WO2022270805A1

WO2022270805A1 - Automatic packet analysis-based automatic network failure resolution device and method therefor

Info

Publication number: WO2022270805A1
Application number: PCT/KR2022/008280
Authority: WO
Inventors: 김신규; 오현세
Original assignee: (주)소울시스템즈
Priority date: 2021-06-21
Filing date: 2022-06-13
Publication date: 2022-12-29
Also published as: KR102376349B1

Abstract

The present invention is for providing an automatic packet analysis-based automatic network failure resolution device and a method therefor. The present invention collects packet information on a network status, automatically determines whether there is a failure in a specific area through analyzed data, and automatically resolves a network failure. Accordingly, the present invention provides, by means of one click, a guide for accurate and fast cause identification and resolution for various and complicated network issues (performance, failure, etc.), enables any network operator to easily and conveniently manage the network, and interworks with various systems by means of information collection and analysis functions so as to provide a customized network management service tailored to a user's needs.

Description

Device and method for automatically solving network failure based on automatic packet analysis

The present invention relates to an intelligent network management system, and in particular, collects packet information on network conditions, automatically determines whether or not there is a failure in a specific area through the analyzed data, and automatically solves network failures, thereby providing various and complex network issues ( performance, failure, etc.), provides a guide for accurate and quick cause identification and resolution with one click, enables any network operator to easily and conveniently manage the network, and provides information collection and analysis functions to link with various systems. An apparatus and method for automatically solving network failures based on automatic packet analysis suitable for providing customized network management services tailored to user needs.

In general, intelligent network technology collectively refers to network and infrastructure technologies commonly used for the 4th industrial revolution and innovative growth based on intelligence, and in detail, software-defined networking (SDN), network functions virtualization (NFV), and network intelligence technology , low-latency/time-deterministic network technology, quantum information communication technology, network structure technology, transport network technology, and wired/wireless access technology.

In addition, the network intelligence technology repeats a series of procedures such as automatic collection of data and feedback for autonomous decision-making using artificial intelligence technologies such as machine learning. technology that automatically performs the functions of

The meaning of these intelligent networks is evolving over time, primarily leading to breakthroughs in computation and algorithms.

Prior art includes Korean Patent Registration No. 10-1998863 'System for Communication Failure Management and Maintenance of Network Equipment' and Korean Patent Registration No. 10-2133001 'Network Management Apparatus, Network Management System and Network Management Method'. has been disclosed.

If the network is down, it is directly related to the disruption of the business. In addition, business processing delays due to network performance degradation lead to direct losses for the organization. They answered that the average loss in the case of a single network outage is USD 402,542 in the US. (Source: The Rise of AIOps: How Data, Machine Learning, and AI Will Transform Performance Monitoring, Appdynamics News, 2018.12.17.) Therefore, it is necessary to minimize the network interruption situation.

The Uptime Institute, which evaluates the performance of networks, has studied publicly reported cases of network outages. Looking at this, among IT failures, network failures increased significantly from 19% in 2017 to 32% in 2018. Therefore, in the event of a network outage, a technology capable of quickly tracking the cause and suggesting a solution is required.

Conventional network management includes a Network Management System (NMS), a Traffic Management System (TMS), a Data Packet Inspector (DPI), and a Packet Analyzer.

However, NMS (Network Management System) focused on equipment and line monitoring has limitations in solving complex network issues. In addition, TMS (Traffic Management System) for network traffic management has a limitation in not supporting in-depth analysis of payload. In addition, DPI (Data Packet Inspector) and Packet Analyzer are very complicated and difficult to use, and have problems in that they require a high level of expertise.

Accordingly, the present invention has been proposed to solve the above conventional problems, and an object of the present invention is to collect packet information about the network state, automatically determine whether or not there is a failure in a specific area through the analyzed data, and to determine network failure. automatically resolves various and complex network issues (performance, failure, etc.) with one click, providing accurate and quick cause identification and a guide for resolution, enabling network operators to easily and conveniently manage the network. To provide a device and method for automatically solving network failures based on automatic packet analysis that can provide customized network management services tailored to user needs by linking with various systems with information collection and analysis functions.

1 is a conceptual diagram of an apparatus for automatically solving network failures based on automatic packet analysis according to an embodiment of the present invention.

As shown in this, in the automatic network failure device 110 of the intelligent network management system 100 that performs network management, the automatic network failure device 110 collects packet information about the network state, Through the analyzed data, it automatically determines whether there is a failure in a specific area and automatically solves the network failure, providing a guide for accurate and quick cause identification and resolution with one click for various and complex network issues, and network operator A control unit 120 that enables anyone to easily and conveniently manage a network, and controls to provide customized network management services tailored to user needs in conjunction with various systems through information collection and analysis functions; Under the control of the control unit 120, packet data is received from the network equipment 210 of the data center or the remote intelligent network management device 220 through a network interface card (NIC), and the received packet data are combined into one data stream to generate raw data necessary for generating an information bundle, and stored in a raw packet storage buffer so that the network of the data center 210 or the remote intelligent network management device 220 is stored. a packet capture unit 130 for measuring data including packet, SNMP TRAP, and SYSLOG information; Under the control of the controller 120, metadata for packets collected by the packet capture unit 130 is generated, and the metadata includes packet check time, packet size, session ID, packet size, MAC address, and TCP information. It includes session information bundle, BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle, and simultaneously compresses data for each type of information bundle. an information bundle generating unit 140 for generating and storing; Performance under the control of the controller 120, receiving the information bundle generated by the information bundle generator 140, generating performance indicators required for network management, and generating basic performance indicators and additional performance indicators as performance indicators. an indicator generator 150; Under the control of the control unit 120, based on the performance indicator generated by the performance indicator generator 150, a performance indicator that selects the type of information to be used from the information bundle and then analyzes the performance of the network to generate a performance indicator analysis result. an analysis unit 160; It is characterized in that it is configured to include; a network failure handling unit 170 that automatically determines whether or not there is a failure in a specific area using the analysis result of the performance indicator analysis unit 160 and automatically solves the network failure.

FIG. 2 is a conceptual diagram showing an example to which the present invention is applied in FIG. 1 .

As shown in this, the network failure processing unit 170, if it is determined to be a network failure, presents a recommendation or controls the network itself, and if it cannot control the network itself or the content is not authorized, it makes a recommendation It presents a solution in the form of a solution, and if it is possible to control the network by itself, it is characterized by accessing through SSH and using a remote shell command to access a specific device in the network to change settings or reboot.

The network failure handling unit 170, when passing through a specific hop in a very busy network, determines that a packet loop is expected in the corresponding hop when the speed suddenly slows down, and delivers a recommendation of 'check cable wiring'; If the network service slows down at a specific time every day, QoS is applied to a specific location discovered by NetFlow to connect to a separate QoS device or switch, and the QoS function is used to prevent server requests from congesting from that location. It is characterized in that it automatically handles network failure by adjusting the total amount of , recommends adjusting the client's network access time so that requests are distributed by time zone, and recommends server and network expansion, including proposals for additional bandwidth required.

3 is a flowchart illustrating a method for automatically solving network failures based on automatic packet analysis according to an embodiment of the present invention.

As shown in this, when the automatic network failure solution 110 of the intelligent network management system 100 performs the automatic resolution of the network failure, the packet capture unit 130 for the network equipment 210 of the data center To sample NetFlow information or to receive packet data from a remote intelligent network management device 220 through a Network Interface Card (NIC), and bundle the received packet data into one data stream to create an information bundle To generate necessary raw data, store it in a raw packet storage buffer, and measure data including network packets, SNMP TRAP, and SYSLOG information of the data center 210 or the remote intelligent network management device 220 a packet capture step (ST1); After the packet capture step, the information bundle generating unit 140 generates metadata for the collected packets, and the metadata includes packet confirmation time, packet size, session ID, packet size, MAC address, and TCP information, Information that creates session information bundle, BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle, and simultaneously compresses and stores data for each type of information bundle a bundle generating step (ST2); After the information bundle generation step, the performance indicator generation unit 150 receives the generated information bundle, generates performance indicators necessary for network management, and generates a basic performance indicator and an additional performance indicator as performance indicators. (ST3) and; After the performance indicator generation step, the performance indicator analysis unit 160 selects the type of information to be used from the information bundle based on the generated performance indicator, analyzes the performance of the network, and generates a performance indicator analysis step (ST4). )Wow; After the performance indicator analysis step, the network failure processing unit 170 automatically determines whether or not there is a failure in a specific area using the analysis result and automatically solves the network failure; characterized in that it is performed including a network failure handling step (ST5). to be

4 is a detailed flowchart of network failure processing in FIG. 3 .

As shown in this, the network failure handling step is a failure handling determination step (ST11, ST12) of determining whether or not there is a failure in a specific area through data obtained by analyzing NetFlow information and packet information, and determining what failure treatment to be performed (ST11, ST12). )Wow; In the failure determination step, if it is determined that it is a network failure, a recommendation is presented or the network is controlled by itself, and a solution is presented in the form of a recommendation if the network cannot be controlled by itself or the content is not permitted. If you can control the network by yourself, connect via SSH and use a remote shell command to connect to a specific device on the network, change settings or reboot, and perform failure handling to provide network failure handling results (ST13, ST14); It is characterized by carrying out including.

An apparatus and method for automatically solving network failures based on automatic packet analysis according to the present invention collects packet information on network conditions, automatically determines whether there is a failure in a specific area through the analyzed data, and automatically solves network failures, Provides a guide for accurate and quick cause identification and resolution with one click for various and complex network issues (performance, failure, etc.), enables any network operator to easily and conveniently manage the network, and provides information collection and analysis functions It has the effect of providing customized network management services tailored to user needs by linking with various systems.

In addition, the present invention can be operated in one system (All-In-One) from information collection to analysis, diagnosis, and results, and provides optimal system selection options suitable for the operating environment (Portable, Rack Mount, Rugged PC, Cloud, etc.) This is possible, and immediate use (Zero Configuration) is possible without pre-setting work.

In addition, the present invention is a packet collection technology using a general NIC (Network Interface Controller), which has the advantage of not depending on the vendor, and has the advantage of not being affected by the user environment by internalizing the L7 protocol automatic classification engine, and interworking with EMS, SIEM, NMS, etc. (Rest API) There is a possible advantage, and there is an effect that a customizing service according to user needs is possible.

In addition, the present invention has the effect of reducing MTTR (Mean time to repair) by more than 1/5 while being about 1/4 cheaper than the price of a foreign solution for the same purpose. In the prior art, it takes about 1 to 2 weeks to collect system setting information, analyze it, and solve the problem of writing a report. On the other hand, the present invention has the advantage of being able to process information collection, analysis, report writing, and problem solving within about 2 to 3 days. (Here, the total time required to solve the problem is a general empirical value and may vary depending on the nature of the problem.) The present invention can shorten the pre-preparation (setup) time for collecting network information and reduce the time to identify the cause of the problem through analysis. can do. It can also reduce action and recovery time for problem resolution. In addition, the time to prepare the final report can be shortened.

In addition, in the case of the prior art, a network and solution operation expert is absolutely necessary for network management, whereas the present invention has the advantage that it can be operated even by a beginner network engineer.

4 is a detailed flowchart of network failure processing in FIG. 3 .

A preferred embodiment of the apparatus and method for automatically solving network failures based on automatic packet analysis according to the present invention configured as described above will be described in detail based on the accompanying drawings. In the following description of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention of a user or operator or precedent, and accordingly, the meaning of each term should be interpreted based on the contents throughout this specification. will be.

First, the present invention collects packet information on the network state, automatically determines whether or not there is a failure in a specific area through the analyzed data, and automatically solves the network failure, thereby solving various and complex network issues (performance, failure, etc.) Provides a guide for accurate and quick cause identification and resolution with one click, allows any network operator to easily and conveniently manage the network, and provides customized network management services tailored to user needs by linking with various systems with information collection and analysis functions is intended to provide

The automatic network failure resolution device 110 of the intelligent network management system 100 that performs network management includes a control unit 120, a packet capture unit 130, an information bundle generator 140, and a performance indicator generator 150. , a performance indicator analysis unit 160, and a network failure processing unit 170.

The control unit 120 collects packet information on the network state from the automatic network problem solving device 110, automatically determines whether or not there is a problem in a specific area through the analyzed data, and automatically solves the network problem, thereby providing various and complex networks. Provides a guide for accurate and quick cause identification and resolution with one click on issues, allows any network operator to easily and conveniently manage the network, and customizes according to user needs by linking with various systems with information collection and analysis functions control to provide network management services.

The packet capture unit 130 receives the control of the control unit 120 and receives packet data from the network equipment 210 of the data center or the remote intelligent network management device 220 through a network interface card (NIC) And, the received packet data is bundled into one data stream, and raw data necessary for generating an information bundle is generated, stored in a raw packet storage buffer, and stored in a data center 210 or a remote intelligent network management device ( 220) measures data including network packets, SNMP TRAP, and SYSLOG information.

The information bundle generator 140 is under the control of the control unit 120 and generates metadata about the packets collected by the packet capture unit 130, and the metadata includes packet check time, packet size, session ID, and packet size. , MAC address and TCP information are included, and session information bundle, BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle are created. Data is compressed and stored by type at the same time.

The performance indicator generation unit 150 is controlled by the control unit 120, receives the information bundle generated by the information bundle generation unit 140, generates performance indicators required for network management, and includes basic performance indicators and performance indicators. Generate additional performance metrics.

The performance indicator analyzer 160 is under the control of the control unit 120, selects the type of information to be used from the information bundle based on the performance indicator generated by the performance indicator generator 150, and then analyzes the performance of the network to analyze the performance indicator. produce results

The network failure processing unit 170 automatically determines whether or not there is a failure in a specific area using the analysis result of the performance indicator analysis unit 160 and automatically solves the network failure.

The operation of the present invention will be described in more detail as follows.

The packet capture unit 130 receives information of the data center 210 as a packet and manages the packet data as an information bundle. When packets are collected in the NIC, the packet capture unit 130 separately distributes and stores each packet in an individual queue (hardware buffer) inside the NIC for load distribution in the NIC. And the process of taking data out of the hardware buffer and processing it is done in the application program.

In addition, the packet capture unit 130 creates a pre-specified number of queues in the NIC hardware itself. And the packet capture unit 130 allocates a separate thread for reading the data of the NIC. At this time, one is assigned to each queue. In addition, a separate buffer is created in advance to move and store raw packets in the NIC internal queue. Also, whether or not packets are accumulated in the queue can be checked automatically or manually. If there is an automatic check, there is a delay between the system checking the queue and notifying the program of the result. This is the process of "Checking the queue in the system → Sending messages to the program → Processing messages in the program → Processing the queue". At this time, the processes of 'sending messages to the program' and 'processing messages in the program' are the causes of delay. So, the delay is avoided by manual control based on an infinite loop. In other words, delay is avoided by performing the process of "checking the queue in the program → processing the queue → repeating".

In addition, the packet capture unit 130 calculates and checks the size of accumulated data for each queue by performing simultaneous execution of each thread at once. Then, for each queue, the location to be stored in the buffer is selected. At this time, if the size of the data to be stored is larger than the remaining size of the buffer, the buffer is replaced with a new, empty buffer. Also, after specifying the storage location in advance, each thread simultaneously writes data to a single buffer. In general, when multiple threads simultaneously write data to a single buffer, a problem may arise when multiple threads simultaneously write data to the same location, but in this case, there is no problem because the areas where data are written do not overlap. Since the packet capture unit 130 specifies the storage location in advance, there is no room for memory waste. In the case of the prior art, since it uses a single thread, the writing speed is limited (up to around 10 Gbps), or the speed problem is solved by using separate FPGA-based hardware, whereas the present invention is purely 100% software-based. It has the advantage of enabling high-speed processing.

In addition, the information bundle generating unit 140 manages a buffer for storing a plurality of packets, and each packet includes an L2 header, an L3 header, an L4 header, and a packet body (body and payload).

The information bundle structure of the information bundle generator 140 consists of a structure such as 'first time stored, last time stored, information block 1, information block 2, information block 3, ..., information block n' there is. The structure of the information block consists of 'compressed size, actual size, compressed binary information data'. Among the previous information data, the fixed length consists of a structure such as 'fixed width data 1, fixed width data 2, fixed width data 3, fixed width data 4, ..., fixed width data n'. Among the previous information data, variable length is 'fixed-width data 1 (including variable-length information), variable-length data 1, fixed-width data 2 (including variable-length information), variable-length data 2, fixed-width data 3 (including variable-length information) , variable length data 3, ..., fixed width data n (including variable length information), and variable length data n'.

The information bundle generator 140 generates metadata for individual packets. Metadata includes packet confirmation time, packet size, session ID, packet size, MAC address, and various types of TCP-specific information.

The information bundle generated by the information bundle generator 140 includes a session information bundle, a BPS information bundle, a PPS information bundle, an RTT information bundle, a timeout information bundle, a TCP information bundle, a Remarks information bundle, an event information bundle, and the like.

The session information bundle stores session ID, client IP/port, server IP/port, L4 protocol, and L7 protocol information.

The BPS information bundle stores session ID, transmission time (in seconds), data size transmitted per second from client to server, and data size information transmitted from server to client per second.

The PPS information bundle stores the session ID, transmission time (in seconds), the number of packets transmitted per second from the client to the server, and the number of packets transmitted per second from the server to the client.

The RTT (Round Trip Time) information bundle stores session ID, transmission delay time from client to server, and transmission delay time information from server to client.

The entire session information and occurrence time information are stored in the timeout information bundle.

The TCP information bundle includes the time zone of TCP SYN and session information, TCP SYN, the time zone and session information of TCP RST, TCP RST, the time zone of TCP DUP ACK and session information, TCP DUP ACK, the time zone and session of TCP packet retransmission It stores information such as TCP packet retransmission, occurred time zone, and TCP other problem information such as the type of problem (TCP Zero Window, Port Reused, Out of Order).

The Remarks information bundle stores HTTP request/response headers, DNS query and response results, SMTP email sender ID, FTP/IMAP/POP3 error content information.

The event information bundle stores event information that occurs when the value is above or below a pre-defined threshold or above a variable rate.

The performance indicator generation unit 150 generates performance indicators of BPS, PPS, latency, and timeout included in the basic performance indicators.

In addition, additional performance indicators include performance indicators of the number of flows generated by time and by IP, TCP performance indicators, performance indicators of IP lists that provide TCP-based services, IP lists that provide UDP-based services, performance indicators of MAC addresses for each IP, and data for each port number. Create one or more performance indicators from the performance indicators of usage status or performance indicators for each L7 protocol.

TCP performance indicators include TCP RST, TCP Zero Windows, TCP DUP ACKS, TCP retransmission, TCP port reuse, and TCP packet out-of-order performance indicators. L7 performance indicators for each protocol include analysis by DNS query result, HTTP connection status, SMTP Performance indicators of data transmission amount measurement for each sender/receiver may be included.

The performance indicator analyzer 160 determines whether the performance indicator to be analyzed is BPS-based analysis, PPS-based analysis, Timeout-based analysis, TCP RST-based analysis, TCP Zero Windows analysis, TCP DUP ACK analysis, TCP retransmission analysis, TCP port reuse analysis, Determine which performance indicator analysis is used among TCP packet order reversal analysis, HTTP error status analysis, and additional performance indicator analysis.

If the performance indicator analysis is BPS-based analysis, the performance indicator analysis unit 160 analyzes it as 'traffic surge' if the traffic is 85% or more of the total available bandwidth, and if the traffic surge condition lasts for more than 60 seconds, it is classified as 'traffic excessive state persistence'. If more than 50% of the total traffic is concentrated on a single IP, it is analyzed as 'concentration of traffic to a specific IP', and if the traffic in use is less than 2% of the total available bandwidth, it is analyzed as 'suspected network failure'.

In PPS-based analysis, if broadcast packets occupy more than 70% of all packets, it is analyzed as 'high bandwidth occupancy due to rapid increase in broadcast packets', and if non-IP packets occupy more than 50% of all packets If it does, it is analyzed as 'unknown packets occupies a large amount of bandwidth'.

In the case of timeout-based analysis, if timeouts occur for more than 20 IPs per second during the period specified by the user, it is analyzed as 'suspicion of unavailability of service due to network interface shutdown or equipment outage', and if more than 10 IPs per second occur simultaneously during the period specified by the user ~ If timeout occurs for less than 20 IPs, it is analyzed as 'suspicion of service interruption due to cable or GBIC (Giga Bitrate Interface Converter) failure'.

In the case of TCP RST-based analysis, if the same server sends RST more than 10 times per second, it is analyzed as 'when a request comes in to a destination port that does not exist on the server side, or when a connection is attempted to a port that has already been disconnected.' And, if the same client sends RST 5 or more times per second, it is analyzed as 'if the application wants to terminate the connection using Reset instead of FIN', and if the same client/server generates RST 3-4 times per second If this is the case, it is analyzed as 'a case where either the server or the client terminates without notifying the termination'.

In the case of TCP Zero Windows analysis, if the TCP Zero Window phenomenon occurs more than 10 times per second, it is analyzed as 'suspicion of zero window creation due to errors in security devices such as firewalls and IPS or WAN accelerators'.

In case of TCP DUP ACK analysis, if DUP ACK occurs more than 60 times per second in a specific IP, it is analyzed as 'Network Congestion'.

In the case of TCP retransmission analysis, if TCP retransmission occurs more than 1000 times per second in a specific IP, it is analyzed as 'suspicion of loop occurrence in the duplication section'.

In TCP port reuse analysis, if TCP port reuse is confirmed more than 3 times per second, it is analyzed as 'client-side local port exhaustion and server time wait state maintenance suspicion'.

In the case of TCP packet out-of-order analysis, if out-of-order occurs more than 3 times per second, it is analyzed as 'suspicion of TCP segment loss due to packet loss'.

In the case of HTTP error status analysis, if the status code is HTTP 4XX and the same phenomenon is found in less than 10 IPs, it is recognized as a 'user input problem' and analyzed. If the same phenomenon is found, it is recognized as 'there is a problem in the server or client code' and analyzed.

If the performance index is additionally analyzed, the performance index is added and analyzed according to the addition of the system setting or the addition of the user.

The external device may be a network equipment 210 of a data center or a remote intelligent network management device 220 or the like. The network equipment 210 of the data center connects to the physical network (Physical NW) through Tapping or Port Mirroring and measures data. In addition, the network equipment 210 of the data center may be a virtualization environment including a virtual switch (vSwitch). The remote intelligent network management device 220 may be a device installed in a remote office.

In addition, the intelligent network management system to which the present invention is applied can automatically diagnose the network using information bundles. Looking at the contents of automatic diagnosis, it defines diagnosis items, measures the condition of the diagnosis subject, provides symptoms of the diagnosis subject, provides expected causes, provides countermeasures for each expected cause, and provides analysis results. Automatic diagnosis targets include performance, usage, UDP, TCP or HTTP errors.

Auto-diagnosis of the network includes network status, network usage and performance, faults/events, application services, auto-diagnosis, L2 to L7 analysis, statistics and trend analysis, and event handling functions, through these functions, auto-diagnosis of the network can be performed.

In the automatic diagnosis of the network status, the status of network equipment is identified through SNMP Trap information analysis, and Syslog data analysis is performed.

In the automatic diagnosis of network usage and performance, BPS (Bits Per Second), PPS (Packets Per Second), Latencies, and Timeout are automatically diagnosed.

Automatic diagnosis of failure/event performs automatic diagnosis on UDP Flag, TCP Resets, TCP Zero Windows, TCP Reuse, TCP Duplicate ACKs, and TCP Retransmission. It also performs automatic diagnosis for HTTP 4XX and HTTP 5XX.

Automatic diagnosis of application service performs automatic recognition of application service and detailed payload analysis. So, it performs automatic diagnosis for HTTP, DNS, SMTP, POP3, IMAP, and FTP.

Automatic diagnosis for problem causes and solutions is performed for TCP Retransmission, Hop Low, Microburst, RTT (Round Trip Time), TCP Reset, TCP Zero Windows, TCP DUP ACKs, and Timeout.

Automatic diagnosis of L2 ~ L7 analysis performs Mac usage analysis as Layer 2 analysis, Hop Account analysis as Layer 3 analysis, analysis by port (by source and destination) as Layer 4 analysis, and automatic diagnosis of application services through Layer 7 analysis. do.

Automatic diagnosis of statistics and trend analysis performs automatic diagnosis of performance indicators (BPS, PPS, Latency, Timeout), TCP related, HTTP error, Layer 7 analysis, and flow trend.

Automatic diagnosis of events performs threshold setting and control by performance, alarm generation and level setting, search and inquiry by alarm level, and automatic diagnosis of Syslog Server (Remote) and SNMP Trap Server. And Alarm/Event provides real-time network status monitoring and notification service.

So, the automatic network troubleshooter 110 of the intelligent network management system 100 collects network information for various network devices 210 such as switches or routers using NetFlow by 5-minute sampling, and the remote intelligent network management device For 220, packet information is received and packet analysis is performed.

NetFlow is network information provided by switches/routers, and data can be collected for each hop. However, when collecting data, equipment and network bandwidth are simultaneously loaded, so in reality, sampling is collected in units of 5 minutes in most cases.

Due to its nature, NetFlow is limited in the kind of data it can obtain. That is, only data of L3 or lower levels (L1, L2) can be received.

Since the automatic network failure resolution device 110 of the intelligent network management system 100 of the present invention is mirroring-based, there is no network bandwidth load, but only information about the corresponding section for collecting packets can be obtained, so that each hop (eg, a specific switch and the interval between the switch) cannot be obtained.

The automatic network failure resolution device 110 of the intelligent network management system 100 may analyze all data from L2 to L7 in detail through Deep Packet Inspection (DPI).

Therefore, the NetFlow of the network equipment 210 of the data center and the remote intelligent network management device 220 to which the present invention is applied are complementary elements in terms of the width and depth of network data to be collected. .

The device for automatically solving network failures 110 can determine whether or not there is a failure in a specific area through data collected and analyzed through the NetFlow of the network equipment 210 and the remote intelligent network management device 220 to which the present invention is applied.

Therefore, the network failure processing unit 170 may present recommendations or control the network by itself when it is determined to be a failure. In addition, it can present a solution in the form of recommendations when it cannot control the network itself or the content is not authorized. In addition, if you can control the network yourself, you can access a specific device to change settings or reboot. At this time, most of them are accessed via SSH and controlled using remote shell commands.

If the network failure processing unit 170 suddenly slows down when going through a specific hop in a very busy network, it may be that the amount of packets confirmed through NetFlow is at a normal level or the BPS shown by NetFlow is also normal. At this time, the automatic network failure resolution device 110 detects hundreds to thousands of TCP DUP ACKs and retransmissions per second from IPs related to the corresponding hop. Therefore, in this case, it can be determined that a packet loop is expected in the corresponding hop. Packet loops are almost always caused by incorrect cable wiring. This is not an issue that can be resolved by changing the settings, so recommendations such as 'check cable routing' are delivered.

In addition, the network failure handling unit 170 can cope with the case where the network service slows down at a specific time every day. This informs that the traffic to the server in charge of the network service both of the NetFlow of the network equipment 210 of the data center and the remote network failure automatic troubleshooting device 220 increases rapidly during the corresponding time period. And NetFlow can confirm that traffic to network services is concentrated in a specific location. The device 220 for automatically solving remote network failures can confirm that the response delay from the server becomes very long only in the corresponding time period. In addition, server personnel may report spikes in CPU and disk usage only during those hours. For this situation, various solutions can be proposed or applied. In other words, QoS is applied to a specific location discovered by NetFlow to connect to a separate QoS device or switch, and the network failure can be automatically handled by adjusting the total amount of traffic so that server requests from the location do not congest using the QoS function. there is. It may also recommend that clients adjust their network access times so that requests are distributed over time. It may also recommend server and network expansion, including suggestions for additional bandwidth needed.

In the packet capture step (ST1), when the automatic network failure resolution device 110 of the intelligent network management system 100 performs automatic resolution for network failure, the packet capture unit 130 is sent to the network equipment 210 of the data center. Sampling NetFlow information about the network or receiving packet data from the remote intelligent network management device 220 through a network interface card (NIC), and generating an information bundle by bundling the received packet data into one data stream It generates raw data necessary for processing and stores it in a raw packet storage buffer to measure data including network packets, SNMP TRAP, and SYSLOG information of the data center 210 or the remote intelligent network management device 220.

In the information bundle generating step (ST2), after the packet capture step, the information bundle generating unit 140 generates metadata for the collected packets, and the metadata includes packet confirmation time, packet size, session ID, packet size, and MAC address. and TCP information, and generates session information bundles, BPS information bundles, PPS information bundles, RTT information bundles, timeout information bundles, TCP information bundles, Remarks information bundles, and event information bundles, and simultaneously for each type of information bundle. Data is compressed and stored.

In the performance indicator generation step (ST3), after the information bundle generation step, the performance indicator generation unit 150 receives the generated information bundle and generates performance indicators necessary for network management, and the performance indicators include basic performance indicators and additional performance indicators. generate

In the performance indicator analysis step (ST4), after the performance indicator generation step, the performance indicator analyzer 160 selects the type of information to be used from the information bundle based on the generated performance indicator, analyzes the performance of the network, and generates the performance indicator analysis result. do.

In the network failure processing step (ST5), after the performance index analysis step, the network failure processing unit 170 automatically determines whether there is a failure in a specific area using the analysis result, and automatically solves the network failure.

4 is a detailed flowchart of network failure processing in FIG. 3 .

In the failure handling determination step (ST11, ST12), it is determined whether there is a failure in a specific area through the data obtained by analyzing NetFlow information and packet information, and what failure handling is to be performed.

In the error handling execution step (ST13, ST14), if it is determined that it is a network failure in the failure handling determination step, a recommendation is presented or the network is controlled by itself, and if the network cannot be controlled by itself or the content is not permitted, a recommendation is made. If you can control the network yourself, connect via SSH and use a remote shell command to connect to a specific device on the network, change settings or reboot, and provide network failure handling results.

As such, the present invention collects packet information on the network state, automatically determines whether or not there is a failure in a specific area through the analyzed data, and automatically solves the network failure, thereby solving various and complex network issues (performance, failure, etc.) Provides a guide for accurate and quick cause identification and resolution with one click, allows any network operator to easily and conveniently manage the network, and provides customized network management services tailored to user needs by linking with various systems with information collection and analysis functions will provide

Although the present invention has been described in more detail by way of examples above, the present invention is not necessarily limited to these examples, and may be variously modified without departing from the spirit of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed according to the claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

Claims

In the automatic network failure solving device of the intelligent network management system for managing the network,

The device for automatically solving network failures collects packet information on the network status, automatically determines whether or not there is a failure in a specific area through the analyzed data, and automatically resolves network failures. To provide a guide for accurate and quick cause identification and resolution, to enable any network operator to manage the network easily and conveniently, and to provide customized network management services tailored to user needs by linking with various systems through information collection and analysis functions. a control unit for controlling;

Under the control of the control unit, raw data required to receive packet data from network equipment in a data center or a remote intelligent network management device through a network interface card, and bundle the received packet data into one data stream to generate an information bundle. A packet capture unit for generating and storing data in a raw packet storage buffer to measure data including network packets, SNMP TRAP, and SYSLOG information of the data center or the remote intelligent network management device;

Under the control of the control unit, metadata for packets collected by the packet capture unit is generated, and the metadata includes packet check time, packet size, session ID, packet size, MAC address, and TCP information, and session information Generate bundle, BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle, and create information bundles that simultaneously compress and store data for each type of information bundle wealth;

a performance indicator generation unit that is controlled by the control unit, receives the information bundle generated by the information bundle generation unit, generates performance indicators required for network management, and generates basic performance indicators and additional performance indicators as performance indicators;

a performance indicator analysis unit under the control of the control unit, selecting a type of information to be used from the information bundle based on the performance indicator generated by the performance indicator generation unit, and then analyzing network performance to generate a performance indicator analysis result;

a network failure processing unit that automatically determines whether or not there is a failure in a specific area using the analysis result of the performance indicator analysis unit and automatically solves the network failure;

Automatic packet analysis-based network failure automatic resolution device, characterized in that configured to include.
The method according to claim 1, wherein the network failure processing unit,

If it is judged to be a network failure, it presents a recommendation or controls the network by itself. If it cannot control the network itself or the content is not authorized, it presents a solution in the form of a recommendation. If it cannot control the network itself Automatic packet analysis-based network failure automatic resolution device, characterized in that, if there is, accesses via SSH and uses a remote shell command to access specific equipment on the network to change settings or perform rebooting.
The method according to claim 1, wherein the network failure processing unit,

If the speed suddenly slows down when going through a specific hop in a very busy network, it is determined that a packet loop is expected in the corresponding hop, and a recommendation of 'check cable wiring' is delivered; If the network service slows down at a specific time every day, QoS is applied to a specific location discovered by NetFlow to connect to a separate QoS device or switch, and the QoS function is used to prevent server requests from congesting from that location. Automatic handling of network failure by adjusting the total amount of network failure, recommending to adjust the client's network access time so that requests are distributed over time, and recommending server and network expansion, including proposals for additional bandwidth required Automatic packet analysis-based network failure resolution device.
When the network failure automatic resolution device of the intelligent network management system performs automatic resolution of network failures, the packet capture unit samples NetFlow information about the network equipment in the data center or transmits packet data from the remote intelligent network management device to the network interface card. receive the received packet data into one data stream, generate the raw data necessary to generate the information bundle, and store it in the raw packet storage buffer to obtain network packets of the data center or the remote intelligent network management device; A packet capture step of measuring data including SNMP TRAP and SYSLOG information;

After the packet capture step, the information bundle generating unit generates metadata for the collected packets, and the metadata includes packet confirmation time, packet size, session ID, packet size, MAC address, and TCP information, session information bundle, An information bundle generation step of generating BPS information bundles, PPS information bundles, RTT information bundles, timeout information bundles, TCP information bundles, Remarks information bundles, and event information bundles, and simultaneously compressing and storing data for each type of information bundle; ;

After the information bundle generating step, the performance indicator generating unit receives the generated information bundle, generates performance indicators necessary for network management, and generates a basic performance indicator and an additional performance indicator as performance indicators;

After the performance indicator generating step, the performance indicator analysis unit selects the type of information to be used from the information bundle based on the generated performance indicator, and then analyzes the performance of the network to generate a performance indicator analysis result;

After the performance indicator analysis step, the network failure processing unit automatically determines whether or not there is a failure in a specific area using the analysis result and automatically solves the network failure;

A method for automatically solving network failures based on automatic packet analysis, comprising:
The method according to claim 4, wherein the network failure processing step,

an error handling determination step of determining whether or not there is a failure in a specific area through data obtained by analyzing NetFlow information and packet information, and determining which failure treatment to be performed;

In the failure determination step, if it is determined that it is a network failure, a recommendation is presented or the network is controlled by itself, and a solution is presented in the form of a recommendation if the network cannot be controlled by itself or the content is not permitted. If you can control the network by yourself, access via SSH and use a remote shell command to connect to a specific device on the network, change settings or reboot, and perform failure handling to provide network failure handling results;

A method for automatically solving network failures based on automatic packet analysis, comprising: