KR20040027705A

KR20040027705A - SVM Based Advanced Packet Marking Mechanism for Traceback AND Router

Info

Publication number: KR20040027705A
Application number: KR1020040012788A
Authority: KR
Inventors: 이형우
Original assignee: 이형우
Priority date: 2004-02-25
Filing date: 2004-02-25
Publication date: 2004-04-01
Also published as: KR100608210B1

Abstract

PURPOSE: A method and a router for tracing back by applying SVM based packet marking technique are provided to reduce the number of packets required for reconstructing the traceback path to reach the damaged system. CONSTITUTION: A method for tracing back by applying SVM based packet marking technique includes the steps of: a first step of inspecting the bandwidth of the traffic for the packet inputted to the router; a second step(S4) of determining whether the packets is a congested signature or not; a third step(S6) of analyzing the SVM based traffic pattern through the SVM module when a large number of packets are generated during a short time period; a fourth step(S10) of marking the pushback field for the corresponding packet when the packet is aggressive; a fifth step(S12) of marking the packets; a seventh step of transmitting the made packets to the front end router; and a seventh step(S20) of determining whether the pushback field is marked or not when a large number of packets are not generated. The method for tracing back is characterized in that the packet marking step of the fifth step(S12) is performed when the pushback field is marked at the seventh step(S20) and the sixth step is performed when the pushback field is not marked.

Description

SVM Based Advanced Packet Marking Mechanism for Traceback AND Router

본 발명은 에스브이엠 기반 패킷 마킹 기법을 적용한 근원지 역추적 방법 및 라우터에 관한 것으로, SVM을 기반으로 하고 기존의 DDoS 공격에 대한 제어 기능을 제공하는 pushback 기법을 역추적 기능과 접목하여 스푸핑된 DDoS 패킷에 대한 IP 근원지를 역추적하는 에스브이엠 기반 서비스 거부 공격 패킷 마킹 기법을 적용한 근원지 역추적 방법 및 라우터에 관한 것이다.The present invention relates to a source traceback method and a router using an SMB-based packet marking technique, and spoofed a DDoS spoofed by using a pushback scheme based on SVM and providing a control function for an existing DDoS attack. The present invention relates to a source backtracking method and a router using an SMB-based denial of service attack packet marking technique that traces back an IP source to a packet.

네트워크로 연결된 공격자와 피해자간에 전송되는 패킷을 추적하거나 차단하는 여러 가지 기술이 개시되고 있다.Various techniques for tracking or blocking packets transmitted between networked attackers and victims have been disclosed.

현재 TCP SYN flooding[Computer Emergency Response Team, "TCP SYN flooding and IP Spoofing attackts," CERT Advisory CA-1996-21, Sept, 1996.] 공격과 같은 서비스 거부 공격(Dos: Denial of service)[L. Garber. "Denial-of-service attacks trip the Internet". Computer, pages 12, Apr. 2000.]을 통해 TCP/IP 체계의 취약점이 노출되어 있기 때문에 네트워크 및 인터넷에서의 해킹 공격에 대응할 수 있는 방안에 대해 연구가 진행되고 있다. 대응 기술로서 우선 방화벽(firewall) 시스템은 접근 제어 기술을 적용한 것으로 해킹 공격에 수동적인 특징을 보이고 있으며, 침입탐지 시스템(IDS:Intrusion Detection System)을 통한 대응 기술은 피해 시스템에 도착한 이상 트래픽에 대한 검출 및 차단 기능만을 제공하는 수동적 해킹 대응 기술이다.Denial of service (Dos) denials such as the current TCP SYN flooding [Computer Emergency Response Team, "TCP SYN flooding and IP Spoofing attackts," CERT Advisory CA-1996-21, Sept, 1996.] attacks [L. Garber. "Denial-of-service attacks trip the Internet". Computer, pages 12, Apr. Since the vulnerability of TCP / IP system is exposed through 2000., researches are being conducted to cope with hacking attacks in network and Internet. As a countermeasure technology, the firewall system is applied with access control technology and shows passive characteristics in hacking attacks, and the countermeasure technology through the intrusion detection system (IDS) detects abnormal traffic arriving at the victim system. And a passive hacking countermeasure that provides only a blocking function.

따라서, 현재까지 제시된 기술은 DoS 해킹 공격 근원지에 대한 확인, 추적 등과 같이 능동적인 측면에서의 해킹 대응 기능을 제공하고 있지 못하고 있다. 그 이유는 대부분의 해킹 공격이 근원지 IP 주소를 스푸핑(IP Spoofing)하는 방식으로 수행되므로 이에 대한 능동적 대응 기술이 개발되어야 한다. Traceroute 기술을 이용하여 근원지 주소를 판별하는 과정을 적용한다 할지라도 분산 서비스 거부 공격(DDos: Distributed Denial of service) 패킷내에 포함되어 있는 주소가 스푸핑되어 있기 때문에 실제 주소에 대한 판별 및 추적 기능을 제공하지 못하고 있다.Therefore, the technologies proposed to date do not provide hacking countermeasures in active aspects such as identification and tracking of DoS hacking attack sources. The reason for this is that most hacking attacks are carried out by IP spoofing the source IP address. Therefore, an active response technology must be developed. Even if the source address is determined by using Traceroute technology, the address contained in the Distributed Denial of Service (DDos) packet is spoofed. I can't.

DDoS 공격과 같은 해킹 공격에 대한 대응하는 방법은 크게 백신, 침입탐지 및 침임감내 기술 등과 같은 수동적인(passive) 대응 방법과 공격 근원지 역추적(Traceback) 기법과 같은 능동적인(active) 대응 방법으로 나눌 수 있다. 능동적인 대응 방법은 다시 해킹 공격 근원지를 검출하는 방법에 따라 전향적(proactive) 역추적 방식과 대응적(reactive) 역추적 기법으로 나눌 수 있다.Countermeasures against hacking attacks, such as DDoS attacks, are largely divided into passive countermeasures such as vaccines, intrusion detection and invasion techniques, and active countermeasures such as traceback techniques. Can be. Active countermeasures can be divided into proactive backtracking and reactive backtracking depending on how the hacking attack origin is detected.

종래 라우터의 공격자 차단방법에 관한 기술을 도 1에 도시하였다(대한민국 특허공개번호 2003-42318호, 2003년 5월 28일 공개). 이에 도시한 바와 같이, 도 1을 참조하여 글로벌한 네트워크 환경에서 공격자가 피해자를 공격하는 흐름을 살펴보면, 공격자(10)가 불법적인 트래픽을 발생하여 자신의 도메인 보더 라우터(20)를 거쳐 ISP의 보더 라우터(30)을 통과하여 피해자가 속한 도메인의 보더 라우터(40)에서 마지막으로 피해자(50)에게 전달되게 된다. 즉, 상기 종래 기술은 ISP에서 거치는 첫 보더라우터인(30)에서 불법적인 트래픽을 감지하고 이를 차단하는 기술에관한 것이다. 상기 라우터의 패킷 전송 구조에서, 패킷의 판별기능은 존재하지만 패킷에 대한 마킹 기법을 제공하지 않고 있다. 즉, 해킹 공격이 발생하였을 경우 이에 대한 대응 기술이 수동적으로 되는 문제점이 있었다.A technique related to an attacker blocking method of a conventional router is shown in FIG. 1 (Korean Patent Publication No. 2003-42318, published May 28, 2003). As shown in FIG. 1, when the attacker attacks the victim in the global network environment with reference to FIG. 1, the attacker 10 generates illegal traffic and passes through his domain border router 20 to the border of the ISP. After passing through the router 30, the border router 40 of the domain to which the victim belongs is finally delivered to the victim 50. That is, the prior art relates to a technique for detecting illegal traffic and blocking the traffic at the first border router 30 that is passed through the ISP. In the packet transmission structure of the router, a packet discrimination function exists but does not provide a marking scheme for the packet. That is, when a hacking attack occurs, there is a problem that the corresponding technology is passive.

DDoS 해킹 공격이 발생하였을 경우 우선 네트워크상에서 라우터 등에 의해서 악성 정보라고 판단되는 패킷을 제거(dropping malicious packets)하는 방식은 ingress filtering[P. Ferguson and D. Senie. "Network ingress Filtering: Defeating denial of service attacks which employ IP source address spoofing", May 2000. RFC 2827.] 기법 등과 같이 라우터에 의한 제거 및 필터링(filtering) 기법 등에 해당하며 DDoS 공격에 수동적인 특성을 보인다. 따라서 효율적인 해결 방법으로는 DDoS 공격이 발생하였을 경우 피해 시스템에서는 스푸핑된 DDoS 공격 근원지에 대한 실제 주소를 역추적하는 방법이다.When a DDoS hacking attack occurs, a method of dropping malicious packets that are deemed malicious information by a router or the like on the network is described in ingress filtering [P. Ferguson and D. Senie. Such as "Network ingress Filtering: Defeating denial of service attacks which employ IP source address spoofing", May 2000. RFC 2827.], etc., is passive in DDoS attack. Therefore, as an effective solution, when a DDoS attack occurs, the victim system traces back the actual address of the source of the spoofed DDoS attack.

역추적 방식은 네트워크상에 패킷이 전송되는 과정에서 사전에 라우터는 역추적 경로 정보를 생성하여 패킷에 삽입하거나 패킷의 목적지 IP 주소로 전달하여 주기적으로 관리하는 방식이다. 만일 피해 시스템에서 해킹 공격이 발생하면 이미 생성, 수집된 역추적 경로 정보를 이용하여 스푸핑된 해킹 공격 근원지를 판별하는 기법이다. 패킷에 대한 확률적 마킹(PPM : probabilistic packet marking)[K. Park and H. Lee. On the effectiveness of probabilistic packet marking for IP traceback under denial of service attack. In Proc. IEEE INFOCOM '01, pages 338 {347, 2001., D. X. Song, A. Perrig, "Advanced and Authenticated Marking Scheme for IP Traceback," Proc, Infocom, vol. 2, pp. 878-886, 2001.] 기법과ICMP 메시지를 변형한 iTrace (ICMP traceback)[Steve Bellovin, Tom Taylor, "ICMP Traceback Messages", RFC 2026, Internet Engineering Task Force, February 2003.]기법 등이 이에 해당한다.The backtracking method is a method in which a router generates backtrace path information in advance in a process of transmitting a packet on a network, inserts it into a packet, or delivers the packet to a destination IP address of a packet periodically. If a hacking attack occurs in the damage system, the spoofed hacking attack origin is determined by using the generated and collected traceback path information. Probabilistic packet marking (PPM) [K. Park and H. Lee. On the effectiveness of probabilistic packet marking for IP traceback under denial of service attack. In Proc. IEEE INFOCOM '01, pages 338 {347, 2001., D. X. Song, A. Perrig, "Advanced and Authenticated Marking Scheme for IP Traceback," Proc, Infocom, vol. 2, pp. 878-886, 2001.] iTrace (ICMP traceback), a variation of the technique and ICMP messages [Steve Bellovin, Tom Taylor, "ICMP Traceback Messages", RFC 2026, Internet Engineering Task Force, February 2003.]This includes techniques.

또한, 최근 제시된 pushback[S. Floyd, S. Bellovin, J. Ioannidis, K. Kompella, R. Mahajan, V. Paxson, "Pushback Message for Controlling Aggregates in the Network," Internet Draft,2001] 기법은 DDoS 공격이 발생하였을 경우 패킷에 대한 판단 기능을 제공하며 패킷 전달 경로를 따라서 패킷에 대한 전송 제어 기능을 제공한다. 이 기법은 DDoS 공격 트래픽에 대한 제어 기능을 제공하지만 DDoS 해킹 공격 근원지를 역추적하는 기능은 제공하지 못하고 다만 패킷 전달 경로를 따라 패킷에 대한 전송 제어 기능을 제공하여 전체적인 네트워크 성능을 높여주고 있다.In addition, recently presented pushback [S. Floyd, S. Bellovin, J. Ioannidis, K. Kompella, R. Mahajan, V. Paxson, "Pushback Message for Controlling Aggregates in the Network," Internet Draft, 2001]. It provides the function and the transmission control function for the packet along the packet forwarding path. This technique provides the control of DDoS attack traffic, but does not provide the ability to trace back the origin of DDoS hacking attacks, but improves the overall network performance by providing the transmission control function for packets along the packet forwarding path.

한편, 도 2는 종래 에지 라우터의 로그 정보를 이용한 공격자 역추적방법에 관한 기술이다(대한민국 특허공개번호 2003-39732호 2003년 5월 22일 공개). 이에 도시한 바와 같이, 도 1의 네트워크 구성도에 따르면, 에지 라우터(102, 103)에 의해 공격자 로컬 인터넷과 인터넷 서비스 사업자 망이 연결되고 에지 라우터(105, 106)에 의해 침입자 로컬 인터넷과 인터넷 서비스 사업자 망이 연결되도록 구성된다. 상기 침입자 로컬 인터넷, 인터넷 서비스 사업자 망 및 공격자 로컬 인터넷에는 공격자를 추적하기 위한 관리서버(109, 110, 111)가 각각 존재한다. 또한, 상기 침입자 로컬 인터넷, 인터넷 서비스 사업자 망 및 공격자 로컬 인터넷에는 각각 다수의 호스트 및 특정 침입탐지 시스템(Intrusion Detection System : IDS)이 구성되어 있다. 동 도면에 있어서, 외부의 해커는 자신의 공격 호스트(101)에서 IP 주소를 속여 자신의 에지 라우터(102)와 ISP 도메인(Internet Service Provider domain)(인터넷 서비스 사업자 망)의 에지 라우터(103, 105)를 경유한 후, 침입 도메인(침입자 로컬 인터넷)의 에지 라우터(106)를 통해 침입 호스트(107)를 공격한다. 이 과정에서 각 도메인의 에지 라우터(103, 106)는 외부 도메인으로부터 접근하는 패킷에 대한 로그정보를 기록한다. 침입탐지 시스템(108)은 상기 공격자의 침입을 탐지할 경우 침입정보를 관리서버(109)에게 보고한다. 관리서버(109)는 침입탐지 시스템(108)으로부터 침입정보를 전달받아 에지 라우터(106)의 로그정보를 바탕으로 해커가 위치한 공격자 도메인의 공격 호스트(101)를 추적한다. 그러나 이러한 로그정보를 이용한 역추적 방법은 많은 침입탐지 시스템과 관리서버등이 필요하므로 시스템이 커지는 문제점이 있고, 로그정보에 의존하는 의존성이 커서 빠른 추적이 불가능하여 침입에 효과적으로 대응하지 못하는 문제점이 있었다. 즉 라우터에서 패킷 정보에 대한 로그 정보를 관리하는 기법은 라우터에 대해 많은 메모리를 필요로 하며 일부 역추적 기능을 제공하지만 전반적으로는 낮은 보안 구조와 DDoS 취약점을 보이는 문제점이 있었다.On the other hand, Figure 2 is a description of the attacker backtracking method using the log information of the conventional edge router (published May 22, 2003, Republic of Korea Patent Publication No. 2003-39732). As shown in FIG. 1, according to the network configuration of FIG. 1, an attacker's local Internet and an Internet service provider network are connected by edge routers 102 and 103, and an intruder's local Internet and Internet service are connected by edge routers 105 and 106. The provider network is configured to be connected. There are management servers 109, 110, and 111 for tracking attackers in the attacker's local Internet, the Internet service provider network, and the attacker's local Internet, respectively. In addition, a plurality of hosts and specific intrusion detection systems (IDS) are configured in the intruder local internet, the internet service provider network, and the attacker local internet, respectively. In the figure, an external hacker cheats an IP address from his attacking host 101, and edge routers 103 and 105 of his edge router 102 and an ISP domain (Internet Service Provider network). And then attack the intrusion host 107 via the edge router 106 of the intrusion domain (intruder local internet). In this process, the edge routers 103 and 106 of each domain record log information about packets that are accessed from an external domain. The intrusion detection system 108 reports the intrusion information to the management server 109 when detecting the intrusion of the attacker. The management server 109 receives the intrusion information from the intrusion detection system 108 and tracks the attacking host 101 of the attacker domain where the hacker is located based on the log information of the edge router 106. However, the backtracking method using this log information requires a large number of intrusion detection systems and management servers, which causes the system to grow, and has a problem in that it cannot effectively track intrusions due to its large dependence on log information. . In other words, the method of managing log information about packet information in the router requires a lot of memory for the router and provides some backtracking functions, but there is a problem of low security structure and DDoS vulnerability in general.

기존의 DDoS 해킹 공격 대응 기술Conventional DDoS hacking attack response technology

스푸핑된 DDoS 패킷에 대한 역추적을 위해서는 TCP 계층을 중심으로한 서비스 중심의 역추적 방식 보다는 패킷 자체의 네트워크 전송 과정과 관련된 IP 계층에서의 역추적 기능을 제공하기 위한 연구가 활발히 진행되고 있다. IP 계층을 중심으로 현재까지 제시된 역추적 기술을 분류하면 해킹 대응 방식에 따라 크게 전향적(proactive) 역추적 기술과 대응적(reactive) 역추적 기술로 나눌 수 있으며, 좀더 세부 기술로 나누어 본다면 라우터 중심의 역추적 기술, 패킷 정보에 대한 관리 시스템 구현 기술, 특수 네트워크 중심 기술 및 관리 기술 중심 역추적 방식으로 분류할 수 있다.In order to trace back the spoofed DDoS packet, research is being actively conducted to provide a traceback function in the IP layer related to the network transmission process of the packet itself rather than a service-oriented traceback method centering on the TCP layer. If we classify the backtracking technology proposed so far based on the IP layer, it can be divided into proactive backtracking and reactive backtracking according to the hacking response method. Can be classified into traceback technology, management system implementation technology for packet information, special network-oriented technology, and management technology-oriented traceback method.

전향적 역추적 기술은 네트워크상에 패킷이 전송되는 과정에서 사전에 역추적 경로 정보를 생성하여 패킷에 삽입하거나 목적지로 전달하여 주기적으로 관리하면서 만일 해킹 공격이 발생하면 이미 생성, 수집된 정보를 이용하여 해킹 공격 근원지를 판별하는 기법이다. 패킷에 대한 확률적 마킹(PPM : probabilistic packet marking) 기법과 전통적인 ICMP 메시지를 변형하여 역추적 기능을 제공하는 iTrace (ICMP traceback) 기법으로 나눌 수 있다.Proactive traceback technology uses the information already generated and collected when a hacking attack occurs, while generating the traceback path information in advance and inserting it into the packet or forwarding it to the destination in the process of transmitting the packet on the network. It is a technique to determine the origin of hacking attack. It can be divided into probabilistic packet marking (PPM) and iTrace (ICMP traceback), which provides a traceback function by transforming a traditional ICMP message.

PPM 기법PPM technique

스푸핑된 패킷에 대해 원래의 패킷 전송 경로를 파악하기 위해서는 IP 계층을 중심으로 네트워크 상에 전송되는 패킷에 대해 네트워크를 구성하는 주요 요소인 라우터에서 IP 패킷에 라우터 자신을 거쳐서 전달되었다는 정보를 삽입하는 방식이다. 즉, 인터넷을 통해 전달되는 패킷에 대해 라우터는 IP 계층을 중심으로 패킷 헤더 정보를 확인하여 라우팅하게 되는데 이때, IP 헤더에서 변형 가능한 필드에 대해서 라우터에 해당하는 주소 정보를 마킹하여 다음 라우터로 전달하는 기법이다. 도 3에서와 같이 IP 헤더에서 16비트 ID 필드에 라우터 자신의 IP 정보를 삽입하게 된다.In order to identify the original packet transmission path for spoofed packets, a method of inserting information that a packet is transmitted through the router itself into an IP packet by a router, which is a main element of the network, is configured for the packet transmitted on the network around the IP layer. to be. That is, the router checks the packet header information based on the IP layer and routes the packet transmitted through the Internet. At this time, the address information corresponding to the router is marked for the transformable field in the IP header and forwarded to the next router. Technique. As shown in FIG. 3, the router's own IP information is inserted into the 16-bit ID field in the IP header.

도 3에서, 라우터에 입력되는 IP 데이터그램은, 소스 IP 주소(Source Ip address), 목적 IP 주소(Destination IP address), 프로토콜(protocol) 및 서비스 타입(service type)이 포함된다. 이 밖에 버전(version), 헤더 길이(header length), 총 길이(total length), 식별자(identification), 플랙(flag), 프래그먼테이션 오프셋(fragmentation offset), 타임투리브(time to live), 헤더 체크섬(header checksum) 및 옵션(option)등의 항목이 있다.In FIG. 3, an IP datagram input to a router includes a source IP address, a destination IP address, a protocol, and a service type. In addition, version, header length, total length, identification, flag, fragmentation offset, time to live, There are items such as header checksums and options.

각 라우터에서 삽입된 정보는 다시 다음 라우터로 전달되고 최종적으로 목적지 피해 시스템에 전달된다. 도 4에서와 같이 각 라우터에서 마킹된 정보가 전달되면 추후에 해킹 공격이 발생하였을 경우 해킹 공격에 해당하는 패킷에 기록된 라우터 정보를 재구성(reconstruction)하여 실제적인 패킷의 전달 경로를 재구성하게 된다.The information inserted at each router is passed back to the next router and finally to the victim victim system. As shown in FIG. 4, when the marked information is delivered from each router, when the hacking attack occurs later, the router information recorded in the packet corresponding to the hacking attack is reconstructed to reconstruct the actual packet transmission path.

각 라우터에서 전달된 정보를 마킹하는 과정에서 모든 패킷에 마킹하게 되면 전체 네트워크에 대한 지연 현상이 발생하기 때문에 일반적으로 라우터에서는 확률 ρ 로 패킷을 샘플링하여 마킹하게 된다. 이때 라우터에서 마킹하는 정보의 구성에 따라 노드 샘플링(node sampling), 에지 샘플링(edge sampling) 및 개선된 패킷 마킹 기법 등이 제시되었다. 도 5와 같이 노드 샘플링 기법은 패킷이 전송된 경로 정보를 확률 ρ 로 샘플링하여 목적지에 전송하는 과정을 보인다.In the process of marking the information transmitted from each router, if all the packets are marked, there is a delay for the entire network. Therefore, in general, routers sample and mark packets with a probability ρ. At this time, node sampling, edge sampling, and improved packet marking schemes are proposed according to the configuration of information marked by a router. As shown in FIG. 5, the node sampling scheme shows a process of sampling path information on which a packet is transmitted with a probability ρ and transmitting the packet to a destination.

도 6은 에지 샘플링 방법으로 라우터에서 자신의 IP 주소 정보만을 패킷 헤더에 마킹하는 것이 아니라, 패킷이 전달된 앞단의 라우터 IP 주소까지도 같이 마킹하여 전달하는 방식이다. 이와 같은 에지 샘플링 기법은 해킹 공격 경로를 재구성하는 과정이 노드 샘플링 기법보다 뛰어나다. 변형된 PPM 기법으로는 라우터에서 마킹하는 패킷에 대한 인증 기능을 제공하여 마킹 과정에서 보안 기능을 제공하는 기법 등이 있다.6 is a method of not only marking its own IP address information in the packet header by the edge sampling method, but also marking and forwarding the router IP address of the front end where the packet is delivered. In the edge sampling technique, the process of reconstructing the hacking attack path is superior to the node sampling technique. The modified PPM technique includes a technique for providing a security function in the marking process by providing an authentication function for a packet marked by a router.

기존 PPM 기술의 문제점Problems of Existing PPM Technology

PPM 기법인 경우 기존의 패킷 정보에 대해 확률 ρ 로 샘플링하여 메시지 헤더에 라우터 자신의 IP 주소 정보를 마킹하고 이를 패킷의 목적지로 전송하는 방식이다. 즉, 라우터에서는 확률 ρ 로 패킷을 선정하여 전송하는데 DDoS 공격에 대한 근원지 경로를 재구성하기 위해서는 상당히 많은 수의 마킹된 패킷이 필요하다. 만일 특정 라우터에서의 에지 정보 또는 노드 정보 등이 마킹되지 않고 전달된다면 나머지 마킹된 정보를 가지고는 완벽한 공격 경로를 재구성할 수 없다는 문제점도 발견할 수 있으며, 최소한 하나의 노드 또는 에지 정보를 마킹하는데 알고리즘에서는 최소한 8개의 패킷을 선정하여 마킹해야 하기 때문에 전체적인 효율 면에서도 비효율적이다.In the case of the PPM scheme, the packet information of the router is sampled with probability ρ, and the router's own IP address information is marked in the message header and transmitted to the packet destination. In other words, the router selects and transmits packets with probability ρ, and a large number of marked packets are required to reconstruct the source path for the DDoS attack. If the edge information or node information in a specific router is delivered without being marked, it may be found that a complete attack path cannot be reconstructed with the remaining marked information. An algorithm is used to mark at least one node or edge information. At least 8 packets must be selected and marked, which is inefficient in terms of overall efficiency.

또한 기존의 PPM 기법인 경우 패킷에 대해 일정 확률 ρ 를 만족할 경우 샘플링하여 전송하는 기법을 사용하는 과정에서 해킹 트래픽에 대해서 마킹하지 않고 보내는 경우도 발생한다. 이 경우 일반적인 패킷에 대해 역추적 경로 정보를 마킹하여 보내기 때문에 DDoS와 같은 해킹 공격이 발생하였을 경우 스푸핑된 공격 근원지를 재구성할 수 없다는 단점이 있다. 따라서 라우터에서 PPM 방식을 수행하는 과정에서 고정적인 형태의 확률 ρ 에 의존하여 샘플링하지 않고 전체 네트워크의 트래픽 특성에 따라 능동적으로 확률 ρ 를 조정할 수 있다면 기존 기법에 비해 네트워크 부하, 메모리 및 역추적 기능 등에서 보다 향상된 기법을 제공할 수 있다.In addition, in the case of the conventional PPM scheme, when a certain probability ρ is satisfied for a packet, hacking traffic is sent without being marked in the process of using a sampling and transmission technique. In this case, since the traceback path information is marked for general packets, a spoofed attack source cannot be reconstructed when a hacking attack such as DDoS occurs. Therefore, if the probability ρ can be actively adjusted according to the traffic characteristics of the entire network without sampling depending on the fixed type of probability ρ in the process of performing the PPM method in the router, the network load, memory, and traceback functions are compared with the conventional scheme. Better techniques can be provided.

기존의 해쉬 기반 역추적 기법인 경우 패킷에 대한 해쉬 값을 일정한 주기로 관리 전송하는 방식이지만 네트워크가 규모가 방대한 경우 전체 성능에 많은 문제점이 발생하게 된다. 또한 IDS 시스템 등을 통해 해킹 등이 발견된 경우 역추적 과정을 수행하는 방식이므로 우선 네트워크 자체에 대한 공격이 수행된다면 본 기법 역시 작동하지 않는다는 문제점이 발생한다. 결국 라우터를 통해 패킷에 해쉬 함수를 퉁한 무결성/인증 기능을 적용하고 트래픽의 특성에 따라 DDoS 트래픽에 대해서만 선정하여 역추적 정보를 마킹하는 새로운 방식이 제시되어야 한다.In the conventional hash-based traceback scheme, the hash value for a packet is managed and transmitted at regular intervals. However, when the network is large in size, many problems occur in overall performance. In addition, if a hack is found through the IDS system, etc., a backtracking process is performed. Therefore, if an attack is performed on the network itself, this technique also does not work. As a result, a new method of marking the traceback information by applying the integrity / authentication function with the hash function to the packet through the router and selecting only the DDoS traffic according to the characteristics of the traffic should be presented.

다시말하면, 기존의 노드 및 에지 샘플링 등에 의한 패킷 마킹 기법과 iTrace 기법은 관리 시스템 및 네트워크 부하는 적은 반면 피해 시스템에서 역추적 경로 재구성시 많은 부하를 필요로 하며, DDoS 공격에는 취약한 특성을 보이며, 전체적으로 현재까지 제시된 IP 역추적 기법을 검토하였을 경우 대부분 기존 라우터에 대한 변형 및 추가적인 네트워크/시스템 부하가 발생하는 문제점이 있다.In other words, the packet marking technique and iTrace technique using the existing node and edge sampling require a lot of load when the traceback reconstruction is performed in the damaged system while the management system and network load are small, and they are vulnerable to DDoS attacks. In the case of examining the IP traceback schemes presented to date, most of them have a problem of modification and additional network / system load on existing routers.

이하에서는 SVM 모듈에 대한 이론으로 SVM에 대하여 설명하기로 한다.Hereinafter, the SVM will be described as a theory of the SVM module.

SVM 연구SVM Research

전통적인 기법들이 경험적인 위험을 최소화하는데 기초한 반면, SVM(SupportVector Machine)은 구조적인 위험을 최소화하는 것에 기초하고 있다. 여기서 경험적 위험의 최소화는 훈련 집단의 수행도를 최적화하려는 노력을 말하고, 구조적 위험의 최소화는 고정되어 있지만 알려지지 않은 확률분포를 갖는 데이터에 대해 잘못 분류하는 확률을 최소화하는 것을 말한다[A.C. Snoeren, C. Partridge, L.A. Sanchez, W.T. Strayer, C.E. Jones, F. Tchakountio, and S.T. Kent, "Hash-Based IP Traceback", BBN Technical Memorandum No. 1284, February 7, 2001.].While traditional techniques are based on minimizing empirical risk, SupportVector Machine (SVM) is based on minimizing structural risk. Empirical risk minimization here refers to efforts to optimize the performance of the training group, while minimizing structural risk refers to minimizing the probability of misclassifying data with fixed but unknown probability distributions [A.C. Snoeren, C. Partridge, L.A. Sanchez, W.T. Strayer, C.E. Jones, F. Tchakountio, and S.T. Kent, "Hash-Based IP Traceback", BBN Technical Memorandum No. 1284, February 7, 2001.].

두 클래스에 속하는 학습 벡터의 집합을 선형적으로 분리 가능하도록 하는 문제를 생각해 보면, 가중치 벡터와 바이어스 b로 구성되는의 초월면(hyperplane)을 가지도록 훈련 데이터 셋(training data set)를 학습시키는 것을 나타내며, 여기서는 입력 패턴이고,는 목표값이 된다. 초월면는 식 (4)의 조건을 만족하게 된다.Consider the problem of linearly separable sets of learning vectors belonging to two classes. Consisting of and bias b Training data set to have a hyperplane of Represents learning to, where Is the input pattern, Becomes the target value. Transcendence Satisfies the condition of Equation (4).

식 (4)에서 등호의 조건을 만족하는 입력패턴들 중에서 결정 표면(decision surface)에 가장 가까이 위치한 패턴들을 support vector라고 하며, 개념적으로 이 벡터들은 초월면에 가장 가까이 위치하여 분류하기가 어려운 벡터들이다. 따라서 분류를 위한 학습은 제약조건 식 (5)을 만족하는 최적의 초월면을 찾는 것이다. 이것은 제약조건을 가지는 최적화 문제로 훈련 데이터 셋이 주어질 때 최적의 초월면을 위한 최적의 파라미터와 b 를 찾는 Quadratic 문제이다.Among the input patterns satisfying the condition of the equal sign in Equation (4), the patterns located closest to the decision surface are called support vectors. Conceptually, these vectors are located closest to the transcendental surface and are difficult to classify. . Thus, learning for classification is to find the optimal transcendental plane that satisfies the constraint (5). This is an optimization problem with constraints. Given the optimal parameters for the optimal transcendental plane Quadratic problem to find and b.

여기서 최적은 최대 마진(margin)을 가지는 것이며, 최대 마진 초월면은 최적으로 두 개의 클래스를 분리할 수 있는 초월면이다. 결국 최적의 선형 분리 경계면을로 놓으면, support vector와의 거리를로 나타낼 수 있으며, 입력패턴을 최적으로 분류하는 초월면은 식 (6)과 같이 비용함수를 최소화한다.Here, the optimal is to have the maximum margin, and the maximum margin transcendental plane is the transcendental plane that can optimally separate the two classes. Eventually, the optimal linear separation boundary To the support vector The distance The transcendental plane that classifies the input pattern optimally can be expressed as Minimize.

식 (6)의 비용함수는의 블록함수이며, 제약조건 식 (5)는에 선형임을 확인할 수 있다. 지금까지 서술된 분류를 위한 SVM을 정리하면, 학습 패턴이 주어질 때 제약조건 식 (5)를 만족하는 가중치 벡터와 바이어스 b 를 찾는 최적화 문제로 생각할 수 있으며, 이때을 최소화하여 분리 간격을 최대화하도록 하여 최적 분리면을 찾아낸다. 이 최적화 문제를 해결하기 위하여 라그랑제(Lagrange) 계수법을 이용하면 식 (7)과 같은 라그랑제 함수을 얻을 수 있다.The cost function in equation (6) is Is the block function of, and the constraint (5) is You can see that it is linear to. Summarizing the SVMs for the classifications described so far, the weight vector satisfying the constraint equation (5) given the learning pattern Think of it as an optimization problem that finds and bias b, where To minimize the separation gap to find the best separation surface. In order to solve this optimization problem, using the Lagrange coefficient method, the Lagrange function such as (7) Can be obtained.

식에서는 라그랑제 계수들이며, 최적화 문제에 대한 해는와 b 에 대해서는 최소화되며,에 대해서는 최대화되어야 한다. 따라서와 b에 대한의 최소는 그 각각에 대한 미분으로 얻어질 수 있다.At the ceremony Are Lagrangian coefficients, and the solution to the optimization problem is Are minimized for and b, Should be maximized. therefore For b The minimum of can be obtained as the derivative for each of them.

식 (8)에서를 구하기 위해 기본 문제에 대한 라그랑제 함수를 이원문제(Dual problem)의 목적함수 Q()로 표현하면 식 (9)와 같이 나타낸다.In equation (8) Lagrange function on the basic problem to find Is the objective function Q of the dual problem. ) Is expressed as in Equation (9).

식 (9)의 목적함수는 일반적으로 Quadratic Programming 문제의 형태로 학습패턴의 항으로만 구성되며, 이때로 표현된다. 그러므로, 분류문제를 식 (9)의 이원문제로 생각하면. 이는 학습패턴이 주어질 때, 제약조건와을 만족하는 목적함수 식 (9)를 최대화하는 라그랑제 계수를 찾는 것이다. 그러므로, Quadratic Programming 알고리즘에 따라 제약조건 식 (5)에서 목적함수 식 (9)를 최대로 하는 최적의 라그랑제 계수를 찾으면 최적의 가중치 벡터는 식 (8)에 의하여 계산될 수 있고, 최적의 바이어스 b 는 support vector로부터 계산될 수 있다. 가중치 벡터와 바이어스에 대한 계산식은 식 (10)과 같이 나타낸다.The objective function of equation (9) is generally composed of terms of learning patterns in the form of quadratic programming problems. It is expressed as Therefore, considering the classification problem as the binary problem of equation (9). This is a learning pattern Given this, constraint Wow Lagrangian coefficients that maximize the objective function (9) To find. Therefore, the optimal Lagrangian coefficient maximizing the objective function equation (9) in the constraint equation (5) according to the Quadratic Programming algorithm. Find the best weight vector Can be calculated by equation (8), and the optimal bias b can be calculated from the support vector. The equation for the weight vector and the bias is shown in equation (10).

여기서과는 식 (11)의 조건을 만족하는 support vector들이다.here and Are the support vectors that satisfy the condition of equation (11).

이때, SVM에 의한 분류식을 정리하면 식 (12)가 선형의 결정면을 가짐을 알 수 있다.At this time, it can be seen that the equation (12) has a linear crystal plane by arranging the classification formula by the SVM.

여기서의이 양수이면 +1이고, 그렇지 않으면 -1을 갖는 함수이다. 하지만 선형으로 분류 가능하지 않는 문제에 대해서도 분류 가능하게 하는 일반화된 초월면을 구성하기 위해서 음수가 아닌 스칼라 변수을 갖게 되는데,는 잘못된 분류와 관계된 오차의 척도로 슬랙변수(slack variables)이다. 따라서 분류 불가능한 경우를 위한 슬랙변수를 포함하는 제약조건은 식 (5)를 식 (13)과 같이 변경함으로서 구할 수 있다[Tatsuya Baba, Shigeyuki Matsuda, "Tracing Network Attacks to Their Sources," IEEE Internet Computing, pp. 20-26, March, 2002.]here of If it is positive, it is +1, otherwise it is -1. However, a nonnegative scalar variable can be used to construct a generalized transcendental surface that can classify problems that are not linearly classifiable. You will have Is a slack variable as a measure of error associated with the wrong classification. Thus slack variables for cases that cannot be classified Constraints, including, can be obtained by changing Eq. (5) to Eq. (13) [Tatsuya Baba, Shigeyuki Matsuda, "Tracing Network Attacks to Their Sources," IEEE Internet Computing, pp. 20-26, March, 2002.]

제약조건을 만족하는 가중치 벡터와 슬랙변수를 포함하는 비용함수는 식 (14)와 같이 나타낼 수 있다.Weight vector that satisfies the constraint And slack variables Cost function including Can be expressed as Equation (14).

이때 C는 학습 오차와 일반화 사이에 상관관계를 제어하는 양의 값을 갖는 파라미터이다. 본 논문에서 제안된 방법을 테스트하기 위하여 사용한 파라미터 C값으로 다양한 변수를 테스트함으로서 최적의 학습오차를 갖는 값을 설정하였으며, SVM의 커널 함수로는 dot와 polynomial 그리고 RBF 커널 함수를 사용하였다.C is a parameter having a positive value that controls the correlation between the learning error and generalization. By testing various variables with the parameter C value used to test the method proposed in this paper, we set the value with the optimal learning error. The kernel functions of SVM are dot, polynomial and RBF kernel functions.

Support Vector Machine 개요Support Vector Machine Overview

패턴 인식을 위한 기존의 전통적인 기법들은 경험적인 위험을 최소화하는데 기초한 반면, SVM은 구조적인 위험을 최소화하는 것에 기초하고 있다. 여기서 경험적 위험의 최소화는 훈련 집단의 수행도를 최적화하려는 노력을 말하고, 구조적 위험의 최소화는 고정되어 있지만 알려지지 않은 확률분포를 갖는 데이터에 대해 잘못 분류하는 확률을 최소화하는 것을 말한다.Existing traditional techniques for pattern recognition are based on minimizing empirical risk, while SVM is based on minimizing structural risk. Here, minimizing empirical risk refers to efforts to optimize the performance of the training group, while minimizing structural risk refers to minimizing the probability of misclassifying data with fixed but unknown probability distributions.

SVM의 장점은 우선 훈련 집단에 포함된 정보를 수집하는 능력이 있으며, 상대적으로 낮은 공간의 결정 평면 집단을 사용한다는 것이다. 패턴 집단이 선형이고 분리 가능한 경우에 있어 SVM은 입력패턴들을 교사학습방법을 통하여, +1과 -1의 두 클래스로 패턴을 분류한다. 훈련 집단 S는 두 클래스로 분류되면, 각 클래스에 포함된 훈련 패턴들을 분리하는 초월면(Hyperplane)이 결정된다. 여기서 초월면이란 각 집단을 분리하는 절단 평면을 일컫는다. 이때, 초월면을 결정하는 입력 패턴들을 Support Vector라 한다.The advantage of SVM is that it first has the ability to collect the information contained in the training group and uses a relatively low spatial decision plane group. In the case where the pattern group is linear and separable, SVM classifies the input patterns into two classes, +1 and -1, through teacher learning. When the training group S is divided into two classes, a hyperplane that separates training patterns included in each class is determined. Transcendence refers to a cutting plane that separates each group. In this case, the input patterns for determining the transcendental plane are called support vectors.

패턴 집단이 분리 가능한 경우에 초월면으로부터 Support Vector까지의 거리(마진)를 최대화하며, 모든 Support Vector는 초월면으로부터 같은 최소 거리에 위치해 있다. 그러나 실제로 패턴집단이 선형으로 분리되는 경우는 거의 드물고, 따라서 두 클래스는 선형적으로 분리가 불가능한 경우가 많을 것이다. 이 때의 초월면과 Support Vector는 제약식을 갖는 최적 문제의 해로부터 얻어진다.When the pattern group is separable, the distance (margin) from the transcendental plane to the Support Vector is maximized, and all Support Vectors are located at the same minimum distance from the transcendental plane. In practice, however, pattern groups are rarely separated linearly, so the two classes will often be impossible to separate linearly. The transcendental plane and the support vector at this time are obtained from the solution of the optimal problem with constraints.

최적해는 마진(각 클래스의 Support Vector사이의 거리)을 가장 크게 하는 것과 에러의 수를 최소화는 것 사이의 trade-off를 가지고 있으며, 이는 정규화 된 파라미터에 의해 조정된다.The optimal solution has a trade-off between maximizing the margin (the distance between each class's Support Vector) and minimizing the number of errors, which is adjusted by normalized parameters.

분류를 위한 Support Vector MachineSupport Vector Machine for Classification

SVM을 통한 분류를 위한 기본 개념을 알아본다. 만약 훈련 데이터가 주어졌을 때,는 두 클래스 중 하나에 속하며,는 해당 클래스를 표시하는 라벨의 역할을 한다. SVM은 각 클래스를 구분하는 최적의 분리 경계면을 구하기 위해 분리 경계면과 가장 분리 경계면에 인접한 점과의 거리를 최대화한다. 최적의 선형 분리 경계면을로 놓으면, Support Vector와의 거리를로 나타낼 수 있다. SVM은를 최소화하여 분리 간격을 최대화하도록 하여 최적 분리면을 찾아낸다. 이 문제는 다음과 같은 블록 최적화 문제가된다.Learn the basic concepts for classification through SVM. If training data Given is, Belongs to one of two classes, Acts as a label that represents the class. SVM maximizes the distance between the separation boundary and the point closest to the separation boundary to find the optimal separation boundary that separates each class. The optimal linear separation boundary To the Support Vector, The distance It can be represented by. SVM Find the optimal separation surface by minimizing the maximum separation interval. This problem becomes the following block optimization problem.

이 문제를 라그랑제(Legendra) 배수로써 쌍대화(Dual Problem) 시키면 아래의 Quadratic 문제가 된다.Dual problem with this Lagendra multiple becomes the Quadratic problem below.

선형 분리경계면으로 완전히 구분할 수 없는 서로 겹쳐져 있는 패턴의 경우에는 slack variable()을 사용한다. 식 (15)로부터 아래의 모델과 같이 표현된다.For overlapping patterns that cannot be completely distinguished by linear dividing boundaries, the slack variable ( ). Equation (15) is expressed as the following model.

위 식 (17)의에서이면 모든 패턴을 완전하게 분리할 수 있다는 것을 의미한다. 그러나 대부분의 패턴은 선형적으로 분리가 가능하지 않다. 따라서 비선형 패턴을 분리하기 위하여 비선형 패턴의 입력 공간을 선형 패턴의 특징 공간으로 전환한다.Of the above formula (17) in This means that all patterns can be completely separated. However, most patterns are not linearly separable. Therefore, in order to separate the nonlinear pattern, the input space of the nonlinear pattern is converted into the feature space of the linear pattern.

즉,서 커널 함수를 정의하면 비선형 패턴을 분리하기 위한 모델은 식(15), (16), (17)으로부터 아래와 같이 표현된다.In other words, Kernel function If we define, the model for separating the nonlinear patterns is expressed as follows from equations (15), (16) and (17).

여기서 C는 식 (17)에서의 Penalty parameter이다. 위의 모델에서 라그랑제 배수 i를 구하면 특징 공간에서 가장 평평한 함수인 아래의 (19)를 구할 수 있다.Where C is the Penalty parameter in equation (17). In the model above, the Lagrange multiple i can be found below (19), which is the flattest function in the feature space.

Support Vector Machine(SVM) 은 1995년 Vapnik에 의하여 개발되고 제안된 학습 알고리즘이다. 이것은 원래 이진분류(binary classification) 를 위하여 개발되었으며 현재에는 생물정보학 (bioinformatics), 문자인식, 필기인식, 얼굴 및 물체인식 등 다양한 분야에서 성공적으로 적용되고 있다.Support Vector Machine (SVM) is a learning algorithm developed and proposed by Vapnik in 1995. It was originally developed for binary classification and is now successfully applied in various fields such as bioinformatics, character recognition, handwriting recognition, face and object recognition.

이진분류 문제는 수집된 training data를 이용해서 두 클래스를 분류하는 target function을 추정해 내는 과정이라고 볼 수 있다. 그렇게 추정된 분류기는 훈련과정에서 이용되지 않은 새로운 data sample에 대해서도 올바른 결과값을 낼 수 있는 일반화 성능 (generalization performance) 이 뛰어나야 한다.The binary classification problem is a process of estimating the target function for classifying two classes using the collected training data. The classifier thus estimated must have good generalization performance to produce correct results even for new data samples not used in the training process.

SVM은 도 7에서 보는 것과 같이 특징 공간(feature space)에서 데이터를 나눌 수 있는 초평면 (possible hyper plane) 중에서 특정한 초평면(optimal hyperplane) 을 선택함으로써 과적합 문제(overfitting)를 방지한다. SVM은 초평면으로부터 가장 가까운 훈련 포인트까지의 최소거리를 최대화시키는 초평면 (hyperplane), 즉, maximum margin hyperplane을 찾게 된다. Support vector라고 불리는 두 class들 사이의 결정경계(decision boundary)에 가까이 놓여있는 훈련 예(sample) 만이 non-zero weight를 갖게 된다. Support vector를 포함하는 초평면 사이의 거리인 margin값이 클수록 분류성능은 좋아진다. 이렇게 찾아낸 초평면을 기준으로 테스트를 시행하여 분류 결과를 얻게 된다. 즉 그림과 같이 이진분류의 경우 SVM은 다음과 같은 방정식으로 설명이 되어진다.As shown in FIG. 7, the SVM prevents overfitting by selecting a specific hyperplane among a hyperplane capable of dividing data in a feature space. SVM finds the hyperplane, or maximum margin hyperplane, that maximizes the minimum distance from the hyperplane to the nearest training point. Only the training sample lying close to the decision boundary between the two classes, called the support vector, has a non-zero weight. The larger the margin, the distance between the hyperplanes containing the support vector, the better the classification performance. Based on the hyperplanes found, tests are performed to obtain classification results. That is, in the case of binary classification, SVM is explained by the following equation.

도 7의 예는 선형으로 분리 가능한 (linearly separable) 데이터 집합의 경우로 아주 쉽게 분류를 할 수 있지만 대부분의 분류 문제의 경우 non-linear한 분포를 취하고 있으므로 일반화(generalization)에 심각한 문제를 겪게 된다. 이 문제의 해결을 위하여 slack variable 과 penalty function의 개념을 도입한 soft margin classifier를 통해서 어느 정도 non-linearly separable classification 문제를 해결할 수 있다.C는 non-separable data에 대한 페널티로 작용하는 변수로서 모델 복잡성과 trade off 관계에 있다. 즉,C가 커지면 학습된 machine은 optimal hyperplane을 구성하는 solution을 제공하는 경향이 있으며,C가 0으로 수렴하는 값일 경우 margin maximization term을 optimize하려는 효과를 제공하게 되며, 그 결과 misclassification error를 minimize하는 term에는 그다지 큰 중점을 두지 않음으로 인해 margin width가 아주 큰 SVM 분류기를 생성해 내게 된다.The example of FIG. 7 is a case of linearly separable data sets, which can be classified very easily. However, since most classification problems have a non-linear distribution, a serious problem arises in generalization. To solve this problem, some non-linearly separable classification problems can be solved through the soft margin classifier which introduces the concept of slack variable and penalty function. C is a trade off for model complexity as a variable that penalizes non-separable data. That is, as C increases, the learned machine tends to provide a solution that constitutes the optimal hyperplane, and when C converges to zero, it provides the effect of optimizing the margin maximization term, resulting in a term that minimizes misclassification errors. Doesn't put too much emphasis on this, which creates an SVM classifier with a very large margin width.

일반적으로는 앞에서 이용한 linear boundary가 입력 벡터를 분류하기에 부적합한 경우가 대부분이다. 이 같은 경우 SVM은 입력 벡터x를 보다 high dimensional feature space내의 벡터로 변형한 후 linear boundary를 찾는 문제로 변형하여 SVM을 구성하게 된다. 입력 공간에서 특징 공간으로의 변환은 일반적으로 non-linear mapping 을 이용하게 되며 이 같은 경우 Cover's Theorem에 의해서 몇가지 조건이 만족할 때 입력 공간에서 non-linearly separable problem이 특징 공간에서는 linear problem으로 변환될 확률이 높음이 알려져 있다. 이러한 high dimensional feature space로의 변환에 이용되는 non-linear mapping function은 Mercer's theorem을 만족하는 함수들의 경우는 일반적으로 가능하다고 알려져 있으며, 그러한 함수들로는 degreeq인 Polynomials, Radial Basis Functions, Two-layer perceptron 등이 있다. SVM에서는 이러한 것을 지원하기 위하여 linear function, polynomial function, radial based function의 커널이 지원된다. 이러한 커널 중 데이터에 가장 적절한 kernel과 그에 따른 parameter의 선택은 SVM의 이용한 분류의 성능에 아주 중요한 영향을 미치게 된다.In general, the linear boundary used previously is not suitable for classifying input vectors. In this case, the SVM transforms the input vector x into a vector in a higher dimensional feature space and transforms the SVM into a problem of finding a linear boundary. The conversion from input space to feature space generally uses non-linear mapping. In this case, when some conditions are satisfied by Cover's Theorem, there is a high probability that non-linearly separable problems will be converted into linear problems in feature space. High is known. Non-linear mapping functions used to transform into high dimensional feature spaces are generally known to be possible for functions that satisfy Mercer's theorem.These functions include degree q polynomials, radial basis functions, and two-layer perceptrons. have. SVM supports kernels of linear, polynomial and radial based functions to support this. The selection of the most appropriate kernel and its parameters among these kernels has a very important effect on the performance of classification using SVM.

DDoS 공격탐지를 위한 SVM 적용시 장단점Pros and cons of applying SVM for DDoS attack detection

기존의 기법에서는 주어진 훈련 데이터에 의하여 학습을 하고 학습에 이용되지 않은 새로운 데이터가 입력으로 들어올 때 올바른 답을 도출해내는 것이 기계학습(learning machine) 방식이다. 실험적 데이터 집합(empirical data set)에 기반한 기계학습은 유한한 데이터의 한계로 인하여 추정하고자 하는 목표함수(objective function)의 분포를 효과적으로 반영하지 못할 수 있다는 문제점이 있다.In the existing technique, the learning machine method is to learn by the given training data and to derive the correct answer when new data is used as input. Machine learning based on an empirical data set has a problem that it may not effectively reflect the distribution of the objective function to be estimated due to the limitation of finite data.

따라서, 기계학습 중에서 대표적인 신경망(neural network) 기법은 일반화 과정에서 많은 문제점을 드러내고 있으며, 시스템의 성능에 중대한 영향을 끼치는 여러 매개변수를 설정하는 과정이 분석적인 과정을 거치지 않고 사용자의휴리스틱(heuristic)에 의존하는 방법을 이용하고 있으며, 경우에 따라서는 문제 환경에 따라 다른 해결책을 제시하기 때문에 문제점이 발생한다.Therefore, the typical neural network technique among machine learning shows many problems in generalization process, and the process of setting various parameters that have a significant effect on the performance of the system does not go through an analytical process. The problem arises because it uses a method that depends on the problem, and in some cases presents a different solution depending on the problem environment.

본 발명의 목적은 상술한 바와 같은 종래 기술에서의 문제점을 개선하기 위해 제안된 것으로서, TCP/IP 헤더 필드에 존재하는 은닉채널을 SVM을 통해 탐지하고 패킷에 마킹하는 방법을 제공하기 위한 것이다.An object of the present invention is to provide a method for detecting hidden channels present in a TCP / IP header field through an SVM and marking a packet.

상기한 바와 같은 목적을 달성하기 위한 본 발명의 실시예에 따르면,다수의 상대 네트워크와 다수의 라우터로 선택적으로 각각 연결되는 통신시스템에 있어서, 라우터에 들어온 패킷에 대해 트래픽의 대역폭을 검사하는 제 1 단계; 일정 이상으로 도착하게 되면 공격 형태에 해당하는 혼잡 시그너쳐인지, 즉 단시간에 많은 양의 패킷이 발생되었는지를 판단하는 제 2 단계; 단시간에 많은 양의 패킷이 발생된 것으로 판단되는 경우, SVM 모듈을 통해 SVM 기반 트래픽 패턴을 분석하는 제 3 단계; SVM 모듈에 의해 공격 패킷인가를 판단하여, 공격 패킷인 경우에는 해당 패킷에 대한 pushback 필드를 마킹하는 제 4 단계; 패킷에 마킹하는 제 5 단계; 마킹된 패킷을 라우터의 출력 큐로 하여금 앞단위 라우터에게 전송하는 제 6 단계;로 이루어지고, 상기 제 2 단계에서 만일 대역폭 조건을 만족하지 않을 경우에는 즉 많은 양의 패킷이 발생되지 않은 경우 pushback 필드가 마킹되었는지 판단하는 제 7 단계; 상기 제 7 단계에서 푸시백 필드가 마킹된 경우에는 상기 제 5 단계의 패킷 마킹단계를 수행하고, 푸쉬백 필드가 마킹되지 않은 경우에는 제 6 단계를 수행하는 것을 특징으로 하는 SVM 기반 패킷 마킹 기법을 적용한 근원지 역추적 방법이 제공된다.According to an embodiment of the present invention for achieving the above object, in a communication system that is selectively connected to a plurality of partner networks and a plurality of routers, respectively, a first for inspecting the bandwidth of the traffic for the packets entering the router step; A second step of determining whether it is a congestion signature corresponding to an attack type, that is, whether a large amount of packets are generated in a short time when it arrives more than a predetermined time; If it is determined that a large amount of packets are generated in a short time, a third step of analyzing the SVM-based traffic pattern through the SVM module; Determining, by the SVM module, whether the attack packet is an attack packet and, if the attack packet is an attack packet, marking a pushback field for the packet; A fifth step of marking the packet; A sixth step of transmitting the marked packet to the router of the previous unit by the output queue of the router; in the second step, if the bandwidth condition is not satisfied, that is, if a large amount of packets are not generated, the pushback field is generated. Determining a seventh step; When the pushback field is marked in the seventh step, the packet marking step of the fifth step is performed, and when the pushback field is not marked, the sixth step is performed. Source traceback methods applied are provided.

바람직하게는, 상기 마킹 단계는 라우터에 입력된 패킷의 IP 데이터 그램은 패킷 TOS 필드 중에서 현재 사용하고 있지 않은 2비트에 대해서 PF(pushback flag)와 CF(congestion flag)를 정의하여 마킹하고, 특히 CF인 경우 RFC2474에서도 네트워크상에서 혼잡 현상이 발생하였을 경우 1로 설정하여 마킹하는 것을 특징으로 한다.Preferably, in the marking step, the IP datagram of the packet input to the router is marked by defining a pushback flag (PF) and a congestion flag (CF) for two bits that are not currently used in the packet TOS field. In the case of RFC2474, when congestion occurs in a network, it is characterized by setting to 1 and marking.

본 발명의 다른 측면에 따르면, 다수의 상대 네트워크와 다수의 라우터로 선택적으로 각각 연결되는 통신시스템에 있어서, 입력포트를 통해 입력되는 개별 TCP/IP 패킷을 전처리 하는 전처리 모듈; 전처리 된 데이터에 대한 SVM 학습을 수행함과 아울러 입력되는 패킷에 대해 트래픽의 대역폭을 검사하고, 일정 이상으로 도착하게 되면 공격 형태에 해당하는 혼잡 시그너쳐인지, 즉 단시간에 많은 양의 패킷이 발생되었는지를 판단하여, 단시간에 많은 양의 패킷이 발생된 것으로 판단되는 경우 SVM 기반 트래픽 패턴을 분석하고, 공격 패킷인가를 판단하는 SVM 모듈; 및 상기 SVM 모듈에 의해 공격 패킷으로 판단된 경우에는 해당 패킷에 대한 pushback 필드를 마킹하고 마킹된 패킷을 라우터의 출력 큐로 하여금 앞단위 라우터에게 전송하는 마킹 모듈로 구성된 것을 특징으로하는 SVM 기반 패킷 마킹 기법을 적용한 라우터가 제공된다.According to another aspect of the present invention, a communication system selectively connected to a plurality of partner networks and a plurality of routers, respectively, comprising: a preprocessing module for preprocessing individual TCP / IP packets input through an input port; In addition to performing SVM learning on preprocessed data, it checks the bandwidth of traffic for incoming packets, and if it arrives above a certain amount, it determines whether it is a congestion signature corresponding to the attack type, that is, whether a large amount of packets occurred in a short time. The SVM module analyzes an SVM-based traffic pattern when it is determined that a large amount of packets are generated in a short time, and determines whether the packet is an attack packet; And a marking module for marking a pushback field for the corresponding packet and transmitting the marked packet to the forwarding unit of the router when the packet is determined to be an attack packet by the SVM module. The router which applied this is provided.

바람직하게는, 상기 마킹 모듈은 많은 양의 패킷이 발생되지 않은 경우 pushback 필드가 마킹되었는지 판단하여 푸시백 필드가 마킹된 경우에는 패킷을 마킹하고, 푸쉬백 필드가 마킹되지 않은 경우에는 해당 패킷을 일반 패킷으로 간주하여 전송하는 것을 특징으로 한다.Preferably, the marking module determines whether the pushback field is marked when a large amount of packets are not generated, and marks the packet when the pushback field is marked, and marks the packet when the pushback field is not marked. It is characterized by transmitting as a packet.

바람직하게는, 상기 전처리 모듈은 하나의 패킷만을 사용하는 것을 특징으로 한다.Preferably, the preprocessing module is characterized by using only one packet.

또한 바람직하게는, 상기 전처리 모듈은 패킷과의 연관 관계를 고려한 탐지 방안으로 여러 개의 TCP/IP 패킷을 연속하여 하나씩 슬라이딩하여 패킷간 타임 딜레이를 전처리 하는 것을 특징으로 한다.Also preferably, the preprocessing module may preprocess the time delay between packets by sliding a plurality of TCP / IP packets one by one in a detection scheme considering a correlation with the packets.

도 1 은 종래 기술에 따른 라우터의 공격자 차단방법을 나타낸 네트워크 구성도이다.1 is a network diagram illustrating a method for blocking an attacker of a router according to the prior art.

도 2 는 종래 기술에 따른 지 라우터의 로그 정보를 이용한 공격자 역추적방법에 관한 구성도이다.2 is a block diagram of an attacker traceback method using the log information of the ground router according to the prior art.

도 3 은 종래 PPM에서의 IP 헤더 형태를 나타낸 도면이다.3 is a diagram illustrating an IP header form in a conventional PPM.

도 4 는 종래 PPM에서의 기법 구조를 나타낸 도면이다.4 is a diagram illustrating a technique structure of a conventional PPM.

도 5 는 종래 노드 샘플링 기반 PPM 기법을 나타낸 도면이다.5 illustrates a conventional node sampling based PPM technique.

도 6 은 종래 에지 샘플링 기반 PPM 기법을 나타낸 도면이다.6 illustrates a conventional edge sampling based PPM technique.

도 7 은 종래 알려진 SVM을 이용한 분류를 나타낸 도면이다.7 is a diagram illustrating a classification using a conventionally known SVM.

도 8 은 본 발명에 따른 SVM 학습을 통한 탐지방안 1을 나타낸 도면이다.8 is a diagram illustrating a detection method 1 through SVM learning according to the present invention.

도 9 는 본 발명에 따른 SVM 학습을 통한 탐지 방안 2를 나타낸 도면이다.9 is a diagram illustrating a detection method 2 through SVM learning according to the present invention.

도 10 은 본 발명에 따른 라우터 기반 DDoS 근원지 역추적 흐름도를 나타낸 도면이다.10 is a diagram illustrating a router-based DDoS source backtracking flowchart according to the present invention.

도 11 은 본 발명에 따른 에스브이엠 기반 패킷 마킹 기법을 적용한 근원지 역추적 방법에서의 패킷 마킹 필드를 나타낸 도면이다.FIG. 11 is a diagram illustrating a packet marking field in a source traceback method using an SMB based packet marking technique according to the present invention.

도 12 는 에스브이엠 기반 패킷 마킹 기법을 적용한 근원지 역추적 방법에서의 패킷 마킹 구조를 나타낸 도면이다.12 is a diagram illustrating a packet marking structure in a source traceback method using an SMB-based packet marking technique.

도 13 은 에스브이엠 기반 패킷 마킹 기법을 적용한 근원지 역추적 방법에서의 라우터 및 공격 경로를 나타낸 도면이다.FIG. 13 is a diagram illustrating a router and an attack path in a source traceback method using an SMB-based packet marking technique.

도 14 는 본 발명에 따른 에스브이엠 기반 패킷 마킹 기법을 적용한 근원지 역추적 방법의 ns-2 기반 실험환경 구축 네트워크를 나타낸 도면이다.FIG. 14 is a diagram illustrating an ns-2 based experimental environment construction network of a source backtracking method using an SMB based packet marking technique according to the present invention.

도 15 는 본 발명에 따른 에스브이엠 기반 패킷 마킹 기법을 적용한 근원지 역추적 방법에서의 ns-2 기반 DDoS 시뮬레이션을 나타낸 도면이다.FIG. 15 is a diagram illustrating ns-2 based DDoS simulation in a source traceback method using an SMB based packet marking technique according to the present invention.

도 16 은 종래 PPM 방식에서의 트래픽을 나타낸 도면이다.16 is a diagram illustrating traffic in a conventional PPM scheme.

도 17 은 본 발명에 따른 에스브이엠 기반 패킷 마킹 기법을 적용한 근원지 역추적 방법에서의 트래픽을 나타낸 도면이다.FIG. 17 is a diagram illustrating traffic in a source backtracking method using an SMB-based packet marking scheme according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명** Explanation of symbols for main parts of the drawing

1001: 전처리 모듈 1002: SVM 모듈1001: preprocessing module 1002: SVM module

1004: 마킹 모듈1004: marking module

이하, 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에서는 TCP/IP 헤더 필드에 존재하는 은닉채널, 혹은 공격 패킷을 탐지하기 위해서 SVM을 학습시키는 두 가지 방안을 제안한다. 먼저 일반적인 학습 방안으로 임의의 라우터(R)에서 입력포트를 통해 입력되는 개별 TCP/IP 패킷을 전처리 모듈(1001)을 통해 전처리하고 그렇게 전처리 된 데이터에 대한 SVM 학습을 수행함과 아울러 처리하는 SVM 모듈(1002)이 도 8과 같이 도시되었다(SVM 학습을 통한 은닉 채널 혹은 공격 패킷 탐지방안 1). 이 방법은 단일 패킷을 SVM의 입력 데이터로서 간주하는 것이다.The present invention proposes two methods of learning SVM to detect hidden channel or attack packet present in TCP / IP header field. First, as a general learning method, an SVM module which preprocesses individual TCP / IP packets inputted through an input port at an arbitrary router R through the preprocessing module 1001 and performs SVM learning on the preprocessed data, 1002) is shown as shown in FIG. 8 (a hidden channel or attack packet detection method 1 through SVM learning). This method considers a single packet as input data of the SVM.

미설명 부호 1004는 본 발명에서의 하나의 특징인 마킹 모듈을 나타낸 것이다. 마킹 모듈(1004)의 기능을 이후 설명한다.Reference numeral 1004 denotes a marking module which is one feature of the present invention. The function of the marking module 1004 will be described later.

하지만, 이러한 방법은 패킷간의 관련성을 고려하지 않고 단일 패킷만의 특성을 사용하여 학습되므로 탐지의 결과가 전처리과정에서 사용하는 feature에만 밀접한 영향을 받을 것으로 예상된다. 따라서 이렇게 하나의 패킷만을 사용하는 것이 아닌, 패킷사이의 연관 관계를 고려한 탐지 방안으로 여러 개의 TCP/IP 패킷을 연속하여 하나씩 슬라이딩하여 패킷간 타임 딜레이를 전처리 모듈(1001)에서 전처리 한 후 SVM 학습을 수행함과 아울러 처리하는 SVM 모듈(1002)이 도 9에 도시된다.However, since this method is trained using the characteristics of a single packet without considering the correlation between packets, the detection result is expected to be closely influenced only by the features used in the preprocessing. Therefore, instead of using only one packet, SVM learning is performed after preprocessing the time delay between the preprocessing module 1001 by sliding several TCP / IP packets one by one in succession. An SVM module 1002 that performs and processes it is shown in FIG. 9.

이러한 패킷 슬라이딩 기법은 은닉채널에 숨겨진 데이터들과 이러한 데이터를 전달하는 패킷의 고유 특성이 일반 패킷과는 다르다는 것에 기인한다. 즉, 은닉채널 속에 은닉되어 전달되는 데이터들은 데이터 상호간의 연관성을 가지고 있다고 가정할 수 있으며 따라서 연속한 몇 개의 패킷들을 하나의 학습 입력 단위로 고려한다면 은닉채널 탐지에 보다 높은 성능을 가지리라 예상된다.This packet sliding technique is due to the fact that the data hidden in the hidden channel and the inherent characteristics of the packet carrying the data are different from the ordinary packet. That is, data concealed and delivered in the hidden channel can be assumed to have correlations between data. Therefore, if several consecutive packets are considered as a learning input unit, it is expected to have higher performance in concealment channel detection.

이러한 패킷간 시간 연관성을 고려한 SVM 학습 방안이 도 9와 같이 제안(SVM 학습을 통한 탐지방안 2)될 수 있다.The SVM learning scheme considering the time correlation between packets may be proposed (detection scheme 2 through SVM learning) as shown in FIG. 9.

DDoS 공격에 대한 패킷 마킹 기반 역추적Packet Marking-based Traceback for DDoS Attacks

Puchback을 적용한 역추적 구조Traceback Structure with Puchback

네트워크는 노드 집합 V 와 에지 집합 E 로 구성된 그래프 G=(V,E) 로 정의할 수 있다. 다시 네트워크 노드 집합 V 는 종단 시스템과 내부 노드에 해당하는 라우터로 나눌 수 있다. 에지는 V 집합 내에 있는 노드들에 대한 물리적인 연결에 해당한다. S SUBSET V 를 공격자라고 정의하고 t IN V/S 를 피해 시스템이라고 정의한다.The network can be defined by a graph G = (V, E) consisting of node set V and edge set E. Again, the network node set V can be divided into routers corresponding to end systems and internal nodes. An edge corresponds to a physical connection to nodes in the V set. We define S SUBSET V as an attacker and t IN V / S as a victim system.

만일일 경우 단일 공격자에 의한 해킹 공격을 의미하고 공격 경로 정보인 경우 공격 시스템 s 에서 피해 시스템 t 로 d 개의 라우터를 통해 전달된 공격 경로를 의미한다. 이때 전달된 패킷의 수를 N 이라고 하자. 만일 패킷내에 라우터에 대한 링크 정보를 마킹할 수 있는 필드가 있다면 이를 확률 ρ 샘플링하여 전달하게 된다. 패킷에 대해서 라우터에서는 일정한 확률로 패킷을 선택하여 에지에 대한 정보와 라우터에 대한 거리 정보를 패킷내에 포함시켜 전달할 수 있다.if In this case, it means a hacking attack by a single attacker and the attack path information. In this case, it means the attack path transmitted through d routers from the attack system s to the damage system t. Let N be the number of packets delivered. Link information for the router in the packet If there is a field that can mark, the probability ρ is sampled and passed. With respect to the packet, the router can select the packet with a certain probability and transmit the information about the edge and the distance information about the router in the packet.

기존의 기법에서는 임의의 확률 ρ로 패킷을 선택하여 여기에 라우터에 대한 링크 정보를 마킹하여 전달하게 된다. 만일 네트워크 상에서 노드에서 마킹하였을 경우 다른 라우터에 의해서는 재마킹되지 않고 전달될 확률을 계산하면 다음과 같다.In the conventional scheme, a packet is selected with a random probability ρ and marked with the link information of the router. If nodes on the network If it is marked at, the probability that it will be delivered without being remarked by another router. The calculation is as follows.

따라서 확률는 공격자에 해당하는 패킷 정보가 다른 라우터에 의해서는 재마킹되지 않고 피해 시스템에 전달될 확률을 의미한다. 결국 피해 시스템에서값을 높이기 위해서는 ρ값을 크게 해야 하는데, 이는 라우터에서 빈번하게 마킹 과정을 수행해야 한다는 것을 의미하므로 기존의 기법에서는 결과적으로 네트워크 성능을 저하시키게 된다.Thus the probability Means the probability that the packet information corresponding to the attacker is delivered to the victim system without being remarked by another router. Eventually in the damage system In order to increase the value, the value of ρ must be increased, which means that the marking process must be frequently performed at the router, and thus, the conventional technique degrades the network performance.

본 발명에서 제시하는 기법은 라우터(R)에서 임의의 확률 ρ로 패킷을 샘플링하여 마킹하지 않고 SVM 모듈(1002)에 의해서 이상 트래픽이 발견되었을 경우 패킷에 대한 마킹 과정을 수행하게 된다. 물론 기존의 ACC 기법에서 사용하는 방법과는 달리 이상 트래픽이 발견되었을 경우 단순히 pushback 메시지를 상위 라우터에 재귀적으로 전달하는 것이 아니라, 상위 라우터에 pushback 메시지를 전달하면서 해당 패킷에 마킹 과정을 수행한다. pushback 메시지를 받은 상위 라우터에서는 메시지 내에 포함된 해킹 트래픽 특성을 인식한 후에 마찬가지로 자신의 라우터에서 2개의 라우터 주소값으로 마킹 과정을 수행하여 이를 목적지에 전달하게 된다. 본 발명의 라우터 구조에 따른 패킷 판단 및 마킹 처리 방법은 도 10과 같다.The technique proposed in the present invention performs a marking process for packets when abnormal traffic is found by the SVM module 1002 without sampling and marking packets with a random probability ρ at the router R. Of course, unlike the conventional ACC method, if an abnormal traffic is found, the packet is processed while the pushback message is not recursively delivered to the upper router. After receiving the pushback message, the upper router recognizes the hacking traffic characteristics included in the message and similarly performs the marking process with two router addresses in its own router and delivers it to the destination. The packet determination and marking processing method according to the router structure of the present invention is shown in FIG.

도 10에 따르면, 라우터(R)에 들어온 패킷에 대해 트래픽의 대역폭을 검사하고(S2) 일정 이상으로 도착하게 되면 공격 형태에 해당하는 혼잡 시그너쳐인지를 판단하게 된다(S4). 즉, 단계 S2에서 단시간에 많은 양의 패킷에 발생된 것으로 판단되는 경우, SVM 모듈(1002)을 통해 SVM 기반 트래픽 패턴을 분석한다(S6). 이후 SVM 모듈에 의해 공격 패킷인가를 판단하여(S8), 이후, 해당 패킷에 대한 pushback 메시지를 생성한다(S10). 이 과정은 푸쉬백 필드 마킹과정(도 11참조)이라고 할 수 있다. 이 단계에서 생성된 해당 패킷에 대한 푸쉬백 메시지가 SVM 모듈(1002)내에 파라미터로 갱신되거나 추가된다. 상기 단계 S10이후에, 패킷에 마킹하는 과정을 수행한다(S12). 이후, 이를 라우터의 출력 큐로 하여금 앞단위 라우터에게 전송토록 한다(S14).According to FIG. 10, when the bandwidth of the traffic is checked for a packet entering the router R (S2), when the packet arrives at a predetermined level or more, it is determined whether the signature corresponds to a congestion signature (S4). That is, when it is determined that a large amount of packets occurred in a short time in step S2, the SVM-based traffic pattern is analyzed through the SVM module 1002 (S6). Thereafter, the SVM module determines whether the attack packet is present (S8), and then, generates a pushback message for the corresponding packet (S10). This process may be referred to as a pushback field marking process (see FIG. 11). The pushback message for that packet generated in this step is updated or added as a parameter in the SVM module 1002. After the step S10, a process of marking a packet is performed (S12). Thereafter, this causes the output queue of the router to be transmitted to the previous unit router (S14).

상기 단계 S4에서 만일 대역폭 조건을 만족하지 않을 경우에는 이전에 pushback 메시지를 통해 주변 라우터로부터 전달된 정보가 있는지를 확인하는 즉이 과정은 푸쉬백 필드가 마킹되었는지를 판단하는 과정이다(S20). 만일 푸쉬백 필드가 마킹되었다면, 상기 단계 S12에서 처럼 패킷에 대한 마킹 과정을 수행한다. 상기 단계 S20에서의 푸시백 필드가 마킹 되지 않은 경우 일반적인 트래픽으로 간주하여 다음 라우터로 전달되는 S14과정이 수행된다. 상기 단계 S20은 단계 S8에서 SVM 모듈에 의해 공격 패킷이 아닌 경우로 판단된 경우에도 다시 수행된다.If the bandwidth condition is not satisfied in the step S4, the process of checking whether there is information previously transmitted from the neighbor routers through the pushback message, that is, the process of determining whether the pushback field is marked (S20). If the pushback field is marked, the marking process for the packet is performed as in step S12. If the pushback field in step S20 is not marked, S14 is performed, which is regarded as general traffic and forwarded to the next router. The step S20 is performed again even when it is determined in step S8 that the packet is not an attack packet by the SVM module.

이하, Pushback을 적용한 역추적 마킹 기법을 좀 더 자세히 설명한다.Hereinafter, a backtracking marking technique using pushback will be described in more detail.

Pushback을 적용한 역추적 마킹 기법Traceback Marking Technique with Pushback

(1) 패킷 헤더 마킹 필드 (1) packet header marking field

라우터의 IP 주소를라고 하자. 그리고에 도착한 IP 패킷을라고 할 때,에서의 해더에서 마킹 정보를 저장할 수 있는 24 비트를라고 하자.router IP address Let's say And IP packets arriving at When I say 24 bits to store the marking information in the header Let's say

- 라우터 :- 라우터의 IP 주소 : - router : Router's IP address:

- 라우터에 도착한 패킷 :패킷에서의 변형 가능한 헤더 24 비트 :패킷에서는 도 11과 같이 TOS(type of service) 필드 8비트와 ID 필드 16비트로 구성된다. TOS 필드인 경우 현재 필드에 대한 정의만 되어 있을 뿐 실제적으로 사용하고 있지 않다. 따라서 TOS 필드 값을 사용한다고 하더라도 전체 네트워크에 영향을 미치지 않는다.- router Packets arrived at: Modifiable header 24 bits in the packet: packet in 11 is composed of 8 bits of the type of service (TOS) field and 16 bits of the ID field. In case of TOS field, only the current field is defined and not actually used. Therefore, using the TOS field value does not affect the entire network.

현재의 TOS 필드는 상위 3비트가 우선순위 비트로 설정되어 있고, 다음 3비트는 최소지연, 최대 성능 및 신뢰성 필드로 정의되어 있으나 현재는 사용하고 있지 않다. 다만 최근에 RFC2474에 의하면 Differenciated Service 필드(DS field)로 재정의하였으며 TOS 8비트 중에서 상위 6비트만을 사용하고 하위 2 비트는 사용하지 않고 있다. 따라서 본 발명에서는 TOS 필드 중에서 현재 사용하고 있지 않은 2비트에 대해서 PF(pushback flag)와 CF(congestion flag)로 정의한다. 특히 CF인 경우 RFC2474에서도 네트워크상에서 혼잡 현상이 발생하였을 경우 1로 설정하도록 정의되어 있다.In the current TOS field, the upper 3 bits are set as priority bits, and the next 3 bits are defined as the minimum delay, maximum performance, and reliability fields, but are not currently used. Recently, however, according to RFC2474, it is redefined as a differential service field (DS field). Only the upper 6 bits of the 8 bits of the TOS are used and the lower 2 bits are not used. Therefore, in the present invention, two bits not currently used in the TOS field are defined as a pushback flag (PF) and a congestion flag (CF). Especially in case of CF, RFC2474 is defined to be set to 1 when congestion occurs in the network.

(2) TTL 정보를 이용한 마킹 구조(2) Marking structure using TTL information

24비트정보에 대해서 라우터에 대한 IP 주소값을 패킷 헤더에 마킹하는 과정은 다음과 같다.24-bit About information router IP address for The process of marking a value in a packet header is as follows.

패킷에서 마킹이 가능한 24비트 정보에 대해서 pushback 과정을 통해 이상 트래픽이 발생하였을 경우 이에 대한 마킹을 위해 라우터자신의 IP 주소와 pushback에 의한 전단계 라우터의 IP 주소를 패킷에 마킹한다. 24비트 내에 두개의 라우터 주소값을 마킹해야 하기 위해서 라우터에 대한 해쉬 값을 적용하여 인증 기능도 제공하는 주소값을 마킹하게 된다.Router for marking 24bit information that can be marked in packet when abnormal traffic occurs through pushback process Your IP address Level router by push and pushback IP address Mark the packet. In order to mark two router address values within 24 bits, a hash value for the router is applied to mark an address value that also provides authentication.

모든 패킷의 TTL(time to live) 필드는 8비트 정보로 구성되며 패킷 전송시 일반적으로 255로 설정되어 전송된다. 라우터에 의해 전송되는 과정에서 TTL 값은 1씩 감소되어 최종적으로 목적지에 전달된다.The TTL (time to live) field of every packet is composed of 8-bit information and is generally set to 255 when the packet is transmitted. In the process transmitted by the router, the TTL value is decremented by 1 and finally delivered to the destination.

현재 TTL 값은 네트워크 상에 패킷 전송시 대역폭을 확보하고 목적지에 도착하지 않는 패킷을 제어하기 위한 목적으로 사용된다. 기존의 연구에서는 TTL 값을 사용하지 않고 다만 별도의 hop 카운터 필드를 두어 패킷이 전달된 거리 정보를 계산하도록 하고 있다. 그러나, 본 발명에서는 라우터에 도착한 패킷의 TTL 값에서 일부 정보를 사용하여 패킷 마킹 과정에 사용한다.The current TTL value is used for the purpose of securing the bandwidth in packet transmission on the network and controlling packets not arriving at the destination. Existing researches do not use TTL values, but have a separate hop counter field to calculate the distance information of a packet. However, in the present invention the router Some information is used in the TTL value of the packet arriving at the packet marking process.

구체적으로 TTL 필드 8비트에서 일반적으로 네트워크 홉 거리는 최대 32 정도로 되어 있기 때문에 라우터에 도착한 패킷의 TTL 필드 하위 6 비트 정보만으로도 패킷이 전달된 거리 정보를 계산할 수 있다. 즉, 패킷에서 TTL 필드에서 하위 6비트 정보에 추출하여 이를라고 하고 패킷의 TOS 6비트 필드에 저장한다.Specifically, because routers typically have a network hop distance of up to 32 in 8-bit TTL fields, routers Arrived at The distance information over which a packet is transmitted can be calculated using only 6 bits of lower TTL field. Ie packet Extracts the lower 6 bit information from the TTL field and TOS 6-bit field of the packet Store in

값은 현재 패킷이 공격지 시스템으로부터 전달된 거리 정보를 나타내며, 만일 이를 패킷에 포함시킨다면 목적지 시스템 V 에 패킷이 도달하였을 경우 V 에서 마찬가지로 계산된값을 비교하여 패킷이 라우터로부터 전달된 거리 정보도 계산할 수 있다. The value indicates the distance information that the current packet was delivered from the attacking system. Compare the value of the packet to the router Distance information transmitted from can also be calculated.

(3) 라우터에서의 역추적 경로 마킹(3) Marking Traceback Paths in Routers

앞에서 제시한 SVM 모듈(1002)을 통해 이상 트래픽이 발생하였다는 것을 통보받게 되면 이제 라우터에서는 pushback 메시지 내에 포함된 혼잡 시그너쳐에 해당하는 패킷에 대해서 마킹 과정을 수행한다.If you are notified that the abnormal traffic has occurred through the SVM module 1002, the router Is a packet corresponding to a congestion signature contained in a pushback message. Perform the marking process for.

우선 pushback 메시지를 받았기 때문에 TOS 필드에서의 PF 필드를 1로 설정한다. 그리고 현재 패킷에서의 TTL 필드 8 비트에 대해값을 계산하고 이를 TOS 필드 6비트에 저장한다. 그리고 라우터의 주소와 앞에서 계산된값에 대해 해쉬 함수를 사용하여 8비트 해쉬 값을 계산하고 이를 ID 필드 처음 8비트인에 마킹한다. 마킹된 패킷은 패킷의 목적지 주소에 해당하는 라우팅 경로의 다음 라우터에게 전달된다.First, since we received a pushback message, set the PF field in the TOS field to 1. And the current packet For TTL field 8 bits in Compute the value and store it in 6 bits of the TOS field. And router Address And calculated before Hash Function on Value To compute an 8-bit hash value and replace it with the first 8 bits of the ID field. Mark on The marked packet is the next router in the routing path that corresponds to the packet's destination address. Is passed to.

이제 라우터는 패킷의 PF 필드값을 보고 1로 설정되어 있는 경우 패킷에서의 TOS 필드 6비트에 해당하는에서 1을 뺀 값과 라우터 IP 주소에 대해 마찬가지로 해쉬 함수를 적용하여에 마킹한다.Now router Is the PF field value of the packet. Is set to 1, it corresponds to the 6 bits of the TOS field in the packet. Minus 1 and router IP address Likewise, by applying a hash function Mark on

마킹과정을 수행한 후에는 CF 필드 값을 1로 설정하여 다음 라우터로 전송하게 되며 다음 라우터는 PF 필드 값과 CF 필드 값이 1로 설정되어 있는 경우에는 이전 라우터에 의해 마킹된 패킷이므로 더 이상 마킹 과정을 수행하지 않는다.After the marking process, the CF field value is set to 1 and transmitted to the next router. If the PF field value and CF field value are set to 1, the next router is marked by the previous router. Do not carry out the process.

역추적 경로 재구성Traceback Reconstruction

(1) DDoS 공격 패킷 역추적(1) DDoS attack packet traceback

네트워크를 통해 전달된 패킷에 대해 피해시스템 V 에서는 DDoS 공격 경로를 재구성하게 된다. 도 13과 같이 DDoS 공격을 S1,`S2,`S3 에서 수행하였다고 가정하자. 공격 패킷에 대해 라우터,및는 패킷 헤더 24비트 정보내에 라우터 자신의 IP 정보와 패킷에서의 TTL 필드 6비트 정보를 마킹하였다. 피해시스템에서는 DDoS 공격이 발생하였을 경우 도착한 패킷에 대해 아래와 같이 경로 역추적 과정을 수행한다.The victim system V reconfigures the DDoS attack path for the packets transmitted through the network. Suppose that the DDoS attack was performed in S1, S2, and S3 as shown in FIG. Router against attack packets , And Marked the router's own IP information in the packet header 24-bit information and the TTL field 6-bit information in the packet. In case of a DDoS attack, the victim system performs the path traceback process on the packets that arrive.

우선 피해시스템 V 에 도착한 패킷을집합이라고 정의하자.값은 DDoS 공격에 해당하는 패킷들로 구성된 집합이고, 집합내에서 라우터에 의해 마킹되어 전달된 패킷의 집합을라고 하자.First, the packet that arrived at the victim system V Define it as a set. A value is a set of packets corresponding to a DDoS attack, and represents a set of packets marked and forwarded by a router in the set. Let's say

피해시스템에 도착한 패킷 집합에서값을 구별하는 방식은 아래와 같이 패킷에서의 TOS 필드 값중에서 임의의 패킷에서의 패킷 PF 필드에 해당하는와 CF 필드부분이 설정되어 있는 패킷을 선택하는 과정을 수행하게 된다.Set of packets arriving at the victim system in The way to distinguish the values is to use any packet among the TOS field values in the packet as shown below. Corresponds to the packet PF field in And CF fields The process of selecting a packet in which a part is set is performed.

즉, 피해시스템에서 마킹되어 있는 패킷의 원소에 해당하는 임의의 패킷에 대해서 8비트 TTL 값을라고 정의할 수 있고, TOS 필드에 패킷된 정보값과 비교하여 패킷가 라우터로 부터 마킹된 후에 전송된 네트워크 홉 거리를 다음과 같이 계산 할 수 있다.That is, packets marked in the victim system Any packet corresponding to an element of 8-bit TTL value for Information packetized in the TOS field. Packet against the value Hop distance transmitted after traffic has been marked from the router Can be calculated as

만일이라면 피해시스템 바로 앞에 있는 라우터에 의해서 마킹되었다는 것을 알 수 있다. 그러나, 본 발명에서 제시하는 기법은 pushback 기법과 연계하였기 때문에,인 패킷을 대상으로 바로 역추적 경로 재구성 과정을 수행할 수 있다.if If it is, it is marked by the router right in front of the victim. However, since the technique proposed in the present invention is associated with the pushback technique, The traceback reconstruction process can be performed directly on the incoming packet.

(2) DDoS 공격 경로 재구성(2) DDoS attack path reconstruction

을 만족하는 패킷는 피해시스템 바로 앞단에 연결되어 있는 두 홉 거리 내에 있는 라우터및에 의해서 마킹된 패킷이라는 것을 의미한다. 즉, 패킷는 피해시스템과 바로 연결되어 있는 라우터와 2 홉 거리에 있는 임의의 라우터에 의해 마킹되었기 때문에값은 2가 된다. 따라서 패킷에서 우선 2 홉 거리를 갖는 라우터를 다음과 같이 판별할 수 있다. Packet that satisfies Is a router that is within two hops distance from the front of the victim. And This means that the packet is marked by. Ie packet Is a router directly connected to the victim system. Any router that is 2 hops away Because it was marked by The value is 2. Thus packet Router with 2 hop distance first Can be determined as follows.

물론 패킷는 피해시스템과 홉 거리 1에 해당하는 라우터에 의해 마킹되었다는 것 역시 아래과 같은 방식으로 검증이 가능하다.Of course packet Is the router corresponding to the damage system and hop distance 1 Marked by can also be verified in the following manner.

이제는를 만족하는에 대해서 위와 같은 과정을 반복하게 되면 DDoS 공격 패킷 집합에서 패킷이 전달된 실제 공격 경로를 재구성할 수 있다.now To satisfy Repeating the above process for DDoS attack packet set Can reconstruct the actual attack path through which the packet was delivered.

도 13과 같은 네트워크 구조에 대해 본 발명에서 제시한 기법을 적용하게 되면 피해시스템에 대한 DDoS 공격 경로 AP 를 다음과 같이 구할 수 있다.Applying the technique proposed in the present invention to the network structure as shown in Figure 13 can be obtained as follows DDoS attack path AP for the damage system.

이와 같은 과정을 통해 라우터에서는 ACC 모듈을 통해 네트워크상에 트래픽에 대한 감시 및 판단 기능을 수행하면서도 변형된 pushback 기술을 적용하여 네트워크 제어 기능을 수행할 수 있고, DDoS 해킹 경로를 역추적하기 위해서 개선된 패킷 마킹 기술을 적용하여 스푸핑된 패킷에 대한 역추적 기능도 제공하여 공격자에 대한 근원지를 재구성할 수 있다. 또한 해쉬 방식을 적용하여 공격자에 의한 마킹 정보 검증 구조도 제공하였다.Through this process, the router can perform network control by applying modified pushback technology while monitoring and determining traffic on the network through the ACC module, and improved to trace back the DDoS hacking path. Packet marking techniques can also be used to provide backtracking for spoofed packets to reconfigure the origin for attackers. Also, by applying hash method, we provided the verification structure of marking information by attacker.

제시한 기법의 성능 분석Performance analysis of the proposed technique

1. 실험결과1. Experiment Results

본 발명에서 제시한 기법에 대한 성능을 평가하기 위해서 Linux 환경에서 ns-2 시뮬레이터를 이용하여 성능을 분석하였다. 도 14와 같은 네트워크를 구성하고 도 15와 같이 0 노드, 1번 및 2번 노드에서 DDoS 공격을 수행하도록 시뮬레이션 하였다.In order to evaluate the performance of the proposed technique, we analyzed the performance using ns-2 simulator in Linux environment. The network as shown in FIG. 14 was configured and simulated to perform a DDoS attack on nodes 0, 1 and 2 as shown in FIG. 15.

실험 결과 기존의 패킷 마킹 기법은 도 16과 같이 DDoS 공격에 대해 각 라우터에서 확률 ρ 로 샘플링하여 마킹하는 방식이므로 전체 마킹된 패킷(파란선:v1.tr)의 수가 DDoS 트래픽(붉은선:r0.tr)에 비례하여 생성되는 것을 볼 수 있다. 본 발명에서 제시하는 기법인 경우 도 17에서와 같이 pushback 기법을 적용하여 DDoS 트래픽에 대한 마킹 과정을 수행하기 때문에 마킹된 패킷의 수가25% 정도 감소하는 것을 확인할 수 있었다.As a result of the experiment, the conventional packet marking technique is a method of sampling and marking the probability ρ at each router for a DDoS attack as shown in FIG. 16, so that the total number of marked packets (blue line: v1.tr) is marked by DDoS traffic (red line: r0. You can see that it is generated in proportion to tr). In the case of the scheme proposed by the present invention, as shown in FIG. 17, the number of marked packets is reduced by about 25% because the marking process for the DDoS traffic is performed by applying the pushback technique.

본 발명에서 제시한 기법은 기존의 PPM 기법과 유사한 방식으로 작동하기 때문에 관리 부하가 적으며, 라우터에서 패킷에 대한 판별 및 제어 기능을 적용하였기 때문에 DDoS와 같은 해킹 공격이 발생하였을 경우 전체 네트워크의 부하를 줄일 수 있다는 장점을 제공한다. 또한 기존의 PPM 기법에서는 임의의 확률 p 로 패킷을 선정하여 마킹 과정을 수행하였으나 본 발명에서 제시한 기법은 TTL 필드 값을 이용하여 경로 정보를 마킹하기 때문에 피해 시스템에 도달하는 역추적 경로 재구성에 필요한 패킷의 수를 줄일 수 있는 효과가 있다.Since the scheme proposed in the present invention operates in a manner similar to the conventional PPM scheme, the management load is small, and since the router's identification and control function is applied to the router, the load of the entire network is generated when a hacking attack such as DDoS occurs. It offers the advantage of reducing the cost. In addition, in the conventional PPM scheme, a marking process is performed by selecting a packet with a random probability p. However, the scheme proposed in the present invention uses the TTL field value to mark the route information. This can reduce the number of packets.

따라서 전체 네트워크 상의 대역폭을 향상시킬 수 있고, 적은 개수의 마킹 패킷만을 가지고도 DDoS 공격 근원지에 대한 경로를 재구성할 수 있는 효과가 있다. 경로 재구성을 위해서는 네트워크에서 n 개의 라우터를 거치는 경우 단지 n 개의 역추적 메시지만으로 근원지 경로를 재구성할 수 있다는 효과가 있다.Therefore, the bandwidth of the entire network can be improved and the path to the source of the DDoS attack can be reconfigured even with a small number of marking packets. Path reconfiguration has the effect of reconfiguring the source path with only n traceback messages when passing through n routers in the network.

Claims

In a communication system selectively connected to a plurality of partner networks and a plurality of routers, respectively,

Checking the bandwidth of the traffic against the packet entering the router;

A second step of determining whether it is a congestion signature corresponding to an attack type, that is, whether a large amount of packets are generated in a short time when it arrives more than a predetermined time;

If it is determined that a large amount of packets are generated in a short time, a third step of analyzing the SVM-based traffic pattern through the SVM module;

Determining, by the SVM module, whether the attack packet is an attack packet and, if the attack packet is an attack packet, marking a pushback field for the packet;

A fifth step of marking the packet;

A sixth step of causing the router to output the marked packet to the previous unit router;

Determining whether the pushback field is marked if the bandwidth condition is not satisfied in the second step, that is, if a large amount of packets are not generated;

When the pushback field is marked in the seventh step, the packet marking step of the fifth step is performed, and when the pushback field is not marked, the sixth step is performed. Source traceback method applied.

The method of claim 1, wherein the marking is performed by marking an IP datagram of a packet input to a router by defining a pushback flag (PF) and a congestion flag (CF) for two bits that are not currently used in the packet TOS field. In case of CF, the origin traceback method using SVM-based packet marking technique, which is set to 1 when congestion occurs in the network even in RFC2474.

A preprocessing module for preprocessing individual TCP / IP packets input through an input port;

In addition to performing SVM learning on preprocessed data, it checks the bandwidth of traffic for incoming packets, and if it arrives above a certain amount, it determines whether it is a congestion signature corresponding to the attack type, that is, whether a large amount of packets occurred in a short time. The SVM module analyzes an SVM-based traffic pattern when it is determined that a large amount of packets are generated in a short time, and determines whether the packet is an attack packet;

If the SVM module is determined to be an attack packet, the SVM-based packet marking scheme comprises a marking module for marking a pushback field for the corresponding packet and transmitting the marked packet to the router of an output unit of the router. Router applied.

4. The method of claim 3, wherein the marking module determines whether the pushback field is marked when a large amount of packets are not generated, and marks the packet when the pushback field is marked, and the corresponding packet when the pushback field is not marked. Router applying SVM-based packet marking scheme, characterized in that the packet is transmitted as a normal packet.

4. The router according to claim 3, wherein the preprocessing module uses only one packet.

The method of claim 3, wherein the preprocessing module applies a SVM-based packet marking technique to preprocess the time delay between packets by sliding a plurality of TCP / IP packets one by one in a detection scheme considering a correlation with packets. router.