KR100594755B1

KR100594755B1 - The packet classification method through hierarchial rulebase partitioning

Info

Publication number: KR100594755B1
Application number: KR1020040059058A
Authority: KR
Inventors: 최린
Original assignee: 삼성전자주식회사
Priority date: 2004-05-11
Filing date: 2004-07-28
Publication date: 2006-06-30
Also published as: KR20050108301A

Abstract

계층적 룰베이스 분할을 통한 패킷 분류 방법을 개시한다. 본 발명에 따른 계층적 룰베이스 분할을 통한 패킷 분류 방법은 외부 네트워크로부터 입력되는 다수의 패킷을 소정 룰베이스에 포함되는 다수의 룰 중에서 매칭되는 가장 우선순위가 높은 룰을 찾은 후, 이 룰을 따라 패킷을 처리하는 패킷 분류 방법에 있어서, 룰베이스를 소정 조건을 기준으로 다수의 독립적인 서브-룰베이스로 분할하고, 이를 기초로 해시 테이블을 생성하는 전처리 단계, 및 입력되는 패킷의 패킷 헤더로부터 추출된 해시키를 이용하여 해시 테이블을 검색하고, 이에 따라 패킷을 대응되는 서브-룰베이스에 매핑함으로써 패킷을 분류하는 패킷 분류 단계를 포함한다. 본 발명에 의하면 입력되는 패킷을 이와 연관된 서브-룰베이스 내에서만 분류를 수행함으로써 패킷 분류를 가속화할 수 있는 장점이 있다.A packet classification method through hierarchical rulebase partitioning is disclosed. In the packet classification method using hierarchical rulebase partitioning according to the present invention, after finding a rule having the highest priority among a plurality of rules included in a plurality of packets input from an external network in a predetermined rulebase, the rule is followed by this rule. A packet classification method for processing a packet, the method comprising: dividing a rulebase into a plurality of independent sub-rulebases based on a predetermined condition, and generating a hash table based on the preprocessing step; and extracting from a packet header of an input packet. A packet classification step of retrieving a hash table using the hashed hash and classifying the packet accordingly by mapping the packet to a corresponding sub-rulebase. According to the present invention, there is an advantage in that packet classification can be accelerated by classifying an input packet only within a sub-rulebase associated with it.

패킷, 방화벽, 프로토콜, 포트, IP 주소, 엔트로피Packet, firewall, protocol, port, IP address, entropy

Description

Packet classification method through hierarchial rulebase partitioning

도 1은 본 발명에 따른 계층적 룰베이스 분할을 통한 패킷 분류 방법의 개념을 나타내기 위한 도면,1 is a diagram illustrating a concept of a packet classification method through hierarchical rulebase partitioning according to the present invention;

도 2는 제1 해싱단계를 적용한 경우 서브-룰베이스의 최대 크기 및 평균 크기를 도시한 그래프,2 is a graph illustrating the maximum and average sizes of a sub-rulebase when the first hashing step is applied;

도 3은 다양한 해시키 선택 알고리즘을 이용하여 제2 해싱단계를 적용한 후에, 서브-룰베이스의 크기에 따른 룰의 평균 개수를 나타낸 그래프,3 is a graph showing an average number of rules according to the size of a sub-rulebase after applying a second hashing step using various hashing selection algorithms.

도 4는 다양한 해시키 선택 알고리즘을 이용하여 제2 해싱단계를 적용한 후에 서브-룰베이스의 크기에 따른 룰의 최대 개수를 나타낸 그래프, 4 is a graph showing the maximum number of rules according to the size of a sub-rulebase after applying a second hashing step using various hashing selection algorithms.

도 5는 출발지와 목적지 IP 주소로부터 각각 최상위 8비트를 추출한 해시키 알고리즘을 이용하여 만든 룰베이스 및 해시 테이블을 도시한 도면, 그리고5 is a diagram illustrating a rule base and a hash table created using a hashing algorithm extracting the most significant 8 bits from the source and destination IP addresses, respectively; and

도 6은 매칭되는 룰을 찾기 위해 참조된 룰의 평균 개수적 측면에서의 분류 성능을 보여주는 그래프이다.6 is a graph showing classification performance in terms of average number of rules referenced to find a matching rule.

본 발명은 패킷 분류 방법에 관한 것으로, 특히 복수의 해싱단계를 통해 룰베이스를 독립적인 서브-룰베이스로 분할하여, 입력되는 패킷을 연관된 서브-룰베이스 내에서만 분류를 수행할 수 있게 함으로써 패킷을 분류하는데 소요되는 검색시간을 크게 단축시킬 수 있도록 한 계층적 룰베이스 분할을 통한 패킷 분류 방법에 관한 것이다.The present invention relates to a packet classification method. In particular, the rule base is divided into independent sub-rule bases through a plurality of hashing steps, so that the packet can be classified only in an associated sub-rule base. The present invention relates to a packet classification method through hierarchical rulebase partitioning to greatly reduce the search time required for classification.

인터넷 망의 발전은 양적으로 대역폭의 증가와 질적으로 다양한 서비스의 제공으로 나타나고 있다. 인터넷 망의 양적 발전을 살펴보면, 인터넷 망이 보급된 이래 대역폭은 6개월에 두 배씩 증가하여 네트워크 중심부는 OC-768(40 Gbps)이 구현되고, 네트워크 주변부는 기가비트의 속도를 지닌 이더넷이 보급되고 있다. 또한, 인터넷 망의 질적인 면에서, 인터넷 사용자가 증가함에 따라 제공되는 서비스가 다양해졌다. 이는 네트워크를 통해 교환되는 패킷들이 서로 다른 처리를 요구함을 뜻하며, Diff-Serv, 방화벽, 가상 사설 네트워크(Virtual Private Network: VPN) 등에 응용된다. 이와 같이, 네트워크망의 크기가 증가함에 따라 빠른 속도로 패킷을 분류하는 작업의 필요성이 중요해지고 있다. The development of the internet network has been shown to increase the bandwidth in quantity and to provide various services in quality. Looking at the quantitative development of the Internet network, the bandwidth has doubled in six months since the Internet network was deployed, OC-768 (40 Gbps) is implemented in the center of the network, Ethernet with Gigabit speed in the periphery of the network . In addition, in terms of the quality of the Internet network, as the number of Internet users increases, the services provided vary. This means that packets exchanged through the network require different processing, and are applied to Diff-Serv, firewalls, and virtual private networks (VPNs). As such, as the size of the network increases, the necessity of the task of classifying packets at high speed becomes important.

종래의 패킷 분류 방법은 페이로드(payload)보다는 헤더를 기준으로 수행되고 있다. 일반적으로 3계층의 프로토콜과, IP 주소, 4계층의 포트번호의 5개 필드를 참조하여 패킷 분류작업을 수행하며, 이 5개의 필드를 5-tuple라고 한다. 패킷분류의 실제적인 방법은 룰베이스를 정하고 이에 패킷을 매칭시켜 분류작업을 수행한다.The conventional packet classification method is performed based on a header rather than a payload. In general, packet classification is performed by referring to five fields of a three-layer protocol, an IP address, and a four-layer port number. The five fields are called 5-tuple. In the practical method of packet classification, the rule base is determined and the packets are matched to perform classification.

패킷이 외부로부터 분류작업을 수행하는 게이트웨이 또는 방화벽에 들어오게 되면, 패킷 분류기에서는 패킷 헤더의 5필드를 추출하여 위의 0번 룰부터 차례로 매칭을 확인하게 된다. IP는 CIDR(Classless Inter Domain Routing) 방식에 의해 가변 길이의 마스크를 갖고, 포트번호는 영역으로 표현된다. 여기서, CIDR 방식은 하나의 IP 클래스를 할당 받아서 조직의 규모에 맞게 여러개의 서브넷으로 쪼개서 쓰는 방식을 말한다. When a packet enters a gateway or firewall that performs a sorting operation from the outside, the packet classifier extracts five fields of the packet header and checks the matching sequentially from rule 0 above. IP has a variable length mask by CIDR (Classless Inter Domain Routing), and the port number is represented by an area. Here, the CIDR method refers to a method in which one IP class is assigned and divided into several subnets according to an organization's size.

위와 같은 특성은 패킷 분류를 더욱 어렵게 만드는 요인이 된다. 하나의 룰베이스 안에서 각각의 룰들은 우선 순위가 정해져 있어서, 처음 매칭되는 룰을 발견하면 더 이상 찾지 않고 그에 연관된 결과에 따라 패킷을 처리하게 된다. 일반적으로 룰베이스의 크기는 작은 기관이나 단체에서 수 백개, 큰 기관이나 회사 등에서 수천 개의 룰로 이루어진다.The above characteristics make the packet classification more difficult. Each rule in a rulebase is prioritized, so when it finds the first matching rule, it no longer finds it and processes the packet accordingly. Generally, rule bases are made up of hundreds of small organizations or organizations, and thousands of rules of large organizations or companies.

서비스의 종류가 많아지고 룰들이 더욱 정교해지는 있다는 점과 여러 네트워크 망이 하나로 합쳐져 관리될 때, 여러 룰베이스 역시 하나로 통합된다는 점에서 룰베이스의 크기가 점점 더 커지고 있다. As the types of services increase and the rules become more sophisticated, and when multiple network networks are managed together, the rule base becomes larger in size because multiple rule bases are also integrated into one.

종래의 패킷 분류 방법으로는 다음과 같은 방법들이 있다. Conventional packet classification methods include the following methods.

1) TCAM을 사용하여 하드웨어적으로 처리하는 방법, 2) RFC를 사용하여 패킷이 룰에 매칭되는 패턴을 간략화시켜 나가는 방법, 3) 룰베이스를 비트 벡터로 표현한 뒤, 이를 분할하여 매칭된 룰을 찾는 방법, 4) 룰베이스를 나무 구조(tree)로 표현한 후 자주 매칭되는 룰을 찾는 방법 등이 있다.1) Hardware processing using TCAM, 2) Simplification of pattern matching packet to rule using RFC, 3) Representation of rulebase as bit vector and segmented to match matching rule Finding method, 4) The rule base is expressed as a tree, and then the matching rule is found.

이 중 1)TCAM을 사용한 방법이나 2)RFC 방법 등은 룰의 재조직 과정에서 저장공간과 전처리 시간에서 제약이 심하다는 단점을 갖고 있다. 이와 같은 이유로 인해 종래의 패킷 분류방법은 5000(=5K)개의 룰을 한계점으로 하여, 그 보다 큰 개수의 룰을 갖는 룰베이스를 처리하기 힘들다. 한편, 3) ABV 방법은 만 단위의 룰베이스를 처리할 수 있는 유일한 방법이다. 그러나 효용성이 많이 떨어져, 대략 2만 개의 룰베이스를 분류하는데 평균 140번의 검색 테이블의 참조가 필요하므로 패킷 분류에 많은 시간이 소요되는 문제점이 있다. Among them, 1) TCAM method and 2) RFC method have disadvantages of severe limitation in storage space and preprocessing time during rule reorganization. For this reason, the conventional packet classification method makes it difficult to process a rule base having a larger number of rules with 5000 (= 5K) rules as a threshold. 3) The ABV method is the only method that can handle 10,000 rulebases. However, due to its poor utility, there is a problem in that packet classification takes a long time because an average of 140 search table references is required to classify approximately 20,000 rule bases.

또한, 이와 같은 종래의 패킷 분류방법들의 대부분은 전처리(preprocess) 방법으로 룰베이스를 재조직하는 과정이 필요하기에 둘 이상의 패킷 분류 알고리즘을 적용하기 힘든 문제점이 있다. In addition, most of the conventional packet classification methods have a problem that it is difficult to apply two or more packet classification algorithms because a process of reorganizing the rule base is required as a preprocessing method.

따라서, 본 발명의 목적은 효율적인 해시키 선택 알고리즘을 사용하여 룰베이스를 작은 크기의 독립적인 서브-룰베이스로 계층적으로 분할함으로써 검색 효율의 향상을 도모할 수 있도록 한 계층적 룰베이스 분할을 통한 패킷 분류 방법을 제공하기 위함이다. Accordingly, an object of the present invention is to provide an efficient hash selection algorithm and hierarchically divide a rulebase into small independent sub-rulebases to improve search efficiency. This is to provide a packet classification method.

이를 위해, 본 발명은 다양한 해시키 알고리즘을 제안하였으며, 가장 효율적인 알고리즘으로 해시키의 알고리즘을 최대화시키는 해시키를 선택하는 최대 엔트로피 해싱을 제안한다.To this end, the present invention proposes a variety of hashing algorithms, and suggests the maximum entropy hashing that selects the hashing that maximizes the hashing algorithm with the most efficient algorithm.

또한, 본 발명의 다른 목적은 룰베이스의 크기가 증가하는 경향에 발맞추어, 기존의 패킷 분류 방법으로는 처리하기 어려운 대형 룰베이스를 갖는 분류기에 적용할 수 있는 패킷 분류 방법을 제공하기 위함이다.Another object of the present invention is to provide a packet classification method that can be applied to a classifier having a large rule base that is difficult to process by the existing packet classification method in keeping with the tendency of increasing the size of the rule base.

상기 목적을 달성하기 위한 본 발명에 따른 최대 엔트로피 해싱을 이용한 확장가능한 패킷 분류 방법은 외부 네트워크로부터 입력되는 다수의 패킷을 소정 룰베이스에 포함된 다수의 룰 중 각각의 룰에 매칭되도록 분류하기 위한 패킷 분류 방법에 있어서, 룰베이스를 소정 조건을 기준으로 다수의 독립적인 서브-룰베이스로 분할하고, 이를 기초로 해시 테이블을 생성하는 전처리 단계, 및 입력되는 패킷의 패킷 헤더로부터 추출된 해시키를 이용하여 해시 테이블을 검색하고, 이에 따라 패킷을 상기 패킷과 대응하는 서브-룰베이스에 매핑함으로써 상기 패킷을 분류하는 분류 단계를 포함하는 것이 바람직하다.Extensible packet classification method using the maximum entropy hashing according to the present invention for achieving the above object is a packet for classifying a plurality of packets input from the external network to match each of a plurality of rules included in a predetermined rule base In the classification method, using a preprocessing step of dividing a rule base into a plurality of independent sub-rule bases based on a predetermined condition, and generating a hash table based on the rule base, and using hashing extracted from a packet header of an incoming packet. And a classification step of classifying the packet by retrieving a hash table and thus mapping the packet to a sub-rulebase corresponding to the packet.

여기서, 전처리 단계는, 룰베이스를 소정 프로토콜 및 소정 포트 번호를 기준으로 다수의 서브-룰베이스로 분할하고, 이를 기초로 해시 테이블을 생성하는 제1 분할 단계를 포함하는 것이 바람직하다.Here, the preprocessing step preferably includes a first partitioning step of dividing the rulebase into a plurality of sub-rulebases based on a predetermined protocol and a predetermined port number, and generating a hash table based thereon.

여기서, 프로토콜 및 포트 번호에 따라 특정한 애플리케이션이 정해지는 것이 바람직하다.Here, it is preferable that a specific application be determined according to the protocol and the port number.

여기서, 전처리 단계는 다수의 서브-룰베이스 중 소정의 서브-룰베이스에 속한 룰의 개수가 미리 설정된 소정 문턱값을 초과하는 경우, 소정의 서브-룰베이스를 출발지 및 목적지 IP 주소로부터 소정 방식으로 추출한 해시키를 이용하여 서브-룰베이스를 재분할하고, 이를 기초로 해시 테이블을 생성하는 제2 분할 단계;를 더 포함하는 것이 바람직하다.Here, in the preprocessing step, when the number of rules belonging to a predetermined sub-rulebase among the plurality of sub-rulebases exceeds a predetermined threshold value, the predetermined sub-rulebase is set in a predetermined manner from the source and destination IP addresses. It is preferable to further include a second partitioning step of repartitioning the sub-rulebase using the extracted hashing and generating a hash table based on the sub-rulebase.

여기서, 소정 방식은, MSB pattern, Exponential growing pattern, Mask distribution pattern, 및 Entropy-maximizing pattern 중 어느 하나인 것이 바람 직하다.Here, the predetermined method is preferably any one of an MSB pattern, an exponential growing pattern, a mask distribution pattern, and an entropy-maximizing pattern.

여기서, MSB pattern 방식은, 목적지 IP 주소 및 출발지 IP 주소의 최상위 비트로부터 소정 개수의 비트를 선택한다.Here, the MSB pattern method selects a predetermined number of bits from the most significant bits of the destination IP address and the source IP address.

여기서, Exponential growing pattern 방식은, 목적지 IP 주소 및 출발지 IP 주소로부터 2의 지수함수에 대응하는 비트 위치를 선택한다.Here, the exponential growing pattern method selects a bit position corresponding to an exponential function of 2 from a destination IP address and a source IP address.

여기서, Mask distribution pattern 방식은, 각각의 비트 위치 bi에서, 상기 룰베이스에 속해있는 정의된 모든 룰에서 돈-케어 비트가 아닌 비트의 수를 합하여 각 비트 위치에서의 누산값을 산출하고, 상기 누산값을 최상위 비트로부터 최하위 비트까지 합산하여 총 누산값을 산출한 후, 소정 비트 위치에서의 누산값이 총 누산값을 K로 나눈 값의 배수가 되는 경우마다, 상기 소정 비트 위치를 선택한다.Here, the mask distribution pattern method, at each bit position bi, sums the number of bits that are not money-care bits in all defined rules belonging to the rule base, and calculates an accumulation value at each bit position. After the value is added up from the most significant bit to the least significant bit to calculate the total accumulated value, the predetermined bit position is selected whenever the accumulated value at the predetermined bit position becomes a multiple of the total accumulated value divided by K.

여기서, K는 생성하고자 하는 해시키의 비트값인 것이 바람직하다.Here, K is preferably a bit value of the hash to be generated.

여기서, Entropy-maximizing pattern 방식은, 목적지 IP 주소 및 출발지 IP 주소로부터 각각 엔트로피를 최대로 하는 소정 개수의 비트를 선택한다.Here, the Entropy-maximizing pattern method selects a predetermined number of bits that maximize the entropy from the destination IP address and the source IP address, respectively.

이하에서는 예시된 첨부도면을 참조하여 본 발명에 대해 설명한다.Hereinafter, the present invention will be described with reference to the accompanying drawings.

본 발명에 따른 패킷 분류 방법은 하나의 룰베이스에서 주어진 패킷과 매칭될 가능성이 있는 룰의 개수는 몇 개에 불과할 것이라는 가정에 기초한다. 다음의 표 1을 참조하여 설명한다.The packet classification method according to the present invention is based on the assumption that only a few rules are likely to match a given packet in one rulebase. It demonstrates with reference to following Table 1.

표1은 전형적인 방화벽의 룰베이스를 나타낸다.Table 1 shows the rule base of a typical firewall.

룰 Rule 프로토콜 protocol 출발지포트 Departure Port 목적지포트 Destination port 출발지 IP Origin IP 도착지 IP Destination IP 조치(ACTION) ACTION 룰의 목적 Rule purpose R0 R0 * * * * * * 내부 inside 내부 inside Deny Deny Protection against Spoofing AttacksProtection against Spoofing Attacks R1 R1 TCP TCP 1024~65535 1024-65535 80 80 외부 Out 내부 inside Accept Accept HTTP Service HTTP Service R2 R2 TCP TCP 1024~65535 1024-65535 23 23 외부 Out 내부 inside Accept Accept Telnet Service Telnet service R3 R3 TCP TCP 1024~65535 1024-65535 21 21 외부 Out 내부 inside Accept Accept FTP Service FTP Service D D * * * * * * * * * * Deny Deny Default Rule Default Rule

표1에서, '내부'는 방화벽에 의해 보호되는 지역 네트워크(local network)를 말하며, '외부'는 방화벽에 의해 내부 네트워크와 분리된 네트워크를 말한다.In Table 1, 'inside' refers to a local network protected by a firewall, and 'outside' refers to a network separated from the internal network by a firewall.

내부의 네트워크는 HTTP, Telnet, FTP 등과 같은 여러가지 애플리케이션 서비스를 수행한다. 제1 내지 제3 룰(R1,R2,R3)은 소정 조건하에서 다양한 애플리케이션 서비스의 접속요청을 받아들이고, 제0 룰(R0)은 스푸핑 공격(Spoofing Attack)으로부터 내부 내트워크를 보호한다. 스푸핑 공격은 바로 자기 자신의 식별 정보를 속여 다른 대상 시스템을 공격하는 기법을 말한다. 한편, D는 외부와의 모든 통신을 배제하기 위한 디폴트 룰이다. 테이블 1에서, UDP 프로토콜을 이용한 패킷은 R0 또는 D에만 매치될 수 있음을 의미한다. 따라서, R1 내지 R3는 UDP 패킷에 매치될 필요가 없다.The internal network performs various application services such as HTTP, Telnet, FTP, and so on. The first to third rules R1, R2, and R3 accept connection requests of various application services under predetermined conditions, and the 0th rule R0 protects the internal network from spoofing attacks. Spoofing is a technique that tricks one's own identifying information into attacking another target system. Meanwhile, D is a default rule for excluding all communication with the outside. In Table 1, it is meant that packets using the UDP protocol can only match R0 or D. Thus, R1 through R3 do not need to match UDP packets.

도 1은 본 발명에 따른 계층적 룰베이스 분할을 통한 패킷 분류 방법의 개념을 나타내기 위한 도면이다. 도 1을 참조하면, 본 발명에 따른 계층적 룰베이스 분 할을 통한 패킷 분류 방법은 전처리 및 분류의 두 단계로 수행된다. 1 is a diagram illustrating a concept of a packet classification method through hierarchical rulebase partitioning according to the present invention. Referring to FIG. 1, the packet classification method through hierarchical rulebase division according to the present invention is performed in two stages of preprocessing and classification.

먼저, 전처리 단계에서는, 분류 필드로부터 선택된 비트 필드들을 해싱함으로써 본래의 룰베이스를 작은 크기의 독립적인 서브-룰베이스로 계층적으로 분할(partition)한다. 분할의 정도는 분류공간에서 서브-룰베이스의 밀도에 의존한다. 분류공간에서 서브-룰베이스의 밀도가 높으면 높을수록, 더욱 더 분할이 요구된다. 모든 서브-룰베이스들이 충분히 작아지면, 이와 같은 분할은 중단된다. First, in the preprocessing step, the original rule base is hierarchically partitioned into small independent sub-rule bases by hashing the bit fields selected from the classification field. The degree of partitioning depends on the density of the sub-rulebases in the classification space. The higher the density of the sub-rulebase in the classification space, the more segmentation is required. If all the sub-rulebases are small enough, this split stops.

다음으로, 분류 단계에서, 패킷 분류기는 전처리 단계에서 이용된 해시키와 동일한 해시키를 이용하여 들어오는 패킷을 검사하여, 패킷과 관련된 서브-룰베이스를 규명함으로써, 입력되는 패킷을 대응되는 룰에 매칭시킨다. 매칭되는 룰을 찾기 위한 검색은 오직 마지막 서브-룰베이스에서만 수행되며, 기존의 검색 알고리즘이 이용될 수 있다.Next, in the classification step, the packet classifier examines the incoming packet using the same hashing used in the preprocessing step, identifying the sub-rulebase associated with the packet, thereby matching the incoming packet to the corresponding rule. Let's do it. The search to find matching rules is performed only in the last sub-rulebase, and existing search algorithms can be used.

이하에서는 전처리 단계에 대하여 상세히 설명한 후, 이어서 분류 단계에 대해 기술할 것이다.Hereinafter, the pretreatment step will be described in detail, and then the classification step will be described.

1. 전처리 단계1. Pretreatment Step

1-1. 제1 분할 단계1-1. First partitioning step

먼저, 전처리 단계의 제1 분할 단계에서는, 프로토콜의 종류와 포트 넘버를 기준으로 룰베이스를 많은 수의 독립적인 서브-룰베이스로 분할된다. 예를 들어, HTTP, FTP, SMTP 트래픽을 제어하는 룰들은 별개의 독립적인 서브-룰베이스로 분할될 수 있다. 그 후, 입력되는 패킷의 헤더 필드를 검색함으로써, 입력되는 패킷의 프로토콜 필드와 동일한 프로토콜 필드를 가지는 서브-룰베이스만을 검색한다.. First, in the first partitioning step of the preprocessing step, the rule base is divided into a large number of independent sub-rule bases based on the protocol type and the port number. For example, rules governing HTTP, FTP, and SMTP traffic can be split into separate, independent sub-rulebases. Then, by searching the header field of the incoming packet, only the sub-rulebase having the same protocol field as the protocol field of the incoming packet is searched.

전처리 단계에서 룰베이스의 분할을 위해 분류공간에서 소정 비트의 해시키를 선택한다. 전처리 단계에서 분류공간은 40비트로 구성된다. 여기서, 40비트는 프로토콜 필드 8비트, 소스 포트 16비트, 목적지 포트 16비트로부터 추출된다. 즉, 룰베이스의 분할을 위한 해시키로서 프로토콜 넘버와 포트 넘버만이 고려된다. 왜냐하면, 프로토콜 넘버와 포트 넘버는 룰들에 의해 제어되는 인터넷 서비스에 기초하여 룰들을 자연스럽게 분류할 수 있기 때문이다. 예를 들어, HTTP 서비스는 6번 프로토콜과 80번 서버 포트에 대응한다. In the preprocessing step, a hash of a predetermined bit is selected in the classification space for partitioning the rule base. In the preprocessing stage, the classification space consists of 40 bits. Here, 40 bits are extracted from the protocol field 8 bits, the source port 16 bits, and the destination port 16 bits. That is, only protocol number and port number are considered as a solution for partitioning the rule base. This is because the protocol number and the port number can naturally classify the rules based on the Internet service controlled by the rules. For example, the HTTP service corresponds to protocol 6 and server port 80.

8비트의 해시키가 선택되면, 2⁸(=256)개의 엔트리를 가진 해시 테이블을 만들수 있다. 여기서, 각각의 엔트리는 서브-룰베이스를 의미한다. 한편, 해시 테이블의 크기를 제한하기 위해서, 분류공간으로부터 해시키로서 하나의 서브셋을 선택한다. 이 때, 해시 테이블의 크기가 너무 커지지 않도록 해시키를 선택하는 것이 바람직하다. 예를 들어, 17비트의 해시키를 이용하는 경우, 이 것에 의해 128K 개의 엔트리를 가진 해시 테이블이 생성된다. 한편, 메모리 공간과 해시키의 크기에 의존하는 분할 계층의 깊이는 상충(trade-off)관계에 있다. If 8-bit hashing is selected, you can create a hash table with 2 ⁸ (= 256) entries. Here, each entry means a sub-rule base. On the other hand, in order to limit the size of the hash table, one subset is selected as a hash from the classification space. At this time, it is preferable to select hashing so that the size of the hash table is not too large. For example, using a hash of 17 bits, this produces a hash table with 128K entries. On the other hand, the depth of the partition hierarchy, which depends on the memory space and the size of the hash, is in a trade-off relationship.

17비트의 해시키를 이용하는 경우, 엔트로피를 최대화하는 키 선택 알고리즘을 이용해 프로토콜 필드로부터 최초의 6비트를 추출한다. 포트 넘버를 특정하는데 단지 2개의 프로토콜 즉, TCP와 UDP가 필요하므로, 상기 프로토콜들을 위해 포트 넘버로부터 11개의 추가적인 비트를 선택한다. When using 17-bit hashing, the first 6 bits are extracted from the protocol field using a key selection algorithm that maximizes entropy. Since only two protocols, TCP and UDP, are needed to specify the port number, we select 11 additional bits from the port number for these protocols.

포트 넘버는 출발지와 목적지 중 어느 하나를 선택하기 위해 한 비트를 사용 하는데, 일반적으로 밀도가 높은 서버 포트를 선택하는 것이 바람직하다. 선택되어진 포트가 서버 포트인 경우, 그 포트 필드로부터 추가의 10LSBs(Least Significant Bits)를 선택하며, 선택되어진 포트가 상위 경계 및 하위 경계에 의해 특정되어지는 클라이언트 포트인 경우, 그 포트 필드로부터 추가의 6MSBs(Most Significant Bits)를 선택하는 것이 바람직하다. The port number uses one bit to select either the source or the destination. It is generally desirable to choose a denser server port. If the selected port is a server port, select additional 10 LSBs (Least Significant Bits) from that port field. If the selected port is a client port specified by the upper and lower bounds, then the additional port is selected from that port field. It is preferable to select 6 MSBs (Most Significant Bits).

전형적으로, 밀도가 높은 서버 포트는 0과 1023 사이의 특정된 포트 넘버를 명시하고 있다. 반면에 밀도가 낮은 클라이언트 포트는 1024에서 65536까지 무작위의 포트 넘버를 이용한다. 따라서, 서버 포트를 위해 하위 10비트가 이용되며, 클라이언트 포트를 위해 상위 6비트가 이용된다. Typically, dense server ports specify specified port numbers between 0 and 1023. Low density client ports, on the other hand, use random port numbers from 1024 to 65536. Therefore, the lower 10 bits are used for the server port and the upper 6 bits are used for the client port.

즉, 해시키는 [프로토콜 필드], [방향 비트] 및 [서버 포트의 하위 10비트 또는 클라이언트 포트의 상위 6비트]와 관련되어 있다. 서버 포트는 클라이언트 포트보다 더 높은 분할 효율을 가지므로, 룰이 출발지 포트나 목적지 포트 중 어느 하나에 속해있는 서버 포트를 특정할 수 있으면, 방향에 관계없이 서버 포트를 이용한다. 룰이 출발지 포트 및 목적지 포트에 속해있는 클라이언트 포트를 특정할 수 있으면, 6MSBs를 이용하여 출발지 및 목적지의 해시 테이블에 속해있는 룰들을 분산시킨다.Namely, the protocol field, direction bits, and the lower 10 bits of the server port or the upper 6 bits of the client port. The server port has higher partitioning efficiency than the client port, so if the rule can specify a server port belonging to either the source port or the destination port, the server port is used regardless of the direction. If the rule can specify the client port belonging to the source port and the destination port, 6MSBs are used to distribute the rules belonging to the hash table of the source and destination.

직관적으로, 두 개의 룰이 동일한 서브-룰베이스에 매핑된다면, 두 개의 룰은 서로 중복될 수 있으나, 두 개의 룰이 다른 서브-룰베이스에 매칭된다면 두 개의 룰은 결코 중복될 수 없을 것이다. 이 것은 두 개의 룰이 독립적이라는 것을 의미한다. 문턱치보다 크기가 큰 서브-룰베이스의 경우, 첫 번째 해시키와 다른 해시 키를 가지고 분할될 수 있다. 여기서, 문턱치란 서브-룰베이스당 룰의 개수를 의미한다.Intuitively, if two rules map to the same sub-rulebase, the two rules may overlap each other, but if two rules match the other sub-rulebase, the two rules will never overlap. This means that the two rules are independent. For sub-rulebases that are larger than the threshold, they may be split with a hash key different from the first hash. Here, the threshold means the number of rules per sub-rule base.

이와 같은 계층적 분할은 모든 서브-룰베이스가 충분히 작아질때 까지 계속된다. 그러나, 50만개 이하의 룰을 가진 룰베이스에서는 두 단계의 분할로 충분하다는 실험결과를 얻었다. 본 패킷 분류 방법의 시공간적 복잡성은 분할 계층의 깊이와 노드의 수에 의존한다. This hierarchical partitioning continues until all sub-rulebases are small enough. However, we found that the rule base with less than 500,000 rules is sufficient to divide the two steps. The spatiotemporal complexity of this packet classification method depends on the depth of partition layer and the number of nodes.

따라서, 시공간적 복잡성을 줄이기 위해서는 룰베이스의 분할 횟수를 줄여야 하며, 이를 위해 룰베이스를 가능한 균등하게 서브-룰베이스로 분할해야 한다. 그래서, 비어있는 룰베이스의 수를 최소화하고, 서브-룰베이스에 속한 룰의 갯수가 균등한 분포를 이루도록 해야한다. 이와 같은 룰베이스의 분할은 해시키 선택 알고리즘에 따라 그 효율성이 좌우된다.Therefore, in order to reduce the spatiotemporal complexity, the rule base needs to be partitioned and the rule base should be divided into sub-rule bases as evenly as possible. Therefore, the number of empty rulebases should be minimized, and the number of rules belonging to the sub-rulebase should be equally distributed. The partitioning of such a rulebase depends on the efficiency of the selection algorithm.

도 2는 제1단계의 분할 결과를 도시한 도면으로, 분할 후에 서브-룰베이스의 평균 크기 및 최대 크기를 나타낸다. 분할 후에, 실질적으로 룰베이스의 평균 크기는 감소된다. 1만 개(10K), 10만 개(100K), 50만 개(500K)의 룰을 가진 룰베이스에서 룰베이스의 감소 비율은 각각 0.0028, 0.0014 및 0.0014이다. 그러나, 이 도면에서 룰베이스의 최대 크기에서 알 수 있는 것처럼, 룰들은 분할된 룰베이스에 균등하게 분포되지 않는다. FIG. 2 is a diagram showing a result of division in the first step, and shows the average size and the maximum size of the sub-rulebase after the division. After partitioning, the average size of the rule base is substantially reduced. In rulebases with 10,000 (10K), 100,000 (100K), and 500,000 (500K) rules, the reduction ratios of rulebases are 0.0028, 0.0014, and 0.0014, respectively. However, as can be seen from the maximum size of the rule base in this figure, the rules are not evenly distributed in the divided rule base.

실험에 사용된 모든 룰베이스에서 가장 큰 서브-룰베이스는 본래 룰베이스의 약 24 퍼센트에 해당하는 룰들을 가지고 있다. 예측한 대로, 이들 룰들은 프로토콜 6과 포트넘버 80에 대응되는 HTTP 서비스와 관련되어 있다. 1만 개(10K), 10만 개 (100K), 50만 개(500K)의 룰을 가진 룰베이스에서 문턱치(서브-룰베이스 당 16개의 룰)를 넘는 서브-룰베이스의 개수는 각각 141, 192, 1114 개이다. 이들 서브 룰베이스에 대해서는 제2단계 분할을 수행한다.The largest sub-rulebase in all rulebases used in the experiment has rules that are about 24 percent of the original rulebase. As expected, these rules relate to HTTP services corresponding to protocol 6 and port number 80. In rulebases with 10,000 (10K), 100,000 (100K), and 500,000 (500K) rules, the number of sub-rulebases above the threshold (16 rules per sub-rulebase) is 141, 192, 1114. The second stage division is performed on these subrule bases.

1-2. 제2 분할 단계1-2. Second partitioning step

제2 분할 단계는 제1 분할 단계가 완료된 후, 문턱치보다 더 큰 버킷들의 경우에만 적용된다. 제2 분할 단계의 해시키는 제1 분할 단계의 해시키에서 유래된 것이므로, 제2 분할 단계에서는 출발지 IP 주소 및 목적지 IP 주소만을 기준으로 룰베이스를 분할한다. IP 주소는 출발지와 목적지 주소를 합쳐 64비트로 구성되며, 이 중 적절한 수의 비트를 선택하여 룰베이스의 분할을 실행한다. 바람직하게는 출발지와 목적지를 합쳐 16개 정도의 비트를 선택하여 분할한다. 이 때, 분할을 위해 선택한 비트는 최종적인 결과에 직접적인 영향을 미치므로, 최적의 비트 패턴을 찾기 위해 본 발명에서는 다음의 4가지 해시키 선택 알로리즘을 제안한다.The second splitting step applies only to buckets larger than the threshold after the first splitting step is completed. Since the resolution of the second partitioning step is derived from the hashing of the first partitioning step, the second partitioning step splits the rule base based on only the source IP address and the destination IP address. The IP address is composed of 64 bits by combining the source and destination addresses, and selects the appropriate number of bits to partition the rule base. Preferably, the source and destination are combined to select and divide about 16 bits. At this time, since the bit selected for division has a direct influence on the final result, the present invention proposes the following four hash selection algorithms to find the optimal bit pattern.

1) MSB 패턴1) MSB Pattern

16비트의 해시키는 출발지 및 목적지 IP 주소로부터 8MSBs(Most Significance Bits)를 추출함으로써 생성된다. 이 알고리즘은 대부분의 프리픽스 마스크는 IP 주소 필드로부터 최초 몇 개의 의미있는 비트를 선택한다는 개념을 이용한 것이다. 이 간단한 해시키 알고리즘은 다른 해시키 알고리즘들의 성능을 비교하는 기준으로서 이용된다. 이 해시키 알고리즘의 시간 복잡성은 O(1)이며, 이 것은 룰베이스에 속한 룰의 개수와 독립적이라는 것을 의미한다.A 16-bit hash is generated by extracting 8 MSBs (Most Significance Bits) from the source and destination IP addresses. This algorithm uses the concept that most prefix masks select the first few meaningful bits from the IP address field. This simple hashing algorithm is used as a reference to compare the performance of other hashing algorithms. The time complexity of this hashing algorithm is O (1), which means that it is independent of the number of rules in the rulebase.

2) Exponential growing pattern (EXP)2) Exponential growing pattern (EXP)

이 알고리즘에서는 출발지 및 목적지 IP 주소로부터 2의 지수함수에 대응하는 비트 위치 즉, b1b2b4b8b16b32를 선택함으로써 12 비트의 해시키를 생성한다.In this algorithm, a 12-bit hash is generated by selecting a bit position corresponding to an exponential function of two, b1b2b4b8b16b32, from the source and destination IP addresses.

즉, 최상위 비트로부터 비트간의 거리가 2배씩이 되도록 선택된다. IP 주소 필드의 비트 위치가 낮아질수록, 마스크드 아웃(masked out)된 비트가 많을 것이라는 개념에 기반을 둔다. 바람직하게, 16비트의 해시키를 생성하기 위해 두 개의 비트 (b6b11)를 추가한다. 이 키 선택 알고리즘의 시간 복잡성 역시 0(1)이다.That is, the distance between the bits from the most significant bit is selected to be doubled. The lower the bit position of the IP address field is based on the concept that there will be more bits masked out. Preferably, two bits b6b11 are added to generate a 16-bit hash. The time complexity of this key selection algorithm is also 0 (1).

3) Mask distribution pattern (MASK)3) Mask distribution pattern (MASK)

이 알고리즘은 분류공간에서 돈-케어 비트는 어떤 정보도 제공하지 못한다는 개념을 이용한 것이다. 따라서, 분류공간에서 각각의 비트 bi는 돈-케어 비트 수의 역비율에 따른 정보를 가지고 있다. This algorithm takes advantage of the notion that money-care bits do not provide any information in the classification space. Therefore, each bit bi in the classification space has information according to the inverse ratio of the number of money-care bits.

이 해시키를 찾는 절차는 다음과 같다. 각각의 비트 위치 bi에서, 룰베이스에 속해있는 정의된 모든 룰에서 돈-케어 비트가 아닌 비트의 수를 합하고, 그 수를 MSB에서 LSB까지 합산한다. K-비트의 해시키를 생성하기 위해서, 비트 위치의 누산값이 총 누산값을 K로 나눈 값의 배수가 되는 경우마다 하나의 비트가 선택된다. 룰베이스에 속해있는 룰의 총 개수가 n인 경우 이 알고리즘의 시간 복잡성은 0(kn)이다. 실험결과에 사용된 K는 16이다.The procedure for finding this solution is as follows. At each bit position bi, sum the number of non-money-care bits in all defined rules belonging to the rule base, and add the numbers from the MSB to the LSB. To generate a K-bit hash, one bit is selected whenever the accumulated value of the bit position is a multiple of the total accumulated value divided by K. If the total number of rules in the rule base is n, the time complexity of this algorithm is 0 (kn). The K used in the experimental results is 16.

4) Entropy-maximizing pattern (Ent)4) Entropy-maximizing pattern (Ent)

좋은 해시키를 찾기 위해, 본 발명의 바람직한 실시예에서는 엔트로피의 개념을 이용한다. 좋은 해시키는 널 엔트리의 수와 각각의 서브-룰베이스에서 개체수의 차이가 최소화되도록 해시 테이블에 속한 룰들을 균등하게 분배해야 한다. 잘 알려진 바와 같이, 모든 엔트리의 발생 가능성이 동일한 경우에, 엔트로피는 최대가 된다. 따라서, 엔트로피 계산을 통해 룰베이스를 균등 분할할 수 있는 좋은 해시키를 찾아야 한다. 이하에서는 적절한 해시키를 찾기 위한 최대 엔트로피 계산에 대해 살펴본다.To find a good harm, the preferred embodiment of the present invention uses the concept of entropy. In order to minimize the difference between the number of null entries and the number of objects in each sub-rulebase, we need to distribute the rules in the hash table evenly. As is well known, entropy is maximized when the probability of occurrence of all entries is the same. Therefore, we need to find a good solution to divide the rulebase evenly through entropy calculation. The following describes the maximum entropy calculation to find the appropriate solution.

이 방법은 해시키의 길이가 1인 경우에서부터 시작한다. 해시키의 대상이 되는 필드의 각 비트의 엔트로피 계산을 수행하여, 이들 비트 중에서 최대 엔트로피를 갖는 비트 위치를 결정한다. 그 후, 해시키의 길이가 2인 경우 다시 위의 과정을 반복한다. 즉, 선택된 1 비트 해시키를 포함하는 2 비트 해시키의 모든 경우에 대하여 엔트로피 계산을 수행하여 추가된 비트 중에서 최대 엔트로피를 갖는 또 하나의 비트 위치를 결정한다. This method starts with a case where the length of the hash is one. An entropy calculation of each bit of the field to be hashed is performed to determine the bit position having the maximum entropy among these bits. After that, if the length of the hashing is 2, the above process is repeated again. That is, entropy calculation is performed for all cases of 2-bit hashing, including the selected 1-bit hash, to determine another bit position with the largest entropy among the added bits.

이와 같은 반복과정은 해시키의 길이가 L에 도달하거나 엔트로피가 더 이상 증가하지 않을 때 까지 계속된다. 이와 같은 방식에 의하여 출발지 IP 주소와 목적지 IP 주소로부터 각각 8비트의 해시키를 선택하여 16비트의 해시키를 생성한다. 이 알고리즘의 시간 복잡성은 O(n[s(2w-s+1)/s])이다. 여기서, w는 분류공간의 길이, s는 해시키의 길이, n은 룰베이스에 속한 룰의 총 개수를 말한다.This iteration continues until the length of the hash reaches L or the entropy no longer increases. In this manner, 16-bit hashes are generated by selecting 8-bit hashes from the source IP address and the destination IP address, respectively. The time complexity of this algorithm is O (n [s (2w-s + 1) / s]). Here, w is the length of the classification space, s is the length of the hash, n is the total number of rules belonging to the rule base.

도 3과 도 4는 다양한 해시키 선택 알고리즘을 이용해 제2 단계 분할을 수행한 후의 결과를 도시한 도면이다. 도 3은 서브-룰베이스 당 룰의 평균개수를 보여주며, 도 4는 가장 큰 룰베이스의 크기를 보여준다. 최대 엔트로피 해시키 알고리즘을 이용한 경우, 제2 단계 분할은 각각 1만 개(10K), 10만 개(100K), 50만 개(500K)의 룰을 가진 룰베이스에 대해 제1단계 분할보다 각각 0.054, 0.054, 0.052 의 비율만큼 감소되었다. 3 and 4 illustrate the results after performing the second stage division using various hash selection algorithms. FIG. 3 shows the average number of rules per sub-rulebase, and FIG. 4 shows the size of the largest rulebase. In the case of using the maximum entropy resolution algorithm, the second stage partitioning is 0.054 respectively than the first stage partitioning for rulebases having 10,000 (10K), 100,000 (100K), and 500,000 (500K) rules, respectively. , 0.054, 0.052.

제1단계 분할과 제2단계 분할을 모두 마친 후에, 각각 1만 개(10K), 10만 개(100K), 50만 개(500K)의 룰을 가진 룰베이스는 서브-룰베이스 당 각각 평균 1.6, 7.6, 36.6개의 룰을 가진 서브-룰베이스로 분할되었다. 이 것은 0.00016, 0.000076, 0.000073의 비율만큼 감소된 것을 의미한다.After completing the first and second stage splits, rulebases with 10,000 (10K), 100,000 (100K), and 500,000 (500K) rules each average 1.6 per sub-rulebase, respectively. It has been split into a sub-rule base with 7.6, 36.6 rules. This means that the ratio is reduced by 0.00016, 0.000076, and 0.000073.

제2단계 분할은 제1단계 분할 후에 본래 룰베이스의 24 퍼센트를 포함하는 거대한 서브-룰베이스를 줄이는데 효과적이다. 최대 엔트로피 해싱 알고리즘을 이용하여, 제2단계 분할을 한 후에, 각각 1만 개(10K), 10만 개(100K), 50만 개(500K)의 룰을 갖는 룰베이스에서 각각 31, 258, 1281개의 룰을 포함하도록 가장 큰 룰베이스의 크기를 줄일 수 있다. The second stage partitioning is effective to reduce the huge sub-rulebase that includes 24 percent of the original rulebase after the first stage partitioning. After the second stage segmentation using the maximum entropy hashing algorithm, 31, 258, and 1281 in rulebases with 10,000 (10K), 100,000 (100K), and 500,000 (500K) rules, respectively. The size of the largest rulebase can be reduced to include three rules.

이 것은 본래 룰베이스의 0.31, 0.26, 0.26 퍼센트에 해당한다. 이 것은 100K 개의 룰을 가지는 룰베이스에 대해 최악의 경우 하나의 패킷에 대해 256개의 룰을 확인해야 함을 의미한다. 가장 큰 룰베이스의 크기를 줄이는 데에는 제1단계 분할보다 제2단계 분할이 더욱 효과적이다. 제1 단계 분할은 룰을 제어하는 인터넷 서비스에 따라 본래의 룰베이스를 분할하기 때문에, 이러한 것이 기대된다. This corresponds to 0.31, 0.26, 0.26 percent of the original rule base. This means that for a rulebase with 100K rules, you should check 256 rules for one packet in the worst case. The second stage partitioning is more effective than the first stage partitioning in reducing the size of the largest rule base. This is expected because the first step partitioning divides the original rule base according to the Internet service controlling the rule.

따라서, HTTP 서비스를 제어하는 가장 큰 서브-룰베이스는 하나의 단일한 서브-룰베이스로 그룹핑된다. 반대로, 제2 단계 분할은 최대 엔트로피 해시키 알고리즘을 이용해 분류공간에서 룰의 분포를 고려함으로써 가장 큰 서브-룰베이스를 줄일 수 있다.Thus, the largest sub-rulebases controlling HTTP services are grouped into one single sub-rulebase. In contrast, the second stage partitioning can reduce the largest sub-rulebase by considering the distribution of rules in the classification space using a maximum entropy solution.

도 3과 도 4를 참조하면, 제2 단계의 분할에서 서로 다른 해시키 알로리즘의 효율성을 비교할 수 있다. 예상한 바와 같이, 최대 엔트로피 키를 선택함으로써 최적의 분할 결과가 도출되었다. 특히, 이 방법은 다른 키 선택 알고리즘과 비교해서, 가장 큰 서브-룰베이스의 크기를 줄일 수 있다. 최대 엔트로피 선택 알고리즘은 MSB 패턴 키 알고리즘에 비해 10K, 100K, 500K개의 룰을 가진 룰베이스에 대해서 2.38, 3.42, 3.49의 인자만큼 효과적이다. 모든 키 선택 알고리즘에 대한 서브-룰베이스의 평균 크기에 대한 비교 결과가 도시된다. 그러나, 최대 엔트로피 알고리즘은 모든 경우, 특히, 100K 개 이상의 룰을 가진 룰베이스에서 가장 효율적이다.3 and 4, the efficiency of different hashing algorithms can be compared in the division of the second stage. As expected, the optimal partitioning results were obtained by selecting the maximum entropy key. In particular, this method can reduce the size of the largest sub-rulebase, compared to other key selection algorithms. The maximum entropy selection algorithm is more effective than the MSB pattern key algorithm by a factor of 2.38, 3.42, 3.49 for a rulebase with 10K, 100K, and 500K rules. Comparison results are shown for the average size of the sub-rulebases for all key selection algorithms. However, the maximum entropy algorithm is most efficient in all cases, especially in rulebases with more than 100K rules.

도 3과 도 4에 도시된 룰베이스 분할 결과로부터 확장성을 판단하기는 어렵다. 왜냐하면, 룰베이스의 크기는 수평축에서 선형적으로 증가하지 않기 때문이다. 그러나, 실험결과, 룰베이스의 크기가 증가함에 따라 서브-룰베이스의 최대 크기 및 평균 크기가 선형적으로 증가함이 입증되었다. 이 결과는 단지 두 단계의 분할만을 수행한 것이다. 필요하다면, 서브-룰베이스의 크기를 더욱 감소시키기 위해 더 많은 단계의 분할이 수행될 수 있다. 이와 같이, 우리는 룰의 총 개수, 버킷의 총 개수, 문턱치를 넘는 버킷의 총 개수의 증가 비율 측면에서 본 알고리즘에 대한 메모리 요구를 조사했으며, 룰의 총 개수, 버킷의 총 개수, 문턱치를 넘는 버킷의 총 개수 모두 좋은 확장성을 보인다는 것을 확인했다.It is difficult to determine scalability from the rulebase partitioning results shown in FIGS. 3 and 4. This is because the size of the rule base does not increase linearly on the horizontal axis. However, the experiments proved that the maximum and average size of the sub-rulebase increases linearly as the size of the rulebase increases. The result is only a two-step split. If necessary, more steps of partitioning may be performed to further reduce the size of the sub-rulebase. As such, we examined the memory requirements for this algorithm in terms of the total number of rules, the total number of buckets, and the rate of increase of the total number of buckets above the threshold, and the total number of rules, the total number of buckets, and the thresholds. We have seen that the total number of buckets shows good scalability.

1-3. 프리픽스 마스크와 범위 특정을 위한 적응형 알고리즘1-3. Adaptive algorithm for prefix mask and range specification

지금까지는 패킷 분류와 매칭되는 정확한 값이 가정되었다. 그러나, 룰의 정 의는 종종 프리픽스 마스크나 특정 범위를 가진 필드들을 포함한다. 이하에서는, 본 발명이 이들 다른 필드 특정을 어떻게 처리할 수 있는가에 관해 설명한다.Until now, the exact value that matches the packet classification has been assumed. However, rule definitions often include prefix masks or fields with specific ranges. The following describes how the present invention can handle these other field specifications.

1) 프리픽스 마스크 필드1) Prefix Mask Field

이 것은 IP 주소 필드를 특정하는데 일반적으로 이용된다. 표 2는 프리픽스 마스크를 가진 룰베이스의 예를 나타낸다. This is commonly used to specify the IP address field. Table 2 shows an example of a rule base with a prefix mask.

Rule Rule Field Description (b0b1b2b3b4b5b6b7) Field Description (b0b1b2b3b4b5b6b7) R0 R0 0000 0000 0000 0000 R1 R1 0110 0000 0110 0000 R2 R2 1000 0000 1000 0000 R3 R3 1*10 0000 1 * 10 0000

돈-케어 비트는 *로 표시되어 있다. 여기서, 문제가 되는 것은 룰이 마스크를 가진 특정 필드를 포함한다면, 우리는 하나의 룰을 해시 테이블에 속한 엔트리의 배수로 증가시킬 필요가 있다는 것이다. The money-care bit is marked with an *. The problem here is that if a rule contains a specific field with a mask, we need to increment one rule by a multiple of the entries in the hash table.

표3은 표2에 도시된 룰베이스를 위해 구성될 수 있는 두 개의 다른 해시 테이블을 보여준다.Table 3 shows two different hash tables that can be configured for the rule base shown in Table 2.

Index Index Hash Key b0b1 Hash Key b0b1 Hash Key b0b2 Hash Key b0b2 00 00 R0 R0 R0 R0 01 01 R1 R1 R1 R1 10 10 R2, R3 R2, R3 R2 R2 11 11 R3 R3 R3 R3

해시키로 b0b1을 선택하면, R3는 10과 11의 인덱스를 가진 두 개의 엔트리로 분산될 수 있다. 왜냐하면, R3에서 b1은 돈-케어 텀이기 때문이다. 이하에서는 이와 같은 현상을 '룰 분산(Rule Spreading)'이라고 기술한다. 룰 분산으로 인해 룰이 중복됨에 따라 해시 테이블의 크기가 될 수 있다. If b0b1 is chosen as the hash, R3 can be distributed into two entries with indices of 10 and 11. This is because b1 in R3 is a money-care term. Hereinafter, this phenomenon is referred to as 'rule spreading'. Rule distribution can lead to the size of the hash table as rules are duplicated.

그러나, 해시키로 b0b2를 선택하면, 표 3에 도시된 바와 같은 룰 분산은 피할 수 있다. 이와 같은 룰 분산을 가능한 방지하기 위해, 최대 엔트로피 키 선택 알고리즘의 변형이 요구된다. 즉, 한 비트의 엔트로피를 계산할 때, 알고리즘은 선택된 비트를 위해 돈-케어 조건을 특정할 수 있도록 정의된 룰을 무시해야 한다. 이 것은 돈-케어 비트는 엔트로피 측면에서 시스템에 어떤 정보도 주지 못하기 때문이다.However, if b0b2 is selected as the hash, rule dispersion as shown in Table 3 can be avoided. In order to prevent such rule distribution as much as possible, a modification of the maximum entropy key selection algorithm is required. In other words, when calculating a bit of entropy, the algorithm must ignore rules defined to specify money-care conditions for the selected bit. This is because money-care bits do not give any information to the system in terms of entropy.

2) Range Field2) Range Field

이 것은 일반적으로 TCP 또는 UDP 포트를 특정하는데 이용된다. 위에서 설명한 바와 같이, 일반적으로 서버 포트는 0에서 1023까지의 특정 포트 넘버를 가리킨다. 이에 반해 클라이언트 포트는 1024에서 65536까지의 임의적인 포트 넘버를 가 리킨다. 여기서, 기본 개념은 범위 특정을 프리픽스 마스크 특정으로 변환하는 것이다.This is generally used to specify a TCP or UDP port. As described above, generally the server port points to a particular port number from 0 to 1023. In contrast, the client port points to an arbitrary port number from 1024 to 65536. Here, the basic concept is to convert range specification to prefix mask specification.

우리는 주어진 임의의 범위를 한 그룹의 프리픽스 마스크로 나누는 프리픽스 변화의 범위를 이용할 수 있다. 예를 들면, 16 비트 범위의 [1024, 65535]는 000001*, 00001*, 0001*, 001*, 01*, 1* 과 같은 6개의 프리픽스 마스크로 구분될 수 있다. 그러나, 이 방법은 본 알고리즘에서 과도한 룰 분산을 초래하며, 바람직하지 않다. We can use a range of prefix changes that divides any given range into a group of prefix masks. For example, [1024, 65535] in the 16-bit range may be divided into six prefix masks such as 000001 *, 00001 *, 0001 *, 001 *, 01 *, 1 *. However, this method leads to excessive rule distribution in the present algorithm, which is undesirable.

본 발명의 바람직한 실시예에서는 직접적인 정밀 그룹핑 방법(precision-directed grouping)을 제안한다. 특정 범위를 가진 룰을 룰의 배수값으로 정확히 구분하는 것이 바람직하다. 예를 들어, [71, 74]의 범위를 가진 룰은 71에서 74까지의 정확한 값을 가진 4개의 룰로 나눌 수 있다. 그러나, [49152, 65535]와 같이 넓은 범위를 가진 룰은 16,384개의 거대한 룰을 만들수 있다. 다행히도, 하나의 범위를 가진 TCP/UDP 포트는 일반적으로 편중되어 있다. 예를 들어, 준비된 포트의 총 개수가 [0, 49151]과 같이 매우 많은 경우에도 대부분의 룰에 이용되는 포트 넘버의 80 퍼센트는 3,999개 이하이다. A preferred embodiment of the present invention proposes a direct precision-directed grouping method. It is desirable to correctly divide a rule having a specific range into a multiple of the rule. For example, a rule with a range of [71, 74] can be divided into four rules with exact values from 71 to 74. However, a rule with a wide range, such as [49152, 65535], can produce 16,384 huge rules. Fortunately, TCP / UDP ports with a range are generally biased. For example, even when the total number of prepared ports is very large, such as [0, 49151], 80 percent of the port numbers used in most rules are 3,999 or less.

이 개념은 밀도 범위에 의존하는 서로 다른 개수의 룰을 그룹핑할 수 있다는 것이다. 바람직하게, 16비트를 가진 특정의 포트 범위에서, 1024개 이하의 각각의 포트에 대해 단일한 엔트리를 생성하면서 [0, 1023]의 고밀도 지역에서 해시키로 10 LSBs를 이용한다. 반면에, 엔트리 당 1024개의 포트를 가진 63개의 엔트리를 생성하면서, [1024, 65535]의 저밀도 지역을 위한 해시키로서 6MSBs를 이용한다. The concept is that you can group different numbers of rules depending on the density range. Preferably, in a particular port range with 16 bits, 10 LSBs are used to decompose in the high density region of [0, 1023] while creating a single entry for each port of 1024 or less. On the other hand, while generating 63 entries with 1024 ports per entry, 6MSBs are used as hashes for the low density regions of [1024, 65535].

2. 분류단계2. Classification stage

전처리 단계에서 룰베이스를 분할하고, 해시 테이블을 생성한 후에, 분류기는 입력되는 패킷을 이에 대응되는 서브-룰베이스에 매핑함으로써 룰베이스의 검색 범위를 좁힐 수 있다. 분류기는 패킷 헤더로부터 추출된 해시키를 이용하여 해시 테이블을 검색한다.After partitioning the rulebase in the preprocessing step and generating the hash table, the classifier can narrow the search range of the rulebase by mapping the incoming packet to the corresponding sub-rulebase. The classifier searches the hash table using hashes extracted from the packet headers.

도 5는 출발지와 목적지 IP 주소로부터 각각 최상위 8비트를 추출한 해시키 알고리즘을 이용하여 만든 룰베이스 및 해시 테이블을 도시한 도면이다.FIG. 5 is a diagram illustrating a rule base and a hash table created using a hashing algorithm extracting the most significant 8 bits from the source and destination IP addresses, respectively.

도 5의 (a)를 참조하면, R0에서 R3는 우선순위가 감소하는 순서로 나열된 것으로 가정한다. 패킷이 4개의 룰(R0~R3)과 매칭되지 않으면, 패킷은 우선 순위가 가장 낮은 룰인 D에 매칭된다. Referring to FIG. 5A, it is assumed that R0 to R3 are listed in descending order of priority. If the packet does not match the four rules R0 to R3, the packet is matched to D, which is the lowest priority rule.

본 발명의 바람직한 실시예에서는 5차원적 패킷 분류를 가정한다. 이 것은 패킷 헤더의 프로토콜 필드로부터 8비트, 소스 포트로부터 16비트, 목적지 포트로부터 16비트, 출발지 IP 주소로부터 32비트, 목적지 IP 주소로부터 32비트의 총 104 비트를 이용한다. 일반적으로 프로토콜은 8비트로 이루어지며, 그 종류에 따라 특정한 하나의 번호를 가지고 있다. In a preferred embodiment of the present invention, five-dimensional packet classification is assumed. This uses a total of 104 bits: 8 bits from the protocol field of the packet header, 16 bits from the source port, 16 bits from the destination port, 32 bits from the source IP address, and 32 bits from the destination IP address. In general, a protocol consists of 8 bits and has a specific number according to its type.

따라서, 프로토콜의 번호에 의해 프로토콜의 종류가 확인된다. 실제로 룰에 나타나는 프로토콜은 5~20가지 이하이다. 포트는 출발지 포트와 목적지 포트로 나누어진다. 각 포트는 16비트이며,고정된 하나의 번호 또는 구역을 갖는다. 특히, 방화벽을 기준으로 출발지와 목적지 중 한 곳은 서비스 포트로서 고정된 하나의 번 호를 갖고 다른 한 쪽은 클라이언트가 되어 1024부터 65535까지의 영역을 임시 포트로 갖는다.Therefore, the type of protocol is confirmed by the protocol number. In practice, there are 5 to 20 protocols that appear in rules. The port is divided into a source port and a destination port. Each port is 16 bits and has a fixed number or zone. In particular, based on the firewall, one of the origin and destination has a fixed number as a service port and the other becomes a client, with a temporary port in the range of 1024 to 65535.

도 5의 (b)를 참조하면, 분류공간에서 8개의 가장 중요한 비트(이하 '8MSBs'라함)만이 표현되는데, 이것은 프로토콜 필드와 같은 소정 헤더 필드를 나타낸다. Referring to FIG. 5B, only the eight most significant bits (hereinafter referred to as '8MSBs') in the classification space are represented, which represents a predetermined header field such as a protocol field.

도 5의 (a)에 도시된 룰베이스에서 8MSBs를 이용하여 룰베이스를 256개의 버킷으로 분할함으로써, 도 5의 (b)에 도시된 해시 테이블을 생성한다. 도 5의 (b)의 해시 테이블에서 하나의 서브-룰베이스에 속한 룰들은 다른 서브-룰베이스에 속한 룰들과 중복되지 않는다. In the rule base shown in FIG. 5A, the rule base is divided into 256 buckets using 8 MSBs to generate the hash table shown in FIG. 5B. In the hash table of FIG. 5B, rules belonging to one sub-rulebase do not overlap with rules belonging to another sub-rulebase.

패킷이 분류기에 도착했을 때, 분류기는 패킷 헤더로부터 8MSBs를 추출하고, 추출된 8MSBs를 해시 테이블의 인덱스로 이용한다. 패킷 분류에 이용되는 해시키는 룰베이스를 분할하는데 이용된 해시키와 동일하다. 해싱이 완료된 후에, 해시 테이블의 엔트리가 비어 있지 않다면, 패킷 분류는 엔트리에 속한 서브-룰베이스 내에서 수행될 수 있다. 해시 테이블의 엔트리가 비어 있다면, 디폴트 룰(D)이 매칭 룰이 된다. 도 5의 (c)는 해시키로서 b₃와 b₅를 사용하여 만든 해시 테이블을 나타낸다.When a packet arrives at the classifier, the classifier extracts 8 MSBs from the packet header and uses the extracted 8MSBs as an index of the hash table. The hashing used for packet classification is the same as the hashing used for partitioning the rulebase. After the hashing is completed, if the entry in the hash table is not empty, packet classification may be performed in the sub-rulebase belonging to the entry. If the entry of the hash table is empty, the default rule (D) becomes the matching rule. Fig. 5C shows a hash table created using b ₃ and b ₅ as hashes.

도 6은 매칭되는 룰을 찾기 위해 참조된 룰의 평균 개수적 측면에서의 분류 성능을 보여주는 그래프이다. 도 6을 참조하면, 최대 엔트로피 키 선택 알고리즘을 이용한 패킷 분류 결과는 각각 1만 개(10K), 10만 개(100K), 50만 개(500K)의 룰을 가지는 룰베이스가 5.6개, 42.64개, 207.02개의 룰을 가지는 룰베이스로 감소되었 음을 보여준다.6 is a graph showing classification performance in terms of average number of rules referenced to find a matching rule. Referring to FIG. 6, the packet classification results using the maximum entropy key selection algorithm are 5.6 and 42.64 rulebases having 10,000 (10K), 100,000 (100K), and 500,000 (500K) rules, respectively. This shows that the rulebase is reduced to 207.02 rules.

이러한 결과는 종래의 패킷 분류 방법 중 가장 우수한 방법이 5천 개(5K)의 룰을 가지는 룰베이스에서 패킷 분류를 위해 적어도 13번의 메모리 참조가 요구된다는 것을 고려할 때, 매우 고무적인 현상이라 할 수 있다. 즉, 본 발명에 의하면, 최종 서브-룰베이스에서 최악의 경우라고 할 수 있는 선형 검색(linear search)이 이루어지는 경우에도, 해시 테이블을 검색하기 위해 2번의 메모리 참조가 요구될 뿐 아니라, 단지 5.6개의 룰만 참조하면 된다.This result is very encouraging considering that the best of the conventional packet classification methods requires at least 13 memory references for packet classification in a rulebase with 5,000 (5K) rules. . That is, according to the present invention, even when the worst case linear search is performed in the final sub-rulebase, not only two memory references are required to search the hash table, but only 5.6 You only need to reference the rules.

이상에서 설명한 바와 같이, 본 발명에 의하면, 두 단계의 분할을 통해 실질적으로 룰베이스의 크기를 줄일수 있다. 예를 들어, 100K개의 룰을 가진 룰베이스에서 룰의 개수는 평균적으로 7.6개가 되며, 최악의 경우 258개가 된다. 10만 개(100K)의 룰을 가진 룰베이스에서 최악의 경우 분류기는 입력 패킷에 대해 단지 258개의 룰을 체크하게 된다. As described above, according to the present invention, it is possible to substantially reduce the size of the rule base by splitting the two steps. For example, in a rule base with 100K rules, the average number of rules is 7.6, and in the worst case, 258. In a rulebase with 100,000 (100K) rules, the worst case classifier will only check 258 rules for incoming packets.

실제 패킷 트레이스에서의 실험 데이터는 5천 개(5K), 1만 개(10K), 2만 개(20K), 5만 개(50K), 10만 개(100K), 20만 개(200K) 및 50만 개(500K)의 룰을 가진 룰베이스에서 분류기는 각각 평균적으로 4.2, 5.6, 8.4, 20.4, 42.6, 83.2 및 207개의 룰만을 체크하면 된다. Experimental data on real packet traces includes 5,000 (5K), 10,000 (10K), 20,000 (20K), 50,000 (50K), 100,000 (100K), 200,000 (200K), and In a rulebase with 500,000 (500K) rules, the classifier only needs to check 4.2, 5.6, 8.4, 20.4, 42.6, 83.2 and 207 rules, respectively, on average.

본 발명에 의하면, 최악의 조건으로 서브-룰베이스에서 검색이 이루어져도, 상대적으로 룰베이스의 크기가 작은 경우 최적의 알고리즘의 하나인 RFC를 능가한다. 여기서, RFC는 5천 개(5K)의 크기를 가진 룰베이스에 대해 13번의 메모리 참조 가 요구된다. 또한, 본 발명에 의하면, 룰베이스의 크기가 증가함에 따라 공간 및 시간적면에서 독특한 확장성을 가지고 있다.According to the present invention, even if the search is performed in the sub-rulebase under the worst condition, it exceeds RFC which is one of the optimal algorithms when the rulebase is relatively small. Here, the RFC requires 13 memory references for a rule base of 5,000 (5K) size. In addition, according to the present invention, as the size of the rule base increases, it has unique expandability in space and time.

이상에서는 본 발명의 바람직한 실시예에 대해서 도시하고 설명하였으나, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능한 것은 물론이고, 그와 같은 변경은 청구범위 기재의 범위에 있게 된다.Although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the present invention is not limited to the specific embodiments of the present invention without departing from the spirit of the present invention as claimed in the claims. Anyone skilled in the art can make various modifications, as well as such modifications that fall within the scope of the claims.

Claims

A packet classification method for processing a packet according to the retrieved rule after searching for a rule having the highest priority matching a plurality of packets input from an external network among a plurality of rules included in a predetermined rule base,

A preprocessing step of dividing the rule base into a plurality of independent sub-rule bases based on at least one of a predetermined protocol and a predetermined port number, and generating a hash table based on the divided sub-rule bases; And

A classification step of searching for the sub-rulebase corresponding to the packet in the hash table using a hash key extracted from the header of the packet, and classifying the packet by mapping the packet to the retrieved sub-rulebase; Packet classification method comprising a.

delete

The method of claim 1,

When the number of rules belonging to a predetermined sub-rule base among the plurality of sub-rule bases exceeds a predetermined threshold value,

Repartitioning the sub-rulebase by using the hashing extracted from the source IP address and the destination IP address in a predetermined manner, and generating a hash table based on the sub-rulebase; Packet classification method characterized in that.

The method of claim 1,

A specific application is determined according to the protocol and the port number.

The method of claim 3, wherein the predetermined scheme is

Packet classification method characterized in that any one of the MSB pattern, Exponential growing pattern, Mask distribution pattern, and Entropy-maximizing pattern.

The method of claim 5, wherein the MSB pattern method,

And selecting a predetermined number of bits from the most significant bits of the destination IP address and the source IP address.

The method of claim 5, wherein the exponential growing pattern method,

And selecting a bit position corresponding to an exponential function of two from the destination IP address and the source IP address.

The method of claim 5, wherein the mask distribution pattern method,

At each bit position bi, the sum of the non-care bits in all defined rules belonging to the rulebase is summed to yield an accumulated value at each bit position, and the accumulated value from the most significant bit to the least significant bit. Summing to calculate the total accumulated value, and then selecting the predetermined bit position when the accumulated value at the predetermined bit position becomes a multiple of the total accumulated value divided by K:

Where K is the bit value of the hash to be generated.

The method of claim 5, wherein the Entropy-maximizing pattern method,

And selecting a predetermined number of bits each of which maximizes entropy from the destination IP address and the source IP address.