KR101716017B1

KR101716017B1 - Apparatus and method for matching of character string

Info

Publication number: KR101716017B1
Application number: KR1020150005896A
Authority: KR
Inventors: 김현진
Original assignee: 단국대학교 산학협력단
Priority date: 2015-01-13
Filing date: 2015-01-13
Publication date: 2017-03-14
Also published as: KR20160087134A

Abstract

본 발명은 문자열 매칭 장치 및 그 방법에 관한 것으로, 다수개의 타겟 문자열 패턴의 접미사를 분석하여 고유 패턴(non unique pattern) 또는 고유 패턴(unique pattern)으로 분리하고 그룹화하는 패턴 분리부; 그룹화된 비 고유 패턴의 각 세트를 부분 매칭 벡터 테이블(partial matching vector, PMV)을 포함하는 제1 비트 분리 문자열 매처(matcher)에 매핑(mapping) 하는 제1 문자열 매칭부; 및 그룹화된 고유 패턴의 각 세트를 부분 매칭 인덱서(partial matching index, PMI)를 포함하는 제2 비트 분리 문자열 매처에 매핑하는 제2 문자열 매칭부;를 포함한다.The present invention relates to a character string matching apparatus and a method thereof, and more particularly, to a character string matching apparatus and method, which analyzes a suffix of a plurality of target character string patterns and separates and groups them into a unique pattern or a unique pattern; A first string matching unit for mapping each set of grouped non-unique patterns to a first bit-separated string matcher including a partial matching vector (PMV); And a second string matching unit for mapping each set of the grouped unique patterns to a second bit-separated string match including a partial matching index (PMI).

Description

[0001] APPARATUS AND METHOD FOR MATCHING OF CHARACTER STRING [0002]

본 발명은 문자열 매칭 장치 및 그 방법에 관한 것으로, 보다 자세하게는 심층 패킷 분석(Deep Packet Inspection, DPI)을 위한 효율적인 메모리 비트 분리 문자열 매칭 기법을 제안하는 문자열 매칭 장치 및 그 방법에 관한 것이다.The present invention relates to a character string matching apparatus and method thereof, and more particularly, to a character string matching apparatus and a method thereof that propose an efficient memory bit string string matching technique for deep packet inspection (DPI).

네트워크 환경이 세계화됨에 따라서 원격의 다양한 공격으로부터 네트워크 보안이 위협받을 수 있다. 기존의 방화벽과 어플리케이션 게이트웨이는 악의적인 공격으로부터 네트워크 환경을 보호하기 위해 개발되었음에도 불구하고 네트워크 투명성을 가지는 컨텐츠 필터링이 지원되지 못한다. 또한 네트워크 통신에의 서비스의 질적 문제는 패킷 컨텐츠의 정보에 기반을 둔 패킷 제어를 통해서만이 향상이 가능하다.As the network environment becomes global, network security may be threatened from various remote attacks. Although existing firewalls and application gateways have been developed to protect the network environment from malicious attacks, content filtering with network transparency is not supported. In addition, the quality problem of the service to the network communication can be improved only by the packet control based on the information of the packet contents.

심층 패킷 분석(Deep Packet Inspection, DPI)은 패킷의 페이로드로부터 유용한 정보를 추출하고 이용하기 위해 개발되었었다. 이 경우 네트워크 환경상에서의 선속도로 동작하는 부분이 관건이다. 문자열 매칭 엔진은 심층 패킷 분석에서 선속도로 해당 패턴을 감지하기 위한 필수적인 소자이다. 악의적인 패킷의 형태가 다양함에 따라 해당 패턴의 수도 늘어나기 때문에 해당 패턴수에 대해서 하드웨어 비용을 적절하게 유지하기 위해서 문자열 매칭 엔진의 메모리 요구량을 감소시켜야 하는 문제점이 있었다.Deep Packet Inspection (DPI) was developed to extract and use useful information from the payload of a packet. In this case, the part that operates at the linear speed in the network environment is the key. The string matching engine is an indispensable element for sensing the pattern at linear speed in deep packet analysis. There is a problem that the memory requirement of the string matching engine must be reduced in order to appropriately maintain the hardware cost for the number of patterns because the number of patterns increases as malicious packet types vary.

이와 관련하여, 한국공개특허 제2009-0104425호는 "모바일 인터넷 환경에서의 ＤＰＩ 장치 및 방법과 이에 사용되는 패턴 매칭 방법 및 기록매체"에 관하여 개시하고 있다.In this regard, Korean Patent Laid-Open Publication No. 2009-0104425 discloses a DPI apparatus and method in a mobile Internet environment, a pattern matching method and recording medium used therefor, and the like.

본 발명은 상기와 같은 문제점을 해결하기 위해 발명된 것으로서, 문자열 패턴의 접미사를 분석하여 비 고유 패턴과 고유 패턴으로 분리하는 문자열 매칭 장치 및 그 방법을 제공하는데 그 목적이 있다.It is an object of the present invention to provide a character string matching apparatus and method for analyzing a suffix of a string pattern to separate the string into a unique pattern and a unique pattern.

또한, 본 발명은 비 고유 패턴의 그룹의 경우 부분 매칭 벡터를 각 상태마다 저장하지 않는 대신 각 상태마다 필요한 부분 매칭 벡터 테이블을 별도로 구비하는 병렬 문자열 매칭 구조를 제안하는 문자열 매칭 장치 및 그 방법을 제공하는데 그 목적이 있다.In addition, the present invention provides a string matching apparatus and a method thereof for proposing a parallel string matching structure that separately includes a partial matching vector table for each state instead of storing partial matching vectors for each state in the case of a group of non-unique patterns It has its purpose.

또한, 본 발명은 고유 패턴의 그룹의 경우 부분 매칭 벡터를 각 상태마다 필요한 부분 매칭 벡터 테이블의 메모리 주소를 저장하고 있는 부분 매칭 인덱스를 구비하는 병렬 문자열 매칭 구조를 제안하는 문자열 매칭 구조를 제안하는 문자열 매칭 장치 및 그 방법을 제공하는데 그 목적이 있다.The present invention also provides a parallel string matching structure that includes a partial matching index storing a partial matching vector for each state, and a memory address of a partial matching vector table required for each group of unique patterns. And to provide a matching apparatus and method therefor.

상기한 목적을 달성하기 위한 본 발명에 따른 문자열 매칭 장치는 다수개의 타겟 문자열 패턴의 접미사를 분석하여 비 고유 패턴 또는 고유 패턴으로 분리하고 그룹화하는 패턴 분리부; 그룹화된 비 고유 패턴의 각 세트를 부분 매칭 벡터 테이블(partial matching vector, PMV)을 포함하는 제1 비트 분리 문자열 매처(matcher)에 매핑(mapping) 하는 제1 문자열 매칭부; 및 그룹화된 고유 패턴의 각 세트를 부분 매칭 인덱서(partial matching index, PMI)를 포함하는 제2 비트 분리 문자열 매처에 매핑하는 제2 문자열 매칭부;를 포함한다.According to an aspect of the present invention, there is provided a character string matching apparatus comprising: a pattern separator for analyzing a suffix of a plurality of target character string patterns and separating and grouping the suffixes into a unique pattern or a unique pattern; A first string matching unit for mapping each set of grouped non-unique patterns to a first bit-separated string matcher including a partial matching vector (PMV); And a second string matching unit for mapping each set of the grouped unique patterns to a second bit-separated string match including a partial matching index (PMI).

또한, 상기 패턴 분리부는 타겟 문자열 패턴의 접미사가 다른 패턴의 접미사가 되는 경우 비 고유 패턴이라 판단하고, 타겟 문자열 패턴의 접미사가 다른 패턴의 접미사가 되지 않는 경우 고유 패턴이라 판단하는 것을 특징으로 한다.The pattern separator determines that the suffix of the target character string pattern is a non-unique pattern when the suffix of the target character string pattern is a suffix of another pattern, and determines that the suffix is a unique pattern when the suffix of the target character string pattern does not become a suffix of another pattern.

또한, 상기 제1 문자열 매칭부는 그룹화된 비 고유 패턴의 각 세트는 다수개의 비트 집합으로 분할하여 다수개의 비트 분리 문자열 매처 중 하나의 상기 제1 비트 분리 문자열 매처에 매핑하는 것을 특징으로 한다.The first string matching unit may divide each set of the grouped non-unique patterns into a plurality of bit sets and map them to one of the plurality of bit-separated string matches.

또한, 상기 제1 비트 분리 문자열 매처는 분할된 다수개의 비트 집합을 각각 입력받는 유한 상태 머신(finite-state machine, FSM) 형태로 이루어진 다수개의 타일(tile)을 구비하여, 타일의 각 행에는 상태 별 벡터 포인터(vector pointer)를 입력하고, 입력된 벡터 포인터를 토대로 하여 타일 별로 구비되는 상기 부분 매칭 벡터 테이블에 해당되는 부분 매칭 벡터를 저장하는 것을 특징으로 한다.In addition, the first bit-sliced string matcher may include a plurality of tiles in the form of a finite-state machine (FSM) receiving a plurality of divided bit sets, And stores a partial matching vector corresponding to the partial matching vector table provided for each tile based on the input vector pointer.

또한, 타일 별로 저장된 부분 매칭 벡터를 비트 별 AND 연산을 이용하여 완전 매칭 벡터(full matching vector)를 추출하는 것을 특징으로 한다.In addition, the partial matching vector stored for each tile is characterized by extracting a full matching vector using a bitwise AND operation.

또한, 상기 제2 문자열 매칭부는 그룹화된 고유 패턴의 각 세트는 다수개의 비트 집합으로 분할하여 다수개의 비트 분리 문자열 매처 중 하나의 상기 제2 비트 분리 문자열 매처에 매핑하는 것을 특징으로 한다.The second character string matching unit may divide each set of the grouped unique patterns into a plurality of bit sets and map them to one of the plurality of bit-separated string matches.

또한, 상기 제2 비트 분리 문자열 매처는 분할된 다수개의 비트 집합을 각각 입력받는 유한 상태 머신(finite-state machine, FSM) 형태로 이루어진 다수개의 타일(tile)을 구비하여, 타일의 각 행에는 상태 별로 부분 매칭 인덱서가 저장되는 것을 특징으로 한다.In addition, the second bit-sliced string matcher may include a plurality of tiles in the form of a finite-state machine (FSM) receiving a plurality of divided bit sets, The partial matching indexer is stored.

또한, 타일 별로 저장된 부분 매칭 인덱서를 비교하여 동일 여부를 판단하는 것을 특징으로 한다.Also, the stored partial matching indexers are compared with each other to determine whether they are the same or not.

또한, 부분 매칭 인덱서는 상태 별로 할당되는 부분 매칭 벡터 테이블의 메모리 주소 정보인 것을 특징으로 한다.The partial matching indexer is memory address information of a partial matching vector table allocated for each state.

상기한 목적을 달성하기 위한 본 발명에 따른 문자열 매칭 방법은 패턴 분리부에 의해, 다수개의 타겟 문자열 패턴의 접미사를 분석하여 비 고유 패턴 또는 고유 패턴으로 분리하고 그룹화하는 단계; 제1 문자열 매칭부에 의해, 그룹화된 비 고유 패턴의 각 세트를 부분 매칭 벡터 테이블(partial matching vector, PMV)을 포함하는 제1 비트 분리 문자열 매처(matcher)에 매핑하는 단계; 및 제2 문자열 매칭부에 의해, 그룹화된 고유 패턴의 각 세트를 부분 매칭 인덱서(partial matching index, PMI)를 포함하는 제2 비트 분리 문자열 매처에 매핑하는 단계;를 포함한다.According to another aspect of the present invention, there is provided a character string matching method, comprising: analyzing a suffix of a plurality of target character string patterns by a pattern separating unit to separate and group into a unique pattern or a unique pattern; Mapping each set of grouped non-unique patterns to a first bit-separated string matcher comprising a partial matching vector (PMV) by a first string matching unit; And mapping each set of grouped unique patterns by a second string matching unit to a second bit-sliced string match comprising a partial matching index (PMI).

또한, 다수개의 타겟 문자열 패턴의 접미사를 분석하여 비 고유 패턴 또는 고유 패턴으로 분리하고 그룹화하는 단계는, 타겟 문자열 패턴의 접미사가 다른 패턴의 접미사가 되는 경우 비 고유 패턴이라 판단하고, 타겟 문자열 패턴의 접미사가 다른 패턴의 접미사가 되지 않는 경우 고유 패턴이라 판단하는 것을 특징으로 한다.The step of analyzing the suffixes of the plurality of target character string patterns and separating and grouping them into a non-unique pattern or a unique pattern may include determining a suffix of the target character string pattern as a non-unique pattern when the suffix of the target character string pattern is a suffix of another pattern, When the suffix is not a suffix of another pattern, it is determined that the suffix is a unique pattern.

또한, 그룹화된 비 고유 패턴의 각 세트를 부분 매칭 벡터 테이블을 포함하는 제1 비트 분리 문자열 매처에 매핑하는 단계는, 그룹화된 비 고유 패턴의 각 세트는 다수개의 비트 집합으로 분할하여 다수개의 비트 분리 문자열 매처 중 하나의 상기 제1 비트 분리 문자열 매처에 매핑하는 것을 특징으로 한다.Also, mapping each set of grouped non-unique patterns to a first bit-separated string match that includes a partial matching vector table may include: dividing each set of grouped non-unique patterns into a plurality of bit sets, To the first bit-separated string match of one of the string matchers.

또한, 그룹화된 비 고유 패턴의 각 세트를 부분 매칭 벡터 테이블을 포함하는 제1 비트 분리 문자열 매처에 매핑하는 단계에서, 상기 제1 비트 분리 문자열 매처는 분할된 다수개의 비트 집합을 각각 입력받는 유한 상태 머신(finite-state machine, FSM) 형태로 이루어진 다수개의 타일(tile)을 구비하여, 타일의 각 행에는 상태 별 벡터 포인터(vector pointer)를 입력하고, 입력된 벡터 포인터를 토대로 하여 타일 별로 구비되는 상기 부분 매칭 벡터 테이블에 해당되는 부분 매칭 벡터를 저장하는 것을 특징으로 한다.In addition, in the step of mapping each set of grouped non-unique patterns to a first bit-separated string match including a partial matching vector table, the first bit-separated string match may include a finite state A plurality of tiles in the form of a finite-state machine (FSM) are provided. In each row of tiles, a vector pointer for each state is input. And a partial matching vector corresponding to the partial matching vector table is stored.

또한, 그룹화된 비 고유 패턴의 각 세트를 부분 매칭 벡터 테이블을 포함하는 제1 비트 분리 문자열 매처에 매핑하는 단계에서, 타일 별로 저장된 부분 매칭 벡터를 비트 별 AND 연산을 이용하여 완전 매칭 벡터(full matching vector)를 추출하는 것을 특징으로 한다.In addition, in mapping each set of grouped non-unique patterns to a first bit-sliced string match including a partial matching vector table, a partial matching vector stored for each tile is subjected to a full matching vector is extracted.

또한, 그룹화된 고유 패턴의 각 세트를 부분 매칭 인덱서를 포함하는 제2 비트 분리 문자열 매처에 매핑하는 단계는, 그룹화된 고유 패턴의 각 세트는 다수개의 비트 집합으로 분할하여 다수개의 비트 분리 문자열 매처 중 하나의 상기 제2 비트 분리 문자열 매처에 매핑하는 것을 특징으로 한다.Further, mapping each set of grouped unique patterns to a second bit-sliced string match comprising a partial match indexer comprises: dividing each set of grouped unique patterns into a plurality of bit sets to generate a plurality of bit- To a second bit-separated string match.

또한, 그룹화된 고유 패턴의 각 세트를 부분 매칭 인덱서를 포함하는 제2 비트 분리 문자열 매처에 매핑하는 단계는, 상기 제2 비트 분리 문자열 매처는 분할된 다수개의 비트 집합을 각각 입력받는 유한 상태 머신(finite-state machine, FSM) 형태로 이루어진 다수개의 타일(tile)을 구비하여, 각각의 타일에는 상태 별로 상기 부분 매칭 인덱서가 저장되는 것을 특징으로 한다.The step of mapping each set of grouped unique patterns to a second bit-sliced string match comprising a partial match indexer may further comprise the steps of: (a) a plurality of tiles in the form of a finite-state machine (FSM), and the partial matching indexers are stored for each tile in each tile.

또한, 그룹화된 고유 패턴의 각 세트를 부분 매칭 인덱서를 포함하는 제2 비트 분리 문자열 매처에 매핑하는 단계는, 타일 별로 저장된 부분 매칭 인덱서를 비교하여 동일 여부를 판단하는 것을 특징으로 한다.The mapping of each set of grouped unique patterns to a second bit-sliced string match including the partial match indexer may be performed by comparing partial match indexers stored for each tile.

상기와 같은 구성을 갖는 본 발명에 의한 문자열 매칭 장치 및 그 방법은 문자열 패턴의 접미사를 분석하여 비 고유 패턴과 고유 패턴으로 분리한 후 비 고유 패턴의 그룹의 경우 별도의 부분 매칭 벡터 테이블을 이용하고 고유 패턴의 그룹의 경우 부분 매칭 인덱스를 이용하여 문자열 매칭을 수행함으로써, 패턴의 고유성을 토대로 해당 패턴 길이의 다양성을 완화시켜 문자열 매칭 장치의 전체 메모리 요구량을 줄일 수 있는 효과가 있다.The character string matching apparatus and method according to the present invention having the above-described structure analyzes a suffix of a character string pattern, separates it into a non-unique pattern and a unique pattern, and uses a separate partial matching vector table for a group of non- In the case of the group of the unique patterns, the string matching is performed using the partial matching index, thereby mitigating the diversity of the pattern length based on the uniqueness of the pattern, thereby reducing the total memory requirement of the string matching apparatus.

도 1은 본 발명에 따른 문자열 매칭 장치의 구성을 설명하기 위한 도면이다.
도 2는 본 발명에 따른 문자열 매칭 장치에 채용되는 패턴 분리부를 통해 패턴을 그룹핑 하는 실시예를 설명하기 위한 도면이다.
도 3은 도 2의 비 고유 패턴 그룹에 대한 DFA의 예시를 나타내는 도면이다.
도 4는 도 2의 고유 패턴 그룹에 대한 DFAs의 예시를 나타내는 도면이다.
도 5는 본 발명에 따른 비 고유 패턴 그룹에 대한 제1 문자열 매칭부의 구조를 설명하기 위한 도면이다.
도 6은 도 5의 비 고유 패턴의 그룹에 대한 유한 상태 머신 형태로 이루어진 다수개의 타일에 대한 구조의 예시를 나타내는 도면이다.
도 7은 본 발명에 따른 고유 패턴 그룹에 대한 제2 문자열 매칭부의 구조를 설명하기 위한 도면이다.
도 8은 도 7의 고유 패턴의 그룹에 대한 유한 상태 머신 형태로 이루어진 다수개의 타일에 대한 구조의 예시를 나타내는 도면이다.
도 9는 본 발명에 따른 비 고유 패턴 그룹의 문자열 매칭 방법의 순서를 설명하기 위한 도면이다.
도 10은 본 발명에 따른 고유 패턴 그룹의 문자열 매칭 방법의 순서를 설명하기 위한 도면이다.FIG. 1 is a diagram for explaining a configuration of a character string matching apparatus according to the present invention.
FIG. 2 is a view for explaining an embodiment of pattern grouping using a pattern separating unit employed in the character string matching apparatus according to the present invention.
Figure 3 is a diagram illustrating an example of a DFA for the non-unique pattern group of Figure 2;
FIG. 4 is a diagram showing an example of DFAs for the unique pattern group of FIG. 2. FIG.
5 is a diagram for explaining a structure of a first character string matching unit for a non-unique pattern group according to the present invention.
FIG. 6 is a diagram illustrating an example of a structure of a plurality of tiles in the form of a finite state machine for a group of the non-unique patterns of FIG. 5. FIG.
7 is a diagram for explaining a structure of a second string matching unit for a unique pattern group according to the present invention.
FIG. 8 is a diagram illustrating an example of a structure for a plurality of tiles in the form of a finite state machine for a group of unique patterns of FIG. 7; FIG.
9 is a diagram for explaining a procedure of a string matching method of a non-unique pattern group according to the present invention.
FIG. 10 is a diagram for explaining a procedure of a string matching method of a unique pattern group according to the present invention.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여, 본 발명의 가장 바람직한 실시예를 첨부 도면을 참조하여 설명하기로 한다. 우선, 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 출력되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to facilitate a person skilled in the art to easily carry out the technical idea of the present invention. . First, in adding reference numerals to the constituent elements of the drawings, it should be noted that the same constituent elements are denoted by the same reference numerals whenever possible even if they are displayed on other drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하, 본 발명의 실시예에 따른 문자열 매칭 장치에 대하여 상세히 설명한다.Hereinafter, a character string matching apparatus according to an embodiment of the present invention will be described in detail.

도 1은 본 발명에 따른 문자열 매칭 장치의 구성을 설명하기 위한 도면이다.FIG. 1 is a diagram for explaining a configuration of a character string matching apparatus according to the present invention.

도 1을 참조하여 설명하면, 본 발명에 따른 문자열 매칭 장치(100)는 크게 패턴 분리부(110), 제1 문자열 매칭부(120) 및 제2 문자열 매칭부(130)를 포함한다.Referring to FIG. 1, a character string matching apparatus 100 according to the present invention includes a pattern separator 110, a first character string matching unit 120, and a second character string matching unit 130.

패턴 분리부(110)는 다수개의 타겟 문자열 패턴의 접미사를 분석하여 비 고유 패턴 또는 고유 패턴으로 분리하고 그룹화한다.The pattern separating unit 110 analyzes the suffixes of the plurality of target character string patterns, and separates and groups the suffixes into a non-unique pattern or a unique pattern.

패턴 분리부(110)는 타겟 문자열 패턴의 접미사가 다른 패턴의 접미사가 되는 경우 비 고유 패턴이라 판단한다. 예를 들어, 고유하지 않은 패턴 "abcd", "cd", "d", "fg"가 있다고 가정하면, "abcd" 패턴의 접미사에 해당하는 "cd", "d" 패턴을 비 고유 패턴이라 판단한다.The pattern separator 110 determines that the suffix of the target character string pattern is a non-unique pattern when the suffix of the target character string pattern is a suffix of another pattern. For example, supposing that the patterns "abcd", "cd", "d", and "fg" are not unique, the patterns "cd" and "d" corresponding to the suffix "abcd" .

또한, 패턴 분리부(110)는 타겟 문자열 패턴의 접미사가 다른 패턴의 접미사가 되지 않는 경우 고유 패턴이라 판단한다. 예를 들어, 고유하지 않은 패턴 "abcd", "adfg", "cd", "d", "fg"가 있다고 가정하면, "abcd" 패턴의 접미사에 해당하는 "cd", "d" 패턴 그리고 "adfg" 패턴의 접미사에 해당하는 "fg"를 제외한 "abcd", "adfg" 패턴을 고유 패턴이라 판단한다.If the suffix of the target character string pattern does not become a suffix of another pattern, the pattern separator 110 determines that the suffix is a unique pattern. For example, suppose you have the non-unique patterns "abcd", "adfg", "cd", "d", and "fg" quot; abcd "and" adfg "patterns except" fg "corresponding to the suffix of the" adfg &

제1 문자열 매칭부(120)는 그룹화된 비 고유 패턴의 각 세트를 부분 매칭 벡터 테이블(partial matching vector, PMV)을 포함하는 제1 비트 분리 문자열 매처(matcher)에 매핑(mapping) 하며, 이를 위한 구성은 이후 도 5 및 도 6에서 자세하게 설명하기로 한다.The first string matching unit 120 maps each set of grouped non-unique patterns to a first bit-separated string matcher including a partial matching vector (PMV) The configuration will be described in detail later with reference to FIG. 5 and FIG.

제2 문자열 매칭부(130)는 그룹화된 고유 패턴의 각 세트를 부분 매칭 인덱서(partial matching index, PMI)를 포함하는 제2 비트 분리 문자열 매처에 매핑하며 이를 위한 구성은 이후 도 7 및 도 8에서 자세하게 설명하기로 한다.
The second string matching unit 130 maps each set of the grouped unique patterns to a second bit-sliced string match that includes a partial matching index (PMI), and a configuration therefor is shown in FIGS. 7 and 8 I will explain in detail.

도 2는 본 발명에 따른 문자열 매칭 장치에 채용되는 패턴 분리부를 통해 패턴을 그룹핑 하는 실시예를 설명하기 위한 도면이고, 도 3은 도 2의 비 고유 패턴 그룹에 대한 DFA의 예시를 나타내는 도면이고, 도 4는 도 2의 고유 패턴 그룹에 대한 DFAs의 예시를 나타내는 도면이다.FIG. 2 is a view for explaining an embodiment for grouping patterns through a pattern separating unit employed in the character string matching apparatus according to the present invention. FIG. 3 is a diagram illustrating an example of a DFA for the non- FIG. 4 is a diagram showing an example of DFAs for the unique pattern group of FIG. 2. FIG.

도 2를 참조하여 설명하면, 패턴 분리부는 다수개의 타겟 문자열 패턴의 접미사를 분석하여 비 고유 패턴 또는 고유 패턴으로 분리하고 그룹화한다.Referring to FIG. 2, the pattern separating unit analyzes suffixes of a plurality of target character string patterns, and separates and groups the suffixes into a non-unique pattern or a unique pattern.

보다 자세하게, 패턴 분리부는 고유하지 않은 패턴 "abcd", "adfg", "cfg", "fg", "cd", "d" 가 있으면, "abcd", "adfg", "cfg" 의 접미사에 해당하는 "fg", "cd", "d" 패턴을 비 고유 패턴으로 분리하고 그룹화하고, "abcd", "adfg", "cfg" 를 고유 패턴으로 그룹화한다.In more detail, the pattern separating unit is provided with suffixes "abcd", "adfg", and "cfg" in the case of the patterns "abcd", "adfg", "cfg", "fg", "cd" The patterns "fg", "cd", and "d" are separated into non-unique patterns and grouped, and "abcd", "adfg", and "cfg" are grouped into unique patterns.

도 3을 참조하여 설명하면, 화살표는 상태 천이를 의미하고, 상태 S0는 초기 상태이다. 또한, 이중 동그라미 상태 S4, S6, S9, S11, S12, 및 S13는 중괄호 그들의 관련 패턴이 각각 출력 상태에서 일치 출력 상태가 된다. 여기서, 패턴 S4, S6, S9, 및 S12가 일치하며 해당 패턴은 비 고유 패턴 집합의 원소들이다. 이처럼, 타겟 패턴을 수집한 후에, 비 고유 패턴 그룹과 고유 패턴 그룹이 생성된다.Referring to FIG. 3, an arrow indicates a state transition, and a state S0 is an initial state. Further, in the double circle states S4, S6, S9, S11, S12, and S13, the related patterns of the braces become the coincidence output states in the output states. Here, the patterns S4, S6, S9, and S12 coincide with each other, and the pattern is an element of the non-unique pattern set. As such, after collecting the target pattern, the non-unique pattern group and the unique pattern group are generated.

도 4를 참조하여 설명하면, (a)는 도 3과는 달리 하나의 패턴은 출력 정합으로 고유 패턴 그룹을 의미하고, (b)는 비 고유 패턴 그룹에 속하는 S4의 출력 상태를 의미한다.
Referring to FIG. 4, (a) differs from FIG. 3 in that one pattern corresponds to a specific pattern group by output matching, and (b) refers to an output state of S4 belonging to a non-unique pattern group.

도 5는 본 발명에 따른 비 고유 패턴 그룹에 대한 제1 문자열 매칭부의 구조를 설명하기 위한 도면이고, 도 6은 도 5의 비 고유 패턴의 그룹에 대한 유한 상태 머신 형태로 이루어진 다수개의 타일에 대한 구조의 예시를 나타내는 도면이다.FIG. 5 is a view for explaining a structure of a first character matching unit for a non-unique pattern group according to the present invention. FIG. 6 is a diagram for explaining a structure of a non- Fig.

도 5를 참조하여 설명하면, 본 발명에 따른 제1 문자열 매칭부는 그룹화된 비 고유 패턴의 각 세트를 부분 매칭 벡터 테이블을 포함하는 제1 비트 분리 문자열 매처에 매핑 한다.Referring to FIG. 5, a first string matching unit according to the present invention maps each set of grouped non-unique patterns to a first bit-separated string match including a partial matching vector table.

즉, 제1 문자열 매칭부는 그룹화된 비 고유 패턴의 각 세트는 다수개의 비트 집합으로 분할하여 다수개의 비트 분리 문자열 매처 중 하나의 제1 비트 분리 문자열 매처에 매핑 한다. That is, the first string matching unit divides each set of the grouped non-unique patterns into a plurality of bit sets and maps them to one of the plurality of bit-separated string matches.

여기서, 제1 비트 분리 문자열 매처는 분할된 다수개의 비트 집합을 각각 입력받는 유한 상태 머신(finite-state machine, FSM) 형태로 이루어진 다수개의 타일(tile)을 구비한다. 이때, 제1 문자열 매칭부는 8비트의 문자를 2비트의 4개의 비트 집합으로 분할한다. 따라서, 제1 비트 분리 문자열 매처는 4개의 타일을 구비한다. 또한, 유한 상태 머신은 전자 논리 회로를 설계하는데 쓰이는 수학적 모델로 간단하게 상태 머신이라고 부르기도 한다. 유한 상태 머신은 유한한 개수의 상태를 가질 수 있는 오토마타, 즉 추상 기계라고 할 수 있다. 이러한 머신은 한 번에 오로지 하나의 상태만을 가지게 되며, 현재 상태(Current State)란 임의의 주어진 시간의 상태를 칭한다. 이러한 기계는 어떠한 사건(Event)에 의해 한 상태에서 다른 상태로 변화할 수 있으며, 이를 전이(Transition)이라 한다. 특정한 유한 오토 마톤은 현재 상태로부터 가능한 전이 상태와, 이러한 전이를 유발하는 조건들의 집합으로서 정의될 수 있다.Here, the first bit-separated string matcher includes a plurality of tiles in the form of a finite-state machine (FSM) receiving a plurality of divided bit sets. At this time, the first string matching unit divides the 8-bit character into 4 bit sets of 2 bits. Thus, the first bit-separated string match has four tiles. A finite state machine is also called a state machine simply as a mathematical model used to design electronic logic circuits. A finite state machine is an automata, or abstract machine, that can have a finite number of states. These machines have only one state at a time, and the Current State refers to the state of any given time. Such a machine can change from one state to another by an event, which is called a transition. A particular finite automaton can be defined as a set of conditions that can cause such a transition and a possible transition state from the current state.

이때, 각각의 타일에는 고 6에 도시된 바와 같이 각 행마다 상태 별 벡터 포인터(vector pointer)를 입력하고, 입력된 벡터 포인터를 토대로 하여 타일 별로 구비되는 부분 매칭 벡터 테이블에 해당되는 부분 매칭 벡터를 저장한다. 벡터 포인터를 토대로 하여 별도로 구비되는 부분 매칭 벡터 테이블에 자신의 부분 매칭 벡터를 나타낸다. At this time, a state vector pointer is input to each tile for each row as shown in high 6, and a partial matching vector corresponding to the partial matching vector table provided for each tile is input based on the input vector pointer . And shows its partial matching vector in a partial matching vector table separately provided based on the vector pointer.

이와 같이, 타일 별로 저장된 부분 매칭 벡터를 비트 별 AND 연산을 이용하여 완전 매칭 벡터(full matching vector)를 추출할 수 있다.
Thus, the partial matching vector stored for each tile can be extracted using a bitwise AND operation to obtain a full matching vector.

도 7은 본 발명에 따른 고유 패턴 그룹에 대한 제2 문자열 매칭부의 구조를 설명하기 위한 도면이고, 도 8은 도 7의 고유 패턴의 그룹에 대한 유한 상태 머신 형태로 이루어진 다수개의 타일에 대한 구조의 예시를 나타내는 도면이다.FIG. 7 is a view for explaining a structure of a second character string matching unit for a unique pattern group according to the present invention. FIG. 8 is a view for explaining a structure of a plurality of tiles constituting a finite state machine form Fig.

도 7을 참조하여 설명하면, 제2 문자열 매칭부는 그룹화된 고유 패턴의 각 세트를 부분 매칭 인덱서를 포함하는 제2 비트 분리 문자열 매처에 매핑한다.Referring to FIG. 7, the second string matching unit maps each set of grouped unique patterns to a second bit-separated string match including the partial match indexer.

즉, 제2 문자열 매칭부는 그룹화된 고유 패턴의 각 세트는 다수개의 비트 집합으로 분할하여 다수개의 비트 분리 문자열 매처 중 하나의 제2 비트 분리 문자열 매처에 매핑한다. That is, the second string matching unit divides each set of the grouped unique patterns into a plurality of bit sets and maps them to one of the plurality of bit-sliced string matches.

여기서, 제2 비트 분리 문자열 매처는 분할된 다수개의 비트 집합을 각각 입력받는 유한 상태 머신(finite-state machine, FSM) 형태로 이루어진 다수개의 타일(tile)을 구비한다. 이때, 제2 문자열 매칭부는 8비트의 문자를 2비트의 4개의 비트 집합으로 분할한다. 따라서, 제2 비트 분리 문자열 매처는 4개의 타일을 구비한다. Here, the second bit-separated string matcher includes a plurality of tiles in the form of a finite-state machine (FSM) receiving a plurality of divided bit sets. At this time, the second string matching unit divides 8-bit characters into 4 bit sets of 2 bits. Thus, the second bit-separated string match has four tiles.

이때, 각각의 타일에는 도 8에 도시된 바와 같이, 각 행마다 상태 별로 상기 부분 매칭 인덱서를 저장한다. 부분 매칭 인덱서는 상태 별로 할당되는 부분 매칭 벡터 테이블의 메모리 주소 정보이다.At this time, as shown in FIG. 8, the partial matching indexer is stored for each row in each tile. The partial matching indexer is the memory address information of the partial matching vector table allocated by the state.

이와 같이, 타일 별로 저장된 부분 매칭 인덱서를 비교하여 동일 여부를 판단한다.
In this manner, the stored partial matching indexers are compared for each tile to determine whether they are the same or not.

이하, 본 발명의 실시예에 따른 문자열 매칭 방법에 대하여 상세히 설명한다.Hereinafter, a character string matching method according to an embodiment of the present invention will be described in detail.

도 9는 본 발명에 따른 비 고유 패턴 그룹의 문자열 매칭 방법의 순서를 설명하기 위한 도면이다.9 is a diagram for explaining a procedure of a string matching method of a non-unique pattern group according to the present invention.

도 9를 참조하여 설명하면, 먼저, 다수개의 타겟 문자열 패턴의 접미사를 분석한다(S100).Referring to FIG. 9, suffixes of a plurality of target character string patterns are analyzed (S100).

다음, 분석 결과 비 고유 패턴을 판단하여 분리한다(S110). S110 단계에서, 타겟 문자열 패턴의 접미사가 다른 패턴의 접미사가 되는 경우 비 고유 패턴이라 판단한다.Next, as a result of the analysis, the non-inherent pattern is determined and separated (S110). In step S110, if the suffix of the target character string pattern is a suffix of another pattern, it is determined to be a non-unique pattern.

다음, 그룹화된 비 고유 패턴의 각 세트는 다수개의 비트 집합으로 분할하여 유한 상태 머신 형태로 이루어진 다수개의 타일에 저장한다(S120). Next, each set of grouped non-unique patterns is divided into a plurality of bit sets and stored in a plurality of tiles in a finite state machine form (S120).

다음, 타일의 각 행마다 상태 별 벡터 포인터를 입력한다(S130).Next, a state-specific vector pointer is input for each row of the tile (S130).

다음, 입력된 벡터 포인터를 토대로 하여 별도의 부분 매칭 벡터 테이블에 해당되는 부분 매칭 벡터를 저장한다(S140).
Next, a partial matching vector corresponding to a separate partial matching vector table is stored based on the input vector pointer (S140).

도 10은 본 발명에 따른 고유 패턴 그룹의 문자열 매칭 방법의 순서를 설명하기 위한 도면이다.FIG. 10 is a diagram for explaining a procedure of a string matching method of a unique pattern group according to the present invention.

도 10을 참조하여 설명하면, 먼저, 다수개의 타겟 문자열 패턴의 접미사를 분석한다(S200).Referring to FIG. 10, first, suffixes of a plurality of target character string patterns are analyzed (S200).

다음, 분석 결과 고유 패턴을 판단하여 분리한다(S210). S210 단계에서, 타겟 문자열 패턴의 접미사가 다른 패턴의 접미사가 되지 않는 경우 고유 패턴이라 판단한다.Next, a unique pattern is determined as an analysis result and separated (S210). In step S210, if the suffix of the target character string pattern does not become the suffix of another pattern, it is determined to be a unique pattern.

다음, 그룹화된 고유 패턴의 각 세트는 다수개의 비트 집합으로 분할하여 유한 상태 머신 형태로 이루어진 다수개의 타일에 저장한다(S220). Next, each set of grouped unique patterns is divided into a plurality of sets of bits and stored in a plurality of tiles in a finite state machine form (S220).

다음, 타일의 각 행마다 상태 별로 상기 부분 매칭 인덱서를 저장한다(S230).
Next, the partial matching indexer is stored for each row of the tiles by state (S230).

이처럼, 본 발명에 의한 문자열 매칭 장치 및 그 방법은 문자열 패턴의 접미사를 분석하여 비 고유 패턴과 고유 패턴으로 분리한 후 비 고유 패턴의 그룹의 경우 별도의 부분 매칭 벡터 테이블을 이용하고 고유 패턴의 그룹의 경우 부분 매칭 인덱스를 이용하여 문자열 매칭을 수행함으로써, 패턴의 고유성을 토대로 해당 패턴 길이의 다양성을 완화시켜 문자열 매칭 장치의 전체 메모리 요구량을 줄일 수 있다
As described above, the character string matching apparatus and method according to the present invention analyzes a suffix of a character string pattern and separates it into a non-unique pattern and a unique pattern. In the case of a group of non-unique patterns, a separate partial matching vector table is used, , The string matching is performed using the partial matching index, thereby alleviating the diversity of the pattern length based on the uniqueness of the pattern, thereby reducing the total memory requirement of the string matching apparatus

이상에서 본 발명에 따른 바람직한 실시예에 대해 설명하였으나, 다양한 형태로 변형이 가능하며, 본 기술분야에서 통상의 지식을 가진 자라면 본 발명의 특허청구범위를 벗어남이 없이 다양한 변형예 및 수정예를 실시할 수 있을 것으로 이해된다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but many variations and modifications may be made without departing from the scope of the present invention. It will be understood that the invention may be practiced.

100 : 문자열 매칭 장치
110 : 패턴 분리부
120 : 제1 문자열 매칭부
130 : 제2 문자열 매칭부100: String matching device
110: pattern separator
120: a first string matching unit
130: second character string matching unit

Claims

A pattern separator for analyzing suffixes of a plurality of target character string patterns to separate and group into a non unique pattern or a unique pattern;
A first string matching unit for mapping each set of grouped non-unique patterns to a first bit-separated string matcher including a partial matching vector (PMV); And
A second string matching unit for mapping each set of grouped unique patterns to a second bit-separated string match including a partial matching index (PMI);
Lt; / RTI >
Wherein the pattern separator determines that the suffix of the target character string pattern is a non-unique pattern when the suffix of the target character string pattern is a suffix of another pattern, and determines that the suffix of the target character string pattern is a suffix of another pattern. .

delete

The method according to claim 1,
Wherein the first string matching unit divides each set of the grouped non-unique patterns into a plurality of bit sets and maps the set to the first bit-separated string matches among the plurality of bit-separated string matches.

The method of claim 3,
The first bit-sliced string matcher includes a plurality of tiles for receiving a plurality of divided bit sets. The vector pointer for each state is input to each row of the tile, and the input vector pointer And a partial matching vector corresponding to the partial matching vector table provided for each tile is stored.

5. The method of claim 4,
And a full matching vector is extracted using a bitwise AND operation of the stored partial matching vectors for each tile.

The method according to claim 1,
Wherein the second string matching unit divides each set of the grouped unique patterns into a plurality of bit sets and maps the sets to the second bit-separated string matches of the plurality of bit-sliced string matches.

The method according to claim 6,
Wherein the second bit-sliced string matcher includes a plurality of tiles each of which receives a plurality of divided bit sets, and a state-based partial matching indexer is stored in each row of the tiles.

8. The method of claim 7,
And comparing the stored partial matching indexers for each tile to determine whether they are the same.

8. The method of claim 7,
Wherein the partial matching indexer is memory address information of a partial matching vector table allocated for each state.

Analyzing the suffixes of the plurality of target character string patterns by the pattern separating unit to separate and group into a unique pattern or a unique pattern;
Mapping each set of grouped non-unique patterns to a first bit-separated string matcher comprising a partial matching vector (PMV) by a first string matching unit; And
Mapping each set of grouped unique patterns by a second string matching unit to a second bit-sliced string match comprising a partial matching index (PMI);
Lt; / RTI >
Analyzing suffixes of a plurality of target character string patterns to separate and group into a unique pattern or a unique pattern,
Wherein if the suffix of the target character string pattern is a suffix of another pattern, it is determined to be a non-unique pattern, and if the suffix of the target character string pattern does not become a suffix of another pattern, the character string matching method is determined.

delete

11. The method of claim 10,
Mapping each set of grouped non-unique patterns to a first bit-separated string match comprising a partial matching vector table,
Wherein each set of grouped non-unique patterns is divided into a plurality of bit sets and is mapped to one of the plurality of bit-sliced string matches.

13. The method of claim 12,
Mapping each set of grouped non-unique patterns to a first bit-separated string match comprising a partial matching vector table,
The first bit-sliced string matcher includes a plurality of tiles for receiving a plurality of divided bit sets. The vector pointer for each state is input to each row of the tile, and the input vector pointer And a partial matching vector corresponding to the partial matching vector table provided for each tile is stored.

14. The method of claim 13,
Mapping each set of grouped non-unique patterns to a first bit-separated string match comprising a partial matching vector table,
Wherein a partial matching vector for each tile is extracted using a bitwise AND operation to obtain a full matching vector.

11. The method of claim 10,
Mapping each set of grouped unique patterns to a second bit-separated string match comprising a partial match indexer,
Wherein each set of grouped unique patterns is divided into a plurality of bit sets and is mapped to one of the plurality of bit-sliced string matches.

16. The method of claim 15,
Mapping each set of grouped unique patterns to a second bit-separated string match comprising a partial match indexer,
Wherein the second bit-sliced string matcher includes a plurality of tiles each of which receives a plurality of divided bit sets, and a partial matching indexer is stored for each tile in each tile.

16. The method of claim 15,
Mapping each set of grouped unique patterns to a second bit-separated string match comprising a partial match indexer,
And comparing the stored partial matching indexers for each tile to determine whether they are the same.