KR20100073134A

KR20100073134A - String pattern containment decision aperture and method using hash for automatically signatures generating system

Info

Publication number: KR20100073134A
Application number: KR1020080131725A
Authority: KR
Inventors: 박상길; 문화신; 이성원; 오진태; 장종수; 조현숙
Original assignee: 한국전자통신연구원
Priority date: 2008-12-22
Filing date: 2008-12-22
Publication date: 2010-07-01
Also published as: KR101079817B1

Abstract

PURPOSE: A character string including type determining device for signature automatic creation system and a method thereof are provided to confirm similarity and comprehensibility in a classification of document or a search engine result thereby minimizing displayed result outcome. CONSTITUTION: If input data is new data, a white list management unit(110) renews bit vector table by hash value. A data search unit(120) searches the bit vector table through the white list management unit. The data search unit searches index having appointed hash value. The data search unit traces hash value of relevant index by searching active-set index according to circumstance.

Description

String pattern containment decision aperture and method using hash for automatically signatures generating system}

본 발명은 시그니처 자동생성 시스템을 위한 문자열 포함성 결정장치 및 방법에 관한 것으로, 특히 데이터 관리 시 새로운 데이터의 매칭 여부 또는 포함여부를 판단하여 다량의 데이터를 관리하는 자동생성 시스템을 위한 문자열 포함성 결정장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for determining string inclusion for a signature automatic generation system. In particular, the present invention relates to a string inclusion determination for an automatic generation system that manages a large amount of data by determining whether to match or include new data when managing data. An apparatus and method are provided.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT성장동력기술개발의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2006-S-042-03, 과제명: Network 위협의 Zero-Day Attack 대응을 위한 실시간 공격 Signature 생성 및 관리 기술개발].The present invention is derived from the research conducted as part of the IT growth engine technology development of the Ministry of Knowledge Economy and the Ministry of Information and Telecommunication Research and Development. Development of real-time attack signature generation and management technology for

서비스 보호를 위한 보안장치 및 기술은, 보호대상이 되는 서버에서 어플리케이션 특성을 추출하고 이를 서버에 접속하는 사용자에 적용하여 비정상 사용자를 탐지하는 기술로 특징 지을 수 있다. The security device and technology for service protection may be characterized as a technology for detecting an abnormal user by extracting application characteristics from a server to be protected and applying the same to a user accessing the server.

이러한 보안장치에서 외부의 침입 또는 접속을 탐지하고 차단하기 위해서는 위헌성 여부 또는 부당성 여부를 판단하기 위한 데이터가 필요하게 된다. 이러한 보안장치에서는 데이터를 지속적으로 수집하고 관리해야 할 필요성이 있으나, 그 데이터의 양이 막대하여 이를 효율적으로 관리할 수 있는 시스템이 요구된다. In order to detect and block an external intrusion or access in such a security device, data for determining whether it is unconstitutional or unfair is required. In such security devices, there is a need to continuously collect and manage data, but a large amount of data requires a system that can efficiently manage the data.

네트워크의 보안을 위해서는 우선 공격 패킷들의 특성을 파악하여 두는 작업이 필요하다. 이러한 공격 패킷의 특성을 시그니처(signature)로 등록해두고, 수신된 패킷에서 등록된 시그니처가 감지되면 그에 해당하는 보안정책을 적용하여 악성 사용자나 프로그램으로부터 대상 네트워크를 보호하게 된다. For the security of the network, it is necessary to first understand the characteristics of attack packets. The signature of the attack packet is registered as a signature, and when a signature registered in the received packet is detected, the corresponding security policy is applied to protect the target network from malicious users or programs.

네트워크상의 공격 패킷들의 특성을 추출하는 기술은 대부분 인터넷상의 웹 문서를 포함하는 전자문서들의 유사성을 검사하거나, 분류하는 기술을 기반으로 한다. 방대한 양의 전자 문서들 간의 유사성을 검사하기 위해서는, 우선 각각의 문서들이 가지는 특성을 간략하게 표현할 필요가 있다. 이렇게 간략화 된 문서들을 비교함으로써 유사성 검증에 소요되는 연산량을 최소화할 수 있다.Most of the techniques for extracting the characteristics of attack packets on a network are based on techniques for checking or classifying the similarity of electronic documents including web documents on the Internet. In order to examine the similarity between a huge amount of electronic documents, it is necessary to first express briefly the characteristics of each document. By comparing these simplified documents, the amount of computation required for similarity verification can be minimized.

즉, 유사한 데이터를 지속적으로 수집/관리하는 시스템에서, 새로운 데이터가 기존의 데이터 셋에 포함되는지, 데이터를 포함하는지, 또는 다른 셋인지 판단하는 기술이 필요하다. That is, in a system for continuously collecting / managing similar data, a technique for determining whether new data is included in an existing data set, includes data, or another set is required.

그런데, 문서의 크기가 큰 경우 또는 데이터베이스의 크기가 너무 큰 경우, 한 문서에 대해 계산된 해쉬 값들을 모두 비교하는 것은 시스템 성능이 저하되는 주요한 요인이 된다. 그 해결책으로는 샘플링이 사용되고 있다. 즉, 산출된 해쉬 값들을 모두 비교하는 것이 아니라 검증된 샘플링방법들을 이용하여 샘플링된 해쉬 값들만을 비교함으로써, 신뢰성 있는 결과를 얻으면서도 시스템 성능을 저하시키지 않도록 하고 있다.However, when the size of the document is large or the size of the database is too large, comparing all the hash values calculated for one document is a major factor that degrades system performance. Sampling is used as a solution. That is, by comparing only the hash values sampled using the verified sampling methods, rather than comparing all the calculated hash values, it is possible to obtain a reliable result while not degrading the system performance.

이러한 시스템은 모든 엔트리 수에 해당하는 비교연산이 필요한데, Andrei.Z Broder에 의해 제시되었던 “On the Resemblance and Containment”에서 사용됐던 기술은 m-gram Hashing을 이용하여 데이터를 해슁하여, 이 해쉬값을 기반으로 축약된 데이터 셋을 사용하여 Containment와 Resemblacne를 판단 하였으며, 이 때 N개의 엔트리와 비교하는 작업이 필요한 경우, 종래의 기술은 O(n)연산시간을 통하여 그 결과를 판단하였다. 다량의 패턴에 적용하기 위하여 “Approximate Pattern matching”기법을 적용할 수 있다. Such a system requires a comparison operation for all entries. The technique used in “On the Resemblance and Containment” presented by Andrei.Z Broder uses the m-gram Hashing to hash the data and use this hash value. Containment and Resemblacne were judged using the abbreviated data set based on this. When the comparison with N entries is needed, the conventional technique judges the result through O (n) operation time. To apply a large amount of patterns, the “Approximate Pattern matching” technique can be applied.

그러나, 하나의 패킷에서 여러 개의 시그니처가 생성되었을 경우, 어떤 것을 시그니처로 사용해야 할지결정하는 문제가 남아 있다. 이 작업을 거치지 않을 경우, 하나의 공격에 대하여 여러 개의 시그니처가 생성되어 시그니처 관리가 불가능하게 된다. 따라서, 발생한 시그니처를 검증하는데, 많은 량의 수작업을 동반하게 된다. However, when several signatures are generated in one packet, there remains a problem of deciding which one to use as a signature. If this is not done, multiple signatures are generated for one attack, making signature management impossible. Thus, verifying the signature that is generated is accompanied by a large amount of manual work.

상기와 같은 경우 Exact Matching은 적용 가능하였으나, 기존의 정확한 패턴 매칭(exact pattern matching) 기술 사용시 탐지에서 누락되는 문제가 생기기 쉽고, 현재의 패턴이 기존의 특정 패턴에 포함 또는 완전 일치를 판단하지 못하는 문제점이 있다. In this case, Exact Matching is applicable, but it is easy to miss the detection when using the existing exact pattern matching technology, and it is not possible to determine whether the current pattern is included or completely matched to the existing specific pattern. There is this.

본 발명의 목적은, 문서의 분류나 검색엔진결과에서 유사성과 포함성을 확인하여 사용자에게 제시하는 결과물의 최소화하고, 보안시스템에서 화이트 리스트(white list)를 적용하는 경우, 이 규칙에 부합되거나 포함되는 경우에 대하여 처리 규칙에 따라 빠른 처리가 가능한 시그니처 자동생성 시스템을 위한 문자열 포함성 결정장치 및 방법을 제공하는데 있다. An object of the present invention is to minimize the results presented to the user by checking similarity and inclusion in document classification or search engine results, and if the white list is applied in the security system, it meets or includes this rule. The present invention provides an apparatus and method for determining string inclusion for a signature automatic generation system capable of fast processing according to a processing rule.

상기한 과제를 해결하기 위한 본 발명에 따른 문자열 포함성 결정장치는 유입된 데이터를 슬라이딩 윈도우 기법에 따라 컨시큐티브 해싱(Consecutive 3-gram Hashing) 또는 오버랩핑 해싱(Overlapping n-gram ashing) 방식으로 해싱하는 해싱부, 상기 해상에 의해 생성된 해쉬값과 동일한 해쉬값을 갖는 화이트리스트 인덱스는 비트벡터테이블로부터 검색하여 상기 데이터의 화이트리스트 포함여부를 판단하는 데이터 탐색부, 상기 데이터가 화이트리스트에 추가할 문자열을 포함하는 새로운 데이터인 경우, 상기 데이터에 대한 해쉬값에 대응하는 비트벡터테이블을 갱신하여 상기 데이터를 화이트리스트에 등록하는 화이트리스트 관리부 및 상기 화이트리스트 관리부에 의한 등록을 제어하고, 상기 데이터 탐색부의 판단결과에 대응하여 상기 데이터를 폐기하거나 유지하는 제어부를 포함한다.An apparatus for determining string inclusions according to the present invention for solving the above-mentioned problems is based on a sliding window technique using a concatenated 3-gram hashing or overlapping hashing method. A hashing unit for hashing, a data search unit for determining whether a white list of the data is included in the white list index having the same hash value as the hash value generated by the resolution by searching from a bit vector table, and adding the data to the white list. In the case of new data including a character string to be performed, a white list management unit for updating the bit vector table corresponding to the hash value for the data and registering the data in the white list and the registration by the white list management unit are controlled. In response to the determination result of the search unit, the data is discarded or Underground includes a control unit.

또한, 본 발명에 따른 문자열 포함성 결정방법은 유입된 데이터에 포함된 문자열을 해싱하는 단계, 상기 해싱결과 생성된 해쉬값과 동일한 값을 갖는 화이트리 스트 인덱스를 비트벡터 테이블로부터 검색하는 단계, 상기 검색결과, 적어도 하나의 화이트리스트 인덱스가 존재하는 경우, 상기 데이터 상기 화이트리스트 인덱스를 매칭 하는 단계 및 상기 매칭결과, 상기 데이터의 문자열이, 상기 화이트리스트 인덱스에 대응하는 문자열에 순차적으로 나타나는 경우, 상기 데이터가 기 등록된 화이트리스트에 포함된 것으로 판단하여 상기 데이터를 정규데이터로 처리하는 단계를 포함한다. In addition, the method of determining a string inclusion according to the present invention includes hashing a string included in the imported data, retrieving a whitelist index having a value equal to a hash value generated as a result of the hashing from a bit vector table. If at least one whitelist index exists, matching the data whitelist index; and if the matching result, the character string of the data appears sequentially in the character string corresponding to the whitelist index, Determining that the data is included in a pre-registered white list, and processing the data as regular data.

또한, 본 발명에 따른 문자열 포함성 결정방법은 화이트리스트에 포함할 문자열이 포함된 데이터가 입력되는 단계, 상기 데이터에 대하여 상기 화이트리스트 포함여부를 판단하는 단계, 상기 데이터가 새로운 데이터인 경우 상기 데이터에 포함된 문자열을 상기 화이트리스트에 등록하는 단계 및 상기 데이터가 상기 화이트 리스트에 이미 포함된 경우, 이미 포함된 데이터임을 나타내는 결과값과 상기 데이터에 대응하는 화이트리스트 인덱스를 출력하는 단계를 포함한다. In addition, according to the present invention, a method for determining string inclusion includes inputting data including a string to be included in a white list, determining whether the white list is included in the data, and when the data is new data, Registering a character string included in the white list, and outputting a result value indicating that the data is already included and a white list index corresponding to the data if the data is already included in the white list.

본 발명에 따르면, 시그니처 자동생성 시스템을 위한 문자열 포함성 결정장치 및 방법은, 문서의 분류나 검색엔진결과에서 유사성과 포함성을 확인하여 사용자에게 제시하는 결과물의 최소화하고, 보안시스템에서 화이트 리스트(whitelist)를 적용하는 경우, 이 규칙에 부합되거나 포함되는 경우에 대하여 처리 규칙에 따라 처리가 가능하고, 불필요한 연산을 방지하므로, 적은 메모리를 이용한 처리가 가능하고 빠른 처리가 가능함에 따라 다양한 패턴에 대한 연산시간이 크게 감소되는 효과가 있다. According to the present invention, an apparatus and method for determining string inclusion for a signature automatic generation system minimizes a result of presenting to a user by checking similarity and inclusion in a classification of a document or a search engine result, and using a white list ( In case of applying whitelist), it can be processed according to the processing rule in case of meeting or including this rule, and it prevents unnecessary operation, so it can be processed using less memory and can be processed quickly. Computation time is greatly reduced.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 설명하면 다음과 같다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

도 1 은 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 구성에 대한 설명에 참조되는 블록도이다. 1 is a block diagram referred to for the description of the configuration of a character string inclusion determining apparatus according to an embodiment of the present invention.

본 발명의 문자열 포함성 결정장치는 문서의 분류나 검색엔진결과에서 유사성과 포함성을 확인하여 출력되는 결과물의 중복성을 최소화하는 일련의 작업을 수행한다. 또한, 시그니처 자동생성 시스템을 포함한 일련의 보안시스템에 화이트 리스트(white list)를 적용하는 경우, 이 규칙에 부합되거나 이 규칙에 완전히 포함되는 경우에 대하여 화이트리스트 처리 규칙을 바탕으로 소정의 작업을 수행한다.The string inclusion determining apparatus of the present invention performs a series of operations to minimize the redundancy of the outputted results by checking the similarity and inclusion in the classification of documents or search engine results. In addition, when a white list is applied to a series of security systems including a signature auto-generation system, a predetermined task is performed based on the white list processing rule in the case of meeting or being completely included in this rule. do.

이러한, 문자열 포함성 결정장치는 도 1 에 도시된 바와 같이, 입력부(180), 출력부(190), 화이트리스트 관리부(110), 데이터 탐색부(120), 비트벡터맵(150), 전체결과테이블(160), 현재결과테이브(170), 해싱부, 동작 전반을 제어하는 제어부(100)를 포함한다. As shown in FIG. 1, the string inclusion determining device includes an input unit 180, an output unit 190, a white list managing unit 110, a data searching unit 120, a bit vector map 150, and an overall result. The table 160 includes a current result tape 170, a hashing unit, and a controller 100 for controlling the overall operation.

문자열 포함성 결정장치에는 사용자가 데이터를 입력하거나 소정의 데이터를 표시하기 위한 입출력 인터페이스가 구비된다. 입력부(180) 또는 수신되는 데이터는 출력부(190)의 그래픽 인터페이스를 통해 출력되고, 입력된 데이터에 따른 화이트리스트 관련 정보 또한 출력부(190)의 그래픽 인터페이스를 통해 출력된다. The character string inclusion determining apparatus is provided with an input / output interface for a user to input data or to display predetermined data. The input unit 180 or the received data is output through the graphic interface of the output unit 190, and the white list related information according to the input data is also output through the graphic interface of the output unit 190.

입력부(180)는 적어도 하나의 입력수단을 구비함에 따라, 입력수단의 조작에 대응하여 소정의 데이터를 입력한다. 출력부(190)는 표시장치, 발광장치, 음향출력장치 중 적어도 하나를 구비하여 동작 상태를 출력하거나, 사용자에게 그래픽 인터 페이스를 제공한다. 여기서, 입출력인터페이스의 방식에 따라 입력부 및 출력부는 다양한 입출력수단이 구비될 수 있다.Since the input unit 180 includes at least one input unit, the input unit 180 inputs predetermined data in response to an operation of the input unit. The output unit 190 includes at least one of a display device, a light emitting device, and an audio output device to output an operation state or provide a graphic interface to a user. Here, the input unit and the output unit may be provided with various input / output means according to the method of the input / output interface.

비트벡터 맵(150), 전체결과테이블(160), 현재결과테이블(170)은 저장장치에 각각 별도로 저장된 데이터 또는 그에 대한 데이터가 저장되는 저장영역이다. 비트벡터 맵(150), 전체결과테이블(160), 현재결과테이블(170)는 데이터부에 포함되며, 제어부(100)의 제어명령에 대응하여 지정된 영역에 연관된 데이터가 저장되는 데이터 집합체이다. The bit vector map 150, the overall result table 160, and the current result table 170 are storage areas in which data stored separately in the storage device or data thereof is stored. The bit vector map 150, the overall result table 160, and the current result table 170 are included in the data unit, and are data collections in which data associated with a specified area is stored in response to a control command of the controller 100.

비트 벡터 맵(150)은 비트 벡터 테이블을 포함하고, 각 비트벡터 테이블에는 해슁에 따른 결과 데이터, 그에 대한 인덱스 데이터가 저장된다. The bit vector map 150 includes a bit vector table, and each bit vector table stores result data according to hashing and index data thereof.

현재결과테이블(170)은 화이트리스트관리부(110) 및 데이터탐색부(120), 해싱부에서 수행되는 소정의 작업에 대한 중간데이터, 소장작업에 의해 산출된 데이터가 저장된다. 특히 현재결과테이블(160)은 데이터 탐색부가 비트벡터맵내의 비트벡터테이블에서 소정의 해쉬값에 대응하는 인덱스를 검색하는 경우, 화이트리스트관리부가 새로운 데이터를 등록하는 경우, 그에 따른 중간 데이터, 결과데이터가 저장된다. The current result table 170 stores the white list management unit 110, the data search unit 120, intermediate data about a predetermined task performed by the hashing unit, and data calculated by the collection operation. In particular, the current result table 160 is used when the data search unit searches for an index corresponding to a predetermined hash value in the bit vector table in the bit vector map, and when the white list manager registers new data, intermediate data and result data accordingly. Is stored.

전체결과테이블(160)는 화이트리스트관리부(110), 데이터 탐색부9120), 제어부(100)에 의해 현재결과테이블(160)에 저장된 데이터 저장되거나, 소정의 연산결과 또는 조건을 만족하는 데이터가 누적 저장된다. The overall result table 160 stores data stored in the current result table 160 by the white list manager 110, the data searcher 9120, and the controller 100, or accumulates data that satisfies a predetermined calculation result or condition. Stored.

해싱부는 제어부(100)의 제어명령에 대응하여 입력부(180)로부터 입력되는 데이터를 해쉬함수를 이용하여 해슁한다. 이때, 해싱부는 해싱방식에 따라 오버래 핑 해싱 및 컨시큐티브 해싱을 수행한다. The hashing unit hashes data input from the input unit 180 in response to a control command of the controller 100 using a hash function. At this time, the hashing unit performs overlapping hashing and consumable hashing according to the hashing method.

오버래핑 해싱부(130)는 지정된 단위로 문자열을 중복하여 해싱하고, 컨시큐티브해싱부(140)는 지정된 단위로 문자열을 구분하여 해싱한다. 이때, 각 해싱부는 슬라이딩 윈도우를 적용하여 해싱 한다. The overlapping hashing unit 130 hashes the string repeatedly in a designated unit, and the conclusive hashing unit 140 divides and hashes the string in the designated unit. At this time, each hashing unit hashes by applying a sliding window.

화이트리스트 관리부((110)는 포함성 패턴매칭을 통해, 화이트리스트(Whitelist)로서 사용될 일련의 문자열들을, 화이트리스트 관리를 위한 메모리인, 비트-벡터 테이블에 반영하는 에 등록한다. The white list management unit 110 registers a series of strings to be used as a whitelist through inclusion pattern matching to be reflected in a bit-vector table, which is a memory for whitelist management.

화이트리스트 관리부((110)는 제어부(100)의 제어명령에 따라, 입력된 데이터가 비트 벡터 맵에 저장된 화이트 리스트의 포함여부를 판단하여, 새로운 데이터인 경우, 해쉬값을 이용하여 비트 벡터 테이블을 갱신함으로써 화이트 리스트를 등록한다. The white list manager 110 determines whether the input data is included in the white list stored in the bit vector map according to a control command of the controller 100, and if the data is new data, uses the hash value to determine the bit vector table. Register the white list by updating.

제어부(110)는 화이트 리스트 관리부(110)의 판단결과에 따라 오버랩핑 해싱부와, 컨시큐티브 해싱부 중 어느 하나를 선택하여, 데이터를 인가한다. The control unit 110 selects any one of the overlapping hashing unit and the conclusive hashing unit according to the determination result of the white list management unit 110, and applies data.

이때, 제어부(110)는 화이트 리스트 관리부(110)에 의해 기 저장된 화이트리스트에 포함된 데이터 인 경우, 출력부(190)를 통해 이미 포함된 데이터임을 나타내는 결과 메시지를 출력하고, 그에 관한 화이트 리스트 인덱스가 출력되도록 한다. In this case, when the data included in the white list previously stored by the white list management unit 110, the controller 110 outputs a result message indicating that the data is already included through the output unit 190, and the white list index corresponding thereto. Will output

또한, 제어부(110)는 데이터가 새로 화이트 리스트에 등록된 경우 등록 결과와, 그에 관한 화이트 리스트 인덱스가 출력부(190)를 통해 출력되도록 한다. 제어부는(110)는 등록과정에서 에러발생시 등록실패 및 에러 메시지가 출력되도록 한 다. In addition, when the data is newly registered in the white list, the controller 110 may output the registration result and the white list index related thereto through the output unit 190. The controller 110 outputs a registration failure and an error message when an error occurs in the registration process.

또한, 제어부(100)는 유입되는 패킷이나 컨텐츠에 대하여 화이트리스트(whitelist)를 적용하는 경우, 패킷이나 컨텐츠를 데이터 탐색부(120)러 인가하여 화이트리스트에 포함되는 패킷 또는 컨텐츠 데이터 인지 여부를 판단함에 따라 패킷 또는 컨텐츠를 송수신하거나 차단한다. When the whitelist is applied to the incoming packet or the content, the controller 100 applies the packet or the content to the data search unit 120 to determine whether the packet or the content data is included in the whitelist. As a result, packets are sent or received or blocked.

데이터탐색부(120)는 화이트 리스트 관리부(120)를 통해 비트벡터 테이블을 탐색하고 지정된 해쉬값을 갖는 인덱스를 검색하고, 경우에 따라 액티브 설정된 인덱스를 검색하여 해당 인덱스의 해쉬값을 추적한다. The data search unit 120 searches the bit vector table through the white list manager 120, searches for an index having a specified hash value, and optionally searches for an active index to track the hash value of the index.

데이터 탐색부(120)는 제어부(100)의 제어명령에 대응하여, 유입되는 패킷 또는 컨텐츠에 대해 슬라이딩-윈도우(Sliding-window) 기법을 적용하여 3-gram 해슁인 경우, 3개의 연속된 데이터에 대한 해쉬값에 대해 기존의 BT[white_index][해쉬값] 중 BT[*][해쉬값]에 ‘1’로 설정된 값이 있는지 확인한다. In response to a control command of the controller 100, the data search unit 120 applies a sliding-window technique to an incoming packet or content to three consecutive data in the case of 3-gram hashing. Check whether there is a value set to '1' in BT [*] [hash value] among existing BT [white_index] [hash value].

데이터 탐색부(120)는 비트벡터 테이블 중, 확인되는 테이블의 화이트리스트 인덱스(white_index) 값 들을 현재 결과테이블(current table)(170)에 작성한다. The data search unit 120 writes the white list index values of the table to be checked in the bit vector table in the current result table 170.

데이터 탐색부(120)는 이렇게 작성된 현재결과테이블(170)은 초기 상태에 전체 점검 테이블(total table)(160)과 “AND”연산을 수행하고, 현재 결과 테이블 과 전체전체 점검 테이블의 교집합에 따른 결과를 다시 전체결과테이블(160)에 반영한다. The data search unit 120 performs an “AND” operation with the total table 160 in the initial state, and the present result table 170 is generated according to the intersection of the current result table and the entire entire check table. The result is reflected back to the overall result table 160.

즉, 데이터 탐색부(120)는 해싱부에서, 슬라이딩-윈도우 기법에 의해 쉬프트되면서 생성되는 해쉬값이 모두 존재하는 화이트리스트인덱스(white_index)를 찾아 낸다. That is, the data search unit 120 finds a white list index (white_index) in which all hash values generated while being shifted by the sliding-window technique are present in the hashing unit.

데이터 탐색부(120)는 비교대상인 문자열이 끝까지 이르지 않더라도 전체 결과 테이블(160)가 NULL이 되는 경우, 즉 전체 결과 테이블(160)에 인덱스가 존재하지 않는 경우 현 비교대상의 문자열이 화이트 리스트(whitelist)에 존재하지 않는 것으로 판단한다. If the entire result table 160 becomes NULL, that is, if the index does not exist in the entire result table 160, even if the string to be compared does not reach the end, the current search string is whitelisted. ) Does not exist.

제어부(110)는 프로그램이 정상적으로 수행하는 루틴에 따라, 이러한 컨텐츠를 취급한다.The control unit 110 handles such contents according to a routine normally executed by the program.

데이터 탐색부(120)는 비교대상의 문자열의 마지막까지 작업 수행 후, 전체 결과테이블(160)에 적어도 하나의 화이트리스트인덱스(white_index)가 존재하는 경우, 이러한 화이트리스트 인덱스에 대하여 현재의 문자열과 정확한 멀티 패턴 매칭(exact multi-pattern matching)을 통하여, 현재의 문자열의 집합이 순차적으로 기존의 화이트리스트 인덱스에 해당하는 문자열에 순차적으로 존재하는지 점검한다. If at least one white list index (white_index) exists in the entire result table 160 after performing the operation to the end of the character string to be compared, the data search unit 120 may correct the current character string with the current character string. Through exact multi-pattern matching, it is checked whether a current set of strings is sequentially present in a string corresponding to an existing whitelist index.

이때, 제어부(110)는 상기와 같이 화이트리스트 인덱스에 대응하는 문자열이 순차적으로 존재하는 경우 현재 유입되는 객체가 화이트리스트(whtielist)에 해당하는 것으로 판단하고, 순서가 다른 경우에는 화이트리스트에 부합되지 않는 것으로 판단한다. 판단결과에 대응하여 화이트리스트(whitelist) 적용 정책에 따라 패킷 또는 컨텐츠를 폐기하거나 유지한다. In this case, when the strings corresponding to the whitelist indexes are sequentially present as described above, the controller 110 determines that the currently introduced object corresponds to the whitelist, and if the order is different, the controller 110 does not match the whitelist. I do not think. In response to the determination result, the packet or contents are discarded or maintained according to the whitelist application policy.

도 2 는 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 문자열 포함성 여부 결정에 사용되는 해싱 방법에 대한 설명에 참조되는 예시도이다. 도 2의 (a) 는 오버랩핑 해싱을 나타내는 도이고, 도 2의 (b)는 컨시큐티브 해싱을 나타낸 도이다. FIG. 2 is an exemplary view referred to for describing a hashing method used to determine whether a string includes a string according to an embodiment of the present invention. FIG. 2A is a diagram illustrating overlapping hashing, and FIG. 2B is a diagram illustrating conservative hashing.

문자열 포함성 결정장치가 사용되는 시그니처 자동 생성 시스템에서 각 시그니처의 내용은 대부분 아스키 코드 값이거나, 문자열이다. In signature auto-generation systems that use string inclusion determinants, the contents of each signature are mostly ASCII code values or strings.

문자열 포함성 결정장치는 이러한 데이터를 모두 보관하지 않고, 데이터의 값을 대신하는 해싱값을 이용하여 데이터를 보관한다. 또한, 문자열 포함성 결정장치는 해싱값을 모두 포함하는 것이 아니라, 비트-백터 테이블에, BT[white_index][해쉬값]의 형태로, 데이터의 해싱값과 화이트리스트 인덱스를 이용하여 저장한다. The string inclusion determiner does not store all of this data, but uses a hashing value instead of the data value to store the data. In addition, the string inclusion determining apparatus does not include all the hashing values, but stores them in the bit-vector table using the hashing values of the data and the whitelist index in the form of BT [white_index] [hash values].

예를 들어, 비트 벡터 테이블에 대하여, 화이트리스트(white list)의 인덱스가 1이고 해쉬값이 ‘1234’인 경우, BT[1][1234]에 ‘1’을 설정한다. For example, with respect to the bit vector table, when the index of the white list is 1 and the hash value is '1234', '1' is set in BT [1] [1234].

해싱부는 해쉬함수를 이용하여 해싱(hashing)하는 경우 오버랩핑 해싱 또는 컨시큐티브 해싱 방식을 이용하여 데이터를 해싱한다. When hashing using a hash function, the hashing hashes the data using an overlapping hash or a convolutional hashing scheme.

도 2 의 (a)와 같이 슬라이딩 윈도우를 적용한 오버랩핑 해싱의 경우, 입력된 데이터에 대하여 해싱에 따른 문자열의 크기를 지정하고, 문자열의 크기에 따라 첫번째 윈도우(11)는 ‘85, ff, 53’, 두번째 윈도우(12)는 ff, 53, 4d, 세번째 윈도우(13)는 ‘53, 4d, 42’로 설정되어 해슁이 수행된다. In the overlapping hashing to which the sliding window is applied as shown in FIG. ', The second window 12 is set to ff, 53, 4d, and the third window 13 is set to '53, 4d, 42' and hashing is performed.

한편, 도 2의(b)와 같이 컨시큐티브 해싱의 경우에는 지정된 문자열의 크기에 따라 첫번째 윈도우(21)은 ‘85, ff, 53’, 두번째 윈도우(22)는 ‘4d, 42, 72’로 나누어서 해싱이 수행된다. On the other hand, as shown in FIG. 2B, in case of the concatenated hashing, the first window 21 is '85, ff, 53 'and the second window 22 is' 4d, 42, 72' according to the size of the designated string. Hashing is performed by dividing by.

여기서, 오버랩핑 해싱부(130)는 화이트리스트를 등록하기 위해 해싱할 때 사용되고, 컨시큐티브 해싱부9140)는 기존의 화이트리스트에 포함되는지, 또한 현재의 컨텐츠가 화이트리스트에 포함되는지 여부 확인을 위해 해싱할 때 사용된다. Here, the overlapping hashing unit 130 is used when hashing to register the white list, and the conclusive hashing unit 9140 checks whether the existing white list is included in the existing white list and whether the current content is included in the white list. Is used when hashing.

도 3 은 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 비트 벡터 테이블에 대한 설명에 참조되는 도이다. FIG. 3 is a diagram referred to for describing a bit vector table of an apparatus for determining string inclusions according to an embodiment of the present invention.

문자열 포함성 결정장치는 데이터는 해싱하여, 비트 벡터 맵의 비트벡터 테이블에 저장하는 경우, 해쉬값에 해당하는 32bit의 정수 값을 해슁함수에 의한 가능한 한 영역에 대해 보관하는 것이 아니라, 화이트리스트 인덱스와 해슁가능영역의 비트(bit)의 곱에 해당하는 수만큼의 데이터를 보관한다. 그에 따라 문자열 포함성 결정장치는 많은 수의 화이트 리스트(white list)에 대하여서 소규모의 메모리를 이용하여 적용 가능하다.When the string inclusion device hashes the data and stores it in the bit vector table of the bit vector map, the whitelist index does not store an integer value of 32 bits corresponding to the hash value as much as possible by the hash function. The number of data corresponding to the product of the bit of the and the hashable area is stored. Accordingly, the string inclusion determiner can be applied using a small amount of memory for a large number of white lists.

도 3은 문자열 포함성 결정장치에서, 슬라이딩 윈도우를 적용한 오버랩핑 해싱과, 컨시큐티브 해싱, 이 두가지 해싱방식에 의한 해쉬값의 분포를 1M까지 발생하게 하는 해슁함수에 의해 2000개의 화이트리스트(whitelist)를 처리하는데 있어서, 소정의 데이터를 처리하기 위해 생성하는 비트 벡터 테이블(Bit-Vector Table)의 개략적인 데이터 구조를 도시한 것이다. FIG. 3 shows 2000 whitelists in a string inclusion determining apparatus using overlapping hashing with a sliding window and a hashing function that generates a distribution of hash values by two hashing schemes, namely, convolutional hashing, and the like. ) Shows a schematic data structure of a bit-vector table generated for processing certain data.

비트벡터 테이블은, 해싱함수값의 범위에 따라 1M영역 16K, 1K등으로 변환하여 적용할 수 있고, 화이트리스트(white list)의 수가 많이 필요한 경우 그 영역에 맞추어 적용할 수 있다.The bit vector table can be converted into 1M area 16K, 1K, etc. according to the range of hashing function values, and can be applied to the area when a large number of white lists is required.

따라서, 본 발명의 문자열 포함성 결정장치는 데이터관리 시, Database나 기 존의 자료화된 데이터 집합에 대하여 새로운 데이터 셋이 정확히 매칭되는지, 그리고 현재의 데이터 셋이 기존의 데이터 셋에 대해 포함되는지 쉽게 판단할 수 있는 기술이다. 보안시스템 등에서 화이트리스트를 관리하는 영역에 대해 많은 수의 화이트리스트를 손쉽고 적은 데이터를 사용하여 관리할 수 있다.Therefore, the string inclusion determining apparatus of the present invention can easily manage whether the new data set is correctly matched with a database or an existing data set, and whether the current data set is included with the existing data set. It is a skill that can be judged. A large number of white lists can be managed easily and with less data for areas that manage white lists in security systems.

이러한 경우 대부분의 엔진에서 M개의 화이트리스트에 대해 그 크기에 대응하는 수행시간을 요하나, 문자열 포함성 결정장치는 상기와 같이 비트벡터 테이블을 이용하여 빠른 검색시간을 제공한다.In this case, although most engines require execution time corresponding to the size of M whitelists, the string inclusion determining device provides a fast search time using a bitvector table as described above.

도 4 는 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 화이트리스트 등록방법에 대한 동작설명에 참조되는 순서도이다. 4 is a flowchart of an operation description of a method for registering a whitelist of a string inclusion determining apparatus according to an embodiment of the present invention.

도 4에 도시된 바와 같이 문자열 포함성 결정장치는 화이트리스트(Whitelist)로서 사용될 일련의 문자열들을 화이트리스트 관리를 위한 메모리 인, 비트-벡터 테이블에 반영하여 등록한다. As shown in FIG. 4, the string inclusion determining apparatus reflects and registers a series of strings to be used as a whitelist in a bit-vector table, which is a memory for whitelist management.

문자열 포함성 결정장치에는 사용자가 데이터를 입력하거나 소정의 데이터를 표시하기 위한 입출력 인터페이스가 표시되는데, 그래픽 사용자 인터페이스 등의 입출력 인터페이스를 통해 화이트 리스트를 위한 데이터가 입력되면(S310), 제어부(100)는 입력된 데이터를 화이트리스트 관리부(110)로 인가한다. An input / output interface for inputting data or displaying predetermined data is displayed on the string inclusion determining apparatus. When data for white list is input through an input / output interface such as a graphic user interface (S310), the controller 100 Applies the input data to the white list manager 110.

화이트리스트 관리부(110)는 유입된 데이터가 기존의 화이트리스트에 포함되는지 여부를 우선 판단하고(S320), 이미 포함된 경우에는 기 포함된 데이터임을 제어부(100)로 알린다. 이때, 제어부(100)는 출력부(190)를 통해 입력된 데이터가 기 포함된 데이터임을 나타내는 결과 및 그에 관한 화이트리스트 인덱스가 출력되도록 한다(S370). The white list manager 110 first determines whether the incoming data is included in the existing white list (S320), and if it is already included, informs the controller 100 that the data is already included. In this case, the control unit 100 outputs a result indicating that the data input through the output unit 190 is pre-contained data and a white list index thereof (S370).

한편, 화이트리스트 관리부(110)는 데이터가 화이트리스트에 포함되지 않은 경우, 새로운 데이터인지 확인 후(S330), 데이터를 등록하기 위해 해싱부로 데이터를 인가한다. Meanwhile, when the data is not included in the white list, the white list manager 110 checks whether the data is new data (S330) and applies the data to the hashing unit to register the data.

여기서 제어부(100)는 화이트 리스트에 데이터를 등록하는 것이므로, 오버랩핑 해싱부(130)와, 컨시큐티브 해싱부(140) 중 오버랩핑 해싱부(130)로 데이터를 인가한다. Since the control unit 100 registers data in the white list, the control unit 100 applies the data to the overlapping hashing unit 130 and the overlapping hashing unit 130 among the congestive hashing unit 140.

오버랩핑해싱부(130)는 입력된 데이터는 전술한 도 2의 (a)와 같이 슬라이딩 윈도우 형태의 오버랩핑 방식에 따라 해싱 작업을 수행한다(S340).The overlapping hashing unit 130 performs a hashing operation on the input data according to the overlapping method in the form of a sliding window as shown in FIG. 2A (S340).

화이트리스트관리부(110)는 해싱부로부터 해싱값을 인가 받아, 생성된 해쉬값에 해당하는 비트 벡터 테이블의 해당 값을 갱신한다(S350).The white list management unit 110 receives a hashing value from the hashing unit and updates the corresponding value of the bit vector table corresponding to the generated hash value (S350).

입력된 데이터에 대한 등록이 완료되면, 화이트리스트관리부(110)는 등록 성공을 제어부(100)로 알리고, 제어부(100)는 그에 대한 결과 및 화이트 리스트 인덱스 정보를 출력부(190)를 통해 출력한다. When the registration for the input data is completed, the white list management unit 110 notifies the control unit 100 of the successful registration, and the control unit 100 outputs the result and the white list index information thereof through the output unit 190. .

한편, 화이트리스트관리부(110)는 상기와 같은 등록절차를 수행하는 중에 오류가 발생되는 경우, 등록 실패를 제어부로 알리고, 제어부(100)는 등록실패에 따른 결과 및 그에 관련된 화이트리스트 정보가 출력되도록 한다(S380) On the other hand, if an error occurs while performing the above registration process, the white list manager 110 notifies the controller of the registration failure, and the controller 100 outputs the result of registration failure and the white list information related thereto. (S380)

도 5 는 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 전처리 절차에 대한 동작설명에 참조되는 순서도이다. 전술한 도 4의 화이트 리스트 등록 및 포함성 결정을 위한 그 전처리 과정을 보다 상세히 설명한다FIG. 5 is a flowchart of an operation description of a preprocessing procedure of an apparatus for determining a string inclusion according to an embodiment of the present invention. The preprocessing process for the white list registration and inclusion determination of FIG. 4 described above will be described in more detail.

도 5를 참조하면, 제어부(100)는 현재 화이트리스트로 사용하고자 할 컨텐츠와, 관련 화이트리스트 인덱스를 화이트리스트 관리부(110)로 인가한다(S410). Referring to FIG. 5, the controller 100 applies the content to be used as the current white list and the related white list index to the white list manager 110 (S410).

화이트리스트 관리부(110)는 첫번째 화이트리스트 인지 여부를 판단하여(S420), 첫번째 화이트리스트인 경우 해싱부는 오버랩핑 해싱(Overlapping n-gram hashing)을 수행한다(S430). 여기서, 오버랩핑 3 그램 해싱(Overlapping 3-gram hashing)이 수행되는 것을 예로 하여 설명한다. The white list manager 110 determines whether it is the first white list (S420), and in the case of the first white list, the hashing unit performs overlapping n-gram hashing (S430). Here, an explanation will be given taking as an example that overlapping 3-gram hashing is performed.

현재 윈도우가 위치하는 3개의 문자열에 대하여 오버랩핑 해싱(Overlapping 3-gram hashing)을 수행되면, 화이트리스트 관리부(110)는, 그 결과 생성된 해쉬값을 이용하여 비트벡터맵을 갱신한다(S440). 이때, 화이트리스트관리부(110)는 생성된 해쉬값에 해당하는 비트벡터테이블, BT[white_index][해쉬값]에 ‘1’을 설정한다. When the overlapping 3-gram hashing is performed on the three strings on which the current window is located, the white list manager 110 updates the bit vector map using the hash value generated as a result (S440). . At this time, the white list manager 110 sets '1' to the bit vector table corresponding to the generated hash value, BT [white_index] [hash value].

화이트리스트관리부(110)는 상기와 같은 과정을, 추가하고자 하는 컨텐츠의 마지막에 도달하기 까지(S450) 반복하여 수행한다(S430, S440). 컨텐츠의 마지막 까지 상기와 같은 과정이 수행되면, 제어부(110)는 그 결과 및 대응하는 화이트리스트 인덱스가 출력되도록 한다(S570).The white list management unit 110 repeatedly performs the above process until reaching the end of the content to be added (S450) (S430 and S440). When the above process is performed until the end of the content, the controller 110 outputs the result and the corresponding white list index (S570).

한편, 입력된 컨텐츠 등의 데이터가 첫번째 화이트리스트가 아닌 경우, 화이트리스트관리부(110)는 해싱부를 통해 데이터에 대하여 컨시큐티브 해싱이 수행되도록 한다(S460). On the other hand, if the data such as the input content is not the first white list, the white list management unit 110 to perform a conjunctive hashing on the data through the hashing unit (S460).

해싱부는 슬라이딩-윈도우를 이용하여 컨시큐티브 해싱(Consecutive 3-gram Hashing)을 수행하고, 비트벡터 화이트리스트맵을 룩업하여(S470), 화이트리스트관 리부(110)는 화이트리스트의 마지막 문자열에 도달하기 까지 해쉬값을 탐색하는 과정이 반복되도록 한다. The hashing unit performs a conscutive 3-gram hashing by using the sliding window, and looks up the bit vector white list map (S470), and the white list management unit 110 reaches the last string of the white list. The process of searching for a hash value is repeated until

이때, 데이터탐색부(120)는 1부터 현재의 화이트리스트인덱스에서 1을 뺀 값까지의 인덱스에 대하여, 해쉬값에 대응하는 비트벡터테이블의 값이 ‘1’로 설정된 화이트리스트 인덱스 값을 찾는다(S480). At this time, the data search unit 120 finds a whitelist index value in which the value of the bit vector table corresponding to the hash value is set to '1' for an index from 1 to a value obtained by subtracting 1 from the current whitelist index. S480).

예를 들어, 해쉬값이 ‘1234’이고, 현재 화이트리스트 인덱스가 8일 때, 데이터탐색부(120)는 BT[1][1234] 부터, BT[7][1234]까지의 비트벡터테이블을 탐색하여 그 값이 1인 인덱스를 찾는다. For example, when the hash value is '1234' and the current white list index is 8, the data search unit 120 performs a bit vector table from BT [1] [1234] to BT [7] [1234]. Search and find the index whose value is 1.

데이터탐색부(120)는 상기와 같은 탐색과정에 따른 결과값은 현재결과테이블(170)에 반영한다(S490). The data search unit 120 reflects the result of the search process as described above in the current result table 170 (S490).

즉, BT[1][1234], BT[2][1234], BT[5][1234], BT[7][1234]의 값이 1로 설정된 경우, 데이터 탐색부(120)는 1,2,5,7 인덱스가 생성된 해쉬값에 대한 검색결과로서 현재결과테이블(170)에 저장된다. That is, when the values of BT [1] [1234], BT [2] [1234], BT [5] [1234], and BT [7] [1234] are set to 1, the data search unit 120 may have 1, 2, 5, and 7 indexes are stored in the current result table 170 as search results for the generated hash values.

이때, 데이터탐색부(120)는 슬라이딩-윈도우의 위치가 첫번째인 경우(S500), 동일한 값을 전체 결과 테이블(160)에도 반영한다(S510). 전술한 도2에서, 현재의 윈도우가 첫번째 윈도우인 즉, ‘85, ff, 53’인 경우 상기 검색결과는 전체결과 테이블에도 저장한다.In this case, when the position of the sliding-window is the first (S500), the data search unit 120 reflects the same value in the entire result table 160 (S510). In FIG. 2, when the current window is the first window, that is, '85, ff, 53 ', the search result is also stored in the overall result table.

슬라이딩-윈도우의 위치가 첫번째가 아닌 경우, 데이터 탐색부(120)는 현재 결과 테이블(170)에 존재하는 화이트리스트 인덱스와, 그 전 스텝에서 산정되었던 전체결과 테이블(160)에 존재하는 화이트리스트 인덱스에 대하여 논리연산 AND를 수행하여(S520), 그 결과, 즉 이전 전체결과테이블과 현재결과테이블에 포함된 화이트리스트 인덱스의 교집합을 전체결과 테이블(170) 에 반영한다(S530).If the position of the sliding-window is not the first, the data search unit 120 may present a white list index existing in the current result table 170 and a white list index present in the entire result table 160 calculated in the previous step. In operation S520, the intersection of the previous full result table and the white list index included in the current result table is reflected in the full result table 170 (S530).

상기와 같은 과정을 수행하는 중, 현재결과테이블(170)에 인덱스가 존재하지 않아 널(NULL)값을 갖는 경우 또는 전체결과 테이블(160)에 인덱스가 존재하지 않아 널(NULL)인 경우(S540, 550), 현 화이트리스트가 기존의 화이트리스트에 포함되지 않는 것으로 판단하여, 현재의 화이트리스트를 비트벡터 테이블에 반영하기 위해, 오버랩핑 해싱 및 해쉬값에 대응하는 비트벡터테이블 갱신 과정을 마지막 문자열까지 반복하여 수행한다(S430, S440). 이렇게 비트벡터 테이블이 갱신되면 제어부(100)는 그에 따른 결과가 출력되도록 한다(S570).If the index is not present in the current result table 170 and has a null value while performing the above process, or if the index is not present in the entire result table 160, the result is null (S540). 550) In order to determine that the current whitelist is not included in the existing whitelist, and to reflect the current whitelist in the bitvector table, the process of updating the bitvector table corresponding to the overlapping hashing and the hash value is the last string. Repeatedly performed until (S430, S440). When the bit vector table is updated in this way, the control unit 100 outputs the result according to operation S570.

현재결과 테이블 및 전체결과테이블에 적어도 하나의 인덱스가 존재하는 경우에는 마지막 문자열에 도달하기까지(S560), 컨시큐티브해싱에 따라 생성된 해쉬값을 이용한 비트벡터 테이블 검색을 포함한 상기 과정을 반복하여 수행한다(S460 S550).If at least one index exists in the current result table and the entire result table, the process is repeated until the last character string is reached (S560), including a bit vector table search using a hash value generated by convolutional hashing. It performs (S460 S550).

화이트리스트의 마지막 문자열까지 작업이 완료되면, 전체결과테이블(160)에 적어도 하나의 인덱스가 존재하는 경우 현재의 화이트리스트가 기존의 화이트리스트에 포함된 경우 것이므로 그에 따른 결과 및 화이트리스트인덱스가 제어부(100)로 인가되고, 그 결과는 출력부(190)를 통해 출력된다(S570). When the operation is completed up to the last string of the white list, if at least one index exists in the entire result table 160, the current white list is included in the existing white list. 100, and the result is output through the output unit 190 (S570).

그에 따라 요청된 화이트리스트를 규칙화 하는 전처리 과정이 완료된다. This completes the preprocessing process of ordering the requested whitelist.

도 6 은 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 새로운 객체의 포함성 여부 결정방법에 대한 동작설명에 참조되는 순서도이다. FIG. 6 is a flowchart of an operation description of a method of determining whether a new object is included in a string inclusion determining device according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 문자열 포함성 결정장치는 현재 유입되는 컨텐츠에 대해, 다음과 같은 과정을 수행하여 기존의 화이트리스트 에 부합되는지를 여부를 판단한다.As shown in FIG. 6, the apparatus for determining string inclusion includes performing the following process on the currently introduced content to determine whether it corresponds to an existing white list.

일련의 문자열로 구성된 새로운 객체가 입력되면(S610), 제어부(100)는 객체 등의 데이터를 데이터 탐색부(120)로 인가한다. 입력된 객체는 컨시큐티브 해싱방식으로 해싱되고(S620), 데이터탐색부(120)는 비트벡터 화이트리스트 맵을 룩업 한 후(S630), 생성된 해쉬값을 비트벡터 테이블에서 해쉬값을 탐색하여, 생성된 해쉬값과 동일한 해쉬값을 갖는 인덱스를 찾는다(S640).When a new object composed of a series of character strings is input (S610), the controller 100 applies data such as an object to the data search unit 120. The input object is hashed by the concatenated hashing method (S620), and the data search unit 120 looks up the bitvector whitelist map (S630), and then searches for the generated hash value in the bitvector table. In operation S640, an index having a hash value equal to the generated hash value is found.

이때, 첫번째 화이트리스트부터 시스템에 존재하는 화이트리스트의 수까지의 비트벡터테이블을 탐색하여, 생성된 해쉬값과 동일한 해쉬값을 갖는 인덱스를 찾는다. 예를 들어, 생성된 해쉬값이 ‘1234’이고, 화이트리스트가 8개인 경우 BT[1][해쉬값] 부터 BT[8][해쉬값]에 대하여 해쉬값이 ‘1234’인 인덱스를 검색한다.At this time, the bit vector table from the first whitelist to the number of whitelists existing in the system is searched to find an index having the same hash value as the generated hash value. For example, if the generated hash value is '1234' and there are 8 white lists, the index of the hash value '1234' is searched for BT [1] [hash value] through BT [8] [hash value]. .

이 경우, 화이트리스트 패턴은 연속된 3-gram의 해쉬값을 갖고 있어서 많은 데이터와 비교하는 경우, 비교하기 위한 현재의 객체는 1/3에 해당하는 해쉬값을 가지고 포함성 여부를 판단하게 되므로, 적용을 하게 되어 종래에 비해 수행시간이 감소된다. In this case, the whitelist pattern has a successive 3-gram hash value. When comparing with a large amount of data, the current object to be compared has a hash value corresponding to 1/3 to determine whether it is included. The application time is reduced compared to the conventional.

비트벡터 테이블에 생성된 해쉬값에 대응되는 해쉬값을 갖는 인덱스가 존재하는 경우(S650), 현재결과테이블(170) 및 전제결과테이블(160)을 갱신한다(S660).If an index having a hash value corresponding to the generated hash value exists in the bit vector table (S650), the current result table 170 and the premise result table 160 are updated (S660).

상기와 같은 과정을 수행하는 중, 현재결과테이블(170)에 인덱스가 존재하 지 않아 널(NULL)값을 갖는 경우 또는 전체결과 테이블(160)에 인덱스가 존재하지 않아 널(NULL)인 경우(S670, S680), 데이터탐색부(120)는 입력된 객체가 화이트리스트에 해당하지 않는 것으로 판단하여 그에 대한 결과를 제어부(100)로 인가하고, 제어부(100)는 입력된 객체에 대한 패킷 또는 트래픽이 폐기 또는 패스되도록 한다(S750). If the index is not present in the current result table 170 and has a null value while the above process is performed, or the index is not present in the entire result table 160, or null. S670 and S680, the data search unit 120 determines that the input object does not correspond to the white list, and applies a result thereof to the control unit 100, and the control unit 100 receives a packet or traffic for the input object. This is to be discarded or passed (S750).

현재결과 테이블 및 전체결과테이블에 적어도 하나의 인덱스가 존재하는 경우에는 유입된 객체의 마지막에 도달하기까지(S690), 컨시큐티브해싱에 따라 생성된 해쉬값에 해당하는 해쉬값을 갖는 비트벡터 테이블 검색을 포함한 상기 과정을 반복하여 수행한다(S620 S690).If at least one index exists in the current result table and the entire result table, the bit vector table having a hash value corresponding to the hash value generated by the convolutional hashing until reaching the end of the introduced object (S690). Repeat the above process including the search (S620 S690).

유입된 객체의 마지막 까지 상기와 같은 반복 작업을 수행한 후, 전체 결과 테이블(160)에 하나 이상의 화이트리스트 인덱스가 존재하는 경우, 전체결과 테이블(160)에 존재하는 화이트리스 인덱스 각각에 대하여 정확한 매칭(Exact matching)을 수행한다(S700).After performing the above repetitive operation until the end of the imported object, if one or more whitelist indexes exist in the entire result table 160, exact matching with respect to each of the whiteless indexes existing in the entire result table 160 is performed. (Exact matching) is performed (S700).

매칭결과 결과, 객체의 문자열이 정확한 매칭에 따라 순차적으로 나타나는지 여부를 판단하여(S710), 현재 객체의 문자열이 순차적으로 화이트리스트의 문자열 집합에 나타나는 경우, 현재 객체가 화이트리스트에 해당하는 것으로 판단하고(S720) 그 결과를 제어부(100)로 인가한다. As a result of the matching, it is determined whether the strings of the objects appear sequentially according to the exact matching (S710). When the strings of the current objects appear sequentially in the string set of the white list, it is determined that the current objects correspond to the white list. (S720) The result is applied to the controller 100.

그에 따라 제어부(100)는 유입된 객체가 화이트리스트에 포함됨을 알리기 위해 결과를 출력하고, 메모리를 해제한 후, 리턴한다(S730).Accordingly, the control unit 100 outputs a result to inform that the introduced object is included in the white list, releases the memory, and returns (S730).

한편, 매칭결과, 객체의 문자열의 순서가 다른 경우, 입력된 객체가 화이트 리스트에 부합되지 않는 것으로 판단하고(S740), 그에 따른 결과는 출력하고 현재 객체는 폐기하거나 패스한다(S750). On the other hand, if the matching result, the order of the string of the object is different, it is determined that the input object does not match the white list (S740), the result is output and the current object is discarded or passed (S750).

본 발명은 상기와 같이, 화이트리스트에 규칙성을 부여하는 등록과정에서 해쉬값의 분포가능영역에 대하여 비트벡터테이블을 관리하고, 슬라이딩-윈도우 영역에 의해 발생되는 해쉬값이 전체 비트벡터테이블 에 존재하는 규칙의 인덱스를 지속적으로 관리하여, 전체결과 테이블에 존재하지 않는 경우 더 이상 연산이 수행되지 않도록 함으로써, 불필요한 연산이 수행되는 것을 방지한다. As described above, the present invention manages the bit vector table for the distributable area of the hash value in the registration process of giving regularity to the white list, and the hash value generated by the sliding-window area exists in the entire bit vector table. By continuously managing the index of the rule to prevent the operation is no longer performed if it does not exist in the entire result table, unnecessary operations are prevented from being performed.

또한, 본 발명은 다량의 데이터를 사용하는 기존의 패턴 매칭에 비해, 소량의 데이터 셋을 이용하여 소정의 필터링 절차 후에, 멀티패턴 매칭(multi pattern matching)을 수행하여 현재의 컨텐츠가 기존의 규칙에 부합되는지 확인하는 절차를 수행함으로써 연산속도가 향상되고 데이터 사용양이 감소되도록 한다. 또한 다량의 패턴내에서 포함성을 유지하는 멀티패턴 매칭을 짧은 연산시간에 제공할 수 있다. In addition, the present invention performs multi-pattern matching after a predetermined filtering procedure using a small amount of data set, compared to a conventional pattern matching using a large amount of data, so that the current content is applied to an existing rule. Performing a procedure to check for conformance improves computation speed and reduces data usage. In addition, it is possible to provide multi-pattern matching in a short calculation time to maintain inclusion in a large amount of patterns.

이상과 같이 본 발명에 의한 시그니처 자동생성 시스템을 위한 문자열 포함성 결정장치 및 방법은 예시된 도면을 참조로 설명하였으나, 본 명세서에 개시된 실시예와 도면에 의해 본 발명은 이에 한정되지 않고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 응용될 수 있다. As described above, the apparatus and method for determining string inclusion for a signature automatic generation system according to the present invention have been described with reference to the illustrated drawings. However, the present invention is not limited thereto by the embodiments and drawings disclosed herein. It can be applied by those skilled in the art.

도 1 은 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 구성에 대한 설명에 참조되는 블록도, 1 is a block diagram referred to for the description of the configuration of a character string inclusion determining apparatus according to an embodiment of the present invention;

도 2 는 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 문자열 포함성 여부 결정에 사용되는 해싱 방법에 대한 설명에 참조되는 예시도, FIG. 2 is an exemplary diagram referred to for describing a hashing method used for determining whether a string includes a string according to an embodiment of the present invention.

도 3 은 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 비트 벡터 테이블에 대한 설명에 참조되는 도, 3 is a reference to the description of the bit vector table of the string inclusion determining apparatus according to an embodiment of the present invention;

도 4 는 본 발명의 일실시예에 따른 문자열 포함성 결정장치의 화이트리스트 등록방법에 대한 동작설명에 참조되는 순서도, 4 is a flowchart referred to in an operation description of a method for registering a whitelist of a string inclusion determining apparatus according to an embodiment of the present invention;

도 5 는 본 발명의 일실시예에 문자열 포함성 결정장치의 전처리 절차에 대한 동작설명에 참조되는 순서도, FIG. 5 is a flowchart for referring to an operation description of a preprocessing procedure of an apparatus for determining string inclusions in an embodiment of the present invention; FIG.

<도면의 주요 부분에 관한 부호의 설명><Explanation of symbols on main parts of the drawings>

100: 제어부100: control unit

110: 화이트리스트 관리부 120: 데이터 탐색부110: white list management unit 120: data search unit

130: 해슁부 150: 비트벡터맵130: hash part 150: bit vector map

160: 전체 결과 테이블 170: 현재 결과 테이블160: full result table 170: current result table

Claims

Generating a hash value by hashing a string included in the imported data;

Retrieving a whitelist index having a value equal to the hash value from a bitvector table;

Matching a string corresponding to the retrieved white list index with the string of data; And

If the character string having the same order as the character string of the data is included in the character string corresponding to the white list index, determining that the data is included in the white list and treating the data as regular data; How to decide.

The method of claim 1,

After the searching step, if the whitelist index having the same value as the hash value does not exist by repeating the hashing step and the searching step until the last character string of the data, the data is not included in the white list. Determining to include, and discarding or passing the data.

The method of claim 2,

The searching may include searching for a whitelist index having the same value as the hash value from the beginning of the bit vector table to the last registered whitelist index corresponding to the number of registered whitelists. How to decide.

The method of claim 1,

If the character string of the data and the sequence of the character string corresponding to the white list index are different from each other, determining that the data does not correspond to the white list, and discarding or passing the data. How to determine inclusion.

The method of claim 1,

In the hashing step, the string included in the data may be hashed by a consecutive n-gram hashing method according to a sliding-window technique.

Inputting data including a character string to be registered in the white list;

Determining whether the data includes the whitelist;

If the data is new data, registering the character string in the whitelist; And

And if the data is already included in the whitelist, outputting a result value indicating that the data is registered and a whitelist index corresponding to the data.

The method of claim 6,

In the determining whether the whitelist is included, hashing the character string of the data by using a consecutive n-gram hashing method according to a sliding window technique.

And in the bit vector table corresponding to the generated hash value, if there is at least one whitelist index having a value of 1, determining that the data is included in the whitelist.

The method of claim 7, wherein

In the determining whether the whitelist is included in the bit vector table corresponding to the hash value, when there is no whitelist index having a value of 1, it is determined that the data is new data not included in the whitelist. Character string inclusion determining method.

The method of claim 6,

The registering step may hash the string of data using an overlapping n-gram hashing method according to a sliding window technique.

And updating the bit vector table corresponding to the generated hash value to register the string of data in the white list.

A hashing unit hashing the imported data by a consecutive n-gram hashing or overlapping hashing method according to a sliding window technique;

A data search unit for searching a white list index having a value equal to the generated hash value from a bit vector table to determine whether the data includes a white list;

A white list manager for updating the bit vector table corresponding to the hash value and registering the data to the white list when the data is new data including a string to be added to the white list; And

And a controller which controls registration by the white list manager and discards or maintains the data in response to a determination result of the data search unit.