KR101210622B1

KR101210622B1 - Method for detecting ip shared router and system thereof

Info

Publication number: KR101210622B1
Application number: KR1020100060259A
Authority: KR
Inventors: 김경수; 김봉기; 손정표; 정현호
Original assignee: 주식회사 케이티
Priority date: 2010-06-24
Filing date: 2010-06-24
Publication date: 2012-12-11
Also published as: KR20110140063A

Abstract

본 발명의 실시예에 따르면, 하나의 공인 IP로 복수의 사용자 단말에게 인터넷 서비스를 제공하는 IP 공유기를 검출한다. 이때, IP 공유기 검출 시스템은 IP 네트워크를 통해 수신한 복수의 웹 서비스 요청 메시지로부터 소스 IP 주소 및 유저 에이전트 필드가 일치하는 웹 서비스 요청 메시지의 소스 포트 번호를 복수의 웹 서비스 요청 메시지를 수신한 시간 순서에 따라 추출한다. 그리고 추출된 복수의 소스 포트 번호를 시간 순서에 따라 변화하는 패턴을 분석하여 하나 이상의 그룹으로 분류한다.
이때, 분류된 그룹의 수에 기초하여 IP 공유기를 사용하는 사용자 단말의 수를 계산한다.According to an embodiment of the present invention, an IP router for providing Internet services to a plurality of user terminals with one public IP is detected. At this time, the IP router detection system receives a plurality of web service request messages from a plurality of web service request messages received through an IP network, using a source port number of a web service request message matching a source IP address and a user agent field. Extract according to. The extracted source port numbers are classified into one or more groups by analyzing patterns that change in time order.
At this time, the number of user terminals using the IP router is calculated based on the number of classified groups.

Description

Method for detecting IP router and system for performing the same {METHOD FOR DETECTING IP SHARED ROUTER AND SYSTEM THEREOF}

본 발명은 IP 공유기를 검출하는 방법 및 이를 수행하는 시스템에 관한 것으로서, 더욱 상세하게는 IP 공유기의 사용 여부 및 IP 공유기에 연결된 사용자 단말의 수를 검출하는 방법 및 그 시스템에 관한 것이다.The present invention relates to a method for detecting an IP router and a system for performing the same. More particularly, the present invention relates to a method and a system for detecting the use of an IP router and the number of user terminals connected to the IP router.

컴퓨터 등의 기기들은 인터넷 서비스 제공자(Internet Service Provider: 이하, ISP)로부터 공인 IP 주소(Internet Protocol Address)라는 고유 식별번호를 할당받아 인터넷 기반의 통신을 수행한다.Devices such as computers are assigned with a unique identification number called a public IP address (Internet Protocol Address) from an Internet Service Provider (ISP) to perform Internet-based communication.

현재 이러한 공인 IP 주소의 효율적이고 개인적인 사용을 위해 IP 공유기를 사용한다. IP 공유기는 인터넷 사용자가 단일의 공인 IP 주소로 여러 사용자 단말들이 동시에 인터넷 이용을 가능하게 한다. 예를 들어, NAT(Network Address Translation) 장치를 이용하여 사설 네트워크를 구성하는 것이다.Currently, I use an IP router for efficient and private use of these public IP addresses. The IP sharer allows Internet users to use the Internet simultaneously with a single public IP address. For example, a private network is configured using a network address translation (NAT) device.

그런데 이러한 IP 공유기가 증가함에 따라 유무선 네트워크 환경에서 사용 가능한 대역폭의 양이 줄어들고 있으며, 가입자가 저렴한 비용으로 네트워크를 사용하기 위해서 IP 공유기를 사용하는 경우가 매우 빈번하게 발생하고 있다.However, as the IP routers increase, the amount of bandwidth available in a wired / wireless network environment decreases, and subscribers frequently use IP routers to use the network at low cost.

이로 인해 네트워크 트래픽 증가가 심각한 문제를 야기하고 있다. 게다가 최근 네트워크의 불안전성과 클라이언트 PC의 응용프로그램 취약점 등을 이용하여 ID, 패스워드 등 귀중한 정보를 훔치는 피싱, 인터넷 뱅킹 사이트로 둔갑하여 불법적으로 거래 자금을 획득하는 금융보안사건, 주요 포탈 사이트 및 보안 사이트들을 DDoS(Distributed Denial of Service)공격하는 사건 등이 자주 발생하고 있다. 만일, 이러한 공격을 하는 PC가 사설 네트워크 내에 존재한다면 이를 차단할 필요성이 매우 크다.As a result, increasing network traffic is causing serious problems. In addition, financial security events, major portal sites and security sites that illegally obtain funds for transactions by phishing, internet banking sites that steal valuable information such as IDs and passwords using network instability and application vulnerability of client PCs, etc. Disrupted Denial of Service (DDoS) attacks have occurred frequently. If a PC that is doing this attack exists in a private network, there is a great need to block it.

종래에 이러한 문제점을 해결하기 위해서 IP 공유기의 사용 유무를 검출하기 위한 다양한 방안이 제시되고 있으며, 크게 다음과 같은 세가지로 분류할 수 있다.Conventionally, various methods for detecting the use of an IP router have been proposed to solve such a problem, and can be classified into three types as follows.

먼저, TCP/IP(Transmission Control Protocol/Internet Protocol) 패킷을 분석하는 방법인데, 쿠키 정보를 이용하는 방법과 TCP 헤더의 ISN(Initial Sequence Number)를 이용하는 방법이 있다.First, a method of analyzing a Transmission Control Protocol / Internet Protocol (TCP / IP) packet includes a method of using cookie information and a method of using an Initial Sequence Number (ISN) of a TCP header.

그런데, 쿠키 정보를 활용하는 것은 상향 트래픽 뿐만 아니라 하향 트래픽의 감시도 필요하기 때문에 작업량이 증가되는 문제점이 존재한다.However, the use of cookie information requires a monitoring of downlink traffic as well as uplink traffic, thereby increasing the workload.

또한, TCP 헤더의 ISN을 이용하는 경우, ISN은 초기 시퀀스 번호로 TCP 연결 요청이 발생할 경우 생성되는 번호이다. 초기값은 1이고, 타임 카운터에 의해 매 4ms마다 1씩 증가하는 원리로 생성된다. 따라서, ISN 번호가 시간 동기화 과정을 거친 후 시간에 따라 순차적으로 증가됨으로써, ISN 번호 추측이 용이하다.In addition, when using the ISN of the TCP header, the ISN is an initial sequence number and a number generated when a TCP connection request is generated. The initial value is 1, and is generated on the basis of a time counter that increments by 1 every 4 ms. Therefore, since the ISN number is sequentially increased with time after the time synchronization process, the ISN number can be easily guessed.

그런데 이와 같이 ISN이 순차적으로 증가하게 되면, 보안사고 발생의 소지가 다분하다. 왜냐하면, TCP/IP프로토콜에서 사용하는 3-way handshaking 인증 방식이 ISN와 AN(Acknowledgement Number)을 이용하기 때문이다. 예를 들어, 공격자가 IP를 변조 후, 원래의 클라이언트(IP:1.1.1.1)보다 먼저 서버(IP:2.2.2.2)에 ACK를 전송하면 공격자와 서버(IP:2.2.2.2) 간의 신뢰성 있는 연결을 생성되고 서버(IP:2.2.2.2)로의 악의적인 공격이 가능해진다.However, if the ISN is sequentially increased in this way, there is a lot of potential for security incidents. This is because the 3-way handshaking authentication method used in the TCP / IP protocol uses ISN and AN (Acknowledgment Number). For example, an attacker sends an ACK to the server (IP: 2.2.2.2) before the original client (IP: 1.1.1.1) after tampering with the IP, thus establishing a reliable connection between the attacker and the server (IP: 2.2.2.2). Is generated and a malicious attack against the server (IP: 2.2.2.2) is possible.

따라서, 최근 들어 ISN 생성 프레임 워크는 각 장치마다 랜덤 번호로써 ISN을 생성하도록 변경되어 결과적으로 ISN을 쉽게 추측하지 못하도록 한다.Therefore, in recent years, the ISN generation framework has been changed to generate an ISN with a random number for each device, so that the ISN cannot be easily guessed as a result.

이와 같이, 각 운영 체제마다 보안을 위해 랜덤 생성기를 이용하여 ISN이 만들어지기 때문에 동일한 클라이언트의 ISN을 추측하기가 어려우므로, ISN을 이용하여 IP 공유기를 검출하고자 할 경우, 검출률이 떨어질 수 밖에 없다.As described above, since the ISN is generated by using a random generator for security of each operating system, it is difficult to infer the ISN of the same client. Therefore, when the IP router is to be detected using the ISN, the detection rate is inevitably reduced.

두번째는 클라이언트 단말에 에이전트(agent)를 설치하는 것으로, 자바 애플릿(JAVA applet)을 설치하는 방법과 ActiveX를 설치하는 방법이 있다. 그런데 클라이언트 단말에 별도의 프로그램을 설치하는 것은 인터넷 사용자가 이것을 인지하고 설치 여부를 거부할 수 있다는 문제의 소지가 있다.The second method is to install an agent on the client terminal. There are a method of installing a Java applet and a method of installing ActiveX. However, installing a separate program on the client terminal has a problem that the Internet user can recognize this and refuse to install it.

마지막으로, 유저 에이전트(User-agent) 필드를 이용하여 동일 사용자 여부를 구분하여 유저 에이전트(User-agent) 필드가 복수개 발견될 경우, IP 공유기를 검출한다. Lastly, if a plurality of user agent fields are found by distinguishing whether the same user is used using the user agent field, the IP router is detected.

그런데 현재 2의 32제곱에 해당하는 공인 IP 주소가 존재하는데 반해 알려진 유저 에이전트(User-agent) 필드의 수는 공인 IP 주소만큼에 훨씬 미치지 못한다. 즉, 동일한 유저 에이전트(User-agent) 필드를 가진 서로 다른 사용자가 존재할 수 있다는 의미다. 따라서, 유저 에이전트(User-agent)필드 만으로 사용자를 분류시 다음과 같은 오류가 발생할 수 있다.However, there are currently public IP addresses corresponding to 32 powers of 2, whereas the number of known user-agent fields is far less than that of public IP addresses. That is, different users having the same user-agent field may exist. Therefore, the following error may occur when classifying a user by only a user-agent field.

먼저, IP 공유기를 사용하지 않음에도 IP 공유기를 사용한다고 판단하는 오류가 발생한다. 즉 IP 공유기를 사용하지 않는 단일의 클라이언트 단말이 두 개 이상의 웹 프로그램 예를 들어, 익스플로러, 파이어폭스, 오페라 등을 이용하여 인터넷을 하는 경우 유저 에이전트(User-agent) 필드가 두 개 이상 발생하지만, 사용자는 동일하다. 그럼에도 유저 에이전트(User-agent) 필드만으로 사용자를 분류시 서로 다른 클라이언트 단말인 것으로 오인하게 되는 것이다. First, an error that determines that an IP router is used even though the IP router is not used occurs. That is, when a single client terminal that does not use an IP router accesses the Internet using two or more web programs, for example, Explorer, Firefox, and Opera, two or more user-agent fields may occur. The user is the same. Nevertheless, when classifying a user using only the user-agent field, the user is mistaken for being a different client terminal.

또한, IP 공유기를 사용함에도 IP 공유기를 사용하지 않는다고 판단하는 오류가 발생한다. 즉 유저 에이전트(User-agent) 필드만으로 사용자를 분류시 IP 공유기를 사용하는 서로 다른 클라이언트 단말들이 동일한 유저 에이전트(User-agent) 필드를 갖는 경우, 유저 에이전트(User-agent) 필드가 동일하다는 이유로 동일한 클라이언트 단말로 판단하여 IP 공유기를 사용함에도 IP 공유기를 사용하지 않는다고 판단하게 되는 것이다.In addition, even when using an IP router, an error that determines that the IP router is not used occurs. That is, when different client terminals using the IP router have the same User-Agent field when classifying users by only the User-Agent field, the same reason is because the User-Agent field is the same. It is determined that the client terminal does not use the IP router even though it is determined as the client terminal.

따라서, 본 발명이 이루고자 하는 기술적 과제는 HTTP 요청 메시지 분석과 통계학적 모델링이라는 2단계 분류 기법을 활용하여 IP 공유기를 검출하는 방법 및 이를 수행하는 시스템을 제공하는 것이다.Accordingly, an aspect of the present invention is to provide a method for detecting an IP router using a two-stage classification technique such as HTTP request message analysis and statistical modeling, and a system for performing the same.

본 발명의 한 특징에 따르면 IP 공유기 검출 방법이 제공된다. 이 방법은, IP 네트워크를 통해 수신한 복수의 웹 서비스 요청 메시지로부터 소스 IP 주소 및 유저 에이전트 필드-여기서 유저 에이전트 필드는 상기 웹 서비스 요청 메시지를 송신한 사용자에 관한 정보가 수록됨-가 일치하는 웹 서비스 요청 메시지의 소스 포트 번호를 상기 복수의 웹 서비스 요청 메시지를 수신한 시간 순서에 따라 추출하는 단계; 추출된 복수의 소스 포트 번호를 상기 시간 순서에 따라 변화하는 패턴을 분석하여 하나 이상의 그룹으로 분류하는 단계; 및 분류된 그룹의 수에 기초하여 상기 IP 공유기를 사용하는 사용자 단말의 수를 계산하는 단계를 포함한다.According to one aspect of the present invention, an IP router detecting method is provided. The method comprises a web that matches a source IP address and a user agent field from a plurality of web service request messages received over an IP network, where the user agent field contains information about the user who sent the web service request message. Extracting a source port number of a service request message according to a time sequence of receiving the plurality of web service request messages; Classifying the extracted plurality of source port numbers into one or more groups by analyzing patterns changing according to the time sequence; And calculating the number of user terminals using the IP router based on the number of classified groups.

본 발명의 다른 특징에 따르면 IP 공유기 검출 시스템이 제공된다. 이 시스템은, IP 네트워크를 통해 수신한 복수의 웹 서비스 요청 메시지로부터 소스 IP 주소 및 유저 에이전트 필드-여기서 유저 에이전트 필드는 상기 웹 서비스 요청 메시지를 송신한 사용자에 관한 정보가 수록됨-가 일치하는 웹 서비스 요청 메시지의 소스 포트 번호를 상기 복수의 웹 서비스 요청 메시지를 수신한 시간 순서에 따라 추출하는 추출부; 추출된 복수의 소스 포트 번호를 상기 시간 순서에 따라 변화하는 패턴을 분석하여 하나 이상의 그룹으로 분류하는 분류부; 및 분류된 그룹의 수에 기초하여 상기 IP 공유기를 사용하는 사용자 단말의 수를 계산하는 검출부를 포함한다.According to another feature of the invention there is provided an IP router detection system. The system includes a web that matches a source IP address and a user agent field from a plurality of web service request messages received through an IP network, where the user agent field contains information about the user who sent the web service request message. An extraction unit for extracting a source port number of a service request message according to a time sequence of receiving the plurality of web service request messages; A classification unit classifying the extracted plurality of source port numbers into one or more groups by analyzing a pattern that changes according to the time sequence; And a detector configured to calculate the number of user terminals using the IP router based on the number of classified groups.

본 발명의 실시예에 따르면, HTTP 요청 메시지로부터 유저 에이전트(User-agent) 필드를 추출하고, 시간대 별로 소스 포트 번호가 선형적으로 증가한다는 사실을 활용하여 IP 공유기를 검출함으로써, 복잡도를 줄이고 IP 공유기 검출의 정확도를 향상시킬 수 있다.According to an embodiment of the present invention, by extracting the user-agent field from the HTTP request message and detecting the IP router by utilizing the fact that the source port number increases linearly for each time zone, the complexity of the IP router is reduced. The accuracy of the detection can be improved.

따라서, 종래에 단순히 공인 IP 주소 당 검출된 유저 에이전트(User-agent) 필드 만을 이용하여 IP 공유기를 검출하는 경우, 하나의 클라이언트 PC에서 두 개 이상의 웹 프로그램을 사용할 경우 유저 에이전트(User-agent) 필드는 다르나 동일한 사용자인데 IP 공유기 사용자로 탐지하는 에러(false positive error)와 서로 다른 클라이언트 PC로부터 검출한 User-Agent값이 동일하여 IP 공유기라고 탐지하지 못하는 경우(false negative error)의 문제점을 해결할 수 있다.Therefore, when detecting an IP router using only the user-agent field detected per public IP address in the related art, the user-agent field when two or more web programs are used in one client PC. Is different but same user, but it is possible to solve the problem of false positive error detected as IP router user and false negative error detected due to the same User-Agent value detected from different client PCs. .

도 1은 본 발명의 실시예에 따른 IP 공유기 검출 시스템이 적용된 네트워크 구성도이다.
도 2는 본 발명의 실시예에 따른 IP 공유기 검출 시스템의 세부적인 구성을 나타낸 블록도이다.
도 3 및 도 4는 본 발명의 실시예에 따른 HTTP 요청 메시지의 구조를 나타낸다.
도 5는 본 발명의 실시예에 따른 유저 에이전트(User-agent) 필드의 구조를 나타낸다.
도 6은 본 발명의 실시예에 따른 시간대별 소스 포트 번호의 차이를 나타낸 히스토그램이다.
도 7은 도 6의 히스토그램에 대하여 가우시안 혼합 모델(GMM)을 적용한 결과를 나타낸 그래프이다.
도 8은 본 발명의 실시예에 따른 IP 공유기 검출 방법을 나타낸 순서도이다.
도 9는 본 발명의 실시예에 따른 임계치 설정 방법을 나타낸 순서도이다.
도 10은 본 발명의 실시예에 따른 시간대별 서로 다른 유저 에이전트(User-agent) 필드를 갖는 HTTP 요청 메시지들의 소스 포트 번호를 나타내는 그래프이다.
도 11은 본 발명의 실시예에 따른 IP 공유기 검출예를 나타낸 그래프이다.1 is a diagram illustrating a network configuration to which an IP router detection system according to an embodiment of the present invention is applied.
2 is a block diagram showing a detailed configuration of an IP router detection system according to an embodiment of the present invention.
3 and 4 illustrate the structure of an HTTP request message according to an embodiment of the present invention.
5 illustrates a structure of a user-agent field according to an embodiment of the present invention.
6 is a histogram showing the difference between source port numbers for each time zone according to an embodiment of the present invention.
FIG. 7 is a graph illustrating a result of applying a Gaussian mixture model (GMM) to the histogram of FIG. 6.
8 is a flowchart illustrating a method of detecting an IP router according to an embodiment of the present invention.
9 is a flowchart illustrating a threshold setting method according to an embodiment of the present invention.
FIG. 10 is a graph illustrating source port numbers of HTTP request messages having different user agent fields according to time zones according to an embodiment of the present invention.
11 is a graph illustrating an example of detecting an IP router according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and like reference numerals designate like parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise.

이하, 도면을 참조로 하여 본 발명의 실시예에 따른 IP 공유기를 검출하는 방법 및 이를 수행하는 시스템에 대하여 상세히 설명한다.Hereinafter, a method for detecting an IP router according to an embodiment of the present invention and a system for performing the same will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 IP 공유기 검출 시스템이 적용된 네트워크 구성도이다.1 is a diagram illustrating a network configuration to which an IP router detection system according to an embodiment of the present invention is applied.

도 1을 참조하면, 클라이언트 1 단말(100), 클라이언트 2 단말(200) 및 클라이언트 3 단말(300)은 모두 IP 공유기(400)에 연결된다.Referring to FIG. 1, the client 1 terminal 100, the client 2 terminal 200, and the client 3 terminal 300 are all connected to the IP router 400.

이러한 IP 공유기(400)는 하나의 공인 IP 주소를 이와 대응되는 하나 이상의 사설 IP 주소로 변환하거나, 하나 이상의 사설 IP 주소를 하나의 공인 IP 주소로 변환한다. The IP router 400 converts one public IP address into one or more private IP addresses corresponding thereto or one or more private IP addresses into one public IP address.

이때, IP 공유기(400)는 세 개의 사설 IP 주소 즉 192.168.0.2, 192.168.0.3, 192.168.0.4가 하나의 공인 IP 주소 125.0.0.1와 매핑된 라우팅 테이블을 구비한다.At this time, the IP sharer 400 includes a routing table in which three private IP addresses, 192.168.0.2, 192.168.0.3, and 192.168.0.4, are mapped to one public IP address 125.0.0.1.

IP 공유기(400)는 클라이언트 1 단말(100), 클라이언트 2 단말(200) 및 클라이언트 3 단말(300) 각각에게 192.168.0.2, 192.168.0.3, 192.168.0.4을 각각 할당한다. 그러면, 클라이언트 1 단말(100), 클라이언트 2 단말(200) 및 클라이언트 3 단말(300)은 할당받은 각각의 사설 IP 주소를 이용하여 IP 공유기(400)에 접속하고, 이와 매핑된 하나의 공인 IP 주소 125.0.0.1를 이용하여 서비스 서버(500)에 접속하여 트래픽을 송수신한다.The IP router 400 allocates 192.168.0.2, 192.168.0.3, and 192.168.0.4 to the client 1 terminal 100, the client 2 terminal 200, and the client 3 terminal 300, respectively. Then, the client 1 terminal 100, the client 2 terminal 200 and the client 3 terminal 300 is connected to the IP sharer 400 by using the assigned private IP address, and one public IP address mapped thereto 125.0.0.1 is used to access the service server 500 to send and receive traffic.

이때, 서비스 서버(500)로 향하는 상향 트래픽은 IP 공유기 검출 시스템(600)으로 인입된다. IP 공유기 검출 시스템(600)은 인입된 상향 트래픽을 분석하여 IP 공유기(400)를 검출하고 IP 공유기(400)에 연결된 클라이언트 단말의 수를 탐지한다.At this time, the upstream traffic directed to the service server 500 flows into the IP router detection system 600. The IP router detecting system 600 detects the IP router 400 by analyzing the incoming upstream traffic and detects the number of client terminals connected to the IP router 400.

특히, 종래에 TCP/IP(Transmission Control Protocol/Internet Protocol) 패킷 전체를 분석하는 것과 달리 IP 공유기 검출 시스템(600)은 상향 트래픽 중에서도 웹 서비스 요청 메시지인 HTTP(HyperText Transfer Protocol) 요청 메시지만을 분석한다. 이렇게 하면, TCP/IP 100개의 패킷 중에 HTTP 요청 메시지가 포함된 패킷은 10개이므로, TCP/IP 패킷을 이용할 경우에 비해 시스템에 부가되는 부하 가중의 부담이 적다.In particular, unlike the conventional TCP / IP packet analysis, the IP router detection system 600 analyzes only a HTTP (HyperText Transfer Protocol) request message, which is a web service request message, among upstream traffic. In this case, since there are 10 packets including the HTTP request message among the 100 packets of TCP / IP, the load burden on the system is less than that in the case of using the TCP / IP packet.

클라이언트 1 단말(100), 클라이언트 2 단말(200) 및 클라이언트 3 단말(300)은 IP 공유기(400)로부터 부여 받은 사설 IP 주소인 192.169.0.2, 192.168.0.3, 192.168.0.4를 각각의 소스 IP 주소로 설정한 HTTP 요청 메시지를 전송한다. 그러면, IP 공유기(400)는 수신된 HTTP 요청 메시지의 소스 IP 주소를 공인 IP 주소(125.0.0.1)로 변환하여 서비스 서버(500)로 전송한다.The client 1 terminal 100, the client 2 terminal 200, and the client 3 terminal 300 assign the source IP addresses 192.169.0.2, 192.168.0.3, and 192.168.0.4, which are private IP addresses, which are assigned from the IP router 400. Send the HTTP request message set to. Then, the IP router 400 converts the source IP address of the received HTTP request message into a public IP address (125.0.0.1) and transmits it to the service server 500.

이때, IP 공유기 검출 시스템(600)이 트래픽 경로 상의 HTTP 요청 메시지를 수집하여 유저 에이전트(User-Agent) 필드를 추출하여 사용자를 1차 분류한다. 그런데 유저 에이전트(User-agent) 필드 외에 다른 필드에는 클라이언트 단말들(100, 200, 300)을 유니크하게 분류할 수 있는 정보가 없지만, 그렇다고 해서 유저 에이전트(User-agent) 필드가 사용자마다 고유하다고 할 수는 없다. At this time, the IP router detection system 600 collects the HTTP request message on the traffic path, extracts the user-agent field, and classifies the user first. However, other fields besides the user-agent field have no information for uniquely classifying the client terminals 100, 200, and 300. However, the user-agent field is unique to each user. There is no number.

따라서, 추출부(603)는 유저 에이전트(User-agent) 필드가 동일한 HTTP 요청 메시지로부터 소스 포트 번호도 추출한다. 추출된 복수의 소스 포트 번호를 HTTP 요청 메시지의 수신 시간 순서에 따라 변화하는 패턴을 분석하여 하나 이상의 그룹으로 분류하여 사용자를 2차 분류하게 된다.Therefore, the extractor 603 also extracts the source port number from the HTTP request message having the same user-agent field. The extracted plurality of source port numbers are classified into one or more groups by analyzing patterns changing according to the reception time order of the HTTP request message to classify users secondly.

이와 같이 분류된 그룹의 수에 기초하여 IP 공유기를 사용하는 사용자 단말의 수를 계산한다.The number of user terminals using the IP router is calculated based on the number of groups classified as described above.

이때, 소스 포트 번호의 선형성을 통계학적 모델을 기반으로 분석하고 그 결과로부터 얻은 선형 방정식을 이용하여 IP 공유기(400) 및 IP 공유기(400)에 연결된 클라이언트 단말의 수를 탐지한다.At this time, the linearity of the source port number is analyzed based on the statistical model and the number of client terminals connected to the IP router 400 and the IP router 400 is detected using the linear equation obtained from the result.

이러한 IP 공유기 검출 시스템(600)은 네트워크 보안 측면에서 바이러스에 감염된 클라이언트 단말들(100, 200, 300)의 공지 체제 구축을 위해 활용될 수 있다. 또한, 망은 크게 중앙 서버 장치들과 지역 노드에 설치되어 있는 지역 서버 장치로 구성될 수 있는데, IP 공유기 검출 시스템(600)은 지역 서버 장치의 형태로 구현될 수 있다.The IP router detection system 600 may be utilized for constructing a known system of client terminals 100, 200, and 300 infected with a virus in terms of network security. In addition, the network may be largely composed of the central server devices and the local server device installed in the local node, the IP router detection system 600 may be implemented in the form of a local server device.

그러면, 이러한 IP 공유기 검출 시스템(600)의 구성에 대해 좀 더 상세히 살펴보기로 한다.Then, the configuration of the IP router detection system 600 will be described in more detail.

도 2는 본 발명의 실시예에 따른 IP 공유기 검출 시스템의 세부적인 구성을 나타낸 블록도이고, 도 3 및 도 4는 본 발명의 실시예에 따른 HTTP 요청 메시지의 구조를 나타내며, 도 5는 본 발명의 실시예에 따른 유저 에이전트(User-agent) 필드의 구조를 나타내고, 도 6은 본 발명의 실시예에 따른 시간대별 소스 포트 번호의 차이를 나타낸 히스토그램이며, 도 7은 도 6의 히스토그램에 대하여 가우시안 혼합 모델(GMM)을 적용한 결과를 나타낸 그래프이다.2 is a block diagram showing a detailed configuration of an IP router detection system according to an embodiment of the present invention, Figures 3 and 4 show the structure of an HTTP request message according to an embodiment of the present invention, Figure 5 is the present invention 6 illustrates a structure of a user-agent field according to an embodiment of the present invention. FIG. 6 is a histogram showing a difference between source port numbers for each time zone according to an embodiment of the present invention. FIG. 7 is a Gaussian for the histogram of FIG. A graph showing the results of applying the mixed model (GMM).

먼저, 도 2를 참조하면, 수집부(601), 추출부(603), 분류부(605), 저장부(607), 설정부(609), 분석부(611) 및 검출부(613)를 포함한다.First, referring to FIG. 2, a collection unit 601, an extraction unit 603, a classification unit 605, a storage unit 607, a setting unit 609, an analysis unit 611, and a detection unit 613 are included. do.

수집부(601)는 클라이언트 단말들(100, 200, 300)에서 서비스 서버(500)로 향하는 상향 트래픽을 상시 감시한다. 그리고 상향 트래픽 중에서 유저 에이전트(User-agent) 필드 및 소스 IP 주소가 동일한 복수의 웹 서비스 요청 메시지들을 시간대 별로 수집한다.The collecting unit 601 constantly monitors the upward traffic from the client terminals 100, 200, and 300 to the service server 500. In addition, a plurality of web service request messages having the same user-agent field and source IP address are collected for each time zone.

여기서, 웹 서비스 요청 메시지는 클라이언트 단말이 인터넷 서비스를 이용하기 위해 서비스 서버(500)로 전송하는 메시지로서, 본 발명의 실시예에서는 HTTP 요청 메시지가 사용된다.Here, the web service request message is a message transmitted from the client terminal to the service server 500 to use the Internet service. In the embodiment of the present invention, an HTTP request message is used.

도 3을 참조하면, HTTP 요청 메시지(700)는 ETH(Ethernet Header)(710), IPH(Internet Protocol Header)(720), TCPH(Transmission Control Protocol Header)(730), HTTPH(HTTP Header)(740) 및 HTTP Body(750)를 포함한다.Referring to FIG. 3, the HTTP request message 700 includes an Ethernet header (ETH) 710, an internet protocol header (IPH) 720, a transmission control protocol header (TCPH) 730, and an HTTP header (HTTPH) 740. ) And an HTTP Body 750.

여기서, 유저 에이전트(User-agent) 필드는 HTTPH(740)에 포함되고, 소스 포트 번호는 TCPH(730)의 소스 포트 번호 필드(731)에 저장되어 있다.Here, the user-agent field is included in the HTTPH 740, and the source port number is stored in the source port number field 731 of the TCPH 730.

클라이언트 단말들(100, 200, 300)에서 인터넷을 이용하기 위해 HTTP 요청 메시지를 서비스 서버(500)로 전송하는데, 최초 연결 요청시 새로운 소스 포트 번호가 랜덤으로 부여되고 세션이 생성된다.The client terminals 100, 200, and 300 transmit an HTTP request message to the service server 500 to use the Internet. In the initial connection request, a new source port number is randomly assigned and a session is created.

HTTPH(HTTP Header)(740) 및 HTTP Body(750)의 구조를 상세히 나타낸 도 4를 참조하면, HTTP 요청 메시지(700)는 요청 라인(741), 3개의 헤더 정보(742, 743, 744) 및 실제의 데이터가 포함된 바디(750)로 구성된다.Referring to FIG. 4 detailing the structures of the HTTP Header (HTTPH) 740 and the HTTP Body 750, the HTTP request message 700 includes a request line 741, three header information (742, 743, 744) and It consists of a body 750 containing the actual data.

이때, 3개의 헤더 정보(742, 743, 744)는 일반 헤더(742), 요청 헤더(743) 및 항목 헤더(744)로 구성된다. 이 중에서 요청 헤더(743)는 HTTP 요청 메시지(700)에만 존재한다.In this case, the three header informations 742, 743, and 744 include a general header 742, a request header 743, and an item header 744. Among these, the request header 743 exists only in the HTTP request message 700.

요청 헤더(743)는 클라이언트 단말들(100, 200, 300)의 구성과 클라이언트 단말들(100, 200, 300)이 선호하는 문서 형식 등을 지정한다. 요청 헤더(743)는 총 14개의 세부 정보로 이루어지며, 송신자의 다양한 정보가 수록된다. 이 중에서도 유저 에이전트(User-agent) 필드(745)는 클라이언트 단말들(100, 200, 300)이 사용하는 웹 브라우저에 대한 식별 가능한 정보를 제공한다.The request header 743 specifies the configuration of the client terminals 100, 200, and 300, a document format preferred by the client terminals 100, 200, and 300. The request header 743 has a total of 14 detailed information, and contains various information of the sender. Among these, the user-agent field 745 provides identifiable information about the web browser used by the client terminals 100, 200, and 300.

이때, 유저 에이전트(User-agent) 필드(745)는 도 5와 같이 구성된다.At this time, the user agent (User-agent) field 745 is configured as shown in FIG.

도 5를 참조하면, 유저 에이전트(User-agent) 필드(745)는 클라이언트 단말들(100, 200, 300)의 트래픽에 포함되어 있는 OS 정보, 웹브라우저 정보, 유저정보 또는 클라이언트 단말들(100, 200, 300)의 상세 정보를 포함한다.Referring to FIG. 5, the user agent field 745 may include OS information, web browser information, user information, or client terminals 100, which are included in the traffic of the client terminals 100, 200, and 300. 200, 300).

유저 에이전트(User-agent) 필드(745)는 운영체제의 종류와 버젼, 웹 프로그램(브라우져)의 종류와 버전 및 응용 프로그램(749)에 따라 필드값(747)이 다르게 나타난다. The user agent field 745 may have a different field value 747 depending on the type and version of the operating system, the type and version of the web program (browser), and the application 749.

즉 유저 에이전트(User-agent) 필드(745)는 설치된 운영 체제 및 웹 브라우져, 그리고 응용 프로그램(749)에 따라 다양한 필드값(747)을 가질 수 있다.That is, the user agent field 745 may have various field values 747 according to the installed operating system, the web browser, and the application 749.

각 브라우저의 세부 정보는 조금씩 다를 수 있으며 이것은 브라우저의 종류와 버젼, 운영 체제, 설치된 소프트웨어 등에 따라 달라진다. 예를 들면, Windows XP Professional SP3의 운영 체제와 Internet Explorer 6를 사용했을 때의 유저 에이전트(User-agent) 필드(745)의 필드값(747)은"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1"이 된다.Each browser's details may vary slightly, depending on the browser's type and version, operating system, and installed software. For example, the field value 747 of the User-agent field 745 when using the operating system of Windows XP Professional SP3 and Internet Explorer 6 is "Mozilla / 4.0 (compatible; MSIE 6.0; Windows NT). 5.1; SV1; InfoPath.1 ".

한편, 추출부(603)는 수집부(601)가 수집한 HTTP 요청 메시지 중에서 소스 IP 주소 및 유저 에이전트(User-agent) 필드가 일치하는 HTTP 요청 메시지의 소스 포트 번호를 HTTP 요청 메시지를 수신한 시간 순서에 따라 복수개 추출한다. Meanwhile, the extraction unit 603 receives the HTTP request message as the source port number of the HTTP request message in which the source IP address and the user-agent field match among the HTTP request messages collected by the collection unit 601. Extract a plurality in order.

이때, 추출부(603)는 도 3의 TCPH(730)의 TCP 플래그(Flag)가 0x18(PSH|ACK가 모두 1)로 설정된 소스 포트 번호를 소스 포트 번호 필드(731)에서 추출한다. At this time, the extractor 603 extracts the source port number in which the TCP flag Flag of the TCPH 730 of FIG. 3 is set to 0x18 (PSH | ACK is all 1) in the source port number field 731.

분류부(605)는 추출된 복수의 소스 포트 번호를 HTTP 요청 메시지를 수신한 시간 순서에 따라 변화하는 패턴에 따른 하나 이상의 그룹으로 분류한다. The classifier 605 classifies the extracted plurality of source port numbers into one or more groups according to a pattern that changes according to a time sequence of receiving the HTTP request message.

이때, 복수의 소스 포트 번호에 대해 이전 시간의 소스 포트 번호와 현재 시간의 소스 포트 간의 차이가 기 정의된 선형성을 갖는 복수의 소스 포트 번호로 구성된 하나 이상의 그룹으로 분류한다. 그리고 그룹핑된 하나 이상의 그룹을 유저 에이전트(User-agent) 필드 별로 매핑한 테이블로 구성하여 저장부(607)에 저장한다.In this case, the difference between the source port number of the previous time and the source port of the current time is classified into one or more groups including a plurality of source port numbers having a predefined linearity for the plurality of source port numbers. The at least one grouped group is configured as a table mapped for each user agent field and stored in the storage unit 607.

저장부(607)는 분류부(605)에 의해 그룹핑된 하나 이상의 그룹이 각각의 유저 에이전트 필드와 매핑된 테이블을 저장한다.The storage unit 607 stores a table in which one or more groups grouped by the classification unit 605 are mapped to respective user agent fields.

설정부(609)는 수집부(601)를 통하여 기 정의된 일정 시간 동안 수집한 복수의 HTTP 요청 메시지들로부터 추출한 유저 에이전트(User-agent) 필드 및 복수의 소스 포트 번호에 가우시안 혼합 모델을 적용하여 도출한 임계치를 설정한다. The setting unit 609 applies a Gaussian mixture model to the user agent field and the plurality of source port numbers extracted from the plurality of HTTP request messages collected for a predetermined time through the collecting unit 601. Set the derived threshold.

즉 동일한 유저 에이전트(User-agent) 필드를 갖는 HTTP 요청 메시지로부터 추출된 소스 포트 번호가 시간이 지남에 따라 선형적으로 증가한다는 사실을 복합 가우시안 모델을 통해 증명하고 이를 통해서 임계치를 도출한다.That is, the composite Gaussian model proves that the source port number extracted from the HTTP request message having the same user-agent field increases linearly with time and derives the threshold through it.

도 6을 참조하면, 기 정의된 일정 시간 동안 수집된 HTTP 요청 메시지로부터 현재 시간(T₁)에 추출된 소스 포트 번호와 이전 시간(T₀)에 추출된 소스 포트 번호의 차이값 히스토그램을 나타낸 것이다. 도 6에 나와 있듯이 대부분의 차이값이 {-1, 0, 1}에 분포함을 알 수 있다. 게다가 특히, 차이값 '1'에서 가장 많은 값이 분포함을 알 수 있다. 이로부터 소스 포트 번호가 시간의 흐름에 따라 선형적으로 증가 또는 감소함을 알 수 있다.Referring to FIG. 6, the difference value histogram between the source port number extracted at the current time T ₁ and the source port number extracted at the previous time T ₀ from the HTTP request message collected for a predetermined time is shown. . As shown in FIG. 6, it can be seen that most difference values are distributed in {-1, 0, 1}. Moreover, in particular, it can be seen that the largest value is distributed in the difference value '1'. It can be seen from this that the source port number increases or decreases linearly with time.

도 6을 일반화하기 위하여 대부분의 IP 공유기를 사용하는 클라이언트 단말들(100, 200, 300)의 소스 포트 번호를 현재 시간과 이전 시간과의 차이값을 기준으로 확률 분포 모델을 정의할 수 있으며, 도 7과 같다.In order to generalize FIG. 6, a probability distribution model may be defined based on a difference between a current time and a previous time of source port numbers of client terminals 100, 200, and 300 using most IP routers. Same as 7.

여기서, 확률 분포 모델은 가우시안 혼합 모델(Gaussian Mixture Model)을 이용한다. 가우시안 혼합 모델(GMM)은 데이터의 확률분포를 추정하여 패턴을 분류하는 방법 중 하나로서, 다양한 형태의 확률분포의 모양을 표현할 수 있도록 확장된 확률 모델로서 다음의 수학식 1로 정의될 수 있다.Here, the probability distribution model uses a Gaussian Mixture Model. The Gaussian mixture model (GMM) is one of methods for classifying patterns by estimating probability distributions of data. The Gaussian mixture model (GMM) may be defined by Equation 1 as an extended probability model to express various shapes of probability distributions.

여기서, -_Yi(x)는 단일의 가우시안 분포를 의미한다. μ_i와 σ_i는 각각 해당 분포의 평균과 분산을 의미한다.Where- _{Yi (x)} denotes a single Gaussian distribution. μ _i and σ _i are the mean and variance of the corresponding distribution, respectively.

설정부(609)는 수학식 1에 정의된 변수들을 분석하여 수학식 2와 같이 임계치를 도출한다.The setting unit 609 analyzes the variables defined in Equation 1 to derive a threshold value as shown in Equation 2.

한편, 분석부(611)는 분류부(605)로부터 전달받은 복수의 소스 포트 번호에 대한 선형성을 판단하여 그 결과를 알려준다. On the other hand, the analysis unit 611 determines the linearity of the plurality of source port numbers received from the classification unit 605 and informs the result.

이때, 선형성 판단은 다음과 같이 수행된다. 즉 분류부(605)로부터 전달받은 복수의 소스 포트 번호에 대해 이전 시간의 소스 포트 번호와 현재 시간의 소스 포트 간의 차이를 계산하여 수학식 3을 판단한다.At this time, the linearity determination is performed as follows. That is, Equation 3 is determined by calculating a difference between the source port number of the previous time and the source port of the current time with respect to the plurality of source port numbers received from the classification unit 605.

여기서, Current_Src_Port는 추출부(603)가 추출한 현재 시간의 소스 포트를 의미한다. Previous_Src_Port는 이전 시간의 소스 포트 번호를 의미한다. Here, Current_Src_Port means a source port of the current time extracted by the extractor 603. Previous_Src_Port means the source port number of the previous time.

이때, 수학식 3의 판단 결과가'참'이면 선형성을 만족하는 경우로 판단하고,'거짓'이면 비선형성으로 판단한다.In this case, if the determination result of Equation 3 is 'true', it is determined that the linearity is satisfied, and if it is 'false', it is determined as nonlinearity.

검출부(613)는 복수의 소스 포트 번호가 시간에 비례하여 선형적으로 증가하는 서로 다른 그룹이 2개 이상 발견되면, IP 공유기가 사용되는 경우로 검출한다. 그리고 그룹의 개수만큼 IP 공유기를 사용하는 사용자 단말의 수를 계산한다.The detection unit 613 detects the case where an IP router is used when two or more different groups in which the plurality of source port numbers increase linearly in proportion to time are found. The number of user terminals using the IP router is calculated as the number of groups.

이제, 이상 설명한 내용을 토대로 IP 공유기를 검출하는 방법에 대해 설명한다. 이때, 도 1 내지 도 7에서 설명한 구성 요소의 동작에 대한 설명에서는 동일한 도면 부호를 사용한다.Now, a method of detecting an IP router based on the above description will be described. In this case, the same reference numerals are used to describe the operation of the components described with reference to FIGS. 1 to 7.

도 8은 본 발명의 실시예에 따른 IP 공유기를 검출하는 방법을 나타낸 순서도로서, 도 2의 IP 공유기 검출 시스템(600)의 동작에 대한 설명이다.FIG. 8 is a flowchart illustrating a method of detecting an IP router according to an exemplary embodiment of the present invention, and illustrates an operation of the IP router detecting system 600 of FIG. 2.

도 8을 참조하면, 수집부(601)가 도 1의 상향 트래픽 경로 상에서 소스 IP 주소가 동일한 HTTP 요청 메시지를 수집한다(S101). Referring to FIG. 8, the collector 601 collects an HTTP request message having the same source IP address on the uplink traffic path of FIG. 1 (S101).

그러면, 추출부(603)는 수집부(601)가 수집한 HTTP 요청 메시지로부터 유저 에이전트(User-agent) 필드 및 소스 포트 번호를 추출한다(S103, S015).Then, the extractor 603 extracts a user-agent field and a source port number from the HTTP request message collected by the collector 601 (S103 and S015).

분류부(605)는 S103 단계에서 추출한 유저 에이전트(User-agent) 필드가 테이블에 저장되어 있는지를 판단한다(S107).The classification unit 605 determines whether the user-agent field extracted in step S103 is stored in the table (S107).

이때, 저장되지 않은 경우, S103 단계 및 S105 단계에서 추출한 유저 에이전트(User-agent) 필드 및 현재 시간의 소스 포트 번호를 매핑하여 테이블에 저장(S109)한 후, S101 단계부터 다시 시작한다. In this case, if not stored, the user-agent field extracted in steps S103 and S105 and the source port number of the current time are mapped and stored in the table (S109), and then starts again from step S101.

이와 같이, 테이블에 유저 에이전트(User-agent) 필드 별로 매핑된 소스 포트 번호는 도 8과 같은 그래프로 표현된다.As such, the source port numbers mapped to the user-agent fields in the table are represented by a graph as shown in FIG. 8.

그러나 S107 단계에서 저장된 경우로 판단되면, 분석부(611)는 테이블에 저장된 이전 시간의 소스 포트 번호와 S105 단계에서 추출한 현재 시간의 소스 포트 번호의 차이를 계산한다(S111). 그리고 계산된 소스 포트 번호의 차이가 설정부(609)가 설정한 기 정의된 임계치 이하인지를 판단한다(S113).However, if it is determined that the case is stored in step S107, the analysis unit 611 calculates the difference between the source port number of the previous time stored in the table and the source port number of the current time extracted in step S105 (S111). In operation S113, it is determined whether the calculated difference between the source port numbers is equal to or less than a predetermined threshold set by the setting unit 609.

이때, 임계치를 초과하는 경우로 판단되면, 분석부(611)는 비선형으로 판단(S115)한다.At this time, if it is determined that the threshold value is exceeded, the analysis unit 611 determines to be non-linear (S115).

하지만, 임계치 이하인 경우, 분석부(611)는 선형성을 만족하는 경우로 판단(S117)하고, 분류부(605)는 테이블에 저장된 이전 소스 포트 번호를 삭제하고 S105 단계에서 추출된 현재 시간의 소스 포트 번호로 갱신한다(S119). However, if it is less than the threshold, the analysis unit 611 determines that the linearity satisfies (S117), the classification unit 605 deletes the previous source port number stored in the table and the source port of the current time extracted in step S105 Update to the number (S119).

그러면, 검출부(613)는 도 9에 보인 것처럼, 시간대 별 소스 포트 번호의 선형 곡선을 생성한다(S121).Then, the detector 613 generates a linear curve of the source port number for each time zone as shown in FIG. 9 (S121).

검출부(613)는 S121 단계에서 생성되는 선형 곡선이 2개 이상 발생하는지를 판단한다(S123).The detector 613 determines whether two or more linear curves generated in step S121 occur (S123).

이때, 2개 이상 발생하지 않는 경우, S101 단계를 다시 시작한다. 하지만, 2개 이상 발생하는 경우, IP 공유기가 사용되는 경우로 검출한다(S125). 그리고 선형 곡선의 개수만큼 IP 공유기의 사용자 단말 수를 계산한다(S127). 예를 들어, 서로 다른 선형 곡선이 4개가 발생되었다면, 4개의 IP 공유기를 공유하는 사용자 단말이 있는 것으로 계산하는 것이다.At this time, if two or more do not occur, step S101 is restarted. However, if two or more occur, it is detected that the IP router is used (S125). The number of user terminals of the IP router is calculated by the number of linear curves (S127). For example, if four different linear curves are generated, it is calculated that there are user terminals sharing four IP routers.

도 9는 본 발명의 실시예에 따른 임계치 설정 방법을 나타낸 순서도로서, 도 9의 S113 단계에서 사용되는 임계치를 사전에 설정하는 방법을 나타낸다.FIG. 9 is a flowchart illustrating a threshold setting method according to an exemplary embodiment of the present invention, and illustrates a method of previously setting a threshold used in operation S113 of FIG. 9.

도 9를 참조하면, 수집부(601)가 기 정의된 일정 시간 동안 공인 IP가 동일한 복수의 HTTP 요청 메시지들을 수집한다(S201).Referring to FIG. 9, the collection unit 601 collects a plurality of HTTP request messages having the same public IP for a predetermined time (S201).

추출부(603)는 소스 IP 주소가 일치하는 복수의 HTTP 요청 메시지에서 유저 에이전트(User-agent) 필드를 추출한다(S203). 그리고 유저 에이전트(User-agent) 필드가 일치하는 HTTP 요청 메시지의 소스 포트 번호를 HTTP 요청 메시지를 수신한 시간 순서에 따라 추출한다(S205).The extractor 603 extracts a user-agent field from a plurality of HTTP request messages that match the source IP address (S203). The source port number of the HTTP request message with which the user-agent field is matched is extracted according to the time order of receiving the HTTP request message (S205).

설정부(609)는 S205 단계에서 추출된 복수의 소스 포트 번호를 이전 시간대의 소스 포트 번호와 현재 시간대의 소스 포트 번호 간의 차이를 계산한다(S207). 그리고 이러한 소스 포트 번호 간의 차이를 계산한 결과에 가우시안 혼합 모델을 적용(S209)하여 임계치를 도출한다(S211).The setting unit 609 calculates a difference between the source port number of the previous time zone and the source port number of the current time zone from the plurality of source port numbers extracted in step S205 (S207). The threshold value is derived by applying a Gaussian mixture model to the result of calculating the difference between the source port numbers (S209).

한편, 도 10 및 도 11은 이상 설명한 IP 공유기 검출 방법을 적용한 시뮬레이션 결과를 나타낸다.10 and 11 show simulation results of applying the IP router detection method described above.

도 10은 본 발명의 실시예에 따른 시간대별 서로 다른 유저 에이전트(User-agent) 필드를 갖는 HTTP 요청 메시지들의 소스 포트 번호를 나타내는 그래프이고, 도 11은 본 발명의 실시예에 따른 IP 공유기 검출예를 나타낸 그래프이다.FIG. 10 is a graph illustrating source port numbers of HTTP request messages having different user agent fields according to time zones according to an embodiment of the present invention, and FIG. 11 is an example of detecting an IP router according to an embodiment of the present invention. This is a graph.

이때, 시뮬레이션에 사용된 IP 공유기는 ipTIME 계열의 4개의 UTP(User Datagram Protocol) 포트를 갖는다. 그리고 총 3대의 클라이언트 PC(클라이언트 1 PC, 클라이언트 2 PC, 클라이언트 3 PC)를 이용하여 사설 네트워크를 구성한다.At this time, the IP router used in the simulation has four User Datagram Protocol (UTP) ports of the ipTIME series. A total of 3 client PCs (Client 1 PC, Client 2 PC, Client 3 PC) are used to construct a private network.

이 중에서, 클라이언트 1 PC는 1차 분류 기법의 오탐지율을 보여주기 위하여 2개의 웹 프로그램을 이용하여 인터넷을 하였고, 나머지 클라이언트 2 PC, 클라이언트 3 PC는 하나의 웹 프로그램만을 이용하였다.Among them, the client 1 PC used the Internet using two web programs to show the false detection rate of the primary classification scheme. The other client 2 PCs and the client 3 PCs used only one web program.

도 10을 참조하면, 클라이언트 별로 소스 포트 번호가 다르게 증가함을 알 수 있다. 이는 최초 HTTP 요청시 소스 포트 번호 부여는 범위 안에서 랜덤하게 생성되지만, 이후의 추가적인 HTTP 요청에 대해서는 소스 포트 번호가 선형적으로 증가하기 때문에 클라이언트 별로 소스 포트 번호가 다르게 증가하기 때문이다.Referring to FIG. 10, it can be seen that the source port number increases differently for each client. This is because the source port number assignment is generated randomly within the range when the first HTTP request is made, but the source port number increases differently for each client because the source port number increases linearly for subsequent HTTP requests.

도 11은 이러한 도 10의 복수의 소스 포트 번호들에 대해 임계치를 이용하여 시간대 별 소스 포트 번호의 선형 곡선을 생성한 결과를 나타낸다.FIG. 11 illustrates a result of generating a linear curve of source port numbers for each time zone using threshold values for the plurality of source port numbers of FIG. 10.

여기서, 굵은 실선은 클라이언트 별 소스 포트 번호들의 선형 곡선에 해당된다.Here, the thick solid line corresponds to a linear curve of source port numbers for each client.

도 11을 참조하면, 선형 곡선이 2개 이상 검출되므로, 단일 공인 IP 주소를 사용하는 IP 공유기가 사용되고 있음을 알 수 있다. 또한, 서로 다른 선형 곡선이 3개가 검출되므로, 시뮬레이션에 이용된 총 3대의 클라이언트 PC(클라이언트 1 PC, 클라이언트 2 PC, 클라이언트 3 PC)가 검출됨을 알 수 있다.Referring to FIG. 11, since two or more linear curves are detected, it can be seen that an IP router using a single public IP address is used. In addition, since three different linear curves are detected, it can be seen that a total of three client PCs (client 1 PC, client 2 PC, and client 3 PC) used in the simulation are detected.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

100, 200, 300: 클라이언트 단말 400: IP 공유기
500: 서비스 서버 600: IP 공유기 검출 시스템
601: 수집부 603: 추출부
605: 분류부 607: 저장부
609: 설정부 611: 분석부
613: 검출부100, 200, 300: client terminal 400: IP router
500: service server 600: IP router detection system
601: collecting unit 603: extracting unit
605: classification unit 607: storage unit
609: setting unit 611: analysis unit
613: detection unit

Claims

From among a plurality of web service request messages received through an IP network, a source IP address and a user agent field, where the user agent field contains information about the user who sent the web service request message. Extracting each source port number according to a time sequence of receiving the plurality of web service request messages;
Classifying the extracted plurality of source port numbers into one or more groups by analyzing patterns changing according to the time sequence; And
Calculating the number of user terminals using the IP router based on the number of classified groups;
The classifying step,
The source port numbers having mutual linearity among the plurality of source port numbers are classified into one group, and a linear curve is generated based on the source port numbers classified into the same group. IP router detection method to detect the use of IP router.

The method of claim 1,
The classifying step,
Calculating a difference between a source port number of a previous time and a source port of a current time with respect to the extracted plurality of source port numbers; And
Classifying the difference between the source port numbers into one or more groups consisting of a plurality of source port numbers having a predefined linearity.
IP router detection method comprising a.

The method of claim 2,
Classifying into one or more groups consisting of the plurality of source port numbers,
If the difference between the source port number has a predetermined threshold or less, it is determined that the linearity, and classifies into one or more groups consisting of a plurality of source port numbers having a difference below the predetermined threshold.

The method of claim 3,
Before the extracting step,
Setting the threshold value using a Gaussian mixture model to a user agent field and a plurality of source port numbers extracted from the plurality of web service request messages collected for a predetermined time;
IP router detection method further comprising.

5. The method according to any one of claims 2 to 4,
Wherein the calculating step comprises:
Creating a table that maps the one or more groups with user agent fields;
Generating a linear curve in which a plurality of source port numbers increase linearly with time using a plurality of source port numbers included in the one or more groups mapped to user agent fields stored in the table; And
Detecting the use of the IP router when two or more different linear curves are generated, and calculating the number of user terminals using the number of the different linear curves
IP router detection method comprising a.

The method of claim 5,
The web service request message,
And an HTTP (HyperText Transfer Protocol) request message including the user agent field and the source port number in header information.

From among a plurality of web service request messages received through an IP network, a source IP address and a user agent field, where the user agent field contains information about the user who sent the web service request message. An extraction unit for extracting each source port number according to a time sequence of receiving the plurality of web service request messages;
A classification unit classifying the extracted plurality of source port numbers into one or more groups by analyzing a pattern that changes according to the time sequence; And
A detector configured to calculate the number of user terminals using the IP router based on the number of classified groups,
Wherein,
The source port numbers having mutual linearity among the plurality of source port numbers are classified into one group, and a linear curve is generated based on the source port numbers classified into the same group.
Wherein:
An IP router detection system for detecting the use of an IP router when the linear curve is two or more.

The method of claim 7, wherein
The apparatus may further include an analyzer configured to determine whether a difference between a source port number of a previous time and a source port of a current time satisfies a predefined linearity among the extracted plurality of source port numbers.
Wherein,
And classifying the extracted plurality of source port numbers into one or more groups consisting of a plurality of source port numbers having a predefined linearity between the source port numbers.

9. The method of claim 8,
The analysis unit,
After setting a threshold by applying a Gaussian mixture model to the user agent field and the plurality of source port numbers extracted from the plurality of web service request messages collected for a predetermined time, a difference between the source port numbers is less than or equal to a preset threshold. IP router detection system to determine if the case having a linearity.

10. The method according to claim 8 or 9,
Wherein:
After generating a table in which the at least one group is mapped to a user agent field, a plurality of source port numbers in accordance with time using a plurality of source port numbers included in the at least one group mapped for each user agent field stored in the table. Generates a linearly increasing linear curve, and when two or more different linear curves occur, detecting the use of the IP router and calculating the number of the user terminals using the number of the different linear curves. IP Router Detection System.

The method of claim 10,
A collecting unit for constantly monitoring upstream traffic, collecting a HTTP (HyperText Transfer Protocol) request message including the user agent field and the source port number in header information from the upstream traffic, and forwarding the HTTP to the extractor.
IP router detection system further comprising.