KR20180052243A

KR20180052243A - Method and device for detecting frauds by using click log data

Info

Publication number: KR20180052243A
Application number: KR1020160149329A
Authority: KR
Inventors: 김상욱; 이상철; 채동규
Original assignee: 주식회사 자이냅스
Priority date: 2016-11-10
Filing date: 2016-11-10
Publication date: 2018-05-18
Also published as: WO2018088824A1; KR101879829B1

Abstract

The present invention relates to a method and an apparatus for detecting an abnormal user by using click log data with respect to an item within a site. According to an embodiment of the present invention, the method for detecting an abnormal user through click log data with respect to an item within a site comprises the steps of: collecting click log data for each site user; extracting inter arrival time (IAT) for each site user by using the collected click log data; extracting diurnal activity (DA) for each site user by using the collected click log data; extracting eigenscore (ES) for each site user by using the collected click log data; calculating a doubt score for each site user by using at least one among the extracted IAT, DA and ES; and detecting an abnormal user from the site users based on the calculated doubt score for each user. The present invention is able to easily detect an abnormal user at low costs.

Description

TECHNICAL FIELD [0001] The present invention relates to a method and an apparatus for detecting an abnormal user using click log data,

본 발명은 사이트 내 아이템에 대한 클릭 로그 데이터를 이용하여 이상 사용자를 검출하는 방법 및 검출하는 장치에 대한 것으로서, 본 발명의 일실시예에 따르면 사이트 사용자의 클릭 로그 데이터를 이용하여 사이트 사용자의 행동 패턴을 분석하고 이상 사용자를 검출할 수 있다.The present invention relates to a method and an apparatus for detecting an abnormal user by using click log data of an item in a site. According to an embodiment of the present invention, And can detect an abnormal user.

종래의 이상 사용자 (fraud) 검출 연구들은 이상 사용자의 활동 도메인 및 활동 패턴에 대한 관찰을 통해 이루어졌다. 예를 들어, 클릭형 이상 사용자(click fraud) 검출 연구는 특정 광고를 의도적으로 많이 클릭하는 사용자에 대한 행동을 검출해내는 연구이며, 명성형 이상 사용자(reputation fraud)는 전자상거래 사이트에서 평점이나 댓글 등으로 여론을 조작하는 사용자를 검출하는 연구이다. 또한, 순위 조작형 이상 사용자(ranking fraud)는 모바일 앱스토어에서 특정 어플의 인기 순위를 높이기 위해 비정상적인 방법을 사용하는 유저를 검출하는 연구이다.Conventional fraud detection studies have been done through observation of activity domains and activity patterns of abnormal users. For example, a click fraud detection study is a research that detects behavior for a user who intentionally clicks on a specific advertisement, and a reputation fraud is a rating or comment on an e-commerce site And so on. In addition, ranking fraud is a research that detects users who use abnormal methods to increase popularity ranking of a specific application in mobile app store.

이와 달리, 가격비교 사이트 사용자들의 행동 패턴(예, 클릭 동작)을 분석하고, 이상 사용자로 의심되는 행동 패턴을 보이는 사용자들을 검출하는 연구는 지금까지 이루어지지 않았다. 이는 가격비교 사이트를 이용하는 사용자들 중 이상 행동을 보이는 사용자들이 타 서비스의 이상 사용자들, 예를 들면 광고 네트워크에서의 클릭형 이상 사용자(click fraud), 온라인 전자상거래 사이트에서의 명성형 이상 사용자(reputation fraud), 또는 모바일 앱 스토어에서의 순위 조작형 이상 사용자(ranking fraud)들과 다른 행동 특징을 가지고 있기 때문이다. On the other hand, research on analyzing behavior patterns (eg, click actions) of price comparison site users and detecting users who show suspicious behavior patterns as abnormal users has not been performed so far. This is because users who exhibit abnormal behaviors among users who use the price comparison site are more likely to be users of other services, for example, click fraud in the advertisement network, reputation in the online e-commerce site fraud), or ranking frauds in the mobile app store.

보다 구체적으로, 가격비교 사이트를 이용하는 사용자들 중 이상 행동을 보이는 사용자들은, 한 명의 사용자가 하나의 아이템에 대해서 다양한 반응(reaction) (예를 들면, 구매, 평점 부여, 아이템과 관련된 데이터 다운로드, 아이템 클릭 등)을 보일 수 있다는 특징을 가지고 있다. More specifically, users who exhibit abnormal behaviors among users using the price comparison site can be classified into various types such as a user who has various reactions (for example, purchase, rating, Click, etc.) can be shown.

기존의 도메인 시스템에서는 한 명의 사용자가 하나의 아이템에 한 번 이상의 반응(reaction)을 보이는 것이 허용되지 않거나, 부자연스러운 행위로 간주된다. 따라서, 한 명의 사용자가 하나의 아이템에 대해서 다양한 반응(reaction)을 보일 수 있는 가격비교 사이트의 이상 사용자 검출에는 기존의 도메인 시스템을 적용할 수 없으며, 기존의 이상 사용자(fraud) 검출 방법들 역시 적용할 수 없다.In an existing domain system, one user is not allowed to show more than one reaction to an item, or is regarded as an unnatural act. Therefore, the conventional domain system can not be applied to the abnormal user detection of the price comparison site where one user can show various reactions for one item, and the existing abnormal fraud detection methods are also applied Can not.

상기한 바와 같이 한 명의 사용자가 하나의 아이템에 대해서 다양한 반응을 보일 수 있을 경우, 기존의 이상 사용자 검출 방법들을 이용할 수 없다. 이에 따라, 본 발명의 일실시예는 가격비교 사이트를 이용하는 사용자 중 이상 사용자를 검출하는 방법을 제공할 수 있다.As described above, when one user can show various responses to one item, the conventional abnormal user detection methods can not be used. Accordingly, an embodiment of the present invention can provide a method of detecting an abnormal user among users using a price comparison site.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 방법은, 사이트 사용자 별 클릭 로그 데이터를 수집하는 단계, 상기 수집된 클릭 로그 데이터를 이용하여 상기 사이트 사용자 별 IAT(inter arrival time)를 추출하는 단계, 상기 수집된 클릭 로그 데이터를 이용하여 상기 사이트 사용자 별 DA(diurnal activity)를 추출하는 단계, 상기 수집된 클릭 로그 데이터를 이용하여 상기 사이트 사용자 별 ES(eigenscore)를 추출하는 단계, 상기 추출된 IAT, DA 또는 ES 중 적어도 하나를 이용하여 상기 사이트 사용자 별 의심 점수를 계산하는 단계 및 상기 계산된 사용자 별 의심 점수를 기초로, 상기 사이트 사용자 중 이상 사용자를 검출하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a method for detecting an abnormal user through click log data on an item in a site includes collecting click log data for each site user, using the collected click log data, Extracting a star interat arrival time (IAT), extracting a diurnal activity by the site user using the collected click log data, extracting a diurnal activity by the site user per ES using the collected click log data, eigenscore), calculating the suspicious point score for each site user using at least one of the extracted IAT, DA, or ES, and calculating a suspicious score for each of the site users based on the calculated user- And a step of detecting.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 방법은, 미리 설정된 시간 동안 상기 사이트에 접속한 사용자 별 클릭 로그 데이터를 저장하는 단계 및 상기 저장된 클릭 로그 데이터 중 상기 사이트 내 아이템에 대한 총 클릭 수가 임계치 미만인 클릭 로그 데이터를 제거하는 단계를 포함할 수 있다.According to an embodiment of the present invention, there is provided a method of detecting an abnormal user through click log data on items in a site, the method comprising: storing click log data per user connected to the site for a preset time; And removing the click log data whose total number of clicks for the items in the site is less than the threshold.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 방법은, 상기 사이트의 모든 사용자들의 연속된 두 번의 클릭 로그 쌍에 대한 시간 간격 정보를 포함하는 I_normal 벡터를 설정하는 단계, 상기 사이트의 임의의 사용자 u의 연속된 두 번의 클릭 로그 쌍에 대한 시간 간격 정보를 포함하는 I_u 벡터를 설정하는 단계 및 상기 설정된 I_normal 벡터 및 I_u 벡터를 기초로, 상기 임의의 사용자 u의 IAT를 계산하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a method for detecting an anomaly through click log data on an item in a site includes the steps of: I _normal, which includes time interval information for two consecutive pairs of click logs of all users of the site; Setting an I _u vector including time interval information for two consecutive click-log pairs of any user u of the site, and based on the set I _normal vector and I _u vector, And calculating the IAT of any user u.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 방법은, 상기 사이트의 모든 사용자들에 대한 시간대 별 클릭 수 정보를 포함하는 D_normal 벡터를 설정하는 단계, 상기 사이트의 임의의 사용자 u에 대한 시간대 별 클릭 수 정보를 포함하는 D_u 벡터를 설정하는 단계 및 상기 설정된 D_normal 벡터 및 D_u 벡터를 기초로, 사용자 별 DA를 계산하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a method for detecting an abnormal user through click log data on an item in a site includes: setting a D _normal vector including information on the number of clicks per time zone for all users of the site , Setting a D _u vector containing information on the number of clicks per time zone for any user u of the site, and calculating DA per user based on the set D _normal vector and D _u vector have.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 방법은, 상기 사이트 사용자 별 클릭 로그 데이터를 이용하여, 사용자 별 사용자-아이템 행렬을 구성하는 단계, SVD(singular vector decomposition)를 통해 상기 구성된 행렬에서 밀도 블록(dense block)을 검색하는 단계, 상기 검색된 밀도 블록(dense block)을 기초로 상기 사이트 사용자 별 ES를 추출하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a method for detecting an abnormal user through click log data on an item in a site includes constructing a user-item matrix for each user using the click log data for each site user, searching for a dense block in the constructed matrix through a singular vector decomposition, and extracting the ES for each site user based on the retrieved dense block.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 장치는, 상기 클릭 로그 데이터를 저장하는 데이터 베이스 및 프로세서를 포함하고, 상기 프로세서는, 사이트 사용자 별 클릭 로그 데이터를 수집하고, 상기 수집된 클릭 로그 데이터를 이용하여 상기 사이트 사용자 별 IAT(inter arrival time)를 추출하고, 상기 수집된 클릭 로그 데이터를 이용하여 상기 사이트 사용자 별 DA(diurnal activity)를 추출하고, 상기 수집된 클릭 로그 데이터를 이용하여 상기 사이트 사용자 별 ES(eigenscore)를 추출하고, 상기 추출된 IAT, DA 또는 ES 중 적어도 하나를 이용하여 상기 사이트 사용자 별 의심 점수를 계산하고, 상기 계산된 사용자 별 의심 점수를 기초로, 상기 사이트 사용자 중 이상 사용자를 검출할 수 있다.According to an embodiment of the present invention, an apparatus for detecting an abnormal user through click log data on an item in a site includes a database for storing the click log data and a processor, Collects the log data, extracts the inter arrival time (IAT) for each site user using the collected click log data, extracts the diurnal activity by the site user using the collected click log data Extracts an ES (eigenscore) for each site user using the collected click log data, calculates the suspicious points for each site user by using at least one of the extracted IAT, DA, or ES, The abnormal user among the users of the site can be detected on the basis of the star suspect score.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 장치는, 미리 설정된 시간 동안 상기 사이트에 접속한 사용자 별 클릭 로그 데이터를 저장하고, 상기 저장된 클릭 로그 데이터 중 상기 사이트 내 아이템에 대한 총 클릭 수가 임계치 미만인 클릭 로그 데이터를 제거하는 동작을 포함할 수 있다.According to an embodiment of the present invention, an apparatus for detecting an abnormal user through click log data for an item in a site stores click log data per user connected to the site for a predetermined time, And removing the click log data whose total number of clicks for items in the site is less than the threshold.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 장치는, 상기 사이트의 모든 사용자들의 연속된 두 번의 클릭 로그 쌍에 대한 시간 간격 정보를 포함하는 I_normal 벡터를 설정하고, 상기 사이트의 임의의 사용자 u의 연속된 두 번의 클릭 로그 쌍에 대한 시간 간격 정보를 포함하는 I_u 벡터를 설정하고, 상기 설정된 I_normal 벡터 및 I_u 벡터를 기초로, 상기 임의의 사용자 u의 IAT를 계산하는 동작을 포함할 수 있다.According to an embodiment of the present invention, an apparatus for detecting an abnormal user through click log data on an item in a site includes: I _normal, which includes time interval information for two consecutive click log pairs of all users of the site; Sets an I _u vector including time interval information on two consecutive click log pairs of a certain user u of the site, and generates an I _u vector based on the set I _normal vector and I _u vector, Lt; RTI ID = 0.0 > u < / RTI >

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 장치는, 상기 사이트의 모든 사용자들에 대한 시간대 별 클릭 수 정보를 포함하는 D_normal 벡터를 설정하고, 상기 사이트의 임의의 사용자 u에 대한 시간대 별 클릭 수 정보를 포함하는 D_u 벡터를 설정하고, 상기 설정된 D_normal 벡터 및 D_u 벡터를 기초로, 사용자 별 DA를 계산하는 동작을 포함할 수 있다.According to an embodiment of the present invention, an apparatus for detecting an abnormal user through click log data for an item in a site may set a D _normal vector including information on the number of clicks per time zone for all users of the site, Setting a D _u vector including the number of clicks per time zone for an arbitrary user u of the site, and computing DA per user based on the set D _normal vector and D _u vector.

본 발명의 일실시예에 따르면, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 장치는, 상기 사이트 사용자 별 클릭 로그 데이터를 이용하여, 사용자 별 사용자-아이템 행렬을 구성하고, SVD(singular vector decomposition)를 통해 상기 구성된 행렬에서 밀도 블록(dense block)을 검색하고, 상기 검색된 밀도 블록(dense block)을 기초로 상기 사이트 사용자 별 ES를 추출하는 동작을 포함할 수 있다.According to an embodiment of the present invention, an apparatus for detecting an abnormal user through click log data for an item in a site configures a user-item matrix for each user using the click log data for each site user, searching a dense block in the constructed matrix through singular vector decomposition and extracting the ES for each site user based on the dense block searched.

본 발명의 일실시예에 따르면, 가격비교 사이트를 이용하는 사용자들 중 이상 사용자를 검출할 수 있다. 보다 구체적으로, 본 발명은 종래 가격비교 서비스를 제공하는 사업자(네이버 쇼핑, 에누리닷컴, 다나와 등)가 추가적인 장치를 설치할 필요없이 사이트 사용자 중 이상 사용자를 검출할 수 있는 방법을 제공할 수 있다. 이에 따라, 가격비교 서비스를 제공하는 사업자는 작은 비용을 들여 이상 사용자를 손쉽게 검출할 수 있다.According to an embodiment of the present invention, an abnormal user among the users using the price comparison site can be detected. More specifically, the present invention can provide a method by which a provider (Naver Shop, Enuri.com, Danawa, etc.) that provides a conventional price comparison service can detect an abnormal user among site users without installing an additional device. Accordingly, a service provider providing a price comparison service can easily detect an abnormal user with a small cost.

또한, 본 발명의 일실시예에 따르면, 자신이 판매하는 상품에 대한 인기를 높이기 위해 정당하지 않은 방법을 사용하는 사용자들의 다양한 시도를 제한할 수 있다.Also, according to an embodiment of the present invention, various attempts of users who use an unfair method can be restricted to increase the popularity of a product they sell.

도 1은 본 발명의 일실시예에 따른 사이트에서 추천 상품이 표시되는 형태를 설명하기 위한 도면이다.
도 2는 본 발명의 일실시예에 따른 사이트의 일반적인 사용자와 이상 사용자의 IAT 차이를 비교하기 위한 도면이다.
도 3은 본 발명의 일실시예에 따른 사이트의 일반적인 사용자와 이상 사용자의 DA 차이를 비교하기 위한 도면이다.
도 4는 본 발명의 일실시예에 따른 사이트의 일반적인 사용자와 이상 사용자의 ES 차이를 비교하기 위한 도면이다.
도 5는 본 발명의 일실시예에 따른 사이트 내 아이템에 대한 클릭 로그 데이터를 이용하여 이상 사용자를 검출하는 방법을 설명하기 위한 순서도이다.1 is a view for explaining a mode in which a recommended product is displayed in a site according to an embodiment of the present invention.
2 is a diagram for comparing IAT differences between a general user and an abnormal user in a site according to an embodiment of the present invention.
3 is a diagram for comparing DA differences between a general user and an ideal user in a site according to an embodiment of the present invention.
4 is a diagram for comparing ES differences between a general user and an ideal user in a site according to an embodiment of the present invention.
5 is a flowchart illustrating a method of detecting an abnormal user using click log data for an item in a site according to an embodiment of the present invention.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.It is to be understood that the specific structural or functional descriptions of embodiments of the present invention disclosed herein are presented for the purpose of describing embodiments only in accordance with the concepts of the present invention, May be embodied in various forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Embodiments in accordance with the concepts of the present invention are capable of various modifications and may take various forms, so that the embodiments are illustrated in the drawings and described in detail herein. However, it is not intended to limit the embodiments according to the concepts of the present invention to the specific disclosure forms, but includes changes, equivalents, or alternatives falling within the spirit and scope of the present invention.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.The terms first, second, or the like may be used to describe various elements, but the elements should not be limited by the terms. The terms may be named for the purpose of distinguishing one element from another, for example without departing from the scope of the right according to the concept of the present invention, the first element being referred to as the second element, Similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 “연결되어” 있다거나 “접속되어” 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 “직접 연결되어” 있다거나 “직접 접속되어” 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 “~사이에”와 “바로~사이에” 또는 “~에 직접 이웃하는” 등도 마찬가지로 해석되어야 한다.It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between. Expressions that describe the relationship between components, for example, "between" and "immediately" or "directly adjacent to" should be interpreted as well.

본 명세서에서 사용한 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, “포함하다” 또는 “가지다” 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, the terms " comprises " or " having ", and the like, are used to specify one or more of the features, numbers, steps, operations, But do not preclude the presence or addition of steps, operations, elements, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the meaning of the context in the relevant art and, unless explicitly defined herein, are to be interpreted as ideal or overly formal Do not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. Like reference symbols in the drawings denote like elements.

도 1은 본 발명의 일실시예에 따른 사이트에서 추천 상품이 표시되는 형태를 설명하기 위한 도면이다.1 is a view for explaining a mode in which a recommended product is displayed in a site according to an embodiment of the present invention.

일실시예에 따른 사이트는, 사이트 사용자의 클릭 로그 데이트를 통해 사용자의 행동 패턴을 분석할 수 있는 아이템을 포함하고 있는 사이트를 의미한다. 예를 들어, 다양한 상품들의 가격을 비교하여 제공하는 가격비교 사이트의 경우, 다양한 상품에 대한 인기도, 판매량 등을 기초로 상품의 순위를 제공할 수 있다. 이러한 가격비교 사이트에서 이미지 또는 텍스트 형태의 링크로 제공되는 아이템들은 사용자의 클릭 로그 데이터를 검출할 수 있는 수단으로 이용될 수 있다. A site according to an embodiment means a site that includes an item that can analyze a behavior pattern of a user through a click log date of a site user. For example, in the case of a price comparison site in which the prices of various products are compared with each other, the ranking of the products can be provided based on popularity of various products, sales volume, and the like. Items provided as links in the form of images or texts at these price comparison sites can be used as means for detecting the user's click log data.

도 1은, 사용자가 가격비교 사이트에서 laptop 을 검색한 경우 표시될 수 있는 화면의 일 예시로서, 도 1의 110에는, 사용자가 검색한 laptop과 관련된 다양한 아이템들이 표시될 수 있다. 사용자가 입력한 검색어에 대응하는 아이템들은 이미지 또는 텍스트 형태의 링크로 제공될 수 있다. FIG. 1 is an example of a screen that can be displayed when a user searches for a laptop at a price comparison site. In FIG. 1, various items related to the laptop searched by the user can be displayed. The items corresponding to the search term entered by the user can be provided as links in the form of images or texts.

도 1의 120에는, 사용자가 검색한 laptop과 관련된 아이템들이 인기 순위에 따라 순서대로 표시될 수 있다. 120에 표시된 아이템들의 인기 순위는, 해당 사이트를 이용하는 사용자들의 클릭 로그 데이터를 기초로 하여 제공될 수 있다. 예를 들면, 사이트 사용자들로부터 가장 많은 클릭 수를 입력 받은 아이템이 1순위로 표시될 수 있다. 또는, 사이트 사용자들에게 가장 많이 판매된 아이템이 1순위로 표시될 수 있다. In FIG. 1, the items related to the laptop searched by the user can be displayed in order according to popularity ranking. The popularity ranking of the items displayed at 120 may be provided based on the click log data of the users using the site. For example, the item that receives the largest number of clicks from the site users may be displayed in a rank order. Or, the items sold most to the site users may be ranked first.

도 1에 개시된 사이트는, 본 발명의 일실시예에 따른 이상 사용자를 검출하는 장치를 통해 사용자에게 제공될 수 있다. 이상 사용자를 검출하는 장치는, 사이트를 이용하는 사용자들의 클릭 로그 데이터를 수집하고, 수집된 클릭 로그 데이터로부터 사용자에 대한 IAT(inter arrival time), DA(diurnal activity) 및 ES(eigenscore)를 추출할 수 있는 프로세서 및 수집된 클릭 로그 데이터가 저장되는 데이터 베이스를 포함할 수 있다. 이하, 도 2 내지 도 4에서 IAT, DA 및 ES에 대해서 자세히 설명하기로 한다.The site disclosed in FIG. 1 may be provided to a user through an apparatus for detecting an abnormal user according to an embodiment of the present invention. The apparatus for detecting an abnormal user can collect click log data of users using the site and extract interat arrival time (IAT), diurnal activity (DA) and ES (eigenscore) for the user from the collected click log data And a database in which the processor and the collected click log data are stored. Hereinafter, the IAT, DA, and ES will be described in detail with reference to FIGS. 2 to 4. FIG.

도 2는 본 발명의 일실시예에 따른 사이트의 일반적인 사용자와 이상 사용자의 IAT 차이를 비교하기 위한 도면이다.2 is a diagram for comparing IAT differences between a general user and an abnormal user in a site according to an embodiment of the present invention.

수집된 클릭 로그 데이터로부터 추출될 수 있는 사이트 사용자 별 IAT(inter arrival time)는, 제1 사용자가 제1 아이템을 클릭한 후, 다시 제1 아이템을 클릭할 때까지 소요된 시간을 의미하는 값이다. 즉, IAT는 일반적인 사용자와 이상 사용자를 구별하는데 이용될 수 있는 지표 중 하나이다.The inter arrival time (IAT) for each site user that can be extracted from the collected click log data is a value indicating the time taken for the first user to click on the first item and again click on the first item . In other words, IAT is one of the indicators that can be used to distinguish general users from ideal users.

이상 사용자를 검출하는 장치에 구비된 프로세서는, 클릭 로그 데이터를 이용하여 임의의 사용자 u에 대한 IAT 값을 기초로 한 I_u 벡터를 설정할 수 있다. I_u 벡터는 n차원 벡터로서, n은 세션의 길이(사이트에 접속한 사용자 u의 최초 클릭과 최종 클릭의 시간 차)를 의미할 수 있다. I_u 벡터의 i 번째 차원의 값은, 사용자 u의 연속된 두 번의 클릭 로그 쌍들 중 i 초의 클릭 시간 간격(click interval)을 가지는 클릭 로그 쌍들의 비율을 의미할 수 있다. 예를 들어, n은 1,200으로 설정될 수 있다. 이는 사이트에 접속한 사용자 u가, 사이트 내 임의의 아이템에 대하여 최초 클릭을 수행한 후 1,200초(즉, 20분) 동안 다음 클릭 동작을 수행하지 않을 경우, 세션을 종료한 것으로 간주한다는 뜻이다. 이와 같은 방식으로, 프로세서는 사이트를 이용하는 모든 사용자들의 클릭 로그 데이터를 이용하여 I_normal 벡터를 설정할 수도 있다.The processor included in the apparatus for detecting abnormal users can set an I _u vector based on the IAT value for an arbitrary user u using the click log data. I _u vector is an n-dimensional vector, and n can mean the length of the session (the time difference between the first click and the last click of the user u connected to the site). The value of the i-th dimension of the _{u u} vector may refer to the ratio of the click log pairs having a click interval of i seconds among two consecutive click log pairs of the user u. For example, n may be set to 1,200. This means that if the user u connecting to the site does not perform the next click operation for 1,200 seconds (i.e., 20 minutes) after performing the first click on any item in the site, the user is considered to have finished the session. In this manner, the processor may set the I _normal vector using the click log data of all users using the site.

도 2에 개시된 그래프의 x축은 사용자의 연속된 두 번의 클릭에 대한 시간 간격인 i 값을 나타내고, y축은 i 초의 클릭 시간 간격을 가지는 클릭 로그 쌍들의 개수를 나타낼 수 있다. 예를 들어, 사이트에 접속한 사용자가 3초 간격의 클릭을 10번 수행한 경우, 도 2의 그래프 (a) 상에 (3, 10)에 대응하는 좌표로 표현될 수 있다.The x-axis of the graph illustrated in Fig. 2 represents the i-value, which is the time interval for the user's two consecutive clicks, and the y-axis, may represent the number of click-log pairs having a click time interval of i seconds. For example, when a user who accesses the site performs 10 clicks at intervals of 3 seconds, it can be expressed in coordinates corresponding to (3, 10) on the graph (a) of FIG.

도 2의 그래프 (a)는 사이트를 이용하는 모든 사용자들의 IAT를 기초로 한 그래프이고, 그래프 (b)는 이상 사용자로 분류된 사용자들의 IAT 만을 기초로 한 그래프이다. The graph (a) of FIG. 2 is a graph based on IAT of all users using a site, and the graph (b) is a graph based only on IAT of users classified as abnormal users.

그래프 (a)를 참고하면, 사이트를 이용하는 모든 사용자들의 IAT 중 대부분은 1 내지 100초 사이에 존재함을 알 수 있다. 그래프 (a)와 같은 IAT 패턴은, 아이템 검색 또는 가격 비교 등의 동작이 수행되는 쇼핑 사이트를 통해 수집될 수 있는 클릭 로그 데이터를 기초로 하여 추출될 수 있는 IAT의 전형적인 패턴이다. 예를 들면, 가격비교 사이트를 이용하는 사용자들의 클릭 로그 데이터로부터 추출된 IAT는 그래프 (a)와 유사한 형태로 표현될 수 있다.Referring to graph (a), it can be seen that most of the IATs of all users using the site exist between 1 and 100 seconds. An IAT pattern such as graph (a) is a typical pattern of IAT that can be extracted based on click log data that can be collected through a shopping site where operations such as item search or price comparison are performed. For example, the IAT extracted from the click log data of the users using the price comparison site can be expressed in a form similar to the graph (a).

이에 반해, 이상 사용자로 분류된 사용자들의 클릭 로그 데이터로부터 추출된 IAT는 도 2의 그래프 (b)와 유사한 형태로 표현될 수 있다. 그래프 (b)를 참고하면, 연속된 두 번의 클릭 로그 쌍 중 대다수가 9초 내지 14초의 클릭 시간 간격(210)을 가지거나 40초의 클릭 시간 간격(220)을 가지는 것을 알 수 있다. 하나의 아이템에 대하여 특정 시간 간격으로 다수의 클릭을 수행하는 사용자의 행동 패턴은 일반적인 사용자의 행동 패턴과 상당한 차이가 있음 알 수 있다.On the other hand, the IAT extracted from the click log data of users classified as abnormal users can be expressed in a form similar to the graph (b) of FIG. Referring to graph (b), it can be seen that the majority of two consecutive click log pairs have a click time interval 210 of 9 to 14 seconds or a click time interval 220 of 40 seconds. It can be seen that the behavior pattern of a user who performs a plurality of clicks on a single item at a specific time interval is considerably different from a behavior pattern of a general user.

일실시예에 따르면, 사이트를 이용하는 모든 사용자들의 클릭 로그 데이터로부터 추출된 IAT 값을 기초로 한 I_normal 벡터와 개별 사용자의 클릭 로그 데이터로부터 추출된 IAT 값을 기초로 한 I_u 벡터를 비교함으로써, 개별 사용자가 이상 사용자인지 여부를 판단할 수 있다. 예를 들어, 사이트를 이용하는 모든 사용자들의 클릭 로그 데이터로부터 추출된 IAT 값을 기초로 한 I_normal 벡터와 사용자 u의 클릭 로그 데이터로부터 추출된 IAT 값을 기초로 한 I_u 벡터의 유사도를 계산하여, 사용자 u의 클릭 인터벌 의심 점수인 a_u ^IAT를 구할 수 있다. According to one embodiment, by comparing an I _normal vector based on the IAT value extracted from the click log data of all the users using the site and the I _u vector based on the IAT value extracted from the individual user's click log data, It is possible to determine whether the individual user is an abnormal user. For example, by calculating for all users of the click log data I _normal based on the IAT value extracted from the vector and the user u of the similarity of I _u vector in the basis of the IAT value extracted from click log data using a site, A _{u u} ^IAT , which is the suspicious score of the user u's click interval, can be obtained.

일실시예에 따라, I_normal 벡터와 I_u 벡터 간의 유사도 계산에는 쿨백-라이블러 발산(Kullback-Leibler divergence, DKL)이 이용될 수 있다. 그리고, 이는 아래의 수학식 1과 같이 나타낼 수 있다.According to one embodiment, Kullback-Leibler divergence (DKL) may be used to calculate the similarity between I _normal vectors and I _u vectors. This can be expressed by the following equation (1).

여기서,

은 I_u 벡터와 I_normal 벡터의 쿨백-라이블러 발산이고, i 는 I_u 벡터와 I_normal 벡터의 차원을 나타내는 지표이다. 일반적으로

은 비대칭 함수이므로, a_u ^IAT 는

및

값을 평균하여 계산할 수 있다. 이는 아래의 수학식 2와 같이 나타낼 수 있다.here,

I _u is the Kullback vector and the _normal vector I - and Lai blur divergence, i is an index indicating the dimension of the vector _u I I and _normal vector. Generally

Is an asymmetric function, a _u ^IAT is

And

Can be calculated by averaging the values. This can be expressed by the following equation (2).

는 최소-최대 정규화(min-max normalization)를 통해 0과 1사이의 값으로 표현될 수 있다.

Can be expressed as a value between 0 and 1 through min-max normalization.

상기 실시예에서는 DKL을 이용하여 벡터 간 유사도를 계산하였으나, 이에 한정되는 것은 아니며, 벡터 간 유사도를 계산하는 방법이라면 무엇이든지 적용될 수 있다. 예를 들어, 유클리드 거리법(Euclidean distance), 맨하탄 거리법(Manhattan distance), 코사인 유사도(cosine similarity) 또는 해밍 거리법(Hamming distance) 등이 적용될 수 있다.Although the similarity between vectors is calculated using the DKL in the above embodiment, the present invention is not limited thereto, and any method for calculating the similarity between vectors can be applied. For example, Euclidean distance, Manhattan distance, cosine similarity, or Hamming distance may be applied.

도 3은 본 발명의 일실시예에 따른 사이트의 일반적인 사용자와 이상 사용자의 DA 차이를 비교하기 위한 도면이다.3 is a diagram for comparing DA differences between a general user and an ideal user in a site according to an embodiment of the present invention.

수집된 클릭 로그 데이터로부터 추출될 수 있는 사이트 사용자 별 DA(diurnal activity)는, 사용자 별 하루 활동량을 의미하는 값으로서, 일반적인 사용자와 이상 사용자를 구별하는데 이용될 수 있는 지표 중 하나이다.The diurnal activity per site user that can be extracted from the collected click log data is one of the indicators that can be used to distinguish general users from ideal users.

이상 사용자를 검출하는 장치에 구비된 프로세서는, 클릭 로그 데이터를 이용하여 임의의 사용자 u에 대한 DA 값을 기초로 한 D_u 벡터를 설정할 수 있다. D_u 벡터는 24차원 벡터로서, 24는 하루 24시간을 의미할 수 있다. D_u 벡터의 i 번째 차원의 값은, i시 0분 0초부터 i시 59분 59초 동안 일어난 클릭의 비율을 의미하는 것으로서, 보다 구체적으로 사용자 u의 시간대 별 클릭 수의 비율을 나타낼 수 있다. 이와 같은 방식으로, 프로세서는 사이트를 이용하는 모든 사용자들의 클릭 로그 데이터를 이용하여 D_normal 벡터를 설정할 수도 있다.The processor included in the apparatus for detecting an abnormal user may set a D _u vector based on the DA value for an arbitrary user u using click log data. The D _u vector is a 24-dimensional vector, and 24 can mean 24 hours a day. The value of the i-th dimension of the D _u vector is the ratio of clicks occurring from 0 minutes 0 seconds to i 59 minutes 59 seconds i, and more specifically, the ratio of the number of clicks of user u by time zone . In this manner, the processor may set the D _normal vector using the click log data of all users using the site.

도 3에 개시된 그래프의 x축은 하루 24시간을 나타내고, y축은 시간대 별 클릭 수의 비율을 나타낼 수 있다. The x-axis of the graph shown in Fig. 3 represents 24 hours a day, and the y-axis represents the ratio of clicks by time zone.

도 3의 그래프 (a)는 사이트를 이용하는 모든 사용자들의 DA를 기초로 한 그래프이고, 그래프 (b)는 이상 사용자로 분류된 사용자들의 DA만을 기초로 한 그래프이다. The graph (a) of FIG. 3 is a graph based on DA of all users using the site, and the graph (b) is a graph based only on DAs of users classified as abnormal users.

그래프 (a)를 참고하면, 사이트를 이용하는 모든 사용자들의 DA가 가장 낮은 시간대는 수면 시간(Sleeping 구간)인 오전 3시 내지 8시이며, 반대로 모든 사용자들의 DA가 가장 높은 시간대는 밤 시간대인 오후 10시 내지 오전 0시임을 알 수 있다. 그래프 (a)와 같은 DA 패턴은, 아이템 검색 또는 가격 비교 등의 동작이 수행되는 쇼핑 사이트를 통해 수집될 수 있는 클릭 로그 데이터를 기초로 하여 추출될 수 있는 DA의 전형적인 패턴이다. 예를 들면, 가격비교 사이트를 이용하는 사용자들의 클릭 로그 데이터로부터 추출된 DA는 그래프 (a)와 유사한 형태로 표현될 수 있다.Referring to the graph (a), all users using the site are at the lowest sleeping time (sleeping interval) from 3:00 am to 8:00 am, and conversely, all users have the highest DA at 10:00 pm 0 am to 0 am. A DA pattern such as graph (a) is a typical pattern of a DA that can be extracted based on click log data that can be collected through a shopping site where an operation such as item search or price comparison is performed. For example, the DA extracted from the click log data of the users using the price comparison site can be expressed in a form similar to the graph (a).

이에 반해, 이상 사용자로 분류된 사용자들의 클릭 로그 데이터로부터 추출된 DA는 도 3의 그래프 (b)와 유사한 형태로 표현될 수 있다. 그래프 (b)의 310을 참고하면, 오후 7시 내지 오후 10시 사이에 이상 사용자의 대부분의 클릭이 수행되었음을 알 수 있다. 하루 중 특정 시간에만 클릭을 수행하는 사용자의 행동 패턴은 일반적인 사용자의 행동 패턴과 상당한 차이가 있음 알 수 있다.On the other hand, the DA extracted from the click log data of the users classified as the abnormal users can be expressed in a form similar to the graph (b) of FIG. Referring to 310 of the graph (b), it can be seen that most of the abnormal users have been clicked between 7 pm and 10 pm. It can be seen that the behavior pattern of a user performing a click only at a specific time of day is considerably different from a behavior pattern of a general user.

일실시예에 따르면, 사이트를 이용하는 모든 사용자들의 클릭 로그 데이터로부터 추출된 DA 값을 기초로 한 D_normal 벡터와 개별 사용자의 클릭 로그 데이터로부터 추출된 DA 값을 기초로 한 D_u 벡터를 비교함으로써, 개별 사용자가 이상 사용자인지 여부를 판단할 수 있다. 예를 들어, 사이트를 이용하는 모든 사용자들의 클릭 로그 데이터로부터 추출된 DA 값을 기초로 한 D_normal 벡터와 사용자 u의 클릭 로그 데이터로부터 추출된 DA 값을 기초로 한 D_u 벡터의 유사도를 계산하여, 사용자 u의 일일 활동량 의심 점수인 a_u ^DA를 구할 수 있다. According to one embodiment, by comparing the D _normal vector based on the DA value extracted from the click log data of all the users using the site and the D _u vector based on the DA value extracted from the individual user's click log data, It is possible to determine whether the individual user is an abnormal user. For example, the similarity of a D _normal vector based on the DA value extracted from the click log data of all the users using the site and the D _u vector based on the DA value extracted from the click log data of the user u is calculated, A _u ^DA , which is the suspicious score of the daily activity amount of the user u, can be obtained.

일실시예에 따라, D_normal 벡터와 D_u 벡터 간의 유사도 계산에는 쿨백-라이블러 발산(Kullback-Leibler divergence, DKL)이 이용될 수 있다. 이는 수학식 3과 같이 나타낼 수 있다.According to one embodiment, Kullback-Leibler divergence (DKL) may be used to calculate the similarity between the D _normal vector and the D _u vector. This can be expressed by Equation (3).

여기서,

은 D_u 벡터와 D_normal 벡터의 쿨백-라이블러 발산이고, i 는 D_u 벡터와 D_normal 벡터의 차원을 나타내는 지표이다. 일반적으로

은 비대칭 함수이므로, a_u ^DA 는

및

값을 평균하여 계산할 수 있다. 이는 수학식 4와 같이 나타낼 수 있다.here,

Is the Kullback D vector _u and the _normal vector D-lie and blur divergence, i is an index indicating the dimension of the vector D _u and D _normal vector. Generally

Since asymmetric function, a _u ^DA is

And

Can be calculated by averaging the values. This can be expressed by Equation (4).

Can be expressed as a value between 0 and 1 through min-max normalization.

도 4는 본 발명의 일실시예에 따른 사이트의 일반적인 사용자와 이상 사용자의 ES 차이를 비교하기 위한 도면이다.4 is a diagram for comparing ES differences between a general user and an ideal user in a site according to an embodiment of the present invention.

수집된 클릭 로그 데이터로부터 추출될 수 있는 사이트 사용자 별 ES(eigenscore)는, 사이트 내 존재하는 아이템 각각에 대한 사용자 별 일일 총 클릭 수를 기초로 하여 추출된 값이다. 예를 들어, 특정 사용자의 ES가 높다는 것은, 특정 사용자의 단일 아이템에 대한 클릭 수가 높다는 것을 의미할 수 있으며, 이와 동시에 특정 사용자가 이상 사용자일 가능성이 높다는 것을 의미할 수 있다. 사용자 별 ES는, 일반적인 사용자와 이상 사용자를 구별하는데 이용될 수 있는 지표 중 하나이다. The ES (eigenscore) for each site user that can be extracted from the collected click log data is a value extracted based on the total number of daily clicks for each item in the site. For example, a high ES of a particular user may mean that a particular user has a high number of clicks on a single item, and at the same time, it may mean that a particular user is more likely to be an abnormal user. The per-user ES is one of the indicators that can be used to distinguish general users from ideal users.

이상 사용자를 검출하는 장치에 구비된 프로세서는, 사이트를 이용하는 사용자들의 클릭 로그 데이터를 이용하여, 사이트 내 존재하는 아이템 각각에 대한 사용자 별 일일 총 클릭 수를 나타내는 행렬을 생성할 수 있다. 예를 들어, 행렬의 행에는 제1 사용자부터 제n 사용자를 나열할 수 있고, 행렬의 열에는 제1 아이템 내지 제n 아이템의 날짜별 클릭 수를 나열할 수 있다. The processor included in the apparatus for detecting an abnormal user can generate a matrix indicating the total number of daily clicks for each item in the site using the click log data of the users using the site. For example, the rows of the matrix may list the first to n-th users, and the column of the matrix may list the number of clicks by date of the first to n-th items.

도 4의 표를 참고하면, 일실시예에 따른 행렬의 1행 1열에는 제1 사용자의 제1 아이템에 대한 첫번째 날의 클릭 수가 표현될 수 있다. 410을 참고하면, 제1 사용자는 제2 아이템에 대하여 첫번째 날에는 75번의 클릭을 수행하였고, 두번째 날에는 42번의 클릭을 수행하였음을 알 수 있다. 420을 참고하면, 제3 사용자는 제3 아이템에 대하여 첫번째 날에는 69번의 클릭을 수행하였고, 두번째 날에는 80번의 클릭을 수행하였음을 알 수 있다. Referring to the table of FIG. 4, in the first row and first column of the matrix according to an embodiment, the number of clicks of the first day for the first item of the first user may be expressed. 410, it can be seen that the first user performed 75 clicks on the first day and the 42 clicks on the second day for the second item. 420, it can be seen that the third user performed 69 clicks on the first day and 80 clicks on the second day for the third item.

일실시예에 따른 프로세서는, 생성된 행렬에 존재하는 밀도 블록(dense block)과 연관된 사용자를 이상 사용자로 분류할 수 있다. 밀도 블록(dense block)의 밀도(density)는 블록의 논-제로 요소(non-zero element)의 비율 및 각 요소(element)의 숫자 크기에 대응하도록 설정될 수 있다. 상기 실시예에 따르면, 밀도 블록(dense block)은 사용자가 특정 아이템을 짧은 시간에 집중적으로 클릭할 때 생성될 수 있다.A processor according to an exemplary embodiment may classify a user associated with a dense block existing in a generated matrix as an abnormal user. The density of the dense block may be set to correspond to the ratio of the non-zero elements of the block and the number magnitude of each element. According to the embodiment, a dense block can be generated when a user intensively clicks on a specific item in a short time.

예를 들어, 프로세서는 클릭 로그 데이터를 이용하여 도 4에 개시된 것과 같은 하루 별 사용자-아이템 클릭 행렬을 생성한 뒤, 생성된 행렬에서 밀도 블록(dense block)을 찾을 수 있다. 이어서 프로세서는, 각 사용자와 밀도 블록(dense block) 간의 관련성을 측정할 수 있다. For example, the processor may use the click log data to generate a daily user-item click matrix as shown in FIG. 4, and then find a dense block in the generated matrix. The processor can then measure the association between each user and the dense block.

프로세서가 밀도 블록(dense block)을 찾는 과정에는, SVD(singular vector decomposition)가 이용될 수 있다. (SVD는 행렬에 포함된 밀도 블록(dense block)을 찾기 위해 흔히 사용되는 방법으로서, 공지 기술에 해당하므로 SVD를 적용하는 방법은 생략한다.) 예를 들어, 클릭 로그 데이터를 기초로 하여 생성된 행렬에 SVD를 적용할 수 있다. 만약 50개의 밀도 블록(dense block)이 검출되었다면, 프로세서는 50개의 특이값들과 각 특이값에 대응되는 특이벡터들을 도출할 수 있다. In the process of finding a dense block by the processor, singular vector decomposition (SVD) can be used. (SVD is a commonly used method for finding a dense block included in a matrix, and corresponds to a known technique, so that a method of applying SVD is omitted.) For example, SVD can be applied to a matrix. If 50 dense blocks are detected, the processor can derive 50 singular values and singular vectors corresponding to each singular value.

프로세서는 도출된 각각의 특이벡터 내에 포함된 값들의 절대값을 ES(eigenscore)로 설정할 수 있다. ES(eigenscore)는 사용자가 밀도 블록(dense block)과 얼마나 연관이 있는지를 나타내는 지표로 이용될 수 있다. 예를 들어, ES(eigenscore)가 클수록 사용자와 밀도 블록(dense block) 간의 연관성이 크다는 것을 의미할 수 있다. 프로세서는 ES(eigenscore)를 0과 1 사이의 값으로 정규화(normalize)할 수 있다.The processor may set the absolute value of the values contained in each derived singular vector to ES (eigenscore). ES (eigenscore) can be used as an indicator of how closely a user is associated with a dense block. For example, the larger the ES (eigenscore), the greater the association between the user and the dense block. The processor can normalize the ES (eigenscore) to a value between 0 and 1.

일실시예에 따르면, 프로세서는 임의의 사용자 u에 대한 특이값 의심 점수인 a_u ^ES를 구할 수 있다. 이를 위해, 프로세서는 모든 사용자들에 대한 상대적인 ES(eigenscore)의 평균인 RE_average와 사용자 u에 대한 상대적인 ES(eigenscore)의 평균인 RE_u의 거리를 계산할 수 있다. 이를 식으로 표현하면 다음과 같이 수학식 5와 같이 나타낼 수 있다.According to one embodiment, the processor can obtain a _{u u} ^ES , a singular value suspect score for any user u. To this end, the processor may calculate the average of the distance _u relative RE ES (eigenscore) to the mean _average of RE and the user u of the relative ES (eigenscore) for all users. This can be expressed as Equation (5) as follows.

일실시예에 따른 프로세서는, 사용자 u의 클릭 인터벌 의심 점수인 a_u ^IAT, 사용자 u의 일일 활동량 의심 점수인 a_u ^DA 또는 사용자 u에 대한 특이값 의심 점수인 a_u ^ES 중 적어도 하나의 값을 기초로 하여, 사용자 u에 대한 최종 의심 점수를 계산할 수 있다. 프로세서가 최종 의심 점수를 계산하는 방법에는, 정보 검색 분야에서 많이 이용되는 extended p-norm model 이 사용될 수 있다. extended p-norm model은 쿼리의 종류(AND 또는 OR)에 따라 두 가지로 나뉠 수 있다.The processor according to an embodiment may calculate at least one of a _u ^IAT , which is a suspicious score of a user u, a _u ^DA , a suspicious score of a user u, or a _u ^ES , which is a suspicious score of a unique value for a user u Based on this, the final suspect score for user u can be calculated. An extended p-norm model, which is often used in information retrieval, can be used as a method for a processor to calculate the final suspect score. The extended p-norm model can be divided into two types according to the kind of query (AND or OR).

도 5는 본 발명의 일실시예에 따른 사이트 내 아이템에 대한 클릭 로그 데이터를 이용하여 이상 사용자를 검출하는 방법을 설명하기 위한 순서도이다.5 is a flowchart illustrating a method of detecting an abnormal user using click log data for an item in a site according to an embodiment of the present invention.

본 발명의 일실시예에 따라, 사이트 내 아이템에 대한 클릭 로그 데이터를 통해 이상 사용자를 검출하는 장치를 통해 수행되는 방법은 하기의 단계들을 포함할 수 있다.According to an embodiment of the present invention, a method performed through an apparatus for detecting an abnormal user through click log data for an item in a site may include the following steps.

단계(500)에서, 프로세서는 사이트를 이용하는 모든 사용자 각각의 클릭 로그 데이터를 수집할 수 있다. 예를 들어, 사이트에 접속한 사용자들은 사이트 내에 포함된 다양한 아이템들을 클릭할 수 있다. 사이트 내에 포함된 아이템들이란, 이미지 또는 텍스트 형태로 제공되는 링크나 컨텐츠를 의미할 수 있다. 사용자가 아이템을 클릭할 경우, 프로세서는 클릭 동작을 수행한 사용자에 대한 식별 정보 및 이에 대응하는 클릭 로그 데이터를 데이터 베이스에 저장할 수 있다. At step 500, the processor may collect click log data for each and every user utilizing the site. For example, users who access a site can click on various items contained within the site. Items included in the site may refer to links or contents provided in the form of images or text. When a user clicks an item, the processor may store identification information for the user who performed the click operation and corresponding click log data in the database.

프로세서는 미리 설정된 수집 기간 동안 데이터 베이스에 저장된 클릭 로그 데이터들을 편집할 수 있다. 예를 들어, 제1 사용자가 수집 기간 동안 사이트 내에서 수행한 클릭 수가 임계치 미만일 경우, 제1 사용자에 대한 클릭 로그 데이터를 표본에서 제외할 수 있으며, 프로세서는 제1 사용자의 클릭 로그 데이터를 데이터 베이스에서 삭제할 수 있다. The processor may edit the click log data stored in the database for a predetermined collection period. For example, if the number of clicks performed within the site by the first user during the collection period is less than the threshold, the click log data for the first user may be excluded from the sample, .

단계(510) 내지 단계(530)에서, 프로세서는 데이터 베이스에 저장된 클릭 로그 데이터를 기초로 하여, 사이트를 이용하는 사용자들에 대한 IAT(inter arrival time), DA(diurnal activity) 또는 ES(eigenscore) 중 적어도 하나를 추출할 수 있다. In step 510 to step 530, the processor determines whether the inter-arrival time (IAT), the diurnal activity (DA), or the ES (eigenscore) for the users using the site based on the click log data stored in the database At least one can be extracted.

예를 들어, 단계(510)에서, 프로세서는 데이터 베이스에 저장된 클릭 로그 데이터를 기초로 하여, 사이트 사용자 별 IAT(inter arrival time)을 추출할 수 있다. 저장된 클릭 로그 데이터로부터 추출될 수 있는 사이트 사용자 별 IAT(inter arrival time)는, 예를 들어, 제1 사용자가 제1 아이템을 클릭한 후, 다시 제1 아이템을 클릭할 때까지 소요된 시간을 의미하는 값이다. 즉, IAT는 일반적인 사용자와 이상 사용자를 구별하는데 이용될 수 있는 지표 중 하나이다.For example, in step 510, the processor may extract an inter arrival time (IAT) per site user based on the click log data stored in the database. The inter-arrival time (IAT) for each site user that can be extracted from stored click log data means the time taken for the first user to click on the first item and then click again on the first item . In other words, IAT is one of the indicators that can be used to distinguish general users from ideal users.

단계(520)에서, 프로세서는 데이터 베이스에 저장된 클릭 로그 데이터를 기초로 하여, 사이트 사용자 별 DA(diurnal activity)를 추출할 수 있다. 저장된 클릭 로그 데이터로부터 추출될 수 있는 사이트 사용자 별 DA(diurnal activity)는, 사용자 별 하루 활동량을 의미하는 값으로서, 일반적인 사용자와 이상 사용자를 구별하는데 이용될 수 있는 지표 중 하나이다.At step 520, the processor may extract the diurnal activity DA per site user based on the click log data stored in the database. The diurnal activity per site user that can be extracted from stored click log data is one of the indexes that can be used to distinguish general users from ideal users.

단계(530)에서, 프로세서는 데이터 베이스에 저장된 클릭 로그 데이터를 기초로 하여, 사이트 사용자 별 ES(eigenscore)를 추출할 수 있다. 저장된 클릭 로그 데이터로부터 추출될 수 있는 사이트 사용자 별 ES(eigenscore)는, 사이트 내 존재하는 아이템 각각에 대한 사용자 별 일일 총 클릭 수를 기초로 하여 추출된 값이다. 예를 들어, 제1 사용자의 ES가 높다는 것은, 제1 사용자의 단일 아이템에 대한 클릭 수가 높다는 것을 의미할 수 있으며, 이와 동시에 제1 사용자가 이상 사용자일 가능성이 높다는 것을 의미할 수 있다. 사용자 별 ES는, 일반적인 사용자와 이상 사용자를 구별하는데 이용될 수 있는 지표 중 하나이다.At step 530, the processor may extract an ES (eigenscore) by site user based on the click log data stored in the database. The ES (eigenscore) for each site user that can be extracted from stored click log data is a value extracted based on the total number of daily clicks for each item in the site. For example, a high ES of a first user may mean that a first user has a high number of clicks on a single item, and at the same time, it may mean that a first user is highly likely to be an abnormal user. The per-user ES is one of the indicators that can be used to distinguish general users from ideal users.

단계(540)에서, 프로세서는 추출된 IAT, DA 또는 ES 중 적어도 하나를 이용하여 사이트 사용자 별 의심 점수를 계산할 수 있다. 보다 구체적으로, 프로세서는 사용자 u의 IAT를 기초로 하여 계산된 클릭 인터벌 의심 점수인 a_u ^IAT, 사용자 u의 DA를 기초로 하여 계산된 일일 활동량 의심 점수인 a_u ^DA 또는 사용자 u의 ES를 기초로 하여 계산된 특이값 의심 점수인 a_u ^ES 중 적어도 하나의 값을 이용하여, 사용자 u에 대한 최종 의심 점수를 계산할 수 있다. At step 540, the processor may calculate a site user-specific suspicious score using at least one of the extracted IAT, DA, or ES. More specifically, the processor is based on a _u ^IAT , which is a click interval suspicion score calculated based on the ^IAT of user u, a _u ^DA , a daily activity suspicious score calculated based on ^{DA of} user u, or ES of user u , The final suspect score for the user u can be calculated using at least one of the a _u ^{ES, which} is the suspicious score of the specific value calculated as e u.

단계(550)에서, 프로세서는 계산된 사용자 별 의심 점수를 기초로, 사이트 사용자 중 이상 사용자를 검출할 수 있다. 예를 들어, 단계(540)에서 계산된 의심 점수들은 [0, 1] 사이의 값들을 가질 수 있으며, 계산된 값이 1에 가까울수록 해당 사용자는 이상 사용자일 가능성이 높다. 이상 사용자를 검출하는 장치를 관리하는 시스템 관리자는 기준 값을 설정함으로써, 이상 사용자에 대한 분류 기준을 정할 수 있다. 예를 들어, 시스템 관리자에 의해 설정된 의심 점수가 0.7인 경우, 0.7 이상의 의심 점수를 획득한 사용자들이 이상 사용자로 분류될 수 있다.At step 550, the processor may detect an abnormal user among the site users based on the computed per-user suspicious score. For example, the suspicious scores calculated in step 540 may have values between [0, 1], and the closer the calculated value is to 1, the more likely the user is an abnormal user. The system administrator who manages the device that detects abnormal users can set the classification criteria for the abnormal user by setting the reference value. For example, if the suspicious score set by the system administrator is 0.7, users who have obtained a suspicious score of 0.7 or higher may be classified as abnormal users.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method for detecting an abnormal user through click log data on items in a site,
Collecting click log data for each site user;
Extracting an inter arrival time (IAT) for each site user using the collected click log data;
Extracting a diurnal activity DA per site user using the collected click log data;
Extracting an ES (eigenscore) for each site user using the collected click log data;
Calculating a suspect point score for each site user using at least one of the extracted IAT, DA, or ES; And
Detecting an abnormal user among the site users based on the calculated user-specific suspicious scores;
&Lt; / RTI >

The method according to claim 1,
Wherein the collecting of the click log data comprises:
Storing click log data per user connected to the site for a preset time; And
Removing the click log data whose total number of clicks for items in the site is less than the threshold among the stored click log data
&Lt; / RTI >

The method according to claim 1,
The step of extracting the inter arrival time (IAT)
Setting an I _normal vector including time interval information for two consecutive click log pairs of all users of the site;
Setting an I _u vector containing time interval information for two consecutive click-log pairs of any user u of the site; And
Calculating an IAT of any user u based on the set I _normal vector and I _u vector,
&Lt; / RTI >

The method according to claim 1,
The step of extracting the diurnal activity DA per site user includes:
Setting a D _normal vector including information on the number of clicks per time zone for all users of the site;
Setting a D _u vector including the number of clicks per time zone for any user u of the site; And
Calculating DA by user based on the set D _normal vector and D _u vector
&Lt; / RTI >

The method according to claim 1,
The step of extracting the ES (eigenscore)
Constructing a user-item matrix for each user using the site log user click log data;
Searching a dense block in the constructed matrix through singular vector decomposition (SVD); And
Extracting an ES for each site user based on the retrieved dense block;
&Lt; / RTI >

An apparatus for detecting an abnormal user through click log data on items in a site,
A database for storing the click log data; And
A processor,
The processor comprising:
To collect click log data per site user,
Extracting an inter arrival time (IAT) for each site user using the collected click log data,
Extracting diurnal activity DA for each site user using the collected click log data,
Extracts ES (eigenscore) for each site user using the collected click log data,
Calculates a suspect point score per site user using at least one of the extracted IAT, DA, or ES,
And detects an abnormal user among the site users based on the calculated user-specific suspicious scores.

The method according to claim 6,
The processor collecting the click log data comprises:
Stores click log data per user connected to the site for a preset time,
And removing the click log data whose total number of clicks for items in the site is less than the threshold among the stored click log data.

The method according to claim 6,
Wherein the processor extracts the inter-arrival time (IAT) per site user,
Setting an I _normal vector containing time interval information for two consecutive click log pairs of all users of the site,
Sets an I _u vector containing time interval information for two consecutive click-log pairs of any user u in the site,
And computing an IAT of any user u based on the set I _normal vector and I _u vector.

The method according to claim 6,
The processor extracting the site user DA (diurnal activity)
Sets a D _normal vector including information on the number of clicks per time zone for all the users of the site,
Sets a D _u vector including the number of clicks per time zone for any user u in the site,
And computing a per user DA based on the set D _normal vector and the D _u vector.

The method according to claim 6,
Wherein the processor extracts the ES (eigenscore)
A user-item matrix for each user is constructed using the click log data for each site user,
A dense block is searched in the constructed matrix through singular vector decomposition (SVD)
And extracting the site-specific user ES based on the searched dense block.