KR101601820B1

KR101601820B1 - Method and program for similar user index

Info

Publication number: KR101601820B1
Application number: KR1020140108643A
Authority: KR
Inventors: 이해성; 권준희
Original assignee: 경기대학교 산학협력단
Priority date: 2014-08-20
Filing date: 2014-08-20
Publication date: 2016-03-14
Also published as: KR20160023937A

Abstract

본 발명의 일 측면에 따른 추천 시스템에서 유사 사용자 인덱스 방법이 제공된다. 본 발명의 일 실시예에 따른 추천 시스템에서 유사 사용자 인덱스 방법은 추천자 데이터 세트에서 소셜 네트워크가 생성되는 단계, 상기 생성된 소셜 네트워크가 이용되어 씨드가 선택되는 단계, 상기 선택된 씨드를 중심으로 클러스터링되는 단계, 상기 클러스터링된 클러스터가 재조정되는 단계 및 상기 클러스터가 유사 사용자 클러스터로 인덱스되는 단계가 포함될 수 있다. 본 발명은 소셜 네트워크 분석 기법을 적용한 유사 사용자 클러스터링 기법을 적용하여 추천 항목에 대한 정확성과 신뢰성을 보다 높일 수 있다.A similar user index method is provided in a recommendation system according to one aspect of the present invention. The similar user index method in the recommendation system according to an embodiment of the present invention includes a step of generating a social network in a recommender data set, a step of selecting a seed using the generated social network, a step of clustering , The clustered cluster is readjusted, and the cluster is indexed into a pseudo-user cluster. The present invention can improve the accuracy and reliability of recommendation items by applying a similar user clustering technique using a social network analysis technique.

Description

[0001] METHOD AND PROGRAM FOR SIMILAR USER INDEX [0002]

본 발명은 유사 사용자 인덱스 방법 및 프로그램에 관한 것으로, 더욱 상세하게는 소셜 클러스터링 기반 유사 사용자 인덱스 방법 및 프로그램에 관한 것이다.
The present invention relates to a pseudo-user index method and program, and more particularly to a method and program for a similar user index based on a social clustering.

최근 웹 컨텐츠의 폭발적인 증가로 인해 추천 시스템이 전자 상거래 분야에서 발전하고 있다. 추천 시스템의 목적은 사용자의 취향에 맞는 아이템 또는 물건을 추천하는 것이며, 선택을 위해 아이템을 선택하거나 구입하는데 있어 사용자를 도와주는 것이다. 개인화된 추천시스템은 선택이 다양하며, 고객의 취향이 중요한 전자 상거래 분야에서 매우 중요하다. Recently, the recommendation system has been developed in the field of electronic commerce due to the explosive increase of web contents. The purpose of the recommendation system is to recommend an item or item to suit the user's taste, and to assist the user in selecting or purchasing items for selection. The personalized recommendation system is very important in the e-commerce field where the choice is diverse and customer's taste is important.

아마존이나 넷플릭스와 같이 대형 전자상거래 사이트는 각자의 고객에 자동적으로 생성된 개인화된 추천정보를 전달하는 추천 시스템을 성공적으로 적용하고 있다. 협업 필터링(collaborative filtering, CF) 추천 시스템은 최초이며, 연구분야 또는 실용분야에서 가장 성공적인 방법이었다. 추천 시스템에서 협업 필터링적 접근방법은 사용자의 과거 이력 데이터를 학습하여 제품이나 서비스에 대한 선호도를 예측하는 것이다. 즉, 거의 모든 협업 필터링 방법을 적용한 추천 시스템 알고리즘은 일반적으로 사용자에 의해 경험된 제품에 대한 평가 값이나 물건과 같은 다양한 데이터 세트를 고려한다. 예를 들면, 아마존은 3천만 이상의 사용자와 수백만 제품들 이상에 대해 기록을 하여 협업 필터링 기반의 추천시스템에 활용하여 왔다.Large e-commerce sites such as Amazon and Netflix have successfully adopted a recommendation system that delivers automatically generated personalized referral information to their customers. Collaborative filtering (CF) recommendation system was the first and was the most successful method in the research or utility field. In the recommendation system, the collaborative filtering approach is to predict user 's preference for product or service by learning user' s past history data. That is, the recommendation system algorithm applying almost all the collaborative filtering methods generally considers various data sets such as evaluation values or objects for products experienced by users. For example, Amazon has recorded over 30 million users and millions of products and has been using it for collaborative filtering-based recommendation systems.

많은 추천 기술은 과거 수십 년간 발달해 왔으나, 대부분이 작은 데이터 세트에서 만들어졌고, 전체적으로 현실적이지 않았다. 최근 데이터 세트가 폭발적으로 증가하여 여러 추천시스템은 정확성(performance) 및 확장성(scalability) 문제로 인하여 더 커진 데이터 세트를 다루기가 어려웠다. Many of the recommendation technologies have evolved over the past decades, but most of them have been made up of small datasets and were not realistic overall. Recent data sets have exploded and many recommendation systems have been difficult to handle larger data sets due to performance and scalability issues.

따라서, 많은 데이터 세트를 다루어 높은 질의 추천과 빠르게 추천 제품을 구성하는 것이 요구되고 있다. Therefore, it is required to handle a large number of data sets and to construct high quality recommendation and quick recommendation products.

본 발명의 배경기술은 대한민국 공개특허공보 제10-2009-0130774호(2009. 12. 24 공개, 사용자 추천방법 및 이를 위한 프로그램이 기록된 기록매체)에 개시되어 있다.
The background art of the present invention is disclosed in Korean Patent Laid-Open Publication No. 10-2009-0130774 (published on December 24, 2009, a user recommendation method and a recording medium on which a program for this is recorded).

본 발명의 목적은 추천 시스템의 정확성과 확장성 측면의 성능을 모두 고려하여 전체 추천 시스템의 성능을 효과적으로 개선된 유사 사용자 인덱스 방법 및 프로그램을 제공하는 것이다.It is an object of the present invention to provide a pseudo-user index method and program that effectively improve the performance of the recommendation system in consideration of both accuracy and scalability of the recommendation system.

또한, 본 발명의 목적은 폭발적으로 증가하는 데이터 크기를 고려하고, 데이터 크기가 증가할수록 전체 추천 시스템의 성능이 저하되지 않는 유사 사용자 인덱스 방법 및 프로그램을 제공하는 것이다.It is another object of the present invention to provide a pseudo-user index method and program that consider explosively increasing data size and do not deteriorate the performance of the entire recommendation system as the data size increases.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.
The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood from the following description.

상기 목적을 달성하기 위하여, 본 발명의 일 측면에 따른 추천 시스템에서 유사 사용자 인덱스 방법이 제공된다.In order to achieve the above object, a similar user index method in a recommendation system according to one aspect of the present invention is provided.

본 발명의 일 실시예에 따른 추천 시스템에서 유사 사용자 인덱스 방법은 추천자 데이터 세트에서 소셜 네트워크가 생성되는 단계, 상기 생성된 소셜 네트워크가 이용되어 씨드가 선택되는 단계, 상기 선택된 씨드를 중심으로 클러스터링되는 단계, 상기 클러스터링된 클러스터가 재조정되는 단계 및 상기 클러스터가 유사 사용자 클러스터로 인덱스되는 단계가 포함될 수 있다.The similar user index method in the recommendation system according to an embodiment of the present invention includes a step of generating a social network in a recommender data set, a step of selecting a seed using the generated social network, a step of clustering , The clustered cluster is readjusted, and the cluster is indexed into a pseudo-user cluster.

또한, 본 발명의 다른 일 측면에 따른 추천 시스템에서 유사 사용자 인덱스 방법이 실행되는 컴퓨터 프로그램이 제공된다.In addition, a computer program is provided in which a similar user index method is executed in a recommendation system according to another aspect of the present invention.

본 발명의 일 실시예에 따른 유사 사용자 인덱스 방법이 실행되는 컴퓨터 프로그램은 추천자 데이터 세트에서 소셜 네트워크가 생성되는 단계, 상기 생성된 소셜 네트워크가 이용되어 씨드가 선택되는 단계, 상기 선택된 씨드를 중심으로 클러스터링되는 단계, 상기 클러스터링된 클러스터가 재조정되는 단계 및 상기 클러스터가 유사 사용자 클러스터로 인덱스되는 단계가 포함될 수 있다.
A computer program for implementing the pseudo-user index method according to an embodiment of the present invention includes a step of generating a social network in a recommender data set, a step of selecting a seed using the generated social network, A step in which the clustered cluster is readjusted, and a step in which the cluster is indexed into a pseudo-user cluster.

본 발명은 대용량의 데이터 시대에 더욱 필요성이 요구되고 있는 대용량 추천 시스템의 성능 개선을 위하여 기존 추천 알고리즘들의 문제점을 보완한 새로운 인덱스 기법이다. The present invention is a new indexing method that overcomes the problems of existing recommendation algorithms to improve the performance of a large-capacity recommendation system, which is demanded more in the age of large-capacity data.

본 발명은 소셜 네트워크 분석 기법을 적용하여 추천 항목에 대한 정확성과 신뢰성을 보다 높일 수 있도록 한다. The present invention applies a social network analysis technique to improve the accuracy and reliability of recommendation items.

본 발명은 유사 사용자 클러스터링 기법을 적용하여 클러스터링의 정확성뿐만 아니라 추천 항목에 대한 정확성을 보다 높일 수 있다. The present invention can improve the accuracy of recommendation items as well as the accuracy of clustering by applying a pseudo-user clustering technique.

본 발명은 유사 사용자 클러스터 인덱스의 사용으로 추천 기법에서의 유사사용자 추출을 위한 연산 시간을 대폭 감소시킴으로써 전체 추천 시스템의 성능을 보다 효과적으로 개선시킬 수 있다.By using the pseudo user cluster index, the present invention can improve the performance of the entire recommendation system more effectively by greatly reducing the computation time for similar user extraction in the recommendation technique.

본 발명은 유사 사용자 클러스터링 기반의 인덱스를 통한 인덱스의 구성으로 폭발적으로 증가하는 데이터 양에 큰 영향을 받지 않고 전체 추천 시스템의 성능을 개선 시킬 수 있다.
INDUSTRIAL APPLICABILITY The present invention can improve the performance of the entire recommendation system without being greatly affected by the amount of data which is explosively increased due to the configuration of the index through the index based on the pseudo-user clustering.

도 1 및 도2는 본 발명의 일 실시예에 따른 유사 사용자 인덱스 방법을 설명하기 위한 도면들.
도 3은 본 발명의 일 실시예에 따른 씨드 추출 방법을 설명하기 위한 도면.
도 4는 본 발명의 일 실시예에 따른 클러스터링 방법을 설명하기 위한 도면.
도 5 및 도 6은 본 발명의 일 실시예에 따른 클러스터 재조정 방법을 설명하기 위한 도면들.
도 7은 본 발명의 일 실시예에 따른 유사 사용자 인덱스 방법을 설명하기 위한 도면.
도 8 및 도 9는 본 발명의 일 실시예에 따른 유사 사용자 인덱스 방법의 효과를 설명하기 위한 도면들.1 and 2 are views for explaining a pseudo-user index method according to an embodiment of the present invention.
3 is a diagram for explaining a seed extracting method according to an embodiment of the present invention.
FIG. 4 illustrates a clustering method according to an embodiment of the present invention. FIG.
5 and 6 are views for explaining a cluster rebalance method according to an embodiment of the present invention.
FIG. 7 is a view for explaining a similar-user index method according to an embodiment of the present invention; FIG.
8 and 9 are views for explaining the effect of the similar user index method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명하도록 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 또한, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Also, when a part is referred to as "including " an element, it is to be understood that it may include other elements as well, without departing from the other elements unless specifically stated otherwise.

이하, 첨부된 도면을 참고하여, 본 발명의 실시를 위한 구체적인 내용을 설명하도록 한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings.

도 1 및 도 2는 본 발명의 일 실시예에 따른 유사 사용자 인덱스 방법을 설명하기 위한 도면들이다.1 and 2 are views for explaining a similar user index method according to an embodiment of the present invention.

본 발명에 따른 유사 사용자 인덱스 방법은 추천 시스템에서 취향이 유사한 사용자에게 아이템을 추천하는 방법을 위하여 유사 사용자 클러스터를 인덱스한다. 따라서, 본 발명에 따른 유사 사용자 인덱스 방법은 연산을 위한 컴퓨터, 서비스를 제공하는 서버에서 이용될 수 있으며, 프로그램 형태로 구현될 수 있음은 명백하다.The pseudo-user index method according to the present invention indexes pseudo-user clusters for a method of recommending items to users with similar tastes in a recommendation system. Accordingly, it is apparent that the pseudo-user index method according to the present invention can be used in a computer for computation, a server for providing service, and can be implemented in a program form.

도 1을 참조하면, 유사 사용자 인덱스 방법은 추천자의 데이터 세트에서 소셜 네트워크가 생성되는 단계(S110), 소셜 네트워크가 이용되어 씨드가 선택되는 단계(S120), 씨드를 중심으로 클러스터링되는 단계(S130), 클러스터가 재조정되는 단계(S140) 및 유사 사용자 클러스터가 인덱스되는 단계(S150)를 포함한다.Referring to FIG. 1, a pseudo-user index method includes a step S110 of generating a social network in a data set of a recommender, a step S120 of selecting a seed using a social network, a step S130 of cluster- , Step S140 in which the cluster is readjusted, and step S150 in which the similar user cluster is indexed.

단계 S110에서, 추천자 데이터 세트에서 소셜 네트워크가 생성된다.In step S110, a social network is created in the recommender data set.

소셜 네트워크 이론은 사람에 대한 아이템의 추천시스템에 사용될 수 있으며, 제1 모드와 제2 모드에서 구별된다. 이 모드는 동일한 기여를 가지는 구별되는 세트의 개체에 관한 것이다. 추천 시스템의 시각에서 제1 모드는 사용자로 간주될 수 있다. 제2 모드는 아이템으로 간주될 수 있다. 즉, 추천시스템에서 아이템에 대한 사람들의 평가 패턴은 명시적인 소셜 네트워크를 가져오며, 네트워크의 연결성에도 영향을 미친다.Social network theory can be used for recommendation systems of items for people, and is distinguished in the first mode and the second mode. This mode relates to a distinct set of entities that have the same contribution. From the perspective of the recommendation system, the first mode may be considered as a user. The second mode can be regarded as an item. That is, people's evaluation patterns on items in the recommendation system lead to explicit social networks and also to network connectivity.

추천데이터 세트는 사용자 집합 U={u1, u2, u3, … , un}, 아이템의 메타데이터 I={i1, i2, … , ik} 및 아이템의 상황 집합 S = {s1, s2, … , sl}로 구성된다. 여기서, 상황 집합은 예를 들면, 사용자에 대한 아이템을 공통으로 경험한 회수, 다른 사용자와의 관계 회수 또는 아이템에 대한 평가 값에 의해 구성될 수 있으나, 이에 한정하지 않는다.Suggested data sets include user sets U = {u1, u2, u3, ... , un}, metadata of items I = {i1, i2, ... , ik} and the set of states S = {s1, s2, ...} , sl}. Here, the situation set can be configured by, for example, the number of times the items are commonly experienced by the user, the number of times of association with another user, or the evaluation value for the item, but is not limited thereto.

도 2를 참조하면, 사용자 U1과 U2는 아이템 I1 및 I2를 공통적으로 경험한다. 따라서, 추천자 데이터 세트에서 도 2의 우측 그림과 같은 소셜 네트워크가 생성된다.Referring to FIG. 2, users U1 and U2 commonly experience items I1 and I2. Therefore, a social network as shown in the right picture of Fig. 2 is generated in the recommender data set.

단계 S120에서, 소셜 네트워크가 이용되어 씨드가 선택된다.In step S120, a social network is used to select a seed.

여기서, 씨드는 각 클러스터의 중심에 위치한 노드를 의미한다. 기존 대다수의 클러스터링 기법은 초기 씨드를 무작위로 선택하여 클러스터링을 수행하기 때문에 클러스터링의 정확성 저하를 야기 시키고, 유사 사용자 클러스터링에 적합하지 않다.Here, the seed means a node located at the center of each cluster. Most existing clustering schemes perform clustering by randomly selecting initial seeds, which causes degradation of clustering accuracy and is not suitable for similar user clustering.

보다 신뢰적인 사용자 노드들로 씨드를 구성해내기 위하여 본 발명의 가장 처음 단계에서 생성한 소셜 네트워크에서 중요 노드(influential node)들을 추출하게 된다. 이때, 중요 노드는 다른 사용자 노드들과의 관계 횟수와 공통 경험 횟수에 따라 판단하게 된다. In order to construct the seed with more reliable user nodes, the influential nodes are extracted from the social network generated in the first step of the present invention. At this time, the critical node is determined according to the number of relationships with other user nodes and the number of common experiences.

각 클러스터의 씨드를 선택하기 위하여 도 2의 우측 그림에 묘사된 소셜 네트워크를 이용한다. 소셜 네트워크에서 사용자간의 관계의 가중치는 사용자들에 의해 경험되는 공통되는 아이템의 빈도수를 의미한다. 한 쌍의 사용자들에 의해 경험되는 공통되는 아이템이 많을수록 더 유사한 사용자가 될 수 있다. 그러므로, 중요 사용자들 노드가 다른 사용자들과 더욱 많은 관계를 가지고 있다고 가정한다. 즉, 중요 노드가 각 유사 사용자 클러스터의 씨드로서 고려될 수 있다. 이는 중요 사용자 노드가 어느 다른 사용자 노드들과 관계를 가지는 더 높은 가능성을 가지고 있기 때문이다. 씨드 추출 단계에 대해서는 도 3에서 더욱 상세히 설명하기로 한다.We use the social network depicted in the right picture of Figure 2 to select the seeds of each cluster. The weight of the relationship between users in a social network means the frequency of common items experienced by users. The more common items experienced by a pair of users, the more similar users can be. Therefore, it is assumed that the important users node has more relationships with other users. That is, an important node can be considered as a seed of each similar user cluster. This is because the primary user node has a higher probability of being associated with any other user node. The seed extraction step will be described in more detail in FIG.

단계 S130에서 씨드를 중심으로 클러스터링된다.Clustered around the seed in step S130.

여기서, 클러스터링은 주어진 데이터베이스에서 흥미있는 패턴을 발견하는 데이터 마이닝 기법이다. 클러스터링의 주요 아이디어는 주어진 n개의 데이터가 m-차원의 공간에서 k 클러스터로 나누어져서 하나의 클러스터 안에 지적된 모든 데이터는 다른 클러스터에 있는 데이터보다 더 유사함을 가진다는 것이다. 추천 시스템에서 클러스터링 방법은 유사한 선호도를 가지는 사용자들의 그룹을 식별함에 의해 작동된다. 클러스터링이 완벽하면, 추천시스템의 성능은 매우 우수할 수 있으며, 이는 분석되어야만 하는 데이터의 크기가 훨씬 작아지기 때문이다.Here, clustering is a data mining technique that finds interesting patterns in a given database. The main idea of clustering is that given n pieces of data are divided into k clusters in m-dimensional space so that all the data pointed to in one cluster is more similar than the data in the other cluster. Clustering methods in recommendation systems are operated by identifying groups of users with similar preferences. When clustering is perfect, the performance of the recommendation system can be very good, because the size of the data that needs to be analyzed is much smaller.

본 발명에 따른 클러스터링 방법은 추출된 각 클러스터의 씨드와 사용자 노드간 거리 값을 계산하여 해당 사용자 노드와 가장 거리 값이 작은 씨드의 클러스터에 사용자 노드가 속하도록 한다. 이에 대해서는 도 4에서 더욱 상세히 설명하기로 한다.The clustering method according to the present invention calculates a distance value between a seed and a user node of each extracted cluster so that the user node belongs to the cluster of the seed having the smallest distance value from the corresponding user node. This will be described in more detail in FIG.

단계 S140에서, 설정된 클러스터가 재조정된다.In step S140, the set cluster is readjusted.

클러스터 재조정 단계는 클러스터링 단계에서 구성된 클러스터들을 대상으로 각 클러스터에 속한 사용자들을 재배치 하여 유사 사용자 클러스터링의 정확도를 보다 높인다. 클러스터 재조정 단계는 네트워크기반 클러스터링의 정확도 평가 수치인 Q를 적용한 클러스터 재조정 알고리즘이 적용된다. 이에 대해서는 도 5 및 도 6에서 더욱 상세히 설명하기로 한다.In the cluster rebalancing step, the clustering step is performed to rearrange the users belonging to each cluster to improve the accuracy of similar user clustering. In the cluster rebalancing step, a cluster rebalance algorithm applying Q, which is an accuracy evaluation value of network-based clustering, is applied. This will be described in more detail in FIG. 5 and FIG.

단계 S150에서, 유사 사용자 클러스터가 인덱스된다.In step S150, the pseudo-user cluster is indexed.

유사 사용자 클러스터 인덱스 단계는 추천 항목에 대한 정확성을 보다 높일 뿐만 아니라 추천 알고리즘 연산에 불필요한 데이터를 효과적으로 제거하여 전체 추천 시스템의 성능을 높이기 위한 단계이다. 이에 대해서는 도 7에서 더욱 상세히 설명하기로 한다.
The similar user cluster index step is a step for improving the performance of the recommendation system by effectively removing unnecessary data in the recommendation algorithm operation as well as improving the accuracy of the recommendation item. This will be described in more detail in FIG.

도 3은 본 발명의 일 실시예에 따른 씨드 추출 방법을 설명하기 위한 도면이다. 3 is a view for explaining a seed extracting method according to an embodiment of the present invention.

도 3을 참조하면, 단계 S310에서 중요도(Centrality)가 높은 상위 N개(여기서, N은 자연수이며, 미리 설정된 씨드의 개수와 동일함)의 노드가 추출된다. 여기서, 중요도는 주변 노드들과 관계를 맺은 회수일 수 있다.Referring to FIG. 3, in step S310, nodes of the top N (where N is a natural number, equal to the number of the seeds set in advance) having a high degree of centrality are extracted. Here, the importance may be the number of times that the nodes are related to each other.

예를 들면, 도 2에서 중요도 높은 노드들 U2=3, U3=4, U1=2, U7=2들이 우선 추출된다.For example, priority nodes U2 = 3, U3 = 4, U1 = 2, U7 = 2 are extracted first in FIG.

단계 S320에서, 추출된 노드들 중 다른 노드와 관계가 많은 노드가 씨드 후보들로 선택된다.In step S320, among the extracted nodes, nodes having a large relation with other nodes are selected as seed candidates.

예를 들면, 도 2에서, U2(3), U3(4)이 선택된다.For example, in Fig. 2, U2 (3) and U3 (4) are selected.

단계 S330에서, 씨드 후보 노드들 중 각각 연결된 이웃 노드들의 중요도(주변 노드와 관계를 맺은 회수)를 합산하여 값이 큰 순서대로 씨드가 추출된다.In step S330, seeds are extracted in descending order of importance by summing the importance (the number of times the nodes are related to each other) of the neighbor nodes connected to the seed candidate nodes.

예를 들면, 도 2에서, U2=2+1+4=7, U3=2+2+3+1=8 이므로 U3가 씨드로 추출된다.For example, in Fig. 2, U3 = 2 + 1 + 4 = 7 and U3 = 2 + 2 + 3 + 1 = 8.

단계 S340에서 추출된 씨드 후보 노드들 중 씨드는 미리 설정된 클러스터 개수만큼 임의로 추출된다.
The seed among the seed candidate nodes extracted in step S340 is arbitrarily extracted by the predetermined number of clusters.

도 4는 본 발명의 일 실시예에 따른 클러스터링 방법을 설명하기 위한 도면이다.4 is a diagram for explaining a clustering method according to an embodiment of the present invention.

도 4를 참조하면, 단계 S410에서 추출 씨드들에 각각 클러스터를 할당한다.Referring to FIG. 4, clusters are allocated to extraction seeds in step S410.

단계 S420에서 각각의 클러스터에 초기 중심 값과 포함 노드 개수 1을 할당한다. 여기서, 초기 중심 값은 추출된 씨드의 위치 값이다.In step S420, the initial center value and the number of included nodes 1 are assigned to each cluster. Here, the initial center value is the position value of the extracted seed.

단계 S430에서 씨드를 제외한 노드들 각각에 대하여 유클리디언 거리 값에 기초한 하기 수학식 1을 이용하여 씨드와 그 거리가 가장 짧은 거리의 클러스터에 해당 노드를 할당한다. 여기서, 거리는 사용자 노드와 씨드와의 유사도로 간주된다. In step S430, the nodes are allocated to the clusters having the shortest distance from the seed using the Euclidean distance value based on Euclidean distance values for each of the nodes except the seed. Here, the distance is regarded as the degree of similarity between the user node and the seed.

여기서, TN은 사용자 노드 또는 씨드가 경험한 아이템의 전체 개수이며, MN은 한 쌍의 노드들이 공통적으로 경험한 아이템의 개수임.
Here, TN is the total number of items experienced by the user node or seed, and MN is the number of items that a pair of nodes have commonly experienced.

도 5 및 도 6은 본 발명의 일 실시예에 따른 클러스터 재조정 방법을 설명하기 위한 도면들이다.5 and 6 are views for explaining a cluster rebalance method according to an embodiment of the present invention.

클러스터 재조정 단계는 구성된 클러스터들을 대상으로 각 클러스터에 속한 사용자들을 재배치 하여 유사 사용자 클러스터링의 정확도를 보다 높이기 위하여 정확도 평가 수치인 Q가 적용된다. 여기서, 정확도 평가 수치인 Q는 1의 값을 가질수록 정확도가 높으며, Q가 최대 값이 되도록 클러스터링이 재조정된다.In the cluster rebalancing step, the accuracy evaluation value Q is applied to rearrange the users belonging to each cluster in the configured clusters and to improve the accuracy of the similar user clustering. Here, Q, which is an accuracy evaluation value, has a higher accuracy with a value of 1, and the clustering is readjusted so that Q becomes a maximum value.

우선, 각 클러스터의 제1 정확도 평가 수치값 Q가 산출된다. First, the first accuracy evaluation numerical value Q of each cluster is calculated.

도 5를 참조하면, 최초에 제1 클러스터에 V1 및 V2에 할당되며, 제2 클러스터에 V3, V4 및 V5가 할당된다.Referring to FIG. 5, V1 and V2 are initially allocated to the first cluster, and V3, V4, and V5 are allocated to the second cluster.

이를 기반으로 Q 값을 계산하면, When the Q value is calculated based on this,

여기서, TR(e)은 클로스터 내에서 사용자 노드들과 연결되는 엣지들(edges)의 부분을 나타내며, 좋은 클러스터는 높은 값을 가진다. a_k ²는 소셜 네트워크에서 엣지들이 무질서하게 분포되었을 때, 클러스터에서 예상되는 엣지들의 부분이다. ∥e∥는 행렬 e 요소의 합이다.Here, TR (e) denotes a portion of edges connected with user nodes in the closter, and a good cluster has a high value. a _k ² is the portion of the edges expected in the cluster when the edges are randomly distributed in the social network. ∥e∥ is the sum of matrix e elements.

이후, 서로 다른 클러스터에 속한 임의의 노드가 두 개 추출돼서 상기 서로 다른 클러스터로 맞교환되고, 서로 다른 클러스터의 제2 정확도 평가 수치 값 Q가 산출된다.Thereafter, two arbitrary nodes belonging to different clusters are extracted and swapped into the different clusters, and second accuracy evaluation numerical values Q of different clusters are calculated.

도 6을 참조하면, V3를 제1 클러스터로 재조정한 후 제2 Q 값이 산출된다.Referring to FIG. 6, after the V3 is readjusted to the first cluster, a second Q value is calculated.

마지막으로, 제1 정확도 평가 수치 값과 제2 정확도 평가 수치 값이 비교되어 제2 정확도 평가 수치 값이 더 큰 경우, 추출된 노드가 서로 다른 클러스터로 맞교환되어 클러스터가 재조정된다.Finally, when the first accuracy evaluation numerical value is compared with the second accuracy evaluation numerical value and the second accuracy evaluation numerical value is larger, the extracted nodes are swapped into different clusters and the cluster is readjusted.

예를 들면, 제2 Q값 - 1.05476이 산출되며, 이후 제2 Q값 -1.05476〉 제1 Q값 -1.81852가 되므로, 클러스터를 제2 Q값과 같이 재조정한다.For example, the second Q value-1.05476 is calculated, and then the second Q value -1.05476> the first Q value -1.81852, so that the cluster is readjusted like the second Q value.

상술한 바와 같이, 서로 다른 클러스터에 속한 임의의 노드를 두 개 추출해서 계속적으로 클러스터 ID를 맞교환하고, Q값을 산출한 후, Q값이 최대값을 가지도록 계산을 반복한다.
As described above, two arbitrary nodes belonging to different clusters are extracted, the cluster IDs are continuously exchanged, the Q value is calculated, and the calculation is repeated so that the Q value has the maximum value.

도 7은 본 발명의 일 실시예에 따른 유사 사용자 인덱스 방법을 설명하기 위한 도면이다.7 is a view for explaining a similar user index method according to an embodiment of the present invention.

도 7을 참조하면, 본 발명의 마지막 단계인 유사 사용자 인덱스 방법은 유사 사용자 클러스터링 단계에서 구성된 클러스터를 가지고 최종 인덱스를 구성된다. Referring to FIG. 7, the pseudo-user index method, which is the last step of the present invention, constitutes a final index with a cluster constituted in the pseudo-user clustering step.

클러스터는 각자의 클러스터 ID를 가지고 있다. 추천 시스템은 이러한 클러스터 ID를 사용할 수 있고, 유사한 사용자 인덱스에 빨리 접근하기 위하여 도 7과 같이 해쉬 함수(Hash function)를 정의할 수 있다. 사용자 ID가 주어지면 이 사용자 ID와 유사한 사용자 들은 해쉬함수를 통하여 검색될 수 있다. 검색된 유사한 사용자 세트는 유사한 사용자 클러스터를 나타낸다. 유사한 사용자 인덱스는 클러스터 ID 사전과 사용자 ID 리스트로 구성된다. 여기서, 클러스터 ID 사전은 그들 자신의 중심 값과 함께 클러스터 ID를 기록한다. 정의된 해쉬 함수는 주어진 사용자 ID와 각 유사 사용자 클러스터의 중심 간의 유사도가 계산되고, 계산된 유사도에 따라 주어진 사용자는 유사도의 최대값을 가지는 클러스터 ID가 출력된다.
Clusters have their own cluster IDs. The recommendation system can use such a cluster ID and define a hash function as shown in FIG. 7 to quickly access a similar user index. Given a user ID, users similar to this user ID can be retrieved through a hash function. A similar set of users retrieved represents a similar user cluster. A similar user index consists of a cluster ID dictionary and a list of user IDs. Here, the cluster ID dictionary records the cluster ID together with its own center value. In the defined hash function, the similarity degree between a given user ID and the center of each similar user cluster is calculated, and a given user outputs a cluster ID having the maximum value of similarity according to the calculated similarity.

도 8 및 도 9는 본 발명의 일 실시예에 따른 유사 사용자 인덱스 방법의 효과를 설명하기 위한 도면들이다.8 and 9 are views for explaining the effect of the pseudo-user index method according to an embodiment of the present invention.

도 8을 참조하면, 대형 추천 시스템에 대해 유사 사용자 클러스터링 기초한 인덱스 방법의 효율성에 대해 평가하였다. Referring to FIG. 8, the efficiency of the pseudo-user clustering-based indexing method for large recommendation systems is evaluated.

효율성 평가를 위하여 임의적으로 생성한 추천 시스템의 데이터 세트는 아래와 같다.
The data set of the recommended system arbitrarily generated for the efficiency evaluation is as follows.

명칭designation 사용자 IDs의 개수Number of user IDs 데이터 크기(MB)Data size (MB) D1D1 37063706 1.41.4 D2D2 35803580 0.970.97 D3D3 34303430 0.850.85 D4D4 33103310 0.780.78 D5D5 29402940 0.710.71 D6D6 26102610 0.670.67 D7D7 24702470 0.580.58 D8D8 23102310 0.510.51 D9D9 20052005 0.480.48 D10D10 18501850 0.410.41

상술한 데이터 세트는 이용하며, 본 발명에 따른 유사 사용자 인덱스 방법과, 일반적인 협업 필터링 방법에 의한 추천 시스템의 반응 시간에 실험을 수행하였다.Experiments were performed on the response time of the recommender system using the similar user index method according to the present invention and the general collaborative filtering method using the above-described data set.

도 8을 참조하면, 2가지 추천 시스템에 대해 반응 시간에 대한 실험 결과, 본 발명에 의한 유사 사용자 인덱스 방법은 데이터 크기가 커지면 커질수록 반응시간을 상대적으로 크게 증가되지 않고 실행되었다.Referring to FIG. 8, as a result of the experiment on the reaction time for the two recommendation systems, the similar user index method according to the present invention was executed without increasing the reaction time relatively as the data size increased.

도 9를 참조하면, K-means에 의한 클러스터링 정확도와 본 발명에 의한 클러스터링 정확도를 비교하였다. 본 발명에 따른 클러스터링 방법은 데이터 크기가 커져도 정확도 Q값이 상대적으로 덜 떨어져서, K-means에 의한 클러스터링에 비하여 더욱 정확하다는 것을 확인할 수 있다.
Referring to FIG. 9, the clustering accuracy by K-means and the clustering accuracy according to the present invention are compared. The clustering method according to the present invention can be confirmed to be more accurate than clustering by K-means because the accuracy Q value is relatively less even if the data size is large.

본 발명의 다양한 실시예에 따른 사용자 인증 서비스 방법은 다양한 서버 등의 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현될 수 있다. The user authentication service method according to various embodiments of the present invention can be implemented in the form of program instructions that can be executed through various means such as servers.

또한, 본 발명에 따른 사용자 인증 서비스 방법을 실행하는 프로그램 및 애플리케이션은 컴퓨터 수단에 설치되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. Further, a program and an application for executing the user authentication service method according to the present invention may be installed in a computer means and recorded in a computer-readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination.

또한, 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 또한 상술한 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.In addition, program instructions recorded on a computer-readable medium may be those specially designed and constructed for the present invention or may be available to those skilled in the computer software arts. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Includes hardware devices specifically configured to store and execute program instructions such as magneto-optical media and ROM, RAM, flash memory, and the like. The above-mentioned medium may also be a transmission medium such as a light or metal wire, wave guide, etc., including a carrier wave for transmitting a signal designating a program command, a data structure and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이제까지 본 발명에 대하여 그 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The embodiments of the present invention have been described above. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

delete

In a similar user index method in a recommendation system,
Generating a social network in a recommender data set;
Selecting a seed using the generated social network;
Clustering the selected seeds;
The clustered cluster being readjusted; And
Wherein the cluster is indexed into a pseudo-user cluster,
Clustering about the selected seeds
Assigning an initial center value and a node to each cluster; And
Each of the nodes excluding the seeds is calculated by the following Eq. (2) based on Euclidian distance measurement and assigned to the cluster with the shortest distance,
&Quot; (2) "

Here, TN is the total number of items experienced by the user node or seed, and MN is the number of items that a pair of nodes have commonly experienced.

In a similar user index method in a recommendation system,
Generating a social network in a recommender data set;
Selecting a seed using the generated social network;
Clustering the selected seeds;
The clustered cluster being readjusted; And
Wherein the cluster is indexed into a pseudo-user cluster,
The step of the clustered cluster being readjusted
Calculating a first accuracy evaluation numerical value of each cluster;
Two arbitrary nodes belonging to different clusters are extracted and swapped into the different clusters, and a second accuracy evaluation numerical value of the different cluster is calculated; And
Wherein the first accuracy evaluation numerical value and the second accuracy evaluation numerical value are compared so that when the second accuracy evaluation numerical value is larger, the extracted nodes are swapped into the different clusters and the clusters are rearranged User index method.

In a similar user index method in a recommendation system,
Generating a social network in a recommender data set;
Selecting a seed using the generated social network;
Clustering the selected seeds;
The clustered cluster being readjusted; And
Wherein the cluster is indexed into a pseudo-user cluster,
Wherein the cluster is pseudo-user cluster indexed
A similarity degree between a given user ID and a center of each similar user cluster is calculated using a hash function, and a cluster ID is output according to the calculated similarity.

delete

A computer program stored on a medium for performing pseudo-user indexing in a recommendation system,
Generating a social network in a recommender data set;
Selecting a seed using the generated social network;
Clustering the selected seeds;
The clustered cluster being readjusted; And
Wherein the cluster is indexed into a pseudo-user cluster,
Clustering about the selected seeds
Assigning an initial center value and a node to each cluster; And
Computing each of the nodes except the seed by the following Euclidean distance measure and assigning it to the cluster with the shortest distance.
&Quot; (3) "

A computer program stored on a medium for performing pseudo-user indexing in a recommendation system,
Generating a social network in a recommender data set;
Selecting a seed using the generated social network;
Clustering the selected seeds;
The clustered cluster being readjusted; And
Wherein the cluster is indexed into a pseudo-user cluster,
The step of the clustered cluster being readjusted
Calculating a first accuracy evaluation numerical value of each cluster;
Two arbitrary nodes belonging to different clusters are extracted and swapped into the different clusters, and a second accuracy evaluation numerical value of the different cluster is calculated; And
Wherein the first accuracy evaluation numerical value and the second accuracy evaluation numerical value are compared so that when the second accuracy evaluation numerical value is larger, the extracted nodes are swapped into the different clusters and the clusters are rearranged A computer program stored on a medium for performing user indexing.

A computer program stored on a medium for performing pseudo-user indexing in a recommendation system,
Generating a social network in a recommender data set;
Selecting a seed using the generated social network;
Clustering the selected seeds;
The clustered cluster being readjusted; And
Wherein the cluster is indexed into a pseudo-user cluster,
Wherein the cluster is pseudo-user cluster indexed
Wherein the hash function is used to calculate a similarity between a given user ID and a center of each similar user cluster, and a cluster ID is output according to the calculated similarity.