KR101567684B1

KR101567684B1 - Method for selecting recommendation type in product recommeding system

Info

Publication number: KR101567684B1
Application number: KR1020140037910A
Authority: KR
Inventors: 김재경; 문현실
Original assignee: 경희대학교 산학협력단
Priority date: 2014-03-31
Filing date: 2014-03-31
Publication date: 2015-11-10
Also published as: KR20150113644A

Abstract

본 발명은 협업 필터링 기반의 상품 추천 방법에 관한 것으로, 보다 구체적으로 상품 구매 데이터베이스의 데이터 특성에 기초하여 상품 구매 데이터베이스를 다수의 샘플 네트워크로 구성하고 각 샘플 네트워크에서 유의성을 가지는 특성으로 전체 상품 구매 데이터의 추천 성과를 판단하여 상품 구매 데이터베이스에 가장 적합한 추천 기법을 선택하는 방법에 관한 것이다.The present invention relates to a product recommendation method based on collaborative filtering. More specifically, the present invention relates to a product recommendation method based on collaborative filtering based on data characteristics of a commodity purchase database, And a method for selecting a recommendation technique most suitable for a product purchase database.

Description

[0001] The present invention relates to a method for selecting recommendation techniques in a product recommendation system based on collaborative filtering,

전자상거래의 급성장으로 기업들의 성장을 위한 경쟁은 더욱 심화되어 다른 경쟁업체보다 경쟁우위를 가질 수 있는 마케팅 전략이 필요하게 되었고, 고객은 상품 정보의 과다로 인하여 효과적으로 상품을 선택할 수 없게 되는 상품 과부하 현상을 야기시켰다. The rapid growth of e-commerce has increased the competition for the growth of companies, which necessitates a marketing strategy that can have a competitive advantage over other competitors. The customer is unable to select products effectively because of excessive information on the products. .

이러한 문제를 해결하기 위한 정보기술 중의 하나가 고객의 선호도에 부합하는 상품을 찾도록 도와주는 상품 추천 시스템이다. 상품 추천 시스템은 고객들에게 추천 상품 목록을 제공하여 고객들이 구매 가능성이 있는 상품을 쉽게 찾도록 도와주는 개인화된 정보 필터링 기술이다. 상품 추천 시스템에서는 사용자에 적합한 상품을 추천하기 위하여 다양한 추천 기법들이 사용되고 있는데, 협업 필터링에 기반한 추천 기법은 상품 추천 시스템 중에서 가장 성공적인 상품 추천 기법으로 알려져 있으며 현재 많이 이용되고 있다. 협업 필터링은 웹을 기반으로 하는 전자 쇼핑몰에서 이용되고 있는 성공적인 상품 추천 기법 중의 하나로서, 목표 고객과 유사한 구매 이력을 보이는 유사 고객들의 상품에 대한 선호도를 바탕으로 목표 고객에게 유용한 상품을 추천하는 방법이다.One of the information technologies to solve these problems is a product recommendation system that helps customers to find products that meet customer preferences. The Product Recommendation System is a personalized information filtering technology that helps customers find products that are likely to be purchased by providing a list of recommended products to their customers. In the recommendation system, a variety of recommendation techniques are used to recommend products suitable for users. The recommendation technique based on collaborative filtering is known as the most successful recommendation technique among product recommendation systems. Collaborative filtering is one of the most successful product recommendation techniques used in Web-based electronic shopping malls. It recommends useful products to target customers based on the preference of similar customers who have similar purchasing histories to their target customers .

그러나 종래의 협업 필터링 기반의 상품 추천 시스템은 상품 구매 데이터베이스의 데이터 특성에 따라 다음과 같은 문제점을 가진다.However, the conventional product recommendation system based on collaborative filtering has the following problems according to the data characteristics of the product purchase database.

첫째, 종래 협업 필터링 기반의 상품 추천 시스템은 상품 수에 비해 상품 구매 데이터베이스의 구매 데이터가 부족한 경우 목표 고객과 주변 고객 사이의 유사도를 정확하게 계산하기 곤란하다. 즉, 데이터 희박성(sparsity)의 문제점을 가진다.First, it is difficult for the commodity recommendation system based on the conventional collaborative filtering to accurately calculate the similarity between the target customer and the surrounding customers when the purchase data of the commodity purchase database is insufficient compared to the number of commodities. That is, it has a problem of data sparsity.

둘째, 종래 협업 필터링 기반의 상품 추천 시스템은 목표 고객이 이질적이고 독특한 구매 행위를 가지는 경우 상품 구매 데이터베이스의 주변 고객과 유사도를 측정하기 곤란하다는 문제점을 가진다.Second, the conventional product recommendation system based on collaborative filtering has a problem that it is difficult to measure the degree of similarity with the surrounding customers of the product purchase database when the target customer has a heterogeneous and unique purchasing behavior.

셋째, 종래 협업 필터링 기반의 상품 추천 시스템은 목표 고객이 신규 고객인 경우 목표 고객의 구매 정보가 없어 상품 구매 데이터베이스의 주변 고객과 유사도를 측정하기 곤란하다는 문제점을 가진다.Third, the conventional product recommendation system based on collaborative filtering has a problem that it is difficult to measure the degree of similarity with the surrounding customers of the product purchase database because there is no purchase information of the target customer when the target customer is a new customer.

넷째, 종래 협업 필터링 기반의 상품 추천 시스템은 상품 구매 데이터베이스에서 특정 주변 고객으로 군집화가 이루어진 경우 특정 주변 고객을 기준으로 상품 추천이 이루어지기 때문에 목표 고객이 구매하지 않았지만 높은 흥미를 가질 것으로 예상되는 다른 상품의 추천이 곤란하다는 문제점을 가진다. 즉, 추천 상품의 우연성(serendipity)이 적다는 문제점을 가진다.Fourth, since the product recommendation system based on the conventional collaborative filtering is a grouping from a product purchase database to a specific nearby customer, a product recommendation is performed based on a specific nearby customer. Therefore, other products that the target customer does not purchase but is expected to have a high interest Is difficult to recommend. That is, there is a problem that the serendipity of the recommended product is small.

매우 다양한 협업 필터링 기반의 추천 기법들이 개발되어 사용되고 있는데, 위에서 살펴본 바와 같이 종래 협업 필터링 기반의 상품 추천 기법은 상품 구매 데이터베이스의 종류에 따라 서로 상이한 문제점을 가질 수 있으며, 따라서 상품 구매 데이터베이스의 데이터 특성에 따라 서로 상이한 추천 기법을 선택하여 상품 추천 시스템을 운영할 필요가 있다. 그러나 종래 협업 필터링 기반의 상품 추천 시스템에서는 상품 구매 데이터베이스의 특성에 따라 다양한 추천 기법 중에서 적절한 추천 기법을 선택하는 기술이 전혀 없었으며, 다만 상품 추천 시스템 설계자의 경험에 기초하여 추천 기법을 설계하였다. As described above, the conventional product recommendation techniques based on the collaborative filtering may have different problems depending on the types of the product purchase database, and thus, the data characteristics of the product purchase database Therefore, it is necessary to operate a product recommendation system by selecting different recommendation techniques. However, in the product recommendation system based on the conventional collaborative filtering, there is no technique to select an appropriate recommendation technique from among various recommendation techniques according to the characteristics of the product purchase database, but the recommendation technique is designed based on the experience of the product recommender system designer.

본 발명은 위에서 언급한 종래 협업 필터링 기법의 추천 기법이 가지는 문제점을 해결하기 위한 것으로, 본 발명이 이루고자 하는 목적은 상품 구매 데이터베이스의 데이터 특성에 기초하여 다양한 추천 기법 중 상품 구매 데이터베이스에 적합한 추천 기법을 선택하는 방법을 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made to solve the problems of the conventional collaborative filtering technique, and it is an object of the present invention to provide a recommendation technique suitable for a commodity purchase database among various recommendation techniques based on data characteristics of a commodity purchase database To provide a method of selecting.

본 발명이 이루고자 하는 다른 목적은 네트워크 이론을 도입하여 상품 구매 데이터베이스로부터 다수의 샘플 네트워크를 생성하고 각 샘플 네트워크에서 공통적으로 유의한 데이터 특성을 추출하여 유의한 데이터 특성을 전체 상품 구매 데이터베이스로 이루어진 전체 네트워크에 적용하여 상품 구매 데이터베이스의 데이터 특성에 적합한 추천 기법을 선택하는 방법을 제공하는 것이다. It is another object of the present invention to provide a method and system for generating a plurality of sample networks from a product purchase database by introducing network theory, extracting common data characteristics in each sample network, And to provide a method for selecting a recommendation technique suitable for the data characteristics of the product purchase database.

본 발명의 목적을 달성하기 위하여 본 발명에 따른 협업필터링 기반의 상품 추천 시스템에서 추천 기법을 선택하는 방법은 상품 구매 데이터베이스로부터 생성되는 샘플 그룹에 포함되어 있는 사용자 사이의 상품 구매 유사도에 기초하여 샘플 네트워크를 생성하고 샘플 네트워크의 특성을 나타내는 샘플 특성 계수를 계산하는 단계와, 샘플 네트워크에 대해 서로 상이한 적어도 1개 이상의 협업 필터링 기반의 추천 기법을 적용하여 각 추천 기법에 대한 상기 샘플 네트워크의 추천 성공율을 계산하는 단계와, 샘플 특성 계수를 독립 변인으로 하고 추천 성공율을 종속 변인으로 하여 각 추천 기법별로 다중 회귀식을 생성하며 각 추천 기법별 다중 회귀식에서 샘플 특성 계수의 유의성을 판단하는 단계와, 상품 구매 데이터베이스를 구성하는 모든 사용자로 이루어진 전체 네트워크를 기준으로 추천 기법의 다중 회귀식에서 공통적으로 유의성을 가지는 전체 특성 계수를 계산하는 단계와, 전체 특성 계수를 각 추천 기법의 다중 회귀식에 적용하여 가장 높은 추천값을 가지는 추천 기법을 상품 구매 데이터베이스의 추천 기법으로 선택하는 단계를 포함하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a method for selecting a recommendation technique in a product recommendation system based on collaborative filtering, the method comprising: receiving, by a sample network, Calculating a recommendation success rate of the sample network for each recommendation technique by applying at least one or more collaborative filtering based recommendation techniques different from each other to the sample network; Determining a significance of a sample characteristic coefficient in a multiple regression equation for each recommendation technique, generating a plurality of regression equations for each recommendation technique with the sample characteristic coefficient as an independent variable and the recommendation success rate as a dependent variable, Any use that makes up A step of calculating a total characteristic coefficient having a common significance in a multiple regression equation of a recommendation scheme based on the entire network consisting of the recommendation scheme and a recommendation scheme having the highest recommendation value by applying the global feature coefficient to a multiple regression equation of each recommendation scheme And selecting as a recommendation technique of the product purchase database.

여기서 샘플 그룹은 사용자의 상품 구매 데이터베이스에서 무작위로 선택된 설정된 수의 사용자로 적어도 1개 이상 생성되는 것을 특징으로 한다.Wherein at least one sample group is generated from a set number of users randomly selected from a user's product purchase database.

바람직하게, 샘플 네트워크의 특성을 나타내는 샘플 특성 계수를 계산하는 단계는 샘플 그룹을 구성하는 사용자 사이의 구매 유사도를 계산하는 단계와, 구매 유사도가 임계 유사도 이상인 사용자 사이를 연결하여 샘플 네트워크를 생성하는 단계와, 샘플 네트워크에서 사용자 사이의 연결 관계를 이용하여 샘플 네트워크의 특성을 나타내는 샘플 특성 계수를 계산하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step of calculating the sample characteristic coefficient indicative of the characteristics of the sample network includes a step of calculating a similarity degree of purchase among users constituting the sample group, a step of connecting the users whose purchase similarity degree is equal to or higher than the threshold similarity degree, And calculating a sample characteristic coefficient indicating a characteristic of the sample network using the connection relationship between the users in the sample network.

여기서 샘플 네트워크의 특성을 나타내는 샘플 특성 계수는 샘플 네트워크(S_i)를 구성하는 사용자의 연결 관계에 기초하여 사용자가 얼마나 긴밀하게 연결되어 있는지를 나타내는 밀도 계수(DEN(S_i)), 샘플 네트워크(S_i)를 구성하는 사용자의 연결 관계에 기초하여 주변 사용자와 연결되지 않는 사용자의 비율을 나타내는 포괄 계수(DIV(S_i)), 샘플 네트워크를 구성하는 사용자의 연결 관계에 기초하여 사용자와 연결되어 있는 주변 사용자 사이의 연결 정도를 나타내는 군집 계수(CLU(S_i)) 및 샘플 네트워크를 구성하는 사용자의 연결 관계에 기초하여 사용자를 중심으로 주변 사용자와 얼마나 집중적으로 연결되어 있는지를 나타내는 집중 계수(CEN(S_i)) 중 적어도 어느 하나인 것을 특징으로 한다. Here, the sample characteristic coefficient indicating the characteristics of the sample network includes a density coefficient (DEN (S _i )) indicating how closely the user is connected based on the connection relation of the users constituting the sample network (S _i ) based on the user's connection relationships that make up the S _i) based on the user's connection relationship encompassing represents the percentage of users that are not connected to the peripheral user coefficient (DIV (S _i)), configuring the sample network is linked to the user (CLU (S _i )) indicating the degree of connection between the surrounding users and a concentration factor (CEN) indicating how intensively the user is connected to the surrounding user based on the connection relationship of the users constituting the sample network ( _Si )). &Lt; / RTI >

밀도 계수(DEN(S_i))는 아래의 수학식(1)에 의해 계산되며,The density coefficient (DEN (S _i )) is calculated by the following equation (1)

[수학식 1][Equation 1]

여기서 k_i는 샘플 네트워크(S_i)의 전체 연결 수를 의미하며, n_i은 샘플 네트워크(S_i)를 구성하는 전체 사용자 수인 것을 특징으로 한다.Where k _i is the number of complete connection of a sample network (S _i) and, n _i is characterized in that the number of users constituting the sample network (S _i).

포괄 계수(DIV(S_i))는 아래의 수학식(2)에 의해 계산되며,The coverage coefficient DIV (S _i ) is calculated by the following equation (2)

[수학식 2]&Quot; (2) "

여기서 n_it는 샘플 네트워크(Si)에서 주변 사용자와 연결되어 있지 않은 사용자의 수인 것을 특징으로 한다.Where n _it is the number of users who are not connected to the surrounding users in the sample network Si.

군집 계수(CLU(S_i))는 아래의 수학식(3)에 의해 계산되며,The cluster coefficient (CLU (S _i )) is calculated by the following equation (3)

[수학식 3]&Quot; (3) "

여기서 n_v는 주변 사용자와 2개 이상의 연결을 가지는 사용자의 수를 의미하며, V는 주변 사용자와 2개 이상의 연결을 가지는 사용자 집합을 의미하며, clu(p)는 주변 사용자와 2개 이상의 연결을 가지는 사용자 집합 중 어느 하나의 사용자(p)의 개별 군집 계수로 아래의 수학식(4)와 같이 계산되며,Where n _v is the number of users who have two or more connections with nearby users, V is a set of users with two or more connections with neighboring users, and clu (p) (P) is calculated as an individual cluster coefficient of any one of the user sets as shown in the following equation (4)

[수학식 4] &Quot; (4) "

여기서 triple_p는 사용자(p)에 2명의 주변 사용자가 연결되어 있는 수를 의미하며, triangle_p는 사용자(p)에 연결되어 있는 2명의 주변 사용자가 서로 연결되어 있는 수를 의미하는 것을 특징으로 한다. Here, triple _p means the number of two users connected to the user (p), and triangle _p means the number of two peripheral users connected to the user (p) connected to each other .

집중 계수(CEN(S_i))는 아래의 수학식(5)에 의해 계산되며,The concentration factor CEN (S _i ) is calculated by the following equation (5)

[수학식 5]&Quot; (5) "

여기서 cen(u_l)은 샘플 네트워크(Si)를 구성하는 사용자(u_l)의 개별 집중 계수를 의미하며, cen(u^*)는 샘플 네트워크(S_i)를 구성하는 사용자 중 가장 높은 개별 집주 계수를 의미하며,The cen (u _l) is a sample network means an individual concentration factor of the user (u _l) constituting the ^{(Si), cen (u *} ) is the highest individual jipju coefficient of users constituting a sample network (S _i) &Quot;

여기서 개별 집중 계수는 아래의 수학식(6)에 의해 계산되며,Here, the individual concentration factor is calculated by the following equation (6)

[수학식 6]&Quot; (6) "

여기서 m_j는 사용자 U_l이 샘플 네트워크와 주변 사용자(U_j)와 연결되어 있으며 1의 값을 가지며 주변 사용자 (U_j)와 연결되어 있지 않은 경우 0의 값을 가지는 것을 특징으로 한다. Where m _j is a value of _{0 when} the user U _l is connected to the sample network and the neighboring user U _j and has a value of 1 and is not connected to the neighboring user U _j .

바람직하게, 샘플 네트워크 추천 성공율을 계산하는 단계는 샘플 네트워크를 구성하는 사용자를 설정된 수의 훈련 데이터 그룹과 테스트 데이터 그룹으로 구분하여 생성하는 단계와, 훈련 데이터 그룹에 서로 상이한 적어도 1개 이상의 협업 필터링 기반의 추천 기법을 적용하여 추천 기법별로 추천 상품을 추출하는 단계와, 추출한 상품과 테스트 데이터 그룹의 사용자가 실제 구매한 상품을 비교하여 추천 기법별로 샘플 네트워크의 추천 성공율을 계산하는 단계를 포함하는 것을 특징으로 한다. Preferably, the step of calculating the sample network recommendation success rate comprises the steps of: dividing the users constituting the sample network into a set number of training data groups and a test data group; and generating at least one or more collaborative filtering based on training data groups And a step of calculating a recommendation success rate of the sample network for each recommendation technique by comparing the extracted product with a product actually purchased by a user of the test data group .

바람직하게, 전체 특성 계수를 계산하는 단계는 각 추천 기법별 다중 회귀식에서 공통으로 유의한 샘플 특성 계수의 종류를 판단하고, 판단한 종류의 전체 특성 계수를 각 추천 기법의 다중 회귀식에 적용하여 가장 높은 추천값을 가지는 추천 기법을 상품 구매 데이터베이스의 추천 기법으로 최종 선택하는 것을 특징으로 한다.Preferably, the step of calculating the total coefficient of characteristic is a step of judging the type of the sample characteristic coefficient which is common in the multiple regression equation for each recommendation technique, and applying the total characteristic coefficient of the determined type to the multiple regression equation of each recommendation technique, And a recommendation technique having a recommendation value is finally selected as a recommendation technique of a purchase database.

본 발명의 목적을 달성하기 위하여 본 발명에 따른 협업 필터링 기반의 상품 추천 시스템에서 추천 기법의 선택 장치는 사용자의 상품 구매 데이터베이스에서 무작위로 선택된 설정된 수의 사용자로 적어도 1개 이상 샘플 그룹을 생성하고 상품 구매 데이터베이스로부터 생성되는 샘플 그룹에 포함되어 있는 사용자 사이의 상품 구매 유사도에 기초하여 샘플 네트워크를 생성하는 네트워크 생성부와, 샘플 네트워크를 기준으로 샘플 네트워트의 특성을 나타내는 샘플 특성 계수 또는 상품 구매 데이터베이스를 구성하는 모든 사용자로 이루어진 전체 네트워크를 기준으로 전체 네트워크의 특성을 나타내는 전체 특성 계수를 계산하는 특성 계수 계산부와, 샘플 네트워크에 대해 서로 상이한 적어도 1개 이상의 협업 필터링 기반의 추천 기법을 적용하여 각 추천 기법에 대한 샘플 네트워크의 추천 성공율을 계산하는 추천 성공율 계산부와, 특성 계수를 독립 변인으로 하고 추천 성공율을 종속 변인으로 하여 각 추천 기법별로 다중 회귀식을 생성하며 각 추천 기법별 다중 회귀식에서 특성 계수의 유의성을 판단하는 유의성 판단부와, 추천 기법의 다중 회귀식에서 공통적으로 유의성을 가지는 전체 특성 계수를 각 추천 기법의 다중 회귀식에 적용하여 가장 높은 추천값을 가지는 추천 기법을 상품 구매 데이터베이스의 추천 기법으로 선택하는 추천 기법 선택부를 포함하는 것을 특징으로 한다.In order to achieve the object of the present invention, in a recommendation technique selection system based on collaborative filtering according to the present invention, at least one sample group is generated from a set number of users randomly selected from a product purchase database of a user, A network generating unit for generating a sample network based on a similarity of goods purchase among users included in a sample group generated from the purchase database; and a sample characteristic coefficient or product purchase database indicating the characteristics of the sample network on the basis of the sample network A characteristic coefficient calculation unit for calculating a total characteristic coefficient indicating the characteristics of the entire network based on the entire network made up of all the users who are connected to the network and at least one recommendation technique based on the collaboration filtering based on the sample network, A recommendation success rate calculator for calculating the recommendation success rate of the sample network for the recommendation technique, a multiple succession rate expression generator for each recommendation technique with the feature coefficient as an independent variable and the recommendation success rate as the dependent variable, A significance determiner for determining the significance of the coefficients, and a recommendation technique having the highest recommendation value by applying the total feature coefficients having common significance in the multiple regression equations of the recommendation technique to the multiple regression equations of each recommendation technique, And a recommendation scheme selection unit for selecting the recommendation scheme selection scheme.

바람직하게, 네트워크 생성부는 사용자의 상품 구매 데이터베이스에서 무작위로 선택된 설정된 수의 사용자로 적어도 1개 이상 샘플 그룹을 생성하는 샘플 그룹 생성부와, 샘플 그룹을 구성하는 사용자 사이의 구매 유사도를 계산하는 유사도 계산부와, 구매 유사도가 임계 유사도 이상인 사용자 사이를 연결하여 샘플 네트워크를 형성하는 샘플 네트워크 생성부를 포함하는 것을 특징으로 한다. Preferably, the network generating unit includes a sample group generating unit that generates at least one sample group from the set number of users randomly selected from the user's product purchase database, a similarity calculation unit that calculates a similarity degree between the users constituting the sample group, And a sample network generating unit for forming a sample network by connecting users whose purchase similarity is equal to or higher than the threshold similarity degree.

바람직하게, 추천 성공율 계산부는 샘플 네트워크를 구성하는 사용자를 설정된 수의 훈련 데이터 그룹과 테스트 데이터 그룹으로 생성하는 데이터 그룹 생성부와, 훈련 데이터 그룹에 서로 상이한 적어도 1개 이상의 협업 필터링 기반의 추천 기법을 적용하여 추천 기법별로 추천 상품을 추출하는 추천 상품 추출부와, 추출한 상품과 테스트 데이터 그룹의 사용자가 실제 구매한 상품을 비교하여 추천 기법별로 샘플 네트워크의 추천 성공율을 계산하는 성공율 계산부를 포함하는 것을 특징으로 한다. Preferably, the recommendation success rate calculation unit includes a data group generation unit that generates a training data group and a test data group with a predetermined number of users constituting the sample network, and at least one recommendation technique based on collaborative filtering that is different from the training data group And a success rate calculation unit for calculating a recommendation success rate of the sample network by comparing the extracted product with a product actually purchased by the user of the test data group and by referring to the recommendation technique .

본 발명에 따른 협업 필터링 기반의 상품 추천 시스템에서 추천 기법을 선택하는 방법은 상품 구매 데이터베이스의 데이터 특성에 기초하여 다양한 추천 기법 중 상품 구매 데이터베이스에 적합한 추천 기법을 선택함으로써, 데이터 특성에 따라 사용자에 가장 적절한 상품을 추천할 수 있다.A method for selecting a recommendation technique in a product recommendation system based on collaborative filtering according to the present invention includes selecting a recommendation technique suitable for a product purchase database among various recommendation techniques based on data characteristics of a product purchase database, Appropriate products can be recommended.

또한, 본 발명에 따른 협업 필터링 기반의 상품 추천 시스템에서 추천 기법의 선택 방법은 네트워크 이론을 도입하여 상품 구매 데이터베이스로부터 다수의 샘플 네트워크를 생성하고 각 샘플 네트워크에서 공통적으로 유의한 데이터 특성만을 추출함으로써, 유의한 데이터 특성을 전체 상품 구매 데이터베이스로 이루어진 전체 네트워크에 적용하여 상품 구매 데이터베이스의 데이터 특성에 적합한 추천 기법을 정확하게 선택할 수 있다. In addition, the method of selecting a recommendation technique in a product recommendation system based on collaborative filtering according to the present invention includes introducing a network theory to generate a plurality of sample networks from a product purchase database, extracting only common data characteristics from each sample network, It is possible to accurately select a recommendation technique suitable for the data characteristics of the product purchase database by applying the significant data characteristics to the entire network including the entire product purchase database.

도 1은 본 발명에 따른 협업 필터링 기반의 상품 추천 시스템에서 추천 기법을 선택하는 장치를 설명하기 위한 기능 블록도이다.
도 2는 본 발명에 따른 네트워크 생성부를 보다 구체적으로 설명하기 위한 기능 블록도이다.
도 3은 본 발명에 따른 전체 그룹과 샘플 그룹의 일 예를 설명하기 위한 도면이다.
도 4는 본 발명에 따른 추천 성공율 계산부(140)를 보다 구체적으로 설명하기 위한 기능 블록도이다.
도 5는 본 발명에 따른 협업 필터링 기반의 상품 추천 시스템에서 상품 구매 데이터베이스의 데이터 특성에 따라 추천 기법을 선택하는 방법을 설명하기 위한 흐름도이다.
도 6은 본 발명에 따른 샘플 특성 계수를 계산하는 단계를 보다 구체적으로 설명하기 위한 흐름도이다.
도 7은 도 7은 상품 구매 데이터베이스의 일 예를 도시하고 있다.
도 8은 군집 계수를 설명하기 위한 도면이다.
도 9는 본 발명에 따른 추천 성공율을 계산하는 단계를 보다 구체적으로 설명하기 위한 흐름도이다.FIG. 1 is a functional block diagram for explaining an apparatus for selecting a recommendation technique in a product recommendation system based on collaborative filtering according to the present invention.
2 is a functional block diagram for explaining the network generating unit according to the present invention in more detail.
3 is a view for explaining an example of a whole group and a sample group according to the present invention.
4 is a functional block diagram for explaining the recommendation success rate calculation unit 140 according to the present invention in more detail.
FIG. 5 is a flowchart illustrating a method of selecting a recommendation technique according to data characteristics of a product purchase database in a product recommendation system based on a collaborative filtering according to the present invention.
6 is a flowchart for explaining the step of calculating the sample characteristic coefficient according to the present invention in more detail.
FIG. 7 shows an example of a goods purchase database.
8 is a diagram for explaining the population coefficient.
9 is a flowchart for explaining the step of calculating the recommendation success rate according to the present invention in more detail.

이하 첨부한 도면을 참고로 본 발명에 따른 협업 필터링 기반의 상품 추천 시스템에서 추천 기법을 설명하기 위한 도면이다.
BRIEF DESCRIPTION OF THE DRAWINGS The above and other features and advantages of the present invention will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: FIG.

도 1은 본 발명에 따른 협업 필터링 기반의 상품 추천 시스템에서 추천 기법을 선택하는 장치를 설명하기 위한 기능 블록도이다.FIG. 1 is a functional block diagram for explaining an apparatus for selecting a recommendation technique in a product recommendation system based on collaborative filtering according to the present invention.

도 1을 참고로 보다 구체적으로 살펴보면, 네트워크 생성부(110)는 상품 구매 데이터베이스(120)에서 무작위로 선택된 설정된 수의 사용자로 적어도 1개 이상 샘플 그룹을 생성하고 상품 구매 데이터베이스(120)로부터 생성되는 샘플 그룹에 포함되어 있는 사용자 사이의 상품 구매 유사도에 기초하여 샘플 네트워크를 생성한다. 상품 구매 데이터베이스(120)에는 상품 추천 시스템에서 관리하는 사용자 정보(예를 들어 사용자 이름, 성별, 나이, 주소, 직업 등), 사용자가 상품 추천 시스템을 통해 현재까지 구매한 상품의 리스트에 대한 데이터가 저장되어 있다.1, the network generating unit 110 generates at least one sample group from a predetermined number of users randomly selected from the product purchase database 120 and generates a sample group from the product purchase database 120 And creates a sample network based on the commodity purchase similarities among the users included in the sample group. The product purchase database 120 stores user information (for example, user name, gender, age, address, job, etc.) managed by the product recommendation system and data on a list of products that the user has purchased so far through the product recommendation system Is stored.

특성 계수 계산부(130)는 생성한 샘플 네트워크를 기준으로 샘플 네트워트의 특성을 나타내는 샘플 특성 계수를 계산한다. 여기서 샘플 특성 계수로 샘플 네트워크의 특성을 나타내는 계수로 본 발명에서 샘플 특성 계수는 샘플 네트워크(S_i)를 구성하는 사용자의 연결 관계에 기초하여 사용자가 얼마나 긴밀하게 연결되어 있는지를 나타내는 밀도 계수(DEN(S_i)), 샘플 네트워크(S_i)를 구성하는 사용자의 연결 관계에 기초하여 주변 사용자와 연결되지 않는 사용자의 비율을 나타내는 포괄 계수(DIV(S_i)), 샘플 네트워크를 구성하는 사용자의 연결 관계에 기초하여 사용자와 연결되어 있는 주변 사용자 사이의 연결 정도를 나타내는 군집 계수(CLU(S_i)), 및 샘플 네트워크를 구성하는 사용자의 연결 관계에 기초하여 사용자를 중심으로 주변 사용자와 얼마나 집중적으로 연결되어 있는지를 나타내는 집중 계수(CEN(S_i))가 사용된다.The characteristic coefficient calculation unit 130 calculates a sample characteristic coefficient indicating the characteristic of the sample network on the basis of the generated sample network. Here, the sample characteristic coefficient in the present invention is a coefficient representing the characteristic of the sample network as a sample characteristic coefficient. The sample characteristic coefficient is a density coefficient (DEN) indicating how closely the user is connected based on the connection relation of the users constituting the sample network S _i (S _i)), the user configuring the sample network (S _i) a configuration covering coefficient (DIV (S _i on the basis of the user of the connection relation shown the percentage of users that are not connected to the peripheral user to)), the sample network (CLU (S _i )) indicating the degree of connection between the user and the surrounding users based on the connection relationship, and the connection relation of the users constituting the sample network, (CEN (S _i )) is used to indicate whether or not the connection is established.

추천 성공율 계산부(140)는 샘플 네트워크에 대해 서로 상이한 적어도 1개 이상의 협업 필터링 기반의 추천 기법을 적용하여 각 추천 기법에 대한 샘플 네트워크의 추천 성공율을 계산하는데, 추천 성공율 계산부(140)는 샘플 네트워크를 구성하는 사용자를 설정된 수의 훈련 데이터 그룹과 테스트 데이터 그룹으로 분리하고 각 훈련 데이터 그룹에 각 협업 필터링 기반의 추천 기법을 적용하여 추출한 상품 중 테스트 데이터 그룹에서 사용자가 실제 구매한 상품의 비율로 각 추천 기법의 추천 성공율을 계산한다. The recommendation success rate calculator 140 calculates a recommendation success rate of the sample network for each recommendation technique by applying at least one recommendation technique based on the collaborative filtering that is different from each other for the sample network. The users constituting the network are divided into a set number of training data groups and test data groups and the recommendation technique based on each collaboration filtering is applied to each training data group, The recommendation success rate of each recommendation technique is calculated.

유의성 판단부(150)는 특성 계수를 독립 변인으로 하고 추천 성공율을 종속 변인으로 하여 각 추천 기법별로 다중 회귀식을 생성하며, 각 추천 기법별 다중 회귀식에서 특성 계수 중 유의한 특성 계수를 판단한다. The significance determiner 150 generates a multiple regression equation for each recommendation technique with the feature coefficient as an independent variable and the recommendation success rate as a dependent variable, and determines a significant feature coefficient among the feature coefficients in the multiple regression equation for each recommendation technique.

추천 기법 선택부(160)는 유의성 판단부(150)에서 판단한, 각 추천 기법의 다중 회귀식에서 공통적으로 유의성을 가지는 특성 계수를 판단하고, 특성 계수 판단부(130)로 하여금 공통적으로 유의성을 가지는 특성 계수만에 대해 상품 구매 데이터베이스(120)를 구성하는 모든 사용자로 이루어진 전체 네트워크를 기준으로 전체 네트워크의 특성을 나타내는 전체 특성 계수를 계산하도록 한다. 특성 계수 판단부(130)는 전체 특성 계수를 계산하며, 추천 기법 선택부(160)는 전체 특성 계수를 각 추천 기법의 다중 회귀식에 적용하여 가장 높은 추천값을 가지는 추천 기법을 상품 구매 데이터베이스(120)의 추천 기법으로 선택한다.
The recommendation scheme selection unit 160 determines characteristic coefficients having a common significance in the multiple regression equations of each recommendation scheme determined by the significance determination unit 150 and transmits the characteristics coefficients having the common significance to the characteristic coefficient determination unit 130 The total characteristic coefficient indicating the characteristics of the entire network is calculated on the basis of the entire network consisting of all the users constituting the commodity purchase database 120 with respect to the coefficient only. The feature coefficient determination unit 130 calculates the total feature coefficient, and the recommendation scheme selection unit 160 applies the overall feature coefficient to the multiple regression formula of each recommendation scheme to obtain a recommendation scheme having the highest recommendation value, 120).

도 2는 본 발명에 따른 네트워크 생성부를 보다 구체적으로 설명하기 위한 기능 블록도이다.2 is a functional block diagram for explaining the network generating unit according to the present invention in more detail.

도 2를 참고로 살펴보면, 샘플 그룹 생성부(111)는 사용자의 상품 구매 데이터베이스에서 무작위로 선택된 설정된 수의 사용자로 적어도 1개 이상 샘플 그룹을 생성한다. 도 3에 도시되어 있는 바와 같이, 상품 구매 데이터베이스를 구성하는 사용자를 전체 그룹(TG)으로, 그리고 전체 그룹 중에서 설정된 수로 무작위로 사용자를 선택하여 각 샘플 그룹(SG)을 생성한다. 여기서 각 샘플 그룹에 포함되어 있는 사용자는 다른 샘플 그룹에 동시에 포함되지 않으며, 각 샘플 그룹을 구성하는 사용자는 최소 100명 이상으로 설정한 경우 각 샘플의 특성을 정확하게 계산할 수 있는 것으로 연구 결과 확인할 수 있었다. Referring to FIG. 2, the sample group generation unit 111 generates at least one sample group from a set number of users randomly selected from the user's product purchase database. As shown in FIG. 3, each sample group SG is generated by randomly selecting users constituting a product purchase database as a whole group (TG) and a predetermined number of groups as a whole. In this case, the users included in each sample group are not included in the other sample groups at the same time, and if the users constituting each sample group are set to a minimum number of 100 or more, the results of the research can confirm that the characteristics of each sample can be accurately calculated .

유사도 계산부(113)는 샘플 그룹을 구성하는 사용자의 상품 구매 데이터베이스에서 사용자가 실제 구매한 상품의 동일 여부로 구매 유사도를 계산하고, 샘플 네트워크 생성부(115)는 구매 유사도가 임계 유사도 이상인 사용자 사이를 에지로 연결하여 각 샘플 그룹 단위로 샘플 네트워크를 형성한다.
The similarity calculation unit 113 calculates the similarity degree of purchase based on whether the product actually purchased by the user is the same or not in the product purchase database of the user constituting the sample group. Are connected by an edge to form a sample network in units of sample groups.

도 4는 본 발명에 따른 추천 성공율 계산부(140)를 보다 구체적으로 설명하기 위한 기능 블록도이다.4 is a functional block diagram for explaining the recommendation success rate calculation unit 140 according to the present invention in more detail.

도 4를 참고로 보다 구체적으로 살펴보면, 데이터 그룹 생성부(141)는 샘플 네트워크를 구성하는 사용자를 설정된 수의 훈련 데이터 그룹과 테스트 데이터 그룹으로 생성한다. 예를 들어, 각 샘플 네트워크가 100명의 사용자로 이루어진 경우, 샘플 네트워크에서 80%를 무작위로 추출하여 훈련 데이터 그룹을 생성하고 나머지 20%로 테스트 데이터 그룹을 생성한다.Referring to FIG. 4, the data group generating unit 141 generates a set of training data groups and test data groups for users constituting the sample network. For example, if each sample network consists of 100 users, a training data group is created by randomly extracting 80% from the sample network and a test data group is created with the remaining 20%.

추천 상품 추출부(143)는 훈련 데이터 그룹에 서로 상이한 적어도 1개 이상의 협업 필터링 기반의 추천 기법을 적용하여 각 추천 기법별로 추천 상품을 추출하며, 성공율 계산부(145)는 각 추천 기법별로 추출한 상품과 테스트 데이터 그룹의 사용자가 실제 구매한 상품을 비교하여 각 추천 기법별로 샘플 네트워크의 추천 성공율을 계산한다. 예를 들어, 협업 필터링 기반의 추천 기법1, 협업 필터링 기반의 추천 기법2, 협업 필터링 기반의 추천 기법3이 존재하는 경우, 샘플 네트워크에 추천 기법1, 추천 기법2, 추천 기법3을 적용하여 훈련 데이터 그룹에서 추천할 상품을 추출하고, 테스트 데이터 그룹에서 실제 사용자가 구매한 상품과 추천한 상품의 동일 여부의 비율로 추천 성공율을 계산한다.
The recommendation product extractor 143 extracts a recommendation product for each recommendation technique by applying at least one or more collaborative filtering-based recommendation techniques that are different from each other to the training data group, and the success rate calculator 145 calculates a product And the user of the test data group are compared with the actual purchase items, and the recommendation success rate of the sample network is calculated for each recommendation technique. For example, if there is a collaborative filtering based recommendation method 1, a collaborative filtering based recommendation method 2, and a collaborative filtering based recommendation method 3, the recommendation technique 1, the recommendation technique 2, and the recommendation technique 3 are applied to the sample network The recommendation success rate is calculated from the ratio of the number of products purchased by the actual user to the number of the recommended products in the test data group.

도 5는 본 발명에 따른 협업 필터링 기반의 상품 추천 시스템에서 상품 구매 데이터베이스의 데이터 특성에 따라 추천 기법을 선택하는 방법을 설명하기 위한 흐름도이다.FIG. 5 is a flowchart illustrating a method of selecting a recommendation technique according to data characteristics of a product purchase database in a product recommendation system based on a collaborative filtering according to the present invention.

도 5를 참고로 보다 구체적으로 살펴보면, 상품 구매 데이터베이스로부터 생성되는 샘플 그룹을 생성한다(S110). 샘플 그룹에 포함되어 있는 사용자 사이의 상품 구매 유사도에 기초하여 샘플 네트워크를 생성하고, 샘플 네트워크의 특성을 나타내는 샘플 특성 계수를 계산한다(S120). 본 발명에서 샘플 네트워크(S_i)의 특성을 나타내는 샘플 특성 계수로, 밀도 계수(DEN(S_i)), 포괄 계수(DIV(S_i)), 군집 계수(CLU(S_i)), 중심 계수(CEN(S_i))를 사용할 수 있다. Referring to FIG. 5, a sample group is generated from a product purchase database (S110). A sample network is created based on the similarity of goods purchase among users included in the sample group, and a sample characteristic coefficient indicating the characteristics of the sample network is calculated (S120). A sample characteristic coefficient representing characteristics of the sample network (S _i) in the present invention, a density coefficient (DEN (S _i)), cover factor (DIV (S _i)), cluster coefficient (CLU (S _i)), the center coefficient (CEN (S _i )) can be used.

밀도 계수는 샘플 네트워크(S_i)를 구성하는 사용자의 연결 관계에 기초하여 사용자가 얼마나 긴밀하게 연결되어 있는지를 나타내는 계수이며, 포괄 계수는 샘플 네트워크(S_i)를 구성하는 사용자의 연결 관계에 기초하여 주변 사용자와 연결되지 않는 사용자의 비율을 나타내는 계수이며, 군집 계수는 샘플 네트워크를 구성하는 사용자의 연결 관계에 기초하여 사용자와 연결되어 있는 주변 사용자 사이의 연결 정도를 나타내는 계수이며, 중심 계수는 샘플 네트워크를 구성하는 사용자의 연결 관계에 기초하여 사용자를 중심으로 주변 사용자와 얼마나 집중적으로 연결되어 있는지를 나타내는 계수이다.Density coefficient is a sample network based on the user's connection relations constituting the (S _i) and the coefficient indicating that the user is how tightly bound, cover factor is based on the user's connection relationships that make up the sample network (S _i) And the cluster coefficient is a coefficient indicating the degree of connection between the user and the surrounding users connected based on the connection relationship of the users constituting the sample network, It is a coefficient indicating how intensively the user is connected to the surrounding user based on the connection relationship of the users constituting the network.

[수학식 1][Equation 1]

여기서 k_i는 샘플 네트워크(S_i)의 전체 연결 수를 의미하며, n_i은 샘플 네트워크(S_i)를 구성하는 전체 사용자 수인 것을 특징으로 한다. 밀도 계수의 특성 계수는 샘플 구매 데이터베이스의 데이터 희박성의 문제점을 극복하기 위한 것으로, 밀도 계수의 증가는 데이터 희박성의 감소를 의미한다.Where k _i is the number of complete connection of a sample network (S _i) and, n _i is characterized in that the number of users constituting the sample network (S _i). The coefficient of characteristic of the density coefficient is intended to overcome the problem of data sparseness in the sample purchase database, and an increase in the density coefficient means a decrease in data sparseness.

[수학식 2]&Quot; (2) "

여기서 n_it는 샘플 네트워크(Si)에서 주변 사용자와 연결되어 있지 않은 사용자의 수인 것을 특징으로 한다. 포괄 계수의 특성 계수는 샘플 구매 데이터베이스에서 이질적이고 독특한 구매 행위를 가지는 사용자에 상품 추천을 하는 경우에 발생하는 문제점을 해결하기 위한 것으로, 포괄 계수의 증가는 독특한 구매 행위를 가지는 고객의 감소를 의미한다.Where n _it is the number of users who are not connected to the surrounding users in the sample network Si. The characteristic coefficient of the comprehensive coefficient is intended to solve a problem that occurs when a product recommendation is made to a user having a heterogeneous and unique purchase behavior in the sample purchase database, and the increase of the coverage coefficient means a decrease of the customer having a unique purchase behavior .

[수학식 3]&Quot; (3) "

[수학식 4] &Quot; (4) "

여기서 triple_p는 사용자(p)에 2명의 주변 사용자가 연결되어 있는 수를 의미하며, triangle_p는 사용자(p)에 연결되어 있는 2명의 주변 사용자가 서로 연결되어 있는 수를 의미하는 것을 특징으로 한다. 예를 들어, 도 8에 도시되어 있는 바와 같이, 사용자 A가 주변 사용자 B, C, D와 연결되어 있으며, 주변 사용자 B와 C가 서로 연결되어 있는 경우, 사용자 A는 주변 사용자 (B, C), 주변 사용자 (B, D), 주변 사용자 (B, D)와 연결되어 있어 사용자 A의 triple_p는 3개이며, 사용자 A의 주변 사용자 (B, C)는 서로 연결되어 있어 사용자 A의 triangle_p는 1이다. 군집 계수의 증가는 샘플 구매 데이터베이스에서 우연성에 의한 상품 추천이 어려움을 의미한다.Here, triple _p means the number of two users connected to the user (p), and triangle _p means the number of two peripheral users connected to the user (p) connected to each other . For example, as shown in FIG. 8, when the user A is connected to the neighboring users B, C, and D, and the neighboring users B and C are connected to each other, , around the user (B, D), around the user (B, D) and is associated, and the three triple _p of the user a, the user a's ambient user (B, C) will stay connected to each other in the user a triangle _p Lt; / RTI > The increase in cluster count means that it is difficult to recommend products by chance in the sample purchase database.

[수학식 5]&Quot; (5) "

[수학식 6]&Quot; (6) "

여기서 m_j는 사용자 U_l이 샘플 네트워크와 주변 사용자(U_j)와 연결되어 있으며 1의 값을 가지며 주변 사용자 (U_j)와 연결되어 있지 않은 경우 0의 값을 가지는 것을 특징으로 한다. 집중 계수의 증가는 특정 사용자에 의존하여 상품을 추천함으로써 다양한 상품의 추천이 불가능하며 추천 범위의 감소를 의미한다.Where m _j is a value of _{0 when} the user U _l is connected to the sample network and the neighboring user U _j and has a value of 1 and is not connected to the neighboring user U _j . The increase of the concentration factor means that it is impossible to recommend various products by recommending a product depending on a specific user, which means a reduction in the recommended range.

샘플 네트워크(S_i)에 대해 서로 상이한 적어도 1개 이상의 협업 필터링 기반의 추천 기법(R_j)을 적용하여 각 추천 기법에 대한 샘플 네트워크의 추천 성공율(F(S_i, R_j))을 계산하고(S130), 샘플 특성 계수를 독립 변인으로 하고 추천 성공율을 종속 변인으로 하여 각 추천 기법별로 다중 회귀식을 생성하는데, 각 추천 기법별 다중 회귀식에서 샘플 특성 계수의 유의성을 판단한다(S140). 다중회귀분석(multiple regression analysis)은 독립변수의 수가 여러 개인 회귀분석을 의미하는 것으로, 변수의 개수가 같아도 포함된 변수에 따라 모형은 다르게 나타날 수 있으며, 여러 개의 변수 중에서 통계적으로 유의한 것만을 선택할 수 있다.Sample is applied to the network (S _i) different at least one or more collaborative filtering based recommendation techniques (R _j) to each other for calculating the like success rate _{_{(F (S i, R j}} )) of a sample network for each referral technique (S130). In step S140, a multiple regression equation is generated for each recommendation technique with the sample property coefficient as an independent variable and the recommendation success rate as a dependent variable. In step S140, the significance of the sample property coefficient is determined in a multiple regression equation for each recommendation technique. Multiple regression analysis means a regression analysis with multiple independent variables. Even if the number of variables is the same, the model can be different depending on the variables involved. Only a statistically significant number of variables can be selected .

아래의 수학식(7)은 본 발명에서 사용되는 다중 회귀 분석식의 일 예를 나타낸 것으로,Equation (7) below shows an example of the multiple regression analysis equation used in the present invention,

[수학식 7]&Quot; (7) "

여기서 Rj는 협업 필터링 기반의 여러 추천 기법 중 j번째 추천 기법을 의미한다.Here, Rj denotes the jth recommendation technique among a plurality of recommendation techniques based on collaborative filtering.

본 발명에서는 샘플 특성 계수를 독립 변인으로 하고 추천 성공율을 종속 변인으로 하여 SAS (Statistical Analysis System), SPSS (Statistical Package for the Social Sciences), LISREL (Linear Structural RELation), EQS (Equations), AMOS (Analysis of Moment Structure) 등의 다중회귀분석 알고리즘을 이용하여 다중회귀분석식의 표준계수(β₁, β₂, β₃, β₄)를 계산함으로써 각 샘플 특성 계수의 유의성을 판단할 수 있는데 이에 대한 상세한 설명은 생략한다.In the present invention, it is assumed that the sample characteristic coefficient is an independent variable, and the recommendation success rate is a dependent variable, such as SAS (Statistical Analysis System), SPSS (Statistical Package for the Social Sciences), LISREL (Linear Structural RELation), EQS (β ₁ , β ₂ , β ₃ , β ₄ ) of multiple regression analysis using multiple regression analysis algorithms such as Moment Structure The description is omitted.

각 추천 기법의 다중 회귀식에서 공통적으로 유의성을 가지는 것으로 판단되는 특성 계수에 대해서만 상품 구매 데이터베이스를 구성하는 모든 사용자로 이루어진 전체 네트워크를 기준으로 전체 특성 계수를 계산하고(S150), 전체 특성 계수를 각 추천 기법의 다중 회귀식에 적용하여 가장 높은 추천값을 가지는 추천 기법을 상품 구매 데이터베이스의 추천 기법으로 선택한다(S160). 전체 특성 계수는 앞서 설명한 샘플 특성 계수와 동일한 방식으로 전체 네트워크를 기준으로 계산된다. The total characteristic coefficient is calculated on the basis of the entire network consisting of all users constituting the product purchase database only for the characteristic coefficient which is determined to have a common significance in the multiple regression equation of each recommendation technique (S150) And the recommendation technique having the highest recommendation value is selected as a recommendation technique of the purchase database (S160). The total characteristic coefficients are calculated on the basis of the entire network in the same manner as the sample characteristic coefficients described above.

여기서 유의성이란 샘플집단에 대한 가설이 가지는 통계적 의미를 말한다. 다시 말해서, 어떤 실험 결과 자료를 두고 "통계적으로 유의하다."라고 하는 것은 확률적으로 봐서 단순한 우연이라고 생각되지 않을 정도로 의미가 있다는 뜻이다. 반대로 "통계적으로 유의하지 않다."라고 하는 것은 실험 결과가 단순한 우연일 수도 있다는 뜻이다.
Here, significance is the statistical meaning of the hypothesis for the sample group. In other words, "statistically significant" of some experimental data means that it is meaningless enough that it is not considered a mere chance by chance. On the contrary, "not statistically significant" means that the experimental result may be a mere coincidence.

도 6은 본 발명에 따른 샘플 특성 계수를 계산하는 단계를 보다 구체적으로 설명하기 위한 흐름도이다.6 is a flowchart for explaining the step of calculating the sample characteristic coefficient according to the present invention in more detail.

도 6을 참고로 보다 구체적으로 살펴보면, 샘플 그룹을 구성하는 사용자 사이의 구매 유사도를 계산하고(S121), 구매 유사도가 임계 유사도 이상인 사용자 사이를 연결하여 샘플 네트워크를 생성한다. 도 7은 상품 구매 데이터베이스의 일 예를 도시하고 있는데, 각 사용자별로 구매한 상품의 리스트가 행렬 형태로 저장되어 사용자가 상품을 구매한 경우 1의 값이 할당되며 사용자가 상품을 구매하지 않은 경우 0의 값이 할당된다. 상품 구매 데이터베이스에서 무작위로 설정된 수의 사용자를 추출하여 샘플 그룹을 생성하고, 샘플 그룹을 구성하는 사용자의 상품 구매 유사도를 계산하는데, 전체 상품 수에서 사용자 사이 공통으로 구매한 상품의 수의 비율로 상품 구매 유사도를 계산한다. 본 발명이 적용되는 분야에 따라 상품 구매 유사도를 계산하기 위하여 피어슨 상관 계수(Pearson’s correlation coefficient)를 사용할 수 있으며, 이는 본 발명의 범위에 속한다.6, a purchase similarity degree between users constituting a sample group is calculated (S121), and a sample network is created by connecting users having a purchase similarity degree equal to or higher than the threshold similarity degree. FIG. 7 shows an example of a product purchase database. A list of products purchased for each user is stored in a matrix form, and a value of 1 is assigned when the user purchases the product. In the case where the user does not purchase the product, Is assigned. A sample group is generated by extracting a randomly set number of users from the product purchase database, and the similarity degree of a product constituting the sample group is calculated. The purchase similarity is calculated. According to the field to which the present invention is applied, a Pearson's correlation coefficient can be used to calculate the similarity of purchase of goods, which falls within the scope of the present invention.

샘플 네트워크에서 사용자 사이의 연결 관계를 이용하여 샘플 네트워크의 특성을 나타내는 샘플 특성 계수를 계산한다(S125).
A sample property coefficient indicating the characteristics of the sample network is calculated using the connection relationship between users in the sample network (S125).

도 9는 본 발명에 따른 추천 성공율을 계산하는 단계를 보다 구체적으로 설명하기 위한 흐름도이다.9 is a flowchart for explaining the step of calculating the recommendation success rate according to the present invention in more detail.

도 9를 참고로 살펴보면, 상기 샘플 네트워크를 구성하는 사용자를 설정된 수의 훈련 데이터 그룹과 테스트 데이터 그룹으로 구분하여 생성하고(S131), 훈련 데이터 그룹에 서로 상이한 적어도 1개 이상의 협업 필터링 기반의 추천 기법을 적용하여 추천 기법별로 추천 상품을 추출한다(S133).Referring to FIG. 9, a user configuring the sample network is divided into a training data group and a test data group in a predetermined number (S131), and at least one recommendation technique based on a collaboration filtering And the recommended product is extracted for each recommendation technique (S133).

추출한 상품과 테스트 데이터 그룹의 사용자가 실제 구매한 상품을 비교하여 추천 기법별로 샘플 네트워크의 추천 성공율을 계산한다(S135). 본 발명에서 추천 성공율은 정확성(precision)과 재현율(recall)에 기초하여 아래의 수학식(8)과 같이 계산할 수 있다.The extracted product is compared with a product actually purchased by the user of the test data group, and the recommendation success rate of the sample network is calculated for each recommendation technique (S135). The recommendation success rate in the present invention can be calculated according to the following equation (8) based on the precision and the recall.

[수학식 8]&Quot; (8) "

여기서 C_re는 재현율을 의미하는데 재현율은 전체 상품 수와 추천에 의해 구매한 상품의 수의 비율로 계산되며, C_pre는 정확성을 의미하는데 추천 상품 수와 추천에 구매한 상품의 수의 비율로 계산된다.
Here, C _re refers to the recall rate, which is calculated as the ratio of the total number of products and the number of products purchased by recommendation, and C _pre means accuracy. The ratio is calculated as the ratio of the number of recommended products to the number of products purchased do.

한편, 상술한 본 발명의 실시 예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터로 읽을 수 있는 기록 매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다.The above-described embodiments of the present invention can be embodied in a general-purpose digital computer that can be embodied as a program that can be executed by a computer and operates the program using a computer-readable recording medium.

상기 컴퓨터로 읽을 수 있는 기록 매체는 전기 또는 자기식 저장 매체(예를 들어, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등) 및 캐리어 웨이브(예를 들면, 인터넷을 통한 전송)와 같은 저장 매체를 포함한다.
The computer-readable recording medium may be an electrically or magnetic storage medium such as a ROM, a floppy disk, a hard disk, etc., an optical reading medium such as a CD-ROM or a DVD and a carrier wave, , Transmission over the Internet).

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

110: 네트워크 생성부 120: 상품 구매 데이터베이스
130: 특성 계수 계산부 140: 추천 성공율 계산부
150: 유의성 판단부 160: 추천 기법 선택부110: Network Generator 120: Product Purchase Database
130: characteristic coefficient calculation unit 140: recommended success rate calculation unit
150: Significance determiner 160: Recommendation technique selector

Claims

A method for selecting a recommendation scheme in a product recommendation system based on collaborative filtering,
Creating a plurality of sample groups from a product purchase database;
Generating a sample network for each sample group based on the similarity degree of goods purchase among users included in each sample group, and calculating sample characteristic coefficients indicating characteristics of each of the sample networks;
Computing a recommendation success rate of the sample network for each recommendation technique by applying a collaborative filtering based multiple recommendation scheme to the sample network;
Generating a multiple regression equation for each of the recommendation techniques with the sample characteristic coefficient as an independent variable and the recommendation success rate as a dependent variable; and determining significance of the sample characteristic coefficient in a multiple regression equation for each recommendation technique;
Determining a type of a common sample characteristic coefficient that is common in the multiple regression equation for each recommendation technique;
Calculating total characteristic coefficient of the same kind as the common sample characteristic coefficient based on the entire network made up of all users constituting the goods purchase database; And
And selecting the recommendation technique having the highest recommendation value as the recommendation technique of the product purchase database by applying the total feature coefficient to the multiple regression formula of each recommendation technique.

The method of claim 1,
Wherein a plurality of users are randomly selected from a set number of users randomly selected from the product purchase database of the user.

3. The method of claim 2, wherein the step of calculating a sample characteristic coefficient indicative of a characteristic of the sample network
Calculating a purchase similarity degree among users constituting the sample group;
Connecting the users whose purchase similarity is equal to or greater than the threshold similarity to generate a sample network; And
And calculating a sample characteristic coefficient indicating a characteristic of the sample network using a connection relationship between users in the sample network.

4. The method of claim 3, wherein the sample property coefficient indicating the characteristics of the sample network is
The sample network density coefficient based on the user's connection relations constituting the (S _i) indicating that the user is much tightly connected (DEN (S _i)), connection of the user to configure the sample network (S _i) (DIV (S _i )) indicating a ratio of users who are not connected to nearby users based on the relationship, a degree of connection between the users connected to the user based on the connection relationships of the users constituting the sample network of showing clustering coefficient (CLU (S _i)) and the sample on the basis of the user of the connection relationships that make up the network focus indicating around the user is connected to how concentrated the surrounding users coefficient (CEN (S _i)), at least A method of selecting a recommendation technique that is one of the two.

The method according to claim 4, wherein the density coefficient (DEN (S _i )) is calculated by the following equation (1)
[Equation 1]

Where k _i is a sample network means the total number of connections (S _i) and, n _i is a method of selecting a network like the sample, it characterized in that the number of users constituting the (S _i) techniques.

The method according to claim 4, wherein the coverage coefficient (DIV (S _i )) is calculated by the following equation (2)
&Quot; (2) "

Where n _it is the number of users who are not connected to the perimeter user in the sample network (Si).

The method according to claim 4, wherein the cluster coefficient (CLU (S _i )) is calculated by the following equation (3)
&Quot; (3) "

Where n _v is the number of users who have two or more connections with nearby users, V is a set of users with two or more connections with neighboring users, and clu (p) (P) is calculated as an individual cluster coefficient of any one of the user sets as shown in the following equation (4)
&Quot; (4) "

Here, triple _p means the number of two users connected to the user (p), and triangle _p means the number of two peripheral users connected to the user (p) connected to each other How to choose a recommendation technique.

The method according to claim 4, wherein the concentration factor (CEN (S _i )) is calculated by the following equation (5)
&Quot; (5) "

The cen (u _l) is a sample network means an individual concentration factor of the user (u _l) constituting the ^{(Si), cen (u *} ) is the highest individual jipju coefficient of users constituting a sample network (S _i) &Quot;
Here, the individual concentration factor is calculated by the following equation (6)
&Quot; (6) "

Where m _j is a value of _{0 if} the user U _l is connected to the sample network and the neighboring user U _j and has a value of 1 and is not connected to the surrounding user U _j . How to choose.

5. The method of claim 4, wherein calculating the sample network recommendation success rate
Dividing a user constituting the sample network into a set number of training data groups and test data groups;
Extracting a recommended product for each of the recommendation techniques by applying a plurality of recommendation techniques based on collaborative filtering to the training data group; And
Comparing the extracted product with a product actually purchased by a user of the test data group, and calculating a recommendation success rate of the sample network according to the recommendation technique.

delete

A network generating unit for generating a plurality of sample groups from a set number of users randomly selected from a user's product purchase database and generating sample networks based on product purchase similarities among users included in sample groups for each sample group;
A characteristic coefficient calculation unit for calculating a total characteristic coefficient indicating a characteristic of the entire network based on a sample characteristic coefficient indicating a characteristic of the sample network on the basis of the sample network and an entire network made up of all users constituting the product purchase database, ;
A recommendation success rate calculator for calculating a recommendation success rate of the sample network for each recommendation scheme by applying a multiple recommendation scheme based on collaborative filtering based on the sample network;
A significance determiner for determining a significance of the sample characteristic coefficient in a multiple regression equation for each recommendation technique by using the sample characteristic coefficient as an independent variable and the recommendation success rate as a dependent variable to generate a multiple regression equation for each recommendation technique; And
The type of the common sample characteristic coefficient having common significance is determined in the multiple regression equations of each recommendation technique among the sample characteristic coefficients and the total characteristic coefficient of the same kind as the common sample characteristic coefficient is applied to the multiple regression equations of each recommendation technique, And a recommendation scheme selection unit that selects a recommendation scheme having a high recommendation value as a recommendation scheme of the product purchase database.

12. The apparatus of claim 11, wherein the network generating unit
A sample group generation unit that generates a plurality of sample groups from a set number of users randomly selected from a user's product purchase database;
A similarity calculation unit for calculating a similarity degree of purchase between users constituting the sample group; And
And a sample network generator for connecting the users whose purchase similarity is equal to or greater than the threshold similarity to form a sample network.

13. The apparatus of claim 12, wherein the characteristic coefficient calculation unit
The sample network (S _i) or density coefficient (DEN (S _i)) indicating that the user is much tightly connected to the basis of the user of the connection between which constitutes the entire network (T), the sample network (S _i (DIV (S _i )) indicating a ratio of a user who is not connected to a neighboring user based on a connection relationship of a user constituting the entire network T or a connection relation of a user constituting the entire network T, (CLU (S _i )) indicating the degree of connection between the user and the surrounding users based on the connection relationship of the users constituting the network and the connection relationship of the users constituting the sample network or the entire network (T) And a concentration factor CEN (S _i ) indicating how intensively the user is connected to the surrounding user based on the user A feature selection device for feature selection.

14. The apparatus according to claim 13, wherein the recommendation success rate calculation unit
A data group generating unit configured to generate a training data group and a test data group with a predetermined number of users constituting the sample network;
A recommendation product extracting unit for extracting a recommendation product for each recommendation technique by applying a multiple recommendation scheme based on collaborative filtering to the training data group; And
And a success rate calculator for comparing the extracted product with a product actually purchased by a user of the test data group and calculating a recommendation success rate of the sample network for each recommendation technique.

delete