KR100447137B1

KR100447137B1 - A extraction and prediction method of latent user's preference

Info

Publication number: KR100447137B1
Application number: KR10-2001-0019165A
Authority: KR
Inventors: 이필규; 정준
Original assignee: 학교법인 인하학원
Priority date: 2001-04-11
Filing date: 2001-04-11
Publication date: 2004-09-04
Also published as: KR20020080063A

Abstract

본 발명은 사용자들에 의하여 입력되어진 단계적 평가 행렬을 사용하여 잠재적인 사용자 선호도를 추출 및 예측함에 있어서 입력 행렬의 노이즈 제거와 의미 있는 차원 감소를 통하여 예측 정확도를 높이고, 수행시간을 줄일 수 있는 잠재적인 사용자 선호도 추출 및 예측방법에 관한 것으로서, 유일값 분석(SVD) 기술을 이용하여 잠재적인 사용자 선호도를 추출하고, 이때 입력행렬의 노이즈 제거와 의미있는 차원감소를 수행하는 제1 과정; 사용자들을 의미있는 차원공간으로 매핑하되, 사용자들을 축소된 차원공간에서 한 벡터로 표현하는 제2 과정; 사용자들을 나타내는 벡터를 이용하여, 각 사용자들 간의 유사성을 나타내는 거리를 구하는 제3 과정; 각각의 사용자와 유사한 사용자들을 선택하기 위하여 일정 크기의 임계치를 이용하여 유사성이 떨어지는 사용자들을 여과하는 제4 과정; 더 유사한 사용자들에게 더 큰 영향력을 주기 위하여 가중치를 구하는 제5 과정; 및 모든 유사한 사용자들의 가중치가 부여된 평가값의 평균을 이용하여 아이템들에 대한 사용자들의 선호도를 구하는 제6 과정을 포함하여 이루어지는 것이다.The present invention provides a potential for improving prediction accuracy and reducing execution time through noise reduction and meaningful dimension reduction of input matrix in extracting and predicting potential user preferences using a staged evaluation matrix inputted by users. A method for extracting and predicting user preferences, the method comprising: extracting potential user preferences using a unique value analysis (SVD) technique, and performing noise removal and significant dimensional reduction of an input matrix; A second process of mapping users to a meaningful dimension space, but representing the users as a vector in the reduced dimension space; A third step of obtaining a distance representing similarity between each user by using a vector representing the users; A fourth step of filtering users with less similarity using a threshold of a predetermined size to select users similar to each user; A fifth step of obtaining weights to give more influence to more similar users; And a sixth process of obtaining the users' preferences for the items by using the average of the weighted evaluation values of all similar users.

Description

How to extract and predict potential user preferences {A EXTRACTION AND PREDICTION METHOD OF LATENT USER'S PREFERENCE}

본 발명은 잠재적인 사용자 선호도 추출 및 예측방법에 관한 것으로서, 보다 상세하게는 유일값 분석(Singular Value Decomposition : SVD) 기술을 이용하여 멀티미디어 아이템에 대한 사용자 선호도를 추출 및 예측하는데 사용자들에 의하여 입력되어진 단계적 평가 행렬을 사용하여 잠재적인 사용자 선호도를 추출 및 예측하게 되는 잠재적인 사용자 선호도 추출 및 예측방법에 관한 것이다.The present invention relates to a method of extracting and predicting potential user preferences. More specifically, the present invention relates to a method of extracting and predicting user preferences for multimedia items using singular value decomposition (SVD) technology. The present invention relates to a method of extracting and predicting potential user preferences by using a stepwise evaluation matrix to extract and predict potential user preferences.

이하에서 설명하고자 하는 본 발명은 협력적 정보 여과라는 기술의 핵심적인 부분으로 고려될 수 있다. 협력적 정보 여과라는 것은 협력적 여과는 사용자의 아이템에 대한 단계적 평가를 기초하여 그 평가 패턴이 유사한 사용자를 찾아 그 사용자들이 선호한 아이템을 상대방에게 교차 추천을 해주는 방법이다.The present invention to be described below can be considered as an essential part of the technique of collaborative information filtration. Collaborative information filtration is a method of collaborative filtration based on a gradual evaluation of a user's items to find users with similar evaluation patterns and cross recommending the items that the users prefer.

종래의 경우에, 협력적 정보 여과에는 이하에서 설명하는 바와 같이 많은 방법들이 사용되고 있다.In the conventional case, many methods are used for collaborative information filtration as described below.

첫 번째로 상관계수(correlation coefficient)를 사용하여 사용자들의 유사도를 구하는 방법이다.The first method is to find the similarity among users using the correlation coefficient.

상관계수를 이용하는 방법은 사용자가 평가하지 않는 아이템에 대하여 관찰된 평가의 선형 조합을 사용한다. 상관계수를 이용하는 방법의 주요한 아이디어는 유사한 선호도를 가지는 사용자들에게 더 큰 가중치를 부여하기 위하여 형식적인 유사성 측정을 사용하는 것이다. 상관계수를 구하는 방법 중에서 대표적인 방법은 Pearson r 방법이다.The method using the correlation coefficient uses a linear combination of the observed ratings for items that the user does not rate. The main idea of how to use correlation coefficients is to use formal similarity measures to give greater weight to users with similar preferences. Among the methods for obtaining the correlation coefficient, the representative method is the Pearson r method.

(x_i ~,~ y_i), i=1,...,N의 쌍에서 Pearson r은 다음과 같은 수학식 1로 주어진다.Pearson r in a pair of (x_i ~, ~ y_i), i = 1, ..., N is given by the following equation (1).

두 번째로 벡터 유사도를 이용하는 방법이다.Secondly, vector similarity is used.

정보 검색에서 두 문서의 유사도는 각 문서를 단어 빈도수의 벡터로 취급하여 두 빈도 벡터가 이루는 코사인 각을 계산하여 측정된다. 이러한 형식을 협력적 여과에 적용하면 사용자는 문서의 역할을 하고 아이템은 단어의 역할를 하고 단계적 평가는 단어 빈도의 역할을 하게 된다. 이런 알고리즘에 관찰된 투표들은 긍정적인 선호를 나타내고 부정적인 투표는 아무런 역할을 하지 않는다. 관찰되지 않은 아이템는 0의 값을 가진다.The similarity of two documents in information retrieval is measured by treating each document as a vector of word frequency and calculating the cosine angle formed by the two frequency vectors. Applying this form to collaborative filtration allows users to act as documents, items as words, and staged evaluation as word frequencies. The votes observed in this algorithm show positive preferences and negative votes play no role. Unobserved items have a value of zero.

관련성 가중치는 다음과 같은 수학식 2로 주어진다.Relevance weights are given by Equation 2 below.

여기서, 분모에서 제곱된 항은 투표의 정규화의 역할을 해서 더 많은 아이템에 대해서 투표를 한 사용자들은 다른 사용자들과 연역적으로 유사하지 않을 것이다.Here, the squared term in the denominator serves as the normalization of voting so that users who vote for more items will not be deductively similar to other users.

세 번째로 군집화 방법이 있다.Third, there is a clustering method.

한 가지 가능한 방법은 비교적 적은 수의 분산된 값에 대해서 관찰되지 않은 클래스 변수 C에 대한 멤버쉽이 주어지면 투표들의 확률 분포는 조건적으로 독립적이다. 아이디어는 선호와 취향의 공통적인 집합을 가지는 사용자들의 형태 혹은 어떤 그룹이 있다는 것이다. 클래스가 주어지면 다양한 아이템에 관련된 선호도는 독립적이다. 투표와 클래스의 결합 확률 분포를 다루기 쉬운 조건적이고 최저에 가까운 분포의 집합으로 관련 시켜주는 확률 분포는 다음과 같은 표준적인 "naive"Bayes 공식으로 아래 수학식 3과 같이 나타낼 수 있다.One possible way is that the probability distribution of votes is conditionally independent given membership to class variable C that is not observed for a relatively small number of distributed values. The idea is that there are some groups or types of users who have a common set of preferences and tastes. Given a class, the preferences associated with the various items are independent. Probability distributions that relate the combined probability distribution of voting and class to a set of conditional and nearest distributions that are easy to handle can be expressed by Equation 3 below using the standard "naive" Bayes formula:

그런데, 상기와 같은 종래의 기술에서 사용중인 인증 방법들은 수행시간, 예측의 정확성, 잠재적인 의미 추출 등에 관하여 적용되기 어려운 문제점을 가지고 있다.By the way, the authentication methods used in the prior art as described above has a problem that is difficult to apply in terms of execution time, prediction accuracy, potential semantic extraction.

본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로서, 그 목적은 사용자들에 의하여 입력되어진 단계적 평가 행렬을 사용하여 잠재적인 사용자 선호도를 추출 및 예측함에 있어서 입력 행렬의 노이즈 제거와 의미 있는 차원 감소를 통하여 예측 정확도를 높이고, 수행시간을 줄일 수 있는 잠재적인 사용자 선호도 추출 및 예측방법을 제공하는데 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems of the prior art, and its purpose is to remove noise and semantics of input matrix in extracting and predicting potential user preferences by using a stepwise evaluation matrix inputted by users. The present invention provides a potential user preference extraction and prediction method that can increase prediction accuracy and reduce execution time through dimension reduction.

도 1은 본 발명에 따른 인터넷 기반으로 SVD를 이용한 멀티미디어 아이템을 추천하기 위한 시스템을 설명하기 위한 도면1 is a diagram illustrating a system for recommending a multimedia item using SVD based on the Internet according to the present invention.

도 2는 도 1에 적용된 시스템의 구성을 설명하기 위한 개략적인 블록도FIG. 2 is a schematic block diagram illustrating the configuration of a system applied to FIG. 1.

도 3a 내지 도 3c는 본 발명에 따른 시스템의 데이터베이스 테이블을 도시한 도면3A-3C show database tables of a system according to the present invention.

도 4는 본 발명의 시스템에 적용되는 전체적인 알고리즘에 대한 설명4 is a description of the overall algorithm applied to the system of the present invention

도 5는 멀티미디어 아이템에 대한 추천을 수행하기 위한 시스템의 전반적인 동작을 나타내는 흐름도5 is a flow diagram illustrating the overall operation of a system to perform a recommendation for a multimedia item.

상기와 같은 목적을 달성하기 위한 본 발명의 특징에 따르면, 유일값 분석(SVD) 기술을 이용하여 잠재적인 사용자 선호도를 추출하고, 이때 입력행렬의 노이즈 제거와 의미있는 차원감소를 수행하는 제1 과정; 사용자들을 의미있는 차원공간으로 매핑하되, 사용자들을 축소된 차원공간에서 한 벡터로 표현하는 제2 과정; 사용자들을 나타내는 벡터를 이용하여, 각 사용자들 간의 유사성을 나타내는 거리를 구하는 제3 과정; 각각의 사용자와 유사한 사용자들을 선택하기 위하여 일정 크기의 임계치를 이용하여 유사성이 떨어지는 사용자들을 여과하는 제4 과정;더 유사한 사용자들에게 더 큰 영향력을 주기 위하여 가중치를 구하는 제5 과정; 및 모든 유사한 사용자들의 가중치가 부여된 평가값의 평균을 이용하여 아이템들에 대한 사용자들의 선호도를 구하는 제6 과정;을 포함하여 이루어지는 잠재적인 사용자 선호도 추출 및 예측방법을 제공한다.According to a feature of the present invention for achieving the above object, a first process of extracting the potential user preferences using the unique value analysis (SVD) technology, the noise removal of the input matrix and the meaningful dimensional reduction ; A second process of mapping users to a meaningful dimension space, but representing the users as a vector in the reduced dimension space; A third step of obtaining a distance representing similarity between each user by using a vector representing the users; A fourth process of filtering out less similar users using a threshold of a predetermined size to select users similar to each user; a fifth process of obtaining weights to give more influence to more similar users; And a sixth process of obtaining the user's preferences for the items using the average of the weighted evaluation values of all similar users.

본 발명의 상술한 목적과 여러 가지 장점은 이 기술 분야에 숙련된 사람들에 의해 첨부된 도면을 참조하여 후술되는 발명의 바람직한 실시예로부터 더욱 명확하게 될 것이다.The above objects and various advantages of the present invention will become more apparent from the preferred embodiments of the invention described below with reference to the accompanying drawings by those skilled in the art.

이하, 본 발명의 바람직한 실시예를 첨부한 도면을 참조하여 상세히 기술하고자 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 인터넷 기반으로 SVD를 이용한 멀티미디어 아이템을 추천하기 위한 시스템을 설명하기 위한 도면이고, 도 2는 도 1에 적용된 시스템의 구성을 설명하기 위한 개략적인 블록도이고, 도 3a 내지 도 3c는 본 발명에 따른 시스템의 데이터베이스 테이블을 도시한 도면이고, 도 4는 본 발명의 시스템에 적용되는 전체적인 알고리즘에 대한 설명이며, 도 5는 멀티미디어 아이템에 대한 추천을 수행하기 위한 시스템의 전반적인 동작을 나타내는 흐름도이다.1 is a view for explaining a system for recommending a multimedia item using SVD based on the Internet according to the present invention, FIG. 2 is a schematic block diagram for explaining the configuration of the system applied to FIG. 3C is a diagram showing a database table of a system according to the present invention, FIG. 4 is a description of an overall algorithm applied to the system of the present invention, and FIG. 5 is an overall operation of a system for performing a recommendation for a multimedia item. It is a flow chart showing.

상기 도 1을 참조하면, 각 사용자 PC들이 인터넷을 통하여 웹서버에 연결이 되며, 상기 웹서버에 DBMS와 선호도 추출 및 예측서버가 연결된다.Referring to FIG. 1, each user PC is connected to a web server through the Internet, and a DBMS and a preference extraction and prediction server are connected to the web server.

이때, 도 2에 도시된 바와 같은 구성을 통하여 사용자 정보, 아이템 정보, 단계적 평가정보 등을 입력받아서 아이템에 대한 선호도 정보를 구하게 되는데, 상기 사용자 정보는 도 3a에 도시된 바와 같은 형태를 갖고, 아이템 정보는 도 3b에 도시된 바와 같은 형태를 갖고, 단계적 평가정보는 도 3c에 도시된 바와 같은 형태를 갖고, 선호도 정보는 도 3d에 도시된 바와 같은 형태를 갖게 된다.상기 도 2에 도시된 바와 같이, 사용자 정보, 아이템 정보, 단계적 평가정보는 각각 사용자 정보 관리자(User Information Manager), 아이템정보 관리자(Item Manager), 단계적평가정보 관리자(Rating Manager)에 의해 관리된다. 유일값 분석자(Singular Value Decomposer)는 상기 사용자 정보 관리자(User Information Manager), 아이템정보 관리자(Item Manager), 단계적평가정보 관리자(Rating Manager)로부터 사용자 정보, 아이템 정보, 단계적 평가정보를 입력받아 후술하는 방법에 따라 유일값 분석(SVD)을 수행한다. 잠재적인 선호도 추출부(Latent Preference Extractor)는 상기 유일값 분석이 수행된 결과를 이용하여 후술하는 방법에 따라 잠재적인 사용자 선호도를 추출한다. 선호도 예측부(Preference Predictor)는 후술하는 방법에 따라 사용자들의 아이템 선호도를 예측한다. 추천 관리자(Recommendation Manager)는 평가 패턴이 유사한 사용자를 찾아 그 사용자들의 아이템에 대한 선호도 정보를 관리하여 협력적 여과가 가능하도록 한다.At this time, preference information for the item is obtained by receiving user information, item information, step-by-step evaluation information, and the like as shown in FIG. 2, wherein the user information has a form as shown in FIG. The information has a form as shown in FIG. 3B, the staged evaluation information has a form as shown in FIG. 3C, and the preference information has a form as shown in FIG. 3D. Similarly, user information, item information, and graded evaluation information are managed by a user information manager, an item information manager, and a graded evaluation information manager, respectively. A singular value decomposer receives user information, item information, and stepwise evaluation information from the user information manager, item manager, and rating manager, which will be described later. Perform unique value analysis (SVD) according to the method. The latent preference extractor extracts potential user preferences according to a method described below using the result of the unique value analysis. The preference predictor predicts item preferences of users according to a method described below. Recommendation Manager finds users who have similar evaluation patterns and manages preference information on their items to enable collaborative filtering.

이하에서 도 4 및 도 5를 참조하여 설명하면, 잠재적인 의미 구조(latent semantic structure)는 문서를 검색하기 위한 방법으로 사용되어 왔다. 주로 용어(term)와 문서(document)의 행렬을 특별한 잠재적인 의미 구조를 파생하기 위하여 유일값 분석(Singular Value Decomposition : SVD)에 의해서 분석이 되었다. 상기 유일값 분석(SVD)은 eigenvecter decomposition, spectral analysis, factor analysis 등과 같은 여러 분야의 수학과 통계분야에 밀접한 관련이 있다.상기 도 4에 도시된 바와 같이, 본 발명의 시스템에 적용되는 전체적인 알고리즘은,1. 사용자들의 단계적 평가 행렬인인 R을 유일값 분석2. 분해된 행렬 중, U와 S의 행렬 곱을 계산하여, 사용자들을 축소된 차원공간에서 한 벡터로 표현3. US의 정해진 k개의 열을 기준으로 모든 사용자들의 코사인 거리, 즉 각 사용자들 간의 유사성을 나타내는 거리를 구하는 과정4. 어떤 임계값 T보다 큰 코사이 ㄴ거리를 가지는 모든 사용자들을 유사한 사용자들, N_u로 취급하는 과정, 즉 각각의 사용자와 유사한 사용자들을 선택하기 위하여 일정 크기의 임계치를 이용하여 유사성이 떨어지는 사용자들을 여과하는 과정5. 유사한 사용자들간의 가중치를 계산하는 과정, 즉 더 유사한 사용자들에게 더 큰 영향력을 주기 위하여 가중치를 구하는 과정6. 모든 유사한 사용자들의 가중치가 부여된 평가 값의 평균을 구하여, 아이템(j)들에 대한 사용자들의 선호도를 구하는 과정을 포함한다.4 and 5, the latent semantic structure has been used as a method for searching a document. Primarily, the matrix of terms and documents was analyzed by Singular Value Decomposition (SVD) to derive a particular potential semantic structure. The unique value analysis (SVD) is closely related to various fields of mathematics and statistics such as eigenvecter decomposition, spectral analysis, factor analysis, etc. As shown in FIG. 4, the overall algorithm applied to the system of the present invention is One. Unique value analysis of R, a stepwise evaluation matrix of users. Among the decomposed matrices, the matrix product of U and S is calculated to represent users as a vector in a reduced dimension space. A process of obtaining a cosine distance of all users based on k predetermined columns of US, that is, a distance representing similarity between users. Treat all users with co-distance greater than a certain threshold T as similar users, N _u , ie filter out less similar users using a threshold of a certain size to select similar users to each user. Process of 5. The process of calculating weights among similar users, that is, the process of obtaining weights to give greater influence to more similar users. Calculating the average of the weighted evaluation values of all the similar users, and calculating the user's preference for the items j.

상기 유일값 분석(SVD)에서는 행과 열에 다른 존재를 가지는 임의의 직각행렬을 가지고 시작한다. 예를 들면, 행은 사용자, 열은 아이템이 될 수 있다. 이런 직각 행렬은 유일값 분석(SVD)이라는 과정에 의해서 특별한 형식의 다른 세가지 행렬로 분해될 수 있다. 선형적으로 대부분의 독립적인 요소들은 매우 적을 것이고, 혹은 무시될 수 있으며, 더 작은 차원을 가지는 근사 모델을 만들 수 있다. 이러한 감소된 모델에서 사용자와 사용자, 아이템과 아이템, 사용자와 아이템의 유사성은 축소된 차원에서의 값으로 어림잡아 진다. 결과적으로 두 벡터의 내적 혹은 코사인 거리는 측정된 유사성에 상응하는 지리적으로 공간 구성에서 표현된다.The unique value analysis (SVD) starts with an arbitrary rectangular matrix with different presences in rows and columns. For example, a row can be a user and a column can be an item. This orthogonal matrix can be decomposed into three other matrices of a particular type by a process called unique value analysis (SVD). Linearly, most of the independent elements will be very small, or can be ignored, resulting in an approximation model with smaller dimensions. In this reduced model, the user-to-user, item-to-item, and user-item similarity is estimated to be reduced in value. As a result, the inner product, or cosine distance, of the two vectors is represented in a geographic spatial configuration corresponding to the measured similarity.

협력적 여과를 위해서, SVD는 사용자의 특징을 구분하는데 있어서 관계없는 아이템들의 집합을 제거하기 위한 기술로서 보여질 수 있다. 각 사용자는 평가 값의 벡터로서 표현된다. 차원 감소의 효과로서, 다소 다른 프로파일을 가지는 사용자들은 같은 벡터로서 위치될 수 있다. 이것은 믿을 수 없는 자료의 개선을 성취하기 위한 속성이 된다.For collaborative filtration, SVD can be viewed as a technique for removing a collection of items that are irrelevant in distinguishing a user's characteristics. Each user is represented as a vector of evaluation values. As an effect of dimension reduction, users with somewhat different profiles can be located as the same vector. This is an attribute for achieving incredible improvements in data.

이상적이나 완전하게 정확하지 않게 요소의 가중치로부터 원래의 사용자와 아이템의 행렬을 재구성할 수 있다. 원래의 사용자 공간은 신뢰할 수 없기 때문에, 파생된 k-차원의 요소 공간은 원래의 사용자 공간을 완전하게 재구성하지 못하는 것을 알아야 한다. 단지, 사용자를 가리키는 지시물로서 아이템을 사용해서 신뢰성과 중요성을 표현하는 파생된 구조를 요구한다.You can reconstruct the matrix of original users and items from the weights of the elements, ideally but not completely accurate. Since the original user space is not reliable, it should be noted that the derived k-dimensional element space does not completely reconstruct the original user space. It simply requires a derived structure that expresses reliability and importance by using items as pointers to users.

잠재적인 구조의 특별한 모델인 유일값 분석(SVD)을 개념적으로 살펴보면, u명의 사용자와 i개의 아이템을 나타내는 직각 행렬인 u*i는 3가지의 다른 행렬로 분해될 수 있다.Conceptually, the unique value analysis (SVD), a special model of potential structure, shows that u * i, a rectangular matrix representing u users and i items, can be broken down into three different matrices.

R = UST^TR = UST ^ T

여기서, U와 V는 직교 행렬이고, S는 대각 행렬이다. 이러한 형태의 분해를 R 행렬의 유일값 분석(SVD)이라고 한다. U와 V는 각각 좌측, 우측 유일벡터(singular vector)라고 하며, S는 유일값(singular value)의 대각 행렬이다.Where U and V are orthogonal matrices and S is a diagonal matrix. This form of decomposition is called unique value analysis (SVD) of the R matrix. U and V are called left and right singular vectors, respectively, and S is a diagonal matrix of singular values.

다음의 그림은 사용자 대 아이템의 행렬에 대한 유일값 분석(SVD)의 구성도를 보여준다. 즉, 평가 행렬 R의 유일값 분석(SVD)의 개념도이다.The following figure shows the schematic of the unique value analysis (SVD) for a matrix of users versus items. That is, it is a conceptual diagram of unique value analysis (SVD) of the evaluation matrix R. FIG.

[모델 1][Model 1]

일반적으로, SVD는 더 작은 행렬을 이용하여 최적의 근사형을 위한 단순한 방법을 제공한다. S의 유일값(singular value)의 크기에 의해서 정렬된다면, k개의 가장 큰 값들은 유지되고, 나머지 작은 값들은 0으로 설정된다. 결과적으로 행렬의 곱은 R에 근사적으로 동일한 행렬인 R'로 정의되며 수학식 4와 같다.In general, SVD provides a simple method for optimal approximation using smaller matrices. If sorted by the magnitude of the singular value of S, the k largest values are retained and the remaining small values are set to zero. As a result, the product of the matrix is defined as R 'which is approximately equal to R and is represented by Equation 4.

R의 k개의 가장 큰 독립적인 선형 요소를 포함하는 것은 R'가 자료의 중요하게 관련된 구조를 생성하고 대부분의 노이즈를 제거하는 효과를 가진다. 다음의 그림은 평가 행렬을 근사화하는 감소된 모델을 보여주고 있다.Including the k largest independent linear elements of R has the effect that R 'produces an importantly relevant structure of the data and removes most of the noise. The following figure shows a reduced model that approximates the evaluation matrix.

[모델 2][Model 2]

감소된 차원, k를 선택함에 있어서 자료의 실제 구조에 맞게 충분히 크게 선택하고 표본 추출의 오류와 중요하지 않는 상세함을 나타내지 않게 충분히 작게 선택해야 한다.In selecting the reduced dimension, k, it should be chosen large enough to fit the actual structure of the data and small enough not to indicate errors in sampling and insignificant detail.

일반적으로, SVD의 분해에 의해서 사용자와 사용자, 사용자와 아이템, 아이템과 아이템의 유사성 비교가 가능하다. 그러나, 협력적 여과에서는 사용자들 사이의 유사성이 가장 중요한 요소이므로 사용자와 사용자 사이의 유사성 비교에 초점을 맞추어 설명하겠다.In general, decomposition of SVD allows comparison of the similarity between a user and a user, a user and an item, and an item and an item. However, in the collaborative filtration, the similarity between users is the most important factor, so we will focus on comparing the similarity between users.

R'의 두 벡터의 내적은 두 사용자가 아이템에 대해서 유사한 패턴으로 평가한 정도를 반영한다. 모든 사용자 대 사용자의 내적을 포함하는 R'R'^T는 대칭정방 행렬이다. S는 대각 행렬이고 U는 직교 행렬이므로 R'R'^T은 다음과 같이 수학식 5로 계산 될 수 있다.The dot product of the two vectors of R 'reflects the degree to which the two users evaluate similar items for the item. R'R ' ^T, which contains the product of all users versus users, is a symmetric square matrix. Since S is a diagonal matrix and U is an orthogonal matrix, R'R ' ^T can be calculated by Equation 5 as follows.

R'R'^T안에서 (i,j)점은 행렬 US의 i와 j 행의 내적을 취해서 얻어 질 수 있다. 즉, 사용자에 대한 좌표로서 TS의 열을 고려한다면, 두 점의 내적은 사용자들의 비교를 의미한다.Point in R'R ^'T (i, j) can be obtained by taking the dot product of the i and j row of the matrix US. That is, considering the column of TS as the coordinates for the user, the dot product of the two points means the comparison of the users.

사용자들의 유사성을 나타내는 기준으로 두 점의 내적을 사용할 수도 있고, 또한 두 벡터에 대한 코사인 거리를 이용할 수도 있다. US의 행렬 곱은 두 사용자의 평가 패턴의 유사한 정도를 나타내는 벡터를 구성한다. 즉, 두 벡터의 코사인 거리가 작을수록 두 사용자의 유사한 정도는 크다고 할 수 있다.The dot product of two points may be used as a criterion indicating the similarity of users, or the cosine distance of two vectors may be used. The matrix product of US constitutes a vector representing the similarity of the evaluation patterns of the two users. That is, the smaller the cosine distance of the two vectors, the greater the similarity of the two users.

이상의 설명에서 본 발명은 특정의 실시 예와 관련하여 도시 및 설명하였지만, 특허청구범위에 의해 나타난 발명의 사상 및 영역으로부터 벗어나지 않는 한도 내에서 다양한 개조 및 변화가 가능하다는 것을 당 업계에서 통상의 지식을 가진 자라면 누구나 쉽게 알 수 있을 것이다.While the invention has been shown and described in connection with specific embodiments thereof, it is well known in the art that various modifications and changes can be made without departing from the spirit and scope of the invention as indicated by the claims. Anyone who owns it can easily find out.

이상에서 설명한 바와 같이 본 발명의 잠재적인 사용자 선호도 추출 및 예측방법에 의하면 사용자들에 의하여 입력되어진 단계적 평가 행렬을 사용하여 잠재적인 사용자 선호도를 추출 및 예측함으로써 선호도 추출에서 입력 행렬의 노이즈 제거와 의미 있는 차원 감소를 통하여 Pearson r, 신경망 등을 사용하는 것보다 예측 정확도를 높이고, 수행시간을 줄일 수 있는 효과가 있다.As described above, according to the potential user preference extraction and prediction method of the present invention, the noise extraction of the input matrix and the meaningfulness of the input matrix are significant in the extraction of the preferences by extracting and predicting the potential user preferences using the stepwise evaluation matrix inputted by the users. By reducing the dimension, it is possible to improve the prediction accuracy and reduce the execution time than using Pearson r or neural network.

Claims

In a system consisting of a user PC, a web server connected to the user PC through the Internet, a DBMS connected to the web server, and a preference extraction and prediction server, the user can extract and predict potential user preferences by the preference extraction and prediction server. In the way,

A first process of extracting potential user preferences using singular value decomposition (SVD) technology and performing noise removal and significant dimensional reduction of the input matrix;

A second process of mapping users to a meaningful dimension space, but representing the users as a vector in the reduced dimension space;

A third step of obtaining a distance representing similarity between each user by using a vector representing the users;

A fourth step of filtering users with less similarity using a threshold of a predetermined size to select users similar to each user;

A fifth step of obtaining weights to give more influence to more similar users;

And a sixth step of obtaining the user's preferences for the items using the average of the weighted evaluation values of all similar users.