KR20080043140A

KR20080043140A - Collaborative filtering system and method

Info

Publication number: KR20080043140A
Application number: KR1020060111793A
Authority: KR
Inventors: 구만영; 김선영; 임성욱; 김지환; 이주명
Original assignee: 에스케이커뮤니케이션즈 주식회사
Priority date: 2006-11-13
Filing date: 2006-11-13
Publication date: 2008-05-16
Also published as: KR100907744B1

Abstract

A collaborative filtering system and a method thereof are provided to offer a correct recommendation result to users of a minor group by considering propensity of the users belonging to the minor group, and calculate correct relation between user preference and contents by excluding user action inducing an incorrect result. A contents providing server interface(105) receives contents use information of users by interfacing with a contents providing server. A contents use information management database(115) stores and manages the contents use information of the users. A user contents preference calculator(120) calculates contents preference applying weight according to user action and/or use time information of each contents by analyzing the contents use information stored in the database. A contents similarity calculator(130) determines a pair of contents frequently used together by analyzing the contents use information stored in the database and calculates similarity of the contents pair by applying the weight according to a use type of the contents pair and attribute of the paired contents. A user recommendation data generator(140) generates recommendation data for each user by using both the user contents preference calculating results and similarity data.

Description

Collaborative filtering system and method {COLLABORATIVE FILTERING SYSTEM AND METHOD}

도 1은 본 발명의 일실시 예에 따른 협업 필터링 시스템에 대한 개략적인 블록도,1 is a schematic block diagram of a collaborative filtering system according to an embodiment of the present invention;

도 2는 본 발명의 일실시 예에 따른 협업 필터링을 위해 컨텐츠 이용 정보를 관리하는 테이블 구조의 예,2 is an example of a table structure for managing content usage information for collaborative filtering according to an embodiment of the present invention;

도 3은 본 발명의 일실시 예에 따른 컨텐츠 선호도 산출시 적용된 시간가중치의 예,3 is an example of a time weight value applied when calculating a content preference according to an embodiment of the present invention;

도 4a 내지 도 4c는 도 2에 예시된 컨텐츠 이용 정보들로부터 분리된 사용자별 컨텐츠 이용 정보를 관리하는 테이블 구조의 예들,4A to 4C illustrate examples of a table structure for managing user-specific content usage information separated from the content usage information illustrated in FIG. 2;

도 5는 본 발명의 일실시 예에 따른 협업 필터링을 위한 사용자의 속성 분류 기준이 되는 사용자 정보 관리 테이블 구조의 예,5 is an example of a user information management table structure that is a property classification criterion of a user for collaborative filtering according to an embodiment of the present invention;

도 6a 내지 도 6b는 도 5에 예시된 사용자 속성 분류 기준에 의거하여 도 2에 예시된 컨텐츠 이용 정보들로부터 분리된 사용자 속성별 컨텐츠 이용 정보를 관리하는 테이블 구조의 예들,6A to 6B are examples of a table structure for managing content usage information for each user attribute separated from the content usage information illustrated in FIG. 2 based on the user attribute classification criteria illustrated in FIG. 5;

도 7은 본 발명의 일실시 예에 따른 협업 필터링을 위해 특정 컨텐츠들간의 유사도 분석결과를 관리하는 테이블 구조의 예,7 is an example of a table structure for managing similarity analysis results between specific contents for collaborative filtering according to an embodiment of the present invention;

도 8은 본 발명의 일실시 예에 따른 협업 필터링 방법을 이용하여 사용자별 추천 데이터를 제공하는 과정에 대한 처리 흐름도,8 is a flowchart illustrating a process of providing recommendation data for each user using a collaborative filtering method according to an embodiment of the present invention;

도 9는 본 발명의 일실시 예에 따른 사용자별 컨텐츠 선호도 산출 과정에 대한 처리 흐름도,9 is a flowchart illustrating a process of calculating content preference for each user according to an embodiment of the present invention;

도 10은 본 발명의 일실시 예에 따른 컨텐츠 유사도 산출 과정에 대한 처리 흐름도.10 is a flowchart illustrating a content similarity calculation process according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100: 협업 필터링 시스템 105: 컨텐츠 제공 서버 인터페이스부100: collaborative filtering system 105: content providing server interface unit

110: 컨텐츠 이용정보 저장부 115: 컨텐츠 이용정보 관리 DB110: content usage information storage unit 115: content usage information management DB

120: 사용자별 컨텐츠 선호도 산출부 125: 사용자별 컨텐츠 선호도 관리 DB120: content preference calculator for each user 125: content preference management DB for each user

130: 컨텐츠 유사도 산출부 135: 유사도 데이터 관리 DB130: content similarity calculating unit 135: similarity data management DB

140: 사용자별 추천 데이터 생성부 145: 사용자별 추천 데이터 관리 DB140: recommendation data generation unit for each user 145: recommendation data management DB for each user

150: 사용자별 추천 데이터 제공부150: user recommendation data provider

본 발명은 협업 필터링 시스템 및 그 방법에 관한 것으로서, 특히 사용자의 속성과 컨텐츠의 속성, 사용자의 컨텐츠 이용 시점을 적절히 반영함으로써 추천의 정확도를 높이는 협업 필터링 시스템 및 그 방법에 관한 것이다.The present invention relates to a collaborative filtering system and a method thereof, and more particularly, to a collaborative filtering system and a method for enhancing the accuracy of the recommendation by properly reflecting the attributes of the user, the attributes of the content, and the user's time of use of the contents.

인터넷이 급속도로 발달하면서 방대한 양의 정보가 사용자들에게 제공되고 있으며, 이로 인해 사용자들이 원하는 정보를 찾는 것이 점차 어려워지고 있다. 또한 원하는 정보를 찾는다 하더라도 그 정보를 찾을 때까지 소요되는 시간(즉, 정보 검색 시간)이 증가하게 되었다. 따라서 최근에는 정보 검색시간을 줄이고 사용자별로 그 사용자가 원하는 데이터, 즉 그 사용자에게 가치 있는 데이터만을 제공하는 ‘개인화 서비스’의 필요성이 대두되었다.With the rapid development of the Internet, a vast amount of information is provided to users, which makes it increasingly difficult for users to find the information they want. In addition, even if the desired information is found, the time required to find the information (ie, information retrieval time) has increased. Therefore, recently, the necessity of a 'personalization service' that reduces information retrieval time and provides only the data desired by the user, that is, valuable data for the user, has emerged.

‘개인화 서비스’란 사용자가 묵시적 혹은 명시적으로 제공한 정보(예컨대, 사용자의 신상정보 및 선호도 등)를 이용하여 해당 사용자에게 가치 있을 것으로 예상되는 데이터만을 선별하고, 상기 선별된 데이터를 해당 사용자에게 제공하는 서비스를 말한다. 이하에서는 이와 같이 사용자별로 선별된 데이터를 제공하는 것을 ‘추천한다’라고 정의한다.'Personalization Service' refers to only data that is expected to be valuable to the user by using information implicitly or explicitly provided by the user (for example, user's personal information and preferences), and the selected data to the user. Say the service you provide. Hereinafter, it is defined as 'recommended' to provide the data selected for each user as described above.

‘협업 필터링’은 이러한 ‘개인화 서비스’의 일종으로서, 사용자들에게 추천할 데이터의 결정시 사용자 또는 컨텐츠들 간의 유사도를 이용하는 방법을 말한다. 즉 ‘협업 필터링’은 크게 사용자간의 유사도를 이용하여 추천 데이터를 결정하는 ‘사용자간(user to user) 협업 필터링’과 컨텐츠간의 유사도를 이용하여 추천 데이터를 결정하는 ‘컨텐츠간(item to item) 협업 필터링’또는 이들 두 가지를 병합한 협업 필터링으로 구분된다. ‘사용자간 협업 필터링’과 ‘컨텐츠간 협업 필터링’각각의 특징은 다음과 같다. "Collaborative filtering" is a kind of "personalization service" and refers to a method of using similarity between users or contents in determining data to be recommended to users. In other words, 'collaboration filtering' refers to 'user to user collaboration filtering', which uses the similarity between users, and 'item to item' collaboration, which uses the similarity between contents. Filtering ”or collaborative filtering that merges the two. The features of 'collaboration filtering between users' and 'collaboration filtering between contents' are as follows.

먼저 ‘사용자간 협업 필터링’은 사용자가 제공한 정보(예컨대, 사용자의 신상정보 및 선호도 등)를 이용하여 비슷한 패턴을 보이는 사용자들을 그룹으로 분류한 후 동일 그룹에 포함된 다른 사용자들의 컨텐츠 구매 정보 또는 컨텐츠 이용 정보를 이용하여 특정 사용자의 추천 데이터를 결정한다. 즉 동일 그룹에 포함된 사용자들간에 교차 추천을 수행한다. 예를 들어 ‘사용자간 협업 필터링’방식을 이용하는 추천 시스템에서, ‘사용자 1’과 ‘사용자 2’가 ‘그룹 1’에 포함되고 ‘사용자 1’이 ‘컨텐츠 2’를 구매한 경우 상기 추천 시스템은 ‘사용자 2’의 추천 데이터에 ‘컨텐츠 2’를 포함시킨다.First, 'collaboration filtering between users' is used to classify users who have a similar pattern by using information provided by the user (for example, user's personal information and preferences), and then purchase information or contents of other users included in the same group or The recommendation data of a specific user is determined using the content usage information. That is, cross recommendation is performed among users included in the same group. For example, in the recommendation system using the 'collaboration filtering between users' method, if 'user 1' and 'user 2' are included in 'group 1' and 'user 1' has purchased 'content 2' Include 'Content 2' in the recommendation data of 'User 2'.

한편 ‘컨텐츠간 협업 필터링’은 동일 사용자가 동시에 이용한 컨텐츠 쌍을 이용하여 특정 사용자의 추천 데이터를 결정한다. 예를 들어 ‘컨텐츠간 협업 필터링’방식을 이용하는 추천 시스템에서, ‘사용자 1’이 ‘컨텐츠 1’과 ‘컨텐츠 2’를 동시에 구매한 경우 ‘사용자 2’가 ‘컨텐츠 1’을 구매하면 상기 추천 시스템은 ‘사용자 2’의 추천 데이터에 ‘컨텐츠 2’를 포함시킨다. Meanwhile, 'collaboration filtering between contents' determines the recommendation data of a specific user by using a pair of contents simultaneously used by the same user. For example, in a recommendation system using a 'collaboration filtering between contents' method, when 'user 1' purchases 'content 1' and 'content 2' at the same time, if 'user 2' purchases 'content 1', the recommendation system Includes 'content 2' in the recommendation data of 'user 2'.

이러한 종래의 ‘협업 필터링’방법들은 다음과 같은 문제점이 있다.These conventional "collaborative filtering" methods have the following problems.

먼저 종래의 ‘협업 필터링’ 방법은 소수 집단에 속한 사용자의 성향을 고려하지 않고 다수 사용자들의 성향에 치우친 추천 결과를 얻게 된다. 예를 들어 전체 100명의 사용자 중 ‘컨텐츠 1’과 ‘컨텐츠 2’를 동시에 선택하는 사용자가 80명, ‘컨텐츠 1’과 ‘컨텐츠 3’을 동시에 선택하는 사용자가 20명이라고 할 경우 종래의 ‘협업 필터링’방법에서는‘컨텐츠 1’과 ‘컨텐츠 2’의 조합이 ‘컨텐츠 1’과 ‘컨텐츠 3’보다 4배 정도 유사하다는 결과가 나온다. 즉 사용자의 속성(예컨대, 성별, 나이, 지역 등)을 고려하지 않고 단순히 사용자의 수에 의해 유 사도를 산출한다. 따라서 상기 ‘컨텐츠 1’과 ‘컨텐츠 2’를 동시에 선택한 사용자 80명 중 여성이 70명, 남성이 10명이고, ‘컨텐츠 1’과 ‘컨텐츠 3’을 동시에 선택한 사용자 20명 중 여성이 5명, 남성이 15명이라면, 남성의 경우 ‘컨텐츠 1’과 ‘컨텐츠 3’을 함께 선택한 사람(15명)이 ‘컨텐츠 1’과 ‘컨텐츠 2’를 함께 선택한 사람(10명)보다 1.5배 정도 많음에도 불구하고 종래에는 상기 사용자의 전체 수에 의해 산출된 결과에 따라 ‘컨텐츠 1’을 선택한 남성 사용자에게 ‘컨텐츠 3’이 아닌 ‘컨텐츠 2’를 추천하게 된다. 이는 남성 25명과, 여성 75명으로 구성된 전체 100명의 사용자들의 성향 분석시 성별을 고려하지 않음으로써 소수 집단에 속한 남성 사용자의 성향이 반영되지 않고 다수 사용자인 여성의 성향에 치우친 결과를 얻는 예를 보여주는 것이다.First, the conventional 'collaborative filtering' method does not consider the propensity of users belonging to the minority group and obtains the recommendation results biased to the propensity of the majority users. For example, if a total of 100 users select 'Content 1' and 'Content 2' at the same time and 80 users select 'Content 1' and 'Content 3' at the same time, In the filtering method, the combination of 'content 1' and 'content 2' is four times similar to 'content 1' and 'content 3'. That is, similarity is calculated by the number of users without considering the user's attributes (eg, gender, age, region, etc.). Therefore, 70 females and 10 males among the 80 users who simultaneously selected the 'Content 1' and the 'Content 2' were 5 females among the 20 users who simultaneously selected the 'Content 1' and 'Content 3', If you have 15 men, 15 times more males choose Content 1 and 3 Contents than 1.5 people choose Content 1 and 2 Contents Nevertheless, in the related art, 'content 2' rather than 'content 3' is recommended to a male user who selected 'content 1' according to the result calculated by the total number of users. This shows an example in which the tendency of male users belonging to the minority group does not reflect the tendency of female users who are majority users without considering gender in the analysis of the propensity of 25 male and 100 female users. will be.

또한 종래의 ‘협업 필터링’ 방법은 사용자별 선호도 및 컨텐츠간의 유사도 산출시, 자신의 선호 분야에 맞추어 선별적으로 컨텐츠를 이용하지 않고 무작위로 컨텐츠에 접근하는 사용자(예컨대, 쇼핑몰 홍보자, 스팸 광고자 및 프로그램 개발자 등)의 컨텐츠 이용 정보를 그대로 적용함으로써 부정확한 결과를 유도하는 원인이 된다. 이로 인해 종래의 ‘협업 필터링’방법은 추천 정확도가 떨어지는 문제가 있다.In addition, in the conventional 'collaboration filtering' method, users who randomly access the content without using the content selectively according to their preferences when calculating the user's preference and the similarity between the contents (for example, a shopping mall promoter and a spam advertiser) And program developers, etc.), as it is, it is a cause of inaccurate results. For this reason, the conventional "collaborative filtering" method has a problem of low recommendation accuracy.

또한 사용자들의 컨텐츠에 대한 선호 성향은 시간에 따라 변화하는데, 종래의 ‘협업 필터링’ 방법으로는 이를 적절히 적용하기 어려운 문제가 있다. 즉 종래에는 단순히 개별 컨텐츠의 이용 시점만을 이용하여 컨텐츠의 선호 성향을 결정함으로써, 유효한 컨텐츠 이용 내역 기간을 길게 설정할 경우 정확도가 떨어지고, 이를 방지하기 위해 상기 컨텐츠 이용 내역 기간을 짧게 설정할 경우 추천 컨텐츠가 제한되는 문제가 있다.In addition, the user's preference for the content changes over time, and there is a problem in that it is difficult to apply it properly with the conventional 'collaboration filtering' method. That is, in the related art, the preference tendency of the contents is determined simply by using only the use time of the individual contents, so that when the valid content usage history period is set long, the accuracy decreases. There is a problem.

마지막으로 종래의 ‘협업 필터링’ 방법은 컨텐츠의 카테고리, 색상, 음악 장르 등과 같이 컨텐츠의 유사성을 짐작할 수 있는 컨텐츠의 속성을 이미 알고 있는 경우에도 컨텐츠의 유사도 산출시 단순히 사용자의 컨텐츠 이용 내역만을 참조한다. 따라서 보다 정확한 유사도 산출이 어려운 문제가 있다. Lastly, the conventional 'collaboration filtering' method simply refers to the user's content usage history only when calculating the similarity of the content even when the property of the content that can be estimated similar to the content category, color, music genre, etc. is already known. . Therefore, it is difficult to calculate more accurate similarity.

본 발명에서는 상기와 같은 문제점을 해결하기 위해 정확한 추천 데이터를 제공하도록 하는 협업 필터링 시스템 및 그 방법을 제공하고자 한다.The present invention provides a collaborative filtering system and method for providing accurate recommendation data to solve the above problems.

또한 본 발명은 소수 집단에 속한 사용자의 성향을 고려함으로써 소수 집단의 사용자들에게도 정확한 추천 결과를 제공하도록 하는 협업 필터링 시스템 및 그 방법을 제공하고자 한다.In addition, the present invention is to provide a collaborative filtering system and method for providing accurate recommendation results to users of the minority group by considering the propensity of users belonging to the minority group.

또한 본 발명은 부정확한 결과를 유도하는 사용자의 행위를 제외시킴으로써 정확한 사용자별 선호도 및 컨텐츠간 유사도를 산출하도록 하는 협업 필터링 시스템 및 그 방법을 제공하고자 한다.In addition, the present invention is to provide a collaborative filtering system and method for calculating accurate user preferences and similarity between contents by excluding user's actions that lead to inaccurate results.

또한 본 발명은 동일 사용자에 의한 컨텐츠간 이용 시간간격을 이용함으로써 컨텐츠간 유사도를 정확하게 산출하는 협업 필터링 시스템 및 그 방법을 제공하고자 한다.In addition, the present invention is to provide a collaborative filtering system and method for accurately calculating the similarity between the contents by using the time interval between the contents by the same user.

또한 본 발명은 컨텐츠의 유사성을 짐작할 수 있는 컨텐츠의 속성을 이용하 여 컨텐츠간 유사도를 정확하게 산출하는 협업 필터링 시스템 및 그 방법을 제공하고자 한다. In addition, the present invention is to provide a system and method for collaborative filtering that accurately calculates the similarity between contents using the attributes of the content that can guess the similarity of the contents.

상기 목적을 달성하기 위해 본 발명에서 제공하는 협업 필터링 방법은 컨텐츠 제공 서버로부터 사용자들의 컨텐츠 이용 정보를 전달받아 저장하는 과정; 상기 컨텐츠 이용 정보를 분석하여 각 컨텐츠에 대한 사용자의 행위 및/또는 이용시간 정보를 판단하고, 상기 사용자의 행위 및/또는 이용 시간 정보에 따른 가중치를 적용하여 컨텐츠 각각에 대한 사용자별 선호도를 산출하는 과정; 상기 컨텐츠 이용 정보를 분석하여 함께 이용된 경우가 많은 컨텐츠 쌍을 판단하고, 상기 컨텐츠 쌍의 이용 형태 및 상기 쌍을 이루는 컨텐츠들 각각의 속성에 따른 가중치를 적용하여 상기 컨텐츠 쌍들의 유사도를 산출하는 과정; 및 상기 사용자별 선호도 및 상기 컨텐츠 쌍의 유사도를 이용하여 사용자별 추천 데이터를 생성하는 과정을 포함한다. In order to achieve the above object, the present invention provides a collaborative filtering method comprising: receiving and storing content usage information of users from a content providing server; Analyzing the content usage information to determine the user's behavior and / or use time information for each content, and calculating the user's preference for each of the content by applying a weight according to the user's behavior and / or use time information process; Analyzing the content usage information to determine a content pair that is often used together, and calculating similarity of the content pairs by applying weights according to the usage form of the content pair and the attributes of each of the contents of the pair; ; And generating user-specific recommendation data using the user-specific preference and the similarity of the content pair.

이 때 상기 사용자별 선호도 산출 과정은 상기 컨텐츠 이용 정보를 분석하여 각 컨텐츠에 대한 사용자의 행위를 판단하는 단계; 미리 설정된 행위별 선호도 가중치를 상기 판단된 사용자의 행위 각각에 적용하여 해당 컨텐츠에 대한 사용자별 행위 선호도를 산출하는 단계; 컨텐츠들 각각의 이용 시간 정보를 판단하고, 그 이용 시간 정보에 따른 시간 가중치를 적용하여 해당 컨텐츠에 대한 사용자별 시간 선호도를 산출하는 단계; 및 상기 사용자별 행위 선호도와 사용자별 시간 선호도를 이용하여 해당 컨텐츠에 대한 사용자별 선호도를 산출하는 단계를 포함하는 것이 바람직하다.At this time, the preference calculation process for each user may include: analyzing the content usage information to determine a user's behavior for each content; Calculating a user's behavior preference for the corresponding content by applying a preset weight of preference for each behavior to each of the determined actions of the user; Determining time usage information of each of the contents and calculating time preference for each user by applying time weights according to the usage time information; And calculating a user's preference for the corresponding content by using the user's action preference and the user's time preference.

또한 상기 사용자별 선호도 산출 과정은 미리 설정된 비정상 사용자 판단 기준에 의거하여 비정상 사용자 정보를 필터링하는 단계를 더 포함하는 것이 바람직하다.The user preference calculation process may further include filtering abnormal user information based on a preset abnormal user determination criterion.

또한 상기 사용자별 선호도 산출 과정은 사용자 속성(예컨대, 성, 연령 및 지역 등)을 이용하여 상기 사용자별 선호도 산출 결과를 분리한 후 사용자 속성별 선호도를 산출하는 단계를 더 포함하는 것이 바람직하다.In addition, the preference calculation process for each user may further include the step of calculating a preference for each user attribute after separating the user preference calculation result using a user attribute (eg, gender, age, region, etc.).

또한 상기 시간 가중치 적용 단계는 해당 컨텐츠의 가장 최근 이용시간과 현재 시간과의 간격에 반비례하는 시간 가중치를 적용하는 것이 바람직하다.In addition, in the step of applying the time weight, it is preferable to apply a time weight that is inversely proportional to the interval between the most recent use time of the corresponding content and the current time.

또한 상기 유사도 산출 과정은 상기 컨텐츠 이용 정보를 분석하여, 미리 설정된 시간 간격 이내에 동일 사용자에 의해 연속적으로 이용된 서로 다른 2이상의 컨텐츠들 중 선택 가능한 2개의 컨텐츠 조합을 유사 컨텐츠 쌍으로 결정하는 단계; 상기 유사 컨텐츠 쌍에 대하여 동일 사용자에 의해 이용된 이용 시점간 간격에 반비례하는 시간 가중치를 적용하는 단계; 및 상기 유사 컨텐츠 쌍을 이루는 컨텐츠들 각각의 속성에 따른 속성 가중치를 적용하는 단계를 포함하는 것이 바람직하다.The calculating of the similarity may include analyzing the content usage information to determine a combination of two contents that are selectable from two or more different contents consecutively used by the same user within a preset time interval as a pair of similar contents; Applying a time weight inversely proportional to an interval between usage times used by the same user for the pair of similar contents; And applying attribute weights according to attributes of each of the contents forming the similar content pair.

또한 상기 유사도 산출 과정은 상기 컨텐츠 쌍을 함께 이용한 사용자들의 수가 미리 설정된 소수 사용자 결정값 이하인 경우 상기 컨텐츠 쌍을 필터링하는 단계를 더 포함하는 것이 바람직하다.The similarity calculation step may further include filtering the content pairs when the number of users using the content pairs is less than or equal to a predetermined minority user determination value.

또한 상기 유사 컨텐츠 쌍 결정 단계는 서로 다른 2이상의 컨텐츠들이 미리 설정된 시간 간격 이내에 동일 사용자에 의해 연속적으로 사용된 횟수가 미리 설정된 횟수 이상인 경우 그 컨텐츠들 중 선택 가능한 2개의 컨텐츠 조합을 유사 컨텐츠 쌍으로 결정하는 것이 바람직하다.In the determining of the similar content pair, if the number of times that two or more different contents are consecutively used by the same user within a preset time interval is greater than or equal to a preset number, the two content combinations selectable among the contents are determined as the similar content pairs. It is desirable to.

또한 상기 시간 가중치 적용 단계는 상기 유사 컨텐츠 쌍에 대하여 동일 사용자에 의해 이용된 이용 시점간 간격을 판단하는 단계; 소정 시간 간격마다 그 시간 간격에 반비례하도록 미리 설정된 시간 가중치 중 상기 이용 시점간 간격에 대응된 시간 가중치를 선별하는 단계; 및 상기 선별된 시간 가중치를 적용하는 단계를 포함하는 것이 바람직하다.The applying of the time weight may include determining an interval between usage time points used by the same user for the pair of similar contents; Selecting time weights corresponding to the intervals between the use time points among predetermined time weights in inverse proportion to the time intervals at predetermined time intervals; And applying the selected time weight.

또한 상기 속성 가중치 적용 단계는 상기 컨텐츠들이 속한 카테고리의 유사도에 비례하는 속성 가중치를 적용하는 것이 바람직하다.In addition, the attribute weight applying step preferably applies an attribute weight proportional to the similarity of the category to which the contents belong.

한편 상기 목적을 달성하기 위해 본 발명에서 제공하는 협업 필터링 시스템은 사용자들에게 컨텐츠 서비스를 제공하는 컨텐츠 제공 서버와의 인터페이스를 통해 상기 컨텐츠 제공 서버로부터 사용자들의 컨텐츠 이용 정보를 수신하는 컨텐츠 제공서버 인터페이스부; 상기 사용자들의 컨텐츠 이용 정보를 저장/관리하는 컨텐츠 이용 정보 관리 데이터베이스부; 상기 컨텐츠 이용 정보 관리 데이터베이스부에 저장된 사용자들의 컨텐츠 이용 정보를 분석하여 각 컨텐츠에 대한 사용자의 행위 및/또는 이용 시간 정보에 따른 가중치가 적용된 사용자별 컨텐츠 선호도를 산출하는 사용자별 컨텐츠 선호도 산출부; 상기 컨텐츠 이용 정보 관리 데이터베이스부에 저장된 사용자들의 컨텐츠 이용 정보를 분석하여 함께 이용된 경우가 많은 컨텐츠 쌍을 판단하고, 상기 컨텐츠 쌍의 이용 형태 및 상기 쌍을 이루는 컨텐츠들 각각의 속성에 따른 가중치를 적용하여 상기 컨텐츠 쌍들의 유사도를 산출하는 컨텐츠 유사도 산출부; 및 상기 선호도 산출부의 사용자별 컨텐츠 선호도 산출결과와 상기 컨텐츠 유사도 산출부의 컨텐츠 쌍들의 유사도 산출결과를 이용하여 사용자별 추천 데이터를 생성하는 사용자별 추천 데이터 생성부를 포함한다.Meanwhile, in order to achieve the above object, the collaborative filtering system provided by the present invention provides a content providing server interface unit for receiving content use information of users from the content providing server through an interface with a content providing server providing a content service to users. ; A content usage information management database unit which stores / manages content usage information of the users; A content preference calculator for each user, which analyzes content usage information of users stored in the content usage information management database unit and calculates content preference for each user to which a weight is applied according to user's behavior and / or usage time information for each content; Analyzes content usage information of users stored in the content usage information management database unit to determine a content pair that is frequently used together, and applies a weight according to the usage form of the content pair and the attributes of each of the paired contents. A content similarity calculation unit configured to calculate similarity of the content pairs; And a user-specific recommendation data generation unit configured to generate recommendation data for each user by using the content preference calculation result of the user of the preference calculator and the similarity calculation result of the content pairs of the content similarity calculator.

이 때 상기 사용자별 컨텐츠 선호도 산출부는 상기 컨텐츠 이용 정보를 분석하여 각 컨텐츠에 대한 사용자의 행위를 판단하고, 미리 설정된 행위별 선호도 가중치를 상기 판단된 사용자의 행위 각각에 적용하여 사용자별 행위 선호도를 산출한 후 그 사용자별 행위 선호도를 포함하는 컨텐츠 선호도를 산출하는 것이 바람직하다.In this case, the content preference calculator for each user analyzes the content usage information to determine a user's behavior for each content, and calculates a user's behavior preference by applying a preset preference weight for each behavior to each of the determined user's behaviors. After that, it is preferable to calculate the content preference including the user preference for each user.

또한 상기 사용자별 컨텐츠 선호도 산출부는 상기 컨텐츠 이용 정보를 분석하여 컨텐츠들 각각의 이용 시간 정보를 판단하고, 그 이용 시간 정보에 따른 시간 가중치를 적용하여 해당 컨텐츠에 대한 사용자별 시간 선호도를 산출한 후 그 사용자별 시간 선호도를 포함하는 컨텐츠 선호도를 산출하는 것이 바람직하다.In addition, the content preference calculator for each user analyzes the content usage information to determine the use time information of each of the contents, and calculates the time preference for each user by applying a time weight according to the use time information. It is preferable to calculate the content preference including the time preference for each user.

또한 상기 사용자별 컨텐츠 선호도 산출부는 미리 설정된 비정상 사용자 판단 기준에 의거하여 비정상 사용자 정보를 필터링한 후 상기 사용자별 선호도를 산출하는 것이 바람직하다.The content preference calculator for each user may be configured to calculate abnormal user information after filtering abnormal user information based on a predetermined abnormal user determination criterion.

또한 상기 사용자별 컨텐츠 선호도 산출부는 사용자 속성을 이용하여 상기 사용자별 선호도 산출결과를 분리한 후 사용자 속성별 선호도를 산출하는 것이 바람직하다.In addition, the content preference calculator for each user may be configured to calculate a preference for each user property after separating the user preference calculation result using the user property.

또한 상기 사용자별 컨텐츠 선호도 산출부는 해당 컨텐츠의 가장 최근 이용 시간과 현재 시간과의 간격에 반비례하는 시간 가중치를 적용하여 사용자별 컨텐츠 선호도를 산출하는 것이 바람직하다.In addition, the content preference calculator for each user calculates the content preference for each user by applying a time weight that is inversely proportional to the interval between the most recent use time of the corresponding content and the current time.

또한 상기 컨텐츠 유사도 산출부는 상기 컨텐츠 이용 정보를 분석하여 미리 설정된 시간 간격 이내에 동일 사용자에 의해 연속적으로 이용된 서로 다른 2이상의 컨텐츠들 중 선택 가능한 2개의 컨텐츠 조합을 유사 컨텐츠 쌍으로 결정한 후 그 유사 컨텐츠 쌍에 대하여 동일 사용자에 의해 이용된 이용 시점간 간격에 반비례하는 시간 가중치를 적용한 유사도를 산출하는 것이 바람직하다.In addition, the content similarity calculating unit analyzes the content usage information to determine two content combinations that can be selected among two or more different contents consecutively used by the same user within a preset time interval as a similar content pair and then the similar content pair It is preferable to calculate the similarity obtained by applying a time weight that is inversely proportional to the interval between the use time points used by the same user.

또한 상기 컨텐츠 유사도 산출부는 서로 다른 2이상의 컨텐츠들이 미리 설정된 시간 간격 이내에 동일 사용자에 의해 연속적으로 사용된 횟수가 미리 설정된 횟수 이상인 경우 그 컨텐츠들 중 선택 가능한 2개의 컨텐츠 조합을 유사 컨텐츠 쌍으로 결정하는 것이 바람직하다.In addition, the content similarity calculator may determine that two or more different contents are consecutively used by the same user within a preset time interval and determine a combination of two selectable contents among the contents as a pair of similar contents. desirable.

또한 상기 컨텐츠 유사도 산출부는 상기 유사 컨텐츠 쌍을 이루는 컨텐츠들 각각의 속성에 따른 속성 가중치를 적용한 유사도를 산출하는 것이 바람직하다.In addition, the content similarity calculation unit preferably calculates a similarity by applying attribute weights according to attributes of each of the contents constituting the pair of similar contents.

또한 상기 컨텐츠 유사도 산출부는 상기 유사 컨텐츠 쌍을 함께 이용한 사용자 수가 미리 설정된 소수 사용자 결정값 이하인 경우 상기 유사 컨텐츠 쌍을 필터링하는 것이 바람직하다.The content similarity calculator may filter the pair of similar contents when the number of users using the pair of similar contents is less than or equal to a predetermined minority user determination value.

이하, 본 발명의 바람직한 실시 예들을 첨부한 도면을 참조하여 상세히 설명한다. 이 때 첨부한 도면들 중 동일한 구성요소들은 가능한 한 어느 곳에서든지 동일한 부호들로 나타내고 있음에 유의해야 한다. 또한 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. At this time, it should be noted that the same elements in the accompanying drawings are represented by the same reference numerals wherever possible. In addition, detailed descriptions of well-known functions and configurations that may unnecessarily obscure the subject matter of the present invention will be omitted.

도 1은 본 발명의 일실시 예에 따른 협업 필터링 시스템에 대한 개략적인 블록도이다. 도 1을 참조하면 본 발명의 일실시 예에 따른 협업 필터링 시스템(100)은 컨텐츠 제공서버 인터페이스부(105), 컨텐츠 이용정보 저장부(110), 컨텐츠 이용 정보 저장 관리 데이터베이스(DB: Data Base)(115), 사용자별 컨텐츠 선호도 산출부(120), 사용자별 컨텐츠 선호도 관리 DB(125), 컨텐츠 유사도 산출부(130), 유사도 데이터 관리 DB(135), 사용자별 추천 데이터 생성부(140), 유사도 데이터 관리 DB(145) 및 사용자별 추천 데이터 제공부(150)를 포함한다.1 is a schematic block diagram of a collaborative filtering system according to an embodiment of the present invention. Referring to FIG. 1, the collaborative filtering system 100 according to an exemplary embodiment may include a content providing server interface 105, a content usage information storage 110, and a content usage information storage management database (DB). 115, content preference calculator 120 for each user, content preference manager DB 125 for each user, content similarity calculator 130, similarity data management DB 135, recommended data generator 140 for each user, Similarity data management DB 145 and the user-specific recommendation data providing unit 150 is included.

컨텐츠 제공서버 인터페이스부(105)는 사용자들에게 컨텐츠 서비스를 제공하는 컨텐츠 제공 서버(미도시)와의 인터페이싱을 수행한다. 예를 들어 컨텐츠 제공서버 인터페이스부(105)는 상기 컨텐츠 제공 서버로부터 사용자들의 컨텐츠 이용 정보를 전달받아 이를 컨텐츠 이용정보 저장부(110)로 전달하고, 사용자별 추천 데이터 제공부(150)로부터 사용자별 추천 데이터를 전달받아 이를 상기 컨텐츠 제공 서버로 전달한다. 이 때 컨텐츠 제공서버 인터페이스부(105)는 사용자들의 컨텐츠 이용 정보를 수신하거나 사용자별 추천 데이터의 제공을 일정 시간 단위로 수행하는 것이 바람직하다. The content providing server interface 105 performs an interface with a content providing server (not shown) that provides a content service to users. For example, the content providing server interface 105 receives the content usage information of the users from the content providing server and delivers the content usage information to the content usage information storing unit 110, and the user recommendation data providing unit 150 for each user. Receives the recommendation data and delivers it to the content providing server. At this time, the content providing server interface 105 preferably receives the content usage information of the users or provides the recommendation data for each user on a predetermined time basis.

컨텐츠 이용정보 저장부(110)는 사용자들의 컨텐츠 이용 정보를 저장/관리한다. 예를 들어 컨텐츠 제공서버 인터페이스부(105)로부터 수신된 컨텐츠 이용 정보를 컨텐츠 이용 정보 관리 DB(115)에 저장한다. The content usage information storage unit 110 stores / manages content usage information of users. For example, the content usage information received from the content providing server interface 105 is stored in the content usage information management DB 115.

컨텐츠 이용 정보 관리 DB(115)는 컨텐츠 이용정보 저장부(110)를 통해 전달 된 사용자들의 컨텐츠 이용 정보를 저장/관리한다. 이때 컨텐츠 이용 정보 관리 DB(115)는 사용자의 컨텐츠에 대한 행위를 행위별로 구분한 3치원 배열로 저장하는 것이 바람직하다. 컨텐츠 이용 정보 관리 DB(115)에서 상기 컨텐츠 이용 정보를 관리하기 위한 테이블 구조의 예가 도 2에 예시되어 있다. The content usage information management DB 115 stores / manages content usage information of users delivered through the content usage information storage 110. At this time, the content usage information management DB 115 preferably stores the behavior of the user's content in a three-dimensional array divided by actions. An example of a table structure for managing the content usage information in the content usage information management DB 115 is illustrated in FIG. 2.

도 2의 예에서는 상기 컨텐츠 이용 정보와 상기 컨텐츠 이용 정보 항목별로 가중치가 적용된 선호도 판단 결과를 함께 저장하는 테이블 구조를 예시하고 있다. 특히, 도 2의 예에서는 사용자의 컨텐츠에 대한 행위 정보를 시간 순으로 정렬한 예를 도시하고 있다. 도 2를 참조하면 상기 컨텐츠 이용 정보는 임의의 컨텐츠에 대한 컨텐츠 이용시간, 상기 컨텐츠를 이용한 사용자 정보 및 상기 컨텐츠 이용 대상이 된 컨텐츠 정보를 포함하고, 상기 컨텐츠에 대한 사용자의 행위 정보(예컨대, 행위수, 행위 타입 및 행위별 가중치)를 포함한다. 이 때 언급되지 않은 시간선호도, 행위별 가중치, 행위 선호도 및 선호 점수는 상기 컨텐츠 이용 정보를 참조하여 사용자별 컨텐츠 선호도 산출부(120)에서 산출한 결과로서, 컨텐츠 선호도 산출부(120)에 대한 동작 설명시에 상기 언급되지 않은 항목들에 대하여 보다 구체적으로 설명할 것이다.In the example of FIG. 2, a table structure for storing the content usage information and a result of determining a preference applied to each of the content usage information items together is illustrated. In particular, the example of FIG. 2 illustrates an example of arranging behavior information of a user's content in chronological order. Referring to FIG. 2, the content usage information includes content usage time for any content, user information using the content, and content information targeted for the content use, and the user's behavior information (eg, behavior) for the content. Number, type of action, and weight for each action). In this case, time preference, weight for each action, behavior preference, and preference score, which are not mentioned, are calculated by the content preference calculation unit 120 for each user with reference to the content usage information, and operate on the content preference calculation unit 120. Items not mentioned above in the description will be described in more detail.

다시 도 1을 참조하면, 사용자별 컨텐츠 선호도 산출부(120)는 도 2에 예시된 바와 같은 컨텐츠 이용 정보(예컨대, 컨텐츠 이용시간, 그 사용자 및 이용 대상이 된 컨텐츠들에 대한 정보)를 이용하여 각 컨텐츠에 대한 사용자별 선호도를 산출한다. 예를 들어, 사용자별 컨텐츠 선호도 산출부(120)는 컨텐츠 이용 정보 관리 DB(115)에 저장된 사용자들의 컨텐츠 이용 정보를 분석한 후, 그 결과에 따라 가중 치가 적용된 사용자별 컨텐츠 선호도를 산출한다. 특히 사용자별 컨텐츠 선호도 산출부(120)는 각 컨텐츠에 대한 사용자의 행위를 고려한 행위 선호도 및 컨텐츠 이용 시간 정보를 고려한 시간 선호도를 각각 산출한 후 그 두 값을 함께 고려한 사용자 선호도를 산출하는 것이 바람직하다.Referring back to FIG. 1, the content preference calculator 120 for each user uses content usage information (eg, content usage time, information on the user, and contents targeted for use) as illustrated in FIG. 2. The user preference for each content is calculated. For example, the content preference calculator 120 for each user analyzes content usage information of users stored in the content usage information management DB 115, and then calculates content preference for each user to which the weighted value is applied. In particular, the content preference calculator 120 for each user calculates a user's preference for each content and a time preference in consideration of content usage time information, and then calculates a user's preference considering both values. .

이를 위해 사용자별 컨텐츠 선호도 산출부(120)는 상기 컨텐츠 이용 정보를 분석하여 각 컨텐츠에 대한 사용자의 행위를 판단하고, 미리 설정된 행위별 선호도 가중치를 상기 판단된 사용자의 행위 각각에 적용하여 사용자별 행위 선호도를 산출한다. 도 2에는 행위 타입 ‘읽기’, ‘구매’, ‘감상평 작성’에 대하여 행위별 가중치가 각각 ‘1’, ‘5’, ‘3’으로 설정된 경우 특정 컨텐츠에 대한 사용자의 행위별로 행위 선호도를 산출한 경우의 예가 도시되어 있다. 즉 도 2의 예에서 첫 번째 항목에 등록된 컨텐츠 이용 정보는 ‘사용자 1’이 ‘컨텐츠 1’에 대하여 ‘읽기’, ‘구매’, ‘감상평 작성’의 행위를 모두 수행한 경우이므로 그들 각각에 설정된 가중치를 모두 합한 가중치 '9' 가 행위 선호도로 산출되었고, 두 번째 항목에 등록된 컨텐츠 이용 정보는 ‘사용자 3’이 ‘컨텐츠 1’에 대하여 ‘읽기’, ‘구매’의 행위를 수행한 경우이므로 그들 각각에 설정된 가중치를 합한 가중치 ‘6’이 행위 선호도로 산출되었다.To this end, the content preference calculator 120 for each user analyzes the content usage information to determine a user's behavior for each content, and applies a preset preference weight for each behavior to each of the determined user's behaviors. Calculate your preferences. FIG. 2 calculates behavioral preferences for each user's behavior for a specific content when the weight for each activity is set to '1', '5', and '3' for the behavior types 'read', 'purchase', and 'write review'. An example of one case is shown. That is, in the example of FIG. 2, the content usage information registered in the first item is the case where 'user 1' performs all the actions of 'read', 'purchase', and 'write review' for 'content 1'. When the weight '9', which is the sum of all set weights, is calculated as the behavior preference, and the content usage information registered in the second item is 'user 3' performing 'read' and 'purchase' actions on 'content 1'. Therefore, the weight '6', which is the sum of the weights set for each of them, was calculated as the behavior preference.

또한 사용자별 컨텐츠 선호도 산출부(120)는 상기 컨텐츠 이용 정보를 분석하여 컨텐츠들 각각의 이용 시간 정보를 판단하고, 그 이용 시간 정보에 따른 시간 가중치를 적용하여 해당 컨텐츠에 대한 사용자별 시간 선호도를 산출한다. 이 때 상기 이용 시간 정보는 ‘특정 컨텐츠의 가장 최근 이용시간과 현재 시간과의 간격 ’인 것이 바람직하다. 즉 사용자별 컨텐츠 선호도 산출부(120)는 도 3에 예시된 바와 같이 ‘특정 컨텐츠의 가장 최근 이용시간과 현재 시간과의 간격’ 반비례하는 시간 가중치를 적용하여 사용자별 시간 선호도를 산출하는 것이 바람직하다. 이는 최근에 이용된 컨텐츠에 높은 가중치를 적용하기 위한 것이다. In addition, the content preference calculator 120 for each user analyzes the content usage information to determine the use time information of each of the contents, and calculates the time preference for each user for the corresponding content by applying a time weight according to the use time information. do. In this case, the use time information may be 'an interval between a most recent use time of a specific content and a current time'. That is, the content preference calculator 120 for each user may calculate the time preference for each user by applying a time weight that is inversely proportional to the 'interval between the most recent use time of a specific content and the current time' as illustrated in FIG. 3. . This is to apply a high weight to recently used content.

그리고 상기 행위 선호도 및 시간 선호도를 함께 적용하여 해당 컨텐츠에 대한 사용자의 선호도를 산출한다. 도 2에서는 상기 행위 선호도 및 시간 선호도를 합한 값을 선호 점수로 산출한 예를 도시하고 있다.In addition, the behavior preference and the time preference are applied together to calculate the user's preference for the corresponding content. 2 illustrates an example in which the sum of the behavior preference and the time preference is calculated as a preference score.

한편 사용자별 컨텐츠 선호도 산출부(120)는 비정상 사용자 판단 기준을 미리 설정하고 그 기준을 만족하는 사용자의 컨텐츠 이용 정보는 필터링한다. 즉 상기 비정상 사용자로 판단된 사용자의 컨텐츠 이용 정보는 사용자별 컨텐츠 선호도 산출시 고려하지 않는다. 이는 서비스를 사용하는 사용자 중 일부가 스팸 광고와 같이 불특정한 다수에게 광고를 게재하는 등, 자신의 선호 분야에 맞추어 선별적으로 컨텐츠를 이용하지 않고 무작위로 컨텐츠에 접근할 경우, 컨텐츠 간의 연관도를 계산하는 데에 있어서 부정확한 결과를 유도하는 것을 방지하기 위한 것이다.Meanwhile, the content preference calculator 120 for each user presets an abnormal user determination criterion and filters the content usage information of the user who satisfies the criterion. That is, the content usage information of the user determined as the abnormal user is not considered when calculating the content preference for each user. If some users of the service access the content randomly without using the content selectively according to their own preferences, such as displaying advertisements to an unspecified number of users such as spam advertisements, the association between the contents This is to avoid inducing incorrect results in the calculation.

이를 위해 사용자별 컨텐츠 선호도 산출부(120)는 사용자의 속성 또는 행위 형태에 의거하여 비정상 사용자를 판단하기 위한 기준을 미리 설정하고 그 기준에 만족하는 사용자의 컨텐츠 이용 정보를 필터링하는 것이 바람직하다. 이 때 사용자별 컨텐츠 선호도 산출부(120)는 사용자의 직업이 해당 컨텐츠를 판매하는 온/오프라인 쇼핑몰 주인이거나 해당 컨텐츠를 개발하는 프로그램 개발자인 경우 또는 사용자의 국적이 해당 컨텐츠의 판매 대상국 이외의 국가인 경우 그 사용자를 비정상 사용자로 판단한다. 또한 사용자의 행위 형태를 분석하여 동일 컨텐츠에 대하여 최소 접근시간 간격 이내에 최대 접근횟수 이상 접근하는 행위 형태를 보이는 사용자 또는 최소 수행 시간 간격 이내에 최대 수행 횟수 이상 동일한 행위를 수행하는 사용자를 비정상 사용자로 판단한다. 특히 상기 동일 행위의 대상이 불특정 다수의 사용자인 경우 상기 사용자를 비정상 사용자로 판단하는 것이 바람직하다.To this end, the content preference calculator 120 for each user may preset a criterion for determining an abnormal user based on the user's attribute or behavior type and filter the user's content usage information that satisfies the criterion. In this case, the content preference calculator 120 for each user may be an owner of an online / offline shopping mall that sells the contents, a program developer who develops the contents, or a user whose nationality is a country other than the target country for selling the contents. If that user is determined to be an abnormal user. In addition, by analyzing the behavior of the user, a user who exhibits the behavior of accessing the same content more than the maximum number of accesses within the minimum access time interval or determines the user who performs the same behavior more than the maximum number of executions within the minimum execution time interval as an abnormal user. . In particular, when the target of the same action is an unspecified number of users, it is preferable to determine the user as an abnormal user.

도 2의 예에서 'yy/mm/dd hh:mm:ss 6_1 ~ yy/mm/dd hh:mm:ss 6_3'의 시간간격이 미리 설정된 최소 접근시간 간격 이내이고 상기 최대 접근횟수가 3회라면, 도 2의 테이블 6번째부터 8번째 항목의 컨텐츠 이용 정보는 ‘동일 컨텐츠에 대하여 최소 접근시간 간격 이내에 최대 접근횟수 이상 접근하는 행위 형태’에 해당된다. 따라서 사용자별 컨텐츠 선호도 산출부(120)는 상기 항목을 필터링 대상으로 판단한다. In the example of FIG. 2, if the time interval of 'yy / mm / dd hh: mm: ss 6_1 to yy / mm / dd hh: mm: ss 6_3' is within a preset minimum access time interval and the maximum number of accesses is three times. In addition, the content usage information of the sixth to eighth items of the table of FIG. 2 corresponds to the 'action type of accessing the same content more than the maximum number of times within a minimum access time interval.' Therefore, the content preference calculator 120 for each user determines the item as a filtering target.

또한 사용자별 컨텐츠 선호도 산출부(120)는 소수 집단에 속한 사용자의 특성이 반영되지 않는 문제점을 해결하기 위해 사용자 속성을 이용하여 상기 사용자별 선호도 산출 결과를 분리한 후 사용자 속성별 선호도를 산출하는 것이 바람직하다. 예를 들어, 성, 연령 및 지역 중 적어도 하나 이상의 사용자 속성 조합에 의해 상기 사용자별 선호도 산출 결과를 분리하는 것이 바람직하다.In addition, the content preference calculator 120 for each user calculates a preference for each user property after separating the user preference calculation result using a user property in order to solve a problem in which the characteristics of a user belonging to a minority group are not reflected. desirable. For example, it is preferable to separate the preference calculation result for each user based on a combination of at least one user attribute among gender, age, and region.

이를 위해 사용자별 컨텐츠 선호도 산출부(120)는 먼저 도 2에 예시된 바와 같은 사용자별 컨텐츠 선호도가 표시된 컨텐츠 이용 정보 관리 테이블을 도 4a 내지 도 4c에 예시된 바와 같이 사용자 별로 분리한다. 즉, 도 4a는 도 2에 예시된 테이블로부터 ‘사용자 1’의 컨텐츠 이용 정보 및 선호도 정보를 분리한 예이고, 도 4b는 도 2에 예시된 테이블로부터 ‘사용자 3’의 컨텐츠 이용 정보 및 선호도 정보를 분리한 예이고, 도 4c는 도 2에 예시된 테이블로부터 ‘사용자 4’의 컨텐츠 이용 정보 및 선호도 정보를 분리한 예를 나타낸다. To this end, the content preference calculator 120 for each user first divides the content usage information management table displaying the content preference for each user as illustrated in FIG. 2 for each user as illustrated in FIGS. 4A to 4C. That is, FIG. 4A illustrates an example of separating content usage information and preference information of 'user 1' from the table illustrated in FIG. 2, and FIG. 4B illustrates content usage information and preference information of 'user 3' from the table illustrated in FIG. 4C illustrates an example of separating content usage information and preference information of 'user 4' from the table illustrated in FIG. 2.

이와 같이 사용자별 컨텐츠 이용 정보 및 선호도 정보를 분리하였으면, 컨텐츠 선호도 산출부(120)는 도 5에 예시된 바와 같은 사용자 정보 관리 테이블로부터 사용자들의 속성을 판단한 후 특정 사용자 속성이 동일한 사용자들에 대한 컨텐츠 이용 정보 및 선호도 정보 테이블을 병합함으로써 사용자 속성별 컨텐츠 이용 정보 및 선호도 정보 관리 테이블을 생성한다.When the content usage information and the preference information for each user are separated as described above, the content preference calculator 120 determines the attributes of the users from the user information management table as illustrated in FIG. 5, and then the contents for the users having the same specific user attributes. The content usage information and preference information management table for each user attribute is generated by merging the usage information and the preference information table.

도 6a 및 도 6b는 상기 과정에 의해 생성된 사용자 속성별 컨텐츠 이용 정보 및 선호도 정보 관리 테이블의 예를 도시하고 있다. 즉, 도 6a는 서울/경기 지역 여자의 컨텐츠 선호도를 관리하는 테이블을 예시하고, 도 6b는 충청 지역 남자의 컨텐츠 선호도를 관리하는 테이블을 예시하고 있다.6A and 6B illustrate examples of the content usage information and preference information management table for each user attribute generated by the above process. That is, FIG. 6A illustrates a table managing content preferences of women in Seoul / Gyeonggi area, and FIG. 6B illustrates a table managing content preferences of men in Chungcheong area.

이와 같이 사용자별 컨텐츠 선호도 산출부(120)에서 산출된 사용자별 컨텐츠 선호도 산출 결과는 사용자별 컨텐츠 선호도 관리 DB(125)에 저장/관리된다. As such, the user-specific content preference calculation result calculated by the user-specific content preference calculation unit 120 is stored / managed in the user-specific content preference management DB 125.

컨텐츠 유사도 산출부(130)는 컨텐츠 이용 정보 관리 DB(115)에 저장된 사용자들의 컨텐츠 이용 정보를 분석하여 함께 이용된 경우가 많은 컨텐츠 쌍을 판단하고, 상기 컨텐츠 쌍의 이용 형태 및 상기 쌍을 이루는 컨텐츠들 각각의 속성에 따른 가중치를 적용하여 상기 컨텐츠 쌍들의 유사도를 산출한다. The content similarity calculating unit 130 analyzes the content usage information of the users stored in the content usage information management DB 115 to determine a content pair that is often used together, and the usage form of the content pair and the content forming the pair. Similarity of the content pairs is calculated by applying weights according to the respective attributes.

이를 위해 컨텐츠 유사도 산출부(130)는 상기 컨텐츠 이용 정보를 분석하여 미리 설정된 시간 간격 이내에 동일 사용자에 의해 연속적으로 이용된 서로 다른 2 이상의 컨텐츠들 중 선택 가능한 2개의 컨텐츠 조합을 유사 컨텐츠 쌍으로 결정한다. 그리고 그 유사 컨텐츠 쌍에 대하여 동일 사용자에 의해 이용된 이용 시점간 간격에 반비례하는 시간 가중치를 적용한 유사도를 산출한다. 예를 들어 30일전 같은 날에 ‘컨텐츠 A’와 ‘컨텐츠 B’를 함께 사용한 사용자가 10인이고, 10일전에 ‘컨텐츠 C’를 사용하고 1일전에 ‘컨텐츠 D’를 사용한 사용자가 10인이라면 두 번째 경우가 더 최근에 일어난 일이지만 ‘컨텐츠 A’와 ‘컨텐츠 B’는 같은 날에 일어난 현상이므로 ‘컨텐츠 A’와 ‘컨텐츠 B’가 ‘컨텐츠 C’와 ‘컨텐츠 D’보다 더 유사도가 높게 계산되도록 가중치를 부여하는 것이다. To this end, the content similarity calculating unit 130 analyzes the content usage information and determines two content combinations that can be selected from two or more different contents consecutively used by the same user within a preset time interval as similar content pairs. . Similarity is calculated by applying a time weight that is inversely proportional to the interval between use times used by the same user for the similar content pair. For example, if 10 users used 'Content A' and 'Content B' on the same day 30 days ago, and 10 users used 'Content C' 10 days ago and 'Content D' 1 day ago The second case is more recent, but 'Content A' and 'Content B' happen on the same day, so 'Content A' and 'Content B' are more similar than 'Content C' and 'Content D'. It is weighted to be calculated.

이 때 컨텐츠 유사도 산출부(130)는 서로 다른 2이상의 컨텐츠들이 미리 설정된 시간 간격 이내에 동일 사용자에 의해 연속적으로 사용된 횟수가 미리 설정된 횟수 이상인 경우에 그 컨텐츠들 중 선택 가능한 2개의 컨텐츠 조합을 유사 컨텐츠 쌍으로 결정하는 것이 바람직하다. 한편 컨텐츠 유사도 산출부(130)는 클러스터링에 의해 상기 유사 컨텐츠 쌍을 결정하며, 상기 클러스터링 방법은 널리 알려진 바와 같이 피어슨 상관관계, 인기도차 방법, 코사인 유사도 방법 등 일반적인 협업 필터링에서 사용되는 모든 알고리즘을 사용하는 것이 가능하다. 따라서 본 명세서에서는 유사 컨텐츠 쌍을 생성하기 위핸 구체적인 클러스터링 과정에 대한 설명은 생략할 것이다.In this case, the content similarity calculating unit 130 compares two content combinations that can be selected from among two or more different contents when the number of consecutive use by the same user within a preset time interval is equal to or more than a preset number. It is preferable to determine in pairs. Meanwhile, the content similarity calculating unit 130 determines the similar content pair by clustering, and the clustering method uses all algorithms used in general collaborative filtering such as Pearson correlation, popularity difference method, and cosine similarity method as is well known. It is possible to. Therefore, the detailed description of the clustering process for generating a pair of similar content will be omitted in the present specification.

또한 컨텐츠 유사도 산출부(130)는 상기 유사 컨텐츠 쌍을 이루는 컨텐츠들 각각의 속성에 따른 속성 가중치를 적용한 유사도를 산출하는 것이 바람직하다. 즉 컨텐츠 유사도 산출부(130)는 상기 유사 컨텐츠 쌍을 이루는 컨텐츠들이 속한 카테 고리의 유사도에 비례하는 속성 가중치를 적용하는 것이 바람직하다. 예를 들어 ‘컨텐츠 A’, ‘컨텐츠 B’, ‘컨텐츠 C’, ‘컨텐츠 D’가 미술작품이라고 하고 행위 기록에 의해 계산된 순수한 ‘컨텐츠 A’와 ‘컨텐츠 B’간 유사도가 ‘0.3’, ‘컨텐츠 C’와 ‘컨텐츠 D’간 유사도가 ‘0.3’으로 동일하다고 할 때, ‘컨텐츠 A’는 서양화, ‘컨텐츠 B’는 동양화, ‘컨텐츠 C’는 조각, ‘컨텐츠 D’는 비디오 아트라고 한다면, ‘컨텐츠 A’와 ‘컨텐츠 B’가 그림이라는 이미 알려진 유사 속성이 있으므로, 컨텐츠 유사도 산출부(130)는 ‘컨텐츠 A’와 ‘컨텐츠 B’의 유사도 점수에 가중치를 부여하여 ‘컨텐츠 C’와 ‘컨텐츠 D’보다 더 높은 유사도가 산출되도록 하는 것이다. In addition, the content similarity calculator 130 may calculate similarity by applying property weights according to attributes of each of the contents constituting the similar content pair. That is, the content similarity calculator 130 may apply an attribute weight proportional to the similarity of the categories to which the contents of the similar content pair belong. For example, "Content A", "Content B", "Content C", and "Content D" are works of art, and the similarity between pure "Content A" and "Content B" calculated by the record of conduct is "0.3", When the similarity between 'Content C' and 'Content D' is the same as '0.3', 'Content A' is Western painting, 'Content B' is Oriental painting, 'Content C' is sculpture, and 'Content D' is video art. If so, since there is already known similar property that 'content A' and 'content B' are pictures, the content similarity calculating unit 130 weights the similarity scores of 'content A' and 'content B' to 'content C'. And higher similarity than 'content D'.

한편 컨텐츠 유사도 산출부(130)는 상기 유사 컨텐츠 쌍을 함께 이용한 사용자 수가 미리 설정된 소수 사용자 결정값 이하인 경우 상기 유사 컨텐츠 쌍을 필터링한다. 이는 소수 사용자에 의해 의도적으로 컨텐츠간 유사도가 실제와는 다르게 산출되는 것을 방지하기 위한 것으로서, 보다 정확한 컨텐츠 유사도를 산출하기 위한 방법 중 하나인 것이다. 이 때 컨텐츠 유사도 산출부(130)는 상기 소수 사용자들의 속성을 비교하여 그 속성이 미리 설정된 개수 이상 동일한 경우 상기 소수 사용자들의 속성 및 필터링된 컨텐츠 쌍 정보를 상기 사용자별 컨텐츠 선호도 산출부로 전달하여 해당 사용자 그룹의 선호도 정보로 관리하도록 하는 것이 바람직하다. 이는 소수의 사용자 그룹이라 하더라도 그 사용자들이 서로 공통적인 속성을 가지는 그룹이라면 그들의 성향을 고려해야할 필요가 있기 때문이다.Meanwhile, the content similarity calculator 130 filters the similar content pairs when the number of users using the similar content pairs is equal to or smaller than a predetermined minority user determination value. This is to prevent the similarity between contents intentionally calculated by a small number of users, which is one of the methods for calculating the more accurate content similarity. At this time, the content similarity calculating unit 130 compares the attributes of the minority users, and if the attributes are equal to or more than a preset number, the content similarity calculating unit 130 transmits the attributes of the minority users and the filtered content pair information to the content preference calculator for each user, corresponding user. It is desirable to manage the preference information of the group. This is because even a small number of user groups need to consider their propensity if the users have attributes in common with each other.

이와 같이 컨텐츠 유사도 산출부(130)에서 산출된 컨텐츠 쌍들의 유사도 산 출 결과는 유사도 데이터 관리 DB(135)에 저장/관리된다. The similarity calculation results of the content pairs calculated by the content similarity calculator 130 are stored / managed in the similarity data management DB 135.

도 7은 이러한 유사도 데이터를 저장/관리하는 테이블의 예로서, 본 발명에의한 유사도 데이터는 상기한 바와 같이 일반 클러스터링 결과에 시간 가중치 및 속성 가중치를 부가한 값이므로 도 7에 예시된 바와 같이 관리하는 것이 가능하다.FIG. 7 is an example of a table for storing / managing such similarity data. The similarity data according to the present invention is a value obtained by adding a time weight and an attribute weight to a general clustering result as described above. It is possible to.

사용자별 추천 데이터 생성부(140)는 사용자별 컨텐츠 선호도 관리 DB(125)에 저장된 사용자별 컨텐츠 선호도 산출결과와 유사도 데이터 관리 DB(135)에 저장된 컨텐츠 쌍들의 유사도 데이터를 이용하여 사용자별 추천 데이터를 생성한다. 이 때 상기 사용자별 추천 데이터의 생성 방법은 이미 공지된 다양한 기술을 이용하는 것이 가능하다. 사용자별 추천 데이터 생성부(140)는 상기 사용자별 추천 데이터 생성 결과를 사용자별 추천 데이터 관리 DB(145)에 저장한다. The user recommendation data generation unit 140 uses the user's content preference calculation result stored in the user's content preference management DB 125 and the user's recommendation data using the similarity data of the pairs of contents stored in the similarity data management DB 135. Create At this time, the method for generating the recommendation data for each user may use various known techniques. The user recommendation data generation unit 140 stores the user recommendation data generation result in the user recommendation data management DB 145.

사용자별 추천 데이터 제공부(150)는 사용자별 추천 데이터를 해당 사용자에게 제공하기 위해 사용자별 추천 데이터 관리 DB(145)에 저장된 생성부에서 생성된 사용자별 추천 데이터를 컨텐츠 제공서버 인터페이스부(105)로 전달한다. The user-specific recommendation data providing unit 150 provides the user-specific recommendation data generated by the generation unit stored in the user-specific recommendation data management DB 145 to provide the user-specific recommendation data to the user. To pass.

도 8은 본 발명의 일실시 예에 따른 협업 필터링 방법을 이용하여 사용자별 추천 데이터를 제공하는 과정에 대한 처리 흐름도이다. 즉 도 8은 도 1에 예시된 협업 필터링 시스템에 의한 협업 필터링 과정의 예를 도시하고 있다. 8 is a flowchart illustrating a process of providing recommendation data for each user using a collaborative filtering method according to an embodiment of the present invention. That is, FIG. 8 illustrates an example of a collaborative filtering process by the collaborative filtering system illustrated in FIG. 1.

도 1 및 도 8을 참조하면 본 발명의 일실시 예에 따른 협업 필터링 방법을 설명하면 다음과 같다. 1 and 8, a collaborative filtering method according to an embodiment of the present invention will be described.

본 발명은 컨텐츠들 각각에 대한 사용자의 행위 정보 및 컨텐츠들의 이용 형태에 따라 사용자별 선호도 및 컨텐츠의 유사도를 산출하고, 그 결과에 의거하여 사용자별 추천 데이터를 생성한다. 따라서 본 발명의 협업 필터링 방법은 컨텐츠 제공 서버로부터 사용자들의 컨텐츠 이용 정보를 전달받아 저장하는 과정(미도시)이 선행되어야 한다.The present invention calculates the user preferences and the similarity of the content according to the user's behavior information and the usage form of the content for each of the content, and generates the recommendation data for each user based on the result. Therefore, the collaborative filtering method of the present invention should be preceded by a process (not shown) for receiving and storing content usage information of users from a content providing server.

상기 컨텐츠 이용 정보가 저장되면 협업 필터링 시스템(100)은 그 컨텐츠 이용 정보를 분석하여 각 컨텐츠에 대한 사용자의 행위 및/또는 이용시간 정보를 판단하고, 상기 사용자의 행위 및/또는 이용 시간 정보에 따른 가중치를 적용하여 컨텐츠 각각에 대한 사용자별 선호도를 산출한다(S100). 그리고 상기 컨텐츠 이용 정보를 분석하여 함께 이용된 경우가 많은 컨텐츠 쌍을 판단하고, 상기 컨텐츠 쌍의 이용 형태 및 상기 쌍을 이루는 컨텐츠들 각각의 속성에 따른 가중치를 적용하여 상기 컨텐츠 쌍들의 유사도를 산출한다(S200). When the content usage information is stored, the collaborative filtering system 100 analyzes the content usage information to determine user's behavior and / or usage time information for each content, and according to the user's behavior and / or usage time information. The user preference for each content is calculated by applying the weight (S100). The content usage information is analyzed to determine a content pair that is often used together, and the similarity of the content pairs is calculated by applying a weight according to the usage form of the content pair and the attributes of each of the contents of the pair. (S200).

상기와 같이 사용자별 선호도 및 컨텐츠 쌍의 유사도가 산출되면 협업 필터링 시스템(100)은 그 사용자별 선호도 및 상기 컨텐츠 쌍의 유사도를 이용하여 사용자별 추천 데이터를 생성한다(S300). 그리고 그 사용자별 컨텐츠 추천 데이터 제공 조건을 만족하면 해당 사용자에게 컨텐츠 추천 데이터를 제공한다(S400). 이 때 상기 컨텐츠 추천 데이터 제공 조건은 해당 사용자가 컨텐츠 제공 서버에 접근한 경우인 것이 바람직하다.When the user preferences and the similarity of the content pairs are calculated as described above, the collaborative filtering system 100 generates user-specific recommendation data using the user preferences and the similarity of the content pairs (S300). When the content recommendation data providing condition for each user is satisfied, the content recommendation data is provided to the corresponding user (S400). At this time, the content recommendation data providing condition is preferably when the user accesses the content providing server.

도 9는 본 발명의 일실시 예에 따른 사용자별 컨텐츠 선호도 산출 과정에 대한 처리 흐름도이다. 도 9를 참조하면 도 8에 예시된 컨텐츠 선호도 산출과정(S100)은 다음과 같다. 9 is a flowchart illustrating a process of calculating content preference for each user according to an embodiment of the present invention. Referring to FIG. 9, the content preference calculation process S100 illustrated in FIG. 8 is as follows.

먼저 도 1에 예시된 바와 같은 협업 필터링 시스템(100)은 미리 저장된 컨텐 츠 이용 정보를 분석하여 각 컨텐츠에 대한 사용자의 행위 정보를 저장한다(S110). 즉 각 컨텐츠에 대한 사용자의 행위 정보를 판단하고 그 결과를 저장한다. 그리고 그 행위 종류에 따른 행위 가중치가 적용된 행위 선호도를 산출한다(S120). 이를 위해 협업 필터링 시스템(100)은 미리 설정된 행위별 선호도 가중치를 상기 판단된 사용자의 행위 각각에 적용하여 해당 컨텐츠에 대한 사용자별 행위 선호도를 하는 것이 바람직하다.First, the collaborative filtering system 100 illustrated in FIG. 1 analyzes pre-stored content usage information and stores user's behavior information for each content (S110). That is, the user's behavior information for each content is determined and the result is stored. In addition, an action preference to which an action weight is applied according to the action type is calculated (S120). To this end, the collaborative filtering system 100 preferably applies a preset preference weight for each action to each of the determined user's actions to perform the user's action preference for the corresponding content.

또한 본 발명의 컨텐츠별 사용자 선호도 산출과정은 컨텐츠 이용 시간에 따른 시간 가중치가 적용된 시간 선호도를 산출한다(S130). 이를 위해 본 발명의 협업 필터링 시스템(100)은 컨텐츠들 각각의 이용 시간 정보를 판단하고, 그 이용 시간 정보에 따른 시간 가중치를 적용하여 해당 컨텐츠에 대한 사용자별 시간 선호도를 산출하는 것이 바람직하다. In addition, the user preference calculation process for each content of the present invention calculates the time preference to which the time weight is applied according to the content use time (S130). To this end, the collaborative filtering system 100 of the present invention may determine the use time information of each content, and calculate a time preference for each user for the corresponding content by applying time weights according to the use time information.

그리고 상기 사용자별 행위 선호도와 사용자별 시간 선호도를 이용하여 해당 컨텐츠에 대한 사용자별 선호도를 산출한다. The user preference for the corresponding content is calculated using the behavior preference for each user and the time preference for each user.

한편 이와 같이 사용자별 선호도를 산출한 본 발명의 협업 필터링 시스템(100)은 비정상 사용자 정보를 필터링한 후(S140) 그 사용자 속성별 컨텐츠 선호도를 분석한다(S150). 이 때 상기 비정상 사용자 정보 필터링 및 사용자 속성별 컨텐츠 선호도 분석 과정은 도 1을 참조한 사용자별 컨텐츠 선호도 산출부(120)의 동작 설명시 언급된 바와 유사하므로 그 구체적인 설명을 생략한다.Meanwhile, the collaborative filtering system 100 of the present invention, which calculates user preferences as described above, filters the abnormal user information (S140) and analyzes the content preferences of the user attributes (S150). In this case, the process of filtering abnormal user information and analyzing content preference for each user attribute is similar to that mentioned in the operation description of the content preference calculator 120 for each user with reference to FIG. 1, and thus a detailed description thereof will be omitted.

이와 같은 본 발명의 사용자별 컨텐츠 선호도 산출과정은 비정상 사용자의 컨텐츠 사용 이력 정보를 무시하면서 소수 집단의 컨텐츠 이용 성향을 고려함으로 써 보다 정확한 추천 데이터를 산출할 수 있는 장점이 있을 것이다. The content preference calculation process for each user of the present invention has an advantage of calculating accurate recommendation data by considering content usage tendency of a small group while ignoring content usage history information of an abnormal user.

도 10은 본 발명의 일실시 예에 따른 컨텐츠 유사도 산출 과정에 대한 처리 흐름도이다. 도 10을 참조하면 도 8에 예시된 컨텐츠 쌍의 유사도 산출과정(S200)은 다음과 같다. 10 is a flowchart illustrating a content similarity calculation process according to an embodiment of the present invention. Referring to FIG. 10, the process of calculating the similarity degree (S200) of the content pairs illustrated in FIG. 8 is as follows.

먼저 도 1에 예시된 바와 같은 협업 필터링 시스템(100)은 미리 저장된 컨텐츠 이용 정보를 분석하여, 미리 설정된 시간 간격 이내에 동일 사용자에 의해 연속적으로 이용된 서로 다른 2이상의 컨텐츠들 중 선택 가능한 2개의 컨텐츠 조합을 유사 컨텐츠 쌍으로 결정한다(S210).First, as illustrated in FIG. 1, the collaborative filtering system 100 analyzes pre-stored content usage information, and selects two content combinations among two or more different contents consecutively used by the same user within a preset time interval. Is determined as a similar content pair (S210).

그리고 클러스터링된 컨텐츠 쌍에 대하여 이용시점간 간격에 반비례하는 시간 가중치를 부여하고(S220), 클러스터링된 컨텐츠 쌍에 대하여 그 속성에 따른 속성 가중치를 부여함으로써(S230) 상기 유사 컨텐츠 쌍을 이루는 컨텐츠들간의 유사도를 산출한다. 이 때 상기 클러스터링 과정(S210)과, 시간 및 속성 가중치 적용 과정(S220, S230)에 대한 구체적인 설명은 도 1을 참조한 컨텐츠 유사도 산출부(130)의 동작 설명시 언급된 바와 유사하므로 그 구체적인 설명을 생략한다. In addition, a time weight is inversely proportional to an interval between times of use for the clustered content pairs (S220), and an attribute weight value corresponding to the attribute is assigned to the clustered content pairs (S230). Calculate the similarity. In this case, a detailed description of the clustering process (S210) and the time and attribute weight application process (S220, S230) are similar to those mentioned in the description of the operation of the content similarity calculator 130 with reference to FIG. Omit.

이 때 상기와 같이 유사도를 산출한 본 발명의 협업 필터링 시스템(100)은 유사 컨텐츠 쌍으로 분류된 컨텐츠 쌍을 함께 이용한 사용자들의 수가 미리 설정된 소수 사용자 결정값 이하인 경우 상기 컨텐츠 쌍을 필터링(S240)한다. 이는 소수 사용자에 의해 의도적으로 컨텐츠간 유사도가 실제와는 다르게 산출되는 것을 방지하기 위한 것이다.In this case, the collaborative filtering system 100 of the present invention, which calculates the similarity as described above, filters the content pairs when the number of users using the content pairs classified as the similar content pairs is less than or equal to a predetermined minority user determination value (S240). . This is to prevent the similarity between contents intentionally calculated by a few users differently from the actual one.

한편 본 발명의 협업 필터링 시스템(100)은 상기 소수 사용자들의 속성을 비 교하여 그 속성이 미리 설정된 개수 이상 동일한 경우 상기 소수 사용자들의 속성 및 필터링된 컨텐츠 쌍 정보를 상기 사용자별 컨텐츠 선호도 산출부로 전달하여 해당 사용자 그룹의 선호도 정보로 관리하도록 하는 것이 바람직하다. 이는 소수의 사용자 그룹이라 하더라도 그 사용자들이 서로 공통적인 속성을 가지는 그룹이라면 그들의 성향을 고려해야할 필요가 있기 때문이다.Meanwhile, the collaborative filtering system 100 of the present invention compares the attributes of the minority users and transmits the attributes of the minority users and the filtered content pair information to the content preference calculator for each user when the attributes are equal to or more than a preset number. It is preferable to manage the preference information of the corresponding user group. This is because even a small number of user groups need to consider their propensity if the users have attributes in common with each other.

이 때 상기 필터링에 대한 구체적인 처리 과정은 도 1을 참조한 컨텐츠 유사도 산출부(130)의 동작 설명시 언급된 바와 유사하므로 그 구체적인 설명을 생락한다. In this case, the detailed processing of the filtering is similar to that mentioned in the description of the operation of the content similarity calculator 130 with reference to FIG. 1, and thus the detailed description thereof is omitted.

이상에서는 본 발명에서 특정의 바람직한 실시 예에 대하여 도시하고 또한 설명하였다. 그러나 본 발명은 상술한 실시 예에 한정되지 아니하며, 특허 청구의 범위에서 첨부하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능할 것이다. In the above, specific preferred embodiments of the present invention have been illustrated and described. However, the present invention is not limited to the above-described embodiments, and various modifications can be made by any person having ordinary skill in the art without departing from the gist of the present invention attached to the claims. .

상기와 같이 본 발명은 사용자별 컨텐츠 선호도 및 컨텐츠간 유사도를 이용하여 사용자별 추천 데이터를 생성하되, 상기 사용자별 컨텐츠 선호도 및 컨텐츠간 유사도 산출시 사용자의 속성과 컨텐츠의 속성 및 사용자의 컨텐츠 이용 시점을 반영한다. 즉 본 발명은 사용자의 컨텐츠 이용정보 및 사용자 속성정보를 이용하여 가중치를 적용하고 보편타당한 사용자가 아니라고 판단된 사용자의 행위를 필터링한다. 이로 인해 본 발명은 추천의 정확도를 높이는 장점이 있다. 특히 소수 집단 의 사용자들에게도 정확한 추천 결과를 제공할 수 있는 장점이 있다. 또한 추천을 받는 대상자를 확장하는 효과가 있다.As described above, the present invention generates user recommendation data using content preferences and similarities between users, and calculates user's attributes, contents' attributes, and user's content use time when calculating user's content preferences and similarities. Reflect. That is, the present invention applies the weight using the content usage information and the user attribute information of the user and filters the user's behavior determined to be not a universally valid user. Because of this, the present invention has the advantage of increasing the accuracy of the recommendation. In particular, it has the advantage of providing accurate recommendation results to a small group of users. It also has the effect of extending the recipient of the recommendation.

Claims

In the collaborative filtering method,

Receiving and storing content usage information of users from a content providing server;

Analyzing the content usage information to determine the user's behavior and / or use time information for each content, and calculating the user's preference for each of the content by applying a weight according to the user's behavior and / or use time information process;

Analyzing the content usage information to determine a content pair that is often used together, and calculating similarity of the content pairs by applying weights according to the usage form of the content pair and the attributes of each of the contents of the pair; ; And

And generating recommendation data for each user by using the user preference and the similarity of the content pair.

The method of claim 1, wherein the user's preference calculation process is performed.

Analyzing the content usage information to determine a user's behavior for each content;

Calculating a user's behavior preference for the corresponding content by applying a preset weight of preference for each behavior to each of the determined actions of the user;

Determining time usage information of each of the contents and calculating time preference for each user by applying time weights according to the usage time information; And

And calculating a user preference for the corresponding content by using the action preference for each user and the time preference for each user.

The method of claim 2, wherein the preference calculation process for each user is performed.

And filtering the abnormal user information based on a predetermined abnormal user determination criterion.

The method of claim 3, wherein the filtering of abnormal user information is performed.

And determining a user who satisfies a predetermined abnormal user attribute as an abnormal user to determine an abnormal user.

The method of claim 4, wherein the abnormal user attribute is

Collaborative filtering method, characterized in that the user's occupation is an occupation that performs at least one of the action of creating, developing, processing, selling, advertising the content.

The method of claim 4, wherein the abnormal user attribute is

Collaborative filtering method, characterized in that the nationality of the user is a country other than the country of sale of the content.

The method of claim 4, wherein the abnormal user attribute is

A method for collaborative filtering, characterized in that it is determined that a user exhibiting a behavior of approaching the same content more than the maximum number of times within a minimum access time interval or a user performing the same behavior more than the maximum number of times within a minimum execution time interval is an abnormal user.

The method of claim 7, wherein the abnormal user attribute is

And determining the user as an abnormal user when the target of the same action is an unspecified number of users.

And separating the preference calculation result for each user using a user attribute and calculating a preference for each user attribute.

The method of claim 9, wherein the calculating of the content preference for each user attribute comprises

And separating the preference calculation result for each user based on a combination of at least one user attribute among gender, age, and region.

10. The method of claim 2 or 9, wherein the time weighting step is

And a time weight that is inversely proportional to the interval between the most recent use time of the corresponding content and the current time.

The method of claim 1, wherein the similarity calculation process

Analyzing the content usage information to determine a combination of two contents that are selectable among two or more different contents consecutively used by the same user within a preset time interval as a similar content pair;

Applying a time weight inversely proportional to an interval between usage times used by the same user for the pair of similar contents; And

And applying an attribute weight according to an attribute of each of the contents constituting the pair of similar contents.

The method of claim 12, wherein the similarity calculation process

And filtering the content pairs when the number of users using the content pairs is less than or equal to a predetermined minority user determination value.

The method of claim 13, wherein the similarity calculation process is performed.

And comparing the attributes of the minority users and managing the filtered content pairs as preference information of the corresponding user group when the attributes are equal to or more than a preset number.

The method of claim 12 or 13, wherein determining the similar content pair

If two or more different contents are consecutively used by the same user within a preset time interval, the collaborative filtering method comprising determining a combination of two contents that are selectable among the contents as a pair of similar contents. .

The method of claim 12 or 13, wherein the time weighting step

Determining an interval between usage time points used by the same user for the pair of similar contents;

Selecting time weights corresponding to the intervals between the use time points among predetermined time weights in inverse proportion to the time intervals at predetermined time intervals; And

And applying the selected time weight.

The method of claim 12 or 13, wherein the attribute weighting step

And applying an attribute weight proportional to the similarity of the category to which the contents belong.

In a collaborative filtering system,

A content providing server interface unit configured to receive content usage information of users from the content providing server through an interface with a content providing server providing content services to users;

A content usage information management database unit which stores / manages content usage information of the users;

A content preference calculator for each user, which analyzes content usage information of users stored in the content usage information management database unit and calculates content preference for each user to which a weight is applied according to user's behavior and / or usage time information for each content;

Analyzes content usage information of users stored in the content usage information management database unit to determine a content pair that is frequently used together, and applies a weight according to the usage form of the content pair and the attributes of each of the paired contents. A content similarity calculation unit configured to calculate similarity of the content pairs; And

And a user-specific recommendation data generator configured to generate recommendation data for each user by using the user's content preference calculation result of the preference calculator and the similarity calculation result of the content pairs of the content similarity calculator.

The method of claim 18,

A content preference management database unit for storing the content preference calculation result for each user;

A similarity management database unit for storing similarity calculation results of the content pairs; And

And a user-specific recommendation data management database unit for storing the user-specific recommendation data.

The method of claim 18 or 19,

Further comprising a user-specific recommendation data providing unit for transmitting the user-specific recommendation data to the content providing server through the interface to provide the user-specific recommendation data generated by the user-specific recommendation data generation unit to the corresponding user. Collaborative Filtering System.

19. The method of claim 18, wherein the content preference calculator for each user

The content usage information is analyzed to determine the user's behavior for each content, and the user's behavior preference is calculated by applying a preset preference weight for each behavior to each of the determined user's behaviors, and including the user's behavior preference. Collaborative filtering system, characterized in that for calculating the content preference.

Analyzing the content usage information to determine the use time information of each of the contents, calculates a time preference for each user by applying a time weight according to the use time information content including the time preference for each user Collaborative filtering system, characterized in that to calculate the preference.

And filtering the abnormal user information based on a preset abnormal user determination criterion to calculate the preference for each user.

24. The method of claim 23, wherein the content preference calculator for each user

Collaborative filtering system, characterized in that the user's job is a job that performs at least one of the creation, development, processing, sales, advertising of the content.

If the nationality of the user is a country other than the country of sale of the content, the collaborative filtering system, characterized in that the user is determined to be an abnormal user.

Collaborative filtering system, characterized in that the user showing the behavior of accessing the same content more than the maximum number of times within the minimum access time interval or the user performing the same behavior more than the maximum number of times within the minimum execution time interval as an abnormal user.

27. The method of claim 26, wherein the content preference calculator for each user

And the user is determined to be an abnormal user when the target of the same action is an unspecified number of users.

29. The method of claim 28, wherein the content preference calculator for each user

And a result of calculating preference preference for each user based on a combination of at least one user attribute among gender, age, and region.

Collaborative filtering system, characterized in that to calculate the content preference for each user by applying a time weight inversely proportional to the interval between the most recent use time of the content and the current time.

19. The method of claim 18, wherein the content similarity calculating unit

The content usage information is analyzed to determine two content combinations that are selectable among two or more different contents consecutively used by the same user within a preset time interval as the similar content pairs, and then to the similar content pairs by the same user. And a similarity applied to a time weight that is inversely proportional to the interval between the used points of view.

32. The method of claim 31, wherein the content similarity calculator

The collaborative filtering system of claim 2, wherein two or more different contents are consecutively used by the same user within a preset time interval, and a combination of two contents that are selectable among the contents is determined as a pair of similar contents. .

19. The method of claim 18, wherein the content similarity calculating unit

And calculating similarity by applying attribute weights according to attributes of each of the contents constituting the pair of similar contents.

The method of claim 33, wherein the content similarity calculating unit

And a property weight that is proportional to the similarity of the categories to which the contents of the similar content pair belong.

19. The method of claim 18, wherein the content similarity calculating unit

And if the number of users using the similar content pair is less than or equal to a predetermined minority user determination value, filtering the similar content pair.

36. The method of claim 35, wherein the content similarity calculator

When the attributes of the minority users are compared and the attributes are equal to or more than a preset number, the attributes of the minority users and the filtered content pair information are transmitted to the content preference calculator for each user to be managed as the preference information of the corresponding user group. Collaborative filtering system.