KR20210142484A

KR20210142484A - Method and Apparatus for Recommendation of Similar Content Items

Info

Publication number: KR20210142484A
Application number: KR1020200059377A
Authority: KR
Inventors: 장시영
Original assignee: 주식회사 엘지유플러스
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2021-11-25
Also published as: KR102380871B1

Abstract

A method for recommending content items is provided. The method includes the steps of: selecting a specific content among at least one content viewed by a user; selecting a target content; calculating a content-based collaborative filtering (CF) similarity between the specific content and the target content; calculating a keyword similarity between the specific content and the target content; calculating a final similarity by weighted average of the CF similarity and the keyword similarity; and determining whether to recommend the target content to the user based on the final similarity.

Description

Method and Apparatus for Recommendation of Similar Content Items}

본 발명은 VOD(Video On Demand) 컨텐츠 관련의 정보 처리 기술에 관한 것이다.The present invention relates to information processing technology related to video on demand (VOD) content.

근래에 들어 인터넷(Internet)의 활용이 일상화됨에 따라 홈 네트워크 시대가 도래하였다. 이러한 홈 네트워크를 실현한 구체적인 한 예로서 IPTV(Internet Protocol Television) 서비스를 들 수 있다. IPTV 서비스는 인터넷을 이용하여 제공되는 양방향 TV 서비스로서, 인터넷에 연결된 셋탑박스(set-top box)를 이용하여 컨텐츠 제공 서비스 사업자가 운용하는 컨텐츠 제공 서버와 연결하여 VOD(video on demand) 컨텐츠와 같은 컨텐츠를 다운로드 방식 또는 스트리밍 방식으로 내려 받아 시청할 수 있도록 한 서비스이다. IPTV 서비스는 일반 케이블 방송과는 달리 시청자가 자신이 편리한 시간에 보고 싶은 프로그램을 선별하여 볼 수 있도록 한다는 점에서 시청자에게는 다양한 볼거리와 편의성을 제공하는 한편 사업자에게는 가입자 별로 일반 케이블 방송사가 징수하는 월정액 이상의 매출을 올릴 수 있는 수익 모델이 되고 있다. 따라서 IPTV 서비스 사업자의 입장에서는 시청자에게 다양한 컨텐츠를 다양한 방식으로 마케팅하여 구매를 독려하는 것이 지대한 관심사가 되고 있다.In recent years, as the use of the Internet has become commonplace, the era of home networks has arrived. A specific example of realizing such a home network is an Internet Protocol Television (IPTV) service. The IPTV service is an interactive TV service provided using the Internet, and is connected to a content providing server operated by a content providing service provider using a set-top box connected to the Internet, such as video on demand (VOD) content. It is a service that allows you to download and watch content in a download or streaming manner. Unlike general cable broadcasting, IPTV service provides a variety of attractions and convenience to viewers in that it allows viewers to select and watch the programs they want to watch at a convenient time. It is becoming a revenue model that can increase sales. Therefore, from the perspective of IPTV service providers, marketing various contents to viewers in various ways to encourage purchases is of great interest.

본 발명의 과제는 사용자가 기 시청한 컨텐츠와 공통 장르 및/또는 소재에 해당하는 유사 컨텐츠를 결정하여 추천할 수 있도록 한, 유사 컨텐츠 추천 방법 및 장치를 제공하는 것이다.SUMMARY OF THE INVENTION An object of the present invention is to provide a similar content recommendation method and apparatus in which a user can determine and recommend previously viewed content and similar content corresponding to a common genre and/or material.

본 발명이 해결하고자 하는 과제들은 이상에서 언급한 과제들에 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

일 측면에서, 컨텐츠 추천 방법이 제공된다. 본 방법은, 사용자가 시청한 적어도 하나의 컨텐츠 중 특정 컨텐츠를 선택하는 단계, 타겟 컨텐츠를 선택하는 단계, 상기 특정 컨텐츠와 상기 타겟 컨텐츠 간의 컨텐츠 기반 협업 필터링(Collaborative Filtering: CF) 유사도를 산출하는 단계, 상기 특정 컨텐츠와 상기 타겟 컨텐츠 간의 키워드 유사도를 산출하는 단계, 상기 CF 유사도와 상기 키워드 유사도를 가중 평균하여 최종 유사도를 산출하는 단계, 및 상기 최종 유사도에 근거하여 상기 타겟 컨텐츠를 상기 사용자에게 추천할 지의 여부를 결정하는 단계를 포함할 수 있다.In one aspect, a content recommendation method is provided. The method includes the steps of selecting a specific content among at least one content viewed by a user, selecting a target content, and calculating a content-based collaborative filtering (CF) similarity between the specific content and the target content. , calculating a keyword similarity between the specific content and the target content, calculating a final similarity by weighted average of the CF similarity and the keyword similarity, and recommending the target content to the user based on the final similarity It may include the step of determining whether or not

일 실시예에서, 상기 특정 컨텐츠와 상기 타겟 컨텐츠 간의 컨텐츠 기반 협업 필터링(collaborative filtering: CF) 유사도를 산출하는 단계는, 아래의 식In an embodiment, calculating the content-based collaborative filtering (CF) similarity between the specific content and the target content includes the following formula

- 여기서 w는 상기 특정 컨텐츠를 시청한 사용자들의 ID들(Identifications)의 집합을 나타내고, W_i는 컨텐츠 i를 시청한 사용자들의 ID들의 집합을 나타내고,

는 w와 Wi의 합집합(union)을 나타내고,

는 w와 Wi의 합집합의 원소들의 개수를 나타내고,

는 w와 Wi의 교집합(intersection)을 나타내고,

는 w와 Wi의 교집합의 원소들의 개수를 나타내고, x_i는 상기 특정 컨텐츠와 컨텐츠 i간의 CF 유사도를 나타냄 - 에 따라 상기 CF 유사도를 산출하는 단계를 포함한다.- Here, w represents a set of IDs (Identifications) of users who have viewed the specific content, and W _i represents a set of IDs of users who have viewed content i,

represents the union of w and Wi,

represents the number of elements of the union of w and Wi,

represents the intersection of w and Wi,

and represents the number of elements of the intersection of w and Wi, and x _i represents the CF similarity between the specific content and the content i.

일 실시예에서, 상기 특정 컨텐츠와 상기 타겟 컨텐츠 간의 키워드 유사도를 산출하는 단계는, 상기 특정 컨텐츠에 대한 키워드 벡터 및 상기 타겟 컨텐츠에 대한 키워드 벡터를 참조하는 단계를 포함한다.In an embodiment, calculating the keyword similarity between the specific content and the target content includes referring to a keyword vector for the specific content and a keyword vector for the target content.

일 실시예에서, 상기 특정 컨텐츠에 대한 키워드 벡터는 후보 키워드들의 총 개수에 해당하는 개수의 원소들을 가지며, 상기 원소들은 상기 후보 키워드들에 각각 대응하며, 상기 원소들의 각각은 상기 특정 컨텐츠에 있어서의 상기 해당 후보 키워드의 중요도에 대응하는 크기의 원소 값을 가지며, 상기 원소들의 각각은 0 이상이고 최고값 이하인 값을 취한다.In one embodiment, the keyword vector for the specific content has a number of elements corresponding to the total number of candidate keywords, the elements respectively correspond to the candidate keywords, and each of the elements is the number of elements in the specific content. It has an element value of a size corresponding to the importance of the corresponding candidate keyword, and each of the elements takes a value greater than or equal to 0 and less than or equal to the maximum value.

일 실시예에서, 상기 특정 컨텐츠에 대한 키워드 벡터의 특정 원소의 원소 값이 0인 경우 상기 특정 원소에 대응하는 상기 후보 키워드가 상기 특정 컨텐츠와 관련이 없음을 나타낸다.In an embodiment, when an element value of a specific element of the keyword vector for the specific content is 0, it indicates that the candidate keyword corresponding to the specific element is not related to the specific content.

일 실시예에서, 상기 타겟 컨텐츠에 대한 키워드 벡터는 후보 키워드들의 총 개수에 해당하는 개수의 원소들을 가지며, 상기 원소들은 상기 후보 키워드들에 각각 대응하며, 상기 원소들의 각각은 상기 타겟 컨텐츠에 있어서의 상기 해당 후보 키워드의 중요도에 대응하는 크기의 원소 값을 가지며, 상기 원소들의 각각은 0 이상이고 최고값 이하인 값을 취한다.In one embodiment, the keyword vector for the target content has a number of elements corresponding to the total number of candidate keywords, the elements respectively corresponding to the candidate keywords, and each of the elements in the target content It has an element value of a size corresponding to the importance of the corresponding candidate keyword, and each of the elements takes a value greater than or equal to 0 and less than or equal to the maximum value.

일 실시예에서, 상기 타겟 컨텐츠에 대한 키워드 벡터의 특정 원소의 원소 값이 0인 경우 상기 특정 원소에 대응하는 상기 후보 키워드가 상기 타겟 컨텐츠와 관련이 없음을 나타낸다.In an embodiment, when an element value of a specific element of the keyword vector for the target content is 0, it indicates that the candidate keyword corresponding to the specific element is not related to the target content.

일 실시예에서, 상기 특정 컨텐츠와 상기 타겟 컨텐츠 간의 키워드 유사도를 산출하는 단계는, 아래의 식In an embodiment, calculating the keyword similarity between the specific content and the target content includes the following formula

- 여기서 k는 상기 특정 컨텐츠에 대한 키워드 벡터이고, K_i는 컨텐츠 i에 대한 키워드 벡터이고,

는 상기 특정 컨텐츠에 대한 키워드 벡터 k와 컨텐츠 i에 대한 키워드 벡터 K_i 간의 맨하탄 거리(Manhattan distance)이고, MAX는 상기 최고값에 상기 후보 키워드들의 총 개수를 곱한 값이고, y_i는 상기 특정 컨텐츠와 컨텐츠 i간의 키워드 유사도를 나타냄 - 에 따라 상기 특정 컨텐츠와 상기 타겟 컨텐츠 간의 키워드 유사도를 산출하는 단계를 포함한다.- where k is a keyword vector for the specific content, K _i is a keyword vector for content i,

is the Manhattan distance between the keyword vector k for the specific content and the keyword vector K _i for the content i, MAX is the maximum value multiplied by the total number of the candidate keywords, and y _i is the specific content and calculating the keyword similarity between the specific content and the target content according to - indicating the keyword similarity between the content i and the content i.

일 실시예에서, 상기 CF 유사도와 상기 키워드 유사도를 가중 평균하여 최종 유사도를 산출하는 단계는, 아래의 식In an embodiment, calculating the final similarity by weighted average of the CF similarity and the keyword similarity includes the following equation

- 여기서

와

는 0 이상이고 1이하인 값이고,

와

의 합은 1이고,

는 상기 특정 컨텐츠에 대한 키워드 벡터의 맨하탄 거리이고,

는 컨텐츠 i에 대한 키워드 벡터의 맨하탄 거리이고,

는 키워드 필터 함수이고, S_i는 상기 특정 컨텐츠와 컨텐츠 i간의 최종 유사도를 나타냄 - 에 따라 상기 최종 유사도를 산출하는 단계를 포함한다.- here

Wow

is a value greater than or equal to 0 and less than or equal to 1,

Wow

the sum of is 1,

is the Manhattan distance of the keyword vector for the specific content,

is the Manhattan distance of the keyword vector for content i,

is a keyword filter function, and S _i represents the final similarity between the specific content and the content i - calculating the final similarity.

일 실시예에서, 상기 특정 컨텐츠에 대한 키워드 벡터에서 0이 아닌 값을 갖는 원소들과 상기 타겟 컨텐츠에 대한 키워드 벡터에서 0이 아닌 값을 갖는 원소들이 서로 배타적인 경우 상기 특정 컨텐츠와 상기 타겟 컨텐츠는 상기 후보 키워드들 중 어떠한 후보 키워드도 공유하지 않는 것을 나타내며 상기 키워드 필터 함수

는 0의 값을 갖는다.In one embodiment, when elements having a non-zero value in the keyword vector for the specific content and elements having a non-zero value in the keyword vector for the target content are mutually exclusive, the specific content and the target content are indicates that none of the candidate keywords are shared and the keyword filter function

has a value of 0.

일 실시예에서, 상기 최종 유사도에 근거하여 상기 타겟 컨텐츠를 상기 사용자에게 추천할 지의 여부를 결정하는 단계는, 상기 최종 유사도가 선정된 임계값 이상인 경우 상기 타겟 컨텐츠를 상기 사용자에게 추천하는 것으로 결정하는 단계를 포함한다.In an embodiment, the determining whether to recommend the target content to the user based on the final similarity includes determining that the target content is recommended to the user when the final similarity is greater than or equal to a predetermined threshold. includes steps.

다른 측면에서, 컨텐츠 추천 방법이 제공된다. 본 방법은, 사용자가 시청한 적어도 하나의 컨텐츠 중 특정 컨텐츠를 선택하는 단계, 복수의 타겟 컨텐츠를 선택하는 단계, 상기 특정 컨텐츠와 상기 복수의 타겟 컨텐츠 간의 최종 유사도들을 산출하는 단계, 및 상기 최종 유사도들에 근거하여 상기 복수의 타겟 컨텐츠 중 적어도 하나의 타겟 컨텐츠를 상기 사용자에게 추천하는 것으로 결정하는 단계를 포함할 수 있다. 상기 최종 유사도들의 각각은 상기 특정 컨텐츠와 상기 해당 타겟 컨텐츠 간의 컨텐츠 기반 협업 필터링(collaborative filtering: CF) 유사도 및 상기 특정 컨텐츠와 상기 해당 타겟 컨텐츠 간의 키워드 유사도에 기초하여 산출될 수 있다.In another aspect, a content recommendation method is provided. The method includes the steps of selecting a specific content from among at least one content viewed by a user, selecting a plurality of target content, calculating final similarities between the specific content and the plurality of target content, and the final similarity The method may include determining to recommend at least one target content among the plurality of target contents to the user based on the information. Each of the final similarities may be calculated based on a content-based collaborative filtering (CF) similarity between the specific content and the corresponding target content and a keyword similarity between the specific content and the corresponding target content.

일 실시예에서, 상기 최종 유사도들의 각각은 상기 CF 유사도와 상기 키워드 유사도를 가중 평균하여 산출된다.In an embodiment, each of the final similarities is calculated by averaging the CF similarity and the keyword similarity.

또 다른 측면에서, 컨텐츠 추천을 위한 장치가 제공된다. 본 장치는, 컨텐츠 테이블 및 컨텐츠들에 대한 키워드 벡터들을 저장하는 데이터베이스부 - 상기 컨텐츠 테이블에는 상기 컨텐츠들의 각각에 대하여 상기 해당 컨텐츠를 시청한 사용자들의 ID들이 연관되어 저장됨 -, 및 프로세싱 엔진을 포함할 수 있다. 상기 프로세싱 엔진은, 상기 컨텐츠 테이블을 참조하여 사용자가 시청한 특정 컨텐츠와 타겟 컨텐츠 간의 컨텐츠 기반 협업 필터링(Collaborative Filtering: CF) 유사도를 산출하는 동작, 상기 특정 컨텐츠에 대한 키워드 벡터와 상기 타겟 컨텐츠에 대한 키워드 벡터를 이용하여 상기 특정 컨텐츠와 상기 타겟 컨텐츠 간의 키워드 유사도를 산출하는 동작, 상기 CF 유사도와 상기 키워드 유사도를 가중 평균하여 최종 유사도를 산출하는 동작 및 상기 최종 유사도에 근거하여 상기 타겟 컨텐츠를 상기 사용자에게 추천할 지의 여부를 결정하는 동작을 수행하도록 구성될 수 있다.In another aspect, an apparatus for content recommendation is provided. The apparatus includes a content table and a database unit for storing keyword vectors for the contents, in which the contents table stores IDs of users who have viewed the corresponding contents in association with each of the contents, and a processing engine. can do. The processing engine refers to an operation of calculating a content-based collaborative filtering (CF) similarity between the specific content viewed by the user and the target content with reference to the content table, a keyword vector for the specific content and the target content. Calculating a keyword similarity between the specific content and the target content using a keyword vector, calculating a final similarity by weighted average of the CF similarity and the keyword similarity, and selecting the target content based on the final similarity to the user It may be configured to perform an operation of determining whether to recommend to the user.

일 실시예에서, 상기 프로세싱 엔진은, 아래의 식In one embodiment, the processing engine is

- 여기서 w는 상기 특정 컨텐츠를 시청한 사용자들의 ID들의 집합을 나타내고, W_i는 컨텐츠 i를 시청한 사용자들의 ID들의 집합을 나타내고,

는 w와 Wi의 합집합(union)을 나타내고,

는 w와 Wi의 합집합의 원소들의 개수를 나타내고,

는 w와 Wi의 교집합(intersection)을 나타내고,

는 w와 Wi의 교집합의 원소들의 개수를 나타내고, x_i는 상기 특정 컨텐츠와 컨텐츠 i간의 CF 유사도를 나타냄 - 에 따라 상기 CF 유사도를 산출하는 동작을 수행하도록 더 구성된다.- Here, w represents a set of IDs of users who have viewed the specific content, and W _i represents a set of IDs of users who have viewed content i,

represents the union of w and Wi,

represents the number of elements of the union of w and Wi,

represents the intersection of w and Wi,

represents the number of elements of the intersection of w and Wi, and x _i represents the CF similarity between the specific content and the content i.

는 상기 특정 컨텐츠에 대한 키워드 벡터 k와 컨텐츠 i에 대한 키워드 벡터 K_i 간의 맨하탄 거리(Manhattan distance)이고, MAX는 상기 최고값에 상기 후보 키워드들의 총 개수를 곱한 값이고, y_i는 상기 특정 컨텐츠와 컨텐츠 i간의 키워드 유사도를 나타냄 - 에 따라 상기 특정 컨텐츠와 상기 타겟 컨텐츠 간의 키워드 유사도를 산출하는 동작을 수행하도록 더 구성된다.- where k is a keyword vector for the specific content, K _i is a keyword vector for content i,

is the Manhattan distance between the keyword vector k for the specific content and the keyword vector K _i for the content i, MAX is the maximum value multiplied by the total number of the candidate keywords, and y _i is the specific content and indicates the keyword similarity between the content i and - according to the operation of calculating the keyword similarity between the specific content and the target content.

- 여기서

와

는 0 이상이고 1이하인 값이고,

와

의 합은 1이고,

는 컨텐츠 i에 대한 키워드 벡터의 맨하탄 거리이고,

는 키워드 필터 함수이고, S_i는 상기 특정 컨텐츠와 컨텐츠 i간의 최종 유사도를 나타냄 - 에 따라 상기 최종 유사도를 산출하는 동작을 수행하도록 더 구성된다.- here

Wow

is a value greater than or equal to 0 and less than or equal to 1,

Wow

the sum of is 1,

is the Manhattan distance of the keyword vector for the specific content,

is the Manhattan distance of the keyword vector for content i,

is a keyword filter function, and S _i represents a final similarity between the specific content and the content i.

has a value of 0.

일 실시예에서, 상기 프로세싱 엔진은, 상기 최종 유사도가 선정된 임계값 이상인 경우 상기 타겟 컨텐츠를 상기 사용자에게 추천하는 것으로 결정하는 동작을 수행하도록 더 구성된다.In an embodiment, the processing engine is further configured to perform an operation of determining to recommend the target content to the user when the final similarity is equal to or greater than a predetermined threshold.

또 다른 측면에서, 컨텐츠 추천을 위한 장치가 제공된다. 본 장치는, 컨텐츠 관련 데이터를 저장하는 데이터베이스부, 및 프로세싱 엔진을 포함할 수 있다. 상기 프로세싱 엔진은, 상기 컨텐츠 관련 데이터를 참조하여 사용자가 시청한 특정 컨텐츠와 복수의 타겟 컨텐츠 간의 최종 유사도들을 산출하고, 상기 최종 유사도들에 근거하여 상기 복수의 타겟 컨텐츠 중 적어도 하나의 타겟 컨텐츠를 상기 사용자에게 추천하는 것으로 결정하도록 구성되고, 상기 프로세싱 엔진은, 상기 최종 유사도들의 각각을 상기 특정 컨텐츠와 상기 해당 타겟 컨텐츠 간의 컨텐츠 기반 협업 필터링(collaborative filtering: CF) 유사도 및 상기 특정 컨텐츠와 상기 해당 타겟 컨텐츠 간의 키워드 유사도에 기초하여 산출하도록 더 구성될 수 있다.In another aspect, an apparatus for content recommendation is provided. The apparatus may include a database unit for storing content-related data, and a processing engine. The processing engine calculates final similarities between the specific content viewed by the user and a plurality of target contents with reference to the content-related data, and selects at least one target content from among the plurality of target contents based on the final similarities and determine a recommendation to a user, wherein the processing engine is configured to determine each of the final similarities between the specific content and the corresponding target content and a content-based collaborative filtering (CF) similarity between the specific content and the corresponding target content and the specific content and the corresponding target content. It may be further configured to calculate based on the degree of similarity between keywords.

일 실시예에서, 상기 컨텐츠 관련 데이터는 컨텐츠 테이블 및 컨텐츠들에 대한 키워드 벡터들에 관한 데이터를 포함하며, 상기 컨텐츠 테이블에는 상기 컨텐츠들의 각각에 대하여 상기 해당 컨텐츠를 시청한 사용자들의 ID들이 연관되어 저장된다.In one embodiment, the content-related data includes data about keyword vectors for the content table and the content, and the content table is associated with IDs of users who viewed the content for each of the content and stored. do.

일 실시예에서, 상기 프로세싱 엔진은, 상기 최종 유사도들의 각각을 상기 CF 유사도와 상기 키워드 유사도를 가중 평균하여 산출하도록 더 구성된다.In an embodiment, the processing engine is further configured to calculate each of the final similarities by weighted average of the CF similarity and the keyword similarity.

또 다른 측면에서, 프로그램을 기록한 컴퓨터 판독가능 기록매체가 제공된다. 여기서 상기 프로그램은 명령어들을 포함하고, 상기 명령어들은 컴퓨터에 의해 실행될 때 상기 방법을 수행한다.In another aspect, a computer-readable recording medium recording a program is provided. wherein the program includes instructions, the instructions performing the method when executed by a computer.

본 발명의 실시예들에 따르면, 사용자가 기 시청한 컨텐츠와 공통 장르 및/또는 소재에 해당하는 유사 컨텐츠를 컨텐츠 기반의 협업 필터링에 의한 유사도와 키워드 유사도를 기초로 결정하여 추천할 수 있도록 함으로써 컨텐츠 소비량을 증대시킬 수 있는 기술적 효과가 있다.According to embodiments of the present invention, content is determined and recommended by a user based on the similarity of previously viewed content and similar content corresponding to a common genre and/or material by content-based collaborative filtering and keyword similarity. There is a technological effect that can increase consumption.

도 1은 유사 컨텐츠 추천 장치의 블록도의 일 실시예를 도시한 도면이다.
도 2는 컨텐츠 테이블의 일 실시예를 도시한 도면이다.
도 3은 유사 컨텐츠 추천 방법을 설명하기 위한 흐름도의 일 실시예를 도시한 도면이다.1 is a diagram illustrating an embodiment of a block diagram of an apparatus for recommending similar contents.
2 is a diagram illustrating an embodiment of a content table.
3 is a diagram illustrating an embodiment of a flowchart for explaining a method for recommending similar content.

본 발명의 이점들과 특징들 그리고 이들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해 질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 본 실시예들은 단지 본 발명의 개시가 완전하도록 하며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려 주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and a method of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, and these embodiments merely allow the disclosure of the present invention to be complete and those of ordinary skill in the art to which the present invention pertains. It is provided to fully inform the person of the scope of the invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용되는 용어는 단지 특정한 실시예를 설명하기 위해 사용되는 것으로 본 발명을 한정하려는 의도에서 사용된 것이 아니다. 예를 들어, 단수로 표현된 구성 요소는 문맥상 명백하게 단수만을 의미하지 않는다면 복수의 구성 요소를 포함하는 개념으로 이해되어야 한다. 또한, 본 발명의 명세서에서, '포함하다' 또는 '가지다' 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것일 뿐이고, 이러한 용어의 사용에 의해 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성 요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성이 배제되는 것은 아니다. 또한, 본 명세서에 기재된 실시예에 있어서 '모듈' 혹은 '부'는 적어도 하나의 기능이나 동작을 수행하는 기능적 부분을 의미할 수 있다.The terms used herein are used only to describe specific embodiments and are not intended to limit the present invention. For example, a component expressed in a singular should be understood as a concept including a plurality of components unless the context clearly means only the singular. In addition, in the specification of the present invention, terms such as 'comprise' or 'have' are only intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, and such The use of the term does not exclude the possibility of the presence or addition of one or more other features or numbers, steps, operations, components, parts, or combinations thereof. Also, in the embodiments described in this specification, a 'module' or a 'unit' may mean a functional part that performs at least one function or operation.

덧붙여, 다르게 정의되지 않는 한 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미가 있는 것으로 해석되어야 하며, 본 발명의 명세서에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.In addition, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the specification of the present invention, it should be interpreted in an ideal or excessively formal meaning. doesn't happen

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 보다 상세히 설명한다. 다만, 이하의 설명에서는 본 발명의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in more detail with reference to the accompanying drawings. However, in the following description, if there is a risk of unnecessarily obscuring the gist of the present invention, detailed descriptions of well-known functions or configurations will be omitted.

도 1은 유사 컨텐츠 추천 장치의 블록도의 일 실시예를 도시한 도면이다.1 is a diagram illustrating an embodiment of a block diagram of an apparatus for recommending similar contents.

도 1의 유사 컨텐츠 추천 장치(100)는 IPTV 서비스 사업자가 운용하는, 하나 이상의 위치에 설치되는 하나 이상의 서버 컴퓨터에서 실행되는 컴퓨터 프로그램들로서 구현되는 시스템의 일 예일 수 있다. 도 1에 도시된 바와 같이, 유사 컨텐츠 추천 장치(100)는 데이터베이스부(110) 및 데이터베이스부(110)에 통신 가능하게 결합된 프로세싱 엔진(120)을 포함할 수 있다. 데이터베이스부(110)는 컨텐츠 관련 데이터를 저장할 수 있다. 컨텐츠 관련 데이터는 컨텐츠 테이블에 관한 데이터를 포함할 수 있다. 컨텐츠 테이블에는 컨텐츠들의 각각에 대하여 해당 컨텐츠를 시청한 사용자들의 ID들이 연관되어 저장된다. 본 명세서에서 사용자가 컨텐츠를 시청하였다는 의미는 사용자가 컨텐츠의 예고편을 시청하거나 컨텐츠를 구매하여 그 본편을 시청한 경우 등을 망라하는 의미로 해석되어야 한다. 컨텐츠 테이블의 일 실시예를 도시한 도 2를 참조하면, 컨텐츠 별로 해당 컨텐츠를 시청하거나 조회한 사용자들의 ID들(Identifications)이 나열되어 있다. 도시된 실시예에 따르면, 컨텐츠 1은 사용자 10, 사용자 33, 사용자 72 및 사용자 165에 의해 시청되거나 조회된 것으로 기록되어 있다. 또한 컨텐츠 3은 사용자 10, 사용자 72, 사용자 165 및 사용자 289에 의해 시청되거나 조회된 것으로 기록되어 있다. 도 2에 도시된 컨텐츠 테이블(200)은 단지 예시의 목적상 데이터를 서로 연관시켜 저장하는 방식을 개념적으로 도시한 것일 뿐이고 도 2에 도시된 사항이 저장된 데이터의 구조를 예시하거나 암시하는 것이 아님을 인식하여야 할 것이다.The similar content recommendation apparatus 100 of FIG. 1 may be an example of a system implemented as computer programs operated by an IPTV service provider and executed on one or more server computers installed in one or more locations. As shown in FIG. 1 , the similar content recommendation apparatus 100 may include a database unit 110 and a processing engine 120 communicatively coupled to the database unit 110 . The database unit 110 may store content-related data. The content-related data may include data about the content table. In the content table, IDs of users who have viewed the corresponding content for each of the content are stored in association with each other. In this specification, the meaning that the user watched the content should be interpreted as encompassing the case where the user watched the trailer of the content or purchased the content and watched the main part. Referring to FIG. 2 showing an embodiment of a content table, IDs (Identifications) of users who have viewed or inquired corresponding content for each content are listed. According to the illustrated embodiment, content 1 is recorded as viewed or viewed by user 10 , user 33 , user 72 , and user 165 . In addition, content 3 is recorded as viewed or viewed by user 10, user 72, user 165, and user 289. The content table 200 shown in FIG. 2 is only conceptually illustrating a method of storing data in association with each other for the purpose of illustration, and the contents shown in FIG. 2 do not illustrate or imply the structure of the stored data. will have to recognize

컨텐츠에 관한 데이터는 컨텐츠들에 대한 키워드 벡터들에 관한 데이터를 더 포함할 수 있다. 임의의 컨텐츠에 대한 키워드 벡터는 후보 키워드들의 총 개수에 해당하는 개수의 원소들을 가진다. 여기서 후보 키워드들은 IPTV 서비스에 의해 제공되는 컨텐츠들과 관련된 키워드들의 총 집합일 수 있다. 일 실시예에서, 후보 키워드는 컨텐츠와 관련된 메타 키워드이다. 키워드 벡터의 원소들은 후보 키워드들에 각각 대응하거나 연관되어 있다. 키워드 벡터의 원소들의 각각은 해당 컨텐츠에 있어서의 해당 후보 키워드의 중요도에 대응하는 크기의 원소 값을 가질 수 있다. 키워드 벡터의 원소들의 각각은 0 이상이고 최고값 이하인 값을 취할 수 있다. 예컨대 후보 키워드들의 총 개수가 5개이고 최고값이 10이고 컨텐츠 1번에 대한 키워드 벡터가 v₁ = [7, 1, 9, 0, 2]이라고 가정하면, 컨텐츠 1번에 있어서의 후보 키워드 1번의 중요도는 7이고, 후보 키워드 2번의 중요도는 1이고, 후보 키워드 3번의 중요도는 9이고, 후보 키워드 5번의 중요도는 2이고, 후보 키워드 4번은 위 특정 컨텐츠와 관련이 없다는 것을 의미한다. 다른 예로서 컨텐츠 2번에 대한 키워드 벡터가 v₂ = [1, 3, 0, 0, 10]이라고 가정하면, 컨텐츠 2번에 있어서의 후보 키워드 1번의 중요도는 1이고, 후보 키워드 2번의 중요도는 3이고, 후보 키워드 5번의 중요도는 10으로서 최고 수준이며, 후보 키워드 3번 및 4번은 컨텐츠 2번과 관련이 없다는 것을 의미한다. 또 다른 예로서 컨텐츠 3번에 대한 키워드 벡터가 v₃ = [0, 0, 0, 4, 0]이라고 가정하면, 컨텐츠 3번에 있어서의 후보 키워드 4번의 중요도는 4이고, 나머지 후보 키워드들은 컨텐츠 3번과 관련이 없다는 것을 의미한다.The data about the content may further include data about keyword vectors for the content. A keyword vector for any content has a number of elements corresponding to the total number of candidate keywords. Here, the candidate keywords may be a total set of keywords related to contents provided by the IPTV service. In one embodiment, the candidate keyword is a meta keyword associated with the content. The elements of the keyword vector respectively correspond to or are associated with the candidate keywords. Each of the elements of the keyword vector may have an element value having a size corresponding to the importance of the corresponding candidate keyword in the corresponding content. Each of the elements of the keyword vector may take a value greater than or equal to zero and less than or equal to the highest value. For example, assuming that the total number of candidate keywords is 5, the highest value is 10, and the keyword vector for content 1 is v ₁ = [7, 1, 9, 0, 2], The importance is 7, the importance of the candidate keyword No. 2 is 1, the importance of the candidate keyword No. 3 is 9, the importance of the candidate keyword No. 5 is 2, and the importance of the candidate keyword No. 4 is not related to the specific content above. As another example, assuming that the keyword vector for content 2 is v ₂ = [1, 3, 0, 0, 10], the importance of candidate keyword #1 in content #2 is 1, and the importance of candidate keyword #2 is 3, the importance of candidate keyword No. 5 is 10, which is the highest level, and candidate keywords No. 3 and No. 4 are not related to content No. 2. As another example, assuming that the keyword vector for content #3 is v ₃ = [0, 0, 0, 4, 0], the importance of candidate keyword #4 in content #3 is 4, and the remaining candidate keywords are That means it has nothing to do with number 3.

데이터베이스부(110)는 프로세싱 엔진(120)을 구현하기 위해 필요한 소프트웨어/펌웨어를 더 저장할 수 있다. 데이터베이스부(110)는, 플래시 메모리 타입(flash memory type), 하드 디스크 타입(hard disk type), 멀티미디어 카드(MultiMedia Card: MMC), 카드 타입의 메모리(예를 들어, SD(Secure Digital) 카드 또는 XD(eXtream Digital) 카드 등), RAM(Random Access Memory), SRAM(Static Random Access Memory), ROM(Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크 및 광 디스크 중 어느 하나의 저장 매체로 구현될 수 있으나, 당업자라면 데이터베이스부(110)의 구현 형태가 이에 한정되는 것이 아님을 알 수 있을 것이다.The database unit 110 may further store software/firmware necessary to implement the processing engine 120 . The database unit 110, a flash memory type (flash memory type), a hard disk type (hard disk type), a multimedia card (MultiMedia Card: MMC), a card type memory (eg, SD (Secure Digital) card or XD (eXtream Digital) cards, etc.), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM) ), a magnetic memory, a magnetic disk, and an optical disk may be implemented as any one storage medium, but those skilled in the art will understand that the implementation form of the database unit 110 is not limited thereto.

프로세싱 엔진(120)은 사용자가 시청한 적어도 하나의 컨텐츠 중 특정 컨텐츠를 선정하고 이 특정 컨텐츠와의 유사도가 높은 다른 컨텐츠를 사용자에게 추천하는 것으로 결정하도록 설계될 수 있다. 이를 위해 프로세싱 엔진(120)은 IPTV 서비스에 의해 제공되는 컨텐츠들 중 사용자가 시청한 특정 컨텐츠를 제외한 어느 하나의 타겟 컨텐츠를 선정해 이를 사용자가 시청한 특정 컨텐츠와 비교하도록 구성될 수 있다. 프로세싱 엔진(120)은 컨텐츠 테이블(200)을 참조하여 특정 컨텐츠와 타겟 컨텐츠 간의 컨텐츠 기반 협업 필터링(content-based collaborative filtering: CF) 유사도를 산출하는 동작, 특정 컨텐츠에 대한 키워드 벡터와 타겟 컨텐츠에 대한 키워드 벡터를 이용하여 특정 컨텐츠와 타겟 컨텐츠 간의 키워드 유사도를 산출하는 동작, CF 유사도와 키워드 유사도를 가중 평균하여 최종 유사도를 산출하는 동작 및 최종 유사도에 근거하여 타겟 컨텐츠를 사용자에게 추천할 지의 여부를 결정하는 동작을 수행하도록 구성될 수 있다.The processing engine 120 may be designed to select a specific content from among at least one content viewed by the user and determine that other content having a high similarity to the specific content is recommended to the user. To this end, the processing engine 120 may be configured to select any one target content excluding the specific content viewed by the user from among the contents provided by the IPTV service and compare it with the specific content viewed by the user. The processing engine 120 refers to the content table 200 to calculate a content-based collaborative filtering (CF) similarity between a specific content and a target content, and a keyword vector for a specific content and a target content. The operation of calculating the keyword similarity between the specific content and the target content using a keyword vector, the operation of calculating the final similarity by weighted average of the CF similarity and the keyword similarity, and determining whether to recommend the target content to the user based on the final similarity It may be configured to perform an operation.

프로세싱 엔진(120)은 아래의 수학식 1 에 따라 특정 컨텐츠와 타겟 컨텐츠 간의 CF 유사도를 산출하는 동작을 수행하도록 더 구성될 수 있다.The processing engine 120 may be further configured to perform an operation of calculating a CF similarity between specific content and target content according to Equation 1 below.

여기서 w는 특정 컨텐츠를 시청한 사용자들의 ID들의 집합을 나타내고, W_i는 컨텐츠 i를 시청한 사용자들의 ID들의 집합을 나타내고,

는 w와 Wi의 합집합(union)을 나타내고,

는 w와 Wi의 합집합의 원소들의 개수를 나타내고,

는 w와 Wi의 교집합(intersection)을 나타내고,

는 w와 Wi의 교집합의 원소들의 개수를 나타내고, x_i는 특정 컨텐츠와 컨텐츠 i간의 CF 유사도를 나타낸다.Here, w denotes a set of IDs of users who watched a specific content, W _i denotes a set of IDs of users who watched content i,

represents the union of w and Wi,

represents the number of elements of the union of w and Wi,

represents the intersection of w and Wi,

denotes the number of elements of the intersection of w and Wi, and x _i denotes the CF similarity between specific content and content i.

도 2의 컨텐츠 테이블(200)을 참조하여 예를 들어 보면, 컨텐츠 1번을 시청한 사용자들의 집합은 C₁ ={10, 33, 72, 165}이고, 컨텐츠 4번을 시청한 사용자들의 집합은 C₄ = {72, 162, 555}이므로, 컨텐츠 1번(특정 컨텐츠)과 컨텐츠 4번(타겟 컨텐츠) 간의 CF 유사도 x₄ = 1 / 6 = 0.16이다. 다른 예를 들어 보면, 컨텐츠 1번을 시청한 사용자들의 집합은 C₁ ={10, 33, 72, 165}이고, 컨텐츠 2번을 시청한 사용자들의 집합은 C₂ = {10, 92, 165}이므로, 컨텐츠 1번(특정 컨텐츠)과 컨텐츠 2번(타겟 컨텐츠) 간의 CF 유사도 x₂ = 2 / 5 = 0.4이다. 또 다른 예를 들어 보면, 컨텐츠 1번을 시청한 사용자들의 집합은 C₁ ={10, 33, 72, 165}이고, 컨텐츠 3번을 시청한 사용자들의 집합은 C₃ = {10, 72, 165, 289}이므로, 컨텐츠 1번(특정 컨텐츠)과 컨텐츠 3번(타겟 컨텐츠) 간의 CF 유사도 x₃ = 3 / 5 = 0.6으로서 비교적 높은 값을 갖는다. 컨텐츠 1번과 컨텐츠 3번은 이들을 시청한 사용자들이 대부분 유사하기 때문이다.For example, referring to the content table 200 of FIG. 2 , the set of users who watched content 1 is C ₁ ={10, 33, 72, 165}, and the set of users who watched content number 4 is Since C ₄ = {72, 162, 555}, the CF similarity between content number 1 (specific content) and content number 4 (target content) x ₄ = 1 / 6 = 0.16. As another example, the set of users who watched content 1 is C ₁ ={10, 33, 72, 165}, and the set of users who watched content 2 is C ₂ = {10, 92, 165} Therefore, the CF similarity between Content No. 1 (specific content) and Content No. 2 (target content) x ₂ = 2 / 5 = 0.4. As another example, the set of users who watched content 1 is C ₁ ={10, 33, 72, 165}, and the set of users who watched content 3 is C ₃ = {10, 72, 165 , 289} so, content # 1 as the degree of similarity CF x ₃ = _3/5 = 0.6 between (the specific content) and the content # 3 (the target content) has a relatively high value. This is because most of the users who watched the content 1 and the content 3 are similar.

프로세싱 엔진(120)은 아래의 수학식 2에 따라 특정 컨텐츠와 타겟 컨텐츠 간의 키워드 유사도를 산출하는 동작을 수행하도록 더 구성될 수 있다.The processing engine 120 may be further configured to perform an operation of calculating a keyword similarity between specific content and target content according to Equation 2 below.

여기서 k는 특정 컨텐츠에 대한 키워드 벡터이고, K_i는 컨텐츠 i에 대한 키워드 벡터이고,

는 특정 컨텐츠에 대한 키워드 벡터 k와 컨텐츠 i에 대한 키워드 벡터 K_i 간의 맨하탄 거리(Manhattan distance)이고, MAX는 키워드 벡터의 원소들이 취할 수 있는 최고값에 후보 키워드들의 총 개수를 곱한 값이고, y_i는 특정 컨텐츠와 컨텐츠 i간의 키워드 유사도를 나타낸다.where k is a keyword vector for specific content, K _i is a keyword vector for content i,

is the Manhattan distance between the keyword vector k for a specific content and the keyword vector K _i for the content i, MAX is the maximum value that elements of the keyword vector can take multiplied by the total number of candidate keywords, y _i represents the keyword similarity between specific content and content i.

전술한 예에서와 같이 키워드 벡터의 원소들이 취할 수 있는 최고값이 10이고 후보 키워드들의 총 개수가 5개이고 컨텐츠 1번에 대한 키워드 벡터가 v₁ = [7, 1, 9, 0, 2]이고 컨텐츠 2번에 대한 키워드 벡터가 v₂ = [1, 3, 0, 0, 10]이고 컨텐츠 3번에 대한 키워드 벡터가 v₃ = [0, 0, 0, 4, 0]인 경우에 있어서 키워드 유사도를 산출하는 예를 들어 보면, 컨텐츠 1번(특정 컨텐츠)과 컨텐츠 2번(타겟 컨텐츠) 간의 키워드 유사도 y₂ = 1 - 25/50 = 0.5이다. 다른 예를 들어 보면, 컨텐츠 1번(특정 컨텐츠)과 컨텐츠 3번(타겟 컨텐츠) 간의 키워드 유사도 y₃ = 1 - 23/50 = 0.54이다.As in the above example, the maximum value that the elements of the keyword vector can take is 10, the total number of candidate keywords is 5, the keyword vector for content 1 is v ₁ = [7, 1, 9, 0, 2], A keyword in the case where the keyword vector for content 2 is v ₂ = [1, 3, 0, 0, 10] and the keyword vector for content 3 is v ₃ = [0, 0, 0, 4, 0] _{As an example of calculating the similarity, the keyword similarity y 2} = 1 - 25/50 = 0.5 between the first content (specific content) and the second content (target content). _{As another example, the keyword similarity y 3} = 1 - 23/50 = 0.54 between content number 1 (specific content) and content number 3 (target content).

프로세싱 엔진(120)은 아래의 수학식 3 및 수학식 4에 따라 최종 유사도를 산출하는 동작을 수행하도록 더 구성될 수 있다.The processing engine 120 may be further configured to perform an operation of calculating a final similarity according to Equations 3 and 4 below.

위 수학식 3에서

와

는 0 이상이고 1이하인 값이고,

와

의 합은 1이다. 일 실시예에서,

와

를 모두 0.5로 설정하여 CF 유사도와 키워드 유사도에 동일한 가중치를 곱하여 최종 유사도를 산출한다. 일 실시예에서,

보다

를 더 큰 값으로 설정하여 CF 유사도 보다 키워드 유사도에 더 큰 가중치를 곱하여 최종 유사도를 산출한다. 위 수학식 4에서

는 특정 컨텐츠에 대한 키워드 벡터의 맨하탄 거리이고,

는 컨텐츠 i에 대한 키워드 벡터의 맨하탄 거리이다. 위 수학식 3에서 S_i는 특정 컨텐츠와 컨텐츠 i간의 최종 유사도를 나타낸다. 위 수학식 4에서

는 키워드 필터 함수로서 특정 컨텐츠와 컨텐츠 i가 어떠한 후보 키워드도 공유하지 않는 경우 S_i를 0으로 만들기 위한 함수이다.In Equation 3 above

Wow

is a value greater than or equal to 0 and less than or equal to 1,

Wow

The sum of is 1. In one embodiment,

Wow

are set to 0.5, and the final similarity is calculated by multiplying the CF similarity and the keyword similarity by the same weight. In one embodiment,

see

is set to a larger value, and the final similarity is calculated by multiplying the keyword similarity by a larger weight than the CF similarity. In Equation 4 above

is the Manhattan distance of the keyword vector for a specific content,

is the Manhattan distance of the keyword vector for content i. In Equation 3 above, S _i represents the final similarity between specific content and content i. In Equation 4 above

is a keyword filter function, and is a function for making _{S i} 0 when a specific content and content i do not share any candidate keywords.

전술한 예의 경우를 빌어 최종 유사도를 산출하는 예를 들어 보면, 컨텐츠 1번에 대한 키워드 벡터 v₁ = [7, 1, 9, 0, 2]이고 컨텐츠 2번에 대한 키워드 벡터 v₂ = [1, 3, 0, 0, 10]이므로 키워드 필터 함수

는 1이고, x₂ = 0.4이고 y₂ = 0.5이므로,

와

가 모두 0.5인 경우 컨텐츠 1번(특정 컨텐츠)과 컨텐츠 2번(타겟 컨텐츠) 간의 최종 유사도 S₂ = 0.45이다. 다른 예를 들어 보면, 컨텐츠 1번에 대한 키워드 벡터 v₁ = [7, 1, 9, 0, 2]이고 컨텐츠 3번에 대한 키워드 벡터 v₃ = [0, 0, 0, 4, 0]이므로, 컨텐츠 1번과 컨텐츠 3번은 어떠한 후보 키워드도 공유하지 않고 이에 따라 키워드 필터 함수

가 0이므로, 컨텐츠 1번(특정 컨텐츠)과 컨텐츠 3번(타겟 컨텐츠) 간의 최종 유사도 S₃은 x₃과 y₃의 값에 상관 없이 0의 값을 갖게 된다. 이와 같이 특정 컨텐츠에 대한 키워드 벡터에서 0이 아닌 값을 갖는 원소들과 타겟 컨텐츠에 대한 키워드 벡터에서 0이 아닌 값을 갖는 원소들이 서로 배타적인 경우 특정 컨텐츠와 타겟 컨텐츠는 후보 키워드들 중 어떠한 후보 키워드도 공유하지 않는 것을 나타내며 이 경우 키워드 필터 함수

를 0으로 만들어 특정 컨텐츠와 타겟 컨텐츠 간의 최종 유사도를 0으로 결정한다.As an example of calculating the final similarity using the case of the above example, the keyword vector v ₁ for content 1 = [7, 1, 9, 0, 2] and the keyword vector v ₂ for content 2 = [1] , 3, 0, 0, 10], so the keyword filter function

is 1, since x ₂ = 0.4 and y ₂ = 0.5,

Wow

_{If both are 0.5, the final similarity S 2} = 0.45 between content 1 (specific content) and content 2 (target content). As another example, since the keyword vector v ₁ for content 1 = [7, 1, 9, 0, 2] and the keyword vector v ₃ for content 3 = [0, 0, 0, 4, 0], , content #1 and content #3 do not share any candidate keywords, and thus the keyword filter function

_{Since is 0, the final similarity S 3} between content 1 (specific content) and content 3 (target content) has a value of 0 regardless of the values of _{x 3} and y _{3 .} As described above, when elements having a non-zero value in the keyword vector for the specific content and elements having a non-zero value in the keyword vector for the target content are mutually exclusive, the specific content and the target content are selected as any candidate keyword among the candidate keywords. also indicates not to share, in this case the keyword filter function

is 0 to determine the final similarity between the specific content and the target content as 0.

프로세싱 엔진(120)은 최종 유사도가 선정된 임계값 이상인 경우 타겟 컨텐츠를 사용자에게 추천하는 것으로 결정하는 동작을 수행하도록 더 구성될 수 있다. 임계값을 낮게 또는 높게 설정함으로써 사용자에게 컨텐츠를 공격적으로 또는 보수적으로 추천하는 결정을 내리도록 프로세싱 엔진(120)을 설계할 수 있다.The processing engine 120 may be further configured to perform an operation of determining that the target content is recommended to the user when the final similarity is greater than or equal to a predetermined threshold. By setting the threshold low or high, the processing engine 120 can be designed to make decisions about aggressively or conservatively recommending content to a user.

이상에서는 프로세싱 엔진(120)이 하나의 타겟 컨텐츠를 선정하여 특정 컨텐츠와 비교함으로써 타겟 컨텐츠를 사용자에게 추천할 것인지의 여부를 결정하도록 구성되는 것으로서 설명하였으나, 특정 컨텐츠와 복수의 타겟 컨텐츠 간의 최종 유사도들을 산출하고, 최종 유사도들에 근거하여 복수의 타겟 컨텐츠 중 적어도 하나의 타겟 컨텐츠를 사용자에게 추천하는 것으로 결정하도록 프로세싱 엔진(120)을 구성하는 것이 가능하다.In the above description, the processing engine 120 is configured to determine whether to recommend the target content to the user by selecting one target content and comparing it with the specific content. It is possible to configure the processing engine 120 to calculate and determine to recommend at least one target content among a plurality of target content to the user based on the final similarities.

프로세싱 엔진(120)은, 응용 주문형 집적 회로(Application Specific Integrated Circuits: ASICs), 디지털 신호 처리기(Digital Signal Processors: DSPs), 디지털 신호 처리 소자(Digital Signal Processing Devices: DSPDs), 프로그램 가능 논리 소자(Programmable Logic Devices: PLDs), 현장 프로그램 가능 게이트 어레이(Field-Programmable Gate Arrays: FPGAs), 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers) 및 마이크로 프로세서(microprocessors) 중 적어도 하나에 기반한 하드웨어 플랫폼(hardware platform)으로서 구현될 수 있다. 프로세싱 엔진(120)은 또한 전술한 하드웨어 플랫폼 상에서 실행 가능한 펌웨어(firmware)/소프트웨어 모듈로 구현될 수 있다. 이 경우, 소프트웨어 모듈은 적절한 프로그램(program) 언어로 쓰여진 소프트웨어 애플리케이션(application)에 의해 구현될 수 있다.The processing engine 120 includes Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), and programmable logic devices (Programmable Logic Devices). Hardware based on at least one of Logic Devices (PLDs), Field-Programmable Gate Arrays (FPGAs), processors, controllers, micro-controllers, and microprocessors. It may be implemented as a hardware platform. The processing engine 120 may also be implemented as a firmware/software module executable on the hardware platform described above. In this case, the software module may be implemented by a software application written in an appropriate program language.

도 3은 유사 컨텐츠 추천 방법을 설명하기 위한 흐름도의 일 실시예를 도시한 도면이다.3 is a diagram illustrating an embodiment of a flowchart for explaining a method for recommending similar content.

도 3에 도시된 바와 같이, 본 유사 컨텐츠 추천 방법은 사용자가 시청한 적어도 하나의 컨텐츠 중 특정 컨텐츠를 선택하는 단계(S305)로부터 시작된다. 단계(S310)에서는 타겟 컨텐츠를 선택한다. 타겟 컨텐츠는 IPTV 서비스에 의해 제공되는 컨텐츠들 중 위 특정 컨텐츠를 제외한 어느 하나의 컨텐츠로서 선택될 수 있다. 단계(S315)에서는 특정 컨텐츠와 타겟 컨텐츠 간의 CF 유사도를 산출한다. CF 유사도는 위 수학식 1에 따라 산출될 수 있다. 단계(S320)에서는 특정 컨텐츠와 타겟 컨텐츠 간의 키워드 유사도를 산출한다. 키워드 유사도는 특정 컨텐츠에 대한 키워드 벡터 및 타겟 컨텐츠에 대한 키워드 벡터를 참조하여 위 수학식 2에 따라 산출될 수 있다. 단계(S325)에서는 CF 유사도와 키워드 유사도를 가중 평균하여 최종 유사도를 산출한다. 최종 유사도는 위 수학식 3 및 수학식 4에 따라 산출될 수 있다. 단계(S330)에서는 최종 유사도에 근거하여 타겟 컨텐츠를 사용자에게 추천할 지의 여부를 결정한다. 본 단계에서는 최종 유사도가 선정된 임계값 이상인 경우 타겟 컨텐츠를 사용자에게 추천하는 것으로 결정할 수 있다.As shown in FIG. 3 , the present similar content recommendation method starts with the step of selecting a specific content from among at least one content that the user has watched ( S305 ). In step S310, target content is selected. The target content may be selected as any one content other than the above specific content among content provided by the IPTV service. In step S315, a CF similarity between specific content and target content is calculated. The CF similarity may be calculated according to Equation 1 above. In step S320, the keyword similarity between specific content and target content is calculated. The keyword similarity may be calculated according to Equation 2 above with reference to the keyword vector for the specific content and the keyword vector for the target content. In step S325, the final similarity is calculated by averaging the CF similarity and the keyword similarity. The final similarity may be calculated according to Equations 3 and 4 above. In step S330, it is determined whether to recommend the target content to the user based on the final similarity. In this step, when the final similarity is greater than or equal to a predetermined threshold, it may be determined that the target content is recommended to the user.

이상의 설명에 있어서 어떤 구성 요소가 다른 구성 요소에 접속되거나 결합된다는 기재의 의미는 당해 구성 요소가 그 다른 구성 요소에 직접적으로 접속되거나 결합된다는 의미뿐만 아니라 이들이 그 사이에 개재된 하나 또는 그 이상의 타 구성 요소를 통해 접속되거나 결합될 수 있다는 의미를 포함하는 것으로 이해되어야 한다. 이외에도 구성 요소들 간의 관계를 기술하기 위한 용어들(예컨대, '간에', '사이에' 등)도 유사한 의미로 해석되어야 한다.In the above description, the meaning of the description that a component is connected to or coupled to another component means that the component is directly connected or coupled to the other component, as well as one or more other components interposed therebetween. It should be understood to include the meaning that may be connected or coupled through an element. In addition, terms for describing the relationship between elements (eg, 'between', 'between', etc.) should also be interpreted with similar meanings.

본원에 개시된 실시예들에 있어서, 도시된 구성 요소들의 배치는 발명이 구현되는 환경 또는 요구 사항에 따라 달라질 수 있다. 예컨대, 일부 구성 요소가 생략되거나 몇몇 구성 요소들이 통합되어 하나로 실시될 수 있다. 또한 일부 구성 요소들의 배치 순서 및 연결이 변경될 수 있다.In the embodiments disclosed herein, the arrangement of the illustrated components may vary depending on the environment or requirements in which the invention is implemented. For example, some components may be omitted or some components may be integrated and implemented as one. Also, the arrangement order and connection of some components may be changed.

이상에서는 본 발명의 다양한 실시예들에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예들에 한정되지 아니하며, 상술한 실시예들은 첨부하는 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양하게 변형 실시될 수 있음은 물론이고, 이러한 변형 실시예들이 본 발명의 기술적 사상이나 범위와 별개로 이해되어져서는 아니 될 것이다. 따라서, 본 발명의 기술적 범위는 오직 첨부된 특허청구범위에 의해서만 정해져야 할 것이다.In the above, various embodiments of the present invention have been shown and described, but the present invention is not limited to the specific embodiments described above, and the above-described embodiments depart from the gist of the present invention as claimed in the appended claims. Without this, various modifications may be made by those of ordinary skill in the art to which the present invention pertains, and these modified embodiments should not be understood separately from the technical spirit or scope of the present invention. Accordingly, the technical scope of the present invention should be defined only by the appended claims.

100: 컨텐츠 추천 장치
110: 데이터베이스부
120: 프로세싱 엔진
200: 컨텐츠 테이블100: content recommendation device
110: database unit
120: processing engine
200: content table

Claims

As a content recommendation method,
selecting a specific content from among at least one content watched by the user;
selecting target content;
Calculating a content-based collaborative filtering (CF) similarity between the specific content and the target content;
calculating a keyword similarity between the specific content and the target content;
calculating a final similarity by weighted average of the CF similarity and the keyword similarity; and
and determining whether to recommend the target content to the user based on the final similarity.

According to claim 1,
Calculating the content-based collaborative filtering (CF) similarity between the specific content and the target content includes the following formula

- Here, w represents a set of IDs (Identifications) of users who have viewed the specific content, and W _i represents a set of IDs of users who have viewed content i,

represents the union of w and Wi,

represents the number of elements of the union of w and Wi,

represents the intersection of w and Wi,

represents the number of elements of the intersection of w and Wi, and x _i represents the CF similarity between the specific content and the content i - calculating the CF similarity according to .

3. The method of claim 2,
Calculating the keyword similarity between the specific content and the target content includes referring to a keyword vector for the specific content and a keyword vector for the target content.

4. The method of claim 3,
The keyword vector for the specific content has a number of elements corresponding to the total number of candidate keywords, the elements respectively correspond to the candidate keywords, and each of the elements is the number of elements of the corresponding candidate keyword in the specific content. A method for recommending content, wherein each of the elements has a value of a size corresponding to importance, and each of the elements takes a value greater than or equal to 0 and less than or equal to a maximum value.

5. The method of claim 4,
When an element value of a specific element of the keyword vector for the specific content is 0, it indicates that the candidate keyword corresponding to the specific element is not related to the specific content.

4. The method of claim 3,
The keyword vector for the target content has a number of elements corresponding to the total number of candidate keywords, the elements respectively correspond to the candidate keywords, and each of the elements is the number of elements of the corresponding candidate keyword in the target content. A method for recommending content, wherein each of the elements has a value of a size corresponding to importance, and each of the elements takes a value greater than or equal to 0 and less than or equal to a maximum value.

7. The method of claim 6,
When an element value of a specific element of the keyword vector for the target content is 0, it indicates that the candidate keyword corresponding to the specific element is not related to the target content.

7. The method of claim 4 or 6,
Calculating the keyword similarity between the specific content and the target content includes the following formula

- where k is a keyword vector for the specific content, K _i is a keyword vector for content i,

9. The method of claim 8,
The step of calculating the final similarity by weighted average of the CF similarity and the keyword similarity is as follows:

- here

Wow

is a value greater than or equal to 0 and less than or equal to 1,

Wow

the sum of is 1,

is the Manhattan distance of the keyword vector for the specific content,

is the Manhattan distance of the keyword vector for content i,

is a keyword filter function, and S _i represents the final similarity between the specific content and the content i - calculating the final similarity according to .

10. The method of claim 9,
When elements having a non-zero value in the keyword vector for the specific content and elements having a non-zero value in the keyword vector for the target content are mutually exclusive, the specific content and the target content are selected from among the candidate keywords. Indicates that no candidate keywords are shared and the keyword filter function

has a value of 0, the content recommendation method.

According to claim 1,
The determining whether to recommend the target content to the user based on the final similarity includes determining that the target content is recommended to the user when the final similarity is greater than or equal to a predetermined threshold value, How to recommend content.

As a content recommendation method,
selecting a specific content from among at least one content watched by the user;
selecting a plurality of target contents;
calculating final similarities between the specific content and the plurality of target content; and
determining to recommend at least one target content among the plurality of target contents to the user based on the final similarities;
Each of the final similarities is calculated based on a content-based collaborative filtering (CF) similarity between the specific content and the corresponding target content and a keyword similarity between the specific content and the corresponding target content.

13. The method of claim 12,
Each of the final similarities is calculated by weighted average of the CF similarity and the keyword similarity.

A device for content recommendation, comprising:
A database unit for storing keyword vectors for a content table and contents, wherein IDs of users who have viewed the corresponding contents are stored in association with each of the contents in the contents table; and
a processing engine;
The processing engine refers to an operation of calculating a content-based collaborative filtering (CF) similarity between the specific content viewed by the user and the target content with reference to the content table, a keyword vector for the specific content and the target content. Calculating a keyword similarity between the specific content and the target content using a keyword vector, calculating a final similarity by weighted average of the CF similarity and the keyword similarity, and selecting the target content based on the final similarity to the user A content recommendation device, configured to perform an operation of determining whether to recommend to a user.

15. The method of claim 14,
The processing engine is

- Here, w represents a set of IDs of users who have viewed the specific content, and W _i represents a set of IDs of users who have viewed content i,

represents the union of w and Wi,

represents the number of elements of the union of w and Wi,

represents the intersection of w and Wi,

16. The method of claim 15,
The keyword vector for the specific content has a number of elements corresponding to the total number of candidate keywords, the elements respectively correspond to the candidate keywords, and each of the elements is the number of elements of the corresponding candidate keyword in the specific content. A content recommendation apparatus having an element value of a size corresponding to importance, wherein each of the elements takes a value greater than or equal to 0 and less than or equal to a maximum value.

17. The method of claim 16,
When an element value of a specific element of the keyword vector for the specific content is 0, it indicates that the candidate keyword corresponding to the specific element is not related to the specific content.

16. The method of claim 15,
The keyword vector for the target content has a number of elements corresponding to the total number of candidate keywords, the elements respectively correspond to the candidate keywords, and each of the elements is the number of elements of the corresponding candidate keyword in the target content. A content recommendation apparatus having an element value of a size corresponding to importance, wherein each of the elements takes a value greater than or equal to 0 and less than or equal to a maximum value.

19. The method of claim 18,
When an element value of a specific element of the keyword vector for the target content is 0, it indicates that the candidate keyword corresponding to the specific element is not related to the target content.

19. The method of claim 16 or 18,
The processing engine is

is the Manhattan distance between the keyword vector k for the specific content and the keyword vector K _i for the content i, MAX is the maximum value multiplied by the total number of the candidate keywords, and y _i is the specific content and indicating the keyword similarity between the content i and the content i, further configured to perform an operation of calculating the keyword similarity between the specific content and the target content according to

21. The method of claim 20,
The processing engine is

- here

Wow

is a value greater than or equal to 0 and less than or equal to 1,

Wow

the sum of is 1,

is the Manhattan distance of the keyword vector for the specific content,

is the Manhattan distance of the keyword vector for content i,

is a keyword filter function, and S _i represents a final degree of similarity between the specific content and content i.

22. The method of claim 21,
When elements having a non-zero value in the keyword vector for the specific content and elements having a non-zero value in the keyword vector for the target content are mutually exclusive, the specific content and the target content are selected from among the candidate keywords. Indicates that no candidate keywords are shared and the keyword filter function

has a value of 0, a content recommendation device.

15. The method of claim 14,
The processing engine is further configured to perform an operation of determining to recommend the target content to the user when the final similarity is greater than or equal to a predetermined threshold.

A device for content recommendation, comprising:
a database unit for storing content-related data; and
a processing engine;
The processing engine calculates final similarities between the specific content viewed by the user and a plurality of target contents with reference to the content-related data, and selects at least one target content from among the plurality of target contents based on the final similarities is configured to determine a recommendation to a user;
The processing engine is further configured to calculate each of the final similarities based on a content-based collaborative filtering (CF) similarity between the specific content and the corresponding target content and a keyword similarity between the specific content and the corresponding target content. A content recommendation device.

25. The method of claim 24,
The content-related data includes a content table and data on keyword vectors for the content, and the content table stores IDs of users who have viewed the corresponding content in association with each of the content items. .

25. The method of claim 24,
The processing engine is further configured to calculate each of the final similarities by weighted average of the CF similarity and the keyword similarity.

A computer-readable recording medium recording a program, the program including instructions, which, when executed by a computer, perform the method according to any one of claims 1 to 11.

A computer-readable recording medium recording a program, the program including instructions, which, when executed by a computer, perform the method according to any one of claims 12 and 13.