KR101372928B1

KR101372928B1 - Apparatus, System, Method and Computer Readable Recording Media Storing the Program for Automatic Recommendation of TV Program Contents based on User-Preferred Topic

Info

Publication number: KR101372928B1
Application number: KR1020120011067A
Authority: KR
Inventors: 김문철; 김은희; 표신지
Original assignee: 한국과학기술원
Priority date: 2012-02-03
Filing date: 2012-02-03
Publication date: 2014-03-14
Also published as: KR20130090042A

Abstract

본 발명은 사용자의 시청기록에 주제 모델링(topic modeling)의 기본 모델인 LDA(Latent Dirichlet Allocation)를 적용하되 개인의 선호도와 동시에 대중의 선호도를 고려한 협업필터링 기법을 활용하여 사용자 선호주제를 찾아 추천하는 TV 프로그램 콘텐츠 자동추천 기술을 제시한다. 본 발명의 일 측면은, 복수의 사용자의 선호시청정보를 이용하여 사용자 그룹을 설정하는 사용자 그룹핑부와, 상기 선호시청정보를 이용하여 상기 사용자 그룹별로 주제모델 파라미터(topic model parameter)를 학습하는 주제모델 파라미터 학습부와, 상기 주제모델 파라미터 중 은닉주제 파라미터를 개인별 선호도와 그룹별 선호도에 매핑(mapping)하는 파라미터 매핑부, 및 상기 개인별 선호도의 다양성을 드러내기 위하여 상기 주제모델 파라미터에 추가적인 가공 또는 추론을 수행하는 파라미터 가공부를 포함하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치를 제공한다. 본 발명에 따르면, 사용자의 시청습관 및 대중선호 콘텐츠를 고려한 TV 프로그램 추천이 이루어짐으로써, 사용자들의 시청특징 분석이 용이해지고 사용자의 TV 이용 편의성이 증대하는 효과가 있다.The present invention applies LDA (Latent Dirichlet Allocation), which is a basic model of topic modeling, to a user's viewing history, and finds and recommends user preference topics by using collaborative filtering techniques considering personal preferences and public preferences. We present technology for auto recommending TV program contents. According to an aspect of the present invention, a user grouping unit configured to set a user group by using preference viewing information of a plurality of users, and a subject that learns a topic model parameter for each user group using the preferred viewing information. A model parameter learning unit, a parameter mapping unit for mapping a hidden topic parameter among the subject model parameters to individual and group preferences, and additional processing or inference to the subject model parameters to reveal the diversity of the individual preferences. It provides a user preference topic-based TV program content automatic recommendation apparatus including a parameter processing unit for performing the. According to the present invention, the TV program recommendation is made in consideration of the user's viewing habits and popular content, thereby facilitating analysis of viewing characteristics of the user and increasing user's convenience in using the TV.

Description

TV program content recommendation system based on user preferences Apparatus, System, Method and Computer Readable Recording Media Storing the Program for Automatic Recommendation of TV Program Contents based on User-Preferred Topic}

본 발명은 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치, 시스템, 방법 및 그 방법을 실행하는 프로그램이 기록된 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다. 구체적으로 본 발명은 은닉주제 파라미터의 선행확률인 디리클레 분포(Dirichlet distribution)를 학습함에 있어서 그 선호시청정보 중 소정의 상위 일부만을 그대로 활용하고 나머지 하위 일부는 0으로 두어 가공하거나 그 디리클레 분포를 비대칭(asymmetric) LDA(Latent Dirichlet Allocation) 모델로 연산함으로써 은닉주제(latent topic)를 추론하고, 추론된 은닉주제에 사용자별 선호도와 그룹별 선호도를 이용한 협업 필터링 기술을 적용하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치, 시스템, 방법 및 그 방법을 실행하는 프로그램이 기록된 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to an apparatus, system, method for automatically recommending TV program content based on user preferences, and a computer-readable recording medium having recorded thereon a program for executing the method. In detail, the present invention utilizes only a predetermined upper part of the preferred viewing information as it is while learning the Dirichlet distribution, which is a prior probability of the hidden subject parameter, and leaves the remaining lower part as 0, or asymmetrics the Dirichlet distribution. asymmetric) Automatically infers the latent topic by calculating with the LDA (Latent Dirichlet Allocation) model, and automatically applies the TV program contents based on the user's preference topic applying the collaborative filtering technology using the user's preference and the group's preference to the inferred hidden topic. The present invention relates to a computer-readable recording medium having recorded thereon a recommended apparatus, a system, a method, and a program for executing the method.

최근 다채널, 다매체, 방송과 통신의 융합으로 인한 IPTV 및 Smart TV의 등장으로 방대한 양의 TV 프로그램이 시청자에게 제공되면서, 시청자(사용자)가 원하는 TV 프로그램 콘텐츠를 찾아 시청하는 것이 어려운 TV 시청환경이 되었다. 조사된 바에 따르면, 75%의 사용자들은 원하는 콘텐츠를 찾기까지 몇 분이상이 걸리고, 34%의 고객들은 TV, PC 및 모바일 기기에서 이러한 원하는 콘텐츠를 선택하는 것에 두려움을 느끼며, 영국인 50%가 개인의 시청 경험에 맞춘 추천에 관심이 있고, 영국인 43% 는 흥미 있는 콘텐츠들이 추천될 경우 Vod 콘텐츠 소비를 위해 돈을 지불할 의사가 있다고 한다.Recently, with the emergence of IPTV and Smart TV due to the convergence of multi-channel, multi-media, broadcasting and communication, a large amount of TV programs are provided to viewers, which makes it difficult for viewers (users) to find and watch TV program contents. It became. According to the survey, 75% of users take more than a few minutes to find the content they want, and 34% of customers are afraid to choose that content on TVs, PCs, and mobile devices. Interested in recommendations tailored to the viewing experience, 43% of British are willing to pay for Vod content consumption if interesting content is recommended.

콘텐츠 추천에 대한 연구는 비디오 추천 및 음악 추천, 책 추천, 뉴스 추천과 같이 단편 판매를 목표로 둔 Amazon, Google, Apple과 같은 온라인 상점 및 검색 업체 그리고 모바일 폰의 앱 추천 등을 통해 연구되어 왔고, 그 실효성이 입증되어 왔다. 추천은 검색과 밀접한 연관을 지니고 있으며 키워드 중심의 query에 관련 연관 검색어들이 도출되는 형태이다. 사용자에 대한 상품 추천은 이와 유사한 형태로 사용자의 소비 이용내역을 query에 매칭하고 관련된 콘텐츠들을 추천해주는 형태를 일반적으로 적용해 왔다. 사용자의 선호도를 특징 벡터(feature vector)로 표현하고 이를 가상의 은닉 주제(hidden topic, latent topic)에 표현한 대표적인 모델로 Hofmann의 PLSI (Probabilistic Latent Semantic Indexing)를 기초로 많은 연구가 진행되고 있다.Research on content recommendations has been studied through online store and search companies such as Amazon, Google, Apple, and short-selling apps such as video and music recommendations, book recommendations, and news recommendations, and app recommendations on mobile phones. Its effectiveness has been proven. Recommendation is closely related to search, and related search terms are derived from keyword-based query. Product recommendation to the user has been similarly applied in the form of matching the user's consumption history to a query and recommending related contents. As a representative model representing the user's preference as a feature vector and expressing it in a hidden topic, a latent topic, many researches have been conducted based on Hofmann's Probabilistic Latent Semantic Indexing (PLSI).

첫째, 일련의 카테고리에 기반한 시청선호도 벡터를 이용한 추천에 관한 종래기술이 있다. 일련의 알려진 카테고리(장르, 채널 등)를 기반으로 각 카테고리의 개수만큼의 특징벡터(feature vector)에 대한 사용자들의 선호시청을 기반으로 사용자 선호도를 구성하고, 이러한 사용자의 특징벡터와 각 콘텐츠의 특징벡터 간의 유사성(혹은 관련성, correlation)을 기반으로 선호 항목의 추천이 이루어지는 방식이다. 유사성 계산에 널리 이용되는 방법으로 Vector Cosine Correlation기법, Euclidian distance방법 등이 있다.First, there is a related art regarding recommendation using a viewing preference vector based on a series of categories. Based on a series of known categories (genres, channels, etc.), user preferences are constructed based on the user's preferences for feature vectors as many as the number of categories, and the feature vectors of each user and the characteristics of each content. It is a way of recommending preference items based on similarity (or correlation, correlation) between vectors. Widely used methods for similarity calculation include Vector Cosine Correlation and Euclidian distance.

둘째, 주제 모델링(Topic Modeling)을 기반으로 영화를 추천하는 시스템에 관한 종래기술이 있다. 이 방식은 주요 추천 대상에서 영화 등 단편 콘텐츠의 추천을 제외시키는데, 이는 영화의 경우 재시청 비율이 낮고 단편위주로 이루어지므로 하나의 제목하에 여러 회로 구성된 영화가 적기 때문이다. 또한, 그 데이터들은 사용자들의 평가 데이터를 기반으로 하여 그 희박성이 크므로, 비평가 항목에 대한 사용자의 평가 예측 목적으로 주로 이용된다.Second, there is a related art related to a system for recommending a film based on topical modeling. This method excludes the recommendation of short content such as movies from the main recommendation targets, since films have low re-watching rate and are mainly short stories, so there are few films composed of several circuits under one title. In addition, since the data are largely lean based on the evaluation data of the users, they are mainly used for the purpose of predicting the user's evaluation of non-evaluation items.

도1은 종래의 주제 모델링 기반 사용자선호 콘텐츠 추천기술의 흐름도이다. 이 종래기술에 따르면, 추천을 위한 모델링 기반의 파라미터를 학습하고, 파라미터 학습 데이터 기반 비평가 항목에 대한 평가 예측을 한 후, 추천후보 대상에서 사용자의 시청항목을 제외하고, 모델 기반의 사용자와 유사성(관련성)이 높은 추천항목을 상위에 정렬하여 추천하는 방식으로 사용자선호 콘텐츠 추천이 이루어지게 된다.1 is a flowchart of a conventional subject modeling-based user preference content recommendation technology. According to this prior art, after learning modeling-based parameters for recommendation, evaluating predictions for non-evaluation items based on parameter learning data, except for the user's viewing items in the recommendation candidates, the similarity with the model-based user ( The user's preference content recommendation is made by recommending highly recommended items by sorting them to the top.

이러한 종래기술에 기반한 기존 TV 프로그램 추천 시스템은 추천 모델 구성에 있어 사용자 선호도 추론 시 TV 프로그램 장르 혹은 채널 등을 위주로 국한되어 시청하는 사용자의 특정 TV 프로그램 시청 사유를 의미적으로 분석하기에 부족한 점이 있었다. 하나의 제목하에 여러 회차 방영으로 구성되는 특징을 지니는 TV 프로그램 콘텐츠에 비해 단편 영화위주의 콘텐츠에 대해 사용자가 직접 평가한 데이터를 기반으로 은닉 주제 모델링 및 협업 필터링 기술을 적용해온 기존의 자동 추천 시스템들은 사용자가 시청하지 않은 콘텐츠에 대해 평가를 예측하는 것을 주로 목적으로 하는 반면, TV 프로그램 자동 추천은 사용자의 이용 편리를 돕는 다는 목적으로 사용자가 시청할만한 콘텐츠를 선별하는 것을 주 목적으로 하기 때문에, TV 프로그램 추천 응용에 효과적으로 적용되기에는 한계가 있었다.Existing TV program recommendation system based on the prior art has a lack of meaningful analysis of the reason for viewing a specific TV program of a user who is limited to TV program genre or channel when inferring user preference in recommendation model composition. Existing automated recommendation systems that have applied hidden subject modeling and collaborative filtering techniques based on user-evaluated data on short film-oriented content, compared to TV program content, which consists of multiple episodes under one title, TV programs are primarily aimed at predicting ratings for content you haven't watched, while automatic recommendation of TV shows is primarily aimed at screening content that users can watch for the purpose of ease of use. There was a limit to the effective application of the recommended application.

상기 문제점을 해결하기 위하여 본 발명은, 본 발명은 실제 사용자의 시청이용내역 데이터를 이용하여 사용자의 선호주제를 추론 및 학습함으로써, TV 프로그램에 대한 의미적 추천을 가능하게 하는 것을 목적으로 한다.In order to solve the above problems, the present invention aims to enable a semantic recommendation for a TV program by inferring and learning a user's preference topic using the actual user's viewing history data.

또한, LDA 모델에 기반한 협업필터링 기법을 이용하면서 디리클레 분포(Dirichlet distribution)를 비대칭 최적화함으로써, 대중의 선호도 대비 개인 선호도에 비중을 두고 개인 선호도의 다양성을 반영한 향상된 추천성능을 제공하는 것을 목적으로 한다.In addition, by using a collaborative filtering technique based on the LDA model, asymmetric optimization of the Dirichlet distribution aims to provide improved recommendation performance that reflects the diversity of personal preferences with emphasis on individual preferences versus public preferences.

상기 목적을 달성하기 위하여 본 발명의 제1측면은, 복수의 사용자의 선호시청정보를 이용하여 사용자 그룹을 설정하는 사용자 그룹핑부와, 상기 선호시청정보를 이용하여 상기 사용자 그룹별로 주제모델 파라미터(topic model parameter)를 학습하는 주제모델 파라미터 학습부와, 상기 주제모델 파라미터 중 은닉주제 파라미터를 개인별 선호도와 그룹별 선호도에 매핑(mapping)하는 파라미터 매핑부, 및 상기 개인별 선호도의 다양성을 드러내기 위하여 상기 주제모델 파라미터에 추가적인 가공 또는 추론을 수행하는 파라미터 가공부를 포함하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치를 제공한다.In order to achieve the above object, a first aspect of the present invention provides a user grouping unit for setting a user group using preference viewing information of a plurality of users, and a subject model parameter for each user group using the preference viewing information. a subject model parameter learning unit for learning model parameters, a parameter mapping unit for mapping a hidden topic parameter among the subject model parameters to individual preferences and group preferences, and the subject to reveal diversity of individual preferences. A user preference topic based TV program content recommendation apparatus including a parameter processing unit that performs additional processing or inference on model parameters is provided.

또한, 상기 파라미터 매핑부는, 문서 생성 모델링에서 문서에 나타난 단어를 의미하는 매핑(mapping), 문서 생성 모델링에서 사용자를 문서로 보고 사용자 시청 토큰을 단어로 보도록 구성한 매핑, 각 문서를 TV프로그램에, 각 시청 토큰을 시청 사용자에 대응시키는 매핑, 또는 대중 선호도와 개인 선호도를 주제모델링으로 구성한 매핑 중 어느 하나를 수행하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치를 제공한다.In addition, the parameter mapping unit, the mapping (mapping) meaning the words appearing in the document in the document generation modeling, the mapping configured to view the user viewing token as a word in the document generation modeling, and to view each document to the TV program, An apparatus for automatically recommending TV program content based on user preferences that performs any one of a mapping corresponding to a viewing token to a viewing user or a mapping consisting of subject modeling and popular preferences and personal preferences is provided.

또한, 상기 파라미터 가공부는, 상기 은닉주제 파라미터의 선행확률인 디리클레 분포(Dirichlet distribution)를 학습함에 있어서, 상기 선호시청정보 중 소정의 상위 일부만을 그대로 활용하고 나머지 하위 일부는 0으로 두어 가공하거나, 상기 디리클레 분포를 비대칭(asymmetric) 디리클레 분포로 연산하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치를 제공한다.In addition, the parameter processing unit, in learning the Dirichlet distribution (Dirichlet distribution), which is a prior probability of the hidden subject parameter, utilizes only a predetermined upper part of the preferred viewing information as it is and leave the remaining lower part as 0, or the processing An apparatus for automatically recommending TV program content based on a user's preference that calculates a Dirichlet distribution into an asymmetric Dirichlet distribution.

또한, 상기 선호시청정보를 이용하여 방영종료 콘텐츠 또는 상기 사용자가 가입하지 않은 콘텐츠 중 어느 하나 이상을 추천후보에서 제외하는 비선호시청항목 필터를 더 포함하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치를 제공한다.In addition, by using the preferred viewing information provides a user preference topic-based TV program content automatic recommendation device further comprises a non-preferred viewing item filter to exclude any one or more of the end of the broadcast content or the content that the user has not subscribed to do.

또한, 상기 주제모델 파라미터를 이용하여 사용자별 추천후보 콘텐츠를 정렬하는 추천후보 콘텐츠 정렬부를 더 포함하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치를 제공한다.In addition, the present invention provides a user preference topic-based TV program content automatic recommendation apparatus further comprising a recommendation candidate content aligning unit for arranging the recommended candidate content for each user using the subject model parameter.

또한, 상기 매핑 또는 상기 추가 추론에 GS(Gibbs Sampling), CGS(Collapsed Gibbs Sampling), VBI(Variational Bayesian Inference), CVBI(Collapsed Variational Bayesian Inference) 중 어느 하나 이상의 기법을 이용하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치를 제공한다.In addition, the user preference topic-based TV program content using any one or more of a technique such as Gibbs Sampling (CGS), Collapsed Gibbs Sampling (CGS), Variational Bayesian Inference (CBI), or Collapsed Variational Bayesian Inference (CVBI) for the mapping or the additional inference. Provide an automatic recommendation device.

상기 목적을 달성하기 위하여 본 발명의 제2측면은, TV 콘텐츠를 방송 송출하는 미디어 스트리밍 서버와; 상기 미디어 스트리밍 서버로부터 TV 콘텐츠를 수신하면서 시청기록을 송신함으로써 TV 콘텐츠 추천목록을 전달받는 사용자 TV와; 상기 사용자 TV로부터 시청기록을 수신하여 저장하는 TV 사용자 시청기록 데이터베이스와; 복수의 상기 사용자 TV로부터 수신한 시청기록으로부터 생성한 선호시청정보를 이용하여 사용자 그룹을 설정하고, 상기 선호시청정보를 이용하여 상기 사용자 그룹별로 주제모델 파라미터(topic model parameter)를 학습하고, 상기 주제모델 파라미터 중 은닉주제 파라미터를 개인별 선호도와 그룹별 선호도에 매핑(mapping)하며, 상기 개인별 선호도의 다양성을 드러내기 위하여 상기 주제모델 파라미터에 추가적인 가공 또는 추론을 수행하여 TV 콘텐츠 추천목록을 상기 사용자 TV에 송신하는 사용자관리 및 추천서버; 및 상기 미디어 스트리밍 서버 및 상기 사용자관리 및 추천서버에 송신할 TV 콘텐츠를 저장하는 TV 콘텐츠 데이터베이스를 포함하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 시스템을 제공한다.In order to achieve the above object, the second aspect of the present invention is a media streaming server for broadcasting TV content; A user TV receiving a TV content recommendation list by transmitting a viewing record while receiving TV content from the media streaming server; A TV user viewing record database for receiving and storing a viewing record from the user TV; Set a user group using the preference viewing information generated from the viewing records received from the plurality of user TVs, and learn a topic model parameter for each user group using the preferred viewing information. Mapping hidden topic parameters among model parameters to individual preferences and group preferences, and performing additional processing or inference on the subject model parameters to reveal the diversity of individual preferences, thereby providing a TV content recommendation list to the user TV. User management and recommendation server for transmitting; And a TV content database storing a TV content to be transmitted to the media streaming server and the user management and recommendation server.

또한, 상기 사용자관리 및 추천서버는, 상기 은닉주제 파라미터의 선행확률인 디리클레 분포(Dirichlet distribution)를 학습함에 있어서 상기 선호시청정보 중 소정의 상위 일부만을 그대로 활용하고 나머지 하위 일부는 0으로 두어 가공하거나 상기 디리클레 분포를 비대칭(asymmetric) 디리클레 분포로 연산하고, 상기 선호시청정보를 이용하여 방영종료 콘텐츠 또는 상기 사용자 TV가 가입하지 않은 콘텐츠 중 어느 하나 이상을 추천후보에서 제외하며, 상기 주제모델 파라미터를 이용하여 사용자별 추천후보 콘텐츠를 정렬하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 시스템을 제공한다.In addition, the user management and recommendation server, while learning the Dirichlet distribution (Dirichlet distribution), which is the prior probability of the hidden topic parameters, utilizes only a predetermined upper part of the preferred viewing information as it is and processes the remaining lower part to 0 The Dirichlet distribution is calculated as an asymmetric Dirichlet distribution, and one or more of the end-of-air content or the content to which the user's TV is not subscribed are excluded from the recommendation candidate using the preferred viewing information, and the subject model parameter is used. It provides an automatic recommendation system for TV program contents based on user preference topics that sorts the recommended candidate content for each user.

상기 목적을 달성하기 위하여 본 발명의 제3측면은, 복수의 사용자의 선호시청정보를 이용하여 사용자 그룹을 설정하는 사용자 그룹핑 단계와, 상기 선호시청정보를 이용하여 상기 사용자 그룹별로 주제모델 파라미터(topic model parameter)를 학습하는 주제모델 파라미터 학습 단계와, 상기 주제모델 파라미터 중 은닉주제 파라미터를 개인별 선호도와 그룹별 선호도에 매핑(mapping)하는 파라미터 매핑 단계와, 상기 개인별 선호도의 다양성을 드러내기 위하여 상기 주제모델 파라미터에 추가적인 가공 또는 추론을 수행하는 파라미터 가공 단계와, 상기 선호시청정보를 이용하여 방영종료 콘텐츠 또는 상기 사용자가 가입하지 않은 콘텐츠 중 어느 하나 이상을 추천후보에서 제외하는 비선호시청항목 필터링 단계, 및 상기 주제모델 파라미터를 이용하여 사용자별 추천후보 콘텐츠를 정렬하는 추천후보 콘텐츠 정렬 단계를 포함하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 방법을 제공한다.In order to achieve the above object, a third aspect of the present invention provides a user grouping step of setting a user group using preference viewing information of a plurality of users, and a subject model parameter (topic) for each user group using the preference viewing information. a subject model parameter learning step of learning a model parameter, a parameter mapping step of mapping a hidden topic parameter among the subject model parameters to individual preferences and group preferences, and the subject to reveal the diversity of the individual preferences. A parameter processing step of performing additional processing or inference on a model parameter, a non-preferred viewing item filtering step of excluding any one or more of the broadcasting end content or the content not subscribed to by the user using the preferred viewing information, and Using the subject model parameters It provides a user-preferred topic-based TV program can automatically recommend how to include the recommended candidate content alignment step to align the per-user content recommendation candidates.

또한, 상기 파라미터 가공 단계는, 상기 은닉주제 파라미터의 선행확률인 디리클레 분포(Dirichlet distribution)를 학습함에 있어서, 상기 선호시청정보 중 소정의 상위 일부만을 그대로 활용하고 나머지 하위 일부는 0으로 두어 가공하거나, 상기 디리클레 분포를 비대칭(asymmetric) 디리클레 분포로 연산하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 방법을 제공한다.In the parameter processing step, when learning a Dirichlet distribution, which is a prior probability of the hidden subject parameter, only a predetermined upper part of the preferred viewing information is used as it is and the remaining lower part is left as 0, or processed. A method for automatically recommending TV program content based on user preferences that calculates the Dirichlet distribution into an asymmetric Dirichlet distribution.

상기 목적을 달성하기 위하여 본 발명의 제4측면은, 복수의 사용자로부터 수신한 선호시청정보를 이용하여 설정된 사용자 그룹별로 주제모델 파라미터(topic model parameter)를 학습하는 주제모델 파라미터 학습 단계와, 상기 주제모델 파라미터 중 은닉주제 파라미터를 개인별 선호도와 그룹별 선호도에 매핑(mapping)하되, 문서 생성 모델링에서 문서에 나타난 단어를 의미하는 매핑(mapping), 문서 생성 모델링에서 사용자를 문서로 보고 사용자 시청 토큰을 단어로 보도록 구성한 매핑, 각 문서를 TV프로그램에, 각 시청 토큰을 시청 사용자에 대응시키는 매핑, 또는 대중 선호도와 개인 선호도를 주제모델링으로 구성한 매핑 중 어느 하나를 수행하는 파라미터 매핑 단계, 및 상기 은닉주제 파라미터의 선행확률인 디리클레 분포(Dirichlet distribution)를 학습함에 있어서, 상기 선호시청정보 중 소정의 상위 일부만을 그대로 활용하고 나머지 하위 일부는 0으로 두어 가공하거나 상기 디리클레 분포를 비대칭(asymmetric) 디리클레 분포로 연산하도록 가공함으로써, 상기 개인별 선호도의 다양성을 드러내는 추가 추론을 수행하는 파라미터 가공 단계를 포함하는 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 방법을 제공한다.In order to achieve the above object, a fourth aspect of the present invention provides a subject model parameter learning step of learning a topic model parameter for each user group set by using preference viewing information received from a plurality of users, and the subject. Mapping the hidden topic parameters among the model parameters to individual preferences and group preferences, but mapping means the words that appear in the document in document generation modeling, and user viewing tokens in the document generation modeling. A parameter mapping step of performing any one of a mapping configured to view a document, a mapping of each document to a TV program, a mapping of each viewing token to a viewer, or a mapping of popular and personal preferences to subject modeling, and the hidden topic parameter. In learning the Dirichlet distribution, Further, by using only a predetermined upper part of the preferred viewing information and processing the remaining lower part as 0, or processing the Dirichlet distribution to be an asymmetric Dirichlet distribution, additional inference that reveals the diversity of personal preferences is obtained. A method for automatically recommending a TV program content based on a user's preference including a parameter processing step of performing the same.

상기 목적을 달성하기 위하여 본 발명의 제5측면은, 상기 어느 하나의 방법을 실행하는 프로그램이 기록된 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In order to achieve the above object, the fifth aspect of the present invention provides a computer-readable recording medium having recorded thereon a program for executing any one of the above methods.

본 발명에 따르면, 사용자들의 선호주제를 시청빈도가 높은 콘텐츠들로 표현함으로써 사용자들의 시청특징 분석이 용이하게 되는 효과가 있다.According to the present invention, it is possible to easily analyze the viewing characteristics of the users by expressing the preference topics of the users with high frequency of viewing.

또한, 사용자의 시청습관 및 대중선호 콘텐츠를 고려한 추천을 통해 수많은 TV 프로그램 콘텐츠로부터 선별된 TV 콘텐츠를 제시함으로써 사용자의 TV 이용 편의성이 증대하는 효과가 있다.In addition, by presenting TV contents selected from numerous TV program contents through recommendation in consideration of the user's viewing habits and popular preference contents, the user's convenience in using the TV is increased.

도1은 종래의 주제 모델링 기반 사용자선호 콘텐츠 추천기술의 흐름도이다.
도2는 대칭분포와 비대칭분포에 따른 디리클레분포와 다항분포의 관계를 나타낸 데이터 분포도이다.
도 3은 본 발명의 일 실시예에 따른 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 시스템의 블록도이다.
도 4는 본 발명의 일 실시예에 따른 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치(사용자관리 및 추천서버)의 블록도이다.
도5는 본 발명의 일 실시예에 따른 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 방법의 간략한 흐름도이다.
도6은 사용자선호도 기반 그룹핑의 개념도이다.
도7은 대칭 α, 대칭 β(SS prior) 하이퍼파라미터 기반 LDA 모델을 도식화한 것이다.
도8은 비대칭 α, 대칭 β(AS prior) 하이퍼파라미터 기반 LDA 모델을 도식화한 것이다.1 is a flowchart of a conventional subject modeling-based user preference content recommendation technology.
FIG. 2 is a data distribution diagram showing the relationship between a Dirichlet distribution and a polynomial distribution according to symmetrical and asymmetrical distributions.
3 is a block diagram of a system for automatically recommending TV program content based on user preferences according to an embodiment of the present invention.
4 is a block diagram of a user preference topic-based TV program content automatic recommendation device (user management and recommendation server) according to an embodiment of the present invention.
5 is a simplified flowchart of a method for automatically recommending TV program content based on user preferences according to an embodiment of the present invention.
6 is a conceptual diagram of user preference based grouping.
7 is a schematic of a symmetric α, symmetric β (SS prior) hyperparameter based LDA model.
8 is a schematic of an asymmetric α, symmetric β (AS prior) hyperparameter based LDA model.

이하 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다. 하기에서 본 발명을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, if it is determined that a detailed description of a known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. The following terms are defined in consideration of the functions of the present invention, and may be changed according to the intentions or customs of the user, the operator, and the like. Therefore, the definition should be based on the contents throughout this specification.

이 명세서에서 주제(topic)란 각 개인 사용자의 시청 사유를 드러내는 콘텐츠들의 집합을 일컫는 용어로, 선호도(preference, interest)란 사용자 개인의 TV콘텐츠에 대한 관심도를 표현한 것으로서 대체로 다항 분포 형태의 특징벡터(feature vector)로 표현되며 사용자 프로파일(profile)이라고도 불리는 개념을 의미하는 용어로 사용한다.In this specification, a topic refers to a set of contents that reveal the reason for viewing each individual user, and a preference (interest) refers to a user's interest in TV content. Expressed as a feature vector, a term used to refer to a concept called a user profile.

본 발명은 사용자 시청이용내역 데이터에 LDA(Latent Dirichlet Allocation) 모델 기반 협업 필터링을 적용하는 기술을 통하여 TV 프로그램 추천모델을 제안한다. 제안된 추천모델을 통해 개인의 선호도와 대중의 선호도를 모두 고려한 개인 사용자별 순위정렬 모델을 기반으로 개인에게 추천을 하되, 추천 성능 향상을 위해 대중의 선호도 대비 개인 선호도에 비중을 두고, 개인 선호도의 다양성을 반영하였다. 이를 위해 개인의 선호도를 표현한 다항분포(multinomial distribution)에서 상위 확률 값을 갖는 요소를 선별하고, 다항분포의 선행확률 분포(prior distribution)인 디리클레 분포(Dirichlet distribution)를 비대칭(asymmetric) 최적화하는 방법을 이용하였다.The present invention proposes a TV program recommendation model through a technique of applying collaborative filtering based on Lateral Dirichlet Allocation (LDA) model to user viewing history data. The proposed recommendation model makes recommendations to individuals based on individual user ranking ranking models that consider both individual preferences and public preferences, but focuses on personal preferences versus personal preferences to improve recommendation performance. It reflects diversity. To do this, we select a method with high probability values from the multinomial distribution expressing individual preferences, and asymmetrically optimize the Dirichlet distribution, which is the prior distribution of the polynomial distribution. Was used.

본 발명은 사용자의 개인 선호도를 다항분포로 표현하고 이의 선행확률로 디리클레 분포가 정의된다. 디리클레 분포를 다항분포의 선행확률로 두는 주된 이유는 다항분포내의 각 요소의 변화를 디리클레 분포의 파라미터로 표현 가능하고, 디리클레 분포가 다항 분포의 공액사전분포(conjugate prior)로 사후 확률이 디리클레 분포와 같은 분포가 되어 연산의 효율성을 가질 수 있는 장점이 있기 때문이다.In the present invention, the user's personal preference is expressed in a polynomial distribution, and the Dirichlet distribution is defined by its leading probability. The main reason for setting the Dirichlet distribution as the leading probability of the polynomial distribution is that the change of each element in the polynomial distribution can be expressed as a parameter of the Dirichlet distribution, and the Dirichlet distribution is the conjugate prior to the polynomial distribution. This is because the same distribution has the advantage of having the efficiency of computation.

도2는 대칭분포와 비대칭분포에 따른 디리클레분포와 다항분포의 관계를 나타낸 데이터 분포도이다. 여기서 색이 짙음은 데이터의 응집성이 높은 것을 의미한다. 도2에서 첫 번째 줄의 첫 번째 그림은 파라미터(parameter) 값이 모두 1인 경우로 균일 분포(uniform distribution) 형태를 보인다. 도 2에서 첫 번째 줄의 두 번째와 세 번째 그림은 파라미터 값을 1이상의 값으로 증가시킬 경우 정규분포 모양의 형태를 보임을 나타낸다. 도2에서 두 번째 줄의 첫 번째와 두 번째 그림은 대칭 파라미터가 아닌 비대칭 파라미터 값들이 주어진 경우의 데이터 분포인데, 정규분포 형태에서 벗어나 비대칭 형태의 데이터 분포가 드러난다. 파라미터 값이 1 보다 작을 경우에는 각 차원의 랜덤 변수에 대한 확률 값이 결정적(deterministic) 형태를 취한다.FIG. 2 is a data distribution diagram showing the relationship between a Dirichlet distribution and a polynomial distribution according to symmetrical and asymmetrical distributions. The dark color here means that the data is highly coherent. In FIG. 2, the first picture of the first line shows a uniform distribution form when the parameter values are all 1. FIG. In FIG. 2, the second and third pictures of the first line show a normal distribution shape when the parameter value is increased to one or more values. In FIG. 2, the first and second plots of the second row are data distributions in which asymmetric parameter values are given rather than symmetric parameters. The data distribution of the asymmetric shape is revealed from the normal distribution. If the parameter value is less than 1, the probability value for the random variable of each dimension takes a deterministic form.

데이터에 대해, K-차원 디리클레 분포는 다음과 같은 수식으로 정의된다.

For the data, the K-dimensional Dirichlet distribution is defined by the following formula.

[수학식 1][Equation 1]

수학식 1은 또한 다음과 같이 간단히 표현된다.Equation 1 is also simply expressed as follows.

[수학식 2]&Quot; (2) "

[수학식 3]&Quot; (3) "

특히, 수학식 3에서 디리클레 분포는 집중 파라미터(concentration parameter) α와 각 차원별 변수 분포를 표현하는 벡터 m(

)으로 표현한 것이다.Particularly, in Equation 3, the Dirichlet distribution is a vector m (expressing a concentration parameter α and a variable distribution for each dimension).

).

도 3은 본 발명의 제1실시예에 따른 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 시스템의 블록도이다. TV 프로그램 추천 시스템의 구성을 위해서는 추천과 직접 연관된 사용자 관리 및 추천서버(400) 외에 TV 사용자 시청기록 데이터베이스(320), TV 콘텐츠 데이터베이스(330) 및 TV 프로그램 콘텐츠 방송 송출을 위한 미디어 스트리밍 서버(340) 등이 필요하다. 본 실시예는 은닉주제 추론기반 TV 프로그램 자동 추천 시스템의 구성 중 하나를 나타낸 것으로, 본 발명은 이러한 실시예의 구체적 형태에 한정되지 않는다.3 is a block diagram of a system for automatically recommending TV program content based on user preferences according to a first embodiment of the present invention. In order to configure the TV program recommendation system, in addition to the user management and recommendation server 400 directly associated with the recommendation, the TV user viewing record database 320, the TV content database 330, and the media streaming server 340 for broadcasting TV program content broadcasts are provided. Etc. are required. This embodiment shows one of the configurations of the hidden topic inference-based TV program automatic recommendation system, and the present invention is not limited to the specific form of this embodiment.

사용자가 TV 프로그램 추천 서비스를 신청하여 사용자 TV(310)가 사용자관리 및 추천서버(400)에 등록되면 본 실시예의 추천서비스가 시작된다(S310). 일정한 훈련기간 등의 시기에 사용자 TV(310)가 미디어 스트리밍 서버(340)로부터 TV 콘텐츠를 수신할 때(S320), 시청기록이 사용자 선호주제 은닉추론 등을 위하여 TV 사용자 시청기록 데이터베이스(320)에 저장된다(S330). 사용자관리 및 추천서버(400)는 TV 사용자 시청기록 데이터베이스(320)로부터 시청기록을 수신하고, TV 콘텐츠 데이터베이스(330)로부터 TV 콘텐츠 목록을 수신하는데(S340), 사용자관리 및 추천서버(400)의 구체적인 구성 및 기능은 아래에서 설명한다.When the user applies for the TV program recommendation service and the user TV 310 is registered in the user management and recommendation server 400, the recommendation service of the present embodiment is started (S310). When the user TV 310 receives the TV content from the media streaming server 340 at a certain training period or the like (S320), the viewing record is transmitted to the TV user viewing history database 320 for the purpose of concealing the user's preference topic. It is stored (S330). The user management and recommendation server 400 receives a viewing record from the TV user viewing record database 320 and receives a TV content list from the TV content database 330 (S340) of the user management and recommendation server 400. Specific configurations and functions are described below.

도 4는 본 발명의 제2실시예에 따른 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 장치(사용자관리 및 추천서버)의 블록도이다. 도5는 본 발명의 제3실시예에 따른 사용자 선호주제 기반 TV프로그램 콘텐츠 자동추천 방법의 간략한 흐름도이다. 아래에서는 도4의 장치가 도5의 방법을 실시하는 경우를 기준으로 제2실시예와 제3실시예를 동시에 설명하나, 본 발명은 도4 와 다른 장치가 도5의 방법을 실시하거나 도 5와 다른 방법을 도 4의 장치가 실시하는 경우 등 다양한 실시형태로 구현될 수 있다.4 is a block diagram of a user preference topic-based TV program content automatic recommendation device (user management and recommendation server) according to a second embodiment of the present invention. 5 is a simplified flowchart of a method for automatically recommending TV program content based on a user preference topic according to a third embodiment of the present invention. In the following, the second embodiment and the third embodiment will be described simultaneously based on the case where the apparatus of FIG. 4 implements the method of FIG. 5, but the present invention is different from FIG. And other methods may be implemented in various embodiments, such as when the apparatus of FIG.

사용자관리 및 추천서버(400)는 먼저 TV 프로그램 시청기록으로부터 사용자의 선호도(사용자 프로파일)에 관한 정보를 추출 및 가공하고(S510), 사용자들의 선호도를 기반으로 유사시청 사용자들을 그룹핑 한다(S520). 또한, 추천모델의 주제모델 파라미터 등을 학습하고(S530), 은닉주제에 관한 주제모델 파라미터를 개인별 선호도와 대중선호도에 매핑하여 은닉주제를 추론하며(S540), 개인별 선호도의 다양성을 위한 주제모델 파라미터의 가공 및 추가추론을 수행한다(S550). 그리고 TV 콘텐츠에 관한 정보를 이용하여 시청자에게 불용한 TV 프로그램 콘텐츠를 추천 대상에서 제외시킨 후(S560), 주제모델 파라미터 등을 이용하여 사용자별 추천후보 프로그램을 정렬하여 제시한다(S570, S580). 아래에서는 그 구체적인 방법을 흐름에 따라 설명한다.The user management and recommendation server 400 first extracts and processes information about a user's preference (user profile) from the TV program viewing record (S510), and groups similar-viewing users based on the user's preference (S520). In addition, the subject model parameters of the recommendation model are learned (S530), the subject model parameters of the hidden subjects are mapped to individual preferences and popular preferences, and the hidden subjects are inferred (S540), subject model parameters for the diversity of individual preferences. Processing and additional reasoning of (S550). After excluding TV program content that is not available to viewers using information on TV content (S560), the candidate program for each user is presented using the subject model parameters and the like (S570 and S580). The following describes the specific method according to the flow.

첫째, 선호시청항목 추출부(410)는 사용자들의 훈련(training) 기간 내 시청항목을 기준으로 사용자 선호시청항목의 정보를 추출한다(S510).First, the preference viewing item extracting unit 410 extracts information of the user viewing preference item based on the viewing item within a training period of users (S510).

이는 사용자에게 직접 TV 프로그램에 대한 선호여부를 요구하지 않으면서도 사용자의 선호도를 학습하여 그에 맞는 TV프로그램을 추천하기 위한 과정이다. 본 실시예의 시스템상에서는 TV 사용자 시청기록 데이터베이스(320)가 사용자 TV(310) 측으로부터 시청기록에 관한 정보를 수신하는 방식으로 구현될 수 있다.This is a process for recommending a TV program by learning the user's preferences without requiring the user to prefer the TV program. In the system of the present exemplary embodiment, the TV user viewing history database 320 may be implemented by receiving information on viewing history from the user TV 310 side.

둘째, 사용자 그룹핑부(420)는 추출된 사용자 선호시청항목의 정보를 이용하여 사용자를 그룹핑한다(S520).Second, the user grouping unit 420 groups the users by using the extracted user preference viewing information (S520).

도6은 사용자선호도 기반 그룹핑의 개념도이다. 1번 그룹에 우선멤버십을 갖는 사용자(110)는 1번 그룹 사용자(120) 중에서 상대적으로 다른 그룹과 관련성이 적은 사용자들의 그룹이다.6 is a conceptual diagram of user preference based grouping. The user 110 having priority membership in group 1 is a group of users who are less related to other groups among the group 1 user 120.

실시예에 따라 그룹핑 단계(S520)에서는 시청 TV 콘텐츠 항목의 차원을 줄여 PCA(Principle Component Analysis) 기법을 이용하거나 장르 및 채널 등에 대한 정보를 기준으로 선호도 벡터를 줄이는 기법을 이용할 수 있다. 이렇게 할 경우 시청 TV 콘텐츠 모든 항목에 대한 차원으로 그 벡터를 구성하기 보다 연산 효율성이 높아지기 때문이다. 실시예에 따라서는 그룹에 대한 멤버십을 멀티 멤버십(multi membership)에 매칭하여 추천성능을 높일 수도 있는데, 이는 주제모델링 사용에 있어서 생성모델인 관계로 그 모델에 포함되는 문서(사용자)의 수가 어느 정도 이상 확보되어야 하기 때문이다. 대표적인 그룹핑 알고리즘으로는 K-means clustering, fuzzy clustering, hierarchical Dirichlet process 등과 같은 기법을 예시할 수 있다.According to an embodiment, in the grouping step (S520), a dimension of a viewing TV content item may be reduced to use a Principle Component Analysis (PCA) technique, or a technique of reducing a preference vector based on information on a genre and a channel may be used. This is because the computational efficiency becomes higher than constructing the vector as a dimension for all items of TV content. In some embodiments, the membership of a group may be matched to multi membership to increase the recommendation performance. This is a generation model in the use of subject modeling, so that the number of documents (users) included in the model is somewhat increased. This is because it must be secured. Typical grouping algorithms include K-means clustering, fuzzy clustering, hierarchical Dirichlet process, and the like.

셋째, 파라미터 학습부(430)는 상기 사용자 선호시청항목의 정보를 이용하여 그룹별 주제모델 파라미터를 학습한다(S530).Third, the parameter learner 430 learns a subject model parameter for each group by using the information of the user preference viewing item (S530).

도7은 대칭 α, 대칭 β(SS prior) 하이퍼파라미터 기반 LDA 모델을 도식화한 것이다. 도8은 비대칭 α, 대칭 β(AS prior) 하이퍼파라미터 기반 LDA 모델을 도식화한 것이다.7 is a schematic of a symmetric α, symmetric β (SS prior) hyperparameter based LDA model. 8 is a schematic of an asymmetric α, symmetric β (AS prior) hyperparameter based LDA model.

본 실시예의 추천모델은 주제모델링(topic modeling)의 고전이라고 할 수 있는 Blei의 LDA(Latent Dirichlet Allocation) 모델에 기반하고 있는데, 이는 문서 생성 모델로서 사용자가 글을 쓰는 과정을 정형화 한 것이다. 즉, 사용자는 주제를 선택하고 해당 주제에 적합한 단어를 선택하는 등 일련의 과정을 거쳐 몇 가지 주제의 단락으로 구성된 글을 쓰게 되는데, 이러한 과정의 주요 변수가 되는 문서, 주제, 단어의 주제 및 단어들을 도식 모델(Graphical Model)의 주요 노드에 대응시키고, 확률 기반의 베이지안 모델(Baysian Model)로 구성한다.The recommendation model of this embodiment is based on Blei's Latent Dirichlet Allocation (LDA) model, which is a classic of topic modeling. It is a document generation model that formalizes the writing process of the user. In other words, the user goes through a series of processes, such as selecting a topic and selecting a word suitable for the topic, and writing a paragraph composed of several paragraphs, which are the main variables of the document, the subject, the word's subject, and the word. Correspond to the main nodes of the graphical model and construct a probability-based Bayesian model.

여기서 각 노드는 랜덤 변수(random variable)를 나타내고, 노드 간의 선은 랜덤 변수간의 영향 관계를, 사각형은 반복 연산과정을 나타낸다. 특히, 색이 있는 노드는 관찰된 데이터 변수를, 색이 없는 노드는 은닉 변수를 나타낸다. 도 8은 비대칭 α 디리클레 분포를 데이터를 기준으로 학습함에 있어서, 파라미터 θ 의 선행 디리클레 분포 학습을 위해 감마 분포(

)를 디리클레 분포의 선행 확률분포로 정의한 실시예이다.Here, each node represents a random variable, a line between nodes represents an influence relationship between random variables, and a square represents an iteration process. In particular, colored nodes represent observed data variables, and colored nodes represent hidden variables. 8 illustrates a gamma distribution for learning a prior Dirichlet distribution of parameter θ in learning asymmetric α Dirichlet distribution based on data.

) Is defined as a prior probability distribution of the Dirichlet distribution.

LDA도식 모델에서 수식 표기는 다음과 같다. ?U는 시청 사용자(user) 수를 나타내고,

는 u 사용자가 시청한 TV 프로그램 콘텐츠의 개수를 나타낸다. ?K는 전체 은닉 주제를 표현하고, V는 전체 TV 프로그램의 개수를 의미한다.In the LDA schematic model, the mathematical notation is ? U represents the number of viewing users,

U represents the number of TV program contents watched by the user. K represents the total hidden theme, and V represents the total number of TV programs.

관찰된 데이터를 나타낸 색이 있는 노드 w는 각 사용자가 선호도

에 맞춰 TV 프로그램 w를 시청한 정보를 기반으로 한다. 생성 모델 관점에서 도 8의 비대칭 LDA 기반 TV 프로그램 시청 모델링은 아래 [표 1]과 같이 설계할 수 있다.The colored node w representing the observed data is preferred by each user.

According to the information watched TV program w. In view of the generation model, the asymmetric LDA-based TV program viewing modeling of FIG. 8 may be designed as shown in Table 1 below.

[표 1][Table 1]

종래기술에 따른 여러 추론 기법들의 성능은 하이퍼파라미터 최적화를 통해 비슷한 성능을 보임이 입증되었다. 이러한 추론 기법에는 GS(Gibbs Sampling), CGS(Collapsed Gibbs Sampling), VBI(Variational Bayesian Inference), CVBI(Collapsed Variational Bayesian Inference) 등 여러 가지가 있다. 이 중 GS(Gibbs Sampling)은 다른 추론 기법에 비해 그 추론 과정이 상대적으로 쉬우며, 특히 CGS(Collapsed Gibbs Sampling) 방법은 주요 랜덤 변수를 marginalization 하여 연산하므로 랜덤변수간의 독립성 가정을 최소화 하여, 상대적으로 다른 추론 기법에 비해 더 정확함이 입증되어 왔다. 따라서 본 실시예에서는 CGS기법을 추론 방법에 사용하나, GS(Gibbs Sampling), VBI(Variational Bayesian Inference), CVBI(Collapsed Variational Bayesian Inference) 등 다른 추론 기법을 이용한 실시예도 얼마든지 가능하다.The performance of various inference techniques according to the prior art has been demonstrated to show similar performance through hyperparameter optimization. Such inference techniques include Gibbs Sampling (GS), Collapsed Gibbs Sampling (CGS), Variational Bayesian Inference (VBI), and Collapsed Variational Bayesian Inference (CVBI). Among them, GS (Gibbs Sampling) is easier to infer than other inference techniques. Especially, the CGS (Collapsed Gibbs Sampling) method calculates marginalization of major random variables, thus minimizing the independence assumption between random variables. It has been proven to be more accurate than other reasoning techniques. Therefore, in the present embodiment, the CGS technique is used as an inference method, but any embodiment using other inference techniques such as GS (Gibbs Sampling), VBI (Variational Bayesian Inference), and CVBI (Collapsed Variational Bayesian Inference) can be used.

베이지안 확률 모델은 확률 P(w)를 최대화 하는 것을 목적으로 한다. P(w)는 다음과 같이 정의된다.Bayesian probabilistic models aim to maximize the probability P (w). P (w) is defined as follows.

[수학식 4]&Quot; (4) "

[수학식 5]&Quot; (5) "

[수학식 6]&Quot; (6) "

[수학식 7][Equation 7]

위 수식의 전개는 랜덤 변수로 지정된 z, θ 가 공존하므로, 단순한 연산으로 해결되지 않는다. 따라서 VB(Variational Bayesian Inference), MCMC(Markov Chain Monte Carlo)와 같은 추론 기술을 필요로 하고, 본 실시예에서는 MCMC 방법의 하나인 CGS를 사용하여 추론한다.The expansion of the above expression coexists with z and θ, which are designated as random variables. Therefore, reasoning techniques such as VB (Variational Bayesian Inference) and MCMC (Markov Chain Monte Carlo) are required, and in this embodiment, reasoning is performed using CGS, which is one of MCMC methods.

CGS로 수식을 전개하기 위해 수학식 5의 오른쪽 첫 번째 항은 θ에 대해, 두 번째 항은 ф에 대해 적분한다. 이 때 공액사전분포(conjugate prior) 관계에 의해 사후 확률도 디리클레 분포 형태를 가지는 성질을 이용 한다. 두 항의 연산 결과에 대해 첫 번째 항은 θ로, 두 번째 항은 ф로 표현하고, Gibbs Sampling방법을 적용하여 사용자별 각 시청 토큰에 대해 은닉 주제(도 7 및 도8의 z)를 샘플링하면서 모델 파라미터 θ와 ф를 다음 수식과 같이 업데이트 과정을 통해 계산한다.To develop the equation in CGS, the first term on the right side of Equation 5 is integrated for θ and the second term for ф. At this time, the post probability is used by the conjugate prior distribution (conjugate prior) relationship with the property of having a Dirichlet distribution form. The first term is expressed as θ, and the second term is expressed as ф for the calculation results of the two terms, and the Gibbs Sampling method is applied to sample the hidden subject (z in FIGS. 7 and 8) for each viewing token for each user. The parameters θ and ф are calculated by the update process as in the following equation.

[수학식 8]&Quot; (8) "

[수학식 9]&Quot; (9) "

여기서 u는 사용자, w는 TV 프로그램(전체 TV 프로그램의 인덱스), i는 사용자의 시청 토큰, k는 은닉 주제 인덱스를 나타낸다. 수학식 8과 수학식 9에서

는 사용자의 현재 시청 토큰 i를 주제 할당 값에서 제외하는 것을 의미한다.

는 현재 TV 프로그램 w가 k 주제에 할당된 개수이다.

는 주제에 할당된 TV 프로그램의 개수를 의미하며, 이는 모든 w에 대해

를 합한 값이다.?

는 현재 사용자 u에게 주제가 할당된 횟수를 의미한다. ?

는 사용자에게 할당된 전체 주제의 개수를 의미하며, 이는 모든 주제 k에 대해

를 합한 값이다.Where u is the user, w is the TV program (index of the entire TV program), i is the user's viewing token, and k is the hidden subject index. In equations (8) and (9)

Means excluding the user's current viewing token i from the theme assignment value.

Is the number of TV programs w currently assigned to k topics.

Means the number of TV shows assigned to the topic, which for all w

Is the sum of?

Denotes the number of times the topic is assigned to the current user u. ?

Means the total number of topics assigned to the user, which for all topics k

Is the sum of

따라서

는 각 TV 프로그램 w가 갖는 k번째 은닉 주제에 대한 확률을 나타내는 모델 파라미터

의 예측 치이고,

는 각 사용자 u의 k번째 은닉 주제에 대한 확률을 나타내는 모델 파라미터

에 대한 예측치를 나타낸다. ?α와 β는 각각 ?θ와 ф 파라미터의 선행 확률(prior)로서 디리클레 분포이고, θ와 ф는 앞서 설명된 것과 같이 다항 분포이다. 일반적으로 α와 β는 대칭(symmetric)적으로 설정된 작은 값을 사용하여 왔다.therefore

Is a model parameter representing the probability for the kth hidden subject of each TV program w.

Is the predicted value of,

Is a model parameter representing the probability for each user u's kth hidden subject

Represents a prediction for. ? α and β are Dirichlet distributions as priorities of the? θ and ф parameters, respectively, and θ and ф are polynomial distributions as described above. In general, α and β have used small values set symmetrically.

비대칭 하이퍼파라미터를 최적화 내지 추정하는 과정에는 디리클레 분포의 maximum likelihood 추정 연산을 단순한 반복루틴(iterative scheme)으로 풀이한 Minka의 기법, 또는 Minka의 fixed point iteration방법에 대해 감마 분포의 특성을 살려 효율적인 컴퓨팅 연산이 가능하도록 개선한 Wallah의 기법 등 다양한 방법이 사용될 수 있다. Wallah의 기법을 사용할 경우 비대칭 디리클레 분포 파라미터 α에 대한 계산식은 아래 수학식 10과 같다.In the process of optimizing or estimating asymmetric hyperparameters, Minka's method, which solves the maximum likelihood estimation of the Dirichlet distribution with a simple iterative scheme, or the gamma distribution for Minka's fixed point iteration, is an efficient computing operation. Various methods can be used, including Wallah's technique, which has been improved to make this possible. When Wallah's method is used, the equation for the asymmetric Dirichlet distribution parameter α is shown in Equation 10 below.

[수학식 10]&Quot; (10) "

여기서

은

의 개수가 n개인 개수를,

은

의 개수가 n개인 개수를 의미한다.here

silver

Number of ns,

silver

Means the number of n's.

실시예에 따라 주제 모델 파라미터 학습 단계(S530)와 관련하여 깁스 샘플링 연산과정은 효율적인 샘플링을 위해 다음과 같이 진행할 수 있다. 먼저 burn-in 과정 후 샘플링을 평균화하는 과정을 거치고, 샘플링을 평균화하는 과정 중에 비대칭 디리클레 분포도 수학식 10을 기반으로 동시에 학습하며, 마지막으로 평균 샘플링 결과로 생성된 z 분포를 시작으로 추론(샘플링) 과정을 추가하여 비대칭 학습할 수 있는 것이다.According to an exemplary embodiment, in relation to the subject model parameter learning step S530, the cast sampling process may proceed as follows for efficient sampling. First, after the burn-in process, the sampling is averaged, and while the sampling is averaged, the asymmetric Dirichlet distribution is simultaneously learned based on Equation 10. Finally, the inference starts with the z distribution generated as the average sampling result. You can add a course to learn asymmetry.

넷째, 파라미터 매핑부(440)는 은닉주제에 관한 주제모델 파라미터를 개인별선호도와 대중선호도에 매핑(mapping)한다(S540).Fourth, the parameter mapping unit 440 maps the subject model parameters related to the hidden subject to individual preferences and popular preferences (S540).

실시예에 따라서는 TV시청 모델링을 문서 생성 모델로 매핑할 수 있다. LDA 도식 모델에서 상술한 바와 같이 도 7의 U는 전체 사용자(user) 수를 표현하는데, 문서 생성 모델링에서는 전체 문서(document)의 개수를 나타낸다. 또한 색이 있는 노드(node) w는 사용자가 시청한 TV프로그램에 대한 토큰(token)을 표현하는데, 문서 생성 모델링에서는 문서에 나타난 단어를 의미하는 매핑(mapping), 사용자를 문서로 보고 사용자 시청 토큰을 단어로 보도록 구성한 매핑이 유리할 수 있다. 역으로, 각 문서를 TV프로그램에, 각 시청 토큰을 시청 사용자에 매핑하는 방법도 가능하다.According to an embodiment, the TV viewing modeling may be mapped to the document generation model. As described above in the LDA schematic model, U in FIG. 7 represents the total number of users. In document generation modeling, the U represents the total number of documents. In addition, the colored node w represents a token for a TV program watched by a user. In document generation modeling, a mapping means a word represented in a document, and a user is viewed as a document. A mapping configured to look at as words may be advantageous. Conversely, it is also possible to map each document to a TV program and each viewing token to a viewer.

실시예에 따라서는 대중 선호도와 개인 선호도를 주제모델링으로 매핑할 수 있다. 이 경우 주제모델링의 고전적인 모델인 LDA를 이용할 경우, 도 7에서 각 주제에 대한 다항분포로 표현되는 파라미터 θ는 개인 선호도에, 전체 단어의 주제에 대한 다항분포로 표현되는 Ф는 대중 선호도에 매핑한다.According to an embodiment, public preference and personal preference may be mapped to subject modeling. In this case, when using LDA, a classical model of subject modeling, the parameter θ represented by the polynomial distribution for each subject in FIG. 7 is mapped to the personal preference, and Ф represented by the polynomial distribution for the subject of the whole word is mapped to the public preference. do.

다섯째, 파라미터 가공부(450)는 개인별선호도의 다양성을 위한 주제모델 파라미터 가공 및 추가추론을 거친다(S550).Fifth, the parameter processing unit 450 undergoes subject model parameter processing and additional reasoning for diversity of individual preferences (S550).

사용자별 선호도의 다양성을 살릴 수 있는 연산 방법으로 개인별 선호도로 대표 표현한 다항분포의 선행 확률분포인 디리클레 분포를 비대칭 파라미터로 학습하여 그 결과를 얻는 방법을 사용한다. 도 8은 비대칭 디리클레 분포 학습을 위해 파라미터 θ의 선행 디리클레 분포 학습을 위해 감마 분포 Г(a,b)를 디리클레 분포의 선행 확률로 둔 실시예이다. 실시예에 따라서는 개인의 은닉 주제에 대한 일정 개수의 상위 확률의 요소 값만을 그대로 활용하고 나머지 값을 0으로 두어 가공함으로써 추가 추론에 이용할 수 있다.As a computational method that can make use of the diversity of preferences for each user, a method is obtained by learning the Dirichlet distribution, which is the preceding probability distribution of the polynomial distribution represented by individual preferences, as an asymmetric parameter. FIG. 8 illustrates an embodiment in which a gamma distribution Г (a, b) is used as a prior probability of the Dirichlet distribution for the prior Dirichlet distribution learning of the parameter θ for the asymmetric Dirichlet distribution learning. According to the embodiment, only a certain number of upper probability element values for the individual hidden subject may be used as they are, and the remaining values may be processed to 0 to be used for further inference.

여섯째, 비선호시청항목 필터(460)는 사용자의 비선호시청항목을 추천후보에서 제외한다(S560). 이미 방영이 종료되었거나, 사용자가 가입하지 않은 서비스로 인해 불용한 컨텐츠 등을 콘텐츠 추천후보에서 제외시키는 필터링 기능이 수행될 수 있다.Sixth, the non-favorite viewing item filter 460 excludes the non-preferred viewing item of the user from the recommended candidate (S560). The filtering function may be performed to exclude content that is not available due to a service that has already been aired or is not subscribed to by the content recommendation candidate.

일곱째, 추천후보 콘텐츠 정렬부(470)는 주제모델 파라미터를 이용하여 사용자별 추천후보 콘텐츠를 정렬한다(S570).Seventh, the recommended candidate content arranging unit 470 sorts the recommended candidate content for each user by using the subject model parameter (S570).

대중의 선호도와 개인의 선호도를 모두 고려할 수 있는 product 및 linear sum등으로 구성된 모델을 기준으로 각 사용자별 콘텐츠에 대한 선호도를 예측하는 단계로, 이를 기준으로 별점 구성이나 순위 정렬이 가능해질 수 있다. 멀티 멤버십의 그룹핑을 하였을 경우, 멤버십이 가장 큰 그룹에서 사용자 개인별 추천 항목을 계산한 결과를 취하도록 구현할 수도 있다.Predicting the preferences for each user's content based on a model composed of products and linear sums that can take into account both public and individual preferences. In the case of grouping of multi-membership, it may be implemented to take the result of calculating the recommendation item for each user in the group with the largest membership.

여기서 개인의 선호도 변수를 θ, 대중의 선호도 변수를 Ф라고 하였을 경우, 사용자 U에 대한 TV 프로그램 w에 대한 점수는 아래 수학식 11 또는 수학식 12와 같이 계산할 수 있다(여기서 K는 은닉 주제의 개수를 의미한다.).Herein, if the individual preference variable is θ and the public preference variable is Ф, the score for the TV program w for the user U may be calculated as in Equation 11 or Equation 12 below, where K is the number of hidden subjects. Means).

[수학식 11]&Quot; (11) "

[수학식 12]&Quot; (12) "

마지막으로 추천항목 제시부(480)는 이러한 과정을 거쳐 정렬된 사용자선호 TV프로그램 콘텐츠 추천항목을 사용자에게 제시한다(S580).Finally, the recommendation item presenter 480 presents the user preference TV program content recommendation item arranged through the process (S580).

본 발명의 방법발명인 제3실시예 및 그 변형된 방법들은 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터 프로그램의 형태로도 구현될 수 있다.The third embodiment of the method of the present invention and its modified methods can also be embodied in the form of a computer program stored on a computer readable recording medium.

본 발명의 실시예는 모델의 주요 변수인 개인별 주제에 대한 선호도와 각 주제에 대한 대중의 선호도를 각각 다항 분포로 표현하였다. 또한, 개인별 추천항목 제시에 있어서 대중의 선호도보다는 개인의 선호도를 중심으로 추천이 되도록 반영하기 위해, 개인별 선호 주제를 상위 관심 주제로 한정 짓는 방법을 사용하였다. 그리고 TV 프로그램에 대한 사용자들의 시청 관심은 개인별 차이가 크므로, 사용자 개인별 선호도가 잘 드러나도록 비대칭 디리클레 분포(dirichlet distribution)를 학습하도록 적용하였다.In the embodiment of the present invention, the preference of individual subjects, which are the main variables of the model, and the public preference of each subject are expressed in polynomial distribution. In addition, in the suggestion of individual recommendation items, in order to reflect the recommendation centered on individual preferences rather than public preferences, a method of limiting individual preference topics to higher interest topics was used. In addition, since viewing interests of TV programs are largely different from individual to viewer, the asymmetric dirichlet distribution is applied to reveal the user's personal preference.

본 발명의 일 실시예를 실험한 결과, 해당 방법들을 기반으로 학습된 파라미터 기반의 순위정렬모델을 통해 개인별 추천에 대한 의미 있는 추천 성능 실험 결과를 확인하였다. 또한, 주제(topic) 모델링 결과 하나의 주제 하에 일관된 장르나 채널들의 TV 프로그램들로 구성될 수 있을 뿐만 아니라, 일관된 장르나 채널로 정의되지 않는 선호 시청 TV 프로그램 목록들로 구성되는 것을 확인할 수 있었다. TV 프로그램 추천에서의 은닉 주제는 이에 속한 해당 TV 프로그램들을 시청한 사용자의 관심 주제를 반영하는 것으로 가정 하였을 때, 본 발명이 제안하는 은닉 주제 추론 기반 TV 프로그램 자동 추천 방법이 기존의 선호 장르별 및 선호 채널별로 TV 프로그램을 추천하는 방식보다 사용자들의 TV시청 선호의 다양성을 더 잘 표현함을 확인할 수 있었다.As a result of experimenting with one embodiment of the present invention, the results of meaningful recommendation performance experiments for individual recommendation were confirmed through a parameter-based ranking alignment model trained based on the corresponding methods. In addition, as a result of the topic modeling, it was confirmed that not only the TV programs of the genres or channels consistent under one topic but also the list of preferred watching TV programs not defined by the consistent genres or channels are included. Assuming that the concealed topic in TV program recommendation reflects the topic of interest of the user who watched the TV programs belonging to the TV program, the concealed topic reasoning-based TV program automatic recommendation method proposed by the present invention is the existing preferred genre and preferred channel. It was confirmed that they express the diversity of TV viewing preferences of users better than recommending TV programs.

본 실시형태의 모듈, 기능 블록들 또는 수단들은 전자 회로, 집적 회로, ASIC (Application Specific Integrated Circuit) 등 공지된 다양한 소자들로 구현될 수 있으며, 각각 별개로 구현되거나 2 이상이 하나로 통합되어 구현될 수 있다.The modules, functional blocks or means of the present embodiment may be implemented by various known devices such as an electronic circuit, an integrated circuit, and an ASIC (Application Specific Integrated Circuit), and they may be implemented separately or two or more may be integrated into one .

이상과 같이 본 발명의 이해를 위하여 그 실시예를 기술하였으나, 당업자라면 알 수 있듯이, 본 발명은 본 명세서에서 기술된 특정 실시예에 한정되는 것이 아니라, 본 발명의 범주를 벗어나지 않는 범위 내에서 다양하게 변형, 변경 및 대체될 수 있다. 예를 들어, 문자 대신 기타 LCD 등 디스플레이에 의해 표시될 수 있는 그림, 영상 등에도 본 발명의 기술이 적용될 수 있다. 따라서, 본 발명의 진정한 사상 및 범주에 속하는 모든 변형 및 변경을 특허청구범위에 의하여 모두 포괄하고자 한다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be construed as limited to the specific embodiments set forth herein; And can be replaced, modified and replaced. For example, the technique of the present invention can be applied to pictures, images, etc., which can be displayed by a display such as other LCDs instead of characters. Accordingly, it is intended to cover in the appended claims all such changes and modifications that fall within the true spirit and scope of the invention.

[310] 사용자 TV
[320] TV 콘텐츠 데이터베이스
[330] TV 사용자 시청기록 데이터베이스
[340] 미디어 스트리밍 서버
[400] 사용자관리 및 추천서버
[610] 1번 그룹에 우선멤버십을 갖는 사용자
[620] 1번 그룹 사용자[310] your TV
[320] TV content database
[330] TV user viewing history database
[340] media streaming server
[400] User Management and Recommendation Server
[610] User with Priority Membership in Group 1
[620] Group 1 user

Claims

A user grouping unit configured to set a user group using the preferred viewing information of a plurality of users;
A subject model parameter learner learning a topic model parameter for each user group using the preferred viewing information;
A parameter mapping unit for mapping hidden topic parameters among the subject model parameters to individual preferences and group preferences, and
Parameter processing unit that performs additional processing or inference to the subject model parameters to reveal the diversity of individual preferences
Automatic recommendation device based on user preference topic, including TV program content.

The method of claim 1,
The parameter mapping unit,
Mapping means the words that appear in the document in document generation modeling, mappings configured to view users as documents and view user viewing tokens as words in document generation modeling, each document to a TV program, and each viewing token to a viewer One of the following mappings, or one that consists of subject modeling of public and personal preferences.
Automatic recommendation device for TV program content based on user preference.

The method of claim 1,
The parameter processing unit,
In learning the Dirichlet distribution, which is the prior probability of the hidden subject parameter,
Process only by using a predetermined upper part of the preferred viewing information as it is and leaving the remaining lower part as 0,
Calculating the Dirichlet distribution as an asymmetric Dirichlet distribution
Automatic recommendation device for TV program content based on user preference.

The method of claim 1,
A non-preferred viewing item filter for excluding any one or more of the broadcasting end content or the content not subscribed to by the user using the preferred viewing information.
Automatically recommend the user based on the topic topic-based TV program content.

The method of claim 1,
Recommended candidate content sorting unit for sorting the recommended candidate content for each user by using the subject model parameter
Automatically recommend the user based on the topic topic-based TV program content.

The method according to any one of claims 1 to 5.
The mapping or the additional inference uses any one or more of a technique such as Gibbs Sampling (GS), Collapsed Gibbs Sampling (CGS), Variational Bayesian Inference (VBI), or Collapsed Variational Bayesian Inference (CVBI).
Automatic recommendation device for TV program content based on user preference.

A media streaming server for broadcasting TV content;
A user TV receiving a TV content recommendation list by transmitting a viewing record while receiving TV content from the media streaming server;
A TV user viewing record database for receiving and storing a viewing record from the user TV;
Set a user group using the preference viewing information generated from the viewing records received from the plurality of user TVs, and learn a topic model parameter for each user group using the preferred viewing information. Mapping hidden topic parameters among model parameters to individual preferences and group preferences, and performing additional processing or inference on the subject model parameters to reveal the diversity of individual preferences, thereby providing a TV content recommendation list to the user TV. User management and recommendation server for transmitting; And
A TV content database storing TV content to be transmitted to the media streaming server and the user management and recommendation server
User recommendation topic-based TV program content automatic recommendation system comprising a.

8. The method of claim 7,
The user management and recommendation server,
In learning the Dirichlet distribution, which is a prior probability of the hidden subject parameter, only a predetermined upper part of the preferred viewing information is used as it is and the remaining lower part is left as 0, or the Dirichlet distribution is asymmetric. , And
By using the preferred viewing information, one or more of the end of the broadcast content or content that the user TV is not subscribed to is excluded from the recommended candidates,
Sorting recommendation content for each user by using the subject model parameter
Automatic recommendation system based on user preference topic.

A user grouping step of setting a user group by using the preference viewing information of a plurality of users;
A topic model parameter learning step of learning a topic model parameter for each user group using the preference viewing information;
A parameter mapping step of mapping a hidden topic parameter among the subject model parameters to individual preferences and group preferences;
A parameter processing step of performing additional processing or inference on the subject model parameters to reveal the diversity of individual preferences;
A non-preferred viewing item filtering step of excluding any one or more of the broadcasting end contents or the contents which the user does not subscribe to using the preferred viewing information, as a candidate for recommendation, and
Suggested candidate content sorting step of sorting the recommended candidate content for each user by using the subject model parameter
Auto recommendation method based on user preference topic, including a.

10. The method of claim 9,
The parameter processing step,
In learning the Dirichlet distribution, which is the prior probability of the hidden subject parameter,
Process only by using a predetermined upper part of the preferred viewing information as it is and leaving the remaining lower part as 0,
Calculating the Dirichlet distribution as an asymmetric Dirichlet distribution
How to automatically recommend TV program content based on user preference.

A subject model parameter learning step of learning a topic model parameter for each user group set using the preference viewing information received from a plurality of users;
Mapping the hidden topic parameters among the subject model parameters to individual preferences and group preferences, mapping the meanings of words appearing in the document in document generation modeling, and viewing the user as a document in document generation modeling. A parameter mapping step of performing any one of a mapping configured to view as a word, a mapping of each document to a TV program, a mapping of each viewing token to a viewer, or a mapping of popular and personal preferences to subject modeling, and
In learning the Dirichlet distribution, which is a prior probability of the hidden subject parameter, only a predetermined upper part of the preferred viewing information is used as it is and the remaining lower part is set to 0, or the Dirichlet distribution is asymmetric. Parametric processing step of performing additional inference that reveals the diversity of individual preferences by processing to compute into a distribution
Auto recommendation method based on user preference topic, including a.

A program for executing the method according to any one of claims 9 to 11 is recorded.
A computer readable recording medium.