KR100925511B1

KR100925511B1 - Method and system for classifing content using collaborative filtering

Info

Publication number: KR100925511B1
Application number: KR1020080027109A
Authority: KR
Inventors: 임성욱; 이수정; 김지환
Original assignee: 에스케이커뮤니케이션즈 주식회사
Priority date: 2008-03-24
Filing date: 2008-03-24
Publication date: 2009-11-06
Also published as: KR20090101770A

Abstract

The present invention relates to a content classification method and system for classifying unclassified content without a classification name by using previously classified content having a classification name according to a classification system.

Specifically, the present invention provides a content classification method using collaborative filtering, comprising: (a) transmitting content usage information including content information, user information, and usage history information; (b) dividing the content usage information into pre-classified content usage information and non-classified content usage information according to whether a classification name exists according to a classification system; (c) calculating preferences for each user's classification name according to the usage history information from the previously classified content usage information; (d) calculating preferences of respective users based on the usage history information from the unclassified contents usage information; (e) calculating a similarity between the classification name and the content based on the preference by classification name and the preference by content; (f) generating classification name-content group information according to the similarity; (g) providing a content classification method and system using cooperative filtering, including classifying the unclassified content into a corresponding classification name according to the classification name-content group information.

Description

Method and system for classifing content using collaborative filtering}

본 발명은 분류체계에 따른 분류명이 있는 기(旣) 분류 컨텐츠를 이용하여 분류명이 없는 미(未) 분류 컨텐츠를 분류하는 컨텐츠 분류방법 및 시스템에 관한 것이다.The present invention relates to a content classification method and system for classifying unclassified content without a classification name by using previously classified content having a classification name according to a classification system.

보다 구체적으로 본 발명은 분류체계에 따른 분류명이 없는 미 분류 컨텐츠의 분류를 위하여, 분류명이 있는 기 분류 컨텐츠에 대한 각 사용자의 분류명별 선호도와 미 분류 컨텐츠에 대한 각 사용자의 컨텐츠별 선호도를 산출하고, 상기 분류명 및 컨텐츠 간 유사도를 이용하여 분류명-컨텐츠 군 정보를 생성함으로써 미 분류 컨텐츠를 해당 분류명으로 분류하는 협업필터링을 이용한 컨텐츠 분류방법 및 시스템에 관한 것이다.More specifically, in order to classify unclassified content having no classification name according to the classification system, the present invention calculates the preference of each user for the classification name and the content preference of each user for the unclassified content. The present invention relates to a content classification method and system using collaborative filtering for classifying unclassified content into a corresponding classification name by generating classification name-content group information using the classification name and the similarity between the contents.

최근 들어 사회가 본격적인 정보화시대로 접어듦에 따라 인터넷 보급과 정보통신기술 발전이 급속도로 이루어졌고, 여기에 힘입어 인터넷을 기반으로 한 전자 상거래(electronic commerce)가 빠르게 확산되고 있다.Recently, as society has entered the age of full-scale informatization, the development of the Internet and the development of information and communication technologies have been rapidly developed, and the electronic commerce based on the Internet is rapidly spreading.

이 같은 전자상거래의 대표적인 예로는 B2C(Business-to-Consumer)라 불리는 기업 대 소비자 간 전자상거래를 꼽을 수 있으며, 이는 인터넷을 매개로 공급자와 소비자 간에 진행되는 모든 형태의 전자적 거래를 총칭한다. 이때, 일반적인 전자상거래의 대상은 유무형의 무엇이라도 가능한바, 이하 컨텐츠(content)라 한다.An example of such e-commerce is a business-to-consumer electronic commerce called B2C (Business-to-Consumer), which refers to all types of electronic transactions between suppliers and consumers via the Internet. At this time, the general object of the electronic commerce can be any of the tangible and intangible, which is referred to as content (hereinafter).

한편, 일반적인 전자상거래에 있어서 소비자인 고객은 불특정 다수인 경우가 많으므로 공급자는 한층 더 효율적인 마케팅 전략을 수립 및 운용할 필요가 있다. 이에 따라 이미 많은 공급자들이 거래에 관련된 고객정보를 적극적으로 활용하여 해당 사용자에게 가치있는 서비스를 제공하는 '개인화 서비스(personalization service)'를 시행하고 있다. 이때, '개인화 서비스'는 사용자가 묵시적 혹은 명시적으로 제공한 정보, 예컨대 사용자 신상정보나 선호도, 소비형태 등을 이용하여 해당 소비자에게 가치있을 것으로 예상되는 데이터를 선별해서 제공하는 서비스를 말한다. On the other hand, in general e-commerce, the consumer is often the unspecified number of customers, the supplier needs to establish and operate a more efficient marketing strategy. As a result, many suppliers are actively implementing 'personalization service' that provides valuable services to the users by actively using customer information related to transactions. At this time, the 'personalization service' refers to a service that selectively selects data that is expected to be of value to the consumer by using information implicitly or explicitly provided by the user, such as user's personal information, preference, or consumption form.

그리고 이러한 개인화 서비스의 대표적인 예로는 '협업필터링(collaborative filtering)에 의한 컨텐츠 추천정보 서비스'를 들 수 있는데, 여기서 '협업필터링'이란 사용자로부터 얻어진 기호정보(taste information)를 토대로 특정 사용자의 관심사를 예측하는 방법으로서 '유사한 성향을 갖는 사용자는 유사한 컨텐츠를 선호한다'는 전제하에 사용자 및/또는 컨텐츠 간의 유사도를 이용하여 특정 사용자에게 적절한 컨텐츠를 선택하는 방법을 의미하고, '컨텐츠 추천정보 서비스'는 협업필터링으로 얻어진 특정 컨텐츠에 대한 정보를 해당 사용자에게 추천 및 소개하는 형태의 서비스를 의미한다.A representative example of such personalization service is 'content recommendation information service through collaborative filtering', where 'collaborative filtering' predicts a user's interest based on taste information obtained from the user. A method of selecting a content suitable for a specific user by using similarity between the user and / or the content, on the premise that a user having similar tendency prefers similar content, and a content recommendation information service is a collaboration method. Refers to a service of recommending and introducing information on specific content obtained through filtering to a corresponding user.

이에 따라 일반적인 협업필터링은 사용자 간 유사도를 이용하는 '사용자 간(user to user) 협업필터링'과 컨텐츠 간 유사도를 이용하는 '컨텐츠 간(content to content) 협업필터링'으로 구분될 수 있다.Accordingly, general collaborative filtering may be classified into 'user to user collaborative filtering' using similarity between users and 'content to content collaborative filtering' using similarity between contents.

구체적으로 살펴보면, '사용자 간 협업필터링'은 사용자들로부터 얻어진 정보를 이용하여 서로 유사한 양상을 보이는 사용자들을 그룹화한 후, 해당 그룹 내 다른 사용자들의 컨텐츠 구매정보나 이용정보를 이용하여 특정 사용자에게 컨텐츠 추천정보를 제공하는 반면, '컨텐츠간 협업필터링'은 일정시간 내에 동반 사용된 경우가 많은 컨텐츠 들을 그룹화한 후, 어느 사용자가 특정 컨텐츠를 사용하면 해당 그룹 내 다른 컨텐츠 추천정보를 제공하는 형태가 대표적이다.In detail, 'collaboration filtering between users' groups information that is similar to each other by using information obtained from users, and then recommends content to a specific user by using contents purchase information or usage information of other users in the group. On the other hand, 'collaboration filtering between contents' is a form of grouping contents that are often used together within a certain time period, and when a user uses a specific content, other content recommendation information in the group is typically provided. .

즉, 사용자 1과 사용자 2가 그룹화된 상태에서 사용자 1이 컨텐츠 1을 사용할 경우에 사용자 2에게 컨텐츠 1을 추천하는 형태가 '사용자 간 협업필터링'의 좋은 예가 될 수 있고, 컨텐츠 1과 컨텐츠 2가 그룹화된 상태에서 사용자 1이 컨텐츠 1을 사용할 경우에 컨텐츠 2에 대한 추천정보를 제공하는 형태가 '컨텐츠 간 협업필터링'의 좋은 예가 될 수 있다. 최근에는 특히 '사용자간 협업필터링'과 '컨텐츠 간 협업필터링'을 병합한 새로운 형태의 사용자-컨텐츠 간 협업필터링 방법이 소개되어 한층 효율적인 컨텐츠 추천 정보 서비스를 제공하기도 한다.That is, in the case where User 1 uses Content 1 while User 1 and User 2 are grouped, a form of recommending Content 1 to User 2 may be a good example of 'collaboration filtering between users'. When User 1 uses Content 1 in a grouped state, a form of providing recommendation information for Content 2 may be a good example of 'collaboration filtering between contents'. Recently, a new type of user-content collaborative filtering method that combines 'collaboration filtering between users' and 'collaboration filtering between contents' has been introduced to provide more efficient content recommendation information service.

한편, 컨텐츠의 종류가 급속도로 증가하고 있는 최근의 추세를 감안하면, 컨텐츠의 명확한 분류는 컨텐츠 제공자와 소비자 모두에게 중요한 영향을 미친다. 이때, '컨텐츠의 분류'라 함은 소정의 분류체계에 따라 컨텐츠들을 묶거나 구분하여 체계화하는 것을 의미하는바, 컨텐츠 분류의 명확성이 결여된 경우에는 정확한 컨텐츠의 선택이 어려울 뿐만 아니라 중복분류나 분류누락의 가능성이 크다. On the other hand, given the recent trend in which the type of content is rapidly increasing, the clear classification of the content has an important effect on both the content provider and the consumer. In this case, 'classification of content' refers to grouping or classifying contents according to a predetermined classification system. In the case of lack of clarity of content classification, not only accurate content selection but also duplicate classification or classification The possibility of omission is large.

이에 따라 통상적으로는 컨텐츠 내부로부터 분류의 실마리를 찾아 체계화하는 방법이 사용되지만, 이는 문서와 같이 단어 및 기호로 구성되어 컨텍스트(context)의 추출이 용이한 경우에만 가능하고, 음악이나 이미지 데이터 등은 실질적인 분류가 불가능하다.Accordingly, a method of finding and organizing clues of classification from the inside of a content is generally used, but this is possible only when a context is easily extracted because of words and symbols, such as a document. Substantial classification is impossible.

때문에 컨텐츠를 명확하게 자동 분류할 수 있는 구체적인 방도가 요구된다.Therefore, a specific strategy for automatically and clearly classifying content is required.

이에 본 발명은 상기와 같은 문제점을 해결하기 위한 것으로, 협업필터링을 이용하여 컨텐츠를 보다 빠르고 명확하게 자동 분류할 수 있는 컨텐츠 분류방안을 제공하는데 목적을 둔다.Accordingly, an object of the present invention is to provide a content classification method that can automatically and quickly classify content using collaborative filtering.

즉, 본 발명은 컨텐츠 내부로부터 분류의 실마리를 찾기 어렵거나 불가능한 경우에 있어서, 분류명이 있는 기(旣) 분류 컨텐츠에 대한 각 사용자의 분류명별 선호도와 미(未) 분류 컨텐츠에 대한 각 사용자의 컨텐츠별 선호도를 통해 분류명-컨텐츠 간 유사도를 판단하고, 상기 유사도를 이용하여 분류명-컨텐츠 군 정보를 생성함으로써 미 분류 컨텐츠를 해당 분류명으로 자동 분류할 수 있는 구체적인 방도를 제공하는데 그 목적이 있다.That is, the present invention, when it is difficult or impossible to find the clue of the classification from the inside of the content, the content of each user for each user's content preferences and unclassified content of each user with respect to the existing classification content with the classification name The object of the present invention is to provide a specific method for automatically classifying unclassified contents into a corresponding classification name by determining similarity between classification names and contents through star preference, and generating classification name-content group information using the similarity.

상기와 같은 목적을 달성하기 위하여 본 발명은, 협업필터링을 이용한 컨텐츠 분류방법으로서, (a) 컨텐츠 정보, 사용자 정보, 사용이력 정보가 담긴 컨텐츠 이용정보가 전송되는 단계와; (b) 분류체계에 따른 분류명 유무에 의해 상기 컨텐츠 이용정보가 기(旣) 분류 컨텐츠 이용정보와 미(未) 분류 컨텐츠 이용정보로 구분되는 단계와; (c) 상기 기 분류 컨텐츠 이용정보로부터 상기 사용이력 정보에 따른 각 사용자의 분류명별 선호도가 산출되는 단계와; (d) 상기 미 분류 컨텐츠 이용정보로부터 상기 사용이력 정보에 따른 각 사용자의 컨텐츠별 선호도가 산출되는 단계와; (e) 상기 분류명별 선호도와 상기 컨텐츠별 선호도를 통해 상기 분류명과 컨텐츠 사이의 유사도가 산출되는 단계와; (f) 상기 유사도에 따라 분류명-컨텐츠 군(群) 정보가 생성되는 단계와; (g) 상기 분류명-컨텐츠 군 정보에 따라 상기 미 분류 컨텐츠가 해당 분류명으로 분류되는 단계를 포함하는 협업필터링을 이용한 컨텐츠 분류방법을 제공한다.In order to achieve the above object, the present invention provides a content classification method using collaborative filtering, comprising: (a) transmitting content usage information including content information, user information, and usage history information; (b) dividing the content usage information into pre-classified content usage information and non-classified content usage information according to whether a classification name exists according to a classification system; (c) calculating preferences for each user's classification name according to the usage history information from the previously classified content usage information; (d) calculating preferences of respective users based on the usage history information from the unclassified contents usage information; (e) calculating a similarity between the classification name and the content based on the preference by classification name and the preference by content; (f) generating classification name-content group information according to the similarity; (g) providing a content classification method using collaborative filtering comprising the step of classifying the unclassified content into a corresponding classification name according to the classification name-content group information.

이때, 상기 (b) 단계 후 상기 (c) 단계 전, 상기 기 분류 컨텐츠 이용정보의 상기 컨텐츠 정보가 해당 분류명으로 치환되는 단계를 더 포함하여, 상기 (c) 단계의 상기 각 사용자의 분류명별 선호도가 산출되는 것을 특징으로 하고, 상기 사용이력 정보는 상기 컨텐츠의 사용유형정보, 상기 컨텐츠의 사용시간정보, 상기 컨텐츠의 사용횟수정보를 포함하는 것을 특징으로 하며, 상기 (c) 단계 및/또는 상기 (d) 단계는, 상기 사용유형정보에 따른 사용유형별 선호도가 산출되는 단계와, 상기 사용시간정보에 따른 사용시간별 선호도가 산출되는 단계와, 상기 사용횟수정보 에 따른 사용횟수별 선호도가 산출되는 단계 중 적어도 하나를 포함하여, 상기 분류명별 선호도 및/또는 상기 컨텐츠별 선호도는 상기 사용유형별 선호도와, 상기 사용시간별 선호도와, 상기 사용횟수별 선호도 중 적어도 하나를 통해 산출되는 것을 특징으로 한다.In this case, after the step (b) and before the step (c), the content information of the pre-categorized content use information is further replaced by a corresponding classification name. Wherein the usage history information includes usage type information of the contents, usage time information of the contents, and usage frequency information of the contents, wherein step (c) and / or the Step (d) is a step of calculating a preference for each type of use according to the usage type information, a step of calculating a preference for each use time according to the use time information, and a step of calculating a preference for each use number according to the use frequency information. Including at least one of, the preference by classification name and / or the content preference is the preference by the type of use, preference by the time of use, the It is characterized in that it is calculated through at least one of the number of preferences.

또한, 상기 사용유형별 선호도는, 상기 사용유형정보의 사용유형별로 기 할당된 가중치로 산출되는 것을 특징으로 하고, 상기 사용시간별 선호도는, 상기 사용시간정보의 최근시간과 현재시간 사이 간격에 따른 특정 감소관계를 참조하여 산출되는 것을 특징으로 하며, 상기 특정감소관계는 시그모이드 감쇄곡선에 따르는 것을 특징으로 한다.In addition, the preference for each type of use is calculated as a weight that is pre-allocated for each type of use of the use type information, the preference for each use time, the specific decrease according to the interval between the latest time and the current time of the use time information It is characterized in that it is calculated with reference to the relationship, characterized in that the specific reduction relationship is in accordance with the sigmoid attenuation curve.

또한 상기 사용횟수별 선호도는, 상기 사용횟수정보의 사용횟수에 따른 특정 증가관계를 참조하여 산출되는 것을 특징으로 하고, 상기 (c) 단계 후 상기 (e) 단계 전, 기 설정된 비정상 사용자 판단기준에 따라 해당 사용자의 상기 분류명별 선호도가 필터링 되는 단계를 더 포함하는 것을 특징으로 하며, 상기 (d) 단계 후 상기 (e) 단계 전, 기 설정된 비정상 사용자 판단기준에 따라 해당 사용자의 상기 컨텐츠별 선호도가 필터링되는 단계를 더 포함하는 것을 특징으로 한다.In addition, the preference for each use frequency is calculated by referring to a specific increase relation according to the use frequency of the use frequency information, and after the step (c) before the step (e), the predetermined abnormal user determination criteria And filtering the user's preferences according to the classification names according to the user's preferences. The method may further include filtering.

또한, 상기 비정상 사용자판단기준은, 상기 사용자의 직업이 해당 컨텐츠의 생성, 개발, 가공, 판매 또는 광고와 관련되거나, 상기 사용자의 국적이 국외이거나, 상기 사용자가 동일 컨텐츠에 대해 최소 사용시간간격 이내에 최대 사용횟수 이상 사용하거나, 상기 사용자가 최소 사용시간간격 이내에 최대 사용횟수 이상 동일행위를 수행하는 것 중 적어도 하나인 것을 특징으로 하고, 상기 (e) 단계는, (e1) 상기 분류명별 선호도를 통해 분류명별 선호사용자 그룹이 생성되는 단계와; (e2) 상기 선호사용자 그룹의 상기 컨텐츠별 선호도를 통해 상기 분류명과 컨텐츠 사이의 유사도가 산출되는 단계를 포함하는 것을 특징으로 하며, 상기 사용자 정보는 상기 사용자의 성별, 연령, 지역 중 적어도 하나의 사용자 속성정보를 더 포함하여, 상기 선호사용자 그룹은 상기 사용자 속성별을 고려하여 생성되는 것을 특징으로 한다.In addition, the abnormal user judgment criteria, the occupation of the user is associated with the creation, development, processing, sale or advertising of the content, the nationality of the user is out of the country, or the user within the minimum usage time interval for the same content It is characterized in that at least one of using more than the maximum number of times, or the user performs the same behavior more than the maximum number of times within the minimum use time interval, step (e), (e1) through the preference by the classification name Generating a preferred user group for each classification name; (e2) calculating the similarity between the classification name and the content based on the content-specific preference of the user group, wherein the user information includes at least one user of the gender, age, and region of the user. In addition to the attribute information, the preferred user group is generated in consideration of the user attribute.

또한, 상기 (e) 단계 후 상기 (f) 단계 이전, 상기 유사도를 기준치와 비교하여 상기 유사도가 상기 기준치 미만인 경우에 해당 컨텐츠가 필터링되는 단계를 더 포함하는 것을 특징으로 한다.The method may further include filtering the corresponding content when the similarity is less than the reference value after comparing the similarity with the reference value after the step (e) and before the (f) step.

아울러 본 발명은 협업필터링을 이용한 컨텐츠 분류 방법으로서, (a) 분류명에 대한 정보를 가진 기(旣) 분류 컨텐츠 이용정보와 미(未)분류 컨텐츠 이용정보가 수집되는 단계와; (b) 상기 기(旣) 분류 컨텐츠 이용정보로부터 각 사용자의 선호도가 산출되어 소정의 사용자 그룹이 생성되는 단계와; (c) 상기 미(未) 분류 컨텐츠 이용정보로부터 상기 사용자 그룹이 선호하는 소정의 미(未)분류 컨텐츠가 선별되는 단계와; (d) 상기 선별된 미(未)분류 컨텐츠의 분류명이 상기 사용자 그룹이 선호하는 기(旣) 분류 컨텐츠의 분류명으로 설정되는 단계를 포함하는 협업필터링을 이용한 컨텐츠 분류방법을 제공한다.In addition, the present invention provides a content classification method using collaborative filtering, comprising the steps of: (a) collecting pre-classified content usage information and unclassified content usage information having information on a classification name; (b) calculating a preference of each user from the pre-classified content usage information to generate a predetermined user group; (c) selecting predetermined unclassified contents preferred by the user group from the unclassified contents usage information; (d) providing a content classification method using collaborative filtering, wherein the classification name of the selected non-classified content is set as a classification name of pre-classified content preferred by the user group.

이때, 상기 컨텐츠 이용 정보는 컨텐츠 정보, 사용자 정보, 사용이력 정보를 포함하고, 상기 사용자 이력 정보는 상기 컨텐츠의 사용유형정보, 상기 컨텐츠의 사용시간 정보, 상기 컨텐츠의 사용횟수정보 중 적어도 하나를 포함하는 것을 특징 으로 하고, 상기 (b) 단계는, 동일한 분류명을 가지는 기(旣) 분류 컨텐츠 이용정보에서 상기 기(旣) 분류 컨텐츠에 대한 선호도가 소정 기준을 초과하는 상기 사용자 그룹을 생성하는 단계를 더 포함하는 것을 특징으로 한다.In this case, the content usage information includes content information, user information, usage history information, and the user history information includes at least one of usage type information of the content, usage time information of the content, and usage count information of the content. The step (b) may include generating the user group whose preference for the existing classified content exceeds a predetermined criterion in the existing classified content using information having the same classification name. It further comprises.

아울러 본 발명은 컨텐츠 정보, 사용자 정보, 사용이력 정보가 담긴 컨텐츠 이용정보를 수신하는 컨텐츠 제공서버 인터페이스부와; 상기 컨텐츠 이용정보 중 분류체계에 따른 분류명이 있는 기 분류 컨텐츠 이용정보를 선별하여 각 사용자의 분류명별 선호도를 산출하는 분류명별 선호도 산출부와; 상기 컨텐츠 이용정보 중 분류체계에 따른 분류명이 없는 미 분류 컨텐츠 이용정보를 선별하여 각 사용자의 컨텐츠별 선호도를 산출하는 컨텐츠별 선호도 산출부와; 상기 분류명별 선호도와 상기 컨텐츠별 선호도를 통해 상기 분류명과 컨텐츠 사이의 유사도를 연산하고, 상기 유사도를 통해 분류명-컨텐츠 군 정보를 생성하는 유사도 연산부와; 상기 분류명-컨텐츠 군 정보를 이용하여 상기 미 분류 컨텐츠를 해당 분류명으로 분류하는 분류부를 포함하는 협업필터링을 이용한 컨텐츠 분류시스템을 제공한다.In addition, the present invention provides a content providing server interface for receiving content use information containing content information, user information, usage history information; A preference calculation unit for each classification name that selects previously classified content usage information having a classification name according to a classification system among the contents usage information and calculates a preference for each classification name of each user; A content preference calculator for selecting unclassified content usage information without a classification name according to a classification system from the content usage information to calculate a preference for each user; A similarity calculation unit configured to calculate a similarity between the classification name and contents based on the preference by classification name and the preference by content, and generate classification name-content group information based on the similarity; Provided is a content classification system using collaborative filtering including a classification unit for classifying the unclassified content into a corresponding classification name by using the classification name-content group information.

이때, 상기 유사도 연산부는, 상기 분류명별 선호도를 통해 분류명별 선호사용자 그룹을 생성하고, 상기 선호사용자 그룹의 상기 컨텐츠 별 선호도를 통해 상기 분류명과 컨텐츠 사이의 유사도를 산출하는 것을 특징으로 하고, 상기 컨텐츠 이용정보가 저장 및 관리되는 컨텐츠 이용정보 데이터베이스와; 상기 분류명별 선호도가 저장 및 관리되는 분류명별 선호도 정보 데이터베이스와; 상기 컨텐츠별 선호도가 저장 및 관리되는 컨텐츠별 선호도 정보 데이터베이스와; 상기 미 분류 분류명-컨텐츠 군 정보가 저장되는 분류명-컨텐츠 군 데이터베이스를 더 포함하는 것 을 특징으로 한다.In this case, the similarity calculating unit may generate a preferred user group for each classification name based on the preference for each classification name, and calculate the similarity between the classification name and contents based on the preference for each content of the preferred user group. A content usage information database in which the usage information is stored and managed; A preference information database for each classification name in which preferences for each classification name are stored and managed; A content preference information database for storing and managing the content preferences; And the classification name-content group database in which the unclassified classification name-content group information is stored.

이상에서 살펴본 것처럼, 본 발명은 분류명이 있는 기(旣) 분류 컨텐츠에 대한 각 사용자의 분류명별 선호도와 미 분류 컨텐츠에 대한 각 사용자의 컨텐츠별 선호도를 이용하여 분류명과 컨텐츠 사이의 유사도를 판단하고, 상기 유사도를 이용하여 분류명-컨텐츠 군 정보를 생성함으로써 미 분류 컨텐츠를 해당 분류명으로 자동 분류한다.As described above, the present invention is to determine the similarity between the classification name and the content by using each user's classification name preference for the previously classified content with the classification name and each user's content preference for unclassified content, Unclassified content is automatically classified into a corresponding classification name by generating classification name-content group information using the similarity.

그 결과 본 발명은 사용자 선호도가 직접적으로 반영된 컨텐츠 분류가 가능하므로 신뢰성이 높고, 한층 더 정확한 컨텐츠 추천정보의 선택 및 제공을 도모할 수 있다.As a result, the present invention can classify the content directly reflecting the user's preference, so that the reliability is high and more accurate content recommendation information can be selected and provided.

이하, 도면을 참조해서 본 발명을 구체적으로 살펴본다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

본격적인 설명에 앞서 본 명세서에서 사용될 몇 가지 용어를 정리하면 '컨텐츠'란 공급자에 의해 사용자에게 제공되는 객체로서, 예를 들면 품명, 모델명으로 구분 가능한 특정물품이나 파일명, 확장자로 구분 가능한 음악파일, 이미지파일, 동영상 파일, 문서파일 등의 데이터 파일이 될 수 있고, '분류명'이란 임의의 분류체계에 따른 컨텐츠의 분류 기준으로서, 해당 물품이나 컨텐츠가 속하는 카테고리 또는 음악 파일인 경우에는 장르 등이 될 수 있다. 따라서 각각의 컨텐츠는 분류명 으로 분류되며, 분류명이 없다는 것은 해당 컨텐츠의 내부 속성 등을 참조하여도 어느 카테고리에 속하는지 알 수 없다는 것을 의미한다. 또한 컨텐츠의 사용이라 함은 컨텐츠를 구매하는 것은 물론, 컨텐츠의 조회, 소개, 개인영역으로 스크랩, 짧은 평가글을 남기는 등 모든 형태를 포괄하는 개념인바, 이 같은 용어의 정의는 본 명세서에서 일관되게 동일한 의미로 사용될 것이다.Before the full description, several terms to be used in the present specification are summarized, and the term 'content' is an object provided to a user by a supplier. It may be a data file such as a file, a video file, a document file, etc. A 'classification name' is a classification criterion for contents according to an arbitrary classification system. In the case of a category or music file to which the article or content belongs, it may be a genre. have. Therefore, each content is classified under a classification name, and the absence of a classification name means that it is impossible to know which category it belongs to even by referring to internal attributes of the corresponding content. In addition, the use of content is a concept that encompasses all forms such as not only purchasing content, but also inquiry, introduction, scrap to the personal area, and a short testimonial, such that the definition of the term is consistently used herein. Will be used in the same sense.

첨부된 도 1은 본 발명에 따른 협업필터링을 이용한 컨텐츠 분류시스템(이하, 간략하게 컨텐츠 분류시스템이라 한다.)의 바람직한 일 양태(樣態)를 나타낸 블럭도이다. 1 is a block diagram showing a preferred aspect of a content classification system (hereinafter, simply referred to as a content classification system) using collaborative filtering according to the present invention.

보이는 것처럼, 본 발명에 따른 컨텐츠 분류시스템은 컨텐츠 제공서버 인터페이스부(110)와, 컨텐츠 이용정보 저장부(112)와, 컨텐츠 이용정보 관리 DB(114)를 비롯한 분류명별 선호도 산출부(122) 및 분류명별 선호도 정보관리 DB(124)와, 컨텐츠별 선호도 산출부(126) 및 컨텐츠별 선호도 정보관리 DB(128)와, 유사도 연산부(130)와, 분류명-컨텐츠 정보관리 DB(132)와, 분류명-컨텐츠 정보 제공부(134)를 포함한다. As shown, the content classification system according to the present invention includes a content providing server interface unit 110, a content usage information storage unit 112, a content usage information management DB 114, including a content usage information management DB 114 and Preference information management DB 124 by classification name, Preference calculation unit 126 by content, Preference information management DB 128 by content, Similarity calculation unit 130, Classification name-content information management DB 132, and classification name And a content information providing unit 134.

이들 각각을 구체적으로 살펴보면 아래와 같다.Looking at each of them in detail as follows.

먼저, 컨텐츠 제공서버 인터페이스부(110)는 별도의 컨텐츠 제공서버(미도시)와 인터페이싱(interfacing) 되는 부분이다.First, the content providing server interface 110 is a part interfacing with a separate content providing server (not shown).

여기서 컨텐츠 제공서버는 사용자별 개인화된 컨텐츠 추천정보를 제공하는 동시에 사용자의 선택에 따라 해당 컨텐츠를 제공하는 기능을 할 수 있다. 이에 따라 컨텐츠 제공서버는 컨텐츠 사용자에 대한 사용자정보, 해당 컨텐츠에 대한 컨텐 츠 정보, 사용자의 컨텐츠 사용에 대한 사용이력 정보가 담긴 컨텐츠 이용정보를 생성 및 관리하며, 이러한 컨텐츠 이용정보는 본 발명에 따른 컨텐츠 분류시스템의 컨텐츠 제공서버 인터페이스부(110)로 전송된다. 이때, 컨텐츠 이용정보는 컨텐츠 제공서버 인터페이스부(110)에 일정시간별로 전송될 수 있다.The content providing server may provide a user with personalized content recommendation information and provide corresponding content according to a user's selection. Accordingly, the content providing server generates and manages content usage information including user information on the content user, content information on the corresponding content, and usage history information on the user's use of the content, and the content usage information according to the present invention. The content providing server interface 110 of the content classification system is transmitted. In this case, the content usage information may be transmitted to the content providing server interface 110 for each predetermined time.

다음으로, 컨텐츠 이용정보 저장부(112)는 컨텐츠 제공서버 인터페이스(110)를 통해 수신된 컨텐츠 이용정보를 컨텐츠 이용정보 관리 DB(114)에 저장한다. 이때, 필요하다면 컨텐츠 이용정보 저장부(112)는 컨텐츠 이용정보를 적절한 포맷으로 변환하거나 처리할 수 있다.Next, the content usage information storage unit 112 stores the content usage information received through the content providing server interface 110 in the content usage information management DB 114. In this case, if necessary, the content usage information storage unit 112 may convert or process the content usage information into an appropriate format.

다음으로, 컨텐츠 이용정보 관리 DB(114)는 컨텐츠 이용정보 저장부(112)를 통해 전달된 컨텐츠 이용정보를 저장 및 관리한다. 이때, 컨텐츠 이용정보 관리 DB(114)는 컨텐츠 이용정보를 사용자별, 컨텐츠별, 사용이력별로 구분 저장할 수 있는데, 그 일례가 도 2에 나타나 있다.Next, the content usage information management DB 114 stores and manages content usage information delivered through the content usage information storage 112. In this case, the content usage information management DB 114 may store the content usage information by user, by content, and by usage history. An example thereof is illustrated in FIG. 2.

즉, 첨부된 도 2는 컨텐츠 이용정보 관리 DB(114)에 저장 및 관리되는 컨텐츠 이용정보 테이블의 예시도로서, 여기에는 사용자 식별을 위한 사용자 정보, 컨텐츠 식별을 위한 컨텐츠 정보, 컨텐츠 사용에 대한 사용이력 정보가 포함된다. 이때, 사용이력 정보에는 사용유형정보, 사용횟수정보, 사용시간정보 중 적어도 하나가 포함될 수 있고, 비록 도면에 표시되지는 않았지만, 사용자 정보에는 사용자의 성별, 연령, 지역, 직업 등 사용자 속성정보가 포함될 수 있으며, 컨텐츠는 분류명 유무에 따라 분류명이 있는 기(旣) 분류 컨텐츠와 분류명이 없는 미(未) 분류 컨텐츠로 구분된다.That is, FIG. 2 is an exemplary view of a content usage information table stored and managed in the content usage information management DB 114, and includes user information for user identification, content information for content identification, and use of content. History information is included. In this case, the usage history information may include at least one of usage type information, usage count information, and usage time information. Although not shown in the drawing, the user information includes user attribute information such as gender, age, region, and occupation of the user. The content may be classified into pre-classified content having a classification name and non-classified content having no classification name according to the presence or absence of a classification name.

다시 도 1로 돌아와서, 컨텐츠 이용정보 관리 DB(114)는 컨텐츠 이용정보를 분류명 유무에 따라 기 분류 컨텐츠 이용정보와 미 분류 컨텐츠 이용정보로 구분하고, 기 분류 컨텐츠 이용정보는 분류명별 선호도 산출부(122)로 전달하는 한편, 미 분류 컨텐츠 이용정보는 컨텐츠별 선호도 산출부(126)로 전달한다.1 again, the content usage information management DB 114 divides the content usage information into pre-classified content usage information and unclassified content usage information according to the presence or absence of the classification name, and the pre-classified content usage information is classified by the preference name calculation unit ( On the other hand, the unclassified content usage information is transmitted to the content preference calculator 126.

다음으로, 분류명별 선호도 산출부(122)는 기 분류 컨텐츠 이용정보를 이용하여 각 사용자의 분류명별 선호도를 산출한다. 이를 위해 분류명별 선호도 산출부(122)는 각 사용자의 사용이력을 분석하여 분류명에 따른 사용유형별 선호도와, 사용시간별 선호도와, 사용횟수별 선호도 중 적어도 하나를 산출하고, 이들을 토대로 각 사용자의 분류명별 선호도를 산출한다.Next, the classification preference preference unit 122 calculates the preference for each classification name by using the classification content usage information. To this end, the classification preference unit 122 analyzes each user's usage history and calculates at least one of preferences by type of use, preferences by use time, and preferences by frequency of use based on the classification name, and based on these, the classification name of each user. Calculate your preferences.

첨부된 도 3은 분류명별 선호도 산출부(122)에 의한 기 분류 컨텐츠 이용정보 테이블의 예시도로서, 도 2와 비교하면 각각의 컨텐츠 정보가 해당 분류명으로 치환되어 있는 것을 확인할 수 있다. 이에 따라 분류명별 선호도 산출부(122)는 각 사용자의 사용이력정보를 토대로 사용유형별 선호도와, 사용시간별 선호도와, 사용횟수별 선호도를 산출하고, 이들을 토대로 각 사용자의 분류명별 선호도를 산출한다.Attached FIG. 3 is an exemplary diagram of a pre-classified content usage information table by the category-specific preference calculator 122, and compared to FIG. 2, it can be seen that each piece of content information is replaced with a corresponding category name. Accordingly, the classification preference unit 122 calculates the preference by type, the preference by use time, and the preference by number of times based on the usage history information of each user, and calculates the preference by classification name of each user based on these.

이때, 사용유형별 선호도는 각각의 사용유형별로 미리 할당된 가중치를 이용하여 산출될 수 있는데, 도면에는 편의상 컨텐츠들이 모두 음악파일이라는 가정하에 사용유형은 감상, 구매, 조회의 세 가지로 구분되어 있고, 각각의 사용유형에는 1, 5, 3의 가중치가 할당되어 있다. 따라서 사용자 1은 번호 1, 5의 분류명-발라드에 대해 감상, 구매, 조회의 사용이력을 나타내므로 사용유형별 선호도는 18로 점 수화될 수 있다.At this time, the preference for each type of use can be calculated using the weights pre-assigned for each type of use.In the figure, for convenience, the types of use are divided into three categories: appreciation, purchase, and inquiry. Each type of use is assigned a weight of 1, 5, 3. Therefore, since the user 1 indicates the usage history of appreciation, purchase, and inquiry for the classification names-ballads of numbers 1 and 5, the preference for each type of use may be scored to 18.

또한 사용시간별 선호도는 각 사용자의 분류명별 사용시간 중 최근 사용시간과 현재시간 사이 간격에 따른 특정 감소관계를 참조하여 산출될 수 있는데, 이 경우 최근 사용시간을 고려하는 이유는 시간의 경과에 따른 선호도 감소를 감안한 것이며, 신뢰성 향상을 위한 특정 감소관계는 자연감쇄 현상에서 흔히 나타나는 S자 형태의 시그모이드(sigmoid) 감쇄 곡선에 따를 수 있다. 이에 따라 사용자 1의 사용시간별 선호도는 번호 5의 사용시간이 보다 최근이라는 전제하에 5로 점수화될 수 있다. In addition, the preference for each use time can be calculated by referring to a specific decrease relation according to the interval between the recent use time and the present time among the use time of each user's classification name. In this case, the reason for considering the use time is the preference over time The reduction is taken into account, and the specific reduction relationship for improving reliability can be based on the sigmoidal decay curve of the S-shape which is common in natural decay. Accordingly, the user's preference for each use time may be scored as 5 on the premise that the use time of the number 5 is more recent.

또한 사용횟수별 선호도는 각 사용자의 분류명별 사용횟수에 따른 특정 증가관계를 참조하여 산출될 수 있는데, 일반적으로 특정 분류명을 한번 사용한 사용자에 비해 n번 사용한 사용자의 선호도가 상대적으로 높은 것은 당연하지만, 개인적 성향을 고려하면 반드시 n배로 정량화되기 어렵다. 따라서 신뢰성을 향상을 위한 특정 증가관계는 한계효용체감의 법칙에 따를 수 있고, 이에 따라 사용자 1의 사용횟수별 선호도는 2로 점수화될 수 있다.In addition, the preference by frequency can be calculated by referring to a specific increase relation according to the number of times of classification by each user. In general, although the preference of a user who used n times is relatively higher than that of a user who has used a specific name once, Considering personal tendencies, it is not necessarily quantified by n times. Therefore, the specific increase relation for improving reliability may be in accordance with the law of diminishing marginal utility, and thus the user 1 preference may be scored as 2.

그 결과 사용자 1의 분류명 '발라드'에 대한 분류명별 선호도는 25로 점수화 될 수 있다. As a result, the preference of each classification name for the classification name 'ballard' of the user 1 may be scored as 25.

이때, 사용유형별 선호도, 사용시간별 선호도, 사용횟수별 선호도를 비롯한 분류명별 선호도를 산출하는 방법은 필요에 따라 얼마든지 변형될 수 있고, 따라서 상술한 내용은 한 가지 예시에 지나지 않음은 당업자에게 자명한 사실일 것이다.At this time, the method of calculating the preference by classification name, including the preference by type of use, the preference by time of use, and the preference by number of times of use may be modified as necessary. Therefore, the above description is merely one example. It will be true.

또한, 분류명별 선호도 산출부(122)는 미리 설정된 비정상 사용자 판단기준 에 따라 특정 사용자의 분류명별 선호도를 필터링함으로써 신뢰성을 높일 수 있다. 이는 부당한 사용자의 의도적인 행동에 의한 분류명별 선호도의 조작 가능성을 낮추기 위한 것으로, 이를 위해 분류명별 선호도 산출부(122)는 미리 설정된 비정상 사용자 판단기준과 사용자의 속성정보 및 사용이력 정보를 비교하여 해당 사용자의 분류명별 선호도를 필터링 한다.In addition, the classification preference preference unit 122 may increase the reliability by filtering the preference of the classification name of a specific user according to a predetermined abnormal user determination criteria. This is to reduce the likelihood of manipulation of the preference by classification name due to unintended user's intentional behavior. For this purpose, the preference calculation unit 122 compares the user's attribute information and usage history information with preset abnormal user judgment criteria. Filter user preference by category name.

이때, 비정상 사용자의 판단기준 중 사용자 속성정보에 따른 판단기준은 사용자의 직업이 해당 컨텐츠의 생성, 개발, 가공, 판매, 광고와 관련되거나, 사용자의 거주지역이 해당 컨텐츠의 판매국 이외인 경우 등이 될 수 있고, 사용이력정보에 따른 판단기준은 동일 컨텐츠에 대해 최소 접근간격시간 이내에 최대 접근횟수 이상 접근하거나, 특정 사용자가 최소 수행시간 간격 이내에 최대 수행횟수 이상 동일 행위를 반복하는 경우가 될 수 있다.At this time, among the criteria for abnormal user's judgment, the criteria according to the user's attribute information include when the user's job is related to the creation, development, processing, sale, or advertisement of the corresponding content, or where the user's living area is outside the country of sale of the corresponding content. The criterion according to the usage history information may be the case where the same content is accessed more than the maximum number of times within the minimum access interval time, or a specific user repeats the same behavior more than the maximum number of times within the minimum execution time interval. .

이 중 사용자 속성정보에 따른 판단기준에 대해서는 별도의 설명이 없더라도 쉽게 이해될 수 있으므로 생략하는 반면, 사용이력정보에 따른 판단기준을 간단히 살펴보면, 예컨대 사용자 2의 분류명 '발라드'에 대한 사용이력정보의 사용시간 중 'yy/mm/dd hh:mm:ss 6_1~yy/mm/dd hh:mm:ss 6_3'의 시간간격이 미리 설정된 최소 접근시간간격 이내이고 최대 사용횟수가 3회라면, 사용자 2의 숫자 6 내지 8에 해당되는 행위는 "동일 컨텐츠에 대해 최소 접근간격시간 이내에 최대 접근횟수 이상 접근"한 경우에 해당된다. 따라서 분류명별 선호도 산출부(122)는 사용자 2의 해당 행위를 필터링 대상으로 판단하고, 이에 대한 분류명별 선호도를 제외한다.Among these, the criterion according to the user attribute information can be easily understood even if there is no separate explanation, and thus, the criterion based on the usage history information is briefly described. For example, the use history information for the user name 2 'ballard' is classified. If the time interval of 'yy / mm / dd hh: mm: ss 6_1 ~ yy / mm / dd hh: mm: ss 6_3' is within the preset minimum access time interval and the maximum number of times of usage is 3, User 2 Actions corresponding to the numbers 6 to 8 in the case of "access to the same content more than the maximum number of times of access within the minimum access interval time". Therefore, the classification preference unit 122 determines that a corresponding action of the user 2 is a filtering target, and excludes the preference by classification name.

아울러, 분류명별 선호도 산출부(122)는 소수 집단에 속한 사용자의 특성이 무시되지 않도록 분류명별 선호도를 사용자 속성별로 분리한 후, 각 사용자의 속성에 따른 분류명별 선호도를 산출할 수 있다. 이를 위해 분류명별 선호도 산출부(122)는 도 2에 보인 각 사용자의 분류명별 선호도를 사용자 속성에 따라 구분한 후 동일 속성별로 병합한다.In addition, the preference calculator 122 by the classification name may separate the preference by classification name so that the characteristics of the users belonging to the minority group are not ignored, and then calculate the preference by classification name according to each user's attributes. To this end, the preference calculator 122 by classification name divides the preferences of each user's classification name shown in FIG. 2 according to user attributes and merges them for each same attribute.

즉, 도 4와 도 5는 각각 사용자 속성에 따른 분류명별 선호도를 나타낸 테이블 예시도로서, 도 4는 사용자 속성정보 중 지역, 성별을 고려하여 서울/경기 지역 여성의 분류명별 선호도를 나타내고 있고, 도 5는 충청 지역 남자의 분류명별 선호도를 나타내고 있다. That is, FIGS. 4 and 5 are table views showing preferences according to classification names according to user attributes, respectively. FIG. 4 shows preferences by classification names of women in Seoul / Gyeonggi area in consideration of region and gender in user attribute information. 5 represents the preference by classification name of males in Chungcheong area.

그리고 도 2에서 살펴본 각 사용자의 분류명별 선호도와 도 4 및 도 5에서 살펴본 사용자 속성에 따른 분류명별 선호도는 분류명별 선호도 정보관리 DB(124)에 저장된다.In addition, the preference by classification name of each user described with reference to FIG. 2 and the preference by classification name according to the user attributes described with reference to FIGS. 4 and 5 are stored in the preference information management DB 124 by classification name.

한편, 컨텐츠별 선호도 산출부(126)는 미 분류 컨텐츠 이용정보를 이용하여 각 사용자의 컨텐츠별 선호도를 산출한다. 이를 위해 컨텐츠별 선호도 산출부(126)는 각 사용자의 사용이력을 분석하여 컨텐츠에 따른 사용유형별 선호도와, 사용시간별 선호도와, 사용횟수별 선호도 중 적어도 하나를 산출하고, 이들을 토대로 컨텐츠별 선호도를 산출한다.Meanwhile, the content preference calculator 126 calculates the content preference of each user by using the unclassified content usage information. To this end, the content preference calculator 126 analyzes the usage history of each user and calculates at least one of preferences by type of use, preferences by use time, and preferences by number of times based on the content, and calculates preferences based on the content. do.

첨부된 도 6은 컨텐츠별 선호도 산출부(126)에 의한 미 분류 컨텐츠 이용정보 테이블 예시도로서, 앞서 도 2와 비교하면 분류명이 아닌 각 컨텐츠별 선호도가 산출된다는 점에서 상이하며, 그외에는 앞서 설명내용이 동일하게 적용될 수 있다. 그 결과 번호 3의 사용자 1에 대한 컨텐츠 2의 사용유형별 선호도는 6, 사용시간별 선호도는 3, 사용횟수별 선호도는 1로 점수화될 수 있고, 컨텐츠 2에 대한 선호도는 10으로 점수화 될 수 있다.6 is a diagram illustrating an unclassified content usage information table by the content preference calculator 126, which is different from that in FIG. 2 in that a preference for each content is calculated rather than a classification name. The same may apply. As a result, the preference by content type 2 for user 1 of number 3 may be scored as 6, the preference by time of use is 3, the preference by frequency of use is 1, and the preference for content 2 may be scored as 10.

또한, 컨텐츠별 선호도 산출부(126)는 미리 설정된 비정상 사용자 판단기준에 따라 특정 사용자의 컨텐츠별 선호도를 필터링함으로써 신뢰성을 높일 수 있고, 소수 집단에 속한 사용자의 특성이 무시되지 않도록 사용자 속성에 따라 컨텐츠별 선호도를 구분한 후 동일 속성별로 병합한다.In addition, the content preference calculator 126 may increase the reliability by filtering the content preferences of a specific user according to a predetermined abnormal user judgment criterion, and the content according to the user attribute so that the characteristics of the users belonging to the minority group are not ignored. After classifying the preferences, merge them by the same property.

이에 따라 도 7은 사용자 속성정보 중 지역, 성별을 고려하여 서울/경기 지역 여성의 컨텐츠별 선호도를 나타내고 있고, 도 8은 충청 지역 남자의 컨텐츠별 선호도를 나타내고 있다. 이러한 컨텐츠별 선호도 산출부(126)의 구체적인 작용에 대해서는 앞서 살펴본 분류명별 선호도 산출부(122)의 설명을 참조하면 쉽게 이해될 수 있으므로 중복된 설명은 생략한다.Accordingly, FIG. 7 illustrates preferences of women in Seoul / Gyeonggi area by content in consideration of region and gender in user attribute information, and FIG. 8 illustrates preferences of contents of men in Chungcheong area. The detailed operation of the preference calculator 126 for each content may be easily understood by referring to the description of the preference calculator 122 for each category, as described above.

그리고 도 6에서 살펴본 각 사용자의 컨텐츠별 선호도와 도 7 및 도 8에서 살펴본 사용자 속성에 따른 컨텐츠별 선호도는 컨텐츠별 선호도 정보관리 DB(128)에 저장된다.In addition, the preferences for each user according to the contents described in FIG. 6 and the preferences for each content according to the user attributes described in FIGS. 7 and 8 are stored in the content preference information management DB 128.

다시 도 1을 참조하면, 유사도 연산부(130)는 분류명별 선호도 정보관리 DB(124)에 저장된 각 사용자의 분류명별 선호도 및 사용자 속성에 따른 분류명별 선호도와 컨텐츠별 선호도 정보관리 DB(128)에 저장된 각 사용자의 컨텐츠별 선호도 및 사용자 속성에 따른 컨텐츠별 선호도를 이용하여 '분류명'과 '컨텐츠' 사이의 유사도를 연산하고, 유사도가 높은 분류명과 컨텐츠를 그룹화하여 '분류명-컨텐츠 군(群) 정보'를 생성한다. 이때, 분류명-컨텐츠 군 정보란 상호 연관성 있는 분 류명과 컨텐츠를 그 유사도에 따라 매트릭스 형태로 대응시키거나 일 대 일, 일 대 복수, 복수 대 일로 대응시킨 형태가 될 수 있다.Referring back to FIG. 1, the similarity calculating unit 130 is stored in the classification preference and content preference information management DB 128 according to the classification name preference and the user attribute of each user stored in the classification information preference information management DB 124. Calculate the similarity between 'classification name' and 'content' by using each user's content preference and user's preference according to user attributes, and group the classification name and content with high similarity to 'classification-content group information' Create In this case, the classification name-content group information may correspond to a category name and contents that are correlated with each other in a matrix form or correspond to one-to-one, one-to-one, and plural-to-one.

보다 구체적으로, 상기 유사도 연산부(130)에 의한 분류명과 컨텐츠 사이의 유사도 연산과정을 살펴본다.More specifically, the similarity calculation process between the classification name and the content by the similarity calculator 130 will be described.

먼저, 유사도 연산부(130)는 분류명별 선호도 정보관리 DB(120)에 저장된 각 사용자의 분류명별 선호도를 이용하여 분류명 별 선호사용자 그룹을 생성한다. 여기서 분류명 별 선호사용자 그룹이란 동일 분류명에 대한 선호도가 미리 설정된 기준치 이상인 사람들로 이루어질 수 있다. 이어서 유사도 연산부(130)는 선호사용자 그룹내 사용자들이 이용한 컨텐츠를 선별하여 분류명과 해당 컨텐츠 사이의 유사도를 산출하고, 그 유사도가 기준치 이상이면 해당 분류명과 컨텐츠를 토대로 분류명-컨텐츠 군 정보를 생성한다.First, the similarity calculator 130 generates a preferred user group for each classification name by using the preference for each classification name of each user stored in the preference information management DB 120 for each classification name. Here, the preferred user group for each classification name may be composed of people whose preference for the same classification name is greater than or equal to a preset threshold. Next, the similarity calculator 130 selects the contents used by the users in the preferred user group to calculate the similarity between the classification name and the corresponding content, and generates similar classification name-content group information based on the classification name and the content when the similarity is higher than the reference value.

이해를 돕기 위해 앞서와 다른 예를 들어보면, 컨텐츠 A,B,C,D는 분류명이 있는 컨텐츠로서 분류명 '발라드'이고, 컨텐츠 E,F,G,H,L,M,N,X,Y,Z는 분류명이 없는 컨텐츠라는 전제하에, 사용자 갑(甲)이 컨텐츠 A,B,C,D,E,F,G,H를 사용하였고, 사용자 을(乙)이 컨텐츠 A,B,C,D,E,X,Y,Z를 사용하였으며, 사용자 병(丙)이 컨텐츠 A,B,C,D,E,L,M,N 을 사용한 경우, 유사도 연산부(130)는 각 사용자의 분류명별 선호도를 통해 사용자 갑, 을, 병, 정을 분류명 '발라드'의 선호사용자 그룹으로 그룹화한다. For the sake of understanding, the contents A, B, C, and D are contents with classification names, which are classified as ballads, and contents E, F, G, H, L, M, N, X, and Y. A, Z is the content A, B, C, D, E, F, G, H, and the user A, B, C, When D, E, X, Y, and Z are used, and the user bottle uses contents A, B, C, D, E, L, M, and N, the similarity calculation unit 130 performs classification for each user's classification name. The preferences group user packs, packs, bottles, and tablets into groups of preferred users with the classification name Ballad.

이어서 유사도 연산부(130)는 분류명 '발라드'의 선호사용자 그룹 내 사용자 갑, 을, 병, 정이 사용한 미분류 컨텐츠를 선별하고, 선별된 컨텐츠와 분류명 사이 의 유사도를 연산하는데, 분류명 '발라드'의 선호사용자 그룹 내 사용자 모두는 컨텐츠 E를 사용하였으므로 분류명 '발라드'와 컨텐츠 E 사이의 유사도는 100% 이고, 분류명 '발라드'의 선호사용자 그룹 내 사용자 중 한 사람 만이 컨텐츠 F를 사용하였으므로 분류명 '발라드'와 컨텐츠 F 사이의 유사도는 약 33%이며, 마찬가지 방식으로 분류명 '발라드'와 컨텐츠 G,H,L,M,N,X,Y,Z의 유사도는 약 33%가 된다. Next, the similarity calculator 130 selects the unclassified contents used by the user, group, and bottle in the preferred user group of the classification name 'ballad', and calculates the similarity between the selected contents and the classification name, and the preferred user of the classification name 'ballard'. Since all users in the group used content E, the similarity between classification name 'ballard' and content E was 100%, and only one of the users in the preferred user group of category name 'ballard' used content F. The similarity between F is about 33%, and similarly, the similarity between the classification name 'ballard' and the contents G, H, L, M, N, X, Y, Z is about 33%.

이때, 미리 설정된 유사도 판단에 대한 기준치가 50%라 가정하면, 분류명 '발라드'와 컨텐츠 E 사이의 유사도는 기준치 이상이므로 '발라드'-'컨텐츠 E'에 대한 분류명-컨텐츠 정보 군이 생성된다. 한편, 상기의 유사도의 연산을 위해서는 주지의 클러스터링 방법이 사용될 수 있고, 그 구체적인 예로는 피어슨 상관관계, 인기도차 방법, 코사인 유사도 방법 등 통상의 협업필터링에서 사용되는 알고리즘이 채택될 수 있으므로 본 발명에서는 유사도 연산을 위한 구체적인 클러스터링 방법은 따로 설명하지 않는다.In this case, assuming that the reference value for the predetermined similarity determination is 50%, since the similarity between the classification name 'ballard' and the content E is greater than the reference value, the classification name-content information group for 'ballard'-'content E' is generated. Meanwhile, a known clustering method may be used to calculate the similarity, and specific examples thereof may include algorithms used in common collaborative filtering such as Pearson correlation, popularity difference method, and cosine similarity method. A detailed clustering method for the similarity calculation will not be described separately.

이 경우, 특히 분류명-컨텐츠 군 정보의 정확도를 한층 더 높이기 위해서 사용자 속성에 따른 분류명별 선호도와 컨텐츠별 선호도가 고려되는 것이 바람직한바, 분류명 별 선호사용자 그룹을 생성할 때 각 사용자 속성에 따른 분류명별 선호도가 고려될 수 있다. 즉, 앞서 예에서 사용자 갑, 을, 병, 정은 사용자 속성에 따라 임의로 서울/경기 지역에 거주하는 여성일 수 있고, 그 밖의 다른 사용자 속성을 고려한 선호사용자 그룹일 수 있다.In this case, in order to further increase the accuracy of the classification name-content group information, it is preferable to consider the preferences by classification name and content preference according to user attributes. When creating a preferred user group by classification name, the classification name according to each user attribute is preferred. Preference may be considered. That is, in the above example, the user pack, group, bottle, and tablet may be women who reside in Seoul / Gyeonggi area arbitrarily according to user attributes, or may be a preferred user group considering other user attributes.

또한 상기의 분류명-컨텐츠 군 정보의 생성에는 몇 가지 가중치가 고려될 수 있는데, 예컨대 임의의 분류명 별 사용자선호 그룹에서 사용된 분류명-컨텐츠의 사 용 간격이 일정시간을 초과하는 경우에는 이용시간간격에 반비례하는 시간 가중치를 적용하고, 동일자 또는 일정 시간 내에 사용된 경우에는 높은 가중치를 부여하여 그 신뢰도를 높일 수 있다.In addition, some weights may be considered in the generation of the classification name-content group information. For example, when the use interval of the classification name-content used in a user preference group for each classification name exceeds a predetermined time, Inversely, time weights that are inversely applied and the same weight or when used within a predetermined time can be given high weights to increase their reliability.

아울러, 유사도 연산부(130)는 분류명 별 선호사용자 그룹 내 사용자가 미리 설정된 소수 사용자 수 미만일 경우에는 해당 선호사용자 그룹을 필터링 할 수 있고, 이를 통해 분류명-컨텐츠 군에 대한 정확도를 높일 수 있다. In addition, the similarity calculator 130 may filter the corresponding preferred user group when the number of users in the preferred user group for each classification name is less than a preset number of users, thereby increasing the accuracy of the classification name-content group.

다음으로, 유사도 연산부(130)에서 생성된 분류명 별 선호사용자 그룹과 분류명-컨텐츠 군 정보는 분류명-컨텐츠 정보관리 DB(132)에 저장되고, 이중 분류명-컨텐츠 군 정보는 분류명-컨텐츠 정보 제공부(134)로 전달된다.Next, the preferred user group for each classification name and the classification name-content group information generated by the similarity calculating unit 130 are stored in the classification name-content information management DB 132, and the dual classification name-content group information is classified name-content information providing unit ( 134).

마지막으로 분류명-컨텐츠 정보 제공부(134)는 미 분류 컨텐츠를 각각 해당 분류명으로 구분하는 한편, 컨텐츠 제공서버 인터페이스부(110)를 통해 별도의 컨텐츠 제공서버에 제공함으로써 미분류 컨텐츠가 해당 분류명으로 분류되도록 한다.Finally, the classification name-content information providing unit 134 classifies the unclassified contents into the corresponding classification names, and provides the unclassified contents with the corresponding classification names by providing them to a separate content providing server through the content providing server interface 110. do.

도 9는 본 발명에 따른 컨텐츠 분류방법의 순서도로서, 도 1과 함께 참조하여 본 발명에 대한 설명을 정리한다.FIG. 9 is a flowchart illustrating a content classification method according to the present invention. The description of the present invention will be described with reference to FIG. 1.

본 발명에 따른 컨텐츠 분류방법을 위해서는 먼저, 컨텐츠 제공서버 인터페이스부(110)로 컨텐츠 이용정보가 전송된다.(st1) 이때, 컨텐츠 이용정보에는 사용자 식별정보는 물론 사용자 성별, 연령, 지역, 직업 등의 사용자 속성정보가 담긴 사용자 정보와, 컨텐츠 식별을 위한 컨텐츠 정보와, 컨텐츠 사용에 대한 사용이력 정보가 포함된다.For the content classification method according to the present invention, first, content usage information is transmitted to the content providing server interface 110. (st1) In this case, the content usage information includes user identification information as well as user gender, age, region, occupation, etc. User information containing the user attribute information of the content, content information for content identification, and usage history information for the use of the content.

이어서 컨텐츠 이용정보는 컨텐츠 이용정보 저장부(112)에 의해 컨텐츠 이용 정보 관리 DB(114)에 저장되고, 분류명 유무에 따라 기 분류 컨텐츠 이용정보와 미 분류 컨텐츠 이용정보를 구분된다.(st2)Subsequently, the content usage information is stored in the content usage information management DB 114 by the content usage information storage unit 112, and the classified content usage information and the unclassified content usage information are classified according to the classification name presence (st2).

이어서, 분류명별 선호도 산출부(122)는 기 분류 컨텐츠 이용정보 내부의 컨텐츠 식별정보를 분류명으로 치환하고, 각 사용자의 사용이력 정보를 통해 사용유형별, 사용시간별, 사용횟수별 선호도를 산출한다. 그리고 이들을 통해 분류명별 선호도를 산출하는 한편, 미리 설정된 비정상 사용자 판단기준에 따라 특정 사용자의 분류명별 선호도를 필터링하고, 분류명별 선호도를 사용자 속성에 따라 구분한 후 동일 속성별로 병합한다.(st11~st15) 그리고 각 사용자의 분류명별 선호도 및 사용자 속성에 따른 분류명별 선호도는 분류명별 선호도 정보관리 DB(124)에 저장된다.Subsequently, the preference calculation unit 122 for each classification name replaces the content identification information in the previously classified content usage information with the classification name, and calculates the preference for each type of use, time of use, and number of times of use based on usage history information of each user. Through these calculations, preferences are calculated by classification name, the preferences of specific user's classification names are filtered according to the preset abnormal user judgment criteria, and the preferences according to classification names are classified according to user attributes and merged by the same attributes (st11 ~ st15). And the preference by classification name according to the classification name and user attribute of each user is stored in the classification information preference information management DB (124).

아울러, 컨텐츠별 선호도 산출부(126)는 컨텐츠별 미 분류 컨텐츠 이용정보 내부의 각 사용자에 대한 사용이력 정보를 통해 사용유형별, 사용시간별, 사용횟수별 선호도를 산출하고, 이들을 통해 컨텐츠별 선호도를 산출하며, 미리 설정된 비정상 사용자 판단기준에 따라 특정 사용자의 분류명별 선호도를 필터링하고, 컨텐츠별 선호도를 사용자 속성에 따라 구분한 후 동일 속성별로 병합한다.(st21~st24) 그리고 각 사용자의 컨텐츠별 선호도 및 사용자 속성에 따른 컨텐츠별 선호도는 컨텐츠별 선호도 정보관리 DB(128)에 저장된다.In addition, the content preference calculator 126 calculates a preference by use type, time of use, and frequency of use based on usage history information for each user in unclassified content use information by content, and calculates a preference by content through these. It filters the preferences by specific user's classification name based on the preset abnormal user judgment criteria, classifies the content preferences according to the user attributes, and merges them by the same attributes (st21 ~ st24). Content-specific preferences based on user attributes are stored in content-specific preference information management DB 128.

이어서, 유사도 연산부(130)는 분류명 별 선호도를 이용하여 분류명별 선호사용자 그룹을 생성하고, 해당 선호사용자 그룹의 컨텐츠별 선호도를 통해 분류명과 컨텐츠 사이의 유사도를 연산하여 분류명-컨텐츠 군 정보를 생성하는 한편, 기 준치 이하의 유사도를 나타내는 컨텐츠를 필터링하는 동시에 소수 사용자에 의한 선호사용자 그룹을 필터링함으로써 신뢰성 높은 분류명-컨텐츠 군 정보를 생성한다.(st31~st33)Subsequently, the similarity calculator 130 generates a preferred user group for each classification name using the preference for each classification name, and generates classification name-content group information by calculating the similarity between the classification name and the content based on the content preference of the corresponding preferred user group. On the other hand, highly reliable classification name-content group information is generated by filtering content indicating a similarity below the threshold and filtering the preferred user group by a few users. (St31 to st33)

이에 따라 분류명-컨텐츠 군 정보가 완성되면 분류명-컨텐츠 군 정보는 분류명-컨텐츠 정보관리 DB(132)에 저장되고, 분류명-컨텐츠 정보제공부(134)는 상기 분류명-컨텐츠 군 정보에 따라 미 분류 컨텐츠를 분류한다.(st34)Accordingly, when the classification name-content group information is completed, the classification name-content group information is stored in the classification name-content information management DB 132, and the classification name-content information providing unit 134 performs unclassified contents according to the classification name-content group information. Classify (st34)

이상에서는 본 발명의 바람직한 실시예에 대해 도시하고 설명하였으며, 본 발명은 이에 한정되지 않는다, 즉 본 발명은 구체적인 방식에 따라 얼마든지 변형될 수 있지만 이는 모두 본 발명의 권리범위 내에 속한다 해야 할 것인바, 본 발명의 권리범위는 이하의 특허청구범위에 보다 분명하게 나타나 있다.The above has been illustrated and described with respect to the preferred embodiment of the present invention, the present invention is not limited to this, that is, the present invention can be modified in any way according to the specific manner, which should all fall within the scope of the present invention. The scope of the present invention is more clearly shown in the following claims.

도 1은 본 발명에 따른 컨텐츠 분류시스템의 블럭도.1 is a block diagram of a content classification system in accordance with the present invention.

도 2는 본 발명에 따른 컨텐츠 이용정보의 테이블 예시도.2 is an exemplary table of content usage information according to the present invention.

도 3은 본 발명에 따른 기 분류 컨텐츠 이용정보의 분류명별 선호도 테이블 예시도.3 is a diagram illustrating a preference table for each classification name of pre-classified content usage information according to the present invention.

도 4와 도 5는 각각 본 발명에 따른 기 분류 컨텐츠 이용정보의 사용자 속성에 따른 분류명별 선호도의 테이블 예시도.4 and 5 are table examples of the preference by classification name according to the user attribute of the pre-classified content usage information according to the present invention, respectively.

도 6은 본 발명에 따른 미 분류 컨텐츠 이용정보의 컨텐츠별 선호도 테이블 예시도.6 is a diagram illustrating a preference table for each content of unclassified content usage information according to the present invention.

도 7과 도 8은 각각 본 발명에 따른 미 분류 컨텐츠 이용정보의 사용자 속성에 따른 컨텐츠별 선호도의 테이블 예시도.7 and 8 are table examples of the preferences for each content according to user attributes of unclassified content usage information according to the present invention.

도 9는 본 발명에 따른 컨텐츠 분류방법의 순서도.9 is a flowchart of a content classification method according to the present invention;

Claims

Content information on the pre-classified content having the classification name according to the classification system and the non-classified content without the classification name according to the classification system, the user information about the user of the content, and the type of use of the user for the content. A content classification method using collaborative filtering of a content classification system for classifying the unclassified content by using content usage information including at least one usage history information of a usage time and a usage frequency.

(a) transmitting the content usage information from a content providing server to the content classification system;

(b) dividing the content usage information into content classification information for the pre-classified content and the unclassified content by the content classification system;

(c) calculating, by the content classification system, preferences for each classification name of the user according to the usage history information on the previously classified content;

(d) After the content classification system generates a preferred user group for each classification name for the users whose preference level is greater than or equal to the reference value, the unclassified content selected by the user of the classification group preferred user group is selected, clustering by collaborative filtering is performed. Calculating a similarity between the classification name and the unclassified content;

(e) generating, by the content classification system, classification name-content group information according to the classification name for the unclassified content whose similarity is equal to or greater than a reference value; And

and (f) classifying the unclassified content into a corresponding classification name according to the classification name-content group information by the content classification system.

delete

The method according to claim 1,

The step (c) is a step of calculating a preference for each use type in proportion to the sum of weights pre-allocated according to the use type, calculating a preference for each use time in proportion to the use time, using in proportion to the use frequency. The method of claim 1, further comprising calculating at least one of the number of preferences, wherein the classification preference is calculated based on a sum of at least one of the preference by type, usage time, and preference.

delete

The method according to claim 4,

The content preference method according to the use time is a content classification method using collaborative filtering reflecting a specific reduction relationship according to the interval between the latest time and the current time.

The method according to claim 6,

The specific reduction relationship is a content classification method using collaborative filtering according to the sigmoid attenuation curve.

The method according to claim 4,

The content preference method according to the number of times of use is a content classification method using collaborative filtering reflecting a specific increase relation according to the number of times of use.

The method according to claim 8,

The specific increase relationship is a content classification method using collaborative filtering according to the law of diminishing marginal utility.

delete

The method according to claim 1,

After the step (c) and before the step (d), the method further comprises the step of filtering the preferences of the user by the classification name in accordance with a predetermined abnormal user determination criteria, wherein the abnormal user determination criteria is the content of the user's job In connection with the creation, development, processing, sale or advertisement of the user, or outside the nationality of the user, the user uses the same content more than the maximum number of times within the minimum usage time interval, or the user maximum within the minimum usage time interval Content classification method using collaborative filtering which is at least one of performing the same behavior more than the number of times of use.

delete

The method according to claim 1,

The user information further includes at least one user attribute of the gender, age, and region of the user, wherein the preferred user group for each classification name of step (d) is generated for each user attribute.

delete

Connected to the content providing server, the content information for the pre-classified content having the classification name according to the classification system and the non-classified content without the classification name according to the classification system, the user information for the user of the content, and the content A content providing server interface unit configured to receive content usage information including at least one usage history information of the user's usage type, usage time, and usage frequency of the user;

A classification name preference calculator configured to select content use information of the previously classified content having a classification name according to a classification system among the content use information, and calculate a preference for each classification name of the user according to the usage history information;

Generate a preferred user group for users whose preference level is higher than the reference value, select unclassified contents selected by a user of the preferred user group, and calculate similarity between the classified name and the unclassified contents through clustering by collaborative filtering; A similarity calculator configured to generate classification name-content group information according to the classification name for the unclassified content having a similarity level or more; And

And a classification unit which classifies the unclassified contents into a corresponding classification name by using the classification name-content group information.

The method according to claim 18,

The preference calculator for each classification name calculates a preference for each use type in proportion to the sum of weights pre-assigned according to the usage type, calculates a preference for each use time in proportion to the use time, or a preference for each use frequency in proportion to the use frequency. The content classification system using the cooperative filtering, characterized in that the calculated by the classification name is calculated by the sum of at least one of the preference by the type of use, preference by use time, preference by the number of times of use.

The method according to claim 18,

A content usage information database in which the content usage information is stored and managed;

A preference information database for each classification name in which preferences for each classification name are stored and managed; And

And a classification name-content group database storing the unclassified classification name-content group information.