KR20080045659A

KR20080045659A - Information processing device, method, and program

Info

Publication number: KR20080045659A
Application number: KR1020077011814A
Authority: KR
Inventors: 노리유끼 야마모또; 게이 다떼노; 마리 사이또; 도모히로 쯔노다; 미쯔히로 미야자끼
Original assignee: 소니 가부시끼 가이샤
Priority date: 2005-09-28
Filing date: 2006-09-15
Publication date: 2008-05-23
Also published as: US20090077132A1; EP1835419A4; EP1835419A1; US8117211B2; JP4378646B2; JP2007122683A; CN100594496C; WO2007037139A1; CN101069184A

Abstract

There are provided an information processing device, an information processing method, and a program capable of suppressing concentration of recommendations in some contents and recommending a content even to a user having little history information by the CF method. Step 11 detects a user A for which a music composition is recommended and another user X having most similar history information. Step 12 detects a music composition owned by the user X and not owned by the user A. Step 13 identifies a cluster of each cluster layer to which the music composition "a" belongs. Step 14 extracts a music composition classified to be common to all the identified clusters as a recommended candidate. Furthermore, step S15 selects one music composition having cluster information most similar to the music composition "a" among the recommended candidates. The music composition thus selected is recommended to the user A. The present invention may be applied, for example, to a content sale site established in the Internet.

Description

Information processing apparatus, methods, and programs {INFORMATION PROCESSING DEVICE, METHOD, AND PROGRAM}

본 발명은, 정보 처리 장치, 정보 처리 방법, 및 프로그램에 관한 것으로, 특히, 콘텐츠를 클러스터에 분류하고, 콘텐츠가 분류된 클러스터를 이용하여 콘텐츠의 특징을 관리하고, 콘텐츠의 검색이나 추천에 이용하도록 한 정보 처리 장치, 정보 처리 방법, 및 프로그램에 관한 것이다.BACKGROUND OF THE INVENTION Field of the Invention The present invention relates to an information processing apparatus, an information processing method, and a program, and more particularly, to classify content into clusters, to manage the characteristics of the content by using the cluster in which the content is classified, and to use the content for search and recommendation. An information processing apparatus, an information processing method, and a program are related.

종래, 이용자의 기호에 기초하여 텔레비전 프로그램, 악곡 등의 콘텐츠를 검색하여 추천하기(소위, 콘텐츠 퍼스널라이제이션) 위한 발명이 제안되어 있다(예를 들면, 특허 문헌 1 참조).Background Art Conventionally, an invention for retrieving and recommending content such as a television program or a piece of music based on a user's preference (so-called content personalization) has been proposed (see Patent Document 1, for example).

콘텐츠 퍼스널라이제이션에는, 협조 필터링(CF)이라고 칭하는 방법이나 콘텐트 베이스트 필터링(CBF)이라고 칭하는 방법이 널리 사용되고 있다.In content personalization, a method called cooperative filtering (CF) or a method called content-based filtering (CBF) is widely used.

CF 방법은, 각 이용자의 구입 이력을 관리하고, 콘텐츠를 추천하려고 하는 이용자 A에 대하여, 구입 이력이 비슷한 다른 이용자 X를 검출하고, 해당 다른 이용자 X가 구입하고 있고, 또한, 이용자 A가 구입하지 않은 콘텐츠를 추천하도록 한 것으로, 예를 들면, 인터넷 상의 통신 판매 사이트에서 채용되고 있다.The CF method manages a purchase history of each user, detects another user X having a similar purchase history for user A who wants to recommend content, and the other user X purchases, and user A does not purchase. In order to recommend content that is not used, for example, it is employed in a mail order site on the Internet.

CBF 방법은, 콘텐츠에 대하여 배신측이나 판매측에 의해 미리 부여되어 있는 메타데이터가 직접적으로 기호의 추출이나 콘텐츠의 추천에 이용되고 있었다. 즉, 이용자의 기호를 나타내는 특징 벡터와, 후보로 되는 각 악곡의 특징 벡터의 거리(코사인 상관 등)를 산출하고, 산출된 거리가 짧은 악곡이 이용자의 기호에 합치한 것으로서 추천되도록 이루어져 있다.In the CBF method, metadata previously provided by the distribution side or sales side with respect to the content has been directly used for extracting preferences and recommending content. That is, the distance (cosine correlation or the like) between the feature vector representing the user's preference and the feature vector of each of the candidate music pieces is calculated, and the short-distance music calculated is recommended as the user's preference.

[특허문헌 1] 일본 특개 2004-194107호 공보[Patent Document 1] Japanese Patent Application Laid-Open No. 2004-194107

<발명의 개시><Start of invention>

<발명이 해결하고자 하는 과제>Problems to be Solved by the Invention

전술한 CF 방법의 이하의 문제점이 발생한다.The following problems arise from the above-described CF method.

(1) 어느 이용자에게도 구입되어 있지 않은 콘텐츠는, 누구에 대해서도 추천되는 일은 없다. 따라서, 방대하게 준비되어 있는 콘텐츠 중, 추천되는 콘텐츠가 일부에 집중하고, 나머지 대다수의 콘텐츠가 추천되지 않는다.(1) Content not purchased by any user is not recommended to anyone. Therefore, among the contents prepared in a large scale, recommended contents concentrate on some parts, and the majority of the contents are not recommended.

(2) 신규의 이용자에 대하여 콘텐츠를 추천하는 경우, 해당 이용자의 구입 이력이 적기 때문에, 이력 정보가 유사한 다른 이용자를 검출할 수 없어, 콘텐츠를 추천할 수 없다(소위, 콜드 개시 문제).(2) In the case of recommending content to a new user, since the purchase history of the user is small, other users with similar history information cannot be detected and the content cannot be recommended (so-called cold start problem).

(3) 통상적으로, 콘텐츠의 수도 이용자의 수도 증가해 가지만, 그 경우, 구입 이력이 비슷한 다른 이용자를 검출할 때의 연산이 많아져서, 추천하는 콘텐츠를 신속하게 결정할 수 없다.(3) In general, the number of users of the content also increases, but in that case, the operation when detecting other users having similar purchase histories increases, so that the recommended content cannot be determined quickly.

본 발명은 이러한 상황을 감안하여 이루어진 것으로, CF 방법에서 일부의 콘텐츠에 추천이 집중되는 것을 억지함과 함께, 이력 정보가 적은 이용자에 대해서도 콘텐츠를 추천할 수 있도록 하는 것이다.The present invention has been made in view of such a situation, and the CF method prevents the recommendation from being concentrated on some of the contents, and also allows the content to be recommended even for a user with little history information.

<과제를 해결하기 위한 수단>Means for solving the problem

본 발명의 일 측면인 정보 처리 장치는, 콘텐츠군 중으로부터 소정의 조건을 충족시키는 콘텐츠를 선택하여 이용자에게 제시하는 정보 처리 장치로서, 상기 콘텐츠군을 구성하는 각 콘텐츠를, 콘텐츠의 메타데이터에 따른 계층의 각각에서 복수의 제1 클러스터 중 어느 하나에 분류하는 콘텐츠 분류 수단과, 각 콘텐츠와 각 콘텐츠의 각각이 분류된 상기 계층에서의 상기 제1 클러스터와의 대응 관계를 나타내는 데이터베이스를 유지하는 유지 수단과, 상기 이용자의 콘텐츠에 대한 이력 정보를 관리하는 관리 수단과, 상기 이력 정보에 기초하여, 주목하는 제1 클러스터를 특정하고, 특정한 상기 제1 클러스터에 분류되어 있는 콘텐츠를 선택하는 선택 수단과, 선택된 상기 콘텐츠를 제시하는 제시 수단을 포함한다.An information processing apparatus, which is an aspect of the present invention, is an information processing apparatus that selects a content satisfying a predetermined condition from a content group and presents the same to a user, wherein each content constituting the content group is selected according to the metadata of the content. Holding means for holding a content classification means for classifying any one of the plurality of first clusters in each of the hierarchies, and a database indicating a correspondence relationship between each content and the first cluster in the hierarchy where each of the respective content is classified; Management means for managing history information on the content of the user, selection means for specifying a first cluster of interest based on the history information, and selecting content classified in the specific first cluster; Presentation means for presenting the selected content.

상기 선택 수단은, 제1 이용자와 상기 이력 정보가 유사한 제2 이용자를 검출하는 검출 수단과, 상기 제1 이용자의 이력 정보 상에 존재하지 않고, 상기 제2 이용자의 이력 정보 상에 존재하는 콘텐츠가 분류되어 있는 제1 클러스터를 특정하는 특정 수단과, 특정된 상기 제1 클러스터에 분류되어 있는 콘텐츠를 추출하는 추출 수단을 포함하고, 상기 제시 수단은, 상기 제1 이용자에 대하여 추출된 상기 콘텐츠를 제시하도록 할 수 있다.The selection means includes detection means for detecting a second user having a similar history information with the first user, and content not present on the history information of the first user and present on the history information of the second user. Specifying means for specifying a classified first cluster, and extracting means for extracting content classified in the identified first cluster, wherein the presenting means presents the extracted content for the first user. You can do that.

본 발명의 일 측면인 정보 처리 장치는, 이용자의 이력 정보와 상기 데이터베이스에 기초하여, 상기 이용자의 기호를 상기 제1 클러스터 단위로 나타내는 기호 정보를 생성하는 생성 수단과, 상기 기호 정보에 기초하여 이용자를 그룹화하는 그룹화 수단을 더 포함하고, 상기 선택 수단은, 제1 이용자와 동일한 그룹에 속하 는 제2 이용자를 검출하는 검출 수단과, 상기 제1 이용자의 이력 정보 상에 존재하지 않고, 상기 제2 이용자의 이력 정보 상에 존재하는 콘텐츠가 분류되어 있는 제1 클러스터를 특정하는 특정 수단과, 특정된 상기 제1 클러스터에 분류되어 있는 콘텐츠를 추출하는 추출 수단을 포함하고, 상기 제시 수단은, 상기 제1 이용자에 대하여 추출된 상기 콘텐츠를 제시하도록 할 수 있다.An information processing apparatus which is an aspect of the present invention includes generation means for generating preference information indicating a preference of the user in units of the first cluster based on history information of the user and the database, and a user based on the preference information. Grouping means for grouping the second means, wherein the selecting means comprises: detecting means for detecting a second user belonging to the same group as the first user, and not present on the history information of the first user; Specifying means for specifying a first cluster in which the content existing on the user's history information is classified, and extracting means for extracting the content classified in the specified first cluster, wherein the presenting means comprises: the first means; 1, the extracted content can be presented to the user.

본 발명의 일 측면인 정보 처리 장치에는, 이용자의 이력 정보와 상기 데이터베이스에 기초하여, 상기 이용자의 기호를 상기 제1 클러스터 단위로 나타내는 기호 정보를 생성하는 생성 수단을 더 설치하고, 상기 선택 수단에는, 제1 이용자와 상기 기호 정보로 나타내어지는 기호가 유사한 제2 이용자를 검출하는 검출 수단과, 상기 제1 이용자의 기호 정보와 상기 제2 이용자의 기호 정보에 기초하여 주목하는 제1 클러스터를 특정하는 특정 수단과, 특정된 상기 제1 클러스터에 분류되어 있는 콘텐츠를 추출하는 추출 수단을 설치하고, 상기 제시 수단에는, 상기 제1 이용자에 대하여 추출된 상기 콘텐츠를 제시시킬 수 있다.The information processing apparatus which is an aspect of the present invention further includes generation means for generating preference information indicating the preference of the user in the first cluster unit based on the history information of the user and the database, and in the selection means, Detecting means for detecting a second user having a similar symbol represented by the first user and the preference information, and specifying a first cluster to be noted based on the preference information of the first user and preference information of the second user. A specific means and extraction means for extracting content classified in the specified first cluster are provided, and the presentation means can present the extracted content for the first user.

상기 검출 수단에는, 이용자의 기호 정보를 정규화하는 정규화 수단과, 정규화된 각 이용자의 기호 정보로부터, 이용자의 각각에 대해서 계층마다의 가중치를 계산하는 가중치 계산 수단과, 계층마다의 가중치와 상기 기호 정보로부터, 이용자 중의 제1 이용자와, 이용자 중의 다른 이용자와의 기호의 유사 정도를 나타내는 유사도를 계산하는 유사도 계산 수단을 설치하고, 계산된 유사도로부터, 제1 이용자와 기호가 유사한 제2 이용자를 검출시킬 수 있다.The detecting means includes normalization means for normalizing user's preference information, weight calculation means for calculating weight for each layer for each of the users from preference information of each normalized user, weight for each hierarchy and the preference information. A similarity calculating means for calculating a similarity indicating a degree of similarity between the first user among the users and the other user among the users is provided, and from the calculated similarity, a second user having similar tastes with the first user can be detected. Can be.

본 발명의 일 측면인 정보 처리 장치에는, 이용자의 이력 정보와 상기 데이 터베이스에 기초하여, 상기 이용자의 기호를 상기 제1 클러스터 단위로 나타내는 기호 정보를 생성하는 생성 수단과, 상기 기호 정보에 기초하여 이용자를 그룹화하는 그룹화 수단을 더 설치하고, 상기 선택 수단에는, 제1 이용자와 동일한 그룹에 속하는 제2 이용자를 검출하는 검출 수단과, 상기 제1 이용자의 기호 정보와 상기 제2 이용자의 기호 정보에 기초하여 주목하는 제1 클러스터를 특정하는 특정 수단과, 특정된 상기 제1 클러스터에 분류되어 있는 콘텐츠를 추출하는 추출 수단을 설치하고, 상기 제시 수단에는, 상기 제1 이용자에 대하여 추출된 상기 콘텐츠를 제시시킬 수 있다.An information processing apparatus, which is an aspect of the present invention, includes generation means for generating preference information indicating a preference of the user in units of the first cluster based on history information of the user and the database, and based on the preference information. And grouping means for grouping the users, wherein the selection means includes: detection means for detecting a second user belonging to the same group as the first user, preference information of the first user, and preference information of the second user. Specifying means for specifying a first cluster to be noted based on the above, and extracting means for extracting content classified in the identified first cluster, and the presenting means includes the content extracted for the first user. Can be presented.

본 발명의 일 측면인 정보 처리 장치에는, 상기 콘텐츠 분류 수단에 의해 상기 메타데이터가 분류되는 제1 클러스터의 각각에 대하여 키워드를 설정하는 설정 수단과, 상기 설정 수단에 의해 설정된 키워드를 이용하고, 콘텐츠의 제시 이유를 나타내는 이유문을 작성하는 작성 수단을 더 설치하고, 상기 제시 수단에는, 상기 이유문도 제시시킬 수 있다.In the information processing apparatus which is an aspect of the present invention, the setting means for setting a keyword for each of the first clusters in which the metadata is classified by the content classification means, and the keyword set by the setting means are used. Creation means for creating a reason statement indicating the reason for presentation is further provided, and the reason statement can also be presented to the presentation means.

콘텐츠의 메타데이터를 복수의 제2 클러스터 중 어느 하나에 분류하고, 제2 클러스터에 상기 계층을 할당하는 메타데이터 분류 수단을 더 설치하고, 상기 콘텐츠 분류 수단에는, 각 콘텐츠를, 할당된 상기 계층의 각각에서 복수의 제1 클러스터 중 어느 하나에 분류시킬 수 있다.Metadata classification means for classifying the metadata of the content into any one of the plurality of second clusters, and assigning the hierarchy to the second cluster is further provided, and the content classification means includes each content of the assigned hierarchy. Each can be classified into any one of a plurality of first clusters.

이용자의 이력 정보와 상기 데이터베이스에 기초하여, 상기 이용자의 기호를 상기 제1 클러스터 단위로 나타내는 기호 정보를 생성하는 생성 수단을 더 설치하고, 상기 선택 수단에는, 전부의 상기 계층의 전부의 상기 제1 클러스터 중, 상기 기호 정보로 나타내어지는 상기 제1 클러스터로서, 가장 많은 상기 제1 클러스터에 분류되어 있는 콘텐츠를 선택시킬 수 있다.Generation means for generating preference information indicating the preference of the user in units of the first cluster based on the history information of the user and the database, wherein the selection means includes the first of all the hierarchies of all the hierarchies; Among the clusters, the content classified into the largest number of the first clusters can be selected as the first cluster represented by the preference information.

본 발명의 일 측면인 정보 처리 방법은, 콘텐츠군 중으로부터 소정의 조건을 충족시키는 콘텐츠를 선택하여 이용자에게 제시하는 정보 처리 장치의 정보 처리 방법으로서, 상기 콘텐츠군을 구성하는 각 콘텐츠를, 콘텐츠의 메타데이터에 따른 각 계층에서 복수의 클러스터 중 어느 하나에 분류하고, 각 콘텐츠와 각 콘텐츠의 각각이 분류된 상기 계층에서의 상기 클러스터와의 대응 관계를 나타내는 데이터베이스를 유지하고, 상기 이용자의 콘텐츠에 대한 이력 정보를 관리하고, 상기 이력 정보에 기초하여, 주목하는 클러스터를 특정하고, 특정한 상기 클러스터에 분류되어 있는 콘텐츠를 선택하고, 선택된 상기 콘텐츠를 제시하는 스텝을 포함한다.An information processing method, which is an aspect of the present invention, is an information processing method of an information processing apparatus for selecting and presenting content that satisfies a predetermined condition from a content group to a user. Classify the data into any one of a plurality of clusters in each hierarchy according to metadata, and maintain a database indicating a correspondence relationship between each content and each cluster in the hierarchy in which each content is classified, and Managing history information, specifying a cluster of interest based on the history information, selecting content classified in the specific cluster, and presenting the selected content.

본 발명의 일 측면인 프로그램은, 콘텐츠군 중으로부터 소정의 조건을 충족시키는 콘텐츠를 선택하여 이용자에게 추천하기 위한 프로그램으로서, 상기 콘텐츠군을 구성하는 각 콘텐츠를, 콘텐츠의 메타데이터에 따른 각 계층에서 복수의 클러스터 중 어느 하나에 분류하고, 각 콘텐츠와 각 콘텐츠의 각각이 분류된 상기 계층에서의 상기 클러스터와의 대응 관계를 나타내는 데이터베이스를 유지하고, 상기 이용자의 콘텐츠에 대한 이력 정보를 관리하고, 상기 이력 정보에 기초하여, 주목하는 클러스터를 특정하고, 특정한 상기 클러스터에 분류되어 있는 콘텐츠를 선택하고, 선택된 상기 콘텐츠를 제시하는 스텝을 포함하는 처리를 컴퓨터에 실행시킨다.A program, which is an aspect of the present invention, is a program for selecting content that satisfies a predetermined condition from a content group and recommending it to a user, wherein each content constituting the content group is included in each layer according to the metadata of the content. Classify the content into one of a plurality of clusters, maintain a database indicating a correspondence relationship between each content and the cluster in the hierarchy in which each content is classified, and manage history information on the content of the user; Based on the history information, the computer executes a process including the step of specifying a cluster of interest, selecting content classified in the specific cluster, and presenting the selected content.

본 발명의 일 측면에서는, 상기 콘텐츠군을 구성하는 각 콘텐츠가, 콘텐츠의 메타데이터에 따른 각 계층에서 복수의 클러스터 중 어느 하나에 분류된다. 그리고, 각 콘텐츠와 각 콘텐츠의 각각이 분류된 상기 계층에서의 상기 클러스터와의 대응 관계를 나타내는 데이터베이스가 유지된다. 또한, 상기 이용자의 콘텐츠에 대한 이력 정보가 관리된다. 그리고, 상기 이력 정보에 기초하여, 주목하는 클러스터가 특정되고, 특정된 상기 클러스터에 분류되어 있는 콘텐츠가 선택되고, 선택된 상기 콘텐츠가 제시된다.In one aspect of the present invention, each content constituting the content group is classified into any one of a plurality of clusters in each hierarchy according to the metadata of the content. Then, a database showing a correspondence relationship between each content and the cluster in the hierarchy in which each content is classified is maintained. Further, history information on the user's content is managed. Based on the history information, the cluster of interest is specified, the content classified in the specified cluster is selected, and the selected content is presented.

<발명의 효과>Effect of the Invention

이상과 같이, 본 발명의 일 측면에 따르면, CF 방법에 의해 이용자에게 콘텐츠를 추천하는 것이 가능하게 된다.As described above, according to one aspect of the present invention, it becomes possible to recommend content to a user by the CF method.

또한, 본 발명의 일 측면에 따르면, 전체 콘텐츠 중, 일부의 콘텐츠에 추천이 집중되는 것을 억지하는 것이 가능하게 된다.In addition, according to one aspect of the present invention, it becomes possible to suppress the concentration of the recommendation to a part of the content of the entire content.

또한, 본 발명의 일 측면에 따르면, 이력 정보가 적은 이용자에 대해서도 콘텐츠를 추천하는 것이 가능하게 된다.Further, according to one aspect of the present invention, it is possible to recommend content even to a user who has little history information.

도 1은 본 발명을 적용한 추천 시스템의 구성예를 도시하는 블록도.1 is a block diagram showing a configuration example of a recommendation system to which the present invention is applied.

도 2는 악곡의 메타데이터를 분류하는 클러스터와 클러스터층의 개념을 도시하는 도면.2 is a diagram showing the concept of clusters and cluster layers for classifying metadata of music pieces;

도 3은 악곡-클러스터 대응표의 일례를 도시하는 도면.3 is a diagram showing an example of a music-cluster correspondence table;

도 4는 클러스터-악곡 대응표의 일례를 도시하는 도면.4 is a diagram showing an example of a cluster-music correspondence table.

도 5는 이용자의 기호 벡터의 일례를 도시하는 도면.5 is a diagram illustrating an example of a user's preference vector.

도 6은 오프라인 시의 전처리를 설명하는 플로우차트.6 is a flowchart for explaining preprocessing when offline.

도 7은 제1 추천 처리를 설명하는 플로우차트.7 is a flowchart for explaining a first recommendation process.

도 8은 제2 및 3 추천 처리를 설명하는 플로우차트.8 is a flowchart for explaining second and third recommendation processes.

도 9는 제4 추천 처리를 설명하는 플로우차트.9 is a flowchart for explaining a fourth recommendation process.

도 10은 제5 및 6 추천 처리를 설명하는 플로우차트.10 is a flowchart for describing fifth and sixth recommendation processes.

도 11은 제7 추천 처리를 설명하는 플로우차트.11 is a flowchart for explaining a seventh recommendation process.

도 12는 범용 퍼스널 컴퓨터의 구성예를 도시하는 블록도.12 is a block diagram illustrating a configuration example of a general-purpose personal computer.

도 13은 본 발명의 일 실시 형태의 추천 시스템의 다른 구성의 예를 도시하는 블록도.It is a block diagram which shows the example of another structure of the recommendation system of one Embodiment of this invention.

도 14는 오프라인 시의 전처리의 다른 예를 설명하는 플로우차트.14 is a flowchart for explaining another example of preprocessing when offline.

도 15는 소프트 클러스터링된 각 악곡의 메타데이터의 예를 도시하는 도면.Fig. 15 is a diagram showing an example of metadata of each piece of music soft-clustered.

도 16은 각 악곡의 메타데이터의 예를 도시하는 도면.16 is a diagram illustrating an example of metadata of each piece of music.

도 17은 클러스터링된 각 악곡의 메타데이터의 예를 도시하는 도면.Fig. 17 is a diagram showing an example of metadata of each clustered music.

도 18은 유사 이용자 검출부의 구성의 예를 도시하는 블록도.18 is a block diagram illustrating an example of a configuration of a similar user detection unit.

도 19는 기호가 유사한 이용자 X의 검출의 처리를 설명하는 플로우차트.Fig. 19 is a flowchart for explaining processing of detection of user X having similar preferences.

도 20은 기호 벡터의 예를 도시하는 도면.20 is a diagram illustrating an example of a symbol vector.

도 21은 정규화된 기호 벡터의 예를 도시하는 도면.21 illustrates an example of a normalized symbol vector.

도 22는 가중치의 예를 도시하는 도면.22 is a diagram illustrating an example of weights.

도 23은 가중치 부여하지 않고 계산한 유사도의 예를 도시하는 도면.Fig. 23 is a diagram showing an example of similarity calculated without weighting.

도 24는 가중치 부여하여 계산한 유사도의 예를 도시하는 도면.24 is a diagram illustrating an example of similarity calculated by weighting.

<부호의 설명><Description of the code>

1:추천 시스템1: Recommended system

11:악곡 DB11: Music DB

12:클러스터링부12: clustering part

13:키워드 설정부13: Keyword setting part

14:클러스터링 완료 DB14: clustering completed DB

15:클러스터-악곡 대응표15: cluster-music correspondence list

16:악곡-클러스터 대응표16: Music piece-cluster correspondence list

17:이용자 이력 정보 DB17: User history information DB

18:추천 후보 선택부18: Recommendation candidate selection part

19:기호 벡터 생성부19: symbol vector generator

20:유사 이용자 검출부20: Similar user detection unit

21:이용자 그룹화부21: user grouping unit

22:차분 검출부22: difference detection unit

23:추천 클러스터 결정부23: Recommended cluster decision part

24:추출부24: extraction part

25:악곡 선택부25: music selection part

26:신규성 판정부26: Newness judgment department

27:선택 이유 생성부27: selection reason generator

28:제시부28: presentation part

100:퍼스널 컴퓨터100: personal computer

101:CPU101: CPU

111:기록 매체111: recording medium

201:메타데이터 클러스터링부201: metadata clustering unit

202:악곡 클러스터링부202: music clustering part

203:유사 이용자 검출부203: Similar user detection unit

231:정규화부231: normalization

232:가중치 계산부232: weight calculation unit

233:유사도 계산부233: Similarity calculator

이하, 본 발명을 적용한 구체적인 실시 형태에 대하여, 도면을 참조하면서 상세하게 설명한다.EMBODIMENT OF THE INVENTION Hereinafter, the specific embodiment which applied this invention is described in detail, referring drawings.

도 1은 본 발명의 일 실시 형태인 추천 시스템의 구성예를 도시하고 있다. 이 추천 시스템(1)은, 예를 들면 인터넷 상에 개설된 악곡 데이터의 판매 사이트에서, 이용자의 이력 정보(악곡 데이터의 구입, 시청, 검색, 보유 등의 정보)를 관리하고, CF법을 이용하여 추천하는 악곡을 선택하고, 이용자에게 제시하는 것이다. 또한, 추천 시스템(1)은, 악곡 이외의 콘텐츠, 예를 들면 텔레비전 프로그램, 영화, 서적 등을 판매하는 판매 사이트에도 적용하는 것이 가능하다.1 shows a configuration example of a recommendation system according to an embodiment of the present invention. This recommendation system 1 manages the user's history information (information such as the purchase, viewing, retrieval, retention, etc. of music data) at the sales site of music data established on the Internet, for example, and uses the CF method. Select the recommended piece of music and present it to the user. In addition, the recommendation system 1 can be applied to sales sites that sell contents other than music, for example, television programs, movies, books, and the like.

추천 시스템(1)은, 이용자에게 추천하여 판매하기 위한 수많은 악곡 데이터(이하, 단순히 악곡이라고도 기술함)의 메타데이터가 기록되어 있는 악곡 데이터베 이스(DB)(11), 악곡 데이터베이스(11)에 기록되어 있는 각 악곡의 메타데이터에 기초하여, 각 악곡을 클러스터링하여 각 악곡의 클러스터 정보를 생성하는 클러스터링부(12), 각 클러스터층과 클러스터층에서의 각 클러스터의 특징을 각각 나타내는 키워드를 설정하는 키워드 설정부(13), 및, 각 악곡의 클러스터링 결과를 유지하는 클러스터링 완료 데이터베이스(DB)(14)로 구성된다.The recommendation system 1 is provided in a music database (DB) 11 and a music database 11 in which metadata of numerous pieces of music data (hereinafter, simply referred to as music) for recommending and selling to users is recorded. On the basis of the recorded metadata of each piece of music, a clustering unit 12 for clustering each piece of music to generate cluster information for each piece of music, and for setting each of the cluster layers and keywords representing the characteristics of each cluster in the cluster layer, respectively. It consists of a keyword setting part 13 and a clustering completed database (DB) 14 which holds the clustering result of each music.

클러스터링 완료 DB(14)에는, 클러스터링 결과로서, 각 클러스터에 속하는 악곡을 나타내는 클러스터-악곡 대응표(15)와, 각 악곡이 속하는 클러스터를 나타내는 악곡-클러스터 대응표(16)가 유지되어 있다.In the clustering completion DB 14, as a clustering result, a cluster-music correspondence table 15 indicating a piece of music belonging to each cluster and a music-cluster correspondence table 16 indicating a cluster to which each piece of music belongs are held.

또한, 추천 시스템(1)은, 각 이용자의 이력 정보를 관리하고 있는 이용자 이력 정보 데이터베이스(DB)(17), 이용자 정보에 기초하여 추천 후보로 되는 복수의 악곡을 선택하는 추천 후보 선택부(18), 선택된 복수의 추천 후보 중으로부터 1 악곡을 선택하는 악곡 선택부(25), 선택된 악곡이 추천되는 이용자에게 있어서 신규성이 있는 것인지의 여부를 판정하는 신규성 판정부(26), 선택된 악곡을 이용자에게 제시할 때의 추천 이유문을 생성하는 선택 이유 생성부(27), 및, 선택된 악곡과 추천 이유문을 이용자에게 제시하는 제시부(28)로 구성된다.In addition, the recommendation system 1 includes a recommendation candidate selection unit 18 that selects a plurality of pieces of music to be recommended candidates based on the user history information database (DB) 17 that manages the history information of each user, and the user information. ), A music selection section 25 for selecting one piece of music from a plurality of selected recommendation candidates, a novelty determination section 26 for determining whether or not the selected piece of music is novel to the user to be recommended, and the selected piece of music to the user. The selection reason generation part 27 which produces | generates the recommendation reason statement at the time of presentation, and the presentation part 28 which presents the selected piece of music and the recommendation reason statement to a user are comprised.

추천 후보 선택부(18)는, 기호 벡터 생성부(19), 이용자 그룹화부(20), 유사 이용자 검출부(21), 차분 검출부(22), 추천 클러스터 결정부(23), 및 추출부(24)를 포함한다.The recommendation candidate selection unit 18 includes a preference vector generation unit 19, a user grouping unit 20, a similar user detection unit 21, a difference detection unit 22, a recommendation cluster determination unit 23, and an extraction unit 24. ).

악곡 DB(11)는, 음악 CD에 수록되어 있는 악곡의 메타데이터를 공급하는 인터넷 상의 데이터 서버인 CDDB(CD Data Base)나 Music Navi 등과 마찬가지로, 추천 하여 판매하는 악곡의 메타데이터를 유지하고 있다.The music DB 11 maintains metadata of music recommended and sold like CDDB (CD Data Base) or Music Navi, which is a data server on the Internet, which supplies metadata of music recorded on a music CD.

클러스터링부(12)는, 악곡 DB(11)의 모든 악곡에 대하여, 악곡의 메타데이터의 각 항목(아티스트명, 장르, 앨범, 아티스트 리뷰, 악곡 리뷰, 타이틀, 템포, 비트, 리듬 등) 혹은 그들의 조합(템포, 비트, 리듬 등)을 기초로 하여, 도 2에 도시하는 바와 같은 클러스터층(제1 내지 n층)을 만들고, 악곡을 각 클러스터층에 설치되는 복수의 클러스터 중의 어느 하나, 혹은 복수에 분류한다(클러스터링한다).The clustering unit 12 includes all items (artist name, genre, album, artist review, music review, title, tempo, beat, rhythm, etc.) of the metadata of the music for all the music of the music DB 11 or their music. Based on the combination (tempo, beat, rhythm, etc.), cluster layers (first to n layers) as shown in FIG. 2 are made, and one or more of a plurality of clusters provided with music in each cluster layer. Classify (cluster).

여기서는 악곡을 예로 설명되어 있지만, 아티스트, 앨범에 대해서도 마찬가지로 많은 메타를 사용하여 각각 다층에 클러스터링한다. 악곡 추천, 아티스트 추천, 앨범 추천을 위해서, 각각 악곡용 다층 클러스터, 아티스트용 다층 클러스터, 앨범용 다층 클러스터를 이용한다.Although music is described as an example here, artists and albums are similarly clustered in multiple layers using a lot of meta. For music recommendation, artist recommendation, and album recommendation, multilayer music clusters for artists, multilayer clusters for artists, and multilayer clusters for albums are used, respectively.

클러스터링은 어떠한 방법을 이용해도 되지만, 클러스터층마다 최적인 클러스터링 방법, 거리 척도를 선택하도록 한다. 예를 들면, 메타데이터의 실 정보가 템포 등의 수치 속성이면 그대로, 타이틀 등의 명의 속성인 경우에는 주성분 분석 등의 수량화 방법을 이용해서 수치로 하여, 유클리드 거리 등의 거리 척도를 정의해서 클러스터링하게 된다. 대표적인 클러스터링 방법으로서는, K-means법, 계층 클러스터링법(군평균법, 최장거리법, 워드법), 소프트 클러스터링법 등을 들 수 있다.Clustering may be used in any manner, but the clustering method and distance measure are selected for each cluster layer. For example, if the actual information of the metadata is a numerical attribute such as tempo, and if it is a title attribute such as a title, it is numerically determined using a quantification method such as principal component analysis to define and cluster distance measures such as Euclidean distance. do. Representative clustering methods include K-means method, hierarchical clustering method (group average method, longest distance method, word method), soft clustering method and the like.

이 때, 기호 거리를 반영한 클러스터링(예를 들면, 제약 첨부 클러스터링)에 의해 실시하는 것이 바람직하다. 그를 위해서는, 사전 조사에 의해 부분적인 정답집(기호적으로 가까운 실 정보의 집합, 먼 실 정보의 집합 등)을 만들고, 그에 적 합한 수치 표현, 거리, 클러스터링 방법을 이용하는 것으로 한다. 또한, 형성되는 각 클러스터층의 독립성이 높아지는 클러스터링 방법(즉, 특성이 서로 다른 클러스터링 방법)을 선택하는 것이 바람직하다.At this time, it is preferable to carry out by clustering (for example, clustering with restrictions) which reflected the symbol distance. For this purpose, partial surveys (symbolic sets of near real information, sets of far real information, etc.) are made by preliminary investigation, and appropriate numerical expressions, distances, and clustering methods are used. In addition, it is preferable to select a clustering method (that is, a clustering method having different characteristics) in which the independence of each cluster layer to be formed is increased.

또한, 1개의 실 정보를 동일 클러스터층의 복수의 클러스터에 분류하여도 된다. 동일 클러스터층에 존재하는 클러스터 간의 거리(유사 정도를 나타냄)는 기지인 것으로 한다. 이 클러스터링 방법에 대해서는 후술한다. 그리고, 메타데이터 대신에 악곡의 특징을 나타내는 정보로서, 메타데이터의 각 항목의 실 정보를 분류한 클러스터의 클러스터 ID(도 2에서의 CL11 등)로 이루어지는 클러스터 정보를 생성하여 클러스터링 완료 DB(14)에 출력한다.Further, one piece of real information may be classified into a plurality of clusters of the same cluster layer. The distance (showing similarity) between clusters existing in the same cluster layer is assumed to be known. This clustering method will be described later. Instead of metadata, cluster information including cluster IDs (CL11 in FIG. 2, etc.) of clusters in which real information of each item of metadata is classified as information representing the characteristics of a piece of music is generated and the clustering completed DB 14 Output to

또한, 분류에 적합한 클러스터가 존재하지 않는 경우, 새롭게 클러스터를 신설하여도 된다. 각 클러스터의 사이즈는 임의로서 복수의 실 정보를 포함할 수 있는 것이다. 또한, 단일의 실 정보만 분류할 수 있는 클러스터를 설치하여도 된다. 이 경우, 해당 클러스터의 클러스터 ID에 유일 분류 가능한 실 정보의 ID(아티스트 ID, 앨범 ID, 타이틀 ID)를 이용하여도 된다.In addition, when a cluster suitable for classification does not exist, you may newly establish a cluster. The size of each cluster can optionally contain a plurality of real information. In addition, a cluster capable of classifying only single real information may be provided. In this case, the ID (artist ID, album ID, title ID) of the real information that can be uniquely classified may be used as the cluster ID of the cluster.

클러스터링 완료 DB(14)는, 클러스터링부(12)에 의해 생성된 각 악곡의 클러스터 정보에 기초하여, 클러스터-악곡 대응표(15)와 악곡-클러스터 대응표(16)를 생성하여 유지하고 있다. 또한, 클러스터링 완료 DB(14)는, 키워드 설정부(13)에 의해 설정된 각 클러스터층이나 각 클러스터에 대하여 설정된 키워드도 유지하고 있다.The clustering completion DB 14 generates and maintains the cluster-music correspondence table 15 and the music-cluster correspondence table 16 based on the cluster information of each piece of music generated by the clustering unit 12. In addition, the clustering completion DB 14 also maintains the keywords set for each cluster layer and each cluster set by the keyword setting unit 13.

도 3은 악곡-클러스터 대응표(16)의 일례를 나타내고 있다. 도 3에서는, 예 를 들면, 악곡 ID=ABC123의 악곡의 클러스터 정보는, (CL12, CL21, CL35, CL47, CL52, …, CLn2)인 것을 나타내고 있다. 또한 예를 들면, 악곡 ID=CTH863의 악곡의 클러스터 정보는, 클러스터 ID(CL11, CL25, CL31, CL42, CL53, …, CLn1)인 것을 나타내고 있다.3 shows an example of the music-cluster correspondence table 16. In FIG. 3, cluster information of the music of music ID = ABC123, for example shows that it is (CL12, CL21, CL35, CL47, CL52, ..., CLn2). For example, the cluster information of the music of music ID = CTH863 shows that it is cluster ID (CL11, CL25, CL31, CL42, CL53, ..., CLn1).

도 4는, 도 3에 도시한 악곡-클러스터 대응표(16)에 따른 클러스터-악곡 대응표(15)의 일례를 나타내고 있다. 도 4에서는, 예를 들면, 클러스터 ID=CL11에는, 악곡 ID=CTH863이 대응하는 것을 나타내고 있다. 또한 예를 들면, 클러스터 ID=CL21에는, 악곡 ID=ABC123이 대응하는 것을 나타내고 있다. 또한 예를 들면, 클러스터 ID=CL32에는, 악곡 ID=XYZ567이 대응하는 것을 나타내고 있다.FIG. 4 shows an example of the cluster-music correspondence table 15 according to the music-cluster correspondence table 16 shown in FIG. In FIG. 4, for example, music ID = CTH863 corresponds to cluster ID = CL11. For example, it shows that cluster ID = CL21 corresponds to music ID = ABC123. For example, it is shown that cluster ID = CL32 corresponds to music ID = XYZ567.

또한, 클러스터링부(12), 키워드 설정부(13), 및 클러스터링 완료 DB(14)의 처리는, 악곡 DB(11)에 새로운 악곡의 메타데이터가 추가될 때마다 실행해 둘 필요가 있다.In addition, the process of the clustering part 12, the keyword setting part 13, and the clustering completion DB 14 needs to be performed whenever the metadata of a new music is added to the music composition DB11.

도 1로 되돌아간다. 이용자 이력 정보 DB(17)에는, 각 이용자가 해당 판매 사이트에서 구입, 시청, 또는 검색한 악곡, 혹은, 어느 하나로 구입하여 이미 보유하고 있는 것을 신고한 악곡을 나타내는 이력 정보가 유지되어 있다. 또한, 이용자 이력 정보 DB(17)에는, 기호 벡터 생성부(19)에 의해 생성되는 각 이용자의 기호 벡터가 유지되어 있다. 또한, 이용자 이력 정보 DB(17)에는, 이용자 그룹화부(20)에 의한 이용자의 그룹화 결과, 즉, 각 이용자가 어느 이용자 그룹에 속하는지를 나타내는 정보가 유지되어 있다.Return to FIG. In the user history information DB 17, the history information which shows the music which each user purchased, watched or searched at the said sales site, or the music which the user purchased and reported that it has already possessed is hold | maintained. In the user history information DB 17, a symbol vector of each user generated by the symbol vector generation unit 19 is held. In the user history information DB 17, the grouping result of the user by the user grouping unit 20, that is, information indicating which user group each user belongs to is held.

기호 벡터 생성부(19)는, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용 자의 이력 정보에 기초하여, 각 이용자에 대하여, 모든 클러스터를 각각 1차원으로 하는 다차원의 기호 벡터를 생성하고, 이용자 이력 정보 DB(17)에 출력한다. 구체적으로는, 이용자의 이력 정보에 있는 각 악곡에 대해서, 클러스터링 완료 DB(14)의 악곡-클러스터 대응표(16)를 참조하고, 악곡이 속하는 클러스터에 대응하는 기호 벡터의 차원에 소정의 값을 가산한다. 생성된 각 이용자의 기호 벡터는, 이용자 이력 정보 DB(17)에서 관리된다. 악곡의 구입 등에 의해 이용자의 이력 정보가 갱신되면, 기호 벡터도 갱신되게 된다.The symbol vector generation unit 19 generates a multidimensional symbol vector having all clusters as one dimension for each user based on the history information of each user held in the user history information DB 17, It outputs to the user history information DB17. Specifically, for each piece of music in the user's history information, the music-cluster correspondence table 16 of the clustered DB 14 is referred to, and a predetermined value is added to the dimension of the symbol vector corresponding to the cluster to which the piece belongs. do. The generated preference vector of each user is managed in the user history information DB 17. When the user's history information is updated by purchasing music, etc., the symbol vector is also updated.

여기서 설명의 편의상, 모든 클러스터가 도 5에 도시하는 바와 같이, 제1층에는 3 클러스터(CL11, CL12, CL13), 제2층에는 4 클러스터(CL21, CL22, CL23, CL24), 제3층에는 3 클러스터(CL31, CL32, CL33), 제4층에는 3 클러스터(CL41, CL42, CL43)만이 존재하고 있다고 가정한다. 이 경우, 기호 벡터는 13차원이다.For convenience of explanation, as shown in FIG. 5, all the clusters are shown in FIG. 5. In the first layer, three clusters CL11, CL12 and CL13 are used. In the second layer, four clusters CL21, CL22, CL23 and CL24 are used. It is assumed that only three clusters CL41, CL42, CL43 exist in the three clusters CL31, CL32, CL33 and the fourth layer. In this case, the symbol vector is 13 dimensions.

예를 들면, 이용자 A의 이력 정보에 2 악곡을 구입한 기록이 있고, 첫번째 악곡이 속하는 클러스터 ID가 CL11, CL22, CL33, CL41이면, 이들에 대응하는 차원의 값에 각각 1이 가산된다. 또한, 두번째 악곡이 속하는 클러스터 ID가 CL12, CL24, CL32, CL43이면, 이들에 대응하는 차원의 값에 각각 1이 가산된다. 그리고, 이용자 A의 기호 벡터(1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1)가 생성된다.For example, if there are records in which two pieces of music are purchased in the history information of the user A, and the cluster IDs to which the first piece of music belongs are CL11, CL22, CL33, and CL41, 1 is added to the values of the corresponding dimensions. If the cluster ID to which the second piece of music belongs is CL12, CL24, CL32, CL43, 1 is added to the values of the dimensions corresponding to them. Then, symbol A (1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1) of user A is generated.

또한 예를 들면, 이용자 X의 이력 정보에 3 악곡을 구입한 기록이 있고, 첫번째 악곡이 속하는 클러스터 ID가 CL11, CL22, CL32, CL43이면, 이들에 대응하는 차원의 값에 각각 1이 가산된다. 또한, 두번째 악곡이 속하는 클러스터 ID가 CL12, CL22, CL33, CL42이면, 이들에 대응하는 차원의 값에 각각 1이 가산된다. 또한, 세번째 악곡이 속하는 클러스터 ID가 CL13, CL24, CL33, CL41이면, 이들에 대응하는 차원의 값에 각각 1이 가산된다. 그리고, 이용자 X의 기호 벡터(1, 1, 1, 0, 2, 0, 1, 0, 1, 2, 1, 1, 1)가 생성된다.For example, if there are records in which three pieces of music are purchased in the history information of the user X, and the cluster IDs to which the first piece of music belongs are CL11, CL22, CL32, and CL43, 1 is added to the values of the corresponding dimensions. If the cluster ID to which the second piece of music belongs is CL12, CL22, CL33, CL42, 1 is added to the value of the dimension corresponding to them, respectively. Further, if the cluster ID to which the third piece of music belongs is CL13, CL24, CL33, CL41, 1 is added to the value of the dimension corresponding to them, respectively. The symbol vector 1, 1, 1, 0, 2, 0, 1, 0, 1, 2, 1, 1, 1 of the user X is generated.

또한, 각 차원에의 가산치는 이력 정보의 종별(즉, 구입, 시청, 검색, 또는 보유)에 따라서 변경하도록 하여도 된다. 예를 들면, 구입 또는 보유이면 1을 가산하고, 시청이면 0.5를 가산하고, 검색이면 0.3을 가산하도록 하여도 된다.In addition, the addition value to each dimension may be changed according to the kind of history information (that is, purchase, viewing, searching, or holding). For example, you may add 1 if it is a purchase or retention, 0.5 if it is a city hall, and 0.3 if it is a search.

도 1로 되돌아간다. 이용자 그룹화부(20)는, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용자의 기호 벡터의 유사성에 기초하여, 전체 이용자를 그룹화한다. 단, 다차원의 기호 벡터의 유사성을 판정하는 처리를 용이하게 하기 위해서, 각 이용자의 기호 벡터의 각 차원의 값을 2치화한다. 즉, 각 차원의 값이 1 이상일 때에는 1로 치환하고, 0일 때에는 0인 채로 한다. 이와 같이, 기호 벡터의 각 차원의 값을 2치화하면, 2치화하지 않은 경우에 비교해서 유사성의 판정 이에 요하는 연산량을 감소시킬 수 있어, 용이하게 그룹화를 행할 수 있다. 이 그룹화 결과는, 이용자 이력 정보 DB(17)에서 관리된다.Return to FIG. The user grouping unit 20 groups all the users based on the similarity of the preference vector of each user held in the user history information DB 17. However, in order to facilitate the process of determining the similarity of the multidimensional symbol vectors, the values of the respective dimensions of the symbol vector of each user are binarized. That is, when the value of each dimension is 1 or more, it is substituted by 1, and when it is 0, it is set to 0. By binarizing the values of the respective dimensions of the symbol vector in this manner, it is possible to reduce the amount of computation required for the determination of similarity as compared with the case of not binarizing, so that grouping can be easily performed. This grouping result is managed by the user history information DB17.

유사 이용자 검출부(21)는, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용자의 이력 정보를 비교함으로써, 악곡이 추천되는 이용자와 이력 정보가 유사한 다른 이용자를 검출한다. 또한, 유사 이용자 검출부(21)는, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용자의 기호 벡터를 비교함으로써, 악곡이 추천되는 이용자와 기호 벡터가 유사한 다른 이용자를 검출한다. 차분 검출부(22)는, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용자의 이력 정보에 기초하여, 악곡이 추천되는 이용자와, 유사 이용자 검출부(21)에 의해 검출된 다른 이용자와의 이력 정보의 차분을 검출한다. 추천 클러스터 결정부(23)는, 악곡이 추천되는 이용자와, 유사 이용자 검출부(21)에 의해 검출된 다른 이용자와의 기호 벡터의 차에 기초하여 추천 클러스터를 결정한다. 추출부(24)는, 유사 이용자 검출부(21) 내지 추천 클러스터 결정부(23)의 처리 결과와, 클러스터링 완료 DB(14)에 기초하여, 추천 후보로 되는 악곡을 추출한다.The similar user detection unit 21 compares the history information of each user held in the user history information DB 17 to detect a user whose music is recommended and another user whose history information is similar. In addition, the similar user detection unit 21 compares the preference vector of each user held in the user history information DB 17 to detect another user whose music is recommended and similar in preference. The difference detection section 22, based on the history information of each user held in the user history information DB 17, history information of the user whose music is recommended and the other users detected by the similar user detection section 21. To detect the difference. The recommendation cluster determination unit 23 determines the recommendation cluster based on the difference between the preference vector and the user whose music is recommended and the other user detected by the similar user detection unit 21. The extraction part 24 extracts the music used as a recommendation candidate based on the processing result of the similar user detection part 21-recommendation cluster determination part 23, and the clustering completed DB14.

악곡 선택부(25)는, 추출된 복수의 악곡 중으로부터 소정의 조건에 따라서 1 악곡을 선택한다. 예를 들면, 더 많은 추천 클러스터에 속해 있는 것, 미리 설정되어 있는 우선 순위가 높은 클러스터층에서의 추천 클러스터에 속해 있는 것, 또는 랜덤하게 1 악곡을 선택하고, 선택 결과를 신규성 판정부(26), 및 선택 이유 생성부(27)에 출력한다. 신규성 판정부(26)는, 악곡이 추천되는 이용자의 기호 벡터에 기초하여, 선택된 악곡이 속하는 클러스터의, 해당 기호 벡터에 대한 중복도가 소정의 비율(예를 들면 30%) 이상인 경우에는 신규성이 없는 것으로 판정하고, 소정의 비율 미만인 경우에는 신규성이 있는 것으로 판정하고, 판정 결과를 선택 이유 생성부(27)에 출력한다.The music selection unit 25 selects one piece of music from a plurality of extracted pieces of music according to a predetermined condition. For example, one belonging to more recommendation clusters, one belonging to a recommendation cluster in a predetermined high priority cluster layer, or one piece of music is randomly selected, and the selection result is determined by the novelty determination unit 26. , And the selection reason generator 27 is output. The novelty determination unit 26, based on the preference vector of the user whose music is recommended, when the degree of redundancy of the cluster to which the selected music belongs belongs is higher than or equal to a predetermined ratio (for example, 30%), the novelty is determined. If it is determined that there is no, and it is less than the predetermined ratio, it is determined that there is novelty, and the determination result is output to the selection reason generation unit 27.

선택 이유 생성부(27)는, 선택된 악곡이 속하는 클러스터층이나 클러스터에 대응하는 키워드를 클러스터링 완료 DB(14)로부터 취득하고, 취득한 키워드 등을 이용하여 선택의 이유를 나타내는 선택 이유문을 생성한다. 또한, 신규성 판정부(26)로부터의 판정 결과에 기초해서도 예를 들면, 신규성이 있는 것에 대해서는 「의외의」, 신규성의 없는 것에 대해서는 「언제나」 혹은 「친숙한」 등의 문언 을 포함시켜서 선택 이유문을 생성한다. 그리고, 생성한 선택 이유문을, 선택된 악곡의 악곡 ID와 함께 제시부(28)에 출력한다.The selection reason generation unit 27 obtains from the clustering completed DB 14 the cluster layer to which the selected music belongs or the keyword corresponding to the cluster, and generates a selection reason statement indicating the reason for selection by using the acquired keyword or the like. Also, based on the determination result from the novelty determination unit 26, for example, the reason for the selection is to include the words "unexpected" for the novelty and "always" or "friendly" for the absence of the novelty. Create a statement. The generated selection reason statement is output to the presentation unit 28 together with the music ID of the selected music.

또한, 선택된 악곡의 리뷰 텍스트를 그대로 선택 이유문으로서 인용하거나, 선택된 악곡의 리뷰 텍스트로부터 추출한 단어를 이용하여 선택 이유문을 생성하여도 된다. 또한, 리뷰 텍스트로부터 선택 이유문에 이용하는 단어를 추출하기 위해서는 Tf/idf법을 적용할 수 있다.Further, the review text of the selected piece of music may be cited as a selection reason statement as it is, or the selection reason statement may be generated using a word extracted from the review text of the selected piece of music. In addition, the Tf / idf method may be applied to extract a word used for the selection reason text from the review text.

제시부(28)는, 선택된 악곡에 관한 정보를 악곡 DB로부터 취득하여, 생성된 선택 이유문과 함께 이용자측에 제시한다.The presentation unit 28 obtains information on the selected piece of music from the piece of music DB and presents it to the user along with the generated selection reason statement.

다음으로, 추천 시스템(1)의 동작에 대해서 설명한다. 우선, 악곡을 추천하는 처리의 준비인 오프라인 시의 전처리에 대해서, 도 6의 플로우차트를 참조하여 설명한다.Next, the operation of the recommendation system 1 will be described. First, the offline preprocessing in preparation for a process for recommending a piece of music will be described with reference to the flowchart of FIG. 6.

스텝 S1에서, 클러스터링부(12)는, 악곡 DB(11)의 모든 악곡을, 악곡의 메타데이터의 각 항목 클러스터층(제1 내지 n층) 중 어느 하나에 분류하고, 각 항목의 실 정보를 분류한 클러스터층에 설치되는 복수의 클러스터 중 어느 하나에 분류(클러스터링)한다. 그리고, 클러스터링부(12)는, 메타데이터 대신에 악곡의 특징을 나타내는 정보로서, 메타데이터의 각 항목의 실 정보를 분류한 클러스터의 클러스터 ID로 이루어지는 클러스터 정보를 생성하여 클러스터링 완료 DB(14)에 출력한다. 또한, 이미 클러스터링 완료의 악곡에 대해서는, 클러스터링을 생략하고, 미클러스터링 완료의 악곡에 대해서만, 클러스터링하도록 해도 된다. 클러스터링 완료 DB(14)는, 클러스터링부(12)에 의해 생성된 각 악곡의 클러스터 정보에 기초하 여, 클러스터-악곡 대응표(15)와 악곡-클러스터 대응표(16)를 생성한다.In step S1, the clustering unit 12 classifies all pieces of music of the music DB 11 into any one of the item cluster layers (first to nth layers) of the metadata of the piece of music, and classifies actual information of each item. Classification (clustering) is carried out in any one of the some cluster provided in the classification cluster layer. Instead of the metadata, the clustering unit 12 generates cluster information consisting of cluster IDs of clusters in which real information of each item of metadata is classified, and generates information on the clustering completed DB 14 instead of metadata. Output In addition, clustering may be abbreviate | omitted about the music which has already been clustered, and you may make it cluster only about the music which has not been clustered. The clustering completion DB 14 generates the cluster-music correspondence table 15 and the music-cluster correspondence table 16 based on the cluster information of each piece of music generated by the clustering unit 12.

스텝 S2에서, 추천 후보 선택부(18)의 기호 벡터 생성부(19)는, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용자의 이력 정보에 기초하여, 각 이용자에 대하여 기호 벡터를 생성하고, 이용자 이력 정보 DB(17)에 출력한다. 스텝 S3에서, 이용자 그룹화부(20)는, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용자의 기호 벡터의 유사성에 기초하여, 전체 이용자를 그룹화한다. 단, 다차원의 기호 벡터의 유사성을 판정하는 처리를 용이하게 하기 위해서, 각 이용자의 기호 벡터의 각 차원의 값을 2치화한다. 그리고, 이 그룹화 결과를 이용자 이력 정보 DB(17)에 출력한다. 이상으로, 오프라인 시의 전처리는 종료된다.In step S2, the symbol vector generation unit 19 of the recommendation candidate selection unit 18 generates a symbol vector for each user based on the history information of each user held in the user history information DB 17. And output to the user history information DB 17. In step S3, the user grouping unit 20 groups all the users based on the similarity of the preference vector of each user held in the user history information DB 17. FIG. However, in order to facilitate the process of determining the similarity of the multidimensional symbol vectors, the values of the respective dimensions of the symbol vector of each user are binarized. The grouping result is then output to the user history information DB 17. The preprocessing at the time of offline is complete | finished above.

이와 같이, 악곡 DB(11)에 유지되어 있는 전체 악곡의 클러스터링과, 각 이용자의 기호 벡터의 생성, 및 이용자의 그룹화를 전처리로서 실시함으로써, 후술하는 제1 내지 7의 추천 처리를 신속하게 실행할 수 있다. 또한, 제1 내지 7의 추천 처리 중, 이용자의 그룹 정보를 이용하지 않는 것이 있기 때문에, 이용자의 그룹 정보를 이용하지 않는 추천 처리만을 실행할 때에는, 스텝 S3의 처리를 생략해도 된다.In this manner, the clustering of all pieces of music held in the music DB 11, generation of the preference vector of each user, and grouping of users as preprocessing can quickly execute the first to seventh recommended processes described later. have. In addition, since some user information is not used among the 1st-7th recommendation processes, when only the recommendation process which does not use the user's group information is performed, the process of step S3 may be abbreviate | omitted.

다음으로 제1 추천 처리에 대해서, 도 7의 플로우차트를 참조하여 설명한다. 이하에서는, 악곡이 추천되는 이용자를 이용자 A라고 기술한다. 이 처리는, 예를 들면, 이용자 A가 해당 판매 사이트에 액세스하였을 때에 개시된다.Next, a 1st recommendation process is demonstrated with reference to the flowchart of FIG. Hereinafter, the user whose music is recommended is described as user A. FIG. This process is started, for example, when the user A accesses the sales site.

스텝 S11에서, 유사 이용자 검출부(21)는, 이용자 이력 정보 DB(17)에 유지되어 있는 이용자 A의 이력 정보와 다른 이용자의 이력 정보를 비교함으로써, 이용 자 A와 이력 정보가 가장 유사한 다른 이용자 X를 검출한다. 스텝 S12에서, 차분 검출부(22)는, 이용자 이력 정보 DB(17)에 유지되어 있는 이용자 A와 이용자 X의 이력 정보에 기초하여, 이용자 X가 갖고 있는(과거에 구입하였거나, 또는 보유하고 있는) 악곡으로서, 이용자 A가 갖고 있지 않은 악곡을 검출한다. 또한, 이 조건을 충족시키는 악곡이 복수 존재하는 경우, 그 중의 1 악곡을 예를 들면 랜덤하게 선택한다. 검출된 악곡을 악곡 a로 한다.In step S11, the similar user detection unit 21 compares the history information of the user A held in the user history information DB 17 with the history information of another user, whereby the other user X whose user information is the most similar in history information. Detect. In step S12, the difference detection unit 22 is based on the history information of the user A and the user X held in the user history information DB 17, which the user X has (purchased or held in the past). As a piece of music, a piece of music that the user A does not have is detected. When a plurality of pieces of music meeting this condition exist, one piece of music is selected at random. The detected music is referred to as music a.

스텝 S13에서, 추천 클러스터 결정부(23)는, 클러스터링 완료 DB(14)의 악곡-클러스터 대응표(16)를 참조하고, 악곡 a가 속하는 각 클러스터층의 클러스터를 특정한다. 스텝 S14에서, 추출부(24)는, 클러스터링 완료 DB(14)의 클러스터-악곡 대응표(15)를 참조하고, 스텝 S13의 처리로 특정된 모든 클러스터에 공통하여 분류되어 있는 악곡을 추출한다. 여기서 추출된 악곡을 추천 후보로 한다. 추천 후보는 복수 있어도 된다. 또한, 스텝 S13의 처리로 특정한 모든 클러스터에 공통하여 분류되어 있는 악곡이 존재하지 않는 경우, 스텝 S13의 처리로 특정한 클러스터 중, 될 수 있는 한 많은 클러스터에 공통하여 분류되어 있는 악곡을 추출하고, 추천 후보로 한다.In step S13, the recommended cluster determination unit 23 refers to the music-cluster correspondence table 16 of the clustering completed DB 14 and specifies the cluster of each cluster layer to which the music a belongs. In step S14, the extraction part 24 refers to the cluster-music correspondence table 15 of the clustering completed DB14, and extracts the music classified in common to all the clusters specified by the process of step S13. The music extracted here is used as a recommendation candidate. There may be a plurality of recommendation candidates. In addition, when there is no music classified in common in all the specific clusters by the process of step S13, among the specific clusters in the process of step S13, the music classified in common to as many clusters as possible can be extracted and recommended. It is a candidate.

스텝 S15에서, 악곡 선택부(25)는, 추천 후보의 악곡 중, 스텝 S12에서 검출한 악곡 a와 클러스터 정보가 가장 유사한 1 악곡을 선택하고, 선택 결과를 신규성 판정부(26), 및 선택 이유 생성부(27)에 출력한다. 스텝 S16에서, 신규성 판정부(26)는, 이용자 A의 기호 벡터와, 선택된 악곡이 속하는 클러스터에 기초하여, 신규성의 유무를 판정하고, 판정 결과를 선택 이유 생성부(27)에 출력한다. 선택 이유 생성부(27)는, 선택된 악곡이 속하는 클러스터층이나 클러스터에 대응하는 키워드를 클러스터링 완료 DB(14)로부터 취득하고, 취득한 키워드 등을 이용하여 선택의 이유를 나타내는 선택 이유문을 생성한다. 또한, 신규성 판정부(26)로부터의 판정 결과에 기초해서도 선택 이유문을 생성한다.In step S15, the music selection unit 25 selects one piece of music of which the candidate candidate most similar to the piece of music a detected in step S12 is clustered among the pieces of the recommendation candidates, and the selection result is determined by the novelty determination unit 26 and the reason for selection. Output to the generation unit 27. In step S16, the novelty determination unit 26 determines the presence or absence of novelty based on the preference vector of the user A and the cluster to which the selected music belongs, and outputs the determination result to the selection reason generation unit 27. The selection reason generation unit 27 obtains from the clustering completed DB 14 the cluster layer to which the selected music belongs or the keyword corresponding to the cluster, and generates a selection reason statement indicating the reason for selection by using the acquired keyword or the like. Further, the selection reason statement is generated also on the basis of the determination result from the novelty determination unit 26.

그리고, 생성한 선택 이유문을, 선택된 악곡의 악곡 ID와 함께 제시부(28)에 출력한다. 스텝 S17에서, 제시부(28)는, 선택된 악곡에 관한 정보를 악곡 DB로부터 취득하여, 생성된 선택 이유문과 함께 이용자측에 제시한다. 이상으로 제1 추천 처리가 종료된다.The generated selection reason statement is output to the presentation unit 28 together with the music ID of the selected music. In step S17, the presentation unit 28 obtains the information about the selected piece of music from the piece of music DB and presents it to the user along with the generated selection reason statement. This is the end of the first recommendation process.

다음으로 제2 및 3의 추천 처리에 대해서, 도 8의 플로우차트를 참조하여 설명한다. 우선 제2 추천 처리에 대해서 설명한다. 이 처리는, 예를 들면, 이용자 A가 해당 판매 사이트에 액세스하였을 때에 개시된다.Next, the 2nd and 3rd recommendation process is demonstrated with reference to the flowchart of FIG. First, the second recommendation process will be described. This process is started, for example, when the user A accesses the sales site.

스텝 S21에서, 유사 이용자 검출부(21)는, 이용자 이력 정보 DB(17)에 유지되어 있는 이용자 A의 기호 벡터와 다른 이용자의 기호 벡터를 비교함으로써, 이용자 A와 기호 벡터가 가장 유사한 다른 이용자 X를 검출한다. 또한, 이용자 A의 기호 벡터와 다른 이용자의 기호 벡터의 유사는, 예를 들면 양자의 코사인 상관값을 산출하여 판단한다.In step S21, the similar user detection unit 21 compares the user's A's preference vector held in the user history information DB 17 with the preference vector of another user, thereby selecting another user X having the most similar user A and the preference vector. Detect. In addition, the similarity between the symbol vector of user A and the symbol vector of another user is judged by calculating cosine correlation values of both, for example.

스텝 S22에서, 차분 검출부(22)는, 이용자 A의 기호 벡터에서는 값이 0이고, 이용자 X의 기호 벡터에서는 0 이외의 값인 기호 벡터의 차원을 검출하고, 검출한 차원에 상당하는 클러스터를 추천 클러스터로 결정한다.In step S22, the difference detection unit 22 detects the dimension of the symbol vector whose value is 0 in the symbol vector of the user A and a value other than 0 in the symbol vector of the user X, and recommends a cluster corresponding to the detected dimension. Decide on

예를 들면, 이용자 A의 기호 벡터가 도 5의 A에 도시하는 (1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1)이고, 이용자 X의 기호 벡터가 도 5의 B에 도시하는 (1, 1, 1, 0, 2, 0, 1, 0, 1, 2, 1, 1, 1)인 경우, 도 5의 B에 사선으로 도시하는 바와 같이, 클러스터 CL13, 및 CL42가 추천 클러스터로 결정된다.For example, the symbol vector of user A is (1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1) shown in A of FIG. When the vector is (1, 1, 1, 0, 2, 0, 1, 0, 1, 2, 1, 1, 1) shown in B of FIG. 5, as shown by the diagonal line in B of FIG. , Clusters CL13, and CL42 are determined as recommended clusters.

스텝 S23에서, 추출부(24)는, 이용자 이력 정보 DB(17)와 클러스터링 완료 DB(14)의 클러스터-악곡 대응표(15)를 참조하고, 추천 클러스터에 분류되어 있는 모든 악곡 중, 이용자 X가 갖고 있고, 이용자 A가 갖고 있지 않은 것을 추출하여 추천 후보로 한다.In step S23, the extraction unit 24 refers to the cluster-music correspondence table 15 of the user history information DB 17 and the clustering completed DB 14, and among the pieces of music classified in the recommended cluster, the user X is selected. It extracts what the user A does not have and makes it a recommendation candidate.

스텝 S24에서, 악곡 선택부(25)는, 추천 후보의 악곡 중, 이하의 3 방법 중의 어느 하나, 또는 3 방법을 조합한 방법에 의해, 1 악곡을 선택하고, 선택 결과를 신규성 판정부(26), 및 선택 이유 생성부(27)에 출력한다. 첫번째 방법은, 더 많은 추천 클러스터에 공통하여 속해 있는 악곡을 선택하는 방법이다. 두번째 방법은, 클러스터층에 미리 우선 순위를 부여하고, 더 우선 순위가 높은 클러스터층에 속하는 추천 클러스터에 분류되어 있는 악곡을 선택하는 방법이다. 세번째 방법은, 랜덤하게 선택하는 방법이다.In step S24, the music selection unit 25 selects one piece of music from the recommendation candidates by any one of the following three methods or a combination of three methods, and selects the result of the novelty determination unit 26. And the selection reason generation unit 27. The first method is to select pieces of music that belong to more recommended clusters. In the second method, priorities are given to the cluster layer in advance, and music pieces classified in the recommended clusters belonging to the higher priority cluster layer are selected. The third method is a method of randomly selecting.

스텝 S25에서, 신규성 판정부(26)는, 이용자 A의 기호 벡터와, 선택된 악곡이 속하는 클러스터에 기초하여, 신규성의 유무를 판정하고, 판정 결과를 선택 이유 생성부(27)에 출력한다. 선택 이유 생성부(27)는, 선택된 악곡이 속하는 클러스터층이나 클러스터에 대응하는 키워드를 클러스터링 완료 DB(14)로부터 취득하고, 취득한 키워드 등을 이용하여 선택의 이유를 나타내는 선택 이유문을 생성한다. 또한, 신규성 판정부(26)로부터의 판정 결과에 기초해서도 선택 이유문을 생 성한다. 그리고, 생성한 선택 이유문을, 선택된 악곡의 악곡 ID와 함께 제시부(28)에 출력한다. 스텝 S26에서, 제시부(28)는, 선택된 악곡에 관한 정보를 악곡 DB로부터 취득하여, 생성된 선택 이유문과 함께 이용자측에 제시한다. 이상으로 제2 추천 처리가 종료된다.In step S25, the novelty determination unit 26 determines the presence or absence of novelty based on the preference vector of the user A and the cluster to which the selected music belongs, and outputs the determination result to the selection reason generation unit 27. The selection reason generation unit 27 obtains from the clustering completed DB 14 the cluster layer to which the selected music belongs or the keyword corresponding to the cluster, and generates a selection reason statement indicating the reason for selection by using the acquired keyword or the like. Also, the selection reason statement is generated based on the determination result from the novelty determination unit 26. The generated selection reason statement is output to the presentation unit 28 together with the music ID of the selected music. In step S26, the presentation part 28 acquires the information about the selected music from the music DB, and presents it to the user with the generated selection reason statement. This is the end of the second recommendation process.

다음으로 제3 추천 처리에 대해서 설명한다. 제3 추천 처리는, 전술한 제2 추천 처리에서의 스텝 S23의 처리에서, 추천 클러스터에 분류되어 있는 모든 악곡 중, 이용자 A가 갖고 있지 않은 것을 추출하여 추천 후보로 하도록 한다. 즉, 이용자 X가 갖고 있지 않은 것도 추천 후보로 할 수 있다. 이 외의 처리에 대해서는 제2 추천 처리와 마찬가지이므로, 그 설명은 생략한다.Next, the third recommendation process will be described. In the process of step S23 in the above-described second recommendation process, the third recommendation process is to extract and extract as a recommendation candidate the user A does not have among all the music classified in the recommendation cluster. In other words, it is also possible to make a recommendation candidate that the user X does not have. Since other processes are the same as those in the second recommendation process, description thereof is omitted.

다음으로 제4 추천 처리에 대해서, 도 9의 플로우차트를 참조하여 설명한다. 이 처리는, 예를 들면, 이용자 A가 해당 판매 사이트에 액세스하였을 때에 개시된다.Next, the 4th recommendation process is demonstrated with reference to the flowchart of FIG. This process is started, for example, when the user A accesses the sales site.

스텝 S41에서, 유사 이용자 검출부(21)는, 이용자 이력 정보 DB(17)에 유지되어 있는 이용자 A의 그룹 정보에 기초하여, 이용자 A와 동일한 그룹에 속하는 다른 이용자 X를 랜덤하게 결정한다. 스텝 S42에서, 차분 검출부(22)는, 이용자 이력 정보 DB(17)에 유지되어 있는 이용자 A와 이용자 X의 이력 정보에 기초하여, 이용자 X가 갖고 있는 악곡으로서, 이용자 A가 갖고 있지 않은 악곡을 검출한다. 또한, 이 조건을 충족시키는 악곡이 복수 존재하는 경우, 그 중의 1 악곡을 예를 들면 랜덤하게 선택한다. 검출된 악곡을 악곡 a로 한다.In step S41, the similar user detection unit 21 randomly determines another user X belonging to the same group as the user A based on the group information of the user A held in the user history information DB 17. In step S42, the difference detection part 22 is a music which user X has based on the history information of user A and user X hold | maintained in the user history information DB 17, and the music which user A does not have is contained. Detect. When a plurality of pieces of music meeting this condition exist, one piece of music is selected at random. The detected music is referred to as music a.

스텝 S43에서, 추천 클러스터 결정부(23)는, 클러스터링 완료 DB(14)의 악곡 -클러스터 대응표(16)를 참조하고, 악곡 a가 속하는 각 클러스터층의 클러스터를 특정한다.In step S43, the recommended cluster determination unit 23 refers to the music-cluster correspondence table 16 of the clustering completed DB 14 and specifies the cluster of each cluster layer to which the music a belongs.

스텝 S44에서, 추출부(24)는, 클러스터링 완료 DB(14)의 클러스터-악곡 대응표(15)를 참조하고, 스텝 S43의 처리로 특정된 모든 클러스터에 공통하여 분류되어 있는 악곡을 추출한다. 여기서 추출된 악곡을 추천 후보로 한다. 추천 후보는 복수 있어도 된다. 또한, 스텝 S43의 처리로 특정한 모든 클러스터에 공통하여 분류되어 있는 악곡이 존재하지 않는 경우, 스텝 S43의 처리로 특정한 클러스터 중, 될 수 있는 한 많은 클러스터에 공통하여 분류되어 있는 악곡을 추출하고, 추천 후보로 한다.In step S44, the extraction part 24 refers to the cluster-music correspondence table 15 of the clustering completion DB14, and extracts the music classified in common to all the clusters specified by the process of step S43. The music extracted here is used as a recommendation candidate. There may be a plurality of recommendation candidates. In addition, when there are no music classified in common in all the specific clusters by the process of step S43, among the specific clusters by the process of step S43, the music classified in common to as many clusters as possible can be extracted and recommended. It is a candidate.

스텝 S45에서, 악곡 선택부(25)는, 추천 후보의 악곡 중, 스텝 S42에서 검출한 악곡 a와 클러스터 정보가 가장 유사한 1 악곡을 선택하고, 선택 결과를 신규성 판정부(26), 및 선택 이유 생성부(27)에 출력한다. 스텝 S46에서, 신규성 판정부(26)는, 이용자 A의 기호 벡터와, 선택된 악곡이 속하는 클러스터에 기초하여, 신규성의 유무를 판정하고, 판정 결과를 선택 이유 생성부(27)에 출력한다. 선택 이유 생성부(27)는, 선택된 악곡이 속하는 클러스터층이나 클러스터에 대응하는 키워드를 클러스터링 완료 DB(14)로부터 취득하고, 취득한 키워드 등을 이용하여 선택의 이유를 나타내는 선택 이유문을 생성한다. 또한, 신규성 판정부(26)로부터의 판정 결과에 기초해서도 선택 이유문을 생성한다.In step S45, the music selection unit 25 selects one piece of music of which the candidate candidate most similar to the piece of music a detected in step S42 among cluster pieces of recommended candidates, and selects the result of the novelty determination unit 26 and the reason for selection. Output to the generation unit 27. In step S46, the novelty determination unit 26 determines the presence or absence of novelty based on the preference vector of the user A and the cluster to which the selected music belongs, and outputs the determination result to the selection reason generation unit 27. The selection reason generation unit 27 obtains from the clustering completed DB 14 the cluster layer to which the selected music belongs or the keyword corresponding to the cluster, and generates a selection reason statement indicating the reason for selection by using the acquired keyword or the like. Further, the selection reason statement is generated also on the basis of the determination result from the novelty determination unit 26.

그리고, 생성한 선택 이유문을, 선택된 악곡의 악곡 ID와 함께 제시부(28)에 출력한다. 스텝 S47에서, 제시부(28)는, 선택된 악곡에 관한 정보를 악곡 DB로부 터 취득하여, 생성된 선택 이유문과 함께 이용자측에 제시한다. 이상으로 제4 추천 처리가 종료된다.The generated selection reason statement is output to the presentation unit 28 together with the music ID of the selected music. In step S47, the presentation unit 28 obtains information on the selected piece of music from the piece of music DB and presents it to the user along with the generated selection reason statement. This is the end of the fourth recommendation process.

제4 추천 처리에서는, 오프라인 시의 전처리에 의해 그룹화되어 있는 이용자의 그룹 정보를 이용하기 때문에, 이용자 A의 이력과 유사한 이용자 X를 신속하게 결정할 수 있다.In the fourth recommendation process, since the group information of the users grouped by the preprocessing during off-line is used, the user X similar to the history of the user A can be quickly determined.

다음으로 제5 및 6의 추천 처리에 대해서, 도 10의 플로우차트를 참조하여 설명한다.Next, the 5th and 6th recommendation processes are demonstrated with reference to the flowchart of FIG.

우선 제5 추천 처리에 대해서 설명한다. 이 처리는, 예를 들면, 이용자 A가 해당 판매 사이트에 액세스하였을 때에 개시된다.First, the fifth recommendation process will be described. This process is started, for example, when the user A accesses the sales site.

스텝 S51에서, 유사 이용자 검출부(21)는, 이용자 이력 정보 DB(17)에 유지되어 있는 이용자 A의 그룹 정보에 기초하여, 이용자 A와 동일한 그룹에 속하는 다른 이용자 X를 랜덤하게 결정한다.In step S51, the similar user detection unit 21 randomly determines another user X belonging to the same group as the user A based on the group information of the user A held in the user history information DB 17. FIG.

스텝 S52에서, 차분 검출부(22)는, 이용자 A의 기호 벡터에서는 값이 0이고, 이용자 X의 기호 벡터에서는 0 이외의 값인 기호 벡터의 차원을 검출하고, 검출한 차원에 상당하는 클러스터를 추천 클러스터로 결정한다.In step S52, the difference detection unit 22 detects the dimension of the symbol vector whose value is 0 in the symbol vector of the user A and a value other than 0 in the symbol vector of the user X, and recommends a cluster corresponding to the detected dimension. Decide on

스텝 S53에서, 추출부(24)는, 이용자 이력 정보 DB(17)와 클러스터링 완료 DB(14)의 클러스터-악곡 대응표(15)를 참조하고, 추천 클러스터에 분류되어 있는 모든 악곡 중, 이용자 X가 갖고 있고, 이용자 A가 갖고 있지 않은 것을 추출하여 추천 후보로 한다.In step S53, the extraction unit 24 refers to the cluster-music correspondence table 15 of the user history information DB 17 and the clustering completed DB 14, and among the pieces of music classified in the recommended cluster, the user X is selected. It extracts what the user A does not have and makes it a recommendation candidate.

스텝 S54에서, 악곡 선택부(25)는, 추천 후보의 악곡 중, 이하의 3 방법 중 어느 하나, 또는 3 방법을 조합한 방법에 의해, 1 악곡을 선택하고, 선택 결과를 신규성 판정부(26), 및 선택 이유 생성부(27)에 출력한다. 첫번째 방법은, 더 많은 추천 클러스터에 공통하여 속해 있는 악곡을 선택하는 방법이다. 두번째 방법은, 클러스터층에 미리 우선 순위를 부여하고, 더 우선 순위가 높은 클러스터층에 속하는 추천 클러스터에 분류되어 있는 악곡을 선택하는 방법이다. 세번째 방법은, 랜덤하게 선택하는 방법이다.In step S54, the music selection unit 25 selects one piece of music from the recommendation candidate by any one of the following three methods or a combination of three methods, and selects the result of the novelty determination unit 26. And the selection reason generation unit 27. The first method is to select pieces of music that belong to more recommended clusters. In the second method, priorities are given to the cluster layer in advance, and music pieces classified in the recommended clusters belonging to the higher priority cluster layer are selected. The third method is a method of randomly selecting.

스텝 S55에서, 신규성 판정부(26)는, 이용자 A의 기호 벡터와, 선택된 악곡이 속하는 클러스터에 기초하여, 신규성의 유무를 판정하고, 판정 결과를 선택 이유 생성부(27)에 출력한다. 선택 이유 생성부(27)는, 선택된 악곡이 속하는 클러스터층이나 클러스터에 대응하는 키워드를 클러스터링 완료 DB(14)로부터 취득하고, 취득한 키워드 등을 이용하여 선택의 이유를 나타내는 선택 이유문을 생성한다. 또한, 신규성 판정부(26)로부터의 판정 결과에 기초해서도 선택 이유문을 생성한다. 그리고, 생성한 선택 이유문을, 선택된 악곡의 악곡 ID와 함께 제시부(28)에 출력한다. 스텝 S56에서, 제시부(28)는, 선택된 악곡에 관한 정보를 악곡 DB로부터 취득하여, 생성된 선택 이유문과 함께 이용자측에 제시한다. 이상으로 제5 추천 처리가 종료된다.In step S55, the novelty determination unit 26 determines the presence or absence of novelty based on the preference vector of the user A and the cluster to which the selected music belongs, and outputs the determination result to the selection reason generation unit 27. The selection reason generation unit 27 obtains from the clustering completed DB 14 the cluster layer to which the selected music belongs or the keyword corresponding to the cluster, and generates a selection reason statement indicating the reason for selection by using the acquired keyword or the like. Further, the selection reason statement is generated also on the basis of the determination result from the novelty determination unit 26. The generated selection reason statement is output to the presentation unit 28 together with the music ID of the selected music. In step S56, the presentation unit 28 obtains the information about the selected piece of music from the piece of music DB, and presents it to the user along with the generated selection reason statement. This is the end of the fifth recommendation process.

다음으로 제6 추천 처리에 대해서 설명한다. 제6 추천 처리는, 전술한 제5 추천 처리에서의 스텝 S53의 처리에서, 추천 클러스터에 분류되어 있는 모든 악곡 중, 이용자 A가 갖고 있지 않은 것을 추출하여 추천 후보로 하도록 한다. 즉, 이용자 X가 갖고 있지 않은 것도 추천 후보로 할 수 있다. 이 외의 처리에 대해서는 제5 추천 처리와 마찬가지이므로, 그 설명은 생략한다.Next, a sixth recommendation process will be described. In the sixth recommendation process, in the processing of step S53 in the fifth recommendation process described above, among the pieces of music classified in the recommendation cluster, the user A does not have a feature to be extracted to be a recommendation candidate. In other words, it is also possible to make a recommendation candidate that the user X does not have. Since other processes are the same as those in the fifth recommendation process, the description thereof is omitted.

제5 및 6의 추천 처리에서는, 오프라인 시의 전처리에 의해 그룹화되어 있는 이용자의 그룹 정보를 이용하기 때문에, 이용자 A의 이력과 유사한 이용자 X를 신속하게 결정할 수 있다.In the fifth and sixth recommendation processes, since the group information of the users grouped by the preprocessing during off-line is used, the user X similar to the history of user A can be quickly determined.

다음으로 제7 추천 처리에 대해서, 도 11의 플로우차트를 참조하여 설명한다. 우선 제7 추천 처리에 대해서 설명한다. 이 처리는, 이용자 A의 이력 정보가 극단적으로 적은 경우, 다른 이용자가 적은 경우 등에 적합하여, 예를 들면, 이용자 A가 해당 판매 사이트에 액세스하였을 때에 개시된다.Next, a seventh recommendation process will be described with reference to the flowchart in FIG. 11. First, the seventh recommendation process will be described. This process is suitable for the case where the user A's history information is extremely small, when there are few other users, and the like, for example, when the user A accesses the sales site.

스텝 S61에서, 차분 검출부(22)는, 이용자 A의 기호 벡터의 각 차원 중, 그 값이 소정의 값 이상인 것을 검출하고, 그 차원에 상당하는 클러스터를 추천 클러스터로 결정한다.In step S61, the difference detection part 22 detects that the value is more than a predetermined value among each dimension of the symbol vector of the user A, and determines the cluster corresponding to the dimension as a recommended cluster.

스텝 S62에서, 추출부(24)는, 이용자 이력 정보 DB(17)와 클러스터링 완료 DB(14)의 클러스터-악곡 대응표(15)을 참조하고, 추천 클러스터에 분류되어 있는 모든 악곡 중, 이용자 A가 갖고 있지 않은 것을 추출하여 추천 후보로 한다.In step S62, the extraction unit 24 refers to the cluster-music correspondence table 15 of the user history information DB 17 and the clustering completed DB 14, and among the pieces of music classified in the recommended cluster, the user A What they do not have is extracted to be a candidate for recommendation.

스텝 S63에서, 악곡 선택부(25)는, 추천 후보의 악곡 중, 가장 많은 추천 클러스터에 속하는 1 악곡을 선택하고, 선택 결과를 신규성 판정부(26), 및 선택 이유 생성부(27)에 출력한다. 또한, 가장 많은 추천 클러스터에 속하는 악곡이 복수 존재하는 경우, 그 중으로부터 예를 들면 랜덤하게 1 악곡을 선택한다.In step S63, the music selection unit 25 selects one piece of music belonging to the most recommended cluster among the pieces of the recommendation candidates, and outputs the selection result to the novelty determination unit 26 and the selection reason generation unit 27. do. When a plurality of pieces of music belonging to the most recommended clusters exist, one piece of music is randomly selected, for example.

스텝 S64에서, 신규성 판정부(26)는, 이용자 A의 기호 벡터와, 선택된 악곡에 속하는 클러스터에 기초하여, 신규성의 유무를 판정하고, 판정 결과를 선택 이 유 생성부(27)에 출력한다. 선택 이유 생성부(27)는, 선택된 악곡이 속하는 클러스터층이나 클러스터에 대응하는 키워드를 클러스터링 완료 DB(14)로부터 취득하고, 취득한 키워드 등을 이용하여 선택의 이유를 나타내는 선택 이유문을 생성한다. 또한, 신규성 판정부(26)로부터의 판정 결과에 기초해서도 선택 이유문을 생성한다. 그리고, 생성한 선택 이유문을, 선택된 악곡의 악곡 ID와 함께 제시부(28)에 출력한다. 스텝 S65에서, 제시부(28)는, 선택된 악곡에 관한 정보를 악곡 DB로부터 취득하여, 생성된 선택 이유문과 함께 이용자측에 제시한다. 이상으로 제7 추천 처리가 종료된다.In step S64, the novelty determination unit 26 determines the presence or absence of novelty based on the preference vector of the user A and the cluster belonging to the selected music, and outputs the determination result to the selection reason generation unit 27. The selection reason generation unit 27 obtains from the clustering completed DB 14 the cluster layer to which the selected music belongs or the keyword corresponding to the cluster, and generates a selection reason statement indicating the reason for selection by using the acquired keyword or the like. Further, the selection reason statement is generated also on the basis of the determination result from the novelty determination unit 26. The generated selection reason statement is output to the presentation unit 28 together with the music ID of the selected music. In step S65, the presentation unit 28 obtains the information about the selected piece of music from the piece of music DB and presents it to the user along with the generated selection reason statement. This is the end of the seventh recommendation process.

전술한 제1 내지 7의 추천 처리에 따르면, 이용자의 이력 정보를, 각 클러스터를 1차원으로 하는 기호 벡터로 치환하여 CF 방법을 적용하기 때문에, 이용자 A에 대한 악곡의 추천이 악곡 DB(11)에 존재하는 전체 악곡 중의 일부에 집중하게 되는 것을 억지할 수 있다. 또한, 이력 정보가 적은 이용자에 대해서도 악곡을 추천할 수 있어, 소위 콜드 개시 문제를 회피할 수 있다. 또한, 추천된 악곡이 선택된 이유를 이용자 A에게 제시할 수 있어, 예를 들면, 이용자 A는, 추천된 악곡이 자신에게 있어서 신규성이 있는 것인지의 여부를 알 수 있다.According to the above-mentioned recommendation processing of the first to the seventh, since the CF method is applied by replacing the user's history information with a symbol vector having each cluster as one-dimensional, the recommendation of the music to the user A is the music DB 11 It can be forbidden to focus on some of the pieces of music present in. In addition, a piece of music can be recommended even for a user with little history information, so that a so-called cold start problem can be avoided. In addition, the reason why the recommended piece of music was selected can be presented to the user A. For example, the user A can know whether the recommended piece of music is novel to himself.

또한, 본 발명은, 악곡을 추천하는 경우뿐만 아니라, 악곡 이외의 콘텐츠, 예를 들면 텔레비전 프로그램, 영화, 서적 등을 판매하는 판매 사이트에도 적용하는 것이 가능하다.In addition, the present invention can be applied not only to recommending a piece of music but also to a sales site that sells contents other than the piece of music, such as a television program, a movie, a book, and the like.

그런데, 전술한 일련의 처리는, 하드웨어에 의해 실행시킬 수도 있지만, 소프트웨어에 의해 실행시킬 수도 있다. 일련의 처리를 소프트웨어에 의해 실행시키 는 경우에는, 그 소프트웨어를 구성하는 프로그램이, 전용의 하드웨어에 조립되어 있는 컴퓨터, 또는, 각종 프로그램을 인스톨함으로써, 각종 기능을 실행하는 것이 가능한, 예를 들면 도 12에 도시한 바와 같이 구성되는 범용의 퍼스널 컴퓨터 등에, 기록 매체로부터 인스톨된다.By the way, although the above-mentioned series of processes can be performed by hardware, it can also be performed by software. When a series of processes are performed by software, the program which comprises the software can perform various functions by installing the computer or the various programs which are integrated in the dedicated hardware, for example, FIG. It is installed in a general-purpose personal computer configured as shown in 12 from a recording medium.

이 퍼스널 컴퓨터(100)는, CPU(Central Processing Unit)(101)를 내장하고 있다. CPU(101)에는 버스(104)를 통하여, 입출력 인터페이스(105)가 접속되어 있다. 버스(104)에는, ROM(Read Only Memory)(102) 및 RAM(Random Access Memory)(103)이 접속되어 있다.This personal computer 100 incorporates a CPU (Central Processing Unit) 101. The input / output interface 105 is connected to the CPU 101 via the bus 104. The bus 104 is connected to a ROM (Read Only Memory) 102 and a RAM (Random Access Memory) 103.

입출력 인터페이스(105)에는, 이용자가 조작 커맨드를 입력하는 키보드, 마우스 등의 입력 디바이스로 이루어지는 입력부(106), 화면을 표시하는 CRT(Cathode Ray Tube) 또는 LCD(Liquid Crystal Display) 등의 디스플레이로 이루어지는 출력부(107), 프로그램이나 각종 데이터를 저장하는 하드디스크 드라이브 등으로 이루어지는 기억부(108), 및 모뎀, LAN(Local Area Network) 어댑터 등으로 이루어지고, 인터넷으로 대표되는 네트워크를 통한 통신 처리를 실행하는 통신부(109)가 접속되어 있다. 또한, 자기 디스크(플렉시블 디스크를 포함함), 광 디스크(CD-ROM(Compact Disc-Read Only Memory), DVD(Digital Versatile Disc)를 포함함), 광 자기 디스크(MD(Mini Disc)를 포함함), 혹은 반도체 메모리 등의 기록 매체(111)에 대하여 데이터를 읽고 쓰는 드라이브(110)가 접속되어 있다.The input / output interface 105 includes an input unit 106 composed of an input device such as a keyboard or a mouse for inputting an operation command by a user, and a display such as a cathode ray tube (CRT) or liquid crystal display (LCD) for displaying a screen. The communication unit comprises an output unit 107, a storage unit 108 including a hard disk drive for storing programs and various data, and a modem, a local area network (LAN) adapter, or the like, for communication processing via a network represented by the Internet. The communication unit 109 to be executed is connected. Also includes magnetic discs (including flexible discs), optical discs (including Compact Disc-Read Only Memory (CD-ROM), digital versatile discs (DVD)), magneto-optical discs (Mini Disc) Or a drive 110 that reads and writes data to a recording medium 111 such as a semiconductor memory.

이 퍼스널 컴퓨터(100)에 전술한 일련의 처리를 실행시키는 프로그램은, 기록 매체(111)에 저장된 상태로 퍼스널 컴퓨터(100)에 공급되고, 드라이브(110)에 의해 판독되어 기억부(108)에 내장되는 하드디스크 드라이브에 인스톨되어 있다. 기억부(108)에 인스톨되어 있는 프로그램은, 입력부(106)에 입력되는 이용자로부터의 커맨드에 대응하는 CPU(101)의 명령에 의해, 기억부(108)로부터 RAM(103)에 로드되어 실행된다.The program causing the personal computer 100 to execute the above-described series of processes is supplied to the personal computer 100 in a state stored in the recording medium 111, read by the drive 110, and stored in the storage unit 108. It is installed on the internal hard disk drive. The program installed in the storage unit 108 is loaded into the RAM 103 from the storage unit 108 and executed by a command of the CPU 101 corresponding to a command from the user input to the input unit 106. .

도 13은, 본 발명의 일 실시 형태의 추천 시스템(1)의 다른 구성의 예를 도시한 블록도이다. 도 13에서, 도 1에 도시한 경우와 동일한 부분에는 동일한 부호를 붙이고, 그 설명은 생략한다.FIG. 13 is a block diagram showing an example of another configuration of the recommendation system 1 according to the embodiment of the present invention. In Fig. 13, the same parts as those shown in Fig. 1 are denoted by the same reference numerals, and the description thereof is omitted.

도 13에 도시된 추천 시스템(1)은, 악곡 DB(11), 키워드 설정부(13), 클러스터링 완료 DB(14), 이용자 이력 정보 DB(17), 추천 후보 선택부(18), 악곡 선택부(25), 신규성 판정부(26), 선택 이유 생성부(27), 제시부(28), 메타데이터 클러스터링부(201), 및 악곡 클러스터링부(202)로 구성된다.The recommendation system 1 shown in FIG. 13 includes a music DB 11, a keyword setting unit 13, a clustering completed DB 14, a user history information DB 17, a recommendation candidate selection unit 18, and a music selection. A unit 25, a novelty determination unit 26, a selection reason generation unit 27, a presentation unit 28, a metadata clustering unit 201, and a music clustering unit 202 are provided.

메타데이터 클러스터링부(201)는, 악곡 데이터베이스(11)에 기록되어 있는 각 악곡의 메타데이터를 클러스터링한다. 즉, 메타데이터 클러스터링부(201)는, 콘텐츠인 악곡의 메타데이터를 복수의 클러스터 중 어느 하나에 분류하고, 클러스터 계층을 할당한다.The metadata clustering unit 201 clusters the metadata of each piece of music recorded in the piece music database 11. In other words, the metadata clustering unit 201 classifies the metadata of the piece of music as the content into any one of the plurality of clusters and assigns a cluster hierarchy.

메타데이터 클러스터링부(201)는, 각 악곡의 메타데이터의 클러스터링의 결과를 악곡 클러스터링부(202)에 공급한다.The metadata clustering unit 201 supplies the music clustering unit 202 with the result of clustering metadata of each music.

악곡 클러스터링부(202)는, 메타데이터 클러스터링부(201)에 의한 각 악곡의 메타데이터의 클러스터링의 결과에 기초하여, 클러스터링부(12)와 마찬가지로, 각 악곡을 클러스터링하여 각 악곡의 클러스터 정보를 생성한다. 즉, 악곡 클러스터 링부(202)는, 각 악곡의 클러스터링의 결과에 따른 클러스터 정보를 생성하여 클러스터링 완료 DB(14)에 출력한다.The music clustering unit 202 generates cluster information of each piece of music by clustering the pieces of music in the same manner as the clustering unit 12 based on the clustering result of metadata of each piece of music by the metadata clustering unit 201. do. That is, the music clustering unit 202 generates cluster information according to the clustering result of each music and outputs it to the clustering completed DB 14.

또한, 도 13에 도시한 추천 시스템(1)의 추천 후보 선택부(18)는, 기호 벡터 생성부(19), 이용자 그룹화부(20), 차분 검출부(22), 추천 클러스터 결정부(23), 추출부(24), 및 유사 이용자 검출부(203)를 포함한다.In addition, the recommendation candidate selection unit 18 of the recommendation system 1 shown in FIG. 13 includes a preference vector generation unit 19, a user grouping unit 20, a difference detection unit 22, and a recommendation cluster determination unit 23. , Extraction section 24, and similar user detection section 203.

유사 이용자 검출부(203)는, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용자의 기호 벡터를 비교함으로써, 악곡이 추천되는 이용자와 기호 벡터가 유사한 다른 이용자를 검출한다. 더 상세하게는, 유사 이용자 검출부(203)는, 이용자의 기호 정보의 일례인 기호 벡터를 정규화하고, 정규화된 각 이용자의 기호 벡터로부터, 이용자의 각각에 대해서 계층마다의 가중치를 계산하고, 계층마다의 가중치와 기호 벡터로부터, 이용자 간의 기호의 유사 정도를 나타내는 유사도를 계산하고, 계산된 유사도로부터, 제1 이용자와 기호가 유사한 제2 이용자를 검출한다.The similar user detection unit 203 compares the preference vector of each user held in the user history information DB 17 to detect a user whose music is recommended and another user whose similar preference vector is similar. More specifically, the similar user detection unit 203 normalizes a symbol vector, which is an example of user's preference information, calculates a weight for each user for each user from the normalized user's preference vector, and calculates the weight for each user. Similarity indicating the degree of similarity between symbols between users is calculated from the weight vector and the symbol vector, and from the calculated similarity, a second user having a similar preference to the first user is detected.

다음으로, 도 14의 플로우차트를 참조하여, 도 13에 도시된 추천 시스템(1)에서의, 악곡을 추천하는 처리의 준비인 오프라인 시의 전처리의 다른 예에 대해서 설명한다.Next, with reference to the flowchart of FIG. 14, another example of the preprocessing at the off-line which is preparation of the process which recommends a music in the recommendation system 1 shown in FIG. 13 is demonstrated.

스텝 S201에서, 메타데이터 클러스터링부(201)는, 악곡 DB(11)로부터 악곡의 메타데이터를 취득하여, 취득한 메타데이터의 차원을 압축한다. 예를 들면, 스텝 S201에서, 메타데이터 클러스터터링부(201)는, 악곡 DB(11)로부터 취득한 악곡의 메타데이터의 차원을, LSA(the latent semantic analysis), PLSA(the probabilistic latent semantic analysis), 또는 수량화 Ⅲ류 등의 방법에 의해 압 축한다.In step S201, the metadata clustering unit 201 acquires the metadata of the music from the music DB 11 and compresses the dimension of the acquired metadata. For example, in step S201, the metadata clustering unit 201 determines the dimensions of the metadata of the music obtained from the music DB 11 by the latent semantic analysis (LSA), the probabilistic latent semantic analysis (PLSA), Or by a method such as quantification III.

또한, 스텝 S201에서, 메타데이터 클러스터링부(201)는, 악곡의 메타데이터를 벡터화하도록 하여도 된다.In step S201, the metadata clustering unit 201 may vector the music metadata.

스텝 S202에서, 메타데이터 클러스터링부(201)는, 각 악곡의 메타데이터를 클러스터링한다. 예를 들면, 스텝 S202에서, 메타데이터 클러스터링부(201)는, 각 악곡의 메타데이터를 소프트 클러스터링한다.In step S202, the metadata clustering unit 201 clusters the metadata of each piece of music. For example, in step S202, the metadata clustering unit 201 soft clusters the metadata of each piece of music.

더 구체적으로는, 예를 들면, 도 15에서 도시되는 바와 같이, 메타데이터 클러스터링부(201)는, 각각의 계층 내에서, 아이템의 각 클러스터에의 귀속 가중치의 합이 1로 되도록, 각 악곡의 메타데이터를 소프트 클러스터링한다.More specifically, for example, as shown in FIG. 15, the metadata clustering unit 201 is configured such that the sum of the attribution weights of the clusters of the items in each hierarchy is 1 in each hierarchy. Soft cluster metadata.

예를 들면, ABC123인 악곡 ID로 특정되는 악곡의 메타데이터의 제1 계층에서의 제1 클러스터, 제2 클러스터, 제3 클러스터, 및 제4 클러스터에의 귀속 가중치는, 각각, 0.0, 0.8, 0.0, 및 0.2이다. ABC123인 악곡 ID로 특정되는 악곡의 메타데이터의 제2 계층에서의 제5 클러스터, 제6 클러스터, 제7 클러스터, 및 제8 클러스터에의 귀속 가중치는, 각각, 0.4, 0.6, 0.0, 및 0.0이다. ABC123인 악곡 ID로 특정되는 악곡의 메타데이터의 제3 계층에서의 제9 클러스터, 제10 클러스터, 및 제11 클러스터에의 귀속 가중치는, 각각, 0.0, 0.0, 및 1.0이다. 또한, ABC123인 악곡 ID로 특정되는 악곡의 메타데이터의 제n 계층에서의 4개의 클러스터의 각각에의 귀속 가중치는, 각각, 1.0, 0.0, 0.0, 및 0.0이다.For example, the attribution weights to the first cluster, the second cluster, the third cluster, and the fourth cluster in the first hierarchy of the music metadata specified by the music ID of ABC123 are 0.0, 0.8, and 0.0, respectively. , And 0.2. Attribution weights to the fifth cluster, the sixth cluster, the seventh cluster, and the eighth cluster in the second layer of the music metadata specified by the music ID that is ABC123 are 0.4, 0.6, 0.0, and 0.0, respectively. . Attribution weights to the ninth cluster, the tenth cluster, and the eleventh cluster in the third layer of the music metadata specified by the music ID being ABC123 are 0.0, 0.0, and 1.0, respectively. In addition, the attribution weights to each of the four clusters in the nth hierarchy of the music metadata specified by the music ID being ABC123 are 1.0, 0.0, 0.0, and 0.0, respectively.

예를 들면, CTH863인 악곡 ID로 특정되는 악곡의 메타데이터의 제1 계층에서의 제1 클러스터, 제2 클러스터, 제3 클러스터, 및 제4 클러스터에의 귀속 가중치 는, 각각, 1.0, 0.0, 0.0, 및 0.0이다. CTH863인 악곡 ID로 특정되는 악곡의 메타데이터의 제2 계층에서의 제5 클러스터, 제6 클러스터, 제7 클러스터, 및 제8 클러스터에의 귀속 가중치는, 각각, 0.0, 0.5, 0.5, 및 0.0이다. CTH863인 악곡 ID로 특정되는 악곡의 메타데이터의 제3 계층에서의 제9 클러스터, 제10 클러스터, 및 제11 클러스터에의 귀속 가중치는, 각각, 0.7, 0.3, 및 0.0이다. 또한, CTH863인 악곡 ID로 특정되는 악곡의 메타데이터의 제n 계층에서의 4개의 클러스터의 각각에의 귀속 가중치는, 각각, 0.0, 0.8, 0.2, 및 0.0이다.For example, the attribution weights to the first cluster, the second cluster, the third cluster, and the fourth cluster in the first layer of the music metadata specified by the music ID of CTH863 are 1.0, 0.0, and 0.0, respectively. , And 0.0. Attribution weights to the fifth cluster, sixth cluster, seventh cluster, and eighth cluster in the second layer of the music metadata specified by the music ID being CTH863 are 0.0, 0.5, 0.5, and 0.0, respectively. . Attribution weights to the ninth cluster, the tenth cluster, and the eleventh cluster in the third layer of the music metadata specified by the music ID being CTH863 are 0.7, 0.3, and 0.0, respectively. In addition, the attribution weight to each of the four clusters in the nth hierarchy of the music metadata specified by the music ID which is CTH863 is 0.0, 0.8, 0.2, and 0.0, respectively.

예를 들면, XYZ567인 악곡 ID로 특정되는 악곡의 메타데이터의 제1 계층에서의 제1 클러스터, 제2 클러스터, 제3 클러스터, 및 제4 클러스터에의 귀속 가중치는, 각각, 0.0, 0.4, 0.6, 및 0.0이다. XYZ567인 악곡 ID로 특정되는 악곡의 메타데이터의 제2 계층에서의 제5 클러스터, 제6 클러스터, 제7 클러스터, 및 제8 클러스터에의 귀속 가중치는, 각각, 0.0, 0.0, 0.0, 및 1.0이다. XYZ567인 악곡 ID로 특정되는 악곡의 메타데이터의 제3 계층에서의 제9 클러스터, 제10 클러스터, 및 제11 클러스터에의 귀속 가중치는, 각각, 0.9, 0.0, 및 0.1이다. 또한, XYZ567인 악곡 ID로 특정되는 악곡의 메타데이터의 제n 계층에서의 4개의 클러스터의 각각에의 귀속 가중치는, 각각, 0.3, 0.0, 0.0, 및 0.7이다.For example, the attribution weights to the first cluster, the second cluster, the third cluster, and the fourth cluster in the first hierarchy of the music metadata specified by the music ID of XYZ567 are 0.0, 0.4, and 0.6, respectively. , And 0.0. Attribution weights to the fifth cluster, the sixth cluster, the seventh cluster, and the eighth cluster in the second hierarchy of the music metadata specified by the music ID of XYZ567 are 0.0, 0.0, 0.0, and 1.0, respectively. . Attribution weights to the ninth cluster, the tenth cluster, and the eleventh cluster in the third layer of the music metadata specified by the music ID of XYZ567 are 0.9, 0.0, and 0.1, respectively. In addition, the attribution weights to each of the four clusters in the nth hierarchy of the music metadata specified by the music ID of XYZ567 are 0.3, 0.0, 0.0, and 0.7, respectively.

또한, 각 악곡의 메타데이터의 소프트 클러스터링은, 각각의 계층 내에서, 아이템, 즉 악곡의 각 클러스터에의 귀속 가중치의 합이 1로 되는 것에 한정되지 않는다. 또한, 각 아이템이 각 계층에서, 어느 클러스터에 속하지 않은 것으로 하여도 된다.In addition, soft clustering of the metadata of each piece of music is not limited to the sum of the attribution weights of the items, that is, the clusters of the pieces of music, in each hierarchy. It is also possible that each item does not belong to any cluster in each hierarchy.

스텝 S203에서, 메타데이터 클러스터링부(201)는, 클러스터의 층을 할당한다.In step S203, the metadata clustering unit 201 assigns the layers of the cluster.

여기서, 도 16 및 도 17을 참조하여, 메타데이터의 클러스터링 및 클러스터의 층의 할당에 대해서 설명한다. 도 16은, 메타데이터의 예를 도시하는 도면이다. 도 16에서 도시되는 메타데이터는, 간단히 하기 위해서, 0 또는 1 중 어느 하나의 값의 카테고리컬 데이터로 되어 있다.16 and 17, clustering of metadata and allocation of layers of the cluster will be described. 16 is a diagram illustrating an example of metadata. The metadata shown in FIG. 16 is categorical data of either 0 or 1 for simplicity.

고차의 분류로서의 메타 그룹 1에는, 메타데이터 1, 메타데이터 2, 및 메타데이터 3이 속하고, 고차의 분류로서의 메타 그룹 2에는, 메타데이터 4, 메타데이터 5, 및 메타데이터 6이 속한다. 예를 들면, 메타 그룹 1에는, 아티스트에 관한 메타데이터가 속하고, 메타데이터 1은, 아티스트의 외관을 나타내고, 메타데이터 2는, 그룹인 것을 나타낸다. 또한, 예를 들면, 메타 그룹 2에는, 장르에 관한 메타데이터가 속하고, 메타데이터 4는, 팝스인 것을 나타내고, 메타데이터 5는, 록인 것을 나타낸다.Metadata 1, metadata 2, and metadata 3 belong to meta group 1 as a higher-order classification, and metadata 4, metadata 5, and metadata 6 belong to meta-group 2 as a higher-order classification. For example, metadata relating to the artist belongs to meta group 1, and metadata 1 indicates the appearance of the artist, and metadata 2 indicates that the group is a group. For example, metadata relating to the genre belongs to meta group 2, and metadata 4 represents pops, and metadata 5 represents lock.

도 16에 도시하는 예에서, ABC123인 악곡 ID로 특정되는 악곡의 메타데이터 1 내지 메타데이터 6은, 각각, 1, 1, 1, 1, 1, 1이고, CTH863인 악곡 ID로 특정되는 악곡의 메타데이터 1 내지 메타데이터 6은, 각각, 0, 1, 0, 0, 1, 1이고, XYZ567인 악곡 ID로 특정되는 악곡의 메타데이터 1 내지 메타데이터 6은, 각각, 1, 1, 1, 1, 1, 1이다. 또한, EKF534인 악곡 ID로 특정되는 악곡의 메타데이터 1 내지 메타데이터 6은, 각각, 1, 0, 1, 0, 0, 1이고, OPQ385인 악곡 ID로 특정되는 악곡의 메타데이터 1 내지 메타데이터 6은, 각각, 1, 0, 1, 1, 0, 0이다.In the example shown in FIG. 16, the metadata 1 to metadata 6 of the music specified by the music ID of ABC123 are 1, 1, 1, 1, 1, 1, respectively, of the music specified by the music ID of CTH863. Metadata 1 to metadata 6 are 0, 1, 0, 0, 1, 1, respectively, and metadata 1 to metadata 6 of a piece of music specified by a music ID of XYZ567 are respectively 1, 1, 1, 1, 1, 1. The metadata 1 to metadata 6 of the music specified by the music ID of EKF534 are 1, 0, 1, 0, 0, and 1, respectively, and the metadata 1 to metadata of the music specified by the music ID of OPQ385. 6 is 1, 0, 1, 1, 0, 0, respectively.

이 때, ABC123인 악곡 ID로 특정되는 악곡 내지 OPQ385인 악곡 ID로 특정되는 악곡에 대한, 메타데이터 1을 벡터로 간주한다. 마찬가지로, ABC123인 악곡 ID로 특정되는 악곡 내지 OPQ385인 악곡 ID로 특정되는 악곡에 대한, 메타데이터 2 내지 메타데이터 6의 각각을 벡터로 간주한다. 즉, 복수의 악곡에 대한 1개의 메타데이터의 값을 벡터로 간주한다.At this time, the metadata 1 for the music specified by the music ID of ABC123 to the music specified by the music ID of OPQ385 is regarded as a vector. Similarly, each of the metadata 2 to metadata 6 for the music specified by the music ID of ABC123 to the music specified by the music ID of OPQ385 is regarded as a vector. In other words, the value of one metadata for a plurality of pieces of music is regarded as a vector.

이 벡터끼리의 거리에 주목한다.Note the distance between these vectors.

도 16에서 도시되는 예에서, 벡터로 간주된 메타데이터 1, 메타데이터 3, 메타데이터 4가, 맨해튼 거리 1 이내의 클러스터에, 또한, 메타데이터 2, 메타데이터 5, 메타데이터 6이, 맨해튼 거리 1 이내의 다른 클러스터에, 정리되어 있다.In the example shown in FIG. 16, metadata 1, metadata 3, and metadata 4 regarded as a vector have a cluster within Manhattan distance 1, and metadata 2, metadata 5, metadata 6 have a Manhattan distance. It is arranged in other clusters within one.

따라서, 이들 클러스터를, 새로운 메타데이터의 계층으로 한다. 즉, 계층의 각각의 층에, 더 가까운 메타데이터가 할당된다.Therefore, let these clusters be a new layer of metadata. That is, closer metadata is assigned to each layer of the hierarchy.

도 17은, 이와 같이 클러스터링되고, 층이 할당된 메타데이터의 예를 나타낸다. 도 17에 도시하는 예에서, 제1층에는, 메타데이터 1, 메타데이터 3, 및 메타데이터 4가 속하고, 제2층에는, 메타데이터 2, 메타데이터 5, 및 메타데이터 6이 속한다.17 shows an example of metadata clustered in this way and assigned a layer. In the example shown in FIG. 17, metadata 1, metadata 3, and metadata 4 belong to the first layer, and metadata 2, metadata 5, and metadata 6 belong to the second layer.

이와 같이, 상관이 높은 메타데이터의 집합으로 각각의 층이 형성되고, 그 중에서 콘텐츠의 클러스터링이 행해지기 때문에, 장르나 아티스트 등을 그대로 계층으로 하도록 하는 통상의 계층 나눔으로는 다 표현하지 못하는 미묘한 콘텐츠 간의 차를 클러스터에 반영할 수 있다.In this way, each layer is formed of a set of highly correlated metadata, and the clustering of the contents is performed among them, so that the subtle contents cannot be represented by the normal hierarchical division that allows genres, artists, etc. to be hierarchically. The difference can be reflected in the cluster.

도 14로 되돌아가서, 스텝 S204에서, 악곡 클러스터링부(202)는, 층마다 악 곡을 클러스터링한다. 즉, 악곡 클러스터링부(202)는, 각 콘텐츠를, 할당된 계층의 각각에서 복수의 클러스터 중 어느 하나에 분류한다.Returning to FIG. 14, in step S204, the music clustering unit 202 clusters music pieces for each layer. That is, the music clustering unit 202 classifies each content into one of a plurality of clusters in each of the assigned hierarchies.

스텝 S205 및 스텝 S206은, 각각, 도 6의 스텝 S2 및 스텝 S3와 마찬가지이므로, 그 설명은 생략한다.Since step S205 and step S206 are the same as that of step S2 and step S3 of FIG. 6, respectively, the description is abbreviate | omitted.

이와 같이 함으로써, 메타데이터에 의한 콘텐츠의 표현의 상세도(표현의 상세함의 정도)를 유지한 채, 데이터량 및 계산량을 삭감하여, 콘텐츠를 클러스터링할 수 있다.By doing in this way, content can be clustered by reducing data amount and calculation amount, maintaining the detail (degree of the detail of expression) of content representation by metadata.

또한, 이상과 같이, 메타데이터를 계층화함으로써, 콘텐츠 간의 미묘한 차이가 잘 표현되도록 콘텐츠를 클러스터링할 수 있다.In addition, as described above, by hierarchizing metadata, contents can be clustered so that subtle differences between contents can be well represented.

다음으로, 유사 이용자 검출부(203)의 상세에 대해서 설명한다.Next, the detail of the similar user detection part 203 is demonstrated.

도 18은, 유사 이용자 검출부(203)의 구성의 예를 도시한 블록도이다. 유사 이용자 검출부(203)는, 정규화부(231), 가중치 계산부(232), 및 유사도 계산부(233)로 구성된다.18 is a block diagram illustrating an example of the configuration of the similar user detection unit 203. The similar user detector 203 includes a normalizer 231, a weight calculator 232, and a similarity calculator 233.

정규화부(231)는, 이용자의 기호 정보의 일례인 기호 벡터를 정규화한다. 가중치 계산부(232)는, 정규화된 각 이용자의 기호 벡터로부터, 이용자의 각각에 대해서 계층마다의 가중치를 계산한다. 유사도 계산부(233)는, 계층마다의 가중치와 기호 벡터로부터, 악곡을 추천하려고 하는 이용자와, 다른 이용자의 기호의 유사 정도를 나타내는 유사도를 계산한다.The normalization unit 231 normalizes the symbol vector which is an example of the user's preference information. The weight calculation unit 232 calculates the weight for each layer for each of the users from the symbol vector of each normalized user. The similarity calculating unit 233 calculates a similarity indicating the degree of similarity between the user who is going to recommend the piece of music and the preference of another user, from the weights and the symbol vectors for each layer.

다음으로, 도 8의 스텝 S21에 대응하는, 유사 이용자 검출부(203)에 의한, 기호가 유사한 이용자 X의 검출의 처리를, 도 19의 플로우차트를 참조하여 설명한 다.Next, the process of detection of the user X with similar preference by the similar user detection unit 203 corresponding to step S21 of FIG. 8 will be described with reference to the flowchart of FIG. 19.

스텝 S231에서, 정규화부(231)는, 이용자 각각의 기호 벡터를 정규화한다.In step S231, the normalization unit 231 normalizes each preference vector of the user.

도 20 및 도 21을 참조하여, 기호 벡터의 정규화에 대해서 설명한다. 도 20은, 기호 벡터 생성부(19)에서 생성되고, 이용자 이력 정보 DB(17)에 유지되어 있는 각 이용자의 기호 벡터의 예를 도시하는 도면이다. 즉, 도 20은, 정규화되기 전의 기호 벡터의 예를 나타낸다.20 and 21, normalization of the symbol vector will be described. 20 is a diagram showing an example of a symbol vector of each user generated by the symbol vector generation unit 19 and held in the user history information DB 17. As shown in FIG. That is, FIG. 20 shows an example of a symbol vector before normalization.

도 20에서 도시되는 기호 벡터의 요소 중, 최초의 4개의 요소가 제1층에 속하고, 다음의 4개의 요소가 제2층에 속하고, 또 다음의 3개의 요소가 제3층에 속하고, 최후의 4개의 요소가 제4층에 속한다.Among the elements of the symbol vector shown in Fig. 20, the first four elements belong to the first layer, the next four elements belong to the second layer, and the next three elements belong to the third layer. The last four elements belong to the fourth layer.

도 20에 도시하는 예에서, U001인 유저 ID로 특정되는 이용자의 기호 벡터는, (0.0, 2.8, 0.0, 2.2, 0.4, 0.6, 0.8, 0.0, 0.5, 0.4, 0.4, 0.0, 0.5, 0.4, 0.0)이다. 여기서, 각각 0.0, 2.8, 0.0, 2.2인 최초의 4개의 요소는, 제1층에 속하고, 각각 0.4, 0.6, 0.8, 0.0인 다음의 4개의 요소는, 제2층에 속하고, 각각 0.5, 0.4, 0.4인 또 다음의 3개의 요소는, 제3층에 속하고, 각각 0.0, 0.5, 0.4, 0.0인 최후의 4개의 요소는, 제4층에 속한다.In the example shown in FIG. 20, the symbol vector of the user specified by the user ID which is U001 is (0.0, 2.8, 0.0, 2.2, 0.4, 0.6, 0.8, 0.0, 0.5, 0.4, 0.4, 0.0, 0.5, 0.4, 0.0). Here, the first four elements, which are 0.0, 2.8, 0.0, and 2.2, respectively, belong to the first layer, and the next four elements, which are 0.4, 0.6, 0.8, and 0.0, respectively, belong to the second layer, and each is 0.5. The next three elements, 0.4, 0.4 belong to the third layer, and the last four elements, 0.0, 0.5, 0.4, and 0.0, respectively, belong to the fourth layer.

도 20에 도시하는 예에서, U002인 유저 ID로 특정되는 이용자의 기호 벡터는, (0.2, 0.8, 0.5, 0.6, 0.0, 0.5, 0.5, 0.0, 0.7, 0.3, 0.6, 0.0, 0.6, 0.2, 0.0)이다. 여기서, 각각 0.2, 0.8, 0.5, 0.6인 최초의 4개의 요소는, 제1층에 속하고, 각각 0.0, 0.5, 0.5, 0.0인 다음의 4개의 요소는, 제2층에 속하고, 각각 0.7, 0.3, 0.6인 또 다음의 3개의 요소는, 제3층에 속하고, 각각 0.0, 0.6, 0.2, 0.0인 최후의 4개의 요소는, 제4층에 속한다.In the example shown in FIG. 20, the symbol vector of the user specified by the user ID which is U002 is (0.2, 0.8, 0.5, 0.6, 0.0, 0.5, 0.5, 0.0, 0.7, 0.3, 0.6, 0.0, 0.6, 0.2, 0.0). Here, the first four elements, which are 0.2, 0.8, 0.5, and 0.6, respectively, belong to the first layer, and the next four elements, which are 0.0, 0.5, 0.5, and 0.0, respectively, belong to the second layer, and 0.7, respectively. The next three elements, 0.3 and 0.6, belong to the third layer, and the last four elements, 0.0, 0.6, 0.2 and 0.0, respectively, belong to the fourth layer.

도 20에 도시하는 예에서, U003인 유저 ID로 특정되는 이용자의 기호 벡터는, (0.0, 2.2, 0.1, 1.6, 0.0, 1.0, 2.0, 1.4, 0.0, 1.2, 0.1, 0.3, 0.4, 0.6, 0.7)이다. 여기서, 각각, 0.0, 2.2, 0.1, 1.6인 최초의 4개의 요소는, 제1층에 속하고, 각각 0.0, 1.0, 2.0, 1.4인 다음의 4개의 요소는, 제2층에 속하고, 각각 0.0, 1.2, 0.1인 또 다음의 3개의 요소는, 제3층에 속하고, 각각 0.3, 0.4, 0.6, 0.7인 최후의 4개의 요소는, 제4층에 속한다.In the example shown in FIG. 20, the symbol vector of the user specified by the user ID of U003 is (0.0, 2.2, 0.1, 1.6, 0.0, 1.0, 2.0, 1.4, 0.0, 1.2, 0.1, 0.3, 0.4, 0.6, 0.7). Here, the first four elements, 0.0, 2.2, 0.1, and 1.6, respectively, belong to the first layer, and the next four elements, each of 0.0, 1.0, 2.0, and 1.4, respectively, belong to the second layer, respectively. The next three elements, 0.0, 1.2, and 0.1, belong to the third layer, and the last four elements, respectively, 0.3, 0.4, 0.6, and 0.7, belong to the fourth layer.

예를 들면, 스텝 S231에서, 정규화부(231)는, 각각의 층에서의 놈이 1로 되도록, 각각의 기호 벡터를 정규화한다.For example, in step S231, the normalization unit 231 normalizes each symbol vector so that the norm in each layer becomes one.

도 21은, 도 20의 기호 벡터를, 각각의 층에서의 놈이 1로 되도록 정규화한 기호 벡터의 예를 도시하는 도면이다.FIG. 21 is a diagram illustrating an example of a symbol vector in which the symbol vector of FIG. 20 is normalized so that the norm in each layer is 1; FIG.

도 21에 도시하는 예에서, U001인 유저 ID로 특정되는 이용자의 정규화된 기호 벡터는, (0.0, 0.8, 0.0, 0.6, 0.4, 0.6, 0.7, 0.0, 0.7, 0.5, 0.5, 0.0, 0.5, 0.4, 0.0)이다. 여기서, 각각 0.0, 0.8, 0.0, 0.6인 최초의 4개의 요소는, 제1층에 속하고, 각각 0.4, 0.6, 0.7, 0.0인 다음의 4개의 요소는, 제2층에 속하고, 각각 0.7, 0.5, 0.5인 또 다음의 3개의 요소는, 제3층에 속하고, 각각 0.0, 0.5, 0.4, 0.0인 최후의 4개의 요소는, 제4층에 속한다.In the example shown in FIG. 21, the normalized symbol vector of the user specified by the user ID of U001 is (0.0, 0.8, 0.0, 0.6, 0.4, 0.6, 0.7, 0.0, 0.7, 0.5, 0.5, 0.0, 0.5, 0.4, 0.0). Here, the first four elements, which are 0.0, 0.8, 0.0, and 0.6, respectively, belong to the first layer, and the next four elements, which are 0.4, 0.6, 0.7, and 0.0, respectively, belong to the second layer, and 0.7, respectively. The next three elements, 0.5, 0.5 belong to the third layer, and the last four elements, 0.0, 0.5, 0.4, and 0.0, respectively, belong to the fourth layer.

도 21에 도시하는 예에서, U002인 유저 ID로 특정되는 이용자의 정규화된 기호 벡터는, (0.2, 0.7, 0.4, 0.5, 0.0, 0.7, 0.7, 0.0, 0.7, 0.3, 0.6, 0.0, 0.8, 0.3, 0.0)이다. 여기서, 각각 0.2, 0.7, 0.4, 0.5인 최초의 4개의 요소는, 제1층 에 속하고, 각각 0.0, 0.7, 0.7, 0.0인 다음의 4개의 요소는, 제2층에 속하고, 각각 0.7, 0.3, 0.6인 또 다음의 3개의 요소는, 제3층에 속하고, 각각 0.0, 0.8, 0.3, 0.0인 최후의 4개의 요소는, 제4층에 속한다.In the example shown in FIG. 21, the normalized symbol vector of the user specified by the user ID of U002 is (0.2, 0.7, 0.4, 0.5, 0.0, 0.7, 0.7, 0.0, 0.7, 0.3, 0.6, 0.0, 0.8, 0.3, 0.0). Here, the first four elements, which are 0.2, 0.7, 0.4, and 0.5, respectively, belong to the first layer, and the next four elements, which are 0.0, 0.7, 0.7, and 0.0, respectively, belong to the second layer, and 0.7, respectively. The next three elements, 0.3 and 0.6, belong to the third layer, and the last four elements, 0.0, 0.8, 0.3 and 0.0, respectively, belong to the fourth layer.

도 21에 도시하는 예에서, U003인 유저 ID로 특정되는 이용자의 정규화된 기호 벡터는, (0.0, 0.8, 0.0, 0.6, 0.0, 0.4, 0.8, 0.5, 0.0, 1.0, 0.1, 0.3, 0.2, 0.2, 0.3)이다. 여기서, 각각 0.0, 0.8, 0.0, 0.6인 최초의 4개의 요소는, 제1층에 속하고, 각각 0.0, 0.4, 0.8, 0.5인 다음의 4개의 요소는, 제2층에 속하고, 각각 0.0, 1.0, 0.1인 또 다음의 3개의 요소는, 제3층에 속하고, 각각 0.3, 0.2, 0.2, 0.3인 최후의 4개의 요소는, 제4층에 속한다.In the example shown in FIG. 21, the normalized symbol vector of the user specified by the user ID of U003 is (0.0, 0.8, 0.0, 0.6, 0.0, 0.4, 0.8, 0.5, 0.0, 1.0, 0.1, 0.3, 0.2, 0.2, 0.3). Here, the first four elements, which are 0.0, 0.8, 0.0, and 0.6, respectively, belong to the first layer, and the next four elements, which are 0.0, 0.4, 0.8, and 0.5, respectively, belong to the second layer, and 0.0, respectively. The next three elements, 1.0 and 0.1, belong to the third layer, and the last four elements, respectively, 0.3, 0.2, 0.2 and 0.3, belong to the fourth layer.

도 19로 되돌아가서, 스텝 S232에서, 가중치 계산부(232)는, 이용자 각각의 기호 벡터의 계층의 각각에 대해서, 가중치를 계산한다. 예를 들면, 스텝 S232에서, 가중치 계산부(232)는, 각각의 계층마다, 1개의 계층에 속하는 요소의 분산인 가중치를 계산한다.Returning to FIG. 19, in step S232, the weight calculation unit 232 calculates the weight for each of the hierarchies of the preference vector of each user. For example, in step S232, the weight calculation part 232 calculates the weight which is the variance of the element which belongs to one layer for each layer.

도 22는, 이용자 각각에 대해서 계층마다 계산된, 각각의 계층에 속하는 요소의 분산인 가중치의 예를 도시하는 도면이다. 도 22에 도시하는 예에서, U001인 유저 ID로 특정되는 이용자에 대한 제1층의 가중치, 제2층의 가중치, 제3층의 가중치, 및 제4층의 가중치는, 각각, 0.17, 0.10, 0.01, 및 0.06이다.FIG. 22 is a diagram illustrating an example of weights that are variances of elements belonging to each hierarchy calculated for each user for each hierarchy. In the example shown in FIG. 22, the weight of the 1st layer, the weight of the 2nd layer, the weight of the 3rd layer, and the weight of the 4th layer with respect to the user specified by the user ID which is U001 are 0.17, 0.10, 0.01, and 0.06.

U002인 유저 ID로 특정되는 이용자에 대한 제1층의 가중치, 제2층의 가중치, 제3층의 가중치, 및 제4층의 가중치는, 각각, 0.05, 0.17, 0.05, 및 0.16이다. 또한, U003인 유저 ID로 특정되는 이용자에 대한 제1층의 가중치, 제2층의 가중치, 제3층의 가중치, 및 제4층의 가중치는, 각각, 0.16, 0.10, 0.31, 및 0.00이다.The weight of the first layer, the weight of the second layer, the weight of the third layer, and the weight of the fourth layer are 0.05, 0.17, 0.05, and 0.16, respectively, for the user specified by the user ID of U002. The weight of the first layer, the weight of the second layer, the weight of the third layer, and the weight of the fourth layer are 0.16, 0.10, 0.31, and 0.00, respectively, for the user specified by the user ID of U003.

스텝 S233에서, 유사도 계산부(233)는, 이용자의 각각에 대해서, 가중치 부여한 기호의 유사도를 계산한다. 스텝 S234에서, 유사 이용자 검출부(203)는, 이용자로부터, 최대의 기호의 유사도의 이용자 X를 검출하고, 처리는 종료한다.In step S233, the similarity calculator 233 calculates the similarity of the weighted symbols for each of the users. In step S234, the similar user detection unit 203 detects a user X having a maximum degree of similarity from the user, and the processing ends.

만약, 수학식 1에 의해, 이용자 u 및 이용자 v의 유사도(sim(u,v))를 가중치 부여하지 않고 계산한다고 하면, 이용자 X를 U001인 유저 ID로 특정되는 이용자로 한 경우, U002인 유저 ID로 특정되는 이용자 및 U003인 유저 ID로 특정되는 이용자에 대한 유사도는, 도 23에 도시되는 바와 같이 된다.If equation (1) calculates the similarity (sim (u, v)) of users u and v without weighting, user U002 is a user identified by user ID U001. The similarity degree between the user specified by the ID and the user specified by the user ID of U003 is as shown in FIG.

수학식 1에서, L은, 기호 벡터의 계층의 수를 나타내는 값이고, l은, 기호 벡터의 계층을 특정하는 값이다. C(l)은, 기호 벡터의 클러스터의 전체를 나타내고, c는, 클러스터를 특정하는 값이다. h는, 정규화된 기호 벡터의 요소의 값을 나타낸다.In Equation 1, L is a value indicating the number of layers of the symbol vector, and l is a value specifying the layer of the symbol vector. C (l) represents the entire cluster of symbol vectors, and c is a value for specifying the cluster. h represents the value of the element of a normalized symbol vector.

U001인 유저 ID로 특정되는 이용자의 기호 벡터의 요소 중, 제1층의 요소와, U002인 유저 ID로 특정되는 이용자의 기호 벡터의 요소 중, 제1층의 요소가, 대응하는 요소끼리로 승산되고, 승산된 결과가 적산되면, 도 23의 U002인 유저 ID의 제1층에 배치한 값인 0.88이 구해진다. 마찬가지로, 제2층, 제3층, 및 제4층에 대해서, U001인 유저 ID로 특정되는 이용자의 기호 벡터의 요소와, U002인 유저 ID로 특정되는 이용자의 기호 벡터의 요소가, 대응하는 요소끼리로 승산되고, 승산된 결과가 적산되면, 도 23의 U002인 유저 ID의 제2층, 제3층, 및 제4층의 각각에 배치한 값인 0.92, 0.97, 0.50이 구해진다.The elements of the first layer among the elements of the user's preference vector specified by the user ID of U001 and the elements of the first layer of the elements of the user's preference vector specified by the user ID of U002 are multiplied by the corresponding elements. When the multiplied result is integrated, 0.88, which is a value placed on the first layer of the user ID of U002 in FIG. 23, is obtained. Similarly, for the second layer, the third layer, and the fourth layer, elements of the user's preference vector specified by the user ID of U001 and elements of the user's preference vector specified by the user ID of U002 are corresponding elements. When the result of multiplication is multiplied with each other, the result of multiplication is 0.92, 0.97, 0.50, which are values disposed in each of the second, third, and fourth layers of the user ID, which is U002 in FIG.

최종적으로, U001인 유저 ID로 특정되는 이용자와 U002인 유저 ID로 특정되는 이용자의 기호의 유사도는, 제1층, 제2층, 제3층, 및 제4층의 각각에 대해서 구해진 0.88, 0.92, 0.97, 및 0.50을 가산한 값인 3.27로 한다.Finally, the similarity between the preferences of the user specified by the user ID of U001 and the user specified by the user ID of U002 is 0.88, 0.92 determined for each of the first layer, the second layer, the third layer, and the fourth layer. It is set to 3.27 which is a value obtained by adding, 0.97, and 0.50.

마찬가지로, U001인 유저 ID로 특정되는 이용자의 기호 벡터의 요소 중, 제1층의 요소와, U003인 유저 ID로 특정되는 이용자의 기호 벡터의 요소 중, 제1층의 요소가, 대응하는 요소끼리로 승산되고, 승산된 결과가 적산되면, 도 23의 U003인 유저 ID의 제1층에 배치한 값인 1.00이 구해진다. 마찬가지로, 제2층, 제3층, 및 제4층에 대해서, U001인 유저 ID로 특정되는 이용자의 기호 벡터의 요소와, U003인 유저 ID로 특정되는 이용자의 기호 벡터의 요소가, 대응하는 요소끼리로 승산되고, 승산된 결과가 적산되면, 도 23의 U003인 유저 ID의 제2층, 제3층, 및 제4층의 각각에 배치한 값인 0.77, 0.57, 0.15가 구해진다.Similarly, elements of the first layer among the elements of the user's preference vector specified by the user ID of U001 and elements of the first layer among the elements of the user's preference vector specified by the user ID of U003 correspond to elements corresponding to each other. Multiplying by and multiplying the result, 1.00, which is a value arranged on the first layer of the user ID of U003 in FIG. Similarly, for the second layer, the third layer, and the fourth layer, elements of the user's preference vector specified by the user ID of U001 and elements of the user's preference vector specified by the user ID of U003 correspond to the corresponding elements. When the result of multiplication is multiplied with each other, the result of multiplication is multiplied, and 0.77, 0.57, and 0.15, which are values disposed in each of the second, third, and fourth layers of the user ID of U003 in FIG. 23, are obtained.

최종적으로, U001인 유저 ID로 특정되는 이용자와 U003인 유저 ID로 특정되는 이용자의 기호의 유사도는, 제1층, 제2층, 제3층, 및 제4층의 각각에 대해서 구해진 1.00, 0.77, 0.57, 및 0.15를 가산한 값인 2.50으로 된다.Finally, the similarity between the preferences of the user specified by the user ID U001 and the user specified by the user ID U003 is 1.00, 0.77 determined for each of the first layer, the second layer, the third layer, and the fourth layer. It becomes 2.50 which is the value which added, 0.57, and 0.15.

이와 같이, 가중치 부여하지 않고 유사도를 계산하면, U001인 유저 ID로 특정되는 이용자와 U002인 유저 ID로 특정되는 이용자의 기호의 유사도가, U001인 유저 ID로 특정되는 이용자와 U003인 유저 ID로 특정되는 이용자의 기호의 유사도보 다 크기 때문에, 최대의 기호의 유사도의 이용자 X로서, U002인 유저 ID로 특정되는 이용자가 검출되게 된다.In this way, if the similarity is calculated without weighting, the similarity between the preferences of the user specified by the user ID of U001 and the user identified by the user ID of U002 is specified by the user specified by the user ID of U001 and the user ID of U003. Since the similarity of the user's preference is larger than that of the user, the user specified by the user ID of U002 is detected as the user X having the maximum similarity of the preference.

이에 대하여, 스텝 S233에서, 유사도 계산부(233)는, 수학식 2에 의해, 가중치 부여한, 이용자 u 및 이용자 v의 유사도(sim(u, v))를 계산한다.In contrast, in step S233, the similarity calculator 233 calculates the similarity (sim (u, v)) of the user u and the user v, which are weighted, by Equation (2).

수학식 2에서, L은, 기호 벡터의 계층의 수를 나타내는 값이고, l은, 기호 벡터의 계층을 특정하는 값이다. C(l)은, 기호 벡터의 클러스터의 전체를 나타내고, c는, 클러스터를 특정하는 값이다. h는, 정규화된 기호 벡터의 요소의 값을 나타낸다. b는, 계층의 각각에 대한 가중치를 나타낸다.In Equation 2, L is a value indicating the number of layers of the symbol vector, and l is a value specifying the layer of the symbol vector. C (l) represents the entire cluster of symbol vectors, and c is a value for specifying the cluster. h represents the value of the element of a normalized symbol vector. b represents a weight for each of the layers.

도 24는, 이용자 X를 U001인 유저 ID로 특정되는 이용자로 한 경우, U002인 유저 ID로 특정되는 이용자 및 U003인 유저 ID로 특정되는 이용자에 대한 가중치 부여한 기호의 유사도의 예를 도시하는 도면이다. 또한, 도 24에 도시되는 값은, 수학식 2로 산출되는 유사도(sim(u, v))를 100배한 것이다.FIG. 24 is a diagram showing an example of the similarity between weighted symbols for a user specified by a user ID of U002 and a user specified by a user ID of U003 when the user X is a user specified by a user ID of U001. . In addition, the value shown in FIG. 24 is 100 times the similarity sim (u, v) calculated by Formula (2).

U001인 유저 ID로 특정되는 이용자의 기호 벡터의 요소 중, 제1층의 요소의 각각에, U001인 유저 ID로 특정되는 이용자의 제1층의 가중치가 승산되고, U002인 유저 ID로 특정되는 이용자의 기호 벡터의 요소 중, 제1층의 요소의 각각에, U002인 유저 ID로 특정되는 이용자의 제1층의 가중치가 승산되고, 대응하는 요소끼리로 승산되고, 승산된 결과가 적산되면, 도 24의 U002인 유저 ID의 제1층에 배치한 값 인 0.72가 구해진다.Among the elements of the user's preference vector specified by the user ID of U001, the weight of the first layer of the user specified by the user ID of U001 is multiplied by each of the elements of the first layer, and the user identified by the user ID of U002. If the weights of the first layer of the user specified by the user ID of U002 are multiplied among the elements of the first layer among the elements of the symbol vector of, and are multiplied by the corresponding elements, the result of multiplication is multiplied. 0.72 which is the value arrange | positioned at the 1st layer of user ID which is U002 of 24 is calculated | required.

마찬가지로, 제2층, 제3층, 및 제4층에 대해서, U001인 유저 ID로 특정되는 이용자의 기호 벡터의 요소의 각각에, U001인 유저 ID로 특정되는 이용자의 제1층, 제2층, 제3층 및 제4층의 가중치가 승산되고, U002인 유저 ID로 특정되는 이용자의 기호 벡터의 요소의 각각에, U002인 유저 ID로 특정되는 이용자의 제1층의 가중치가 승산되고, 대응하는 요소끼리로 승산되고, 승산된 결과가 적산되면, 도 24의 U002인 유저 ID의 제2층, 제3층, 및 제4층의 각각에 배치한 값인 1.54, 0.03, 0.48이 구해진다.Similarly, for the second layer, the third layer, and the fourth layer, each of the elements of the user's preference vector specified by the user ID of U001, the first and second layers of the user specified by the user ID of U001 The weights of the third and fourth layers are multiplied, and each of the elements of the user's preference vector specified by the user ID of U002 is multiplied by the weight of the first layer of the user specified by the user ID of U002. When the multiplied elements are multiplied and the multiplied result is integrated, 1.54, 0.03, and 0.48, which are values disposed in each of the second layer, the third layer, and the fourth layer of the user ID of U002 in FIG. 24, are obtained.

최종적으로, U001인 유저 ID로 특정되는 이용자와 U002인 유저 ID로 특정되는 이용자의 가중치 부여된 기호의 유사도는, 제1층, 제2층, 제3층, 및 제4층의 각각에 대해서 구해진 0.72, 1.54, 0.03, 및 0.48을 가산한 값인 2.76으로 된다.Finally, the similarity between the weighted preferences of the user specified by the user ID of U001 and the user identified by the user ID of U002 is obtained for each of the first layer, the second layer, the third layer, and the fourth layer. It becomes 2.76 which is the value which added 0.72, 1.54, 0.03, and 0.48.

마찬가지로, U001인 유저 ID로 특정되는 이용자의 기호 벡터의 요소 중, 제1층의 요소의 각각에, U001인 유저 ID로 특정되는 이용자의 제1층의 가중치가 승산되고, U003인 유저 ID로 특정되는 이용자의 기호 벡터의 요소 중, 제1층의 요소의 각각에, U003인 유저 ID로 특정되는 이용자의 제1층의 가중치가 승산되고, 대응하는 요소끼리로 승산되고, 승산된 결과가 적산되면, 도 24의 U003인 유저 ID의 제1층에 배치한 값인 2.74가 구해진다. 마찬가지로, 제2층, 제3층, 및 제4층에 대해서, U001인 유저 ID로 특정되는 이용자의 기호 벡터의 요소의 각각에, U001인 유저 ID로 특정되는 이용자의 제1층, 제2층, 제3층 및 제4층의 가중치가 승산되고, U003인 유저 ID로 특정되는 이용자의 기호 벡터의 요소의 각각에, U003인 유저 ID로 특 정되는 이용자의 제1층의 가중치가 승산되고, 대응하는 요소끼리로 승산되고, 승산된 결과가 적산되면, 도 24의 U003인 유저 ID의 제2층, 제3층, 및 제4층의 각각에 배치한 값인 0.79, 0.10, 0.00이 구해진다.Similarly, among the elements of the user's preference vector specified by the user ID of U001, the weight of the first layer of the user specified by the user ID of U001 is multiplied by each of the elements of the first layer, and specified by the user ID of U003. Of the elements of the user's preference vector, each of the elements of the first layer is multiplied by the weight of the first layer of the user specified by the user ID of U003, multiplied by the corresponding elements, and multiplied by the result. 2.74 which is the value arrange | positioned at the 1st layer of the user ID which is U003 of FIG. 24 is calculated | required. Similarly, for the second layer, the third layer, and the fourth layer, each of the elements of the user's preference vector specified by the user ID of U001, the first and second layers of the user specified by the user ID of U001 The weights of the third and fourth layers are multiplied, and each of the elements of the user's preference vector specified by the user ID of U003 is multiplied by the weight of the first layer of the user specified by the user ID of U003, When the corresponding elements are multiplied and the multiplied result is integrated, 0.79, 0.10, and 0.00, which are values disposed in each of the second layer, the third layer, and the fourth layer of the user ID of U003 in FIG. 24, are obtained.

최종적으로, U001인 유저 ID로 특정되는 이용자와 U003인 유저 ID로 특정되는 이용자의 가중치 부여된 기호의 유사도는, 제1층, 제2층, 제3층, 및 제4층의 각각에 대해서 구해진 2.74, 0.79, 0.10, 및 0.00을 가산한 값인 3.64로 된다.Finally, the similarity degree between the weighted preferences of the user specified by the user ID of U001 and the user identified by the user ID of U003 is obtained for each of the first layer, the second layer, the third layer, and the fourth layer. It becomes 3.64 which is the value which added 2.74, 0.79, 0.10, and 0.00.

이 결과, 가중치 부여하여 유사도를 계산하면, U001인 유저 ID로 특정되는 이용자와 U003인 유저 ID로 특정되는 이용자의 기호의 유사도가, U001인 유저 ID로 특정되는 이용자와 U002인 유저 ID로 특정되는 이용자의 기호의 유사도보다 크기 때문에, 최대의 기호의 유사도의 이용자 X로서, U003인 유저 ID로 특정되는 이용자가 검출된다.As a result, when the similarity is calculated by weighting, the degree of similarity between the user specified by the user ID of U001 and the user specified by the user ID of U003 is specified by the user specified by the user ID of U001 and the user ID of U002. Since it is larger than the similarity of the user's preference, the user specified by the user ID of U003 is detected as the user X of the maximum similarity of the preference.

도 20에서 도시되는 기호 벡터에 주목하면, U001인 유저 ID로 특정되는 이용자의 기호 벡터의 각 요소의 값은, 제2층 내지 제4층에 비교하여, 제1층에서 크게 변화되고 있기 때문에, 제2층 내지 제4층에 비교하여, 제1층의 각 요소의 값이, U001인 유저 ID로 특정되는 이용자의 기호에 더 관계한다고 예측된다.If the attention is paid to the symbol vector shown in FIG. 20, since the value of each element of the user's preference vector specified by the user ID of U001 is largely changed in the 1st layer compared with the 2nd layer-the 4th layer, Compared with 2nd-4th layer, it is estimated that the value of each element of a 1st layer further relates to the preference of the user specified by the user ID which is U001.

U002인 유저 ID로 특정되는 이용자의 기호 벡터, 및 U003인 유저 ID로 특정되는 이용자의 기호 벡터의 제1층의 각 요소의 값에 대해서 주목하면, U003인 유저 ID로 특정되는 이용자의 기호 벡터의 제1층의 각 요소의 값이, U002인 유저 ID로 특정되는 이용자의 기호 벡터의 제1층의 각 요소의 값보다, U001인 유저 ID로 특정되는 이용자의 기호 벡터의 제1층의 각 요소의 값에 근사하고 있다. 따라서, U002 인 유저 ID로 특정되는 이용자에 비교하여, U003인 유저 ID로 특정되는 이용자의 기호는, U001인 유저 ID로 특정되는 이용자의 기호와 유사하다고 예측된다.Note the value of each element of the first layer of the user's preference vector specified by the user ID U002 and the user's preference vector specified by the user ID U003. Each element of the first layer of the user's preference vector specified by the user ID U001 than the value of each element of the user's preference vector specified by the user ID U002 is the value of each element of the first layer. Approximate to the value of. Therefore, compared with the user specified by the user ID of U002, the preference of the user specified by the user ID of U003 is predicted to be similar to that of the user specified by the user ID of U001.

이와 같이, 가중치 부여함으로써, 이용자의 기호에 그다지 관계하지 않는다고 예측되는 값에 비교하여, 이용자의 기호에 더 관계한다고 예측되는 값에 의해, 그 값이 더 크게 변화하는 기호의 유사도를 구할 수 있기 때문에, 더 정확하게, 기호가 유사한 이용자를 검출할 수 있다.In this way, by weighting, the similarity of the sign that the value changes more greatly can be obtained from the value predicted to be more related to the user's preference compared to the value predicted to be less relevant to the user's preference. More accurately, users with similar symbols can be detected.

또한, 스텝 S232에서, 가중치 계산부(232)는, 예를 들면, 각각의 계층에 속하는 요소의 분산인 가중치를 계산한다고 설명하였지만, 이에 한하지 않고, 계층에서의 요소의 변동이 큰 경우에 더 큰 값으로 되는 가중치를 계산하면 되어, 예를 들면, 수학식 3에 의해 엔트로피 H를 산출하고, 1로부터 엔트로피 H를 뺄셈한 결과 얻어지는 값인 가중치를 계산하도록 하여도 된다.In addition, although it was demonstrated in step S232 that the weight calculation part 232 calculates the weight which is the variance of the element which belongs to each hierarchy, for example, it is not limited to this, and further, when the variation of an element in a hierarchy is large. What is necessary is just to calculate the weight which becomes a large value, For example, you may calculate entropy H by Formula (3), and calculate the weight which is a value obtained by subtracting entropy H from 1.

이와 같이, 정보의 누락을 최소한으로 억제하면서, 적절한 콘텐츠를 선택하기 위한 계산량을 삭감할 수 있다. 또한, 이용자가 어떤 정보에 주목하여 콘텐츠를 선택하고 있는지를 확실하게 반영한 콘텐츠를 제시할 수 있게 된다.In this way, the amount of calculation for selecting appropriate content can be reduced while minimizing the omission of information. In addition, it is possible to present a content that reliably reflects what information the user pays attention to.

또한, 본 명세서에서, 프로그램에 기초하여 실행되는 스텝은, 기재된 순서에 따라서 시계열적으로 행해지는 처리는 물론, 반드시 시계열적으로 처리되지 않더라도, 병렬적 혹은 개별로 실행되는 처리도 포함하는 것이다.In addition, in this specification, the step performed based on a program includes not only the process performed time-series according to the order described, but also the process performed in parallel or separately, even if it does not necessarily process in time series.

또한, 프로그램은, 1대의 컴퓨터에 의해 처리되는 것이어도 되고, 복수의 컴퓨터에 의해 분산 처리되는 것이어도 된다. 또한, 프로그램은, 먼 곳의 컴퓨터에 전송되어 실행되는 것이어도 된다.The program may be processed by one computer or may be distributed by a plurality of computers. In addition, the program may be transmitted to a remote computer and executed.

또한, 본 명세서에서, 시스템이란, 복수의 장치에 의해 구성되는 장치 전체를 나타내는 것이다.In addition, in this specification, a system shows the whole apparatus comprised by several apparatus.

또한, 본 발명의 실시 형태는, 전술한 실시 형태에 한정되는 것은 아니고, 본 발명의 요지를 일탈하지 않는 범위에서 다양한 변경이 가능하다.In addition, embodiment of this invention is not limited to embodiment mentioned above, A various change is possible in the range which does not deviate from the summary of this invention.

Claims

An information processing apparatus for selecting and presenting content that satisfies predetermined conditions from a content group to a user,

Content classification means for classifying each content constituting the content group into any one of a plurality of first clusters in each of hierarchies according to metadata of the content;

Holding means for maintaining a database indicating a correspondence between each content and the first cluster in the hierarchy where each of the respective content is classified;

Management means for managing history information on the content of the user;

Selection means for specifying a first cluster of interest based on the history information, and selecting content classified in the specific first cluster;

Presentation means for presenting the selected content

Information processing apparatus comprising a.

The method of claim 1,

The selection means,

Detecting means for detecting a first user and a second user having similar history information;

Specifying means for specifying a first cluster, on which content existing on the history information of the second user, which does not exist on the history information of the first user, is classified;

Extracting means for extracting content classified in the specified first cluster,

And the presenting means presents the content extracted for the first user.

The method of claim 1,

Generation means for generating preference information indicating the preference of the user in units of the first cluster based on the history information of the user and the database;

Grouping means for grouping users based on the preference information,

The selection means,

Detecting means for detecting a second user belonging to the same group as the first user;

And the presenting means presents the content extracted for the first user.

The method of claim 1,

The selection means,

Detecting means for detecting a first user and a second user having a similar symbol represented by the preference information;

Specifying means for specifying a first cluster to be focused on based on preference information of the first user and preference information of the second user;

And the presenting means presents the content extracted for the first user.

The method of claim 4, wherein

The detection means,

Normalization means for normalizing user's preference information,

Weight calculation means for calculating weight for each layer from each user's preference information;

And similarity calculation means for calculating a similarity indicating a degree of similarity between the first user among the users and the other users among the users, based on the weight for each layer and the preference information.

An information processing apparatus for detecting a second user whose taste is similar to the first user from the calculated similarity.

The method of claim 1,

Grouping means for grouping users based on the preference information,

The selection means,

And the presenting means presents the content extracted for the first user.

The method of claim 1,

Setting means for setting a keyword for each of the first clusters in which the metadata is classified by the content classification means;

Creation means for creating a reason text indicating a reason for presentation of the content using the keyword set by the setting means,

And the presenting means also presents the reason text.

The method of claim 1,

Metadata classification means for classifying metadata of the content into any one of a plurality of second clusters, and assigning the hierarchy to a second cluster;

The content classification means classifies each content into any one of a plurality of first clusters in each of the assigned hierarchies.

The method of claim 1,

The selecting means is an information processing apparatus that selects, among the first clusters of all the hierarchies of all the hierarchies, the content classified in the largest number of the first clusters as the first cluster represented by the preference information.

In the information processing method of the information processing apparatus which selects the content which meets a predetermined condition from the content group, and presents it to a user,

Each content constituting the content group is classified into one of a plurality of clusters in each layer according to the metadata of the content,

Maintaining a database representing a correspondence relationship between each content and the cluster in the hierarchy where each of the respective content is classified;

Manage history information on the content of the user,

On the basis of the history information, a cluster of interest is specified, and contents classified into the specific cluster are selected;

Presenting the selected content

An information processing method comprising a step.

As a program for selecting content that satisfies predetermined conditions from a content group and recommending it to a user,

Manage history information on the content of the user,

Presenting the selected content

A program that causes a computer to execute a process including a step.