JP2010134714A

JP2010134714A - Collaborative sorting apparatus, method, and program, and computer readable recording medium

Info

Publication number: JP2010134714A
Application number: JP2008310173A
Authority: JP
Inventors: Takeharu Eda; 毅晴江田; Toshiro Uchiyama; 俊郎内山; Masashi Uchiyama; 匡内山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-12-04
Filing date: 2008-12-04
Publication date: 2010-06-17

Abstract

<P>PROBLEM TO BE SOLVED: To achieve division of a set of collected tags into two groups of objective tags consensual among users and subjective tags being noise as category axes. <P>SOLUTION: Tags being sorting axes are made into feature vectors by probabilistic clustering with respect to a plurality of sorting axes stored in a database, sorting objects, and concurrence information relevant to users, and sorting axes arbitrarily given by users are divided into objective tags and subjective tags on the basis of feature vectors. Entropy values of feature vectors are calculated, and feature vectors of which the entropy values are higher than a prescribed threshold are defined as subjective tags, and feature vectors of which the entropy values are lower than the threshold are defined as objective tags. Alternatively, a set of sorting axes being a correct answer is obtained from the database, and nearby tags being at short distances from the set of sorting axes as the correct answer are defined as subjective tag candidates. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、協調的分類方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体に係り、特に、ブックマークや写真、動画、本、論文といった情報を、複数のユーザが分類し共有する協調的分類方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体に関する。詳しくは、協調的分類情報装置において、情報の分類軸として相応しい客観的な分類軸（客観的タグ）と、情報の分類軸としては相応しくない主観的な分類軸（主観的タグ）を、サービス側に収集される分類情報のみから機械的に区別し、サービス提供者による高度な分類情報提示を実現するための協調的分類方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to a cooperative classification method and apparatus, a program, and a computer-readable recording medium, and in particular, a cooperative classification method in which a plurality of users classify and share information such as bookmarks, photos, videos, books, and papers. The present invention relates to an apparatus, a program, and a computer-readable recording medium. Specifically, in the collaborative classification information device, an objective classification axis (objective tag) suitable as an information classification axis and a subjective classification axis (subjective tag) not suitable as an information classification axis are provided on the service side. The present invention relates to a collaborative classification method and apparatus, a program, and a computer-readable recording medium for mechanically distinguishing only from the classification information collected in the system and realizing high-level classification information presentation by a service provider.

昨今、ＵＲＬ（ブックマーク）、写真、動画、論文、といった情報を、各ユーザがそれぞれ整理分類した結果を共有することにより、鮮度の高い情報の整理を可能にする協調的分類システム（collaborative/Social Tagging System）が隆盛である。協調的分散システムでは、ユーザによる自由なタグの付与を可能としており、ユーザによる情報の分類結果を収集し、情報取得時に検索キーとして利用することができる。 Recently, collaborative / social tagging (collaborative / social tagging) enables users to organize fresh information by sharing the results of each user organizing and classifying information such as URLs (bookmarks), photos, videos, and papers. System) is prosperous. In the cooperative distributed system, a user can freely assign tags, and information classification results by a user can be collected and used as a search key when acquiring information.

しかしながら、協調的分類システムではユーザが自由にタグを付与するため、分類軸として相応しくない主観的タグが混在する問題が指摘されている（例えば、文献１"The Structure of Collaborative Tagging System", Scott Golder, and Bernardo A. Huberman Journal of Information Science 2006）。当該文献１において、協調的分類システムにおいて、タグの使われ方は７種類存在すると言われている。 However, in the collaborative classification system, since the user freely attaches tags, there is a problem that subjective tags that are not suitable as classification axes are mixed (for example, Reference 1 “The Structure of Collaborative Tagging System”, Scott Golder). , and Bernardo A. Huberman Journal of Information Science 2006). In Document 1, it is said that there are seven types of tag usage in a cooperative classification system.

●客観的タグ：
１．対象が何についてであるか？(What it is about?)
−−［例］"web2.0"、"コンピュータ"、"programming"
２．対象が何であるか？（What it is?）
――［例］"blog"、"news"、"記事"
３．誰が対象を所有するか？（Who owns it?）
――［例］"jkondo"、"dankogai"、"timbray"
４．分類の詳細化（Refining Categories）
――［例］"コンピュータ"＋"プログラミング"、"仕事"+"納期"
●主観的タグ：
５．対象の質（Qualities or Characteristics）
――［例］"これはすごい"、"fun"、"これはひどい"、"★★★"
６．各自の参照用（Self Reference）
――［例］"セルクマ"（セルフブックマーク）、"mystuff"
７．タスク整理（Task Organizing）
――［例］"あとで読む"、"toread"、"todo"
これらのタグは、タグの字面のみでその種類を断定することはできない。例えば、"ブログ"というタグを付与されたＵＲＬが"ブログ"についての記事なのか、ブログそのものなのかはＵＲＬの中身まで見ないと分からない。しかしながら、これらの７種類のうち、１〜４までと５〜７とではタグの種類が異なると考えられる。１〜４の客観的タグは対象の中身に関する客観性のあるタグであり、利用者の間で合意をとれる可能性があるが、５〜７の主観的タグは、対象だけでなく利用者の状態に大きく影響を受ける主観的なタグであり利用者により使われ方が異なると考えられる。本明細書では、１〜４に属すると考えられるタグを「客観的タグ」、５〜７に属すると考えられるタグを「主観的タグ」と呼ぶ。 ● Objective tags:
1. What is the subject about? (What it is about?)
--- [Example] "web2.0", "Computer", "programming"
2. What is the target? (What it is?)
-[Example] "blog", "news", "article"
3. Who owns the subject? (Who owns it?)
-[Example] "jkondo", "dankogai", "timbray"
4). Refinement Categories
-[Example] "Computer" + "Programming", "Work" + "Delivery"
● Subjective tags:
5). Qualities or characteristics
―― [Example] “This is amazing”, “fun”, “This is terrible”, “★★★”
6). For your own reference (Self Reference)
-[Example] "Selkuma" (self-bookmark), "mystuff"
7). Task Organizing
-[Example] "Read later", "toread", "todo"
The type of these tags cannot be determined only by the face of the tag. For example, if the URL with the tag “Blog” is an article about “Blog” or the blog itself, it is not possible to understand the contents of the URL. However, among these seven types, it is considered that the types of tags differ between 1-4 and 5-7. The objective tags 1 to 4 are objective tags related to the contents of the object, and there is a possibility that an agreement can be reached among users, but the subjective tags 5 to 7 are not only for the object but also for the user. It is a subjective tag that is greatly influenced by the state, and is considered to be used differently depending on the user. In this specification, a tag considered to belong to 1 to 4 is called an “objective tag”, and a tag considered to belong to 5 to 7 is called a “subjective tag”.

＜主観的タグと客観的タグの問題＞
以下に、主観的タグと客観的タグの問題について説明する。 <Problems of subjective and objective tags>
The problem of subjective tags and objective tags will be described below.

［主観的タグと客観的タグの例］
下記に２００８年８月において、Ｗｅｂに公開されているソーシャルブックマークサービスから取得したより利用されているタグから客観的タグと主観的タグの例を示す。 [Examples of subjective and objective tags]
In August 2008, examples of objective tags and subjective tags from tags used from a social bookmark service published on the Web are shown below.

●客観的タグ：
⇒"mobile"、"science"、"apple"、"osx"、"turorials"、"technology"、"life"、"gtd"
●主観的タグ：
⇒"fun"、"cool"、"これはひどい"、"これはすごい"
現状の協調的分散システムが人気を集めた要因の一つとしてタグ（分類軸）として任意のキーワードを付与できることがあげられる。その結果として、協調的分散システムでは、客観的タグと主観的タグが混在して収集されている状況にある。客観的タグは利用者間で合意がとれるため、情報を探したり、タグを一覧表示させて利用するのに有用であると考えられる。一方、主観的タグは各人によって使われ方が異なるため、各人が個人的に利用する際には有用であるが、他人がそのまま利用するのは難しい。 ● Objective tags:
⇒ "mobile", "science", "apple", "osx", "turorials", "technology", "life", "gtd"
● Subjective tags:
⇒ "fun", "cool", "this is terrible", "this is awesome"
One of the reasons why the current cooperative distributed system has gained popularity is that arbitrary keywords can be assigned as tags (classification axes). As a result, in a cooperative distributed system, objective tags and subjective tags are mixed and collected. Since the objective tag can be agreed between users, it is considered useful for searching for information or displaying a list of tags. On the other hand, since subjective tags are used differently by each person, it is useful when each person uses it personally, but it is difficult for others to use it as it is.

例えば、"これはすごい"と言える理由は各人によって大きく異なる。"これはすごい"と何度も分類されたＵＲＬが別の複数の利用者によって"これはひどい"とタグ付けされているケースは少なくない。 For example, the reason for saying "This is amazing" varies greatly from person to person. There are many cases where a URL that has been classified many times as "This is amazing" is tagged as "This is terrible" by different users.

しかしながら、サービス提供者が任意のタグを利用することを禁止し、予め指定されたタグのみ使うことを利用者に強いると、協調的分類システムの利点を大きく損なうことに繋がる。もし、利用者に任意のタグの利用を許しながら、サービス提供者で自動的にタグの種類を区別し、再利用時に有効に提示することができれば、既存の協調的分類システムのインタフェースのメリットを損なうことなく、サービスの利便性や有用性が高まることが期待でき、その結果として更に利用者を集める可能性がある。 However, if the service provider is prohibited from using an arbitrary tag and the user is forced to use only a predesignated tag, the advantage of the cooperative classification system is greatly impaired. If the service provider can automatically distinguish the tag type and present it effectively when reusing it, allowing the user to use any tag, the advantages of the existing collaborative classification system interface can be obtained. It can be expected that the convenience and usefulness of the service will increase without loss, and as a result, there is a possibility of gathering more users.

そのため、協調的分類システムには、客観的タグと主観的タグに区別する手法が望まれる。 For this reason, a cooperative classification system is desired to distinguish between an objective tag and a subjective tag.

協調的分類システムとして、確率モデルに基づく概念検索技術ＰＬＳＩ(Probabilistic Latent Semantic Indexing)（例えば、非特許文献１参照）や、ＰＬＳＩを三組に拡張したものを、ソーシャルブックマークサービスに適用する技術（例えば、非特許文献２参照）がある。
"Probabilistic Latent Semantic Analysis", Thomas Hofmann, 1999, In Proc. of Uncertainty in Artficial Intelligence, UAI'99 "Exploring Social Annotations for the Semantic Web", Xian Wu et.al, 2006, In WWW2006. As a collaborative classification system, a concept retrieval technology based on a probabilistic model PLSI (Probabilistic Latent Semantic Indexing) (see, for example, Non-Patent Document 1) or a technology in which PLSI is expanded to three sets (for example, a technology for applying social bookmarking services) Non-Patent Document 2).
"Probabilistic Latent Semantic Analysis", Thomas Hofmann, 1999, In Proc. Of Uncertainty in Artficial Intelligence, UAI'99 "Exploring Social Annotations for the Semantic Web", Xian Wu et.al, 2006, In WWW2006.

しかしながら、エンドユーザによる分類結果を集積する協調的分類システム自体が、Ｗｅｂの登場によってはじめて実現可能になった技術であり、実際にデータが集積されて利用されるようになってはじめて本発明が解決しようとする課題が明確化されてきた。前述の文献１等の技術はタグの種類に着目したが、より抽象度は高いがサービスとしての有用性は高いと考えられる主観的タグと客観的タグの区別の問題については触れていない。 However, the collaborative classification system itself that accumulates classification results by end users is a technology that can be realized only by the appearance of the Web, and the present invention can be solved only when data is actually accumulated and used. Challenges to be clarified have been clarified. Although the technique of the above-mentioned literature 1 etc. paid attention to the kind of tag, it does not touch on the problem of distinction between the subjective tag and the objective tag that are considered to be highly useful as a service although the level of abstraction is higher.

また、安易な方法として、人手で辞書を構築し、辞書ベースでタグの種類を区別することが考えられる。しかしながら、協調的分類システムではエンドユーザが毎時毎分に分類を行い続けているため、辞書ベースの手法では新出タグや意味の不明瞭なタグ（顔文字や絵文字など）が本当に主観的タグなのかどうかを判定することが困難である。そこで、機械的区別方法あるいは人手による区別を支援する手法が望まれる。 As an easy method, it is conceivable to construct a dictionary manually and distinguish tag types on a dictionary basis. However, because end users continue to classify every hour in a collaborative classification system, new tags and tags with unclear meaning (emoticons, pictograms, etc.) are really subjective tags in dictionary-based methods. It is difficult to determine whether or not. Therefore, a mechanical distinction method or a technique that supports manual distinction is desired.

上記のように、ブックマークや写真、動画、本、論文といった情報を、複数のユーザが分類し共有する協調的分類システム、協調的分類装置において、客観的タグと主観的タグが混在する問題、結果として分類結果の第三者による再利用を困難にしている。 As described above, in the collaborative classification system and collaborative classification system in which multiple users classify and share information such as bookmarks, photos, videos, books, and papers, problems and results of mixed objective tags and subjective tags As a result, it is difficult to reuse the classification results by a third party.

本発明は、上記の点に鑑みなされたもので、収集されたタグ集合を、ユーザらによって合意のとれた客観的タグと、分類軸としてはノイズになる主観的タグの２つに区別することを実現し、あるいは人手による区別を支援することを可能とし、サービス提供者の分類軸（タグ）の利活用を容易にすることが可能な協調的分類方法及び装置及びプログラム及びコンピュータ読み取り可能な記録媒体を提供することを目的とする。 The present invention has been made in view of the above points, and distinguishes the collected tag set into an objective tag agreed upon by users and a subjective tag that causes noise as a classification axis. Or a collaborative classification method, apparatus, program, and computer-readable record that can support manual discrimination and facilitate the use of a classification axis (tag) of a service provider The purpose is to provide a medium.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明（請求項１）は、データの入出力を行うユーザインタフェースとデータ通信を行う通信手段と、分類対象閲覧処理及び分類対象の識別子登録処理として分類処理を実行するデータ処理手段と、を有し、利用者がブックマークや写真、動画、本、論文といった情報を、各自が分類し共有するシステムにおける協調的分類方法であって、
データ処理手段が、データベースに蓄積された、複数の分類軸、分類対象、利用者の関連の共起情報に対して確率的クラスタリングにより分類軸であるタグを特徴ベクトル化する特徴ベクトル化ステップ（ステップ１，２）と、
特徴ベクトル化ステップで得られた特徴ベクトルに基づいて、利用者が任意に付与した分類軸を、客観的なタグと主観的なタグに区別する分析ステップ（ステップ３）を行う。 The present invention (Claim 1) has a user interface for inputting / outputting data, a communication means for performing data communication, and a data processing means for executing classification processing as classification target browsing processing and classification target identifier registration processing. A collaborative classification method in a system in which users classify and share information such as bookmarks, photos, videos, books, and articles,
A feature vectorizing step (step) in which the data processing means performs feature vectorization of the tag that is the classification axis by probabilistic clustering with respect to the co-occurrence information related to the plurality of classification axes, classification targets, and users stored in the database. 1, 2),
Based on the feature vector obtained in the feature vectorization step, an analysis step (step 3) is performed for discriminating the classification axis arbitrarily given by the user into an objective tag and a subjective tag.

また、本発明（請求項２）は、分析ステップ（ステップ３）において、
特徴ベクトル化により得られた特徴ベクトルのエントロピー値を計算するエントロピー計算ステップを更に行い、
エントロピー値が所定の閾値より高いものを主観的なタグとし、該閾値より低いものを客観的なタグとする。 Further, the present invention (Claim 2) is provided in the analysis step (Step 3).
An entropy calculation step of calculating an entropy value of the feature vector obtained by the feature vectorization;
A tag having an entropy value higher than a predetermined threshold is defined as a subjective tag, and a tag having an entropy value lower than the threshold is defined as an objective tag.

また、本発明（請求項３）は、分析ステップ（ステップ３）において、
データベースより正解となる分類軸集合を取得し、
正解となる分類軸集合と距離の近い近傍タグを主観的タグ候補とする。 In the analysis step (step 3), the present invention (claim 3)
Get the correct classification axis set from the database,
Neighboring tags that are close to the correct classification axis set are set as subjective tag candidates.

また、本発明（請求項４）は、分析ステップ（ステップ３）において、
特徴ベクトル化ステップにより得られた特徴ベクトルのエントロピー値を計算するエントロピー計算ステップを更に行い、
エントロピー値が所定の閾値より高いものを正解となる主観的タグとし、
正解となる主観的タグと距離の近い近傍タグを主観的タグ候補とする。 Further, the present invention (Claim 4) is provided in the analysis step (Step 3).
An entropy calculation step of calculating an entropy value of the feature vector obtained by the feature vectorization step;
A subject whose entropy value is higher than a predetermined threshold is a correct subjective tag,
Neighboring tags that are close to the correct subjective tag are determined as subjective tag candidates.

図２は、本発明の原理を説明するための図である。 FIG. 2 is a diagram for explaining the principle of the present invention.

本発明（請求項５）は、データの入出力を行うユーザインタフェースとデータ通信を行う通信手段と、分類対象閲覧処理及び分類対象の識別子登録処理として分類処理を実行するデータ処理手段と、を有し、利用者がブックマークや写真、動画、本、論文といった情報を、各自が分類し共有するシステムにおける協調的分類装置であって、
データ処理手段は、
データベース１０３に蓄積された、複数の分類軸、分類対象、利用者の関連の全てあるいはいずれかの情報から抽出された要素を確率的クラスタリングにより特徴ベクトル化する特徴ベクトル化手段１１０と、
特徴ベクトル化手段１１０で得られた特徴ベクトルに基づいて、利用者が任意に付与した分類軸を、客観的なタグと主観的なタグに区別する分析手段１２０と、を有する。 The present invention (Claim 5) has a user interface for inputting / outputting data, a communication means for performing data communication, and a data processing means for executing classification processing as classification target browsing processing and classification target identifier registration processing. A collaborative classification device in a system in which users classify and share information such as bookmarks, photos, videos, books, and papers,
Data processing means
A feature vectorizing means 110 for converting the elements extracted from all or any of the information related to a plurality of classification axes, classification targets, and users stored in the database 103 by probabilistic clustering;
Based on the feature vector obtained by the feature vectorization means 110, there is an analysis means 120 that distinguishes a classification axis arbitrarily given by the user into an objective tag and a subjective tag.

また、本発明（請求項６）は、分析手段１２０において、
特徴ベクトル化手段１１０により得られた特徴ベクトルのエントロピー値を計算するエントロピー計算手段と、
エントロピー値が所定の閾値より高いものを主観的なタグとし、該閾値より低いものを客観的なタグとする手段と、を含む。 Further, the present invention (Claim 6) is provided in the analyzing means 120.
Entropy calculating means for calculating the entropy value of the feature vector obtained by the feature vector converting means 110;
A unit having a higher entropy value than a predetermined threshold as a subjective tag, and a unit having an entropy value lower than the threshold as an objective tag.

また、本発明（請求項７）は、分析手段１２０において、
データベースより正解となる分類軸集合を取得し、該正解となる分類軸集合と距離の近い近傍タグを主観的タグ候補とする手段を含む。 Further, the present invention (Claim 7) is provided in the analyzing means 120.
A means for acquiring a correct classification axis set from the database and using a nearby tag close to the correct classification axis set as a subjective tag candidate is included.

また、本発明（請求項８）は、分析手段１２０において、
特徴ベクトル化により得られた特徴ベクトルのエントロピー値を計算するエントロピー計算手段と、
エントロピー値が所定の閾値より高いものを正解となる主観的タグとする手段と、
正解となる主観的タグと距離の近い近傍タグを主観的タグ候補とする手段と、を含む。 Further, the present invention (Claim 8) is provided in the analyzing means 120.
Entropy calculating means for calculating the entropy value of the feature vector obtained by the feature vectorization;
Means that the entropy value is higher than a predetermined threshold value as a correct subjective tag,
And a means for setting a subjective tag as a subjective tag candidate that is close to the subjective tag that is the correct answer.

本発明（請求項９）は、請求項５乃至８のいずれか１項に記載の協調的分類装置を構成する各手段としてコンピュータを機能させるための協調的分類プログラムである。 The present invention (Claim 9) is a cooperative classification program for causing a computer to function as each means constituting the cooperative classification apparatus according to any one of Claims 5 to 8.

本発明（請求項１０）は、請求項９記載の協調的分類プログラムを格納したコンピュータ読取可能な記録媒体である。 The present invention (Claim 10) is a computer-readable recording medium storing the cooperative classification program according to Claim 9.

上記のように、本発明によれば、協調的分類システムにおいて収集されたタグ集合を、ユーザらによって合意のとれた客観的タグと、分類軸としてはノイズとなる主観的タグの２つに区別することを実現し、あるいは、人手による区別を支援することにより、サービス提供者の分類軸（タグ）の利活用を容易にすることができる。これは、主観的タグはユーザ本人のみ有益な度合いが高く、他人から見たときに役に立つ度合いは小さいが、客観的タグであれば他人によっても有益といえるところがあるため、この性質を利用して機械的に２種類のタグに分類することにより、サービス提供者が高度な分類情報を提供することができるようになる。 As described above, according to the present invention, the tag set collected in the collaborative classification system is divided into the objective tag agreed by the users and the subjective tag that causes noise as the classification axis. It is possible to facilitate the utilization of the classification axis (tag) of the service provider by realizing this or by assisting the manual discrimination. This is because subjective tags are highly useful only for the user himself and less useful when viewed from others, but objective tags can be said to be useful by others. By classifying mechanically into two types of tags, a service provider can provide advanced classification information.

最初に、本明細書で使用する用語について説明する。 First, terms used in this specification will be described.

・協調的分類システム(Collaborative/Social Tagging System)
情報を、エンドユーザが各自自由に分類し、分類結果を共有できるシステムであり、例えば、ソーシャルブックマークサービス（del.iciolus、はてな、等）、写真(flickr)、動画、論文（citeulike）、等がある。・ Collaborative / Social Tagging System
A system that allows end users to freely classify information and share the classification results, such as social bookmarking services (del.iciolus, Hatena, etc.), photos (flickr), videos, papers (citeulike), etc. is there.

・ＳＢＭ（Social Bookmark Service）
ソーシャルブックマークサービスの略。分類対象がＷｅｂ上のリソース、すなわち、ＵＲＬ。各ユーザのブックマークをネットワークを通して共有するシステムであり、協調的分類システムの典型例である。・ SBM (Social Bookmark Service)
Abbreviation for social bookmarking service. The classification target is a resource on the Web, that is, a URL. This is a system for sharing bookmarks of users through a network, and is a typical example of a collaborative classification system.

・分類軸
協調的分類システムにおいて分類に用いられる情報であり、分類軸としてはタグ（キーワード）や画像、音声が用いられる。図３に示すように、分類軸（タグ）で分類対象を分類する。 Classification axis Information used for classification in the cooperative classification system, and tags (keywords), images, and sounds are used as the classification axis. As shown in FIG. 3, the classification target is classified by the classification axis (tag).

・分類対象の識別子
分類をするには分類対象が区別できる必要がある。分類対象が区別できるということは一意に決定できる識別子集合と１対１対応がとれるため、分類対象を識別子で区別することと同義である。ブックマークの場合には、ＵＲＬを識別子として利用できる。すなわち、本明細書では分類対象を識別子で表現しても特に問題は発生しない。・ Classifier identifiers To be classified, the classification targets must be distinguishable. The distinction between the classification targets is synonymous with the discrimination of the classification targets by identifiers because they can have a one-to-one correspondence with the identifier set that can be uniquely determined. In the case of a bookmark, the URL can be used as an identifier. That is, in this specification, there is no particular problem even if the classification target is expressed by an identifier.

・ＳＢＭが収集するデータ
本明細書では、図４に示すように、誰が（USER）、どのＵＲＬを（RESOURCE）、どういったカテゴリに（TAG）分類したという３組Ｕ×Ｒ×Ｔを想定している。実際のＳＢＭではそれ以外にもコメントやユーザによるスコア、タイムスタンプ等が収集される。・ Data collected by SBM In this specification, as shown in FIG. 4, three sets of U × R × T in which (USER), which URL (RESOURCE), and what category (TAG) are classified are assumed. is doing. In the actual SBM, comments, user scores, time stamps, and the like are also collected.

・タグ
ソーシャルブックマークサービスにおける分類軸であり、任意のワードを各ユーザが決定することができる。 Tag A classification axis in the social bookmark service, and each user can determine an arbitrary word.

・客観的タグ・主観的タグ
タグの種類を本明細書では２種類に分けている。この種類は、文献１のGolderらによる７種類への分け方を、タグが主観的か客観的かの観点から更に大まかに２つに分けたものである。 -Objective tags-Subjective tags The types of tags are divided into two types in this specification. This type is roughly divided into two types according to Golder et al. In Document 1 from the viewpoint of whether the tag is subjective or objective.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図５は、本発明の一実施の形態におけるシステム概略のブロック図である。 FIG. 5 is a schematic block diagram of a system according to an embodiment of the present invention.

同図に示す協調的分類システムは、ネットワークサービスとして提供され、ユーザは、ウェブブラウザやクライアントアプリケーションを通してサービスを利用することができる。 The cooperative classification system shown in the figure is provided as a network service, and the user can use the service through a web browser or a client application.

協調的分類処理システムは、実際に処理を行うアプリケーションサーバ１００、データを格納するデータベースサーバ２００、分類軸や分類対象、ユーザ情報のベクトル化を行う特徴ベクトル化エンジン３００、特徴ベクトルのエントロピー値の計算、ソーティング、ｋ近傍取得を行うベクトル分析エンジン４００、ＬＡＮ５００、ルータ６００を有する。 The collaborative classification processing system includes an application server 100 that actually performs processing, a database server 200 that stores data, a classification axis and classification target, a feature vectorization engine 300 that vectorizes user information, and calculation of entropy values of feature vectors. , Sorting, k vector acquisition engine 400, LAN 500, and router 600.

これらの処理部は、単一のサーバ内で実現されるだけでなく、複数台で分散構成になることもある。 These processing units are not only realized in a single server, but may be distributed in a plurality of units.

以下では、協調的分類システムのうち、特に、ソーシャルブックマークシステムを例に、協調的分類装置としてアプリケーションサーバ１００で実行されるものとして説明する。 Below, especially a social bookmark system is taken as an example among cooperative classification systems, and it demonstrates as what is performed with the application server 100 as a cooperative classification apparatus.

図７は、本発明の一実施の形態におけるアプリケーションサーバの構成を示す。 FIG. 7 shows the configuration of the application server in one embodiment of the present invention.

アプリケーションサーバ１００は、ネットワークを介して複数のユーザ端末２と接続されている。なお、ユーザ端末２は、ネットワークを介してアプリケーションサーバ１００と通信する通信部２１、記憶部２２、データ処理部２３、入力／表示機能を有するユーザインタフェース２４を有する。 The application server 100 is connected to a plurality of user terminals 2 via a network. The user terminal 2 includes a communication unit 21 that communicates with the application server 100 via a network, a storage unit 22, a data processing unit 23, and a user interface 24 having an input / display function.

アプリケーションサーバ１００は、ネットワークを介してユーザ端末２と通信するための通信インタフェース１０１、協調的分類処理を行うための制御部１０２、途中経過の計算値や閾値等を格納する記憶部１０３から構成される。 The application server 100 is composed of a communication interface 101 for communicating with the user terminal 2 via a network, a control unit 102 for performing cooperative classification processing, and a storage unit 103 for storing calculated values and threshold values during progress. The

制御部１０２は、特徴ベクトル化部１１０とベクトル分析部１２０を有する。 The control unit 102 includes a feature vectorization unit 110 and a vector analysis unit 120.

図６に制御部１０２の処理概要を示す。特徴ベクトル化部１１０は、図５の特徴ベクトル化エンジンの機能を有し、三組データベースから共起データを抽出する要素抽出処理、共起データに対するノイズ処理を行う前フィルタ処理、ノイズが除去された共起データに対するＰＬＳＩ処理を行う。 FIG. 6 shows an outline of processing of the control unit 102. The feature vectorization unit 110 has the function of the feature vectorization engine of FIG. 5, and performs element extraction processing for extracting co-occurrence data from the triple database, pre-filter processing for performing noise processing on the co-occurrence data, and noise is removed PLSI processing is performed on the co-occurrence data.

特徴ベクトル化部１１０では、データベースサーバ２００から入力装置を介して入力された分類対象・タグ（分類軸）・ユーザを全てあるいはいずれかの情報を用いて、それらから特徴となる性質を抜き出し、例えば、タグ−タグ、分類対象−タグ、といったアイテム間の距離を計測可能である。そのための技術としては、各種既存技術を利用することができる。例えば、前述の文献１、非特許文献１、非特許文献２等の技術を利用可能である。典型的には、各アイテムを特徴ベクトルとして表現し、それらの特徴ベクトル間での類似度を用いて要素間の距離を測るものである。 The feature vectorization unit 110 uses all or any information of the classification target, tag (classification axis), and user input from the database server 200 via the input device, and extracts the characteristic features from them, for example, , Tag-tag, classification target-tag, and the distance between items can be measured. For this purpose, various existing techniques can be used. For example, techniques such as Document 1, Non-Patent Document 1, and Non-Patent Document 2 described above can be used. Typically, each item is expressed as a feature vector, and the distance between elements is measured using the similarity between the feature vectors.

特徴ベクトル化部１１０は、要素抽出部１１１、ノイズ除去部１１２、共起データインデクシング部１１３を有する。 The feature vectorization unit 110 includes an element extraction unit 111, a noise removal unit 112, and a co-occurrence data indexing unit 113.

以下に特徴ベクトル化部１１０の処理の詳細を説明する。 Details of the processing of the feature vectorization unit 110 will be described below.

図８は、本発明の一実施の形態における特徴ベクトル化部の動作のフローチャートである
ステップ１０１）要素抽出部１１１にローテキストデータ（あるいはデータベース）が入力されると、ローデータから必要な要素を抜き出す。ソーシャルブックマークにおけるローデータの構造は、各サービス内でどのようにデータベーススキーマを定義するかに依存する。具体例としては、ブックマークするという行為を「誰が（Ｕ）、どのＵＲＬを（Ｒ），いつ（ｔ）、何と分類し（Ｔ）、何らかの感想を書いた（Ｃ）」とモデル化することができ、このときローデータは、それぞれの要素の直積からなる５組の共起データ（Ｕ×Ｒ×ｔ×Ｔ×Ｃの部分集合）と捉えることができる。文献１では、２組から構成される確率的インデクシング手法を採用しており、そのインデクシング手法を実行できるよう、５組共起データから２組共起データを選択する（Ｕ×ＲやＵ×Ｔなど）。また、前述の非特許文献１、非特許文献２及び文献２（"PLSIを用いたSBMユーザとタグの関連の可視化"、毛受崇、江田毅晴、山室雅司、第７回Ｗｅｂインテリジェンスと印ラクション研究会、２００６．）では、３組共起データに対するインデクシングを実行できるため、（Ｕ×Ｒ×Ｔ）といった３組共起データを選択する。なお、当該処理で選択されない情報は単に無視される。 FIG. 8 is a flowchart of the operation of the feature vectorization unit in one embodiment of the present invention. Step 101) When raw text data (or database) is input to the element extraction unit 111, necessary elements are extracted from the raw data. Extract. The structure of raw data in social bookmarks depends on how the database schema is defined within each service. As a specific example, the act of bookmarking may be modeled as “who (U), what URL (R), when (t), what (T), what (T) and some impressions (C)”. At this time, the raw data can be regarded as five sets of co-occurrence data (a subset of U × R × t × T × C) consisting of the direct product of the respective elements. Reference 1 employs a probabilistic indexing method composed of two sets, and selects two sets of co-occurrence data from five sets of co-occurrence data so that the indexing method can be executed (U × R or U × T). Such). In addition, Non-Patent Document 1, Non-Patent Document 2 and Document 2 ("Visualization of SBM user and tag relations using PLSI", Takashi Mao, Yasuharu Eda, Masashi Yamamuro, 7th Web Intelligence and Mark In the Raction Study Group, 2006.), since the indexing can be performed on the three sets of co-occurrence data, the three sets of co-occurrence data such as (U × R × T) are selected. Information that is not selected in the process is simply ignored.

以下、文献１のインデクシング手法であるＰＬＳＩを２組共起データから３組に拡張したものが、非特許文献１の手法であり、議論は同様に成り立つため、以下では３組共起データを用いて説明する。 In the following, the PLSI which is the indexing technique of Document 1 is expanded from 2 sets of co-occurrence data to 3 sets, which is the technique of Non-Patent Document 1 and the discussion holds in the same way, so the following uses 3 sets of co-occurrence data. I will explain.

ステップ１０２）ＰＬＳＩにおいて、共起データ内で登録頻度の低いアイテムはノイズになることが知られている（非特許文献１）。そこで、ノイズ除去部１１２は、共起データ内での要素の最低頻度条件を用いて共起データからノイズの除去を行う。各アイテムの頻度について説明する。 Step 102) In PLSI, it is known that items with low registration frequency in co-occurrence data become noise (Non-patent Document 1). Therefore, the noise removal unit 112 removes noise from the co-occurrence data using the minimum frequency condition of the elements in the co-occurrence data. The frequency of each item will be described.

今、３組データの具体例を、
｛(u1,r1,ｔ１)，(u1,r1,t2)，(u2,r2,t2)，(u3,r2,t3)｝
とする。この場合のぞれぞれのアイテム頻度は、アイテムが３組としてデータセットに登場した回数を表す。つまり、
│u1│=３，│u2│＝１，│u3│＝１
│r1│＝２，│r2│＝３
│t1│＝２，│t2│＝２，│t3│＝１
となる。但し、│ｘ│は、アイテムｘの濃度（個数）を表す。この場合、最低頻度として２を指定すると、ｕ２，ｕ３，ｔ３は、ノイズとして除去され、３組データは
｛(u1,r1,t1)，(u1,r1,t2)｝
が残る。 Now, a specific example of 3 sets of data
{(U1, r1, t1), (u1, r1, t2), (u2, r2, t2), (u3, r2, t3)}
And In this case, each item frequency represents the number of times an item appears in the data set as three sets. That means
│u1│ = 3, │u2│ = 1, │u3│ = 1
│r1│ = 2, │r2│ = 3
│t1│ = 2, │t2│ = 2, │t3│ = 1
It becomes. However, | x | represents the density (number) of the item x. In this case, if 2 is specified as the lowest frequency, u2, u3, and t3 are removed as noise, and the three sets of data are {(u1, r1, t1), (u1, r1, t2)}.
Remains.

ステップ１０３）共起データインデクシング部１１３は、文献１及び非特許文献１に記載されているＰＬＳＩを用いると、Ｎ組共起データの共起性を学習し、それぞれの要素のアイテムを確率（分布）ベクトルとして求める。確率ベクトルとは、あるベクトルのそれぞれの値を合計すると１になるように正規化されたベクトルであり、確率分布とみなすことができる。確率ベクトル間の距離としては、ＫＬダイバージェンスや、ＪＳダイバージェンスを用いることにより、精度よくアイテム間の距離を測定することが可能となる。 Step 103) Using the PLSI described in Document 1 and Non-Patent Document 1, the co-occurrence data indexing unit 113 learns the co-occurrence of N sets of co-occurrence data and sets the items of the respective items as probabilities (distributions). ) Calculate as a vector. The probability vector is a vector normalized so that the sum of the values of a certain vector becomes 1, and can be regarded as a probability distribution. By using KL divergence or JS divergence as the distance between probability vectors, the distance between items can be accurately measured.

次に、制御部１０２のベクトル分析部１２０について説明する。 Next, the vector analysis unit 120 of the control unit 102 will be described.

ベクトル分析部１２０では、図６に示すように、ＰＬＳＩ処理によりインデクシングされたＰＬＳＩベクトルについてエントロピー計算処理を行い、エントロピー値に基づいて降順にソートし、その結果に基づいて主観的タグと客観的タグに分類する。または、何らかの方法により正解主観的タグを取得して主観的タグ候補との距離を求め、当該距離に基づいて、主観的タグと客観的タグに分類するＫ近傍取得処理を行う。 As shown in FIG. 6, the vector analysis unit 120 performs entropy calculation processing on the PLSI vectors indexed by the PLSI processing, sorts them in descending order based on the entropy values, and based on the results, the subjective tag and the objective tag Classify into: Alternatively, a correct subjective tag is acquired by some method, a distance from the subjective tag candidate is obtained, and a K-neighbor acquisition process for classifying the subjective tag and the objective tag based on the distance is performed.

ベクトル分析部１２０では、ＰＬＳＩベクトルの中からエントロピーの高いタグを降順に主観的タグ候補として取得する。その際の基本的手法は、
（１）エントロピー順にソートして上位ｋ件（あるいはエントロピーの最小値ｅ以上）のタグを取得する（ナイーブ手法）；
（２）何らかの方法で正解主観的タグを用意し、そのタグをキーとしてｋ近傍（あるいは最小距離ｄ以下）のタグを展開する；
の２つから成る。上記の２．における正解タグの選択方法としては、（１）を利用することができる。（１）の手法だと、正解タグを用意する必要がないが、（２）では正解タグが必要である。また、（１）の場合は、ＰＬＳＩを実行したデータセットの性質に大きく依存して主観的タグ候補が計算されるが、（２）を利用すると、利用形態として似たタグを取得することになる。この２つのアイディアはデータセット及び、用意できる正解タグの特徴に応じて選択されるべきである。 The vector analysis unit 120 acquires tags with high entropy from the PLSI vectors as subjective tag candidates in descending order. In that case, the basic method is
(1) Sorting in order of entropy and obtaining tags of the top k items (or more than the minimum entropy value e) (naive method);
(2) A correct subjective tag is prepared by some method, and a tag in the vicinity of k (or less than the minimum distance d) is developed using the tag as a key;
It consists of two. 2. above. As a method for selecting a correct tag in (1), (1) can be used. In the method (1), it is not necessary to prepare a correct tag, but in (2), a correct tag is required. In the case of (1), subjective tag candidates are calculated largely depending on the nature of the data set that has executed PLSI. However, if (2) is used, a tag similar to the usage pattern is acquired. Become. These two ideas should be selected according to the data set and the features of the correct tags that can be prepared.

ベクトル分析部１２０は、エントロピー計算部１２１、ソーティング部１２２、Ｋ近傍取得処理を行う正解主観的タグ取得部１２３から構成される。 The vector analysis unit 120 includes an entropy calculation unit 121, a sorting unit 122, and a correct subjective tag acquisition unit 123 that performs K neighborhood acquisition processing.

図９は、本発明の一実施の形態におけるベクトル分析部の動作のフローチャート（その１）である。 FIG. 9 is a flowchart (No. 1) of the operation of the vector analysis unit in the embodiment of the present invention.

同図のフローチャートは、上記の（１）の手法に対応する。 The flowchart in the figure corresponds to the above method (1).

ステップ２０１）はじめに、ＰＬＳＩにて特徴ベクトル化された全タグデータセットＴを用意すると共に、出力用に配列Ａを空で初期化する。主観的候補タグの個数ｋあるいは、エントロピー値に対する閾値であるｅも与える必要がある。ｋ，ｅ共に、データセットやサービスに依存するパラメータであり、適切にチューニングする必要がある。 Step 201) First, all tag data sets T converted into feature vectors by PLSI are prepared, and array A is initialized empty for output. It is also necessary to give the number k of subjective candidate tags or e which is a threshold value for the entropy value. Both k and e are parameters depending on the data set and service, and need to be appropriately tuned.

エントロピー計算部１２１は、上記のＴの全てのＰＬＳＩベクトルのエントロピー値を計算する。エントロピーは、下記の式でそれぞれのタグ（＝ＰＬＳＩベクトル）毎に計算される値である。 The entropy calculation unit 121 calculates entropy values of all the PLSI vectors of T described above. Entropy is a value calculated for each tag (= PLSI vector) in the following equation.

ステップ２０２）ソーティング部１２２は、上記で求められたエントロピー値でＴをソートする。

Step 202) The sorting unit 122 sorts T by the entropy value obtained above.

ステップ２０３）正解主観的タグ取得部１２３は、個数ｋあるいは、閾値ｅを満たすタグを配列Ａ（メモリ）に格納する。 Step 203) The correct subjective tag acquisition unit 123 stores the tags satisfying the number k or the threshold e in the array A (memory).

ステップ２０４）正解主観的タグ取得部１２３は、配列Ａを主観的タグ候補として出力する。 Step 204) The correct subjective tag acquisition unit 123 outputs the array A as a subjective tag candidate.

図１０は、本発明の一実施の形態における実際に計算されたＰＬＳＩベクトル（確率分布）の例を示す。同図は、上記の処理により、実際のデータを用いて計算したいくつかのタグのＰＬＳＩベクトル（確率分布）を示している。同図（Ａ）は、エントロピー値が０、すなわち、曖昧性が無いタグのうち、７５次元に強く特徴を持ったＰＬＳＩベクトルを示している。全てのタグが"Apple"に関連するタグであることがわかる。一方、同図（Ｂ）は、エントロピー値が高い上位１６件のタグ、すなわち、曖昧なタグを示した。示した１６件の主観的タグ候補のうち、１３件は実際に主観的タグと判断できる。個数ｋとエントロピー値ｅの２つの引数を調節することにより、主観的タグ候補は増やすことができるが、候補タグが増えれば増えるほど、精度は下がることに注意されたい。エントロピーと確率分布の関係をわかりやすく示したのが、図１１である。確率分布が１つのピークしか持たないときは、エントロピー値は０となり、曖昧性がないことを示している。ピークの数が増えるほど、エントロピーは増加していき、理論上は一様分布のときにエントロピーが最大値をとる。現実には一様分布になることはありえないので、ピークの数が多い方がエントロピーが高いと理解できる。 FIG. 10 shows an example of the actually calculated PLSI vector (probability distribution) in one embodiment of the present invention. This figure shows PLSI vectors (probability distributions) of several tags calculated using the actual data by the above processing. FIG. 6A shows a PLSI vector having a strong characteristic in 75 dimensions among tags with an entropy value of 0, that is, no ambiguity. You can see that all tags are related to "Apple". On the other hand, FIG. 5B shows the top 16 tags with high entropy values, that is, ambiguous tags. Of the 16 subjective tag candidates shown, 13 can actually be determined as subjective tags. It should be noted that the number of subjective tag candidates can be increased by adjusting the two arguments of the number k and the entropy value e, but the accuracy decreases as the number of candidate tags increases. FIG. 11 shows the relationship between entropy and probability distribution in an easy-to-understand manner. When the probability distribution has only one peak, the entropy value is 0, indicating that there is no ambiguity. As the number of peaks increases, the entropy increases. In theory, the entropy takes the maximum value when the distribution is uniform. In reality, it cannot be a uniform distribution, so it can be understood that entropy is higher when the number of peaks is larger.

次に、上記の（２）の手法について説明する。 Next, the method (2) will be described.

図１２は、本発明の一実施の形態におけるベクトル分析部の動作のフローチャート（その２）である。同図では、正解主観的タグがある場合の処理を示している。 FIG. 12 is a flowchart (part 2) of the operation of the vector analysis unit in the embodiment of the present invention. In the figure, the process when there is a correct subjective tag is shown.

図９の処理と同様に、ＰＬＳＩベクトルデータセットＴを用意し、配列Ａを初期化する。 Similar to the processing of FIG. 9, a PLSI vector data set T is prepared and the array A is initialized.

ステップ３０１）まず、辞書や以前の結果などを用いて予め正解主観的タグを用意する。用意できない場合は、ステップ３０２に移行する。 Step 301) First, a correct subjective tag is prepared in advance using a dictionary and previous results. If it cannot be prepared, the process proceeds to step 302.

ステップ３０２）正解主観的タグを用意できない場合は、例えば、図９の処理を用いて正解主観的タグ集合を用意する。 Step 302) If a correct subjective tag cannot be prepared, for example, a correct subjective tag set is prepared using the processing of FIG.

ステップ３０３）この正解主観的タグ集合に対して、Ｋ近傍タグを取得する。Ｋ近傍タグは、非奇数のタグから近距離にあるタグのうち上位Ｋ件を取得する操作である。タグ間の距離関数としては、例えば、情報検索において有効性の示されている確率分布間の距離であるＪＳダイバージェンスを用いることができる。ＪＳダイバージェンスは、下記のように計算される。 Step 303) A K neighborhood tag is acquired for the correct subjective tag set. The K vicinity tag is an operation for acquiring the top K items among tags at a short distance from a non-odd number of tags. As a distance function between tags, for example, JS divergence, which is a distance between probability distributions that have been shown to be effective in information retrieval, can be used. JS divergence is calculated as follows.

ここで、Dk(x‖y)は、ｘ、ｙ間のＫＬダイバージェンスである。全てのＫ近傍タグをＡに格納し、主観的タグ候補として出力する。

Here, Dk (x‖y) is a KL divergence between x and y. All K neighborhood tags are stored in A and output as subjective tag candidates.

上記のように、本発明の協調的分類システムでは、ユーザが分類した結果であるユーザ・分類軸・分類対象の共起情報に対して教師なし学習である確率的クラスタリングを行い、分類軸であるタグを特徴ベクトル化する。確率的クラスタリングによりタグは確率分布として表現される。この確率分布として取得できたタグのエントロピー値を計算し、エントロピーが高い順にタグを並べ直す。エントロピー値が高いということは、確率分布が曖昧であることを意味し、確率的クラスタリングの結果の曖昧性が高いということはすなわち、そのタグがどのグループにもうまくグループ分けできないことを示している。タグがうまくグループ分けできない理由は、そのタグの使われ方が他のタグの使われ方と異なっていることを示している。これは主観的タグがユーザ毎に使われ方の差が大きく、うまくクラスタリングすることができないことに起因している。本技術は予め付与されたタグの特徴ベクトルのエントロピー値の降順にタグをソーティングし、上位ｋ件あるいは、エントロピー値に対する閾値を上回るタグを機械的に主観的タグ候補と、何らかの手段によって取得された正解主観的タグ候補をキーとして、確率分布間の距離による近傍タグを主観的タグ候補とする方法からなる全自動の主観的タグ区別手法及び、人手による主観的タグ除去の支援手法からなる。正解主観的タグは、予め辞書を用意してもよいし、エントロピー値が高いタグを利用してもよい。 As described above, in the collaborative classification system of the present invention, it is a classification axis by performing probabilistic clustering that is unsupervised learning on the co-occurrence information of the user, the classification axis, and the classification target that is the result of classification by the user. Convert tags into feature vectors. Tags are represented as a probability distribution by probabilistic clustering. The entropy value of the tag acquired as this probability distribution is calculated, and the tags are rearranged in descending order of entropy. A high entropy value means that the probability distribution is ambiguous, and a high degree of ambiguity in the result of probabilistic clustering means that the tag cannot be grouped well into any group. . The reason why a tag cannot be grouped well indicates that the tag is used differently than other tags. This is because subjective tags are used differently for each user, and clustering cannot be performed well. This technology sorts tags in descending order of entropy values of pre-assigned tag feature vectors, and the top k items or tags that exceed the threshold for entropy values are mechanically acquired as subjective tag candidates and by some means It consists of a fully automatic subjective tag distinction method that consists of a method in which a correct subjective tag candidate is used as a key and a neighboring tag based on a distance between probability distributions as a subjective tag candidate, and a manual tag removal support method that is manual. As the correct subjective tag, a dictionary may be prepared in advance, or a tag with a high entropy value may be used.

なお、図７に示す制御部１０２の動作をプログラムとして構築し、当該協調的分類装置として利用されるアプリケーションサーバ（コンピュータ）にインストールして、ＣＰＵに実行させる、または、ネットワークを介して流通させることが可能である。 The operation of the control unit 102 shown in FIG. 7 is constructed as a program, installed in an application server (computer) used as the cooperative classification device, and executed by the CPU or distributed through a network. Is possible.

また、構築されたプログラムを、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体、光記憶装置、磁気記憶装置等にインストールする、または、配布することが可能である。 In addition, the constructed program can be installed or distributed in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, an optical storage device, a magnetic storage device, or the like.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、エンドユーザが種々の情報を分類し、分類結果を共有するシステムに適用可能である。 The present invention is applicable to a system in which end users classify various information and share the classification results.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 協調的分類システム（ＳＢＭ）における分類の例である。It is an example of the classification | category in a cooperative classification system (SBM). 分類行動の完全三部グラフである。It is a complete tripartite graph of classification behavior. 本発明の一実施の形態におけるシステム概略のブロック図である。It is a block diagram of the system outline in one embodiment of this invention. 本発明の一実施の形態における特徴ベクトル化・ベクトル分析処理の概要を示す図である。It is a figure which shows the outline | summary of the feature vectorization and vector analysis process in one embodiment of this invention. 本発明の一実施の形態におけるアプリケーションサーバの構成図である。It is a block diagram of the application server in one embodiment of this invention. 本発明の一実施の形態における特徴ベクトル化部の動作のフローチャートである。It is a flowchart of operation | movement of the feature vectorization part in one embodiment of this invention. 本発明の一実施の形態におけるベクトル分析部の動作のフローチャート（その１）である。It is a flowchart (the 1) of operation | movement of the vector analysis part in one embodiment of this invention. 本発明の一実施の形態における実際に計算されたＰＬＳＩベクトル（確率分布）の例である。It is an example of the actually calculated PLSI vector (probability distribution) in one embodiment of this invention. 本発明の一実施の形態におけるＰＬＳＩベクトル（確率分布）とエントロピー値の大小である。It is the magnitude of a PLSI vector (probability distribution) and entropy value in an embodiment of the present invention. 本発明の一実施の形態におけるベクトル分析部の動作のフローチャート（その２）である。It is a flowchart (the 2) of operation | movement of the vector analysis part in one embodiment of this invention.

Explanation of symbols

１協調的分類処理装置（システム）
２クライアント
２１通信部
２２記憶部
２３データ処理部
２４ユーザインタフェース
１００アプリケーションサーバ
１０１通信インタフェース
１０２制御部
１０３記憶部
１０３データベース
１１０特徴ベクトル化手段、特徴ベクトル化部
１１１要素抽出部
１１２ノイズ除去部
１１３共起データインデクシング部
１２０分析手段、ベクトル分析部
１２１エントロピー計算部
１２２ソーティング部
１２３正解主観的タグ取得部
１２４Ｋ近傍タグ取得部
１００アプリケーションサーバ
２００データベースサーバ
３００特徴ベクトル化エンジン
４００ベクトル分析エンジン 1 Collaborative classification processor (system)
2 Client 21 Communication unit 22 Storage unit 23 Data processing unit 24 User interface 100 Application server 101 Communication interface 102 Control unit 103 Storage unit 103 Database 110 Feature vectorization means, feature vectorization unit 111 Element extraction unit 112 Noise removal unit 113 Co-occurrence Data indexing unit 120 Analysis means, vector analysis unit 121 Entropy calculation unit 122 Sorting unit 123 Correct subjective tag acquisition unit 124 K-neighbor tag acquisition unit 100 Application server 200 Database server 300 Feature vectorization engine 400 Vector analysis engine

Claims

A user interface for inputting and outputting data; a communication means for performing data communication; and a data processing means for executing classification processing as classification target browsing processing and classification target identifier registration processing. A collaborative classification method in a system in which each user classifies and shares information such as videos, books, and articles,
A feature vectorizing step in which the data processing means performs feature vectorization of a tag that is a classification axis by probabilistic clustering with respect to a plurality of classification axes, classification targets, and user-related co-occurrence information stored in a database; ,
Based on the feature vector obtained in the feature vectorization step, an analysis step for distinguishing the classification axis arbitrarily given by the user into an objective tag and a subjective tag;
A collaborative classification method characterized by

In the analyzing step,
An entropy calculation step of calculating an entropy value of the feature vector obtained by the feature vectorization step;
The cooperative classification method according to claim 1, wherein the entropy value higher than a predetermined threshold is a subjective tag, and the entropy value lower than the threshold is an objective tag.

In the analyzing step,
Obtain a correct classification axis set from the database,
The collaborative classification method according to claim 1, wherein a neighboring tag whose distance is close to the correct classification axis set is a subjective tag candidate.

In the analyzing step,
An entropy calculation step of calculating an entropy value of the feature vector obtained by the feature vectorization;
The entropy value is higher than a predetermined threshold as a subjective tag that is correct,
The cooperative classification method according to claim 1, wherein a neighboring tag that is close to the subjective tag that is the correct answer is a subjective tag candidate.

A user interface for inputting and outputting data; a communication means for performing data communication; and a data processing means for executing classification processing as classification target browsing processing and classification target identifier registration processing. A collaborative classification device in a system in which each user classifies and shares information such as videos, books, and articles,
The data processing means includes
A feature vectorization means for vectorizing a tag that is a classification axis by probabilistic clustering with respect to co-occurrence information related to a plurality of classification axes, classification targets, and users stored in a database;
Based on the feature vector obtained by the feature vectorization means, the analysis means for distinguishing the classification axis arbitrarily given by the user into an objective tag and a subjective tag;
A collaborative classification device characterized by comprising:

The analysis means includes
Entropy calculating means for calculating the entropy value of the feature vector obtained by the feature vector converting means;
Means that the entropy value is higher than a predetermined threshold as a subjective tag, and the lower than the threshold is an objective tag;
The cooperative classification device according to claim 5, comprising:

The analysis means includes
6. The collaborative classification apparatus according to claim 5, further comprising means for acquiring a classification axis set that is a correct answer from the database and using a nearby tag that is close in distance to the classification axis set that is the correct answer as a subjective tag candidate.

The analysis means includes
Entropy calculating means for calculating an entropy value of a feature vector obtained by the feature vectorization;
Means for making the entropy value higher than a predetermined threshold a subjective tag that is a correct answer;
Means for setting a near tag close to the subjective tag as the correct answer as a subjective tag candidate;
The cooperative classification device according to claim 5, comprising:

The cooperative classification program for functioning a computer as each means which comprises the cooperative classification device of any one of Claims 5 thru | or 8.

A computer-readable recording medium storing the cooperative classification program according to claim 9.