JP2009217436A

JP2009217436A - Collaborative sorting apparatus and program

Info

Publication number: JP2009217436A
Application number: JP2008059190A
Authority: JP
Inventors: Takeharu Eda; 毅晴江田; Toshibumi Enomoto; 俊文榎本; Masashi Yamamuro; 雅司山室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-03-10
Filing date: 2008-03-10
Publication date: 2009-09-24
Anticipated expiration: 2028-03-10
Also published as: JP5112117B2

Abstract

<P>PROBLEM TO BE SOLVED: To increase the rate of reproducing a search process where a sorting axis is designated by facilitating the sorting of sorting axes that are likely to diverge, in a collaborative sorting system. <P>SOLUTION: Processes for conceptually associating input text data while referring to previously stored sorting axes and the content to be sorted, then indexing and grouping the data, storing the data in a storage means, and sending and presenting a group index or a hierarchy index to a user from the storage means according to a bundle name inputted from the user are repeated until the user inputs an instruction to not continue hierarchization. When a search query is inputted from the user, a normal search is conducted based on the search query, the storage means is referenced based on the search query, and the search query is expanded for the search. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、協調的分類装置及びプログラムに係り、特に、ブックマークや写真、動画、本、論文といった情報を、複数のユーザが分類し共有する協調的分類装置及びプログラムに関する。 The present invention relates to a collaborative classification apparatus and program, and more particularly to a collaborative classification apparatus and program in which a plurality of users classify and share information such as bookmarks, photos, videos, books, and papers.

昨今、ＵＲＬ（ブックマーク）写真、動画、論文といった情報を、各ユーザがそれぞれ整理分類した結果を共有することにより、鮮度の高い整理された情報収集を可能にする協調的分類システム（Collaborative Tagging Systems）が隆盛である。こうしたシステムでは、ユーザによる自由なタグ（分類軸）の付与を可能としているため、通常のインターネットを利用する行為から自然に情報共有システムへと誘導することが可能となっており、サービスへの参加敷居が低いため、多数のユーザを集めている。 Recently, collaborative tagging systems (Collaborative Tagging Systems) that enable collection of highly organized information by sharing the results of each user organizing and classifying information such as URL (bookmark) photos, videos, and papers. Is prosperous. In such a system, users can freely assign tags (classification axes), so it is possible to naturally guide users to the information sharing system from the act of using the Internet, and participate in the service. Because the threshold is low, it attracts many users.

タグによる分類は、Folksonomyと呼ばれ、様々にその位置づけが議論されており、研究においても多数の成果が発表されている。 Tag classification is called Folksonomy, and its positioning has been discussed in various ways, and numerous results have been published in research.

既存の協調的分類システムでは、ユーザが使っているタグ一覧をタググランド形式で表示することが多い。ユーザが入力するタグは、自由である反面、種類が爆発する傾向にあり、五十音順あるいは頻度順で並べただけのタグでは、情報を探す際の分類軸として有効に機能しない。頻度の高いもののみを一覧表示すると、詳細なタグを見つけ出すことが困難になる。 In an existing cooperative classification system, a tag list used by a user is often displayed in a tag ground format. Tags input by the user are free, but tend to explode in kind, and tags that are simply arranged in the order of Japanese syllabary or frequency do not function effectively as a classification axis when searching for information. If only frequent items are listed, it will be difficult to find detailed tags.

これらの問題を解決する手法として、ソーシャルブックマークサービスの『del.icio.us』や『goo bookmark』ではタグのグループ化機能を提供している。それらのサービスでは、タグをグループ分けするインタフェースを提供しており、ユーザ自らが望むようにタグをグループ化することができる（例えば、非特許文献１、非特許文献２参照）。
http://del.icio.us http://bookmark.goo.ne.jp To solve these problems, social bookmarking services “del.icio.us” and “goo bookmark” provide a tag grouping function. In these services, an interface for grouping tags is provided, and tags can be grouped as desired by the user himself (for example, see Non-Patent Document 1 and Non-Patent Document 2).
http://del.icio.us http://bookmark.goo.ne.jp

しかしながら、このような従来の技術では、タグの種類が非常に多いため、タグ一覧をグループ化／階層化する作業は決して容易なものではない。どのタグ同士が近いかをユーザが一つ一つ突き合せて判断しなければならない。 However, in such a conventional technique, since there are so many types of tags, the task of grouping / hierarchizing the tag list is not easy. The user must determine which tags are close to each other by matching each other.

また、ユーザがタグを検索クエリとして、協調的分類システムに対して検索することができるが、ユーザが選んだタグの利用頻度が低い場合に、得られる情報が減少してしまい、検索の再現率が下がってしまうという問題がある。 In addition, the user can search the collaborative classification system using the tag as a search query. However, when the frequency of use of the tag selected by the user is low, the obtained information decreases, and the search reproduction rate. There is a problem that goes down.

本発明は、上記の点に鑑みなされたもので、発散したタグを集約整理する技術グループ化（bundiling）機能において、どのようなグループ化をすればよいか、タグによる検索を行う際に、利用頻度の低いタグを利用して検索を行ったために結果が少なすぎるという問題を解決し、協調的分類システムにおいて、発散しがちな分類軸の整理を容易にし、分類軸を指定した検索処理の再現率を上げることが可能な協調的分類装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and is used when performing a search by tag to determine what kind of grouping should be performed in a technology bundiling function for collecting and organizing divergent tags. Resolved the problem that search was performed using infrequent tags, resulting in too few results, facilitated the organization of classification axes that tend to diverge in a collaborative classification system, and reproduction of search processing with a specified classification axis An object of the present invention is to provide a collaborative classification apparatus and program capable of increasing the rate.

図１は、本発明の原理構成図である。 FIG. 1 is a principle configuration diagram of the present invention.

本発明（請求項１）は、複数の利用者がブックマークや写真、動画、本、論文といった情報を、各ユーザが分類し提供するための協調的分類情報処理装置であって、
データ入出力部としてのユーザインタフェース手段１０と、
データ通信を実行する通信手段８０と、
複数のユーザによって分類された、タグ、画像、音声を含む分類軸、分類対象、ユーザ情報を格納したデータベース６５と、
ユーザのクライアント装置からテキストデータが入力されると、データベース６５の内容を参照して、該テキストデータに対して概念的に関連付けを行い、索引データベース１５に格納する関連付け処理手段２０と、
索引データベース１５の関連付けされたデータに対して、索引付け及びグループ化を行い、索引データベース１５に格納する索引付け処理手段３０と、を有する。 The present invention (Claim 1) is a collaborative classification information processing apparatus for a plurality of users to classify and provide information such as bookmarks, photographs, videos, books, papers, etc.
User interface means 10 as a data input / output unit;
Communication means 80 for performing data communication;
A database 65 storing a classification axis including tags, images and sounds, classification targets, and user information classified by a plurality of users;
When text data is input from the user's client device, the contents of the database 65 are referred to, the text data is conceptually related, and association processing means 20 for storing in the index database 15;
Indexing processing means 30 for indexing and grouping the associated data in the index database 15 and storing the data in the index database 15.

また、本発明（請求項２）は、関連付け処理手段２０において、
データベース６５を参照して、テキストデータから特徴となる性質の共起データを抽出する要素抽出処理手段と、
共起データ内での要素の所定の最低頻度条件を用いてノイズの除去を行うフィルタリング手段と、
ノイズが除去された共起データの各要素の確率ベクトルを求めるベクトル算出手段と、を含む。 Further, the present invention (Claim 2) is provided in the association processing means 20,
An element extraction processing means for extracting co-occurrence data having a characteristic property from text data with reference to the database 65;
Filtering means for removing noise using a predetermined minimum frequency condition of elements in the co-occurrence data;
Vector calculating means for obtaining a probability vector of each element of the co-occurrence data from which noise has been removed.

また、本発明（請求項３）は、索引付け処理手段３０において、
ユーザから指定された要素に近い要素を近傍インデックスとして索引付ける近傍インデックス付与手段と、
ユーザから指定された要素と同一のグループに属する要素をグループインデックスとして索引付けるグループインデックス付与手段と、
ユーザから指定された要素の下位要素群と上位要素を階層インデックスとして索引付ける階層インデックス付与手段と、を含む。 Further, the present invention (Claim 3) is provided in the indexing processing means 30.
Neighborhood index assigning means for indexing an element close to an element designated by the user as a neighborhood index;
Group index assigning means for indexing elements belonging to the same group as the element designated by the user as a group index;
And a hierarchical index assigning means for indexing a lower element group and an upper element of the element designated by the user as a hierarchical index.

また、本発明（請求項４）は、ユーザから入力されたバンドル名に基づいて、索引データベース１５のグループインデックスまたは階層インデックスを該ユーザのクライアント装置に送信し、提示する処理を、該ユーザから階層化を継続しないとする指示が入力するまで繰り返すバンドル化・階層化レコメンド手段を更に有する。 Further, according to the present invention (Claim 4), the group index or the hierarchical index of the index database 15 is transmitted to the user's client device based on the bundle name input from the user, and the process of presenting the hierarchical index from the user is hierarchically And bundling / hierarchical recommendation means that repeats until an instruction not to continue is input.

また、本発明（請求項５）は、ユーザから検索クエリが入力されると、該検索クエリに基づいて通常の検索を行い、また、該検索クエリに基づいて索引データベース１５を参照して、該検索クエリを拡張して検索する検索クエリ拡張手段５０と、
通常の検索結果と拡張された検索クエリを用いて検索された結果を、ユーザの指示に基づいて切替表示する結果切替表示手段９０と、を更に有する。 Further, according to the present invention (Claim 5), when a search query is input from a user, a normal search is performed based on the search query, and the index database 15 is referred to based on the search query. Search query expansion means 50 for expanding and searching a search query;
It further has a result switching display means 90 for switching and displaying the search result using the normal search result and the extended search query based on a user instruction.

本発明（請求項６）は、請求項１乃至４のいずれか１項に記載の協調的分類情報処理装置を構成する各手段としてコンピュータを機能させるための協調的分類情報処理択プログラムである。 The present invention (Claim 6) is a cooperative classification information processing selection program for causing a computer to function as each means constituting the cooperative classification information processing apparatus according to any one of claims 1 to 4.

本発明は、協調的分類システムにおいて、発散しがちな分類軸の整理を容易にし、分類軸を指定した検索処理の再現率を上げることが可能となる。 The present invention facilitates the organization of classification axes that tend to diverge in a cooperative classification system, and can increase the recall of a search process that designates a classification axis.

まず、本明細書中で使用される用語について説明する。 First, terms used in this specification will be described.

・協調的分類システム：情報をエンドユーザが各自自由に分類し、分類結果を共有できるシステム。例：ソーシャルブックマークサービス（del.icio.us、はてな、etc.,）、写真（flickr）、動画、論文（citeulike）、etc.,
・ＳＢＭ：ソーシャルブックマークサービスの略。分類対象がＵＲＬ。各ユーザのブックマークをネットワークを通して共有するシステム。協調的分類システムの典型例。 -Collaborative classification system: A system that allows end users to freely classify information and share classification results. Example: Social bookmarking service (del.icio.us, Hatena, etc.,), photo (flickr), video, paper (citeulike), etc.,
SBM: Abbreviation for social bookmark service. The classification target is a URL. A system for sharing bookmarks of each user over a network. A typical example of a collaborative classification system.

・ＳＢＭのデータモデル：本実施の形態では、誰が（ＵＳＥＲ）、どのＵＲＬを（ＲＥＳＯＵＲＣＥ）、どういったカテゴリに（ＴＡＧ）分類したという３組モデル。Ｕ×Ｒ×Ｔを想定。 Data model of SBM: In this embodiment, a three-set model in which (USER), which URL (RESOURCE), and what category (TAG) are classified. Assume UxRxT.

・分類軸：協調的分類システムにおいて分類に用いられる情報。分類軸としてはタグ（キーワード）や画像、音声が用いられる。 Classification axis: Information used for classification in a cooperative classification system. Tags (keywords), images, and sounds are used as the classification axis.

・分類対象の識別子：分類をするには分類対象が区別できる必要がある。分類対象が区別できるということは一意に決定できる識別子集合と１対I対応がとれるため、分類対象を識別子で区別することと同義である。ブックマークの場合には、ＵＲＬを識別子として利用できる。すなわち、本実施の形態では、分類対象を識別子で表現しても、特に問題は発生しない。 -Classification target identifier: The classification target needs to be distinguishable for classification. Being able to distinguish between classification targets is synonymous with distinguishing classification targets by identifiers because they can have a one-to-one correspondence with identifier sets that can be uniquely determined. In the case of a bookmark, the URL can be used as an identifier. That is, in the present embodiment, no particular problem occurs even if the classification target is represented by an identifier.

・タグ：ＳＢＭにおける分類軸。任意のワードを各ユーザが決定することができる。 Tag: Classification axis in SBM. An arbitrary word can be determined by each user.

・タグのバンドル化：タグをあるキーワード（バンドル名）でひも付けること（del.icio.us(http://del.icio.us)参照）。バンドル名とタグの間には必ず下上位／下位の関係性を要求しない。多重帰属も許可する場合が多い。タグを整理する方法の１つ。 -Tag bundling: Link a tag with a certain keyword (bundle name) (see del.icio.us (http://del.icio.us)). A lower / upper / lower relationship is not always required between a bundle name and a tag. In many cases, multiple attribution is also permitted. One way to organize tags.

・タグの階層化：タグを階層的に配置あるいは、バンドル化を再帰的に繰り返して階層的に配置すること（youtube(http://youtube.com)参照）。 -Tag hierarchization: Hierarchical arrangement of tags or recursive bundling (see youtube (http://youtube.com)).

・クエリ：本実施の形態では、付与されたタグに対する検索タグとする。 Query: In this embodiment, a search tag for a given tag is used.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

本発明の基本操作について説明する。 The basic operation of the present invention will be described.

本発明の協調的分類システムは、図２に示すように、利用者がある分類軸（タグ）で分類対象（ＵＲＬ）を分類するものである。 As shown in FIG. 2, the cooperative classification system of the present invention classifies a classification target (URL) by a user with a certain classification axis (tag).

本発明は、協調的分類システムにおいて、ユーザが分類した結果であり、ユーザ／分類軸／分類対象の関連情報を予め概念的に索引付けすることにより、分類軸を整理する際は適切であると予想されるグルーピング及び階層を推薦し、検索時には概念に基づいて検索クエリ拡張を行う。このレコメンドは、予め索引付けを行っておき、高速に情報を取得可能にしておく。これにより、上記課題を解決することができる。 The present invention is a result of classification by a user in a cooperative classification system, and is appropriate when organizing the classification axis by conceptually indexing related information of the user / classification axis / classification target in advance. Expected groupings and hierarchies are recommended, and search queries are expanded based on concepts during search. This recommendation is indexed in advance and information can be acquired at high speed. Thereby, the said subject can be solved.

本発明の基本動作は以下の通りである。 The basic operation of the present invention is as follows.

（１）予め付与された分類軸間の関連付け（グルーピング階層化）を利用して、ユーザが分類軸を整理する際に、グループや階層構造をレコメンドする。 (1) Using the association (grouping hierarchization) between the classification axes given in advance, when the user organizes the classification axes, the group and the hierarchical structure are recommended.

（２）分類軸に対する検索クエリに対して、予め付与された分類軸への分類対象の関連性を利用して、検索結果を拡張する。 (2) For the search query for the classification axis, the search result is expanded using the relevance of the classification target to the classification axis given in advance.

図３は、本発明の一実施の形態におけるシステム概略のブロック図である。 FIG. 3 is a schematic block diagram of a system according to an embodiment of the present invention.

協調的分類システムは、ネットワークサービスとしてクライアント装置２に提供され、ユーザはＷｅｂブラウザやクライアントアプリケーションを通してサービスを利用することができる。協調的分類システム（情報共有サーバ）では、実際に処理を行うアプリケーションサーバ１０、データを格納するデータベースサーバ１５、６０、分類軸や分類対象、ユーザ情報の関連付けを行う関連付けエンジン３０、関連付け情報をアプリケーション内のロジックに合わせて予め索引付けするインデクシングエンジン３０、クエリ拡張エンジン５０からなる。 The cooperative classification system is provided to the client device 2 as a network service, and the user can use the service through a Web browser or a client application. In a cooperative classification system (information sharing server), an application server 10 that actually performs processing, database servers 15 and 60 that store data, an association engine 30 that associates classification axes and classification targets, user information, and association information as an application The indexing engine 30 and the query expansion engine 50 are pre-indexed according to the internal logic.

これらの処理部は、単一のサーバ内で実現されるだけでなく、複数台で分散構成になることもある。 These processing units are not only realized in a single server, but may be distributed in a plurality of units.

以下では、図３に示す情報共有サーバ１を協調的分類装置として説明する。 Hereinafter, the information sharing server 1 shown in FIG. 3 will be described as a cooperative classification device.

図４は、本発明の一実施の形態における協調的分類装置の構成を示す。 FIG. 4 shows the configuration of the cooperative classification apparatus according to an embodiment of the present invention.

同図に示す協調的分類装置は、通信インタフェース（Ｉ／Ｆ）１０、関連付け処理部２０、索引付け処理部３０、バンドル化・階層化レコメンド処理部４０、クエリ拡張部５０、分類軸ＤＢ６０、分類対象ＤＢ７０、ユーザＤＢ７５、通信部８０、検索結果表示制御部９０、索引ＤＢ１５から構成される。 The cooperative classification apparatus shown in the figure includes a communication interface (I / F) 10, an association processing unit 20, an indexing processing unit 30, a bundling / hierarchical recommendation processing unit 40, a query expansion unit 50, a classification axis DB 60, a classification. It consists of a target DB 70, a user DB 75, a communication unit 80, a search result display control unit 90, and an index DB 15.

関連付け処理部２０は、分類軸タグ６０、分類対象ＤＢ７０、ユーザＤＢ７５の分類対象・タグ（分類軸）・ユーザ情報の全てあるいはいずれかの情報を用いて、それらから特徴となる性質を抜き出し、例えば、タグ⇔タグ、分類対象⇔タグ、といった距離を計測可能にすることである。そのための処理としては、文献１「"Probabilistic Latent Semantic Analysis", Thomas Hofmann, 1999, In Proc. of Uncertainty in Artificial Intelligence, UAI'99」、文献２「"Exploring Social Annotations for the Semantic Web", Xian Wu et. al, 2006, In WWW2006」、文献３「PLSIを用いたSBMユーザとタグの関連の可視化、毛受崇、江田毅晴、吉川正俊、山室雅司、DBWS, 2007」等の既存技術を利用することができる。典型的には、各アイテムを特徴ベクトルとして表現し、それらのベクトル間での類似度を用いて要素間の距離を測るものである。 The association processing unit 20 uses all or any of the classification target / tag (classification axis) / user information of the classification axis tag 60, the classification target DB 70, and the user DB 75, and extracts characteristic features from them, for example, It is possible to measure distances such as a tag tag and a classification target tag. For this purpose, Reference 1 “Probabilistic Latent Semantic Analysis”, Thomas Hofmann, 1999, In Proc. Of Uncertainty in Artificial Intelligence, UAI'99, Reference 2 “Exploring Social Annotations for the Semantic Web”, Xian Wu et. al, 2006, In WWW2006 ”, Reference 3“ Visualization of SBM user-tag relations using PLSI, Takashi Mao, Yasuharu Eda, Masatoshi Yoshikawa, Masashi Yamamuro, DBWS, 2007 ” can do. Typically, each item is expressed as a feature vector, and the distance between elements is measured using the similarity between the vectors.

関連付け処理部２０は、図５に示すように、要素抽出部２１、前フィルタ処理部２２、PLSI(Probabilistic Latent Semantic Indexing)処理部２３を有し、各処理部の結果は索引ＤＢ１５に格納するものとする。 As shown in FIG. 5, the association processing unit 20 includes an element extraction unit 21, a pre-filter processing unit 22, and a PLSI (Probabilistic Latent Semantic Indexing) processing unit 23, and the result of each processing unit is stored in the index DB 15. And

図６は、本発明の一実施の形態における関連付け処理のフローチャートである。 FIG. 6 is a flowchart of the associating process according to the embodiment of the present invention.

まず、通信インタフェース１０を介して、ローテキストデータが入力されると、関連付け処理部３０は、要素抽出処理部２１において当該ローデータから必要な共起データを抽出し（ステップ１０１）、前フィルタ処理部２２において、抽出された共起データからノイズとなるデータを除去する（ステップ１０２）。ＰＬＳＩ処理部２３は、ノイズが除去された共起データについてインデクシングを行い（ステップ１０３）、その結果として各要素の特徴ベクトルを索引ＤＢ１５に出力する（ステップ１０４）。 First, when raw text data is input via the communication interface 10, the association processing unit 30 extracts necessary co-occurrence data from the raw data in the element extraction processing unit 21 (step 101), and performs pre-filter processing. The unit 22 removes noise data from the extracted co-occurrence data (step 102). The PLSI processing unit 23 performs indexing on the co-occurrence data from which noise has been removed (step 103), and as a result, outputs the feature vector of each element to the index DB 15 (step 104).

以下に、関連付け処理部２０の詳細な処理について説明する。 Hereinafter, detailed processing of the association processing unit 20 will be described.

要素抽出部２１には、ローテキストデータが入力される。ソーシャルブックマークにおけるローデータ構造は、各サービス内でどのようにデータベーススキーマを定義するかに依存する。具体例としては、ブックマークするという行為を『誰が（U）、どのＵＲＬを（Ｒ）、いつ（ｔ）、何と分類した（Ｔ）。感想を書いた（Ｃ）』とモデル化することができ、このときローデータはそれぞれの直積からなる５組共起データ（Ｕ×Ｒ×ｔ×Ｔ×Ｃの部分集合）と捉えることができる。上記の文献１の方法では、２組から構成される確率的インデクシング手法を採用しており、そのインデクシング手法を実行できるよう、５組共起データを選択する（Ｕ×ＲやＲ×Ｔなど）。また、上記の文献２、文献３、及び文献４「"Folksonomy のタグを用いた自動分類体系構築へ向けて"、江田毅晴、吉川正俊、山室雅司、DBWS，2007」では、３組共起データに対するインデクシングを実行できるため、（Ｕ×Ｒ×Ｔ：ユーザ・分類軸・分類対象）といった３組共起データを選択する。これらの処理が当該要素抽出処理部２１で行われる処理であり、選択されない情報は単に無視される。要素抽出処理部２１は、選択された共起データを索引ＤＢ１５の共起データ領域に格納する。 Raw text data is input to the element extraction unit 21. The raw data structure in social bookmarks depends on how the database schema is defined within each service. As a specific example, the act of bookmarking is classified as “who (U), which URL (R), when (t), what” (T). (C) ”, and the raw data can be regarded as five sets of co-occurrence data (subsets of U × R × t × T × C) each consisting of a direct product. . In the method of the above-mentioned document 1, a stochastic indexing method composed of two sets is adopted, and five sets of co-occurrence data are selected so that the indexing method can be executed (such as U × R and R × T). . Also, in the above-mentioned literature 2, literature 3, and literature 4, “Toward the construction of an automatic classification system using the tags of Folksonomy”, Masaharu Eda, Masatoshi Yoshikawa, Masashi Yamamuro, DBWS, 2007, three sets co-occurrence Since indexing can be performed on the data, three sets of co-occurrence data such as (U × R × T: user, classification axis, classification target) are selected. These processes are processes performed by the element extraction processing unit 21, and information not selected is simply ignored. The element extraction processing unit 21 stores the selected co-occurrence data in the co-occurrence data area of the index DB 15.

上記の文献１のインデクシング手法であるＰＬＳＩを２組共起データから３組に拡張したものが、上記の文献２の方法であり、議論は同様に成り立つため、以降、３組共起データを用いて説明する。 The PLSI, which is the indexing technique of the above-mentioned literature 1, is expanded from 2 sets of co-occurrence data to 3 sets. This is the method of the above-mentioned literature 2, and the discussion is similarly established. I will explain.

次に、前フィルタ処理部２２について説明する。 Next, the pre-filter processing unit 22 will be described.

ＰＬＳＩにおいて、共起データ内で登場頻度の低いアイテムはノイズになることが知られている（文献２参照）。そこで、前フィルタ処理部３２では、索引ＤＢ１５から要素抽出部２１で選択された共起データを読み出して、当該共起データ内での要素の最低頻度条件を用いて共起データからノイズ除去を行い、索引ＤＢ１５に格納する。ここで、上記の各アイテム頻度について説明する。 In PLSI, it is known that items with low appearance frequency in co-occurrence data become noise (see Document 2). Therefore, the pre-filter processing unit 32 reads the co-occurrence data selected by the element extraction unit 21 from the index DB 15 and performs noise removal from the co-occurrence data using the minimum frequency condition of the elements in the co-occurrence data. And stored in the index DB 15. Here, each item frequency will be described.

今、３組のデータの具体例を
｛（ｕ１，ｒ１，ｔ１），（ｕ１，ｒ１，ｔ２），（ｕ１，ｒ２，ｔ１），（ｕ２，ｒ２，ｔ２），（ｕ３，ｒ２，ｔ３）｝
とする。この場合、それぞれのアイテム頻度は、アイテムが３組としてデータセットに登場した回数を表す。つまり、
│ｕ１│=３，│ｕ２│=１，│ｕ３│＝１
│ｒ１│＝２，│ｒ２│＝３
│ｔ１│＝２，│ｔ２│＝２，│ｔ３│＝１
となる。但し、│ｘ│は、アイテムｘの濃度（個数）を表す。この場合、最低頻度として"２"を指定すると、ｕ２，ｕ３，ｔ３はノイズとして除去され、３組データは
｛（ｕ１，ｒ１，ｔ１），（ｕ１，ｒ１，ｔ２）｝
が残る。 Now, specific examples of three sets of data {(u1, r1, t1), (u1, r1, t2), (u1, r2, t1), (u2, r2, t2), (u3, r2, t3) }
And In this case, each item frequency represents the number of times an item appears in the data set as three sets. That means
│u1│ = 3, │u2│ = 1, │u3│ = 1
│r1│ = 2, │r2│ = 3
│t1│ = 2, │t2│ = 2, │t3│ = 1
It becomes. However, | x | represents the density (number) of the item x. In this case, if “2” is specified as the minimum frequency, u2, u3, t3 are removed as noise, and the three sets of data are {(u1, r1, t1), (u1, r1, t2)}.
Remains.

次に、ＰＬＳＩ処理部２３について説明する。 Next, the PLSI processing unit 23 will be described.

ＰＬＳＩ処理部３３は、索引ＤＢ１５から前フィルタ処理部２２でノイズ除去された共起データを読み込んで、前述の文献１、文献２にて提案されているＰＬＳＩを用いて、Ｎ組共起データの共起性を学習し、それぞれの要素のアイテムを確率ベクトルとして求め、索引ＤＢ１５に格納する。処理の詳細については、文献１，２を参照されたい。ここで、確率ベクトルとは、あるベクトルのそれぞれの値を合計すると１になるような正規化されたベクトルである。こうしたベクトル間の距離としては、ＫＬダイバージェンスや、ＪＳダイバージェンスを用いることにより、精度よくアイテム間の距離を測定することが可能となる。 The PLSI processing unit 33 reads the co-occurrence data from which noise has been removed by the pre-filter processing unit 22 from the index DB 15, and uses the PLSI proposed in the above-mentioned literature 1 and literature 2 to generate N sets of co-occurrence data. The co-occurrence is learned, the item of each element is obtained as a probability vector, and stored in the index DB 15. Refer to Documents 1 and 2 for details of the processing. Here, the probability vector is a normalized vector such that the sum of the values of a certain vector is 1. By using KL divergence or JS divergence as the distance between vectors, it is possible to measure the distance between items with high accuracy.

次に、索引付け処理部３０について図５を用いて説明する。 Next, the indexing processing unit 30 will be described with reference to FIG.

索引付け処理部３０は、図５に示すように、近傍インデクシング部３１とグルーピング階層化部３２から構成され、それぞれの結果は索引ＤＢ１５に格納される。 As shown in FIG. 5, the indexing processing unit 30 includes a neighborhood indexing unit 31 and a grouping hierarchizing unit 32, and each result is stored in the index DB 15.

索引付け処理部３０では、次の３通りのパターンを行う。 The indexing processing unit 30 performs the following three patterns.

１．近傍インデクシング部３１は、索引ＤＢ１５から関連付け処理部２０で求められた特徴ベクトル（ＰＬＳＩベクトル）を読み込んで、ユーザから指定された要素に距離の近い要素を索引付け、これを近傍情報インデックスとして索引ＤＢ１５の近傍インデックス領域に格納する。 1. The neighborhood indexing unit 31 reads the feature vector (PLSI vector) obtained by the association processing unit 20 from the index DB 15, indexes an element having a distance close to the element designated by the user, and uses this as a neighborhood information index. Is stored in the neighborhood index area.

２．グルーピング・階層化部３２は、索引ＤＢ１５から特徴ベクトル（ＰＬＳＩベクトル）を読み込んで、ユーザから指定された要素と同一のグループに属する要素を索引付け、グループインデックスとして索引ＤＢ１５のグループインデックス領域に格納する。 2. The grouping / hierarchizing unit 32 reads a feature vector (PLSI vector) from the index DB 15, indexes elements belonging to the same group as the element designated by the user, and stores them as a group index in the group index area of the index DB 15. .

３．グルーピング・階層化部３２において、索引ＤＢ１５から特徴ベクトル（ＰＬＳＩベクトル）を読み込んで、ユーザから指定された要素の下位要素群を索引付け、階層インデックスとして索引ＤＢ１５の階層インデックス領域に格納する。 3. The grouping / hierarchizing unit 32 reads the feature vector (PLSI vector) from the index DB 15, indexes the lower element group of the element designated by the user, and stores it as a hierarchical index in the hierarchical index area of the index DB 15.

詳細は、前述の文献３，４に示すような手法を用いて行う。 The details are performed using the methods shown in the above-mentioned documents 3 and 4.

結果として取得できる索引の例を示す。 An example of an index that can be obtained as a result is shown below.

（１）"web2.0"---"web","インターネット","internet","www","html","blog","SBM"
（２）"web2.0"---5
・上記の「５」は、グループＩＤを示す。グループＩＤはタグを特徴量に基づいてクラスタリングし、それぞれのクラスタに割り振った番号である。この番号「５」に属するタグとして｛"web","インターネット","internet","www"｝等が想定できる。 (1) "web2.0" --- "web", "Internet", "internet", "www", "html", "blog", "SBM"
(2) "web2.0" --- 5
“5” above indicates a group ID. The group ID is a number assigned to each cluster by clustering tags based on the feature amount. As tags belonging to the number “5”, {“web”, “Internet”, “internet”, “www”} and the like can be assumed.

（３）"web2.0"---親->,子供->(blog, thml, SBM,,,)
・ルートに対しては、トップタグ候補を子供として返す。（トップタグ候補の構成方法は、文献４に記載されている）
＊""---親->""，子供->(web, あとで読む,tools, reference,,,)
こうした索引を索引付け処理部３０にて予め計算しておくことで問い合わせ処理を高速化する。 (3) "web2.0" --- Parent->, Children-> (blog, thml, SBM ,,,)
-For the route, the top tag candidate is returned as a child. (The method for configuring the top tag candidate is described in Document 4)
* "" --- parent->"",child-> (web, read later, tools, reference ,,,)
Inquiry processing is speeded up by calculating such an index in advance in the indexing processing unit 30.

次に、バンドル化・階層化レコメンド処理部４０について説明する。 Next, the bundling / hierarchical recommendation processing unit 40 will be described.

まず、バンドル化レコメンド処理について説明する。バンドル化・階層化レコメンド処理部４０は、典型的にはバンドル化時には、グループインデックスを、階層化時には階層インデックスを利用する。 First, the bundle recommendation process will be described. The bundling / hierarchical recommendation processing unit 40 typically uses a group index at the time of bundling and a hierarchical index at the time of hierarchization.

図７は、本発明の一実施の形態におけるバンドル化レコメンド処理のフローチャートである。 FIG. 7 is a flowchart of the bundling recommendation process according to the embodiment of the present invention.

まず、ユーザのクライアント装置２からユーザインタフェース１０を介してバンドル名nameと、全体のタグ集合Ｔが入力される（ステップ２０１）。バンドル化レコメンド処理においては、バンドル名nameは、既に利用されているタグの場合もあるし、任意のキーワードの場合もある。タグの場合には（ステップ２０２、Ｙ）、索引ＤＢ１５に格納されているグループインデックスを利用して同一グループに含まれるタグ一覧をバンドル対象候補としてユーザに提示する（ステップ２０５）。任意のキーワードの場合には（ステップ２０２、Ｎ）、タグを一つユーザに選択させることにより（ステップ２０３，２０４）、選択済みのタグのグループインデックスを利用してバンドル候補タグの集合Ｒｔを取得し（ステップ２０５）、ユーザに提示する（ステップ２０６）。上記のグループインデックスを利用したところは、近傍情報インデックスや階層インデックスに置き換えても同様のレコメンドが可能である。階層インデックスの場合は、バンドル名を親タグとして情報を取得することにより、より直感的なバンドルの推薦が可能になる。 First, the bundle name name and the entire tag set T are input from the user client device 2 via the user interface 10 (step 201). In the bundle recommendation process, the bundle name name may be a tag that has already been used or may be an arbitrary keyword. In the case of a tag (step 202, Y), a list of tags included in the same group is presented to the user as a bundle target candidate using the group index stored in the index DB 15 (step 205). In the case of an arbitrary keyword (step 202, N), by letting the user select one tag (steps 203, 204), a set Rt of bundle candidate tags is obtained using the group index of the selected tag. (Step 205) and present it to the user (step 206). When the above group index is used, the same recommendation can be made even if it is replaced with a neighborhood information index or a hierarchical index. In the case of a hierarchical index, it is possible to more intuitively recommend bundles by acquiring information using the bundle name as a parent tag.

図８は、本発明の一実施の形態におけるバンドル化時のユーザインタフェースの流れを示す。同図に示すユーザインタフェースの流れは、グループインデックスを使った場合のバンドル化の手順である。 FIG. 8 shows the flow of the user interface at the time of bundling in an embodiment of the present invention. The flow of the user interface shown in the figure is a bundling procedure when a group index is used.

（１）ユーザは、クライアント装置２において、バンドル作成メニューを表示し、新しいバンドル名を入力する。 (1) The user displays a bundle creation menu on the client device 2 and inputs a new bundle name.

（２）次に、入力されたバンドル名に基づいてメモリ３３を参照し、バンドルに含むタグを入力するためのバンドル推薦タグを表示する。 (2) Next, referring to the memory 33 based on the input bundle name, a bundle recommendation tag for inputting a tag included in the bundle is displayed.

（３）ユーザは、表示されたバンドル推薦タグを選択する、または、手入力により任意のタグを入力する。 (3) The user selects the bundle recommendation tag displayed or inputs an arbitrary tag by manual input.

（４）手入力により、タグを入力する。 (4) A tag is input manually.

次に、バンドル化・階層化レコメント処理部４０の階層化レコメンド処理について説明する。図９は、本発明の一実施の形態における階層化レコメンド処理のフローチャートである。 Next, the hierarchical recommendation process of the bundled / hierarchical comment processing unit 40 will be described. FIG. 9 is a flowchart of the hierarchical recommendation process according to the embodiment of the present invention.

階層化操作は、通常再帰的な操作の繰り返しになるため、図９の階層化レコメンド処理においては、階層化処理を継続するかしないか選択することにより再帰的に階層化を行えるようになっている。 Since the hierarchization operation is usually a recursive repetitive operation, the hierarchization recommendation process in FIG. 9 can be recursively hierarchized by selecting whether or not to continue the hierarchization process. Yes.

まず、親候補（トップタグ選択時は空）ｐが入力される（ステップ３０１）。 First, a parent candidate (empty when a top tag is selected) p is input (step 301).

ｐをキーとして索引ＤＢ１５から下位タグ集合Ｒｔを取得し（ステップ３０２）、当該集合Ｒｔをユーザに提示する。このとき、任意タグの選択が可能な選択フォームを提示する（ステップ３０３）。ユーザによる階層化処理を継続する場合は（ステップ３０４、Ｙ）、親候補選択フォームをユーザに提示し（ステップ３０５）、ユーザが選択した親候補ｐを入力し（ステップ３０６）、上記の処理を繰り返す。また、ユーザによる階層化処理を継続しない場合は（ステップ３０４、Ｎ）、階層化結果を出力する（ステップ３０７）。 The lower tag set Rt is acquired from the index DB 15 using p as a key (step 302), and the set Rt is presented to the user. At this time, a selection form capable of selecting an arbitrary tag is presented (step 303). When continuing the hierarchization process by the user (step 304, Y), the parent candidate selection form is presented to the user (step 305), the parent candidate p selected by the user is input (step 306), and the above process is performed. repeat. If the stratification process by the user is not continued (step 304, N), the stratification result is output (step 307).

図１０は、本発明の一実施の形態における階層化を行う際のユーザインタフェースの流れを示す図である。 FIG. 10 is a diagram showing a flow of a user interface when performing hierarchization in an embodiment of the present invention.

（１）ユーザのクライアント装置２に対してタグ階層化メニューを表示し、トップタグの１つ下位の階層のタグ候補を推薦タグとして表示する。 (1) A tag hierarchization menu is displayed on the client device 2 of the user, and tag candidates in a hierarchy one level lower than the top tag are displayed as recommended tags.

（２）ユーザにより選択されたタグに基づいて索引ＤＢ１５を参照してその１つ下位の階層のタグ候補を更に表示し、ユーザからの選択入力を受け付ける。 (2) Based on the tag selected by the user, the index DB 15 is referenced to further display tag candidates in the next lower hierarchy, and a selection input from the user is accepted.

なお、（１）、（２）において、ユーザの手入力を受け付けることも可能である。 In (1) and (2), it is also possible to accept user manual input.

次に、検索時におけるクエリ拡張処理部５０と検索結果表示制御部９０について説明する。 Next, the query expansion processing unit 50 and the search result display control unit 90 at the time of search will be described.

図１１は、本発明の一実施の形態におけるクエリ拡張処理部のフローチャートである。 FIG. 11 is a flowchart of the query expansion processing unit according to the embodiment of the present invention.

クエリ拡張処理部５０は、入力として、クライアント装置２から検索クエリｑと検索オプションｏが与えられると（ステップ４０１）、検索オプションｏが概念検索である場合は（ステップ４０２、Ｙ）、索引ＤＢ１５に格納されている特徴ベクトル距離によるランキング結果を出力する（ステップ４０３）。ｏが検索概念ではなく（ステップ４０２、Ｎ）、検索オプションｏが類似タグ検索である場合は（ステップ４０４、Ｙ）、索引ＤＢ１５に格納されている近傍インデックスｑＴから取得した近傍タグとｑの集合を取得し（ステップ４０５）、ｑＴのタグをＯＲで結合して、検索エンジンによって通常の検索処理を行い（ステップ４０６）、検索結果表示制御部９０は、そのｑＴによる検索結果を出力し（ステップ４０６）、一方、ｏが概念検索ではなく、類似タグ検索でもない場合は（ステップ４０２、Ｎ，ステップ４０４、Ｎ）、ｑによる通常の検索結果を出力する（ステップ４０８）。 When the search query q and the search option o are given from the client device 2 as inputs (step 401), the query expansion processing unit 50 stores the index DB 15 in the index DB 15 when the search option o is a conceptual search (step 402, Y). A ranking result based on the stored feature vector distance is output (step 403). If o is not a search concept (step 402, N) and the search option o is a similar tag search (step 404, Y), a set of neighbor tags and q acquired from the neighbor index qT stored in the index DB 15 (Step 405), the tags of qT are combined with OR, and a normal search process is performed by the search engine (step 406), and the search result display control unit 90 outputs the search result based on qT (step 406). On the other hand, when o is not a concept search and a similar tag search (step 402, N, step 404, N), a normal search result by q is output (step 408).

クエリ拡張処理部５０は、上記のように、内部処理としてのクエリ処理として、通常の検索処理以外に、概念検索、類似タグ拡張検索がある。 As described above, the query expansion processing unit 50 includes a concept search and a similar tag expansion search in addition to a normal search process as a query process as an internal process.

概念検索は、タグと検索対象の特徴ベクトルの距離に基づいてランキングする検索である。一般に、通常の検索結果が少ないときには、結果数を増やし、多すぎるときには結果数を減らす効果がある。 Concept search is a search that ranks based on the distance between a tag and a feature vector to be searched. In general, the number of results is increased when there are few normal search results, and the number of results is decreased when there are too many results.

また、類似タグ拡張検索とは、近傍インデックスを利用して検索タグの近傍タグを利用して近傍タグ集合と検索タグの結果の和集合をとってランキングする検索手法である。 Further, the similar tag extended search is a search method that uses a neighborhood index and uses a neighborhood tag of a search tag and ranks the union of a neighborhood tag set and a search tag result.

以下に、従来の技術と本発明を比較した例を示す。 The following is an example comparing the prior art with the present invention.

図１２は、従来技術と本発明にて作成したタグのバンドル化を行う操作メニューの例である。同図（Ａ）は従来技術により作成したものであり、バンドル名を選択あるいは作成する時に見えるタグの一覧は五十音順あるいは頻度順に並んでおり、バンドル名をどのように選択すればよいか直感的に把握することが困難である。同図（Ｂ）は本発明を用いて作成したものであり、タグはその意味を解析した関連性に基づいてグルーピングされて一覧表示されており、バンドル名として相応しいタグを直感的に選び出すことが可能となる。 FIG. 12 is an example of an operation menu for bundling tags created in the prior art and the present invention. Fig. (A) is created by the prior art. The list of tags that can be seen when selecting or creating a bundle name is arranged in alphabetical order or frequency order, and how to select the bundle name. It is difficult to grasp intuitively. FIG. 6B is created using the present invention, and tags are grouped and displayed in a list based on relevance whose meaning has been analyzed, and it is possible to intuitively select tags that are appropriate as bundle names. It becomes possible.

図１３は、従来技術と本発明にて作成したタグの階層化を行う操作メニューの例であり、図１４は、従来技術と本発明にて作成したタグの階層化を行う操作メニューにおいて下位タグを選択した場合の例である。図１３は、階層化のスタートとなる最上位タグの選択操作を示しており、図１４は、最上位タグの１つとして指定した「programming」というタグの下位タグの選択操作を示している。 FIG. 13 shows an example of an operation menu for hierarchizing tags created in the prior art and the present invention, and FIG. 14 shows a lower tag in the operation menu for hierarchizing tags created in the prior art and the present invention. This is an example of selecting. FIG. 13 shows the selection operation of the highest tag that is the start of hierarchization, and FIG. 14 shows the selection operation of the lower tag of the tag “programming” designated as one of the highest tags.

図１３（Ａ）、図１４（Ａ）に示す従来技術では、ある階層のタグを選択する際に、既存のタグの一覧から選択するか、ユーザが任意にキーワードを指定する必要がある。図１３（Ｂ），図１４（Ｂ）に示す本発明では、最上位タグの選択時には、ユーザが利用しているタグ全体の中から抽象度と頻度が高いと判定できるタグを一覧表示する。あるタグの下位タグを選択する際には、そのタグより抽象度が低く意味的に距離の近いタグを推薦する。これにより、ユーザは容易にタグの階層的な配置を行うことができる。推薦タグ以外のタグを配置したい場合や任意のキーワードを利用する場合は、従来の手法を使う。 In the prior art shown in FIGS. 13A and 14A, when a tag in a certain hierarchy is selected, it is necessary to select from a list of existing tags or to specify a keyword arbitrarily by the user. In the present invention shown in FIGS. 13 (B) and 14 (B), when the highest tag is selected, a list of tags that can be determined as having a high abstraction level and frequency among all tags used by the user is displayed. When selecting a lower tag of a tag, a tag having a lower abstraction level and a semantically closer distance than the tag is recommended. As a result, the user can easily arrange tags hierarchically. If you want to place tags other than recommended tags or use arbitrary keywords, use the conventional method.

図１５は、通常の検索結果とクエリ拡張した結果の例を示す。 FIG. 15 shows an example of a normal search result and a result of query expansion.

クエリは、'xquery'であるが、それほど頻繁に使われるタグでない場合、従来技術では、結果数が少なくなってしまうという問題がある。これに対して本発明では、'xquery'に意味の近いタグを表示し、結果も意味的な関連を元に追加して表示することが可能になり、検索結果の再現率向上に寄与する。ノイズも増えるため、もちろん安易に再現率向上と言えない部分もあるが、通常の結果とクエリ拡張を行った結果の両方をインタラクティブに見せるようなユーザインタフェースを提供することにより、ユーザに情報を取得するための選択肢を与えることにつなげることが可能である。 The query is 'xquery', but if it is not a tag that is used so frequently, there is a problem that the number of results is reduced in the prior art. On the other hand, in the present invention, it is possible to display a tag having a meaning similar to 'xquery', and to display a result by adding a semantic relation, which contributes to an improvement in the recall rate of the search result. Of course, there is a part that cannot easily be said to improve the recall rate due to increased noise, but by providing a user interface that interactively shows both the normal result and the result of query expansion, information is acquired for the user It can be linked to giving options to do.

上記の実施の形態では、協調的分類システムのうち、特にソーシャルブックマークシステムを例として説明しているが、ブックマークに限定されることなく、いずれの協調的分類システムを用いてもよい。 In the above-described embodiment, the social bookmark system is described as an example among the cooperative classification systems, but any cooperative classification system may be used without being limited to the bookmark.

なお、図４に示す協調的分類装置の各構成要素の動作をプログラムとして構築し、協調的分類装置として利用されるコンピュータ（サーバ）にインストールして実行させる、または、ネットワークを介して流通させることが可能である。 In addition, the operation | movement of each component of the cooperative classification apparatus shown in FIG. 4 is built as a program, installed in a computer (server) used as the cooperative classification apparatus, executed, or distributed via a network. Is possible.

また、構築したプログラムをハードディスクや、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 Further, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、ブックマーク、写真、動画、本、論文等の情報を複数のユーザが分類し供給するシステムに適用可能である。 The present invention can be applied to a system in which a plurality of users classify and supply information such as bookmarks, photographs, videos, books, and papers.

本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の協調的分類システムとＳＢＭとの対応付けを示す図である。It is a figure which shows matching with the cooperative classification system of this invention, and SBM. 本発明の一実施の形態におけるシステム概略のブロック図である。It is a block diagram of the system outline in one embodiment of this invention. 本発明の一実施の形態における協調的分類装置の構成図である。It is a block diagram of the cooperative classification apparatus in one embodiment of this invention. 本発明の一実施の形態における関連付け処理部と索引付け処理部の構成図である。It is a block diagram of the correlation process part and indexing process part in one embodiment of this invention. 本発明の一実施の形態における関連付け処理のフローチャートである。It is a flowchart of the correlation process in one embodiment of this invention. 本発明の一実施の形態におけるバンドル化レコメンド処理のフローチャートである。It is a flowchart of the bundling recommendation process in one embodiment of this invention. 本発明の一実施の形態におけるバンドル化時のユーザインタフェースの流れを示す図である。It is a figure which shows the flow of the user interface at the time of bundling in one embodiment of this invention. 本発明の一実施の形態における階層化レコメンド処理のフローチャートである。It is a flowchart of the hierarchization recommendation process in one embodiment of this invention. 本発明の一実施の形態における階層化を行う際のユーザインタフェースの流れを示す図である。It is a figure which shows the flow of the user interface at the time of performing hierarchization in one embodiment of this invention. 本発明の一実施の形態におけるクエリ拡張処理部のフローチャートである。It is a flowchart of the query expansion process part in one embodiment of this invention. 従来技術と本発明にて作成したタグのバンドル化を行う操作メニューの例である。It is an example of the operation menu which bundles the tag produced by the prior art and this invention. 従来技術と本発明にて作成したタグの階層化を行う操作メニューの例である。It is an example of the operation menu which performs the hierarchy of the tag produced by the prior art and this invention. 従来技術と本発明にて作成したタグ階層化を行う操作メニューにおいて下位タグを選択した場合の例である。This is an example in which a lower tag is selected in the operation menu for creating a tag hierarchy created in the prior art and the present invention. 通常の検索結果とクエリ拡張した結果の例である。It is an example of a normal search result and a query expansion result.

Explanation of symbols

１情報共有サーバ、協調的分類装置
２クライアント装置
１０ユーザインタフェース手段、ユーザインタフェース
１５索引ＤＢ（データベース）
２０関連付け処理手段、関連付け処理部、関連付けエンジン
２１要素抽出部
２２前フィルタ処理部
２３ＰＬＳＩ処理部
３０索引付処理手段、索引付け処理部、インデクシングエンジン
３１近傍インデクシング部
３２グルーピング階層化部
４０バンドル化・階層化レコメンド手段、バンドル化・階層化レコメンド処理部
５０検索クエリ拡張手段、クエリ拡張部、クエリ拡張エンジン
６０分類軸データベース（ＤＢ）、データベースサーバ
６５データベース
７０分類対象データベース（ＤＢ）、データベースサーバ
７５ユーザデータベース（ＤＢ）
８０通信手段、通信部
９０結果切替表示手段、検索結果表示制御部 DESCRIPTION OF SYMBOLS 1 Information sharing server, cooperative classification apparatus 2 Client apparatus 10 User interface means, user interface 15 Index DB (database)
20 Association Processing Unit, Association Processing Unit, Association Engine 21 Element Extraction Unit 22 Pre-Filter Processing Unit 23 PLSI Processing Unit 30 Indexing Processing Unit, Indexing Processing Unit, Indexing Engine 31 Neighborhood Indexing Unit 32 Grouping Hierarchy Unit 40 Bundled / Hierarchical recommendation means, bundled / hierarchical recommendation processing section 50 Search query expansion means, query expansion section, query expansion engine 60 Classification axis database (DB), database server 65 Database 70 Classification target database (DB), database server 75 User Database (DB)
80 communication means, communication section 90 result switching display means, search result display control section

Claims

A collaborative classification information processing device for each user to classify and provide information such as bookmarks, photos, videos, books, papers, etc.
User interface means as a data input / output unit;
A communication means for performing data communication;
A database that stores tags, images, classification axes including audio, classification targets, and user information classified by a plurality of users;
When text data is input from the user's client device, referring to the contents of the database, conceptually associating the text data, and storing in an index database; and
Indexing processing means for indexing and grouping the associated data of the index database and storing in the index database;
A collaborative classification information processing apparatus characterized by comprising:

The association processing means includes
Element extraction processing means for extracting co-occurrence data having a characteristic property from the text data with reference to the database;
Filtering means for removing noise using a predetermined minimum frequency condition of elements in the co-occurrence data;
Vector calculation means for obtaining a probability vector of each element of the co-occurrence data from which noise has been removed;
The cooperative classification information processing apparatus according to claim 1, including:

The indexing processing means includes
Neighborhood index assigning means for indexing an element close to the element designated by the user as a neighborhood index;
Group index assigning means for indexing an element belonging to the same group as the element designated by the user as a group index;
Hierarchical index assigning means for indexing a lower element group and an upper element of an element designated by the user as a hierarchical index;
The cooperative classification information processing apparatus according to claim 1, including:

Based on the bundle name input from the user, the group index or the hierarchical index of the index database is transmitted to the client device of the user, and an instruction not to continue the hierarchical processing is input from the user The collaborative classification information processing apparatus according to claim 1, further comprising a bundling / hierarchical recommendation unit that repeats up to.

When a search query is input from the user, a normal search is performed based on the search query, and the search query is expanded and searched by referring to the index database based on the search query. Expansion means;
Means for switching and displaying a normal search result and a result searched using the extended search query based on an instruction of the user;
The cooperative classification information processing apparatus according to claim 1, further comprising:

A cooperative classification information processing selection program for causing a computer to function as each means constituting the cooperative classification information processing apparatus according to any one of claims 1 to 4.