JP2007249600A

JP2007249600A - How to classify target data into categories

Info

Publication number: JP2007249600A
Application number: JP2006071958A
Authority: JP
Inventors: Sumio Fujita; 澄男藤田
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2006-03-16
Filing date: 2006-03-16
Publication date: 2007-09-27
Anticipated expiration: 2026-03-16
Also published as: JP4891638B2

Abstract

【課題】人手で行うのと近い精度で自動的に目的データをカテゴリに分類することができ、更には、そうした信頼度の高い分類サンプル（学習事例）を継続的に取得し、次回以降の新たな目的データの分類に利用できる方法を提供すること。
【解決手段】端末装置２０では、ユーザがサーバ１０に記憶されている商品等の目的データを検索する場合に、ツリー状にカテゴリ分けされたリンクを辿って、希望する種類の目的データ群を閲覧する。目的データとカテゴリとの関連付けについてはサーバ１０が記憶している。サーバ１０は、分類するカテゴリを学習するデータを受け付けた際、分類の候補となるカテゴリを付与する。端末装置２０のユーザは、付与されたカテゴリのうち、いずれかを選択して目的のデータに辿り着くが、このときに選択されたカテゴリをサーバ１０は記憶する。そして、サーバ１０は、新たな目的データを受け付けた際に、学習したカテゴリに分類する。
【選択図】図１[PROBLEMS] It is possible to automatically classify target data into categories with an accuracy close to that performed manually, and to continuously acquire such highly reliable classification samples (learning examples) and to newly To provide a method that can be used to classify specific objective data.
In a terminal device, when a user searches for target data such as a product stored in a server, a user browses a group of desired types of target data by following links categorized in a tree shape. To do. The server 10 stores the association between the target data and the category. When the server 10 receives data for learning a category to be classified, the server 10 assigns a category as a classification candidate. The user of the terminal device 20 selects one of the assigned categories and arrives at the target data. The server 10 stores the category selected at this time. Then, when the server 10 receives new object data, the server 10 classifies it into the learned category.
[Selection] Figure 1

Description

本発明は、端末装置のユーザに閲覧させる目的データをカテゴリに分類する方法、サーバ、およびプログラムに関する。 The present invention relates to a method, a server, and a program for classifying target data to be browsed by a user of a terminal device.

従来、インターネットにおいて情報（目的データ）を検索する方法としては、様々なものが提供されているが、その一つに、ツリー状に構成されたカテゴリをユーザが辿っていくことによって目的の情報（目的データ）に至る、ディレクトリ型と呼ばれる検索エンジンがある。この方法は、通常人手によりカテゴリ分けがなされた目的データを検索するもので、同種類の情報をまとめて閲覧することが可能であり、例えば商品情報の検索等に用いられている。 Conventionally, various methods for searching information (target data) on the Internet have been provided, and one of them is to search for target information (following a tree-structured category). There is a search engine called a directory type. This method searches for target data that has been classified into categories by hand, and can browse the same type of information collectively. For example, it is used for searching for product information.

この方法を実現するためには、予め商品等の情報提供項目に対してカテゴリ情報を記憶しておく必要がある。更に、ユーザによる検索が効果的に行われるためには、カテゴリに分類する際の精度の高さが求められる。こうした状況の中で、情報の分類に関しては、その精度を向上させる技術が知られている。 In order to realize this method, it is necessary to store category information for information providing items such as products in advance. Furthermore, in order for the search by the user to be performed effectively, high accuracy is required when classifying into categories. Under such circumstances, techniques for improving the accuracy of information classification are known.

例えば、特許文献１には、各クラス（カテゴリ）間の情報の重なりによる誤分類を少なくする為に、各クラスについて着目クラスには現れるが他のクラスでは現れにくい特徴、および他のクラスでは現れるが着目クラスでは現れにくい特徴を求め、そのデータを使って類似度の補正を行う方法が開示されている。
特開２００３−２５６４４１号公報 For example, in Patent Document 1, in order to reduce misclassification due to overlapping information between classes (categories), features that appear in the class of interest for each class but are difficult to appear in other classes, and appear in other classes However, a method is disclosed in which a feature that is difficult to appear in the class of interest is obtained, and the similarity is corrected using the data.
JP 2003-256441 A

しかしながら、上記方法によっても、分類の精度は学習事例（目的データを特徴付けるキーワード等とカテゴリとの組）の正確さ、あるいは分類の際の類似度評価の正確さに左右されるため、誤った分類がされる可能性は依然として残る。そのため、通常人手により行っている各目的データに対するカテゴリ情報の付与を自動で行うと、自動で分類された学習事例に含まれる誤分類が蓄積されることによって、次第に分類精度が悪化していくことが懸念される。 However, even with the above method, the accuracy of classification depends on the accuracy of the learning examples (a combination of keywords and other characteristics that characterize the target data and the category) or the accuracy of the similarity evaluation at the time of classification. The possibility of being left still remains. For this reason, if category information is automatically assigned to each objective data that is normally performed manually, the classification accuracy will gradually deteriorate due to the accumulation of misclassifications included in automatically classified learning cases. Is concerned.

そこで本発明は、人手で行うのと近い精度で自動的に目的データをカテゴリに分類することができ、更には、そうした信頼度の高い分類サンプル（学習事例）を継続的に取得し、次回以降の新たな目的データの分類に利用できる方法を提供することを目的とする。 Therefore, the present invention can automatically classify target data into categories with an accuracy close to that performed manually, and continuously obtain such highly reliable classification samples (learning examples). It is an object of the present invention to provide a method that can be used to classify new target data.

具体的には、以下のようなものを提供する。 Specifically, the following are provided.

（１）端末装置と通信ネットワークを介して接続されたサーバが、前記端末装置のユーザに閲覧させる目的データをカテゴリに分類する方法であって、
前記目的データを複数の候補カテゴリデータと関連付けて記憶するステップと、
前記端末装置から、前記ユーザによる前記複数の候補カテゴリデータのうち一の選択を表すデータを受信したことに応じて、前記選択をした候補カテゴリデータに関連付けられた目的データを送信するステップと、
前記端末装置から前記ユーザによる前記目的データの選択を表すデータを受け付けたことに応じて、前記ユーザが前記選択をした候補カテゴリデータの選択回数データをカウントアップして、前記選択をした目的データおよび前記選択をした候補カテゴリデータに関連付けて記憶するステップと、
所定の期間における前記選択回数データに基づき、前記目的データを分類する前記カテゴリの決定をするステップと、
を含む方法。 (1) A method in which a server connected to a terminal device via a communication network classifies target data to be browsed by a user of the terminal device into a category,
Storing the target data in association with a plurality of candidate category data;
Transmitting the target data associated with the selected candidate category data in response to receiving data representing one selection of the plurality of candidate category data by the user from the terminal device;
In response to receiving data representing selection of the target data by the user from the terminal device, the user counts up the number of selections of candidate category data selected by the user, and the selected target data and Storing in association with the selected candidate category data;
Determining the category for classifying the target data based on the selection frequency data in a predetermined period;
Including methods.

（１）の発明によれば、サーバは、端末装置のユーザに閲覧させる目的データを複数の候補カテゴリデータと関連付けて記憶し、前記端末装置から、前記ユーザによる前記複数の候補カテゴリデータのうち一の選択を表すデータを受信したことに応じて、前記選択をした候補カテゴリデータに関連付けられた目的データを送信し、前記端末装置から前記ユーザによる前記目的データの選択を表すデータを受け付けたことに応じて、前記ユーザが前記選択をした候補カテゴリデータの選択回数データをカウントアップして、前記選択をした目的データおよび前記選択をした候補カテゴリデータに関連付けて記憶し、所定の期間における前記選択回数データに基づき、前記目的データを分類する前記カテゴリの決定をする。 According to the invention of (1), the server stores the target data to be browsed by the user of the terminal device in association with a plurality of candidate category data, and from the terminal device, one of the plurality of candidate category data by the user is stored. In response to receiving the data indicating the selection of the user, the target data associated with the selected candidate category data is transmitted, and the data indicating the selection of the target data by the user is received from the terminal device. Accordingly, the selection count data of the candidate category data selected by the user is counted up, stored in association with the selected target data and the selected candidate category data, and the selection count in a predetermined period. The category for classifying the target data is determined based on the data.

このことにより、前記サーバは、複数の候補カテゴリについて、実際のユーザのアクセス履歴（どのカテゴリから当該目的データに辿り着いたか）を記憶できるので、当該アクセス履歴に基づいてカテゴリを選択することにより、仮に候補カテゴリの中に相応しくないものが混在していた場合にも、そのようなカテゴリは自動的に排除され、実際のユーザの思いに近い（精度の良い）カテゴリに対して自動的に分類できる可能性がある。 As a result, the server can store the actual user access history (from which category the target data has been reached) for a plurality of candidate categories, so that by selecting a category based on the access history Even if there are unsuitable candidate categories, such categories are automatically excluded and automatically classified into categories that are close to the user's expectations (high accuracy). There is a possibility.

（２）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数が所定の回数以上のものであることを特徴とする（１）に記載の方法。 (2) The method according to (1), wherein the category to be determined has a number of selections of the candidate category data equal to or greater than a predetermined number.

（２）の発明によれば、前記サーバは、前記目的データを選択回数が所定の回数以上のカテゴリに分類する。 According to the invention of (2), the server classifies the target data into categories in which the number of selections is a predetermined number or more.

このことにより、前記サーバは、数多く（所定の回数以上）選択された実績のあるカテゴリに分類するので、実際のユーザが選択しやすいカテゴリに対して自動的に分類できる可能性がある。
（３）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数が所定の順位以上のものであることを特徴とする（１）に記載の方法。 As a result, the server classifies the category into a category with a track record of selection (more than a predetermined number of times), so that there is a possibility that it can be automatically classified into a category that can be easily selected by an actual user.
(3) The method according to (1), wherein the category to be determined has a number of selections of the candidate category data equal to or higher than a predetermined rank.

（３）の発明によれば、前記サーバは、前記目的データを前記選択回数が所定の順位以上のものに対応するカテゴリに分類する。 According to the invention of (3), the server classifies the target data into categories corresponding to those with the number of selections equal to or higher than a predetermined order.

このことにより、前記サーバは、選択される頻度が高いカテゴリを一または複数選択するので、多くのユーザが考える（思いつきやすい）カテゴリに対して自動的に分類できる可能性がある。 Accordingly, since the server selects one or a plurality of categories that are frequently selected, there is a possibility that the server can automatically classify the categories that many users think (easy to come up with).

（４）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数の相対評価により決定するものであることを特徴とする（１）に記載の方法。 (4) The method according to (1), wherein the category to be determined is determined by relative evaluation of the number of selections of the candidate category data.

（４）の発明によれば、前記サーバは、前記カテゴリデータの選択回数の相対評価により前記目的データを分類するカテゴリを決定する。 According to the invention of (4), the server determines a category for classifying the target data by relative evaluation of the number of selections of the category data.

このことにより、前記サーバは、関連付けた複数のカテゴリの中から、例えば他と比べて選択される割合が突出するカテゴリを選択できるため、ユーザの志向により近いカテゴリに対して自動的に分類できる可能性がある。また、所定の期間において、どの候補カテゴリも所定の選択回数に達しなかった場合や、所定の順位以内の候補カテゴリ間で選択回数に大きな差がある場合にも適切にカテゴリを選択できる。 As a result, the server can automatically select a category that is closer to the user's orientation because it can select a category that has a higher proportion of selection than other categories. There is sex. In addition, a category can be appropriately selected even when no candidate category has reached the predetermined number of selections in a predetermined period or when there is a large difference in the number of selections between candidate categories within a predetermined rank.

（５）前記決定をしたカテゴリデータと関連付けて、前記目的データに含まれるキーワードデータを抽出して記憶するステップを更に含む（１）から（４）のいずれかに記載の方法。 (5) The method according to any one of (1) to (4), further including a step of extracting and storing keyword data included in the target data in association with the determined category data.

（５）の発明によれば、前記サーバは、前記決定をしたカテゴリデータと関連付けて、前記目的データに含まれるキーワードデータを抽出して記憶する。 According to the invention of (5), the server extracts and stores the keyword data included in the target data in association with the determined category data.

このことにより、前記サーバは、前記カテゴリデータと前記キーワードデータとの対応付けを記憶するため、後に、分類の事例として利用することができる。 Thus, the server stores the association between the category data and the keyword data, and can be used later as a classification example.

（６）前記目的データとは別の新たな目的データの登録を受け付けたことに応じて、前記新たな目的データと前記キーワードデータとの類似度の算出をするステップと、
前記算出をした類似度に基づくカテゴリの評価に従い、前記新たな目的データを分類するカテゴリを決定するステップと、
を更に含む（５）に記載の方法。 (6) calculating a similarity between the new target data and the keyword data in response to accepting registration of new target data different from the target data;
Determining a category for classifying the new target data in accordance with a category evaluation based on the calculated similarity;
The method according to (5), further comprising:

（６）の発明によれば、前記サーバは、前記目的データとは別の新たな目的データの登録を受け付けたことに応じて、前記新たな目的データと前記キーワードデータとの類似度の算出をし、前記算出をした類似度に基づき、前記新たな目的データを分類するカテゴリを決定する。 According to the invention of (6), the server calculates the similarity between the new target data and the keyword data in response to receiving registration of new target data different from the target data. Then, based on the calculated similarity, a category for classifying the new target data is determined.

このことにより、前記サーバは、記憶した前記キーワードデータと前記カテゴリデータとの関連付けに基づき、キーワードが類似する前記目的データについて、対応するカテゴリに自動的に分類することができる。 Accordingly, the server can automatically classify the target data having similar keywords into corresponding categories based on the association between the stored keyword data and the category data.

（７）前記目的データを複数の前記カテゴリデータと関連付けて記憶するステップが、前記目的データと前記キーワードデータとの類似度の算出をし、当該算出をした類似度に基づくカテゴリの評価に従い前記候補カテゴリデータを選択することを特徴とする（５）または（６）に記載の方法。 (7) The step of storing the target data in association with a plurality of the category data calculates a similarity between the target data and the keyword data, and the candidate is evaluated according to a category evaluation based on the calculated similarity. The method according to (5) or (6), wherein category data is selected.

（７）の発明によれば、前記サーバは、前記目的データと前記キーワードデータとの類似度の算出をし、当該算出をした類似度に基づき前記候補カテゴリデータを選択する。 According to the invention of (7), the server calculates the similarity between the target data and the keyword data, and selects the candidate category data based on the calculated similarity.

このことにより、前記サーバは、記憶したキーワードデータとカテゴリデータとの関連付けに基づき、学習を行う（分類するカテゴリをアクセス履歴により決定する）データについてキーワードが類似する複数のカテゴリを、分類する候補として前記目的データに関連付けることができる。よって、ユーザのアクセス履歴を取得する対象のカテゴリを自動的に抽出することができる。 As a result, the server performs learning based on the association between the stored keyword data and category data (determining the category to be classified based on the access history) as candidates for classifying a plurality of categories having similar keywords. It can be associated with the target data. Therefore, it is possible to automatically extract a category for which a user access history is acquired.

（８）前記決定をしたカテゴリデータと関連付けて前記目的データに含まれるキーワードデータを抽出して記憶するステップが、前記カテゴリの選択回数に基づく選択確率データを更に関連付けて記憶し、
前記キーワードデータに関連付くカテゴリの評価が、前記類似度および前記選択確率データに基づくことを特徴とする（６）または（７）に記載の方法。 (8) The step of extracting and storing keyword data included in the target data in association with the determined category data further stores selection probability data based on the number of selections of the category,
The method according to (6) or (7), wherein an evaluation of a category associated with the keyword data is based on the similarity and the selection probability data.

（８）の発明によれば、前記サーバは、前記決定をしたカテゴリデータと関連付けて前記カテゴリの選択回数に基づく選択確率データを更に記憶し、前記キーワードデータに関連付くカテゴリの評価を、前記類似度および前記選択確率データに基づいて行う。 According to the invention of (8), the server further stores selection probability data based on the number of selections of the category in association with the determined category data, and evaluates the category associated with the keyword data with the similarity And based on the selection probability data.

このことにより、前記サーバは、例えば、類似度に基づいて抽出された分類候補となるカテゴリの中から、ユーザによる選択確率の高かったカテゴリに対して優先的に分類できる。よって、ユーザの志向に沿った、より自然な分類ができる可能性がある。 Thereby, for example, the server can preferentially classify a category having a high selection probability by the user from categories that are candidates for classification extracted based on the degree of similarity. Therefore, there is a possibility that more natural classification can be performed according to the user's intention.

（９）端末装置と通信ネットワークを介して接続され、前記端末装置のユーザに閲覧させる目的データをカテゴリに分類するサーバであって、
前記目的データを複数の候補カテゴリデータと関連付けて記憶する手段と、
前記端末装置から、前記ユーザによる前記複数の候補カテゴリデータのうち一の選択を表すデータを受信したことに応じて、前記選択をした候補カテゴリデータに関連付けられた目的データを送信する手段と、
前記端末装置から前記ユーザによる前記目的データの選択を表すデータを受け付けたことに応じて、前記ユーザが前記選択をした候補カテゴリデータの選択回数データをカウントアップして、前記選択をした目的データおよび前記選択をした候補カテゴリデータに関連付けて記憶する手段と、
所定の期間における前記選択回数データに基づき、前記目的データを分類する前記カテゴリの決定をする手段と、
を備えるサーバ。 (9) A server that is connected to a terminal device via a communication network and classifies target data to be browsed by a user of the terminal device,
Means for storing the target data in association with a plurality of candidate category data;
Means for transmitting target data associated with the selected candidate category data in response to receiving data representing one selection of the plurality of candidate category data by the user from the terminal device;
In response to receiving data representing selection of the target data by the user from the terminal device, the user counts up the number of selections of candidate category data selected by the user, and the selected target data and Means for storing in association with the selected candidate category data;
Means for determining the category for classifying the target data based on the selection frequency data in a predetermined period;
A server comprising

（９）の発明によれば、当該サーバを運用することにより、（１）と同様の効果が期待できる。 According to the invention of (9), the same effect as that of (1) can be expected by operating the server.

（１０）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数が所定の回数以上のものであることを特徴とする（９）に記載のサーバ。 (10) The server according to (9), wherein the category to be determined is one in which the candidate category data is selected a predetermined number of times or more.

（１０）の発明によれば、当該サーバを運用することにより、（２）と同様の効果が期待できる。 According to the invention of (10), the same effect as in (2) can be expected by operating the server.

（１１）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数が所定の順位以上のものであることを特徴とする（９）に記載のサーバ。 (11) The server according to (9), wherein the category to be determined has a number of selections of the candidate category data equal to or higher than a predetermined order.

（１１）の発明によれば、当該サーバを運用することにより、（３）と同様の効果が期待できる。 According to the invention of (11), the same effect as that of (3) can be expected by operating the server.

（１２）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数の相対評価により決定するものであることを特徴とする（９）に記載のサーバ。 (12) The server according to (9), wherein the category to be determined is determined by relative evaluation of the number of selections of the candidate category data.

（１２）の発明によれば、当該サーバを運用することにより、（４）と同様の効果が期待できる。 According to the invention of (12), the same effect as in (4) can be expected by operating the server.

（１３）前記決定をしたカテゴリデータと関連付けて、前記目的データに含まれるキーワードデータを抽出して記憶する手段を更に備える（９）から（１２）のいずれかに記載のサーバ。 (13) The server according to any one of (9) to (12), further including means for extracting and storing keyword data included in the target data in association with the determined category data.

（１３）の発明によれば、当該サーバを運用することにより、（５）と同様の効果が期待できる。 According to the invention of (13), the same effect as in (5) can be expected by operating the server.

（１４）前記目的データとは別の新たな目的データの登録を受け付けたことに応じて、前記新たな目的データと前記キーワードデータとの類似度の算出をする手段と、
前記算出をした類似度に基づくカテゴリの評価に従い、前記新たな目的データを分類するカテゴリを決定する手段と、
を更に備える（１３）に記載のサーバ。 (14) means for calculating a similarity between the new target data and the keyword data in response to accepting registration of new target data different from the target data;
Means for determining a category for classifying the new target data according to the evaluation of the category based on the calculated similarity;
The server according to (13), further comprising:

（１４）の発明によれば、当該サーバを運用することにより、（６）と同様の効果が期待できる。 According to the invention of (14), the same effect as in (6) can be expected by operating the server.

（１５）前記目的データを複数の前記カテゴリデータと関連付けて記憶する手段が、前記目的データと前記キーワードデータとの類似度の算出をし、当該算出をした類似度に基づくカテゴリの評価に従い前記候補カテゴリデータを選択することを特徴とする（１３）または（１４）に記載のサーバ。 (15) A means for storing the target data in association with a plurality of the category data calculates a similarity between the target data and the keyword data, and the candidate is evaluated according to a category evaluation based on the calculated similarity. The server according to (13) or (14), wherein category data is selected.

（１５）の発明によれば、当該サーバを運用することにより、（７）と同様の効果が期待できる。 According to the invention of (15), the same effect as that of (7) can be expected by operating the server.

（１６）前記決定をしたカテゴリデータと関連付けて前記目的データに含まれるキーワードデータを抽出して記憶する手段が、前記カテゴリの選択回数に基づく選択確率データを更に関連付けて記憶し、
前記キーワードデータに関連付くカテゴリの評価が、前記類似度および前記選択確率データに基づくことを特徴とする（１４）または（１５）に記載のサーバ。 (16) The means for extracting and storing the keyword data included in the target data in association with the determined category data further stores the selection probability data based on the number of selections of the category,
The server according to (14) or (15), wherein an evaluation of a category associated with the keyword data is based on the similarity and the selection probability data.

（１６）の発明によれば、当該サーバを運用することにより、（８）と同様の効果が期待できる。 According to the invention of (16), the same effect as that of (8) can be expected by operating the server.

（１７）端末装置と通信ネットワークを介して接続されたサーバに、前記端末装置のユーザに閲覧させる目的データをカテゴリに分類させるプログラムであって、
前記目的データを複数の候補カテゴリデータと関連付けて記憶させるステップと、
前記端末装置から、前記ユーザによる前記複数の候補カテゴリデータのうち一の選択を表すデータを受信したことに応じて、前記選択をした候補カテゴリデータに関連付けられた目的データを送信させるステップと、
前記端末装置から前記ユーザによる前記目的データの選択を表すデータを受け付けたことに応じて、前記ユーザが前記選択をした候補カテゴリデータの選択回数データをカウントアップして、前記選択をした目的データおよび前記選択をした候補カテゴリデータに関連付けて記憶させるステップと、
所定の期間における前記選択回数データに基づき、前記目的データを分類する前記カテゴリの決定をさせるステップと、
を実行させるプログラム。 (17) A program for causing a server connected to a terminal device via a communication network to classify target data to be viewed by a user of the terminal device into a category,
Storing the target data in association with a plurality of candidate category data;
In response to receiving data representing one selection of the plurality of candidate category data by the user from the terminal device, transmitting target data associated with the selected candidate category data;
In response to receiving data representing selection of the target data by the user from the terminal device, the user counts up the number of selections of candidate category data selected by the user, and the selected target data and Storing in association with the selected candidate category data;
Determining the category for classifying the target data based on the selection frequency data in a predetermined period;
A program that executes

（１７）の発明によれば、当該プログラムを当該サーバ上で実行することにより、（１）と同様の効果が期待できる。 According to the invention of (17), the same effect as in (1) can be expected by executing the program on the server.

（１８）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数が所定の回数以上のものであることを特徴とする（１７）に記載のプログラム。 (18) The program according to (17), wherein the category to be determined has a number of selections of the candidate category data equal to or greater than a predetermined number.

（１８）の発明によれば、当該プログラムを当該サーバ上で実行することにより、（２）と同様の効果が期待できる。 According to the invention of (18), the same effect as in (2) can be expected by executing the program on the server.

（１９）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数が所定の順位以上のものであることを特徴とする（１７）に記載のプログラム。 (19) The program according to (17), wherein the category to be determined has a number of selections of the candidate category data equal to or higher than a predetermined order.

（１９）の発明によれば、当該プログラムを当該サーバ上で実行することにより、（３）と同様の効果が期待できる。 According to the invention of (19), the same effect as in (3) can be expected by executing the program on the server.

（２０）前記決定をするカテゴリは、前記候補カテゴリデータの選択回数の相対評価により決定するものであることを特徴とする（１７）に記載のプログラム。 (20) The program according to (17), wherein the category to be determined is determined by relative evaluation of the number of selections of the candidate category data.

（２０）の発明によれば、当該プログラムを当該サーバ上で実行することにより、（４）と同様の効果が期待できる。 According to the invention of (20), the same effect as in (4) can be expected by executing the program on the server.

（２１）前記決定をしたカテゴリデータと関連付けて、前記目的データに含まれるキーワードデータを抽出して記憶させるステップを更に含む（１７）から（２０）のいずれかに記載のプログラム。 (21) The program according to any one of (17) to (20), further including a step of extracting and storing keyword data included in the target data in association with the determined category data.

（２１）の発明によれば、当該プログラムを当該サーバ上で実行することにより、（５）と同様の効果が期待できる。 According to the invention of (21), the same effect as in (5) can be expected by executing the program on the server.

（２２）前記目的データとは別の新たな目的データの登録を受け付けたことに応じて、前記新たな目的データと前記キーワードデータとの類似度の算出をさせるステップと、
前記算出をした類似度に基づくカテゴリの評価に従い、前記新たな目的データを分類するカテゴリを決定させるステップと、
を更に含む（２１）に記載のプログラム。 (22) a step of calculating a similarity between the new target data and the keyword data in response to accepting registration of new target data different from the target data;
Determining a category for classifying the new target data in accordance with a category evaluation based on the calculated similarity;
The program according to (21), further including:

（２２）の発明によれば、当該プログラムを当該サーバ上で実行することにより、（６）と同様の効果が期待できる。 According to the invention of (22), the same effect as in (6) can be expected by executing the program on the server.

（２３）前記目的データを複数の前記カテゴリデータと関連付けて記憶させるステップが、前記目的データと前記キーワードデータとの類似度の算出をさせ、当該算出をした類似度に基づくカテゴリの評価に従い前記候補カテゴリデータを選択させることを特徴とする（２１）または（２２）に記載のプログラム。 (23) The step of storing the target data in association with a plurality of the category data causes the similarity between the target data and the keyword data to be calculated, and the candidate is evaluated according to the category evaluation based on the calculated similarity. The program according to (21) or (22), wherein category data is selected.

（２３）の発明によれば、当該プログラムを当該サーバ上で実行することにより、（７）と同様の効果が期待できる。 According to the invention of (23), the same effect as in (7) can be expected by executing the program on the server.

（２４）前記決定をしたカテゴリデータと関連付けて前記目的データに含まれるキーワードデータを抽出して記憶させるステップが、前記カテゴリの選択回数に基づく選択確率データを更に関連付けて記憶させ、
前記キーワードデータに関連付くカテゴリの評価が、前記類似度および前記選択確率データに基づくことを特徴とする（２２）または（２３）に記載のプログラム。 (24) The step of extracting and storing the keyword data included in the target data in association with the determined category data further stores the selection probability data based on the number of selections of the category,
The program according to (22) or (23), wherein an evaluation of a category associated with the keyword data is based on the similarity and the selection probability data.

（２４）の発明によれば、当該プログラムを当該サーバ上で実行することにより、（８）と同様の効果が期待できる。 According to the invention of (24), the same effect as in (8) can be expected by executing the program on the server.

本発明によれば、人手で行う場合と近い精度で自動的に目的データをカテゴリに分類できる可能性がある。更には、そうした信頼度の高い分類サンプル（学習事例）を継続的に取得し、次回以降の新たな目的データの分類に利用することで、分類の精度を高く維持できる可能性がある。 According to the present invention, there is a possibility that the target data can be automatically classified into categories with an accuracy close to that performed manually. Furthermore, by continuously acquiring such highly reliable classification samples (learning examples) and using them for classification of new target data after the next time, there is a possibility that classification accuracy can be maintained high.

本発明に係る好適な実施形態の一例について、図面に基づいて以下に説明する。 An example of a preferred embodiment according to the present invention will be described below based on the drawings.

［全体構成］
図１は、本発明の好適な実施形態の一例に係るデータ分類方法の概念図である。 [overall structure]
FIG. 1 is a conceptual diagram of a data classification method according to an example of a preferred embodiment of the present invention.

カテゴリ分類を実施するサーバ１０は、ユーザが目的データを閲覧するために使用する端末装置２０と、通信ネットワーク３０を介して接続され、端末装置２０のユーザに対する情報提供サービスを運用する。 The server 10 that performs category classification is connected to the terminal device 20 used by the user for browsing the target data via the communication network 30 and operates an information providing service for the user of the terminal device 20.

端末装置２０では、ユーザがサーバ１０に記憶されている商品等の情報提供項目（目的データ）を検索する場合に、ツリー状にカテゴリ分けされたリンクを辿って、希望する種類の目的データ（商品データ等）群を閲覧する。目的データとカテゴリとの関連付けについてはサーバ１０が記憶している。 In the terminal device 20, when the user searches for information provision items (purpose data) such as products stored in the server 10, the user sorts the desired category of target data (product) by following the links categorized in a tree shape. Data etc.) Browse the group. The server 10 stores the association between the target data and the category.

サーバ１０は、分類するカテゴリを学習するデータ（商品Ａ）を受け付けた際、分類の候補となるカテゴリ（分類１、分類２）を付与する。端末装置２０のユーザは、付与されたカテゴリのうち、いずれかを選択して目的のデータ（商品Ａ）に辿り着くが、このときに選択されたカテゴリをサーバ１０は記憶する。 When the server 10 receives data (product A) for learning a category to be classified, the server 10 assigns categories (classification 1 and classification 2) that are candidates for classification. The user of the terminal device 20 selects one of the assigned categories and arrives at the target data (product A). The server 10 stores the category selected at this time.

このようなカテゴリ選択の履歴を所定の期間記憶することにより、サーバ１０は、ユーザの考えに近いカテゴリ分類を学習することができる（処理の詳細は後述する）。その結果、サーバ１０は、新たな目的データ（商品Ｂ）を受け付けた際に、学習したカテゴリ（例えば分類２）に分類する。 By storing such a category selection history for a predetermined period, the server 10 can learn category classification close to the user's idea (details of the process will be described later). As a result, when the server 10 receives new target data (product B), the server 10 classifies it into the learned category (for example, classification 2).

［コンピュータの構成］
図２は、本発明の好適な実施形態の一例に係るコンピュータシステムを構成する各コンピュータ（サーバ１０、または端末装置２０）の構成を示すブロック図である。 [Computer configuration]
FIG. 2 is a block diagram showing a configuration of each computer (server 10 or terminal device 20) constituting the computer system according to an example of the preferred embodiment of the present invention.

制御部１１０、記憶部１２０、入力部１３０、表示部１４０、通信制御部１５０は、バス１６０を介して接続されている。 The control unit 110, the storage unit 120, the input unit 130, the display unit 140, and the communication control unit 150 are connected via a bus 160.

制御部１１０は、情報の演算、処理を行う情報演算処理装置（ＣＰＵ）であり、当該コンピュータ全体の制御を行う。制御部１１０は、記憶部１２０に記憶された各種プログラムを適宜読み出して実行することにより、上述のハードウエアと協働し、本発明に係る各種機能を実現している。 The control unit 110 is an information processing unit (CPU) that performs calculation and processing of information, and controls the entire computer. The control unit 110 reads and executes various programs stored in the storage unit 120 as appropriate, thereby realizing various functions according to the present invention in cooperation with the above-described hardware.

記憶部１２０は、制御部１１０と組み合わせてプログラムの実行に使用するローカルメモリ、大容量のバルクメモリ、および当該バルクメモリの検索を効率的に行うために使用するキャッシュメモリを含んでよい。記憶部１２０を実現するコンピュータ可読媒体としては、電気的、磁気的、光学的、電磁的に実現するものを含んでよい。より具体的には、半導体記憶装置、磁気テープ、フロッピー（登録商標）ディスク、ランダム・アクセス・メモリ（ＲＡＭ）、リードオンリー・メモリ（ＲＯＭ）、ＣＤ−ＲＯＭとＣＤ−Ｒ／ＷとＤＶＤとを含む光ディスクが含まれる。 The storage unit 120 may include a local memory used for executing a program in combination with the control unit 110, a large-capacity bulk memory, and a cache memory used for efficiently searching the bulk memory. The computer readable medium that implements the storage unit 120 may include an electrical, magnetic, optical, or electromagnetic implementation. More specifically, a semiconductor storage device, a magnetic tape, a floppy (registered trademark) disk, a random access memory (RAM), a read only memory (ROM), a CD-ROM, a CD-R / W, and a DVD Includes optical discs.

入力部１３０は、ユーザによる入力の受付を行うものであり、キーボード、ポインティング・デバイス等を含んでよい。入力部１３０は、直接または介在Ｉ／Ｏコントローラを介して当該コンピュータと接続することができる。 The input unit 130 receives input from the user, and may include a keyboard, a pointing device, and the like. The input unit 130 can be connected to the computer directly or via an intervening I / O controller.

表示部１４０は、ユーザにデータの入力を受け付ける画面を表示したり、当該コンピュータによる演算処理結果の画面を表示したりするものであり、ブラウン管表示装置（ＣＲＴ）、液晶表示装置（ＬＣＤ）等のディスプレイ装置を含む。表示部１４０は、直接または介在Ｉ／Ｏコントローラを介して当該コンピュータと接続することができる。 The display unit 140 displays a screen for accepting data input to the user or displays a calculation result screen by the computer, such as a cathode ray tube display (CRT), a liquid crystal display (LCD), or the like. Including a display device. The display unit 140 can be connected to the computer directly or via an intervening I / O controller.

通信制御部１５０は、当該コンピュータを専用ネットワークまたは公共ネットワークを介して別の演算処理システムまたは記憶装置と接続できるようにするためのネットワーク・アダプタである。通信制御部１５０は、モデム、ケーブル・モデムおよびイーサネット（登録商標）・アダプタを含んでよい。 The communication control unit 150 is a network adapter that enables the computer to be connected to another arithmetic processing system or a storage device via a dedicated network or a public network. The communication control unit 150 may include a modem, a cable modem, and an Ethernet (registered trademark) adapter.

［学習処理フロー］
図３は、本発明の好適な実施形態の一例に係るデータ分類の学習処理フローを示す図である。 [Learning process flow]
FIG. 3 is a diagram showing a data classification learning process flow according to an example of the preferred embodiment of the present invention.

ステップＳ１０５では、サーバ１０は、学習用の目的データを受け付ける。当該目的データは入力部１３０から受け付けることとしてもよいし、通信制御部１５０を介して別の演算処理システムあるいは記憶装置から受信することとしてもよい。目的データは、例えば商品に関するデータの場合には、商品名や商品説明等の情報を含んでおり、本学習処理を通じて、分類すべきカテゴリを付与する。 In step S105, the server 10 receives objective data for learning. The target data may be received from the input unit 130 or may be received from another arithmetic processing system or storage device via the communication control unit 150. For example, in the case of data relating to a product, the target data includes information such as a product name and a product description, and a category to be classified is given through this learning process.

ステップＳ１１０では、サーバ１０は、受け付けた目的データに対して、分類の候補となる複数のカテゴリを付与する。このカテゴリの付与については、人手によってもよいが、既に分類済みのサンプルが存在すればサーバが自動で行う（詳細は図４にて後述する）ことが望ましい。 In step S110, the server 10 assigns a plurality of categories that are candidates for classification to the received target data. The assignment of this category may be done manually, but it is desirable that the server automatically performs the assignment if there are already classified samples (details will be described later with reference to FIG. 4).

ステップＳ１０５およびＳ１１０の結果、例えば商品データの場合には、図５に示すような商品分類テーブル４０に、候補となるカテゴリ、商品名、商品説明を記憶する。これらのデータを基に、端末装置２０においてカテゴリに紐付けて商品データを表示し、ユーザに対する情報提供サービスを運用する。 As a result of Steps S105 and S110, for example, in the case of product data, candidate categories, product names, and product descriptions are stored in the product classification table 40 as shown in FIG. Based on these data, the terminal device 20 displays product data in association with the category, and operates an information providing service for the user.

ステップＳ１１５では、サーバ１０は、端末装置２０からカテゴリ選択を表すデータを受信する。具体的には、サーバ１０は、端末装置２０のユーザが辿ってきたカテゴリ（リンク）を示すデータを受信し、当該ユーザが目的データを選択（閲覧）した際に、ステップＳ１１０にて付与された候補カテゴリの中から選択されたカテゴリデータを判別する。 In step S <b> 115, the server 10 receives data representing category selection from the terminal device 20. Specifically, the server 10 receives the data indicating the category (link) followed by the user of the terminal device 20, and is given in step S110 when the user selects (views) the target data. The category data selected from the candidate categories is determined.

ステップＳ１２０では、サーバ１０は、候補カテゴリのそれぞれが選択された回数をカウントアップして記憶する。具体的には、例えば図６に示すアクセス履歴テーブル５０を利用する。目的データ（商品名フィールド）に対してステップＳ１１０にて付与した候補カテゴリ（カテゴリフィールド）のそれぞれについて、ステップＳ１１５にて受信したカテゴリ選択を表すデータを受信する毎に選択回数フィールドの値をカウントアップして記憶する。 In step S120, the server 10 counts up and stores the number of times each candidate category has been selected. Specifically, for example, an access history table 50 shown in FIG. 6 is used. For each candidate category (category field) assigned to the target data (product name field) in step S110, the value of the selection count field is counted up every time data representing the category selection received in step S115 is received. And remember.

ステップＳ１２５では、サーバ１０は、目的データを分類するカテゴリを決定するか否かを判別する。具体的には、例えば、目的データに候補カテゴリを付与してユーザへの閲覧を開始してから所定の期間が経った場合に分類するカテゴリを決定することとしてもよいし、選択回数が所定の数に達した場合に決定することとしてもよい。 In step S125, the server 10 determines whether to determine a category for classifying the target data. Specifically, for example, it is possible to determine a category to be classified when a predetermined period has passed since a candidate category is given to the target data and browsing to the user is started, and the number of selections is predetermined. It may be determined when the number is reached.

あるいは、サーバ１０は、この時点で分類するカテゴリを決定せず、後に新たな目的データを受け付けたときの選択回数に基づいて分類するカテゴリを決定することとしてもよい。その場合には、サーバ１０は、より長期間にわたってユーザの傾向を記憶し、カテゴリ分類に利用することができる。なお、このときには、ステップＳ１２５およびＳ１３０は不要となる。 Or the server 10 is good also as determining the category classified based on the frequency | count of selection when new target data is received later, without determining the category classified at this time. In this case, the server 10 can store the user's tendency for a longer period and use it for category classification. At this time, steps S125 and S130 are not necessary.

ステップＳ１３０では、サーバ１０は、目的データを分類するカテゴリを決定する。具体的には、例えば以下のような方法で決定することができる。 In step S130, the server 10 determines a category for classifying the target data. Specifically, for example, it can be determined by the following method.

（１）選択回数の最も多いカテゴリにする。この場合、サーバ１０は、目的データに付与した候補カテゴリの中から、最も選択回数の多いものを選択して目的データに関連付ける。 (1) Select a category with the largest number of selections. In this case, the server 10 selects the category with the highest number of selections from the candidate categories assigned to the target data and associates it with the target data.

（２）選択回数の上位複数のカテゴリにする。この場合、サーバ１０は、目的データに付与した候補カテゴリのうち、選択回数の多いものから順に所定の数を選択して目的データに関連付ける。 (2) Select a plurality of categories with the highest number of selections. In this case, the server 10 selects a predetermined number from the candidate categories assigned to the target data in descending order of selection, and associates them with the target data.

（３）選択回数が他と比べて多いカテゴリにする。この場合、サーバ１０は、例えば各候補カテゴリについての選択回数の偏差値を算出し、当該偏差値が所定の値以上のものを選択して目的データに関連付ける。 (3) Select categories that have more selections than others. In this case, for example, the server 10 calculates a deviation value of the number of selections for each candidate category, selects a deviation value that is equal to or greater than a predetermined value, and associates it with target data.

分類するカテゴリの決定方法は、以上に限られないが、このような方法によれば、サーバ１０は、多くのユーザが選択する傾向にあるカテゴリに対して目的データを効果的に分類することができる。このことにより、ユーザが辿りやすいカテゴリのみを残して、選択されにくい（精度の悪い）カテゴリを排除することもできる。 The method for determining the category to be classified is not limited to the above, but according to such a method, the server 10 can effectively classify the target data for a category that tends to be selected by many users. it can. As a result, it is also possible to leave only categories that are easy for the user to trace and exclude categories that are difficult to select (inaccurate).

なお、所定の期間が経ってもなお、分類すべきカテゴリが決定できないと判断した場合には、サーバ１０はその旨を管理者の端末に表示して候補カテゴリの再設定を促してもよい。 If it is determined that a category to be classified cannot be determined even after a predetermined period of time has elapsed, the server 10 may display a message to that effect on the administrator's terminal to prompt resetting of candidate categories.

ステップＳ１３５では、サーバ１０は、ステップＳ１３０にて決定したカテゴリ（ステップＳ１２５およびステップＳ１３０を行わない場合には全候補カテゴリ）と関連付けて、目的データに含まれるキーワードと、カテゴリの選択回数に基づいて計算する当該カテゴリの選択確率データを記憶する。具体的には、例えば、図７に示す分類学習テーブル６０を利用する。 In step S135, the server 10 associates with the category determined in step S130 (all candidate categories if step S125 and step S130 are not performed), and based on the keywords included in the target data and the number of category selections. The selection probability data of the category to be calculated is stored. Specifically, for example, the classification learning table 60 shown in FIG. 7 is used.

分類学習テーブル６０には、目的データの名称（例えば商品名等）、目的データに含まれるキーワード、ステップＳ１３０にて決定したカテゴリ、カテゴリの選択確率を関連付けて記憶する。ここで、キーワードは、目的データを特徴付ける単語であって、例えば商品データの場合には、商品分類テーブル４０（図５）の商品説明を、形態素解析等によって語句に分解し、ＴＦ＊ＩＤＦや確率言語モデルといった手法を用いることにより抽出することができる。 In the classification learning table 60, the name of the target data (for example, product name), the keyword included in the target data, the category determined in step S130, and the category selection probability are stored in association with each other. Here, the keyword is a word that characterizes target data. For example, in the case of product data, the product description in the product classification table 40 (FIG. 5) is decomposed into words by morphological analysis or the like, and TF * IDF or probability It can be extracted by using a method such as language model.

なお、記憶手段はテーブルには限られず、転置索引ファイルとして記憶することもできる。更に、記憶するキーワードには、出現場所や出現回数、あるいは他の分類サンプル（学習事例）とあわせた中での出現頻度等を関連付けて記憶することとしてもよい。これらは後述する類似度の算出において利用することができる。 The storage means is not limited to a table, and can be stored as an inverted index file. Furthermore, the keyword to be stored may be stored in association with the appearance location, the number of appearances, or the appearance frequency in combination with other classification samples (learning examples). These can be used in the calculation of similarity described later.

なお、ステップＳ１３５は、本処理フローにおいては、分類カテゴリの決定後に行うこととしたが、タイミングはこれに限られず、例えばキーワードについては、ステップＳ１０５にて目的データを受け付けた際に記憶してもよいし、選択確率は、ステップＳ１２０の選択回数とあわせて記憶することもできる。 Note that step S135 is performed after the classification category is determined in this processing flow, but the timing is not limited to this. For example, keywords may be stored when the target data is received in step S105. The selection probability may be stored together with the number of selections in step S120.

［分類処理フロー］
図４は、本発明の好適な実施形態の一例に係るカテゴリへの分類処理フローを示す図である。当該分類処理は、学習処理フロー（図３）において学習したカテゴリ分類に基づいて、新たな目的データを分類する処理であるが、当該学習処理フローのステップＳ１１０における候補カテゴリの付与においても用いることができる。 [Classification process flow]
FIG. 4 is a diagram showing a classification processing flow into categories according to an example of the preferred embodiment of the present invention. The classification process is a process of classifying new target data based on the category classification learned in the learning process flow (FIG. 3), but it can also be used for assigning candidate categories in step S110 of the learning process flow. it can.

ステップＳ２０５では、サーバ１０は、カテゴリ分類を行う新たな目的データを受け付ける。当該新たな目的データは入力部１３０から受け付けることとしてもよいし、通信制御部１５０を介して別の演算処理システムあるいは記憶装置から受信することとしてもよい。 In step S205, the server 10 receives new target data for performing category classification. The new target data may be received from the input unit 130, or may be received from another arithmetic processing system or storage device via the communication control unit 150.

ステップＳ２１０では、サーバ１０は、学習処理フロー（図３）のステップＳ１３５において記憶した分類学習テーブル６０または転置索引ファイル等に基づいて、当該新たな目的データとカテゴリ毎のキーワード群との類似度を算出する。具体的には、例えば、ＴＦ＊ＩＤＦ、あるいはｋＮＮ法やＮａｉｖｅＢａｙｅｓ法といった手法を用いることにより、類似度を算出することができる。このとき、キーワードと関連付けて出現頻度等のデータを前もって記憶していれば、これらを用いることができる。このことにより、サーバ１０は、当該新たな目的データと類似度の高いキーワード群と、それに対応するカテゴリを抽出することができる。 In step S210, the server 10 determines the similarity between the new target data and the keyword group for each category based on the classification learning table 60 or the inverted index file stored in step S135 of the learning process flow (FIG. 3). calculate. Specifically, for example, the degree of similarity can be calculated by using a technique such as TF * IDF, kNN method, or Naive Bayes method. At this time, if data such as the appearance frequency is stored in advance in association with the keyword, these can be used. Thus, the server 10 can extract a keyword group having a high similarity with the new target data and a category corresponding to the keyword group.

ステップＳ２１５では、サーバ１０は、ステップＳ２１０にて算出した類似度に基づいて、分類するカテゴリを決定する。具体的には、サーバ１０は当該類似度を用いて、対応するカテゴリそれぞれについて分類すべきか否かを評価する。このとき、サーバ１０は、例えば以下のように分類するカテゴリを一つまたは複数決定する。 In step S215, the server 10 determines a category to be classified based on the similarity calculated in step S210. Specifically, the server 10 evaluates whether or not each corresponding category should be classified using the similarity. At this time, the server 10 determines one or more categories to be classified as follows, for example.

（１）類似度が最も高いキーワード群に対応するカテゴリ、あるいは類似度の高い順に所定の個数のカテゴリに分類する（ＮａｉｖｅＢａｙｅｓ法等）。この場合、例えば「バッグ」、「ヌメ革」を商品説明に含んだ目的データは、分類学習テーブル６０（図７）の例によれば、１行目の学習事例のキーワードとの類似度が高くなり、サーバ１０は、当該目的データを「ハンドバッグ」のカテゴリに分類する。 (1) A category corresponding to a keyword group having the highest similarity, or a predetermined number of categories in descending order of similarity (Naive Bayes method or the like). In this case, for example, the target data including “bag” and “numerous leather” in the product description has a high similarity to the keyword of the learning example in the first row according to the example of the classification learning table 60 (FIG. 7). Thus, the server 10 classifies the target data into the “handbag” category.

このように、サーバ１０は、当該新たな目的データを受け付けた時点で記憶している最新のキーワードとカテゴリの組み合わせを参照して、当該新たな目的データの分類に活用する。なお、この分類学習テーブル６０は、前述のように学習処理フロー（図３）のステップＳ１３０において実際のユーザによる選択実績に基づいて、不要と考えられる候補カテゴリは排除されているため、特許文献１のように誤分類が蓄積されることによる精度の悪化も抑止できる。 In this way, the server 10 refers to the latest keyword / category combination stored at the time when the new target data is received, and uses it to classify the new target data. Note that this category learning table 60 excludes candidate categories that are considered unnecessary based on the actual selection by the user in step S130 of the learning process flow (FIG. 3), as described above. The deterioration of accuracy due to the accumulation of misclassifications can also be suppressed.

（２）類似度の高い適切な個数のサンプル中で分類するカテゴリを多数決する（ｋＮＮ法）。この場合、例えば、目的データとキーワードの類似する学習事例が３件あったとき、対応するカテゴリが「ハンドバッグ」２件、「トートバッグ」１件であれば、サーバ１０は、当該目的データを多数決により「ハンドバッグ」に分類する。 (2) A large number of categories to be classified in an appropriate number of samples having a high degree of similarity are determined (kNN method). In this case, for example, when there are three learning cases similar to the target data and the keyword, if the corresponding category is two “handbags” and one “tote bag”, the server 10 determines the majority of the target data. Classify as “handbag”.

（３）類似度による重み付けをした上で、適切な個数のサンプル中で分類するカテゴリを多数決する（ｋＮＮ法）。この場合、上記（２）の例では、「トートバッグ」よりも「ハンドバッグ」の方がキーワードの類似する学習事例の件数は多いが、「トートバッグ」の類似度が勝っていれば、サーバ１０は、重み付けの結果「トートバッグ」をより高く評価して分類する可能性がある。 (3) After weighting by similarity, a large number of categories to be classified in an appropriate number of samples are determined (kNN method). In this case, in the example of (2) above, the “handbag” has a larger number of learning cases with similar keywords than the “totebag”, but if the “totebag” similarity score is higher, the server 10 May classify the “tote bag” as a result of weighting, with a higher rating.

（４）同一カテゴリに対応付けられたキーワード群との類似度の平均が最も高いカテゴリ、あるいは高い順に所定の個数のカテゴリに分類する。この場合は、サーバ１０は、同じカテゴリに分類される学習事例の中で、目的データとの類似度にばらつきがあるものよりも、最高の類似度でなくても偏りがないものに分類する可能性がある。 (4) A category having the highest average similarity with a keyword group associated with the same category, or a predetermined number of categories in descending order. In this case, the server 10 can classify the learning examples classified into the same category into those that have no bias even if the similarity is not the highest, rather than those that vary in similarity with the target data. There is sex.

（５）類似度に対して選択確率の重み付けをした上で、上述の（１）〜（４）の方法により分類する。この場合は、選択確率が高ければ分類の精度が高いと仮定し、サーバ１０は、例えば分類学習テーブル６０（図７）の選択確率を類似度に掛け合わせて評価することにより、精度の高い学習事例に基づいたカテゴリに分類することができる。 (5) After the selection probabilities are weighted on the similarity, classification is performed by the methods (1) to (4) described above. In this case, assuming that the selection probability is high, it is assumed that the classification accuracy is high. For example, the server 10 evaluates the selection probability of the classification learning table 60 (FIG. 7) by multiplying the similarity with the high accuracy learning. It can be classified into categories based on cases.

以上のように、本発明の実施形態によれば、情報提供サービスを運用することで目的データを分類するカテゴリの学習ができ、学習したカテゴリに対して新たな目的データを自動で分類することができる。 As described above, according to the embodiment of the present invention, it is possible to learn a category for classifying target data by operating an information providing service, and to automatically classify new target data for the learned category. it can.

ここで、学習の結果（学習事例）については、数多く、そして新しい方がより精度が高く、ユーザの意見を反映していることが期待できる。そこで、学習処理については定期的に行うことが好ましく、例えば、新たな目的データを自動で分類する際にも複数のカテゴリを付与し、学習を行うこととしてもよい。 Here, with regard to the learning results (learning examples), many and newer ones can be expected to have higher accuracy and reflect the user's opinions. Therefore, it is preferable to periodically perform the learning process. For example, when new target data is automatically classified, a plurality of categories may be assigned and learning may be performed.

このようにして、サーバ１０は、情報提供サービスを運用する中で学習事例を継続的に取得・利用することができ、カテゴリ分類の精度を高く維持することができる。 In this way, the server 10 can continuously acquire and use learning examples while operating the information providing service, and can maintain high accuracy of category classification.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施例に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

本発明の好適な実施形態の一例に係るデータ分類方法の概念図である。It is a conceptual diagram of the data classification method which concerns on an example of suitable embodiment of this invention. 本発明の好適な実施形態の一例に係るコンピュータシステムを構成する各コンピュータの構成を示すブロック図である。It is a block diagram which shows the structure of each computer which comprises the computer system which concerns on an example of suitable embodiment of this invention. 本発明の好適な実施形態の一例に係るデータ分類の学習処理フローを示す図である。It is a figure which shows the learning processing flow of the data classification which concerns on an example of suitable embodiment of this invention. 本発明の好適な実施形態の一例に係るカテゴリへの分類処理フローを示す図である。It is a figure which shows the classification processing flow to the category which concerns on an example of suitable embodiment of this invention. 本発明の好適な実施形態の一例に係る商品分類テーブルを示す図である。It is a figure which shows the goods classification table which concerns on an example of suitable embodiment of this invention. 本発明の好適な実施形態の一例に係るアクセス履歴テーブルを示す図である。It is a figure which shows the access history table which concerns on an example of suitable embodiment of this invention. 本発明の好適な実施形態の一例に係る分類学習テーブルを示す図である。It is a figure which shows the classification learning table which concerns on an example of suitable embodiment of this invention.

Explanation of symbols

１０サーバ
２０端末装置
３０通信ネットワーク
４０商品分類テーブル
５０アクセス履歴テーブル
６０分類学習テーブル
１１０制御部
１２０記憶部
１３０入力部
１４０表示部
１５０通信制御部
１６０バス 10 server 20 terminal device 30 communication network 40 product classification table 50 access history table 60 classification learning table 110 control unit 120 storage unit 130 input unit 140 display unit 150 communication control unit 160 bus

Claims

A server connected to a terminal device via a communication network is a method for classifying target data to be viewed by a user of the terminal device into a category,
Storing the target data in association with a plurality of candidate category data;
Transmitting the target data associated with the selected candidate category data in response to receiving data representing one selection of the plurality of candidate category data by the user from the terminal device;
In response to receiving data representing selection of the target data by the user from the terminal device, the user counts up the number of selections of candidate category data selected by the user, and the selected target data and Storing in association with the selected candidate category data;
Determining the category for classifying the target data based on the selection frequency data in a predetermined period;
Including methods.

The method according to claim 1, wherein the category to be determined is one in which the candidate category data is selected a predetermined number of times or more.

The method according to claim 1, wherein the category to be determined has a number of selections of the candidate category data equal to or higher than a predetermined rank.

The method according to claim 1, wherein the category to be determined is determined by a relative evaluation of the number of selections of the candidate category data.

5. The method according to claim 1, further comprising a step of extracting and storing keyword data included in the target data in association with the determined category data.

Calculating a similarity between the new target data and the keyword data in response to accepting registration of new target data different from the target data;
Determining a category for classifying the new target data in accordance with a category evaluation based on the calculated similarity;
The method of claim 5 further comprising:

The step of storing the target data in association with a plurality of the category data calculates a similarity between the target data and the keyword data, and stores the candidate category data according to a category evaluation based on the calculated similarity. The method according to claim 5 or 6, wherein the method is selected.

Extracting and storing keyword data included in the target data in association with the determined category data, further storing selection probability data based on the number of selections of the category,
The method according to claim 6 or 7, wherein an evaluation of a category associated with the keyword data is based on the similarity and the selection probability data.

A server that is connected to a terminal device via a communication network and classifies target data to be browsed by a user of the terminal device,
Means for storing the target data in association with a plurality of candidate category data;
Means for transmitting target data associated with the selected candidate category data in response to receiving data representing one selection of the plurality of candidate category data by the user from the terminal device;
In response to receiving data representing selection of the target data by the user from the terminal device, the user counts up the number of selections of candidate category data selected by the user, and the selected target data and Means for storing in association with the selected candidate category data;
Means for determining the category for classifying the target data based on the selection frequency data in a predetermined period;
A server comprising

10. The server according to claim 9, wherein the category to be determined is one in which the candidate category data is selected a predetermined number of times or more.

10. The server according to claim 9, wherein the category to be determined has a number of selections of the candidate category data equal to or higher than a predetermined order.

The server according to claim 9, wherein the category to be determined is determined by relative evaluation of the number of selections of the candidate category data.

The server according to claim 9, further comprising means for extracting and storing keyword data included in the target data in association with the determined category data.

Means for calculating the similarity between the new target data and the keyword data in response to accepting registration of new target data different from the target data;
Means for determining a category for classifying the new target data according to the evaluation of the category based on the calculated similarity;
The server according to claim 13, further comprising:

Means for storing the target data in association with a plurality of the category data calculates a similarity between the target data and the keyword data, and stores the candidate category data according to a category evaluation based on the calculated similarity. The server according to claim 13 or 14, wherein the server is selected.

Means for extracting and storing keyword data included in the target data in association with the determined category data, and further storing selection probability data based on the number of selections of the category;
The server according to claim 14 or 15, wherein an evaluation of a category associated with the keyword data is based on the similarity and the selection probability data.

A program that causes a server connected to a terminal device via a communication network to classify target data to be viewed by a user of the terminal device into a category,
Storing the target data in association with a plurality of candidate category data;
In response to receiving data representing one selection of the plurality of candidate category data by the user from the terminal device, transmitting target data associated with the selected candidate category data;
In response to receiving data representing selection of the target data by the user from the terminal device, the user counts up the number of selections of candidate category data selected by the user, and the selected target data and Storing in association with the selected candidate category data;
Determining the category for classifying the target data based on the selection frequency data in a predetermined period;
A program that executes

18. The program according to claim 17, wherein the category to be determined is one in which the candidate category data is selected a predetermined number of times or more.

18. The program according to claim 17, wherein the category to be determined has a number of selections of the candidate category data equal to or higher than a predetermined order.

The program according to claim 17, wherein the category to be determined is determined by relative evaluation of the number of selections of the candidate category data.

21. The program according to claim 17, further comprising a step of extracting and storing keyword data included in the target data in association with the determined category data.

In response to accepting registration of new target data different from the target data, calculating the similarity between the new target data and the keyword data;
Determining a category for classifying the new target data in accordance with a category evaluation based on the calculated similarity;
The program according to claim 21, further comprising:

The step of storing the target data in association with a plurality of the category data causes the similarity between the target data and the keyword data to be calculated, and the candidate category data is determined according to the category evaluation based on the calculated similarity. The program according to claim 21 or 22, wherein the program is selected.

The step of extracting and storing the keyword data included in the target data in association with the determined category data further stores the selection probability data based on the number of selections of the category,
The program according to claim 22 or 23, wherein an evaluation of a category associated with the keyword data is based on the similarity and the selection probability data.