JP7067884B2

JP7067884B2 - Classification device, classification method and classification program

Info

Publication number: JP7067884B2
Application number: JP2017177328A
Authority: JP
Inventors: 伸次池宮; 健田村; 琢郎森; 和也工藤; 麻里衣目
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2017-09-15
Filing date: 2017-09-15
Publication date: 2022-05-16
Anticipated expiration: 2037-09-15
Also published as: JP2019053519A

Description

本発明は、分類装置、分類方法及び分類プログラムに関する。 The present invention relates to a classification device, a classification method and a classification program.

近年、通信ネットワークの発達とともに、様々なサービスがネットワークを介して提供されている。これに関連して、ネットワーク上に存在するサービスを検索したり、サービス内において所望の商品や記事を検索したりするための様々な検索技術が提案されている。 In recent years, with the development of communication networks, various services have been provided via networks. In connection with this, various search techniques for searching for services existing on the network and searching for desired products and articles within the services have been proposed.

例えば、ユーザ端末から送信された検索キーワードから修飾語とコンセプトキーワードとを区分し、修飾語及びコンセプトキーワードの各々を抽出して生成されるコンセプトキーワード拡張データセットを利用した検索の技術が提案されている。 For example, a search technique using a concept keyword extended data set generated by classifying a modifier and a concept keyword from a search keyword transmitted from a user terminal and extracting each of the modifier and the concept keyword has been proposed. There is.

特開２０１３－７３６２６号公報Japanese Unexamined Patent Publication No. 2013-73626

しかしながら、上記の従来技術では、多様なクエリ同士における相互の関係性を導出することは難しい。具体的には、上記の従来技術では、ユーザから入力された複数のクエリ（キーワード）に共通する語や特徴等に基づいてキーワード拡張の処理を行う。すなわち、従来技術では、例えば互いに分野やカテゴリが異なるような多様なクエリ同士からは共通する特徴を抽出できず、結果として、キーワード拡張を行ったり、クエリ同士の関係性を分析したりといった情報処理を行うことができない場合がある。 However, with the above-mentioned conventional technique, it is difficult to derive mutual relationships between various queries. Specifically, in the above-mentioned conventional technique, keyword expansion processing is performed based on words, features, and the like common to a plurality of queries (keywords) input by the user. That is, in the prior art, it is not possible to extract common features from various queries whose fields and categories are different from each other, and as a result, information processing such as keyword expansion and analysis of the relationship between queries is performed. May not be possible.

本願は、上記に鑑みてなされたものであって、多様なクエリ同士における相互の関係性を導出することができる分類装置、分類方法、及び分類プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object thereof is to provide a classification device, a classification method, and a classification program capable of deriving mutual relationships between various queries.

本願に係る分類装置は、任意のクエリ同士の関連度に基づいて、第１クエリと関連する複数の第２クエリを抽出する抽出部と、前記抽出部によって抽出された複数の第２クエリに基づいて、前記第１クエリを特徴付ける特徴情報を生成する生成部と、前記生成部によって生成された特徴情報に基づいて、前記第１クエリに対応するキーワードを分類する分類部と、を備えることを特徴とする。 The classification device according to the present application is based on an extraction unit that extracts a plurality of second queries related to the first query based on the degree of relevance between arbitrary queries, and a plurality of second queries extracted by the extraction unit. It is characterized by including a generation unit that generates feature information that characterizes the first query, and a classification unit that classifies keywords corresponding to the first query based on the feature information generated by the generation unit. And.

実施形態の一態様によれば、多様なクエリ同士における相互の関係性を導出することができるという効果を奏する。 According to one aspect of the embodiment, there is an effect that the mutual relationship between various queries can be derived.

図１は、実施形態に係る分類処理の一例を示す図である。FIG. 1 is a diagram showing an example of a classification process according to an embodiment. 図２は、実施形態に係る分類システムの構成例を示す図である。FIG. 2 is a diagram showing a configuration example of a classification system according to an embodiment. 図３は、実施形態に係る分類装置の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of the classification device according to the embodiment. 図４は、実施形態に係る関連度情報記憶部の一例を示す図である。FIG. 4 is a diagram showing an example of a relevance information storage unit according to an embodiment. 図５は、実施形態に係る特徴情報記憶部の一例を示す図である。FIG. 5 is a diagram showing an example of a feature information storage unit according to an embodiment. 図６は、実施形態に係る分類情報記憶部の一例を示す図である。FIG. 6 is a diagram showing an example of a classification information storage unit according to an embodiment. 図７は、実施形態に係る分類装置による処理手順を示すフローチャート（１）である。FIG. 7 is a flowchart (1) showing a processing procedure by the classification device according to the embodiment. 図８は、実施形態に係る分類装置による処理手順を示すフローチャート（２）である。FIG. 8 is a flowchart (2) showing a processing procedure by the classification device according to the embodiment. 図９は、変形例に係る分類装置による処理手順を示すフローチャートである。FIG. 9 is a flowchart showing a processing procedure by the classification device according to the modified example. 図１０は、分類装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 10 is a hardware configuration diagram showing an example of a computer that realizes the function of the classification device.

以下に、本願に係る分類装置、分類方法及び分類プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る分類装置、分類方法及び分類プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, the classification device, the classification method, and the embodiment for implementing the classification program (hereinafter referred to as “the embodiment”) according to the present application will be described in detail with reference to the drawings. It should be noted that this embodiment does not limit the classification device, classification method and classification program according to the present application. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate explanations are omitted.

〔１．分類処理の一例〕
まず、図１を用いて、実施形態に係る分類処理の一例について説明する。図１は、実施形態に係る分類処理の一例を示す図である。図１では、実施形態に係る分類処理の一例として、実施形態に係る分類装置１００がユーザＵ０１から送信されるキーワードリストＬ０１に含まれる複数のキーワードを分類する処理を例に挙げて説明する。 [1. An example of classification processing]
First, an example of the classification process according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of a classification process according to an embodiment. In FIG. 1, as an example of the classification process according to the embodiment, a process in which the classification device 100 according to the embodiment classifies a plurality of keywords included in the keyword list L01 transmitted from the user U01 will be described as an example.

図１に示す分類装置１００は、ユーザから送信される検索クエリ（以下、「クエリ」と表記する）同士の関連度に基づいて、各々のクエリの特徴情報を生成し、生成した特徴情報に基づいて、クエリを所定のクラスに分類（クラスタリング）するサーバ装置である。なお、実施形態では、分類装置１００は、検索サイトをユーザに提供するウェブサーバとしての機能を兼ねるものとする。 The classification device 100 shown in FIG. 1 generates feature information of each query based on the degree of relevance between search queries (hereinafter referred to as “query”) transmitted from the user, and is based on the generated feature information. It is a server device that classifies queries into predetermined classes (clustering). In the embodiment, the classification device 100 also functions as a web server that provides a search site to the user.

なお、実施形態では、検索サイト等において検索に用いられるキーワードを「クエリ」と称する。ただし、キーワードとクエリとが完全に一致しない場合もありうる。例えば、ユーザは、「ＡＡＡの自動車」といった文をクエリとして入力する場合もある（「ＡＡＡ」は、例えば自動車メーカーの名称である）。この場合、「ＡＡＡの自動車」がクエリとなり、クエリに含まれる単語である「ＡＡＡ」や「自動車」がキーワードとなりうる。分類装置１００は、例えば、クエリを形態素解析等することにより、適宜、クエリからキーワードを抽出する。また、以下の説明では、「クエリ」と「キーワード」が同義であるものとして扱う場合もある。例えば、クエリがキーワードと一致するか、あるいは、クエリを形態素解析等した結果から一のキーワードが抽出された場合等には、クエリとキーワードを同義のものとして扱う。 In the embodiment, a keyword used for a search on a search site or the like is referred to as a "query". However, there may be cases where the keyword and query do not exactly match. For example, the user may enter a sentence such as "AAA car" as a query ("AAA" is, for example, the name of the car manufacturer). In this case, "AAA car" can be a query, and the words "AAA" and "car" included in the query can be keywords. The classification device 100 appropriately extracts keywords from the query by, for example, performing morphological analysis of the query. Further, in the following description, "query" and "keyword" may be treated as having the same meaning. For example, if the query matches the keyword, or if one keyword is extracted from the result of morphological analysis of the query, the query and the keyword are treated as synonymous.

図１に示すユーザ群は、検索サイト等を利用する複数のユーザであり、クエリを分類装置１００に送信する複数のユーザである。また、ユーザＵ０１は、ユーザの一例であり、複数のキーワードが含まれるキーワードリストＬ０１を分類装置１００に送信し、キーワードリストＬ０１が含むキーワードの分類を要求するユーザである。なお、図１では図示を省略するが、ユーザ群が含む各ユーザやユーザＵ０１の各々は、検索サイトを利用したり、分類装置１００との各種情報の送受信を行ったりするための情報処理端末（以下、「ユーザ端末１０」と表記する）を有する。また、以下の説明では、ユーザ端末１０と、ユーザ端末１０を利用するユーザとを同一視する場合がある。例えば、「ユーザＵ０１がクエリを送信する」とは、実際には、「ユーザＵ０１が利用するユーザ端末１０がクエリを送信する」ことを意味する。 The user group shown in FIG. 1 is a plurality of users who use a search site or the like, and are a plurality of users who send a query to the classification device 100. Further, the user U01 is an example of a user, and is a user who transmits a keyword list L01 including a plurality of keywords to the classification device 100 and requests classification of keywords included in the keyword list L01. Although not shown in FIG. 1, each user and each user U01 included in the user group uses an information processing terminal for using a search site and transmitting / receiving various information to / from the classification device 100. Hereinafter, it is referred to as "user terminal 10"). Further, in the following description, the user terminal 10 and the user who uses the user terminal 10 may be equated with each other. For example, "user U01 sends a query" actually means "user terminal 10 used by user U01 sends a query".

ユーザが検索を行う際に送信するクエリは、ユーザの興味関心を示す。このため、クエリを分類することにより、例えば、同じクラスに属するクエリを頻繁に送信するユーザ同士は、互いに同じ興味関心を有するユーザであると推定すること等が可能である。このような情報は、例えば、広告配信等のマーケティングにおいて有用な情報となりうる。このことから、どのようなクエリ同士が類似する関係にあるかを求め、クエリを分類することができれば、例えば、広告配信事業者等にとって有用な情報が得られる。クエリを分類する手法としては、例えば、クエリに対してカテゴリを設定し、同じようなカテゴリに属するクエリを同じクラスに分類する手法等がある。 The query that the user sends when performing a search indicates the user's interests. Therefore, by classifying the queries, for example, it is possible to presume that users who frequently send queries belonging to the same class are users who have the same interests as each other. Such information can be useful information in marketing such as advertisement distribution, for example. From this, if it is possible to find out what kind of queries have a similar relationship with each other and classify the queries, for example, useful information for an advertisement distribution business operator or the like can be obtained. As a method of classifying queries, for example, there is a method of setting a category for a query and classifying queries belonging to the same category into the same class.

しかしながら、検索サイト等に送信されるクエリの種類は膨大であり、人手でカテゴリを付与することは現実的ではない。また、プログラム等によってクエリにカテゴリを付与する手法においても、一つのクエリに複数の意味が含まれている場合もあり、自動的に適切なカテゴライズを行うことも難しい。 However, the types of queries sent to search sites and the like are enormous, and it is not realistic to manually assign categories. Further, even in the method of assigning a category to a query by a program or the like, one query may contain a plurality of meanings, and it is difficult to automatically perform appropriate categorization.

そこで、実施形態に係る分類装置１００は、集計対象となる全ユーザのうち、同一のユーザが異なるクエリを入力した回数に基づいて、ある２つのクエリ同士の関連度を算出する。さらに、分類装置１００は、関連度に基づいて、任意のクエリの特徴情報を算出する。そして、分類装置１００は、特徴情報に基づいてクエリを分類する。これにより、分類装置１００は、ユーザが実際に検索した行動という定量的な情報に基づいてクエリを分類できるため、各々のクエリをカテゴライズすることなく、関連するクエリを適切なクラスに分類することができる。また、分類装置１００によれば、予めクエリをカテゴライズすることを要しないため、どのようなクエリが入力された場合でも、クエリ同士の関連度を算出することや、クエリの分類を行うこと等ができる。すなわち、分類装置１００は、多様なクエリに対応した分類処理を行うことができる。以下、図１を用いて、分類装置１００によって行われる分類処理の一例を流れに沿って説明する。なお、以下の説明では、関連度が算出される２つのクエリのうち、処理対象とするクエリを「第１クエリ」と称し、第１クエリと関連するクエリを「第２クエリ」と称する。このため、所定のキーワードが「第１クエリ」となる場合もあれば、「第２クエリ」となる場合もありうる。 Therefore, the classification device 100 according to the embodiment calculates the degree of relevance between two queries based on the number of times that the same user inputs a different query among all the users to be aggregated. Further, the classification device 100 calculates the feature information of an arbitrary query based on the degree of relevance. Then, the classification device 100 classifies the query based on the feature information. As a result, the classification device 100 can classify the queries based on the quantitative information of the behavior actually searched by the user, so that the related queries can be classified into appropriate classes without categorizing each query. can. Further, according to the classification device 100, it is not necessary to categorize the queries in advance, so that no matter what kind of query is input, it is possible to calculate the degree of relevance between the queries, classify the queries, and the like. can. That is, the classification device 100 can perform classification processing corresponding to various queries. Hereinafter, an example of the classification process performed by the classification device 100 will be described along the flow with reference to FIG. 1. In the following description, of the two queries for which the degree of relevance is calculated, the query to be processed is referred to as a "first query", and the query related to the first query is referred to as a "second query". Therefore, the predetermined keyword may be the "first query" or the "second query".

図１に示す例において、ユーザ群の各々のユーザは、検索サイト等においてクエリを入力する（ステップＳ１１）。分類装置１００は、各ユーザから送信されるクエリを取得する（ステップＳ１２）。そして、分類装置１００は、クエリ同士の関連度を算出する（ステップＳ１３）。 In the example shown in FIG. 1, each user of the user group inputs a query on a search site or the like (step S11). The classification device 100 acquires a query sent from each user (step S12). Then, the classification device 100 calculates the degree of relevance between the queries (step S13).

分類装置１００は、任意のクエリのうち、互いに異なる二つのクエリのいずれかを入力したユーザの数と、当該二つのクエリを両方とも入力したユーザの数と、に少なくとも基づいて、任意のクエリ同士の関連度を算出する。一例として、分類装置１００は、下記式（１）に基づいて、第１クエリ（式（１）では、「クエリＡ」）と第２クエリ（式（１）では、「クエリＢ」）の関連度を算出する。 The classification device 100 is based on at least the number of users who have entered one of two different queries among the arbitrary queries and the number of users who have entered both of the two queries. Calculate the degree of relevance of. As an example, the classification device 100 relates the first query (“query A” in the formula (1)) and the second query (“query B” in the formula (1)) based on the following formula (1). Calculate the degree.

上記式（１）において、「Ｓｃｏｒｅ（Ａ，Ｂ）」は、クエリＡとクエリＢの関連度の数値を示す。「Ａｕｓｅｒ」は、クエリＡを検索（入力）したユーザ数を示す。「Ｂｕｓｅｒ」は、クエリＢを検索したユーザ数を示す。「ＡＬＬｕｓｅｒ」は、所定の集計期間においてクエリを送信したユーザ（検索を利用したユーザ）の数を示す。そして、「Ａｕｓｅｒ∧Ｂｕｓｅｒ」は、クエリＡとクエリＢの両方を検索したユーザ数を示す。 In the above equation (1), "Score (A, B)" indicates a numerical value of the degree of association between query A and query B. "Auser" indicates the number of users who searched (input) the query A. “Buser” indicates the number of users who searched for query B. "ALLuser" indicates the number of users (users who used the search) who sent a query in a predetermined aggregation period. Then, "User ∧ Buser" indicates the number of users who searched for both Query A and Query B.

分類装置１００は、上記式（１）を用いて、ユーザ群から検索された全クエリの関連度を算出する。例えば、分類装置１００は、第１クエリであるクエリＡを検索したユーザが検索するクエリＢを集計する。そして、分類装置１００は、第１クエリに対する全ての第２クエリに対して、上記式（１）を用いて関連度を算出する。分類装置１００は、算出した第１クエリの関連度を記憶部に格納する。なお、分類装置１００は、所定の閾値を超える関連度を有する第１クエリと第２クエリのペアのみを抽出して記憶部に格納するようにしてもよい。 The classification device 100 calculates the relevance of all the queries searched from the user group by using the above formula (1). For example, the classification device 100 aggregates the queries B searched by the user who searched the query A, which is the first query. Then, the classification device 100 calculates the degree of relevance for all the second queries for the first query by using the above equation (1). The classification device 100 stores the calculated relevance of the first query in the storage unit. The classification device 100 may extract only the pair of the first query and the second query having a degree of relevance exceeding a predetermined threshold value and store them in the storage unit.

上記式（１）において、分類装置１００は、集計期間を変更することで、どのくらいの長さの期間におけるクエリ同士の関連度を算出するかを調整することができる。例えば、分類装置１００は、集計期間を数年間という範囲で設定すれば、当該数年間の間に同一のユーザが第１クエリと第２クエリとを検索した場合に、「第１クエリと第２クエリの両方を検索したユーザ」として計数するため、比較的長い範囲におけるユーザの興味関心の移り変わりを示した関連度を算出することができる。これにより、分類装置１００は、ユーザのライフステージ（例えば、ユーザが検索するクエリが「妊娠」から「出産」に変化したことなど）の移り変わりを反映させた関連度の算出を行うことができる。一方で、分類装置１００は、集計期間を数日間という範囲で設定すれば、当該数日間の間に同一のユーザが第１クエリと第２クエリの両方を検索しなければ、「第１クエリと第２クエリの両方を検索したユーザ数」を計数しない。このため、分類装置１００は、比較的長い期間を要せずとも関連を有するクエリ同士のペア等を抽出し易くなる。なお、集計期間は、例えば、分類装置１００の管理者等によって、適宜、設定されてもよい。 In the above formula (1), the classification device 100 can adjust how long the period of relevance between queries is calculated by changing the aggregation period. For example, if the aggregation period is set within a range of several years, the classification device 100 will "first query and second query" when the same user searches for the first query and the second query during the few years. Since it is counted as "users who searched both queries", it is possible to calculate the degree of relevance indicating the change of interests of users over a relatively long range. As a result, the classification device 100 can calculate the degree of relevance that reflects the transition of the user's life stage (for example, the query searched by the user has changed from "pregnancy" to "childbirth"). On the other hand, if the aggregation period is set within a range of several days, the classification device 100 will not search for both the first query and the second query within the few days, "the first query and Do not count the number of users who searched both in the second query. Therefore, the classification device 100 can easily extract pairs and the like of related queries without requiring a relatively long period of time. The aggregation period may be appropriately set by, for example, the administrator of the classification device 100 or the like.

図１の例では、分類装置１００は、算出した関連度をデータベースＤＢ０１に格納するものとする。図１に示すように、データベースＤＢ０１は、第１クエリが「ＡＡＡ」であり、第２クエリがそれぞれ「ＢＢＢ」、「ＣＣＣ」、「ＤＤＤ」である場合の関連度を記憶する。具体的には、データベースＤＢ０１に記憶される情報の一例は、第１クエリが「ＡＡＡ」であり、第２クエリが「ＢＢＢ」であるペアの関連度は、「５．９３」であることを示している。 In the example of FIG. 1, the classification device 100 stores the calculated relevance degree in the database DB 01. As shown in FIG. 1, the database DB 01 stores the degree of relevance when the first query is "AAA" and the second query is "BBB", "CCC", and "DDD", respectively. Specifically, as an example of the information stored in the database DB01, the relevance of the pair in which the first query is "AAA" and the second query is "BBB" is "5.93". Shows.

続けて、分類装置１００は、各クエリ（キーワード）の特徴情報を生成する処理を行う。まず、分類装置１００は、算出した関連度に基づいて、第１クエリと関連する第２クエリを抽出する（ステップＳ１４）。具体的には、分類装置１００は、第１クエリとの関連度が所定の閾値を超える全ての第２クエリを抽出する。例えば、分類装置１００は、第１クエリが入力された場合に、第１クエリとの関連度が所定の閾値を超える第２クエリを抽出する処理を行う所定の検索エンジンを用いて、第２クエリの抽出を行う。 Subsequently, the classification device 100 performs a process of generating feature information of each query (keyword). First, the classification device 100 extracts a second query related to the first query based on the calculated relevance degree (step S14). Specifically, the classification device 100 extracts all the second queries whose relevance to the first query exceeds a predetermined threshold. For example, the classification device 100 uses a predetermined search engine that performs a process of extracting a second query whose relevance to the first query exceeds a predetermined threshold when the first query is input. Extract.

そして、分類装置１００は、抽出された第２クエリに基づいて、第１クエリの特徴を示す特徴情報を生成する（ステップＳ１５）。例えば、分類装置１００は、抽出された第２クエリを各次元とする単語ベクトルで表記することにより、第１クエリの特徴情報をベクトルとして生成する。この場合、分類装置１００は、抽出された第２クエリの各々を形態素解析し、解析されたキーワードに基づいて、第１クエリを示す単語ベクトルを生成してもよい。 Then, the classification device 100 generates feature information indicating the features of the first query based on the extracted second query (step S15). For example, the classification device 100 generates the feature information of the first query as a vector by expressing the extracted second query as a word vector having each dimension. In this case, the classification device 100 may perform morphological analysis of each of the extracted second queries and generate a word vector indicating the first query based on the analyzed keywords.

上述のように、クエリには、複数のキーワードが含まれる場合がある。仮に、第１クエリに関連する第２クエリとして、「ＢＢＢの中古の自動車」というクエリが存在していたとする。この場合、分類装置１００は、「ＢＢＢの中古の自動車」を形態素解析し、「ＢＢＢ」、「中古」、「自動車」の各々のキーワードを抽出する。分類装置１００は、上記の処理を抽出された全ての第２クエリに対して行うことで、第１クエリを特徴付ける各キーワードと、キーワードの出現回数とを取得する。そして、分類装置１００は、各々のキーワードを次元とし、各々のキーワードの出現数を次元数とするベクトルを生成する。仮に、第１クエリ「ＡＡＡ」に関連する第２クエリに対して上記処理を行い、キーワードとして、「ＢＢＢ」が「１８」回出現し、「ＣＣＣ」が「１５」回出現し、「ＤＤＤ」が「９」回出現し、「中古」が「２５」回出現し、「自動車」が「７１」回出現したとする。この場合、第１クエリ「ＡＡＡ」の特徴情報は、（ＢＢＢ，ＣＣＣ，ＤＤＤ，中古，自動車，・・・）＝（１８，１５，９，２５，７１，・・・）のようなベクトルとして示される。 As mentioned above, a query may contain multiple keywords. Suppose that the query "BBB used car" exists as the second query related to the first query. In this case, the classification device 100 performs morphological analysis of "used car of BBB" and extracts each keyword of "BBB", "used", and "car". The classification device 100 acquires each keyword that characterizes the first query and the number of occurrences of the keyword by performing the above processing for all the extracted second queries. Then, the classification device 100 generates a vector in which each keyword is a dimension and the number of appearances of each keyword is a dimension number. Temporarily, the above processing is performed for the second query related to the first query "AAA", and "BBB" appears "18" times, "CCC" appears "15" times, and "DDD" as keywords. Appears "9" times, "used" appears "25" times, and "automobile" appears "71" times. In this case, the feature information of the first query "AAA" is as a vector such as (BBB, CCC, DDD, used, automobile, ...) = (18,15,9,25,71, ...). Shown.

分類装置１００は、ユーザ群から検索が行われた全クエリを第１クエリとして上記の処理を行い、各々の第１クエリの特徴情報を生成する。そして、分類装置１００は、第１クエリと生成した特徴情報とを対応付けて記憶部（図１の例では、データベースＤＢ０２）に格納する。 The classification device 100 performs the above processing with all the queries searched from the user group as the first query, and generates the feature information of each first query. Then, the classification device 100 stores the first query and the generated feature information in the storage unit (database DB 02 in the example of FIG. 1) in association with each other.

その後、所定のタイミングにおいて、分類装置１００による分類処理を利用することを所望するユーザＵ０１は、任意の複数のキーワードを含むキーワードリストを分類装置１００に送信する（ステップＳ１６）。図１の例では、ユーザＵ０１は、自動車メーカーの名称の一覧をキーワードとして含むキーワードリストＬ０１を分類装置１００に送信する。 After that, at a predetermined timing, the user U01 who desires to use the classification process by the classification device 100 transmits a keyword list including any plurality of keywords to the classification device 100 (step S16). In the example of FIG. 1, the user U01 transmits a keyword list L01 including a list of names of automobile manufacturers as a keyword to the classification device 100.

分類装置１００は、送信されたキーワードリストＬ０１を受け付ける。そして、分類装置１００は、特徴情報の類似度に基づいて、キーワードリストＬ０１に含まれるキーワードを分類（クラスタリング）する（ステップＳ１７）。 The classification device 100 receives the transmitted keyword list L01. Then, the classification device 100 classifies (clusters) the keywords included in the keyword list L01 based on the similarity of the feature information (step S17).

例えば、分類装置１００は、データベースＤＢ０２を参照し、キーワードリストＬ０１に含まれる各々のキーワードに対応する第１クエリを抽出する。そして、分類装置１００は、抽出された第１クエリの特徴情報の各々の関連性に基づいて、キーワードリストＬ０１に含まれるキーワードを分類する。具体的には、分類装置１００は、キーワードリストＬ０１に含まれる各々のキーワードをk-means法等の非階層的手法(non-hierarchical method)を用いて分類する。なお、分類処理の手法は上記の例に限られず、分類装置１００は、特徴情報に基づいてキーワードを分類することが可能な手法であれば、いずれの手法を利用してもよい。例えば、分類装置１００は、特徴情報同士のコサイン類似度を算出し、所定の閾値を超えるコサイン類似度を有するキーワード同士を同じクラスに分類してもよい。また、分類装置１００は、最短距離法などの階層的手法（hierarchical method)を用いてもよいし、サポートベクタマシンのように学習を利用した分類手法を用いてもよい。 For example, the classification device 100 refers to the database DB 02 and extracts the first query corresponding to each keyword included in the keyword list L01. Then, the classification device 100 classifies the keywords included in the keyword list L01 based on the relevance of each of the extracted feature information of the first query. Specifically, the classification device 100 classifies each keyword included in the keyword list L01 by using a non-hierarchical method such as a k-means method. The method of classification processing is not limited to the above example, and the classification device 100 may use any method as long as it is a method capable of classifying keywords based on feature information. For example, the classification device 100 may calculate the cosine similarity between feature information and classify keywords having cosine similarity exceeding a predetermined threshold into the same class. Further, the classification device 100 may use a hierarchical method such as the shortest distance method, or may use a classification method using learning such as a support vector machine.

図１に示す例では、分類装置１００は、キーワードリストＬ０１に含まれるキーワード「ＡＡＡ」、「ＢＢＢ」及び「ＣＣＣ」等を、同じクラスであるクラスＣＬ０１に分類したものとする。また、分類装置１００は、キーワードリストＬ０１に含まれるキーワード「ＤＤＤ」等をクラスＣＬ０２に分類したものとする。なお、図１の例では、クエリ「ＡＡＡ」や「ＢＢＢ」等と、キーワード「ＡＡＡ」や「ＢＢＢ」等とは、それぞれ同じ語を示すものとする。 In the example shown in FIG. 1, it is assumed that the classification device 100 classifies the keywords "AAA", "BBB", "CCC" and the like included in the keyword list L01 into the class CL01 which is the same class. Further, it is assumed that the classification device 100 classifies the keywords "DDD" and the like included in the keyword list L01 into the class CL02. In the example of FIG. 1, the query "AAA", "BBB", etc. and the keywords "AAA", "BBB", etc. indicate the same word, respectively.

分類装置１００は、分類の結果を記憶部（図１の例では、データベースＤＢ０３）に格納する。この例では、「ＡＡＡ」、「ＢＢＢ」及び「ＣＣＣ」に何らかの共通の性質（例えば、拠点とする国が共通していたり、資本関係があったり、製造する自動車の特徴が類似することからユーザ同士の比較対象とされていたりする等）を有していることが推定される。また、「ＡＡＡ」、「ＢＢＢ」及び「ＣＣＣ」と、「ＤＤＤ」とは、何らかの相違する性質（例えば、拠点とする国が異なっていたり、資本関係がなかったり、製造する自動車が競合しなかったりする等）を有していることが推定される。このように、分類装置１００によれば、ユーザＵ０１が提示した複数のキーワードに対して、ユーザの検索行動に基づいて生成された特徴情報を用いてクラスタリングがなされるため、ユーザの興味関心が反映された分類を行うことができる。 The classification device 100 stores the classification result in a storage unit (database DB 03 in the example of FIG. 1). In this example, the user has some common qualities in "AAA", "BBB" and "CCC" (eg, because they have a common base country, have a capital relationship, and have similar characteristics of the cars they manufacture. It is presumed that they have (such as being compared with each other). In addition, "AAA", "BBB" and "CCC" and "DDD" have some different properties (for example, the countries in which they are based are different, there is no capital relationship, and the automobiles they manufacture do not compete with each other. It is presumed that they have (such as). As described above, according to the classification device 100, clustering is performed using the feature information generated based on the user's search behavior for the plurality of keywords presented by the user U01, so that the user's interests are reflected. Can be classified.

その後、分類装置１００は、キーワードリストＬ０１に含まれる複数のキーワードを分類した結果をユーザＵ０１に送信する（ステップＳ１８）。ユーザＵ０１は、分類された結果を参照することにより、ユーザの行動にどのようなキーワード同士が関連性を有するかといった情報を知得することができる。 After that, the classification device 100 transmits the result of classifying a plurality of keywords included in the keyword list L01 to the user U01 (step S18). By referring to the classified results, the user U01 can obtain information such as what kind of keywords are related to the user's behavior.

図１を用いて上述してきたように、実施形態に係る分類装置１００は、任意のクエリ同士の関連度に基づいて、第１クエリと関連する複数の第２クエリを抽出する。また、分類装置１００は、抽出した複数の第２クエリに基づいて、第１クエリを特徴付ける特徴情報を生成する。そして、分類装置１００は、生成した特徴情報に基づいて、第１クエリに対応するキーワードを分類する。 As described above with reference to FIG. 1, the classification device 100 according to the embodiment extracts a plurality of second queries related to the first query based on the degree of relevance between arbitrary queries. Further, the classification device 100 generates feature information that characterizes the first query based on the plurality of extracted second queries. Then, the classification device 100 classifies the keywords corresponding to the first query based on the generated feature information.

すなわち、分類装置１００によれば、ユーザから送信されるクエリが有する意味や性質等の定性的な情報によらず、実際のユーザの検索行動という定量的な情報に基づいて分類処理を行う。言い換えれば、分類装置１００は、各々のクエリをカテゴライズ等することなく、関連するキーワード同士を分類する。このように、分類装置１００は、ユーザから送信される様々なクエリに対して分類を行うことができるため、結果として、多様なクエリ同士における相互の関係性を導出することができる。以下、上記のような処理を行う分類装置１００及び分類装置１００を含む分類システム１について、詳細に説明する。 That is, according to the classification device 100, the classification process is performed based on the quantitative information of the actual search behavior of the user, regardless of the qualitative information such as the meaning and the property of the query transmitted from the user. In other words, the classification device 100 classifies related keywords without categorizing each query. In this way, the classification device 100 can perform classification for various queries transmitted from the user, and as a result, it is possible to derive mutual relationships between various queries. Hereinafter, the classification system 1 including the classification device 100 and the classification device 100 that perform the above processing will be described in detail.

〔２．分類システムの構成〕
次に、図２を用いて、実施形態に係る分類システム１の構成について説明する。図２は、実施形態に係る分類システム１の構成例を示す図である。図２に示すように、分類システム１は、ユーザ端末１０と、分類装置１００とを含む。ユーザ端末１０、及び分類装置１００は、通信ネットワークであるネットワークＮ（例えば、インターネット）を介して有線または無線により通信可能に接続される。なお、図２に示す分類システム１に含まれる各装置の数は図示したものに限られない。例えば、分類システム１には、複数台のユーザ端末１０等が含まれてもよい。 [2. Classification system configuration]
Next, the configuration of the classification system 1 according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram showing a configuration example of the classification system 1 according to the embodiment. As shown in FIG. 2, the classification system 1 includes a user terminal 10 and a classification device 100. The user terminal 10 and the classification device 100 are connected so as to be communicable by wire or wirelessly via a network N (for example, the Internet) which is a communication network. The number of each device included in the classification system 1 shown in FIG. 2 is not limited to that shown in the figure. For example, the classification system 1 may include a plurality of user terminals 10 and the like.

ユーザ端末１０は、ユーザによって利用される情報処理装置である。例えば、ユーザ端末１０は、デスクトップ型ＰＣや、ノート型ＰＣや、スマートフォン等の携帯電話機や、タブレット端末や、ＰＤＡ（Personal Digital Assistant）、ウェアラブルデバイス（Wearable Device）等の情報処理装置である。例えば、ユーザ端末１０は、ユーザによる操作にしたがって、検索サイトにアクセスする。そして、ユーザ端末１０は、ユーザによって入力されたクエリを、検索サイトを提供するサーバ（実施形態では、分類装置１００）に対して送信する。 The user terminal 10 is an information processing device used by the user. For example, the user terminal 10 is an information processing device such as a desktop PC, a notebook PC, a mobile phone such as a smartphone, a tablet terminal, a PDA (Personal Digital Assistant), or a wearable device (Wearable Device). For example, the user terminal 10 accesses the search site according to the operation by the user. Then, the user terminal 10 transmits the query input by the user to the server (in the embodiment, the classification device 100) that provides the search site.

分類装置１００は、上述のように、任意のクエリ同士の関連度に基づいて第１クエリを特徴付ける特徴情報を生成し、生成した特徴情報に基づいて、第１クエリに対応するキーワードを分類するサーバ装置である。 As described above, the classification device 100 generates feature information that characterizes the first query based on the degree of relevance between arbitrary queries, and classifies keywords corresponding to the first query based on the generated feature information. It is a device.

〔３．分類装置の構成〕
次に、図３を用いて、実施形態に係る分類装置１００の構成について説明する。図３は、実施形態に係る分類装置１００の構成例を示す図である。図３に示すように、分類装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。なお、分類装置１００は、分類装置１００を利用する管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を出力するための出力部（例えば、液晶ディスプレイ等）を有してもよい。 [3. Classification device configuration]
Next, the configuration of the classification device 100 according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram showing a configuration example of the classification device 100 according to the embodiment. As shown in FIG. 3, the classification device 100 includes a communication unit 110, a storage unit 120, and a control unit 130. The classification device 100 includes an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from an administrator or the like who uses the classification device 100, and an output unit (for example, a liquid crystal display, etc.) for outputting various information. You may have.

（通信部１１０について）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。通信部１１０は、通信ネットワークと有線又は無線で接続され、通信ネットワークを介して、ユーザ端末１０との間で情報の送受信を行う。 (About communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. The communication unit 110 is connected to the communication network by wire or wirelessly, and transmits / receives information to / from the user terminal 10 via the communication network.

（記憶部１２０について）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。実施形態に係る記憶部１２０は、関連度情報記憶部１２１と、特徴情報記憶部１２２と、分類情報記憶部１２３とを有する。以下、各記憶部について順に説明する。 (About the storage unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. The storage unit 120 according to the embodiment includes a relevance information storage unit 121, a feature information storage unit 122, and a classification information storage unit 123. Hereinafter, each storage unit will be described in order.

（関連度情報記憶部１２１について）
関連度情報記憶部１２１は、クエリ同士の関連度に関する情報を記憶する。関連度情報記憶部１２１は、図１で示したデータベースＤＢ０１に対応する。ここで、図４に、実施形態に係る関連度情報記憶部１２１の一例を示す。図４は、実施形態に係る関連度情報記憶部１２１の一例を示す図である。図４に示すように、関連度情報記憶部１２１は、「集計期間」、「検索ユーザ全体数」、「第１クエリ」、「第２クエリ」、「関連度」といった項目を有する。 (About the relevance information storage unit 121)
The relevance information storage unit 121 stores information regarding the relevance between queries. The relevance information storage unit 121 corresponds to the database DB 01 shown in FIG. Here, FIG. 4 shows an example of the relevance information storage unit 121 according to the embodiment. FIG. 4 is a diagram showing an example of the relevance information storage unit 121 according to the embodiment. As shown in FIG. 4, the relevance information storage unit 121 has items such as "aggregation period", "total number of search users", "first query", "second query", and "relevance".

「集計期間」は、ユーザの検索行動に関する情報を集計する期間を示す。「検索ユーザ全体数」は、集計期間における検索ユーザの全体数に関する情報を示す。なお、図４では、「検索ユーザ全体数」に記憶される情報を「Ａ０１」のように概念的に示しているが、実際には、「検索ユーザ全体数」の項目には、集計期間において検索サイトを利用した全ユニークユーザ数の具体的な数値が記憶される。なお、当該項目には、集計期間において検索サイトにクエリが送信された（言い換えれば、検索が行われた）回数や、各クエリの検索回数等の情報が記憶されてもよい。 The "aggregation period" indicates a period during which information regarding the user's search behavior is aggregated. "Total number of search users" indicates information regarding the total number of search users in the aggregation period. In FIG. 4, the information stored in the "total number of search users" is conceptually shown as "A01", but in reality, the item of "total number of search users" is included in the aggregation period. The specific number of all unique users who used the search site is stored. It should be noted that the item may store information such as the number of times a query is sent (in other words, a search is performed) to the search site during the aggregation period, and the number of times each query is searched.

「第１クエリ」は、第１クエリを示す。「第２クエリ」は、第１クエリを検索したユーザが検索したクエリであって、第１クエリとは異なるクエリを示す。「関連度」は、第１クエリと第２クエリとの関連度を示す。関連度は、例えば、上記式（１）によってクエリのペアごとに算出される。 The "first query" indicates the first query. The "second query" is a query searched by the user who searched for the first query, and indicates a query different from the first query. "Relevance degree" indicates the degree of relevance between the first query and the second query. The degree of relevance is calculated for each query pair by, for example, the above equation (1).

すなわち、図４では、関連度情報記憶部１２１が保持する情報の一例として、集計期間が「２０１６年７月１日～２０１７年６月３０日」であって、検索ユーザ全体数が「Ａ０１」である集計データのうち、第１クエリが「ＡＡＡ」であって第２クエリが「ＢＢＢ」であるペアの関連度は「５．９３」であることを示している。 That is, in FIG. 4, as an example of the information held by the relevance information storage unit 121, the aggregation period is "July 1, 2016 to June 30, 2017", and the total number of search users is "A01". Of the aggregated data, the degree of relevance of the pair whose first query is "AAA" and whose second query is "BBB" is "5.93".

（特徴情報記憶部１２２について）
特徴情報記憶部１２２は、クエリの特徴情報を記憶する。特徴情報記憶部１２２は、図１で示したデータベースＤＢ０２に対応する。ここで、図５に、実施形態に係る特徴情報記憶部１２２の一例を示す。図５は、実施形態に係る特徴情報記憶部１２２の一例を示す図である。図５に示すように、特徴情報記憶部１２２は、「第１クエリ」、「抽出された第２クエリ情報」、「形態素解析情報」、「特徴情報」といった項目を有する。 (About the feature information storage unit 122)
The feature information storage unit 122 stores the feature information of the query. The feature information storage unit 122 corresponds to the database DB 02 shown in FIG. Here, FIG. 5 shows an example of the feature information storage unit 122 according to the embodiment. FIG. 5 is a diagram showing an example of the feature information storage unit 122 according to the embodiment. As shown in FIG. 5, the feature information storage unit 122 has items such as "first query", "extracted second query information", "morphological analysis information", and "feature information".

「第１クエリ」は、図４で示した同一の項目に対応する。「抽出された第２クエリ情報」は、第１クエリに対して、所定の閾値を超える関連度を有する第２クエリとして抽出された第２クエリの情報を示す。図５では、「抽出された第２クエリ情報」に記憶される情報を「Ｂ０１」のように概念的に示しているが、実際には、「抽出された第２クエリ情報」の項目には、抽出された複数の第２クエリを示す情報が記憶される。なお、分類装置１００は、第２クエリを抽出する際の関連度の閾値については、任意に設定してもよい。また、分類装置１００は、例えば、関連度に関わらず、第１クエリに対して所定数（例えば１０個や１００個）の第２クエリを抽出するようにしてもよい。 The "first query" corresponds to the same item shown in FIG. The "extracted second query information" indicates the information of the second query extracted as the second query having a degree of relevance exceeding a predetermined threshold value with respect to the first query. In FIG. 5, the information stored in the "extracted second query information" is conceptually shown as "B01", but in reality, the item of the "extracted second query information" is shown. , Information indicating a plurality of extracted second queries is stored. The classification device 100 may arbitrarily set the threshold value of the degree of relevance when extracting the second query. Further, the classification device 100 may, for example, extract a predetermined number (for example, 10 or 100) of second queries from the first query regardless of the degree of relevance.

「形態素解析情報」は、抽出された第２クエリを形態素解析した情報を示す。図５では、「形態素解析情報」に記憶される情報を「Ｃ０１」のように概念的に示しているが、実際には、「形態素解析情報」の項目には、抽出された第２クエリを形態素解析した結果が記憶される。より具体的には、「形態素解析情報」の項目には、第２クエリを形態素解析することにより得られたキーワードや、キーワードの出現回数を示す情報が記憶される。なお、分類装置１００は、形態素解析の結果のうち、所定の条件を満たすキーワードのみを記憶するようにしてもよい。例えば、分類装置１００は、日本語であれば、第２クエリを形態素解析した結果として、名詞のみをキーワードとして記憶するようにしてもよい。 "Morphological analysis information" indicates information obtained by morphological analysis of the extracted second query. In FIG. 5, the information stored in the "morphological analysis information" is conceptually shown as "C01", but in reality, the extracted second query is included in the "morphological analysis information" item. The result of morphological analysis is stored. More specifically, in the item of "morphological analysis information", a keyword obtained by morphological analysis of the second query and information indicating the number of occurrences of the keyword are stored. The classification device 100 may store only the keywords satisfying a predetermined condition among the results of the morphological analysis. For example, in the case of Japanese, the classification device 100 may store only nouns as keywords as a result of morphological analysis of the second query.

「特徴情報」は、第１クエリの特徴情報を示す。図５では、「特徴情報」に記憶される情報を「Ｒ０１」のように概念的に示しているが、実際には、「特徴情報」の項目には、第２クエリを形態素解析して得られたキーワードと、キーワードの出現回数から構成される情報が記憶される。より具体的には、「特徴情報」の項目には、キーワードを次元とし、出現回数を次元数とする単語ベクトルが記憶される。 "Characteristic information" indicates the characteristic information of the first query. In FIG. 5, the information stored in the "feature information" is conceptually shown as "R01", but in reality, the item of the "feature information" is obtained by morphological analysis of the second query. Information composed of the keywords and the number of times the keywords appear is stored. More specifically, in the item of "feature information", a word vector having a keyword as a dimension and the number of occurrences as a dimension is stored.

すなわち、図５では、特徴情報記憶部１２２が保持する情報の一例として、第１クエリ「ＡＡＡ」に対して抽出された第２クエリ情報は「Ｂ０１」であり、かかる第２クエリを形態素解析した形態素解析情報は「Ｃ０１」であり、かかる情報から生成された特徴情報は「Ｒ０１」であることを示している。 That is, in FIG. 5, as an example of the information held by the feature information storage unit 122, the second query information extracted for the first query “AAA” is “B01”, and the second query is morphologically analyzed. It is shown that the morphological analysis information is "C01" and the feature information generated from such information is "R01".

（分類情報記憶部１２３について）
分類情報記憶部１２３は、分類処理の結果を記憶する。分類情報記憶部１２３は、図１で示したデータベースＤＢ０３に対応する。ここで、図６に、実施形態に係る分類情報記憶部１２３の一例を示す。図６は、実施形態に係る分類情報記憶部１２３の一例を示す図である。図６に示すように、分類情報記憶部１２３は、「キーワードリストＩＤ」、「クラスＩＤ」、「キーワード」といった項目を有する。 (About the classification information storage unit 123)
The classification information storage unit 123 stores the result of the classification process. The classification information storage unit 123 corresponds to the database DB 03 shown in FIG. Here, FIG. 6 shows an example of the classification information storage unit 123 according to the embodiment. FIG. 6 is a diagram showing an example of the classification information storage unit 123 according to the embodiment. As shown in FIG. 6, the classification information storage unit 123 has items such as "keyword list ID", "class ID", and "keyword".

「キーワードリストＩＤ」は、キーワードリストを識別する識別情報を示す。「クラスＩＤ」は、クラスを識別する識別情報を示す。なお、本明細書では、キーワードリストＩＤ等の識別情報は、説明で用いる参照符号と共通するものとする。例えば、キーワードリストＩＤ「Ｌ０１」で識別されるキーワードリストは、「キーワードリストＬ０１」を示す。「キーワード」は、分類の対象であるキーワードを示す。 The "keyword list ID" indicates identification information that identifies the keyword list. The "class ID" indicates identification information that identifies the class. In this specification, the identification information such as the keyword list ID is the same as the reference code used in the description. For example, the keyword list identified by the keyword list ID "L01" indicates "keyword list L01". "Keyword" indicates a keyword to be classified.

すなわち、図６では、分類情報記憶部１２３が保持する情報の一例として、キーワードリストＩＤ「Ｌ０１」で識別されるキーワードリストＬ０１に含まれるキーワードは、クラスＩＤ「ＣＬ０１」や「ＣＬ０２」等で識別されるクラスに分類されたことを示している。また、図６では、例えば、クラスＣＬ０１には、キーワード「ＡＡＡ」や、キーワード「ＢＢＢ」や、キーワード「ＣＣＣ」が分類されていることを示している。 That is, in FIG. 6, as an example of the information held by the classification information storage unit 123, the keywords included in the keyword list L01 identified by the keyword list ID “L01” are identified by the class IDs “CL01”, “CL02”, and the like. It shows that it was classified into the class to be. Further, in FIG. 6, for example, it is shown that the keyword “AAA”, the keyword “BBB”, and the keyword “CCC” are classified in the class CL01.

（制御部１３０について）
制御部１３０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、分類装置１００内部の記憶装置に記憶されている各種プログラム（分類プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (About control unit 130)
The control unit 130 is a controller, and is, for example, various programs (as an example of a classification program) stored in a storage device inside the classification device 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like. (Equivalent) is realized by executing RAM as a work area. Further, the control unit 130 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

実施形態に係る制御部１３０は、図３に示すように、取得部１３１と、算出部１３２と、抽出部１３３と、生成部１３４と、受付部１３５と、分類部１３６とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部１３０が有する各処理部の接続関係は、図３に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 3, the control unit 130 according to the embodiment includes an acquisition unit 131, a calculation unit 132, an extraction unit 133, a generation unit 134, a reception unit 135, and a classification unit 136. Realize or execute the functions and actions of information processing described in. The internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be any other configuration as long as it is configured to perform information processing described later. Further, the connection relationship of each processing unit included in the control unit 130 is not limited to the connection relationship shown in FIG. 3, and may be another connection relationship.

（取得部１３１について）
取得部１３１は、各種情報を取得する。例えば、取得部１３１は、ユーザから送信されるクエリを取得する。具体的には、取得部１３１は、ユーザから任意に入力されるキーワードであって、検索サイト等における検索処理に用いるためのキーワードをクエリとして取得する。なお、クエリは、複数のキーワードを含んでいてもよい。 (About acquisition unit 131)
The acquisition unit 131 acquires various types of information. For example, the acquisition unit 131 acquires a query sent from the user. Specifically, the acquisition unit 131 acquires a keyword arbitrarily input by the user and used for a search process on a search site or the like as a query. The query may include a plurality of keywords.

また、取得部１３１は、所定の集計期間において、任意のクエリのうち、互いに異なる二つのクエリのいずれかを入力したユーザの数と、当該二つのクエリを両方とも入力したユーザの数とを取得する。例えば、取得部１３１は、第１クエリを検索したユーザを識別する情報（サービスにおけるユーザＩＤや、端末固有の識別情報である端末ＩＤや、ブラウザソフトウェアのクッキー（cookie）等）に基づいて、当該ユーザが第２クエリを検索したか否かを判定可能である。 Further, the acquisition unit 131 acquires the number of users who have input one of two different queries and the number of users who have input both of the two queries in a predetermined aggregation period. do. For example, the acquisition unit 131 is based on information that identifies the user who searched for the first query (user ID in the service, terminal ID that is identification information unique to the terminal, cookie of the browser software, etc.). It is possible to determine whether or not the user has searched for the second query.

また、取得部１３１は、所定の集計期間における全体の検索数や、全体のユニークユーザ数を取得する。なお、取得部１３１は、一つの検索サイトから上記情報を取得してもよいし、複数の検索サイトから取得した情報を合算してもよい。この場合、取得部１３１は、ユーザの検索行動に関する情報を、検索サイトをユーザに提供する所定の外部サーバから取得してもよい。また、取得部１３１は、検索サイトに限らず、所定のサービスサイト（例えば、ショッピングサイトやオークションサイト）等においてユーザから送信されるクエリを取得してもよい。 In addition, the acquisition unit 131 acquires the total number of searches and the total number of unique users in a predetermined aggregation period. The acquisition unit 131 may acquire the above information from one search site, or may add up the information acquired from a plurality of search sites. In this case, the acquisition unit 131 may acquire information about the user's search behavior from a predetermined external server that provides the search site to the user. Further, the acquisition unit 131 may acquire a query transmitted from a user not only on a search site but also on a predetermined service site (for example, a shopping site or an auction site).

取得部１３１は、取得した情報を記憶部１２０内に格納する。また、取得部１３１は、後述する各処理部が要する情報を、適宜、記憶部１２０内から取得してもよい。 The acquisition unit 131 stores the acquired information in the storage unit 120. Further, the acquisition unit 131 may appropriately acquire the information required by each processing unit, which will be described later, from the storage unit 120.

（算出部１３２について）
算出部１３２は、取得部１３１によって取得されたクエリ同士の関連度を算出する。算出部１３２は、任意のクエリのうち、互いに異なる二つのクエリのいずれかを入力したユーザの数と、当該二つのクエリを両方とも入力したユーザの数と、に少なくとも基づいて、任意のクエリ同士の関連度を算出する。なお、算出部１３２は、クエリに複数のキーワードが含まれている場合には、キーワードごとに、キーワードを入力したユーザの数を計数してもよい。 (About calculation unit 132)
The calculation unit 132 calculates the degree of relevance between the queries acquired by the acquisition unit 131. The calculation unit 132 includes arbitrary queries based on at least the number of users who input one of two different queries and the number of users who input both of the two queries. Calculate the degree of relevance of. When the query includes a plurality of keywords, the calculation unit 132 may count the number of users who have input the keywords for each keyword.

また、算出部１３２は、所定期間のうちに互いに異なる二つのクエリを両方とも入力したユーザの数に基づいて、関連度を算出してもよい。例えば、算出部１３２は、集計期間が設定されている場合には、当該集計期間の間に、互いに異なる二つのクエリを両方とも入力したユーザの数に基づいて関連度を算出する。このように、算出部１３２は、二つのクエリを検索したという情報を集計する期間を可変とすることで、比較的長い期間（１年以上など）を捉えた関連度を算出するか、あるいは、比較的短い期間を捉えた関連度を算出するか、といった制御を行うことができる。 Further, the calculation unit 132 may calculate the degree of relevance based on the number of users who have input two queries that are different from each other within a predetermined period. For example, when the aggregation period is set, the calculation unit 132 calculates the relevance degree based on the number of users who have input both different queries during the aggregation period. In this way, the calculation unit 132 calculates the degree of relevance that captures a relatively long period (such as one year or more) by making the period for aggregating the information that two queries have been searched variable. It is possible to control whether to calculate the degree of relevance that captures a relatively short period.

なお、算出部１３２は、クエリを入力するユーザを分類し、分類したグループごとに関連度を算出してもよい。例えば、算出部１３２は、所定のクエリの入力履歴、所定のサービスの利用履歴、又は、所定の属性の少なくともいずれか一つに基づいてユーザを所定のグループに分類し、分類したグループごとに関連度を算出してもよい。例えば、算出部１３２は、「妊娠」や「出産」等のライフステージに関わるクエリを所定の回数以上検索した履歴を有するユーザ群における、クエリ同士の関連度を算出してもよい。あるいは、算出部１３２は、ショッピングサービスやオークションサービスを利用した履歴を所定の回数以上有するユーザ群や、年齢層や性別が共通するユーザ群における、クエリ同士の関連度を算出してもよい。これにより、算出部１３２は、不特定多数のユーザの検索行動のみならず、ある特定の興味関心を有するグループにおけるユーザの検索行動に基づいて、クエリの関連度を算出することができる。 The calculation unit 132 may classify the users who input the query and calculate the degree of relevance for each of the classified groups. For example, the calculation unit 132 classifies users into predetermined groups based on at least one of a predetermined query input history, a predetermined service usage history, or a predetermined attribute, and is associated with each classified group. The degree may be calculated. For example, the calculation unit 132 may calculate the degree of relevance between queries in a group of users who have a history of searching for queries related to life stages such as "pregnancy" and "childbirth" a predetermined number of times or more. Alternatively, the calculation unit 132 may calculate the degree of relevance between the queries in the user group having a history of using the shopping service or the auction service more than a predetermined number of times, or the user group having a common age group and gender. As a result, the calculation unit 132 can calculate the relevance of the query based not only on the search behavior of an unspecified number of users but also on the search behavior of users in a group having a specific interest.

算出部１３２は、上記式（１）で示す式に対応する数値を代入することにより、互いに異なる二つのクエリである第１クエリと第２クエリとの関連度を算出する。算出部１３２は、第１クエリと第２クエリのペアと、算出した関連度とを対応付けて、関連度情報記憶部１２１に記憶する。 The calculation unit 132 calculates the degree of relevance between the first query and the second query, which are two queries different from each other, by substituting the numerical values corresponding to the equation shown in the above equation (1). The calculation unit 132 stores the pair of the first query and the second query in the relevance information storage unit 121 in association with the calculated relevance.

（抽出部１３３について）
抽出部１３３は、任意のクエリ同士の関連度に基づいて、第１クエリと関連する複数の第２クエリを抽出する。例えば、抽出部１３３は、算出部１３２によって算出された関連度に基づいて、第１クエリと関連する複数の第２クエリを抽出する。 (About extraction unit 133)
The extraction unit 133 extracts a plurality of second queries related to the first query based on the degree of relevance between arbitrary queries. For example, the extraction unit 133 extracts a plurality of second queries related to the first query based on the degree of relevance calculated by the calculation unit 132.

具体的には、抽出部１３３は、第１クエリに対する第２クエリのうち、所定の閾値を超える関連度を有する第２クエリを抽出する。なお、所定の閾値は、例えば分類装置１００の管理者によって任意に設定されてもよいし、統計的な手法により算出されてもよい（例えば、算出された全ての関連度における平均値を所定の閾値として設定する等）。また、抽出部１３３は、第１クエリに対する第２クエリのうち、関連度の高い順から所定数の第２クエリを抽出するようにしてもよい。 Specifically, the extraction unit 133 extracts the second query having a degree of relevance exceeding a predetermined threshold value from the second query with respect to the first query. The predetermined threshold value may be arbitrarily set by, for example, the administrator of the classification device 100, or may be calculated by a statistical method (for example, the average value in all the calculated relevance degrees is predetermined. Set as a threshold, etc.). Further, the extraction unit 133 may extract a predetermined number of second queries from the second query with respect to the first query in descending order of relevance.

（生成部１３４について）
生成部１３４は、抽出部１３３によって抽出された複数の第２クエリに基づいて、第１クエリを特徴付ける特徴情報を生成する。 (About the generator 134)
The generation unit 134 generates feature information that characterizes the first query based on the plurality of second queries extracted by the extraction unit 133.

例えば、生成部１３４は、複数の第２クエリの各々を構成する要素と、要素の出現回数とに基づいて、複数の第２クエリと関連する第１クエリの特徴情報を生成する。具体的には、生成部１３４は、第１クエリの特徴情報として、第２クエリを構成するキーワードと、キーワードの出現回数との組合せの情報を生成してもよい。 For example, the generation unit 134 generates the feature information of the first query related to the plurality of second queries based on the elements constituting each of the plurality of second queries and the number of occurrences of the elements. Specifically, the generation unit 134 may generate information on a combination of the keywords constituting the second query and the number of occurrences of the keywords as the feature information of the first query.

また、生成部１３４は、第１クエリの特徴情報として、複数の第２クエリの各々を構成する要素を次元とし、要素の出現回数を各々の次元の次元数とするベクトルを生成してもよい。具体的には、生成部１３４は、第１クエリの特徴情報として、抽出された複数の第２クエリの各々を構成するキーワードを次元とし、各々のキーワードの出現回数を各々の次元の次元数とする単語ベクトルを生成する。 Further, the generation unit 134 may generate a vector in which the elements constituting each of the plurality of second queries are dimensions and the number of appearances of the elements is the number of dimensions of each dimension as the feature information of the first query. .. Specifically, the generation unit 134 uses the keywords constituting each of the extracted second queries as dimensions as the feature information of the first query, and sets the number of occurrences of each keyword as the number of dimensions in each dimension. Generate a word vector to do.

生成部１３４は、第１クエリと、生成した特徴情報とを対応付けて、特徴情報記憶部１２２に格納する。 The generation unit 134 associates the first query with the generated feature information and stores it in the feature information storage unit 122.

（受付部１３５について）
受付部１３５は、各種要求を受け付ける。例えば、受付部１３５は、所定のユーザから任意のキーワードを受け付ける。具体的には、受付部１３５は、キーワードの分類を所望するユーザから、任意のキーワードを受け付けるとともに、当該キーワードの分類の要求（リクエスト）を受け付ける。受付部１３５は、受け付けたキーワードを分類部１３６に送る。例えば、受付部１３５によって一のキーワードが受け付けられた場合、分類部１３６は、当該一のキーワードが既存のクラスのいずれかに分類されるかを判定する。 (About reception desk 135)
The reception unit 135 receives various requests. For example, the reception unit 135 receives an arbitrary keyword from a predetermined user. Specifically, the reception unit 135 accepts an arbitrary keyword from a user who desires the classification of the keyword, and also receives a request (request) for the classification of the keyword. The reception unit 135 sends the received keyword to the classification unit 136. For example, when one keyword is received by the reception unit 135, the classification unit 136 determines whether the one keyword is classified into any of the existing classes.

また、受付部１３５は、任意の複数のキーワードを含むキーワードリストを受け付けてもよい。この場合、受付部１３５は、受け付けたキーワードリストを分類部１３６に送る。この場合、分類部１３６は、キーワードリストに含まれる各々のキーワードを分類する。 Further, the reception unit 135 may accept a keyword list including any plurality of keywords. In this case, the reception unit 135 sends the received keyword list to the classification unit 136. In this case, the classification unit 136 classifies each keyword included in the keyword list.

（分類部１３６について）
分類部１３６は、生成部１３４によって生成された特徴情報に基づいて、第１クエリに対応するキーワードを分類する。 (About classification unit 136)
The classification unit 136 classifies the keywords corresponding to the first query based on the feature information generated by the generation unit 134.

例えば、分類部１３６は、第１クエリの特徴情報がベクトルである場合には、生成部１３４によって生成されたベクトルの類似度（例えばコサイン類似度）に基づいて、キーワードを分類する。 For example, when the feature information of the first query is a vector, the classification unit 136 classifies the keywords based on the similarity (for example, cosine similarity) of the vector generated by the generation unit 134.

また、分類部１３６は、受付部１３５によって任意のキーワードが受け付けられた場合には、当該キーワードに対応する特徴情報に基づいて、当該キーワードを分類する。例えば、分類部１３６は、既存のクラスに属する他のキーワードが存在する場合には、他のキーワードの特徴情報と、受け付けられたキーワードとの特徴情報の類似度に基づいて、受け付けられたキーワードがいずれのクラスに属するかを判定する。 Further, when an arbitrary keyword is received by the reception unit 135, the classification unit 136 classifies the keyword based on the feature information corresponding to the keyword. For example, in the classification unit 136, when another keyword belonging to the existing class exists, the accepted keyword is based on the similarity between the feature information of the other keyword and the feature information of the accepted keyword. Determine which class it belongs to.

また、分類部１３６は、受付部１３５によって任意の複数のキーワードを含むキーワードリストが受け付けられた場合には、キーワードリストに含まれる各々のキーワードの特徴情報に基づいて、当該キーワードリストに含まれる各々のキーワードを分類する。これにより、分類部１３６は、ユーザが分類を所望する複数のキーワードについて、キーワードの意味やカテゴリ等によらず、適切な分類を行うことができる。 Further, when the reception unit 135 receives a keyword list including an arbitrary plurality of keywords, the classification unit 136 includes each of the keyword lists included in the keyword list based on the characteristic information of each keyword included in the keyword list. Categorize the keywords of. As a result, the classification unit 136 can appropriately classify the plurality of keywords that the user wants to classify, regardless of the meaning of the keywords, the category, and the like.

分類部１３６は、キーワードと分類したクラスとを対応付けて、分類情報記憶部１２３に格納する。また、分類部１３６は、分類した結果をユーザに送信する。 The classification unit 136 associates the keyword with the classified class and stores the keyword in the classification information storage unit 123. Further, the classification unit 136 transmits the classification result to the user.

また、分類部１３６は、ユーザから受け付けたキーワードリストに対して分類を行った場合、分類の結果を可視化したグラフ等をユーザに提供してもよい。例えば、分類部１３６は、キーワードを分布図等で示し、同じクラス（グループ）に分類されたキーワードを同じ色で示すなど、ユーザが一目でキーワード同士の関係を把握できるような情報をユーザに提供してもよい。 Further, when the classification unit 136 classifies the keyword list received from the user, the classification unit 136 may provide the user with a graph or the like that visualizes the classification result. For example, the classification unit 136 provides the user with information that allows the user to grasp the relationship between the keywords at a glance, such as showing the keywords in a distribution map or the like and showing the keywords classified in the same class (group) in the same color. You may.

〔４．処理手順〕
次に、図７及び図８を用いて、実施形態に係る分類装置１００による処理の手順について説明する。まず、図７を用いて、実施形態に係る特徴情報の生成処理の手順について説明する。図７は、実施形態に係る分類装置１００による処理手順を示すフローチャート（１）である。 [4. Processing procedure]
Next, the procedure of processing by the classification device 100 according to the embodiment will be described with reference to FIGS. 7 and 8. First, the procedure for generating the feature information according to the embodiment will be described with reference to FIG. 7. FIG. 7 is a flowchart (1) showing a processing procedure by the classification device 100 according to the embodiment.

図７に示すように、分類装置１００は、ユーザから送信されたクエリを取得したか否かを判定する（ステップＳ１０１）。クエリを取得していない場合（ステップＳ１０１；Ｎｏ）、分類装置１００は、クエリを取得するまで待機する。 As shown in FIG. 7, the classification device 100 determines whether or not the query transmitted from the user has been acquired (step S101). If the query has not been acquired (step S101; No), the classification device 100 waits until the query is acquired.

一方、クエリを取得した場合（ステップＳ１０１；Ｙｅｓ）、分類装置１００は、取得したクエリのうち、処理対象とする任意の第１クエリを選択する（ステップＳ１０２）。さらに、分類装置１００は、第１クエリを検索したユーザが検索した第２クエリを集計する（ステップＳ１０３）。 On the other hand, when a query is acquired (step S101; Yes), the classification device 100 selects an arbitrary first query to be processed from the acquired queries (step S102). Further, the classification device 100 aggregates the second query searched by the user who searched for the first query (step S103).

そして、分類装置１００は、例えば上記式（１）を用いて、第１クエリと第２クエリとの関連度を算出する（ステップＳ１０４）。その後、分類装置１００は、全検索クエリの関連度を算出したか否かを判定する（ステップＳ１０５）。全クエリの関連度を算出していない場合（ステップＳ１０５；Ｎｏ）、分類装置１００は、ステップＳ１０２からステップＳ１０４の処理を繰り返す。 Then, the classification device 100 calculates the degree of association between the first query and the second query by using, for example, the above equation (1) (step S104). After that, the classification device 100 determines whether or not the relevance of all search queries has been calculated (step S105). When the relevance of all queries has not been calculated (step S105; No), the classification device 100 repeats the processes of steps S102 to S104.

一方、全クエリの関連度を算出した場合（ステップＳ１０５；Ｙｅｓ）、分類装置１００は、任意の第１クエリについて、所定の閾値を超える関連度を有する複数の第２クエリを抽出する（ステップＳ１０６）。そして、分類装置１００は、抽出された第２クエリを形態素解析する（ステップＳ１０７）。 On the other hand, when the relevance of all the queries is calculated (step S105; Yes), the classification device 100 extracts a plurality of second queries having a relevance exceeding a predetermined threshold value for any first query (step S106). ). Then, the classification device 100 analyzes the extracted second query by morphological analysis (step S107).

続けて、分類装置１００は、形態素と、各々の形態素の出現回数とに基づいて、第１クエリの特徴情報を生成する（ステップＳ１０８）。そして、分類装置１００は、生成した特徴情報を記憶部１２０内に格納する（ステップＳ１０９）。 Subsequently, the classification device 100 generates the feature information of the first query based on the morpheme and the number of appearances of each morpheme (step S108). Then, the classification device 100 stores the generated feature information in the storage unit 120 (step S109).

次に、図８を用いて、実施形態に係る分類処理の手順について説明する。図８は、実施形態に係る分類装置１００による処理手順を示すフローチャート（２）である。 Next, the procedure of the classification process according to the embodiment will be described with reference to FIG. FIG. 8 is a flowchart (2) showing a processing procedure by the classification device 100 according to the embodiment.

図８に示すように、分類装置１００は、ユーザからキーワードリストを受け付けたか否かを判定する（ステップＳ２０１）。キーワードリストを受け付けていない場合（ステップＳ２０１；Ｎｏ）、分類装置１００は、受け付けるまで待機する。 As shown in FIG. 8, the classification device 100 determines whether or not the keyword list has been accepted from the user (step S201). If the keyword list is not accepted (step S201; No), the classification device 100 waits until it is accepted.

一方、キーワードリストを受け付けた場合（ステップＳ２０１；Ｙｅｓ）、分類装置１００は、例えば特徴情報記憶部１２２を参照して、キーワードリストに含まれる各々のキーワードに対応する特徴情報を特定する（ステップＳ２０２）。 On the other hand, when the keyword list is received (step S201; Yes), the classification device 100 refers to, for example, the feature information storage unit 122, and identifies the feature information corresponding to each keyword included in the keyword list (step S202). ).

そして、分類装置１００は、各々のキーワードの特徴情報同士の類似度を算出する（ステップＳ２０３）。さらに、分類装置１００は、算出した類似度に基づいてキーワードを分類する（ステップＳ２０４）。そして、分類装置１００は、分類した結果を記憶部１２０内に格納する（ステップＳ２０５）。また、分類装置１００は、分類した結果をキーワードリストの送信元であるユーザに送信する（ステップＳ２０６）。 Then, the classification device 100 calculates the degree of similarity between the feature information of each keyword (step S203). Further, the classification device 100 classifies keywords based on the calculated similarity (step S204). Then, the classification device 100 stores the classification result in the storage unit 120 (step S205). Further, the classification device 100 transmits the classification result to the user who is the transmission source of the keyword list (step S206).

〔５．変形例〕
上述した実施形態に係る分類システム１は、上記実施形態以外にも種々の異なる形態にて実施されてよい。そこで、以下では、上記の分類システム１に含まれる各装置の他の実施形態について説明する。 [5. Modification example]
The classification system 1 according to the above-described embodiment may be implemented in various different forms other than the above-mentioned embodiment. Therefore, in the following, other embodiments of each device included in the above classification system 1 will be described.

〔５－１．キーワードリストの生成〕
上記実施形態では、分類装置１００が、ユーザからキーワードリストを受け付ける例を示した。ここで、分類装置１００は、ユーザから受け付けた一のキーワードに基づいて、キーワードリストを生成し、生成したキーワードリストに含まれるキーワードを分類する処理を行ってもよい。 [5-1. Keyword list generation]
In the above embodiment, the classification device 100 has shown an example of accepting a keyword list from a user. Here, the classification device 100 may generate a keyword list based on one keyword received from the user, and may perform a process of classifying the keywords included in the generated keyword list.

例えば、分類装置１００は、所定のユーザから任意のキーワードの入力を受け付けた場合に、当該任意のキーワードとの関連度が所定の閾値を超える複数のキーワードを抽出し、当該任意のキーワードと抽出した複数のキーワードとを含むキーワードリストを生成する。そして、分類装置１００は、生成したキーワードリストに含まれる各々のキーワードを分類する。この点について、図９を用いて、処理の流れに沿って説明する。図９は、変形例に係る分類装置による処理手順を示すフローチャートである。 For example, when the classification device 100 receives the input of an arbitrary keyword from a predetermined user, the classification device 100 extracts a plurality of keywords whose relevance to the arbitrary keyword exceeds a predetermined threshold value, and extracts the arbitrary keyword. Generate a keyword list that contains multiple keywords. Then, the classification device 100 classifies each keyword included in the generated keyword list. This point will be described with reference to FIG. 9 along with the flow of processing. FIG. 9 is a flowchart showing a processing procedure by the classification device according to the modified example.

図９に示すように、分類装置１００は、ユーザからキーワードを受け付けたか否かを判定する（ステップＳ３０１）。キーワードを受け付けていない場合（ステップＳ３０１；Ｎｏ）、分類装置１００は、受け付けるまで待機する。 As shown in FIG. 9, the classification device 100 determines whether or not the keyword has been accepted from the user (step S301). If the keyword is not accepted (step S301; No), the classification device 100 waits until it is accepted.

一方、キーワードを受け付けた場合（ステップＳ３０１；Ｙｅｓ）、分類装置１００は、例えば関連度情報記憶部１２１を参照して、受け付けたキーワードに対して所定の閾値を超える関連度を有するキーワードを抽出する（ステップＳ３０２）。 On the other hand, when the keyword is received (step S301; Yes), the classification device 100 refers to, for example, the relevance information storage unit 121, and extracts a keyword having a relevance degree exceeding a predetermined threshold value with respect to the accepted keyword. (Step S302).

そして、分類装置１００は、受け付けたキーワードと、ステップＳ３０２において抽出したキーワードとを含むキーワードリストを生成する（ステップＳ３０３）。そして、分類装置１００は、生成したキーワードリストに対して、例えば図８で示した流れに沿って分類処理を実行する（ステップＳ３０４）。 Then, the classification device 100 generates a keyword list including the received keywords and the keywords extracted in step S302 (step S303). Then, the classification device 100 executes a classification process for the generated keyword list, for example, according to the flow shown in FIG. 8 (step S304).

このように、分類装置１００は、ユーザから受け付けたキーワードに基づいてキーワードリストを生成し、生成したキーワードリストに含まれるキーワードを分類する処理を行ってもよい。これにより、ユーザは、自身でリストを作成することを要さず、興味関心のあるキーワードを一つだけ分類装置１００に送信することにより、当該キーワードと関連するキーワード群に関する分類結果を得ることができる。すなわち、ユーザは、入力した一のキーワードに関して、ある程度のグルーピングのなされたキーワード群を得ることができる。このため、マーケティングを行うユーザであれば、例えば、入力した一のキーワードに関してどのようなニーズがあるか、また、入力した一のキーワードに関心を有するユーザが、他のどのようなキーワードに興味を有しているか等の情報を知得することができる。 In this way, the classification device 100 may generate a keyword list based on the keywords received from the user, and may perform a process of classifying the keywords included in the generated keyword list. As a result, the user does not need to create a list by himself / herself, and by sending only one keyword of interest to the classification device 100, the classification result regarding the keyword group related to the keyword can be obtained. can. That is, the user can obtain a group of keywords that have been grouped to some extent with respect to one input keyword. Therefore, if you are a marketing user, for example, what kind of needs do you have regarding one entered keyword, and what other keywords are interested in a user who is interested in one entered keyword? It is possible to obtain information such as whether or not it is possessed.

〔５－２．クエリに対する形態素解析〕
上記実施形態では、分類装置１００が、第１クエリと第２クエリの両方を入力したユーザの数に基づいてクエリ同士の関連度を算出する例を示した。ここで、上述のように、クエリは、一のキーワードのみならず、複数のキーワードや文章によって構成される場合がある。このため、分類装置１００は、ユーザから送信されたクエリを形態素解析し、形態素解析の結果に含まれるキーワードを第１クエリや第２クエリとして取り扱うようにしてもよい。この場合、分類装置１００は、既知の記述を用いて、クエリに含まれる名詞や固有名詞を抽出し、抽出したキーワードのみを処理に用いてもよい。 [5-2. Morphological analysis for queries]
In the above embodiment, the classification device 100 shows an example of calculating the degree of relevance between queries based on the number of users who have input both the first query and the second query. Here, as described above, the query may be composed of not only one keyword but also a plurality of keywords and sentences. Therefore, the classification device 100 may perform morphological analysis of the query sent from the user and treat the keywords included in the result of the morphological analysis as the first query or the second query. In this case, the classification device 100 may extract nouns and proper nouns included in the query using a known description, and use only the extracted keywords for processing.

〔５－３．検索行動〕
分類装置１００は、第１クエリと第２クエリとの両方を検索したユーザと判定する期間について、必ずしも集計期間と同じ期間において検索行動がなされたことを条件とすることを要しない。すなわち、分類装置１００は、第１クエリと第２クエリとの両方を検索したユーザと判定する期間と、クエリを検索したユーザの数等を集計する期間とをそれぞれ設定してもよい。例えば、分類装置１００は、同一ユーザから２４時間以内に第１クエリと第２クエリとが送信された場合に、当該ユーザを第１クエリと第２クエリとの両方を検索したユーザと扱ってもよい。また、分類装置１００は、同一ユーザにおける同一セッション（例えば、所定の検索サイトへアクセスし、アクセスが途切れるまでの一連の行動）において第１クエリと第２クエリとが送信された場合に、当該ユーザを第１クエリと第２クエリとの両方を検索したユーザと扱ってもよい。このように、分類装置１００は、ユーザの検索行動を柔軟に取扱い、種々の情報処理を行ってもよい。 [5-3. Search behavior]
The classification device 100 does not necessarily have to be conditional on the search action being performed in the same period as the aggregation period for the period for determining both the first query and the second query as the searched user. That is, the classification device 100 may set a period for determining both the first query and the second query as the searched user and a period for totaling the number of users who searched the query. For example, the classification device 100 may treat the user as a user who has searched for both the first query and the second query when the first query and the second query are sent from the same user within 24 hours. good. Further, the classification device 100 is used when the first query and the second query are transmitted in the same session (for example, a series of actions from accessing a predetermined search site until the access is interrupted) by the same user. May be treated as a user who searched for both the first query and the second query. In this way, the classification device 100 may flexibly handle the user's search behavior and perform various information processing.

〔５－４．検索サイト〕
上記実施形態では、検索サイトが分類装置１００によって提供される例を示した。しかし、検索サイトは、所定の外部サーバ（例えば、検索サービスを提供するウェブサーバ）によって提供されてもよい。この場合、分類装置１００は、外部サーバを介して、ユーザが検索サイトに対して送信したクエリやユーザ情報等を取得するようにしてもよい。 [5-4. Search site]
In the above embodiment, an example is shown in which the search site is provided by the classification device 100. However, the search site may be provided by a predetermined external server (eg, a web server that provides a search service). In this case, the classification device 100 may acquire the query, the user information, and the like transmitted by the user to the search site via the external server.

〔６．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 [6. others〕
Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in any unit according to various loads and usage conditions. Can be integrated and configured.

例えば、図３に示した関連度情報記憶部１２１や、特徴情報記憶部１２２や、分類情報記憶部１２３は、分類装置１００が保持せずに、外部のストレージサーバ等に保持されてもよい。この場合、分類装置１００は、ストレージサーバにアクセスすることで、関連度情報や特徴情報等を取得する。 For example, the relevance information storage unit 121, the feature information storage unit 122, and the classification information storage unit 123 shown in FIG. 3 may not be held by the classification device 100 but may be held by an external storage server or the like. In this case, the classification device 100 acquires relevance information, feature information, and the like by accessing the storage server.

また、例えば、上述してきた分類装置１００は、ユーザ端末１０からクエリを取得したり、分類結果をユーザに送信したりといった、外部装置とのやりとりを中心に実行するフロントエンドサーバ側と、特徴情報に基づいてクエリを分類する処理等を実行するバックエンドサーバ側とに分散されてもよい。 Further, for example, the classification device 100 described above has a front-end server side that mainly executes communication with an external device, such as acquiring a query from a user terminal 10 and transmitting a classification result to a user, and characteristic information. It may be distributed to the back-end server side that executes the process of classifying queries based on.

〔７．ハードウェア構成〕
また、上述してきた実施形態に係る分類装置１００やユーザ端末１０は、例えば図１０に示すような構成のコンピュータ１０００によって実現される。以下、分類装置１００を例として説明する。図１０は、分類装置１００の機能を実現するコンピュータ１０００の一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ（Read Only Memory）１３００、ＨＤＤ（Hard Disk Drive）１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [7. Hardware configuration]
Further, the classification device 100 and the user terminal 10 according to the above-described embodiment are realized by, for example, a computer 1000 having a configuration as shown in FIG. Hereinafter, the classification device 100 will be described as an example. FIG. 10 is a hardware configuration diagram showing an example of a computer 1000 that realizes the functions of the classification device 100. The computer 1000 includes a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface (I / F) 1500, an input / output interface (I / F) 1600, and a media interface (I / F). ) Has 1700.

ＣＰＵ１１００は、ＲＯＭ１３００又はＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を記憶する。通信インターフェイス１５００は、通信網５００（図２に示すネットワークＮに対応する）を介して他の機器からデータを受信してＣＰＵ１１００へ送り、また、通信網５００を介してＣＰＵ１１００が生成したデータを他の機器へ送信する。 The HDD 1400 stores a program executed by the CPU 1100, data used by such a program, and the like. The communication interface 1500 receives data from another device via the communication network 500 (corresponding to the network N shown in FIG. 2) and sends the data to the CPU 1100, and also receives data generated by the CPU 1100 via the communication network 500. Send to the device of.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、入出力インターフェイス１６００を介して生成したデータを出力装置へ出力する。 The CPU 1100 controls an output device such as a display or a printer, and an input device such as a keyboard or a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. Further, the CPU 1100 outputs the data generated via the input / output interface 1600 to the output device.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラム又はデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides the program or data to the CPU 1100 via the RAM 1200. The CPU 1100 loads the program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. And so on.

例えば、コンピュータ１０００が分類装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０の機能を実現する。また、ＨＤＤ１４００には、記憶部１２０内の各データが格納される。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置から通信網５００を介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the classification device 100, the CPU 1100 of the computer 1000 realizes the function of the control unit 130 by executing the program loaded on the RAM 1200. Further, each data in the storage unit 120 is stored in the HDD 1400. The CPU 1100 of the computer 1000 reads these programs from the recording medium 1800 and executes them, but as another example, these programs may be acquired from another device via the communication network 500.

〔８．効果〕
上述してきたように、実施形態に係る分類装置１００は、抽出部１３３と、生成部１３４と、分類部１３６とを有する。抽出部１３３は、任意のクエリ同士の関連度に基づいて、第１クエリと関連する複数の第２クエリを抽出する。生成部１３４は、抽出部１３３によって抽出された複数の第２クエリに基づいて、第１クエリを特徴付ける特徴情報を生成する。分類部１３６は、生成部１３４によって生成された特徴情報に基づいて、第１クエリに対応するキーワードを分類する。 [8. effect〕
As described above, the classification device 100 according to the embodiment includes an extraction unit 133, a generation unit 134, and a classification unit 136. The extraction unit 133 extracts a plurality of second queries related to the first query based on the degree of relevance between arbitrary queries. The generation unit 134 generates feature information that characterizes the first query based on the plurality of second queries extracted by the extraction unit 133. The classification unit 136 classifies the keywords corresponding to the first query based on the feature information generated by the generation unit 134.

このように、実施形態に係る分類装置１００は、ユーザから送信されるクエリが有する意味や性質等の定性的な情報によらず、実際のユーザの検索行動という定量的な情報に基づいて分類処理を行う。言い換えれば、分類装置１００は、各々のクエリをカテゴライズ等することなく、関連するキーワード同士を分類する。例えば、分類装置１００によれば、ユーザの趣味嗜好や興味関心を反映してキーワードを分類するため、意味として直接的に関係のなさそうなキーワード同士であっても、同じグループ（クラス）に属するキーワードとして分類することができる。このように、分類装置１００は、ユーザから送信される様々なクエリに対して分類を行うことができるため、結果として、多様なクエリ同士における相互の関係性を導出することができる。 As described above, the classification device 100 according to the embodiment does not depend on qualitative information such as the meaning and properties of the query transmitted from the user, but classifies based on the quantitative information of the actual search behavior of the user. I do. In other words, the classification device 100 classifies related keywords without categorizing each query. For example, according to the classification device 100, keywords are classified by reflecting the user's hobbies, tastes, and interests, so that even keywords that do not seem to be directly related in meaning belong to the same group (class). It can be classified as a keyword. In this way, the classification device 100 can perform classification for various queries transmitted from the user, and as a result, it is possible to derive mutual relationships between various queries.

また、生成部１３４は、複数の第２クエリの各々を構成する要素と、当該要素の出現回数とに基づいて、当該複数の第２クエリと関連する第１クエリの特徴情報を生成する。 Further, the generation unit 134 generates the feature information of the first query related to the plurality of second queries based on the elements constituting each of the plurality of second queries and the number of occurrences of the elements.

このように、実施形態に係る分類装置１００は、第２クエリを構成する要素（例えば、第２クエリが含むキーワード）と要素の出現回数とによって第１クエリの特徴を示すため、第１クエリの特徴を詳細に捉えることができる。このため、分類装置１００は、第１クエリに対応するキーワードを適切に分類することができる。 As described above, the classification device 100 according to the embodiment shows the characteristics of the first query by the elements constituting the second query (for example, the keywords included in the second query) and the number of occurrences of the elements. You can capture the features in detail. Therefore, the classification device 100 can appropriately classify the keywords corresponding to the first query.

また、生成部１３４は、第１クエリの特徴情報として、複数の第２クエリの各々を構成する要素を次元とし、当該要素の出現回数を各々の次元の次元数とするベクトルを生成する。分類部１３６は、生成部１３４によって生成されたベクトルの類似度に基づいて、キーワードを分類する。 Further, the generation unit 134 generates a vector in which the element constituting each of the plurality of second queries is set as the dimension and the number of appearances of the element is set as the number of dimensions of each dimension as the feature information of the first query. The classification unit 136 classifies keywords based on the similarity of the vectors generated by the generation unit 134.

このように、実施形態に係る分類装置１００は、第２クエリを構成する要素と要素の出現回数とをベクトルによって表現することにより、特徴情報同士の類似度の算出を容易に行うことができる。 As described above, the classification device 100 according to the embodiment can easily calculate the similarity between the feature information by expressing the element constituting the second query and the number of appearances of the element by a vector.

また、実施形態に係る分類装置１００は、任意のクエリのうち、互いに異なる二つのクエリのいずれかを入力したユーザの数と、当該二つのクエリを両方とも入力したユーザの数と、に少なくとも基づいて、任意のクエリ同士の関連度を算出する算出部１３２をさらに備える。抽出部１３３は、算出部１３２によって算出された関連度に基づいて、第１クエリと関連する複数の第２クエリを抽出する。 Further, the classification device 100 according to the embodiment is at least based on the number of users who input one of two different queries among arbitrary queries and the number of users who input both of the two queries. Further, a calculation unit 132 for calculating the degree of relevance between arbitrary queries is provided. The extraction unit 133 extracts a plurality of second queries related to the first query based on the degree of relevance calculated by the calculation unit 132.

このように、実施形態に係る分類装置１００は、ユーザの検索行動ログに基づいて関連度を算出するため、クエリそのものの意味やカテゴリ等によらず、実際のユーザの興味関心に基づいて関連度を算出することができる。これにより、分類装置１００は、似たような行動をとるユーザが検索したクエリ同士の関連度を高く算出するなど、ユーザの行動が的確に反映された関連度の算出を行うことができる。 As described above, since the classification device 100 according to the embodiment calculates the relevance degree based on the user's search behavior log, the relevance degree is based on the actual user's interests regardless of the meaning or category of the query itself. Can be calculated. As a result, the classification device 100 can calculate the degree of relevance that accurately reflects the user's behavior, such as calculating the degree of relevance between the queries searched by the users who have similar behaviors.

また、算出部１３２は、所定期間のうちに互いに異なる二つのクエリを両方とも入力したユーザの数に基づいて、関連度を算出する。 Further, the calculation unit 132 calculates the degree of relevance based on the number of users who have input two queries that are different from each other within a predetermined period.

このように、実施形態に係る分類装置１００は、所定期間を設定し、所定期間における一連のユーザの検索行動を捉え、関連度を算出する。分類装置１００は、例えば所定期間を数年間というスパンで設定することで、妊娠や出産、また、ベビー用品など、ユーザのライフステージの移り変わりに関するクエリ同士を関連度のあるクエリとして抽出することができる。これにより、分類装置１００は、ユーザの一連の行動を反映させた分類処理を行うことができる。 In this way, the classification device 100 according to the embodiment sets a predetermined period, captures a series of user search behaviors in the predetermined period, and calculates the degree of relevance. The classification device 100 can extract queries related to changes in the user's life stage, such as pregnancy, childbirth, and baby products, as related queries by setting a predetermined period in a span of several years, for example. .. As a result, the classification device 100 can perform classification processing that reflects a series of user actions.

また、実施形態に係る分類装置１００は、所定のユーザから任意のキーワードを受け付ける受付部１３５をさらに備える。分類部１３６は、受付部１３５によって受け付けられたキーワードに対応する特徴情報に基づいて、当該キーワードを分類する。 Further, the classification device 100 according to the embodiment further includes a reception unit 135 that accepts an arbitrary keyword from a predetermined user. The classification unit 136 classifies the keyword based on the feature information corresponding to the keyword received by the reception unit 135.

このように、実施形態に係る分類装置１００は、ユーザから受け付けたキーワードに対して分類を行うことで、ユーザが所望するキーワードがどのような興味関心を持っているユーザに検索されているか、また、当該キーワードがどのような他のキーワードと関連性を有するかといった情報をユーザに提供することができる。 As described above, the classification device 100 according to the embodiment classifies the keywords received from the users, so that the keywords desired by the users are searched by the users who are interested in them. , Information such as what other keywords the keyword is related to can be provided to the user.

また、受付部１３５は、任意の複数のキーワードを含むキーワードリストを受け付ける。分類部１３６は、受付部１３５によって受け付けられたキーワードリストに含まれる各々のキーワードの特徴情報に基づいて、当該キーワードリストに含まれる各々のキーワードを分類する。 Further, the reception unit 135 accepts a keyword list including an arbitrary plurality of keywords. The classification unit 136 classifies each keyword included in the keyword list based on the characteristic information of each keyword included in the keyword list received by the reception unit 135.

このように、実施形態に係る分類装置１００は、キーワードリストに含まれるキーワードを分類することで、どのようなキーワード同士がユーザに検索され易い傾向にあるかといった情報をユーザに提供することができる。 As described above, the classification device 100 according to the embodiment can provide the user with information such as what kind of keywords tend to be easily searched by the user by classifying the keywords included in the keyword list. ..

また、受付部１３５は、所定のユーザから任意のキーワードの入力を受け付けた場合に、当該任意のキーワードとの関連度が所定の閾値を超える複数のキーワードを抽出し、当該任意のキーワードと抽出した複数のキーワードとを含むキーワードリストを生成する。分類部１３６は、受付部１３５によって生成されたキーワードリストに含まれる各々のキーワードを分類する。 Further, when the reception unit 135 receives the input of an arbitrary keyword from a predetermined user, the reception unit 135 extracts a plurality of keywords whose relevance to the arbitrary keyword exceeds a predetermined threshold value, and extracts the arbitrary keyword. Generate a keyword list that contains multiple keywords. The classification unit 136 classifies each keyword included in the keyword list generated by the reception unit 135.

このように、実施形態に係る分類装置１００は、ユーザから受け付けたキーワードに基づいてキーワードリストを生成し、生成したキーワードリストに含まれるキーワードを分類する処理を行ってもよい。これにより、分類装置１００は、ユーザにキーワードリストを作成する手間を掛けさせず、キーワードと関連するキーワード群に関する分類結果をユーザに提供することができる。 As described above, the classification device 100 according to the embodiment may generate a keyword list based on the keywords received from the user and perform a process of classifying the keywords included in the generated keyword list. As a result, the classification device 100 can provide the user with the classification result regarding the keyword group related to the keyword without causing the user to take the trouble of creating the keyword list.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are examples, and various modifications are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure column of the invention. It is possible to carry out the present invention in other modified forms.

また、上述した分類装置１００は、複数のサーバコンピュータで実現してもよく、また、機能によっては外部のプラットフォーム等をＡＰＩ（Application Programming Interface）やネットワークコンピューティングなどで呼び出して実現するなど、構成は柔軟に変更できる。 Further, the above-mentioned classification device 100 may be realized by a plurality of server computers, and depending on the function, it may be realized by calling an external platform or the like by API (Application Programming Interface), network computing, or the like. It can be changed flexibly.

また、特許請求の範囲に記載した「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Further, the "section, module, unit" described in the claims can be read as "means" or "circuit". For example, the acquisition unit can be read as an acquisition means or an acquisition circuit.

１分類システム
１０ユーザ端末
１００分類装置
１１０通信部
１２０記憶部
１２１関連度情報記憶部
１２２特徴情報記憶部
１２３分類情報記憶部
１３０制御部
１３１取得部
１３２算出部
１３３抽出部
１３４生成部
１３５受付部
１３６分類部 1 Classification system 10 User terminal 100 Classification device 110 Communication unit 120 Storage unit 121 Relevance information storage unit 122 Feature information storage unit 123 Classification information storage unit 130 Control unit 131 Acquisition unit 132 Calculation unit 133 Extraction unit 134 Generation unit 135 Reception unit 136 Classification department

Claims

An extractor that extracts multiple second queries related to the first query based on the degree of relevance between arbitrary queries,
A generation unit that generates feature information that characterizes the first query based on a plurality of second queries extracted by the extraction unit.
A classification unit that classifies keywords corresponding to the first query based on the feature information generated by the generation unit.
Equipped with
The generator is
Based on the elements constituting each of the plurality of second queries and the number of occurrences of the elements, the feature information of the first query related to the plurality of second queries is generated.
A classification device characterized by that.

The generator is
As the feature information of the first query, a vector is generated in which the elements constituting each of the plurality of second queries are dimensions and the number of appearances of the elements is the number of dimensions of each dimension.
The classification unit
The keywords are classified based on the similarity of the vectors generated by the generator.
The classification device according to claim 1 , wherein the classification device is characterized by the above.

Calculate the relevance of any query based on at least the number of users who entered one of two different queries out of any query and the number of users who entered both of those two queries. Calculation unit,
Further prepare
The extraction unit
A plurality of second queries related to the first query are extracted based on the degree of relevance calculated by the calculation unit.
The classification device according to claim 1 or 2 , wherein the classification device is characterized by the above.

The calculation unit
The relevance is calculated based on the number of users who have entered both of the two different queries within a given period of time.
The classification device according to claim 3 , wherein the classification device is characterized by the above.

Reception department that accepts arbitrary keywords from a predetermined user,
Further prepare
The classification unit
The keyword is classified based on the feature information corresponding to the keyword received by the reception unit.
The classification device according to any one of claims 1 to 4 , wherein the classification device is characterized by the above.

The reception department
Accepts a keyword list containing any number of keywords,
The classification unit
Based on the characteristic information of each keyword included in the keyword list accepted by the reception unit, each keyword included in the keyword list is classified.
The classification device according to claim 5 , wherein the classification device is characterized by the above.

The reception department
When the input of an arbitrary keyword is received from the predetermined user, a plurality of keywords whose relevance to the arbitrary keyword exceeds a predetermined threshold value are extracted, and the arbitrary keyword and the extracted plurality of keywords are included. Generate a keyword list and
The classification unit
Classify each keyword included in the keyword list generated by the reception unit.
The classification device according to claim 5 , wherein the classification device is characterized by the above.

It ’s a computer-executed classification method.
An extraction process that extracts multiple second queries related to the first query based on the degree of relevance between arbitrary queries.
A generation step that generates feature information that characterizes the first query based on a plurality of second queries extracted by the extraction step.
A classification step of classifying keywords corresponding to the first query based on the feature information generated by the generation step, and a classification step.
Including
The production step is
Based on the elements constituting each of the plurality of second queries and the number of occurrences of the elements, the feature information of the first query related to the plurality of second queries is generated.
A classification method characterized by that.

An extraction procedure that extracts multiple second queries related to the first query based on the degree of relevance between arbitrary queries, and
A generation procedure that generates feature information that characterizes the first query based on a plurality of second queries extracted by the extraction procedure.
A classification procedure for classifying keywords corresponding to the first query based on the feature information generated by the generation procedure, and a classification procedure.
Let the computer run
The generation procedure is
Based on the elements constituting each of the plurality of second queries and the number of occurrences of the elements, the feature information of the first query related to the plurality of second queries is generated.
A classification program characterized by that.