JP6590880B2

JP6590880B2 - Extraction apparatus, extraction method, and extraction program

Info

Publication number: JP6590880B2
Application number: JP2017168937A
Authority: JP
Inventors: 嘉人西川
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2017-09-01
Filing date: 2017-09-01
Publication date: 2019-10-16
Anticipated expiration: 2037-09-01
Also published as: JP2019046189A

Description

本発明は、抽出装置、抽出方法及び抽出プログラムに関する。 The present invention relates to an extraction apparatus, an extraction method, and an extraction program.

従来、検索連動型広告やコンテンツ連動型広告に適切なキーワードを設定するために、検索クエリを活用する技術が提案されている。例えば、検索クエリに基づいて共起クエリのグループを複数取得し、取得された共起クエリの複数のグループから、グループ間で共通する共起クエリを共通クエリとして抽出する技術が提案されている。 Conventionally, techniques for utilizing search queries have been proposed in order to set appropriate keywords for search-linked advertisements and content-linked advertisements. For example, a technique has been proposed in which a plurality of co-occurrence query groups are acquired based on a search query, and a co-occurrence query common to the groups is extracted as a common query from the plurality of acquired co-occurrence query groups.

特開２０１２−１１３４８６号公報JP 2012-113486 A

しかしながら、上記の従来技術では、特定のユーザ群の特徴を高精度に抽出することができるとは限らない。具体的には、上記の従来技術では、検索クエリに基づいて共起クエリのグループを複数取得し、取得された共起クエリの複数のグループから、グループ間で共通する共起クエリを共通クエリとして抽出するにすぎず、特定のユーザ群の特徴を高精度に抽出することができるとは限らない。 However, with the above-described conventional technology, it is not always possible to extract the characteristics of a specific user group with high accuracy. Specifically, in the above-described conventional technology, a plurality of co-occurrence query groups are acquired based on the search query, and a common co-occurrence query between the groups is acquired as a common query from the plurality of acquired co-occurrence query groups. However, the feature of a specific user group cannot always be extracted with high accuracy.

本願は、上記に鑑みてなされたものであって、特定のユーザ群の特徴を高精度に抽出することができる抽出装置、抽出方法及び抽出プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object thereof is to provide an extraction device, an extraction method, and an extraction program that can extract the characteristics of a specific user group with high accuracy.

本願に係る抽出装置は、第１グループの第１ユーザの第１特徴量と第２グループの第２ユーザの特徴量であって前記第１特徴量と異なる第２特徴量を取得する取得部と、前記取得部によって取得された第１特徴量と前記取得部によって取得された第２特徴量とに基づいて得られる特徴量である前記第１グループの第１ユーザの第３特徴量を抽出する抽出部とを備えたことを特徴とする。 The extraction device according to the present application includes a first feature amount of a first user of a first group and a feature amount of a second user of a second group, the second feature amount being different from the first feature amount; The third feature amount of the first user of the first group, which is a feature amount obtained based on the first feature amount acquired by the acquisition unit and the second feature amount acquired by the acquisition unit, is extracted. And an extraction unit.

実施形態の一態様によれば、特定のユーザ群の特徴を高精度に抽出することができるといった効果を奏する。 According to one aspect of the embodiment, there is an effect that the characteristics of a specific user group can be extracted with high accuracy.

図１は、実施形態に係る抽出処理の一例を示す図である。FIG. 1 is a diagram illustrating an example of an extraction process according to the embodiment. 図２は、実施形態に係る抽出装置の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of the extraction device according to the embodiment. 図３は、実施形態に係る検索クエリ記憶部の一例を示す図である。FIG. 3 is a diagram illustrating an example of a search query storage unit according to the embodiment. 図４は、変形例に係る決定処理の一例を示す図である。FIG. 4 is a diagram illustrating an example of a determination process according to the modification. 図５は、実施形態に係る抽出処理手順を示すフローチャートである。FIG. 5 is a flowchart illustrating an extraction processing procedure according to the embodiment. 図６は、抽出装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 6 is a hardware configuration diagram illustrating an example of a computer that realizes the function of the extraction device.

以下に、本願に係る抽出装置、抽出方法及び抽出プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る抽出装置、抽出方法及び抽出プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, a mode for carrying out an extraction apparatus, an extraction method, and an extraction program according to the present application (hereinafter referred to as “embodiment”) will be described in detail with reference to the drawings. Note that the extraction apparatus, the extraction method, and the extraction program according to the present application are not limited to the embodiment. In the following embodiments, the same portions are denoted by the same reference numerals, and redundant description is omitted.

〔１．抽出処理の一例〕
まず、図１を用いて、実施形態に係る抽出処理の一例について説明する。図１は、実施形態に係る抽出処理の一例を示す図である。図１では、抽出装置１００が、第１グループの第１ユーザの第１特徴量として、第１グループの第１ユーザによって入力された検索クエリ（以下、適宜、「第１検索クエリ」と呼ぶ。）の第１グループにおける出現確率の重みを取得する例を示す。また、抽出装置１００が、第２グループの第２ユーザの特徴量であって第１特徴量と異なる第２特徴量として、第２グループの第２ユーザによって入力された検索クエリ（以下、適宜、「第２検索クエリ」と呼ぶ。）の第２グループにおける出現確率の重みを取得する例を示す。そして、抽出装置１００が、第１検索クエリの第１グループにおける出現確率の重みと第２検索クエリの第２グループにおける出現確率の重みとの差分をとることにより、第１グループの第１ユーザの第３特徴量として、各検索クエリの出現確率の重みの差分を抽出する例を示す。 [1. Example of extraction process)
First, an example of the extraction process according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating an example of an extraction process according to the embodiment. In FIG. 1, the extraction device 100 is referred to as a search query (hereinafter referred to as “first search query” as appropriate) input by the first user of the first group as the first feature amount of the first user of the first group. The example of acquiring the weight of the appearance probability in the first group of) is shown. In addition, the extraction device 100 may use a search query (hereinafter referred to as appropriate) that is input by the second user of the second group as a second feature amount that is a feature amount of the second user of the second group and is different from the first feature amount. An example of acquiring the weight of the appearance probability in the second group of “second search query” is shown. Then, the extraction device 100 obtains the difference between the weight of the appearance probability in the first group of the first search query and the weight of the appearance probability in the second group of the second search query, thereby obtaining the first user of the first group. As an example of the third feature quantity, an example of extracting the difference in the weights of the appearance probabilities of the respective search queries is shown.

図１に示すように、抽出システム１には、検索サーバ２０と、広告配信サーバ３０と、抽出装置１００とが含まれる。これらの各種装置は、ネットワークＮ（例えば、インターネット）を介して、有線又は無線により通信可能に接続される。なお、図１に示す抽出システム１には、複数台の検索サーバ２０や、複数台の広告配信サーバ３０が含まれてもよい。また、本実施形態では、検索サーバ２０、広告配信サーバ３０および抽出装置１００は、事業者Ｔ１によって管理されているものとする。 As shown in FIG. 1, the extraction system 1 includes a search server 20, an advertisement distribution server 30, and an extraction device 100. These various apparatuses are communicably connected via a network N (for example, the Internet) in a wired or wireless manner. The extraction system 1 shown in FIG. 1 may include a plurality of search servers 20 and a plurality of advertisement distribution servers 30. Moreover, in this embodiment, the search server 20, the advertisement delivery server 30, and the extraction apparatus 100 shall be managed by the provider T1.

検索サーバ２０は、ユーザから検索クエリの入力を受け付けて、ウェブページ等のコンテンツの検索結果を返す情報処理装置である。また、検索サーバ２０は、ユーザが入力した検索クエリや検索日時に関する情報を検索履歴記憶部に格納する。そして、検索サーバ２０は、後述する抽出装置１００からの要求に応じて、検索履歴記憶部に格納したユーザの検索クエリに関する情報を抽出装置１００に送信する。 The search server 20 is an information processing apparatus that receives a search query input from a user and returns a search result of content such as a web page. In addition, the search server 20 stores information related to the search query and the search date and time input by the user in the search history storage unit. And the search server 20 transmits the information regarding the search query of the user stored in the search history memory | storage part to the extraction apparatus 100 according to the request | requirement from the extraction apparatus 100 mentioned later.

広告配信サーバ３０は、後述する抽出装置１００によって決定される広告コンテンツの配信先に抽出装置１００によって決定された配信対象の広告コンテンツを配信する情報処理装置である。また、広告配信サーバ３０は、広告コンテンツや広告コンテンツに設定されたキーワードに関する情報を広告情報記憶部に格納する。そして、広告配信サーバ３０は、後述する抽出装置１００からの要求に応じて、広告情報記憶部に格納した広告コンテンツに設定されたキーワードに関する情報を抽出装置１００に送信する。 The advertisement distribution server 30 is an information processing apparatus that distributes advertisement content to be distributed determined by the extraction device 100 to a distribution destination of advertisement content determined by the extraction device 100 described later. Further, the advertisement distribution server 30 stores information regarding the advertisement content and the keyword set for the advertisement content in the advertisement information storage unit. And the advertisement delivery server 30 transmits the information regarding the keyword set to the advertisement content stored in the advertisement information storage unit to the extraction device 100 in response to a request from the extraction device 100 described later.

抽出装置１００は、特定のユーザ群の検索クエリの特定のユーザ群における出現確率の重みと他のユーザ群の検索クエリの他のユーザ群における出現確率の重みとの差分をとることにより、特定のユーザ群の検索クエリの出現確率の重みの差分を抽出する情報処理装置である。抽出装置１００は、特定のユーザ群に属するユーザの検索クエリの送信要求を検索サーバ２０に送って、特定のユーザ群に属するユーザの検索クエリを検索サーバ２０から取得する。また、抽出装置１００は、広告コンテンツに設定されたキーワードに関する情報の送信要求を広告配信サーバ３０に送って、広告コンテンツに設定されたキーワードに関する情報を広告配信サーバ３０から取得する。 The extraction apparatus 100 takes a difference between the weight of the appearance probability in the specific user group of the search query of the specific user group and the weight of the appearance probability in the other user group of the search query of the other user group. It is an information processing apparatus that extracts a difference in weights of appearance probabilities of search queries for a user group. The extraction device 100 sends a search query transmission request for a user belonging to a specific user group to the search server 20 and acquires a search query for a user belonging to the specific user group from the search server 20. Further, the extraction apparatus 100 sends a request for transmitting information related to the keyword set in the advertisement content to the advertisement distribution server 30 and acquires information related to the keyword set in the advertisement content from the advertisement distribution server 30.

以下、図１を用いて、抽出処理の一例を説明する。図１に示す例では、抽出装置１００は、検索サーバ２０から第１検索クエリを取得する。抽出装置１００は、第１検索クエリとして、検索クエリＱ１、検索クエリＱ２、検索クエリＱ３、検索クエリＱ４、検索クエリＱ５、…を取得する（ステップＳ１）。 Hereinafter, an example of the extraction process will be described with reference to FIG. In the example illustrated in FIG. 1, the extraction device 100 acquires a first search query from the search server 20. The extraction device 100 acquires a search query Q1, a search query Q2, a search query Q3, a search query Q4, a search query Q5,... As a first search query (step S1).

続いて、抽出装置１００は、各第１検索クエリの第１グループにおける出現確率の重みを算出する。抽出装置１００は、各第１検索クエリを単純ベイズ分類器に入力して、第１グループの第１ユーザのみによって検索される確率が高いか否かに関する重みを算出する（ステップＳ２）。そして、抽出装置１００は、第１グループの第１ユーザの第１特徴量として、各第１検索クエリの第１グループにおける出現確率の重みを取得する。例えば、第１グループの第１ユーザが１００人いる場合、１００人中９０人が検索クエリＱ１を入力したとすると、検索クエリＱ１の第１グループにおける出現確率は９０％である。そして、抽出装置１００は、検索クエリＱ１の第１グループにおける出現確率の重みとして０．９を取得する。 Subsequently, the extraction apparatus 100 calculates the weight of the appearance probability in the first group of each first search query. The extraction device 100 inputs each first search query to the naive Bayes classifier, and calculates a weight related to whether or not there is a high probability of being searched only by the first user of the first group (step S2). And the extraction apparatus 100 acquires the weight of the appearance probability in the 1st group of each 1st search query as the 1st feature amount of the 1st user of the 1st group. For example, if there are 100 first users in the first group, and 90 out of 100 users input the search query Q1, the appearance probability of the search query Q1 in the first group is 90%. Then, the extraction apparatus 100 acquires 0.9 as the weight of the appearance probability in the first group of the search query Q1.

続いて、抽出装置１００は、検索サーバ２０から第２検索クエリを取得する。抽出装置１００は、第２検索クエリとして、検索クエリＱ１、検索クエリＱ２、検索クエリＱ３、検索クエリＱ４、検索クエリＱ５、…を取得する（ステップＳ３）。 Subsequently, the extraction device 100 acquires a second search query from the search server 20. The extraction device 100 acquires a search query Q1, a search query Q2, a search query Q3, a search query Q4, a search query Q5,... As a second search query (step S3).

続いて、抽出装置１００は、各第２検索クエリの第２グループにおける出現確率の重みを算出する。抽出装置１００は、各第２検索クエリを単純ベイズ分類器に入力して、第２グループの第２ユーザのみによって検索される確率が高いか否かに関する重みを算出する（ステップＳ４）。そして、抽出装置１００は、第２グループの第２ユーザの第２特徴量として、各第２検索クエリの第２グループにおける出現確率の重みを取得する。例えば、第２グループの第２ユーザが１０００人いる場合に、１０００人中１００人が検索クエリＱ１を入力したとすると、検索クエリＱ１の第２グループにおける出現確率は１０％である。そして、抽出装置１００は、検索クエリＱ１の第２グループにおける出現確率の重みとして０．１を取得する。 Subsequently, the extraction apparatus 100 calculates the weight of the appearance probability in the second group of each second search query. The extraction device 100 inputs each second search query to the naive Bayes classifier, and calculates a weight related to whether or not there is a high probability of being searched only by the second user of the second group (step S4). And the extraction apparatus 100 acquires the weight of the appearance probability in the 2nd group of each 2nd search query as a 2nd feature-value of the 2nd user of a 2nd group. For example, if there are 1000 second users in the second group and 100 out of 1000 users input the search query Q1, the appearance probability of the search query Q1 in the second group is 10%. Then, the extraction device 100 acquires 0.1 as the weight of the appearance probability in the second group of the search query Q1.

続いて、抽出装置１００は、各検索クエリの第１グループにおける出現確率の重みから各検索クエリの第２グループにおける出現確率の重みを引く（ステップＳ５）。例えば、抽出装置１００は、検索クエリＱ１の第１グループにおける出現確率の重み０．９から検索クエリＱ１の第２グループにおける出現確率の重み０．１を引く。 Subsequently, the extraction apparatus 100 subtracts the weight of the appearance probability in the second group of each search query from the weight of the appearance probability in the first group of each search query (step S5). For example, the extraction apparatus 100 subtracts the weight 0.1 of the appearance probability in the second group of the search query Q1 from the weight 0.9 of the appearance probability in the first group of the search query Q1.

続いて、抽出装置１００は、第１グループの第１ユーザの第３特徴量として、各検索クエリの出現確率の重みの差分を抽出する（ステップＳ６）。例えば、抽出装置１００は、検索クエリＱ１の第１グループにおける出現確率の重み０．９から検索クエリＱ１の第２グループにおける出現確率の重み０．１を引くことにより、検索クエリＱ１の出現確率の重みの差分として０．８を抽出する。ここで、検索クエリの出現確率の重みの差分の値が大きいことは、第１グループの第１ユーザのみによって検索される確率が高いことを意味する。したがって、抽出装置１００は、出現確率の重みの差分の値が大きい検索クエリほど、第１グループの第１ユーザのみによって検索される確率が高い検索クエリなので、第１グループの第１ユーザに特有の検索クエリであると判定する。そして、抽出装置１００は、第１グループの第１ユーザに特有の検索クエリを第３検索クエリとして抽出する。 Subsequently, the extraction apparatus 100 extracts the difference in the weights of the appearance probabilities of the respective search queries as the third feature amount of the first user of the first group (Step S6). For example, the extraction apparatus 100 subtracts the appearance probability weight 0.1 in the second group of the search query Q1 from the appearance probability weight 0.9 in the first group of the search query Q1, thereby reducing the appearance probability of the search query Q1. 0.8 is extracted as the weight difference. Here, a large value of the difference in the weights of the appearance probabilities of the search query means that there is a high probability of being searched only by the first user of the first group. Therefore, since the extraction device 100 is a search query that has a higher probability of being searched only by the first user of the first group, the search query having a larger difference value of the weights of appearance probabilities is unique to the first user of the first group. It is determined that it is a search query. Then, the extraction apparatus 100 extracts a search query unique to the first user of the first group as the third search query.

続いて、抽出装置１００は、抽出した第１グループの第１ユーザの第３検索クエリに基づいて、広告コンテンツの配信先を決定する（ステップＳ７）。 Subsequently, the extraction apparatus 100 determines the distribution destination of the advertisement content based on the extracted third search query of the first user of the first group (step S7).

また、抽出装置１００は、抽出した第１グループの第１ユーザの第３検索クエリに関連する広告コンテンツを配信対象の広告コンテンツとして決定する。このように、抽出装置１００は、抽出した第１グループの第１ユーザの第３検索クエリに基づいて、配信対象の広告コンテンツを決定する（ステップＳ８）。 In addition, the extraction apparatus 100 determines the advertisement content related to the extracted third search query of the first user of the first group as the advertisement content to be distributed. As described above, the extraction device 100 determines the advertisement content to be distributed based on the extracted third search query of the first user of the first group (step S8).

続いて、抽出装置１００は、第１グループの第１ユーザの第３検索クエリに基づいて決定した広告コンテンツの配信先と広告コンテンツとを広告配信サーバ３０に送信する（ステップＳ９）。 Subsequently, the extraction device 100 transmits the advertising content distribution destination and the advertising content determined based on the third search query of the first user of the first group to the advertising distribution server 30 (step S9).

上述したように、抽出装置１００は、第１グループの第１ユーザの第１特徴量として、第１グループの第１ユーザによって入力された検索クエリ（第１検索クエリ）の第１グループにおける出現確率の重みを取得する。また、抽出装置１００は、第２グループの第２ユーザの特徴量であって第１特徴量と異なる第２特徴量として、第２グループの第２ユーザによって入力された検索クエリ（第２検索クエリ）の第２グループにおける出現確率の重みを取得する。そして、抽出装置１００は、各検索クエリの第１グループにおける出現確率の重みから各検索クエリの第２グループにおける出現確率の重みを引くことにより、第１グループの第１ユーザの第３特徴量として、各検索クエリの出現確率の重みの差分を抽出する。そして、抽出装置１００は、各検索クエリの出現確率の重みの差分の値が大きい検索クエリを第３検索クエリとして抽出する。 As described above, the extraction apparatus 100 uses the first group of the first query of the first group as the first feature amount of the first user of the first group, and the appearance probability of the first group in the first group. Get the weight of. In addition, the extraction apparatus 100 uses a search query (second search query) input by the second user of the second group as a second feature quantity that is a feature quantity of the second user of the second group and is different from the first feature quantity. ) Of the appearance probability in the second group. Then, the extraction apparatus 100 subtracts the weight of the appearance probability in the second group of each search query from the weight of the appearance probability in the first group of each search query, thereby obtaining the third feature amount of the first user of the first group. Then, the difference between the weights of the appearance probabilities of the respective search queries is extracted. Then, the extraction apparatus 100 extracts a search query having a large difference in weight of appearance probability of each search query as the third search query.

このように、抽出装置１００は、各検索クエリの出現確率の重みの差分を抽出することにより、第１グループの第１ユーザのみによって検索される確率が高い第３検索クエリを抽出することができる。これにより、抽出装置１００は、第１グループの第１ユーザに特有の検索クエリである第３検索クエリを高精度に抽出することができる。また、第１グループの第１ユーザに特有の検索クエリである第３検索クエリは、第１グループの第１ユーザの特徴を反映する情報である。したがって、抽出装置１００は、特定のユーザ群の特徴を高精度に抽出することができる。 Thus, the extraction apparatus 100 can extract the third search query having a high probability of being searched only by the first user in the first group by extracting the difference in the weights of the appearance probabilities of the search queries. . Thereby, the extraction apparatus 100 can extract the 3rd search query which is a search query peculiar to the 1st user of a 1st group with high precision. The third search query, which is a search query unique to the first user of the first group, is information reflecting the characteristics of the first user of the first group. Therefore, the extraction apparatus 100 can extract the characteristics of a specific user group with high accuracy.

なお、図１に示す例の抽出システム１は、以下のような前提のサービスに適用できる。例えば、ＣＲＭ（Customer Relationship Management）業務の代行サービスを提供する事業者Ｘと優良顧客の獲得を希望する事業者Ｙがいるとする。この場合、事業者Ｘは、事業者Ｙの優良顧客の顧客リストと事業者Ｙの既存顧客の顧客リストとに基づいて、抽出システム１を適用する。具体的には、事業者Ｙの優良顧客の検索クエリの事業者Ｙの優良顧客における出現確率の重みから事業者Ｙの既存顧客の検索クエリの事業者Ｙの既存顧客における出現確率の重みを引くことにより、事業者Ｙの優良顧客に特有の検索クエリを第３検索クエリとして抽出する。そして、事業者Ｘは、事業者Ｙの既存顧客のうち、事業者Ｙの優良顧客の第３検索クエリと同じような検索クエリを入力したユーザを広告コンテンツの配信先として決定し、事業者Ｙに提案する。また、事業者Ｘは、事業者Ｙの優良顧客の第３検索クエリに基づいて、事業者Ｙの優良顧客の第３検索クエリと同じような検索クエリを入力したユーザに対して配信する広告コンテンツを決定し、事業者Ｙに提案する。このように、事業者Ｘは、図１に示す例の抽出システム１により、事業者Ｙの優良顧客を獲得するために有効な提案をすることができる。したがって、事業者Ｘは、図１に示す例の抽出システム１により、事業者Ｙの既存顧客へのアップセル、クロスセルを促すことができる。 The example extraction system 1 shown in FIG. 1 can be applied to the following premise service. For example, it is assumed that there is a business operator X that provides a proxy service for CRM (Customer Relationship Management) business and a business operator Y that desires to acquire excellent customers. In this case, the business operator X applies the extraction system 1 based on the customer list of the superior customer of the business operator Y and the customer list of the existing customer of the business operator Y. Specifically, the weight of the occurrence probability of the existing customer of the operator Y in the search query of the existing customer of the operator Y is subtracted from the weight of the appearance probability of the excellent customer of the operator Y in the search query of the superior customer of the operator Y. Thus, the search query specific to the good customer of the business operator Y is extracted as the third search query. Then, the business operator X determines a user who inputs a search query similar to the third search query of the superior customer of the business operator Y among the existing customers of the business operator Y as the delivery destination of the advertising content. Propose to. Further, the business X distributes advertisement content to a user who has entered a search query similar to the third search query of the good customer of the business Y based on the third search query of the good customer of the business Y. And proposes to the operator Y. In this way, the business operator X can make an effective proposal for acquiring a good customer of the business operator Y by the extraction system 1 shown in FIG. Therefore, the business operator X can promote up-sell and cross-sell to the existing customer of the business operator Y by the extraction system 1 shown in FIG.

また、図１に示す例の抽出システム１は、以下のような前提のサービスに適用できる。具体的には、ＣＲＭ業務の代行サービスを提供する事業者Ｘと新規顧客の獲得を希望する事業者Ｚがいるとする。この場合、事業者Ｘは、事業者Ｚの既存顧客の顧客リストと事業者Ｚの顧客ではない一般ユーザのリストとに基づいて、抽出システム１を適用する。具体的には、事業者Ｚの既存顧客の検索クエリの事業者Ｚの既存顧客における出現確率の重みから事業者Ｚの顧客ではない一般ユーザの検索クエリの事業者Ｚの顧客ではない一般ユーザにおける出現確率の重みを引くことにより、事業者Ｚの既存顧客に特有の検索クエリを第３検索クエリとして抽出する。そして、事業者Ｘは、事業者Ｚの顧客ではない一般ユーザのうち、事業者Ｚの既存顧客の第３検索クエリと同じような検索クエリを入力したユーザを事業者Ｚの広告コンテンツの配信先として決定し、事業者Ｚに提案する。また、事業者Ｘは、事業者Ｚの既存顧客の第３検索クエリに基づいて、事業者Ｚの既存顧客の第３検索クエリと同じような検索クエリを入力した一般ユーザに対して配信する広告コンテンツを決定し、事業者Ｚに提案する。このように、事業者Ｘは、図１に示す例の抽出システム１により、事業者Ｚの新規顧客を獲得するために有効な提案をすることができる。したがって、事業者Ｘは、図１に示す例の抽出システム１により、事業者Ｚの新規顧客獲得を促すことができる。 Moreover, the extraction system 1 of the example shown in FIG. 1 is applicable to the following premise services. Specifically, it is assumed that there is a business operator X that provides a CRM service substitution service and a business operator Z that desires to acquire a new customer. In this case, the business operator X applies the extraction system 1 based on the customer list of the existing customer of the business operator Z and the list of general users who are not customers of the business operator Z. Specifically, in the general user who is not the customer of the operator Z in the search query of the general user who is not the customer of the operator Z from the weight of the appearance probability in the existing customer of the operator Z in the search query of the existing customer of the operator Z By subtracting the weight of the appearance probability, a search query unique to the existing customer of the operator Z is extracted as the third search query. Then, the business operator X is a general user who is not a customer of the business operator Z, and a user who inputs a search query similar to the third search query of the existing customer of the business operator Z is a distribution destination of the business content of the business operator Z And propose to the operator Z. Further, the business operator X distributes an advertisement distributed to a general user who has input a search query similar to the third search query of the existing customer of the business operator Z based on the third search query of the existing customer of the business operator Z. The content is determined and proposed to the operator Z. Thus, the business operator X can make an effective proposal for acquiring a new customer of the business operator Z by the extraction system 1 shown in FIG. Accordingly, the business operator X can prompt the business operator Z to acquire a new customer by using the extraction system 1 shown in FIG.

また、図１に示す例では、抽出装置１００が、ユーザの特徴量として、ユーザによって入力された検索クエリのそのユーザが属するユーザ群における出現確率の重みに基づいて特定のユーザ群の特徴を抽出する例を示したが、ユーザの特徴量として、ユーザの購入商品、利用サービス、または検索サイト等のそのユーザが属するユーザ群における出現確率の重みに基づいて特定のユーザ群の特徴を抽出してもよい。また、抽出装置１００は、ユーザのデモグラフィック属性やサイコグラフィック属性、行動属性等のそのユーザが属するユーザ群における出現確率の重みに基づいて特定のユーザ群の特徴を抽出してもよい。なお、抽出装置１００は、出現確率の重みではなく、ユーザの検索クエリ、購入商品、利用サービス、検索サイト、デモグラフィック属性、サイコグラフィック属性、または行動属性等に基づいて特定のユーザ群の特徴を抽出してもよい。 In the example illustrated in FIG. 1, the extraction apparatus 100 extracts features of a specific user group based on weights of appearance probabilities in the user group to which the user belongs in the search query input by the user as the user feature amount. As an example, the feature of a specific user group is extracted based on the weight of appearance probability in the user group to which the user belongs, such as a user's purchased product, usage service, or search site. Also good. Further, the extraction apparatus 100 may extract features of a specific user group based on weights of appearance probabilities in the user group to which the user belongs, such as a demographic attribute, psychographic attribute, and behavior attribute of the user. In addition, the extraction apparatus 100 does not use the weight of the appearance probability, but the characteristics of a specific user group based on a user search query, a purchased product, a use service, a search site, a demographic attribute, a psychographic attribute, or an action attribute. It may be extracted.

また、図１に示す例では、抽出装置１００が、ユーザの特徴量として、ユーザによって入力された検索クエリそのものを用いる例を示したが、ユーザによって入力された検索クエリのグループを用いてもよい。例えば、抽出装置１００は、ひらがなの「すいか」とカタカナの「スイカ」と漢字の「西瓜」を検索クエリのグループとして、グループの検索クエリは同義のものとして扱ってもよい。 In the example illustrated in FIG. 1, the extraction apparatus 100 uses the search query itself input by the user as the feature amount of the user. However, a group of search queries input by the user may be used. . For example, the extraction apparatus 100 may treat hiragana “watermelon”, katakana “watermelon”, and kanji “saijo” as groups of search queries, and group search queries as synonymous.

また、図１に示す例では、抽出装置１００と広告配信サーバ３０とは、別装置である場合を示したが、抽出装置１００と広告配信サーバ３０とが一体であってもよい。例えば、抽出装置１００は、広告配信サーバ３０の機能を有し、広告配信サーバ３０の広告配信、および特定のユーザ群の検索クエリの抽出と広告コンテンツの配信先の決定、配信対象の広告コンテンツの決定の両方を行ってもよい。 In the example illustrated in FIG. 1, the extraction device 100 and the advertisement distribution server 30 are separate devices, but the extraction device 100 and the advertisement distribution server 30 may be integrated. For example, the extraction device 100 has the function of the advertisement distribution server 30, and the advertisement distribution of the advertisement distribution server 30, the extraction of a search query for a specific user group, the determination of the distribution destination of the advertisement content, the distribution of the advertisement content to be distributed Both decisions may be made.

〔２．抽出装置の構成〕
次に、図２を用いて、実施形態に係る抽出装置１００について説明する。図２は、実施形態に係る抽出装置１００の構成例を示す図である。図２に示すように、抽出装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。 [2. (Extractor configuration)
Next, the extraction apparatus 100 according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram illustrating a configuration example of the extraction apparatus 100 according to the embodiment. As illustrated in FIG. 2, the extraction device 100 includes a communication unit 110, a storage unit 120, and a control unit 130.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワークＮと有線または無線で接続され、例えば、検索サーバ２０、広告配信サーバ３０との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card). The communication unit 110 is connected to the network N in a wired or wireless manner, and transmits and receives information to and from the search server 20 and the advertisement distribution server 30, for example.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部１２０は、図２に示すように、検索クエリ記憶部１２１を有する。 (Storage unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. As illustrated in FIG. 2, the storage unit 120 includes a search query storage unit 121.

（検索クエリ記憶部１２１）
検索クエリ記憶部１２１は、検索サーバ２０から取得したユーザの検索クエリに関する各種情報を記憶する。図３に、実施形態に係る検索クエリ記憶部１２１の一例を示す。図３に示す例では、検索クエリ記憶部１２１は、「ユーザＩＤ」、「検索クエリ」、「日時」といった項目を有する。 (Search query storage unit 121)
The search query storage unit 121 stores various types of information related to a user search query acquired from the search server 20. FIG. 3 shows an example of the search query storage unit 121 according to the embodiment. In the example illustrated in FIG. 3, the search query storage unit 121 includes items such as “user ID”, “search query”, and “date and time”.

図３に示す例において、１レコード目は、ユーザＩＤ「Ｕ１」により識別されるユーザ（ユーザＵ１）が日時「２０１７年８月１日１０：００」に検索クエリ「Ｑ１」を検索したことを示す。 In the example shown in FIG. 3, the first record indicates that the user (user U1) identified by the user ID “U1” searched for the search query “Q1” on the date “August 1, 2017 10:00”. Show.

（制御部１３０）
制御部１３０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、抽出装置１００内部の記憶装置に記憶されている各種プログラム（生成プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 130)
The control unit 130 is a controller, for example, various programs (an example of a generation program) stored in a storage device inside the extraction device 100 by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like. This is realized by executing the RAM as a work area. The control unit 130 is a controller, and is realized by an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).

図２に示すように、制御部１３０は、取得部１３１と、抽出部１３２と、決定部１３３とを有し、以下に説明する情報処理の作用を実現または実行する。なお、制御部１３０の内部構成は、図２に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 2, the control unit 130 includes an acquisition unit 131, an extraction unit 132, and a determination unit 133, and realizes or executes the information processing operation described below. The internal configuration of the control unit 130 is not limited to the configuration illustrated in FIG. 2, and may be another configuration as long as the information processing described later is performed.

（取得部１３１）
取得部１３１は、第１グループの第１ユーザの第１特徴量と第２グループの第２ユーザの特徴量であって第１特徴量と異なる第２特徴量を取得する。取得部１３１は、第１特徴量として、第１グループの第１ユーザと第１ユーザに関する各要素との各関連度を取得し、第２特徴量として、第２グループの第２ユーザと第２ユーザに関する各要素との各関連度を取得する。具体的には、取得部１３１は、第１特徴量および第２特徴量として、第１グループの第１ユーザおよび第２グループの第２ユーザが入力した検索クエリに関する情報を取得する。例えば、取得部１３１は、第１特徴量として、第１グループの第１ユーザが検索クエリを入力した回数に基づいて、第１ユーザと第１ユーザに関する各検索クエリとの各関連度として、第１検索クエリの第１グループにおける出現確率の重みを取得する。また、取得部１３１は、第２特徴量として、第２グループの第２ユーザが検索クエリを入力した回数に基づいて、第２検索クエリの第２グループにおける出現確率の重みを取得する。 (Acquisition part 131)
The acquisition unit 131 acquires a second feature amount that is a first feature amount of the first user of the first group and a feature amount of the second user of the second group and is different from the first feature amount. The acquisition unit 131 acquires the degree of association between the first user of the first group and each element related to the first user as the first feature amount, and the second user and the second user of the second group as the second feature amount. Acquire each degree of association with each element related to the user. Specifically, the acquisition unit 131 acquires information on a search query input by the first user of the first group and the second user of the second group as the first feature amount and the second feature amount. For example, the acquisition unit 131 uses the first feature amount based on the number of times the first user of the first group has input the search query as the degree of association between the first user and each search query related to the first user. The weight of the appearance probability in the first group of one search query is acquired. In addition, the acquisition unit 131 acquires the weight of the appearance probability in the second group of the second search query based on the number of times the second user of the second group has input the search query as the second feature amount.

取得部１３１は、ユーザに関する各要素として、ユーザが入力した検索クエリの他に、ユーザが購入した購入商品、ユーザが利用したサービス、または、ユーザが閲覧したサイト等を取得してもよい。例えば、取得部１３１は、第１特徴量として、第１グループの第１ユーザが商品を購入した回数に基づいて、第１ユーザと第１ユーザに関する各購入商品との各関連度として、購入商品の第１グループにおける購入確率の重みを取得してもよい。例えば、取得部１３１は、第１特徴量として、第１グループの第１ユーザがサービスを利用した回数に基づいて、第１ユーザと第１ユーザに関する各利用サービスとの各関連度として、利用サービスの第１グループにおける利用確率の重みを取得してもよい。例えば、取得部１３１は、第１特徴量として、第１グループの第１ユーザがサイトを閲覧した回数に基づいて、第１ユーザと第１ユーザに関する各閲覧サイトとの各関連度として、閲覧サイトの第１グループにおける閲覧確率の重みを取得してもよい。取得部１３１は、第１特徴量と同様にして、第２特徴量を取得する。 The acquisition unit 131 may acquire, as each element related to the user, a purchased product purchased by the user, a service used by the user, a site browsed by the user, or the like, in addition to the search query input by the user. For example, the acquisition unit 131 uses, as the first feature amount, the purchased product as the degree of association between the first user and each purchased product related to the first user based on the number of times the first user of the first group has purchased the product. The weight of the purchase probability in the first group may be acquired. For example, the acquisition unit 131 uses, as the first feature amount, the usage service as the degree of association between the first user and each usage service related to the first user based on the number of times the first user of the first group has used the service. The weight of the use probability in the first group may be acquired. For example, the acquisition unit 131 uses, as the first feature amount, the browsing site as the degree of association between the first user and each browsing site related to the first user based on the number of times the first user of the first group browsed the site. You may acquire the weight of the browsing probability in the 1st group. The acquisition unit 131 acquires the second feature value in the same manner as the first feature value.

具体的には、取得部１３１は、第１グループの第１ユーザが入力した第１検索クエリと第２グループの第２ユーザが入力した検索クエリであって第１検索クエリと異なる第２検索クエリの送信要求を検索サーバ２０に送信する。続いて、取得部１３１は、第１ユーザに関する各要素として、第１ユーザが入力した第１検索クエリを検索サーバ２０から取得する。また、取得部１３１は、第２ユーザに関する各要素として、第２ユーザが入力した第２検索クエリを検索サーバ２０から取得する。 Specifically, the acquisition unit 131 includes a first search query input by a first user in the first group and a search query input by a second user in the second group, which is different from the first search query. Is transmitted to the search server 20. Subsequently, the acquisition unit 131 acquires from the search server 20 the first search query input by the first user as each element related to the first user. Moreover, the acquisition part 131 acquires the 2nd search query which the 2nd user input as each element regarding a 2nd user from the search server 20. FIG.

続いて、取得部１３１は、第１検索クエリの第１グループにおける出現確率の重みを算出する。具体的には、取得部１３１は、第１検索クエリを単純ベイズ分類器に入力して、第１グループの第１ユーザのみによって検索される確率が高いか否かに関する重みを算出する。そして、取得部１３１は、第１グループの第１ユーザの第１特徴量として、第１検索クエリの第１グループにおける出現確率の重みを取得する。 Subsequently, the acquisition unit 131 calculates the weight of the appearance probability in the first group of the first search query. Specifically, the acquiring unit 131 inputs the first search query to the naive Bayes classifier and calculates a weight related to whether or not there is a high probability of being searched only by the first user of the first group. And the acquisition part 131 acquires the weight of the appearance probability in the 1st group of a 1st search query as a 1st feature-value of the 1st user of a 1st group.

続いて、取得部１３１は、第２検索クエリの第２グループにおける出現確率の重みを算出する。具体的には、取得部１３１は、第２検索クエリを単純ベイズ分類器に入力して、第２グループの第２ユーザのみによって検索される確率が高いか否かに関する重みを算出する。そして、取得部１３１は、第２グループの第２ユーザの第２特徴量として、第２検索クエリの第２グループにおける出現確率の重みを取得する。 Subsequently, the acquisition unit 131 calculates the weight of the appearance probability in the second group of the second search query. Specifically, the acquisition unit 131 inputs the second search query to the naive Bayes classifier and calculates a weight related to whether or not there is a high probability of being searched only by the second user of the second group. And the acquisition part 131 acquires the weight of the appearance probability in the 2nd group of a 2nd search query as a 2nd feature-value of the 2nd user of a 2nd group.

取得部１３１は、第２特徴量として、第１グループを包含するグループのユーザの特徴量を取得する。例えば、第１グループが優良顧客、第２グループが優良顧客を包含する既存顧客であるとする。この場合、取得部１３１は、第１特徴量として、優良顧客の検索クエリの優良顧客における出現確率の重みを取得する。そして、取得部１３１は、第２特徴量として、優良顧客を包含する既存顧客の検索クエリの優良顧客を包含する既存顧客における出現確率の重みを取得する。また、例えば、第１グループが既存顧客、第２グループが顧客ではない一般ユーザであるとする。この場合、取得部１３１は、第１特徴量として、既存顧客の検索クエリの既存顧客における出現確率の重みを取得する。そして、取得部１３１は、第２特徴量として、顧客ではない一般ユーザの検索クエリの顧客ではない一般ユーザにおける出現確率の重みを取得する。 The acquisition unit 131 acquires the feature amount of the user in the group including the first group as the second feature amount. For example, it is assumed that the first group is a good customer and the second group is an existing customer including the good customer. In this case, the acquisition unit 131 acquires, as the first feature amount, the weight of the appearance probability of the excellent customer search query in the excellent customer. And the acquisition part 131 acquires the weight of the appearance probability in the existing customer which includes the quality customer of the search query of the existing customer which includes the quality customer as a 2nd feature-value. Further, for example, it is assumed that the first group is an existing customer and the second group is a general user who is not a customer. In this case, the acquisition unit 131 acquires the weight of the appearance probability of the existing customer search query in the existing customer as the first feature amount. And the acquisition part 131 acquires the weight of the appearance probability in the general user who is not a customer of the search query of the general user who is not a customer as a 2nd feature-value.

取得部１３１は、第２特徴量として、第１特徴量を取得したのとは異なる時期における第１グループの第１ユーザの特徴量を取得する。例えば、取得部１３１は、第１グループの第１ユーザの第１検索クエリの第１グループにおける出現確率の重みを２０１７年９月に取得する。そして、取得部１３１は、第１グループの第１ユーザの第２検索クエリの第１グループにおける出現確率の重みを２０１７年１２月に取得する。 The acquisition unit 131 acquires, as the second feature amount, the feature amount of the first user in the first group at a time different from the time when the first feature amount is acquired. For example, the acquisition unit 131 acquires the weight of the appearance probability in the first group of the first search query of the first user of the first group in September 2017. And the acquisition part 131 acquires the weight of the appearance probability in the 1st group of the 2nd search query of the 1st user of the 1st group in December, 2017.

（抽出部１３２）
抽出部１３２は、取得部１３１によって取得された第１特徴量と取得部１３１によって取得された第２特徴量とに基づいて得られる特徴量である第１グループの第１ユーザの第３特徴量を抽出する。抽出部１３２は、取得部１３１によって取得された第１特徴量と取得部１３１によって取得された第２特徴量との差分をとることにより、第３特徴量を抽出する。具体的には、抽出部１３２は、取得部１３１によって取得された第１グループの第１ユーザと第１ユーザに関する各要素との各関連度と第２グループの第２ユーザと第２ユーザに関する各要素との各関連度との差分をとることにより、第１グループの第１ユーザの第３特徴量として、各要素の各関連度の差分を抽出する。そして、抽出部１３２は、各要素の各関連度の差分の値が大きい要素を第３要素として抽出する。そして、抽出装置１００は、第１グループの第１ユーザに特有の要素を第３要素として抽出する。 (Extractor 132)
The extraction unit 132 is a third feature amount of the first user of the first group, which is a feature amount obtained based on the first feature amount acquired by the acquisition unit 131 and the second feature amount acquired by the acquisition unit 131. To extract. The extraction unit 132 extracts the third feature value by taking the difference between the first feature value acquired by the acquisition unit 131 and the second feature value acquired by the acquisition unit 131. Specifically, the extracting unit 132 includes each degree of association between the first user of the first group acquired by the acquiring unit 131 and each element related to the first user and each of the second user and second user of the second group. By taking the difference between each element and the degree of association, the difference between the degrees of association of each element is extracted as the third feature amount of the first user of the first group. And the extraction part 132 extracts the element with the large value of the difference of each relevance degree of each element as a 3rd element. And the extraction apparatus 100 extracts the element peculiar to the 1st user of a 1st group as a 3rd element.

例えば、抽出部１３２は、第１特徴量である第１検索クエリの第１グループにおける出現確率の重みと第２特徴量である第２検索クエリの第２グループにおける出現確率の重みとの差分をとることにより、第１グループの第１ユーザの第３特徴量として各検索クエリの出現確率の重みの差分を抽出する。続いて、抽出部１３２は、抽出した出現確率の重みの差分の値が大きい順に検索クエリを並べる。抽出部１３２は、抽出した出現確率の重みの差分の値が大きい順に検索クエリに順位を付けてもよい。そして、抽出部１３２は、各検索クエリの出現確率の重みの差分の値が大きい検索クエリを第３検索クエリとして抽出する。 For example, the extraction unit 132 calculates the difference between the weight of the appearance probability in the first group of the first search query that is the first feature quantity and the weight of the appearance probability in the second group of the second search query that is the second feature quantity. As a result, the difference in the weight of the appearance probability of each search query is extracted as the third feature amount of the first user of the first group. Subsequently, the extraction unit 132 arranges the search queries in descending order of the extracted difference value of the weights of appearance probabilities. The extraction unit 132 may rank the search queries in descending order of the extracted difference value of the weights of appearance probabilities. Then, the extraction unit 132 extracts a search query having a large difference in the weights of appearance probabilities of the search queries as a third search query.

抽出部１３２は、抽出した出現確率の重みの差分の値が、正の値であって絶対値が大きい検索クエリほど、第１グループの第１ユーザのみによって検索される確率が高い検索クエリであるから、第１グループの第１ユーザに特有の検索クエリであると判定する。すなわち、抽出部１３２は、出現確率の重みの差分の値が、正の値であって絶対値が大きいほど、第１グループの第１ユーザに特有の検索クエリであると判定する。したがって、抽出装置１００は、出現確率の重みの差分の値が正の値であって絶対値が大きい検索クエリを第３検索クエリとして抽出する。 The extraction unit 132 is a search query that has a higher probability of being searched only by the first user of the first group, as the search query has a positive difference in the weight value of the appearance probability extracted and a larger absolute value. Therefore, it is determined that the search query is unique to the first user of the first group. That is, the extraction unit 132 determines that the search query is more specific to the first user of the first group as the difference between the weights of the appearance probabilities is a positive value and the absolute value is larger. Therefore, the extraction apparatus 100 extracts a search query having a positive difference in the appearance probability weight difference and a large absolute value as the third search query.

なお、抽出部１３２は、抽出した出現確率の重みの差分の値が、負の値であって絶対値が大きい検索クエリほど、第１グループの第１ユーザのみによって検索される確率が低い検索クエリであるから、第１グループの第１ユーザに特有の検索クエリではないと判定する。すなわち、抽出部１３２は、出現確率の重みの差分の値が、負の値であって絶対値が大きいほど、第１グループの第１ユーザに特有の検索クエリではないと判定する。したがって、抽出装置１００は、出現確率の重みの差分の値が負の値であって絶対値が大きい検索クエリは第３検索クエリとして抽出しない。 In addition, the extraction part 132 is a search query with a lower probability of being searched only by the first user of the first group for a search query in which the extracted difference value of the weights of appearance probabilities is a negative value and has a larger absolute value. Therefore, it is determined that the search query is not unique to the first user of the first group. That is, the extraction unit 132 determines that the search query specific to the first user of the first group is not as the negative value and the absolute value of the difference in the weights of the appearance probabilities are larger. Therefore, the extraction apparatus 100 does not extract a search query that has a negative difference in appearance probability weight and a large absolute value as a third search query.

また、抽出部１３２は、抽出した出現確率の重みの差分の値が０である場合は、第１グループの第１ユーザのみによって検索される確率と第２グループの第２ユーザのみによって検索される確率が同じ検索クエリであるから、第１グループの第１ユーザに特有の検索クエリではないと判定する。すなわち、抽出部１３２は、出現確率の重みの差分の値が０である場合は、第１グループの第１ユーザに特有の検索クエリではないと判定する。したがって、抽出装置１００は、出現確率の重みの差分の値が０である検索クエリは第３検索クエリとして抽出しない。 In addition, when the value of the difference between the weights of the extracted appearance probabilities is 0, the extraction unit 132 searches only by the probability that only the first user of the first group searches and the second user of the second group. Since the search queries have the same probability, it is determined that the search query is not unique to the first user of the first group. That is, the extraction unit 132 determines that the search query is not unique to the first user of the first group when the difference value of the appearance probability weight is 0. Therefore, the extraction device 100 does not extract a search query whose appearance probability weight difference value is 0 as the third search query.

例えば、抽出部１３２は、取得部１３１によって取得された優良顧客の検索クエリの優良顧客における出現確率の重みと取得部１３１によって取得された既存顧客の検索クエリの既存顧客における出現確率の重みとの差分をとることにより、優良顧客の検索クエリの出現確率の重みの差分を抽出する。そして、抽出部１３２は、各検索クエリの出現確率の重みの差分の値が大きい検索クエリを優良顧客の第３検索クエリとして抽出する。 For example, the extraction unit 132 calculates the weight of the appearance probability in the good customer of the search query of the good customer acquired by the acquisition unit 131 and the weight of the appearance probability in the existing customer of the search query of the existing customer acquired by the acquisition unit 131. By taking the difference, the difference in the weight of the appearance probability of the search query of the excellent customer is extracted. And the extraction part 132 extracts a search query with a large difference value of the weight of the appearance probability of each search query as a 3rd search query of a quality customer.

例えば、抽出部１３２は、取得部１３１によって取得された既存顧客の検索クエリの既存顧客における出現確率の重みと取得部１３１によって取得された顧客ではない一般ユーザの検索クエリの顧客ではない一般ユーザにおける出現確率の重みとの差分をとることにより、既存顧客の検索クエリの出現確率の重みの差分を抽出する。そして、抽出部１３２は、各検索クエリの出現確率の重みの差分の値が大きい検索クエリを既存顧客の第３検索クエリとして抽出する。 For example, the extraction unit 132 uses the weight of the appearance probability of the existing customer search query acquired by the acquisition unit 131 and the general user who is not the customer of the search query of the general user who is not the customer acquired by the acquisition unit 131. By taking the difference with the weight of the appearance probability, the difference in the weight of the appearance probability of the search query of the existing customer is extracted. And the extraction part 132 extracts the search query with a large difference value of the weight of the appearance probability of each search query as a 3rd search query of an existing customer.

（決定部１３３）
決定部１３３は、抽出部１３２によって抽出された第３特徴量に基づいてコンテンツの配信を制御する。具体的には、決定部１３３は、抽出部１３２によって抽出された第３特徴量に基づいて抽出された第３検索クエリに基づいてコンテンツの配信を制御する。例えば、広告コンテンツに設定されたキーワードに関する情報の送信要求を広告配信サーバ３０に送る。続いて、決定部１３３は、広告コンテンツに設定されたキーワードに関する情報を広告配信サーバ３０から取得する。そして、決定部１３３は、広告コンテンツに設定されたキーワードと抽出部１３２によって抽出された第３検索クエリとに基づいてコンテンツの配信を制御する。 (Determining unit 133)
The determination unit 133 controls content distribution based on the third feature amount extracted by the extraction unit 132. Specifically, the determination unit 133 controls content distribution based on the third search query extracted based on the third feature amount extracted by the extraction unit 132. For example, a transmission request for information related to the keyword set in the advertisement content is sent to the advertisement distribution server 30. Subsequently, the determination unit 133 acquires information about the keyword set in the advertisement content from the advertisement distribution server 30. Then, the determination unit 133 controls content distribution based on the keyword set for the advertisement content and the third search query extracted by the extraction unit 132.

決定部１３３は、抽出部１３２によって抽出された第３特徴量に基づいて配信対象のコンテンツを決定する。具体的には、決定部１３３は、抽出部１３２によって抽出された第３特徴量に基づいて抽出された第３検索クエリに基づいて配信対象のコンテンツを決定する。決定部１３３は、取得部１３１によって取得された第２特徴量と抽出部１３２によって抽出された第３特徴量とに基づいて、第３特徴量と類似する特徴量を有する第２グループのユーザに対する配信対象のコンテンツを決定する。例えば、決定部１３３は、取得部１３１によって取得された第２検索クエリと抽出部１３２によって抽出された第３検索クエリとに基づいて、第３検索クエリと類似する検索クエリを入力した第２グループの第２ユーザに対する配信対象のコンテンツを決定する。例えば、決定部１３３は、第３検索クエリと類似するキーワードが設定された広告コンテンツを第２グループの第２ユーザに対する配信対象のコンテンツとして決定する。 The determination unit 133 determines the content to be distributed based on the third feature amount extracted by the extraction unit 132. Specifically, the determination unit 133 determines the content to be distributed based on the third search query extracted based on the third feature amount extracted by the extraction unit 132. Based on the second feature amount acquired by the acquisition unit 131 and the third feature amount extracted by the extraction unit 132, the determination unit 133 determines the second group of users having a feature amount similar to the third feature amount. Determine the content to be distributed. For example, the determination unit 133 inputs the second search query similar to the third search query based on the second search query acquired by the acquisition unit 131 and the third search query extracted by the extraction unit 132. The content to be distributed to the second user is determined. For example, the determination unit 133 determines advertisement content in which a keyword similar to the third search query is set as content to be distributed to the second user of the second group.

また、決定部１３３は、抽出部１３２によって抽出された第３特徴量に基づいてコンテンツの配信先を決定する。具体的には、決定部１３３は、抽出部１３２によって抽出された第３特徴量に基づいて抽出された第３検索クエリに基づいてコンテンツの配信先を決定する。決定部１３３は、取得部１３１によって取得された第２特徴量と抽出部１３２によって抽出された第３特徴量とに基づいて、第３特徴量と類似する特徴量を有する第２グループのユーザをコンテンツの配信先として決定する。例えば、決定部１３３は、取得部１３１によって取得された第２検索クエリと抽出部１３２によって抽出された第３検索クエリとに基づいて、第３検索クエリと類似する検索クエリを入力した第２グループの第２ユーザをコンテンツの配信先として決定する。 Further, the determination unit 133 determines a content distribution destination based on the third feature amount extracted by the extraction unit 132. Specifically, the determination unit 133 determines a content distribution destination based on the third search query extracted based on the third feature amount extracted by the extraction unit 132. Based on the second feature amount acquired by the acquisition unit 131 and the third feature amount extracted by the extraction unit 132, the determination unit 133 selects a second group of users having a feature amount similar to the third feature amount. Determine as the content distribution destination. For example, the determination unit 133 inputs the second search query similar to the third search query based on the second search query acquired by the acquisition unit 131 and the third search query extracted by the extraction unit 132. Is determined as a content distribution destination.

また、決定部１３３は、第３検索クエリのなかから、重み順に並べた順位が所定の順位以内である検索クエリに基づいて、配信対象の広告コンテンツを決定してもよい。例えば、決定部１３３は、配信対象の広告コンテンツとして、重み順に並べた順位が所定の順位以内であるような第３検索クエリと類似するキーワードが設定されている広告コンテンツを決定してもよい。また、決定部１３３は、第３検索クエリのなかから、重み順に並べた順位が所定の順位以内である検索クエリに基づいて、広告コンテンツの配信先を決定してもよい。 Further, the determination unit 133 may determine the advertisement content to be distributed based on a search query in which the order arranged in the weight order is within a predetermined order from among the third search queries. For example, the determination unit 133 may determine an advertisement content in which a keyword similar to the third search query whose rank arranged in the weight order is within a predetermined rank is set as the advertisement content to be distributed. Further, the determination unit 133 may determine the distribution destination of the advertisement content based on the search query in which the order arranged in the weight order is within a predetermined order from among the third search queries.

また、決定部１３３は、第３検索クエリのなかから、算出した重みが所定の閾値以上である検索クエリに基づいて、配信対象の広告コンテンツを決定してもよい。例えば、決定部１３３は、配信対象の広告コンテンツとして、算出した重みが所定の閾値以上であるような第３検索クエリと類似するキーワードが設定されている広告コンテンツを決定してもよい。また、決定部１３３は、第３検索クエリのなかから、算出した重みが所定の閾値以上である検索クエリに基づいて、広告コンテンツの配信先を決定してもよい。 In addition, the determination unit 133 may determine the advertisement content to be distributed based on a search query in which the calculated weight is equal to or greater than a predetermined threshold among the third search queries. For example, the determination unit 133 may determine an advertisement content in which a keyword similar to the third search query in which the calculated weight is equal to or greater than a predetermined threshold is set as the advertisement content to be distributed. In addition, the determination unit 133 may determine the distribution destination of the advertisement content based on a search query in which the calculated weight is equal to or greater than a predetermined threshold among the third search queries.

また、決定部１３３は、抽出部１３２によって抽出された優良顧客の第３検索クエリに基づいて配信対象の広告コンテンツを決定してもよい。例えば、決定部１３３は、優良顧客の第３検索クエリに基づいて、優良顧客の第３検索クエリと同じような検索クエリを入力した既存顧客に対して配信する広告コンテンツを決定してもよい。また、決定部１３３は、抽出部１３２によって抽出された優良顧客の第３検索クエリに基づいて広告コンテンツの配信先を決定してもよい。例えば、決定部１３３は、既存顧客のうち、優良顧客の第３検索クエリと同じような検索クエリを入力した既存顧客を広告コンテンツの配信先として決定してもよい。 Further, the determination unit 133 may determine the advertisement content to be distributed based on the third search query of the good customer extracted by the extraction unit 132. For example, the determination unit 133 may determine advertisement content to be distributed to an existing customer who has input a search query similar to the third search query of the good customer based on the third search query of the good customer. Further, the determination unit 133 may determine the distribution destination of the advertisement content based on the third search query of the good customer extracted by the extraction unit 132. For example, the determination unit 133 may determine an existing customer who has input a search query similar to the third search query of a good customer among existing customers as a distribution destination of the advertisement content.

また、決定部１３３は、抽出部１３２によって抽出された既存顧客の第３検索クエリに基づいて配信対象の広告コンテンツを決定してもよい。例えば、決定部１３３は、既存顧客の第３検索クエリに基づいて、既存顧客の第３検索クエリと同じような検索クエリを入力した顧客ではない一般ユーザに対して配信する広告コンテンツを決定してもよい。また、決定部１３３は、抽出部１３２によって抽出された既存顧客の第３検索クエリに基づいて広告コンテンツの配信先を決定してもよい。例えば、決定部１３３は、顧客ではない一般ユーザのうち、既存顧客の第３検索クエリと同じような検索クエリを入力した顧客ではない一般ユーザを広告コンテンツの配信先として決定してもよい。 Further, the determination unit 133 may determine the advertising content to be distributed based on the third search query of the existing customer extracted by the extraction unit 132. For example, the determination unit 133 determines advertisement content to be distributed to a general user who is not a customer who has input a search query similar to the third search query of the existing customer based on the third search query of the existing customer. Also good. The determination unit 133 may determine the distribution destination of the advertisement content based on the third search query of the existing customer extracted by the extraction unit 132. For example, the determination unit 133 may determine a general user who is not a customer who has input a search query similar to the third search query of an existing customer among general users who are not customers as a distribution destination of the advertisement content.

〔３．抽出処理のフロー〕
次に、図５を用いて、実施形態に係る抽出処理の手順について説明する。図５は、実施形態に係る抽出処理手順を示すフローチャートである。 [3. (Extraction process flow)
Next, the procedure of the extraction process according to the embodiment will be described with reference to FIG. FIG. 5 is a flowchart illustrating an extraction processing procedure according to the embodiment.

図５に示すように、抽出装置１００は、第１グループのユーザの検索クエリを取得する（ステップＳ１０１）。 As illustrated in FIG. 5, the extraction device 100 acquires a search query for users in the first group (step S <b> 101).

続いて、抽出装置１００は、各検索クエリの第１グループにおける出現確率の重みを算出する（ステップＳ１０２）。 Subsequently, the extraction apparatus 100 calculates the weight of the appearance probability in the first group of each search query (step S102).

続いて、抽出装置１００は、第２グループのユーザの検索クエリを取得する（ステップＳ１０３）。 Subsequently, the extraction apparatus 100 acquires a search query for the second group of users (step S103).

続いて、抽出装置１００は、各検索クエリの第２グループにおける出現確率の重みを算出する（ステップＳ１０４）。 Subsequently, the extraction device 100 calculates the weight of the appearance probability in the second group of each search query (step S104).

続いて、抽出装置１００は、各検索クエリの第１グループにおける出現確率の重みから各検索クエリの第２グループにおける出現確率の重みを引いて、各検索クエリの出現確率の重みの差分を抽出する（ステップＳ１０５）。 Subsequently, the extraction apparatus 100 subtracts the weight of the appearance probability in the second group of each search query from the weight of the appearance probability in the first group of each search query, and extracts the difference in the weight of the appearance probability of each search query. (Step S105).

〔４．変形例〕
上述した実施形態に係る抽出システム１は、上記実施形態以外にも種々の異なる形態にて実施されてよい。そこで、以下では、抽出システム１の他の実施形態について説明する。 [4. (Modification)
The extraction system 1 according to the above-described embodiment may be implemented in various different forms other than the above-described embodiment. Therefore, in the following, another embodiment of the extraction system 1 will be described.

〔４−１．差分ベクトルの大きさに基づいてターゲットユーザを決定する〕
図４を用いて、変形例に係る決定処理の一例について説明する。図４は、変形例に係る決定処理の一例を示す図である。図４では、抽出装置１００が、所定のグループのユーザの特徴量ベクトルと他のグループのユーザの特徴量ベクトルとの差分ベクトルの大きさに基づいて、ターゲットユーザを決定する例を示す。 [4-1. The target user is determined based on the size of the difference vector)
An example of the determination process according to the modification will be described with reference to FIG. FIG. 4 is a diagram illustrating an example of a determination process according to the modification. FIG. 4 illustrates an example in which the extraction apparatus 100 determines a target user based on the magnitude of a difference vector between a feature amount vector of users in a predetermined group and a feature amount vector of users in other groups.

図４に示す例は、第１グループ〜第３グループのユーザの検索クエリの各グループにおける出現確率の重みがＮ次元の特徴量ベクトル空間に点で表されている。図４に示す縦軸や横軸は、具体的には、個々の検索クエリであってもよいし、あるいは、同義と扱われる検索クエリのグループであってもよい。図４に示す縦軸や横軸は、例えば、「野球」という一つの検索クエリであってもよいし、あるいは、ひらがなの「すいか」とカタカナの「スイカ」と漢字の「西瓜」を同義のものとして扱う検索クエリのグループであってもよい。 In the example shown in FIG. 4, the weight of the appearance probability in each group of the search queries of the users of the first group to the third group is represented by a point in the N-dimensional feature vector space. Specifically, the vertical axis and horizontal axis shown in FIG. 4 may be individual search queries, or may be a group of search queries treated as synonymous. The vertical and horizontal axes shown in FIG. 4 may be, for example, a single search query “baseball”, or synonymous with “suika” in hiragana, “watermelon” in katakana, and “saijo” in kanji. It may be a group of search queries handled as a thing.

抽出装置１００は、第１グループの第１ユーザの特徴量ベクトルとして、事業者Ｚの既存顧客の検索クエリの事業者Ｚの既存顧客における出現確率の重みを取得する。また、抽出装置１００は、第２グループの第２ユーザの特徴量ベクトルとして、事業者Ｚの競合他社である事業者Ｗの既存顧客の検索クエリの事業者Ｗの既存顧客における出現確率の重みを取得する。また、抽出装置１００は、第３グループのユーザの特徴量ベクトルとして、事業者Ｚの競合他社である事業者Ｖの既存顧客の検索クエリの事業者Ｖの既存顧客における出現確率の重みを取得する。 The extraction apparatus 100 acquires the weight of the appearance probability of the existing customer of the operator Z in the search query of the existing customer of the operator Z as the feature vector of the first user of the first group. Further, the extraction apparatus 100 uses the weight of the appearance probability of the existing customer of the operator W in the search query of the existing customer of the operator W who is a competitor of the operator Z as the feature vector of the second user of the second group. get. Further, the extraction apparatus 100 acquires the weight of the appearance probability of the existing customer of the business operator V in the search query of the existing customer of the business operator V who is the competitor of the business operator Z as the feature quantity vector of the user of the third group. .

続いて、抽出装置１００は、事業者Ｚの既存顧客の検索クエリの事業者Ｚの既存顧客における出現確率の重みと事業者Ｗの既存顧客の検索クエリの事業者Ｗの既存顧客における出現確率の重みとの差分をとることにより、両者の差分ベクトルを算出する。また、抽出装置１００は、事業者Ｚの既存顧客の検索クエリの事業者Ｚの既存顧客における出現確率の重みと事業者Ｖの既存顧客の検索クエリの事業者Ｖの既存顧客における出現確率の重みとの差分をとることにより、両者の差分ベクトルを算出する。 Subsequently, the extraction device 100 determines the weight of the appearance probability of the existing customer of the operator Z in the search query of the existing customer of the operator Z and the appearance probability of the existing customer of the search query of the operator W of the operator W. By calculating the difference from the weight, a difference vector between the two is calculated. Further, the extraction apparatus 100 weights the appearance probability of the existing customer of the operator Z in the search query of the existing customer of the operator Z and the weight of the appearance probability of the existing customer of the operator V in the search query of the existing customer of the operator V. The difference vector between the two is calculated.

図４に示すとおり、第１グループである事業者Ｚの既存顧客の検索クエリの事業者Ｚの既存顧客における出現確率の重みと第２グループである事業者Ｗの既存顧客の検索クエリの事業者Ｗの既存顧客における出現確率の重みとの差分ベクトルの大きさよりも、第１グループである事業者Ｚの既存顧客の検索クエリの事業者Ｚの既存顧客における出現確率の重みと第３グループである事業者Ｖの既存顧客の検索クエリの事業者Ｖの既存顧客における出現確率の重みとの差分ベクトルの大きさの方が小さい。 As shown in FIG. 4, the weight of the appearance probability in the existing customer of the operator Z of the existing customer of the operator Z that is the first group and the operator of the search query of the existing customer of the operator W that is the second group. The third group is the weight of the appearance probability of the existing customer of the operator Z in the search query of the existing customer of the operator Z, which is the first group, than the magnitude of the difference vector from the weight of the appearance probability of the existing customer of W The size of the difference vector with the weight of the appearance probability of the existing customer of the operator V in the search query of the existing customer of the operator V is smaller.

したがって、抽出装置１００は、事業者Ｚの既存顧客と事業者Ｗの既存顧客との類似度よりも、事業者Ｚの既存顧客と事業者Ｖの既存顧客との類似度の方が高いと判定する。そして、抽出装置１００は、事業者Ｚの競合他社の既存顧客を新規顧客として獲得するため、事業者Ｚの既存顧客との類似度がより高い事業者Ｖの既存顧客をターゲットユーザとして決定する。 Therefore, the extraction apparatus 100 determines that the similarity between the existing customer of the business operator Z and the existing customer of the business operator V is higher than the similarity between the existing customer of the business operator Z and the existing customer of the business operator W. To do. Then, in order to acquire the existing customer of the competitor of the business operator Z as a new customer, the extraction apparatus 100 determines the existing customer of the business operator V having a higher similarity with the existing customer of the business operator Z as a target user.

〔４−２．検索クエリそのものの差分に基づいて決定する〕
図１に示す例では、抽出装置１００は、検索クエリごとに出現確率の重みの差分を抽出する例を示したが、検索クエリそのものの差分を抽出してもよい。具体的には、抽出装置１００は、第１グループの第１ユーザによって入力された第１検索クエリを検索サーバ２０から取得する。例えば、抽出装置１００は、第１検索クエリとして、検索クエリＱ１１、検索クエリＱ１２、…、検索クエリＱ１５、検索クエリＱ１６、検索クエリＱ１７、…、検索クエリＱ２０、…を検索サーバ２０から取得する。 [4-2. (Determined based on the difference of the search query itself)
In the example illustrated in FIG. 1, the extraction apparatus 100 extracts the difference in the weights of appearance probabilities for each search query. However, the extraction apparatus 100 may extract the difference in the search query itself. Specifically, the extraction device 100 acquires a first search query input by a first user of the first group from the search server 20. For example, the extraction device 100 acquires, as the first search query, the search query Q11, the search query Q12,..., The search query Q15, the search query Q16, the search query Q17,.

続いて、抽出装置１００は、第２グループの第２ユーザによって入力された第２検索クエリを検索サーバ２０から取得する。例えば、抽出装置１００は、第２検索クエリとして、検索クエリＱ１６、検索クエリＱ１７、…、検索クエリＱ２０、…を検索サーバ２０から取得する。 Subsequently, the extraction device 100 acquires the second search query input by the second user of the second group from the search server 20. For example, the extraction device 100 acquires, as the second search query, the search query Q16, the search query Q17,..., The search query Q20,.

続いて、抽出装置１００は、第１グループの第１ユーザの第１検索クエリから第２グループの第２ユーザの第２検索クエリと共通する検索クエリを除くことにより、第１グループの第１ユーザの第３検索クエリを抽出する。例えば、抽出装置１００は、第１グループの第１ユーザの第３検索クエリとして、検索クエリＱ１１、検索クエリＱ１２、…、検索クエリＱ１５、…を抽出する。このように、抽出装置１００は、第１グループの第１ユーザの第１検索クエリと第２グループの第２ユーザの第２検索クエリとの差分をとることにより、第１グループの第１ユーザの第３検索クエリを抽出する。 Subsequently, the extraction apparatus 100 removes a search query common to the second search query of the second user of the second group from the first search query of the first user of the first group, thereby obtaining the first user of the first group. The third search query is extracted. For example, the extraction device 100 extracts the search query Q11, the search query Q12,..., The search query Q15,... As the third search query of the first user of the first group. In this way, the extraction apparatus 100 obtains the difference between the first search query of the first user of the first group and the second search query of the second user of the second group, thereby obtaining the first user of the first group. A third search query is extracted.

続いて、抽出装置１００は、抽出した第１グループの第１ユーザの第３検索クエリに基づいて、広告コンテンツの配信先を決定する。 Subsequently, the extraction apparatus 100 determines the distribution destination of the advertisement content based on the extracted third search query of the first user of the first group.

また、抽出装置１００は、抽出した第１グループの第１ユーザの第３検索クエリに関連する広告コンテンツを配信対象の広告コンテンツとして決定する。このように、抽出装置１００は、抽出した第１グループの第１ユーザの第３検索クエリに基づいて、配信対象の広告コンテンツを決定する。 In addition, the extraction apparatus 100 determines the advertisement content related to the extracted third search query of the first user of the first group as the advertisement content to be distributed. As described above, the extraction device 100 determines the advertisement content to be distributed based on the extracted third search query of the first user of the first group.

続いて、抽出装置１００は、第１グループの第１ユーザの第３検索クエリに基づいて決定した広告コンテンツの配信先と広告コンテンツとを広告配信サーバ３０に送信する。 Subsequently, the extraction device 100 transmits the advertisement content distribution destination and the advertisement content determined based on the third search query of the first user of the first group to the advertisement distribution server 30.

上述したように、抽出装置１００は、第１グループの第１ユーザの第１特徴量として、第１グループの第１ユーザによって入力された検索クエリを取得する。また、抽出装置１００は、第２グループの第２ユーザの特徴量であって第１特徴量と異なる第２特徴量として、第２グループの第２ユーザによって入力された検索クエリを取得する。そして、抽出装置１００は、第１グループの第１ユーザの第１検索クエリと第２グループの第２ユーザの第２検索クエリとの差分をとることにより、第１グループの第１ユーザの第３検索クエリを抽出する。 As described above, the extraction device 100 acquires the search query input by the first user of the first group as the first feature amount of the first user of the first group. Further, the extraction device 100 acquires a search query input by the second user of the second group as a second feature amount that is a feature amount of the second user of the second group and is different from the first feature amount. Then, the extraction device 100 takes the difference between the first search query of the first user of the first group and the second search query of the second user of the second group, thereby obtaining the third of the first user of the first group. Extract search queries.

このように、抽出装置１００は、第１グループの第１ユーザの第１検索クエリから第２グループの第２ユーザの第２検索クエリと共通する検索クエリを除くことにより、第１グループの第１ユーザに特有の検索クエリである第３検索クエリを高精度に抽出することができる。また、第１グループの第１ユーザに特有の検索クエリである第３検索クエリは、第１グループの第１ユーザの特徴を反映する情報である。したがって、抽出装置１００は、特定のユーザ群の特徴を高精度に抽出することができる。 As described above, the extraction apparatus 100 removes a search query that is common to the second search query of the second user of the second group from the first search query of the first user of the first group. The third search query that is a search query specific to the user can be extracted with high accuracy. The third search query, which is a search query unique to the first user of the first group, is information reflecting the characteristics of the first user of the first group. Therefore, the extraction apparatus 100 can extract the characteristics of a specific user group with high accuracy.

〔５．効果〕
上述してきたように、実施形態に係る抽出装置１００は、取得部１３１と、抽出部１３２とを有する。取得部１３１は、第１グループの第１ユーザの第１特徴量と第２グループの第２ユーザの特徴量であって第１特徴量と異なる第２特徴量を取得する。抽出部１３２は、取得部１３１によって取得された第１特徴量と取得部１３１によって取得された第２特徴量とに基づいて得られる特徴量である第１グループの第１ユーザの第３特徴量を抽出する。また、抽出部１３２は、取得部１３１によって取得された第１特徴量と取得部１３１によって取得された第２特徴量との差分をとることにより、第３特徴量を抽出する。 [5. effect〕
As described above, the extraction device 100 according to the embodiment includes the acquisition unit 131 and the extraction unit 132. The acquisition unit 131 acquires a second feature amount that is a first feature amount of the first user of the first group and a feature amount of the second user of the second group and is different from the first feature amount. The extraction unit 132 is a third feature amount of the first user of the first group, which is a feature amount obtained based on the first feature amount acquired by the acquisition unit 131 and the second feature amount acquired by the acquisition unit 131. To extract. In addition, the extraction unit 132 extracts the third feature amount by taking the difference between the first feature amount acquired by the acquisition unit 131 and the second feature amount acquired by the acquisition unit 131.

このように、実施形態に係る抽出装置１００は、第１グループの第１ユーザの第１特徴量から第２グループの第２ユーザの第２特徴量と共通する特徴量を除くことにより、第１グループの第１ユーザに特有の第３特徴量を高精度に抽出することができる。また、第１グループの第１ユーザに特有の第３特徴量は、第１グループの第１ユーザの特徴を反映する情報である。したがって、抽出装置１００は、特定のユーザ群の特徴を高精度に抽出することができる。 As described above, the extraction apparatus 100 according to the embodiment removes the feature amount that is common to the second feature amount of the second user of the second group from the first feature amount of the first user of the first group. The third feature quantity specific to the first user of the group can be extracted with high accuracy. The third feature amount unique to the first user of the first group is information reflecting the feature of the first user of the first group. Therefore, the extraction apparatus 100 can extract the characteristics of a specific user group with high accuracy.

また、取得部１３１は、第１特徴量として、第１グループの第１ユーザと第１ユーザに関する各要素との各関連度を取得し、第２特徴量として、第２グループの第２ユーザと第２ユーザに関する各要素との各関連度を取得する。また、抽出部１３２は、取得部１３１によって取得された第１グループにおける各関連度と第２グループにおける各関連度との差分をとることにより、第１グループの第１ユーザの第３特徴量を抽出する。 In addition, the acquisition unit 131 acquires, as the first feature amount, each degree of association between the first user of the first group and each element related to the first user, and as the second feature amount, the second user of the second group Each degree of association with each element relating to the second user is acquired. In addition, the extraction unit 132 obtains the third feature amount of the first user of the first group by taking the difference between each degree of association in the first group acquired by the acquisition unit 131 and each degree of association in the second group. Extract.

このように、実施形態に係る抽出装置１００は、グループ間で各要素の各関連度の差分を抽出することにより、第１グループの第１ユーザのみに関連度が高い第３要素を抽出することができる。これにより、抽出装置１００は、第１グループの第１ユーザに特有の要素である第３要素を高精度に抽出することができる。また、第１グループの第１ユーザに特有の要素である第３要素は、第１グループの第１ユーザの特徴を反映する情報である。したがって、抽出装置１００は、特定のユーザ群の特徴を高精度に抽出することができる。 As described above, the extraction apparatus 100 according to the embodiment extracts the third element having a high degree of relevance only for the first user of the first group by extracting the difference in the relevance of each element between the groups. Can do. Thereby, the extraction apparatus 100 can extract the 3rd element which is an element peculiar to the 1st user of a 1st group with high precision. The third element, which is an element specific to the first user of the first group, is information reflecting the characteristics of the first user of the first group. Therefore, the extraction apparatus 100 can extract the characteristics of a specific user group with high accuracy.

また、取得部１３１は、第２特徴量として、第１グループを包含するグループのユーザの特徴量を取得する。 In addition, the acquisition unit 131 acquires the feature amount of the user in the group including the first group as the second feature amount.

これにより、抽出装置１００は、第２グループに包含される第１グループの第１ユーザの特徴を高精度に抽出することができる。 Thereby, the extraction apparatus 100 can extract the feature of the first user of the first group included in the second group with high accuracy.

また、取得部１３１は、第２特徴量として、第１特徴量を取得したのとは異なる時期における第１グループの第１ユーザの特徴量を取得する。 Further, the acquisition unit 131 acquires, as the second feature amount, the feature amount of the first user of the first group at a time different from the time when the first feature amount is acquired.

これにより、抽出装置１００は、同一グループのユーザについて時期のトレンドを反映した特徴を高精度に抽出することができる。 Thereby, the extraction apparatus 100 can extract the feature which reflected the trend of the time with respect to the user of the same group with high precision.

また、実施形態に係る抽出装置１００は、抽出部１３２によって抽出された第３特徴量に基づいてコンテンツの配信を制御する決定部１３３をさらに有する。具体的には、決定部１３３は、抽出部１３２によって抽出された第３特徴量に基づいて配信対象のコンテンツを決定する。また、決定部１３３は、抽出部１３２によって抽出された第３特徴量に基づいてコンテンツの配信先を決定する。例えば、決定部１３３は、取得部１３１によって取得された第２特徴量と抽出部１３２によって抽出された第３特徴量とに基づいて、第３特徴量と類似する特徴量を有する第２グループの第２ユーザに対する配信対象のコンテンツを決定する。また、決定部１３３は、取得部１３１によって取得された第２特徴量と抽出部１３２によって抽出された第３特徴量とに基づいて、第３特徴量と類似する特徴量を有する第２グループの第２ユーザをコンテンツの配信先として決定する。 The extraction device 100 according to the embodiment further includes a determination unit 133 that controls content distribution based on the third feature amount extracted by the extraction unit 132. Specifically, the determination unit 133 determines the content to be distributed based on the third feature amount extracted by the extraction unit 132. Further, the determination unit 133 determines a content distribution destination based on the third feature amount extracted by the extraction unit 132. For example, the determination unit 133 determines the second group having a feature amount similar to the third feature amount based on the second feature amount acquired by the acquisition unit 131 and the third feature amount extracted by the extraction unit 132. Content to be distributed to the second user is determined. In addition, the determination unit 133 determines the second group having a feature amount similar to the third feature amount based on the second feature amount acquired by the acquisition unit 131 and the third feature amount extracted by the extraction unit 132. The second user is determined as a content distribution destination.

これにより、抽出装置１００は、第２グループの第２ユーザのうち、第１グループの第１ユーザと類似する特徴を有するユーザをターゲットユーザとして決定することができる。したがって、抽出装置１００は、第２グループの第２ユーザのうち、第１グループの第１ユーザと類似する特徴を有するユーザをターゲットユーザとして決定することにより、第２グループから第１グループの新しいユーザを獲得することができる。そして、抽出装置１００は、第２グループの第２ユーザのうち、第１グループの第１ユーザと類似する特徴を有するユーザをターゲットユーザとして決定することにより、第２グループから第１グループへのアップセル、クロスセルを促すことができる。 Thereby, the extraction apparatus 100 can determine the user who has the characteristic similar to the 1st user of a 1st group among 2nd users of a 2nd group as a target user. Therefore, the extraction apparatus 100 determines a user having characteristics similar to those of the first user of the first group among the second users of the second group as a target user, and thereby new users of the first group from the second group. Can be earned. Then, the extraction apparatus 100 increases the second group to the first group by determining, as a target user, a user having characteristics similar to those of the first user of the first group among the second users of the second group. You can promote cell and cross-sell.

また、取得部１３１は、第１特徴量および第２特徴量として、第１グループの第１ユーザおよび第２グループの第２ユーザが入力した検索クエリに関する情報を取得する。 Moreover, the acquisition part 131 acquires the information regarding the search query which the 1st user of the 1st group and the 2nd user of the 2nd group input as the 1st feature amount and the 2nd feature amount.

検索クエリはユーザのトレンドを高精度に反映する。したがって、抽出装置１００は、特定のユーザ群のトレンドを反映する特徴を高精度に抽出することができる。 Search queries reflect user trends with high accuracy. Therefore, the extraction apparatus 100 can extract a feature reflecting the trend of a specific user group with high accuracy.

〔６．ハードウェア構成〕
また、上述してきた実施形態に係る抽出装置１００は、例えば図６に示すような構成のコンピュータ１０００によって実現される。図６は、抽出装置１００の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [6. Hardware configuration)
Further, the extraction apparatus 100 according to the embodiment described above is realized by a computer 1000 having a configuration as shown in FIG. 6, for example. FIG. 6 is a hardware configuration diagram illustrating an example of a computer that realizes the function of the extraction device 100. The computer 1000 includes a CPU 1100, RAM 1200, ROM 1300, HDD 1400, communication interface (I / F) 1500, input / output interface (I / F) 1600, and media interface (I / F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400 and controls each unit. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started up, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、係るプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、所定の通信網を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを所定の通信網を介して他の機器へ送信する。 The HDD 1400 stores programs executed by the CPU 1100, data used by the programs, and the like. The communication interface 1500 receives data from other devices via a predetermined communication network, sends the data to the CPU 1100, and transmits the data generated by the CPU 1100 to other devices via the predetermined communication network.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display and a printer and an input device such as a keyboard and a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. In addition, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、係るプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides it to the CPU 1100 via the RAM 1200. The CPU 1100 loads the program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. Etc.

例えば、コンピュータ１０００が実施形態に係る抽出装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムまたはデータを実行することにより、制御部１３０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムまたはデータを記録媒体１８００から読み取って実行するが、他の例として、他の装置から所定の通信網を介してこれらのプログラムまたはデータを取得してもよい。 For example, when the computer 1000 functions as the extraction device 100 according to the embodiment, the CPU 1100 of the computer 1000 implements the function of the control unit 130 by executing a program or data loaded on the RAM 1200. The CPU 1100 of the computer 1000 reads and executes these programs or data from the recording medium 1800. However, as another example, these programs or data may be acquired from other devices via a predetermined communication network.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail with reference to the drawings. However, these are merely examples, and various modifications, including the aspects described in the disclosure section of the invention, based on the knowledge of those skilled in the art, It is possible to implement the present invention in other forms with improvements.

〔７．その他〕
また、上記実施形態及び変形例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [7. Others]
In addition, among the processes described in the above-described embodiments and modifications, all or a part of the processes described as being automatically performed can be manually performed, or are described as being performed manually. All or part of the processing can be automatically performed by a known method. In addition, the processing procedures, specific names, and information including various data and parameters shown in the document and drawings can be arbitrarily changed unless otherwise specified. For example, the various types of information illustrated in each drawing is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Each component of each illustrated device is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution / integration of each device is not limited to that shown in the figure, and all or a part thereof may be functionally or physically distributed or arbitrarily distributed in arbitrary units according to various loads or usage conditions. Can be integrated and configured.

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments and modifications can be combined as appropriate within a range that does not contradict processing contents.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、抽出部は、抽出手段や抽出回路に読み替えることができる。 In addition, the “section (module, unit)” described above can be read as “means” or “circuit”. For example, the extraction unit can be read as extraction means or an extraction circuit.

１抽出システム
２０検索サーバ
３０広告配信サーバ
１００抽出装置
１２１検索クエリ記憶部
１３１取得部
１３２抽出部
１３３決定部 DESCRIPTION OF SYMBOLS 1 Extraction system 20 Search server 30 Advertisement delivery server 100 Extraction apparatus 121 Search query memory | storage part 131 Acquisition part 132 Extraction part 133 Determination part

Claims

First search information that is information related to a search query history input by a user of the first group and second search information that is information related to a search query history input by a user of a second group different from the first group. , from the search server, and obtains via the communication unit, based on the first search information acquired, and calculates a first characteristic amount of a user of the first group for each search query, the second search acquired An acquisition unit that calculates a second feature amount of the second group of users for each search query based on the information ;
As the third feature amount of the first group of users, which is a feature amount obtained based on the first feature amount and the second feature amount calculated by the acquisition unit, the first feature amount and the second feature amount An extraction unit that extracts a difference from a feature amount for each search query, and extracts a search query indicating a feature of a user of the first group based on a value of a third feature amount extracted for each search query;
Based on the second search information acquired by the acquisition unit, among users of the second group, a user having a history of inputting the search query extracted by the extraction unit is determined as a content distribution destination, A determination unit that determines that the content related to the search query extracted by the extraction unit is distributed as content to be distributed;
An extraction device comprising:

The acquisition unit
As the first feature amount, each degree of association between the user of the first group and each search query related to the user is acquired, and as the second feature amount, each search query related to the user of the second group and the user, Get the relevance of each
The extraction unit includes:
Extracting the third feature amount of the user of the first group by taking the difference between each degree of association in the first group acquired by the acquisition unit and each degree of association in the second group. The extraction device according to claim 1.

The acquisition unit
The extraction device according to claim 1 or 2, wherein the feature amount of a user of a group including the first group is acquired as the second feature amount.

The acquisition unit
The feature quantity of the user of the first group at a time different from the time when the first feature quantity is obtained is obtained as the second feature quantity. Extraction device.

The acquisition unit
As the first search information, information on the history of a search query entered by a good customer who is a user posted on a customer list of a good customer at a predetermined business, and as the second search information, in the predetermined business The information regarding the history of the search query input by the existing customer who is a user posted in the customer list of the existing customer is acquired from the search server via the communication unit, and the above-described excellent information is obtained based on the acquired first search information. calculates the first feature quantity of customers in each search query, based on the acquired second search information, calculates a second characteristic amount of the existing customer for each search query,
The extraction unit includes:
As the third feature amount of the excellent customer, which is a feature amount obtained based on the first feature amount and the second feature amount calculated by the acquisition unit , the first feature amount and the second feature amount For each search query, based on the extracted third feature value, extract a search query indicating the characteristics of the excellent customer,
The determination unit
Among the existing customers, an existing customer having a history of inputting the search query extracted by the extraction unit is determined as a content distribution destination, and content related to the search query extracted by the extraction unit is set as distribution target content It determines to distribute. The extraction device as described in any one of Claims 1-4 characterized by the above-mentioned.

The acquisition unit
As the first search information, information on the history of a search query entered by an existing customer who is a user posted on a customer list of an existing customer at a predetermined operator, and as the second search information at the predetermined operator Information related to the history of the search query input by the general user who is a user listed in the general user list is acquired from the search server via the communication unit, and the existing search is performed based on the acquired first search information. Calculating a first feature amount of the customer for each search query, and calculating a second feature amount of the general user for each search query based on the acquired second search information;
The extraction unit includes:
As the third feature amount of the existing customer, which is a feature amount obtained based on the first feature amount and the second feature amount calculated by the acquisition unit , the first feature amount and the second feature amount For each search query, and based on the extracted third feature value, extract a search query indicating the characteristics of the existing customer,
The determination unit
Among the general users, a general user having a history of inputting the search query extracted by the extraction unit is determined as a content distribution destination, and content related to the search query extracted by the extraction unit is set as content to be distributed. It determines to deliver. The extraction device according to any one of claims 1 to 5 characterized by things.

The acquisition unit
The information regarding the search query which the user of the 1st group and the user of the 2nd group input as the 1st feature amount and the 2nd feature amount is acquired. The extraction device described in one.

An extraction method performed by a computer,
First search information that is information related to a search query history input by a user of the first group and second search information that is information related to a search query history input by a user of a second group different from the first group. , from the search server, and obtains via the communication unit, based on the first search information acquired, and calculates a first characteristic amount of a user of the first group for each search query, the second search acquired An obtaining step of calculating a second feature amount of the second group of users for each search query based on the information ;
As the third feature amount of the first group of users, which is a feature amount obtained based on the first feature amount and the second feature amount calculated by the obtaining step, the first feature amount and the second feature amount An extraction step of extracting a difference from the feature amount for each search query, and extracting a search query indicating the feature of the user of the first group based on the extracted value of the third feature amount;
Based on the second search information acquired by the acquisition step, among the users of the second group, a user having a history of inputting the search query extracted by the extraction step is determined as a content distribution destination, A determination step of determining that content related to the search query extracted by the extraction step is distributed as content to be distributed;
The extraction method characterized by including.

First search information that is information related to a search query history input by a user of the first group and second search information that is information related to a search query history input by a user of a second group different from the first group. , from the search server, and obtains via the communication unit, based on the first search information acquired, and calculates a first characteristic amount of a user of the first group for each search query, the second search acquired Obtaining means for calculating, for each search query, a second feature amount of the second group of users based on the information ;
As the third feature value of the first group of users, which is a feature value obtained based on the first feature value and the second feature value calculated by the acquisition means, the first feature value and the second feature value Extraction means for extracting a difference from the feature amount for each search query, and extracting a search query indicating a feature of the user of the first group based on the extracted value of the third feature amount;
Based on the second search information acquired by the acquisition unit, among the users of the second group, a user having a history of inputting the search query extracted by the extraction unit is determined as a content distribution destination, Determining means for deciding to distribute the content related to the search query extracted by the extracting means as the content to be distributed;
An extraction program characterized by causing a computer to execute.