JP6979986B2

JP6979986B2 - Information processing equipment, information processing methods and information processing programs

Info

Publication number: JP6979986B2
Application number: JP2019140358A
Authority: JP
Inventors: 氣範金; 知紘小川; 幸弘寺田; 雅弘橋本
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2021-12-15
Anticipated expiration: 2039-07-30
Also published as: JP2021022343A

Description

本発明は、情報処理装置、情報処理方法及び情報処理プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method and an information processing program.

近年、インターネットの飛躍的な普及に伴い、インターネット上の種々の情報を用いた分析に関する技術が提供されている。例えば、検索クエリの分析に基づいて、所定の分野におけるトレンドを予測する技術が提案されている。 In recent years, with the rapid spread of the Internet, techniques related to analysis using various information on the Internet have been provided. For example, a technique for predicting a trend in a predetermined field based on the analysis of a search query has been proposed.

特表２０１５−５３４１８０号公報Japanese Patent Publication No. 2015-534180

しかしながら、上記の従来技術では、所定のカテゴリに属する対象を示す文字列を適切に抽出することができるとは限らない。例えば、上記の従来技術では、所定のカテゴリにおけるイノベータを特定し、特定したイノベータによって検索された検索クエリに基づいて、所定のカテゴリにおけるトレンドを予測するに過ぎず、所定のカテゴリに属する対象を示す文字列を適切に抽出することができるとは限らない。例えば、上記の従来技術では、検索クエリとして入力された文字列が、世の中に登場したばかりの新しい用語である場合には、その用語が示す対象を特定することは困難である。例えば、新しい用語の意味を辞書等で調べようとしても、まだその用語が辞書等に掲載されていない場合がある。このような場合、新しい用語によって示される対象を特定することは困難なため、新しい用語によって示される対象が属するカテゴリを特定することも困難である。 However, in the above-mentioned conventional technique, it is not always possible to appropriately extract a character string indicating an object belonging to a predetermined category. For example, in the above-mentioned prior art, an innovator in a predetermined category is identified, and a trend in the predetermined category is only predicted based on a search query searched by the specified innovator, and an object belonging to the predetermined category is shown. It is not always possible to properly extract a character string. For example, in the above-mentioned conventional technique, when the character string input as a search query is a new term that has just appeared in the world, it is difficult to specify the target indicated by the term. For example, even if an attempt is made to look up the meaning of a new term in a dictionary or the like, the term may not be listed in the dictionary or the like yet. In such a case, it is difficult to identify the object indicated by the new term, and it is also difficult to identify the category to which the object indicated by the new term belongs.

本願は、上記に鑑みてなされたものであって、所定のカテゴリに属する対象を示す文字列を適切に抽出することができる情報処理装置、情報処理方法及び情報処理プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object thereof is to provide an information processing apparatus, an information processing method, and an information processing program capable of appropriately extracting a character string indicating an object belonging to a predetermined category. do.

本願に係る情報処理装置は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして前記複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから前記所定の検索クエリが属するカテゴリを予測する前記第２学習モデルを取得する取得部と、前記取得部によって取得された第２学習モデルを用いて、検索クエリとして入力された文字列によって示される対象が属するカテゴリを推定する推定部と、前記推定部によって推定されたカテゴリに基づいて、前記文字列の中から、対象分野に属する抽出対象を示す対象文字列を抽出する抽出部を備えたことを特徴とする。 The information processing apparatus according to the present application uses a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned. An acquisition unit that acquires the second learning model that predicts the category to which the predetermined search query belongs from the generated second learning model, and a second learning model acquired by the acquisition unit. Is used to estimate the category to which the target indicated by the character string input as the search query belongs, and the extraction belonging to the target field from the character string based on the category estimated by the estimation unit. It is characterized by having an extraction unit for extracting a target character string indicating a target.

実施形態の一態様によれば、所定のカテゴリに属する対象を示す文字列を適切に抽出することができるといった効果を奏する。 According to one aspect of the embodiment, there is an effect that a character string indicating an object belonging to a predetermined category can be appropriately extracted.

図１は、実施形態に係る情報処理の一例を示す図である。FIG. 1 is a diagram showing an example of information processing according to an embodiment. 図２は、実施形態に係る情報処理システムの構成例を示す図である。FIG. 2 is a diagram showing a configuration example of an information processing system according to an embodiment. 図３は、実施形態に係る情報処理装置の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of the information processing apparatus according to the embodiment. 図４は、実施形態に係るクエリ情報記憶部の一例を示す図である。FIG. 4 is a diagram showing an example of a query information storage unit according to an embodiment. 図５は、実施形態に係る不要文字列記憶部の一例を示す図である。FIG. 5 is a diagram showing an example of an unnecessary character string storage unit according to the embodiment. 図６は、実施形態に係る不要カテゴリ記憶部の一例を示す図である。FIG. 6 is a diagram showing an example of an unnecessary category storage unit according to the embodiment. 図７は、実施形態に係るモデル情報記憶部の一例を示す図である。FIG. 7 is a diagram showing an example of a model information storage unit according to an embodiment. 図８は、実施形態に係る情報処理の一例を示す図である。FIG. 8 is a diagram showing an example of information processing according to the embodiment. 図９は、実施形態に係る情報処理手順を示すフローチャートである。FIG. 9 is a flowchart showing an information processing procedure according to the embodiment. 図１０は、実施形態に係る第１学習モデルの生成処理の一例を示す図である。FIG. 10 is a diagram showing an example of a generation process of the first learning model according to the embodiment. 図１１は、実施形態に係る第１学習モデルの生成処理の一例を示す図である。FIG. 11 is a diagram showing an example of a generation process of the first learning model according to the embodiment. 図１２は、実施形態に係る第２学習モデルの生成処理の一例を示す図である。FIG. 12 is a diagram showing an example of a generation process of the second learning model according to the embodiment. 図１３は、実施形態に係る生成装置の構成例を示す図である。FIG. 13 is a diagram showing a configuration example of the generator according to the embodiment. 図１４は、実施形態に係るクエリ情報記憶部の一例を示す図である。FIG. 14 is a diagram showing an example of the query information storage unit according to the embodiment. 図１５は、実施形態に係るベクトル情報記憶部の一例を示す図である。FIG. 15 is a diagram showing an example of a vector information storage unit according to an embodiment. 図１６は、実施形態に係る分類定義記憶部の一例を示す図である。FIG. 16 is a diagram showing an example of a classification definition storage unit according to an embodiment. 図１７は、実施形態に係るカテゴリ情報記憶部の一例を示す図である。FIG. 17 is a diagram showing an example of a category information storage unit according to an embodiment. 図１８は、実施形態に係るモデル情報記憶部の一例を示す図である。FIG. 18 is a diagram showing an example of a model information storage unit according to an embodiment. 図１９は、実施形態に係る第１学習モデルの一例を示す図である。FIG. 19 is a diagram showing an example of the first learning model according to the embodiment. 図２０は、実施形態に係る第２学習モデルの一例を示す図である。FIG. 20 is a diagram showing an example of the second learning model according to the embodiment. 図２１は、実施形態に係る第１学習モデルの生成処理手順を示すフローチャートである。FIG. 21 is a flowchart showing a generation processing procedure of the first learning model according to the embodiment. 図２２は、実施形態に係る第２学習モデルの生成処理手順を示すフローチャートである。FIG. 22 is a flowchart showing a generation processing procedure of the second learning model according to the embodiment. 図２３は、プログラムを実行するコンピュータのハードウェア構成の一例を示す図である。FIG. 23 is a diagram showing an example of the hardware configuration of the computer that executes the program.

以下に、本願に係る情報処理装置、情報処理方法及び情報処理プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法及び情報処理プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略する。 Hereinafter, an information processing apparatus, an information processing method, and an embodiment for implementing an information processing program (hereinafter referred to as “embodiments”) according to the present application will be described in detail with reference to the drawings. Note that this embodiment does not limit the information processing apparatus, information processing method, and information processing program according to the present application. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate description will be omitted.

〔１．実施形態〕
〔１−１．情報処理の一例〕
まず、図１を用いて、実施形態に係る情報処理の一例について説明する。図１は、実施形態に係る情報処理の一例を示す図である。実施形態に係る情報処理は、図１に示す情報処理装置１００によって行われる。図１に示す例では、情報処理装置１００は、ファッション系サイトに流入した流入検索クエリ群に含まれる雑多な文字列の中から、対象分野であるファッション分野（ファッションに関するカテゴリ）に属する対象を示す文字列を抽出する。 [1. Embodiment]
[1-1. An example of information processing]
First, an example of information processing according to an embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of information processing according to an embodiment. The information processing according to the embodiment is performed by the information processing apparatus 100 shown in FIG. In the example shown in FIG. 1, the information processing apparatus 100 indicates a target belonging to the fashion field (category related to fashion), which is the target field, from among the miscellaneous character strings included in the inflow search query group flowing into the fashion site. Extract the character string.

図１の説明に先立って、図２を用いて、実施形態に係る情報処理システムの構成について説明する。図２は、実施形態に係る情報処理システムの構成例を示す図である。図２に示すように、情報処理システム１には、生成装置５０と、情報処理装置１００と、サービスサーバ２００が含まれる。生成装置５０と、情報処理装置１００と、サービスサーバ２００は所定のネットワークＮを介して、有線または無線により通信可能に接続される。なお、図２に示す情報処理システム１には、任意の数の生成装置５０と任意の数の情報処理装置１００と任意の数のサービスサーバ２００が含まれてもよい。 Prior to the description of FIG. 1, the configuration of the information processing system according to the embodiment will be described with reference to FIG. FIG. 2 is a diagram showing a configuration example of an information processing system according to an embodiment. As shown in FIG. 2, the information processing system 1 includes a generation device 50, an information processing device 100, and a service server 200. The generation device 50, the information processing device 100, and the service server 200 are connected to each other via a predetermined network N so as to be communicable by wire or wirelessly. The information processing system 1 shown in FIG. 2 may include an arbitrary number of generation devices 50, an arbitrary number of information processing devices 100, and an arbitrary number of service servers 200.

生成装置５０は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを生成する情報処理装置である。なお、生成装置５０による第２学習モデルの生成処理の詳細については後述する。生成装置５０は、情報処理装置１００の要求に応じて、第２学習モデルのモデルデータＭＤＴ２を情報処理装置１００に対して送信する。 The generation device 50 is generated by using a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned. It is an information processing device that generates a second learning model that predicts a category to which a predetermined search query belongs from a predetermined search query, which is a two-learning model. The details of the generation process of the second learning model by the generation device 50 will be described later. The generation device 50 transmits the model data MDT2 of the second learning model to the information processing device 100 in response to the request of the information processing device 100.

情報処理装置１００は、対象分野に属する抽出対象を示す対象文字列を抽出するサーバ装置である。図１に示す例では、情報処理装置１００は、ファッション系サイトに流入した流入検索クエリに含まれる文字列の中から、ファッション分野（以下、ファッションカテゴリともいう）に属する対象を示す文字列を抽出する。 The information processing device 100 is a server device that extracts a target character string indicating an extraction target belonging to a target field. In the example shown in FIG. 1, the information processing apparatus 100 extracts a character string indicating an object belonging to the fashion field (hereinafter, also referred to as a fashion category) from the character strings included in the inflow search query that has flowed into the fashion-related site. do.

ここで、一般的に、所定期間内に所定のカテゴリに関するサイトに流入した流入検索クエリには、様々な検索クエリが含まれている。具体的には、流入検索クエリの中には、所定のカテゴリに属する対象を示す文字列とは異なる一般的な用語やサイトの名称等が含まれる。このように、流入検索クエリには、所定のカテゴリに属する対象を示す文字列とは異なる文字列が多く含まれているため、これらの文字列を取り除く必要がある。 Here, in general, the inflow search query that has flowed into the site related to a predetermined category within a predetermined period includes various search queries. Specifically, the inflow search query includes general terms, site names, etc. that are different from the character strings indicating the target belonging to the predetermined category. As described above, since the inflow search query contains many character strings different from the character strings indicating the target belonging to the predetermined category, it is necessary to remove these character strings.

例えば、ファッション系サイトに流入した流入検索クエリの中には、ファッションカテゴリに属する対象を示す文字列とは異なる文字列として、レディース、人気、コーデ、サイズ、…等の一般的な用語や、流入先のサイトの名称を示す文字列が含まれている。これら一般的な用語や、流入先のサイトの名称を示す文字列については、不要な文字列としてあらかじめ登録しておき、辞書ベースで取り除くことができる。一方、流入検索クエリとして入力された文字列の中に、世の中に登場したばかりの新しい用語（以下、未知用語ともいう）が含まれる場合には、その用語を取り除くことは困難である。例えば、ファッション系サイトに流入した流入検索クエリの中には、ファッションカテゴリに属する対象を示す文字列とは異なる文字列として、人名、ブランド名、店舗名、…等を示す文字列が含まれている。これら人名、ブランド名、店舗名、…等を示す文字列は、新たな人名、新たなブランド名、新たな店舗名、…等が次々と世の中に登場するため、未知用語が多く存在する。したがって、これら人名、ブランド名、店舗名、…等を示す文字列については、あらかじめ全ての文字列を登録しておくことが困難である。したがって、従来は、流入検索クエリの中から、所定のカテゴリに属する対象を示す文字列以外の文字列を取り除くことが困難であった。 For example, in the inflow search query that has flowed into a fashion-related site, general terms such as ladies', popularity, coordination, size, etc., and inflow as a character string different from the character string indicating the target belonging to the fashion category. Contains a string indicating the name of the previous site. These general terms and character strings indicating the name of the inflow destination site can be registered in advance as unnecessary character strings and removed on a dictionary basis. On the other hand, if the character string entered as an inflow search query contains a new term that has just appeared in the world (hereinafter, also referred to as an unknown term), it is difficult to remove the term. For example, the inflow search query that has flowed into a fashion site includes a character string that indicates a person name, a brand name, a store name, ..., etc. as a character string different from the character string that indicates a target belonging to a fashion category. There is. There are many unknown terms in the character strings indicating these personal names, brand names, store names, etc., because new personal names, new brand names, new store names, etc. appear one after another in the world. Therefore, it is difficult to register all the character strings indicating the person name, brand name, store name, ..., Etc. in advance. Therefore, conventionally, it has been difficult to remove a character string other than a character string indicating a target belonging to a predetermined category from an inflow search query.

また、抽出対象の文字列が未知用語である場合もある。例えば、ファッション系サイトに流入した流入検索クエリの中には、ファッションカテゴリに属する対象を示す文字列であって、世の中に登場したばかりの新しい用語（以下、未知用語ともいう）が含まれる。特に、ファッション分野のように、新しい対象を示す新しい用語が次々に登場する分野においては、抽出対象の文字列が未知用語である場合も珍しくない。そこで、未知用語が抽出対象の文字列であるか否かを判断するために、用語の意味を辞書等で調べる方法が考えられるが、世の中に登場したばかりの未知用語はまだ辞書等に掲載されていない場合がある。このような場合、未知用語によって示される対象を特定することは困難なため、未知用語によって示される対象が属するカテゴリを特定することも困難である。すなわち、従来は、未知用語によって示される対象が属するカテゴリを特定することが困難なため、未知用語を対象分野に属する文字列として抽出することは困難であった。 In addition, the character string to be extracted may be an unknown term. For example, the inflow search query that has flowed into a fashion-related site includes a new term (hereinafter, also referred to as an unknown term) that is a character string indicating an object belonging to a fashion category and has just appeared in the world. In particular, in fields such as the fashion field where new terms indicating new targets appear one after another, it is not uncommon for the character string to be extracted to be an unknown term. Therefore, in order to determine whether or not the unknown term is a character string to be extracted, a method of looking up the meaning of the term in a dictionary or the like can be considered, but the unknown term that has just appeared in the world is still published in the dictionary or the like. It may not be. In such a case, since it is difficult to specify the object indicated by the unknown term, it is also difficult to specify the category to which the object indicated by the unknown term belongs. That is, conventionally, since it is difficult to specify the category to which the object indicated by the unknown term belongs, it is difficult to extract the unknown term as a character string belonging to the target field.

そこで、本願発明に係る情報処理装置１００は、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを用いて、流入検索クエリとして入力された文字列によって示される対象が属するカテゴリを推定する。これにより、本願発明に係る情報処理装置１００は、流入検索クエリとして入力された文字列が未知用語である場合であっても、未知用語によって示される対象が属するカテゴリを推定することができるため、所定のカテゴリに属する対象を示す文字列を精度よく抽出することができる。具体的には、情報処理装置１００は、未知用語に対して対象分野とは異なる非対象分野を示すカテゴリが推定された場合には、その未知用語が対象カテゴリに属する対象を示す文字列以外の文字列であると推定することができる。よって、情報処理装置１００は、その未知用語を不要文字列として取り除くことができる。一方、情報処理装置１００は、流入検索クエリとして入力された文字列が未知用語である場合であっても、未知用語に対して対象分野を示すカテゴリが推定された場合には、その未知用語を対象分野に属する対象文字列であると推定することができる。よって、情報処理装置１００は、その未知用語を対象文字列として抽出することができる。したがって、本願発明に係る情報処理装置１００は、所定のカテゴリに属する対象を示す文字列を適切に抽出することができる。 Therefore, in the information processing apparatus 100 according to the present invention, the object indicated by the character string input as the inflow search query belongs to the second learning model that predicts the category to which the predetermined search query belongs from the predetermined search query. Estimate the category. As a result, the information processing apparatus 100 according to the present invention can estimate the category to which the object indicated by the unknown term belongs even when the character string input as the inflow search query is an unknown term. It is possible to accurately extract a character string indicating an object belonging to a predetermined category. Specifically, in the information processing apparatus 100, when a category indicating a non-target field different from the target field is estimated for an unknown term, the unknown term is other than a character string indicating a target belonging to the target category. It can be presumed to be a character string. Therefore, the information processing apparatus 100 can remove the unknown term as an unnecessary character string. On the other hand, even if the character string input as the inflow search query is an unknown term, the information processing apparatus 100 uses the unknown term when a category indicating a target field is estimated for the unknown term. It can be estimated that it is a target character string belonging to the target field. Therefore, the information processing apparatus 100 can extract the unknown term as a target character string. Therefore, the information processing apparatus 100 according to the present invention can appropriately extract a character string indicating an object belonging to a predetermined category.

サービスサーバ２００は、対象分野に関するサイトを提供するサーバ装置である。具体的には、サービスサーバ２００は、対象分野に関するサイトに流入した流入検索クエリに関する情報を記憶する。図１に示す例では、サービスサーバ２００は、ファッション分野に関するサイトに流入した流入検索クエリに関する情報を記憶する。サービスサーバ２００は、情報処理装置１００の要求に応じて、流入検索クエリに関する情報を情報処理装置１００に対して送信する。なお、サービスサーバ２００は、ファッション分野に限らず、種々の分野に関するサイトを提供してもよい。 The service server 200 is a server device that provides a site related to a target field. Specifically, the service server 200 stores information about the inflow search query that has flowed into the site related to the target field. In the example shown in FIG. 1, the service server 200 stores information about an inflow search query that has flowed into a site related to the fashion field. The service server 200 transmits information regarding an inflow search query to the information processing device 100 in response to a request from the information processing device 100. The service server 200 may provide sites related to various fields, not limited to the fashion field.

ここから、図１を用いて、情報処理の流れについて説明する。図１では、情報処理装置１００は、ファッション系サイトに流入した流入検索クエリに関する情報の取得要求をサービスサーバ２００に対して送信する。サービスサーバ２００は、情報処理装置１００の要求に応じて、流入検索クエリに関する情報を情報処理装置１００に対して送信する。 From here, the flow of information processing will be described with reference to FIG. In FIG. 1, the information processing apparatus 100 transmits a request for acquiring information regarding an inflow search query that has flowed into a fashion site to a service server 200. The service server 200 transmits information regarding an inflow search query to the information processing device 100 in response to a request from the information processing device 100.

情報処理装置１００は、サービスサーバ２００からファッション系サイトに流入した流入検索クエリに関する情報を取得する。具体的には、情報処理装置１００は、流入検索クエリとして入力された各文字列を取得する(ステップＳ１)。 The information processing apparatus 100 acquires information regarding an inflow search query that has flowed into a fashion site from the service server 200. Specifically, the information processing apparatus 100 acquires each character string input as an inflow search query (step S1).

図１の左側に示す例では、情報処理装置１００は、流入検索クエリＱ１００として入力された文字列「レディース未知用語Ｌ１」を取得する。ここで、「未知用語Ｌ１」は、世の中に登場したばかりの新しい用語(例えば、ファッション用語)であって、まだ辞書等に掲載されていない用語であるものとする。 In the example shown on the left side of FIG. 1, the information processing apparatus 100 acquires the character string “Ladies unknown term L1” input as the inflow search query Q100. Here, it is assumed that the "unknown term L1" is a new term (for example, a fashion term) that has just appeared in the world and has not yet been published in a dictionary or the like.

また、情報処理装置１００は、流入検索クエリＱ２００として入力された文字列「未知用語Ｌ１Ｍサイズ店舗名Ｔ１」を取得する。なお、「店舗名Ｔ１」は、特定の店舗(例えば、特定のアパレルメーカー)の名称を示す語句であって、例えば、世の中に登場したばかりの新しい店舗名を示す語句であるものとする。 Further, the information processing apparatus 100 acquires the character string "unknown term L1 M size store name T1" input as the inflow search query Q200. The "store name T1" is a phrase indicating the name of a specific store (for example, a specific apparel maker), and is, for example, a phrase indicating a new store name that has just appeared in the world.

また、情報処理装置１００は、流入検索クエリＱ３００として入力された文字列「Ｙシャツ２０代」を取得する。 Further, the information processing apparatus 100 acquires the character string "Y-shirt 20s" input as the inflow search query Q300.

また、情報処理装置１００は、流入検索クエリＱ４００として入力された文字列「Ｙシャツ人名Ｍ１」を取得する。なお、「人名Ｍ１」は、特定の人物(例えば、ファッションモデルや芸能人等)の名称を示す語句であって、例えば、世の中に登場したばかりの新しい人名を示す語句であるものとする。 Further, the information processing apparatus 100 acquires the character string "Y-shirt person name M1" input as the inflow search query Q400. The "personal name M1" is a phrase indicating the name of a specific person (for example, a fashion model, an entertainer, etc.), and is, for example, a phrase indicating a new personal name that has just appeared in the world.

また、情報処理装置１００は、流入検索クエリＱ５００として入力された文字列「未知用語Ｌ２コーデ」を取得する。ここで、「未知用語Ｌ２」は、「未知用語Ｌ１」と同様、世の中に登場したばかりの新しい用語(例えば、ファッション用語)であって、まだ辞書等に掲載されていない用語であるものとする。 Further, the information processing apparatus 100 acquires the character string "unknown term L2 coordination" input as the inflow search query Q500. Here, it is assumed that the "unknown term L2" is a new term (for example, a fashion term) that has just appeared in the world and has not yet been published in a dictionary or the like, like the "unknown term L1". ..

また、情報処理装置１００は、流入検索クエリＱ６００として入力された文字列「未知用語Ｌ２ブランド名Ｂ１人気」を取得する。なお、「ブランド名Ｂ１」は、特定のアパラレルブランドの名称を示す語句であって、例えば、世の中に登場したばかりの新しいブランド名を示す語句であるものとする。 Further, the information processing apparatus 100 acquires the character string "unknown term L2 brand name B1 popularity" input as the inflow search query Q600. The "brand name B1" is a phrase indicating the name of a specific parallel brand, and is, for example, a phrase indicating a new brand name that has just appeared in the world.

続いて、情報処理装置１００は、流入検索クエリ群を取得すると、取得した流入検索クエリ群に含まれる各検索クエリとして入力された各文字列から不要な文字列として登録された不要文字列を取り除いた第１文字列を取得する。例えば、情報処理装置１００は、不要な文字列として登録された不要文字列のリストを取得する。続いて、情報処理装置１００は、取得した不要文字列のリストを参照して、取得した流入検索クエリ群に含まれる各検索クエリとして入力された各文字列の中に不要文字列が含まれるか否かを判定する。続いて、情報処理装置１００は、不要文字列が含まれると判定した場合には、検索クエリとして入力された文字列から不要文字列を取り除いた第１文字列を取得する。 Subsequently, when the information processing apparatus 100 acquires the inflow search query group, the information processing apparatus 100 removes the unnecessary character string registered as an unnecessary character string from each character string input as each search query included in the acquired inflow search query group. Get the first character string. For example, the information processing apparatus 100 acquires a list of unnecessary character strings registered as unnecessary character strings. Subsequently, the information processing apparatus 100 refers to the acquired list of unnecessary character strings, and whether the unnecessary character strings are included in each character string input as each search query included in the acquired inflow search query group. Judge whether or not. Subsequently, when the information processing apparatus 100 determines that the unnecessary character string is included, the information processing apparatus 100 acquires the first character string obtained by removing the unnecessary character string from the character string input as the search query.

例えば、情報処理装置１００は、不要文字列のリストを参照して、流入検索クエリＱ１００として入力された文字列「レディース未知用語Ｌ１」に不要文字列である「レディース」が含まれると判定する。続いて、情報処理装置１００は、不要文字列が含まれると判定すると、流入検索クエリＱ１００として入力された文字列「レディース未知用語Ｌ１」から不要文字列である「レディース」を取り除いた第１文字列「未知用語Ｌ１」（第１文字列Ｌ１）を取得する。 For example, the information processing apparatus 100 refers to the list of unnecessary character strings, and determines that the character string "ladies unknown term L1" input as the inflow search query Q100 includes the unnecessary character string "ladies". Subsequently, when the information processing apparatus 100 determines that the unnecessary character string is included, the first character obtained by removing the unnecessary character string "ladies" from the character string "ladies unknown term L1" input as the inflow search query Q100. The column "unknown term L1" (first character string L1) is acquired.

また、情報処理装置１００は、不要文字列のリストを参照して、流入検索クエリＱ２００として入力された文字列「未知用語Ｌ１Ｍサイズ店舗名Ｔ１」に不要文字列である「Ｍサイズ」が含まれると判定する。続いて、情報処理装置１００は、不要文字列が含まれると判定すると、流入検索クエリＱ２００として入力された文字列「未知用語Ｌ１Ｍサイズ店舗名Ｔ１」から不要文字列である「Ｍサイズ」を取り除いた第１文字列「未知用語Ｌ１店舗名Ｔ１」（第１文字列Ｌ２）を取得する。 Further, the information processing apparatus 100 refers to the list of unnecessary character strings, and includes the unnecessary character string "M size" in the character string "unknown term L1 M size store name T1" input as the inflow search query Q200. It is determined that the information is processed. Subsequently, when the information processing apparatus 100 determines that the unnecessary character string is included, the information processing device 100 selects the unnecessary character string "M size" from the character string "unknown term L1 M size store name T1" input as the inflow search query Q200. The removed first character string "unknown term L1 store name T1" (first character string L2) is acquired.

また、情報処理装置１００は、不要文字列のリストを参照して、流入検索クエリＱ３００として入力された文字列「Ｙシャツ２０代」に不要文字列である「２０代」が含まれると判定する。続いて、情報処理装置１００は、不要文字列が含まれると判定すると、流入検索クエリＱ３００として入力された文字列「Ｙシャツ２０代」から不要文字列である「２０代」を取り除いた第１文字列「Ｙシャツ」（第１文字列Ｌ３）を取得する。 Further, the information processing apparatus 100 refers to the list of unnecessary character strings, and determines that the character string "Y-shirt 20s" input as the inflow search query Q300 includes the unnecessary character string "20s". .. Subsequently, when the information processing apparatus 100 determines that the unnecessary character string is included, the first character string "Y-shirt 20s" input as the inflow search query Q300 is removed from the unnecessary character string "20s". The character string "Y shirt" (first character string L3) is acquired.

また、情報処理装置１００は、不要文字列のリストを参照して、流入検索クエリＱ５００として入力された文字列「未知用語Ｌ２コーデ」に不要文字列である「コーデ」が含まれると判定する。続いて、情報処理装置１００は、不要文字列が含まれると判定すると、流入検索クエリＱ５００として入力された文字列「未知用語Ｌ２コーデ」から不要文字列である「コーデ」を取り除いた第１文字列「未知用語Ｌ２」（第１文字列Ｌ５）を取得する。 Further, the information processing apparatus 100 refers to the list of unnecessary character strings, and determines that the character string "unknown term L2 coordination" input as the inflow search query Q500 includes the unnecessary character string "coordination". Subsequently, when the information processing apparatus 100 determines that the unnecessary character string is included, the first character obtained by removing the unnecessary character string "corde" from the character string "unknown term L2 coordination" input as the inflow search query Q500. The column "unknown term L2" (first character string L5) is acquired.

また、情報処理装置１００は、不要文字列のリストを参照して、流入検索クエリＱ６００として入力された文字列「未知用語Ｌ２ブランド名Ｂ１人気」に不要文字列である「人気」が含まれると判定する。続いて、情報処理装置１００は、不要文字列が含まれると判定すると、流入検索クエリＱ６００として入力された文字列「未知用語Ｌ２ブランド名Ｂ１人気」から不要文字列である「人気」を取り除いた第１文字列「未知用語Ｌ２ブランド名Ｂ１」（第１文字列Ｌ６）を取得する。 Further, when the information processing apparatus 100 refers to the list of unnecessary character strings, the character string "unknown term L2 brand name B1 popularity" input as the inflow search query Q600 includes "popularity" which is an unnecessary character string. judge. Subsequently, when the information processing apparatus 100 determines that the unnecessary character string is included, the information processing apparatus 100 removes the unnecessary character string "popularity" from the character string "unknown term L2 brand name B1 popularity" input as the inflow search query Q600. The first character string "unknown term L2 brand name B1" (first character string L6) is acquired.

続いて、情報処理装置１００は、第１文字列を取得すると、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデル（第２学習モデルＭ２）を取得する。続いて、情報処理装置１００は、第２学習モデルを取得すると、第２学習モデルを用いて、第１文字列によって示される対象が属するカテゴリを推定する。情報処理装置１００は、第１文字列によって示される対象が属する複数のカテゴリを推定する。具体的には、情報処理装置１００は、第１文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。より具体的には、情報処理装置１００は、第２学習モデルＭ２の入力情報として第１文字列を入力することにより、第２学習モデルＭ２の出力情報として第１文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する(ステップＳ２)。 Subsequently, when the information processing apparatus 100 acquires the first character string, the information processing apparatus 100 learns the characteristics of the plurality of search queries as if the plurality of search queries input by the same user within a predetermined time have similar characteristics. A second learning model (second learning model M2) that predicts a category to which a predetermined search query belongs is acquired from a predetermined search query, which is a second learning model generated by using the first learning model. Subsequently, when the information processing apparatus 100 acquires the second learning model, the information processing apparatus 100 estimates the category to which the object indicated by the first character string belongs by using the second learning model. The information processing apparatus 100 estimates a plurality of categories to which the object indicated by the first character string belongs. Specifically, the information processing apparatus 100 outputs the probability that the object indicated by the first character string belongs to each category for each category. More specifically, in the information processing apparatus 100, by inputting the first character string as the input information of the second learning model M2, the target indicated by the first character string as the output information of the second learning model M2 is each. The probability of belonging to a category is output for each category (step S2).

図１の真ん中に示す例では、情報処理装置１００は、第１文字列Ｌ１を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ１を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、情報処理装置１００は、第１文字列Ｌ１によって示される対象がファッションに関するカテゴリＣ１（以下、ファッションカテゴリＣ１ともいう）に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 In the example shown in the middle of FIG. 1, when the information processing apparatus 100 acquires the first character string L1, the information processing apparatus 100 inputs the first character string L1 as the input information of the second learning model M2, so that the second learning model M2 As output information, the probability that the target indicated by the first character string L1 belongs to each category is output for each category. For example, the information processing apparatus 100 outputs the probability that the object represented by the first character string L1 belongs to the fashion category C1 (hereinafter, also referred to as fashion category C1) as 100%, and the probability that the object belongs to another category is 0%. ..

また、情報処理装置１００は、第１文字列Ｌ２を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ２を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ２によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、情報処理装置１００は、第１文字列Ｌ２によって示される対象がファッションカテゴリＣ１に属する確率を７０パーセント、店舗名に関するカテゴリＣ４に属する確率を３０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the information processing apparatus 100 acquires the first character string L2, the information processing apparatus 100 inputs the first character string L2 as the input information of the second learning model M2, thereby inputting the first character string as the output information of the second learning model M2. The probability that the object indicated by L2 belongs to each category is output for each category. For example, the information processing apparatus 100 has a 70% probability that the object indicated by the first character string L2 belongs to the fashion category C1, a 30% probability that the object belongs to the category C4 related to the store name, and a 0% probability that the object belongs to another category. Output.

また、情報処理装置１００は、第１文字列Ｌ３を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ３を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ３によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、情報処理装置１００は、第１文字列Ｌ３によって示される対象がファッションカテゴリＣ１に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the information processing apparatus 100 acquires the first character string L3, the information processing apparatus 100 inputs the first character string L3 as the input information of the second learning model M2, thereby inputting the first character string as the output information of the second learning model M2. The probability that the object indicated by L3 belongs to each category is output for each category. For example, the information processing apparatus 100 outputs 100% the probability that the object indicated by the first character string L3 belongs to the fashion category C1 and 0% the probability that the object belongs to another category.

また、情報処理装置１００は、第１文字列Ｌ４を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ４を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ４によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、情報処理装置１００は、第１文字列Ｌ４によって示される対象がファッションカテゴリＣ１に属する確率を５０パーセント、人名に関するカテゴリＣ２に属する確率を５０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the information processing apparatus 100 acquires the first character string L4, the information processing apparatus 100 inputs the first character string L4 as the input information of the second learning model M2, thereby inputting the first character string as the output information of the second learning model M2. The probability that the object indicated by L4 belongs to each category is output for each category. For example, the information processing apparatus 100 outputs that the probability that the object indicated by the first character string L4 belongs to the fashion category C1 is 50%, the probability that the object belongs to the category C2 related to the personal name is 50%, and the probability that the object belongs to another category is 0%. do.

また、情報処理装置１００は、第１文字列Ｌ５を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ５を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ５によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、情報処理装置１００は、第１文字列Ｌ５によって示される対象がファッションカテゴリＣ１に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the information processing apparatus 100 acquires the first character string L5, the information processing apparatus 100 inputs the first character string L5 as the input information of the second learning model M2, thereby inputting the first character string as the output information of the second learning model M2. The probability that the object indicated by L5 belongs to each category is output for each category. For example, the information processing apparatus 100 outputs 100% the probability that the object indicated by the first character string L5 belongs to the fashion category C1 and 0% the probability that the object belongs to another category.

また、情報処理装置１００は、第１文字列Ｌ６を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ６を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ６によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、情報処理装置１００は、第１文字列Ｌ６によって示される対象がファッションカテゴリＣ１に属する確率を６０パーセント、ブランド名に関するカテゴリＣ３に属する確率を４０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the information processing apparatus 100 acquires the first character string L6, the information processing apparatus 100 inputs the first character string L6 as the input information of the second learning model M2, thereby inputting the first character string as the output information of the second learning model M2. The probability that the object indicated by L6 belongs to each category is output for each category. For example, in the information processing apparatus 100, the probability that the object indicated by the first character string L6 belongs to the fashion category C1 is 60%, the probability that the object belongs to the category C3 related to the brand name is 40%, and the probability that the object belongs to the other category is 0%. Output.

続いて、情報処理装置１００は、第１文字列によって示される対象が属するカテゴリを推定すると、推定されたカテゴリに基づいて、第１文字列の中から、対象分野であるファッションカテゴリＣ１に属する抽出対象を示す対象文字列を抽出する(ステップＳ３)。図１の右側に示す例では、情報処理装置１００は、第１文字列の中から、第１文字列「未知用語Ｌ１」（第１文字列Ｌ１）を対象文字列Ｗ１として抽出する。また、情報処理装置１００は、第１文字列の中から、第１文字列「Ｙシャツ」（第１文字列Ｌ３）を対象文字列Ｗ２として抽出する。また、情報処理装置１００は、第１文字列の中から、第１文字列「未知用語Ｌ２」（第１文字列Ｌ５）を対象文字列Ｗ３として抽出する。 Subsequently, when the information processing apparatus 100 estimates the category to which the object indicated by the first character string belongs, the information processing apparatus 100 extracts from the first character string to belong to the fashion category C1 which is the target field, based on the estimated category. The target character string indicating the target is extracted (step S3). In the example shown on the right side of FIG. 1, the information processing apparatus 100 extracts the first character string "unknown term L1" (first character string L1) from the first character string as the target character string W1. Further, the information processing apparatus 100 extracts the first character string "Y-shirt" (first character string L3) from the first character string as the target character string W2. Further, the information processing apparatus 100 extracts the first character string "unknown term L2" (first character string L5) from the first character string as the target character string W3.

具体的には、情報処理装置１００は、第１文字列によって示される対象が属するカテゴリを推定すると、推定したカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを第１文字列毎に判定する。続いて、情報処理装置１００は、推定したカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定した場合、その第１文字列を抽出する。すなわち、情報処理装置１００は、ファッションカテゴリＣ１に属する対象を示す第１文字列を抽出する。続いて、情報処理装置１００は、ファッションカテゴリＣ１に属する対象を示す第１文字列を抽出すると、推定したカテゴリの中に不要なカテゴリとして登録された不要カテゴリを含むか否かを抽出した第１文字列毎に判定する。続いて、情報処理装置１００は、推定したカテゴリの中に不要なカテゴリとして登録された不要カテゴリを含まないと判定した場合、その第１文字列を対象文字列として抽出する。すなわち、情報処理装置１００は、不要なカテゴリに属する対象を示す第１文字列以外の第１文字列を対象文字列として抽出する。 Specifically, when the information processing apparatus 100 estimates the category to which the object indicated by the first character string belongs, the first character indicates whether or not the estimated category includes the fashion category C1 which is the target field. Judgment is made for each column. Subsequently, when the information processing apparatus 100 determines that the fashion category C1 which is the target field is included in the estimated category, the information processing apparatus 100 extracts the first character string thereof. That is, the information processing apparatus 100 extracts a first character string indicating an object belonging to the fashion category C1. Subsequently, when the information processing apparatus 100 extracts the first character string indicating the object belonging to the fashion category C1, the first first character string is extracted as to whether or not the estimated category includes an unnecessary category registered as an unnecessary category. Judgment is made for each character string. Subsequently, when the information processing apparatus 100 determines that the estimated category does not include the unnecessary category registered as an unnecessary category, the information processing apparatus 100 extracts the first character string as the target character string. That is, the information processing apparatus 100 extracts the first character string other than the first character string indicating the target belonging to the unnecessary category as the target character string.

例えば、情報処理装置１００は、第１文字列Ｌ１について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。情報処理装置１００は、第１文字列Ｌ１について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、情報処理装置１００は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ１について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、情報処理装置１００は、第１文字列Ｌ１について推定されたカテゴリの中に、不要カテゴリを含まないと判定する。続いて、情報処理装置１００は、不要カテゴリを含まないと判定したので、第１文字列Ｌ１を対象文字列Ｗ１として抽出する。 For example, the information processing apparatus 100 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L1. The information processing apparatus 100 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L1. Subsequently, since the information processing apparatus 100 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L1. Subsequently, the information processing apparatus 100 determines that the category estimated for the first character string L1 does not include an unnecessary category. Subsequently, since the information processing apparatus 100 has determined that the unnecessary category is not included, the first character string L1 is extracted as the target character string W1.

また、情報処理装置１００は、第１文字列Ｌ２について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。情報処理装置１００は、第１文字列Ｌ２について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、情報処理装置１００は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ２について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、情報処理装置１００は、第１文字列Ｌ２について推定されたカテゴリの中に、不要カテゴリである店舗名に関するカテゴリＣ３を含むと判定する。続いて、情報処理装置１００は、不要カテゴリを含むと判定したので、第１文字列Ｌ２を対象文字列として抽出しないことを決定する。 Further, the information processing apparatus 100 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L2. The information processing apparatus 100 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L2. Subsequently, since the information processing apparatus 100 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L2. Subsequently, the information processing apparatus 100 determines that the category C3 related to the store name, which is an unnecessary category, is included in the category estimated for the first character string L2. Subsequently, since the information processing apparatus 100 has determined that the unnecessary category is included, it is determined not to extract the first character string L2 as the target character string.

また、情報処理装置１００は、第１文字列Ｌ３について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。情報処理装置１００は、第１文字列Ｌ３について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、情報処理装置１００は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ３について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、情報処理装置１００は、第１文字列Ｌ３について推定されたカテゴリの中に、不要カテゴリを含まないと判定する。続いて、情報処理装置１００は、不要カテゴリを含まないと判定したので、第１文字列Ｌ３を対象文字列Ｗ２として抽出する。 Further, the information processing apparatus 100 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L3. The information processing apparatus 100 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L3. Subsequently, since the information processing apparatus 100 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L3. Subsequently, the information processing apparatus 100 determines that the category estimated for the first character string L3 does not include an unnecessary category. Subsequently, since the information processing apparatus 100 has determined that the unnecessary category is not included, the first character string L3 is extracted as the target character string W2.

また、情報処理装置１００は、第１文字列Ｌ４について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。情報処理装置１００は、第１文字列Ｌ４について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、情報処理装置１００は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ４について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、情報処理装置１００は、第１文字列Ｌ４について推定されたカテゴリの中に、不要カテゴリである人名に関するカテゴリＣ２を含むと判定する。続いて、情報処理装置１００は、不要カテゴリを含むと判定したので、第１文字列Ｌ４を対象文字列として抽出しないことを決定する。 Further, the information processing apparatus 100 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L4. The information processing apparatus 100 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L4. Subsequently, since the information processing apparatus 100 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L4. Subsequently, the information processing apparatus 100 determines that the category C2 relating to the personal name, which is an unnecessary category, is included in the category estimated for the first character string L4. Subsequently, since the information processing apparatus 100 has determined that the unnecessary category is included, it is determined not to extract the first character string L4 as the target character string.

また、情報処理装置１００は、第１文字列Ｌ５について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。情報処理装置１００は、第１文字列Ｌ５について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、情報処理装置１００は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ５について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、情報処理装置１００は、第１文字列Ｌ５について推定されたカテゴリの中に、不要カテゴリを含まないと判定する。続いて、情報処理装置１００は、不要カテゴリを含まないと判定したので、第１文字列Ｌ５を対象文字列Ｗ３として抽出する。 Further, the information processing apparatus 100 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L5. The information processing apparatus 100 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L5. Subsequently, since the information processing apparatus 100 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L5. Subsequently, the information processing apparatus 100 determines that the category estimated for the first character string L5 does not include an unnecessary category. Subsequently, since the information processing apparatus 100 has determined that the unnecessary category is not included, the first character string L5 is extracted as the target character string W3.

また、情報処理装置１００は、第１文字列Ｌ６について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。情報処理装置１００は、第１文字列Ｌ６について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、情報処理装置１００は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ６について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、情報処理装置１００は、第１文字列Ｌ６について推定されたカテゴリの中に、不要カテゴリであるブランド名に関するカテゴリＣ３を含むと判定する。続いて、情報処理装置１００は、不要カテゴリを含むと判定したので、第１文字列Ｌ６を対象文字列として抽出しないことを決定する。 Further, the information processing apparatus 100 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L6. The information processing apparatus 100 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L6. Subsequently, since the information processing apparatus 100 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L6. Subsequently, the information processing apparatus 100 determines that the category C3 relating to the brand name, which is an unnecessary category, is included in the category estimated for the first character string L6. Subsequently, since the information processing apparatus 100 has determined that the unnecessary category is included, it is determined not to extract the first character string L6 as the target character string.

上述したように、情報処理装置１００は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。また、情報処理装置１００は、取得した第２学習モデルを用いて、検索クエリとして入力された文字列によって示される対象が属するカテゴリを推定する。そして、情報処理装置１００は、推定したカテゴリに基づいて、文字列の中から、対象分野に属する抽出対象を示す対象文字列を抽出する。 As described above, the information processing apparatus 100 uses a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned. The second training model generated by using the above, which predicts the category to which the predetermined search query belongs from the predetermined search query, is acquired. Further, the information processing apparatus 100 uses the acquired second learning model to estimate the category to which the target indicated by the character string input as the search query belongs. Then, the information processing apparatus 100 extracts a target character string indicating an extraction target belonging to the target field from the character string based on the estimated category.

これにより、情報処理装置１００は、検索クエリとして入力された文字列が未知用語である場合であっても、未知用語によって示される対象が属するカテゴリを推定することができるため、所定のカテゴリに属する対象を示す文字列を精度よく抽出することができる。例えば、情報処理装置１００は、未知用語に対して対象分野とは異なる非対象分野を示すカテゴリが推定された場合には、その未知用語を不要な文字列として取り除くことができる。また、情報処理装置１００は、流入検索クエリとして入力された文字列が未知用語である場合であっても、未知用語に対して対象分野を示すカテゴリが推定された場合には、その未知用語を対象分野に属する対象文字列として抽出することができる。したがって、本願発明に係る情報処理装置１００は、所定のカテゴリに属する対象を示す文字列を適切に抽出することができる。 As a result, the information processing apparatus 100 can estimate the category to which the target indicated by the unknown term belongs even when the character string input as the search query is an unknown term, and therefore belongs to a predetermined category. The character string indicating the target can be extracted with high accuracy. For example, the information processing apparatus 100 can remove an unknown term as an unnecessary character string when a category indicating a non-target field different from the target field is estimated for the unknown term. Further, even if the character string input as the inflow search query is an unknown term, the information processing apparatus 100 uses the unknown term when a category indicating a target field is estimated for the unknown term. It can be extracted as a target character string belonging to the target field. Therefore, the information processing apparatus 100 according to the present invention can appropriately extract a character string indicating an object belonging to a predetermined category.

〔１−２．情報処理装置の構成〕
次に、図３を用いて、実施形態に係る情報処理装置１００の構成について説明する。図３は、実施形態に係る情報処理装置１００の構成例を示す図である。図３に示すように、情報処理装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。なお、情報処理装置１００は、情報処理装置１００の管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を表示させるための表示部（例えば、液晶ディスプレイ等）を有してもよい。 [1-2. Information processing device configuration]
Next, the configuration of the information processing apparatus 100 according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram showing a configuration example of the information processing apparatus 100 according to the embodiment. As shown in FIG. 3, the information processing apparatus 100 includes a communication unit 110, a storage unit 120, and a control unit 130. The information processing device 100 includes an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the information processing device 100, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. You may have.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワークと有線または無線で接続され、例えば、生成装置５０とサービスサーバ２００との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 110 is connected to the network by wire or wirelessly, and for example, information is transmitted / received between the generation device 50 and the service server 200.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部１２０は、図３に示すように、クエリ情報記憶部１２１と不要文字列記憶部１２２と不要カテゴリ記憶部１２３とモデル情報記憶部１２４を有する。 (Memory unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. As shown in FIG. 3, the storage unit 120 includes a query information storage unit 121, an unnecessary character string storage unit 122, an unnecessary category storage unit 123, and a model information storage unit 124.

（クエリ情報記憶部１２１）
クエリ情報記憶部１２１は、対象分野に関するサイトに流入した検索クエリに関する各種の情報を記憶する。例えば、クエリ情報記憶部１２１は、ファッション系サイトに流入した検索クエリに関する各種の情報を記憶する。図４に、実施形態に係るクエリ情報記憶部の一例を示す。図４に示す例では、クエリ情報記憶部１２１は、「検索クエリＩＤ」、「日時」、「流入サイト名」、「文字列」といった項目を有する。 (Query information storage unit 121)
The query information storage unit 121 stores various information related to the search query that has flowed into the site related to the target field. For example, the query information storage unit 121 stores various information related to the search query that has flowed into the fashion-related site. FIG. 4 shows an example of the query information storage unit according to the embodiment. In the example shown in FIG. 4, the query information storage unit 121 has items such as "search query ID", "date and time", "inflow site name", and "character string".

「検索クエリＩＤ」は、対象分野に関するサイトに流入した検索クエリを識別する識別情報を示す。「日時」は、検索クエリが対象分野に関するサイトに流入した日時を示す。「流入サイト名」は、検索クエリが流入した流入先のサイト名を示す。「文字列」は、検索クエリとして入力された文字列を示す。 The "search query ID" indicates identification information that identifies the search query that has flowed into the site related to the target field. "Date and time" indicates the date and time when the search query entered the site related to the target field. The "inflow site name" indicates the site name of the inflow destination to which the search query has flowed. "Character string" indicates a character string entered as a search query.

図４の１レコード目に示す例では、検索クエリＩＤ「Ｑ１」で識別される検索クエリ（検索クエリＱ１）は、日時「ＤＴ１」に流入サイト名「サイト名Ｎ１」に流入した検索クエリであることを示す。また、文字列「サイト名Ｎ１コサッシュ」は、検索クエリＱ１として入力された文字列が「サイト名Ｎ１コサッシュ」であることを示す。 In the example shown in the first record of FIG. 4, the search query (search query Q1) identified by the search query ID "Q1" is a search query that has flowed into the site name "site name N1" that has flowed into the date and time "DT1". Show that. Further, the character string "site name N1 cosash" indicates that the character string input as the search query Q1 is "site name N1 cosash".

（不要文字列記憶部１２２）
不要文字列記憶部１２２は、不要文字列に関する各種の情報を記憶する。図５に、実施形態に係る不要文字列記憶部の一例を示す。図５に示す例では、不要文字列記憶部１２２は、「不要文字列ＩＤ」、「不要文字列」といった項目を有する。 (Unnecessary character string storage unit 122)
The unnecessary character string storage unit 122 stores various information related to the unnecessary character string. FIG. 5 shows an example of an unnecessary character string storage unit according to the embodiment. In the example shown in FIG. 5, the unnecessary character string storage unit 122 has items such as “unnecessary character string ID” and “unnecessary character string”.

「不要文字列ＩＤ」は、不要文字列を識別する識別情報を示す。「不要文字列」は、不要な文字列として登録された文字列を示す。 The "unnecessary character string ID" indicates identification information for identifying the unnecessary character string. "Unnecessary character string" indicates a character string registered as an unnecessary character string.

図５の１レコード目に示す例では、不要文字列ＩＤ「ＵＬ１１」で識別される不要文字列が「サイト名Ｎ１」であることを示す。例えば、「サイト名Ｎ１」は、ファッション系サイトのサイト名を示す。 In the example shown in the first record of FIG. 5, it is shown that the unnecessary character string identified by the unnecessary character string ID “UL11” is “site name N1”. For example, "site name N1" indicates the site name of a fashion-related site.

（不要カテゴリ記憶部１２３）
不要カテゴリ記憶部１２３は、不要カテゴリに関する各種の情報を記憶する。図６に、実施形態に係る不要カテゴリ記憶部の一例を示す。図６に示す例では、不要カテゴリ記憶部１２３は、「不要カテゴリＩＤ」、「不要カテゴリ」といった項目を有する。 (Unnecessary category storage unit 123)
The unnecessary category storage unit 123 stores various information related to the unnecessary category. FIG. 6 shows an example of an unnecessary category storage unit according to the embodiment. In the example shown in FIG. 6, the unnecessary category storage unit 123 has items such as “unnecessary category ID” and “unnecessary category”.

「不要カテゴリＩＤ」は、不要カテゴリを識別する識別情報を示す。「不要カテゴリ」は、不要なカテゴリとして登録されたカテゴリを示す。 The "unnecessary category ID" indicates identification information for identifying the unnecessary category. "Unnecessary category" indicates a category registered as an unnecessary category.

図６の１レコード目に示す例では、不要カテゴリ「人名」は、不要カテゴリＩＤ「Ｃ２」で識別される不要カテゴリが人名に関するカテゴリであることを示す。 In the example shown in the first record of FIG. 6, the unnecessary category "person name" indicates that the unnecessary category identified by the unnecessary category ID "C2" is a category related to a person name.

（モデル情報記憶部１２４）
モデル情報記憶部１２４は、生成装置５０によって生成された学習モデルに関する各種の情報を記憶する。図７に、実施形態に係るモデル情報記憶部の一例を示す。図７に示す例では、モデル情報記憶部１２４は、「モデルＩＤ」、「モデルデータ」といった項目を有する。 (Model information storage unit 124)
The model information storage unit 124 stores various information about the learning model generated by the generation device 50. FIG. 7 shows an example of the model information storage unit according to the embodiment. In the example shown in FIG. 7, the model information storage unit 124 has items such as "model ID" and "model data".

「モデルＩＤ」は、生成装置５０によって生成された学習モデルを識別するための識別情報を示す。「モデルデータ」は、生成装置５０によって生成された学習モデルのモデルデータを示す。例えば、「モデルデータ」には、検索クエリを検索クエリが各カテゴリに属する確率に変換するためのデータが格納される。 The "model ID" indicates identification information for identifying the learning model generated by the generation device 50. The "model data" indicates the model data of the learning model generated by the generation device 50. For example, "model data" stores data for converting a search query into the probability that the search query belongs to each category.

図７の１レコード目に示す例では、モデルＩＤ「Ｍ１」で識別される学習モデルは、後述する第１モデルＭ１に対応する。また、モデルデータ「ＭＤＴ１」は、情報処理装置１００によって生成された第１モデルＭ１のモデルデータ（モデルデータＭＤＴ１）を示す。 In the example shown in the first record of FIG. 7, the learning model identified by the model ID “M1” corresponds to the first model M1 described later. Further, the model data "MDT1" indicates model data (model data MDT1) of the first model M1 generated by the information processing apparatus 100.

モデルデータＭＤＴ１は、検索クエリが入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力された検索クエリに応じて、入力層に入力された検索クエリの分散表現を出力層から出力するよう、情報処理装置１００を機能させてもよい。 The model data MDT1 includes an input layer into which a search query is input, an output layer, a first element which is any layer from the input layer to the output layer and belongs to a layer other than the output layer, and the first element and the first element. The output layer outputs a distributed representation of the search query input to the input layer according to the search query input to the input layer, including the second element whose value is calculated based on the weight of one element. As such, the information processing device 100 may be made to function.

ここで、モデルデータＭＤＴ１が「y=a1*x1+a2*x2+・・・+ai*xi」で示す回帰モデルで実現されるとする。この場合、モデルデータＭＤＴ１が含む第１要素は、x1やx2等といった入力データ（xi）に対応する。また、第１要素の重みは、xiに対応する係数aiに対応する。ここで、回帰モデルは、入力層と出力層とを有する単純パーセプトロンと見做すことができる。各モデルを単純パーセプトロンと見做した場合、第１要素は、入力層が有するいずれかのノードに対応し、第２要素は、出力層が有するノードと見做すことができる。 Here, it is assumed that the model data MDT1 is realized by the regression model shown by "y = a1 * x1 + a2 * x2 + ... + ai * xi". In this case, the first element included in the model data MDT1 corresponds to input data (xi) such as x1 and x2. Further, the weight of the first element corresponds to the coefficient ai corresponding to xi. Here, the regression model can be regarded as a simple perceptron having an input layer and an output layer. When each model is regarded as a simple perceptron, the first element corresponds to any node of the input layer, and the second element can be regarded as the node of the output layer.

また、モデルデータＭＤＴ１がＤＮＮ（Deep Neural Network）等、１つまたは複数の中間層を有するニューラルネットワークで実現されるとする。この場合、モデルデータＭＤＴ１が含む第１要素は、入力層または中間層が有するいずれかのノードに対応する。また、第２要素は、第１要素と対応するノードから値が伝達されるノードである次段のノードに対応する。また、第１要素の重みは、第１要素と対応するノードから第２要素と対応するノードに伝達される値に対して考慮される重みである接続係数に対応する。 Further, it is assumed that the model data MDT1 is realized by a neural network having one or a plurality of intermediate layers such as DNN (Deep Neural Network). In this case, the first element included in the model data MDT1 corresponds to either the node of the input layer or the intermediate layer. Further, the second element corresponds to the node of the next stage, which is the node to which the value is transmitted from the node corresponding to the first element. Further, the weight of the first element corresponds to a connection coefficient which is a weight considered for the value transmitted from the node corresponding to the first element to the node corresponding to the second element.

情報処理装置１００は、上述した回帰モデルやニューラルネットワーク等、任意の構造を有するモデルを用いて、分散表現の算出を行う。具体的には、モデルデータＭＤＴ１は、検索クエリが入力された場合に、分散表現を出力するように係数が設定される。情報処理装置１００は、このようなモデルデータＭＤＴ１を用いて、分散表現を算出する。 The information processing apparatus 100 calculates the distributed representation using a model having an arbitrary structure such as the regression model and the neural network described above. Specifically, the model data MDT1 is set with a coefficient so as to output a distributed representation when a search query is input. The information processing apparatus 100 uses such model data MDT1 to calculate a distributed representation.

なお、上記例では、モデルデータＭＤＴ１が、検索クエリが入力された場合に、検索クエリの分散表現を出力するモデル（以下、モデルＸ１という。）である例を示した。しかし、実施形態に係るモデルデータＭＤＴ１は、モデルＸ１にデータの入出力を繰り返すことで得られる結果に基づいて生成されるモデルであってもよい。例えば、モデルデータＭＤＴ１は、検索クエリを入力とし、モデルＸ１が出力する分散表現を出力とするよう学習されたモデル（以下、モデルＹ１という。）であってもよい。または、モデルデータＭＤＴ１は、検索クエリを入力とし、モデルＹ１の出力値を出力とするよう学習されたモデルであってもよい。 In the above example, the model data MDT1 is a model (hereinafter referred to as model X1) that outputs a distributed representation of the search query when the search query is input. However, the model data MDT1 according to the embodiment may be a model generated based on the result obtained by repeating the input / output of data to the model X1. For example, the model data MDT1 may be a model (hereinafter referred to as model Y1) trained to input a search query and output a distributed expression output by the model X1. Alternatively, the model data MDT1 may be a model trained to input a search query and output the output value of the model Y1.

また、情報処理装置１００がＧＡＮ（Generative Adversarial Networks）を用いた推定処理を行う場合、モデルデータＭＤＴ１は、ＧＡＮの一部を構成するモデルであってもよい。 Further, when the information processing apparatus 100 performs estimation processing using GAN (Generative Adversarial Networks), the model data MDT1 may be a model constituting a part of GAN.

図７の２レコード目に示す例では、モデルＩＤ「Ｍ２」で識別される学習モデルは、図１に示した第２モデルＭ２に対応する。また、モデルデータ「ＭＤＴ２」は、情報処理装置１００によって生成された第２モデルＭ２のモデルデータ（モデルデータＭＤＴ２）を示す。 In the example shown in the second record of FIG. 7, the learning model identified by the model ID “M2” corresponds to the second model M2 shown in FIG. Further, the model data "MDT2" indicates model data (model data MDT2) of the second model M2 generated by the information processing apparatus 100.

モデルデータＭＤＴ２は、検索クエリが入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力された検索クエリに応じて、入力層に入力された検索クエリが各カテゴリに属する確率を出力層から出力するよう、情報処理装置１００を機能させてもよい。 The model data MDT2 includes an input layer into which a search query is input, an output layer, a first element which is any layer from the input layer to the output layer and belongs to a layer other than the output layer, and the first element and the first element. An output layer that includes a second element whose value is calculated based on the weight of one element, and the probability that the search query input to the input layer belongs to each category according to the search query input to the input layer. The information processing apparatus 100 may be made to function so as to output from.

ここで、モデルデータＭＤＴ２が「y=a1*x1+a2*x2+・・・+ai*xi」で示す回帰モデルで実現されるとする。この場合、モデルデータＭＤＴ２が含む第１要素は、x1やx2等といった入力データ（xi）に対応する。また、第１要素の重みは、xiに対応する係数aiに対応する。ここで、回帰モデルは、入力層と出力層とを有する単純パーセプトロンと見做すことができる。各モデルを単純パーセプトロンと見做した場合、第１要素は、入力層が有するいずれかのノードに対応し、第２要素は、出力層が有するノードと見做すことができる。 Here, it is assumed that the model data MDT2 is realized by the regression model shown by "y = a1 * x1 + a2 * x2 + ... + ai * xi". In this case, the first element included in the model data MDT2 corresponds to input data (xi) such as x1 and x2. Further, the weight of the first element corresponds to the coefficient ai corresponding to xi. Here, the regression model can be regarded as a simple perceptron having an input layer and an output layer. When each model is regarded as a simple perceptron, the first element corresponds to any node of the input layer, and the second element can be regarded as the node of the output layer.

また、モデルデータＭＤＴ２がＤＮＮ（Deep Neural Network）等、１つまたは複数の中間層を有するニューラルネットワークで実現されるとする。この場合、モデルデータＭＤＴ２が含む第１要素は、入力層または中間層が有するいずれかのノードに対応する。また、第２要素は、第１要素と対応するノードから値が伝達されるノードである次段のノードに対応する。また、第１要素の重みは、第１要素と対応するノードから第２要素と対応するノードに伝達される値に対して考慮される重みである接続係数に対応する。 Further, it is assumed that the model data MDT2 is realized by a neural network having one or a plurality of intermediate layers such as DNN (Deep Neural Network). In this case, the first element included in the model data MDT2 corresponds to either the node of the input layer or the intermediate layer. Further, the second element corresponds to the node of the next stage, which is the node to which the value is transmitted from the node corresponding to the first element. Further, the weight of the first element corresponds to a connection coefficient which is a weight considered for the value transmitted from the node corresponding to the first element to the node corresponding to the second element.

情報処理装置１００は、上述した回帰モデルやニューラルネットワーク等、任意の構造を有するモデルを用いて、検索クエリが各カテゴリに属する確率の算出を行う。具体的には、モデルデータＭＤＴ２は、検索クエリが入力された場合に、検索クエリが各カテゴリに属する確率を出力するように係数が設定される。情報処理装置１００は、このようなモデルデータＭＤＴ２を用いて、検索クエリが各カテゴリに属する確率を算出する。 The information processing apparatus 100 calculates the probability that the search query belongs to each category by using a model having an arbitrary structure such as the regression model and the neural network described above. Specifically, in the model data MDT2, when a search query is input, a coefficient is set so as to output the probability that the search query belongs to each category. The information processing apparatus 100 uses such model data MDT2 to calculate the probability that the search query belongs to each category.

なお、上記例では、モデルデータＭＤＴ２が、検索クエリが入力された場合に、検索クエリが各カテゴリに属する確率を出力するモデル（以下、モデルＸ２という。）である例を示した。しかし、実施形態に係るモデルデータＭＤＴ２は、モデルＸ２にデータの入出力を繰り返すことで得られる結果に基づいて生成されるモデルであってもよい。例えば、モデルデータＭＤＴ２は、検索クエリを入力とし、モデルＸ２が出力する確率を出力とするよう学習されたモデル（以下、モデルＹ２という。）であってもよい。または、モデルデータＭＤＴ２は、検索クエリを入力とし、モデルＹ２の出力値を出力とするよう学習されたモデルであってもよい。 In the above example, the model data MDT2 is a model (hereinafter referred to as model X2) that outputs the probability that the search query belongs to each category when the search query is input. However, the model data MDT2 according to the embodiment may be a model generated based on the result obtained by repeating the input / output of data to the model X2. For example, the model data MDT2 may be a model trained to input a search query and output a probability output by the model X2 (hereinafter referred to as a model Y2). Alternatively, the model data MDT2 may be a model trained to input a search query and output the output value of the model Y2.

また、情報処理装置１００がＧＡＮ（Generative Adversarial Networks）を用いた推定処理を行う場合、モデルデータＭＤＴ２は、ＧＡＮの一部を構成するモデルであってもよい。 Further, when the information processing apparatus 100 performs estimation processing using GAN (Generative Adversarial Networks), the model data MDT2 may be a model constituting a part of GAN.

（制御部１３０）
図３の説明に戻って、制御部１３０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、情報処理装置１００内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 130)
Returning to the description of FIG. 3, the control unit 130 is a controller, and is stored in a storage device inside the information processing device 100 by, for example, a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). Various programs (corresponding to an example of an information processing program) are realized by executing the RAM as a work area. Further, the control unit 130 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図３に示すように、制御部１３０は、取得部１３１と、処理部１３２と、推定部１３３と、抽出部１３４とを有し、以下に説明する情報処理の作用を実現または実行する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 3, the control unit 130 includes an acquisition unit 131, a processing unit 132, an estimation unit 133, and an extraction unit 134, and realizes or executes the information processing operation described below. The internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be any other configuration as long as it is configured to perform information processing described later.

（取得部１３１）
取得部１３１は、各種情報を取得する。具体的には、取得部１３１は、対象分野に属する対象を示す文字列とは異なる対象を示す文字列である不要文字列を取得する。例えば、取得部１３１は、不要な文字列として登録された不要文字列のリストを取得する。例えば、取得部１３１は、情報処理装置１００の管理者によって登録された不要文字列のリストを取得する。取得部１３１は、不要文字列を取得すると、取得した不要文字列を不要文字列記憶部１２２に記憶する。 (Acquisition unit 131)
The acquisition unit 131 acquires various types of information. Specifically, the acquisition unit 131 acquires an unnecessary character string which is a character string indicating an object different from the character string indicating the object belonging to the target field. For example, the acquisition unit 131 acquires a list of unnecessary character strings registered as unnecessary character strings. For example, the acquisition unit 131 acquires a list of unnecessary character strings registered by the administrator of the information processing apparatus 100. When the acquisition unit 131 acquires an unnecessary character string, the acquisition unit 131 stores the acquired unnecessary character string in the unnecessary character string storage unit 122.

また、取得部１３１は、対象分野とは異なる非対象分野を示すカテゴリである不要カテゴリに関する情報を取得する。例えば、取得部１３１は、不要カテゴリとして登録された不要カテゴリのリストを取得する。例えば、取得部１３１は、情報処理装置１００の管理者によって登録された不要カテゴリのリストを取得する。取得部１３１は、不要カテゴリを取得すると、取得した不要カテゴリを不要カテゴリ記憶部１２３に記憶する。 In addition, the acquisition unit 131 acquires information about an unnecessary category, which is a category indicating a non-target field different from the target field. For example, the acquisition unit 131 acquires a list of unnecessary categories registered as unnecessary categories. For example, the acquisition unit 131 acquires a list of unnecessary categories registered by the administrator of the information processing apparatus 100. When the acquisition unit 131 acquires an unnecessary category, the acquisition unit 131 stores the acquired unnecessary category in the unnecessary category storage unit 123.

また、取得部１３１は、外部の情報処理装置から各種情報を取得する。具体的には、取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを生成装置５０から取得する。続いて、取得部１３１は、第２学習モデルを取得すると、取得した第２学習モデルをモデル情報記憶部１２４に格納する。 Further, the acquisition unit 131 acquires various information from an external information processing device. Specifically, the acquisition unit 131 uses a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned. A second learning model that predicts the category to which the predetermined search query belongs is acquired from the generation device 50 from the predetermined search query. Subsequently, when the acquisition unit 131 acquires the second learning model, the acquisition unit 131 stores the acquired second learning model in the model information storage unit 124.

また、取得部１３１は、同一のユーザによって所定の時間内に入力された一対の検索クエリの分散表現が類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。 Further, the acquisition unit 131 uses a first learning model in which the characteristics of a plurality of search queries are learned so that the distributed expressions of a pair of search queries input by the same user within a predetermined time have similar characteristics. A second learning model that predicts a category to which a predetermined search query belongs is acquired from a predetermined search query.

また、取得部１３１は、入力情報として所定の検索クエリが入力された際に、出力情報として所定の検索クエリの分散表現を出力する第１学習モデルを用いて生成された第２学習モデルを取得する。 Further, the acquisition unit 131 acquires a second learning model generated by using a first learning model that outputs a distributed representation of the predetermined search query as output information when a predetermined search query is input as input information. do.

また、取得部１３１は、入力情報として検索クエリが第２学習モデルに入力された際に、出力情報として検索クエリがカテゴリに属する確率をカテゴリ毎に出力する第２学習モデルを取得する。 Further, the acquisition unit 131 acquires a second learning model that outputs the probability that the search query belongs to a category as output information when the search query is input to the second learning model as input information.

また、取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリとして、所定の区切り文字で区切られた文字列を含む複数の検索クエリが類似する特徴を有するものとして学習することで、複数の検索クエリが類似する特徴を有するものとして、複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。 Further, the acquisition unit 131 learns that a plurality of search queries including a character string delimited by a predetermined delimiter have similar characteristics as a plurality of search queries input by the same user within a predetermined time. By doing so, it is assumed that a plurality of search queries have similar characteristics, and it is a second learning model generated by using the first learning model in which the characteristics possessed by the plurality of search queries are learned, from a predetermined search query. Acquire a second training model that predicts the category to which a given search query belongs.

また、取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルであって、ランダムに抽出された複数の検索クエリが相違する特徴を有するものとして学習することで、複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。 Further, the acquisition unit 131 is a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned, and is random. It is a second learning model generated by using the first learning model that learned the characteristics of the plurality of search queries by learning that the plurality of search queries extracted from the above have different characteristics. A second learning model that predicts the category to which a predetermined search query belongs is acquired from the search query of.

また、取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルであって、ランダムに抽出された一対の検索クエリの分散表現が相違するように学習することで、複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。 Further, the acquisition unit 131 is a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned, and is random. It is a second learning model generated by using the first learning model that learned the characteristics of a plurality of search queries by learning so that the distributed expressions of the pair of search queries extracted in the above are different. A second learning model that predicts the category to which a predetermined search query belongs is acquired from the search query of.

また、取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、検索クエリが第２学習モデルに入力された際に、第２学習モデルが出力する分散表現の分類結果が、検索クエリが属するカテゴリに対応するように学習することで、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。 Further, the acquisition unit 131 is generated by using a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned. In the second learning model, when the search query is input to the second learning model, the classification result of the distributed expression output by the second learning model is learned so as to correspond to the category to which the search query belongs. Then, the second learning model that predicts the category to which the predetermined search query belongs is acquired from the predetermined search query.

（処理部１３２）
処理部１３２は、対象分野に関するサイトに流入した流入検索クエリを取得し、流入検索クエリとして入力された文字列から、不要な文字列として登録された不要文字列を取り除いた第１文字列を取得する。具体的には、処理部１３２は、サービスサーバ２００から対象分野に関するサイトに流入した流入検索クエリに関する情報を取得する。より具体的には、処理部１３２は、各流入検索クエリとして入力された各文字列を取得する。処理部１３２は、流入検索クエリとして入力された文字列を取得すると、取得した文字列をクエリ情報記憶部１２１に格納する。 (Processing unit 132)
The processing unit 132 acquires the inflow search query that has flowed into the site related to the target field, and acquires the first character string obtained by removing the unnecessary character string registered as an unnecessary character string from the character string input as the inflow search query. do. Specifically, the processing unit 132 acquires information regarding the inflow search query that has flowed into the site related to the target field from the service server 200. More specifically, the processing unit 132 acquires each character string input as each inflow search query. When the processing unit 132 acquires the character string input as the inflow search query, the processing unit 132 stores the acquired character string in the query information storage unit 121.

図１に示す例では、処理部１３２は、サービスサーバ２００からファッション系サイトに流入した流入検索クエリに関する情報を取得する。具体的には、処理部１３２は、流入検索クエリとして入力された各文字列を取得する。 In the example shown in FIG. 1, the processing unit 132 acquires information regarding the inflow search query that has flowed into the fashion site from the service server 200. Specifically, the processing unit 132 acquires each character string input as an inflow search query.

図１の左側に示す例では、処理部１３２は、流入検索クエリＱ１００として入力された文字列「レディース未知用語Ｌ１」を取得する。ここで、「未知用語Ｌ１」は、世の中に登場したばかりの新しい用語(例えば、ファッション用語)であって、まだ辞書等に掲載されていない用語であるものとする。 In the example shown on the left side of FIG. 1, the processing unit 132 acquires the character string “ladies unknown term L1” input as the inflow search query Q100. Here, it is assumed that the "unknown term L1" is a new term (for example, a fashion term) that has just appeared in the world and has not yet been published in a dictionary or the like.

また、処理部１３２は、流入検索クエリＱ２００として入力された文字列「未知用語Ｌ１Ｍサイズ店舗名Ｔ１」を取得する。なお、「店舗名Ｔ１」は、特定の店舗(例えば、特定のアパレルメーカー)の名称を示す語句であって、例えば、世の中に登場したばかりの新しい店舗名を示す語句であるものとする。 Further, the processing unit 132 acquires the character string “unknown term L1 M size store name T1” input as the inflow search query Q200. The "store name T1" is a phrase indicating the name of a specific store (for example, a specific apparel maker), and is, for example, a phrase indicating a new store name that has just appeared in the world.

また、処理部１３２は、流入検索クエリＱ３００として入力された文字列「Ｙシャツ２０代」を取得する。 Further, the processing unit 132 acquires the character string "Y-shirt 20s" input as the inflow search query Q300.

また、処理部１３２は、流入検索クエリＱ４００として入力された文字列「Ｙシャツ人名Ｍ１」を取得する。なお、「人名Ｍ１」は、特定の人物(例えば、ファッションモデルや芸能人等)の名称を示す語句であって、例えば、世の中に登場したばかりの新しい人名を示す語句であるものとする。 Further, the processing unit 132 acquires the character string "Y-shirt person name M1" input as the inflow search query Q400. The "personal name M1" is a phrase indicating the name of a specific person (for example, a fashion model, an entertainer, etc.), and is, for example, a phrase indicating a new personal name that has just appeared in the world.

また、処理部１３２は、流入検索クエリＱ５００として入力された文字列「未知用語Ｌ２コーデ」を取得する。ここで、「未知用語Ｌ２」は、「未知用語Ｌ１」と同様、世の中に登場したばかりの新しい用語(例えば、ファッション用語)であって、まだ辞書等に掲載されていない用語であるものとする。 Further, the processing unit 132 acquires the character string "unknown term L2 coordination" input as the inflow search query Q500. Here, it is assumed that the "unknown term L2" is a new term (for example, a fashion term) that has just appeared in the world and has not yet been published in a dictionary or the like, like the "unknown term L1". ..

また、処理部１３２は、流入検索クエリＱ６００として入力された文字列「未知用語Ｌ２ブランド名Ｂ１人気」を取得する。なお、「ブランド名Ｂ１」は、特定のアパラレルブランドの名称を示す語句であって、例えば、世の中に登場したばかりの新しいブランド名を示す語句であるものとする。 Further, the processing unit 132 acquires the character string "unknown term L2 brand name B1 popularity" input as the inflow search query Q600. The "brand name B1" is a phrase indicating the name of a specific parallel brand, and is, for example, a phrase indicating a new brand name that has just appeared in the world.

続いて、処理部１３２は、流入検索クエリ群を取得すると、取得した流入検索クエリ群に含まれる各検索クエリとして入力された各文字列から不要な文字列として登録された不要文字列を取り除いた第１文字列を取得する。例えば、処理部１３２は、不要文字列記憶部１２２を参照して、不要な文字列として登録された不要文字列のリストを取得する。続いて、処理部１３２は、取得した不要文字列のリストを参照して、取得した流入検索クエリ群に含まれる各検索クエリとして入力された各文字列の中に不要文字列が含まれるか否かを判定する。続いて、処理部１３２は、不要文字列が含まれると判定した場合には、検索クエリとして入力された文字列から不要文字列を取り除いた第１文字列を取得する。 Subsequently, when the inflow search query group is acquired, the processing unit 132 removes the unnecessary character string registered as an unnecessary character string from each character string input as each search query included in the acquired inflow search query group. Get the first character string. For example, the processing unit 132 refers to the unnecessary character string storage unit 122 to acquire a list of unnecessary character strings registered as unnecessary character strings. Subsequently, the processing unit 132 refers to the acquired list of unnecessary character strings, and determines whether or not the unnecessary character strings are included in each character string input as each search query included in the acquired inflow search query group. Is determined. Subsequently, when it is determined that the unnecessary character string is included, the processing unit 132 acquires the first character string obtained by removing the unnecessary character string from the character string input as the search query.

例えば、処理部１３２は、不要文字列のリストを参照して、流入検索クエリＱ１００として入力された文字列「レディース未知用語Ｌ１」に不要文字列である「レディース」が含まれると判定する。続いて、処理部１３２は、不要文字列が含まれると判定すると、流入検索クエリＱ１００として入力された文字列「レディース未知用語Ｌ１」から不要文字列である「レディース」を取り除いた第１文字列「未知用語Ｌ１」（第１文字列Ｌ１）を取得する。 For example, the processing unit 132 refers to the list of unnecessary character strings, and determines that the character string “ladies unknown term L1” input as the inflow search query Q100 includes the unnecessary character string “ladies”. Subsequently, when the processing unit 132 determines that the unnecessary character string is included, the first character string obtained by removing the unnecessary character string "ladies" from the character string "ladies unknown term L1" input as the inflow search query Q100. Acquire the "unknown term L1" (first character string L1).

また、処理部１３２は、不要文字列のリストを参照して、流入検索クエリＱ２００として入力された文字列「未知用語Ｌ１Ｍサイズ店舗名Ｔ１」に不要文字列である「Ｍサイズ」が含まれると判定する。続いて、処理部１３２は、不要文字列が含まれると判定すると、流入検索クエリＱ２００として入力された文字列「未知用語Ｌ１Ｍサイズ店舗名Ｔ１」から不要文字列である「Ｍサイズ」を取り除いた第１文字列「未知用語Ｌ１店舗名Ｔ１」（第１文字列Ｌ２）を取得する。 Further, the processing unit 132 refers to the list of unnecessary character strings, and the character string "unknown term L1 M size store name T1" input as the inflow search query Q200 includes the unnecessary character string "M size". Is determined. Subsequently, when the processing unit 132 determines that the unnecessary character string is included, the processing unit 132 removes the unnecessary character string "M size" from the character string "unknown term L1 M size store name T1" input as the inflow search query Q200. The first character string "unknown term L1 store name T1" (first character string L2) is acquired.

また、処理部１３２は、不要文字列のリストを参照して、流入検索クエリＱ３００として入力された文字列「Ｙシャツ２０代」に不要文字列である「２０代」が含まれると判定する。続いて、処理部１３２は、不要文字列が含まれると判定すると、流入検索クエリＱ３００として入力された文字列「Ｙシャツ２０代」から不要文字列である「２０代」を取り除いた第１文字列「Ｙシャツ」（第１文字列Ｌ３）を取得する。 Further, the processing unit 132 refers to the list of unnecessary character strings, and determines that the character string "Y-shirt 20s" input as the inflow search query Q300 includes the unnecessary character string "20s". Subsequently, when the processing unit 132 determines that the unnecessary character string is included, the first character obtained by removing the unnecessary character string "20's" from the character string "Y-shirt 20's" input as the inflow search query Q300. The column "Y shirt" (first character string L3) is acquired.

また、処理部１３２は、不要文字列のリストを参照して、流入検索クエリＱ５００として入力された文字列「未知用語Ｌ２コーデ」に不要文字列である「コーデ」が含まれると判定する。続いて、処理部１３２は、不要文字列が含まれると判定すると、流入検索クエリＱ５００として入力された文字列「未知用語Ｌ２コーデ」から不要文字列である「コーデ」を取り除いた第１文字列「未知用語Ｌ２」（第１文字列Ｌ５）を取得する。 Further, the processing unit 132 refers to the list of unnecessary character strings, and determines that the character string "unknown term L2 coordination" input as the inflow search query Q500 includes the unnecessary character string "coordination". Subsequently, when the processing unit 132 determines that the unnecessary character string is included, the first character string obtained by removing the unnecessary character string "corde" from the character string "unknown term L2 coordination" input as the inflow search query Q500. Acquire the "unknown term L2" (first character string L5).

また、処理部１３２は、不要文字列のリストを参照して、流入検索クエリＱ６００として入力された文字列「未知用語Ｌ２ブランド名Ｂ１人気」に不要文字列である「人気」が含まれると判定する。続いて、処理部１３２は、不要文字列が含まれると判定すると、流入検索クエリＱ６００として入力された文字列「未知用語Ｌ２ブランド名Ｂ１人気」から不要文字列である「人気」を取り除いた第１文字列「未知用語Ｌ２ブランド名Ｂ１」（第１文字列Ｌ６）を取得する。 Further, the processing unit 132 refers to the list of unnecessary character strings, and determines that the character string "unknown term L2 brand name B1 popularity" input as the inflow search query Q600 includes "popularity" which is an unnecessary character string. do. Subsequently, when the processing unit 132 determines that the unnecessary character string is included, the processing unit 132 removes the unnecessary character string "popularity" from the character string "unknown term L2 brand name B1 popularity" input as the inflow search query Q600. Acquire one character string "unknown term L2 brand name B1" (first character string L6).

（推定部１３３）
推定部１３３は、取得部１３１によって取得された第２学習モデルを用いて、検索クエリとして入力された文字列によって示される対象が属するカテゴリを推定する。推定部１３３は、処理部１３２によって取得された第１文字列によって示される対象が属するカテゴリを推定する。具体的には、推定部１３３は、処理部１３２によって第１文字列が取得されると、モデル情報記憶部１２４を参照して、第２学習モデル（第２学習モデルＭ２）を取得する。続いて、推定部１３３は、第２学習モデルを取得すると、第２学習モデルを用いて、第１文字列によって示される対象が属するカテゴリを推定する。推定部１３３は、文字列によって示される対象が属する複数のカテゴリを推定する。例えば、推定部１３３は、第１文字列によって示される対象が属する複数のカテゴリを推定する。 (Estimation unit 133)
The estimation unit 133 uses the second learning model acquired by the acquisition unit 131 to estimate the category to which the target indicated by the character string input as the search query belongs. The estimation unit 133 estimates the category to which the object indicated by the first character string acquired by the processing unit 132 belongs. Specifically, when the first character string is acquired by the processing unit 132, the estimation unit 133 refers to the model information storage unit 124 and acquires the second learning model (second learning model M2). Subsequently, when the estimation unit 133 acquires the second learning model, the estimation unit 133 estimates the category to which the object indicated by the first character string belongs by using the second learning model. The estimation unit 133 estimates a plurality of categories to which the object indicated by the character string belongs. For example, the estimation unit 133 estimates a plurality of categories to which the object indicated by the first character string belongs.

より具体的には、推定部１３３は、文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、推定部１３３は、第１文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列を入力することにより、第２学習モデルＭ２の出力情報として第１文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。 More specifically, the estimation unit 133 outputs the probability that the object indicated by the character string belongs to each category for each category. For example, the estimation unit 133 outputs the probability that the object indicated by the first character string belongs to each category for each category. For example, by inputting the first character string as the input information of the second learning model M2, the estimation unit 133 determines the probability that the target indicated by the first character string belongs to each category as the output information of the second learning model M2. Output for each category.

図１の真ん中に示す例では、推定部１３３は、第１文字列Ｌ１を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ１を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、推定部１３３は、第１文字列Ｌ１によって示される対象がファッションに関するカテゴリＣ１（以下、ファッションカテゴリＣ１ともいう）に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 In the example shown in the middle of FIG. 1, when the estimation unit 133 acquires the first character string L1, the estimation unit 133 inputs the first character string L1 as the input information of the second learning model M2, thereby outputting the second learning model M2. As information, the probability that the target indicated by the first character string L1 belongs to each category is output for each category. For example, the estimation unit 133 outputs the probability that the object represented by the first character string L1 belongs to the fashion category C1 (hereinafter, also referred to as fashion category C1) as 100%, and the probability that the object belongs to another category is 0%.

また、推定部１３３は、第１文字列Ｌ２を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ２を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ２によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、推定部１３３は、第１文字列Ｌ２によって示される対象がファッションカテゴリＣ１に属する確率を７０パーセント、店舗名に関するカテゴリＣ４に属する確率を３０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the estimation unit 133 acquires the first character string L2, the estimation unit 133 inputs the first character string L2 as the input information of the second learning model M2, so that the first character string L2 is used as the output information of the second learning model M2. The probability that the target indicated by is in each category is output for each category. For example, the estimation unit 133 outputs the probability that the object indicated by the first character string L2 belongs to the fashion category C1 is 70%, the probability that the object belongs to the category C4 regarding the store name is 30%, and the probability that the object belongs to another category is 0%. do.

また、推定部１３３は、第１文字列Ｌ３を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ３を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ３によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、推定部１３３は、第１文字列Ｌ３によって示される対象がファッションカテゴリＣ１に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the estimation unit 133 acquires the first character string L3, the estimation unit 133 inputs the first character string L3 as the input information of the second learning model M2, thereby inputting the first character string L3 as the output information of the second learning model M2. The probability that the target indicated by is in each category is output for each category. For example, the estimation unit 133 outputs the probability that the object represented by the first character string L3 belongs to the fashion category C1 is 100%, and the probability that the object belongs to the other category is 0%.

また、推定部１３３は、第１文字列Ｌ４を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ４を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ４によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、推定部１３３は、第１文字列Ｌ４によって示される対象がファッションカテゴリＣ１に属する確率を５０パーセント、人名に関するカテゴリＣ２に属する確率を５０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the estimation unit 133 acquires the first character string L4, the estimation unit 133 inputs the first character string L4 as the input information of the second learning model M2, so that the first character string L4 is used as the output information of the second learning model M2. The probability that the target indicated by is in each category is output for each category. For example, the estimation unit 133 outputs the probability that the object indicated by the first character string L4 belongs to the fashion category C1 as 50%, the probability that the object belongs to the category C2 related to the person name as 50%, and the probability that the object belongs to the other category as 0%. ..

また、推定部１３３は、第１文字列Ｌ５を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ５を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ５によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、推定部１３３は、第１文字列Ｌ５によって示される対象がファッションカテゴリＣ１に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the estimation unit 133 acquires the first character string L5, the estimation unit 133 inputs the first character string L5 as the input information of the second learning model M2, so that the first character string L5 is used as the output information of the second learning model M2. The probability that the target indicated by is in each category is output for each category. For example, the estimation unit 133 outputs the probability that the object indicated by the first character string L5 belongs to the fashion category C1 as 100%, and the probability that the object belongs to another category is 0%.

また、推定部１３３は、第１文字列Ｌ６を取得すると、第２学習モデルＭ２の入力情報として第１文字列Ｌ６を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ６によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。例えば、推定部１３３は、第１文字列Ｌ６によって示される対象がファッションカテゴリＣ１に属する確率を６０パーセント、ブランド名に関するカテゴリＣ３に属する確率を４０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, when the estimation unit 133 acquires the first character string L6, the estimation unit 133 inputs the first character string L6 as the input information of the second learning model M2, thereby inputting the first character string L6 as the output information of the second learning model M2. The probability that the target indicated by is in each category is output for each category. For example, the estimation unit 133 outputs the probability that the object indicated by the first character string L6 belongs to the fashion category C1 is 60%, the probability that the object belongs to the category C3 related to the brand name is 40%, and the probability that the object belongs to the other category is 0%. do.

（抽出部１３４）
抽出部１３４は、推定部１３３によって推定されたカテゴリに基づいて、文字列の中から、対象分野に属する抽出対象を示す対象文字列を抽出する。具体的には、抽出部１３４は、推定部１３３によって推定されたカテゴリの中に、対象分野を示すカテゴリを含む文字列を対象文字列として抽出する。例えば、抽出部１３４は、推定部１３３によって第１文字列によって示される対象が属するカテゴリが推定されると、推定部１３３によって推定されたカテゴリの中に、対象分野を示すカテゴリが含まれるか否かを第１文字列毎に判定する。続いて、抽出部１３４は、推定部１３３によって推定されたカテゴリの中に、対象分野を示すカテゴリが含まれると判定した場合、その第１文字列を抽出する。すなわち、抽出部１３４は、対象分野を示すカテゴリに属する対象を示す第１文字列を抽出する。 (Extraction unit 134)
The extraction unit 134 extracts a target character string indicating an extraction target belonging to the target field from the character string based on the category estimated by the estimation unit 133. Specifically, the extraction unit 134 extracts a character string including the category indicating the target field from the categories estimated by the estimation unit 133 as the target character string. For example, in the extraction unit 134, when the category to which the target indicated by the first character string belongs is estimated by the estimation unit 133, whether or not the category indicating the target field is included in the categories estimated by the estimation unit 133. Is determined for each first character string. Subsequently, when the extraction unit 134 determines that the category indicating the target field is included in the category estimated by the estimation unit 133, the extraction unit 134 extracts the first character string. That is, the extraction unit 134 extracts the first character string indicating the target belonging to the category indicating the target field.

また、抽出部１３４は、推定部１３３によって推定されたカテゴリの中に、不要なカテゴリとして登録された不要カテゴリを含まない文字列を対象文字列として抽出する。例えば、抽出部１３４は、不要カテゴリ記憶部１２３を参照して、不要なカテゴリとして登録された不要カテゴリのリストを取得する。続いて、抽出部１３４は、対象分野を示すカテゴリが含まれる第１文字列を抽出すると、取得した不要カテゴリのリストを参照して、推定部１３３によって推定されたカテゴリの中に、不要なカテゴリとして登録された不要カテゴリが含まれるか否かを抽出した第１文字列毎に判定する。続いて、抽出部１３４は、推定部１３３によって推定されたカテゴリの中に、不要カテゴリが含まれないと判定した場合、その第１文字列を対象文字列として抽出する。すなわち、抽出部１３４は、不要カテゴリに属する対象を示す第１文字列以外の第１文字列を対象文字列として抽出する。 Further, the extraction unit 134 extracts a character string that does not include the unnecessary category registered as an unnecessary category from the categories estimated by the estimation unit 133 as the target character string. For example, the extraction unit 134 refers to the unnecessary category storage unit 123 to acquire a list of unnecessary categories registered as unnecessary categories. Subsequently, when the extraction unit 134 extracts the first character string including the category indicating the target field, the extraction unit 134 refers to the acquired list of unnecessary categories, and the unnecessary categories are included in the categories estimated by the estimation unit 133. It is determined for each extracted first character string whether or not the unnecessary category registered as is included. Subsequently, when the extraction unit 134 determines that the unnecessary category is not included in the categories estimated by the estimation unit 133, the extraction unit 134 extracts the first character string as the target character string. That is, the extraction unit 134 extracts the first character string other than the first character string indicating the target belonging to the unnecessary category as the target character string.

図１の右側に示す例では、情報処理装置１００は、第１文字列の中から、第１文字列「未知用語Ｌ１」（第１文字列Ｌ１）を対象文字列Ｗ１として抽出する。また、情報処理装置１００は、第１文字列の中から、第１文字列「Ｙシャツ」（第１文字列Ｌ３）を対象文字列Ｗ２として抽出する。また、情報処理装置１００は、第１文字列の中から、第１文字列「未知用語Ｌ２」（第１文字列Ｌ５）を対象文字列Ｗ３として抽出する。 In the example shown on the right side of FIG. 1, the information processing apparatus 100 extracts the first character string "unknown term L1" (first character string L1) from the first character string as the target character string W1. Further, the information processing apparatus 100 extracts the first character string "Y-shirt" (first character string L3) from the first character string as the target character string W2. Further, the information processing apparatus 100 extracts the first character string "unknown term L2" (first character string L5) from the first character string as the target character string W3.

具体的には、抽出部１３４は、推定部１３３によって第１文字列Ｌ１について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。抽出部１３４は、第１文字列Ｌ１について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、抽出部１３４は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ１について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、抽出部１３４は、第１文字列Ｌ１について推定されたカテゴリの中に、不要カテゴリを含まないと判定する。続いて、抽出部１３４は、不要カテゴリを含まないと判定したので、第１文字列Ｌ１を対象文字列Ｗ１として抽出する。 Specifically, the extraction unit 134 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L1 by the estimation unit 133. The extraction unit 134 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L1. Subsequently, since the extraction unit 134 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L1. Subsequently, the extraction unit 134 determines that the category estimated for the first character string L1 does not include an unnecessary category. Subsequently, since the extraction unit 134 determines that the unnecessary category is not included, the first character string L1 is extracted as the target character string W1.

また、抽出部１３４は、推定部１３３によって第１文字列Ｌ２について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。抽出部１３４は、第１文字列Ｌ２について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、抽出部１３４は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ２について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、抽出部１３４は、第１文字列Ｌ２について推定されたカテゴリの中に、不要カテゴリである店舗名に関するカテゴリＣ３を含むと判定する。続いて、抽出部１３４は、不要カテゴリを含むと判定したので、第１文字列Ｌ２を対象文字列として抽出しないことを決定する。 Further, the extraction unit 134 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L2 by the estimation unit 133. The extraction unit 134 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L2. Subsequently, since the extraction unit 134 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L2. Subsequently, the extraction unit 134 determines that the category C3 relating to the store name, which is an unnecessary category, is included in the category estimated for the first character string L2. Subsequently, the extraction unit 134 determines that the first character string L2 is not extracted as the target character string because it is determined that the unnecessary category is included.

また、抽出部１３４は、推定部１３３によって第１文字列Ｌ３について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。抽出部１３４は、第１文字列Ｌ３について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、抽出部１３４は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ３について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、抽出部１３４は、第１文字列Ｌ３について推定されたカテゴリの中に、不要カテゴリを含まないと判定する。続いて、抽出部１３４は、不要カテゴリを含まないと判定したので、第１文字列Ｌ３を対象文字列Ｗ２として抽出する。 Further, the extraction unit 134 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L3 by the estimation unit 133. The extraction unit 134 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L3. Subsequently, since the extraction unit 134 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L3. Subsequently, the extraction unit 134 determines that the category estimated for the first character string L3 does not include an unnecessary category. Subsequently, since the extraction unit 134 determines that the unnecessary category is not included, the first character string L3 is extracted as the target character string W2.

また、抽出部１３４は、推定部１３３によって第１文字列Ｌ４について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。抽出部１３４は、第１文字列Ｌ４について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、抽出部１３４は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ４について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、抽出部１３４は、第１文字列Ｌ４について推定されたカテゴリの中に、不要カテゴリである人名に関するカテゴリＣ２を含むと判定する。続いて、抽出部１３４は、不要カテゴリを含むと判定したので、第１文字列Ｌ４を対象文字列として抽出しないことを決定する。 Further, the extraction unit 134 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L4 by the estimation unit 133. The extraction unit 134 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L4. Subsequently, since the extraction unit 134 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L4. Subsequently, the extraction unit 134 determines that the category C2 relating to the personal name, which is an unnecessary category, is included in the category estimated for the first character string L4. Subsequently, the extraction unit 134 determines that the first character string L4 is not extracted as the target character string because it is determined that the unnecessary category is included.

また、抽出部１３４は、推定部１３３によって第１文字列Ｌ５について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。抽出部１３４は、第１文字列Ｌ５について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、抽出部１３４は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ５について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、抽出部１３４は、第１文字列Ｌ５について推定されたカテゴリの中に、不要カテゴリを含まないと判定する。続いて、抽出部１３４は、不要カテゴリを含まないと判定したので、第１文字列Ｌ５を対象文字列Ｗ３として抽出する。 Further, the extraction unit 134 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L5 by the estimation unit 133. The extraction unit 134 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L5. Subsequently, since the extraction unit 134 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L5. Subsequently, the extraction unit 134 determines that the category estimated for the first character string L5 does not include an unnecessary category. Subsequently, since the extraction unit 134 determines that the unnecessary category is not included, the first character string L5 is extracted as the target character string W3.

また、抽出部１３４は、推定部１３３によって第１文字列Ｌ６について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれるか否かを判定する。抽出部１３４は、第１文字列Ｌ６について推定されたカテゴリの中に、対象分野であるファッションカテゴリＣ１が含まれると判定する。続いて、抽出部１３４は、ファッションカテゴリＣ１が含まれると判定したので、第１文字列Ｌ６について推定されたカテゴリの中に、不要カテゴリを含むか否かを判定する。続いて、抽出部１３４は、第１文字列Ｌ６について推定されたカテゴリの中に、不要カテゴリであるブランド名に関するカテゴリＣ３を含むと判定する。続いて、抽出部１３４は、不要カテゴリを含むと判定したので、第１文字列Ｌ６を対象文字列として抽出しないことを決定する。 Further, the extraction unit 134 determines whether or not the fashion category C1 which is the target field is included in the category estimated for the first character string L6 by the estimation unit 133. The extraction unit 134 determines that the fashion category C1 which is the target field is included in the category estimated for the first character string L6. Subsequently, since the extraction unit 134 has determined that the fashion category C1 is included, it is determined whether or not the unnecessary category is included in the category estimated for the first character string L6. Subsequently, the extraction unit 134 determines that the category C3 relating to the brand name, which is an unnecessary category, is included in the category estimated for the first character string L6. Subsequently, the extraction unit 134 determines that the first character string L6 is not extracted as the target character string because it is determined that the unnecessary category is included.

次に、図８を用いて、実施形態に係る情報処理の一例についてより詳しく説明する。図８は、実施形態に係る情報処理の一例を示す図である。図８に示す例では、処理部１３２は、クエリ情報記憶部１２１を参照して、ファッション系サイトに流入した検索クエリとして入力された文字列を取得する（ステップＳ１´）。 Next, an example of information processing according to the embodiment will be described in more detail with reference to FIG. FIG. 8 is a diagram showing an example of information processing according to the embodiment. In the example shown in FIG. 8, the processing unit 132 refers to the query information storage unit 121 and acquires a character string input as a search query that has flowed into the fashion-related site (step S1').

例えば、処理部１３２は、検索クエリＱ１として入力された文字列「サイト名Ｎ１コサッシュ」を取得する。また、処理部１３２は、検索クエリＱ２として入力された文字列「サイト名Ｎ２コサッシュ人名Ｍ１」を取得する。また、処理部１３２は、検索クエリＱ３として入力された文字列「花柄サーマルレディース」を取得する。また、処理部１３２は、検索クエリＱ４として入力された文字列「花柄サーマル人気ブランド名Ｂ１」を取得する。また、処理部１３２は、検索クエリＱ５として入力された文字列「花柄サーマルコーデ店舗名Ｔ１」を取得する。また、処理部１３２は、検索クエリＱ６として入力された文字列「マニデニムＭサイズ」を取得する。また、処理部１３２は、検索クエリＱ７として入力された文字列「マニデニムＬサイズブランド名Ｂ２」を取得する。また、処理部１３２は、検索クエリＱ８として入力された文字列「抜け襟２０代」を取得する。また、処理部１３２は、検索クエリＱ９として入力された文字列「抜け襟３０代人名Ｍ２」を取得する。 For example, the processing unit 132 acquires the character string “site name N1 cosash” input as the search query Q1. Further, the processing unit 132 acquires the character string "site name N2 Kosash person name M1" input as the search query Q2. Further, the processing unit 132 acquires the character string "floral pattern thermal ladies" input as the search query Q3. Further, the processing unit 132 acquires the character string "floral pattern thermal popular brand name B1" input as the search query Q4. Further, the processing unit 132 acquires the character string "floral pattern thermal coordination store name T1" input as the search query Q5. Further, the processing unit 132 acquires the character string "mani-denim M size" input as the search query Q6. Further, the processing unit 132 acquires the character string "manidenim L size brand name B2" input as the search query Q7. In addition, the processing unit 132 acquires the character string "missing collar 20s" input as the search query Q8. Further, the processing unit 132 acquires the character string “missing collar 30s person name M2” input as the search query Q9.

続いて、処理部１３２は、流入検索クエリを取得すると、不要文字列記憶部１２２を参照して、取得した流入検索クエリ群に含まれる各検索クエリとして入力された各文字列から不要な文字列として登録された不要文字列を取り除いた第１文字列を取得する。具体的には、処理部１３２は、図８のステップＳ１´の下段に示すテーブル１２１Ａを記憶部１２０に一時的に記憶する。 Subsequently, when the processing unit 132 acquires the inflow search query, the processing unit 132 refers to the unnecessary character string storage unit 122, and an unnecessary character string from each character string input as each search query included in the acquired inflow search query group. The first character string from which the unnecessary character string registered as is removed is acquired. Specifically, the processing unit 132 temporarily stores the table 121A shown in the lower part of step S1'in FIG. 8 in the storage unit 120.

例えば、処理部１３２は、検索クエリＱ１として入力された文字列「サイト名Ｎ１コサッシュ」から不要文字列「サイト名Ｎ１」（不要文字列ＵＬ１１）を取り除いた第１文字列「コサッシュ」（第１文字列Ｌ１１）を取得する。また、処理部１３２は、検索クエリＱ２として入力された文字列「サイト名Ｎ２コサッシュ人名Ｍ１」から不要文字列「サイト名Ｎ２」（不要文字列ＵＬ１２）を取り除いた第１文字列「コサッシュ人名Ｍ１」（第１文字列Ｌ１２）を取得する。また、処理部１３２は、検索クエリＱ３として入力された文字列「花柄サーマルレディース」から不要文字列「レディース」（不要文字列ＵＬ２１）を取り除いた第１文字列「花柄サーマル」（第１文字列Ｌ１３）を取得する。また、処理部１４２は、検索クエリＱ４として入力された文字列「花柄サーマル人気ブランド名Ｂ１」から不要文字列「人気」（不要文字列ＵＬ２２）を取り除いた第１文字列「花柄サーマルブランド名Ｂ１」（第１文字列Ｌ１４）を取得する。また、処理部１５２は、検索クエリＱ５として入力された文字列「花柄サーマルコーデ店舗名Ｔ１」から不要文字列「コーデ」（不要文字列ＵＬ２３）を取り除いた第１文字列「花柄サーマル店舗名Ｔ１」（第１文字列Ｌ１５）を取得する。また、処理部１６２は、検索クエリＱ６として入力された文字列「マニデニムＭサイズ」から不要文字列「Ｍサイズ」（不要文字列ＵＬ３１）を取り除いた第１文字列「マニデニム」（第１文字列Ｌ１６）を取得する。また、処理部１７２は、検索クエリＱ７として入力された文字列「マニデニムＬサイズブランド名Ｂ２」から不要文字列「Ｌサイズ」（不要文字列ＵＬ３２）を取り除いた第１文字列「マニデニムブランド名Ｂ２」（第１文字列Ｌ１７）を取得する。また、処理部１８２は、検索クエリＱ８として入力された文字列「抜け襟２０代」から不要文字列「２０代」（不要文字列ＵＬ４１）を取り除いた第１文字列「抜け襟」（第１文字列Ｌ１８）を取得する。また、処理部１９２は、検索クエリＱ９として入力された文字列「抜け襟３０代人名Ｍ２」から不要文字列「３０代」（不要文字列ＵＬ４２）を取り除いた第１文字列「抜け襟人名Ｍ２」（第１文字列Ｌ１９）を取得する。 For example, the processing unit 132 removes the unnecessary character string "site name N1" (unnecessary character string UL11) from the character string "site name N1 cosash" input as the search query Q1, and removes the first character string "cosash" (first). The character string L11) is acquired. Further, the processing unit 132 removes the unnecessary character string "site name N2" (unnecessary character string UL12) from the character string "site name N2 Kosash person name M1" input as the search query Q2, and removes the first character string "Kosash person name M1". "(First character string L12) is acquired. Further, the processing unit 132 removes the unnecessary character string "ladies" (unnecessary character string UL21) from the character string "floral pattern thermal ladies" input as the search query Q3, and removes the first character string "floral pattern thermal" (first). The character string L13) is acquired. In addition, the processing unit 142 removes the unnecessary character string "popular" (unnecessary character string UL22) from the character string "flower pattern thermal popular brand name B1" input as the search query Q4, and removes the first character string "flower pattern thermal brand". First name B1 ”(first character string L14) is acquired. In addition, the processing unit 152 removes the unnecessary character string "coordination" (unnecessary character string UL23) from the character string "flower pattern thermal coordination store name T1" input as the search query Q5, and removes the first character string "flower pattern thermal coordination store". First name T1 ”(first character string L15) is acquired. Further, the processing unit 162 removes the unnecessary character string "M size" (unnecessary character string UL31) from the character string "mani denim M size" input as the search query Q6, and removes the first character string "mani denim" (first character string). L16) is acquired. Further, the processing unit 172 removes the unnecessary character string "L size" (unnecessary character string UL32) from the character string "mani denim L size brand name B2" input as the search query Q7, and removes the first character string "mani denim brand name B2". "(First character string L17) is acquired. Further, the processing unit 182 removes the unnecessary character string "20s" (unnecessary character string UL41) from the character string "missing collar 20s" input as the search query Q8, and removes the first character string "missing collar" (first). The character string L18) is acquired. In addition, the processing unit 192 removes the unnecessary character string "30s" (unnecessary character string UL42) from the character string "missing collar 30s person name M2" input as the search query Q9, and removes the first character string "missing collar person name M2". "(First character string L19) is acquired.

続いて、推定部１３３は、処理部１３２によって第１文字列が取得されると、モデル情報記憶部１２４を参照して、第２学習モデル（第２学習モデルＭ２）を取得する。続いて、推定部１３３は、第２学習モデルを取得すると、第２学習モデルを用いて、第１文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。図８に示す例では、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列を入力することにより、第２学習モデルＭ２の出力情報として第１文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する（ステップＳ２´）。具体的には、推定部１３３は、図８のステップＳ２´に示すテーブル１２１Ｂの情報を記憶部１２０に一時的に記憶する。 Subsequently, when the first character string is acquired by the processing unit 132, the estimation unit 133 refers to the model information storage unit 124 and acquires the second learning model (second learning model M2). Subsequently, when the estimation unit 133 acquires the second learning model, the estimation unit 133 outputs the probability that the object indicated by the first character string belongs to each category by using the second learning model. In the example shown in FIG. 8, the estimation unit 133 inputs the first character string as the input information of the second learning model M2, and the target indicated by the first character string as the output information of the second learning model M2 is each. The probability of belonging to a category is output for each category (step S2'). Specifically, the estimation unit 133 temporarily stores the information in the table 121B shown in step S2'in FIG. 8 in the storage unit 120.

例えば、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「コサッシュ」（第１文字列Ｌ１１）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１１によって示される対象がファッションカテゴリＣ１に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 For example, the estimation unit 133 inputs the first character string "Cosash" (first character string L11) as the input information of the second learning model M2, so that the first character string L11 is output information of the second learning model M2. The probability that the object indicated by the item belongs to the fashion category C1 is 100%, and the probability that the object belongs to the other category is 0%.

また、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「コサッシュ人名Ｍ１」（第１文字列Ｌ１２）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１２によって示される対象がファッションカテゴリＣ１に属する確率を５０パーセント、人名に関するカテゴリＣ２に属する確率を５０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, the estimation unit 133 inputs the first character string "Kosash person name M1" (first character string L12) as the input information of the second learning model M2, so that the first character is output information of the second learning model M2. The probability that the object represented by the column L12 belongs to the fashion category C1 is output as 50%, the probability that the object belongs to the category C2 related to the person name is 50%, and the probability that the object belongs to the other category is 0%.

また、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「花柄サーマル」（第１文字列Ｌ１３）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１３によって示される対象がファッションカテゴリＣ１に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, the estimation unit 133 inputs the first character string "flower pattern thermal" (first character string L13) as the input information of the second learning model M2, so that the first character is output information of the second learning model M2. The probability that the object represented by the column L13 belongs to the fashion category C1 is 100%, and the probability that the object belongs to the other category is 0%.

また、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「花柄サーマルブランド名Ｂ１」（第１文字列Ｌ１４）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１４によって示される対象がファッションカテゴリＣ１に属する確率を６０パーセント、ブランド名に関するカテゴリＣ３に属する確率を４０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, the estimation unit 133 inputs the first character string "flower pattern thermal brand name B1" (first character string L14) as the input information of the second learning model M2, and thus as the output information of the second learning model M2. The probability that the object indicated by the first character string L14 belongs to the fashion category C1 is output as 60%, the probability that the object belongs to the category C3 related to the brand name is 40%, and the probability that the object belongs to the other category is 0%.

また、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「花柄サーマル店舗名Ｔ１」（第１文字列Ｌ１５）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１５によって示される対象がファッションカテゴリＣ１に属する確率を７０パーセント、店舗名に関するカテゴリＣ４に属する確率を３０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, the estimation unit 133 inputs the first character string "flower pattern thermal store name T1" (first character string L15) as the input information of the second learning model M2, and thus as the output information of the second learning model M2. The probability that the object indicated by the first character string L15 belongs to the fashion category C1 is 70%, the probability that the object belongs to the category C4 related to the store name is 30%, and the probability that the object belongs to the other category is 0%.

また、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「マニデニム」（第１文字列Ｌ１６）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１６によって示される対象がファッションカテゴリＣ１に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, the estimation unit 133 inputs the first character string "mani denim" (first character string L16) as the input information of the second learning model M2, so that the first character string L16 is output information of the second learning model M2. The probability that the object indicated by the item belongs to the fashion category C1 is 100%, and the probability that the object belongs to the other category is 0%.

また、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「マニデニムブランド名Ｂ２」（第１文字列Ｌ１７）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１７によって示される対象がファッションカテゴリＣ１に属する確率を５０パーセント、ブランド名に関するカテゴリＣ３に属する確率を５０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, the estimation unit 133 inputs the first character string "manidenim brand name B2" (first character string L17) as the input information of the second learning model M2, so that the first is the output information of the second learning model M2. The probability that the object indicated by the character string L17 belongs to the fashion category C1 is 50%, the probability that the object belongs to the category C3 related to the brand name is 50%, and the probability that the object belongs to the other category is 0%.

また、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「抜け襟」（第１文字列Ｌ１８）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１８によって示される対象がファッションカテゴリＣ１に属する確率を１００パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, the estimation unit 133 inputs the first character string "missing collar" (first character string L18) as the input information of the second learning model M2, so that the first character string is output information of the second learning model M2. The probability that the object indicated by L18 belongs to the fashion category C1 is output as 100%, and the probability that the object belongs to the other category is output as 0%.

また、推定部１３３は、第２学習モデルＭ２の入力情報として第１文字列「抜け襟人名Ｍ２」（第１文字列Ｌ１９）を入力することにより、第２学習モデルＭ２の出力情報として第１文字列Ｌ１９によって示される対象がファッションカテゴリＣ１に属する確率を８０パーセント、人名に関するカテゴリＣ２に属する確率を２０パーセント、その他のカテゴリに属する確率を０％と出力する。 Further, the estimation unit 133 inputs the first character string "missing collar person name M2" (first character string L19) as the input information of the second learning model M2, so that the first is the output information of the second learning model M2. The probability that the object indicated by the character string L19 belongs to the fashion category C1 is 80%, the probability that the object belongs to the category C2 related to the person name is 20%, and the probability that the object belongs to the other category is 0%.

続いて、抽出部１３４は、推定部１３３によって第１文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力すると、推定部１３３によって推定されたカテゴリの中に、対象分野を示すカテゴリが含まれる第１文字列を抽出する。すなわち、抽出部１３４は、対象分野を示すカテゴリに属する対象を示す第１文字列を抽出する。続いて、抽出部１３４は、対象分野を示すカテゴリが含まれる第１文字列を抽出すると、不要カテゴリ記憶部１２３を参照して、推定部１３３によって推定されたカテゴリの中に、不要カテゴリが含まれない第１文字列を対象文字列として抽出する。すなわち、抽出部１３４は、不要カテゴリに属する対象を示す第１文字列以外の第１文字列を対象文字列（第２文字列ともいう）として抽出する（ステップＳ３´）。具体的には、推定部１３３は、図８のステップＳ３´に示すテーブル１２１Ｃの情報を記憶部１２０に記憶する。 Subsequently, when the extraction unit 134 outputs the probability that the object indicated by the first character string belongs to each category by the estimation unit 133 for each category, the category indicating the target field is included in the categories estimated by the estimation unit 133. The first character string containing is extracted. That is, the extraction unit 134 extracts the first character string indicating the target belonging to the category indicating the target field. Subsequently, when the extraction unit 134 extracts the first character string including the category indicating the target field, the unnecessary category is included in the categories estimated by the estimation unit 133 with reference to the unnecessary category storage unit 123. The first character string that cannot be extracted is extracted as the target character string. That is, the extraction unit 134 extracts the first character string other than the first character string indicating the target belonging to the unnecessary category as the target character string (also referred to as the second character string) (step S3'). Specifically, the estimation unit 133 stores the information of the table 121C shown in step S3'of FIG. 8 in the storage unit 120.

例えば、抽出部１３４は、第１文字列「コサッシュ」（第１文字列Ｌ１１）について推定されたカテゴリの中に、対象分野を示すファッションカテゴリＣ１が含まれており、かつ、不要カテゴリが含まれないので、第１文字列「コサッシュ」（第１文字列Ｌ１１）を第２文字列Ｗ２１として抽出する。 For example, the extraction unit 134 includes the fashion category C1 indicating the target field in the category estimated for the first character string “Cosash” (first character string L11), and also includes an unnecessary category. Since there is no such thing, the first character string "Cosash" (first character string L11) is extracted as the second character string W21.

また、抽出部１３４は、第１文字列「コサッシュ人名Ｍ１」（第１文字列Ｌ１２）について推定されたカテゴリの中にはファッションカテゴリＣ１が含まれるが、不要カテゴリである人名に関するカテゴリＣ２が含まれるので、第１文字列「コサッシュ人名Ｍ１」（第１文字列Ｌ１２）を第２文字列として抽出しないことを決定する。 Further, the extraction unit 134 includes the fashion category C1 in the category estimated for the first character string "Kosash person name M1" (first character string L12), but includes the category C2 for the person name which is an unnecessary category. Therefore, it is determined not to extract the first character string "Kosash person name M1" (first character string L12) as the second character string.

また、抽出部１３４は、第１文字列「花柄サーマル」（第１文字列Ｌ１３）について推定されたカテゴリの中に、対象分野を示すファッションカテゴリＣ１が含まれており、かつ、不要カテゴリが含まれないので、第１文字列「花柄サーマル」（第１文字列Ｌ１３）を第２文字列Ｗ２２として抽出する。 Further, the extraction unit 134 includes the fashion category C1 indicating the target field in the category estimated for the first character string "flower pattern thermal" (first character string L13), and the unnecessary category is included. Since it is not included, the first character string "flower pattern thermal" (first character string L13) is extracted as the second character string W22.

また、抽出部１３４は、第１文字列「花柄サーマルブランド名Ｂ１」（第１文字列Ｌ１４）について推定されたカテゴリの中にはファッションカテゴリＣ１が含まれるが、不要カテゴリであるブランド名に関するカテゴリＣ３が含まれるので、第１文字列「花柄サーマルブランド名Ｂ１」（第１文字列Ｌ１４）を第２文字列として抽出しないことを決定する。 Further, the extraction unit 134 includes the fashion category C1 in the category estimated for the first character string "flower pattern thermal brand name B1" (first character string L14), but relates to a brand name which is an unnecessary category. Since the category C3 is included, it is determined not to extract the first character string "flower pattern thermal brand name B1" (first character string L14) as the second character string.

また、抽出部１３４は、第１文字列「花柄サーマル店舗名Ｔ１」（第１文字列Ｌ１５）について推定されたカテゴリの中にはファッションカテゴリＣ１が含まれるが、不要カテゴリである店舗名に関するカテゴリＣ４が含まれるので、第１文字列「花柄サーマル店舗名Ｔ１」（第１文字列Ｌ１５）を第２文字列として抽出しないことを決定する。 Further, the extraction unit 134 includes the fashion category C1 in the category estimated for the first character string "flower pattern thermal store name T1" (first character string L15), but relates to the store name which is an unnecessary category. Since the category C4 is included, it is determined not to extract the first character string "flower pattern thermal store name T1" (first character string L15) as the second character string.

また、抽出部１３４は、第１文字列「マニデニム」（第１文字列Ｌ１６）について推定されたカテゴリの中に、対象分野を示すファッションカテゴリＣ１が含まれており、かつ、不要カテゴリが含まれないので、第１文字列「マニデニム」（第１文字列Ｌ１６）を第２文字列Ｗ２３として抽出する。 Further, the extraction unit 134 includes the fashion category C1 indicating the target field in the category estimated for the first character string "mani denim" (first character string L16), and also includes an unnecessary category. Since there is no such thing, the first character string "mani denim" (first character string L16) is extracted as the second character string W23.

また、抽出部１３４は、第１文字列「マニデニムブランド名Ｂ２」（第１文字列Ｌ１７）について推定されたカテゴリの中にはファッションカテゴリＣ１が含まれるが、不要カテゴリであるブランド名に関するカテゴリＣ３が含まれるので、第１文字列「マニデニムブランド名Ｂ２」（第１文字列Ｌ１７）を第２文字列として抽出しないことを決定する。 Further, the extraction unit 134 includes the fashion category C1 in the category estimated for the first character string "mani denim brand name B2" (first character string L17), but the category C3 regarding the brand name which is an unnecessary category. Is included, so it is determined not to extract the first character string "manidenim brand name B2" (first character string L17) as the second character string.

また、抽出部１３４は、第１文字列「抜け襟」（第１文字列Ｌ１８）について推定されたカテゴリの中に、対象分野を示すファッションカテゴリＣ１が含まれており、かつ、不要カテゴリが含まれないので、第１文字列「抜け襟」（第１文字列Ｌ１８）を第２文字列Ｗ２４として抽出する。 Further, the extraction unit 134 includes the fashion category C1 indicating the target field in the category estimated for the first character string "missing collar" (first character string L18), and also includes an unnecessary category. Therefore, the first character string "missing collar" (first character string L18) is extracted as the second character string W24.

また、抽出部１３４は、第１文字列「抜け襟人名Ｍ２」（第１文字列Ｌ１９）について推定されたカテゴリの中にはファッションカテゴリＣ１が含まれるが、不要カテゴリである人名に関するカテゴリＣ２が含まれるので、第１文字列「抜け襟人名Ｍ２」（第１文字列Ｌ１９）を第２文字列として抽出しないことを決定する。 Further, the extraction unit 134 includes the fashion category C1 in the category estimated for the first character string "missing collar person name M2" (first character string L19), but the category C2 relating to the person name which is an unnecessary category is included. Since it is included, it is determined not to extract the first character string "missing collar person name M2" (first character string L19) as the second character string.

〔１−３．情報処理のフロー〕
次に、図９を用いて、実施形態に係る情報処理の手順について説明する。図９は、実施形態に係る情報処理手順を示すフローチャートである。図９に示す例では、情報処理装置１００は、対象分野に関するサイトに流入した検索クエリを取得する（ステップＳ１０１）。続いて、情報処理装置１００は、対象分野に関するサイトに流入した検索クエリを取得すると、検索クエリとして入力された文字列から不要文字列を取り除いた第１文字列を取得する（ステップＳ１０２）。続いて、情報処理装置１００は、第１文字列を取得すると、第２学習モデルを用いて、第１文字列によって示される対象が属するカテゴリを推定する（ステップＳ１０３）。続いて、情報処理装置１００は、カテゴリを推定すると、推定したカテゴリに基づいて、第１文字列の中から抽出対象を示す第２文字列を抽出する（ステップＳ１０４）。 [1-3. Information processing flow]
Next, the procedure of information processing according to the embodiment will be described with reference to FIG. FIG. 9 is a flowchart showing an information processing procedure according to the embodiment. In the example shown in FIG. 9, the information processing apparatus 100 acquires the search query that has flowed into the site related to the target field (step S101). Subsequently, when the information processing apparatus 100 acquires the search query that has flowed into the site related to the target field, it acquires the first character string obtained by removing the unnecessary character string from the character string input as the search query (step S102). Subsequently, when the information processing apparatus 100 acquires the first character string, the information processing apparatus 100 estimates the category to which the object indicated by the first character string belongs by using the second learning model (step S103). Subsequently, when the category is estimated, the information processing apparatus 100 extracts a second character string indicating an extraction target from the first character string based on the estimated category (step S104).

〔２．学習モデルの生成処理〕
〔２−１．第１学習モデルの生成処理〕
次に、図１０を用いて、第１学習モデルの生成処理の流れについて説明する。図１０は、実施形態に係る第１学習モデルの生成処理の一例を示す図である。図１０に示す例では、生成装置５０は、同一のユーザＵ１によって所定の時間内に連続して入力された「六本木パスタ」という検索クエリＱ１１と「六本木イタリアン」という検索クエリＱ１２とから成る一対の検索クエリを抽出する（ステップＳ１１）。 [2. Learning model generation process]
[2-1. First learning model generation process]
Next, the flow of the generation process of the first learning model will be described with reference to FIG. FIG. 10 is a diagram showing an example of a generation process of the first learning model according to the embodiment. In the example shown in FIG. 10, the generator 50 is a pair consisting of a search query Q11 "Roppongi pasta" and a search query Q12 "Roppongi Italian" continuously input by the same user U1 within a predetermined time. Extract the search query (step S11).

続いて、生成装置５０は、抽出した検索クエリＱ１１を第１モデルＭ１に入力して、検索クエリＱ１１の分散表現であるベクトルＢＱＶ１１を出力する。ここで、ベクトルＢＱＶ１１は、第１モデルＭ１の出力層から出力されたばかりの検索クエリＱ１１の分散表現であって、第１モデルＭ１にフィードバックをかける前（学習前）の分散表現を示す。また、生成装置５０は、抽出した検索クエリＱ１２を第１モデルＭ１に入力して、検索クエリＱ１２の分散表現であるベクトルＢＱＶ１２を出力する。ここで、ベクトルＢＱＶ１２は、第１モデルＭ１の出力層から出力されたばかりの検索クエリＱ１２の分散表現であって、第１モデルＭ１にフィードバックをかける前（学習前）の分散表現を示す。このようにして、生成装置５０は、検索クエリＱ１１の分散表現であるベクトルＢＱＶ１１と、検索クエリＱ１２の分散表現であるベクトルＢＱＶ１２とを出力する（ステップＳ１２）。 Subsequently, the generation device 50 inputs the extracted search query Q11 into the first model M1 and outputs the vector BQV11 which is a distributed representation of the search query Q11. Here, the vector BQV11 is a distributed expression of the search query Q11 just output from the output layer of the first model M1, and shows a distributed expression before giving feedback to the first model M1 (before learning). Further, the generation device 50 inputs the extracted search query Q12 into the first model M1 and outputs a vector BQV12 which is a distributed representation of the search query Q12. Here, the vector BQV12 is a distributed expression of the search query Q12 just output from the output layer of the first model M1, and shows a distributed expression before giving feedback to the first model M1 (before learning). In this way, the generation device 50 outputs the vector BQV11 which is the distributed representation of the search query Q11 and the vector BQV12 which is the distributed representation of the search query Q12 (step S12).

続いて、生成装置５０は、同一のユーザＵ１によって所定の時間内に連続して入力された検索クエリＱ１１（「六本木パスタ」）と検索クエリＱ１２（「六本木イタリアン」）とから成る一対の検索クエリは、所定の検索意図（例えば、「ある場所で飲食店を探す」という検索意図）で入力された検索クエリであると推定されるため、相互に類似する特徴を有するものとして、検索クエリＱ１１の分散表現（ベクトルＱＶ１１）と、検索クエリＱ１１と対となる検索クエリＱ１２の分散表現（ベクトルＱＶ１２）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。例えば、第１モデルＭ１にフィードバックをかける前（学習前）の検索クエリＱ１１の分散表現であるベクトルＢＱＶ１１と検索クエリＱ１２の分散表現であるベクトルＢＱＶ１２とのなす角度の大きさをΘとする。また、第１モデルＭ１にフィードバックをかけた後（学習後）の検索クエリＱ１１の分散表現であるベクトルＱＶ１１と検索クエリＱ１２の分散表現であるベクトルＱＶ１２とのなす角度の大きさをΦとする。この時、生成装置５０は、ΘよりもΦが小さくなるように、第１モデルＭ１を学習させる。例えば、生成装置５０は、ベクトルＢＱＶ１１とベクトルＢＱＶ１２のコサイン類似度の値を算出する。また、生成装置５０は、ベクトルＱＶ１１とベクトルＱＶ１２のコサイン類似度の値を算出する。続いて、生成装置５０は、ベクトルＢＱＶ１１とベクトルＢＱＶ１２のコサイン類似度の値よりも、ベクトルＱＶ１１とベクトルＱＶ１２のコサイン類似度の値が大きくなるように（値が１に近づくように）第１モデルＭ１を学習させる。このように、生成装置５０は、一対の検索クエリに対応する一対の分散表現である２つのベクトルが分散表現空間上で類似するように第１モデルＭ１を学習させることで、検索クエリから分散表現を出力する第１モデルＭ１を生成する（ステップＳ１３）。なお、生成装置５０は、コサイン類似度に限らず、ベクトル間の距離尺度として適用可能な指標であれば、どのような指標に基づいて分散表現の間の類似度を算出してもよい。また、生成装置５０は、ベクトル間の距離尺度として適用可能な指標であれば、どのような指標に基づいて第１モデルＭ１を学習させてもよい。例えば、生成装置５０は、分散表現同士のユークリッド距離や双曲空間等の非ユークリッド空間中での距離、マンハッタン距離、マハラノビス距離等といった所定の距離関数の値を算出する。続いて、生成装置５０は、分散表現同士の所定の距離関数の値（すなわち、分散表現空間における距離）が小さくなるように第１モデルＭ１を学習させてもよい。 Subsequently, the generation device 50 is a pair of search queries composed of a search query Q11 (“Roppongi pasta”) and a search query Q12 (“Roppongi Italian”) continuously input by the same user U1 within a predetermined time. Is a search query entered with a predetermined search intention (for example, a search intention of "searching for a restaurant in a certain place"). The first model M1 is trained so that the distributed expression (vector QV11) and the distributed expression (vector QV12) of the search query Q12 paired with the search query Q11 are similar on the distributed expression space. For example, let Θ be the size of the angle formed by the vector BQV11 which is the distributed expression of the search query Q11 before giving feedback to the first model M1 (before learning) and the vector BQV12 which is the distributed expression of the search query Q12. Further, the size of the angle formed by the vector QV11 which is the distributed expression of the search query Q11 after giving feedback to the first model M1 (after learning) and the vector QV12 which is the distributed expression of the search query Q12 is Φ. At this time, the generator 50 trains the first model M1 so that Φ is smaller than Θ. For example, the generator 50 calculates the value of the cosine similarity between the vector BQV11 and the vector BQV12. Further, the generation device 50 calculates the value of the cosine similarity between the vector QV11 and the vector QV12. Subsequently, the generator 50 uses the first model so that the value of the cosine similarity between the vector QV11 and the vector QV12 is larger than the value of the cosine similarity between the vector BQV11 and the vector BQV12 (so that the value approaches 1). Learn M1. In this way, the generation device 50 trains the first model M1 so that the two vectors, which are a pair of distributed representations corresponding to the pair of search queries, are similar in the distributed representation space, so that the distributed representation is expressed from the search query. Is generated (step S13). The generator 50 is not limited to the cosine similarity, and may calculate the similarity between the distributed representations based on any index as long as it is an index applicable as a distance scale between vectors. Further, the generation device 50 may train the first model M1 based on any index as long as it is an index applicable as a distance scale between vectors. For example, the generation device 50 calculates the value of a predetermined distance function such as the Euclidean distance between distributed expressions, the distance in a non-Euclidean space such as a twin-curved space, the Manhattan distance, the Mahalanobis distance, and the like. Subsequently, the generation device 50 may train the first model M1 so that the value of a predetermined distance function between the distributed representations (that is, the distance in the distributed representation space) becomes small.

次に、図１１を用いて、第１学習モデルの生成処理の流れについてより詳しく説明する。なお、図１１の説明では、図１０の説明と重複する部分は、適宜省略する。図１１は、実施形態に係る第１学習モデルの生成処理を示す図である。図１１に示す例では、生成装置５０が生成した第１モデルＭ１によって出力された分散表現が分散表現空間にマッピングされる様子が示されている。生成装置５０は、所定の検索クエリの分散表現と所定の検索クエリと対となる他の検索クエリの分散表現とが分散表現空間上で近くにマッピングされるように第１モデルＭ１のトレーニングを行う。 Next, the flow of the generation process of the first learning model will be described in more detail with reference to FIG. In the description of FIG. 11, a part that overlaps with the description of FIG. 10 will be omitted as appropriate. FIG. 11 is a diagram showing a generation process of the first learning model according to the embodiment. In the example shown in FIG. 11, the distributed representation output by the first model M1 generated by the generation device 50 is mapped to the distributed representation space. The generation device 50 trains the first model M1 so that the distributed representation of a predetermined search query and the distributed representation of another search query paired with the predetermined search query are closely mapped on the distributed representation space. ..

図１１の上段に示す例では、生成装置５０は、同一のユーザＵ１によって所定の時間内に連続して入力された４個の検索クエリである検索クエリＱ１１（「六本木パスタ」）、検索クエリＱ１２（「六本木イタリアン」）、検索クエリＱ１３（「赤坂パスタ」）、検索クエリＱ１４（「麻布パスタ」）を抽出する。生成装置５０は、同一のユーザＵ１によって各検索クエリが入力された時間の間隔が所定の時間内である４個の検索クエリを抽出する。生成装置５０は、同一のユーザＵ１によって後述する各検索クエリのペアが入力された時間の間隔が所定の時間内である複数の検索クエリを抽出する。生成装置５０は、検索クエリが入力された順番に並べると、検索クエリＱ１１、検索クエリＱ１２、検索クエリＱ１３、検索クエリＱ１４の順番で入力された４個の検索クエリを抽出する。生成装置５０は、４個の検索クエリを抽出すると、時系列的に隣り合う２つの検索クエリを一対の検索クエリとして、３対の検索クエリのペアである（検索クエリＱ１１、検索クエリＱ１２）、（検索クエリＱ１２、検索クエリＱ１３）、（検索クエリＱ１３、検索クエリＱ１４）を抽出する（ステップＳ２１−１）。なお、生成装置５０は、同一のユーザＵ１によって全ての検索クエリが所定の時間内に入力された複数の検索クエリを抽出してもよい。そして、生成装置５０は、時系列的に隣り合うか否かに関わらず、抽出した複数の検索クエリの中から２つの検索クエリを選択して、選択した２つの検索クエリを一対の検索クエリとして抽出してもよい。 In the example shown in the upper part of FIG. 11, the generation device 50 is a search query Q11 (“Roppongi pasta”) and a search query Q12, which are four search queries continuously input by the same user U1 within a predetermined time. ("Roppongi Italian"), search query Q13 ("Akasaka pasta"), search query Q14 ("Azabu pasta") are extracted. The generation device 50 extracts four search queries in which the time interval in which each search query is input by the same user U1 is within a predetermined time. The generation device 50 extracts a plurality of search queries in which the time interval in which each search query pair described later is input by the same user U1 is within a predetermined time. When the generation device 50 arranges the search queries in the order in which they are input, the generation device 50 extracts four search queries input in the order of search query Q11, search query Q12, search query Q13, and search query Q14. When the generation device 50 extracts four search queries, it is a pair of three pairs of search queries (search query Q11, search query Q12), with two search queries adjacent in chronological order as a pair of search queries. (Search query Q12, search query Q13) and (search query Q13, search query Q14) are extracted (step S21-1). The generation device 50 may extract a plurality of search queries in which all the search queries are input within a predetermined time by the same user U1. Then, the generation device 50 selects two search queries from the extracted plurality of search queries regardless of whether they are adjacent to each other in chronological order, and the two selected search queries are used as a pair of search queries. It may be extracted.

続いて、生成装置５０は、抽出した検索クエリＱ１ｋ（ｋ＝１、２、３、４）を第１モデルＭ１に入力して、検索クエリＱ１ｋ（ｋ＝１、２、３、４）の分散表現であるベクトルＢＱＶ１ｋ（ｋ＝１、２、３、４）を出力する。ここで、ベクトルＢＱＶ１ｋ（ｋ＝１、２、３、４）は、第１モデルＭ１の出力層から出力されたばかりの検索クエリＱ１ｋ（ｋ＝１、２、３、４）の分散表現であって、第１モデルＭ１にフィードバックをかける前（学習前）の分散表現を示す（ステップＳ２２−１）。 Subsequently, the generation device 50 inputs the extracted search query Q1k (k = 1, 2, 3, 4) into the first model M1, and distributes the search query Q1k (k = 1, 2, 3, 4). The representation vector BQV1k (k = 1, 2, 3, 4) is output. Here, the vector BQV1k (k = 1, 2, 3, 4) is a distributed representation of the search query Q1k (k = 1, 2, 3, 4) just output from the output layer of the first model M1. , The distributed representation before giving feedback (before learning) to the first model M1 is shown (step S22-1).

続いて、生成装置５０は、同一のユーザＵ１によって所定の時間内に連続して入力された一対の検索クエリは、所定の検索意図（例えば、「ある場所（東京都港区付近）で飲食店を探す」という検索意図）で入力された検索クエリであると推定されるため、相互に類似する特徴を有するものとして、検索クエリＱ１１の分散表現（ベクトルＱＶ１１）と、検索クエリＱ１１と対となる検索クエリＱ１２の分散表現（ベクトルＱＶ１２）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。また、生成装置５０は、検索クエリＱ１２の分散表現（ベクトルＱＶ１２）と、検索クエリＱ１２と対となる検索クエリＱ１３の分散表現（ベクトルＱＶ１３）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。また、生成装置５０は、検索クエリＱ１３の分散表現（ベクトルＱＶ１３）と、検索クエリＱ１３と対となる検索クエリＱ１４の分散表現（ベクトルＱＶ１４）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。このように、生成装置５０は、一対の検索クエリに対応する一対の分散表現である２つのベクトルが分散表現空間上で類似するように第１モデルＭ１を学習させることで、検索クエリから分散表現を出力する第１モデルＭ１を生成する（ステップＳ２３−１）。 Subsequently, in the generation device 50, a pair of search queries continuously input by the same user U1 within a predetermined time is a predetermined search intention (for example, "a restaurant at a certain place (near Minato-ku, Tokyo)). Since it is presumed that the search query was entered with the search intent of "searching for"), the distributed representation of the search query Q11 (vector QV11) and the search query Q11 are paired as having similar characteristics to each other. The first model M1 is trained so that the distributed expression (vector QV12) of the search query Q12 is similar on the distributed expression space. Further, in the generation device 50, the distributed representation of the search query Q12 (vector QV12) and the distributed representation of the search query Q13 paired with the search query Q12 (vector QV13) are similar to each other in the distributed representation space. Train model M1. Further, in the generation device 50, the distributed expression of the search query Q13 (vector QV13) and the distributed expression of the search query Q14 paired with the search query Q13 (vector QV14) are similar to each other in the distributed expression space. Train model M1. In this way, the generation device 50 trains the first model M1 so that the two vectors, which are a pair of distributed representations corresponding to the pair of search queries, are similar in the distributed representation space, so that the distributed representation is expressed from the search query. Is generated (step S23-1).

図１１の上段に示す情報処理の結果として、検索クエリＱ１ｋ（ｋ＝１、２、３、４）の分散表現であるベクトルＱＶ１ｋ（ｋ＝１、２、３、４）が分散表現空間の近い位置にクラスタＣＬ１１としてマッピングされる様子が示されている。例えば、検索クエリＱ１ｋ（ｋ＝１、２、３、４）は、ユーザＵ１によって「ある場所（東京都港区付近）で飲食店を探す」という検索意図の下で検索された検索クエリの集合であると推定される。すなわち、検索クエリＱ１ｋ（ｋ＝１、２、３、４）は、「ある場所（東京都港区付近）で飲食店を探す」という検索意図の下で検索された検索クエリであるという点で、相互に類似する特徴を有する検索クエリであると推定される。ここで、生成装置５０は、「ある場所（東京都港区付近）で飲食店を探す」という検索意図で入力された所定の検索クエリが第１モデルに入力されると、クラスタＣＬ１１の位置にマッピングされるような分散表現を出力することができる。これにより、例えば、生成装置５０は、クラスタＣＬ１１の位置にマッピングされる分散表現に対応する検索クエリを抽出することにより、「ある場所（東京都港区付近）で飲食店を探す」という検索意図に応じた検索クエリを抽出することができる。したがって、生成装置５０は、検索クエリの意味を適切に解釈可能とすることができる。 As a result of the information processing shown in the upper part of FIG. 11, the vector QV1k (k = 1, 2, 3, 4), which is a distributed representation of the search query Q1k (k = 1, 2, 3, 4), is close to the distributed representation space. It is shown that the position is mapped as the cluster CL11. For example, the search query Q1k (k = 1, 2, 3, 4) is a set of search queries searched by the user U1 with the search intention of "searching for a restaurant in a certain place (near Minato-ku, Tokyo)". Is presumed to be. That is, the search query Q1k (k = 1, 2, 3, 4) is a search query searched with the search intention of "searching for a restaurant in a certain place (near Minato-ku, Tokyo)". , Is presumed to be a search query with similar characteristics. Here, when the predetermined search query input with the search intention of "searching for a restaurant in a certain place (near Minato-ku, Tokyo)" is input to the first model, the generator 50 is located at the position of the cluster CL11. It is possible to output a distributed representation that is mapped. As a result, for example, the generator 50 has a search intention of "searching for a restaurant in a certain place (near Minato-ku, Tokyo)" by extracting a search query corresponding to the distributed expression mapped to the position of the cluster CL11. Search queries can be extracted according to. Therefore, the generation device 50 can appropriately interpret the meaning of the search query.

図１１の下段に示す例では、生成装置５０は、同一のユーザＵ２によって所定の時間内に連続して入力された３個の検索クエリである検索クエリＱ２１（「冷蔵庫４００Ｌ」）、検索クエリＱ２２（「冷蔵庫中型」）、検索クエリＱ２３（「冷蔵庫中型おすすめ」）を抽出する。生成装置５０は、検索クエリが入力された順番に並べると、検索クエリＱ２１、検索クエリＱ２２、検索クエリＱ２３の順番で入力された３個の検索クエリを抽出する。生成装置５０は、３個の検索クエリを抽出すると、時系列的に隣り合う２つの検索クエリを一対の検索クエリとして、２対の検索クエリのペアである（検索クエリＱ２１、検索クエリＱ２２）、（検索クエリＱ２２、検索クエリＱ２３）を抽出する（ステップＳ２１−２）。 In the example shown in the lower part of FIG. 11, the generation device 50 is a search query Q21 (“refrigerator 400L”) and a search query Q22, which are three search queries continuously input by the same user U2 within a predetermined time. ("Refrigerator medium size"), search query Q23 ("Refrigerator medium size recommended") is extracted. When the generation device 50 arranges the search queries in the order in which they are input, the generation device 50 extracts three search queries input in the order of the search query Q21, the search query Q22, and the search query Q23. When the generation device 50 extracts three search queries, it is a pair of two search queries (search query Q21, search query Q22), with two search queries adjacent in chronological order as a pair of search queries. (Search query Q22, search query Q23) is extracted (step S21-2).

続いて、生成装置５０は、抽出した検索クエリＱ２ｍ（ｍ＝１、２、３）を第１モデルＭ１に入力して、検索クエリＱ２ｍ（ｍ＝１、２、３）の分散表現であるベクトルＢＱＶ２ｍ（ｍ＝１、２、３）を出力する。ここで、ベクトルＢＱＶ２ｍ（ｍ＝１、２、３）は、第１モデルＭ１の出力層から出力されたばかりの検索クエリＱ２ｍ（ｍ＝１、２、３）の分散表現であって、第１モデルＭ１にフィードバックをかける前（学習前）の分散表現を示す（ステップＳ２２−２）。 Subsequently, the generation device 50 inputs the extracted search query Q2m (m = 1, 2, 3) into the first model M1, and a vector that is a distributed representation of the search query Q2m (m = 1, 2, 3). BQV2m (m = 1, 2, 3) is output. Here, the vector BQV2m (m = 1, 2, 3) is a distributed representation of the search query Q2m (m = 1, 2, 3) just output from the output layer of the first model M1, and is the first model. A distributed expression before giving feedback to M1 (before learning) is shown (step S22-2).

続いて、生成装置５０は、同一のユーザＵ２によって所定の時間内に連続して入力された一対の検索クエリは、所定の検索意図（例えば、「中型の冷蔵庫を調べる」という検索意図）で入力された検索クエリであると推定されるため、相互に類似する特徴を有するものとして、検索クエリＱ２１の分散表現（ベクトルＱＶ２１）と、検索クエリＱ２１と対となる検索クエリＱ２２の分散表現（ベクトルＱＶ２２）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。また、生成装置５０は、検索クエリＱ２２の分散表現（ベクトルＱＶ２２）と、検索クエリＱ２２と対となる検索クエリＱ２３の分散表現（ベクトルＱＶ２３）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。このように、生成装置５０は、一対の検索クエリに対応する一対の分散表現である２つのベクトルが分散表現空間上で類似するように第１モデルＭ１を学習させることで、検索クエリから分散表現を出力する第１モデルＭ１を生成する（ステップＳ２３−２）。 Subsequently, the generation device 50 inputs a pair of search queries continuously input by the same user U2 within a predetermined time with a predetermined search intention (for example, a search intention of "examining a medium-sized refrigerator"). Since it is presumed that the search query was performed, the distributed representation of the search query Q21 (vector QV21) and the distributed representation of the search query Q22 paired with the search query Q21 (vector QV22) are assumed to have similar characteristics to each other. ) And the first model M1 are trained so as to be similar in the distributed expression space. Further, in the generation device 50, the first distributed representation of the search query Q22 (vector QV22) and the distributed representation of the search query Q23 paired with the search query Q22 (vector QV23) are similar in the distributed representation space. Train model M1. In this way, the generation device 50 trains the first model M1 so that the two vectors, which are a pair of distributed representations corresponding to the pair of search queries, are similar on the distributed representation space, so that the distributed representation is expressed from the search query. Is generated (step S23-2).

図１１の下段に示す情報処理の結果として、検索クエリＱ２ｍ（ｍ＝１、２、３）の分散表現であるベクトルＱＶ２ｍ（ｍ＝１、２、３）が分散表現空間の近い位置にクラスタＣＬ２１としてマッピングされる様子が示されている。例えば、検索クエリＱ２ｍ（ｍ＝１、２、３）は、ユーザＵ２によって「中型の冷蔵庫を調べる」という検索意図の下で検索された検索クエリの集合であると推定される。すなわち、Ｑ２ｍ（ｍ＝１、２、３）は、「中型の冷蔵庫を調べる」という検索意図の下で検索された検索クエリであるという点で、相互に類似する特徴を有する検索クエリであると推定される。ここで、生成装置５０は、「中型の冷蔵庫を調べる」という検索意図で入力された所定の検索クエリが第１モデルに入力されると、クラスタＣＬ２１の位置にマッピングされるような分散表現を出力することができる。これにより、例えば、生成装置５０は、クラスタＣＬ２１の位置にマッピングされる分散表現に対応する検索クエリを抽出することにより、「中型の冷蔵庫を調べる」という検索意図に応じた検索クエリを抽出することができる。したがって、生成装置５０は、検索クエリの意味を適切に解釈可能とすることができる。 As a result of the information processing shown in the lower part of FIG. 11, the vector QV2m (m = 1, 2, 3), which is a distributed representation of the search query Q2m (m = 1, 2, 3), is located close to the distributed representation space in the cluster CL21. It is shown how it is mapped as. For example, the search query Q2m (m = 1, 2, 3) is presumed to be a set of search queries searched by the user U2 with the search intention of "checking a medium-sized refrigerator". That is, Q2m (m = 1, 2, 3) is a search query having similar characteristics in that it is a search query searched with the search intention of "searching for a medium-sized refrigerator". Presumed. Here, the generation device 50 outputs a distributed expression that is mapped to the position of the cluster CL21 when a predetermined search query input with the search intention of "checking a medium-sized refrigerator" is input to the first model. can do. Thereby, for example, the generation device 50 extracts the search query corresponding to the distributed expression mapped to the position of the cluster CL21, thereby extracting the search query according to the search intention of "examining the medium-sized refrigerator". Can be done. Therefore, the generation device 50 can appropriately interpret the meaning of the search query.

また、本願発明に係る生成装置５０は、ランダムに抽出された複数の検索クエリは、異なる検索意図の下で検索された検索クエリであるという点で、相互に相違する特徴を有する検索クエリであるとみなして第１モデルＭ１を学習させる。具体的には、生成装置５０は、所定の検索クエリの分散表現と、所定の検索クエリとは無関係にランダムに抽出された検索クエリの分散表現とが分散表現空間上で遠くにマッピングされるように第１モデルＭ１のトレーニングを行う。図１１に示す例では、生成装置５０は、検索クエリＱ１１とは無関係にランダムに検索クエリを抽出したところ、検索クエリＱ２１が抽出されたとする。この場合、生成装置５０は、検索クエリＱ１１の分散表現（ベクトルＱＶ１１）と、検索クエリＱ１１とは無関係にランダムに抽出された検索クエリＱ２１の分散表現（ベクトルＱＶ２１）とが分散表現空間上で遠くにマッピングされるように第１モデルＭ１のトレーニングを行う。その結果として、「ある場所（東京都港区付近）で飲食店を探す」という検索意図の下で検索された検索クエリＱ１ｋ（ｋ＝１、２、３、４）の分散表現であるベクトルＱＶ１ｋ（ｋ＝１、２、３、４）を含むクラスタＣＬ１１と、「中型の冷蔵庫を調べる」という検索意図の下で検索された検索クエリＱ２ｍ（ｍ＝１、２、３）の分散表現であるベクトルＱＶ２ｍ（ｍ＝１、２、３）を含むクラスタＣＬ２１とは、分散表現空間上で遠くにマッピングされる。すなわち、本願発明に係る生成装置５０は、ランダムに抽出された複数の検索クエリの分散表現が相違するように第１モデルＭ１を学習させることにより、検索意図が異なる検索クエリの分散表現を分散表現空間上で遠い位置に出力可能とする。 Further, the generation device 50 according to the present invention is a search query having different characteristics from each other in that a plurality of randomly extracted search queries are search queries searched under different search intentions. Assuming that, the first model M1 is trained. Specifically, the generation device 50 maps the distributed representation of a predetermined search query and the distributed representation of a search query randomly extracted independently of the predetermined search query far away in the distributed representation space. The first model M1 is trained. In the example shown in FIG. 11, it is assumed that the generation device 50 randomly extracts the search query regardless of the search query Q11, and the search query Q21 is extracted. In this case, in the generation device 50, the distributed representation of the search query Q11 (vector QV11) and the distributed representation of the search query Q21 randomly extracted regardless of the search query Q11 (vector QV21) are far apart on the distributed representation space. The first model M1 is trained so as to be mapped to. As a result, the vector QV1k, which is a distributed expression of the search query Q1k (k = 1, 2, 3, 4) searched under the search intention of "searching for a restaurant in a certain place (near Minato-ku, Tokyo)". It is a distributed representation of the cluster CL11 including (k = 1, 2, 3, 4) and the search query Q2m (m = 1, 2, 3) searched under the search intention of "searching for a medium-sized refrigerator". The cluster CL21 containing the vector QV2m (m = 1, 2, 3) is mapped far away on the distributed representation space. That is, the generation device 50 according to the present invention trains the first model M1 so that the distributed expressions of a plurality of randomly extracted search queries are different, thereby expressing the distributed expressions of the search queries having different search intentions. It is possible to output to a distant position in space.

なお、生成装置５０が生成した第１モデルＭ１によって出力された分散表現が分散表現空間にマッピングされた結果として、上述したクラスタＣＬ１１とクラスタＣＬ２１の他にも、同一のユーザによって所定の時間内に入力された複数の検索クエリの分散表現の集合であるクラスタＣＬ１２やクラスタＣＬ２２が生成される。 As a result of mapping the distributed representation output by the first model M1 generated by the generation device 50 to the distributed representation space, in addition to the above-mentioned cluster CL11 and cluster CL21, the same user within a predetermined time. Cluster CL12 and cluster CL22, which are a set of distributed representations of a plurality of input search queries, are generated.

上述したように、生成装置５０は、ユーザによって入力された検索クエリを取得する。また、生成装置５０は、取得した検索クエリのうち、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして学習することで、所定の検索クエリから所定の検索クエリの特徴情報を予測する第１モデルを生成する。すなわち、本願発明に係る生成装置５０は、所定の時間内に連続して入力された複数の検索クエリは、所定の検索意図の下で検索された検索クエリであるという点で、相互に類似する特徴を有する検索クエリであるとみなして第１モデルを学習させる。具体的には、生成装置５０は、同一のユーザによって所定の時間内に入力された複数の検索クエリの分散表現が類似するように第１モデルを学習させることで、所定の検索クエリから所定の検索クエリの特徴情報を含む分散表現を出力する第１モデルを生成する。すなわち、本願発明に係る生成装置５０は、所定の時間内に連続して入力された複数の検索クエリの分散表現が類似するように第１モデルＭ１を学習させることにより、所定の検索意図の下で検索された検索クエリの分散表現を分散表現空間上で近い位置に出力可能とする。これにより、生成装置５０は、検索クエリを入力したユーザのコンテクストに応じて検索クエリの意味（検索意図）を出力（解釈）することを可能にする。したがって、生成装置５０は、検索クエリの意味を適切に解釈可能とすることができる。さらに、生成装置５０は、所定の検索クエリの特徴情報を含む分散表現の近傍にマッピングされる分散表現に対応する検索クエリを抽出することにより、所定の検索クエリが検索された検索意図に応じた検索クエリを抽出することができる。すなわち、生成装置５０は、検索クエリを入力したユーザの検索意図やコンテクストを考慮して、ユーザの検索動向を分析することを可能にする。したがって、生成装置５０は、ユーザの検索動向の分析精度を高めることができる。また、生成装置５０が生成した第１モデルＭ１を検索システムの一部として機能させることもできる。あるいは、生成装置５０は、第１モデルＭ１によって予測された検索クエリの特徴情報を利用する他のシステム（例えば、検索エンジン）への入力情報として、第１モデルＭ１が出力した検索クエリの分散表現を提供することもできる。これにより、検索システムは、第１モデルＭ１によって予測された検索クエリの特徴情報に基づいて、検索結果として出力されるコンテンツを選択可能になる。すなわち、検索システムは、検索クエリを入力したユーザの検索意図やコンテクストを考慮して、検索結果として出力されるコンテンツを選択可能になる。さらに、検索システムは、第１モデルＭ１によって予測された検索クエリの特徴情報に基づいて、検索結果として出力されるコンテンツに含まれる文字列の分散表現と検索クエリの分散表現との類似度を算出可能になる。そして、検索システムは、算出した類似度に基づいて、検索結果として出力されるコンテンツの表示順を決定可能になる。すなわち、検索システムは、検索クエリを入力したユーザの検索意図やコンテクストを考慮して、検索結果として出力されるコンテンツの表示順を決定可能になる。したがって、生成装置５０は、検索サービスにおけるユーザビリティを向上させることができる。 As described above, the generator 50 acquires the search query entered by the user. Further, the generation device 50 learns from the acquired search queries that a plurality of search queries input by the same user within a predetermined time have similar characteristics, so that the search queries can be determined from the predetermined search queries. Generate a first model that predicts the feature information of the search query. That is, the generation device 50 according to the present invention is similar to each other in that a plurality of search queries continuously input within a predetermined time are search queries searched under a predetermined search intention. The first model is trained by regarding it as a search query having characteristics. Specifically, the generation device 50 trains the first model so that the distributed representations of a plurality of search queries input by the same user within a predetermined time are similar, so that a predetermined search query can be used as a predetermined value. Generate a first model that outputs a distributed representation that includes the feature information of the search query. That is, the generation device 50 according to the present invention trains the first model M1 so that the distributed expressions of a plurality of search queries continuously input within a predetermined time are similar to each other, so that the first model M1 is trained under a predetermined search intention. The distributed representation of the search query searched in is able to be output to a close position on the distributed representation space. This makes it possible for the generation device 50 to output (interpret) the meaning (search intention) of the search query according to the context of the user who input the search query. Therefore, the generation device 50 can appropriately interpret the meaning of the search query. Further, the generation device 50 extracts a search query corresponding to the distributed expression mapped in the vicinity of the distributed expression including the feature information of the predetermined search query, so that the predetermined search query can be searched according to the search intention. You can extract search queries. That is, the generation device 50 makes it possible to analyze the search trend of the user in consideration of the search intention and the context of the user who input the search query. Therefore, the generation device 50 can improve the analysis accuracy of the user's search trend. Further, the first model M1 generated by the generation device 50 can be made to function as a part of the search system. Alternatively, the generation device 50 is a distributed representation of the search query output by the first model M1 as input information to another system (for example, a search engine) that uses the feature information of the search query predicted by the first model M1. Can also be provided. As a result, the search system can select the content output as the search result based on the feature information of the search query predicted by the first model M1. That is, the search system can select the content output as the search result in consideration of the search intention and context of the user who entered the search query. Further, the search system calculates the similarity between the distributed expression of the character string included in the content output as the search result and the distributed expression of the search query based on the characteristic information of the search query predicted by the first model M1. It will be possible. Then, the search system can determine the display order of the contents output as the search result based on the calculated similarity. That is, the search system can determine the display order of the contents output as the search result in consideration of the search intention and the context of the user who input the search query. Therefore, the generation device 50 can improve the usability in the search service.

〔２−２．第２学習モデルの生成処理〕
次に、図１２を用いて、第２学習モデルの生成処理の流れについて説明する。図１２は、実施形態に係る第２学習モデルの生成処理の一例を示す図である。なお、以下では、適宜、第２学習モデルを第２モデル（又は、第２モデルＭ２）と記載する。図１２の上段に示す例では、生成装置５０は、同一のユーザＵ１によって所定の時間内に連続して入力された４個の検索クエリである検索クエリＱ１１（「六本木パスタ」）、検索クエリＱ１２（「六本木イタリアン」）、検索クエリＱ１３（「赤坂パスタ」）、検索クエリＱ１４（「麻布パスタ」）を抽出する。生成装置５０は、同一のユーザＵ１によって各検索クエリが入力された時間の間隔が所定の時間内である複数の検索クエリを抽出する。また、生成装置５０は、同一のユーザＵ１によって各検索クエリのペアが入力された時間の間隔が所定の時間内である複数の検索クエリを抽出する。ここで、４個の検索クエリは、検索クエリＱ１１、検索クエリＱ１２、検索クエリＱ１３、検索クエリＱ１４の順番でユーザＵ１によって各検索クエリが所定の時間内に入力された検索クエリであるとする。生成装置５０は、４個の検索クエリを抽出すると、時系列的に隣り合う２つの検索クエリを一対の検索クエリとして、３対の検索クエリのペアである（検索クエリＱ１１、検索クエリＱ１２）、（検索クエリＱ１２、検索クエリＱ１３）、（検索クエリＱ１３、検索クエリＱ１４）を抽出する。生成装置５０は、３対の検索クエリのペアを抽出すると、抽出した検索クエリＱ１ｋ（ｋ＝１、２、３、４）を第１モデルＭ１に入力する（ステップＳ３１）。なお、生成装置５０は、同一のユーザＵ１によって全ての検索クエリが所定の時間内に入力された複数の検索クエリを抽出してもよい。そして、生成装置５０は、時系列的に隣り合うか否かに関わらず、抽出した複数の検索クエリの中から２つの検索クエリを選択して、選択した２つの検索クエリを一対の検索クエリとして抽出してもよい。 [2-2. Second learning model generation process]
Next, the flow of the generation process of the second learning model will be described with reference to FIG. FIG. 12 is a diagram showing an example of a generation process of the second learning model according to the embodiment. In the following, the second learning model will be referred to as a second model (or a second model M2) as appropriate. In the example shown in the upper part of FIG. 12, the generation device 50 is a search query Q11 (“Roppongi pasta”) and a search query Q12, which are four search queries continuously input by the same user U1 within a predetermined time. ("Roppongi Italian"), search query Q13 ("Akasaka pasta"), search query Q14 ("Azabu pasta") are extracted. The generation device 50 extracts a plurality of search queries in which the time interval in which each search query is input by the same user U1 is within a predetermined time. Further, the generation device 50 extracts a plurality of search queries in which the time interval in which each search query pair is input by the same user U1 is within a predetermined time. Here, it is assumed that the four search queries are search queries in which each search query is input by the user U1 in the order of search query Q11, search query Q12, search query Q13, and search query Q14 within a predetermined time. When the generation device 50 extracts four search queries, it is a pair of three pairs of search queries (search query Q11, search query Q12), with two search queries adjacent in chronological order as a pair of search queries. (Search query Q12, search query Q13) and (search query Q13, search query Q14) are extracted. When the generation device 50 extracts three pairs of search queries, the extracted search queries Q1k (k = 1, 2, 3, 4) are input to the first model M1 (step S31). The generation device 50 may extract a plurality of search queries in which all the search queries are input within a predetermined time by the same user U1. Then, the generation device 50 selects two search queries from the plurality of extracted search queries regardless of whether they are adjacent to each other in chronological order, and the two selected search queries are used as a pair of search queries. It may be extracted.

続いて、生成装置５０は、検索クエリＱ１ｋ（ｋ＝１、２、３、４）の分散表現であるベクトルＢＱＶ１ｋ（ｋ＝１、２、３、４）を第１モデルＭ１の出力データとして出力する（ステップＳ３２）。ここで、ベクトルＢＱＶ１ｋ（ｋ＝１、２、３、４）は、第１モデルＭ１の出力層から出力されたばかりの検索クエリＱ１ｋ（ｋ＝１、２、３、４）の分散表現であって、第１モデルＭ１にフィードバックをかける前（学習前）の分散表現を示す。 Subsequently, the generation device 50 outputs the vector BQV1k (k = 1, 2, 3, 4), which is a distributed representation of the search query Q1k (k = 1, 2, 3, 4), as output data of the first model M1. (Step S32). Here, the vector BQV1k (k = 1, 2, 3, 4) is a distributed representation of the search query Q1k (k = 1, 2, 3, 4) just output from the output layer of the first model M1. , The distributed representation before giving feedback to the first model M1 (before learning) is shown.

ここで、同一のユーザＵ１によって所定の時間内に連続して入力された検索クエリＱ１ｋ（ｋ＝１、２、３、４）は、例えば、ユーザＵ１によって「ある場所（東京都港区付近）で飲食店を探す」という検索意図の下で検索された検索クエリの集合であると推定される。すなわち、検索クエリＱ１ｋ（ｋ＝１、２、３、４）は、「ある場所（東京都港区付近）で飲食店を探す」という検索意図の下で検索された検索クエリであるという点で、相互に類似する特徴を有する検索クエリであると推定される。そこで、生成装置５０は、連続して入力された検索クエリが類似する特徴を有するものとして学習することで、所定の検索クエリから所定の検索クエリの特徴情報を予測する第１モデルを生成する（ステップＳ３３）。具体的には、生成装置５０は、連続して入力された検索クエリの分散表現が類似するものとして学習することで、所定の検索クエリから所定の検索クエリの分散表現を予測する第１モデルＭ１を生成する。例えば、生成装置５０は、検索クエリＱ１１の分散表現（ベクトルＱＶ１１）と、検索クエリＱ１１と対となる検索クエリＱ１２の分散表現（ベクトルＱＶ１２）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。また、生成装置５０は、検索クエリＱ１２の分散表現（ベクトルＱＶ１２）と、検索クエリＱ１２と対となる検索クエリＱ１３の分散表現（ベクトルＱＶ１３）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。また、生成装置５０は、検索クエリＱ１３の分散表現（ベクトルＱＶ１３）と、検索クエリＱ１３と対となる検索クエリＱ１４の分散表現（ベクトルＱＶ１４）とが、分散表現空間上で類似するように第１モデルＭ１を学習させる。 Here, the search query Q1k (k = 1, 2, 3, 4) continuously input by the same user U1 within a predetermined time is, for example, "a certain place (near Minato-ku, Tokyo)" by the user U1. It is presumed to be a set of search queries searched under the search intention of "searching for restaurants in Tokyo". That is, the search query Q1k (k = 1, 2, 3, 4) is a search query searched with the search intention of "searching for a restaurant in a certain place (near Minato-ku, Tokyo)". , Is presumed to be a search query with similar characteristics. Therefore, the generation device 50 generates a first model that predicts the characteristic information of a predetermined search query from the predetermined search query by learning that the continuously input search queries have similar characteristics (the generation device 50). Step S33). Specifically, the generation device 50 predicts the distributed expression of a predetermined search query from a predetermined search query by learning that the distributed expressions of the continuously input search queries are similar to each other. To generate. For example, in the generation device 50, the first distributed representation of the search query Q11 (vector QV11) and the distributed representation of the search query Q12 paired with the search query Q11 (vector QV12) are similar in the distributed representation space. Train model M1. Further, in the generation device 50, the distributed representation of the search query Q12 (vector QV12) and the distributed representation of the search query Q13 paired with the search query Q12 (vector QV13) are similar to each other in the distributed representation space. Train model M1. Further, in the generation device 50, the distributed expression of the search query Q13 (vector QV13) and the distributed expression of the search query Q14 paired with the search query Q13 (vector QV14) are similar to each other in the distributed expression space. Train model M1.

図１２の上段の右側には、学習済みの第１モデルＭ１の出力結果として、同一のユーザＵ１によって所定の時間内に入力された検索クエリＱ１ｋ（ｋ＝１、２、３、４）の分散表現であるベクトルＱＶ１ｋ（ｋ＝１、２、３、４）が分散表現空間のクラスタＣＬ１１としてマッピングされる様子が示されている。このように、生成装置５０は、同一のユーザによって所定の時間内に入力された複数の検索クエリが有する特徴を学習した第１学習モデルＭ１を生成する。 On the right side of the upper part of FIG. 12, as the output result of the trained first model M1, the variance of the search query Q1k (k = 1, 2, 3, 4) input by the same user U1 within a predetermined time is distributed. It is shown that the representation vector QV1k (k = 1, 2, 3, 4) is mapped as the cluster CL11 in the distributed representation space. In this way, the generation device 50 generates the first learning model M1 that has learned the features of the plurality of search queries input by the same user within a predetermined time.

生成装置５０は、第１モデルＭ１を生成すると、生成した第１モデルＭ１（第１モデルＭ１のモデルデータＭＤＴ１）を取得する。生成装置５０は、第１モデルＭ１を取得すると、取得した第１モデルＭ１を用いて、第２学習モデルＭ２を生成する。具体的には、生成装置５０は、第１モデルＭ１を再学習させることにより、第１モデルＭ１とは学習モデルの重みである接続係数が異なる第２モデルＭ２を生成する。より具体的には、生成装置５０は、第１モデルＭ１を用いて、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルＭ２を生成する（ステップＳ３４）。 When the generation device 50 generates the first model M1, the generated first model M1 (model data MDT1 of the first model M1) is acquired. When the generation device 50 acquires the first model M1, the generation device 50 generates the second learning model M2 by using the acquired first model M1. Specifically, the generation device 50 relearns the first model M1 to generate the second model M2 having a connection coefficient different from that of the first model M1 which is the weight of the learning model. More specifically, the generation device 50 uses the first model M1 to generate a second learning model M2 that predicts a category to which a predetermined search query belongs from a predetermined search query (step S34).

図１２の下段に示す例では、生成装置５０は、検索クエリが第２モデルＭ２に入力された際に、ＣＡＴ１１（「飲食店を探す」）、ＣＡＴ１２（「商品を探す」）、ＣＡＴ１３（「飲食店を予約する」）、ＣＡＴ１４（「商品を購入する」）の４つのカテゴリのいずれのカテゴリに属するかを予測する第２モデルＭ２を生成する。具体的には、生成装置５０は、入力情報として検索クエリが第２モデルＭ２に入力された際に、出力情報として検索クエリがそのカテゴリに属する確率をカテゴリ毎に出力する第２モデルＭ２を生成する。例えば、生成装置５０は、第２モデルＭ２の正解データとして、検索クエリと検索クエリが属するカテゴリ（ＣＡＴ１１〜ＣＡＴ１４のいずれか）との組を学習する。 In the example shown in the lower part of FIG. 12, when the search query is input to the second model M2, the generator 50 has CAT11 (“find a restaurant”), CAT12 (“find a product”), and CAT13 (“find a product”). A second model M2 that predicts which of the four categories of "reserving a restaurant") and CAT14 ("purchasing a product") belongs to is generated. Specifically, the generation device 50 generates the second model M2 that outputs the probability that the search query belongs to the category as the output information when the search query is input to the second model M2 as the input information. do. For example, the generation device 50 learns a set of a search query and a category (any of CAT11 to CAT14) to which the search query belongs as correct answer data of the second model M2.

なお、検索クエリがＣＡＴ１１（「飲食店を探す」）に属することは、検索クエリが飲食店を探す意図で入力された検索クエリであることを示す。また、ＣＡＴ１２（「商品を探す」）に属することは、検索クエリが商品を探す意図で入力された検索クエリであることを示す。また、検索クエリがＣＡＴ１３（「飲食店を予約する」）に属することは、検索クエリが飲食店を予約する意図で入力された検索クエリであることを示す。また、検索クエリがＣＡＴ１４（「商品を購入する」）に属することは、検索クエリが商品を購入する意図で入力された検索クエリであることを示す。 The fact that the search query belongs to CAT11 (“search for a restaurant”) indicates that the search query is a search query entered with the intention of searching for a restaurant. Further, belonging to CAT12 (“searching for a product”) indicates that the search query is a search query input with the intention of searching for a product. Further, the fact that the search query belongs to CAT13 (“reserving a restaurant”) indicates that the search query is a search query input with the intention of reserving a restaurant. Further, the fact that the search query belongs to CAT14 (“purchase a product”) indicates that the search query is a search query input with the intention of purchasing a product.

具体的には、生成装置５０は、検索クエリが学習モデルに入力された際に、学習モデルが出力する分散表現の分類結果が、検索クエリが属するカテゴリに対応するように学習することで、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２モデルＭ２を生成する。そして、生成装置５０は、例えば、入力情報として検索クエリが第２モデルＭ２に入力された際に、出力情報として検索クエリがそのカテゴリに属する確率をカテゴリＣＡＴ１１〜ＣＡＴ１４毎に出力する第２モデルＭ２を生成する。 Specifically, the generation device 50 determines that when the search query is input to the learning model, the classification result of the distributed expression output by the learning model is learned so as to correspond to the category to which the search query belongs. A second model M2 that predicts the category to which a predetermined search query belongs is generated from the search query of. Then, for example, when the search query is input to the second model M2 as input information, the generation device 50 outputs the probability that the search query belongs to the category as output information for each of the categories CAT11 to CAT14. To generate.

例えば、生成装置５０は、入力情報として検索クエリＱ１１（「六本木パスタ」）が第２モデルＭ２に入力された際に（ステップＳ３５）、出力情報として検索クエリＱ１１（「六本木パスタ」）の分散表現であるベクトルＢＱＶ１１を出力する。ここで、ベクトルＢＱＶ１１は、第２モデルＭ２の出力層から出力されたばかりの検索クエリＱ１１の分散表現であって、第２モデルＭ２にフィードバックをかける前（学習前）の分散表現を示す。ここで、検索クエリＱ１１（「六本木パスタ」）が属する正解カテゴリがＣＡＴ１１（「飲食店を探す」）であるとする。この場合、生成装置５０は、出力された検索クエリＱ１１（「六本木パスタ」）の分散表現であるベクトルＢＱＶ１１がＣＡＴ１１（「飲食店を探す」）に分類される確率が所定の閾値を超えるように第２モデルＭ２を学習させる。なお、生成装置５０は、あらかじめ用意された正解データを用いて第２モデルを学習させる。生成装置５０は、第２モデルＭ２の正解データを生成してもよい。そして、生成装置５０は、生成した正解データを用いて第２モデルＭ２を学習させてもよい。具体的には、生成装置５０は、検索クエリを検索したユーザの検索後の行動に基づいて、検索クエリが属する正解カテゴリを決定する。より具体的には、生成装置５０は、所定の検索クエリを検索したユーザに対して、検索後に所定の行動を起こしたユーザの割合が所定の閾値を超える所定の行動を、正解カテゴリに対応する行動として決定する。例えば、検索クエリＱ１１（「六本木パスタ」）を検索したユーザが検索後に所定の行動を起こしたユーザの割合として、飲食店を探す行動を起こしたユーザの割合が９０％、検索後に商品を探す行動を起こしたユーザの割合が０％、検索後に飲食店を予約する行動を起こしたユーザの割合が１０％、検索後に商品を購入する行動を起こしたユーザの割合が０％であったとする。この場合、生成装置５０は、飲食店を探す行動を起こしたユーザの割合が所定の閾値（例えば、９０％）を超えるため、飲食店を探す行動を検索クエリＱ１１（「六本木パスタ」）の正解カテゴリに対応する行動として決定する。そして、生成装置５０は、正解カテゴリに対応する行動を飲食店を探す行動であると決定したので、検索クエリＱ１１（「六本木パスタ」）が属する正解カテゴリをＣＡＴ１１（「飲食店を探す」）に決定する。 For example, the generation device 50 distributes the search query Q11 (“Roppongi pasta”) as output information when the search query Q11 (“Roppongi pasta”) is input to the second model M2 as input information (step S35). Is output as the vector BQV11. Here, the vector BQV11 is a distributed expression of the search query Q11 just output from the output layer of the second model M2, and shows a distributed expression before giving feedback to the second model M2 (before learning). Here, it is assumed that the correct answer category to which the search query Q11 (“Roppongi pasta”) belongs is CAT11 (“find a restaurant”). In this case, the generation device 50 sets the probability that the vector BQV11, which is a distributed representation of the output search query Q11 (“Roppongi pasta”), is classified into CAT11 (“searching for a restaurant”) exceeds a predetermined threshold value. The second model M2 is trained. The generation device 50 trains the second model using the correct answer data prepared in advance. The generation device 50 may generate correct answer data of the second model M2. Then, the generation device 50 may train the second model M2 using the generated correct answer data. Specifically, the generation device 50 determines the correct answer category to which the search query belongs based on the post-search behavior of the user who searched for the search query. More specifically, the generation device 50 corresponds to a predetermined action in which the ratio of users who have performed a predetermined action after the search exceeds a predetermined threshold with respect to the user who searched for the predetermined search query, according to the correct answer category. Determine as an action. For example, 90% of the users who searched for the search query Q11 (“Roppongi Pasta”) took a predetermined action after the search, and 90% of the users took the action of searching for a restaurant, and the action of searching for a product after the search. It is assumed that the percentage of users who have caused the problem is 0%, the percentage of users who have taken the action of reserving a restaurant after the search is 10%, and the percentage of users who have taken the action of purchasing the product after the search is 0%. In this case, since the percentage of users who have taken the action of searching for a restaurant exceeds a predetermined threshold value (for example, 90%), the generation device 50 searches for the action of searching for a restaurant with the correct answer of the search query Q11 (“Roppongi pasta”). Determined as an action corresponding to the category. Then, since the generation device 50 determines that the action corresponding to the correct answer category is the action of searching for a restaurant, the correct answer category to which the search query Q11 (“Roppongi pasta”) belongs is set to CAT11 (“find a restaurant”). decide.

例えば、生成装置５０は、学習前の第２モデルＭ２に検索クエリＱ１１（「六本木パスタ」）が入力された際に、分散表現であるベクトルＢＱＶ１１がＣＡＴ１１（「飲食店を探す」）に分類される確率を８０％、ＣＡＴ１２（「商品を探す」）に分類される確率を０％、ＣＡＴ１３（「飲食店を予約」）に分類される確率を２０％、ＣＡＴ１４（「商品を購入する」）に分類される確率を０％と出力したとする。この場合、生成装置５０は、分散表現であるベクトルＢＱＶ１１がＣＡＴ１１（「飲食店を探す」）に分類される確率を所定の閾値（例えば、９０％）を超えるように第２モデルＭ２を学習させる。また、生成装置５０は、分散表現であるベクトルＢＱＶ１１がＣＡＴ１１（「飲食店を探す」）に分類される確率が所定の閾値（例えば、９０％）を超えるように学習させるのに合わせて、分散表現であるベクトルＢＱＶ１１が他のカテゴリＣＡＴ１３（「飲食店を予約」）に分類される確率を１０％に下げるように第２モデルＭ２を学習させる。 For example, in the generation device 50, when the search query Q11 (“Roppongi pasta”) is input to the second model M2 before learning, the vector BQV11 which is a distributed expression is classified into CAT11 (“find a restaurant”). 80% probability of being classified as CAT12 ("find a product") 0%, 20% probability of being classified as CAT13 ("book a restaurant"), CAT14 ("buy a product") It is assumed that the probability of being classified as 0% is output. In this case, the generation device 50 trains the second model M2 so that the probability that the vector BQV11, which is a distributed representation, is classified into CAT11 (“find a restaurant”) exceeds a predetermined threshold value (for example, 90%). .. Further, the generator 50 is distributed so that the probability that the vector BQV11, which is a distributed expression, is classified into CAT11 (“find a restaurant”) exceeds a predetermined threshold value (for example, 90%). The second model M2 is trained so that the probability that the representation vector BQV11 is classified into another category CAT13 (“book a restaurant”) is reduced to 10%.

このように、生成装置５０は、入力情報として所定の検索クエリが入力されると、出力情報として所定の検索クエリの分散表現が正解カテゴリに分類される確率が所定の閾値を超えるように第２モデルを学習させる。そして、生成装置５０は、入力情報として所定の検索クエリが入力された際に、所定の検索クエリの分散表現がそのカテゴリに属する確率が所定の閾値を超えるカテゴリを、所定の検索クエリのカテゴリとして出力する。例えば、生成装置５０は、学習済みの第２モデルＭ２に入力情報として検索クエリＱ１１（「六本木パスタ」）が入力されると、検索クエリＱ１１（「六本木パスタ」）の分散表現であるベクトルＢＱＶ１１がカテゴリＣＡＴ１１（「飲食店を探す」）に属する確率が９０％を超えるので、出力情報として検索クエリが属するカテゴリをＣＡＴ１１（「飲食店を探す」）と出力する（ステップＳ３６）。このように、生成装置５０は、検索クエリと検索クエリの正解カテゴリとの組を学習することで、所定の検索クエリから所定の検索クエリのカテゴリを予測する第２モデルを生成する（ステップＳ３７）。 As described above, when the predetermined search query is input as the input information, the generation device 50 makes the second so that the probability that the distributed expression of the predetermined search query is classified into the correct answer category as the output information exceeds the predetermined threshold value. Train the model. Then, when a predetermined search query is input as input information, the generation device 50 sets a category in which the probability that the distributed expression of the predetermined search query belongs to the category exceeds a predetermined threshold value as a predetermined search query category. Output. For example, in the generation device 50, when the search query Q11 (“Roppongi pasta”) is input as input information to the trained second model M2, the vector BQV11 which is a distributed representation of the search query Q11 (“Roppongi pasta”) is generated. Since the probability of belonging to the category CAT11 ("find a restaurant") exceeds 90%, the category to which the search query belongs is output as CAT11 ("find a restaurant") as output information (step S36). In this way, the generation device 50 generates a second model that predicts a predetermined search query category from a predetermined search query by learning a set of a search query and a correct answer category of the search query (step S37). ..

一般的に、ユーザはある意図を持って検索を複数回行うと考えられるため、所定の時間内に連続して入力された検索クエリは、検索意図が近いという仮定が成り立つ。そこで、本願発明に係る生成装置５０は、所定の時間内に連続して入力された複数の検索クエリは、所定の検索意図の下で検索された検索クエリであるという点で、相互に類似する特徴を有する検索クエリであるとみなして第１モデルＭ１を学習させる。これにより、生成装置５０は、検索意図を考慮した検索クエリの特徴を第１モデルＭ１に学習させることができる。そして、生成装置５０は、検索意図を考慮した検索クエリの特徴を学習した第１モデルＭ１を活用して、所定の検索クエリから所定の検索クエリのカテゴリを予測する第２モデルを効率的に生成することができる。これにより、生成装置５０は、検索クエリを入力したユーザの検索意図を考慮したカテゴリに検索クエリを分類することを可能にする。また、従来、検索クエリをカテゴリに分類し、高い分類精度を得るためには、十分な量の正解データを用意することが必要であった。しかしながら、検索クエリ自体、多種多様であり、ロングテイルな性質を持つものであるため、多数の検索クエリに対応する正解カテゴリをラベル付けするのは、非常に手間がかかり困難である。ここで、生成装置５０は、正解カテゴリをラベル付けする代わりに、ユーザの検索意図（検索クエリを入力したユーザのコンテクスト）を一種の正解として、検索クエリのカテゴリを予測する第２モデルを学習させることができる。これにより、生成装置５０は、人手で検索クエリの正解カテゴリをラベル付けすることなく、第２モデルを学習させることができる。すなわち、生成装置５０は、正解データが少ないときでも、十分な分類精度を得られるようになる。また、生成装置５０は、正解データが多いときであれば、さらに高い分類精度を得られるようになる。したがって、生成装置５０は、検索クエリの分類精度を高めることができる。 In general, it is considered that the user performs a search a plurality of times with a certain intention, so that it is assumed that the search queries that are continuously input within a predetermined time have similar search intentions. Therefore, the generation device 50 according to the present invention is similar to each other in that a plurality of search queries continuously input within a predetermined time are search queries searched under a predetermined search intention. The first model M1 is trained by regarding it as a search query having characteristics. As a result, the generation device 50 can make the first model M1 learn the characteristics of the search query in consideration of the search intention. Then, the generation device 50 efficiently generates a second model that predicts a predetermined search query category from a predetermined search query by utilizing the first model M1 that has learned the characteristics of the search query in consideration of the search intention. can do. This allows the generator 50 to classify the search query into categories that take into account the search intent of the user who entered the search query. Further, conventionally, in order to classify search queries into categories and obtain high classification accuracy, it has been necessary to prepare a sufficient amount of correct answer data. However, because the search queries themselves are diverse and have long-tailed properties, it is very laborious and difficult to label the correct answer categories that correspond to a large number of search queries. Here, instead of labeling the correct answer category, the generation device 50 trains a second model that predicts the category of the search query by using the user's search intention (the context of the user who entered the search query) as a kind of correct answer. be able to. As a result, the generation device 50 can train the second model without manually labeling the correct answer category of the search query. That is, the generation device 50 can obtain sufficient classification accuracy even when the number of correct answer data is small. Further, the generation device 50 can obtain even higher classification accuracy when there are many correct answer data. Therefore, the generation device 50 can improve the classification accuracy of the search query.

〔２−３．情報処理装置の構成〕
次に、図１３を用いて、実施形態に係る生成装置５０の構成について説明する。図１３は、実施形態に係る生成装置５０の構成例を示す図である。図１３に示すように、生成装置５０は、通信部５１と、記憶部５３と、制御部５２とを有する。なお、生成装置５０は、生成装置５０の管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を表示するための表示部（例えば、液晶ディスプレイ等）を有してもよい。 [2-3. Information processing device configuration]
Next, the configuration of the generation device 50 according to the embodiment will be described with reference to FIG. FIG. 13 is a diagram showing a configuration example of the generation device 50 according to the embodiment. As shown in FIG. 13, the generation device 50 includes a communication unit 51, a storage unit 53, and a control unit 52. The generation device 50 has an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the generation device 50, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. You may.

（通信部５１）
通信部５１は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部５１は、ネットワークと有線または無線で接続され、例えば、ユーザ端末１０と、検索サーバ２０との間で情報の送受信を行う。 (Communication unit 51)
The communication unit 51 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 51 is connected to the network by wire or wirelessly, and for example, information is transmitted / received between the user terminal 10 and the search server 20.

（記憶部５３）
記憶部５３は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部５３は、図１３に示すように、クエリ情報記憶部５３１と、ベクトル情報記憶部５３２と、分類定義記憶部５３３と、カテゴリ情報記憶部５３４と、モデル情報記憶部５３５とを有する。 (Memory unit 53)
The storage unit 53 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. As shown in FIG. 13, the storage unit 53 includes a query information storage unit 531, a vector information storage unit 532, a classification definition storage unit 533, a category information storage unit 534, and a model information storage unit 535.

（クエリ情報記憶部５３１）
クエリ情報記憶部５３１は、ユーザによって入力された検索クエリに関する各種の情報を記憶する。図１４に、実施形態に係るクエリ情報記憶部の一例を示す。図１４に示す例では、クエリ情報記憶部５３１は、「ユーザＩＤ」、「日時」、「検索クエリ」、「検索クエリＩＤ」といった項目を有する。 (Query information storage unit 531)
The query information storage unit 531 stores various information related to the search query input by the user. FIG. 14 shows an example of the query information storage unit according to the embodiment. In the example shown in FIG. 14, the query information storage unit 531 has items such as "user ID", "date and time", "search query", and "search query ID".

「ユーザＩＤ」は、検索クエリを入力したユーザを識別するための識別情報を示す。「日時」は、検索サーバがユーザから検索クエリを受け付けた日時を示す。「検索クエリ」は、ユーザによって入力された検索クエリを示す。「検索クエリＩＤ」は、ユーザによって入力された検索クエリを識別するための識別情報を示す。 The "user ID" indicates identification information for identifying the user who entered the search query. "Date and time" indicates the date and time when the search server received the search query from the user. A "search query" indicates a search query entered by the user. The "search query ID" indicates identification information for identifying the search query entered by the user.

図１４の１レコード目に示す例では、検索クエリＩＤ「Ｑ１１」で識別される検索クエリ（検索クエリＱ１１）は、図１０に示した検索クエリＱ１１に対応する。また、ユーザＩＤ「Ｕ１」は、検索クエリＱ１１を入力したユーザがユーザＩＤ「Ｕ１」で識別されるユーザ（ユーザＵ１）であることを示す。また、日時「２０１８／９／１ＰＭ１７：００」は、検索サーバがユーザＵ１から検索クエリＱ１１を受け付けた日時が２０１８年９月１日の午後１７：００であることを示す。また、検索クエリ「六本木パスタ」は、ユーザＵ１によって入力された検索クエリＱ１１を示す。具体的には、検索クエリ「六本木パスタ」は、地名を示す「六本木」と食品の種類を示す「パスタ」の文字とが区切り文字であるスペースで区切られた文字列であることを示す。 In the example shown in the first record of FIG. 14, the search query (search query Q11) identified by the search query ID "Q11" corresponds to the search query Q11 shown in FIG. Further, the user ID "U1" indicates that the user who has input the search query Q11 is the user (user U1) identified by the user ID "U1". Further, the date and time "2018/9/1 PM 17:00" indicates that the date and time when the search server receives the search query Q11 from the user U1 is 17:00 pm on September 1, 2018. Further, the search query "Roppongi pasta" indicates the search query Q11 input by the user U1. Specifically, the search query "Roppongi pasta" indicates that the characters "Roppongi" indicating the place name and the characters "pasta" indicating the type of food are separated by a space which is a delimiter.

（ベクトル情報記憶部５３２）
ベクトル情報記憶部５３２は、検索クエリの分散表現であるベクトルに関する各種の情報を記憶する。図１５に、実施形態に係るベクトル情報記憶部の一例を示す。図１５に示す例では、ベクトル情報記憶部５３２は、「ベクトルＩＤ」、「検索クエリＩＤ」、「ベクトル情報」といった項目を有する。 (Vector information storage unit 532)
The vector information storage unit 532 stores various information related to the vector, which is a distributed representation of the search query. FIG. 15 shows an example of the vector information storage unit according to the embodiment. In the example shown in FIG. 15, the vector information storage unit 532 has items such as "vector ID", "search query ID", and "vector information".

「ベクトルＩＤ」は、検索クエリの分散表現であるベクトルを識別するための識別情報を示す。「検索クエリＩＤ」は、ベクトルに対応する検索クエリを識別するための識別情報を示す。「ベクトル情報」は、検索クエリの分散表現であるＮ次元のベクトルを示す。検索クエリの分散表現であるベクトルは、例えば、１２８次元のベクトルである。 The "vector ID" indicates identification information for identifying a vector which is a distributed representation of a search query. The "search query ID" indicates identification information for identifying the search query corresponding to the vector. "Vector information" indicates an N-dimensional vector which is a distributed representation of a search query. The vector that is the distributed representation of the search query is, for example, a 128-dimensional vector.

図１５の１レコード目に示す例では、ベクトルＩＤ「ＱＶ１１」で識別されるベクトル（ベクトルＱＶ１１）は、図１０に示した検索クエリＱ１１の分散表現であるベクトルＱＶ１１に対応する。また、検索クエリＩＤ「Ｑ１１」で識別される検索クエリ（検索クエリＱ１１）は、ベクトルＱＶ１１に対応する検索クエリが検索クエリＱ１１であることを示す。また、ベクトル情報「ＱＶＤＴ１１」は、検索クエリＱ１１の分散表現であるＮ次元のベクトルを示す。 In the example shown in the first record of FIG. 15, the vector (vector QV11) identified by the vector ID “QV11” corresponds to the vector QV11 which is the distributed representation of the search query Q11 shown in FIG. Further, the search query (search query Q11) identified by the search query ID "Q11" indicates that the search query corresponding to the vector QV11 is the search query Q11. Further, the vector information "QVDT11" indicates an N-dimensional vector which is a distributed representation of the search query Q11.

（分類定義記憶部５３３）
分類定義記憶部５３３は、検索クエリが分類されるカテゴリの定義に関する各種の情報を記憶する。図１６に、実施形態に係る分類定義記憶部の一例を示す。図１６に示す例では、分類定義記憶部５３３は、「大分類ＩＤ」、「大分類」、「小分類ＩＤ」、「小分類」といった項目を有する。 (Classification definition storage unit 533)
The classification definition storage unit 533 stores various information regarding the definition of the category in which the search query is classified. FIG. 16 shows an example of the classification definition storage unit according to the embodiment. In the example shown in FIG. 16, the classification definition storage unit 533 has items such as "major classification ID", "major classification", "minor classification ID", and "minor classification".

「大分類」は、検索クエリが分類されるカテゴリの大分類を示す。「大分類ＩＤ」は、大分類を識別するための識別情報を示す。図１６に示す例では、大分類「購買行動系」は、図１の下段に示す例で説明した大分類に対応する。大分類「購買行動系」は、検索クエリをユーザの購買行動に基づいて分類するカテゴリの大分類を示す。図１６に示す例では、大分類「購買行動系」は、さらに４つの小分類を有する。大分類ＩＤ「ＣＡＴ１」は、大分類「購買行動系」を識別するための識別情報を示す。 "Major classification" indicates a major classification of the category in which the search query is classified. The "major classification ID" indicates identification information for identifying the major classification. In the example shown in FIG. 16, the major classification “purchasing behavior system” corresponds to the major classification described in the example shown in the lower part of FIG. Major classification "Purchasing behavior system" indicates a major classification of categories that classify search queries based on the user's purchasing behavior. In the example shown in FIG. 16, the major classification "purchasing behavior system" has four further minor classifications. The major classification ID "CAT1" indicates identification information for identifying the major classification "purchasing behavior system".

「小分類」、検索クエリが分類されるカテゴリの小分類を示す。「小分類ＩＤ」は、小分類を識別するための識別情報を示す。図１６に示す例では、小分類「飲食店を探す」は、大分類「購買行動系」に属する分類であって、小分類に分類される検索クエリが、ユーザによって飲食店を探す意図で入力された検索クエリであることを示す。小分類ＩＤ「ＣＡＴ１１」は、小分類「飲食店を探す」を識別するための識別情報を示す。 "Minor classification", indicates the subclassification of the category in which the search query is classified. The "minor classification ID" indicates identification information for identifying the minor classification. In the example shown in FIG. 16, the minor classification "find a restaurant" is a classification belonging to the major classification "purchasing behavior system", and the search query classified into the minor classification is input with the intention of searching for a restaurant by the user. Indicates that the search query was made. The sub-category ID "CAT11" indicates identification information for identifying the sub-category "find a restaurant".

小分類「商品を探す」は、大分類「購買行動系」に属する分類であって、小分類に分類される検索クエリが、ユーザによって商品を探す意図で入力された検索クエリであることを示す。小分類ＩＤ「ＣＡＴ１２」は、小分類「商品を探す」を識別するための識別情報を示す。 The sub-category "search for products" is a category that belongs to the major category "purchasing behavior system", and indicates that the search query classified in the sub-category is a search query entered by the user with the intention of searching for products. .. The sub-classification ID "CAT12" indicates identification information for identifying the sub-classification "search for a product".

小分類「飲食店を予約」は、大分類「購買行動系」に属する分類であって、小分類に分類される検索クエリが、ユーザによって飲食店を予約する意図で入力された検索クエリであることを示す。小分類ＩＤ「ＣＡＴ１３」は、小分類「飲食店を予約」を識別するための識別情報を示す。 The sub-category "reserve restaurant" is a classification belonging to the major category "purchasing behavior system", and the search query classified into the sub-category is a search query input by the user with the intention of reserving the restaurant. Show that. The sub-category ID "CAT13" indicates identification information for identifying the sub-category "reserve a restaurant".

小分類「商品を購入」は、大分類「購買行動系」に属する分類であって、小分類に分類される検索クエリが、ユーザによって商品を購入する意図で入力された検索クエリであることを示す。小分類ＩＤ「ＣＡＴ１４」は、小分類「商品を購入」を識別するための識別情報を示す。 The sub-category "Purchase a product" is a category that belongs to the major category "Purchase behavior system", and the search query classified in the sub-category is a search query entered by the user with the intention of purchasing the product. show. The sub-category ID "CAT14" indicates identification information for identifying the sub-category "purchase a product".

（カテゴリ情報記憶部５３４）
カテゴリ情報記憶部５３４は、検索クエリが属するカテゴリに関する各種の情報を記憶する。具体的には、カテゴリ情報記憶部５３４は、学習済みの第２学習モデルに検索クエリが入力された際に、第２学習モデルが出力するカテゴリに関する各種の情報を記憶する。図１７に、実施形態に係るカテゴリ情報記憶部の一例を示す。図１７に示す例では、カテゴリ情報記憶部５３４は、「検索クエリＩＤ」、「大分類ＩＤ」、「小分類ＩＤ」、「確率（％）」といった項目を有する。 (Category information storage unit 534)
The category information storage unit 534 stores various information about the category to which the search query belongs. Specifically, the category information storage unit 534 stores various information related to the category output by the second learning model when a search query is input to the trained second learning model. FIG. 17 shows an example of the category information storage unit according to the embodiment. In the example shown in FIG. 17, the category information storage unit 534 has items such as "search query ID", "major classification ID", "minor classification ID", and "probability (%)".

「検索クエリＩＤ」は、ユーザによって入力された検索クエリを識別するための識別情報を示す。図１７に示す例では、検索クエリＩＤ「Ｑ１１」で識別される検索クエリ（検索クエリＱ１１）は、図１２に示した検索クエリＱ１１に対応する。 The "search query ID" indicates identification information for identifying the search query entered by the user. In the example shown in FIG. 17, the search query (search query Q11) identified by the search query ID "Q11" corresponds to the search query Q11 shown in FIG.

「大分類ＩＤ」は、大分類を識別するための識別情報を示す。「小分類ＩＤ」は、小分類を識別するための識別情報を示す。「確率（％）」は、学習済みの第２学習モデルに検索クエリが入力された際に、第２学習モデルが出力する小分類毎の確率を示す。図１７に示す例では、確率（％）「９０」は、検索クエリＱ１１がカテゴリＣＡＴ１１に分類される確率が９０％であることを示す。 The "major classification ID" indicates identification information for identifying the major classification. The "minor classification ID" indicates identification information for identifying the minor classification. The "probability (%)" indicates the probability for each subclass output by the second learning model when a search query is input to the trained second learning model. In the example shown in FIG. 17, the probability (%) "90" indicates that the probability that the search query Q11 is classified into the category CAT11 is 90%.

（モデル情報記憶部５３５）
モデル情報記憶部５３５は、生成装置５０によって生成された学習モデルに関する各種の情報を記憶する。図１８に、実施形態に係るモデル情報記憶部の一例を示す。図１８に示す例では、モデル情報記憶部５３５は、「モデルＩＤ」、「モデルデータ」といった項目を有する。 (Model information storage unit 535)
The model information storage unit 535 stores various information about the learning model generated by the generation device 50. FIG. 18 shows an example of the model information storage unit according to the embodiment. In the example shown in FIG. 18, the model information storage unit 535 has items such as "model ID" and "model data".

「モデルＩＤ」は、生成装置５０によって生成された学習モデルを識別するための識別情報を示す。「モデルデータ」は、生成装置５０によって生成された学習モデルのモデルデータを示す。例えば、「モデルデータ」には、検索クエリを分散表現に変換するためのデータが格納される。 The "model ID" indicates identification information for identifying the learning model generated by the generation device 50. The "model data" indicates the model data of the learning model generated by the generation device 50. For example, "model data" stores data for converting a search query into a distributed representation.

図１８の１レコード目に示す例では、モデルＩＤ「Ｍ１」で識別される学習モデルは、図１に示した第１モデルＭ１に対応する。また、モデルデータ「ＭＤＴ１」は、生成装置５０によって生成された第１モデルＭ１のモデルデータ（モデルデータＭＤＴ１）を示す。 In the example shown in the first record of FIG. 18, the learning model identified by the model ID “M1” corresponds to the first model M1 shown in FIG. Further, the model data "MDT1" indicates model data (model data MDT1) of the first model M1 generated by the generation device 50.

モデルデータＭＤＴ１は、検索クエリが入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力された検索クエリに応じて、入力層に入力された検索クエリの分散表現を出力層から出力するよう、生成装置５０を機能させてもよい。 The model data MDT1 includes an input layer into which a search query is input, an output layer, a first element which is any layer from the input layer to the output layer and belongs to a layer other than the output layer, and the first element and the first element. The output layer outputs a distributed representation of the search query input to the input layer according to the search query input to the input layer, including the second element whose value is calculated based on the weight of one element. As such, the generator 50 may be made to function.

生成装置５０は、上述した回帰モデルやニューラルネットワーク等、任意の構造を有するモデルを用いて、分散表現の算出を行う。具体的には、モデルデータＭＤＴ１は、検索クエリが入力された場合に、分散表現を出力するように係数が設定される。生成装置５０は、このようなモデルデータＭＤＴ１を用いて、分散表現を算出する。 The generation device 50 calculates the distributed representation using a model having an arbitrary structure such as the regression model and the neural network described above. Specifically, the model data MDT1 is set with a coefficient so as to output a distributed representation when a search query is input. The generation device 50 calculates the distributed representation using such model data MDT1.

なお、上記例では、モデルデータＭＤＴ１が、検索クエリが入力された場合に、検索クエリの分散表現を出力するモデル（以下、モデルＸ１という。）である例を示した。しかし、実施形態に係るモデルデータＭＤＴ１は、モデルＸ１にデータの入出力を繰り返すことで得られる結果に基づいて生成されるモデルであってもよい。例えば、モデルデータＭＤＴ１は、検索クエリを入力とした際に、モデルＸ１が出力した分散表現を入力して学習されたモデル（以下、モデルＹ１という。）であってもよい。または、モデルデータＭＤＴ１は、検索クエリを入力とし、モデルＹ１の出力値を出力とするよう学習されたモデルであってもよい。 In the above example, the model data MDT1 is a model (hereinafter referred to as model X1) that outputs a distributed representation of the search query when the search query is input. However, the model data MDT1 according to the embodiment may be a model generated based on the result obtained by repeating the input / output of data to the model X1. For example, the model data MDT1 may be a model (hereinafter referred to as model Y1) learned by inputting the distributed representation output by the model X1 when the search query is input. Alternatively, the model data MDT1 may be a model trained to input a search query and output the output value of the model Y1.

また、生成装置５０がＧＡＮ（Generative Adversarial Networks）を用いた推定処理を行う場合、モデルデータＭＤＴ１は、ＧＡＮの一部を構成するモデルであってもよい。 Further, when the generation device 50 performs estimation processing using GAN (Generative Adversarial Networks), the model data MDT1 may be a model constituting a part of GAN.

図１８の２レコード目に示す例では、モデルＩＤ「Ｍ２」で識別される学習モデルは、図１に示した第２モデルＭ２に対応する。また、モデルデータ「ＭＤＴ２」は、生成装置５０によって生成された第２モデルＭ２のモデルデータ（モデルデータＭＤＴ２）を示す。 In the example shown in the second record of FIG. 18, the learning model identified by the model ID “M2” corresponds to the second model M2 shown in FIG. Further, the model data "MDT2" indicates model data (model data MDT2) of the second model M2 generated by the generation device 50.

モデルデータＭＤＴ２は、検索クエリが入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力された検索クエリに応じて、入力層に入力された検索クエリが各カテゴリに属する確率を出力層から出力するよう、生成装置５０を機能させてもよい。 The model data MDT2 includes an input layer into which a search query is input, an output layer, a first element which is any layer from the input layer to the output layer and belongs to a layer other than the output layer, and the first element and the first element. An output layer that includes a second element whose value is calculated based on the weight of one element, and the probability that the search query input to the input layer belongs to each category according to the search query input to the input layer. The generator 50 may function to output from.

生成装置５０は、上述した回帰モデルやニューラルネットワーク等、任意の構造を有するモデルを用いて、検索クエリが各カテゴリに属する確率の算出を行う。具体的には、モデルデータＭＤＴ２は、検索クエリが入力された場合に、検索クエリが各カテゴリに属する確率を出力するように係数が設定される。生成装置５０は、このようなモデルデータＭＤＴ２を用いて、検索クエリが各カテゴリに属する確率を算出する。 The generation device 50 calculates the probability that the search query belongs to each category by using a model having an arbitrary structure such as the regression model and the neural network described above. Specifically, in the model data MDT2, when a search query is input, a coefficient is set so as to output the probability that the search query belongs to each category. The generation device 50 uses such model data MDT2 to calculate the probability that the search query belongs to each category.

なお、上記例では、モデルデータＭＤＴ２が、検索クエリが入力された場合に、検索クエリの分散表現を出力するモデル（以下、モデルＸ２という。）である例を示した。しかし、実施形態に係るモデルデータＭＤＴ２は、モデルＸ２にデータの入出力を繰り返すことで得られる結果に基づいて生成されるモデルであってもよい。例えば、モデルデータＭＤＴ２は、検索クエリを入力とした際に、モデルＸ２が出力した分散表現を入力して学習されたモデル（以下、モデルＹ２という。）であってもよい。または、モデルデータＭＤＴ２は、検索クエリを入力とし、モデルＹ２の出力値を出力とするよう学習されたモデルであってもよい。 In the above example, the model data MDT2 is a model (hereinafter referred to as model X2) that outputs a distributed representation of the search query when the search query is input. However, the model data MDT2 according to the embodiment may be a model generated based on the result obtained by repeating the input / output of data to the model X2. For example, the model data MDT2 may be a model (hereinafter referred to as model Y2) learned by inputting the distributed representation output by the model X2 when the search query is input. Alternatively, the model data MDT2 may be a model trained to input a search query and output the output value of the model Y2.

また、生成装置５０がＧＡＮ（Generative Adversarial Networks）を用いた推定処理を行う場合、モデルデータＭＤＴ２は、ＧＡＮの一部を構成するモデルであってもよい。 Further, when the generation device 50 performs estimation processing using GAN (Generative Adversarial Networks), the model data MDT2 may be a model constituting a part of GAN.

（制御部５２）
図１３の説明に戻って、制御部５２は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、生成装置５０内部の記憶装置に記憶されている各種プログラム（生成プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部５２は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Control unit 52)
Returning to the description of FIG. 13, the control unit 52 is a controller, and is stored in a storage device inside the generation device 50 by, for example, a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). It is realized by executing various programs (corresponding to an example of a generation program) using the RAM as a work area. Further, the control unit 52 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

また、制御部５２は、モデル情報記憶部５３５に記憶されている第１モデルＭ１（モデルデータＭＤＴ１）に従った情報処理により、入力層に入力された検索クエリに対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、分散表現を出力層から出力するよう、コンピュータを機能させる。 Further, the control unit 52 sends the search query input to the input layer to each layer other than the output layer by information processing according to the first model M1 (model data MDT1) stored in the model information storage unit 535. The computer is made to function so that the distributed representation is output from the output layer by performing an operation based on the first element and the weight of the first element with each element to which it belongs as the first element.

また、制御部５２は、モデル情報記憶部５３５に記憶されている第２モデルＭ２（モデルデータＭＤＴ２）に従った情報処理により、入力層に入力された検索クエリに対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、検索クエリが各カテゴリに属する確率を出力層から出力するよう、コンピュータを機能させる。 Further, the control unit 52 sends the search query input to the input layer to each layer other than the output layer by information processing according to the second model M2 (model data MDT2) stored in the model information storage unit 535. By performing an operation based on the first element and the weight of the first element with each element to which it belongs as the first element, the computer is made to function so that the probability that the search query belongs to each category is output from the output layer.

図１３に示すように、制御部５２は、取得部５２１と、抽出部５２２と、生成部５２３を有し、以下に説明する情報処理の作用を実現または実行する。なお、制御部５２の内部構成は、図１３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 13, the control unit 52 includes an acquisition unit 521, an extraction unit 522, and a generation unit 523, and realizes or executes the information processing operation described below. The internal configuration of the control unit 52 is not limited to the configuration shown in FIG. 13, and may be any other configuration as long as it is configured to perform information processing described later.

（取得部５２１）
取得部５２１は、種々の情報を取得する。具体的には、取得部５２１は、ユーザによって入力された検索クエリを検索サーバ２０から取得する。取得部５２１は、ユーザによって入力された検索クエリを取得すると、取得した検索クエリをクエリ情報記憶部５３１に格納する。また、取得部５２１は、検索クエリの分散表現であるベクトルに関するベクトル情報を取得する。取得部５２１は、ベクトル情報を取得すると、取得したベクトル情報をベクトル情報記憶部５３２に格納する。また、取得部５２１は、検索クエリと検索クエリが属するカテゴリの分類を定義する情報を取得する。取得部５２１は、検索クエリと検索クエリが属するカテゴリの分類を定義する分類定義情報を取得すると、取得した分類定義情報を分類定義記憶部５３３に格納する。また、取得部５２１は、検索クエリが属するカテゴリに関するカテゴリ情報を取得する。取得部５２１は、カテゴリ情報を取得すると、取得したカテゴリ情報をカテゴリ情報記憶部５３４に格納する。 (Acquisition unit 521)
The acquisition unit 521 acquires various information. Specifically, the acquisition unit 521 acquires the search query input by the user from the search server 20. When the acquisition unit 521 acquires the search query input by the user, the acquisition unit 521 stores the acquired search query in the query information storage unit 531. Further, the acquisition unit 521 acquires vector information regarding a vector which is a distributed representation of the search query. When the acquisition unit 521 acquires the vector information, the acquisition unit 521 stores the acquired vector information in the vector information storage unit 532. In addition, the acquisition unit 521 acquires information that defines the search query and the classification of the category to which the search query belongs. When the acquisition unit 521 acquires the search query and the classification definition information that defines the classification of the category to which the search query belongs, the acquisition unit 521 stores the acquired classification definition information in the classification definition storage unit 533. Further, the acquisition unit 521 acquires the category information regarding the category to which the search query belongs. When the acquisition unit 521 acquires the category information, the acquisition unit 521 stores the acquired category information in the category information storage unit 534.

（抽出部５２２）
抽出部５２２は、種々の情報を抽出する。具体的には、抽出部５２２は、取得部５２１によって取得された検索クエリのうち、同一のユーザによって所定の時間内に入力された複数の検索クエリを抽出する。例えば、抽出部５２２は、同一のユーザによって各検索クエリが入力された時間の間隔が所定の時間内である複数の検索クエリを抽出する。続いて、抽出部５２２は、同一のユーザによって所定の時間内に入力された複数の検索クエリのうち、同一のユーザによって所定の時間内に連続して入力された一対の検索クエリを抽出する。例えば、抽出部５２２は、同一のユーザによって各検索クエリのペアが入力された時間の間隔が所定の時間内である複数の検索クエリを抽出する。例えば、抽出部５２２は、取得部５２１によって取得された検索クエリのうち、同一のユーザＵ１によって所定の時間内に連続して入力された４個の検索クエリである検索クエリＱ１１（「六本木パスタ」）、検索クエリＱ１２（「六本木イタリアン」）、検索クエリＱ１３（「赤坂パスタ」）、検索クエリＱ１４（「麻布パスタ」）を抽出する。抽出部５２２は、検索クエリが入力された順番に並べると、検索クエリＱ１１、検索クエリＱ１２、検索クエリＱ１３、検索クエリＱ１４の順番で入力された４個の検索クエリを抽出する。続いて、抽出部５２２は、４個の検索クエリを抽出すると、時系列的に隣り合う２つの検索クエリを一対の検索クエリとして、３対の検索クエリのペアである（検索クエリＱ１１、検索クエリＱ１２）、（検索クエリＱ１２、検索クエリＱ１３）、（検索クエリＱ１３、検索クエリＱ１４）を抽出する。なお、抽出部５２２は、同一のユーザによって全ての検索クエリが所定の時間内に入力された複数の検索クエリを抽出してもよい。そして、抽出部５２２は、時系列的に隣り合うか否かに関わらず、抽出した複数の検索クエリの中から２つの検索クエリを選択して、選択した２つの検索クエリを一対の検索クエリとして抽出してもよい。 (Extraction unit 522)
The extraction unit 522 extracts various information. Specifically, the extraction unit 522 extracts a plurality of search queries input by the same user within a predetermined time from the search queries acquired by the acquisition unit 521. For example, the extraction unit 522 extracts a plurality of search queries in which the time interval in which each search query is input by the same user is within a predetermined time. Subsequently, the extraction unit 522 extracts a pair of search queries continuously input by the same user within a predetermined time from among a plurality of search queries input by the same user within a predetermined time. For example, the extraction unit 522 extracts a plurality of search queries in which the time interval in which each search query pair is input by the same user is within a predetermined time. For example, the extraction unit 522 is a search query Q11 (“Roppongi pasta”” which is four search queries continuously input by the same user U1 within a predetermined time among the search queries acquired by the acquisition unit 521. ), Search query Q12 (“Roppongi Italian”), search query Q13 (“Akasaka pasta”), and search query Q14 (“Azabu pasta”). When the search queries are arranged in the order in which the search queries are input, the extraction unit 522 extracts four search queries input in the order of search query Q11, search query Q12, search query Q13, and search query Q14. Subsequently, when the extraction unit 522 extracts four search queries, it is a pair of three pairs of search queries (search query Q11, search query), with two search queries adjacent in chronological order as a pair of search queries. Q12), (search query Q12, search query Q13), (search query Q13, search query Q14) are extracted. The extraction unit 522 may extract a plurality of search queries in which all the search queries are input by the same user within a predetermined time. Then, the extraction unit 522 selects two search queries from the plurality of extracted search queries regardless of whether they are adjacent to each other in chronological order, and the two selected search queries are used as a pair of search queries. It may be extracted.

また、抽出部５２２は、取得部５２１によって取得された検索クエリのうち、所定の検索クエリと所定の検索クエリに無関係な他の検索クエリとを抽出する。例えば、抽出部５２２は、取得部５２１によって取得された検索クエリの中から、所定の検索クエリを抽出する。続いて、抽出部５２２は、取得部５２１によって取得された検索クエリの中から、所定の検索クエリとは無関係にランダムに他の検索クエリを抽出する。 Further, the extraction unit 522 extracts a predetermined search query and other search queries unrelated to the predetermined search query from the search queries acquired by the acquisition unit 521. For example, the extraction unit 522 extracts a predetermined search query from the search queries acquired by the acquisition unit 521. Subsequently, the extraction unit 522 randomly extracts other search queries from the search queries acquired by the acquisition unit 521, regardless of the predetermined search query.

（生成部５２３）
生成部５２３は、種々の情報を生成する。具体的には、生成部５２３は、取得部５２１によって取得された検索クエリのうち、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして学習することで、所定の検索クエリから所定の検索クエリの特徴情報を予測する学習モデルを生成する。具体的には、生成部５２３は、同一のユーザによって所定の時間内に入力された複数の検索クエリの分散表現が類似するように学習モデルを学習させることで、所定の検索クエリから所定の検索クエリの特徴情報を予測する学習モデルを生成する。例えば、生成部５２３は、所定の時間内に続けて入力された一対の検索クエリの分散表現が類似するように学習することで、学習モデルを生成する。例えば、生成部５２３は、一対の検索クエリの学習前の分散表現の類似度の値を算出する。また、生成部５２３は、一対の検索クエリの学習後の分散表現の類似度の値を算出する。続いて、生成部５２３は、学習前の分散表現の類似度の値よりも、学習後の分散表現の類似度の値が大きくなるように学習モデルを学習させる。このように、生成部５２３は、一対の検索クエリに対応する一対の分散表現である２つのベクトルが分散表現空間上で類似するように学習モデルを学習させることで、検索クエリから分散表現を出力する学習モデルを生成する。より具体的には、生成部５２３は、ＲＮＮの一種であるＬＳＴＭを分散表現生成に用いたＤＳＳＭの技術を用いて、検索クエリから分散表現を出力する学習モデルを生成する。例えば、生成部５２３は、学習モデルの正解データとして、同一のユーザによって所定の時間内に入力された一対の検索クエリが類似する特徴を有するものとして、所定の検索クエリの分散表現と、所定の検索クエリと対となる他の検索クエリの分散表現とが、分散表現空間上で近くに存在するように学習する。また、生成部５２３は、第１学習モデルを生成すると、第１学習モデルを識別する識別情報と対応付けて、生成した第１学習モデル（モデルデータＭＤＴ１）をモデル情報記憶部５３５に格納する。 (Generation unit 523)
The generation unit 523 generates various information. Specifically, the generation unit 523 learns that among the search queries acquired by the acquisition unit 521, a plurality of search queries input by the same user within a predetermined time have similar characteristics. , Generate a learning model that predicts the feature information of a given search query from a given search query. Specifically, the generation unit 523 trains a learning model so that the distributed expressions of a plurality of search queries input by the same user within a predetermined time are similar to each other, thereby performing a predetermined search from a predetermined search query. Generate a learning model that predicts query feature information. For example, the generation unit 523 generates a learning model by learning so that the distributed representations of a pair of search queries that are continuously input within a predetermined time are similar. For example, the generation unit 523 calculates the value of the similarity of the distributed representation before learning the pair of search queries. In addition, the generation unit 523 calculates the value of the similarity of the distributed representation after learning the pair of search queries. Subsequently, the generation unit 523 trains the learning model so that the value of the similarity of the distributed expression after learning is larger than the value of the similarity of the distributed expression before learning. In this way, the generation unit 523 outputs the distributed expression from the search query by training the learning model so that the two vectors, which are the pair of distributed expressions corresponding to the pair of search queries, are similar on the distributed expression space. Generate a learning model to do. More specifically, the generation unit 523 generates a learning model that outputs a distributed expression from a search query by using the DSSM technology that uses LSTM, which is a kind of RNN, for the distributed expression generation. For example, the generation unit 523 assumes that the pair of search queries input by the same user within a predetermined time has similar characteristics as the correct answer data of the learning model, and the distributed representation of the predetermined search query and the predetermined search query are predetermined. Learn so that the distributed representations of other search queries that are paired with the search query are close together in the distributed representation space. Further, when the first learning model is generated, the generation unit 523 stores the generated first learning model (model data MDT1) in the model information storage unit 535 in association with the identification information that identifies the first learning model.

〔２−４．第１学習モデルの一例〕
ここで、図１９を用いて生成装置５０が生成する第１学習モデルの一例について説明する。図１９は、実施形態に係る第１学習モデルの一例を示す図である。図１９に示す例では、生成装置５０が生成する第１学習モデルＭ１は、３層のＬＳＴＭＲＮＮで構成されている。図１９に示す例では、抽出部５２２は、同一のユーザＵ１によって所定の時間内に連続して入力された「六本木パスタ」という検索クエリＱ１１と「六本木イタリアン」という検索クエリＱ１２とから成る一対の検索クエリを抽出する。生成部５２３は、抽出部５２２によって抽出されたた検索クエリＱ１１を第１学習モデルＭ１の入力層に入力する（ステップＳ４１）。 [2-4. An example of the first learning model]
Here, an example of the first learning model generated by the generation device 50 will be described with reference to FIG. FIG. 19 is a diagram showing an example of the first learning model according to the embodiment. In the example shown in FIG. 19, the first learning model M1 generated by the generation device 50 is composed of three layers of LSTM RNNs. In the example shown in FIG. 19, the extraction unit 522 is a pair consisting of a search query Q11 "Roppongi pasta" and a search query Q12 "Roppongi Italian" continuously input by the same user U1 within a predetermined time. Extract search queries. The generation unit 523 inputs the search query Q11 extracted by the extraction unit 522 to the input layer of the first learning model M1 (step S41).

続いて、生成部５２３は、第１学習モデルＭ１の出力層から検索クエリＱ１１の分散表現である２５６次元のベクトルＢＱＶ１１を出力する。また、生成部５２３は、抽出部５２２によって抽出された検索クエリＱ１２を第１学習モデルＭ１の入力層に入力する。続いて、生成部５２３は、第１学習モデルＭ１の出力層から検索クエリＱ１２の分散表現である２５６次元のベクトルＢＱＶ１２を出力する（ステップＳ４２）。 Subsequently, the generation unit 523 outputs a 256-dimensional vector BQV11 which is a distributed representation of the search query Q11 from the output layer of the first learning model M1. Further, the generation unit 523 inputs the search query Q12 extracted by the extraction unit 522 to the input layer of the first learning model M1. Subsequently, the generation unit 523 outputs the 256-dimensional vector BQV12, which is a distributed representation of the search query Q12, from the output layer of the first learning model M1 (step S42).

続いて、生成部５２３は、連続して入力された２つの検索クエリの分散表現が類似するように学習することで、検索クエリから分散表現を出力する第１学習モデルＭ１を生成する（ステップＳ４３）。例えば、第１学習モデルＭ１にフィードバックをかける前（学習前）の検索クエリＱ１１の分散表現であるベクトルＢＱＶ１１と検索クエリＱ１２の分散表現であるベクトルＢＱＶ１２とのなす角度の大きさをΘとする。また、第１学習モデルＭ１にフィードバックをかけた後（学習後）の検索クエリＱ１１の分散表現であるベクトルＱＶ１１と検索クエリＱ１２の分散表現であるベクトルＱＶ１２とのなす角度の大きさをΦとする。この時、生成部５２３は、ΘよりもΦが小さくなるように、第１学習モデルＭ１を学習させる。例えば、生成部５２３は、ベクトルＢＱＶ１１とベクトルＢＱＶ１２のコサイン類似度の値を算出する。また、生成部５２３は、ベクトルＱＶ１１とベクトルＱＶ１２のコサイン類似度の値を算出する。続いて、生成部５２３は、ベクトルＢＱＶ１１とベクトルＢＱＶ１２のコサイン類似度の値よりも、ベクトルＱＶ１１とベクトルＱＶ１２のコサイン類似度の値が大きくなるように（値が１に近づくように）学習モデルＭ１を学習させる。このように、生成部５２３は、一対の検索クエリに対応する一対の分散表現である２つのベクトルが分散表現空間上で類似するように第１学習モデルＭ１を学習させることで、検索クエリから分散表現を出力する第１学習モデルＭ１を生成する。なお、生成部５２３は、コサイン類似度に限らず、ベクトル間の距離尺度として適用可能な指標であれば、どのような指標に基づいて分散表現の間の類似度を算出してもよい。また、生成部５２３は、ベクトル間の距離尺度として適用可能な指標であれば、どのような指標に基づいて学習モデルＭ１を学習させてもよい。例えば、生成部５２３は、分散表現同士のユークリッド距離や双曲空間等の非ユークリッド空間中での距離、マンハッタン距離、マハラノビス距離等といった所定の距離関数の値を算出する。続いて、生成部５２３は、分散表現同士の所定の距離関数の値（すなわち、分散表現空間における距離）が小さくなるように学習モデルＭ１を学習させてもよい。 Subsequently, the generation unit 523 generates the first learning model M1 that outputs the distributed expression from the search query by learning so that the distributed expressions of the two consecutively input search queries are similar (step S43). ). For example, let Θ be the size of the angle formed by the vector BQV11 which is the distributed expression of the search query Q11 before giving feedback to the first learning model M1 (before learning) and the vector BQV12 which is the distributed expression of the search query Q12. Further, let Φ be the size of the angle formed by the vector QV11 which is the distributed expression of the search query Q11 after giving feedback to the first learning model M1 (after learning) and the vector QV12 which is the distributed expression of the search query Q12. .. At this time, the generation unit 523 trains the first learning model M1 so that Φ is smaller than Θ. For example, the generation unit 523 calculates the value of the cosine similarity between the vector BQV11 and the vector BQV12. Further, the generation unit 523 calculates the value of the cosine similarity between the vector QV11 and the vector QV12. Subsequently, the generation unit 523 prepares the learning model M1 so that the value of the cosine similarity between the vector QV11 and the vector QV12 is larger than the value of the cosine similarity between the vector BQV11 and the vector BQV12 (so that the value approaches 1). To learn. In this way, the generation unit 523 distributes from the search query by training the first learning model M1 so that the two vectors, which are a pair of distributed expressions corresponding to the pair of search queries, are similar on the distributed expression space. Generate the first learning model M1 that outputs the expression. The generation unit 523 is not limited to the cosine similarity, and may calculate the similarity between the distributed representations based on any index as long as it is an index applicable as a distance scale between vectors. Further, the generation unit 523 may train the learning model M1 based on any index as long as it is an index applicable as a distance scale between vectors. For example, the generation unit 523 calculates the value of a predetermined distance function such as the Euclidean distance between distributed expressions, the distance in a non-Euclidean space such as a twin-curved space, the Manhattan distance, and the Mahalanobis distance. Subsequently, the generation unit 523 may train the learning model M1 so that the value of a predetermined distance function between the distributed expressions (that is, the distance in the distributed expression space) becomes small.

また、生成部５２３は、同一のユーザによって所定の時間内に入力された複数の検索クエリとして、所定の区切り文字で区切られた文字列を含む複数の検索クエリが類似する特徴を有するものとして学習することで、第１学習モデルを生成する。例えば、生成部５２３は、地名を示す「六本木」と食品の種類を示す「パスタ」の文字とが区切り文字であるスペースで区切られた検索クエリ「六本木パスタ」と、地名を示す「六本木」と料理の種類を示す「イタリアン」の文字とが区切り文字であるスペースで区切られた検索クエリ「六本木イタリアン」とが類似する特徴を有するものとして学習することで、第１学習モデルを生成する。 Further, the generation unit 523 learns that a plurality of search queries including a character string separated by a predetermined delimiter have similar characteristics as a plurality of search queries input by the same user within a predetermined time. By doing so, the first learning model is generated. For example, the generation unit 523 uses a search query "Roppongi pasta" in which the characters "Roppongi" indicating a place name and "pasta" indicating a food type are separated by a space as a delimiter, and "Roppongi" indicating a place name. The first learning model is generated by learning as having similar characteristics to the search query "Roppongi Italian" separated by a space in which the character "Italian" indicating the type of food is separated.

また、生成部５２３は、取得部５２１によって取得された検索クエリのうち、ランダムに抽出された複数の検索クエリが相違する特徴を有するものとして学習することで、第１学習モデルを生成する。具体的には、生成部５２３は、取得部５２１によって取得された検索クエリのうち、ランダムに抽出された一対の検索クエリの分散表現が相違するように学習することで、第１学習モデルを生成する。例えば、生成部５２３は、抽出部５２２によって抽出された所定の検索クエリの分散表現と、所定の検索クエリとは無関係にランダムに抽出された検索クエリの分散表現とが分散表現空間上で遠くにマッピングされるように第１学習モデルＭ１のトレーニングを行う。 Further, the generation unit 523 generates the first learning model by learning that a plurality of randomly extracted search queries among the search queries acquired by the acquisition unit 521 have different characteristics. Specifically, the generation unit 523 generates the first learning model by learning so that the distributed expressions of the pair of randomly extracted search queries among the search queries acquired by the acquisition unit 521 are different. do. For example, in the generation unit 523, the distributed representation of the predetermined search query extracted by the extraction unit 522 and the distributed representation of the search query randomly extracted regardless of the predetermined search query are far apart on the distributed representation space. The first learning model M1 is trained so as to be mapped.

また、生成部５２３は、第２学習モデルを生成する。具体的には、生成部５２３は、モデル情報記憶部５３５を参照して、生成部５２３によって生成された第１学習モデル（第１学習モデルＭ１のモデルデータＭＤＴ１）を取得する。続いて、生成部５２３は、取得した第１学習モデルを用いて、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを生成する。生成部５２３は、第１モデルＭ１を取得すると、取得した第１モデルＭ１を用いて、第２学習モデルＭ２を生成する。生成部５２３は、第１モデルＭ１を再学習させることにより、第１モデルＭ１とは学習モデルの重みである接続係数が異なる第２モデルＭ２を生成する。具体的には、生成部５２３は、検索クエリが学習モデルに入力された際に、学習モデルが出力する分散表現の分類結果が、検索クエリが属するカテゴリに対応するように学習することで、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２モデルＭ２を生成する。 In addition, the generation unit 523 generates the second learning model. Specifically, the generation unit 523 refers to the model information storage unit 535 and acquires the first learning model (model data MDT1 of the first learning model M1) generated by the generation unit 523. Subsequently, the generation unit 523 uses the acquired first learning model to generate a second learning model that predicts the category to which the predetermined search query belongs from the predetermined search query. When the generation unit 523 acquires the first model M1, the generation unit 523 generates the second learning model M2 by using the acquired first model M1. By retraining the first model M1, the generation unit 523 generates the second model M2 having a connection coefficient different from that of the first model M1 which is the weight of the learning model. Specifically, the generation unit 523 determines that when the search query is input to the learning model, the classification result of the distributed expression output by the learning model is learned so as to correspond to the category to which the search query belongs. A second model M2 that predicts the category to which a predetermined search query belongs is generated from the search query of.

具体的には、生成部５２３は、検索クエリが学習モデルに入力された際に、学習モデルが出力する分散表現の分類結果が、検索クエリが属するカテゴリに対応するように学習することで、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを生成する。生成部５２３は、入力情報として検索クエリが学習モデルに入力された際に、出力情報として検索クエリが属するカテゴリ毎の確率を出力する第２学習モデルを生成する。例えば、生成部５２３は、第１モデルＭ１を用いて、入力情報として所定の検索クエリが学習モデルに入力された際に、出力情報として検索クエリの分散表現がそのカテゴリに分類される確率をカテゴリ毎に出力する第２モデルＭ２を生成する。生成部５２３は、入力情報として所定の検索クエリが入力されると、出力情報として所定の検索クエリの分散表現が正解カテゴリに分類される確率が所定の閾値を超えるように第２モデルを学習させる。そして、生成部５２３は、入力情報として所定の検索クエリが入力された際に、所定の検索クエリの分散表現がそのカテゴリに属する確率が所定の閾値を超えるカテゴリを、所定の検索クエリのカテゴリとして出力する第２モデルＭ２を生成する。また、生成部５２３は、第２学習モデルを生成すると、第２学習モデルを識別する識別情報と対応付けて、生成した第２学習モデル（モデルデータＭＤＴ２）をモデル情報記憶部５３５に格納する。 Specifically, the generation unit 523 determines that when the search query is input to the learning model, the classification result of the distributed expression output by the learning model is learned so as to correspond to the category to which the search query belongs. A second learning model that predicts the category to which a predetermined search query belongs is generated from the search query of. The generation unit 523 generates a second learning model that outputs the probability of each category to which the search query belongs as output information when the search query is input to the learning model as input information. For example, the generation unit 523 uses the first model M1 to classify the probability that the distributed expression of the search query is classified into the category as the output information when the predetermined search query is input to the learning model as the input information. A second model M2 to be output is generated every time. When a predetermined search query is input as input information, the generation unit 523 trains the second model so that the probability that the distributed expression of the predetermined search query is classified into the correct answer category as output information exceeds a predetermined threshold. .. Then, when a predetermined search query is input as input information, the generation unit 523 sets a category in which the probability that the distributed expression of the predetermined search query belongs to the category exceeds a predetermined threshold as a predetermined search query category. Generate the second model M2 to be output. Further, when the second learning model is generated, the generation unit 523 stores the generated second learning model (model data MDT2) in the model information storage unit 535 in association with the identification information that identifies the second learning model.

例えば、生成部５２３は、図１８に示すモデル情報記憶部５３５を参照して、第１モデルＭ１（第１モデルＭ１のモデルデータＭＤＴ１）を取得する。続いて、生成部５２３は、図１６に示す分類定義記憶部５３３を参照して、検索クエリを分類するカテゴリの大分類を選択する。続いて、生成部５２３は、大分類を選択すると、第２モデルＭ２の学習データとして、検索クエリと検索クエリが属する小分類との組を学習する。 For example, the generation unit 523 acquires the first model M1 (model data MDT1 of the first model M1) with reference to the model information storage unit 535 shown in FIG. Subsequently, the generation unit 523 selects a major classification of the category for classifying the search query with reference to the classification definition storage unit 533 shown in FIG. Subsequently, when the major classification is selected, the generation unit 523 learns a set of the search query and the minor classification to which the search query belongs as the learning data of the second model M2.

例えば、検索クエリＱ１１（「六本木パスタ」）が属する正解カテゴリがＣＡＴ１１（「飲食店を探す」）であるとする。生成部５２３は、入力情報として検索クエリＱ１１（「六本木パスタ」）が第２モデルＭ２に入力された際に、第２モデルＭ２の出力層から検索クエリＱ１１（「六本木パスタ」）の分散表現であるベクトルＢＱＶ１１を出力する。ここで、ベクトルＢＱＶ１１は、第２モデルＭ２の出力層から出力されたばかりの検索クエリＱ１１の分散表現であって、第２モデルＭ２にフィードバックをかける前（学習前）の分散表現を示す。この場合、生成部５２３は、出力された検索クエリＱ１１（「六本木パスタ」）の分散表現であるベクトルＢＱＶ１１が正解カテゴリＣＡＴ１１（「飲食店を探す」）に分類される確率が所定の閾値を超えるように第２モデルＭ２を学習させる。 For example, assume that the correct answer category to which the search query Q11 (“Roppongi pasta”) belongs is CAT11 (“find a restaurant”). When the search query Q11 (“Roppongi pasta”) is input to the second model M2 as input information, the generation unit 523 uses a distributed representation of the search query Q11 (“Roppongi pasta”) from the output layer of the second model M2. A certain vector BQV11 is output. Here, the vector BQV11 is a distributed expression of the search query Q11 just output from the output layer of the second model M2, and shows a distributed expression before giving feedback to the second model M2 (before learning). In this case, the generation unit 523 has a probability that the vector BQV11, which is a distributed expression of the output search query Q11 (“Roppongi pasta”), is classified into the correct answer category CAT11 (“find a restaurant”) exceeds a predetermined threshold value. The second model M2 is trained in this way.

例えば、生成部５２３は、学習前の第２モデルＭ２に検索クエリＱ１１（「六本木パスタ」）が入力された際に、分散表現であるベクトルＢＱＶ１１がＣＡＴ１１（「飲食店を探す」）に分類される確率を８０％、ＣＡＴ１２（「商品を探す」）に分類される確率を０％、ＣＡＴ１３（「飲食店を予約」）に分類される確率を２０％、ＣＡＴ１４（「商品を購入する」）に分類される確率を０％と出力したとする。この場合、生成部５２３は、分散表現であるベクトルＢＱＶ１１がＣＡＴ１１（「飲食店を探す」）に分類される確率を所定の閾値（例えば、９０％）を超えるように第２モデルＭ２を学習させる。また、生成部５２３は、分散表現であるベクトルＢＱＶ１１がＣＡＴ１１（「飲食店を探す」）に分類される確率が所定の閾値（例えば、９０％）を超えるように学習させるのに合わせて、分散表現であるベクトルＢＱＶ１１が他のカテゴリＣＡＴ１３（「飲食店を予約」）に分類される確率を１０％に下げるように第２モデルＭ２を学習させる。続いて、生成部５２３は、学習済みの第２モデルＭ２に入力情報として検索クエリＱ１１（「六本木パスタ」）が入力されると、検索クエリＱ１１（「六本木パスタ」）の分散表現であるベクトルＢＱＶ１１がカテゴリＣＡＴ１１（「飲食店を探す」）に属する確率が９０％を超えるので、出力情報として検索クエリが属するカテゴリをＣＡＴ１１（「飲食店を探す」）と出力する。 For example, in the generation unit 523, when the search query Q11 (“Roppongi pasta”) is input to the second model M2 before learning, the vector BQV11 which is a distributed expression is classified into CAT11 (“find a restaurant”). 80% probability of being classified as CAT12 ("find a product") 0%, 20% probability of being classified as CAT13 ("book a restaurant"), CAT14 ("buy a product") It is assumed that the probability of being classified as 0% is output. In this case, the generation unit 523 trains the second model M2 so that the probability that the vector BQV11, which is a distributed representation, is classified into CAT11 (“find a restaurant”) exceeds a predetermined threshold value (for example, 90%). .. Further, the generation unit 523 distributes the vector BQV11, which is a distributed expression, so that the probability of being classified into CAT11 (“searching for a restaurant”) exceeds a predetermined threshold value (for example, 90%). The second model M2 is trained so that the probability that the representation vector BQV11 is classified into another category CAT13 (“book a restaurant”) is reduced to 10%. Subsequently, the generation unit 523 receives the search query Q11 (“Roppongi pasta”) as input information in the trained second model M2, and the vector BQV11 is a distributed representation of the search query Q11 (“Roppongi pasta”). Since the probability of belonging to the category CAT11 ("find a restaurant") exceeds 90%, the category to which the search query belongs is output as CAT11 ("find a restaurant") as output information.

なお、生成部５２３は、大分類として、任意の数の大分類を選択してもよい。そして、生成部５２３は、入力情報として検索クエリが第２モデルＭ２に入力された際に、出力情報として検索クエリが選択した任意の数の大分類に属する各小分類に属する確率を小分類毎に出力する第２モデルＭ２を生成してもよい。また、生成部５２３は、大分類として、全ての大分類を選択してもよい。そして、生成部５２３は、検索クエリが第２モデルＭ２に入力された際に、各小分類に属する確率を全ての小分類毎に出力する第２モデルＭ２を生成してもよい。 The generation unit 523 may select any number of major classifications as the major classification. Then, when the search query is input to the second model M2 as input information, the generation unit 523 determines the probability of belonging to each minor classification belonging to any number of major classifications selected by the search query as output information for each minor classification. The second model M2 to be output to may be generated. Further, the generation unit 523 may select all major classifications as the major classification. Then, the generation unit 523 may generate the second model M2 that outputs the probability of belonging to each subclass for each subclass when the search query is input to the second model M2.

〔２−５．第２学習モデルの一例〕
ここで、図２０を用いて生成装置５０が生成する第２学習モデルの一例について説明する。図２０は、実施形態に係る第２学習モデルの一例を示す図である。図２０に示す例では、生成装置５０が生成する第２学習モデルＭ２は、第１学習モデルＭ１を用いて生成される。すなわち、生成装置５０は、第１学習モデルＭ１を再学習させることにより、第１学習モデルＭ１とは学習モデルの重みである接続係数が異なる第２学習モデルＭ２を生成する。 [2-5. An example of the second learning model]
Here, an example of the second learning model generated by the generation device 50 will be described with reference to FIG. 20. FIG. 20 is a diagram showing an example of the second learning model according to the embodiment. In the example shown in FIG. 20, the second learning model M2 generated by the generation device 50 is generated using the first learning model M1. That is, the generation device 50 relearns the first learning model M1 to generate the second learning model M2 having a connection coefficient different from that of the first learning model M1.

より具体的には、生成装置５０が生成する第２学習モデルＭ２は、第１学習モデルＭ１と同様に、３層のＬＳＴＭＲＮＮで構成されている。図２０に示す例では、抽出部５２２は、ユーザＵ１によって入力された「六本木パスタ」という検索クエリＱ１１を第２学習モデルＭ２の入力層に入力する（ステップＳ５１）。 More specifically, the second learning model M2 generated by the generation device 50 is composed of three layers of LSTM RNNs, like the first learning model M1. In the example shown in FIG. 20, the extraction unit 522 inputs the search query Q11 "Roppongi pasta" input by the user U1 to the input layer of the second learning model M2 (step S51).

続いて、生成部５２３は、第２学習モデルＭ２の出力層から検索クエリＱ１１の分散表現である２５６次元のベクトルＢＱＶ１１を出力する（ステップＳ５２）。 Subsequently, the generation unit 523 outputs a 256-dimensional vector BQV11 which is a distributed representation of the search query Q11 from the output layer of the second learning model M2 (step S52).

続いて、生成部５２３は、検索クエリＱ１１の分散表現であるベクトルＢＱＶ１１が各カテゴリに分類される確率を出力する（ステップＳ５３）。 Subsequently, the generation unit 523 outputs the probability that the vector BQV11, which is the distributed representation of the search query Q11, is classified into each category (step S53).

続いて、生成部５２３は、検索クエリＱ１１の分散表現であるベクトルＢＱＶ１１が正解カテゴリに分類される確率を高くするように第２学習モデルＭ２を学習することで、検索クエリから検索クエリのカテゴリを予測する第２モデルを生成する（ステップＳ５４）。 Subsequently, the generation unit 523 learns the second learning model M2 so as to increase the probability that the vector BQV11, which is the distributed representation of the search query Q11, is classified into the correct answer category, so that the search query category is selected from the search query. A second model to be predicted is generated (step S54).

〔２−６．第１学習モデルの生成処理のフロー〕
次に、図２１を用いて、実施形態に係る第１学習モデルの生成処理の手順について説明する。図２１は、実施形態に係る第１学習モデルの生成処理手順を示すフローチャートである。 [2-6. Flow of generation process of the first learning model]
Next, the procedure for generating the first learning model according to the embodiment will be described with reference to FIG. 21. FIG. 21 is a flowchart showing a generation processing procedure of the first learning model according to the embodiment.

図２１に示す例では、生成装置５０は、ユーザによって入力された検索クエリを取得する（ステップＳ１００１）。 In the example shown in FIG. 21, the generation device 50 acquires the search query input by the user (step S1001).

続いて、生成装置５０は、同一のユーザによって所定の時間内に入力された複数の検索クエリを抽出する（ステップＳ１００２）。 Subsequently, the generation device 50 extracts a plurality of search queries input by the same user within a predetermined time (step S1002).

続いて、生成装置５０は、抽出した複数の検索クエリが類似する特徴を有するものとして学習することで、所定の検索クエリから所定の検索クエリの特徴情報を予測する第１学習モデルを生成する（ステップＳ１００３）。 Subsequently, the generation device 50 generates a first learning model that predicts the characteristic information of a predetermined search query from the predetermined search query by learning that the extracted plurality of search queries have similar characteristics (the following). Step S1003).

〔２−７．第２学習モデルの生成処理のフロー〕
次に、図２２を用いて、実施形態に係る第２学習モデルの生成処理の手順について説明する。図２２は、実施形態に係る第２学習モデルの生成処理の手順を示すフローチャートである。 [2-7. Flow of generation process of the second learning model]
Next, the procedure for generating the second learning model according to the embodiment will be described with reference to FIG. 22. FIG. 22 is a flowchart showing the procedure of the generation process of the second learning model according to the embodiment.

図２２に示す例では、生成装置５０は、第１学習モデル（第１学習モデルＭ１のモデルデータＭＤＴ１）を取得する（ステップＳ２００１）。 In the example shown in FIG. 22, the generation device 50 acquires the first learning model (model data MDT1 of the first learning model M1) (step S2001).

続いて、生成装置５０は、第１学習モデルを用いて、所定の検索クエリから所定の検索クエリのカテゴリを予測する第２学習モデルを生成する（ステップＳ２００２）。 Subsequently, the generation device 50 uses the first learning model to generate a second learning model that predicts a predetermined search query category from a predetermined search query (step S2002).

〔３．効果〕
上述してきたように、実施形態に係る情報処理装置１００は、取得部１３１と推定部１３３と抽出部１３４を備える。取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。推定部１３３は、取得部１３１によって取得された第２学習モデルを用いて、検索クエリとして入力された文字列によって示される対象が属するカテゴリを推定する。抽出部１３４は、推定部１３３によって推定されたカテゴリに基づいて、文字列の中から、対象分野に属する抽出対象を示す対象文字列を抽出する。 [3. effect〕
As described above, the information processing apparatus 100 according to the embodiment includes an acquisition unit 131, an estimation unit 133, and an extraction unit 134. The acquisition unit 131 is generated by using a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned. A second learning model that predicts a category to which a predetermined search query belongs is acquired from a predetermined search query, which is a two-learning model. The estimation unit 133 uses the second learning model acquired by the acquisition unit 131 to estimate the category to which the target indicated by the character string input as the search query belongs. The extraction unit 134 extracts a target character string indicating an extraction target belonging to the target field from the character string based on the category estimated by the estimation unit 133.

また、一般的に、ユーザはある意図を持って検索を複数回行うと考えられるため、所定の時間内に連続して入力された検索クエリは、検索意図が近いという仮定が成り立つ。そこで、本願発明に係る生成装置５０は、所定の時間内に連続して入力された複数の検索クエリは、所定の検索意図の下で検索された検索クエリであるという点で、相互に類似する特徴を有する検索クエリであるとみなして第１モデルＭ１を学習させる。これにより、生成装置５０は、検索意図を考慮した検索クエリの特徴を第１モデルＭ１に学習させることができる。そして、生成装置５０は、検索意図を考慮した検索クエリの特徴を学習した第１モデルＭ１を活用して、所定の検索クエリから所定の検索クエリのカテゴリを予測する第２モデルを効率的に生成することができる。これにより、生成装置５０は、検索クエリを入力したユーザの検索意図を考慮したカテゴリに検索クエリを分類することを可能にする。また、従来、検索クエリをカテゴリに分類し、高い分類精度を得るためには、十分な量の正解データを用意することが必要であった。しかしながら、検索クエリ自体、多種多様であり、ロングテイルな性質を持つものであるため、多数の検索クエリに対応する正解カテゴリをラベル付けするのは、非常に手間がかかり困難である。ここで、生成装置５０は、検索意図を考慮した検索クエリの特徴を学習させた第１モデルを出発点として第２モデルを学習させることにより、正解カテゴリをラベル付けする代わりに、ユーザの検索意図（検索クエリを入力したユーザのコンテクスト）を一種の正解として、検索クエリのカテゴリを予測する第２モデルを学習させることができる。これにより、生成装置５０は、人手で検索クエリの正解カテゴリをラベル付けすることなく、第２モデルを学習させることができる。すなわち、第２モデルは、正解データが少ないときでも、十分な分類精度を得られるようになる。また、第２モデルは、正解データが多いときであれば、さらに高い分類精度を得られるようになる。したがって、情報処理装置１００は、検索クエリの分類精度を高めることができる。 Further, since it is generally considered that the user performs a search a plurality of times with a certain intention, it is assumed that the search queries continuously input within a predetermined time have similar search intentions. Therefore, the generation device 50 according to the present invention is similar to each other in that a plurality of search queries continuously input within a predetermined time are search queries searched under a predetermined search intention. The first model M1 is trained by regarding it as a search query having characteristics. As a result, the generation device 50 can make the first model M1 learn the characteristics of the search query in consideration of the search intention. Then, the generation device 50 efficiently generates a second model that predicts a predetermined search query category from a predetermined search query by utilizing the first model M1 that has learned the characteristics of the search query in consideration of the search intention. can do. This allows the generator 50 to classify the search query into categories that take into account the search intent of the user who entered the search query. Further, conventionally, in order to classify search queries into categories and obtain high classification accuracy, it has been necessary to prepare a sufficient amount of correct answer data. However, because the search queries themselves are diverse and have long-tailed properties, it is very laborious and difficult to label the correct answer categories that correspond to a large number of search queries. Here, the generation device 50 learns the second model starting from the first model in which the characteristics of the search query considering the search intention are learned, so that the user's search intention is used instead of labeling the correct answer category. (The context of the user who entered the search query) can be used as a kind of correct answer to train the second model that predicts the category of the search query. As a result, the generation device 50 can train the second model without manually labeling the correct answer category of the search query. That is, the second model can obtain sufficient classification accuracy even when the number of correct answer data is small. In addition, the second model can obtain even higher classification accuracy when there are many correct answer data. Therefore, the information processing apparatus 100 can improve the classification accuracy of the search query.

また、推定部１３３は、文字列によって示される対象が属する複数のカテゴリを推定する。また、推定部１３３は、文字列によって示される対象が各カテゴリに属する確率をカテゴリ毎に出力する。 Further, the estimation unit 133 estimates a plurality of categories to which the object indicated by the character string belongs. Further, the estimation unit 133 outputs the probability that the object indicated by the character string belongs to each category for each category.

これにより、情報処理装置１００は、文字列によって示される対象が属するカテゴリとして、対象分野を示すカテゴリと複数の非対象分野を示すカテゴリとを同時に推定することができる。 Thereby, the information processing apparatus 100 can simultaneously estimate the category indicating the target field and the category indicating the plurality of non-target fields as the category to which the target indicated by the character string belongs.

また、抽出部１３４は、推定部１３３によって推定されたカテゴリの中に、対象分野を示すカテゴリを含む文字列を対象文字列として抽出する。 Further, the extraction unit 134 extracts a character string including the category indicating the target field from the categories estimated by the estimation unit 133 as the target character string.

これにより、情報処理装置１００は、対象分野に属する対象を示す文字列を適切に抽出することができる。 As a result, the information processing apparatus 100 can appropriately extract a character string indicating a target belonging to the target field.

また、抽出部１３４は、推定部１３３によって推定されたカテゴリの中に、不要なカテゴリとして登録された不要カテゴリを含まない文字列を対象文字列として抽出する。 Further, the extraction unit 134 extracts a character string that does not include the unnecessary category registered as an unnecessary category from the categories estimated by the estimation unit 133 as the target character string.

これにより、情報処理装置１００は、非対象分野に属する対象を示す文字列を適切に取り除くことができる。 As a result, the information processing apparatus 100 can appropriately remove the character string indicating the target belonging to the non-target field.

また、実施形態に係る情報処理装置１００は、処理部１３２をさらに備える。処理部１３２は、対象分野に関するサイトに流入した流入検索クエリを取得し、流入検索クエリとして入力された文字列から、不要な文字列として登録された不要文字列を取り除いた第１文字列を取得する。推定部１３３は、処理部１３２によって取得された第１文字列によって示される対象が属するカテゴリを推定する。 Further, the information processing apparatus 100 according to the embodiment further includes a processing unit 132. The processing unit 132 acquires the inflow search query that has flowed into the site related to the target field, and acquires the first character string obtained by removing the unnecessary character string registered as an unnecessary character string from the character string input as the inflow search query. do. The estimation unit 133 estimates the category to which the object indicated by the first character string acquired by the processing unit 132 belongs.

これにより、情報処理装置１００は、辞書ベースで取り除くことができる文字列については、あらかじめ取り除いておくことで、第２学習モデルに入力される入力情報の精度を高めることができる。したがって、情報処理装置１００は、第２学習モデルによって推定されるカテゴリの推定精度を高めることができる。 As a result, the information processing apparatus 100 can improve the accuracy of the input information input to the second learning model by removing the character strings that can be removed on a dictionary basis in advance. Therefore, the information processing apparatus 100 can improve the estimation accuracy of the category estimated by the second learning model.

一般的に、同一のユーザが短時間に続けて入力した２つの検索クエリは、検索意図が同一であるか、同一でなくとも検索意図が近いと考えられる。すなわち、所定の時間内に続けて入力された一対の検索クエリは、検索意図が同一であるか、同一でなくとも検索意図が近いと考えられる。すなわち、生成装置５０は、所定の時間内に続けて入力された一対の検索クエリの分散表現が類似するように学習させることにより、第１モデルの学習精度を向上させることができる。したがって、生成装置５０は、学習精度が向上した第１モデルを用いて第２モデルを生成することができるので、第２モデルの学習精度を向上させることができる。 In general, two search queries entered by the same user in succession in a short period of time are considered to have the same search intent or similar search intents even if they are not the same. That is, it is considered that the pair of search queries that are continuously input within a predetermined time have the same search intent, or the search intents are close even if they are not the same. That is, the generation device 50 can improve the learning accuracy of the first model by learning so that the distributed expressions of the pair of search queries continuously input within a predetermined time are similar. Therefore, since the generation device 50 can generate the second model using the first model with improved learning accuracy, the learning accuracy of the second model can be improved.

一般的に、単体の文字列からなる検索クエリよりも、複数の文字列を含む検索クエリのほうが、検索意図がより明確であると考えられる。すなわち、生成装置５０は、所定の区切り文字で区切られた文字列を含む検索クエリを用いて学習させることにより、第１モデルの学習精度を向上させることができる。したがって、生成装置５０は、学習精度が向上した第１モデルを用いて第２モデルを生成することができるので、第２モデルの学習精度を向上させることができる。 In general, a search query containing a plurality of character strings is considered to have a clearer search intent than a search query consisting of a single character string. That is, the generation device 50 can improve the learning accuracy of the first model by training using a search query including a character string delimited by a predetermined delimiter. Therefore, since the generation device 50 can generate the second model using the first model with improved learning accuracy, the learning accuracy of the second model can be improved.

また、取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルであって、ランダムに抽出された複数の検索クエリが相違する特徴を有するものとして学習することで、複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。また、取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルであって、ランダムに抽出された一対の検索クエリの分散表現が相違するように学習することで、複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。 Further, the acquisition unit 131 is a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned, and is random. It is a second learning model generated by using the first learning model that learned the characteristics of the plurality of search queries by learning that the plurality of search queries extracted from the above have different characteristics. A second learning model that predicts the category to which a predetermined search query belongs is acquired from the search query of. Further, the acquisition unit 131 is a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned, and is random. It is a second learning model generated by using the first learning model that learned the characteristics of a plurality of search queries by learning so that the distributed expressions of the pair of search queries extracted in the above are different. A second learning model that predicts the category to which a predetermined search query belongs is acquired from the search query of.

一般的に、ランダムに抽出された複数の検索クエリは、互いに無関係に入力された検索クエリであるため、検索意図が異なるか、検索意図が遠いと考えられる。そこで、本願発明に係る生成装置５０は、ランダムに抽出された複数の検索クエリは、異なる検索意図の下で検索された検索クエリであるという点で、相互に相違する特徴を有する検索クエリであるとみなして学習モデルＭ１を学習させる。これにより、学習モデルは、検索意図が近い検索クエリのペアである正解データに加えて、検索意図が遠い検索クエリのペアである不正解データを学習することができる。すなわち、生成装置５０は、第１モデルの学習精度を向上させることができる。したがって、生成装置５０は、学習精度が向上した第１モデルを用いて第２モデルを生成することができるので、第２モデルの学習精度を向上させることができる。 In general, a plurality of randomly extracted search queries are search queries that are input independently of each other, so it is considered that the search intents are different or the search intents are distant. Therefore, the generation device 50 according to the present invention is a search query having different characteristics from each other in that a plurality of randomly extracted search queries are search queries searched under different search intentions. Assuming that, the learning model M1 is trained. As a result, the learning model can learn correct answer data, which is a pair of search queries with similar search intentions, and incorrect answer data, which is a pair of search queries with distant search intentions. That is, the generation device 50 can improve the learning accuracy of the first model. Therefore, since the generation device 50 can generate the second model using the first model with improved learning accuracy, the learning accuracy of the second model can be improved.

また、取得部１３１は、同一のユーザによって所定の時間内に入力された複数の検索クエリが類似する特徴を有するものとして複数の検索クエリが有する特徴を学習した第１学習モデルを用いて生成された第２学習モデルであって、検索クエリが第２学習モデルに入力された際に、第２学習モデルが出力する分散表現の分類結果が、検索クエリが属するカテゴリに対応するように学習することで、所定の検索クエリから所定の検索クエリが属するカテゴリを予測する第２学習モデルを取得する。また、取得部１３１は、入力情報として所定の検索クエリが入力された際に、出力情報として所定の検索クエリの分散表現を出力する第１学習モデルを用いて生成された第２学習モデルを取得する。また、取得部１３１は、入力情報として検索クエリが第２学習モデルに入力された際に、出力情報として検索クエリがカテゴリに属する確率をカテゴリ毎に出力する第２学習モデルを取得する。 Further, the acquisition unit 131 is generated by using a first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned. In the second learning model, when the search query is input to the second learning model, the classification result of the distributed expression output by the second learning model is learned so as to correspond to the category to which the search query belongs. Then, the second learning model that predicts the category to which the predetermined search query belongs is acquired from the predetermined search query. Further, the acquisition unit 131 acquires a second learning model generated by using a first learning model that outputs a distributed representation of the predetermined search query as output information when a predetermined search query is input as input information. do. Further, the acquisition unit 131 acquires a second learning model that outputs the probability that the search query belongs to a category as output information when the search query is input to the second learning model as input information.

これにより、生成装置５０は、検索意図を考慮した検索クエリの特徴を含む分散表現を活用して、検索クエリを入力したユーザの検索意図を考慮したカテゴリに検索クエリを分類する第２学習モデルを効率的に生成することができる。すなわち、生成装置５０は、検索クエリを入力したユーザの検索意図を考慮したカテゴリに検索クエリを分類することを可能にする。したがって、生成装置５０は、検索クエリの分類精度を高めることができる。 As a result, the generation device 50 utilizes a distributed expression including the characteristics of the search query considering the search intention, and classifies the search query into a category considering the search intention of the user who input the search query. It can be generated efficiently. That is, the generation device 50 makes it possible to classify the search query into a category considering the search intention of the user who has input the search query. Therefore, the generation device 50 can improve the classification accuracy of the search query.

〔４．ハードウェア構成〕
また、上述してきた実施形態に係る情報処理装置１００および実施形態に係る生成装置５０は、例えば図２３に示すような構成のコンピュータ１０００によって実現される。図２３は、情報処理装置１００および生成装置５０の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を備える。 [4. Hardware configuration]
Further, the information processing apparatus 100 according to the above-described embodiment and the generation apparatus 50 according to the embodiment are realized by, for example, a computer 1000 having a configuration as shown in FIG. 23. FIG. 23 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing device 100 and the generation device 50. The computer 1000 includes a CPU 1100, a RAM 1200, a ROM 1300, an HDD 1400, a communication interface (I / F) 1500, an input / output interface (I / F) 1600, and a media interface (I / F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、所定の通信網を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを所定の通信網を介して他の機器へ送信する。 The HDD 1400 stores a program executed by the CPU 1100, data used by such a program, and the like. The communication interface 1500 receives data from another device via a predetermined communication network and sends the data to the CPU 1100, and transmits the data generated by the CPU 1100 to the other device via the predetermined communication network.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display or a printer, and an input device such as a keyboard or a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. Further, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides the program or data to the CPU 1100 via the RAM 1200. The CPU 1100 loads the program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. And so on.

例えば、コンピュータ１０００が情報処理装置１００または生成装置５０として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムを実行することにより、制御部１３０または制御部５２の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置から所定の通信網を介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the information processing device 100 or the generation device 50, the CPU 1100 of the computer 1000 realizes the functions of the control unit 130 or the control unit 52 by executing the program loaded on the RAM 1200. The CPU 1100 of the computer 1000 reads these programs from the recording medium 1800 and executes them, but as another example, these programs may be acquired from another device via a predetermined communication network.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are examples, and various modifications are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure column of the invention. It is possible to carry out the present invention in other modified forms.

〔５．その他〕
また、上記実施形態及び変形例において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [5. others〕
Further, among the processes described in the above-described embodiments and modifications, all or part of the processes described as being automatically performed can be manually performed, or are described as being manually performed. It is also possible to automatically perform all or part of the performed processing by a known method. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the information shown in the figure.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically distributed in any unit according to various loads and usage conditions. Can be integrated and configured.

また、上述してきた実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Further, the above-described embodiments and modifications can be appropriately combined as long as the processing contents do not contradict each other.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、抽出部は、抽出手段や抽出回路に読み替えることができる。 Further, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the extraction unit can be read as an extraction means or an extraction circuit.

１情報処理システム
１０ユーザ端末
２０検索サーバ
５０生成装置
１００情報処理装置
１１０通信部
１２０記憶部
１２１クエリ情報記憶部
１２２不要文字列記憶部
１２３不要カテゴリ記憶部
１２４モデル情報記憶部
１３０制御部
１３１取得部
１３２処理部
１３３推定部
１３４抽出部 1 Information processing system 10 User terminal 20 Search server 50 Generation device 100 Information processing device 110 Communication unit 120 Storage unit 121 Query information storage unit 122 Unnecessary character string storage unit 123 Unnecessary category storage unit 124 Model information storage unit 130 Control unit 131 Acquisition unit 132 Processing unit 133 Estimating unit 134 Extracting unit

Claims

In the second learning model generated by using the first learning model in which the characteristics of the plurality of search queries are trained assuming that the plurality of search queries input by the same user within a predetermined time have similar characteristics. There is an acquisition unit that acquires the second learning model that predicts the category to which the predetermined search query belongs from the predetermined search query.
Using the second learning model acquired by the acquisition unit, an estimation unit that estimates the category to which the target indicated by the character string input as a search query belongs, and an estimation unit.
An information processing apparatus including an information processing unit that extracts a target character string indicating an extraction target belonging to a target field corresponding to the category from the character string based on a category estimated by the estimation unit. ..

The estimation unit
The information processing apparatus according to claim 1, wherein a plurality of categories to which the object indicated by the character string belongs are estimated.

The estimation unit
The information processing apparatus according to claim 2, wherein the probability that the object indicated by the character string belongs to each category is output for each category.

The extraction unit
The information processing according to any one of claims 1 to 3, wherein a character string including a category indicating the target field is extracted as the target character string from the categories estimated by the estimation unit. Device.

The extraction unit
One of claims 1 to 4, wherein a character string that does not include an unnecessary category registered as an unnecessary category is extracted as the target character string from the categories estimated by the estimation unit. The information processing device described.

A processing unit that acquires an inflow search query that has flowed into a site related to the target field, and acquires a first character string obtained by removing an unnecessary character string registered as an unnecessary character string from the character string input as the inflow search query. Further prepared,
The estimation unit
The information processing apparatus according to any one of claims 1 to 5, wherein the category to which the object indicated by the first character string acquired by the processing unit belongs is estimated.

The acquisition unit
The second learning generated by using the first learning model in which the characteristics of a plurality of search queries are learned as the distributed representations of a pair of search queries input by the same user within a predetermined time have similar characteristics. The information processing apparatus according to any one of claims 1 to 6, further comprising acquiring the second learning model, which is a model and predicts a category to which the predetermined search query belongs from a predetermined search query. ..

The acquisition unit
As a plurality of search queries input by the same user within a predetermined time, the plurality of search queries including a character string separated by a predetermined delimiter are learned as having similar characteristics. Assuming that the search queries have similar characteristics, it is a second learning model generated by using the first learning model that has learned the characteristics of the plurality of search queries, and is the predetermined search query from the predetermined search query. The information processing apparatus according to any one of claims 1 to 7, wherein the second learning model for predicting the category to which the second learning model belongs is acquired.

The acquisition unit
A first learning model in which a plurality of search queries input by the same user within a predetermined time have similar characteristics and the characteristics of the plurality of search queries are learned, and a plurality of randomly extracted search queries. A second learning model generated by using the first learning model in which the characteristics of the plurality of search queries are learned by learning the search queries as having different characteristics, from a predetermined search query. The information processing apparatus according to any one of claims 1 to 8, wherein the second learning model for predicting the category to which the predetermined search query belongs is acquired.

The acquisition unit
A first learning model in which a plurality of search queries entered by the same user within a predetermined time learn the characteristics of the plurality of search queries as having similar characteristics, and a pair of randomly extracted search queries. It is a second learning model generated by using the first learning model that learned the characteristics of the plurality of search queries by learning so that the distributed expressions of the search queries are different, and is from a predetermined search query. The information processing apparatus according to any one of claims 1 to 9, wherein the second learning model for predicting a category to which the predetermined search query belongs is acquired.

The acquisition unit
In the second learning model generated by using the first learning model in which the characteristics of the plurality of search queries are trained assuming that the plurality of search queries input by the same user within a predetermined time have similar characteristics. Therefore, when the search query is input to the second learning model, the classification result of the distributed expression output by the second learning model is learned so as to correspond to the category to which the search query belongs. The information processing apparatus according to any one of claims 1 to 10, wherein the second learning model that predicts the category to which the predetermined search query belongs is acquired from the search query of.

The acquisition unit
When a predetermined search query is input as input information, the second learning model generated by using the first learning model that outputs a distributed representation of the predetermined search query as output information is acquired. The information processing apparatus according to any one of claims 1 to 11.

The acquisition unit
When a search query is input to the second learning model as input information, a request characterized by acquiring the second learning model that outputs the probability that the search query belongs to the category as output information for each category. Item 6. The information processing apparatus according to any one of Items 1 to 12.

It is an information processing method executed by a computer.
In the second learning model generated by using the first learning model in which the characteristics of the plurality of search queries are trained assuming that the plurality of search queries input by the same user within a predetermined time have similar characteristics. Therefore, the acquisition process of acquiring the second learning model that predicts the category to which the predetermined search query belongs from the predetermined search query, and
Using the second learning model acquired by the acquisition process, an estimation process for estimating the category to which the target indicated by the character string input as a search query belongs, and an estimation process.
An information processing method comprising an extraction step of extracting a target character string indicating an extraction target belonging to a target field corresponding to the category from the character string based on a category estimated by the estimation step. ..

In the second learning model generated by using the first learning model in which the characteristics of the plurality of search queries are trained assuming that the plurality of search queries input by the same user within a predetermined time have similar characteristics. Therefore, the acquisition procedure for acquiring the second learning model that predicts the category to which the predetermined search query belongs from the predetermined search query, and the acquisition procedure.
Using the second learning model acquired by the acquisition procedure, an estimation procedure for estimating the category to which the target indicated by the character string input as a search query belongs, and an estimation procedure.
Based on the category estimated by the estimation procedure, the computer is made to execute an extraction procedure for extracting a target character string indicating an extraction target belonging to the target field corresponding to the category from the character string. Information processing program.