JP6971103B2

JP6971103B2 - Information processing equipment, information processing methods, and programs

Info

Publication number: JP6971103B2
Application number: JP2017175549A
Authority: JP
Inventors: 文彦 ▲高▼橋
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2017-09-13
Filing date: 2017-09-13
Publication date: 2021-11-24
Anticipated expiration: 2037-09-13
Also published as: JP2019053386A

Description

本発明は、情報処理装置、情報処理方法、およびプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

ウェブ検索の分野において、ユーザが入力したキーワード（以下、「クエリ」）に応じて、このクエリに関連する各種商品またはサービスの情報を提供する手法が行われている。また、ユーザが意図する商品またはサービスの情報を高精度で提供するために、機械学習を用いた検索手法の開発も進められている。 In the field of web search, a method of providing information on various products or services related to this query is performed according to a keyword input by a user (hereinafter, "query"). In addition, in order to provide information on products or services intended by users with high accuracy, a search method using machine learning is being developed.

特表２００２−５３０９４８号公報Special Table 2002-530948 Gazette

しかしながら、従来の技術では、クエリとの関連度が低く、ユーザが意図しない商品またはサービスの情報が提供される場合があった。また、従来の機械学習を用いた手法は、単純な線形モデルを用いたものであり、クエリに含まれる単語の組み合わせを考慮できていなかった。また、一般的な機械学習の分野において、線形モデルの他、エンコーダ−デコーダモデルを用いる手法の研究が進められているが、ウェブ検索の分野に適用することは想定されていなかった（例えば、特許文献１参照）。 However, in the conventional technique, the relevance to the query is low, and the information of the product or service not intended by the user may be provided. In addition, the conventional method using machine learning uses a simple linear model, and cannot consider the combination of words contained in the query. Also, in the field of general machine learning, research on methods using encoder-decoder models in addition to linear models is underway, but it was not supposed to be applied to the field of web search (for example, patents). See Document 1).

本発明は、このような事情を考慮してなされたものであり、検索精度を向上させることができる情報処理装置、情報処理方法、およびプログラムを提供することを目的の一つとする。 The present invention has been made in consideration of such circumstances, and one of the objects of the present invention is to provide an information processing apparatus, an information processing method, and a program capable of improving search accuracy.

本発明の一態様は、検索に用いられたクエリを解析して、前記クエリを単語に分割する解析部と、前記解析部によって分割された単語を特徴ベクトルに変換する変換部と、学習データにおけるクエリに含まれる単語の特徴ベクトルと、前記学習データにおけるクエリに関連付けされたカテゴリとの関係を学習することにより、前記検索に用いられたクエリと関連付けされるカテゴリを推定する第１推定部とを備える情報処理装置である。 One aspect of the present invention is an analysis unit that analyzes a query used for a search and divides the query into words, a conversion unit that converts the words divided by the analysis unit into feature vectors, and training data. A first estimation unit that estimates the category associated with the query used in the search by learning the relationship between the feature vector of the word included in the query and the category associated with the query in the training data. It is an information processing device provided.

本発明の一態様によれば、検索精度を向上させることができる。 According to one aspect of the present invention, the search accuracy can be improved.

第１実施形態の検索サーバ１０の使用環境を示す図である。It is a figure which shows the use environment of the search server 10 of 1st Embodiment. 第１実施形態のカテゴリのツリー構造の一例を示す図である。It is a figure which shows an example of the tree structure of the category of 1st Embodiment. 第１実施形態の検索サーバ１０の学習処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the learning process of the search server 10 of 1st Embodiment. 第１実施形態の検索サーバ１０のカテゴリ推定処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the category estimation process of the search server 10 of 1st Embodiment. 第１実施形態のカテゴリ推定部３６におけるカテゴリ推定処理を模式的に示す図である。It is a figure which shows typically the category estimation process in the category estimation part 36 of 1st Embodiment. 第１実施形態の検索結果ページの一例を示す図である。It is a figure which shows an example of the search result page of 1st Embodiment. 評価実験において利用したモデルを説明する図である。It is a figure explaining the model used in the evaluation experiment. 評価実験の結果を示す図である。It is a figure which shows the result of the evaluation experiment. 評価実験のクリック数別での一致率の比較結果を示す図である。It is a figure which shows the comparison result of the agreement rate by the number of clicks of an evaluation experiment. 評価実験のモデル１を用いて推定されたカテゴリの一例を示す図である。It is a figure which shows an example of the category estimated using the model 1 of the evaluation experiment. 第２実施形態の検索サーバ１１の機能ブロック図である。It is a functional block diagram of the search server 11 of the 2nd Embodiment. 第２実施形態の検索ログデータの一例を示す図である。It is a figure which shows an example of the search log data of 2nd Embodiment. 第２実施形態の検索サーバ１１のカテゴリ推定処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the category estimation process of the search server 11 of 2nd Embodiment. 第３実施形態の検索サーバ１２の機能ブロック図である。It is a functional block diagram of the search server 12 of the 3rd Embodiment.

以下、図面を参照して、情報処理装置、情報処理方法、およびプログラムの実施形態について説明する。 Hereinafter, an information processing apparatus, an information processing method, and an embodiment of a program will be described with reference to the drawings.

＜１．第１実施形態＞
以下、本発明の第１実施形態について説明する。本実施形態の情報処理装置は、検索に用いられたクエリを単語に分割し、この分割された単語を特徴ベクトルに変換し、クエリと関連付けされるカテゴリを推定する。本実施形態における「カテゴリ」とは、検索に用いられたクエリに対して検索結果として提供されるコンテンツの内容（商品またはサービス）が属する分野を示す情報である。本実施形態では、情報処理装置が、商品またはサービスのコンテンツの検索結果を提供する検索サーバである例について説明する。 <1. First Embodiment>
Hereinafter, the first embodiment of the present invention will be described. The information processing apparatus of the present embodiment divides the query used for the search into words, converts the divided words into feature vectors, and estimates the category associated with the query. The "category" in the present embodiment is information indicating a field to which the content (product or service) of the content provided as a search result for the query used for the search belongs. In the present embodiment, an example in which the information processing device is a search server that provides search results for the contents of a product or service will be described.

＜１−１．システム構成＞
図１は、本実施形態の検索サーバ１０（情報処理装置）の使用環境を示す図である。検索サーバ１０は、ネットワークＮＷを介して、端末装置Ｔ等と接続されている。ネットワークＮＷは、例えば、ＷＡＮ（Wide Area Network）やＬＡＮ（Local Area Network）、インターネット、専用回線、無線基地局、プロバイダ等を含む。 <1-1. System configuration>
FIG. 1 is a diagram showing a usage environment of the search server 10 (information processing apparatus) of the present embodiment. The search server 10 is connected to the terminal device T or the like via the network NW. The network NW includes, for example, WAN (Wide Area Network), LAN (Local Area Network), the Internet, a dedicated line, a wireless base station, a provider, and the like.

端末装置Ｔは、検索サーバ１０により提供される検索サービスを利用するユーザによって操作される。端末装置Ｔは、例えば、パーソナルコンピュータ、スマートフォンなどの携帯電話やタブレット端末、ＰＤＡ（Personal Digital Assistant）などのコンピュータ装置である。端末装置Ｔは、ユーザの操作に基づいて動作するブラウザまたはアプリケーションプログラムが、例えば、情報提供を要求するクエリを検索サーバ１０に送信し、クエリと関連付けされた情報を検索サーバ１０から受信する。 The terminal device T is operated by a user who uses the search service provided by the search server 10. The terminal device T is, for example, a personal computer, a mobile phone such as a smartphone, a tablet terminal, or a computer device such as a PDA (Personal Digital Assistant). In the terminal device T, a browser or an application program that operates based on a user's operation sends, for example, a query requesting information provision to the search server 10, and receives information associated with the query from the search server 10.

検索サーバ１０は、端末装置Ｔから入力されたクエリと関連付けされる検索結果のページ情報を端末装置Ｔに送信する。検索サーバ１０は、クエリと、このクエリと関連付けされたサイトのページを参照するための参照情報とを関連付けた検索データを用いて、検索結果のページ情報を生成する。参照情報は、例えば、ＵＲＬ（Uniform Resource Locator）を含む。 The search server 10 transmits the page information of the search result associated with the query input from the terminal device T to the terminal device T. The search server 10 generates page information of the search result by using the search data in which the query is associated with the reference information for referencing the page of the site associated with the query. The reference information includes, for example, a URL (Uniform Resource Locator).

検索サーバ１０は、例えば、通信部３０と、クエリ解析部３２（解析部）と、クエリ変換部３４（変換部）と、カテゴリ推定部３６（第１推定部、学習部）と、検索結果生成部３８（生成部）と、記憶部４０とを備える。通信部３０は、ネットワークＮＷを介して、端末装置Ｔ等と通信する。通信部３０は、ネットワークＮＷを介して、端末装置Ｔからクエリを受信し、検索結果のページ情報を端末装置Ｔに送信する。通信部３０は、例えば、ＮＩＣ等の通信インターフェースを含む。 The search server 10 includes, for example, a communication unit 30, a query analysis unit 32 (analysis unit), a query conversion unit 34 (conversion unit), a category estimation unit 36 (first estimation unit, learning unit), and a search result generation. A unit 38 (generation unit) and a storage unit 40 are provided. The communication unit 30 communicates with the terminal device T or the like via the network NW. The communication unit 30 receives a query from the terminal device T via the network NW, and transmits the page information of the search result to the terminal device T. The communication unit 30 includes, for example, a communication interface such as a NIC.

クエリ解析部３２は、端末装置Ｔから入力されたクエリを解析して、単語レベルに分割する。クエリ解析部３２は、例えば、形態素解析によってクエリを解析し、少なくとも１つの単語に分割する。 The query analysis unit 32 analyzes the query input from the terminal device T and divides it into word levels. The query analysis unit 32 analyzes the query by, for example, morphological analysis, and divides the query into at least one word.

クエリ変換部３４は、クエリ解析部３２によって分割された単語の各々を特徴ベクトルに変換する。クエリ変換部３４は、分割された単語の各々を、例えば、ｗｏｒｄ２ｖｅｃと称されているツール（プログラム）を利用して、特徴ベクトルに変換する。ｗｏｒｄ２ｖｅｃとは、ニューラルネットワークを利用したツールであり、入力されたコーパスに含まれるテキスト情報を、そのテキスト情報の特徴を示す特徴ベクトルに変換して出力する。本実施形態では、ｗｏｒｄ２ｖｅｃにおいて、クエリの分散表現が事前に学習されているものとする。 The query conversion unit 34 converts each of the words divided by the query analysis unit 32 into a feature vector. The query conversion unit 34 converts each of the divided words into a feature vector by using, for example, a tool (program) called word2vec. Word2vec is a tool that uses a neural network, and converts the text information contained in the input corpus into a feature vector that indicates the characteristics of the text information and outputs it. In this embodiment, it is assumed that the distributed representation of the query has been learned in advance in word2vec.

カテゴリ推定部３６は、クエリ変換部３４から入力された少なくとも１つの特徴ベクトルと関連付けされるカテゴリを推定する。カテゴリ推定部３６は、例えば、再帰型ニューラルネットワーク（ＲＮＮ：Recurrent Neural Networks）に基づくエンコーダデコーダモデルを用いて、カテゴリを推定する。以下においては、カテゴリ推定部３６が、エンコーダデコーダモデルを用いる例について説明する。 The category estimation unit 36 estimates the category associated with at least one feature vector input from the query conversion unit 34. The category estimation unit 36 estimates a category using, for example, an encoder / decoder model based on a recurrent neural network (RNN). In the following, an example in which the category estimation unit 36 uses the encoder / decoder model will be described.

カテゴリ推定部３６は、例えば、エンコーダ４２と、デコーダ４４とを備える。エンコーダ４２およびデコーダ４４は、検索に用いられたクエリに含まれる「単語の特徴ベクトル」と、クエリを応じて提供された検索結果ページにおいてユーザが選択した「カテゴリ」との組を学習データとして、学習処理を行う。 The category estimation unit 36 includes, for example, an encoder 42 and a decoder 44. The encoder 42 and the decoder 44 use a set of a "word feature vector" included in the query used for the search and a "category" selected by the user on the search result page provided in response to the query as learning data. Perform learning processing.

エンコーダ４２は、上述の学習データを用いて学習を行ったモデル（以下、「エンコードモデル」を用いて、クエリ変換部３４から入力された少なくとも１つの特徴ベクトルに対してエンコード処理を行う。エンコーダ４２は、クエリ変換部３４から２以上の特徴ベクトルが入力された場合、第１の特徴ベクトルに対してエンコード処理を行い、次に、第１の特徴ベクトルのエンコード処理結果と、第２の特徴ベクトルとを用いてエンコード処理を行う。エンコーダ４２は、エンコード処理を繰り返すことにより得られたエンコード処理結果をデコーダ４４に入力する。 The encoder 42 performs an encoding process on at least one feature vector input from the query conversion unit 34 by using a model trained using the above-mentioned training data (hereinafter, “encoding model””. When two or more feature vectors are input from the query conversion unit 34, encodes the first feature vector, then encodes the first feature vector and the second feature vector. The encoder 42 inputs the encoding processing result obtained by repeating the encoding processing to the decoder 44.

デコーダ４４は、上述の学習データを用いて学習を行ったモデル（以下、「デコードモデル」を用いて、エンコーダ４２から入力されたエンコード処理結果に基づいて、クエリの単語に関連付けされるべきカテゴリを推定して出力する。カテゴリは、例えば、ツリー構造を有しており、商品またはサービスが属する分野を階層状に定義する。図２は、カテゴリのツリー構造の一例を示す図である。図２に示す例では、第１階層Ｈ１として「メンズファッション」、「レディースファッション」等が定義されている。また、「メンズファッション（第１階層Ｈ１）」の下位の第２階層Ｈ２として、「メンズシューズ」、「メンズバック」等が定義されている。また、「メンズシューズ（第２階層Ｈ２）」の下位の第３階層Ｈ３として、「スニーカー」、「ビジネスシューズ」、「サンダル」、「ブーツ」等が定義されている。 The decoder 44 sets a category to be associated with the query word based on the encoding processing result input from the encoder 42 using the model trained using the above-mentioned training data (hereinafter, “decode model””. The category has a tree structure, for example, and defines the fields to which the goods or services belong in a hierarchical manner. FIG. 2 is a diagram showing an example of the tree structure of the categories. In the example shown in the above, "men's fashion", "ladies' fashion" and the like are defined as the first layer H1, and "men's shoes" are defined as the second layer H2 below "men's fashion (first layer H1)". , "Men's back", etc. In addition, "sneakers", "business shoes", "sandals", "boots" are defined as the third layer H3 below "men's shoes (second layer H2)". Etc. are defined.

デコーダ４４は、エンコーダ４２から入力されるエンコード処理結果（多次元ベクトル）と、図２に示すようなカテゴリのツリー構造との関係を学習する。デコーダ４４は、エンコード処理結果に基づいて、クエリの単語に関連付けされるべきカテゴリを推定し、カテゴリのツリー構造における上位の階層から順に出力する。 The decoder 44 learns the relationship between the encoding processing result (multidimensional vector) input from the encoder 42 and the tree structure of the category as shown in FIG. The decoder 44 estimates the category to be associated with the word of the query based on the encoding processing result, and outputs the category in order from the upper hierarchy in the tree structure of the category.

検索結果生成部３８は、カテゴリ推定部３６によって推定されたカテゴリに基づいて、検索結果である検索結果ページを生成する。検索結果生成部３８は、生成した検索結果ページを、通信部３０を介して、端末装置Ｔに送信する。 The search result generation unit 38 generates a search result page which is a search result based on the category estimated by the category estimation unit 36. The search result generation unit 38 transmits the generated search result page to the terminal device T via the communication unit 30.

検索サーバ１０の各機能部は、例えば、コンピュータにおいて、ＣＰＵ（Central Processing Unit）等のハードウェアプロセッサがプログラム（ソフトウェア）を実行することにより実現される。また、これらの構成要素のうち一部または全部は、ＬＳＩ（Large Scale Integration）やＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）等のハードウェア（回路部；circuitryを含む）によって実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。 Each functional unit of the search server 10 is realized, for example, by executing a program (software) by a hardware processor such as a CPU (Central Processing Unit) in a computer. In addition, some or all of these components are hardware (circuits) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), and GPU (Graphics Processing Unit). It may be realized by the part; including circuitry), or it may be realized by the cooperation of software and hardware.

記憶部４０は、クエリと、このクエリと関連付けされたサイトのページを参照するための参照情報とを関連付けた検索データＤ１、カテゴリのツリー構造を示すカテゴリデータＤ２等を記憶する。記憶部４０は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、またはこれらのうち複数が組み合わされたハイブリッド型記憶装置等により実現される。また、記憶部４０の一部または全部は、ＮＡＳや外部のストレージサーバ等、検索サーバ１０がアクセス可能な外部装置であってもよい。 The storage unit 40 stores search data D1 in which a query is associated with reference information for referencing a page of a site associated with the query, category data D2 indicating a category tree structure, and the like. The storage unit 40 is realized by, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), a flash memory, or a hybrid storage device in which a plurality of these are combined. Further, a part or all of the storage unit 40 may be an external device such as NAS or an external storage server that can be accessed by the search server 10.

＜１−２．学習処理＞
以下において、検索サーバ１０の学習処理について説明する。図３は、検索サーバ１０の学習処理の流れの一例を示す図である。カテゴリ推定部３６は、クエリに含まれる単語の特徴ベクトルと、このクエリと関連付けされるカテゴリとの関係を学習する。 <1-2. Learning process>
Hereinafter, the learning process of the search server 10 will be described. FIG. 3 is a diagram showing an example of the flow of learning processing of the search server 10. The category estimation unit 36 learns the relationship between the feature vector of the word included in the query and the category associated with the query.

まず、検索サーバ１０のカテゴリ推定部３６に、学習データを入力する（Ｓ１０１）。学習データとしては、例えば、検索サービスが利用された際のログデータを利用する。学習データとしては、例えば、検索に用いられたクエリに含まれる「単語の特徴ベクトル」と、クエリを応じて提供された検索結果ページにおいてユーザが選択した「カテゴリ」との組を利用する。 First, the learning data is input to the category estimation unit 36 of the search server 10 (S101). As the learning data, for example, the log data when the search service is used is used. As the learning data, for example, a set of a "word feature vector" included in the query used for the search and a "category" selected by the user on the search result page provided in response to the query is used.

次に、カテゴリ推定部３６のエンコーダ４２が「クエリに含まれる単語の特徴ベクトル」をエンコードし（Ｓ１０３）、デコーダ４４が、エンコーダ４２によるエンコード処理結果をデコードした際に（Ｓ１０５）、このクエリと組となる「カテゴリ」が出力されるように、このエンコーダモデルおよびデコーダモデルにおけるパラメータを調整する（Ｓ１０７）。 Next, when the encoder 42 of the category estimation unit 36 encodes the "feature vector of the word included in the query" (S103) and the decoder 44 decodes the encoding processing result by the encoder 42 (S105), this query and The parameters in this encoder model and decoder model are adjusted so that a set of "categories" is output (S107).

デコーダ４４は、例えば、このデコード処理の結果として、第１階層として定義されたカテゴリごとに、クエリの単語との関連度合を示すスコアを算出する。デコーダ４４は、学習データにおいてクエリと組となるカテゴリのスコアが高くなるように、デコーダモデルにおけるパラメータを調整する。カテゴリ推定部３６は、例えば、逆伝播法により、パラメータの調整を行ってよい。第２階層以下の階層についても、同様な処理を行う。以上により、本フローチャートの処理を終了する。 For example, as a result of this decoding process, the decoder 44 calculates a score indicating the degree of association with the word of the query for each category defined as the first layer. The decoder 44 adjusts the parameters in the decoder model so that the score of the category paired with the query is high in the training data. The category estimation unit 36 may adjust the parameters by, for example, the back propagation method. The same processing is performed for the second and lower layers. This completes the processing of this flowchart.

尚、ログデータにおいて、あるクエリに対して複数存在するカテゴリのデータのうち、クリック数の少ないカテゴリのデータ（クリック数（選択された回数）が所定の閾値以下であるデータ）を除外し、除外後のログデータを学習データとするようにしてもよい。これにより、エンコーダモデルおよびデコーダモデルの精度を向上させることができる。 In addition, in the log data, among the data of multiple categories existing for a certain query, the data of the category with a small number of clicks (data in which the number of clicks (selected number of times) is equal to or less than a predetermined threshold value) is excluded and excluded. The later log data may be used as training data. This makes it possible to improve the accuracy of the encoder model and the decoder model.

＜１−３．カテゴリ推定処理＞
以下において、検索サーバ１０のカテゴリ推定処理について説明する。図４は、検索サーバ１０のカテゴリ推定処理の流れの一例を示す図である。 <1-3. Category estimation process>
Hereinafter, the category estimation process of the search server 10 will be described. FIG. 4 is a diagram showing an example of the flow of the category estimation process of the search server 10.

まず、クエリ解析部３２は、通信部３０を介して、端末装置Ｔから入力されたクエリを取得する（Ｓ２０１）。次に、クエリ解析部３２は、端末装置Ｔから入力されたクエリを解析して、単語レベルに分割する（Ｓ２０３）。クエリ解析部３２は、例えば、形態素解析によってクエリを解析し、単語レベルに分割する。 First, the query analysis unit 32 acquires the query input from the terminal device T via the communication unit 30 (S201). Next, the query analysis unit 32 analyzes the query input from the terminal device T and divides it into word levels (S203). The query analysis unit 32 analyzes the query by, for example, morphological analysis, and divides the query into word levels.

次に、クエリ変換部３４は、クエリ解析部３２によって分割された単語の各々を特徴ベクトルに変換する（Ｓ２０５）。クエリ変換部３４は、分割された単語の各々を、例えば、ｗｏｒｄ２ｖｅｃを利用して、特徴ベクトルに変換する。 Next, the query conversion unit 34 converts each of the words divided by the query analysis unit 32 into a feature vector (S205). The query conversion unit 34 converts each of the divided words into a feature vector by using, for example, word2vec.

次に、エンコーダ４２は、クエリ変換部３４によって変換された特徴ベクトルのうち、第１の単語の特徴ベクトルに対してエンコード処理を行う（Ｓ２０７）。次に、エンコーダ４２は、全ての単語の特徴ベクトルに対するエンコード処理が完了したか否かを判定する（Ｓ２０９）。 Next, the encoder 42 performs encoding processing on the feature vector of the first word among the feature vectors converted by the query conversion unit 34 (S207). Next, the encoder 42 determines whether or not the encoding process for the feature vectors of all the words is completed (S209).

エンコーダ４２は、全ての単語の特徴ベクトルに対するエンコード処理が完了していないと判定した場合、クエリ変換部３４によって変換された特徴ベクトルのうち、第２の単語の特徴ベクトルと、第１の単語の特徴ベクトルに対するエンコード処理結果とを用いて、再度、エンコード処理を行う（Ｓ２０７）。次に、エンコーダ４２は、全ての単語の特徴ベクトルに対するエンコード処理が完了したか否かを判定する（Ｓ２０９）。以下、全ての単語の特徴ベクトルに対するエンコード処理が完了するまで、同様な処理を繰り返す。エンコーダ４２は、全ての単語の特徴ベクトルに対するエンコード処理が完了したと判定した場合、上述のエンコード処理を繰り返した結果として得られたエンコード処理結果をデコーダ４４に入力する。 When the encoder 42 determines that the encoding processing for the feature vectors of all the words has not been completed, the feature vector of the second word and the feature vector of the first word among the feature vectors converted by the query conversion unit 34 Encoding processing is performed again using the encoding processing result for the feature vector (S207). Next, the encoder 42 determines whether or not the encoding process for the feature vectors of all the words is completed (S209). Hereinafter, the same processing is repeated until the encoding processing for the feature vectors of all words is completed. When the encoder 42 determines that the encoding processing for the feature vectors of all words is completed, the encoder 42 inputs the encoding processing result obtained as a result of repeating the above-mentioned encoding processing to the decoder 44.

次に、デコーダ４４は、エンコーダ４２から入力されたエンコード処理結果に対して、デコード処理を行う（Ｓ２１１）。デコーダ４４は、このデコード処理の結果として、第１階層として定義されたカテゴリごとに、クエリの単語との関連度合を示すスコアを算出する。例えば、最も高いスコアのカテゴリが、クエリと関連付けされるカテゴリとして推定される。 Next, the decoder 44 performs decoding processing on the encoding processing result input from the encoder 42 (S211). As a result of this decoding process, the decoder 44 calculates a score indicating the degree of association with the word of the query for each category defined as the first layer. For example, the category with the highest score is estimated as the category associated with the query.

次に、デコーダ４４は、最下層のカテゴリの推定が完了したか否かを判定する（Ｓ２１３）。デコーダ４４は、最下層のカテゴリの推定が完了していないと判定した場合、推定されたカテゴリの情報と、デコード処理結果とを用いて、再度、デコード処理を行う（Ｓ２１１）。次に、エンコーダ４２は、最下層のカテゴリの推定が完了したか否かを判定する（Ｓ２１３）。以下、最下層のカテゴリの推定が完了するまで、同様な処理を繰り返す。デコーダ４４は、最下層のカテゴリの推定が完了したと判定した場合、推定したカテゴリの情報を検索結果生成部３８に入力する。 Next, the decoder 44 determines whether or not the estimation of the lowest layer category is completed (S213). When the decoder 44 determines that the estimation of the lowermost category has not been completed, the decoder 44 performs the decoding process again using the estimated category information and the decoding process result (S211). Next, the encoder 42 determines whether or not the estimation of the category of the lowest layer is completed (S213). Hereinafter, the same process is repeated until the estimation of the lowest category is completed. When the decoder 44 determines that the estimation of the lowest layer category is completed, the decoder 44 inputs the information of the estimated category to the search result generation unit 38.

図５は、カテゴリ推定部３６におけるカテゴリ推定処理を模式的に示す図である。図５に示す例では、クエリ解析部３２は、端末装置Ｔから入力されたクエリ「メンズスニーカーＡ社」を、「メンズ」、「スニーカー」、および「Ａ社」という３単語に分割し、クエリ変換部３４が、これらの３つの単語の各々を特徴ベクトルに変換して、カテゴリ推定部３６に入力する例を示す。 FIG. 5 is a diagram schematically showing the category estimation process in the category estimation unit 36. In the example shown in FIG. 5, the query analysis unit 32 divides the query "men's sneaker company A" input from the terminal device T into three words "men's", "sneaker", and "company A", and queries. An example is shown in which the conversion unit 34 converts each of these three words into a feature vector and inputs it to the category estimation unit 36.

まず、エンコーダ４２は、第１の単語「メンズ」の特徴ベクトルに対してエンコード処理を行い、第１のエンコード処理結果Ｒ１を得る。次に、エンコーダ４２は、第２の単語「スニーカー」の特徴ベクトルと、第１のエンコード処理結果Ｒ１とを用いてエンコード処理を行い、第２のエンコード処理結果Ｒ２を得る。次に、エンコーダ４２は、第３の単語「Ａ社」の特徴ベクトルと、第２のエンコード処理結果Ｒ２とを用いてエンコード処理を行い、第３のエンコード処理結果Ｒ３を得る。エンコーダ４２は、第３のエンコード処理結果Ｒ３をデコーダ４４に入力する。 First, the encoder 42 performs encoding processing on the feature vector of the first word "men's", and obtains the first encoding processing result R1. Next, the encoder 42 performs an encoding process using the feature vector of the second word "sneaker" and the first encoding processing result R1 to obtain the second encoding processing result R2. Next, the encoder 42 performs an encoding process using the feature vector of the third word “Company A” and the second encoding processing result R2, and obtains the third encoding processing result R3. The encoder 42 inputs the third encoding processing result R3 to the decoder 44.

次に、デコーダ４４は、エンコーダ４２から入力された第３のエンコード処理結果Ｒ３に対してデコード処理を行う。ここで、デコーダ４４は、最上位層（第１階層Ｈ１）のカテゴリごとに、「メンズ」、「スニーカー」、および「Ａ社」という３単語との関連度合を示すスコアを算出する。デコーダ４４は、カテゴリごとに算出したスコアの内、最も高いスコアを示すカテゴリを、「メンズ」、「スニーカー」、および「Ａ社」という３単語と関連付けされる第１のカテゴリ（この例では「メンズファッション」）と推定する。 Next, the decoder 44 performs decoding processing on the third encoding processing result R3 input from the encoder 42. Here, the decoder 44 calculates a score indicating the degree of association with the three words "men's", "sneakers", and "company A" for each category of the highest layer (first layer H1). The decoder 44 associates the category with the highest score among the scores calculated for each category with the three words "men's", "sneakers", and "company A" (in this example, "" Men's fashion ").

次に、デコーダ４４は、第３のエンコード処理結果Ｒ３に対するデコード処理結果（第１のデコード処理結果Ｒ４）と、推定されたカテゴリ「メンズファッション」を示すデータとを用いてデコード処理を行う。ここで、デコーダ４４は、推定されたカテゴリ「メンズファッション（第１階層）」の下位に位置する第２階層Ｈ２のカテゴリごとに、「メンズ」、「スニーカー」、および「Ａ社」という３単語との関連度合を示すスコアを算出する。デコーダ４４は、カテゴリごとに算出したスコアの内、最も高いスコアを示すカテゴリを、「メンズ」、「スニーカー」、および「Ａ社」という３単語と関連付けされる第２のカテゴリ（この例では「メンズシューズ」）と推定する。 Next, the decoder 44 performs decoding processing using the decoding processing result (first decoding processing result R4) for the third encoding processing result R3 and the data indicating the estimated category “men's fashion”. Here, the decoder 44 has three words, "men's", "sneakers", and "company A", for each category of the second layer H2 located below the estimated category "men's fashion (first layer)". Calculate a score that indicates the degree of association with. The decoder 44 associates the category with the highest score among the scores calculated for each category with the three words "men's", "sneakers", and "company A" (in this example, "" Men's shoes ").

次に、デコーダ４４は、第１のデコード処理結果Ｒ４に対するデコード処理結果（第２のデコード処理結果Ｒ５）と、推定されたカテゴリ「メンズシューズ」を示すデータとを用いてデコード処理を行う。ここで、デコーダ４４は、推定されたカテゴリ「メンズシューズ（第２階層）」の下位に位置する第３階層Ｈ３のカテゴリごとに、「メンズ」、「スニーカー」、および「Ａ社」という３単語との関連度合を示すスコアを算出する。デコーダ４４は、カテゴリごとに算出したスコアの内、最も高いスコアを示すカテゴリを、「メンズ」、「スニーカー」、および「Ａ社」という３単語と関連付けされる第３のカテゴリ（この例では「スニーカー」）と推定する。 Next, the decoder 44 performs decoding processing using the decoding processing result (second decoding processing result R5) for the first decoding processing result R4 and the data indicating the estimated category “men's shoes”. Here, the decoder 44 has the three words "men's", "sneakers", and "company A" for each category of the third layer H3 located below the estimated category "men's shoes (second layer)". Calculate a score that indicates the degree of association with. The decoder 44 associates the category with the highest score among the scores calculated for each category with the three words "men's", "sneakers", and "company A" (in this example, "" Sneakers ").

図５に示す以上の処理により、「メンズ」、「スニーカー」、および「Ａ社」という３単語と関連付けされるカテゴリとして、「メンズファッション（第１階層）」、「メンズシューズ（第２階層）」、および「スニーカー（第３階層）」というカテゴリの推定結果が得られる。 By the above processing shown in FIG. 5, the categories associated with the three words "men's", "sneakers", and "company A" are "men's fashion (first layer)" and "men's shoes (second layer)". , And the estimation results of the category "sneakers (third layer)" are obtained.

次に、検索結果生成部３８は、デコーダ４４から入力されたカテゴリの推定結果に基づいて、検索結果である検索結果ページを生成し、端末装置Ｔに送信する（Ｓ２１５）。図６は、検索結果ページＰ１の一例を示す図である。図６に示す例において、検索結果生成部３８は、推定された第３階層のカテゴリである「スニーカー」をページ内の最も視認性の高い位置に配置した検索結果ページＰ１を生成する。 Next, the search result generation unit 38 generates a search result page, which is a search result, based on the estimation result of the category input from the decoder 44, and transmits it to the terminal device T (S215). FIG. 6 is a diagram showing an example of the search result page P1. In the example shown in FIG. 6, the search result generation unit 38 generates the search result page P1 in which the estimated third layer category “sneakers” are arranged at the most visible position in the page.

尚、図６に示す例では、推定されたカテゴリである「スニーカー」に加え、推定された第２階層のカテゴリである「メンズシューズ（第２階層）」の下位のその他のカテゴリである「ブーツ」、「サンダル」、「ビジネスシューズ」等が、例えば、スコアが高い順に並べられて表示されている。検索結果生成部３８は、生成した検索結果ページＰ１を、通信部３０を介して、端末装置Ｔに送信する。端末装置Ｔのユーザは、端末装置Ｔを操作して検索結果ページＰ１に含まれる１つのカテゴリを選択することで、ユーザの意図に応じた検索結果を取得することができる。以上により、本フローチャートの処理を終了する。 In the example shown in FIG. 6, in addition to the estimated category "sneakers", the other category "boots" under the estimated second layer category "men's shoes (second layer)". , "Sandals", "business shoes", etc. are displayed in descending order of score, for example. The search result generation unit 38 transmits the generated search result page P1 to the terminal device T via the communication unit 30. The user of the terminal device T can acquire the search result according to the user's intention by operating the terminal device T and selecting one category included in the search result page P1. This completes the processing of this flowchart.

尚、図６に示す例では、推定されたカテゴリをユーザに選択させる検索結果ページＰ１を端末装置Ｔに送信する例を説明した。しかしながら、検索結果生成部３８は、推定されたカテゴリをユーザに選択させる検索結果ページＰ１に代えて、推定されたカテゴリに関連する商品またはサービスのページを端末装置Ｔに送信するようにしてもよい。 In the example shown in FIG. 6, an example of transmitting the search result page P1 that causes the user to select the estimated category to the terminal device T has been described. However, the search result generation unit 38 may transmit the page of the product or service related to the estimated category to the terminal device T instead of the search result page P1 that causes the user to select the estimated category. ..

＜１−３．評価実験＞
本実施形態における検索サーバ１０のカテゴリ推定精度を評価するために、図７に示すような３つのモデルを用いた評価実験を行った。モデル１は、本実施形態の検索サーバ１０で使用するエンコーダデコーダモデルである。モデル２は、従来技術の線形分類器を用いた線形モデルである。モデル３は、従来技術のログに基づく（実績ベース）モデルである。このモデル３では、検索ログにおいてクリック数の多いカテゴリが選択される。この評価実験では、検索に実際に利用されたに応じて提供された検索結果ページにおいてユーザがクリックしたカテゴリのうち、クリック数が最も多いカテゴリを「正解カテゴリ」とした。この評価実験では、各モデルを利用したカテゴリ推定処理を行う検索サーバ１０に対して、検索に実際に利用されたクエリを入力し、検索サーバ１０によって推定されたカテゴリと、正解カテゴリとの一致率を算出した。 <1-3. Evaluation experiment>
In order to evaluate the category estimation accuracy of the search server 10 in this embodiment, an evaluation experiment using three models as shown in FIG. 7 was performed. Model 1 is an encoder / decoder model used in the search server 10 of the present embodiment. Model 2 is a linear model using a conventional linear classifier. Model 3 is a (results-based) model based on the log of the prior art. In this model 3, the category with the most clicks is selected in the search log. In this evaluation experiment, among the categories clicked by the user on the search result page provided according to the actual use in the search, the category with the largest number of clicks was defined as the "correct answer category". In this evaluation experiment, the query actually used for the search is input to the search server 10 that performs the category estimation process using each model, and the match rate between the category estimated by the search server 10 and the correct answer category. Was calculated.

図８は、上述の評価実験の結果を示す図である。図８に示すように、第１階層のカテゴリ、第２階層のカテゴリ、および第３階層のカテゴリのいずれにおいても、モデル３（ログモデル）を用いた推定結果の一致率が最も高く、次に、モデル１（エンコーダデコーダモデル）を用いた推定結果の一致率が高く、モデル２（線形モデル）を用いた推定結果の一致率が最も低かった。 FIG. 8 is a diagram showing the results of the above-mentioned evaluation experiment. As shown in FIG. 8, in all of the first layer category, the second layer category, and the third layer category, the matching rate of the estimation result using the model 3 (log model) is the highest, followed by. , The matching rate of the estimation result using the model 1 (encoder decoder model) was high, and the matching rate of the estimation result using the model 2 (linear model) was the lowest.

また、図９は、評価実験のクリック数別での一致率の比較結果を示す図である。図９に示すように、いずれのクリック数においても、モデル３を用いた推定結果の一致率が最も高く、次に、モデル１を用いた推定結果の一致率が高く、モデル２を用いた推定結果の一致率が最も低かった。クリック数が少ない場合においても（いわゆるテールクエリにおいても）、モデル３を用いた推定結果の一致率が最も高いことが分かった。 Further, FIG. 9 is a diagram showing a comparison result of the agreement rate for each number of clicks in the evaluation experiment. As shown in FIG. 9, for any number of clicks, the matching rate of the estimation result using the model 3 is the highest, then the matching rate of the estimation result using the model 1 is the highest, and the estimation using the model 2 is performed. The concordance rate of the results was the lowest. It was found that even when the number of clicks was small (even in the so-called tail query), the matching rate of the estimation results using Model 3 was the highest.

この評価実験に使用したモデル１では、クリック頻度を考慮していないため、クリック数が多いカテゴリも、クリック数が少ないカテゴリも同じように処理されるため、学習データに含まれるノイズに弱いことが想定される。 In model 1 used in this evaluation experiment, since the click frequency is not taken into consideration, the category with a large number of clicks and the category with a small number of clicks are processed in the same way, so that they are vulnerable to noise contained in the training data. is assumed.

また、モデル１を用いて推定されたカテゴリは、検索ログにおける正解カテゴリとは異なっていたものの、正しい推定処理が行われていると判断されてよいものも含まれていた。図１０は、モデル１を用いて推定されたカテゴリの一例を示す図である。図１０に示すように、例えば、「新婚プレゼント」というクエリに対して、モデル１を用いて推定されたカテゴリはキッチン用品に関連するカテゴリであったのに対して、正解カテゴリはゲームに関連するカテゴリであった。これはクエリの多義性に起因するものであり、このようなモデル１の推定結果は正しいと判断してもよいと考えらえる。 Further, although the categories estimated using the model 1 were different from the correct answer categories in the search log, there were some categories that could be judged to have been correctly estimated. FIG. 10 is a diagram showing an example of a category estimated using the model 1. As shown in FIG. 10, for example, for the query "newlywed gift", the category estimated using model 1 was the category related to kitchen utensils, while the correct answer category was related to the game. It was a category. This is due to the ambiguity of the query, and it can be considered that the estimation result of such model 1 may be judged to be correct.

また、例えば、「大きい財布通帳入る」というクエリに対して、モデル１を用いて推定されたカテゴリはレディースファッションに関連するカテゴリであったのに対して、正解カテゴリはメンズファッションに関連するカテゴリであった。これは、商品に関連するカテゴリが重複して存在することに起因するものであり、このようなモデル１の推定結果は正しいと判断してもよいと考えらえる。 Also, for example, in response to the query "Enter a large wallet passbook", the category estimated using Model 1 was a category related to women's fashion, while the correct answer category was a category related to men's fashion. there were. This is due to the fact that the categories related to the products are duplicated, and it can be considered that the estimation result of the model 1 may be judged to be correct.

また、例えば、「携帯電話かわいいストラップ」というクエリに対して、モデル１を用いて推定されたカテゴリはスマートフォンに関連するカテゴリであったのに対して、正解カテゴリはレディースファッションに関連するカテゴリであった。これは、クエリに対するカテゴリ選択の困難性に起因するものであり、このようなモデル１の推定結果は正しいと判断してもよいと考えらえる。 Also, for example, for the query "mobile phone cute strap", the category estimated using model 1 was the category related to smartphones, while the correct answer category was the category related to ladies fashion. rice field. This is due to the difficulty of category selection for the query, and it can be considered that such an estimation result of Model 1 may be judged to be correct.

その他、モデル１の推定処理が正解ではないと判断されなかった例として、クエリ解析部３２によるクエリに対する形態素解析が適切でなかった場合や、クエリに含まれる特定の単語がカテゴリ推定に大きな影響を及ぼしその他の単語との関連が適切に評価されなかった場合等があった。 In addition, as an example in which the estimation process of model 1 is not determined to be the correct answer, the morphological analysis for the query by the query analysis unit 32 is not appropriate, or a specific word included in the query has a great influence on the category estimation. In some cases, the relationship with other words was not properly evaluated.

上述のような、モデル１を用いて推定されたカテゴリに関して、検索ログにおける「正解カテゴリ」とは異なっていたものの、正しい推定処理が行われていると判断してよいものを考慮すると、モデル１における第１階層のカテゴリの一致率は、０．９２３８（９２．３８［％］）となった。この点を考慮すると、本実施形態の検索サーバ１０におけるエンコーダデコーダモデルを用いたカテゴリ推定の精度は高いと判断できる。 Regarding the category estimated using model 1 as described above, although it is different from the "correct answer category" in the search log, considering that it can be judged that the correct estimation process is performed, model 1 The matching rate of the categories in the first layer was 0.9238 (92.38 [%]). Considering this point, it can be determined that the accuracy of category estimation using the encoder / decoder model in the search server 10 of the present embodiment is high.

また、モデル３を用いた推定処理における、クエリに対する検索ログのカバレッジは、全体で０．３６７１であった。このため、モデル３を用いた場合、クエリが検索ログに存在していないと適切なカテゴリの推定が行えない場合がある。一方、モデル１を用いた推定では、クエリが検索ログに存在しない場合であっても、適切にカテゴリの推定を行うことができる。例えば、「メンズ、スニーカー、Ｂ社」というクエリの学習が行われていない場合であっても、「メンズ、スニーカー、Ａ社」というクエリの学習が行われおり、かつ、「Ｂ社」と「Ａ社」とがベクトル表現上で近い単語であるという学習が行われていれば、「メンズ、スニーカー、Ｂ社」に対するカテゴリを適切に推定することが可能である。 In addition, the coverage of the search log for the query in the estimation process using the model 3 was 0.3671 as a whole. Therefore, when the model 3 is used, it may not be possible to estimate an appropriate category unless the query exists in the search log. On the other hand, in the estimation using the model 1, even if the query does not exist in the search log, the category can be estimated appropriately. For example, even if the query "Men's, sneakers, company B" is not learned, the query "Men's, sneakers, company A" is learned, and "Company B" and "Company B" are learned. If it is learned that "Company A" is a close word in the vector expression, it is possible to appropriately estimate the category for "Men's, sneakers, Company B".

以上において説明した本実施形態の検索サーバ１０によれば、クエリと関連付けされるカテゴリを適切に推定でき、さらには、検索精度を向上させることができる。また、本実施形態の検索サーバ１０によれば、過去に検索に用いられたことのない未知のクエリが入力された場合であっても適切にカテゴリの推定を行うことができる。 According to the search server 10 of the present embodiment described above, the category associated with the query can be appropriately estimated, and the search accuracy can be improved. Further, according to the search server 10 of the present embodiment, it is possible to appropriately estimate the category even when an unknown query that has not been used for the search in the past is input.

尚、エンコーダ４２のエンコード処理結果に基づいて、検索サーバ１０に対する端末装置Ｔを操作するユーザからのアクセスの一まとまりのセッションの切れ目を判定する判定部さらに設けてもよい。例えば、エンコード処理結果が示すベクトルのベクトル座標空間上の位置が大きく変化した場合に、ユーザからのアクセスのセッションが切り替わったと判定するようにしてよい。 It should be noted that a determination unit for determining the break of a set of sessions of access from the user who operates the terminal device T to the search server 10 based on the encoding processing result of the encoder 42 may be further provided. For example, when the position of the vector indicated by the encoding processing result on the vector coordinate space changes significantly, it may be determined that the session of access from the user has been switched.

＜２．第２実施形態＞
以下、本発明の第２実施形態について説明する。本実施形態の情報処理装置（検索サーバ）は、第１実施形態と比較して、エンコーダデコーダモデルに基づくカテゴリ推定処理と、ログデータに基づくカテゴリ推定処理との両方を行う点が異なる。このため、構成などについては第１実施形態で説明した図および関連する記載を援用し、詳細な説明を省略する。 <2. 2nd Embodiment>
Hereinafter, a second embodiment of the present invention will be described. The information processing apparatus (search server) of the present embodiment is different from the first embodiment in that it performs both the category estimation process based on the encoder decoder model and the category estimation process based on the log data. Therefore, for the configuration and the like, the drawings and related descriptions described in the first embodiment will be referred to, and detailed description will be omitted.

＜２−１．システム構成＞
図１１は、本実施形態の検索サーバ１１の機能ブロック図である。検索サーバ１１は、第１実施形態の検索サーバ１０の構成要素に加えて、例えば、クエリ判定部４６と、第２カテゴリ推定部４８（第２推定部）とをさらに備える。 <2-1. System configuration>
FIG. 11 is a functional block diagram of the search server 11 of the present embodiment. The search server 11 further includes, for example, a query determination unit 46 and a second category estimation unit 48 (second estimation unit) in addition to the components of the search server 10 of the first embodiment.

クエリ判定部４６は、記憶部４０に記憶された検索ログを参照し、端末装置Ｔから入力されたクエリが、過去の検索ログに存在しているか否かを判定する。検索サーバ１１では、クエリ判定部４６が端末装置Ｔから入力されたクエリが過去の検索ログに存在していないと判定した場合、第１実施形態と同様なエンコーダデコーダモデルに基づくカテゴリ推定処理を行う。一方、検索サーバ１１では、クエリ判定部４６が端末装置Ｔから入力されたクエリが過去の検索ログに存在していると判定した場合、ログデータに基づくカテゴリ推定処理を行う。 The query determination unit 46 refers to the search log stored in the storage unit 40, and determines whether or not the query input from the terminal device T exists in the past search log. In the search server 11, when the query determination unit 46 determines that the query input from the terminal device T does not exist in the past search log, the search server 11 performs category estimation processing based on the encoder decoder model similar to the first embodiment. .. On the other hand, when the query determination unit 46 determines that the query input from the terminal device T exists in the past search log, the search server 11 performs category estimation processing based on the log data.

第２カテゴリ推定部４８は、例えば、記憶部４０に記憶された検索ログデータＤ３を参照し、カテゴリを推定する。第２カテゴリ推定部４８は、例えば、記憶部４０に記憶された検索ログデータＤ３において、クリック数が最も多いカテゴリを選択する。図１２は、検索ログデータＤ３の一例を示す図である。図１２において、クエリが「レディースＢ社」の場合、第１階層のカテゴリ１が「レディースファッション」であり、第２階層のカテゴリ２が「財布・小物」であり、第３階層のカテゴリ３が「財布」であるカテゴリのクリック数（２５６９９９）が最も多い。このため、クエリが「レディースＢ社」の場合、第２カテゴリ推定部４８は、このクエリに関連付けされるカテゴリとして、第１階層のカテゴリ１が「レディースファッション」であり、第２階層のカテゴリ２が「財布・小物」であり、第３階層のカテゴリ３が「財布」を推定する。 The second category estimation unit 48, for example, refers to the search log data D3 stored in the storage unit 40 and estimates the category. The second category estimation unit 48 selects, for example, the category with the largest number of clicks in the search log data D3 stored in the storage unit 40. FIG. 12 is a diagram showing an example of the search log data D3. In FIG. 12, when the query is "Ladies B company", category 1 of the first layer is "ladies fashion", category 2 of the second layer is "wallet / accessory", and category 3 of the third layer is. The number of clicks (256999) in the category of "wallet" is the highest. Therefore, when the query is "Ladies'B company", the category 1 of the first layer is "Ladies' fashion" and the category 2 of the second layer is the category associated with this query in the second category estimation unit 48. Is "wallet / accessory", and category 3 of the third layer estimates "wallet".

＜２−２．カテゴリ推定処理＞
以下において、検索サーバ１１のカテゴリ推定処理について説明する。図１３は、検索サーバ１１のカテゴリ推定処理の流れの一例を示す図である。 <2-2. Category estimation process>
Hereinafter, the category estimation process of the search server 11 will be described. FIG. 13 is a diagram showing an example of the flow of the category estimation process of the search server 11.

まず、クエリ解析部３２は、通信部３０を介して、端末装置Ｔから入力されたクエリを取得する（Ｓ３０１）。次に、クエリ判定部４６は、記憶部４０に記憶された検索ログを参照し、端末装置Ｔから入力されたクエリが、過去の検索ログに存在しているか否かを判定する（Ｓ３０３）。 First, the query analysis unit 32 acquires the query input from the terminal device T via the communication unit 30 (S301). Next, the query determination unit 46 refers to the search log stored in the storage unit 40, and determines whether or not the query input from the terminal device T exists in the past search log (S303).

クエリ判定部４６が端末装置Ｔから入力されたクエリが過去の検索ログに存在していると判定した場合、第２カテゴリ推定部４８は、記憶部４０に記憶された検索ログデータＤ３を参照し、カテゴリを推定する（Ｓ３０５）。第２カテゴリ推定部４８は、推定したカテゴリの情報を検索結果生成部３８に入力する。 When the query determination unit 46 determines that the query input from the terminal device T exists in the past search log, the second category estimation unit 48 refers to the search log data D3 stored in the storage unit 40. , Estimate the category (S305). The second category estimation unit 48 inputs the estimated category information into the search result generation unit 38.

一方、クエリ判定部４６が端末装置Ｔから入力されたクエリが過去の検索ログに存在しないと判定した場合、クエリ解析部３２は、端末装置Ｔから入力されたクエリを解析して、単語レベルに分割する（Ｓ３０７）。次に、クエリ変換部３４は、クエリ解析部３２によって分割された単語の各々を特徴ベクトルに変換する（Ｓ３０９）。 On the other hand, when the query determination unit 46 determines that the query input from the terminal device T does not exist in the past search log, the query analysis unit 32 analyzes the query input from the terminal device T to the word level. Divide (S307). Next, the query conversion unit 34 converts each of the words divided by the query analysis unit 32 into a feature vector (S309).

次に、エンコーダ４２は、クエリ変換部３４によって変換された特徴ベクトルのうち、第１の単語の特徴ベクトルに対してエンコード処理を行う（Ｓ３１１）。次に、エンコーダ４２は、全ての単語の特徴ベクトルに対するエンコード処理が完了したか否かを判定する（Ｓ３１３）。 Next, the encoder 42 performs encoding processing on the feature vector of the first word among the feature vectors converted by the query conversion unit 34 (S311). Next, the encoder 42 determines whether or not the encoding process for the feature vectors of all the words is completed (S313).

エンコーダ４２は、全ての単語の特徴ベクトルに対するエンコード処理が完了していないと判定した場合、クエリ変換部３４によって変換された特徴ベクトルのうち、第２の単語の特徴ベクトルと、第１の単語の特徴ベクトルに対するエンコード処理結果とを用いて、再度、エンコード処理を行う（Ｓ３１１）。次に、エンコーダ４２は、全ての単語の特徴ベクトルに対するエンコード処理が完了したか否かを判定する（Ｓ３１３）。以下、全ての単語の特徴ベクトルに対するエンコード処理が完了するまで、同様な処理を繰り返す。エンコーダ４２は、全ての単語の特徴ベクトルに対するエンコード処理が完了したと判定した場合、上述のエンコード処理を繰り返した結果として得られたエンコード処理結果をデコーダ４４に入力する。 When the encoder 42 determines that the encoding processing for the feature vectors of all the words has not been completed, the feature vector of the second word and the feature vector of the first word among the feature vectors converted by the query conversion unit 34 Encoding processing is performed again using the encoding processing result for the feature vector (S311). Next, the encoder 42 determines whether or not the encoding process for the feature vectors of all the words is completed (S313). Hereinafter, the same processing is repeated until the encoding processing for the feature vectors of all words is completed. When the encoder 42 determines that the encoding processing for the feature vectors of all words is completed, the encoder 42 inputs the encoding processing result obtained as a result of repeating the above-mentioned encoding processing to the decoder 44.

次に、デコーダ４４は、エンコーダ４２から入力されたエンコード処理結果に対して、デコード処理を行う（Ｓ３１５）。次に、デコーダ４４は、最下層のカテゴリの推定が完了したか否かを判定する（Ｓ３１７）。デコーダ４４は、最下層のカテゴリの推定が完了していないと判定した場合、推定されたカテゴリの情報と、デコード処理結果とを用いて、再度、デコード処理を行う（Ｓ３１５）。次に、エンコーダ４２は、最下層のカテゴリの推定が完了したか否かを判定する（Ｓ３１７）。以下、最下層のカテゴリの推定が完了するまで、同様な処理を繰り返す。デコーダ４４は、最下層のカテゴリの推定が完了したと判定した場合、推定したカテゴリの情報を検索結果生成部３８に入力する。 Next, the decoder 44 performs decoding processing on the encoding processing result input from the encoder 42 (S315). Next, the decoder 44 determines whether or not the estimation of the lowest layer category is completed (S317). When the decoder 44 determines that the estimation of the lowermost category has not been completed, the decoder 44 performs the decoding process again using the estimated category information and the decoding process result (S315). Next, the encoder 42 determines whether or not the estimation of the category of the lowest layer is completed (S317). Hereinafter, the same process is repeated until the estimation of the lowest category is completed. When the decoder 44 determines that the estimation of the lowest layer category is completed, the decoder 44 inputs the information of the estimated category to the search result generation unit 38.

次に、検索結果生成部３８は、デコーダ４４または第２カテゴリ推定部４８から入力されたカテゴリの推定結果に基づいて、検索結果である検索結果ページを生成し、端末装置Ｔに送信する（Ｓ３１９）。以上により、本フローチャートの処理を終了する。 Next, the search result generation unit 38 generates a search result page which is a search result based on the estimation result of the category input from the decoder 44 or the second category estimation unit 48, and transmits it to the terminal device T (S319). ). This completes the processing of this flowchart.

以上において説明した本実施形態の検索サーバ１１によれば、クエリと関連付けされるカテゴリを適切に推定でき、さらには、検索精度を向上させることができる。また、本実施形態の検索サーバ１１によれば、端末装置Ｔから入力されたクエリが過去の検索ログに存在していない場合、エンコーダデコーダモデルに基づくカテゴリ推定処理を行い、端末装置Ｔから入力されたクエリが過去の検索ログに存在している場合、ログデータに基づくカテゴリ推定処理を行うことで、カテゴリの推定精度をさらに向上させることができる。 According to the search server 11 of the present embodiment described above, the category associated with the query can be appropriately estimated, and the search accuracy can be improved. Further, according to the search server 11 of the present embodiment, when the query input from the terminal device T does not exist in the past search log, the category estimation process based on the encoder decoder model is performed and the query is input from the terminal device T. If the query exists in the past search log, the category estimation accuracy can be further improved by performing the category estimation process based on the log data.

＜３．第３実施形態＞
以下、本発明の第３実施形態について説明する。本実施形態の情報処理装置（検索サーバ）は、第１実施形態と比較して、１つのエンコーダと、複数の推定部（デコーダ）とを備える点が異なる。このため、構成などについては第１実施形態で説明した図および関連する記載を援用し、詳細な説明を省略する。 <3. Third Embodiment>
Hereinafter, a third embodiment of the present invention will be described. The information processing apparatus (search server) of the present embodiment is different from the first embodiment in that it includes one encoder and a plurality of estimation units (decoders). Therefore, for the configuration and the like, the drawings and related descriptions described in the first embodiment will be referred to, and detailed description will be omitted.

＜３−１．システム構成＞
図１４は、本実施形態の検索サーバ１２の機能ブロック図である。検索サーバ１２は、第１実施形態の検索サーバ１０のカテゴリ推定部３６に代えて、例えば、１つのエンコーダ５０と、３つの推定部（カテゴリ推定部５２、価格帯推定部５４、ブランド推定部５６）とを備える。 <3-1. System configuration>
FIG. 14 is a functional block diagram of the search server 12 of the present embodiment. The search server 12 replaces the category estimation unit 36 of the search server 10 of the first embodiment with, for example, one encoder 50, three estimation units (category estimation unit 52, price range estimation unit 54, brand estimation unit 56). ) And.

エンコーダ５０は、学習データを用いてエンコードモデルを用いて、クエリ変換部３４から入力された少なくとも１つの特徴ベクトルに対してエンコード処理を行う。エンコーダ５０は、クエリ変換部３４から２以上の特徴ベクトルが入力された場合、１つ目の特徴ベクトルに対してエンコード処理を行い、次に、１つ目の特徴ベクトルのエンコード処理結果と、２つ目の特徴ベクトルとを用いてエンコード処理を行う。エンコーダ４２は、エンコード処理を繰り返すことにより得られたエンコード処理結果を、カテゴリ推定部５２、価格帯推定部５４、およびブランド推定部５６に入力する。 The encoder 50 uses the learning data and uses an encoding model to perform encoding processing on at least one feature vector input from the query conversion unit 34. When two or more feature vectors are input from the query conversion unit 34, the encoder 50 performs encoding processing on the first feature vector, and then encodes the first feature vector and 2 Encoding processing is performed using the second feature vector. The encoder 42 inputs the encoding processing result obtained by repeating the encoding processing to the category estimation unit 52, the price range estimation unit 54, and the brand estimation unit 56.

カテゴリ推定部５２は、学習データを用いて学習を行ったデコードモデルを用いて、エンコーダ５０から入力されたエンコード処理結果に対してデコード処理を行い、クエリの単語に関連付けされるべきカテゴリを推定して検索結果生成部３８に入力する。 The category estimation unit 52 performs decoding processing on the encoding processing result input from the encoder 50 by using the decoding model trained using the training data, and estimates the category to be associated with the query word. And input to the search result generation unit 38.

価格帯推定部５４は、学習データを用いて学習を行ったデコードモデルを用いて、エンコーダ５０から入力されたエンコード処理結果に対してデコード処理を行い、クエリの単語に関連付けされるべき商品またはサービスの価格帯を推定して検索結果生成部３８に入力する。価格帯推定部５４の学習データは、例えば、検索に用いられたクエリに含まれる「単語の特徴ベクトル」と、クエリを応じて提供された検索結果ページにおいてユーザが選択した「商品またはサービスの価格」との組を利用する。 The price range estimation unit 54 performs decoding processing on the encoding processing result input from the encoder 50 using the decoding model trained using the training data, and the product or service to be associated with the word of the query. The price range of is estimated and input to the search result generation unit 38. The training data of the price range estimation unit 54 includes, for example, a "word feature vector" included in the query used for the search and a "price of the product or service" selected by the user on the search result page provided in response to the query. Use the pair with.

ブランド推定部５６は、学習データを用いて学習を行ったデコードモデルを用いて、エンコーダ５０から入力されたエンコード処理結果に対してデコード処理を行い、クエリの単語に関連付けされるべき商品またはサービスのブランドを推定して検索結果生成部３８に入力する。ブランド推定部５６の学習データは、例えば、検索に用いられたクエリに含まれる「単語の特徴ベクトル」と、クエリを応じて提供された検索結果ページにおいてユーザが選択した「商品またはサービスのブランド」との組を利用する。 The brand estimation unit 56 performs decoding processing on the encoding processing result input from the encoder 50 using the decoding model trained using the training data, and performs decoding processing on the product or service to be associated with the word of the query. The brand is estimated and input to the search result generation unit 38. The learning data of the brand estimation unit 56 includes, for example, a "word feature vector" included in the query used in the search and a "brand of goods or services" selected by the user on the search result page provided in response to the query. Use the pair with.

検索結果生成部３８は、カテゴリ推定部５２から入力されたカテゴリの推定結果、価格帯推定部５４から入力された価格帯の推定結果、およびブランド推定部５６から入力されたブランドの推定結果に基づいて、検索結果である検索結果ページを生成し、端末装置Ｔに送信する。 The search result generation unit 38 is based on the category estimation result input from the category estimation unit 52, the price range estimation result input from the price range estimation unit 54, and the brand estimation result input from the brand estimation unit 56. Then, a search result page, which is a search result, is generated and transmitted to the terminal device T.

以上において説明した本実施形態の検索サーバ１２によれば、クエリと関連付けされるカテゴリを適切に推定でき、さらには、検索精度を向上させることができる。また、本実施形態の検索サーバ１２によれば、カテゴリ推定部５２から入力されたカテゴリの推定結果、価格帯推定部５４から入力された価格帯の推定結果、およびブランド推定部５６から入力されたブランドの推定結果に基づいて、検索結果である検索結果ページを生成するため、さらにユーザの意図に応じた検索結果ページを提供することができる。 According to the search server 12 of the present embodiment described above, the category associated with the query can be appropriately estimated, and the search accuracy can be improved. Further, according to the search server 12 of the present embodiment, the category estimation result input from the category estimation unit 52, the price range estimation result input from the price range estimation unit 54, and the brand estimation unit 56 are input. Since the search result page which is the search result is generated based on the estimation result of the brand, it is possible to further provide the search result page according to the user's intention.

尚、カテゴリ推定部５２、価格帯推定部５４、およびブランド推定部５６に加えてあるいは代えて、他の特徴を推定する推定部を設けてもよい。 In addition to or in place of the category estimation unit 52, the price range estimation unit 54, and the brand estimation unit 56, an estimation unit for estimating other features may be provided.

以上において説明した実施形態によれば、検索に用いられたクエリを解析して、前記クエリを単語に分割する解析部と、前記解析部によって分割された単語を特徴ベクトルに変換する変換部と、学習データにおけるクエリに含まれる単語の特徴ベクトルと、前記学習データにおけるクエリに関連付けされたカテゴリとの関係を学習することにより、前記検索に用いられたクエリと関連付けされるカテゴリを推定する第１推定部とを備えることで、検索精度を向上させることができる。 According to the embodiment described above, an analysis unit that analyzes the query used for the search and divides the query into words, a conversion unit that converts the words divided by the analysis unit into feature vectors, and a conversion unit. First estimation to estimate the category associated with the query used in the search by learning the relationship between the feature vector of the word included in the query in the training data and the category associated with the query in the training data. By providing a unit, the search accuracy can be improved.

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.

１０、１１、１２…検索サーバ（情報処理装置）
３０…通信部
３２…クエリ解析部（解析部）
３４…クエリ変換部（変換部）
３６…カテゴリ推定部（第１推定部、学習部）
３８…検索結果生成部（生成部）
４０…記憶部
４２…エンコーダ
４４…デコーダ
４６…クエリ判定部（判定部）
４８…第２カテゴリ推定部（第２推定部）
５０…エンコーダ
５２…カテゴリ推定部
５４…価格帯推定部
５６…ブランド推定部 10, 11, 12 ... Search server (information processing device)
30 ... Communication unit 32 ... Query analysis unit (analysis unit)
34 ... Query conversion unit (conversion unit)
36 ... Category estimation unit (1st estimation unit, learning unit)
38 ... Search result generation unit (generation unit)
40 ... Storage unit 42 ... Encoder 44 ... Decoder 46 ... Query judgment unit (judgment unit)
48 ... Second category estimation unit (second estimation unit)
50 ... Encoder 52 ... Category estimation unit 54 ... Price range estimation unit 56 ... Brand estimation unit

Claims

An analysis unit that analyzes the query used for the search and divides the query into words.
A conversion unit that converts words divided by the analysis unit into feature vectors, and a conversion unit.
The first estimation that estimates the category associated with the query used in the search by learning the relationship between the feature vector of the word included in the query in the training data and the category associated with the query in the training data. With a department ,
The first estimation unit is
An encoder that performs encoding processing on the feature vector of the word,
A decoder that performs decoding processing on the processing result of the encoder and outputs a category associated with the query used in the search.
Equipped with
The decoder performs a first decoding process on the processing result of the encoder to output a first category, and performs a second decoding process on the result of the first decoding process to perform a second decoding process. Output category,
Information processing device.

The encoder performs a first encoding process on the first feature vector among the feature vectors of the plurality of words, and then uses the result of the first encoding process and the second feature vector. Encoder processing,
The information processing apparatus according to claim 1.

The category has a tree structure and has a tree structure.
The first category is the category defined at the highest level in the tree structure.
The second category is a category defined below the first category.
The information processing apparatus according to claim 1 or 2.

It further comprises a generator that generates a search result page for the query used in the search based on the category estimated by the first estimate.
The information processing apparatus according to any one of claims 1 to 3.

The analysis unit analyzes the query used in the search by morphological analysis and divides the query into words.
The information processing apparatus according to any one of claims 1 to 4.

Further provided with a learning unit that learns the relationship between the feature vector of the query word used in the search and the category selected by the user on the search result page for the query used in the search.
The learning unit excludes categories from the learning data in which the number of times selected by the user is equal to or less than a predetermined threshold value.
The information processing apparatus according to any one of claims 1 to 5.

Further comprising a second estimation unit that estimates the category associated with the query used in the search based on the log of the category selected by the user on the search result page for the query used in the search.
The information processing apparatus according to any one of claims 1 to 6.

Further, a determination unit for determining whether or not the query used for the search exists in the log is provided.
When the determination unit determines that the query used in the search exists in the log, the second estimation unit estimates a category related to the query used in the search.
When the determination unit determines that the query used for the search does not exist in the log, the first estimation unit estimates a category related to the query used for the search.
The information processing apparatus according to claim 7.

The first estimation unit is
An encoder that performs encoding processing on the feature vector of the word,
A plurality of decoders that perform decoding processing on the processing result of the encoder are provided.
The information processing apparatus according to claim 1.

An encoder that performs encoding processing on the feature vector of the word,
When the position of the vector indicated by the processing result of the encoder on the vector coordinate space changes significantly, it is determined that the session of access from the user has been switched, so that a session of a set of access from the user to the information processing apparatus is performed. Judgment unit that determines the break of
Further prepare,
The information processing apparatus according to claim 1.

The computer
The query used in the search is parsed, the query is split into words, and
The divided words are converted into feature vectors and
By learning the relationship between the feature vector of the word included in the query in the training data and the category associated with the query in the training data, the category associated with the query used in the search is estimated .
In the estimation of the above category,
Encoding processing is performed on the feature vector of the word.
Decoding processing is performed on the result of the encoding processing, and the category associated with the query used in the search is output.
In the decoding process, the first decoding process is performed on the result of the encoding process to output the first category, and the second decoding process is performed on the result of the first decoding process to perform the second decoding process. Output the category of
Information processing method.

On the computer
The query used for the search is analyzed, and the query is divided into words.
The divided words are converted into feature vectors and
By learning the relationship between the feature vector of the word included in the query in the training data and the category associated with the query in the training data, the category associated with the query used in the search is estimated .
In the estimation of the above category,
Encoding processing is executed for the feature vector of the word,
The decoding process is executed for the result of the encoding process, and the category associated with the query used in the search is output.
In the decoding process, the first decoding process is executed on the result of the encoding process to output the first category, and the second decoding process is executed on the result of the first decoding process. Output the second category,
program.