JP7224392B2

JP7224392B2 - Information processing device, information processing method and program

Info

Publication number: JP7224392B2
Application number: JP2021089033A
Authority: JP
Inventors: ウイルソンジュリア
Original assignee: Rakuten Group Inc
Current assignee: Rakuten Group Inc
Priority date: 2021-04-09
Filing date: 2021-05-27
Publication date: 2023-02-17
Anticipated expiration: 2041-05-27
Also published as: JP2022161774A; JP2023055916A; JP7461524B2

Description

本発明は、情報処理装置、情報処理方法およびプログラムに関し、特に、入力されたクエリに対応して検索結果を提供するための技術に関する。 The present invention relates to an information processing device, an information processing method, and a program, and more particularly to technology for providing search results in response to an input query.

クエリ（ｑｕｅｒｙ）は、検索エンジンに投入されることにより、検索エンジンに所望の検索結果を出力させる検索要求である。
例えば、ウェブ上に構築されるＥＣ（ＥｌｅｃｔｒｏｎｉｃＣｏｍｍｅｒｃｅ）サイトは、検索エンジンを実装し、ユーザが購入を所望するアイテムを検索するためのクエリの入力を受け付けて、クエリに含まれる、検索のキーとなるキーワードに一致するタグでタグ付けされたアイテムを検索エンジンに検索させて、検索結果として得られたアイテムのリストをユーザに提示する。 A query is a search request that is submitted to a search engine to cause the search engine to output desired search results.
For example, an EC (Electronic Commerce) site built on the web implements a search engine, receives input of a query for searching for an item that a user wishes to purchase, and obtains a search key and The search engine searches for items tagged with tags that match a keyword, and presents the user with a list of the resulting items.

多くの検索エンジンは、複数のキーワードを１つのクエリに含むマルチワードクエリを許容する。マルチワードクエリは、ユーザにとって、ユーザの意図を複数の観点からクエリに反映させ易くするが、他方、複数の検索キーワードのすべてと完全に一致する検索結果のみを出力したのでは、本来出力されるべき検索結果が出力から漏れることになり、所望の検索結果が得られない。 Many search engines allow multi-word queries that include multiple keywords in one query. A multi-word query makes it easier for the user to reflect the user's intentions in the query from multiple viewpoints. The desired search results will not be obtained because the search results that should be expected will be omitted from the output.

このため、マルチワードクエリを許容する検索エンジンでは、通常、１つのクエリの一部を構成するキーフレーズ（句）やキーワード（語）のいずれかと一致する検索結果を出力する部分一致（ｐａｒｔｉａｌｍａｔｃｈ）検索を採用することで、検索範囲を拡大し、より多数の検索結果を出力している。
しかしながら、１つのクエリに含まれる複数のキーフレーズやキーワードのそれぞれが、何を実際に購入したいか、のユーザの意図を、必ずしも均等に代表するわけではない。このため、部分一致検索では、ユーザの意図から大きく外れる検索結果も出力されるおそれがあり、検索精度が低下しかねない。 For this reason, search engines that allow multi-word queries usually use partial match to output search results that match either a key phrase (phrase) or a keyword (word) that constitutes a part of one query. By adopting search, the search range is expanded and more search results are output.
However, each of the multiple key phrases or keywords included in a single query does not necessarily equally represent the user's intention of what he or she actually wants to purchase. For this reason, partial match searches may output search results that are significantly different from the user's intentions, which may reduce search accuracy.

特許文献１は、クエリをフレーズまたはキーワードに分類し、分類されたフレーズまたはキーワードから、ユーザの要望が表されていると判定されたフレーズまたはキーワードを選択する情報処理システムを開示する。
具体的には、特許文献１の情報処理システムにおいては、クエリタイプ分類部が、クエリを、フレーズ、キーワード、またはこれらの任意の組み合わせデータのいずれかに分類し、クエリタイプ判定部が、要望辞書データベースを検索して、分類されたフレーズ、キーワード、または組み合わせデータに対応するクエリタイプおよびクエリタイプスコアを取得し、取得されたクエリタイプスコアの高いフレーズ、キーワード、または組み合わせデータを選択する。 Patent Literature 1 discloses an information processing system that classifies queries into phrases or keywords, and selects phrases or keywords determined to express a user's desire from the classified phrases or keywords.
Specifically, in the information processing system of Patent Document 1, a query type classification unit classifies a query into either phrases, keywords, or any combination of these data, and a query type determination unit uses a request dictionary Searching a database to obtain query types and query type scores corresponding to the classified phrases, keywords, or combination data, and selecting phrases, keywords, or combination data with high obtained query type scores.

クエリタイプは、「案内型」、「要望型」等、フレーズやキーワードの性質と要望に応じた分類パターンであり、クエリタイプスコアは、クエリタイプに従って事前定義される、推定されるユーザの要望の確度を数値化したスコアである。要望辞書データベースは、フレーズ、キーワード、または組み合わせデータごとに、対応するクエリタイプおよびクエリタイプスコアを記憶する。 Query type is a classification pattern according to the nature and request of phrases and keywords, such as "guidance type" and "request type". It is a score that quantifies the accuracy. The wish dictionary database stores the corresponding query type and query type score for each phrase, keyword, or combination data.

特開２０１７－４１０３０号公報Japanese Unexamined Patent Application Publication No. 2017-41030

特許文献１の技術によれば、１つのクエリを構成するキーフレーズやキーワードのうち、推定されるユーザの要望の確度の高いキーフレーズやキーワードを選択することができる。
しかしながら、特許文献１の技術では、クエリに含まれ得る構成要素であるキーフレーズやキーワードの候補のそれぞれについて、クエリタイプおよびクエリタイプスコアを事前に辞書に定義しておかなければならない。
このため、事前定義に係る登録や保守の負荷が増大するとともに、事前定義されていないクエリの構成要素についてはクエリタイプスコアを取得することができないため、柔軟性に欠け、検索精度が低下してしまうおそれがある。 According to the technique disclosed in Patent Literature 1, it is possible to select a key phrase or keyword that is highly likely to be the user's desire, among the key phrases or keywords that make up one query.
However, with the technique of Patent Literature 1, a query type and a query type score must be defined in a dictionary in advance for each of the key phrases and keyword candidates that are components that can be included in a query.
As a result, the load of registration and maintenance related to predefined definitions increases, and query type scores cannot be obtained for query components that are not predefined, resulting in lack of flexibility and reduced search accuracy. There is a risk that it will be lost.

本発明は上記課題を解決するためになされたものであり、その目的は、クエリに含まれる構成要素のそれぞれを事前定義することなく、クエリの構成要素の中から、ユーザの意図により合致する構成要素を高精度に特定することが可能な情報処理装置、情報処理方法およびプログラムを提供することにある。 SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and its object is to provide a configuration that more closely matches the user's intention from among the constituent elements of a query without predefining each of the constituent elements included in the query. An object of the present invention is to provide an information processing device, an information processing method, and a program capable of specifying an element with high accuracy.

上記課題を解決するために、本発明に係る情報処理装置の一態様は、第１のクエリを取得する第１クエリ取得部と、前記第１クエリ取得部により取得された前記第１のクエリを複数のトークンに分割し、分割された１つまたは複数のトークンからなるクエリサブセットを複数生成するクエリサブセット生成部と、前記クエリサブセット生成部により生成された前記クエリサブセットを含む第２のクエリを取得する第２クエリ取得部と、ユーザの行動履歴に基づいて、前記第２のクエリの、前記第１のクエリに対する類似度を算出する類似度算出部と、前記類似度算出部により算出された、前記第２のクエリの前記類似度に基づいて、前記クエリサブセットのスコアを算出するスコア算出部と、複数の前記クエリサブセットのうち、前記スコア算出部により算出された前記スコアがより高いクエリサブセットを特定するクエリサブセット特定部とを備える。 In order to solve the above problems, one aspect of an information processing apparatus according to the present invention includes: a first query acquisition unit that acquires a first query; Obtaining a query subset generation unit that divides into a plurality of tokens and generates a plurality of query subsets composed of one or more divided tokens, and a second query that includes the query subset generated by the query subset generation unit a second query acquisition unit, a similarity calculation unit that calculates the similarity of the second query to the first query based on the user's action history, and the similarity calculation unit, a score calculation unit that calculates a score of the query subset based on the similarity of the second query; and a query subset having a higher score calculated by the score calculation unit among the plurality of query subsets. and a query subset identification unit to identify.

前記類似度算出部は、前記第２のクエリの検索結果セットが前記第１のクエリの検索結果セットに類似する程度として、前記類似度を算出してよい。
上記情報処理装置は、前記ユーザの行動履歴に基づいて、前記第１のクエリまたは前記第２のクエリを、当該クエリの属性にリンクで接続し、前記リンクに重みを付与することで、クエリモデルを生成し、生成された前記クエリモデルを記憶装置に記憶するクエリモデル生成部をさらに備え、前記類似度算出部は、前記記憶装置に記憶される前記クエリモデルを参照することにより、前記第２のクエリの前記類似度を算出してよい。 The similarity calculation unit may calculate the similarity as a degree of similarity between the search result set of the second query and the search result set of the first query.
The information processing device connects the first query or the second query to the attribute of the query with a link based on the action history of the user, and assigns a weight to the link, thereby forming a query model and storing the generated query model in a storage device, wherein the similarity calculation unit refers to the query model stored in the storage device to obtain the second query may be calculated.

前記類似度算出部は、前記第１のクエリのクエリモデルと前記第２のクエリのクエリモデルとの間で共有される前記属性に接続される前記リンクに付与された前記重みを演算することにより、前記第２のクエリの前記類似度を算出してよい。 The similarity calculation unit calculates the weight given to the link connected to the attribute shared between the query model of the first query and the query model of the second query , the similarity of the second query may be calculated.

前記類似度算出部は、前記第１のクエリのクエリモデルと前記第２のクエリのクエリモデルとの間で共有される前記属性に接続される前記リンクの対にそれぞれ付与された複数の前記重みを比較し、小さい値を持つ前記重みを複数の前記属性について加算することにより、前記第２のクエリの前記類似度を算出してよい。 The similarity calculation unit calculates a plurality of the weights respectively given to the pairs of the links connected to the attributes shared between the query model of the first query and the query model of the second query. and adding the weights with smaller values for a plurality of the attributes to calculate the similarity of the second query.

前記類似度算出部は、前記ユーザが操作を過去に実行したアイテムに関連付けられる情報を、前記第１のクエリおよび第２のクエリの間で比較することにより、前記第２のクエリの前記類似度を算出してよい。 The similarity calculation unit calculates the similarity of the second query by comparing information associated with an item on which the user has performed an operation in the past, between the first query and the second query. can be calculated.

前記クエリモデル生成部は、前記アイテムに関連付けられる情報を、前記属性として設定し、前記操作の頻度に基づいて、前記重みを付与することにより、前記クエリモデルを生成してよい。 The query model generation unit may generate the query model by setting information associated with the item as the attribute and assigning the weight based on the frequency of the operation.

前記スコア算出部は、すべての前記第２のクエリの前記類似度の幾何平均を算出することにより、前記クエリサブセットのスコアを算出してよい。 The score calculator may calculate the score of the query subset by calculating a geometric mean of the similarities of all the second queries.

前記第２クエリ取得部は、前記クエリサブセット生成部により生成された複数の前記クエリサブセットのそれぞれについて、前記第２のクエリを取得し、前記スコア算出部は、複数の前記クエリサブセットのそれぞれについて、前記スコアを算出してよい。 The second query acquisition unit acquires the second query for each of the plurality of query subsets generated by the query subset generation unit, and the score calculation unit acquires the second query for each of the plurality of query subsets, The score may be calculated.

前記クエリサブセット生成部は、前記第１のクエリの文字列上、連続しない複数のトークンから、前記クエリサブセットを生成してよい。 The query subset generation unit may generate the query subset from a plurality of discontinuous tokens in the character string of the first query.

上記情報処理装置は、前記クエリサブセット特定部により特定された前記クエリサブセットに対応する検索結果が優先的に提示されるよう出力を制御する出力制御部をさらに備えてよい。 The information processing apparatus may further include an output control unit that controls output such that search results corresponding to the query subset identified by the query subset identification unit are preferentially presented.

本発明に係る情報処理方法の一態様は、情報処理装置が実行する情報処理方法であって、第１のクエリを取得するステップと、取得された前記第１のクエリを複数のトークンに分割し、分割された１つまたは複数のトークンからなるクエリサブセットを複数生成するステップと、生成された前記クエリサブセットを含む第２のクエリを取得するステップと、
ユーザの行動履歴に基づいて、前記第２のクエリの、前記第１のクエリに対する類似度を算出するステップと、前記第２のクエリの前記類似度に基づいて、前記クエリサブセットのスコアを算出するステップと、複数の前記クエリサブセットのうち、算出された前記スコアがより高いクエリサブセットを特定するステップとを含む。 One aspect of an information processing method according to the present invention is an information processing method executed by an information processing apparatus, comprising: acquiring a first query; dividing the acquired first query into a plurality of tokens; , generating a plurality of query subsets consisting of one or more divided tokens, and obtaining a second query including the generated query subsets;
calculating a similarity of the second query to the first query based on a user's behavior history; and calculating a score of the query subset based on the similarity of the second query. and identifying a query subset with a higher calculated score among the plurality of query subsets.

本発明に係る情報処理プログラムの一態様は、情報処理をコンピュータに実行させるための情報処理プログラムであって、該プログラムは、前記コンピュータに、第１のクエリを取得する第１クエリ取得処理と、前記第１クエリ取得処理により取得された前記第１のクエリを複数のトークンに分割し、分割された１つまたは複数のトークンからなるクエリサブセットを複数生成するクエリサブセット生成処理と、前記クエリサブセット生成処理により生成された前記クエリサブセットを含む第２のクエリを取得する第２クエリ取得処理と、ユーザの行動履歴に基づいて、前記第２のクエリの、前記第１のクエリに対する類似度を算出する類似度算出処理と、前記類似度算出処理により算出された、前記第２のクエリの前記類似度に基づいて、前記クエリサブセットのスコアを算出するスコア算出処理と、複数の前記クエリサブセットのうち、前記スコア算出部により算出された前記スコアがより高いクエリサブセットを特定するクエリサブセット特定処理とを含む処理を実行させるためのものである。 One aspect of an information processing program according to the present invention is an information processing program for causing a computer to execute information processing, the program comprising: a first query acquisition process for acquiring a first query; a query subset generation process that divides the first query acquired by the first query acquisition process into a plurality of tokens and generates a plurality of query subsets each composed of one or more divided tokens; and the query subset generation. a second query acquisition process for acquiring a second query including the query subset generated by the process; and calculating a degree of similarity of the second query to the first query based on a user's action history. A score calculation process for calculating the score of the query subset based on the similarity of the second query calculated by the similarity calculation process and the similarity calculation process, and among the plurality of query subsets, and query subset identification processing for identifying a query subset having a higher score calculated by the score calculation unit.

本発明によれば、クエリに含まれる構成要素のそれぞれを事前定義することなく、クエリの構成要素の中から、ユーザの意図により合致する構成要素を高精度に特定することができる。
上記した本発明の目的、態様及び効果並びに上記されなかった本発明の目的、態様及び効果は、当業者であれば添付図面及び請求の範囲の記載を参照することにより下記の発明を実施するための形態から理解できるであろう。 Advantageous Effects of Invention According to the present invention, it is possible to specify with high accuracy a component that more closely matches the user's intention from among the components of a query without predefining each of the components included in the query.
The objects, aspects and effects of the present invention described above and the objects, aspects and effects of the present invention not described above can be understood by a person skilled in the art to carry out the following invention by referring to the accompanying drawings and the description of the claims. can be understood from the form of

図１は、本発明の実施形態に係るキーワード特定装置を含む検索システムの機能構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the functional configuration of a search system including a keyword specifying device according to an embodiment of the present invention. 図２は、本実施形態に係るキーワード特定装置が実行するキーワード特定処理の概略処理手順の一例を示すフローチャートである。FIG. 2 is a flowchart showing an example of a schematic processing procedure of keyword identification processing executed by the keyword identification device according to the present embodiment. 図３は、本実施形態に係るキーワード特定装置のクエリモデル生成部が生成し、クエリモデル記憶部が記憶する、クエリと属性との間の関係とその重みを規定するクエリモデルの一例を簡略化して説明する概念図である。FIG. 3 is a simplified example of a query model that defines the relationship between queries and attributes and their weights, generated by the query model generation unit of the keyword identification device according to the present embodiment and stored in the query model storage unit. 1 is a conceptual diagram for explanation. 図４は、図３のあるクエリについて類似度を算出する一例を説明する概念図である。FIG. 4 is a conceptual diagram illustrating an example of calculating similarity for a certain query in FIG. 図５は、図３の他のクエリについて類似度を算出する一例を説明する概念図である。FIG. 5 is a conceptual diagram illustrating an example of calculating similarities for other queries in FIG. 図６は、本実施形態に係るキーワード特定装置のシングル評価部が実行するシングル評価処理の詳細処理手順の一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of a detailed processing procedure of single evaluation processing executed by the single evaluation unit of the keyword identification device according to this embodiment. 図７は、オリジナルのクエリに対して、あるシングルを共有する他のクエリの類似度を算出し、算出された類似度に基づいて当該シングルのスコアを算出する一例を説明する概念図である。FIG. 7 is a conceptual diagram illustrating an example of calculating the similarity of other queries that share a certain single with respect to the original query, and calculating the score of the single based on the calculated similarity. 図８は、本実施形態に係るキーワード特定装置のハードウエア構成の一例を示すブロック図である。FIG. 8 is a block diagram showing an example of the hardware configuration of the keyword specifying device according to this embodiment.

以下、添付図面を参照して、本発明を実施するための実施形態について詳細に説明する。以下に開示される構成要素のうち、同一機能を有するものには同一の符号を付し、その説明を省略する。なお、以下に開示される実施形態は、本発明の実現手段としての一例であり、本発明が適用される装置の構成や各種条件によって適宜修正または変更されるべきものであり、本発明は以下の実施形態に限定されるものではない。また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。 Embodiments for carrying out the present invention will be described in detail below with reference to the accompanying drawings. Among the constituent elements disclosed below, those having the same functions are denoted by the same reference numerals, and descriptions thereof are omitted. The embodiments disclosed below are examples of means for realizing the present invention, and should be appropriately modified or changed according to the configuration of the device to which the present invention is applied and various conditions. is not limited to the embodiment of Also, not all combinations of features described in the present embodiment are essential for the solution means of the present invention.

本実施形態に係るキーワード特定装置は、検索エンジンに投入すべきクエリを、クエリを構成する個々のトークンに分割し、１つまたは複数のトークンからなるシングル（ｓｈｉｎｇｌｅ）を生成し、生成されたシングルのそれぞれについて、当該シングルを含むクエリを生成する。
本実施形態に係るキーワード特定装置はさらに、ユーザの行動履歴に基づいて生成されたクエリモデルを参照して、生成された各クエリのオリジナルのクエリに対する類似度を算出し、算出された類似度から、当該シングルのスコアを算出することにより、より高いスコアが算出されたシングルを、ユーザの意図により合致するキーワードとして特定する。 The keyword identification device according to this embodiment divides a query to be submitted to a search engine into individual tokens that make up the query, generates a single consisting of one or more tokens, and generates a single , generate a query containing that single.
The keyword identification device according to the present embodiment further refers to the query model generated based on the user's behavior history, calculates the similarity of each generated query to the original query, and from the calculated similarity , the score of the single is calculated, and the single with the higher calculated score is specified as a keyword that more closely matches the user's intention.

以下では、本実施形態が、例えば、ＥＣ（ＥｌｅｃｔｒｏｎｉｃＣｏｍｍｅｒｃｅ）に実装される検索エンジンに適用され、ユーザがアイテムを購入するためマルチワードのクエリを入力するユースケースにおいて、ユーザの行動履歴に基づいてクエリ中でよりユーザの意図に合致するキーワードを選択する用途に適用される一例を説明するが、本実施形態はこれに限定されず、あらゆる用途の検索に適用可能である。 In the following, for example, this embodiment is applied to a search engine implemented in EC (Electronic Commerce), and in a use case where a user inputs a multi-word query to purchase an item, based on the user's action history Although an example applied to the use of selecting a keyword that more closely matches the user's intention in a query will be described, this embodiment is not limited to this, and can be applied to searches for any use.

＜キーワード特定装置の機能構成＞
図１は、本実施形態に係るキーワード特定装置１の機能構成の一例を示すブロック図である。
図１に示すキーワード特定装置１は、クエリ入力部１１、クエリ分割部１２、クエリモデル生成部１３、シングル評価部１４、および出力部１５を備える。
キーワード特定装置１は、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）等で構成されるクライアント装置（不図示）とネットワークを介して通信可能に接続してよい。この場合、キーワード特定装置１はサーバに実装され、クライアント装置は、キーワード特定装置１が外部と情報の入出力を実行する際のユーザインタフェースを提供してよく、また、キーワード特定装置１の各コンポーネント１１～１５の一部または全部を備えてもよい。 <Functional Configuration of Keyword Identification Device>
FIG. 1 is a block diagram showing an example of the functional configuration of a keyword identification device 1 according to this embodiment.
The keyword identification device 1 shown in FIG.
The keyword identification device 1 may be communicably connected to a client device (not shown) such as a PC (Personal Computer) via a network. In this case, the keyword identification device 1 is mounted on a server, and the client device may provide a user interface when the keyword identification device 1 inputs/outputs information to/from the outside. A part or all of 11 to 15 may be provided.

図１を参照して、キーワード特定装置１、ユーザ行動履歴ＤＢ（データベース）２、検索エンジン３、およびクエリモデル記憶部４により、検索システムが構成され、キーワード特定装置１がユーザ行動履歴ＤＢ２にアクセスして、クエリモデルを生成してクエリモデル記憶部４に記憶し、キーワード特定処理の処理結果を検索エンジン３に供給する例が示されているが、本実施形態はこれに限定されない。例えば、キーワード特定装置１は、検索エンジン３内部に実装されてもよく、また、ユーザの行動履歴を検索エンジン３から直接取得してもよい。 Referring to FIG. 1, a keyword identification device 1, a user action history DB (database) 2, a search engine 3, and a query model storage unit 4 constitute a search system, and the keyword identification device 1 accesses the user action history DB 2. Then, a query model is generated, stored in the query model storage unit 4, and the processing result of the keyword identification processing is supplied to the search engine 3, but the present embodiment is not limited to this. For example, the keyword identification device 1 may be installed inside the search engine 3 or may directly acquire the user's action history from the search engine 3 .

クエリ入力部１１は、検索エンジン３に検索を実行させるための検索要求であるクエリを入力して、クエリ分割部１２へ供給する。クエリ入力部１１は、クエリを、ネットワークを介して接続されるクライアント装置からリアルタイムで受信してもよく、あるいは予め記憶装置に格納されたクエリを取得してもよい。
クエリ入力部１１はまた、キーワード特定装置１においてキーワード特定処理を実行するために必要な各種パラメータの入力を受け付ける。クエリ入力部１１は、キーワード特定装置１と通信可能に接続されるクライアント装置のユーザインタフェースを介して、各種パラメータの入力を受け付けてよい。 The query input unit 11 inputs a query, which is a search request for causing the search engine 3 to execute a search, and supplies the query to the query division unit 12 . The query input unit 11 may receive a query in real time from a client device connected via a network, or may acquire a query stored in advance in a storage device.
The query input unit 11 also receives input of various parameters necessary for executing keyword identification processing in the keyword identification device 1 . The query input unit 11 may receive input of various parameters via a user interface of a client device communicably connected to the keyword identification device 1 .

クエリ分割部１２は、クエリ入力部１１から供給されるクエリを、複数のトークンに分割する。複数のキーワードを含むマルチワードのクエリから、複数のトークンに分割するには、クエリに文字列として記述されるキーワード間の区切り文字（スペース、カンマ、セミコロン等）を検出してもよく、あるいはクエリを形態素解析して、品詞に分解してもよい。
本実施形態において、クエリ分割部１２はさらに、分割された複数のトークンから、後述するシングル評価部１４による評価（スコア算出）の対象となる複数のシングルを生成し、生成された複数のシングルを、シングル評価部１４に供給する。 The query division unit 12 divides the query supplied from the query input unit 11 into multiple tokens. To split a multi-word query containing multiple keywords into multiple tokens, you can detect the delimiters (space, comma, semicolon, etc.) between keywords written as a string in the query, or can be morphologically analyzed and decomposed into parts of speech.
In this embodiment, the query dividing unit 12 further generates multiple singles to be evaluated (score calculation) by the single evaluation unit 14 described later from the divided tokens, and divides the generated singles into , to the single evaluation unit 14 .

トークンとは、クエリに含まれるキーフレーズやキーワードを構成する意味的な最小単位であり、１つの語（ワード）が１つのトークンに相当する。
シングル（ｓｈｉｎｇｌｅ）とは、クエリの部分一致（ｐａｒｔｉａｌｍａｔｃｈ）検索に使用されるキーフレーズまたはキーワードであり、１つまたは複数のトークン（語）から構成される。 A token is a semantic minimum unit that constitutes a key phrase or keyword included in a query, and one word corresponds to one token.
A shingle is a keyphrase or keyword used for partial match search of a query and is composed of one or more tokens (words).

非限定的一例として、検索要求としてのクエリが、「ｔｒａｖｅｌｃｏｆｆｅｅｍｕｇ」であるものとする。この場合、クエリ「ｔｒａｖｅｌｃｏｆｆｅｅｍｕｇ」は、３つのトークン「ｔｒａｖｅｌ」、「ｃｏｆｆｅｅ」、および「ｍｕｇ」に分割される。各トークンはそれぞれ最小単位である１つの語（ワード）からなる。
これら３つのトークン「ｔｒａｖｅｌ」、「ｃｏｆｆｅｅ」、および「ｍｕｇ」から、２つのトークン（トークンのクラスタ）から構成されるシングルとして、「ｔｒａｖｅｌｃｏｆｆｅｅ」、「ｔｒａｖｅｌｍｕｇ」、「ｃｏｆｆｅｅｍｕｇ」、１つのトークンから構成されるシングルとして、「ｔｒａｖｅｌ」、「ｃｏｆｆｅｅ」、および「ｍｕｇ」が、それぞれ生成される。 As a non-limiting example, assume that a query as a search request is "travel coffee mug". In this case, the query "travel coffee mug" is split into three tokens "travel", "coffee" and "mug". Each token consists of one word, which is the minimum unit.
From these three tokens "travel", "coffee", and "mug", a single consisting of two tokens (a cluster of tokens) can be written as "travel coffee", "travel mug", "coffee mug", one As singles composed of tokens, "travel", "coffee" and "mug" are generated respectively.

本実施形態において、シングル「ｔｒａｖｅｌｍｕｇ」から理解されるように、クエリ中で連続する語同士を組み合わせるだけでなく、クエリの文字列上、連続（隣接）しない語同士を組み合わせることでも、シングルが生成される。
このように、シングルは、１つのトークンまたは複数のトークンの任意の組み合わせで構成される、キーワードまたはキーフレーズである。すなわち、シングルは、トークンのセット（組）の任意のサブセットであり、クエリの任意のサブセット（クエリサブセット）である。マルチワードクエリからは、複数のシングル、すなわち、複数のクエリのサブセットが生成される。
以下、キーワードおよびキーフレーズを総称して、「キーワード」という。すなわち、キーワードとは、検索のキーとなる、１つまたは複数の語からなるものとする。シングルを生成することで、検索範囲を拡大し、より多くの検索結果が出力され得る。 In this embodiment, as can be understood from the single "travel mug", not only combining consecutive words in the query, but also combining non-consecutive (adjacent) words in the query character string, the single generated.
Thus, a single is a keyword or keyphrase that consists of one token or any combination of multiple tokens. That is, a single is any subset of the set of tokens (tuple) and any subset of queries (query subset). Multiple singles, or subsets of multiple queries, are generated from multi-word queries.
Hereinafter, keywords and key phrases are collectively referred to as "keywords". In other words, a keyword consists of one or more words that serve as search keys. By generating singles, the search range can be expanded and more search results can be output.

クエリモデル生成部１３は、ユーザの行動履歴に基づいて、クエリモデルを生成し、クエリモデル記憶部４に出力する。クエリモデル記憶部４は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の不揮発性記憶装置で構成され、クエリモデル生成部により生成されたクエリモデルを記憶する。
クエリモデルとは、クエリと、当該クエリに関連付けられる１つまたは複数の属性と、クエリと各属性とを接続するリンクとから構成される、クエリと属性との意味的な関係をモデル化したグラフである。クエリと各属性とを接続するリンクには、リンクの両端に接続されるクエリと属性との間の関連性の大きさを示す重みが付与されている。クエリモデルは、複数のクエリを、これら複数のクエリの双方に関連付けられる１つまたは複数の属性を介して接続したグラフであってもよい。 The query model generation unit 13 generates a query model based on the user's action history, and outputs the query model to the query model storage unit 4 . The query model storage unit 4 is composed of a non-volatile storage device such as a HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores the query model generated by the query model generation unit.
A query model is a graph that models the semantic relationship between a query and attributes, consisting of a query, one or more attributes associated with the query, and links connecting the query and each attribute. is. A link that connects a query and each attribute is given a weight that indicates the degree of relevance between the query and the attribute that are connected to both ends of the link. A query model may be a graph connecting multiple queries via one or more attributes associated with both of the multiple queries.

本実施形態において、クエリに関連付けられる属性とは、例えば、すでに購入されたアイテムを特徴付ける、アイテムのジャンル（カテゴリ）、ベンター（ブランド）、アイテム名ないしイテムＩＤ、価格帯、状態（新品、新古品、中古品等）、タグ（セール品、非セール品）等の情報を含んでよいが、これに限定されず、クエリに関連付けて取得することが可能なあらゆる情報であってよい。 In this embodiment, the attributes associated with the query are, for example, item genre (category), vendor (brand), item name or item ID, price range, condition (new, new and old) that characterize already purchased items. , used items, etc.), tags (sale items, non-sale items), etc., but is not limited thereto, and may be any information that can be acquired in association with the query.

本実施形態では、クエリモデル生成部１３は、ユーザ行動履歴ＤＢ２に格納されるユーザの行動履歴（ｂｅｈａｖｉｏｒｈｉｓｔｏｒｙ）、例えば、投入されたクエリに対応してユーザが行った、アイテムに対して実行した操作、およびこれらの操作の頻度等に基づいて、クエリモデルを生成する。
なお、アイテムに対してユーザが実行する操作は、例えば、アイテムの画像やリンクをクリックする操作、アイテムの画像をタップして閲覧する操作、「お気に入り」のタブ付けする操作、購入操作等を含むがこれに限定されず、アイテムを購入するプロセスにおいて発生し得るあらゆる操作を含む。以下、これらの操作を単に、「操作」という。 In this embodiment, the query model generation unit 13 stores the behavior history of the user stored in the user behavior history DB 2, for example, a Generate a query model based on the operations, the frequency of those operations, and so on.
Operations performed by the user on items include, for example, an operation of clicking an image of an item or a link, an operation of tapping an image of an item to view it, an operation of tabbing "Favorites", and a purchase operation. includes, but is not limited to, any operation that may occur in the process of purchasing an item. Hereinafter, these operations are simply referred to as "operations".

具体的には、クエリモデル生成部１３は、投入されたクエリと、当該クエリに対応して購入されたアイテムとをリンクで接続し、各リンクに購入回数等に応じて重みが付与されたクエリモデルのグラフを生成し、生成されたグラフをクエリモデル記憶部４に記憶する。ユーザの操作等の行動をトラッキングすることで、購入されたアイテムの属性を、購入のため当初実行されたクエリにリンクすることができ、また、リンクに付与すべき重みを決定することができる。このように生成されるクエリモデルは、アイテム購入に関連付けられるユーザの意図をモデル化するものである。
クエリモデル生成部１３は、ユーザ行動履歴ＤＢ２を参照して、検索エンジン３に対して過去に投入されたクエリおよび当該クエリに対応してユーザが行った操作から、クエリモデルを事前に生成してよい。クエリモデルは、ユーザごと個々に生成されてもよく、年齢層、性別、職業、過去の購入額等でグルーピングしたユーザ群について生成されてもよい。このクエリモデルの詳細は、図４～図６を参照して後述する。 Specifically, the query model generation unit 13 connects the input query and the item purchased corresponding to the query with a link, and weights each link according to the number of purchases. A model graph is generated, and the generated graph is stored in the query model storage unit 4 . By tracking actions such as user actions, the attributes of the purchased item can be linked to the query that was originally run for the purchase, and the weight that should be given to the link can be determined. The query model thus generated models the user's intent associated with purchasing an item.
The query model generation unit 13 refers to the user behavior history DB 2, and generates a query model in advance from queries input to the search engine 3 in the past and operations performed by the user in response to the queries. good. A query model may be generated individually for each user, or may be generated for a group of users grouped by age group, gender, occupation, past purchase amount, or the like. Details of this query model will be described later with reference to FIGS.

シングル評価部１４は、クエリモデル記憶部４に記憶されるクエリモデルを参照して、クエリ分割部１２から供給される複数のシングルをそれぞれ評価する。
具体的には、シングル評価部１４は、クエリ入力部１１に入力された、キーワード特定処理の処理対象であるクエリ（以下、「オリジナルクエリ」ともいう）と、当該オリジナルクエリから生成された複数のシングルとに基づいて、それぞれのシングルについて、当該オリジナルクエリに対するスコアを算出することにより、複数のシングルを評価する。
シングル評価部１４が算出するスコアは、当該オリジナルクエリにおいて、各シングルがユーザの意図に合致している程度を示す指標となる。このシングル評価処理の詳細は、図３を参照して後述する。 The single evaluation unit 14 refers to the query models stored in the query model storage unit 4 and evaluates each of the singles supplied from the query division unit 12 .
Specifically, the single evaluation unit 14 inputs a query (hereinafter, also referred to as an “original query”) to be processed for keyword identification processing input to the query input unit 11, and a plurality of queries generated from the original query. A plurality of singles are evaluated by calculating a score for the original query for each single based on the singles.
The score calculated by the single evaluation unit 14 serves as an index indicating the extent to which each single matches the user's intention in the original query. The details of this single evaluation process will be described later with reference to FIG.

出力部１５は、シングル評価部１４により高いスコアが算出された１つまたは複数のシングルを、表示装置等の出力デバイスに出力するとともに、検索エンジン３に供給する。出力部１５はまた、キーワード特定装置１が実行するキーワード特定処理の各種入力データや各種処理結果、クエリモデル記憶部１３が記憶するクエリモデル等を、表示装置等の出力デバイスを介して適宜出力してよい。 The output unit 15 outputs one or a plurality of singles for which a high score is calculated by the single evaluation unit 14 to an output device such as a display device, and supplies the singles to the search engine 3 . The output unit 15 also outputs various input data and various processing results of the keyword identification process executed by the keyword identification device 1, query models stored in the query model storage unit 13, and the like as appropriate via an output device such as a display device. you can

＜キーワード特定処理の処理手順＞
図２は、本実施形態に係るキーワード特定装置１が実行するキーワード特定処理の処理手順の一例を示すフローチャートである。
なお、図２の各ステップは、キーワード特定装置１のＨＤＤ等の記憶装置に記憶されたプログラムをＣＰＵが読み出し、実行することで実現される。また、図２に示すフローチャートの少なくとも一部をハードウエアにより実現してもよい。ハードウエアにより実現する場合、例えば、所定のコンパイラを用いることで、各ステップを実現するためのプログラムからＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）上に自動的に専用回路を生成すればよい。また、ＦＰＧＡと同様にしてＧａｔｅＡｒｒａｙ回路を形成し、ハードウエアとして実現するようにしてもよい。また、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）により実現するようにしてもよい。後述する図３の各ステップについても同様である。 <Processing procedure for keyword identification processing>
FIG. 2 is a flowchart showing an example of the procedure of keyword identification processing executed by the keyword identification device 1 according to this embodiment.
Each step in FIG. 2 is implemented by the CPU reading out and executing a program stored in a storage device such as an HDD of the keyword identification device 1 . Also, at least part of the flowchart shown in FIG. 2 may be realized by hardware. When implemented by hardware, for example, by using a predetermined compiler, a dedicated circuit may be automatically generated on an FPGA (Field Programmable Gate Array) from a program for implementing each step. Also, a Gate Array circuit may be formed in the same manner as the FPGA and implemented as hardware. Also, it may be realized by an ASIC (Application Specific Integrated Circuit). The same applies to each step in FIG. 3, which will be described later.

Ｓ１で、キーワード特定装置１のクエリ入力部１１は、検索エンジン３に検索を実行させるための検索要求であるクエリの入力を受け付ける。入力されるクエリは、複数のキーワードを含むマルチワードクエリであってよいが、単一のキーワードを含むシングルワードクエリであってもよい。クエリ入力部１１は、ネットワークを介して接続されるクライアント装置からクエリを受信してもよく、あるいは予め記憶装置に格納されたクエリを取得してもよい。
Ｓ２で、キーワード特定装置１のクエリ分割部１２は、クエリ入力部１１から供給されるクエリを、複数のトークンに分割する。複数のトークンのそれぞれは、単一のキーワードである。 In S1, the query input unit 11 of the keyword identification device 1 receives input of a query, which is a search request for causing the search engine 3 to perform a search. The input query may be a multi-word query containing multiple keywords, but may also be a single-word query containing a single keyword. The query input unit 11 may receive a query from a client device connected via a network, or acquire a query stored in advance in a storage device.
In S2, the query division unit 12 of the keyword identification device 1 divides the query supplied from the query input unit 11 into a plurality of tokens. Each of the multiple tokens is a single keyword.

Ｓ３で、キーワード特定装置１のクエリ分割部１２は、Ｓ２で分割された複数のトークンから、複数のシングルを生成する。Ｓ１で入力されたクエリから、Ｓ２で生成されるシングルは、１つまたは複数のトークンからなる、クエリのサブセットである。それぞれのシングルは、シングル評価部１４により評価されて、スコアが付与される。 At S3, the query division unit 12 of the keyword identification device 1 generates a plurality of singles from the plurality of tokens divided at S2. From the query input at S1, the single generated at S2 is a subset of the query, consisting of one or more tokens. Each single is evaluated by the single evaluator 14 and given a score.

Ｓ４で、キーワード特定装置１のシングル評価部１４は、Ｓ３で生成された複数のシングルのそれぞれに対して、スコアを算出することで、それぞれのシングルを評価する。
具体的には、シングル評価部１４は、クエリモデル記憶部４に記憶されるクエリモデルを参照して、それぞれのシングルについて、Ｓ１でクエリを入力した当該ユーザの行動履歴に基づいて、Ｓ１で入力されたクエリに対するスコアを算出する。ここで算出されるシングルのスコアは、Ｓ１で入力されたクエリ中で、アイテム購入におけるユーザの意図にどれだけ合致しているか、すなわちユーザの意図を表現するのにどれだけ有意であるかの指標となる。シングル評価部１４が実行するシングル評価処理の詳細は、図６を参照して後述する。 In S4, the single evaluation unit 14 of the keyword identification device 1 evaluates each single by calculating a score for each of the multiple singles generated in S3.
Specifically, the single evaluation unit 14 refers to the query model stored in the query model storage unit 4, and for each single, based on the action history of the user who entered the query in S1, inputs the query in S1. Calculates the score for the given query. The single score calculated here is an indicator of how well the query input in S1 matches the user's intention in purchasing the item, that is, how significant it is in expressing the user's intention. becomes. The details of the single evaluation process executed by the single evaluation unit 14 will be described later with reference to FIG.

Ｓ５で、キーワード特定装置１の出力部１５は、Ｓ４でスコアが算出された複数のシングルのうち、高いスコアが算出された１つまたは複数のシングルを、ユーザの意図を最も顕著に表現するキーワードであると判断することができる。Ｓ５で、出力部１５は、出力すべき１つまたは複数のシングルを、検索エンジン３に供給してよい。
検索エンジン３は、例えば、出力部１５から供給されるシングルに対応する検索結果がより優先的にユーザに提示されるよう、クエリによる検索結果をソートしてよい。あるいは、検索エンジン３は、Ｓ１で入力されたクエリを検索キーとする検索結果が少数である場合や検索範囲が過度に狭い場合、出力部１５から供給されるシングルを含む他のクエリを検索キーとして、再度検索を実行してもよい。これにより、検索の精度が向上し、ユーザの意図により合致する検索結果の提示が可能となる。 In S5, the output unit 15 of the keyword identification device 1 selects one or a plurality of singles for which a high score was calculated among the plurality of singles for which the score was calculated in S4 as a keyword that most conspicuously expresses the user's intention. can be determined to be At S5, the output unit 15 may supply the search engine 3 with one or more singles to be output.
The search engine 3 may, for example, sort search results by query so that search results corresponding to singles supplied from the output unit 15 are preferentially presented to the user. Alternatively, if the search results using the query input in S1 as a search key are few or if the search range is too narrow, the search engine 3 uses other queries including singles supplied from the output unit 15 as search keys. , the search may be performed again. As a result, search accuracy is improved, and search results that more closely match the user's intention can be presented.

＜クエリモデルの生成およびクエリ間の類似度算出＞
図３～図５を参照して、クエリモデルの生成およびクエリ間の類似度算出を説明する。なお、図３～図５においては、説明を簡略化するため、各クエリが単一のキーワードからなるシングルワードクエリである例を説明するが、複数のキーワードを含むマルチワードクエリの場合でも同様に処理することができる。 <Generation of query model and calculation of similarity between queries>
Generating a query model and calculating similarity between queries will be described with reference to FIGS. In addition, in FIGS. 3 to 5, in order to simplify the explanation, an example in which each query is a single-word query consisting of a single keyword will be explained. can be processed.

図３は、クエリと属性との間の関係とその重みを規定するクエリモデルの一例を簡略化して説明する概念図である。図３を参照して、クエリ「ｐａｎｔｓ」は、属性「ｋｉｄｓ」、「ｗｏｍｅｎｓ」、および「ｍｅｎｓ」とリンクにより接続されている。クエリ「ｐａｎｔｓ」と属性「ｋｉｄｓ」との間のリンクには重み０．２が、クエリ「ｐａｎｔｓ」と属性「ｗｏｍｅｎｓ」との間のリンクには重み０．３が、クエリ「ｐａｎｔｓ」と属性「ｍｅｎｓ」との間のリンクには重み０．５が、それぞれ付与されている。
図３に示すクエリ「ｐａｎｔｓ」のクエリモデルは、クエリ「ｐａｎｔｓ」を検索エンジンに投入したユーザが、属性「ｋｉｄｓ」のカテゴリのアイテム、属性「ｗｏｍｅｎｓ」のカテゴリのアイテム、および属性「ｍｅｎｓ」のカテゴリのアイテムについて、操作等の行動履歴を有することを示している。すなわち、それぞれの属性は、投入されたクエリに対する検索結果をカテゴリ化して得られる属性であり、クエリ間の類似度は、オリジナルクエリの検索結果セットと他のクエリの検索結果セットとが類似する程度を示す。
それぞれのリンクに付与されている重みは、操作の頻度と相関を持つ。操作の種別に応じて、例えば、クリック操作より購入操作により高い重みが付与されるように、重みを設定してもよい。 FIG. 3 is a conceptual diagram for simplifying and explaining an example of a query model that defines the relationship between queries and attributes and their weights. Referring to FIG. 3, the query "pants" is linked to the attributes "kids", "womens", and "mens". The link between the query "pants" and the attribute "kids" has a weight of 0.2, the link between the query "pants" and the attribute "womens" has a weight of 0.3, and the link between the query "pants" and the attribute "kids" has a weight of 0.3. A weight of 0.5 is given to each link between "mens".
The query model of the query "pants" shown in FIG. It shows that the items in the category have action histories such as operations. That is, each attribute is an attribute obtained by categorizing the search results for the input query, and the similarity between queries is the degree of similarity between the search result set of the original query and the search result set of another query. indicates
The weight assigned to each link correlates with the frequency of operations. Depending on the type of operation, the weight may be set such that, for example, a higher weight is given to the purchase operation than to the click operation.

図３を参照して、クエリ「ｄｒｅｓｓ」、「ｊｅａｎｓ」、および「ｓｈｉｒｔｓ」はそれぞれ、属性「ｋｉｄｓ」、「ｗｏｍｅｎｓ」、および「ｍｅｎｓ」とリンクにより接続されている。これらのクエリ「ｄｒｅｓｓ」、「ｊｅａｎｓ」、および「ｓｈｉｒｔｓ」はそれぞれ、クエリ「ｐａｎｔｓ」と、属性「ｋｉｄｓ」、「ｗｏｍｅｎｓ」、および「ｍｅｎｓ」を共有する他のクエリである。図３において、クエリ「ｐａｎｔｓ」が図２のＳ１で入力されるオリジナルクエリ、クエリ「ｄｒｅｓｓ」、「ｊｅａｎｓ」、および「ｓｈｉｒｔｓ」がそれぞれ、シングルのスコアを算出するために、クエリモデル記憶部４から読み出される他のクエリであるものとする。 Referring to FIG. 3, the queries 'dress', 'jeans' and 'shirts' are linked to the attributes 'kids', 'womens' and 'mens' respectively. These queries 'dress', 'jeans' and 'shirts' are the other queries that share the attributes 'kids', 'womens' and 'mens' with the query 'pants' respectively. In FIG. 3, the query model storage unit 4 uses the query model storage unit 4 to calculate the single score for each of the original query input in S1 of FIG. be another query read from

クエリ「ｄｒｅｓｓ」と属性「ｋｉｄｓ」との間のリンクには重み０．４が、クエリ「ｄｒｅｓｓ」と属性「ｗｏｍｅｎｓ」との間のリンクには重み０．６が、それぞれ付与されている。一方、クエリ「ｄｒｅｓｓ」は、属性「ｍｅｎｓ」に関連付けられていないため、属性「ｍｅｎｓ」へのリンクは持たない。
クエリ「ｊｅａｎｓ」と属性「ｋｉｄｓ」との間のリンクには重み０．３が、クエリ「ｊｅａｎｓ」と属性「ｗｏｍｅｎｓ」との間のリンクには重み０．２が、クエリ「ｊｅａｎｓ」と属性「ｍｅｎｓ」との間のリンクには重み０．５が、それぞれ付与されている。
クエリ「ｓｈｉｒｔｓ」と属性「ｋｉｄｓ」との間のリンクには重み０．３が、クエリ「ｓｈｉｒｔｓ」と属性「ｗｏｍｅｎｓ」との間のリンクには重み０．４が、クエリ「ｓｈｉｒｔｓ」と属性「ｍｅｎｓ」との間のリンクには重み０．３が、それぞれ付与されている。 A link between the query "dress" and the attribute "kids" is given a weight of 0.4, and a link between the query "dress" and the attribute "womens" is given a weight of 0.6. On the other hand, the query "dress" has no link to the attribute "mens" because it is not associated with the attribute "mens".
The link between the query "jeans" and the attribute "kids" has a weight of 0.3, the link between the query "jeans" and the attribute "womens" has a weight of 0.2, and the link between the query "jeans" and the attribute "womens" has a weight of 0.2. A weight of 0.5 is given to each link between "mens".
The link between the query "shirts" and the attribute "kids" has a weight of 0.3; the link between the query "shirts" and the attribute "womens" has a weight of 0.4; A weight of 0.3 is assigned to each link to "mens".

図３において、他のクエリ「ｄｒｅｓｓ」、「ｊｅａｎｓ」、および「ｓｈｉｒｔｓ」のクエリモデルは、オリジナルクエリ「ｐａｎｔｓ」のクエリモデルと、属性をそれぞれ共有するため、オリジナルクエリ「ｐａｎｔｓ」のクエリモデルにそれぞれ類似する。
図４は、図３から、オリジナルクエリ「ｐａｎｔｓ」のクエリモデルおよび他のクエリ「ｓｈｉｒｔｓ」のクエリモデルを抽出した概念図である。 In FIG. 3, the query models of the other queries “dress”, “jeans”, and “shirts” share attributes with the query model of the original query “pants”. similar to each other.
FIG. 4 is a conceptual diagram of the query model of the original query "pants" and the query model of another query "shirts" extracted from FIG.

本実施形態において、シングル評価部１４は、オリジナルクエリと他のクエリとの間で共有される属性について、それぞれのクエリから属性へのリンクに付与された重みを演算することで、類似度を算出する。例えば、シングル評価部１４は、対となるリンクの重みの最小値（ｐａｉｒｗｉｓｅｍｉｎｉｍｕｍ）を加算することにより、類似度を算出してよい。
図４を参照して、属性「ｗｏｍｅｎｓ」への対のリンクの重みの最小値は０．３、属性「ｋｉｄｓ」への対のリンクの重みの最小値は０．２、属性「ｍｅｎｓ」への対のリンクの重みの最小値は０．３である。このため、これら３つの最小値の和である０．８が、クエリ「ｓｈｉｒｔｓ」のオリジナルクエリ「ｐａｎｔｓ」に対する類似度となる。 In this embodiment, the single evaluation unit 14 calculates the similarity by calculating the weight given to the link from each query to the attribute shared by the original query and other queries. do. For example, the single evaluation unit 14 may calculate the degree of similarity by adding the minimum value of the weights of the paired links (pairwise minimum).
Referring to FIG. 4, the minimum weight of paired links to attribute “women” is 0.3, the minimum weight of paired links to attribute “kids” is 0.2, and the minimum weight of paired links to attribute “mens” is is 0.3. Therefore, the sum of these three minimum values, 0.8, is the degree of similarity of the query "shirts" to the original query "pants".

一方、図５は、図３から、オリジナルクエリ「ｐａｎｔｓ」のクエリモデルおよび他のクエリ「ｊｅａｎｓ」のクエリモデルを抽出した概念図である。
図５を参照して、属性「ｗｏｍｅｎｓ」への対のリンクの重みの最小値は０．２、属性「ｋｉｄｓ」への対のリンクの重みの最小値は０．２、属性「ｍｅｎｓ」への対のリンクの重みの最小値は０．５である。このため、これら３つの最小値の和である０．９が、クエリ「ｊｅａｎｓ」のオリジナルクエリ「ｐａｎｔｓ」に対する類似度となる。
図３～図５に示す例において、クエリ「ｊｅａｎｓ」は、クエリ「ｓｈｉｒｔｓ」より、オリジナルクエリ「ｐａｎｔｓ」に対する類似度が高いと評価することができる。 On the other hand, FIG. 5 is a conceptual diagram in which the query model of the original query "pants" and the query model of another query "jeans" are extracted from FIG.
Referring to FIG. 5, the minimum weight of paired links to attribute “womens” is 0.2, the minimum weight of paired links to attribute “kids” is 0.2, and the minimum weight of paired links to attribute “mens” is is 0.5. Therefore, the sum of these three minimum values, 0.9, is the degree of similarity of the query “jeans” to the original query “pants”.
In the examples shown in FIGS. 3 to 5, it can be evaluated that the query "jeans" has a higher degree of similarity to the original query "pants" than the query "shirts".

＜シングル評価処理の詳細処理手順＞
図６は、本実施形態に係るキーワード特定装置１のシングル評価部１４が実行するシングル評価処理の詳細処理手順の一例を示すフローチャートである。
Ｓ４１で、キーワード特定装置１のシングル評価部１４は、図２のＳ１で入力されたオリジナルクエリの属性および重みを決定する。
具体的には、シングル評価部１４は、クエリモデル記憶部４を参照して、Ｓ１で入力されたオリジナルクエリのクエリモデルがクエリモデル記憶部４にすでに記憶されている場合には、クエリモデル記憶部４からオリジナルクエリのクエリモデルを読み出し、Ｓ４２をスキップして、Ｓ４３に進む。 <Detailed processing procedure for single evaluation processing>
FIG. 6 is a flowchart showing an example of a detailed processing procedure of single evaluation processing executed by the single evaluation unit 14 of the keyword identification device 1 according to this embodiment.
In S41, the single evaluation unit 14 of the keyword identification device 1 determines attributes and weights of the original query input in S1 of FIG.
Specifically, the single evaluation unit 14 refers to the query model storage unit 4, and if the query model of the original query input in S1 is already stored in the query model storage unit 4, the query model storage unit 4 The query model of the original query is read from the unit 4, skipping S42, and proceeding to S43.

一方、オリジナルクエリのクエリモデルがクエリモデル記憶部４に記憶されていない場合は、シングル評価部１４は、ユーザ行動履歴ＤＢ２を参照して、Ｓ１でオリジナルクエリを入力したユーザが過去に購入したアイテムに関連付けられる属性を取得して、オリジナルクエリの属性として決定してよく、あるいは、オリジナルクエリが入力された際に表示されているＥＣサイトのページの情報から購入しようとするアイテムの属性を推定してもよい。また、重みには属性ごとに、所定の初期値が設定されてよい。あるいは、シングル評価部１４は、例えば、オリジナルクエリと最も多くのトークンを共有するクエリのクエリモデルをクエリモデル記憶部１３から読み出して、読み出されたクエリモデルが有する属性および重みを、オリジナルクエリの属性および重みとして決定、あるいは類推してよい。
また、過去に行動履歴がない新規ユーザの場合は、メンバー登録の際などに取得可能なユーザの情報に近いカテゴリの他のユーザやユーザグループの行動履歴やクエリモデルから、オリジナルクエリの初期のクエリモデルを類推してもよい。 On the other hand, if the query model of the original query is not stored in the query model storage unit 4, the single evaluation unit 14 refers to the user behavior history DB 2 and refers to the items purchased in the past by the user who entered the original query in S1. may be determined as the attribute of the original query, or the attribute of the item to be purchased may be estimated from the information on the EC site page displayed when the original query was entered. may Also, a predetermined initial value may be set for the weight for each attribute. Alternatively, the single evaluation unit 14, for example, reads from the query model storage unit 13 the query model of the query that shares the most tokens with the original query, and assigns the attributes and weights of the read query model to those of the original query. It may be determined or inferred as attributes and weights.
In the case of a new user who has no past action history, the initial query of the original query is obtained from the action history and query model of other users and user groups in categories close to the user's information that can be obtained when registering as a member. A model may be inferred.

Ｓ４２で、キーワード特定装置１のシングル評価部１４は、クエリモデル生成部１３に、Ｓ１で入力されたオリジナルクエリに対して、Ｓ４１で決定された属性および重みを関連付けて、クエリモデルを生成させる。クエリモデル生成部１３は、Ｓ４２で生成されたクエリモデルを、クエリモデル記憶部４に記憶してよい。 In S42, the single evaluation unit 14 of the keyword identification device 1 causes the query model generation unit 13 to generate a query model by associating the attribute and weight determined in S41 with the original query input in S1. The query model generation unit 13 may store the query model generated in S<b>42 in the query model storage unit 4 .

Ｓ４３で、キーワード特定装置１のシングル評価部１４は、図２のＳ３で生成されたシングルごとに、当該シングル、すなわち当該シングルに含まれるすべてのトークン、をオリジナルクエリと共有する他のクエリを、クエリモデル記憶部４から取得する。
シングル評価部１４は、当該シングルをオリジナルクエリと共有し、かつオリジナルクエリと少なくとも１つの属性を共有する他のクエリを取得してよい。属性を共有しないクエリ間の類似度は、図４および図５の例の類似度算出に従えば、０となるからである。 In S43, the single evaluation unit 14 of the keyword identification device 1, for each single generated in S3 of FIG. Acquired from the query model storage unit 4 .
The single evaluator 14 may obtain other queries that share the single with the original query and share at least one attribute with the original query. This is because the degree of similarity between queries that do not share attributes is 0 according to the degree of similarity calculation in the examples of FIGS.

Ｓ４４で、キーワード特定装置１のシングル評価部１４は、Ｓ４３で取得された他のクエリのそれぞれについて、オリジナルクエリに対する類似度を算出する。
具体的には、シングル評価部１４は、オリジナルクエリのクエリモデルと他のクエリのクエリモデルとが共有する属性について、双方のクエリから接続される対のリンクに付与された重みを演算することにより、他のクエリのオリジナルクエリに対する類似度を算出する。図４および図５を参照してすでに説明したように、シングル評価部１４は、例えば、属性が有する対となるリンクの重みの最小値を加算することで、類似度を算出してよい。 In S44, the single evaluation unit 14 of the keyword identification device 1 calculates the degree of similarity to the original query for each of the other queries acquired in S43.
Specifically, the single evaluation unit 14 calculates the weight given to the pair of links connected from both queries for the attribute shared by the query model of the original query and the query model of the other query. , to calculate the similarity of other queries to the original query. As already described with reference to FIGS. 4 and 5, the single evaluation unit 14 may calculate the degree of similarity by, for example, adding the minimum value of the weights of the paired links of the attributes.

Ｓ４５で、キーワード特定装置１のシングル評価部１４は、Ｓ４３で取得された他のクエリのすべてについてＳ４４で算出された類似度に基づいて、評価対象のシングルのスコアを算出する。
具体的には、シングル評価部１４は、Ｓ４３で取得された複数のクエリのすべてについて、Ｓ４４で算出された複数の類似度の平均、例えば、幾何平均（ｇｅｏｍｅｔｒｉｃｍｅａｎ）、を、評価対象のシングルのスコアとして算出してよい。 In S45, the single evaluation unit 14 of the keyword identification device 1 calculates the score of the evaluation target single based on the similarities calculated in S44 with respect to all other queries acquired in S43.
Specifically, the single evaluation unit 14 calculates the average of the plurality of similarities calculated in S44 for all of the plurality of queries acquired in S43, for example, the geometric mean, as the evaluation target single. may be calculated as a score of

Ｓ４６で、キーワード特定装置１のシングル評価部１４は、図２のＳ３で生成された複数のシングルのうち、すべてのシングルが評価されたか否かを判定する。
未処理のシングルがある場合（Ｓ４６：Ｎ）、Ｓ４３に戻り、Ｓ４３からＳ４６までの処理を繰り返す。一方、未処理のシングルがない場合（Ｓ４６：Ｙ）、図２のＳ５に進み、より高いスコアが算出された１つまたは複数のシングルを、ユーザの意図に合致するキーワードとして出力する。
このように、本実施形態に係るキーワード特定装置１は、オリジナルクエリを構成する個別のトークン自体を評価するのではなく、これらのトークンを含む他のクエリにそれぞれ関連付けられるユーザの行動履歴を評価することで、トークンやシングルを事前に定義することなく、キーワードを特定することができる。 In S46, the single evaluation unit 14 of the keyword identification device 1 determines whether or not all of the singles generated in S3 of FIG. 2 have been evaluated.
If there is an unprocessed single (S46: N), the process returns to S43 and repeats the processes from S43 to S46. On the other hand, if there are no unprocessed singles (S46: Y), the process proceeds to S5 in FIG. 2, and one or more singles for which a higher score is calculated are output as keywords that match the user's intention.
In this way, the keyword identification device 1 according to this embodiment does not evaluate the individual tokens themselves that make up the original query, but evaluates the user's action history associated with each other query containing these tokens. allows you to specify keywords without predefining tokens or singles.

図７は、オリジナルのクエリに対して、あるシングルを共有する他のクエリの類似度を算出し、算出された類似度に基づいて当該シングルのスコアを算出する一例を説明する概念図である。
図７を参照して、図２のＳ１で入力されるオリジナルクエリが、｛“ａ”，“ｂ”，“ｃ”｝であるものとする。この場合、図２のＳ２で、オリジナルクエリは、３つのトークン｛“ａ”｝、｛“ｂ”｝、｛“ｃ”｝に分割され、図２のＳ３で、６つのシングル｛“ａ”，“ｂ”｝、｛“ａ”，“ｃ”｝、｛“ｂ”，“ｃ”｝、｛“ａ”｝、｛“ｂ”｝、｛“ｃ”｝が生成される。 FIG. 7 is a conceptual diagram illustrating an example of calculating the similarity of other queries that share a certain single with respect to the original query, and calculating the score of the single based on the calculated similarity.
Referring to FIG. 7, it is assumed that the original query input in S1 of FIG. 2 is {“a”, “b”, “c”}. In this case, at S2 of FIG. 2, the original query is split into three tokens {“a”}, {“b”}, {“c”}, and at S3 of FIG. , "b"}, {"a", "c"}, {"b", "c"}, {"a"}, {"b"}, {"c"} are generated.

これら６つのシングルのそれぞれについて、当該シングルを含むすべての他のクエリが、図６のＳ４３で取得される。例えば、シングル｛“ａ”，“ｂ”｝について、クエリ｛“ａ”，“ｂ”，“ｄ”｝および｛“ａ”，“ｂ”，“ｃ”｝が取得される。クエリ｛“ａ”，“ｂ”，“ｃ”｝は、オリジナルクエリと同じクエリである。
シングル｛“ａ”，“ｂ”｝について取得された、これら他のクエリ｛“ａ”，“ｂ”，“ｄ”｝および｛“ａ”，“ｂ”，“ｃ”｝のそれぞれについて、図６のＳ４４で、オリジナルクエリ｛“ａ”，“ｂ”，“ｃ”｝に対する類似度が算出される。 For each of these six singles, all other queries containing that single are obtained in S43 of FIG. For example, for single {“a”, “b”}, queries {“a”, “b”, “d”} and {“a”, “b”, “c”} are obtained. Query {“a”, “b”, “c”} is the same query as the original query.
For each of these other queries {“a”, “b”, “d”} and {“a”, “b”, “c”} obtained for the single {“a”, “b”}, In S44 of FIG. 6, the degree of similarity with respect to the original query {“a”, “b”, “c”} is calculated.

図７を参照して、クエリ｛“ａ”，“ｂ”，“ｄ”｝は、オリジナルクエリ｛“ａ”，“ｂ”，“ｃ”｝と、属性α_１およびα_２を共有し、属性α_１に接続する対のリンクに付与された重みの最小値は０．２、属性α_２に接続する対のリンクに付与された重みの最小値は０．３である。このため、クエリ｛“ａ”，“ｂ”，“ｄ”｝のオリジナルクエリ｛“ａ”，“ｂ”，“ｃ”｝に対する類似度は、０．２と０．３の和として、０．５と算出される。
一方、オリジナルクエリと同じであるクエリ｛“ａ”，“ｂ”，“ｃ”｝は、オリジナルクエリ｛“ａ”，“ｂ”，“ｃ”｝と、すべての属性α_１、α_２、およびα_３を共有し、各属性に接続する対のリンクに付与された重みの最小値はそれぞれ０．３である。このため、クエリ｛“ａ”，“ｂ”，“ｃ”｝のオリジナルクエリ｛“ａ”，“ｂ”，“ｃ”｝に対する類似度は、０．９と算出される。
したがって、図６のＳ４５で、シングル｛“ａ”，“ｂ”｝のスコアは、０．５と０．９の幾何平均として、０．６７と算出される。
上記の処理を、図２のＳ３で生成されたシングルのすべてについて繰り返すことで、元も高いスコアを持つ１つまたは複数のシングルを特定することができる。 Referring to FIG. 7, query {“a”, “b”, “d”} shares attributes α ₁ and α ₂ with original query {“a”, “b”, “c”}, The minimum value of the weight given to the pair of links connected to the attribute _α1 is 0.2, and the minimum value of the weight given to the pair of links connected to the attribute _α2 is 0.3. Therefore, the similarity of the query {“a”, “b”, “d”} to the original query {“a”, “b”, “c”} is 0 as the sum of 0.2 and 0.3. .5.
On the other hand, the query {“a”, “b”, “c”}, which is the same as the original query, contains the original query {“a”, “b”, “c”} and all attributes α ₁ , α ₂ , and _α3 , and the minimum value of the weight given to the paired links connecting each attribute is 0.3 respectively. Therefore, the similarity of the query {“a”, “b”, “c”} to the original query {“a”, “b”, “c”} is calculated as 0.9.
Therefore, in S45 of FIG. 6, the score of single {“a”, “b”} is calculated as 0.67 as the geometric mean of 0.5 and 0.9.
By repeating the above process for all of the singles generated in S3 of FIG. 2, it is possible to identify one or more singles that originally had a high score.

＜キーワード特定装置のハードウエア構成＞
図８は、本実施形態に係るキーワード特定装置１のハードウエア構成の非限定的一例を示す図である。
本実施形態に係るキーワード特定装置１は、単一または複数の、あらゆるコンピュータ、モバイルデバイス、または他のいかなる処理プラットフォーム上にも実装することができる。
図８を参照して、キーワード特定装置１は、単一のコンピュータに実装される例が示されているが、本実施形態に係るキーワード特定装置１は、複数のコンピュータを含むコンピュータシステムに実装されてよい。複数のコンピュータは、有線または無線のネットワークにより相互通信可能に接続されてよい。 <Hardware Configuration of Keyword Identification Device>
FIG. 8 is a diagram showing a non-limiting example of the hardware configuration of the keyword identification device 1 according to this embodiment.
The keyword identification device 1 according to this embodiment can be implemented on any single or multiple computers, mobile devices, or any other processing platform.
With reference to FIG. 8, an example in which the keyword identification device 1 is implemented in a single computer is shown, but the keyword identification device 1 according to this embodiment is implemented in a computer system including a plurality of computers. you can A plurality of computers may be interconnectably connected by a wired or wireless network.

図８に示すように、キーワード特定装置１は、ＣＰＵ８１と、ＲＯＭ８２と、ＲＡＭ８３と、ＨＤＤ８４と、入力部８５と、表示部８６と、通信Ｉ／Ｆ８７と、システムバス８８とを備えてよい。キーワード特定装置１はまた、外部メモリを備えてよい。
ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）８１は、キーワード特定装置１における動作を統括的に制御するものであり、データ伝送路であるシステムバス８８を介して、各構成部（８２～８７）を制御する。 As shown in FIG. 8, the keyword identification device 1 may include a CPU 81, a ROM 82, a RAM 83, an HDD 84, an input section 85, a display section 86, a communication I/F 87, and a system bus 88. The keyword identification device 1 may also comprise an external memory.
A CPU (Central Processing Unit) 81 comprehensively controls operations in the keyword identification device 1, and controls each component (82 to 87) via a system bus 88, which is a data transmission line.

ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）８２は、ＣＰＵ８１が処理を実行するために必要な制御プログラム等を記憶する不揮発性メモリである。なお、当該プログラムは、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）８４、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の不揮発性メモリや着脱可能な記憶媒体（不図示）等の外部メモリに記憶されていてもよい。
ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）８３は、揮発性メモリであり、ＣＰＵ８１の主メモリ、ワークエリア等として機能する。すなわち、ＣＰＵ８１は、処理の実行に際してＲＯＭ８２から必要なプログラム等をＲＡＭ８３にロードし、当該プログラム等を実行することで各種の機能動作を実現する。 A ROM (Read Only Memory) 82 is a non-volatile memory that stores control programs and the like necessary for the CPU 81 to execute processing. The program may be stored in a non-volatile memory such as a HDD (Hard Disk Drive) 84 or an SSD (Solid State Drive) or an external memory such as a removable storage medium (not shown).
A RAM (Random Access Memory) 83 is a volatile memory and functions as a main memory, a work area, and the like for the CPU 81 . That is, the CPU 81 loads necessary programs and the like from the ROM 82 to the RAM 83 when executing processing, and executes the programs and the like to realize various functional operations.

ＨＤＤ８４は、例えば、ＣＰＵ８１がプログラムを用いた処理を行う際に必要な各種データや各種情報等を記憶している。また、ＨＤＤ８４には、例えば、ＣＰＵ８１がプログラム等を用いた処理を行うことにより得られた各種データや各種情報等が記憶される。
入力部８５は、キーボードやマウス等のポインティングデバイスにより構成される。
表示部８６は、液晶ディスプレイ（ＬＣＤ）等のモニターにより構成される。表示部８６は、キーワード特定処理で使用される各種パラメータや、他の装置との通信で使用される通信パラメータ等をキーワード特定装置１へ指示入力するためのユーザインタフェースであるＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）を提供してよい。 The HDD 84 stores, for example, various data and information necessary for the CPU 81 to perform processing using programs. The HDD 84 also stores various data, information, and the like obtained by the CPU 81 performing processing using programs and the like, for example.
The input unit 85 is composed of a pointing device such as a keyboard and a mouse.
The display unit 86 is configured by a monitor such as a liquid crystal display (LCD). The display unit 86 is a GUI (Graphical User Interface) that is a user interface for inputting instructions to the keyword identification apparatus 1 for various parameters used in the keyword identification process, communication parameters used in communication with other apparatuses, and the like. may be provided.

通信Ｉ／Ｆ８７は、キーワード特定装置１と外部装置との通信を制御するインタフェースである。
通信Ｉ／Ｆ８７は、ネットワークとのインタフェースを提供し、ネットワークを介して、外部装置との通信を実行する。通信Ｉ／Ｆ８７を介して、外部装置との間で各種データや各種パラメータ等が送受信される。本実施形態では、通信Ｉ／Ｆ８７は、イーサネット（登録商標）等の通信規格に準拠する有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）や専用線を介した通信を実行してよい。ただし、本実施形態で利用可能なネットワークはこれに限定されず、無線ネットワークで構成されてもよい。この無線ネットワークは、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＺｉｇＢｅｅ（登録商標）、ＵＷＢ（ＵｌｔｒａＷｉｄｅＢａｎｄ）等の無線ＰＡＮ（ＰｅｒｓｏｎａｌＡｒｅａＮｅｔｗｏｒｋ）を含む。また、Ｗｉ－Ｆｉ（ＷｉｒｅｌｅｓｓＦｉｄｅｌｉｔｙ）（登録商標）等の無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）や、ＷｉＭＡＸ（登録商標）等の無線ＭＡＮ（ＭｅｔｒｏｐｏｌｉｔａｎＡｒｅａＮｅｔｗｏｒｋ）を含む。さらに、ＬＴＥ／３Ｇ、４Ｇ、５Ｇ等の無線ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）を含む。なお、ネットワークは、各機器を相互に通信可能に接続し、通信が可能であればよく、通信の規格、規模、構成は上記に限定されない。 The communication I/F 87 is an interface that controls communication between the keyword identification device 1 and an external device.
A communication I/F 87 provides an interface with a network and performs communication with an external device via the network. Various data, various parameters, etc. are transmitted/received to/from an external device via the communication I/F 87 . In this embodiment, the communication I/F 87 may perform communication via a wired LAN (Local Area Network) conforming to a communication standard such as Ethernet (registered trademark) or a dedicated line. However, the network that can be used in this embodiment is not limited to this, and may be configured as a wireless network. This wireless network includes a wireless PAN (Personal Area Network) such as Bluetooth (registered trademark), ZigBee (registered trademark), and UWB (Ultra Wide Band). It also includes a wireless LAN (Local Area Network) such as Wi-Fi (Wireless Fidelity) (registered trademark) and a wireless MAN (Metropolitan Area Network) such as WiMAX (registered trademark). Furthermore, wireless WANs (Wide Area Networks) such as LTE/3G, 4G, and 5G are included. It should be noted that the network connects each device so as to be able to communicate with each other, and the communication standard, scale, and configuration are not limited to those described above.

図１に示すキーワード特定装置１の各要素のうち少なくとも一部の機能は、ＣＰＵ８１がプログラムを実行することで実現することができる。ただし、図１に示すキーワード特定装置１の各要素のうち少なくとも一部の機能が専用のハードウエアとして動作するようにしてもよい。この場合、専用のハードウエアは、ＣＰＵ８１の制御に基づいて動作する。 At least some of the functions of the elements of the keyword identification device 1 shown in FIG. 1 can be realized by the CPU 81 executing a program. However, at least some of the functions of the elements of the keyword identification device 1 shown in FIG. 1 may operate as dedicated hardware. In this case, the dedicated hardware operates under the control of the CPU 81 .

以上説明したように、本実施形態によれば、キーワード特定装置は、検索エンジンに投入すべきクエリを、クエリを構成する個々のトークンに分割し、１つまたは複数のトークンからなるシングル（ｓｈｉｎｇｌｅ）をクエリのサブセットとして生成し、生成されたシングルのそれぞれについて、当該シングルを含むクエリを生成する。
本実施形態に係るキーワード特定装置はさらに、ユーザの行動履歴に基づいて生成されたクエリモデルを参照して、生成された各クエリのオリジナルのクエリに対する類似度を算出し、算出された類似度から、当該シングルのスコアを算出することにより、より高いスコアが算出されたシングルを、ユーザの意図により合致するキーワードとして特定する。 As described above, according to the present embodiment, the keyword identification device divides a query to be input to a search engine into individual tokens that constitute the query, and creates a single token consisting of one or more tokens. is generated as a subset of the query, and for each generated single, a query containing the single is generated.
The keyword identification device according to the present embodiment further refers to the query model generated based on the user's behavior history, calculates the similarity of each generated query to the original query, and from the calculated similarity , the score of the single is calculated, and the single with the higher calculated score is specified as a keyword that more closely matches the user's intention.

したがって、クエリに含まれるトークンやシングルのそれぞれを事前定義することなく、クエリを構成するシングル（キーフレーズやキーワード）の中から、ユーザの意図をより代表するキーワードを、柔軟かつ容易に特定することができる。
これにより、クエリに対して、所望の検索結果の提示が実現され、検索精度の向上に資する。 Therefore, without predefining each of the tokens and singles included in the query, to flexibly and easily identify keywords that more represent the user's intentions from among the singles (keyphrases and keywords) that make up the query. can be done.
As a result, a desired search result is presented in response to the query, contributing to improvement in search accuracy.

なお、上記において特定の実施形態が説明されているが、当該実施形態は単なる例示であり、本発明の範囲を限定する意図はない。本明細書に記載された装置及び方法は上記した以外の形態において具現化することができる。また、本発明の範囲から離れることなく、上記した実施形態に対して適宜、省略、置換及び変更をなすこともできる。かかる省略、置換及び変更をなした形態は、請求の範囲に記載されたもの及びこれらの均等物の範疇に含まれ、本発明の技術的範囲に属する。 It should be noted that although specific embodiments are described above, the embodiments are merely examples and are not intended to limit the scope of the invention. The apparatus and methods described herein may be embodied in forms other than those described above. Also, appropriate omissions, substitutions, and modifications may be made to the above-described embodiments without departing from the scope of the invention. Forms with such omissions, substitutions and modifications are included in the scope of what is described in the claims and their equivalents, and belong to the technical scope of the present invention.

１…キーワード特定装置、２…ユーザ行動履歴ＤＢ、３…検索エンジン、４…クエリモデル記憶部、１１…クエリ入力部、１２…クエリ分割部、１３…クエリモデル生成部、１４…シングル評価部、１５…出力部、８１…ＣＰＵ、８２…ＲＯＭ、８３…ＲＡＭ、８４…ＨＤＤ、８５…入力部、８６…表示部、８７…通信Ｉ／Ｆ、８８…バス 1 keyword identification device 2 user action history DB 3 search engine 4 query model storage unit 11 query input unit 12 query division unit 13 query model generation unit 14 single evaluation unit 15... Output unit, 81... CPU, 82... ROM, 83... RAM, 84... HDD, 85... Input unit, 86... Display unit, 87... Communication I/F, 88... Bus

Claims

a first query acquisition unit that acquires a first query;
a query subset generation unit that divides the first query acquired by the first query acquisition unit into a plurality of tokens and generates a plurality of query subsets each composed of one or more divided tokens;
a second query acquisition unit that acquires a second query that shares the query subset generated by the query subset generation unit with the first query ;
Similarity for calculating a degree of similarity of the second query to the first query based on an operation history of items on which a user has performed operations in the past in response to the first query or the second query a degree calculation unit;
a score calculation unit that calculates a score of the query subset based on the similarity of the second query calculated by the similarity calculation unit;
and a query subset identifying unit that identifies, as a keyword , a query subset having a higher score calculated by the score calculating unit among the plurality of query subsets.

The information according to claim 1, wherein the similarity calculation unit calculates the similarity as a degree of similarity between the search result set of the second query and the search result set of the first query. processing equipment.

Based on the user's action history, the first query or the second query is connected to the attribute of the query with a link, and a weight is given to the link to generate a query model, and the generated further comprising a query model generation unit that stores the query model in a storage device,
The information processing according to claim 1 or 2, wherein the similarity calculation unit calculates the similarity of the second query by referring to the query model stored in the storage device. Device.

The similarity calculation unit calculates the weight given to the link connected to the attribute shared between the query model of the first query and the query model of the second query , calculating the similarity of the second query.

The similarity calculation unit calculates a plurality of the weights respectively given to the pairs of the links connected to the attributes shared between the query model of the first query and the query model of the second query. The information processing apparatus according to claim 4, wherein the similarity of the second query is calculated by comparing and adding the weight having a small value for a plurality of the attributes.

The similarity calculation unit calculates the similarity of the second query by comparing information associated with an item on which the user has performed an operation in the past, between the first query and the second query. 6. The information processing apparatus according to any one of claims 1 to 5, wherein a is calculated.

3. The query model generation unit generates the query model by setting information associated with the item as the attribute and assigning the weight based on the frequency of the operation. 6. The information processing device according to any one of 3 to 5.

8. The score of the query subset according to any one of claims 1 to 7, wherein the score calculation unit calculates the score of the query subset by calculating the geometric mean of the similarities of all the second queries. The information processing device described.

The second query acquisition unit acquires the second query for each of the plurality of query subsets generated by the query subset generation unit;
The information processing apparatus according to any one of claims 1 to 8, wherein the score calculation unit calculates the score for each of the plurality of query subsets.

The information processing according to any one of claims 1 to 9, wherein the query subset generation unit generates the query subset from a plurality of discontinuous tokens in the character string of the first query. Device.

11. Any one of claims 1 to 10, further comprising an output control unit that controls output so that search results corresponding to the query subset identified by the query subset identification unit are preferentially presented. The information processing device according to .

An information processing method executed by an information processing device,
obtaining a first query;
dividing the obtained first query into a plurality of tokens and generating a plurality of query subsets composed of one or more divided tokens;
obtaining a second query that shares the generated query subset with the first query ;
calculating the degree of similarity of the second query to the first query based on an operation history of items that the user has operated in the past in response to the first query or the second query; and,
calculating a score for the query subset based on the similarity of the second query;
and specifying , as a keyword , a query subset having a higher calculated score among the plurality of query subsets.

An information processing program for causing a computer to execute information processing, the program causing the computer to:
a first query acquisition process for acquiring a first query;
a query subset generation process for dividing the first query acquired by the first query acquisition process into a plurality of tokens and generating a plurality of query subsets each composed of one or more divided tokens;
a second query acquisition process for acquiring a second query that shares the query subset generated by the query subset generation process with the first query ;
A degree of similarity for calculating the degree of similarity of the second query to the first query based on an operation history of items on which a user has performed operations in the past in response to the first query or the second query. a calculation process;
A score calculation process for calculating a score of the query subset based on the similarity of the second query calculated by the similarity calculation process;
and a query subset identification process for identifying, as a keyword , a query subset with a higher score calculated by the score calculation process among the plurality of query subsets. processing program.