JP7212728B1

JP7212728B1 - Information processing device, information processing method and information processing program

Info

Publication number: JP7212728B1
Application number: JP2021120022A
Authority: JP
Inventors: 知博真鍋; 圭吾町永; 云波朴; 宏太薛; 秀之社本
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2023-01-25
Anticipated expiration: 2041-07-20
Also published as: JP2023015933A

Abstract

【課題】検索結果の質を高めることができる情報処理装置、情報処理方法および情報処理プログラムを提供すること。【解決手段】本願に係る情報処理装置は、分類部と、特定部と、検索部とを備える。分類部は、検索対象となる各コンテンツに対応するコンテンツベクトルをクラスタに分類する。特定部は、ユーザが入力した検索クエリに対応するクエリベクトルと類似するクラスタを特定する。検索部は、特定したクラスタに含まれるコンテンツベクトルに対応するコンテンツを検索対象として検索する。【選択図】図５An information processing device, an information processing method, and an information processing program capable of improving the quality of search results are provided. An information processing apparatus according to the present application includes a classification section, an identification section, and a search section. The classification unit classifies content vectors corresponding to each content to be searched into clusters. The identifying unit identifies clusters similar to the query vector corresponding to the search query input by the user. The search unit searches for content corresponding to the content vector included in the specified cluster as a search target. [Selection drawing] Fig. 5

Description

本発明は、情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

従来、ユーザが入力したクエリおよび検索対象のコンテンツをベクトル化し、双方のベクトルの類似度により検索処理を行う情報処理装置がある。この種の情報処理装置では、近似最近傍探索（ANN：Approximate Nearest Neighbor algorithms search）により検索対象の検索範囲を絞る技術が提案されている。 2. Description of the Related Art Conventionally, there is an information processing apparatus that vectorizes a query input by a user and content to be searched, and performs search processing based on the degree of similarity between the two vectors. For this type of information processing apparatus, a technique has been proposed for narrowing down the search range of a search target by an approximate nearest neighbor search (ANN).

特開２０２１－８６５７３号公報JP 2021-86573 A

しかしながら、従来の技術では、検索結果の質を高める点で更なる改善の余地があった。 However, the conventional technology has room for further improvement in terms of improving the quality of search results.

本願は、上記に鑑みてなされたものであって、検索結果の質を高めることができる情報処理装置、情報処理方法および情報処理プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object thereof is to provide an information processing apparatus, an information processing method, and an information processing program capable of improving the quality of search results.

本願に係る情報処理装置は、分類部と、特定部と、検索部とを備える。前記分類部は、検索対象となる各コンテンツに対応するコンテンツベクトルをクラスタに分類する。前記特定部は、ユーザが入力した検索クエリに対応するクエリベクトルと類似する前記クラスタを特定する。前記検索部は、特定した前記クラスタに含まれる前記コンテンツベクトルに対応するコンテンツを検索対象として検索する。 An information processing apparatus according to the present application includes a classification section, an identification section, and a search section. The classification unit classifies content vectors corresponding to each content to be searched into clusters. The identifying unit identifies the clusters similar to a query vector corresponding to a search query input by a user. The search unit searches for content corresponding to the content vector included in the specified cluster as a search target.

実施形態の一態様によれば、検索結果の質を高めることができるという効果を奏する。 According to one aspect of the embodiment, it is possible to improve the quality of search results.

図１は、実施形態に係る情報処理その１を示す図である。FIG. 1 is a diagram showing information processing 1 according to the embodiment. 図２は、実施形態に係る情報処理その２を示す図である。FIG. 2 is a diagram illustrating information processing 2 according to the embodiment. 図３は、実施形態に係る情報処理その３を示す図である。FIG. 3 is a diagram illustrating information processing No. 3 according to the embodiment. 図４は、実施形態に係る情報処理システムの構成例を示す図である。FIG. 4 is a diagram illustrating a configuration example of an information processing system according to the embodiment; 図５は、実施形態に係る情報処理装置の構成例を示す図である。FIG. 5 is a diagram illustrating a configuration example of an information processing apparatus according to the embodiment; 図６は、コンテンツ情報の一例を示す図である。FIG. 6 is a diagram showing an example of content information. 図７は、ユーザ情報の一例を示す図である。FIG. 7 is a diagram showing an example of user information. 図８は、実施形態に係る情報処理装置が実行する情報処理その１の処理手順を示すフローチャートである。FIG. 8 is a flowchart illustrating a processing procedure of information processing 1 executed by the information processing apparatus according to the embodiment. 図９は、実施形態に係る情報処理装置が実行する情報処理その２の処理手順を示すフローチャートである。FIG. 9 is a flowchart illustrating a processing procedure of information processing No. 2 executed by the information processing apparatus according to the embodiment. 図１０は、実施形態に係る情報処理装置が実行する情報処理その３の処理手順を示すフローチャートである。FIG. 10 is a flowchart illustrating a processing procedure of information processing No. 3 executed by the information processing apparatus according to the embodiment. 図１１は、ハードウェア構成の一例を示す図である。FIG. 11 is a diagram illustrating an example of a hardware configuration;

以下に、本願に係る情報処理装置、情報処理方法および情報処理プログラムを実施するための形態（以下、「実施形態」と記載する）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る情報処理装置、情報処理方法および情報処理プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, modes for implementing an information processing apparatus, an information processing method, and an information processing program according to the present application (hereinafter referred to as "embodiments") will be described in detail with reference to the drawings. The information processing apparatus, information processing method, and information processing program according to the present application are not limited to this embodiment. Also, in each of the following embodiments, the same parts are denoted by the same reference numerals, and overlapping descriptions are omitted.

（実施形態）
まず、図１～図３を用いて、実施形態に係る情報処理装置が実行する情報処理について説明する。図１～図３は、実施形態に係る情報処理その１～その３を示す図である。なお、図１～図３では、実施形態に係る情報処理装置１を含む情報処理システムＳの動作例を示している。図１～図３に示すように、実施形態に係る情報処理システムＳは、情報処理装置１と、ユーザ端末５０とを含む。 (embodiment)
First, information processing executed by the information processing apparatus according to the embodiment will be described with reference to FIGS. 1 to 3. FIG. 1 to 3 are diagrams showing information processing 1 to 3 according to the embodiment. 1 to 3 show an operation example of an information processing system S including the information processing device 1 according to the embodiment. As shown in FIGS. 1 to 3, an information processing system S according to the embodiment includes an information processing device 1 and a user terminal 50. FIG.

まず、図１を用いて、情報処理その１について説明する。図１に示す情報処理その１では、ユーザが入力した検索クエリおよび検索対象の検索範囲を指定するフィルタ条件それぞれをベクトル化し、フィルタ条件のベクトルに基づいて、検索クエリのベクトルを補正する処理を行った後に検索処理を行う。 First, information processing 1 will be described with reference to FIG. In the information processing 1 shown in FIG. 1, the search query input by the user and the filter condition specifying the search range of the search target are each vectorized, and the vector of the search query is corrected based on the filter condition vector. After that, search processing is performed.

具体的には、まず、実施形態に係るユーザ端末５０は、ユーザが入力した検索クエリおよび検索対象の検索範囲を指定するフィルタ条件を送信する（ステップＳ１）。検索クエリは、テキストや画像等である。フィルタ条件は、検索対象のカテゴリ（商品種別や価格帯等）を指定する条件であり、例えば、予め設定されたフィルタ条件の中からユーザが選択することで指定する。また、検索対象は、ショッピングサイトにおける商品や、ドキュメント、画像等といった各種コンテンツである。 Specifically, first, the user terminal 50 according to the embodiment transmits a search query input by the user and a filter condition specifying a search range to be searched (step S1). A search query is a text, an image, or the like. The filter condition is a condition for specifying a search target category (product type, price range, etc.), and is specified by, for example, the user selecting from preset filter conditions. Search targets are various contents such as products, documents, images, etc. on the shopping site.

つづいて、実施形態に係る情報処理装置１は、取得した検索クエリおよびフィルタ条件を所定の距離空間におけるベクトルに変換する（ステップＳ２）。なお、以下では、検索クエリに対応するベクトルを「クエリベクトル」、フィルタ条件に対応するベクトルを「フィルタベクトル」と記載する場合がある。また、所定の距離空間は、コサイン類似度の空間や、内積空間、ユークリッド距離空間等である。 Subsequently, the information processing apparatus 1 according to the embodiment converts the acquired search query and filter condition into a vector in a predetermined distance space (step S2). In the following description, a vector corresponding to a search query may be referred to as a "query vector", and a vector corresponding to a filter condition may be referred to as a "filter vector". Further, the predetermined metric space is a cosine similarity space, an inner product space, a Euclidean metric space, or the like.

つづいて、実施形態に係る情報処理装置１は、フィルタベクトルに基づいてクエリベクトルを補正することで補正ベクトルを生成する（ステップＳ３）。例えば、情報処理装置１は、クエリベクトルの向きやノルムを、フィルタベクトルの向きやノルムに基づいて補正した補正ベクトルを生成する。 Subsequently, the information processing device 1 according to the embodiment generates a correction vector by correcting the query vector based on the filter vector (step S3). For example, the information processing device 1 generates a correction vector by correcting the orientation and norm of the query vector based on the orientation and norm of the filter vector.

図１に示す例では、情報処理装置１は、クエリベクトルの向きやノルムをフィルタベクトルの向きやノルムに近づけた補正ベクトルを生成する。なお、補正ベクトルの生成方法については様々なバリエーションが考えられるが、かかる点の詳細については後述する。 In the example shown in FIG. 1, the information processing device 1 generates a correction vector in which the orientation and norm of the query vector are brought closer to the orientation and norm of the filter vector. Various variations are conceivable for the method of generating the correction vector, and the details of this point will be described later.

つづいて、情報処理装置１は、生成した補正ベクトルに基づいて検索対象を検索する（ステップＳ４）。例えば、情報処理装置１は、補正ベクトルと検索対象のベクトルとの距離（コサイン類似度や、内積距離、ユークリッド距離）を算出し、算出結果に基づいて検索結果を生成する。そして、情報処理装置１は、検索結果をユーザ端末５０へ送信する（ステップＳ５）。 Subsequently, the information processing device 1 searches for a search target based on the generated correction vector (step S4). For example, the information processing device 1 calculates the distance (cosine similarity, inner product distance, Euclidean distance) between the correction vector and the search target vector, and generates search results based on the calculation results. Then, the information processing device 1 transmits the search result to the user terminal 50 (step S5).

このように、実施形態に係る情報処理その１によれば、フィルタベクトルを用いてクエリベクトル自体を補正するため、従来の手法であるクエリベクトルと距離が近い検索対象を抽出してからフィルタ条件に合致する検索対象を選択する場合に比べて、検索結果に含まれるコンテンツが多くなり、かつ、補正ベクトルにより抽出された各検索対象はフィルタ条件が既に加味されていることとなる。すなわち、実施形態に係る情報処理その１によれば、検索結果の質を高めることができる。 As described above, according to the information processing 1 according to the embodiment, since the query vector itself is corrected using the filter vector, search targets that are close to the query vector, which is a conventional method, are extracted and then applied to the filter condition. Compared to the case of selecting matching search targets, the search results contain more content, and the filter conditions are already taken into consideration for each search target extracted by the correction vector. That is, according to the first information processing according to the embodiment, it is possible to improve the quality of search results.

次に、図２を用いて、情報処理その２について説明する。図２に示す情報処理その２では、クエリベクトルと、検索対象となるコンテンツのベクトル（コンテンツベクトル）とのノルム差が所定値以上であるコンテンツ内積計算の対象コンテンツとして設定した後に検索処理を行う。なお、図２に示す情報処理その２では、上記した所定の距離空間が内積空間であることとする。 Next, the second information processing will be described with reference to FIG. In information processing No. 2 shown in FIG. 2, search processing is performed after content is set as content to be subjected to inner product calculation for which the norm difference between the query vector and the vector of the content to be searched (content vector) is equal to or greater than a predetermined value. In the second information processing shown in FIG. 2, the predetermined metric space described above is assumed to be an inner product space.

具体的には、まず、実施形態に係るユーザ端末５０は、ユーザが入力した検索クエリを送信する（ステップＳ１１）。なお、ユーザ端末５０は、検索クエリに加え、上記したフィルタ条件を送信してもよい。 Specifically, first, the user terminal 50 according to the embodiment transmits a search query input by the user (step S11). Note that the user terminal 50 may transmit the filter conditions described above in addition to the search query.

つづいて、実施形態に係る情報処理装置１は、検索クエリと、検索対象となる各コンテンツとを内積空間におけるベクトルに変換する（ステップＳ１２）。なお、以下では、検索クエリに対応するベクトルを上記と同様に「クエリベクトル」、各コンテンツに対応するベクトルを「コンテンツベクトル」と記載する場合がある。このベクトル変換処理は、検索クエリと各コンテンツのベクトルの内積がスコアとして意味をなすような同じ空間（同じ次元数）のベクトルに変換することで実現されるものであり、ベクトルは内積の値の大小を基に機械学習によって獲得された変換手順によって変換される。このような内積空間を用いた検索手順は、内積の値が大きいコンテンツのほうが検索結果に残りやすくする手順を含む。 Subsequently, the information processing apparatus 1 according to the embodiment converts the search query and each content to be searched into vectors in the inner product space (step S12). In addition, hereinafter, the vector corresponding to the search query may be referred to as the "query vector", and the vector corresponding to each content may be referred to as the "content vector". This vector conversion process is realized by converting the inner product of the search query and each content vector into a vector in the same space (same number of dimensions) that makes sense as a score, and the vector is the value of the inner product. It is converted by a conversion procedure acquired by machine learning based on the magnitude. A search procedure using such an inner product space includes a procedure in which content with a large inner product value is more likely to remain in search results.

図２では、コンテンツベクトルは、クエリベクトルと同じタイミングで変換する例を示したが、例えば、検索クエリが入力される前のタイミング（例えば、コンテンツが登録されたタイミング等）で事前に変換されていてもよい。 FIG. 2 shows an example in which the content vector is converted at the same timing as the query vector. may

つづいて、実施形態に係る情報処理装置１は、各コンテンツベクトルの中から、ノルムが所定値以上のコンテンツベクトルを抽出し、抽出したコンテンツベクトルに対応するコンテンツを検索クエリに対応するクエリベクトルとの内積計算の対象コンテンツとして設定する（ステップＳ１３）。 Subsequently, the information processing apparatus 1 according to the embodiment extracts content vectors whose norm is equal to or greater than a predetermined value from each content vector, and compares the content corresponding to the extracted content vector with the query vector corresponding to the search query. The content is set as the target content for the inner product calculation (step S13).

図２に示す例では、領域Ｒ２、Ｒ３に属するコンテンツを対象コンテンツとして設定する。また、領域Ｒ１は、後段のステップＳ１４においてクエリベクトルと類似する（内積が所定条件を満たす）コンテンツ群であり、言い換えれば、内積計算の対象コンテンツである。 In the example shown in FIG. 2, contents belonging to areas R2 and R3 are set as target contents. Also, the region R1 is a group of contents similar to the query vector (an inner product satisfies a predetermined condition) in step S14 described later, in other words, a target content for the inner product calculation.

つづいて、実施形態に係る情報処理装置１は、領域Ｒ１～Ｒ３に含まれる対象コンテンツを検索対象として検索処理を行う（ステップＳ１４）。例えば、情報処理装置１は、対象コンテンツのコンテンツベクトルと、クエリベクトルとの内積を計算し、計算結果に基づいて検索結果を生成する。 Subsequently, the information processing apparatus 1 according to the embodiment performs search processing with target contents included in the regions R1 to R3 as search targets (step S14). For example, the information processing device 1 calculates the inner product of the content vector of the target content and the query vector, and generates search results based on the calculation results.

つづいて、実施形態に係る情報処理装置１は、生成した検索結果をユーザ端末５０へ送信する（ステップＳ１５）。 Subsequently, the information processing device 1 according to the embodiment transmits the generated search result to the user terminal 50 (step S15).

つまり、実施形態に係る情報処理その２では、ノルムが所定値以上である領域Ｒ２、Ｒ３のコンテンツについては、クエリベクトルに関わらず必ず検索対象に含ませるようにする。これは、内積空間においては、ノルムが所定値以上のコンテンツベクトルとクエリベクトルとの類似度は高くなる（検索結果としてユーザが所望しているものに近い）場合があるためである。すなわち、実施形態に係る情報処理装置１によれば、検索結果の質を高めることができる。 That is, in the information processing 2 according to the embodiment, the contents of the regions R2 and R3 whose norms are equal to or greater than a predetermined value are always included in the search target regardless of the query vector. This is because, in the inner product space, the similarity between the content vector and the query vector whose norm is equal to or greater than a predetermined value may be high (the search result is close to what the user desires). That is, according to the information processing device 1 according to the embodiment, it is possible to improve the quality of search results.

次に、図３を用いて、情報処理その３について説明する。図３に示す情報処理その３では、検索対象となる各コンテンツに対応するコンテンツベクトルを事前にクラスタリングしておき、クエリベクトルと類似するクラスタを特定して検索処理を行う。 Next, information processing 3 will be described with reference to FIG. In the information processing 3 shown in FIG. 3, content vectors corresponding to each content to be searched are clustered in advance, and a cluster similar to the query vector is specified to perform search processing.

具体的には、まず、実施形態に係る情報処理装置１は、事前に（検索クエリを取得する前に）、各コンテンツベクトルについて、類似するコンテンツベクトルが同じクラスタとなるように分類する（ステップＳ２１）。図３に示す例では、領域Ｒ１～Ｒ９それぞれをクラスタとして分類する。 Specifically, first, the information processing apparatus 1 according to the embodiment classifies each content vector in advance (before acquiring a search query) so that similar content vectors are in the same cluster (step S21 ). In the example shown in FIG. 3, each of regions R1 to R9 is classified as a cluster.

つづいて、実施形態に係るユーザ端末５０は、ユーザが入力した検索クエリを送信する（ステップＳ２２）。なお、ユーザ端末５０は、検索クエリに加え、上記したフィルタ条件を送信してもよい。 Subsequently, the user terminal 50 according to the embodiment transmits the search query input by the user (step S22). Note that the user terminal 50 may transmit the filter conditions described above in addition to the search query.

つづいて、実施形態に係る情報処理装置１は、検索クエリを所定の距離空間におけるクエリベクトルに変換する（ステップＳ２３）。 Subsequently, the information processing device 1 according to the embodiment converts the search query into a query vector in a predetermined distance space (step S23).

つづいて、実施形態に係る情報処理装置１は、クエリベクトルと類似するクラスタを特定する（ステップＳ２４）。例えば、情報処理装置１は、クラスタに含まれるコンテンツベクトルの代表ベクトルを算出し、代表ベクトルとクエリベクトルとの類似度に基づいて特定する。 Subsequently, the information processing device 1 according to the embodiment identifies clusters similar to the query vector (step S24). For example, the information processing device 1 calculates a representative vector of the content vectors included in the cluster, and specifies based on the degree of similarity between the representative vector and the query vector.

つづいて、実施形態に係る情報処理装置１は、特定したクラスタに含まれるコンテンツを検索対象として検索処理を行う（ステップＳ２５）。例えば、情報処理装置１は、対クラスタに含まれるコンテンツベクトルと、クエリベクトルとの内積を計算し、計算結果に基づいて検索結果を生成する。 Subsequently, the information processing apparatus 1 according to the embodiment performs search processing with content included in the specified cluster as search targets (step S25). For example, the information processing device 1 calculates the inner product of the content vector included in the paired cluster and the query vector, and generates the search result based on the calculation result.

つづいて、実施形態に係る情報処理装置１は、生成した検索結果をユーザ端末５０へ送信する（ステップＳ２６）。 Subsequently, the information processing device 1 according to the embodiment transmits the generated search result to the user terminal 50 (step S26).

つまり、実施形態に係る情報処理その３では、コンテンツベクトルを事前にクラスタリングしておき、クエリベクトルと類似するクラスタを特定することで、検索結果を絞りつつ、全コンテンツをスコアリング（例えば、内積計算）したときと類似した検索結果を生成することができる。つまり、実施形態に係る情報処理装置１によれば、検索結果の質を高めることができる。 In other words, in the third information processing according to the embodiment, content vectors are clustered in advance, and by specifying clusters similar to the query vector, all content is scored (for example, inner product calculation) while narrowing down the search results. ) to generate similar search results. That is, according to the information processing device 1 according to the embodiment, it is possible to improve the quality of search results.

次に、図４を用いて、実施形態に係る情報処理システムＳの構成例について説明する。図４は、実施形態に係る情報処理システムＳの構成例を示すブロック図である。図４に示すように、実施形態に係る情報処理システムＳは、情報処理装置１と、複数のユーザ端末５０とがネットワークＮに対して有線又は無線により接続される。ネットワークＮは、例えば、インターネット、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）等のネットワークである。 Next, a configuration example of the information processing system S according to the embodiment will be described with reference to FIG. FIG. 4 is a block diagram showing a configuration example of the information processing system S according to the embodiment. As shown in FIG. 4, in an information processing system S according to the embodiment, an information processing device 1 and a plurality of user terminals 50 are connected to a network N by wire or wirelessly. The network N is, for example, the Internet, a WAN (Wide Area Network), a LAN (Local Area Network), or the like.

情報処理装置１は、各種コンテンツの検索結果を提供する検索サーバであり、情報処理方法を実行する。情報処理装置１が検索する検索対象は、例えば、ショッピングサイトにおける商品や、ドキュメント、画像等を含む。 The information processing device 1 is a search server that provides search results for various contents, and executes an information processing method. The search targets searched by the information processing apparatus 1 include, for example, products on shopping sites, documents, images, and the like.

ユーザ端末５０は、コンテンツを検索したいユーザが所持する端末装置であり、情報処理装置１へ検索要求を行う端末装置である。ユーザ端末５０は、スマートフォン、デスクトップ型ＰＣ、ノート型ＰＣ、タブレット型ＰＣ等の任意のタイプの端末装置を用いることができる。 The user terminal 50 is a terminal device possessed by a user who wants to search for content, and is a terminal device that issues a search request to the information processing apparatus 1 . The user terminal 50 can be any type of terminal device such as a smart phone, desktop PC, notebook PC, tablet PC, or the like.

なお、図４では、情報処理装置１およびユーザ端末５０を含む情報処理システムＳの構成例を示したが、これら構成の他に、コンテンツを登録する事業者等の端末装置が情報処理システムＳに含まれてもよい。 Note that FIG. 4 shows a configuration example of the information processing system S including the information processing device 1 and the user terminal 50, but in addition to these configurations, a terminal device such as a business operator who registers content may be included in the information processing system S. may be included.

次に、図５を参照して、情報処理装置１の構成例について説明する。 Next, a configuration example of the information processing apparatus 1 will be described with reference to FIG.

図５は、実施形態に係る情報処理装置１の構成例を示す図である。図５に示されるように、情報処理装置１は、通信部２と、制御部３と、記憶部４とを有する。制御部３は、取得部３１と、変換部３２と、生成部３３と、分類部３４と、特定部３５と、設定部３６と、検索部３７と、出力部３８とを備える。記憶部４は、コンテンツ情報４１と、ユーザ情報４２とを記憶する。 FIG. 5 is a diagram showing a configuration example of the information processing device 1 according to the embodiment. As shown in FIG. 5 , the information processing device 1 has a communication section 2 , a control section 3 and a storage section 4 . The control unit 3 includes an acquisition unit 31 , a conversion unit 32 , a generation unit 33 , a classification unit 34 , an identification unit 35 , a setting unit 36 , a search unit 37 and an output unit 38 . The storage unit 4 stores content information 41 and user information 42 .

通信部２は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。通信部２は、有線または無線によりネットワーク網と接続される。 The communication unit 2 is realized by, for example, a NIC (Network Interface Card) or the like. The communication unit 2 is connected to a network by wire or wirelessly.

制御部３は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等のプロセッサによって、情報処理装置１内部の記憶装置に記憶されている各種プログラム（情報処理プログラムの一例に相当）がＲＡＭ等を作業領域として実行されることにより実現される。また、制御部３は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）、ＧＰＧＰＵ（General Purpose Graphic Processing Unit）等の集積回路により実現されてもよい。 The control unit 3 is a controller, and for example, various programs (information processing (corresponding to an example of a program) is executed using a RAM or the like as a work area. In addition, the control unit 3 is a controller, and may be implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or a GPGPU (General Purpose Graphic Processing Unit). .

記憶部４は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。 The storage unit 4 is realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disc.

コンテンツ情報４１は、検索対象である各コンテンツに関する情報である。図６は、コンテンツ情報４１の一例を示す図である。図６に示すように、コンテンツ情報４１は、「コンテンツＩＤ」、「カテゴリ」、「詳細情報」等の項目を含む。 The content information 41 is information about each content to be searched. FIG. 6 is a diagram showing an example of the content information 41. As shown in FIG. As shown in FIG. 6, the content information 41 includes items such as "content ID", "category", and "detailed information".

「コンテンツＩＤ」は、各コンテンツを識別する識別情報である。「カテゴリ」は、コンテンツのカテゴリを示す情報であり、例えば、コンテンツの種別（商品種別やドキュメント種別）や、価格帯、文字列か画像かの情報態様に関する情報である。「詳細情報」は、コンテンツの詳細情報であり、例えば、商品の説明文や、商品画像、ドキュメントのテキスト等の情報である。 "Content ID" is identification information for identifying each content. "Category" is information indicating the category of content, for example, information on the type of content (product type or document type), price range, and information mode such as character string or image. The “detailed information” is detailed information about the content, such as product descriptions, product images, document texts, and the like.

ユーザ情報４２は、ユーザに関する情報である。図７は、ユーザ情報４２の一例を示す図である。図７に示すように、ユーザ情報４２は、「ユーザＩＤ」、「属性情報」、「行動情報」等の項目を含む。 The user information 42 is information about users. FIG. 7 is a diagram showing an example of the user information 42. As shown in FIG. As shown in FIG. 7, the user information 42 includes items such as "user ID", "attribute information", and "behavior information".

「ユーザＩＤ」は、ユーザを識別する識別情報である。「属性情報」は、ユーザの属性に関する情報であり、サイコグラフィック属性や、デモグラフィック属性等を含む。「行動情報」は、ユーザの行動に関する情報であり、例えば、検索行動や、購買行動等を含む。 “User ID” is identification information that identifies a user. The “attribute information” is information about user attributes, including psychographic attributes, demographic attributes, and the like. “Behavior information” is information about user behavior, and includes, for example, search behavior and purchase behavior.

次に、情報処理装置１の制御部３の各機能（取得部３１、変換部３２、生成部３３、分類部３４、特定部３５、設定部３６、検索部３７および出力部３８）について説明する。 Next, each function of the control unit 3 of the information processing device 1 (acquisition unit 31, conversion unit 32, generation unit 33, classification unit 34, identification unit 35, setting unit 36, search unit 37 and output unit 38) will be described. .

取得部３１は、各種情報を取得する。取得部３１は、ユーザ端末５０から検索クエリおよびフィルタ条件を取得する。なお、取得部３１は、検索ログを収集するサーバから過去の検索クエリおよびフィルタ条件を取得してもよい。また、取得部３１は、ユーザが入力すると想定される検索クエリおよびフィルタ条件を生成してもよい。 The acquisition unit 31 acquires various types of information. The acquisition unit 31 acquires search queries and filter conditions from the user terminal 50 . Note that the acquisition unit 31 may acquire past search queries and filter conditions from a server that collects search logs. The acquisition unit 31 may also generate search queries and filter conditions that are expected to be input by the user.

変換部３２は、取得部３１が取得した検索クエリおよびフィルタ条件をそれぞれクエリベクトルおよびフィルタベクトルに変換する。また、変換部３２は、コンテンツ情報４１を参照して、検索対象となるコンテンツをコンテンツベクトルに変換する。 The conversion unit 32 converts the search query and filter condition acquired by the acquisition unit 31 into a query vector and a filter vector, respectively. The conversion unit 32 also refers to the content information 41 and converts the content to be searched into a content vector.

ベクトルの変換手法は、例えば、word2vecのようなtokenからベクトルに変換するための対応表を用いた変換手法や、Fasttext、ＢＥＲＴ（Bidirectional Encoder Representations from Transformers）等といった既知の変換手法を用いることができる。 As a vector conversion method, for example, a conversion method using a correspondence table for converting a token into a vector, such as word2vec, or a known conversion method such as Fasttext, BERT (Bidirectional Encoder Representations from Transformers), etc. can be used. .

また、ベクトルが配置される所定の距離空間は、コサイン類似度の距離空間や、内積空間、ユークリッド距離空間等である。なお、図２で示した情報処理その２においては、所定の距離空間のうち、内積空間を用いてベクトル変換が行われる。 The predetermined metric space in which vectors are arranged is a metric space of cosine similarity, an inner product space, a Euclidean metric space, or the like. In the information processing 2 shown in FIG. 2, the vector conversion is performed using the inner product space among the predetermined metric spaces.

また、フィルタベクトルは、フィルタ条件毎のベクトルを保持したテーブル情報を用いて変換が行われてもよい。あるいは、フィルタベクトルは、フィルタ条件の文字列を上記した変換手法により変換されてもよい。 Also, the filter vector may be converted using table information that holds the vector for each filter condition. Alternatively, the filter vector may be transformed by the transformation technique described above for the string of filter conditions.

生成部３３は、変換部３２によって変換されたフィルタベクトルに基づいてクエリベクトルを補正することで補正ベクトルを生成する。例えば、生成部３３は、フィルタベクトルおよびクエリベクトルを変数として含む関数の出力を補正ベクトルとして生成する。 The generation unit 33 generates a correction vector by correcting the query vector based on the filter vector converted by the conversion unit 32 . For example, the generation unit 33 generates the output of a function including the filter vector and the query vector as variables as the correction vector.

具体的には、かかる関数は、フィルタベクトルおよびクエリベクトルそれぞれに所定の重みを乗算して足し合わせる関数である。この重みは、例えば固定値であってもよく、変動値であってもよい。 Specifically, such a function is a function that multiplies each of the filter vector and the query vector by a predetermined weight and sums them up. This weight may be, for example, a fixed value or a variable value.

変動値としての重み値は、例えば、フィルタ条件に合致する検索対象の数に応じて決定する。すなわち、フィルタ条件に合致する検索対象の数が多い程、フィルタベクトルに乗算する重み値を大きくし、検索対象の数が少ない程、フィルタベクトルに乗算する重み値を小さくする。これにより、補正ベクトルを用いて検索処理を行う場合に、検索結果に含まれるコンテンツの数が少なくなることを高精度に回避することができる。 A weight value as a variable value is determined, for example, according to the number of search targets that match the filter conditions. That is, the greater the number of search targets that match the filter condition, the greater the weight value to be multiplied by the filter vector, and the less the number of search targets, the smaller the weight value to be multiplied by the filter vector. As a result, when the search process is performed using the correction vector, it is possible to avoid, with high accuracy, a decrease in the number of contents included in the search results.

また、生成部３３は、フィルタ条件が複数存在する場合、例えば、各フィルタ条件に対応する複数のフィルタベクトルで独立してクエリベクトルを補正してもよい。あるいは、生成部３３は、複数のフィルタベクトルの代表ベクトル（例えば、重心）を算出し、代表ベクトルによりクエリベクトルを補正してもよい。 Further, when there are a plurality of filter conditions, the generation unit 33 may, for example, independently correct the query vector using a plurality of filter vectors corresponding to each filter condition. Alternatively, the generation unit 33 may calculate a representative vector (for example, the center of gravity) of a plurality of filter vectors, and correct the query vector using the representative vector.

また、生成部３３は、ユーザ情報４２を加味して補正ベクトルを生成してもよい。例えば、生成部３３は、ユーザの属性情報や行動情報をユーザベクトルに変換し、ユーザベクトルと、フィルタベクトルとに基づいてクエリベクトルを補正してもよい。 Further, the generation unit 33 may generate the correction vector by adding the user information 42 . For example, the generation unit 33 may convert user attribute information and behavior information into a user vector, and correct the query vector based on the user vector and the filter vector.

また、生成部３３は、検索クエリを入力したユーザとは異なる他のユーザのユーザ情報を用いて補正ベクトルを生成してもよい。他のユーザは、属性情報や行動情報が類似する他のユーザを抽出する。 Moreover, the generation unit 33 may generate the correction vector using user information of a user other than the user who has input the search query. For other users, other users with similar attribute information and behavior information are extracted.

また、生成部３３は、検索クエリの属性を推定して属性ベクトルに変換し、属性ベクトルに基づいて補正ベクトルを生成してもよい。 Further, the generation unit 33 may estimate the attribute of the search query, convert it into an attribute vector, and generate the correction vector based on the attribute vector.

分類部３４は、コンテンツベクトルをクラスタに分類する。例えば、分類部３４は、コンテンツベクトルの向きおよびノルムが所定の範囲内であるものを同じクラスタに分類する。あるいは、分類部３４は、ベクトルの向きおよびノルムを所定間隔の領域毎に区切り、各領域に含まれるコンテンツベクトルを同じクラスタに分類する。 The classification unit 34 classifies the content vectors into clusters. For example, the classification unit 34 classifies contents vectors whose orientation and norm are within a predetermined range into the same cluster. Alternatively, the classification unit 34 divides the direction and norm of the vector into regions at predetermined intervals, and classifies the content vectors included in each region into the same cluster.

特定部３５は、分類部３４によって分類されたクラスタの中から、クエリベクトルと類似するクラスタを特定する。例えば、特定部３５は、クラスタに含まれるコンテンツベクトルの代表ベクトルを算出し、代表ベクトルとクエリベクトルとの距離に基づく類似度によりクラスタを特定する。 The identifying unit 35 identifies clusters similar to the query vector from among the clusters classified by the classifying unit 34 . For example, the identifying unit 35 calculates a representative vector of the content vectors included in the cluster, and identifies the cluster based on the similarity based on the distance between the representative vector and the query vector.

なお、代表ベクトルは、クラスタに含まれるコンテンツベクトルの重心や、平均値を用いることができる。 Note that the center of gravity of the content vectors included in the cluster or the average value can be used as the representative vector.

また、特定部３５は、例えば、取得部３１がフィルタ条件を取得した場合、フィルタ条件に合致するコンテンツベクトルが含まれるクラスタを抽出し、抽出したクラスタに対して上記した特定処理を行ってもよい。 Further, for example, when the obtaining unit 31 obtains a filter condition, the identifying unit 35 may extract a cluster including a content vector that matches the filter condition, and perform the above-described identifying process on the extracted cluster. .

また、特定部３５は、クラスタの代表ベクトルのノルムまたは向きがクエリベクトルと類似する場合に、かかるクラスタを特定してもよい。 Further, the identifying unit 35 may identify the cluster when the norm or direction of the representative vector of the cluster is similar to the query vector.

設定部３６は、各コンテンツに対応するコンテンツベクトルの中から、ノルムが所定値以上のコンテンツベクトルを抽出し、抽出したコンテンツベクトルに対応するコンテンツを検索クエリに対応するクエリベクトルとの内積計算の対象コンテンツとして設定する。 The setting unit 36 extracts a content vector whose norm is equal to or greater than a predetermined value from the content vectors corresponding to each content, and selects the content corresponding to the extracted content vector as the target of the inner product calculation with the query vector corresponding to the search query. Set as content.

例えば、設定部３６は、上記したノルム差が所定値以上、かつ、コンテンツベクトルのノルムが所定値以上であるコンテンツを対象コンテンツとして設定する。すなわち、設定部３６は、クエリベクトルよりもノルムが極端に長いコンテンツベクトルのコンテンツを対象コンテンツとして設定する。なお、設定部３６は、クエリベクトルよりもノルムが極端に短いコンテンツベクトルのコンテンツを対象コンテンツとして設定してもよい。 For example, the setting unit 36 sets, as the target content, the content whose norm difference is equal to or greater than a predetermined value and whose content vector norm is equal to or greater than a predetermined value. In other words, the setting unit 36 sets the content of the content vector whose norm is extremely longer than the query vector as the target content. Note that the setting unit 36 may set, as the target content, the content of the content vector whose norm is extremely shorter than the query vector.

また、設定部３６は、ノルムが所定値以上の場合に限らず、例えば、全検索対象のうち、ノルムが上位数％のコンテンツを対象コンテンツとして抽出してもよい。 Moreover, the setting unit 36 may extract, as target content, content whose norm is in the top few percent of all search targets, without being limited to the case where the norm is equal to or greater than a predetermined value.

また、設定部３６は、コンテンツベクトルをノルム毎にクラスタリングし、クラスタの代表ベクトルと、クエリベクトルとのノルム差が所定値以上である場合に、かかるクラスタに含まれるコンテンツを対象コンテンツとして設定してもよい。 In addition, the setting unit 36 clusters the content vectors for each norm, and when the norm difference between the representative vector of the cluster and the query vector is equal to or greater than a predetermined value, sets the content included in the cluster as the target content. good too.

また、設定部３６は、ノルム差に加えて、コンテンツベクトルをコンテンツのカテゴリ毎にクラスタリングし、クエリベクトルのカテゴリと一致するクラスタのコンテンツを対象コンテンツとしてもよい。 In addition to the norm difference, the setting unit 36 may cluster the content vector for each content category, and set the content of the cluster that matches the category of the query vector as the target content.

また、設定部３６は、クエリベクトルと類似するクラスタに含まれるコンテンツを対象コンテンツとして設定する。具体的には、設定部３６は、クラスタの代表ベクトルのノルムおよび向きがクエリベクトルのノルムおよび向きと類似する場合、対象コンテンツとして設定する。 Also, the setting unit 36 sets the content included in the cluster similar to the query vector as the target content. Specifically, when the norm and orientation of the cluster representative vector are similar to the norm and orientation of the query vector, the setting unit 36 sets the content as target content.

検索部３７は、設定部３６によって設定された対象コンテンツについて、所定の距離空間におけるクエリベクトルとの距離を計算する。例えば、検索部３７は、クエリベクトルと、対象コンテンツのコンテンツベクトルとの内積を計算する。 The search unit 37 calculates the distance between the target content set by the setting unit 36 and the query vector in a predetermined distance space. For example, the search unit 37 calculates the inner product of the query vector and the content vector of the target content.

そして、検索部３７は、内積に応じた類似度に基づいて検索結果を生成する。例えば、検索部３７は、類似度が高いほど上位にランキングされる検索結果を生成する。 Then, the search unit 37 generates a search result based on the degree of similarity according to the inner product. For example, the search unit 37 generates search results in which the higher the degree of similarity, the higher the ranking.

出力部３８は、検索部３７によって生成された検索結果をユーザ端末５０へ出力する。 The output unit 38 outputs search results generated by the search unit 37 to the user terminal 50 .

次に、図８～図１０を用いて、実施形態に係る情報処理装置１が実行する情報処理その１～その３の処理手順について説明する。図８～図１０は、実施形態に係る情報処理装置１が実行する情報処理その１～その３の処理手順を示すフローチャートである。 Next, processing procedures of information processing 1 to 3 executed by the information processing apparatus 1 according to the embodiment will be described with reference to FIGS. 8 to 10. FIG. 8 to 10 are flow charts showing processing procedures of information processing 1 to 3 executed by the information processing apparatus 1 according to the embodiment.

まず、図８を用いて、情報処理その１の処理手順について説明する。 First, the processing procedure of information processing 1 will be described with reference to FIG.

図８に示すように、制御部３は、まず、ユーザ端末５０から検索クエリおよびフィルタ条件を取得する（ステップＳ１０１）。つづいて、制御部３は、検索クエリおよびフィルタ条件をそれぞれクエリベクトルおよびフィルタベクトルに変換する（ステップＳ１０２）。つづいて、制御部３は、フィルタベクトルに基づいてクエリベクトルを補正することで補正ベクトルを生成する（ステップＳ１０３）。つづいて、制御部３は、補正ベクトルに基づいて検索対象を検索する（ステップＳ１０４）。つづいて、制御部３は、検索結果をユーザ端末５０へ出力し（ステップＳ１０５）、処理を終了する。 As shown in FIG. 8, the control unit 3 first acquires a search query and filter conditions from the user terminal 50 (step S101). Subsequently, the control unit 3 converts the search query and filter conditions into query vectors and filter vectors, respectively (step S102). Subsequently, the control unit 3 generates a correction vector by correcting the query vector based on the filter vector (step S103). Subsequently, the control unit 3 searches for a search target based on the correction vector (step S104). Subsequently, the control unit 3 outputs the search result to the user terminal 50 (step S105), and terminates the process.

次に、図９を用いて、情報処理その２の処理手順について説明する。 Next, the processing procedure of the second information processing will be described with reference to FIG.

図９に示すように、制御部３は、まず、ユーザ端末５０から検索クエリを取得する（ステップＳ２０１）。つづいて、制御部３は、内積空間におけるクエリベクトルおよびコンテンツベクトルに変換する（ステップＳ２０２）。つづいて、制御部３は、クエリベクトルとのノルム差が所定値以上のコンテンツを対象コンテンツに設定（ステップＳ２０４）。つづいて、制御部３は、対象コンテンツのコンテンツベクトルとクエリベクトルとの内積計算を行う（ステップＳ２０５）。つづいて、制御部３は、計算結果に基づいて検索結果を生成する（ステップＳ２０６）。つづいて、制御部３は、検索結果をユーザ端末５０へ出力し（ステップＳ２０７）、処理を終了する。 As shown in FIG. 9, the control unit 3 first acquires a search query from the user terminal 50 (step S201). Subsequently, the control unit 3 converts them into query vectors and content vectors in the inner product space (step S202). Subsequently, the control unit 3 sets content whose norm difference from the query vector is equal to or greater than a predetermined value as target content (step S204). Subsequently, the control unit 3 performs inner product calculation of the content vector of the target content and the query vector (step S205). Subsequently, the control unit 3 generates search results based on the calculation results (step S206). Subsequently, the control unit 3 outputs the search result to the user terminal 50 (step S207), and terminates the process.

次に、図１０を用いて、情報処理その３の処理手順について説明する。 Next, the processing procedure of information processing 3 will be described with reference to FIG.

図１０に示すように、制御部３は、まず、コンテンツベクトルをクラスタに分類する（ステップＳ３０１）。つづいて、制御部３は、検索クエリを取得し、クエリベクトルに変換する（ステップＳ３０２）。つづいて、制御部３は、クエリベクトルと類似するクラスタと特定する（ステップＳ３０３）。つづいて、制御部３は、特定したクラスタに含まれるコンテンツを対象コンテンツに設定する（ステップＳ３０４）。つづいて、制御部３は、対象コンテンツのコンテンツベクトルとクエリベクトルとの内積計算を行う（ステップＳ３０５）。つづいて、制御部３は、計算結果に基づいて検索結果を生成する（ステップＳ３０６）。つづいて、制御部３は、検索結果をユーザ端末５０へ出力し（ステップＳ３０７）、処理を終了する。 As shown in FIG. 10, the control unit 3 first classifies content vectors into clusters (step S301). Subsequently, the control unit 3 acquires a search query and converts it into a query vector (step S302). Subsequently, the control unit 3 identifies a cluster similar to the query vector (step S303). Subsequently, the control unit 3 sets the content included in the identified cluster as target content (step S304). Subsequently, the control unit 3 performs inner product calculation of the content vector of the target content and the query vector (step S305). Subsequently, the control unit 3 generates search results based on the calculation results (step S306). Subsequently, the control unit 3 outputs the search result to the user terminal 50 (step S307), and ends the process.

〔その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の一部を手動的に行うこともできる。あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。〔others〕
Also, among the processes described in the above embodiments, some of the processes described as being automatically performed can also be performed manually. Alternatively, all or part of the processes described as being performed manually can be performed automatically by known methods. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Also, each component of each device illustrated is functionally conceptual, and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

例えば、図５に示した記憶部４の一部又は全部は、各装置によって保持されるのではなく、ストレージサーバ等に保持されてもよい。この場合、各装置は、ストレージサーバにアクセスすることで、各種情報を取得する。 For example, part or all of the storage unit 4 shown in FIG. 5 may be held in a storage server or the like instead of being held by each device. In this case, each device acquires various information by accessing the storage server.

〔ハードウェア構成〕
また、上述してきた実施形態に係る情報処理装置１は、例えば図１１に示すような構成のコンピュータ１０００によって実現される。図１１は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Interface）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 [Hardware configuration]
Further, the information processing apparatus 1 according to the embodiment described above is implemented by a computer 1000 configured as shown in FIG. 11, for example. FIG. 11 is a diagram illustrating an example of a hardware configuration; A computer 1000 is connected to an output device 1010 and an input device 1020, and an arithmetic device 1030, a primary storage device 1040, a secondary storage device 1050, an output IF (Interface) 1060, an input IF 1070, and a network IF 1080 are connected via a bus 1090. have

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭ等、演算装置１０３０が各種の演算に用いるデータを一時的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ（Read Only Memory）、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ等により実現される。 The arithmetic device 1030 operates based on programs stored in the primary storage device 1040 and the secondary storage device 1050, programs read from the input device 1020, and the like, and executes various processes. The primary storage device 1040 is a memory device such as a RAM that temporarily stores data used by the arithmetic device 1030 for various calculations. The secondary storage device 1050 is a storage device in which data used for various calculations by the arithmetic device 1030 and various databases are registered. It is realized by

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインタフェースであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナ等といった各種の入力装置１０２０から情報を受信するためのインタフェースであり、例えば、ＵＳＢ等により実現される。 The output IF 1060 is an interface for transmitting information to be output to the output device 1010 that outputs various types of information such as a monitor and a printer. It is realized by a connector conforming to a standard such as HDMI (registered trademark) (High Definition Multimedia Interface). Also, the input IF 1070 is an interface for receiving information from various input devices 1020 such as a mouse, keyboard, scanner, etc., and is realized by, for example, USB.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等から情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリ等の外付け記憶媒体であってもよい。 Note that the input device 1020 includes, for example, optical recording media such as CDs (Compact Discs), DVDs (Digital Versatile Discs), PDs (Phase change rewritable discs), magneto-optical recording media such as MOs (Magneto-Optical discs), and tapes. It may be a device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like. Also, the input device 1020 may be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 Network IF 1080 receives data from other devices via network N and sends the data to arithmetic device 1030, and also transmits data generated by arithmetic device 1030 via network N to other devices.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic device 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070 . For example, arithmetic device 1030 loads a program from input device 1020 or secondary storage device 1050 onto primary storage device 1040 and executes the loaded program.

例えば、コンピュータ１０００が情報処理装置１として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部３の機能を実現する。 For example, when the computer 1000 functions as the information processing device 1 , the arithmetic device 1030 of the computer 1000 implements the functions of the control unit 3 by executing programs loaded on the primary storage device 1040 .

〔効果〕
上述してきたように、実施形態に係る情報処理装置１は、変換部３２と、生成部３３と、検索部３７とを備える。変換部３２は、ユーザが入力した検索クエリをクエリベクトルに、検索対象の検索範囲を指定するフィルタ条件をフィルタベクトルにそれぞれ変換する。生成部３３は、フィルタベクトルに基づいてクエリベクトルを補正することで補正ベクトルを生成する。検索部３７は、補正ベクトルに基づいて検索対象を検索する。生成部３３は、フィルタベクトルおよびクエリベクトルを変数として含む関数の出力を補正ベクトルとして生成する。生成部３３は、フィルタベクトルおよびクエリベクトルの変数それぞれに重みを乗算して足し合わせる関数の出力を補正ベクトルとして生成する。重みは、フィルタ条件と一致する検索対象の数に応じた値が設定される。生成部３３は、フィルタベクトルおよびクエリベクトルの重心を補正ベクトルとして生成する。生成部３３は、ユーザに関するユーザ情報をベクトル化したユーザベクトルと、フィルタベクトルとに基づいて補正ベクトルを生成する。ユーザ情報は、ユーザの属性情報および行動情報の少なくとも１つを含む。生成部３３は、ユーザと類似する他のユーザのユーザ情報をベクトル化したベクトルに基づいて補正ベクトルを生成すること。変換部３２は、複数のフィルタ条件それぞれに対応する複数のフィルタベクトルに変換する。生成部３３は、複数のフィルタベクトルに基づいて補正ベクトルを生成する。このような構成により、検索結果の質を高めることができる。〔effect〕
As described above, the information processing device 1 according to the embodiment includes the conversion unit 32, the generation unit 33, and the search unit 37. The conversion unit 32 converts a search query input by the user into a query vector, and converts a filter condition specifying a search range of a search target into a filter vector. The generation unit 33 generates a correction vector by correcting the query vector based on the filter vector. The search unit 37 searches for a search target based on the correction vector. The generation unit 33 generates, as a correction vector, the output of a function including the filter vector and the query vector as variables. The generation unit 33 generates, as a correction vector, the output of a function that multiplies and adds weights to the variables of the filter vector and the query vector. A weight is set according to the number of search targets that match the filter condition. The generation unit 33 generates the center of gravity of the filter vector and the query vector as a correction vector. The generation unit 33 generates a correction vector based on the user vector obtained by vectorizing the user information about the user and the filter vector. User information includes at least one of user attribute information and behavior information. The generation unit 33 generates a correction vector based on a vector obtained by vectorizing user information of other users similar to the user. The conversion unit 32 converts into a plurality of filter vectors respectively corresponding to a plurality of filter conditions. The generator 33 generates correction vectors based on the plurality of filter vectors. Such a configuration can improve the quality of search results.

また、上述してきたように、実施形態に係る情報処理装置１は、変換部３２と、設定部３６とを備える。変換部３２は、ユーザが入力した検索クエリと、検索対象となる各コンテンツとをそれぞれ内積空間におけるベクトルに変換する。設定部３６は、各コンテンツに対応するコンテンツベクトルの中から、ノルムが所定値以上のコンテンツベクトルを抽出し、抽出したコンテンツベクトルに対応するコンテンツを検索クエリに対応するクエリベクトルとの内積計算の対象コンテンツとして設定する。設定部３６は、コンテンツベクトルをノルムに応じてクラスタリングし、クラスタに含まれるコンテンツベクトルの代表ベクトルと、クエリベクトルとのノルム差に基づいて対象コンテンツを設定する。代表ベクトルは、クラスタに含まれるコンテンツベクトルの重心である。設定部３６は、ノルム差が所定値以上、かつ、コンテンツベクトルのノルムが所定値以上のコンテンツを対象コンテンツとして設定する。設定部３６は、ノルム差が所定値以上、かつ、コンテンツベクトルのノルムが上位から所定の割合までのコンテンツを対象コンテンツとして設定する。設定部３６は、コンテンツベクトルをコンテンツのカテゴリ毎のクラスタに分類し、検索クエリのカテゴリと一致するクラスタの中から、対象コンテンツを設定する。このような構成により、検索結果の質を高めることができる。 Further, as described above, the information processing apparatus 1 according to the embodiment includes the conversion section 32 and the setting section 36 . The conversion unit 32 converts the search query input by the user and each content to be searched into vectors in the inner product space. The setting unit 36 extracts a content vector whose norm is equal to or greater than a predetermined value from the content vectors corresponding to each content, and selects the content corresponding to the extracted content vector as the target of the inner product calculation with the query vector corresponding to the search query. Set as content. The setting unit 36 clusters the content vectors according to the norm, and sets the target content based on the norm difference between the representative vector of the content vectors included in the cluster and the query vector. The representative vector is the centroid of the content vectors included in the cluster. The setting unit 36 sets, as the target content, the content whose norm difference is equal to or greater than a predetermined value and whose content vector norm is equal to or greater than a predetermined value. The setting unit 36 sets, as the target content, the content whose norm difference is equal to or greater than a predetermined value and whose content vector norm is up to a predetermined percentage from the top. The setting unit 36 classifies the content vectors into clusters for each content category, and sets target content from clusters that match the category of the search query. Such a configuration can improve the quality of search results.

また、上述してきたように、実施形態に係る情報処理装置１は、分類部３４と、特定部３５と、検索部３７とを備える。分類部３４は、検索対象となる各コンテンツに対応するコンテンツベクトルをクラスタに分類する。特定部３５は、ユーザが入力した検索クエリに対応するクエリベクトルと類似するクラスタを特定する。検索部３７は、特定したクラスタに含まれるコンテンツベクトルに対応するコンテンツを検索対象として検索する。特定部３５は、クラスタに含まれるコンテンツベクトルの代表ベクトルがクエリベクトルと類似するクラスタを特定する。代表ベクトルは、クラスタに含まれるコンテンツベクトルの重心である。特定部３５は、検索対象の検索範囲を指定するフィルタ条件に合致するクラスタを抽出し、抽出したクラスタの中からクエリベクトルと類似するクラスタを特定する。特定部３５は、クエリベクトルの向きおよびノルムが類似するクラスタを特定する。このような構成により、検索結果の質を高めることができる。 Further, as described above, the information processing apparatus 1 according to the embodiment includes the classification section 34 , the identification section 35 and the search section 37 . The classification unit 34 classifies the content vectors corresponding to each content to be searched into clusters. The identifying unit 35 identifies clusters similar to the query vector corresponding to the search query input by the user. The search unit 37 searches for content corresponding to the content vector included in the specified cluster as a search target. The identifying unit 35 identifies a cluster in which the representative vector of the content vectors included in the cluster is similar to the query vector. The representative vector is the centroid of the content vectors included in the cluster. The identifying unit 35 extracts clusters that match a filter condition specifying a search range of a search target, and identifies clusters that are similar to the query vector from among the extracted clusters. The identifying unit 35 identifies clusters having similar query vector directions and norms. Such a configuration can improve the quality of search results.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail based on the drawings. It is possible to carry out the invention in other forms with modifications.

〔その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。〔others〕
Further, among the processes described in the above embodiments, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being performed manually can be performed manually. All or part of this can also be done automatically by known methods. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the illustrated information.

また、上述してきた実施形態に記載した各処理は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Further, each processing described in the above-described embodiments can be appropriately combined within a range that does not contradict the content of the processing.

また、上記してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、制御部３は、制御手段や制御回路に読み替えることができる。 Also, the "section, module, unit" described above can be read as "means" or "circuit". For example, the control unit 3 can be read as control means or a control circuit.

１情報処理装置
２通信部
３制御部
４記憶部
３１取得部
３２変換部
３３生成部
３４分類部
３５特定部
３６設定部
３７検索部
３８出力部
４１コンテンツ情報
４２ユーザ情報
５０ユーザ端末
Ｎネットワーク
Ｓ情報処理システム 1 information processing device 2 communication unit 3 control unit 4 storage unit 31 acquisition unit 32 conversion unit 33 generation unit 34 classification unit 35 identification unit 36 setting unit 37 search unit 38 output unit 41 content information 42 user information 50 user terminal N network S information processing system

Claims

a conversion unit that converts a search query input by a user and each content to be searched into vectors in an inner product distance space;
a classification unit that classifies the content vectors corresponding to the respective contents into clusters;
an identifying unit that identifies the clusters similar to the query vector corresponding to the search query;
a search unit that searches for content corresponding to the content vector included in the identified cluster by calculating an inner product of the content vector and the query vector;
An information processing apparatus, comprising: a setting unit that sets, as content for inner product calculation in the search unit, content whose norm difference between the query vector and the content vector is equal to or greater than a predetermined value.

The identification unit
The information processing apparatus according to claim 1, wherein the cluster in which a representative vector of the content vectors included in the cluster is similar to the query vector is specified.

The representative vector is
3. The information processing apparatus according to claim 2, wherein the center of gravity of said content vector included in said cluster.

The identification unit
4. The method according to any one of claims 1 to 3, wherein the clusters that match a filter condition specifying the search range of the search target are extracted, and the clusters that are similar to the query vector are specified from among the extracted clusters. The information processing apparatus according to any one of the above.

The identification unit
The information processing apparatus according to any one of claims 1 to 4, wherein said clusters having similar orientations and norms of said query vectors are identified.

a conversion unit that converts the search query into a query vector and a filter condition specifying the search range of the search target into a filter vector;
a generation unit that generates a correction vector that corrects the query vector based on the filter vector;
further comprising
The identification unit
The information processing apparatus according to any one of claims 1 to 5, wherein the cluster similar to the correction vector is specified.

A computer-executed information processing method comprising:
a conversion step of converting a search query input by a user and each content to be searched into vectors in an inner product distance space;
a classification step of classifying content vectors corresponding to the respective contents into clusters;
an identifying step of identifying the clusters similar to a query vector corresponding to the search query;
a search step of searching content corresponding to the content vector included in the identified cluster as a search target by inner product calculation with the query vector;
and a setting step of setting a content whose norm difference between the query vector and the content vector is equal to or greater than a predetermined value as content to be subjected to inner product calculation in the search step.

a conversion procedure for converting a search query input by a user and each content to be searched into vectors in an inner product distance space;
a classification procedure for classifying content vectors corresponding to the respective contents into clusters;
an identifying step of identifying the clusters similar to the query vector corresponding to the search query;
a search procedure for searching content corresponding to the content vector included in the identified cluster as a search target by inner product calculation with the query vector;
An information processing program for causing a computer to execute: a setting procedure for setting content having a norm difference between the query vector and the content vector equal to or greater than a predetermined value as target content for inner product calculation in the search procedure.