JP7309673B2

JP7309673B2 - Information processing device, information processing method, and program

Info

Publication number: JP7309673B2
Application number: JP2020141065A
Authority: JP
Inventors: 茂莉黒川; 慧米川
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2023-07-18
Anticipated expiration: 2040-08-24
Also published as: JP2022036713A

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

ユーザに対してアイテムの推薦をする際、アイテムを推薦する対象となる対象ユーザが評価したアイテムを同じように評価している類似ユーザの過去評価を参照して、対象ユーザが評価する可能性の高いアイテムを推定し補完する技術がある。このような技術の中には、他事業の顧客に対して自事業のアイテムの推薦などのアプローチを行うことも行われている。ここで、複数の事業間で共通の顧客に対して共通なクラスタを付与し、それをもとにデータの関連付けを行う技術も存在する（例えば特許文献１参照）。 When recommending an item to a user, it is possible for the target user to evaluate the item by referring to the past evaluations of similar users who have similarly evaluated the item evaluated by the target user to whom the item is to be recommended. There are techniques to estimate and complement expensive items. Among such techniques, approaches such as recommending items of one's own business to customers of other businesses are also performed. Here, there is also a technique of assigning a common cluster to a common customer among a plurality of businesses and relating data based on the cluster (for example, see Patent Document 1).

特開２０１９－１７５４１９号公報JP 2019-175419 A

上記の技術は、あらかじめ顧客がクラスタに分類されていることを前提としており、クラスタリングの精度を高めるための技術ではない。しかしながら、クラスタの分類精度は推定精度に影響するため、クラスタの分類を固定してしまうと、推定精度の向上には限界があると考えられる。このように、類似した情報に基づいて欠落している情報を推定する技術には改善の余地があると考えられる。 The above technique is based on the premise that customers are classified into clusters in advance, and is not a technique for improving the accuracy of clustering. However, since the cluster classification accuracy affects the estimation accuracy, if the cluster classification is fixed, it is considered that there is a limit to the improvement of the estimation accuracy. Thus, it is considered that there is room for improvement in techniques for estimating missing information based on similar information.

本発明はこれらの点に鑑みてなされたものであり、類似した情報に基づいて推定する欠落情報の推定精度を向上させるための技術を提供することを目的とする。 The present invention has been made in view of these points, and it is an object of the present invention to provide a technique for improving the estimation accuracy of missing information estimated based on similar information.

本発明の第１の態様は、情報処理装置である。この装置は、（１）第１ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報とを含む第１ユーザ群情報と、（２）第２ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報のみを含む第２ユーザ群情報と、（３）第３ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報のみを含む第３ユーザ群情報と、を取得するユーザ群情報取得部と、前記第１ユーザ群に含まれるユーザそれぞれが、複数のクラスタそれぞれに所属する確率を示す所属確率と、各クラスタにおける前記第１属性情報と前記第２属性情報とのそれぞれの実現確率を示す属性確率と、を設定する確率設定部と、前記所属確率と前記属性確率とに基づいて、前記第２ユーザ群に含まれるユーザそれぞれに関する複数の前記第２属性情報と、前記第３ユーザ群に含まれるユーザそれぞれに関する複数の前記第１属性情報と、を生成する補完データ生成部と、前記第１ユーザ群、前記第２ユーザ群、及び前記第３ユーザ群に含まれる各ユーザの第１属性情報及び第２属性情報に、前記補完データ生成部が生成した情報が含まれるか否かを識別する識別関数から定まる識別評価関数と、生成された前記第１属性情報又は前記第２属性情報と、前記第１ユーザ群、前記第２ユーザ群、及び前記第３ユーザ群に含まれる各ユーザの第１属性情報又は第２属性情報の誤差を評価する生成評価関数とに基づいて、敵対的生成ネットワークを用いて前記識別関数、前記所属確率、及び前記属性確率を更新する更新部と、を備え、前記補完データ生成部は、前記更新部が更新した前記所属確率と前記属性確率とに基づいて、前記第２ユーザ群に含まれるユーザそれぞれに関する複数の前記第２属性情報と、前記第３ユーザ群に含まれるユーザそれぞれに関する複数の前記第１属性情報と、を生成する。 A first aspect of the present invention is an information processing apparatus. (1) a user identifier for specifying each user included in a first user group; a plurality of first attribute information included in a first attribute category for each user; first user group information including a plurality of pieces of second attribute information included in the attribute category; (2) user identifiers for specifying each user included in the second user group; and first attribute category for each user (3) a user identifier for identifying each user included in the third user group, and a second attribute category for each user included in the a user group information acquisition unit that acquires third user group information that includes only a plurality of second attribute information, and an affiliation indicating the probability that each user included in the first user group belongs to each of a plurality of clusters a probability setting unit for setting a probability and an attribute probability indicating the realization probability of each of the first attribute information and the second attribute information in each cluster; and based on the belonging probability and the attribute probability, the a complementary data generation unit that generates a plurality of the second attribute information regarding each of the users included in the second user group and a plurality of the first attribute information regarding each of the users included in the third user group; identifying whether information generated by the complementary data generating unit is included in the first attribute information and the second attribute information of each user included in the first user group, the second user group, and the third user group; The identification evaluation function determined from the identification function, the generated first attribute information or the second attribute information, and each user included in the first user group, the second user group, and the third user group an updating unit that updates the discriminant function, the belonging probability, and the attribute probability using a generative adversarial network based on a generated evaluation function that evaluates an error of the first attribute information or the second attribute information. , the complementary data generating unit, based on the belonging probability and the attribute probability updated by the updating unit, generates a plurality of the second attribute information about each of the users included in the second user group, and the third user and generating a plurality of pieces of first attribute information about each of the users included in the group.

前記確率設定部は、前記属性確率の初期値を設定する初期確率設定部と、ＥＭ（Expectation-Maximization）アルゴリズムのＥステップにより前記所属確率を更新する所属確率更新部と、前記ＥＭアルゴリズムのＭステップにより前記属性確率を更新する属性確率更新部と、所定の収束条件を満たすまで前記所属確率更新部に前記所属確率を更新させるとともに、前記属性確率更新部に前記属性確率を更新させる最適化部と、を備えてもよい。 The probability setting unit includes an initial probability setting unit that sets an initial value of the attribute probability, an affiliation probability updating unit that updates the affiliation probability by the E step of an EM (Expectation-Maximization) algorithm, and an M step of the EM algorithm. an attribute probability updating unit for updating the attribute probabilities by and an optimization unit for causing the attribute probability updating unit to update the attribute probabilities until a predetermined convergence condition is satisfied, and for causing the attribute probability updating unit to update the attribute probabilities; , may be provided.

前記第１属性カテゴリーと前記第２属性カテゴリーとは、それぞれ商品に関する異なるカテゴリーを示してもよく、前記第１属性情報と前記第２属性情報とは、それぞれ前記第１属性カテゴリーと前記第２属性カテゴリーとに含まれる商品の購入の有無を示す情報であってもよく、前記情報処理装置は、前記補完データ生成部が生成した前記第２ユーザ群に含まれる前記第２属性情報に基づいて前記第２ユーザ群の各ユーザに推薦する前記第２属性カテゴリーの商品を選択するとともに、前記補完データ生成部が生成した前記第３ユーザ群に含まれる前記第１属性情報に基づいて前記第３ユーザ群の各ユーザに推薦する前記第１属性カテゴリーの商品を選択する推薦商品選択部をさらに備えてもよい。 The first attribute category and the second attribute category may indicate different categories related to products, respectively, and the first attribute information and the second attribute information are the first attribute category and the second attribute, respectively. The information may be information indicating whether or not a product included in the category is purchased, and the information processing device generates the second attribute information included in the second user group generated by the complementary data generation unit. Selecting the product of the second attribute category recommended to each user of the second user group, and the third user based on the first attribute information included in the third user group generated by the complementary data generation unit A recommended product selection unit that selects products of the first attribute category to be recommended to each user of the group may be further provided.

本発明の第２の態様は、情報処理方法である。この方法において、プロセッサが、（１）第１ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報とを含む第１ユーザ群情報と、（２）第２ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報のみを含む第２ユーザ群情報と、（３）第３ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報のみを含む第３ユーザ群情報と、を取得するステップと、前記第１ユーザ群に含まれるユーザそれぞれが、複数のクラスタそれぞれに所属する確率を示す所属確率と、各クラスタにおける前記第１属性情報と前記第２属性情報とのそれぞれの実現確率を示す属性確率と、を設定するステップと、前記所属確率と前記属性確率とに基づいて、前記第２ユーザ群に含まれるユーザそれぞれに関する複数の前記第２属性情報と、前記第３ユーザ群に含まれるユーザそれぞれに関する複数の前記第１属性情報と、を生成するステップと、前記第１ユーザ群、前記第２ユーザ群、及び前記第３ユーザ群に含まれる各ユーザの第１属性情報及び第２属性情報に生成された情報が含まれるか否かを識別する識別関数から定まる識別評価関数と、生成された前記第１属性情報又は前記第２属性情報と、前記第１ユーザ群、前記第２ユーザ群、及び前記第３ユーザ群に含まれる各ユーザの第１属性情報又は第２属性情報の誤差を評価する生成評価関数とに基づいて、敵対的生成ネットワークを用いて前記識別関数、前記所属確率、及び前記属性確率を更新するステップと、更新された前記所属確率と前記属性確率とに基づいて、前記第２ユーザ群に含まれるユーザそれぞれに関する複数の前記第２属性情報と、前記第３ユーザ群に含まれるユーザそれぞれに関する複数の前記第１属性情報と、を生成するステップと、を実行する。 A second aspect of the present invention is an information processing method. In this method, the processor performs (1) a user identifier for specifying each user included in the first user group, a plurality of first attribute information included in the first attribute category for each user, each user (2) a user identifier for identifying each user included in the second user group; Second user group information including only a plurality of first attribute information included in one attribute category; (3) user identifiers for identifying each user included in the third user group; and second attributes related to each user a third user group information containing only a plurality of second attribute information included in the category; and a belonging probability indicating a probability that each user included in the first user group belongs to each of a plurality of clusters. and an attribute probability indicating a realization probability of each of the first attribute information and the second attribute information in each cluster; and based on the belonging probability and the attribute probability, the second user a step of generating a plurality of said second attribute information relating to each user included in the group and a plurality of said first attribute information relating to each user included in said third user group; an identification evaluation function determined from an identification function that identifies whether or not the generated information is included in the first attribute information and the second attribute information of each user included in the second user group and the third user group; The error between the first attribute information or the second attribute information and the first attribute information or the second attribute information of each user included in the first user group, the second user group, and the third user group updating the discriminant function, the membership probabilities, and the attribute probabilities using a generative adversarial network based on a generative evaluation function to be evaluated; and based on the updated membership probabilities and the attribute probabilities, generating a plurality of pieces of second attribute information about each user included in the second user group and a plurality of pieces of first attribute information about each user included in the third user group.

本発明の第３の態様は、プログラムである。このプログラムは、コンピュータに、（１）第１ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報とを含む第１ユーザ群情報と、（２）第２ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報のみを含む第２ユーザ群情報と、（３）第３ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報のみを含む第３ユーザ群情報と、を取得する機能と、前記第１ユーザ群に含まれるユーザそれぞれが、複数のクラスタそれぞれに所属する確率を示す所属確率と、各クラスタにおける前記第１属性情報と前記第２属性情報とのそれぞれの実現確率を示す属性確率と、を設定する機能と、前記所属確率と前記属性確率とに基づいて、前記第２ユーザ群に含まれるユーザそれぞれに関する複数の前記第２属性情報と、前記第３ユーザ群に含まれるユーザそれぞれに関する複数の前記第１属性情報と、を生成する機能と、前記第１ユーザ群、前記第２ユーザ群、及び前記第３ユーザ群に含まれる各ユーザの第１属性情報及び第２属性情報に生成された情報が含まれるか否かを識別する識別関数から定まる識別評価関数と、生成された前記第１属性情報又は前記第２属性情報と、前記第１ユーザ群、前記第２ユーザ群、及び前記第３ユーザ群に含まれる各ユーザの第１属性情報又は第２属性情報の誤差を評価する生成評価関数とに基づいて、敵対的生成ネットワークを用いて前記識別関数、前記所属確率、及び前記属性確率を更新する機能と、更新された前記所属確率と前記属性確率とに基づいて、前記第２ユーザ群に含まれるユーザそれぞれに関する複数の前記第２属性情報と、前記第３ユーザ群に含まれるユーザそれぞれに関する複数の前記第１属性情報と、を生成する機能と、を実現させる。 A third aspect of the present invention is a program. This program stores (1) a user identifier for specifying each user included in a first user group, a plurality of first attribute information included in a first attribute category for each user, each user (2) a user identifier for identifying each user included in the second user group; Second user group information including only a plurality of first attribute information included in one attribute category; (3) user identifiers for identifying each user included in the third user group; and second attributes related to each user third user group information containing only a plurality of second attribute information included in the category; and belonging probability indicating the probability that each user included in the first user group belongs to each of a plurality of clusters. and attribute probabilities indicating realization probabilities of the first attribute information and the second attribute information in each cluster, and based on the belonging probabilities and the attribute probabilities, the second user a function of generating a plurality of second attribute information about each user included in the group and a plurality of first attribute information about each user included in the third user group; an identification evaluation function determined from an identification function that identifies whether or not the generated information is included in the first attribute information and the second attribute information of each user included in the second user group and the third user group; The error between the first attribute information or the second attribute information and the first attribute information or the second attribute information of each user included in the first user group, the second user group, and the third user group a function of updating the discriminant function, the membership probabilities, and the attribute probabilities using a generative adversarial network, based on a generative evaluation function to be evaluated; and based on the updated membership probabilities and the attribute probabilities, A function of generating a plurality of pieces of second attribute information about each user included in the second user group and a plurality of pieces of first attribute information about each user included in the third user group is realized.

このプログラムを提供するため、あるいはプログラムの一部をアップデートするために、このプログラムを記録したコンピュータ読み取り可能な記録媒体が提供されてもよく、また、このプログラムが通信回線で伝送されてもよい。 In order to provide this program or update part of the program, a computer-readable recording medium recording this program may be provided, or this program may be transmitted via a communication line.

なお、以上の構成要素の任意の組み合わせ、本発明の表現を方法、装置、システム、コンピュータプログラム、データ構造、記録媒体などの間で変換したものもまた、本発明の態様として有効である。 Any combination of the above-described components, and expressions of the present invention converted into methods, devices, systems, computer programs, data structures, recording media, etc. are also effective as aspects of the present invention.

本発明によれば、類似した情報に基づいて推定する欠落情報の推定精度を向上させることができる。 According to the present invention, it is possible to improve the estimation accuracy of missing information estimated based on similar information.

実施の形態に係る情報処理装置が実行する推定処理の概要を説明するための図である。FIG. 4 is a diagram for explaining an overview of estimation processing executed by the information processing apparatus according to the embodiment; 実施の形態に係る情報処理装置の機能構成を模式的に示す図である。1 is a diagram schematically showing a functional configuration of an information processing device according to an embodiment; FIG. 確率設定部が設定する所属確率と属性確率との初期値の一例を示す図である。It is a figure which shows an example of the initial value of the belonging probability and attribute probability which a probability setting part sets. 実施の形態に係る補完データ生成部による補完後のユーザデータベースを示す図である。It is a figure which shows the user database after the complementation by the complementation data production|generation part which concerns on embodiment. 二値化処理を施した補完後のユーザデータベースを示す図である。It is a figure which shows the user database after completion|finish which performed binarization processing. 実施の形態に係る情報処理装置が実行する情報処理の流れを説明するためのフローチャートである。4 is a flowchart for explaining the flow of information processing executed by the information processing apparatus according to the embodiment;

＜実施の形態の概要＞
実施の形態の概要を述べる。実施の形態に係る情報処理装置は、欠如している情報と関連する情報に基づいて、欠如している情報を推定して補完する装置である。具体例としては、実施の形態に係る情報処理装置は、あるカテゴリーに含まれるアイテム群（例えば、生鮮食品）と、別のカテゴリーに含まれるアイテム群（例えば、インスタント食品）との両方のカテゴリーにおいて購買履歴があるユーザ群の購買履歴に基づいて、一方のカテゴリーに含まれるアイテム群のみの購買履歴があるユーザについて、他方のカテゴリーに含まれるアイテムを購買するか否かを示す傾向を推定する。 <Overview of Embodiment>
An outline of an embodiment will be described. An information processing apparatus according to an embodiment is an apparatus that estimates and complements missing information based on information related to the missing information. As a specific example, the information processing device according to the embodiment performs Based on the purchase histories of a group of users with purchase histories, the tendency of users who have purchase histories of only items included in one category to purchase items included in the other category is estimated.

図１（ａ）－（ｂ）は、実施の形態に係る情報処理装置が実行する推定処理の概要を説明するための図である。具体的には、図１（ａ）はユーザ属性とユーザ群との関係を説明するための図であり、図１（ｂ）は、実施の形態に係る情報処理装置の推定対象となる情報を説明するための図である。 FIGS. 1(a) and 1(b) are diagrams for explaining an overview of estimation processing executed by the information processing apparatus according to the embodiment. Specifically, FIG. 1A is a diagram for explaining the relationship between user attributes and user groups, and FIG. It is a figure for explaining.

図１（ａ）において、第１ユーザ属性は、第１カテゴリーに含まれるアイテムの購買履歴を示す情報を示す。また、第２ユーザ属性は、第１カテゴリーとは異なる第２カテゴリーに含まれるアイテムの購買履歴を示す。図１（ａ）において、第１ユーザ群は、第１ユーザ属性と第２ユーザ属性との二つの属性を持つユーザの集合である。 In FIG. 1A, the first user attribute indicates information indicating the purchase history of items included in the first category. Also, the second user attribute indicates the purchase history of items included in a second category different from the first category. In FIG. 1A, the first user group is a set of users having two attributes, a first user attribute and a second user attribute.

第２ユーザ群は、第１ユーザ属性を持つが、第２ユーザ属性は持たないユーザの集合である。すなわち、第１ユーザ群に含まれるユーザは、第２カテゴリーのアイテムの購買の傾向をつかむことができないユーザである。一方、第３ユーザ群は、第２ユーザ属性を持つが、第１ユーザ属性は持たないユーザの集合である。すなわち、第３ユーザ群に含まれるユーザは、第１カテゴリーのアイテムの購買の傾向をつかむことができないユーザである。 A second user group is a set of users who have the first user attribute but do not have the second user attribute. That is, the users included in the first user group are users who cannot grasp the purchasing tendency of items in the second category. On the other hand, the third user group is a group of users who have the second user attribute but do not have the first user attribute. That is, users included in the third user group are users who cannot grasp the purchasing tendency of items in the first category.

図１（ｂ）は、各ユーザを特定するためのユーザ識別子と、各ユーザの属性情報とを対応づけて格納するユーザデータベースのデータ構造を模式的に示す図である。実施の形態に係る情報処理装置は、各ユーザにはユーザ識別子を一意に割り当てて管理している。図１（ｂ）においては、第１ユーザ群に含まれるユーザは、ユーザ識別子がＵＩＤ０００１、ＵＩＤ０００２、ＵＩＤ０００３、及びＵＩＤ０００４が割り当てられた４人のユーザである。第２ユーザ群に含まれるユーザは、ユーザ識別子がＵＩＤ０００５、ＵＩＤ０００６、ＵＩＤ０００７、ＵＩＤ０００８、及びＵＩＤ０００９の５人のユーザであり、第３ユーザ群に含まれるユーザは、ユーザ識別子がＵＩＤ００１０及びＵＩＤ００１１の２人のユーザである。 FIG. 1(b) is a diagram schematically showing the data structure of a user database that stores a user identifier for specifying each user and attribute information of each user in association with each other. The information processing apparatus according to the embodiment manages each user by uniquely assigning a user identifier. In FIG. 1B, the users included in the first user group are four users assigned user identifiers UID0001, UID0002, UID0003, and UID0004. The users included in the second user group are five users whose user identifiers are UID0005, UID0006, UID0007, UID0008, and UID0009, and the users included in the third user group are two users whose user identifiers are UID0010 and UID0011. is a user of

図１（ｂ）において、第１属性情報は「Ｓ１」、「Ｓ２」、及び「Ｓ３」で示される３つの情報を含むが、これはそれぞれ第１カテゴリーに含まれる３つの異なるアイテムに関する購買の有無を示す情報であり、具体的には「１」は購買有、「０」は購買無を示している。例えば、ユーザ識別がＵＩＤ０００１であるユーザの「Ｓ１」は「１」であるので、このユーザは過去にＳ１に対応するアイテムを購買したことがあることを示している。同様に、第２属性情報に含まれる「Ｔ１」及び「Ｔ２」は、それぞれ第２カテゴリーに含まれる２つの異なるアイテムに関する購買の有無を示す情報である。例えば、ユーザ識別子がＵＩＤ０００４であるユーザは、過去にＴ１に対応するアイテムを購買したことがないことを示している。 In FIG. 1(b), the first attribute information includes three pieces of information indicated by 'S1', 'S2' and 'S3', each of which is a purchase order for three different items included in the first category. This is information indicating the presence/absence. Specifically, "1" indicates that there is a purchase, and "0" indicates that there is no purchase. For example, "S1" of a user whose user identification is UID0001 is "1", indicating that this user has purchased an item corresponding to S1 in the past. Similarly, "T1" and "T2" included in the second attribute information are information indicating whether or not two different items included in the second category are purchased. For example, a user whose user identifier is UID0004 has never purchased an item corresponding to T1 in the past.

図１（ｂ）において右肩上がりの斜線で示すように、第２ユーザ群に含まれるユーザはいずれも第２属性情報が空欄である。これは、第２ユーザ群に含まれるユーザについては、第２属性情報が不明であることを示している。また、図１（ｂ）において右肩下がりの斜線で示すように、第３ユーザ群に含まれるユーザはいずれも第１属性情報が空欄である。これは、第３ユーザ群に含まれるユーザについては、第１属性情報が不明であることを示している。実施の形態に係る情報処理装置の処理目的は、図１（ｂ）における斜線で示す領域、すなわち、第２ユーザ群に関する第２属性情報と、第３ユーザ群に関する第１属性情報とを推定することである。 As indicated by the upward slanted lines in FIG. 1(b), the second attribute information of all the users included in the second user group is blank. This indicates that the second attribute information is unknown for users included in the second user group. In addition, as indicated by the diagonal lines slanting downward to the right in FIG. 1(b), the first attribute information of all the users included in the third user group is blank. This indicates that the first attribute information is unknown for the users included in the third user group. The processing purpose of the information processing apparatus according to the embodiment is to estimate the area indicated by diagonal lines in FIG. That is.

図１（ａ）－（ｂ）は、説明の便宜のため、ユーザの総数は１１であるが、一般には１１よりも多いことが多く、また、各属性情報の種類数も３よりも多いことが多い。実施の形態に係る情報処理装置は、第１ユーザの属性情報を解析することにより、第１属性情報と第２属性情報との統計的な関係性を算出する。例えば、第１属性情がいずれも「１」であるユーザは、第１属性情報における「Ｔ１」が１である傾向が高いとする。この場合、第２ユーザ群において第１属性情報がいずれも「１」であるユーザは、第２属性情報におけるＴ１に対応する商品を購買する蓋然性が高いと考えられる。 In FIGS. 1(a) and 1(b), for convenience of explanation, the total number of users is 11, but in general the number is often more than 11, and the number of types of attribute information is more than 3. There are many. The information processing apparatus according to the embodiment calculates a statistical relationship between the first attribute information and the second attribute information by analyzing the attribute information of the first user. For example, it is assumed that a user whose first attribute information is all "1" has a high tendency to have "T1" of "1" in the first attribute information. In this case, it is considered highly probable that the users in the second user group whose first attribute information is "1" will purchase the product corresponding to T1 in the second attribute information.

詳細は後述するが、実施の形態に係る情報処理装置は、２以上のクラスタを設定し、各ユーザがそれぞれのクラスタに属する確率を、既知のＥＭ（Expectation-Maximization）アルゴリズムを用いて更新しながら推定する。実施の形態に係る情報処理装置は、さらに、既知の敵対的生成ネットワーク（Generative Adversarial Networks；ＧＡＮ）の枠組みを用いることにより、推定した情報が、元からある情報であるか、あるいは推定された情報であるかの判別を難しくするという意味において、推定精度を高める。これにより、実施の形態に係る情報処理装置は、算出した関係性を用いて、第２ユーザ群における第２属性情報と、第３ユーザ群における第１属性情報との推定精度を向上させることができる。 Although the details will be described later, the information processing apparatus according to the embodiment sets two or more clusters, and updates the probability that each user belongs to each cluster using a known EM (Expectation-Maximization) algorithm. presume. The information processing apparatus according to the embodiment further uses a known framework of Generative Adversarial Networks (GAN) to determine whether the estimated information is original information or estimated information The estimation accuracy is improved in the sense that it is difficult to determine whether or not As a result, the information processing apparatus according to the embodiment uses the calculated relationship to improve the estimation accuracy of the second attribute information for the second user group and the first attribute information for the third user group. can.

＜実施の形態に係る情報処理装置１の機能構成＞
図２は、実施の形態に係る情報処理装置１の機能構成を模式的に示す図である。情報処理装置１は、記憶部２と制御部３とを備える。図２において、矢印は主なデータの流れを示しており、図２に示していないデータの流れがあってもよい。図２において、各機能ブロックはハードウェア（装置）単位の構成ではなく、機能単位の構成を示している。そのため、図２に示す機能ブロックは単一の装置内に実装されてもよく、あるいは複数の装置内に分かれて実装されてもよい。機能ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてもよい。 <Functional Configuration of Information Processing Apparatus 1 According to Embodiment>
FIG. 2 is a diagram schematically showing the functional configuration of the information processing device 1 according to the embodiment. The information processing device 1 includes a storage section 2 and a control section 3 . In FIG. 2, arrows indicate main data flows, and data flows not shown in FIG. 2 may exist. In FIG. 2, each functional block does not show the configuration in units of hardware (apparatus), but the configuration in units of functions. Therefore, the functional blocks shown in FIG. 2 may be implemented within a single device, or may be implemented separately within a plurality of devices. Data exchange between functional blocks may be performed via any means such as a data bus, network, or portable storage medium.

記憶部２は、情報処理装置１を実現するコンピュータのＢＩＯＳ（Basic Input Output System）等を格納するＲＯＭ（Read Only Memory）や情報処理装置１の作業領域となるＲＡＭ（Random Access Memory）、ＯＳ（Operating System）やアプリケーションプログラム、当該アプリケーションプログラムの実行時に参照される種々の情報を格納するＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）等の大容量記憶装置である。記憶部２は、図１（ｂ）に示すユーザデータベースも記憶している。 The storage unit 2 includes a ROM (Read Only Memory) for storing the BIOS (Basic Input Output System) of a computer that implements the information processing apparatus 1, a RAM (Random Access Memory) that serves as a work area for the information processing apparatus 1, an OS ( (Operating System), application programs, and a large-capacity storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive) that stores various information referred to when the application program is executed. The storage unit 2 also stores a user database shown in FIG. 1(b).

制御部３は、情報処理装置１のＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等のプロセッサであり、記憶部２に記憶されたプログラムを実行することによってユーザ群情報取得部３０、確率設定部３１、補完データ生成部３２、更新部３３、及び推薦商品選択部３４として機能する。確率設定部３１は、初期確率設定部３１０、所属確率更新部３１１、属性確率更新部３１２、及び最適化部３１３を含む。 The control unit 3 is a processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) of the information processing device 1, and executes a program stored in the storage unit 2 to acquire the user group information acquisition unit 30, probability It functions as a setting unit 31 , a complementary data generation unit 32 , an update unit 33 and a recommended product selection unit 34 . The probability setting unit 31 includes an initial probability setting unit 310 , a belonging probability updating unit 311 , an attribute probability updating unit 312 and an optimization unit 313 .

なお、図２は、情報処理装置１が単一の装置で構成されている場合の例を示している。しかしながら、情報処理装置１は、例えばクラウドコンピューティングシステムのように複数のプロセッサやメモリ等の計算リソースによって実現されてもよい。この場合、制御部３を構成する各部は、複数の異なるプロセッサの中の少なくともいずれかのプロセッサがプログラムを実行することによって実現される。 Note that FIG. 2 shows an example in which the information processing device 1 is composed of a single device. However, the information processing apparatus 1 may be realized by computational resources such as a plurality of processors and memories, for example, like a cloud computing system. In this case, each unit that configures the control unit 3 is implemented by executing a program by at least one of a plurality of different processors.

ユーザ群情報取得部３０は、記憶部２に格納されているユーザデータベースを参照して、第１ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報とを含む第１ユーザ群情報を取得する。また、ユーザ群情報取得部３０は、ユーザデータベースを参照して、第２ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報のみを含む第２ユーザ群情報を取得する。さらに、ユーザ群情報取得部３０は、ユーザデータベースを参照して、第３ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報のみを含む第３ユーザ群情報を取得する。 The user group information acquisition unit 30 refers to the user database stored in the storage unit 2, and assigns a user identifier for specifying each user included in the first user group and a first attribute category for each user. Acquiring first user group information including a plurality of included first attribute information and a plurality of second attribute information included in a second attribute category for each user. In addition, the user group information acquisition unit 30 refers to the user database, a user identifier for specifying each user included in the second user group, and a plurality of first attributes included in the first attribute category related to each user. Acquire second user group information that includes only attribute information. Further, the user group information acquisition unit 30 refers to the user database, and refers to the user identifier for specifying each user included in the third user group, and the plurality of second attributes included in the second attribute category for each user. Acquire third user group information that includes only attribute information.

確率設定部３１は、第１ユーザ群に含まれるユーザそれぞれが、複数のクラスタそれぞれに所属する確率を示す所属確率と、各クラスタにおける第１属性情報と第２属性情報とのそれぞれの実現確率を示す属性確率と、を設定する。 The probability setting unit 31 sets the probability that each user included in the first user group belongs to each of a plurality of clusters, and the realization probability of each of the first attribute information and the second attribute information in each cluster. Set the attribute probabilities to indicate

図３（ａ）－（ｃ）は、確率設定部３１が設定する所属確率と属性確率との一例を示す図である。具体的には、図３（ａ）は確率設定部３１が設定する属性確率の初期値の一例を示す図であり、図３（ｂ）は属性確率の初期値に基づいて算出した所属確率を示す図である。確率設定部３１の初期確率設定部３１０は、例えば乱数等を用いることにより、属性確率の初期値を設定する。 FIGS. 3A to 3C are diagrams showing examples of belonging probabilities and attribute probabilities set by the probability setting unit 31. FIG. Specifically, FIG. 3A is a diagram showing an example of initial values of attribute probabilities set by the probability setting unit 31, and FIG. FIG. 4 is a diagram showing; An initial probability setting unit 310 of the probability setting unit 31 sets initial values of attribute probabilities by using, for example, random numbers.

例えば、ユーザ識別子がＵＩＤ００００５であるユーザが第１クラスタに属している確率Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００００５）は、以下のように計算できる。
Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００００５）∝Ｐ（ＩＤ＝ＵＩＤ００００５，第１クラスタ）＝Ｐ（Ｓ１＝１｜第１クラスタ）×Ｐ（Ｓ２＝１｜第１クラスタ）×Ｐ（Ｓ３＝１｜第１クラスタ）＝０．８９×０．００１３×０．７４９９＝０．０００８９６。 For example, the probability P that a user whose user identifier is UID00005 belongs to the first cluster (first cluster |ID=UID00005) can be calculated as follows.
P(first cluster|ID=UID00005)∝P(ID=UID00005, first cluster)=P(S1=1|first cluster)×P(S2=1|first cluster)×P(S3=1| first cluster) = 0.89 x 0.0013 x 0.7499 = 0.000896.

同様に、ユーザ識別子がＵＩＤ００００５であるユーザが第２クラスタに属している確率Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００００５）は、以下のように計算できる。
Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００００５）∝（ＩＤ＝ＵＩＤ００００５，第２クラスタ）＝Ｐ（Ｓ１＝１｜第２クラスタ）×Ｐ（Ｓ２＝１｜第２クラスタ）×Ｐ（Ｓ３＝１｜第２クラスタ）＝０．９８５６×０．０４５０×０．３８８８＝０．０１７２３。 Similarly, the probability P that the user whose user identifier is UID00005 belongs to the second cluster (second cluster |ID=UID00005) can be calculated as follows.
P(second cluster|ID=UID00005)∝(ID=UID00005, second cluster)=P(S1=1|second cluster)×P(S2=1|second cluster)×P(S3=1|th 2 clusters) = 0.9856 x 0.0450 x 0.3888 = 0.01723.

Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００００５）、Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００００５）の右辺には、それぞれ各クラスの事前確率であるＰ（第１クラスタ）、Ｐ（第２クラスタ）を乗算するのが正しいが、簡単のためＰ（第１クラスタ）＝Ｐ（第２クラスタ）＝０．５とみなして省略している。ただし、省略せずＰ（第１クラスタ）、Ｐ（第２クラスタ）を推定してもよい。 The right sides of P (first cluster |ID=UID00005) and P (second cluster |ID=UID00005) are multiplied by the prior probabilities P (first cluster) and P (second cluster) of each class, respectively. is correct, but for the sake of simplicity, it is assumed that P (first cluster)=P (second cluster)=0.5 and is omitted. However, P (first cluster) and P (second cluster) may be estimated without omission.

これより、ユーザ識別子がＵＩＤ０００５のユーザに関する第１クラスタの事後確率Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００００５）は、Ｐ（ＩＤ＝ＵＩＤ００００５，第１クラスタ）とＰ（ＩＤ＝ＵＩＤ００００５，第２クラスタ）とに基づいて以下のように計算できる。 From this, the posterior probability P (first cluster|ID=UID00005) of the first cluster regarding the user whose user identifier is UID0005 is P (ID=UID00005, first cluster) and P (ID=UID00005, second cluster). can be calculated as follows.

Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００００５）＝Ｐ（ＩＤ＝ＵＩＤ００００５，第１クラスタ）／（Ｐ（ＩＤ＝ＵＩＤ００００５，第１クラスタ）＋Ｐ（ＩＤ＝ＵＩＤ００００５，第２クラスタ））＝０．０００８９６／（０．０００８９６＋０．０１７２３）＝０．０４９４。 P(first cluster|ID=UID00005)=P(ID=UID00005, first cluster)/(P(ID=UID00005, first cluster)+P(ID=UID00005, second cluster))=0.000896/( 0.000896 + 0.01723) = 0.0494.

一方、ユーザ識別子がＵＩＤ０００５のユーザに関する第１クラスタの事後確率Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００００５）は、以下のように計算できる。 On the other hand, the posterior probability P (second cluster |ID=UID00005) of the first cluster regarding the user whose user identifier is UID0005 can be calculated as follows.

Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００００５）＝１－Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００００５）＝０．９５０６。 P(second cluster|ID=UID00005)=1−P(first cluster|ID=UID00005)=0.9506.

なお、図３（ａ）に示す属性確率から、図３（ｂ）に示す所属確率を生成する処理は、ＥＭアルゴリズムにおけるＥステップに相当する。確率設定部３１の所属確率更新部３１１は、ＥＭアルゴリズムのＥステップを実行することにより所属確率を更新する。 Note that the process of generating the belonging probabilities shown in FIG. 3(b) from the attribute probabilities shown in FIG. 3(a) corresponds to step E in the EM algorithm. The belonging probability updating unit 311 of the probability setting unit 31 updates the belonging probability by executing the E step of the EM algorithm.

確率設定部３１の属性確率更新部３１２は、図３（ｂ）に示す所属確率に基づいて、ＥＭアルゴリズムのＭステップを実行することにより属性確率を更新する。確率設定部３１の最適化部３１３は、所定の収束条件を満たすまで所属確率更新部３１１に所属確率を更新させるとともに、属性確率更新部３１２に属性確率を更新させる。これにより、確率設定部３１は、最終的に図３（ｃ）に示す属性確率を生成する。図示はしないが、図３（ｃ）に示す属性確率に基づいて、属性確率更新部３１２は最終的な所属確率も算出する。 The attribute probability updating unit 312 of the probability setting unit 31 updates the attribute probabilities by executing the M step of the EM algorithm based on the belonging probabilities shown in FIG. 3(b). The optimization unit 313 of the probability setting unit 31 causes the belonging probability updating unit 311 to update the belonging probability and the attribute probability updating unit 312 to update the attribute probability until a predetermined convergence condition is satisfied. As a result, the probability setting unit 31 finally generates the attribute probabilities shown in FIG. 3(c). Although not shown, the attribute probability updating unit 312 also calculates final affiliation probabilities based on the attribute probabilities shown in FIG. 3(c).

図３に示す例では、確率設定部３１は、第１クラスタと第２クラスタとの二つのクラスタを設定した場合の例を示している。図３（ａ）において、例えば、ユーザが第１クラスタに属する場合、そのユーザが第１属性情報の「Ｓ３」の購入履歴がある可能性が７４．９９％（０．７４９９の割合）であることを示している。確率設定部３１は、ＥＭアルゴリズム及びＧＡＮのフレームワークを用いることにより、属性確率及び所属確率を更新する。 The example shown in FIG. 3 shows an example in which the probability setting unit 31 sets two clusters, a first cluster and a second cluster. In FIG. 3A, for example, when a user belongs to the first cluster, the possibility that the user has a purchase history of "S3" in the first attribute information is 74.99% (0.7499 ratio). It is shown that. The probability setting unit 31 updates attribute probabilities and belonging probabilities by using the framework of the EM algorithm and GAN.

図３（ｃ）は、確率設定部３１が更新処理によって生成した属性確率の一例を示す図である。例えば、Ｐ（Ｔ１＝１｜第１クラスタ）、Ｐ（Ｔ１＝１｜第２クラスタ）は以下のように計算される。 FIG.3(c) is a figure which shows an example of the attribute probability which the probability setting part 31 produced|generated by update processing. For example, P(T1=1|first cluster) and P(T1=1|second cluster) are calculated as follows.

Ｐ（Ｔ１＝１｜第１クラスタ）＝
｛Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００１）×Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００１）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００２）×Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００２）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００３）×Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００３）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００４）×Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００４）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ００１０）×Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００１０）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ００１１）×Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００１１）｝／｛Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００１）＋Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００２）＋Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００３）＋Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００４）＋Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００１０）＋Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ００１１）｝
＝｛１×０．５２９３＋１×０．９９７０＋１×０．００７５＋０×０．００１９＋１×０．９５５８＋０×０．１４８１｝／｛０．５２９３＋０．９９７０＋０．００７５＋０．００１９＋０．９５５８＋０．１４８１｝
＝０．９４３２ P(T1=1|first cluster)=
{P(T1=1|ID=UID0001)×P(first cluster|ID=UID0001)
+P(T1=1|ID=UID0002)×P(first cluster|ID=UID0002)
+P(T1=1|ID=UID0003)×P(first cluster|ID=UID0003)
+P(T1=1|ID=UID0004)×P(first cluster|ID=UID0004)
+P(T1=1|ID=UID0010)×P(first cluster|ID=UID0010)
+P(T1=1|ID=UID0011)×P(first cluster|ID=UID0011)}/{P(first cluster|ID=UID0001)+P(first cluster|ID=UID0002)+P(first cluster| ID=UID0003)+P(first cluster|ID=UID0004)+P(first cluster|ID=UID0010)+P(first cluster|ID=UID0011)}
= {1 x 0.5293 + 1 x 0.9970 + 1 x 0.0075 + 0 x 0.0019 + 1 x 0.9558 + 0 x 0.1481} / {0.5293 + 0.9970 + 0.0075 + 0.0019 + 0.9558 + 0.1481}
= 0.9432

Ｐ（Ｔ１＝１｜第２クラスタ）＝
｛Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００１）×Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００１）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００２）×Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００２）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００３）×Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００３）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００４）×Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００４）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ００１０）×Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００１０）
＋Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ００１１）×Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００１１）｝／｛Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００１）＋Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００２）＋Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００３）＋Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００４）＋Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００１０）＋Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ００１１）｝
＝｛１×０．４７０７＋１×０．００３０＋１×０．９９２５＋０．９９８１＋１×０．０４４２＋０×０.８５１９｝／｛０．４７０７＋０．００３０＋０．９９２５＋０．９９８１＋０．０４４２＋０．８５１９｝
＝０．４４９５ P(T1=1|2nd cluster)=
{P(T1=1|ID=UID0001)×P(second cluster|ID=UID0001)
+P(T1=1|ID=UID0002)×P(second cluster|ID=UID0002)
+P(T1=1|ID=UID0003)×P(second cluster|ID=UID0003)
+P(T1=1|ID=UID0004)×P(second cluster|ID=UID0004)
+P(T1=1|ID=UID0010)×P(second cluster|ID=UID0010)
+P(T1=1|ID=UID0011)×P(second cluster|ID=UID0011)}/{P(second cluster|ID=UID0001)+P(second cluster|ID=UID0002)+P(second cluster| ID=UID0003)+P(second cluster|ID=UID0004)+P(second cluster|ID=UID0010)+P(second cluster|ID=UID0011)}
= {1 x 0.4707 + 1 x 0.0030 + 1 x 0.9925 + 0.9981 + 1 x 0.0442 + 0 x 0.8519} / {0.4707 + 0.0030 + 0.9925 + 0.9981 + 0.0442 + 0.8519}
= 0.4495

確率設定部３１が実行する更新処理により、図３（ａ）に示す属性確率の初期値は、図３（ｃ）に示す値に更新される。更新処理を繰り返し行う場合は、更新後の値である図３（ｃ）に示す値を図３（ａ）に設定し、図３（ｃ）に示す値を更新する。 By the updating process executed by the probability setting unit 31, the initial values of the attribute probabilities shown in FIG. 3(a) are updated to the values shown in FIG. 3(c). When the updating process is repeated, the updated values shown in FIG. 3(c) are set in FIG. 3(a), and the values shown in FIG. 3(c) are updated.

補完データ生成部３２は、所属確率と属性確率とに基づいて、第２ユーザ群に含まれるユーザそれぞれに関する複数の第２属性情報と、第３ユーザ群に含まれるユーザそれぞれに関する複数の第１属性情報と、を生成する。図１（ｂ）に示すように、第２ユーザ群に含まれるユーザそれぞれに関する複数の第２属性情報と、第３ユーザ群に含まれるユーザそれぞれに関する複数の第１属性情報とは、いずれも欠落している情報である。 Based on the affiliation probability and the attribute probability, the complementary data generation unit 32 generates a plurality of second attribute information regarding each user included in the second user group and a plurality of first attribute information regarding each user included in the third user group. to generate information; As shown in FIG. 1B, the plurality of second attribute information regarding each user included in the second user group and the plurality of first attribute information regarding each user included in the third user group are both missing. It is information that

例えば、補完データ生成部３２は、ユーザ識別子がＵＩＤ０００５のユーザにおける「Ｔ１」が１である確率Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００５）は、以下のように計算できる。
Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００５）＝Ｐ（Ｔ１＝１｜第１クラスタ）×Ｐ（第１クラスタ｜ＩＤ＝ＵＩＤ０００５）＋Ｐ（Ｔ１＝１｜第２クラスタ）×Ｐ（第２クラスタ｜ＩＤ＝ＵＩＤ０００５）＝０．９４３２×０．０４９４＋０．４４９５×０．９５０６＝０．４７３９。 For example, the complementary data generation unit 32 can calculate the probability P (T1=1|ID=UID0005) that "T1" is 1 for the user whose user identifier is UID0005 as follows.
P(T1=1|ID=UID0005)=P(T1=1|1st cluster)×P(1st cluster|ID=UID0005)+P(T1=1|2nd cluster)×P(2nd cluster|ID =UID0005) = 0.9432 x 0.0494 + 0.4495 x 0.9506 = 0.4739.

また、ユーザ識別子がＵＩＤ０００５のユーザにおける「Ｔ１」が０である確率Ｐ（Ｔ１＝０｜ＩＤ＝ＵＩＤ０００５）は、以下のように計算できる。
Ｐ（Ｔ１＝０｜ＩＤ＝ＵＩＤ０００５）＝１－Ｐ（Ｔ１＝１｜ＩＤ＝ＵＩＤ０００５）＝１－０．４７３９＝０．５２６１。 Also, the probability P (T1=0|ID=UID0005) that "T1" is 0 for the user whose user identifier is UID0005 can be calculated as follows.
P(T1=0|ID=UID0005)=1−P(T1=1|ID=UID0005)=1−0.4739=0.5261.

図４は、実施の形態に係る補完データ生成部３２による補完後のユーザデータベースを示す図である。図４においてアンダーラインで示すように、図１（ｂ）に示すユーザデータベースでは欠落している情報が補完データ生成部３２によって補完されている。 FIG. 4 is a diagram showing the user database after complementation by the complementation data generation unit 32 according to the embodiment. As indicated by the underline in FIG. 4, missing information in the user database shown in FIG.

図５は、補完後のユーザデータベースを示す図であり、二値化処理を施した補完後のユーザデータベースを示す図である。より具体的には、図４に示すデータベースの各値に対し、閾値（例えば、０．５）未満を０、閾値以上を１とする二値化を施した図である。図５では、図１で実現値が与えられているものは図１の値を埋める。 FIG. 5 is a diagram showing the user database after complementation, and is a diagram showing the user database after complementation that has undergone binarization processing. More specifically, each value in the database shown in FIG. 4 is binarized such that 0 is less than a threshold value (for example, 0.5) and 1 is greater than or equal to the threshold value. In FIG. 5, those for which actual values are given in FIG. 1 fill in the values in FIG.

図２の説明に戻る。更新部３３は、第１ユーザ群、第２ユーザ群、及び第３ユーザ群に含まれる各ユーザの第１属性情報及び第２属性情報に、補完データ生成部３２が生成した情報が含まれるか否かを識別する識別関数から定まる識別評価関数と、生成された第１属性情報又は第２属性情報と、第１ユーザ群、第２ユーザ群、及び第３ユーザ群に含まれる各ユーザの第１属性情報又は第２属性情報の誤差を評価する生成評価関数とに基づいて、敵対的生成ネットワークを用いて識別関数、所属確率、及び属性確率を更新する。 Returning to the description of FIG. The update unit 33 determines whether information generated by the complementary data generation unit 32 is included in the first attribute information and second attribute information of each user included in the first user group, the second user group, and the third user group. The identification evaluation function determined from the identification function that identifies whether or not, the generated first attribute information or second attribute information, and the first user group, second user group, and third user group of each user included in A generative adversarial network is used to update the discriminant function, membership probabilities, and attribute probabilities based on the generated evaluation function that evaluates the error of the first attribute information or the second attribute information.

識別関数は、図５に示す補完後のデータのうち実現値（ｘ_ｎ∈Ｘ_ｎ）か、補完値(ｘ_ｒ∈Ｘ_ｒ)かを識別する関数であり、入力が真なら１、偽なら０を返す関数である。識別評価関数Ｏ_Ｄの具体例は、以下の式（１）で表現できる。識別評価関数は大きい方がよい。 The discriminant function is _a function that _{discriminates} whether the data after completion _shown in _FIG . A function that returns 0. A specific example of the identification evaluation function _OD can be expressed by the following equation (1). A larger discriminative evaluation function is better.

ｍ次元のベクトルの入力ｘ_．＝（ｘ_．１，ｘ_．２，・・・，ｘ_．ｍ）とした場合（・には実現値を意味するｎ又は補完値を意味するｒが入る）、一例として識別関数は以下の式（２）のように表現できる。 An m-dimensional vector input x _. =( _x.1 _, _x.2 , . (2) can be expressed.

式（２）において、ｗ_１，ｗ_２，・・・，ｗ_ｍは、はベクトルの次元毎の識別関数の係数であり、ｅｘｐ（・）／Σｅｘｐ（・）はｓｏｆｔｍａｘ関数と呼ばれ、この関数を通すことにより出力値が［０，１］の範囲となり、かつ、出力値合計が１となる。 In equation (2 ₎ , w ₁ , w ₂ , . By passing the function, the output value is in the range [0, 1], and the total output value is 1.

識別関数はクラスタ毎に学習してもよい。その場合は、クラスタｃ毎にＤ_ｃを作成し、識別評価関数Ｏ_Ｄは以下の式（３）のように設定する。 A discriminant function may be learned for each cluster. In that case, _Dc is created for each cluster c, and the identification evaluation function _OD is set as shown in the following equation (3).

また、一例としてクラスタｃ毎の識別関数は以下の式（４）のように表現できる。 Also, as an example, the discriminant function for each cluster c can be expressed as in Equation (4) below.

式（４）において、ｗ_ｃ１，ｗ_ｃ２，・・・，ｗ_ｃｍは、クラスタｃ毎の識別関数の係数である。クラスタ毎の関数とすることで、クラスタ毎の実現値、補完値の割合の違いを考慮した識別を行うことが可能となる。ここで、更新部３３は、Ｏ_Ｄの値が大きくなるように学習及び更新を所定の回数繰り返す。 In Equation (4), w _c1 , w _c2 , . . . , w _cm are the coefficients of the discrimination function for each cluster c. By using a function for each cluster, it is possible to perform discrimination considering the difference in the ratio of the actual value and the complementary value for each cluster. Here, the updating unit 33 repeats learning and updating a predetermined number of times so that the value of _OD increases.

生成評価関数Ｏ_Ｇは、実現値（ｘ_ｎ∈Ｘ_ｎ）の補完生成のよさ（再構成損失）Ｌ（ｘ，ｎ）と、補完値（ｘ_ｒ∈Ｘ_ｒ）の真偽識別の行い難さを評価するために用いられる関数である。生成評価関数Ｏ_Ｇは、以下の式（５）に示すように、属性_Ｓ３と真偽識別の行い易さの加重和として設定する。 The generation evaluation function _OG is the goodness of complementary generation (reconstruction loss) L(x, n) of the realization value (x _n ∈ X _n ) and the difficulty of authenticity discrimination of the complementary value (x _r ∈ X _r ). It is a function used to evaluate strength. The generation evaluation function _OG is set as a weighted sum of the attribute_S3 and the ease of authenticity discrimination, as shown in the following equation (5).

式（５）において、λは再構成損失と真偽識別の行い易さのバランスをとるハイパーパラメータである。

In Equation (5), λ is a hyperparameter that balances reconstruction loss and ease of authenticity discrimination.

識別関数をクラスタｃ毎に学習する場合は、生成評価関数Ｏ_Ｇは以下の式（６）のように設定する。 When the discriminant function is learned for each cluster c, the generated evaluation function _OG is set as shown in the following equation (6).

更新部３３は、生成評価関数Ｏ_Ｇの値が小さくなるように学習及び更新を所定の回数繰り返す。また、実現値（ｘ_ｎ∈Ｘ_ｎ）の補完生成のよさ（再構成損失）Ｌ（ｘ_ｎ）は以下の式（７）のように設定できる。生成評価関数は小さい方がよく、敵対的生成ネットワークは識別評価関数を大きくしつつ生成評価関数を小さくするミニマックスゲームにより属性確率等を最適化する。 The updating unit 33 repeats learning and updating a predetermined number of times so that the value of the generated evaluation function _OG becomes small. Also, the quality of complementary generation (reconstruction loss) L(x _n ) of the realization value (x _n εX _n ) can be set as shown in Equation (7) below. The smaller the generation evaluation function, the better, and the adversarial generation network optimizes the attribute probabilities and the like through a minimax game in which the generation evaluation function is made smaller while the discrimination evaluation function is increased.

ΣｌｏｇＬ（ｘ_ｎ）＝－Σｉ∈ＩＤ＿Ｓ｛Ｉ（属性＿Ｓ３３１＝１，ＩＤ＿Ｓ＝ｉ）ｌｏｇＰ（属性＿Ｓ３１＝１｜ＩＤ＿Ｓ＝ｉ）＋Ｉ（属性＿Ｓ３１＝０，ＩＤ＿Ｓ＝ｉ）ｌｏｇＰ（属性＿Ｓ３１＝０｜ＩＤ＿Ｓ＝ｉ）＋Ｉ（属性＿Ｓ３２＝１，ＩＤ＿Ｓ＝ｉ）ｌｏｇＰ（属性＿Ｓ３２＝１｜ＩＤ＿Ｓ＝ｉ）＋Ｉ（属性＿Ｓ３２＝０，ＩＤ＿Ｓ＝ｉ）ｌｏｇＰ（属性＿Ｓ３２＝０｜ＩＤ＿Ｓ＝ｉ）＋Ｉ（属性＿Ｓ３３＝１，ＩＤ＿Ｓ＝ｉ）ｌｏｇＰ（属性＿Ｓ３３＝１｜ＩＤ＿Ｓ＝ｉ）＋Ｉ（属性＿Ｓ３３＝０，ＩＤ＿Ｓ＝ｉ）ｌｏｇＰ（属性＿Ｓ３３＝０｜ＩＤ＿Ｓ＝ｉ））－Σｊ∈ＩＤ＿Ｔ｛Ｉ（属性＿Ｔ１＝１，ＩＤ＿Ｔ＝ｊ）ｌｏｇＰ（属性＿Ｔ１＝１｜ＩＤ＿Ｔ＝ｊ）＋Ｉ（属性＿Ｔ１＝０，ＩＤ＿Ｔ＝ｊ）ｌｏｇＰ（属性＿Ｔ１＝０｜ＩＤ＿Ｔ＝ｊ）＋Ｉ（属性＿Ｔ２＝１，ＩＤ＿Ｔ＝ｊ）ｌｏｇＰ（属性＿Ｔ２＝１｜ＩＤ＿Ｔ＝ｊ）＋Ｉ（属性＿Ｔ２＝０，ＩＤ＿Ｔ＝ｊ）ｌｏｇＰ（属性＿Ｔ２＝０｜ＩＤ＿Ｔ＝ｊ））（７） ΣlogL(x _n )=−ΣiεID_S{I(attribute_S331=1, ID_S=i) logP(attribute_S31=1|ID_S=i)+I(attribute_S31=0, ID_S=i) logP(attribute_S31=0 |ID_S=i)+I(attribute_S32=1, ID_S=i) logP(attribute_S32=1|ID_S=i)+I(attribute_S32=0, ID_S=i) logP(attribute_S32=0|ID_S=i)+I (attribute_S33=1, ID_S=i) logP(attribute_S33=1|ID_S=i)+I(attribute_S33=0, ID_S=i) logP(attribute_S33=0|ID_S=i))−ΣjεID_T{I (attribute_T1=1, ID_T=j) logP(attribute_T1=1|ID_T=j)+I(attribute_T1=0, ID_T=j) logP(attribute_T1=0|ID_T=j)+I(attribute_T2=1, ID_T=j) logP(attribute_T2=1|ID_T=j)+I(attribute_T2=0, ID_T=j) logP(attribute_T2=0|ID_T=j)) (7)

式（７）において、ＩＤ＿Ｓは第２ユーザ群のユーザ識別子を示し、ＩＤ＿Ｔは第３ユーザ群のユーザ識別子を示す。また、例えばＰ（属性＿Ｓ３１=１｜ＩＤ＿Ｓ=ｉ）などの確率値は、更新部３３が図４に示す補完後のユーザデータベースを参照することにより取得できる。 In equation (7), ID_S indicates the user identifier of the second user group, and ID_T indicates the user identifier of the third user group. Also, for example, a probability value such as P (attribute_S31=1|ID_S=i) can be obtained by referring to the user database after complement shown in FIG. 4 by the updating unit 33 .

ここで、図３（ａ）に示す属性確率が生成評価関数において更新されるパラメータであり、図４に示す補完後のユーザデータベースは補完データ生成部３２により図１（ｂ）に示す属性情報及び図３（ａ）に示す属性確率から再計算される。 Here, the attribute probability shown in FIG. 3A is a parameter updated in the generated evaluation function, and the complemented user database shown in FIG. It is recalculated from the attribute probabilities shown in FIG. 3(a).

以上において、属性情報のみから計算できる単純なモデルの例を示しているが、階層ベイズモデルなど、より属性情報に関する多数のパラメータを有する複雑なモデルに変更しても構わない。また、Ｉ（属性Ｓ３=１，ＩＤ＿Ｓ=ｉ）はＩＤ＿Ｓの属性Ｓ３＝１ならば１、０ならば０を返す指示関数である。 In the above, an example of a simple model that can be calculated only from attribute information is shown, but it may be changed to a complicated model having many parameters related to attribute information, such as a hierarchical Bayesian model. I (attribute S3=1, ID_S=i) is an indicator function that returns 1 if the attribute S3 of ID_S is 1 and returns 0 if it is 0.

図２の説明に戻る。補完データ生成部３２は、更新部３３が更新した所属確率と属性確率とに基づいて、第２ユーザ群に含まれるユーザそれぞれに関する複数の第２属性情報と、第３ユーザ群に含まれるユーザそれぞれに関する複数の第１属性情報と、を生成する。 Returning to the description of FIG. Based on the belonging probability and the attribute probability updated by the updating unit 33, the complementary data generating unit 32 generates a plurality of pieces of second attribute information about each user included in the second user group and each user included in the third user group. and generating a plurality of first attribute information relating to

このように、実施の形態に係る情報処理装置１は、ＥＭアルゴリズムを用いることにより、各ユーザが属するクラスタを固定することなく、第１ユーザ群の属性情報に基づいて各ユーザの各クラスタの所属確率を推定する。これにより、各ユーザが所属するクラスタを固定する場合と比較して、先見情報の統計的な解析により各ユーザが属するクラスタを推定することができる。 In this way, the information processing apparatus 1 according to the embodiment uses the EM algorithm to determine whether each user belongs to each cluster based on the attribute information of the first user group without fixing the cluster to which each user belongs. Estimate probabilities. As a result, compared to the case where the cluster to which each user belongs is fixed, the cluster to which each user belongs can be estimated by statistical analysis of the foresight information.

実施の形態に係る情報処理装置１は、さらに、ＥＭアルゴリズムを用いることによって推定した値をいわば初期値として、ＧＡＮのフレームワークを用いることにより、推定した欠損情報の精度を高める。これにより、情報処理装置１は、類似した情報に基づいて推定さする欠落情報の推定精度を向上させることができる。 The information processing apparatus 1 according to the embodiment further uses the values estimated by using the EM algorithm as initial values, so to speak, and uses the GAN framework to improve the accuracy of the estimated loss information. As a result, the information processing device 1 can improve the estimation accuracy of missing information estimated based on similar information.

＜実施の形態に係る情報処理装置１の利用シーン＞
実施の形態に係る情報処理装置１は、例えば、あるユーザのアイテムの購買履歴に基づいて、その購買履歴が存在するアイテムと別のカテゴリーに含まれるアイテムのうち、ユーザに推薦するアイテムを選択することに用いることができる。 <Use scene of information processing apparatus 1 according to the embodiment>
The information processing apparatus 1 according to the embodiment selects an item to be recommended to the user, for example, based on the item purchase history of a certain user, from among items included in a category different from the items for which the purchase history exists. can be used for

この場合、第１属性カテゴリーと第２属性カテゴリーとは、それぞれ商品に関する異なるカテゴリーを示しているものとする。また、第１属性情報と第２属性情報とは、それぞれ第１属性カテゴリーと第２属性カテゴリーとに含まれる商品の購入の有無を示す情報であるものとする。 In this case, it is assumed that the first attribute category and the second attribute category indicate different categories of products. Also, the first attribute information and the second attribute information shall be information indicating whether or not the product included in the first attribute category and the second attribute category is purchased, respectively.

推薦商品選択部３４は、補完データ生成部３２が生成した第２ユーザ群に含まれる第２属性情報に基づいて第２ユーザ群の各ユーザに推薦する第２属性カテゴリーの商品を選択する。同様に、推薦商品選択部は、補完データ生成部３２が生成した第３ユーザ群に含まれる第１属性情報に基づいて第３ユーザ群の各ユーザに推薦する第１属性カテゴリーの商品を選択する。これにより、情報処理装置１は、二つのカテゴリーのうち一方のカテゴリーに含まれるアイテムの購買履歴のみが存在するユーザに対し、別のカテゴリーに含まれるアイテムの中から統計的に購買する可能性が高いアイテムを選択することができる。 The recommended product selection unit 34 selects products of the second attribute category to be recommended to each user of the second user group based on the second attribute information included in the second user group generated by the complementary data generation unit 32 . Similarly, the recommended product selection unit selects products of the first attribute category to be recommended to each user of the third user group based on the first attribute information included in the third user group generated by the complementary data generation unit 32. . As a result, the information processing apparatus 1 statistically increases the possibility that a user who only has a purchase history of items included in one of the two categories will purchase items included in another category. You can choose expensive items.

＜情報処理装置１が実行する情報処理方法の処理フロー＞
図６は、実施の形態に係る情報処理装置１が実行する情報処理の流れを説明するためのフローチャートである。本フローチャートにおける処理は、例えば情報処理装置１が起動したときに開始する。 <Processing Flow of Information Processing Method Executed by Information Processing Apparatus 1>
FIG. 6 is a flowchart for explaining the flow of information processing executed by the information processing apparatus 1 according to the embodiment. The processing in this flowchart starts, for example, when the information processing apparatus 1 is activated.

ユーザ群情報取得部３０は、（１）第１ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報とを含む第１ユーザ群情報と、（２）第２ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第１属性カテゴリーに含まれる複数の第１属性情報のみを含む第２ユーザ群情報と、（３）第３ユーザ群に含まれるユーザそれぞれを特定するためのユーザ識別子と、各ユーザそれぞれに関する第２属性カテゴリーに含まれる複数の第２属性情報のみを含む第３ユーザ群情報と、を取得する（ステップＳ２）。 The user group information acquisition unit 30 includes (1) a user identifier for specifying each user included in the first user group, a plurality of first attribute information included in the first attribute category related to each user, each user first user group information including a plurality of second attribute information included in the second attribute category relating to each; (2) a user identifier for specifying each user included in the second user group; Second user group information including only a plurality of first attribute information included in the first attribute category; (3) user identifiers for specifying each user included in the third user group; and third user group information including only a plurality of pieces of second attribute information included in the attribute category (step S2).

確率設定部３１は、第１ユーザ群に含まれるユーザそれぞれが、複数のクラスタそれぞれに所属する確率を示す所属確率を設定する（ステップＳ４）。また、確率設定部３１は、各クラスタにおける第１属性情報と第２属性情報とのそれぞれの実現確率を示す属性確率を設定する（ステップＳ６）。 The probability setting unit 31 sets a belonging probability indicating a probability that each user included in the first user group belongs to each of a plurality of clusters (step S4). Further, the probability setting unit 31 sets attribute probabilities indicating respective realization probabilities of the first attribute information and the second attribute information in each cluster (step S6).

補完データ生成部３２は、所属確率と属性確率とに基づいて、第２ユーザ群に含まれるユーザそれぞれに関する複数の第２属性情報と、第３ユーザ群に含まれるユーザそれぞれに関する複数の第１属性情報と、を生成する（ステップＳ８）。 Based on the affiliation probability and the attribute probability, the complementary data generation unit 32 generates a plurality of second attribute information regarding each user included in the second user group and a plurality of first attribute information regarding each user included in the third user group. information is generated (step S8).

更新部３３は、第１ユーザ群、第２ユーザ群、及び第３ユーザ群に含まれる各ユーザの第１属性情報及び第２属性情報に、補完データ生成部３２が生成した情報が含まれるか否かを識別する識別関数から定まる識別評価関数と、生成された第１属性情報又は第２属性情報と、第１ユーザ群、第２ユーザ群、及び第３ユーザ群に含まれる各ユーザの第１属性情報又は第２属性情報の誤差を評価する生成評価関数とに基づいて、敵対的生成ネットワークを用いて識別関数、所属確率、及び属性確率を更新する（ステップＳ１０）。 The update unit 33 determines whether information generated by the complementary data generation unit 32 is included in the first attribute information and second attribute information of each user included in the first user group, the second user group, and the third user group. The identification evaluation function determined from the identification function that identifies whether or not, the generated first attribute information or second attribute information, and the first user group, second user group, and third user group of each user included in Based on the generated evaluation function that evaluates the error of the first attribute information or the second attribute information, the discriminant function, the belonging probability, and the attribute probability are updated using the adversarial generative network (step S10).

＜実施の形態に係る情報処理装置１が奏する効果＞
以上説明したように、実施の形態に係る情報処理装置１によれば、類似した情報に基づいて推定する欠落情報の推定精度を向上させることができる。 <Effects of Information Processing Apparatus 1 According to Embodiment>
As described above, according to the information processing apparatus 1 according to the embodiment, it is possible to improve the estimation accuracy of missing information estimated based on similar information.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果をあわせ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes are possible within the scope of the gist thereof. be. For example, all or part of the device can be functionally or physically distributed and integrated in arbitrary units. In addition, new embodiments resulting from arbitrary combinations of multiple embodiments are also included in the embodiments of the present invention. The effect of the new embodiment caused by the combination has the effect of the original embodiment.

１・・・情報処理装置
２・・・記憶部
３・・・制御部
３０・・・ユーザ群情報取得部
３１・・・確率設定部
３１０・・・初期確率設定部
３１１・・・所属確率更新部
３１２・・・属性確率更新部
３１３・・・最適化部
３２・・・補完データ生成部
３３・・・更新部
３４・・・推薦商品選択部
Reference Signs List 1 Information processing device 2 Storage unit 3 Control unit 30 User group information acquisition unit 31 Probability setting unit 310 Initial probability setting unit 311 Belonging probability update Part 312... Attribute probability update part 313... Optimization part 32... Complementary data generation part 33... Update part 34... Recommended product selection part

Claims

(1) A user identifier for specifying each user included in the first user group, a plurality of first attribute information included in the first attribute category for each user, and a second attribute category for each user. (2) a user identifier for identifying each user included in the second user group; and a plurality of items included in the first attribute category for each user (3) a user identifier for identifying each user included in the third user group, and a plurality of second attributes included in the second attribute category for each user a user group information acquisition unit that acquires third user group information that includes only two attribute information;
A belonging probability indicating a probability that each user included in the first user group belongs to each of a plurality of clusters, and an attribute probability indicating a realization probability of each of the first attribute information and the second attribute information in each cluster. and a probability setting unit for setting
Based on the belonging probability and the attribute probability, a plurality of the second attribute information regarding each of the users included in the second user group and a plurality of the first attribute information regarding each of the users included in the third user group. and a complementary data generation unit that generates
whether the information generated by the complementary data generation unit is included in the first attribute information and the second attribute information of each user included in the first user group, the second user group, and the third user group; The identification evaluation function determined from the identification function that identifies the generated first attribute information or the second attribute information, and each included in the first user group, the second user group, and the third user group an updating unit that updates the discriminant function, the belonging probability, and the attribute probability using a generative adversarial network based on a generated evaluation function that evaluates an error in the user's first attribute information or second attribute information; with
The complementary data generation unit generates a plurality of pieces of second attribute information about each user included in the second user group and the third user group based on the belonging probability and the attribute probability updated by the updating unit. generating a plurality of said first attribute information for each of the users included in
Information processing equipment.

The probability setting unit
an initial probability setting unit that sets an initial value of the attribute probability;
an affiliation probability updating unit that updates the affiliation probability by the E step of an EM (Expectation-Maximization) algorithm;
an attribute probability updating unit that updates the attribute probability according to the M step of the EM algorithm;
an optimization unit that causes the attribute probability updating unit to update the belonging probability until a predetermined convergence condition is satisfied, and causes the attribute probability updating unit to update the attribute probability;
The information processing apparatus according to claim 1, comprising:

The first attribute category and the second attribute category indicate different categories of products, respectively,
The first attribute information and the second attribute information are information indicating whether or not the products included in the first attribute category and the second attribute category are purchased, respectively,
The information processing device is
Selecting a product of the second attribute category to be recommended to each user of the second user group based on the second attribute information included in the second user group generated by the complementary data generation unit, and selecting the complementary data further comprising a recommended product selection unit that selects a product of the first attribute category to be recommended to each user of the third user group based on the first attribute information included in the third user group generated by the generation unit;
The information processing apparatus according to claim 1 or 2.

the processor
(1) A user identifier for specifying each user included in the first user group, a plurality of first attribute information included in the first attribute category for each user, and a second attribute category for each user. (2) a user identifier for identifying each user included in the second user group; and a plurality of items included in the first attribute category for each user (3) a user identifier for identifying each user included in the third user group, and a plurality of second attributes included in the second attribute category for each user obtaining third user group information including only two attribute information;
A belonging probability indicating a probability that each user included in the first user group belongs to each of a plurality of clusters, and an attribute probability indicating a realization probability of each of the first attribute information and the second attribute information in each cluster. and setting
Based on the belonging probability and the attribute probability, a plurality of the second attribute information regarding each of the users included in the second user group and a plurality of the first attribute information regarding each of the users included in the third user group. and generating
From an identification function that identifies whether generated information is included in the first attribute information and second attribute information of each user included in the first user group, the second user group, and the third user group A determined identification evaluation function, the generated first attribute information or the second attribute information, and the first attribute information of each user included in the first user group, the second user group, and the third user group or updating the discriminant function, the membership probabilities, and the attribute probabilities using a generative adversarial network, based on a generated evaluation function that evaluates the error of the second attribute information;
Based on the updated affiliation probability and the attribute probability, a plurality of the second attribute information regarding each of the users included in the second user group and a plurality of the second attribute information regarding each of the users included in the third user group. 1 attribute information;
Information processing method that performs

to the computer,
(1) A user identifier for specifying each user included in the first user group, a plurality of first attribute information included in the first attribute category for each user, and a second attribute category for each user. (2) a user identifier for identifying each user included in the second user group; and a plurality of items included in the first attribute category for each user (3) a user identifier for identifying each user included in the third user group, and a plurality of second attributes included in the second attribute category for each user a function of acquiring third user group information including only two attribute information;
A belonging probability indicating a probability that each user included in the first user group belongs to each of a plurality of clusters, and an attribute probability indicating a realization probability of each of the first attribute information and the second attribute information in each cluster. and the ability to set
Based on the belonging probability and the attribute probability, a plurality of the second attribute information regarding each of the users included in the second user group and a plurality of the first attribute information regarding each of the users included in the third user group. , and a function that generates
From an identification function that identifies whether generated information is included in the first attribute information and second attribute information of each user included in the first user group, the second user group, and the third user group A determined identification evaluation function, the generated first attribute information or the second attribute information, and the first attribute information of each user included in the first user group, the second user group, and the third user group or a function of updating the discriminant function, the belonging probability, and the attribute probability using a generative adversarial network, based on a generated evaluation function that evaluates the error of the second attribute information;
Based on the updated affiliation probability and the attribute probability, a plurality of the second attribute information regarding each of the users included in the second user group and a plurality of the second attribute information regarding each of the users included in the third user group. 1 attribute information;
program to realize