JP2022090562A

JP2022090562A - Information processing device and program

Info

Publication number: JP2022090562A
Application number: JP2020203042A
Authority: JP
Inventors: 俊彦山崎; Toshihiko Yamazaki; 軼威張; Yi Wei Zhang; 雪▲テイ▼ 汪; Xueting Wang
Original assignee: University of Tokyo NUC
Current assignee: University of Tokyo NUC
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2022-06-17

Abstract

To provide an information processing device and a program that can use even social media where relations are sparse, by quantifying the relations between information items such as analysis targets and articles for sale.SOLUTION: An information processing device includes: a first machine learning processing unit that extracts a first analysis target group in which relations set between analysis targets are considered to have a mutually predetermined prescribed first relation, and machine-learns distributed expression information related to each of the analysis targets, based on permutations or combinations thereof; and a second machine learning processing unit that sets, as a second analysis target group, analysis targets in which relations set between the analysis targets are considered to have a mutually predetermined prescribed second relation and analysis targets in which relations set between the analysis targets are considered not to have the mutually predetermined prescribed second relation, and machine-learns distributed expression information of the analysis targets based on the relation set for each of the analysis targets included in the second analysis target group.SELECTED DRAWING: Figure 2

Description

本発明は、分析対象に係る分析処理を行う情報処理装置及びプログラムに関する。 The present invention relates to an information processing apparatus and a program that perform analysis processing related to an analysis target.

近年ではソーシャルネットワークサービス（ＳＮＳ）等の種々のソーシャルメディアが多くの人々の生活に欠かせないものとなってきている。しかし一方で、ＳＮＳ等で発信される情報は膨大となっており、その分析対象に適切な情報を提供することが困難となってきている。 In recent years, various social media such as social network services (SNS) have become indispensable to the lives of many people. However, on the other hand, the amount of information transmitted by SNS and the like is enormous, and it is becoming difficult to provide appropriate information to the analysis target.

特許文献１には、分析対象グループに属する分析対象のうち、他の分析対象が購入したものを購入していない分析対象に対して購入推薦情報を提供する例が開示されている。 Patent Document 1 discloses an example in which purchase recommendation information is provided to an analysis target that has not been purchased by another analysis target among the analysis targets belonging to the analysis target group.

特開２０１９－１２５３５９号公報JP-A-2019-125359A

上記従来例によれば、例えばあるＳＮＳなどのソーシャルメディアのユーザ間で、共通して購入されている商品の情報を提供することができる。しかしながら、ソーシャルメディアの規模が増大するにつれて、分析対象であるユーザの数も大規模になり、ユーザの興味が多様になるため、共通して購入した商品や共通して利用しているハッシュタグ、などといった情報の数はユーザの数に対して小さいものとなる。 According to the above-mentioned conventional example, it is possible to provide information on products that are commonly purchased among users of social media such as a certain SNS. However, as the scale of social media grows, so does the number of users to be analyzed, and the interests of users diversify. The number of information such as is small compared to the number of users.

つまり、分析対象の数が増大するにつれて、一般に分析対象間の関係や、分析対象と購入した商品、分析対象が用いたハッシュタグ（見出し語）などの関係を表す情報は、疎（sparse）になっているのが実情である。このような疎な関係に対しては、上記従来例のような方法は有効でないことが知られている。 In other words, as the number of analysis targets increases, information that generally represents the relationship between the analysis targets, the products purchased with the analysis target, and the hashtag (headword) used by the analysis target becomes sparse. The reality is that it has become. It is known that the method as in the above conventional example is not effective for such a sparse relationship.

本発明は上記実情に鑑みて為されたもので、関係が疎であるようなソーシャルメディアであっても、分析対象と商品などといった情報項目間の関係を定量化し、利用に供することのできる情報処理装置、及びプログラムを提供することを、その目的の一つとする。 The present invention has been made in view of the above circumstances, and information that can be used by quantifying the relationship between information items such as analysis targets and products even in social media where the relationship is sparse. One of the purposes is to provide a processing device and a program.

上記従来例の問題点を解決する本発明の一態様は、情報処理装置であって、互いの関係が設定された複数の分析対象について、分析対象ごとに設定されるベクトル値を、分散表現情報として保持する手段と、前記分析対象間に設定された関係が、互いに予め定めた所定の第１の関係にあるとされる前記複数の分析対象の一部を、第１分析対象群として抽出し、当該抽出した第１分析対象群の順列または組み合わせに基づいて、当該第１分析対象群に含まれる分析対象のそれぞれに係る分散表現情報を機械学習する第１機械学習手段と、前記分析対象間に設定された関係が、互いに予め定めた所定の第２の関係にあるとされる前記複数の分析対象の一部と、前記分析対象間に設定された関係が、互いに予め定めた所定の第２の関係にないとされる前記複数の分析対象の一部とを第２分析対象群として、当該第２分析対象群に含まれる分析対象の各々に設定された関係に基づいて、前記第２分析対象群に含まれる分析対象のそれぞれに係る分散表現情報を機械学習する第２機械学習手段と、前記第１機械学習手段と第２機械学習手段とにより機械学習された分析対象ごとの分散表現情報を、所定の処理に供する処理手段と、を含むものである。 One aspect of the present invention that solves the problems of the above-mentioned conventional example is an information processing apparatus, in which vector values set for each analysis target are set as distributed representation information for a plurality of analysis targets for which mutual relationships are set. A part of the plurality of analysis targets, which is said to have a predetermined first relationship in which the means set as the analysis target and the analysis target have a predetermined first relationship with each other, is extracted as the first analysis target group. , The first machine learning means for machine learning the distributed expression information related to each of the analysis targets included in the first analysis target group based on the sequence or combination of the extracted first analysis target group, and the analysis target. A part of the plurality of analysis targets in which the relationship set in the above is considered to be a predetermined second relationship predetermined to each other, and the relationship set between the analysis targets is a predetermined second relationship predetermined to each other. The second analysis target group is defined as a part of the plurality of analysis targets that are not related to the second analysis target group, and is based on the relationship set for each of the analysis targets included in the second analysis target group. A second machine learning means for machine learning the distributed expression information related to each of the analysis targets included in the analysis target group, and a distributed expression for each analysis target machine-learned by the first machine learning means and the second machine learning means. It includes a processing means for providing information to a predetermined processing.

本発明によると、関係が疎であるようなソーシャルメディアであっても、分析対象と商品などといった情報項目間の関係を定量化し、利用に供することができる。 According to the present invention, even in social media where the relationship is sparse, the relationship between information items such as an analysis target and a product can be quantified and used.

本発明の実施の形態に係る情報処理装置の構成例を表すブロック図である。It is a block diagram which shows the structural example of the information processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置の例を表す機能ブロック図である。It is a functional block diagram which shows the example of the information processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置が生成するグラフの概要例を表す説明図である。It is explanatory drawing which shows the outline example of the graph generated by the information processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置の動作例を表すフローチャート図である。It is a flowchart which shows the operation example of the information processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る情報処理装置の動作の例を表す説明図である。It is explanatory drawing which shows the example of the operation of the information processing apparatus which concerns on embodiment of this invention.

本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る情報処理装置１は、図１に例示するように、制御部１１、記憶部１２、操作部１３、表示部１４、及び通信部１５を含んで構成されている。制御部１１は、ＣＰＵ等のプログラム制御デバイスであり、記憶部１２に格納されたプログラムに基づいて動作する。本実施の形態ではこの制御部１１は、互いの関係が設定された複数の分析対象について、分析対象ごとに設定されるベクトル値を、分散表現情報として記憶部１２に保持する。またこの制御部１１は、当該分析対象間に設定された関係を用いて、所定の第１の関係にあると判断される複数の分析対象の一部を、第１分析対象群として抽出する。そして制御部１１は、当該抽出した第１分析対象群の順列または組み合わせに基づいて、分散表現情報を機械学習する第１機械学習処理を実行する。 An embodiment of the present invention will be described with reference to the drawings. As illustrated in FIG. 1, the information processing apparatus 1 according to the embodiment of the present invention includes a control unit 11, a storage unit 12, an operation unit 13, a display unit 14, and a communication unit 15. The control unit 11 is a program control device such as a CPU, and operates based on the program stored in the storage unit 12. In the present embodiment, the control unit 11 holds the vector values set for each analysis target in the storage unit 12 as distributed expression information for a plurality of analysis targets for which the relationship with each other is set. Further, the control unit 11 extracts a part of a plurality of analysis targets determined to have a predetermined first relationship as a first analysis target group by using the relationship set between the analysis targets. Then, the control unit 11 executes the first machine learning process for machine learning the distributed expression information based on the order or combination of the extracted first analysis target group.

またこの制御部１１は、第１分析対象群とは別の分析対象を第２分析対象群として抽出し、当該抽出した第２分析対象群に含まれる分析対象の各々に設定された関係に基づいて、第２分析対象群に含まれる分析対象のそれぞれに係る分散表現情報を機械学習する第２機械学習処理を実行する。 Further, the control unit 11 extracts an analysis target different from the first analysis target group as the second analysis target group, and is based on the relationship set for each of the analysis targets included in the extracted second analysis target group. Then, the second machine learning process for machine learning the distributed expression information related to each of the analysis targets included in the second analysis target group is executed.

本実施の形態の一例では、制御部１１は、ある分析対象Ａとの間に設定された関係が、互いに予め定めた所定の第２の関係にあると判断される、分析対象Ａとは異なる分析対象（他の分析対象と呼ぶ。複数あってもよい）Ｂと、ある分析対象Ａとの間に設定された関係が、互いに予め定めた所定の第２の関係にないと判断される他の分析対象Ｂとを含む第２分析対象群を、複数の分析対象のうちから抽出する。そして制御部１１は、当該抽出した第２分析対象群に含まれる分析対象の各々に設定された関係に基づいて、第２分析対象群に含まれる分析対象のそれぞれに係る分散表現情報を機械学習する。 In one example of the present embodiment, the control unit 11 is different from the analysis target A, in which the relationship set with a certain analysis target A is determined to be in a predetermined second relationship predetermined with each other. Other than that, it is determined that the relationship set between the analysis target (referred to as another analysis target. There may be a plurality of them) B and a certain analysis target A does not have a predetermined second relationship predetermined with each other. The second analysis target group including the analysis target B of the above is extracted from a plurality of analysis targets. Then, the control unit 11 machine-learns the distributed expression information related to each of the analysis targets included in the second analysis target group based on the relationship set for each of the analysis targets included in the extracted second analysis target group. do.

そして制御部１１は、第１機械学習処理と第２機械学習処理とで得られた分析対象ごとの分散表現情報を、所定の処理、例えば分析対象間の類似性の判断等の処理に供する。この制御部１１の詳しい処理の内容については後に述べる。 Then, the control unit 11 uses the distributed expression information for each analysis target obtained by the first machine learning process and the second machine learning process for a predetermined process, for example, a process such as determination of similarity between the analysis targets. The details of the processing of the control unit 11 will be described later.

記憶部１２は、メモリデバイスやディスクデバイス等であり、制御部１１により実行されるプログラムを保持する。この記憶部１２に格納されたプログラムは、コンピュータ可読、かつ非一時的な記録媒体に格納されて提供され、この記憶部１２に格納されたものであってもよい。また本実施の形態の記憶部１２は、制御部１１のワークメモリとしても動作する。 The storage unit 12 is a memory device, a disk device, or the like, and holds a program executed by the control unit 11. The program stored in the storage unit 12 may be provided by being stored in a computer-readable and non-temporary recording medium, and may be stored in the storage unit 12. The storage unit 12 of the present embodiment also operates as a work memory of the control unit 11.

操作部１３は、キーボードやマウス等であり、情報処理装置１のユーザの操作を受け入れて、当該操作の内容を表す情報を、制御部１１に出力する。表示部１４は、ディスプレイ等であり、制御部１１から入力される指示に従って情報を表示出力する。 The operation unit 13 is a keyboard, a mouse, or the like, accepts an operation of the user of the information processing device 1, and outputs information representing the content of the operation to the control unit 11. The display unit 14 is a display or the like, and displays and outputs information according to an instruction input from the control unit 11.

通信部１５は、ネットワークインタフェース等である。本実施の形態の一例では、この通信部１５は、制御部１１から入力される指示に従って、ＳＮＳのサーバＳ等にアクセスし、これらのサーバＳ等から分析対象の情報や、分析対象の投稿した情報等を受信して制御部１１に出力する。またこの通信部１５は、制御部１１から入力される指示に従い、ネットワークを介して、ユーザ側の端末Ｕに対し、情報を送出し、ユーザに提示させることとしてもよい。 The communication unit 15 is a network interface or the like. In an example of this embodiment, the communication unit 15 accesses the SNS server S or the like according to an instruction input from the control unit 11, and posts information to be analyzed or an analysis target from these servers S or the like. It receives information and the like and outputs it to the control unit 11. Further, the communication unit 15 may send information to the terminal U on the user side via the network according to the instruction input from the control unit 11 and have the user present the information.

次に、本実施の形態の制御部１１の動作の内容について説明する。本実施の形態の制御部１１は、記憶部１２に格納されたプログラムに従って動作することにより、図２に例示するように、機能的に、情報取得部２１と、グラフ生成部２２と、第１機械学習処理部２３と、第２機械学習処理部２４と、統合処理部２５と、評価処理部２６と、出力処理部２７とを含む構成を実現する。 Next, the contents of the operation of the control unit 11 of the present embodiment will be described. The control unit 11 of the present embodiment operates according to the program stored in the storage unit 12, and functionally, as illustrated in FIG. 2, the information acquisition unit 21, the graph generation unit 22, and the first unit. A configuration including a machine learning processing unit 23, a second machine learning processing unit 24, an integrated processing unit 25, an evaluation processing unit 26, and an output processing unit 27 is realized.

なお、以下の説明では、一例として、ツイッター（登録商標）や、インスタグラム（登録商標）等のＳＮＳの利用者のアカウント（便宜的に一般アカウントと呼ぶ）と、利用者がフォローする別のアカウント（以下の例では特に、商品ブランドの公式のアカウントなど、一般アカウント以外の、予め定めたアカウントに限定する、以下、公式アカウントと呼ぶ）と、各利用者（一般アカウントであると公式アカウントであるとを問わない）の投稿に含まれるタグ（ハッシュタグ）とのそれぞれを分析対象として、各分析対象を特定する情報を取得する。 In the following explanation, as an example, an account of an SNS user such as Twitter (registered trademark) or Instagram (registered trademark) (referred to as a general account for convenience) and another account that the user follows. (In the following example, in particular, it is limited to predetermined accounts other than general accounts such as official accounts of product brands, hereinafter referred to as official accounts) and each user (general account is an official account). The information that identifies each analysis target is acquired by using each of the tags (hash tags) included in the post (regardless of) as the analysis target.

この例ではアカウントと当該アカウントの投稿に含まれるタグの情報、並びに一般アカウントを特定する情報と、当該情報で特定される一般アカウントがフォローしている公式アカウントを特定する情報と、が本発明における、分析対象間の互いの関係に相当する。 In this example, the information of the account and the tag included in the post of the account, the information specifying the general account, and the information specifying the official account followed by the general account specified by the information are in the present invention. , Corresponds to the mutual relationship between the analysis targets.

情報取得部２１は、評価の対象とする分析対象を含む、複数の分析対象の各々を特定する情報と、当該分析対象の少なくとも一人に対して所定の関係を有する、少なくとも一種類の情報項目とを取得する。本実施の形態の一例では、ツイッター（登録商標）や、インスタグラム（登録商標）等のＳＮＳの一つを対象として、当該対象としたＳＮＳの利用者のアカウントを分析対象とする。 The information acquisition unit 21 includes information that identifies each of a plurality of analysis targets, including an analysis target to be evaluated, and at least one type of information item that has a predetermined relationship with at least one of the analysis targets. To get. In one example of this embodiment, one of the SNSs such as Twitter (registered trademark) and Instagram (registered trademark) is targeted, and the account of the user of the targeted SNS is analyzed.

この例では、情報取得部２１は、当該ＳＮＳのウェブサイトあるいはＳＮＳの運営者から、当該ＳＮＳの利用者のアカウントの一覧（全体でなく、分析対象となるアカウントを、所定の条件により限定してもよい）と、その一覧に属するアカウントからの投稿に含まれるタグ（ハッシュタグ）の情報と、当該一覧に属するアカウントごとに、当該アカウントがフォローする別のアカウント（ここでは公式アカウントに限るものとする）を特定する情報とを取得すればよい。 In this example, the information acquisition unit 21 limits the list of accounts of the users of the SNS (not the entire account, but the accounts to be analyzed, from the website of the SNS or the operator of the SNS according to predetermined conditions. (Maybe), the tag (hash tag) information included in the posts from the accounts belonging to the list, and another account that the account follows for each account belonging to the list (here, it is limited to the official account). It is sufficient to acquire the information that specifies (to be).

グラフ生成部２２は、情報取得部２１が取得した情報で特定される、分析対象の各々を表すノードを設定するとともに、所定の関係を有するノード間を連結したグラフを生成する。 The graph generation unit 22 sets nodes representing each of the analysis targets specified by the information acquired by the information acquisition unit 21, and generates a graph in which nodes having a predetermined relationship are connected.

図３に、ＳＮＳを対象とした場合の、このグラフ生成部２２が生成するグラフを概念的に示す。図３では、一般アカウントの各々を表すノードを含む層Ｂと、公式アカウントのノードを含む層Ａと、各アカウントが利用したハッシュタグを表すノードを含む層Ｃとをそれぞれ別々の層として表している。 FIG. 3 conceptually shows a graph generated by the graph generation unit 22 when the SNS is targeted. In FIG. 3, a layer B including a node representing each of general accounts, a layer A including a node of an official account, and a layer C including a node representing a hashtag used by each account are represented as separate layers. There is.

グラフ生成部２２は、各層間、あるいは層内のそれぞれについて予め定めた関係を有するノード間にエッジを設定する。具体的に図３の例では、次の規則に従い、エッジを設定する：
・層Ｂと層Ａとの間では、層Ｂ内のノードと、当該ノードが表す一般アカウントがフォローする公式アカウントを表す層Ａ内のノードとの間にエッジを設定する。
・層Ｂと層Ｃとの間では、層Ｂ内のノードと、当該ノードが表す一般アカウントが投稿したテキストに含まれるハッシュタグを表す層Ｃ内のノードとの間にエッジを設定する。
・層Ｃ内のノード間では、一つの投稿に一度に含まれる（つまり共起している）複数のハッシュタグのそれぞれを表すノードの間にエッジを設定する。
こととする。ここでの例では、公式アカウントと公式アカウントが利用したハッシュタグとの関係は考慮しないこととしている。 The graph generation unit 22 sets an edge between each layer or between nodes having a predetermined relationship with each other in the layer. Specifically, in the example of FIG. 3, the edge is set according to the following rule:
An edge is set between the layer B and the layer A between the node in the layer B and the node in the layer A representing the official account that the general account represented by the node represents.
An edge is set between the layer B and the layer C between the node in the layer B and the node in the layer C representing the hashtag included in the text posted by the general account represented by the node.
-Between the nodes in layer C, an edge is set between the nodes representing each of the plurality of hashtags included (that is, co-occurring) at one time in one post.
I will do it. In the example here, the relationship between the official account and the hashtag used by the official account is not considered.

グラフ生成部２２は、上述の規則に従って層Ｂと層Ａとの間、層Ｂと層Ｃとの間、層Ｂ内のノード間、層Ｃ内のノード間に係るエッジを設定する。つまりグラフ生成部２２は、層Ｂに属する一般アカウントのノードを一つずつ選択し、選択したノードから、当該ノードが表す一般アカウントがフォローする、層Ａに属する公式アカウントのノードへのエッジを設定する。 The graph generation unit 22 sets an edge between the layer B and the layer A, between the layer B and the layer C, between the nodes in the layer B, and between the nodes in the layer C according to the above-mentioned rule. That is, the graph generation unit 22 selects the nodes of the general account belonging to the layer B one by one, and sets the edge from the selected node to the node of the official account belonging to the layer A, which the general account represented by the node follows. do.

またグラフ生成部２２は、上記層Ｂのうちから選択したノードから、当該選択したノードが表す一般アカウントの投稿に含まれるハッシュタグのノードに対してエッジを設定する。 Further, the graph generation unit 22 sets an edge from the node selected from the layer B to the node of the hash tag included in the post of the general account represented by the selected node.

グラフ生成部２２は、このエッジの設定を、層Ｂに属する一般アカウントの各ノードについて行う。さらに、このグラフ生成部２２は、所定の関係を有する一対のハッシュタグを表す一対のノード、及び所定の関係を有する一対の公式アカウントを表す一対のノードをさらに連結してもよい。 The graph generation unit 22 sets this edge for each node of the general account belonging to the layer B. Further, the graph generation unit 22 may further concatenate a pair of nodes representing a pair of hashtags having a predetermined relationship and a pair of nodes representing a pair of official accounts having a predetermined relationship.

具体的にこのグラフ生成部２２は、層Ｃに属する一対のノードがそれぞれ表す、互いに異なる一対のハッシュタグであって、一つの投稿に共通して現れている（つまり共起している）一対のハッシュタグを表す一対のノード間にエッジを設定する。例えばある投稿にハッシュタグα、β、γが含まれる場合、グラフ生成部２２は、ハッシュタグα－β間、β－γ間、α－γ間にエッジを設定する。 Specifically, this graph generation unit 22 is a pair of hashtags that are different from each other and are represented by a pair of nodes belonging to layer C, and a pair that appears in common (that is, co-occurs) in one post. Set an edge between a pair of nodes that represent the hashtag of. For example, when a post contains hashtags α, β, and γ, the graph generation unit 22 sets edges between hashtags α-β, β-γ, and α-γ.

またグラフ生成部２２は、この層Ｃ内のノード間のエッジには、共起が生じた回数に応じて、重みを設定してもよい。さらに、グラフ生成部２２は、一般アカウントのノード（層Ｂに属するノード）と、当該一般アカウントの利用者からの投稿に含まれるハッシュタグのノード（層Ｃに属するノード）との間のエッジについても、当該一般アカウントによる当該ハッシュタグの使用回数に応じて重みを設定してもよい。 Further, the graph generation unit 22 may set weights on the edges between the nodes in the layer C according to the number of times co-occurrence occurs. Further, the graph generation unit 22 describes the edge between the node of the general account (node belonging to layer B) and the node of the hashtag included in the post from the user of the general account (node belonging to layer C). However, the weight may be set according to the number of times the hashtag is used by the general account.

ここでグラフ生成部２２が設定する各エッジは、いずれも方向のないエッジとしておいてよい。つまり、グラフ生成部２２が生成するグラフは、無向グラフでよい。 Here, each edge set by the graph generation unit 22 may be an edge having no direction. That is, the graph generated by the graph generation unit 22 may be an undirected graph.

第１機械学習処理部２３は、グラフ生成部２２が生成したグラフに含まれるノードの連結関係、つまり、分析対象間に設定された関係を用いて所定の第１の関係にあると判断される複数の分析対象の一部を、第１分析対象群として抽出する。 The first machine learning processing unit 23 is determined to have a predetermined first relationship using the connection relationship of the nodes included in the graph generated by the graph generation unit 22, that is, the relationship set between the analysis targets. A part of a plurality of analysis targets is extracted as a first analysis target group.

一例としてこの第１機械学習処理部２３は、グラフ生成部２２が生成したグラフの任意のノードを起点として、当該起点から任意に定めた回数だけエッジを辿って到達できるノード（第１分析対象群）のリスト（順列、または組み合わせ）を複数組生成する。以下、このように、起点から任意に定めた回数以内の回数だけエッジを辿って到達できるとの条件（所定の条件）を満足するノードを、起点となったノードに対する連結ノードと呼び、これら起点ノードと連結ノードとは互いに連結された関係にあると呼ぶこととする。 As an example, the first machine learning processing unit 23 starts from an arbitrary node of the graph generated by the graph generation unit 22, and can be reached by tracing an edge a predetermined number of times from the starting point (first analysis target group). ) List (sequential column or combination) is generated. Hereinafter, a node that satisfies the condition (predetermined condition) that the edge can be reached by tracing the edge within an arbitrarily determined number of times from the starting point is referred to as a connecting node for the node that is the starting point, and these starting points are referred to. It is said that the node and the connected node are connected to each other.

そして第１機械学習処理部２３は、当該抽出した第１分析対象群の、例えば順列に基づいて、分散表現情報を機械学習する。具体的に第１機械学習処理部２３は、生成した各リストに含まれる各ノードについてネガティブサンプリングによるスキップグラム（skipgram）学習を行って、各ノードの分散表現情報であるベクトル情報を得る。この方法は、例えば、T. Mikolov, et al., “Distributed Representations of Words and Phrases and their Compositionality”, arXiv, arXiv:1310.4546に開示されているので、この方法についてのここでの詳細な説明は省略する。 Then, the first machine learning processing unit 23 machine-learns the distributed expression information based on, for example, a sequence of the extracted first analysis target group. Specifically, the first machine learning processing unit 23 performs skipgram learning by negative sampling for each node included in each generated list, and obtains vector information which is distributed representation information of each node. This method is disclosed, for example, in T. Mikolov, et al., “Distributed Representations of Words and Phrases and their Compositionality”, arXiv, arXiv: 1310.4546, so detailed description of this method is omitted here. do.

すなわちこの第１機械学習処理部２３は、いわゆるグラフ・エンベディング（Graph embedding）の方法により、生成されたグラフの各ノードの分散表現情報として、ノードごとのベクトル情報を得る。 That is, the first machine learning processing unit 23 obtains vector information for each node as distributed representation information of each node of the generated graph by a so-called graph embedding method.

第２機械学習処理部２４は、分析対象間に設定された関係が、互いに予め定めた所定の関係にあるとされる複数の分析対象の一部と、分析対象間に設定された関係が、互いに予め定めた所定の関係にないとされる前記複数の分析対象の一部とを第２分析対象群として、当該第２分析対象群に含まれる分析対象の各々に設定された関係に基づいて、当該第２分析対象群に含まれる分析対象のそれぞれに係る分散表現情報を機械学習する。 In the second machine learning processing unit 24, the relationship set between the analysis targets is a part of a plurality of analysis targets that are considered to have a predetermined relationship predetermined with each other, and the relationship set between the analysis targets is set. A part of the plurality of analysis targets that are not considered to have a predetermined relationship with each other as a second analysis target group, based on the relationship set for each of the analysis targets included in the second analysis target group. , Machine learning of distributed expression information related to each of the analysis targets included in the second analysis target group.

本実施の形態の一例では、この第２機械学習処理部２４は、グラフ情報を用いた所定の条件により互いに連結された関係にあると判断される場合に、上記所定の関係にあるものとして、第２分析対象群を抽出する。 In an example of the present embodiment, when it is determined that the second machine learning processing unit 24 has a relationship of being connected to each other under a predetermined condition using graph information, it is assumed that the second machine learning processing unit 24 has the above-mentioned predetermined relationship. The second analysis target group is extracted.

具体的に、この例の処理を行う第２機械学習処理部２４は、層Ｂに属する一般アカウントの各ノードを順次、注目ノードＢｎとして選択しつつ次の処理を行う。すなわち第２機械学習処理部２４は、層Ｃに属するノードのうち、選択した注目ノードＢｎとの間でエッジが設定されているノードＣｘを少なくとも一つ選択する。また、第２機械学習処理部２４は、層Ｃに属するノードのうち、選択した注目ノードＢｎとの間でエッジが設定されていないノードＣｙを少なくとも一つ選択する。ここで選択されたノードＣｘ，Ｃｙが本発明における第２分析対象群となる。 Specifically, the second machine learning processing unit 24 that performs the processing of this example performs the next processing while sequentially selecting each node of the general account belonging to the layer B as the node of interest Bn. That is, the second machine learning processing unit 24 selects at least one node Cx whose edge is set between the selected node Bn and the node of interest among the nodes belonging to the layer C. Further, the second machine learning processing unit 24 selects at least one node Cy whose edge is not set between the selected node Bn and the node belonging to the layer C among the nodes belonging to the layer C. The nodes Cx and Cy selected here are the second analysis target group in the present invention.

なお、ここでは第１分析対象群については起点としたノードから予め定めた回数以内の回数だけエッジを辿って到達できるとの条件を満足するノードを抽出しているのに対し、第２分析対象群については起点としたノードとの間でエッジが設定されているノードと、エッジが設定されていないノードとを選択することとしているが、これは一例であり、本実施の形態の別の例では、第２の分析対象群の抽出においても、起点としたノードから予め定めた回数以内の回数だけエッジを辿って到達できるとの条件を満足するノードと、当該条件を満足しないノードとを選択することとしてもよい。 As for the first analysis target group, the node satisfying the condition that the node can be reached by tracing the edge within a predetermined number of times from the node as the starting point is extracted here, whereas the second analysis target is extracted. Regarding the group, a node having an edge set between the node and the node as the starting point and a node having no edge set are selected, but this is an example, and is another example of the present embodiment. Then, also in the extraction of the second analysis target group, a node that satisfies the condition that the node can be reached by tracing the edge within a predetermined number of times from the node as the starting point and a node that does not satisfy the condition are selected. You may do it.

第２機械学習処理部２４は、注目ノードＢｎの分散表現情報とノードＣｘの（複数ある場合はそのそれぞれの）分散表現情報との距離が、注目ノードＢｎの分散表現情報とノードＣｙ（複数ある場合はそのそれぞれの）分散表現情報との距離よりも近接した距離となるように、少なくとも注目ノードＢｎの分散表現情報を定めることとする。なお、ここでの各ノードの分散表現情報の初期値は、ランダムに定めておくこととしてもよいし、第１機械学習処理部２３が得たベクトル値に設定されてもよい。 In the second machine learning processing unit 24, the distance between the distributed representation information of the node Bn of interest and the distributed representation information of the node Cx (if there are a plurality of them) is different between the distributed representation information of the node Bn of interest and the node Cy (plural). In that case, at least the distributed representation information of the node Bn of interest is determined so that the distance is closer than the distance to the distributed representation information (of each of them). The initial value of the distributed representation information of each node here may be set at random, or may be set to the vector value obtained by the first machine learning processing unit 23.

初期化後、ｉ番目に選択した注目ノードＢｎの分散表現情報をｘi（ｉ＝１，２，…）、ノードＣｘの分散表現情報をｘj（ｊ＝１，２，…）、ノードＣｙの分散表現情報をｘj′（ｊ′＝１，２…）として、第２機械学習処理部２４は、損失ＬPARIGを、

として求める。ここでΘを含む項は正則化項（regularization term）であり、

である。またここでａはゲインであり、適宜実験的に、例えばａ＝１などと定める。 After initialization, the distributed representation information of the node Bn of interest selected i-th is xi (i = 1, 2, ...), The distributed representation information of the node Cx is xj (j = 1, 2, ...), And the distribution of the node Cy. Assuming that the expression information is xj'(j'= 1, 2, ...), The second machine learning processing unit 24 sets the loss LPARIG.

Ask as. Here, the term containing Θ is a regularization term, and is a regularization term.

Is. Further, here, a is a gain, and is appropriately experimentally determined, for example, a = 1.

さらにここで

は、条件Ｂに係る指示関数であり、（１）式の例では所定のｍを定めたとき、

であれば「１」、そうでなければ「０」となる関数である。 Further here

Is an indicator function according to the condition B, and in the example of the equation (1), when a predetermined m is defined,

If it is, it is "1", otherwise it is "0".

つまりこの損失関数ＬPARIGは、注目ノードＢｎの分散表現情報とノードＣｘの（複数ある場合はそのそれぞれの）分散表現情報との距離が、注目ノードＢｎの分散表現情報とノードＣｙ（複数ある場合はそのそれぞれの）分散表現情報との距離よりも近接した距離となるように、少なくとも注目ノードＢｎの分散表現情報を定めようとするものであるが、そもそも注目ノードＢｎの分散表現情報とあるノードＣｘの分散表現情報との距離が、注目ノードＢｎの分散表現情報とあるノードＣｙの分散表現情報との距離よりも十分小さい場合のみ、当該距離の差を考慮するように調整したものである。 That is, in this loss function LPARIG, the distance between the distributed representation information of the attention node Bn and the distributed representation information of the node Cx (if there are a plurality of them) is the variance representation information of the attention node Bn and the node Cy (if there are a plurality of them). At least the distributed representation information of the attention node Bn is determined so that the distance is closer than the distance to the distributed representation information (of each of them), but in the first place, the distributed representation information of the attention node Bn and a certain node Cx Only when the distance from the distributed expression information of the node Bn of interest is sufficiently smaller than the distance between the distributed expression information of the node Cy of interest and the distributed expression information of a certain node Cy, the difference in the distance is taken into consideration.

第２機械学習処理部２４は、この損失ＬPARIGを最小とするように注目ノードＢｎの分散表現情報ｘiと、ノードＣｘの分散表現情報ｘj（ｊ＝１，２，…）、ノードＣｙの分散表現情報をｘj′（ｊ′＝１，２…）を更新して機械学習する。 The second machine learning processing unit 24 has the distributed representation information xi of the node Bn of interest, the distributed representation information xj (j = 1, 2, ...) Of the node Cx, and the distributed representation of the node Cy so as to minimize this loss LPARIG. The information is machine-learned by updating xj'(j'= 1, 2, ...).

第２機械学習処理部２４は、層Ｂに属する各ノードの分散表現情報ｘi（ｉ＝１，２…）をこの処理により取得する。なお、ここで述べた第２機械学習処理部２４における、分散表現情報を損失が小さくなるよう更新する機械学習の処理は広く知られている処理を採用できるので、ここでのこれ以上の詳細な説明を省略する。 The second machine learning processing unit 24 acquires the distributed representation information xi (i = 1, 2, ...) Of each node belonging to the layer B by this processing. Since the machine learning process for updating the distributed representation information so that the loss is small in the second machine learning processing unit 24 described here can be a widely known process, more detailed processing here can be adopted. The explanation is omitted.

統合処理部２５は、分析対象を表すノードについて、第１機械学習処理部２３で得られた分散表現情報と、第２機械学習処理部２４で得られた分散表現情報とを用いて、後に評価処理部２６で利用する分析対象を表すノードについての分散表現情報を生成する。 The integrated processing unit 25 later evaluates the node representing the analysis target by using the distributed expression information obtained by the first machine learning processing unit 23 and the distributed expression information obtained by the second machine learning processing unit 24. Generates distributed representation information about the node representing the analysis target used in the processing unit 26.

具体的にこの統合処理部２５は、層Ｂに属する各ノードＢi（ｉ＝１，２，…）についての分散表現情報Ｘi（以下、統合分散表現情報と呼ぶ）を、当該ノードＢiについて第１機械学習処理部２３で得られた分散表現情報ｙiと、第２機械学習処理部２４で得られた分散表現情報ｘiとを用いて、

とする。なお、ここで

は、ベクトルの要素を連接することを意味する。つまりここでは分散表現情報ｙiの次元がｎy、分散表現情報ｘiの次元がｎxであるとするとき、統合分散表現情報Ｘiは、その成分の数が（ｎy＋ｎx）であり、その最初のｎy個はｙiの成分とし、残るｎx個の成分はｘiの成分としたものである。 Specifically, the integrated processing unit 25 uses the distributed representation information Xi (hereinafter referred to as integrated distributed representation information) for each node Bi (i = 1, 2, ...) Belonging to the layer B as the first for the node Bi. Using the distributed expression information yi obtained by the machine learning processing unit 23 and the distributed expression information xi obtained by the second machine learning processing unit 24,

And. In addition, here

Means concatenating the elements of a vector. That is, here, assuming that the dimension of the distributed representation information yi is ny and the dimension of the distributed representation information xi is nx, the integrated distributed representation information Xi has the number of components (ny + nx), and the first ny pieces are The components of yi are used, and the remaining nx components are the components of xi.

評価処理部２６は、統合処理部２５により得られた層Ｂに属するノードごとの統合分散表現情報を用いて個人アカウント間の類似性を評価する。この例では評価処理部２６は、層Ｂに属する複数のノードについて、それらの統合分散表現情報間の距離情報を得る。この距離情報は、ユークリッド距離であってもよいし、コサイン類似度（各ベクトルのノルムの積でベクトルの内積を除したもの）であってもよい。 The evaluation processing unit 26 evaluates the similarity between individual accounts using the integrated distributed representation information for each node belonging to the layer B obtained by the integrated processing unit 25. In this example, the evaluation processing unit 26 obtains distance information between the integrated distributed representation information of the plurality of nodes belonging to the layer B. This distance information may be the Euclidean distance or the cosine similarity (the product of the norms of each vector divided by the inner product of the vectors).

そして評価処理部２６は、個人アカウントのノードごとに、当該ノードと別のノードとの間の距離情報に基づいて距離が近いと判断される別のノード（ユークリッド距離であれば所定のしきい値を下回るもの、コサイン類似度であれば、０以上１未満の所定のしきい値を超えるもの）を類似アカウントのノードとして選択する。そして評価処理部２６は、個人アカウントのノードごとに選択した類似アカウントのノードを特定する情報を出力する。 Then, the evaluation processing unit 26 determines that the distance is short based on the distance information between the node and another node for each node of the personal account (if it is a Euclidean distance, a predetermined threshold value). If it is less than, or if it is a cosine similarity, it exceeds a predetermined threshold of 0 or more and less than 1) is selected as a node of a similar account. Then, the evaluation processing unit 26 outputs information for identifying a node of a similar account selected for each node of the personal account.

出力処理部２７は、評価処理部２６から入力される情報で特定される個人アカウントのアカウント名やそのプロファイルなどの情報を、表示部１４に表示出力したり、通信部１５を介してユーザ側の端末Ｕへ送出したりするなどして、情報処理装置１のユーザに提示する。 The output processing unit 27 displays and outputs information such as the account name and its profile of the personal account specified by the information input from the evaluation processing unit 26 to the display unit 14, or displays and outputs the information on the user side via the communication unit 15. It is presented to the user of the information processing apparatus 1 by sending it to the terminal U or the like.

［動作］
本実施の形態は以上の構成を備えており、次のように動作する。以下の例では、情報処理装置１は、ツイッター（登録商標）や、インスタグラム（登録商標）等のＳＮＳの運営者により管理されるサーバ装置、あるいは当該ＳＮＳの運営者と契約した業者により管理されるサーバ装置として実現され、当該ＳＮＳのユーザからアクセス可能に配されているものとする。 [motion]
The present embodiment has the above configuration and operates as follows. In the following example, the information processing device 1 is managed by a server device managed by an SNS operator such as Twitter (registered trademark) or Instagram (registered trademark), or a contractor contracted with the SNS operator. It is assumed that it is realized as a server device and is accessible from the user of the SNS.

この例の情報処理装置１は、当該ＳＮＳの利用者の個人アカウントと、投稿されたテキストに含まれるハッシュタグと、当該ＳＮＳにアカウントを有する企業等の公式アカウントとを分析対象とする。また、情報処理装置１は個人アカウントを特定する情報を、予めユーザから受け入れる。具体的には、情報処理装置１は、ＳＮＳの利用者の一人であるユーザから、当該ユーザ自身を特定する情報を、評価の対象とする個人アカウントを特定する情報として受け入れておく。 The information processing device 1 of this example analyzes the personal account of the user of the SNS, the hashtag included in the posted text, and the official account of a company or the like having an account in the SNS. Further, the information processing device 1 receives information for identifying the personal account from the user in advance. Specifically, the information processing apparatus 1 accepts information that identifies the user himself / herself from a user who is one of the users of the SNS as information that identifies the personal account to be evaluated.

以下の動作の例では、当該受け入れた情報で、評価の対象として特定されたアカウントに対し、類似する他の個人アカウントを提示するものとする。 In the example of the following behavior, the accepted information shall present another similar personal account to the account identified for evaluation.

この例の情報処理装置１は、図４に例示するように、対象とするＳＮＳのウェブサイト等から、当該ＳＮＳの利用者の個人アカウントの一覧と、その一覧に属する個人アカウントからの投稿に含まれるタグ（ハッシュタグ）の情報と、当該一覧に属する個人アカウントごとに、当該個人アカウントがフォローする別のアカウント（ここでは説明のため公式アカウントに限るものとする）を特定する情報とを取得する（Ｓ１１：情報収集）。 As illustrated in FIG. 4, the information processing device 1 of this example is included in a list of personal accounts of users of the SNS and posts from personal accounts belonging to the list from the website of the target SNS or the like. Get information about tags (hash tags) and information that identifies another account that the individual account follows (here, for the sake of explanation, it is limited to the official account) for each individual account that belongs to the list. (S11: Information gathering).

なお、ここで情報処理装置１が取得する個人アカウントは、ＳＮＳの利用者全体の個人アカウントでなくてもよい。ただし、評価の対象とする個人アカウントを含めるものとする。 The personal account acquired by the information processing apparatus 1 here does not have to be the personal account of all SNS users. However, the personal account to be evaluated shall be included.

情報処理装置１は、当該取得した個人アカウントの各々を表すノードを含む層Ｂと、公式アカウントのノードを含む層Ａと、アカウントが利用したハッシュタグを表すノードを含む層Ｃとを仮想的に設定する。 The information processing device 1 virtually includes a layer B including a node representing each of the acquired personal accounts, a layer A including a node of an official account, and a layer C including a node representing a hashtag used by the account. Set.

そして情報処理装置１は、評価の対象を含む分析対象である層Ｂに属するアカウントのノードを所定の順に一つずつ選択し、選択したノードから、当該ノードが表すアカウントがフォローする、層Ａに属する公式アカウントのノードへのエッジを設定する。 Then, the information processing apparatus 1 selects the nodes of the accounts belonging to the layer B, which is the analysis target including the evaluation target, one by one in a predetermined order, and from the selected nodes, the account represented by the node follows the layer A. Set the edge to the node of the official account to which it belongs.

また情報処理装置１は、上記選択した層Ｂに属するアカウントのノードから、当該選択したノードが表すアカウントの投稿に含まれるハッシュタグのノードに対してエッジを設定する。さらに情報処理装置１は、層Ｃに属するハッシュタグを表すノードにおいて、一つの投稿に共通して現れている（つまり共起している）ハッシュタグ間にエッジを設定する。 Further, the information processing apparatus 1 sets an edge from the node of the account belonging to the selected layer B to the node of the hash tag included in the post of the account represented by the selected node. Further, the information processing apparatus 1 sets an edge between hashtags that appear in common (that is, co-occurrence) in one post in the node representing the hashtag belonging to the layer C.

また情報処理装置１は、この層Ｃ内のノード間のエッジに、エッジに係る一対のノードが表すハッシュタグの共起の回数に応じて、重みを設定する。さらに、情報処理装置１は、個人アカウントのノード（層Ｂに属するノード）と、当該個人アカウントからの投稿に含まれるハッシュタグのノード（層Ｃに属するノード）との間のエッジについても、当該個人アカウントのハッシュタグの使用回数に応じた重みを設定する（Ｓ１２：グラフの生成）。 Further, the information processing apparatus 1 sets weights on the edges between the nodes in the layer C according to the number of co-occurrence of hashtags represented by the pair of nodes related to the edges. Further, the information processing apparatus 1 also has the edge between the node of the personal account (node belonging to layer B) and the node of the hashtag included in the post from the personal account (node belonging to layer C). Set the weight according to the number of times the hashtag of the personal account is used (S12: Generate graph).

次に情報処理装置１は、ステップＳ１２で生成したグラフに含まれるノードの連結関係に基づき、ノードの分散表現情報を機械学習する第１の機械学習処理を実行する（Ｓ１３）。ここでは具体的に情報処理装置１は、グラフエンベディングの手法を用いて、各ノードの分散表現情報を得るものとする。 Next, the information processing apparatus 1 executes a first machine learning process for machine learning the distributed representation information of the nodes based on the connection relationship of the nodes included in the graph generated in step S12 (S13). Here, specifically, the information processing apparatus 1 uses a graph embedding method to obtain distributed representation information of each node.

すなわち、情報処理装置１はステップＳ１３において、いわゆるディープ・ウォーク（DeepWalk）の手法を用いて各ノードの分散表現情報を得る（Bryan Perozzi, et al., DeepWalk: Online Learning of Social Representations, arXiv:1403.6652v2, 27 Jun 2014）。より詳しくは、この情報処理装置１は、ステップＳ１３においてランダムな整数値γを一つ生成し、カウンタｋをｋ＝０にリセットする。 That is, in step S13, the information processing apparatus 1 obtains distributed representation information of each node by using a so-called DeepWalk method (Bryan Perozzi, et al., DeepWalk: Online Learning of Social Representations, arXiv: 1403.6652). v2, 27 Jun 2014). More specifically, the information processing apparatus 1 generates one random integer value γ in step S13 and resets the counter k to k = 0.

情報処理装置１は、層Ｂに属するノードＶの一つを初期ノードＶj(０)（ｊは、この処理の繰り返し回数であり、ｊ＝１，２，…）として選択する。そして情報処理装置１は、この初期ノードＶj（０）を記録する。次に、情報処理装置１は当該初期ノードＶj（０）に接続されているエッジを一つランダムに選択して、当該選択したエッジを介して隣接するノードＶj（１）へ移動して、当該ノードＶj（１）を初期ノードに連接して記録する。そして情報処理装置１はカウンタｋを「１」だけインクリメントする。 The information processing apparatus 1 selects one of the nodes V belonging to the layer B as the initial node Vj (0) (j is the number of repetitions of this process, j = 1, 2, ...). Then, the information processing apparatus 1 records this initial node Vj (0). Next, the information processing apparatus 1 randomly selects one edge connected to the initial node Vj (0), moves to the adjacent node Vj (1) via the selected edge, and causes the relevant node Vj (1). The node Vj (1) is connected to the initial node and recorded. Then, the information processing apparatus 1 increments the counter k by "1".

以下、ｋ＜γの間、移動先のノードＶj（ｋ）から、当該ノードＶj（ｋ）に含まれるエッジを一つランダムに選択して、隣接するノードＶj（ｋ＋１）へ移動して、移動先のノードＶj（ｋ＋１）をそれまでに移動したノードの記録に連接して記録する処理を繰り返す。 Hereinafter, while k <γ, one edge included in the node Vj (k) is randomly selected from the destination node Vj (k), and the node is moved to the adjacent node Vj (k + 1) to move. The process of concatenating and recording the previous node Vj (k + 1) with the recording of the node that has been moved up to that point is repeated.

なお、ここで現在のノードから当該ノードに接続されているエッジをランダムに選択する際、情報処理装置１は、エッジの重みを考慮してもよい。エッジの重みを考慮する場合は、情報処理装置１は、エッジに設定されている重みの比により、選択される確率を制御する。例えば現在のノードＶj（ｋ）にｎ個のエッジＥi（ｉ＝１，２…，ｎ）が連結され、それぞれに重みｗiが設定されている場合（なお、ここでは重みの設定のないエッジについては予め定めた値、あるいは、当該ノードに連結されたエッジのうち重みが設定されているエッジの重みに基づく統計値（平均値、中央値、最大値、最小値など）、あるいは、グラフ内で重みが設定されているエッジの重みに基づく統計値などにより定めておく）、情報処理装置１は、ノードＥiを選択する確率を、ｗi／Σｗiとする（Σｗiは、ｗiの総和を意味する）。このように確率に応じてランダムに選択する方法については広く知られた種々の方法を採用できるので、ここでの説明は省略する。 Here, when randomly selecting an edge connected to the node from the current node, the information processing apparatus 1 may consider the weight of the edge. When considering the weight of the edge, the information processing apparatus 1 controls the probability of being selected by the ratio of the weight set to the edge. For example, when n edges Ei (i = 1, 2, ..., N) are connected to the current node Vj (k) and a weight wi is set for each (here, for an edge without a weight setting). Is a predetermined value, a statistical value based on the weight of the edge connected to the node and the weight of the edge (mean value, median value, maximum value, minimum value, etc.), or in the graph. The information processing device 1 sets the probability of selecting the node Ei as wi / Σwi (Σwi means the sum of wis). .. As for the method of randomly selecting according to the probability, various well-known methods can be adopted, and thus the description thereof is omitted here.

これにより情報処理装置１は、γ個のノードのリスト（ノードリストと呼ぶ）Ｌj（ｊ＝１，２…）：
Ｌj＝［Ｖj（０），Ｖj（１），Ｖj（２）…，Ｖj（γ）］
を得る。 As a result, the information processing apparatus 1 has a list of γ nodes (referred to as a node list) Lj (j = 1, 2, ...) :.
Lj = [Vj (0), Vj (1), Vj (2) ..., Vj (γ)]
To get.

情報処理装置１は、層Ｂに属するノードＶiの各々を順次選択して、上記の処理を実行し、それぞれ互いに異なる長さの、ランダムなノードのリストを得る。 The information processing apparatus 1 sequentially selects each of the nodes Vi belonging to the layer B and executes the above processing to obtain a random list of nodes having different lengths from each other.

情報処理装置１は、層Ｂに属するノードＶiを起点としてエッジを辿って得たノードリストであって、評価の対象とする個人アカウントのノードを少なくとも所定の数（例えば１以上）だけ含んだ数のノードリストが得られるまで、ステップＳ１３の処理を繰り返し実行して、複数個のノードリストＬj（ｊ＝１，２…）を生成する。 The information processing device 1 is a node list obtained by tracing the edge starting from the node Vi belonging to the layer B, and includes at least a predetermined number (for example, 1 or more) of the nodes of the personal account to be evaluated. The process of step S13 is repeatedly executed until the node list of is obtained, and a plurality of node lists Lj (j = 1, 2, ...) Are generated.

例えば情報処理装置１は、起点とするノードを選択する際に、層Ｂのノードのすべてを順次抽出することとしてもよい。また情報処理装置１は、処理により得られたノードリストのうちに、層Ｂに含まれる互いに異なるノードの数ｎをカウントし、このｎの値が所定のしきい値（便宜的に第１しきい値と呼ぶ）を超えたか否かを判断してもよい。ここで第１しきい値は、例えば層Ｂに含まれるノードの数Ｎのうち、所定の割合α（０＜α≦１）の数を超えない最大の整数値floor(αＮ)とすればよい。そして、このｎが所定の第１しきい値を下回るか、または、得られているノードリストのうちに、評価の対象とするアカウントのノードが予め定めた第２しきい値未満の数だけしか含まれていない場合に、情報処理装置１は、起点とするノードを選択する処理から繰り返して実行して、さらに別のノードリストを生成する処理を繰り返すようにしてもよい。 For example, the information processing apparatus 1 may sequentially extract all the nodes of the layer B when selecting the node as the starting point. Further, the information processing apparatus 1 counts the number n of nodes included in the layer B, which are different from each other, in the node list obtained by the processing, and the value of this n is a predetermined threshold value (for convenience, the first one). It may be determined whether or not it exceeds the threshold value). Here, the first threshold value may be, for example, the maximum integer value floor (αN) that does not exceed the number of a predetermined ratio α (0 <α ≦ 1) among the number N of the nodes included in the layer B. .. Then, this n is below a predetermined first threshold value, or in the obtained node list, only the number of nodes of the account to be evaluated is less than a predetermined second threshold value. If it is not included, the information processing apparatus 1 may repeat the process of selecting a node as a starting point and then repeat the process of generating another node list.

一例として、図５に例示するように、Ａ層にノードＶ11，Ｖ12，Ｖ13…が含まれ、Ｂ層にノードＶ21，Ｖ22…が含まれ、Ｃ層にノードＶ31，Ｖ32，Ｖ33…が含まれているとき、Ｂ層ノードＶ21と、Ａ層のノードＶ11との間にエッジが設定され、当該ノードＶ21とＣ層のノードＶ31，Ｖ32にエッジが設定されているとする。またＢ層のノードＶ22と、Ｃ層のノードＶ32にエッジが設定されているものとし、Ｃ層のノードＶ31，32間、Ｖ32，33間、及びＶ31，33間にそれぞれエッジが設定されているものとする。 As an example, as illustrated in FIG. 5, the A layer contains nodes V11, V12, V13 ..., The B layer contains nodes V21, V22 ..., And the C layer contains nodes V31, V32, V33 ... At this time, it is assumed that an edge is set between the B-layer node V21 and the A-layer node V11, and an edge is set between the node V21 and the C-layer nodes V31 and V32. Further, it is assumed that the edge is set between the node V22 of the B layer and the node V32 of the C layer, and the edge is set between the nodes V31, 32, V32, 33, and V31, 33 of the C layer, respectively. It shall be.

このとき、例えばＶ21を初期ノードとして、５個のノードからなるノードリストをDeepWalkの方法で生成するとする。この例ではＶ21から酔歩（random walk）により、Ｖ21→Ｖ11→Ｖ21→Ｖ31→Ｖ33のように移動して、順列
Ｌ1＝［Ｖ21，Ｖ11，Ｖ21，Ｖ31，Ｖ33］
を得る。 At this time, for example, it is assumed that V21 is set as the initial node and a node list consisting of five nodes is generated by the Deep Walk method. In this example, it moves from V21 in the order of V21 → V11 → V21 → V31 → V33 by random walk, and the sequence L1 = [V21, V11, V21, V31, V33].
To get.

また同じＶ21を初期ノードとした場合に、別のノードリストとして、別の順列である
Ｌ2＝［Ｖ21，Ｖ31，Ｖ32，Ｖ22］
のようなノードリストが生成されてもよい。 Also, when the same V21 is used as the initial node, another node list is L2 = [V21, V31, V32, V22], which is a different order.
A node list such as may be generated.

情報処理装置１は、次に、生成した各ノードリストに含まれる各ノードについて、例えばネガティブサンプリングによるスキップグラム（skipgram）学習を行って、各ノードの分散表現情報であるベクトル情報を得る。つまり、情報処理装置１は、生成したノードリストを一つの「文」、当該文であるノードリストに含まれるノードを「語」として、word2vec等、広く知られた方法により分散表現情報を生成する。 Next, the information processing apparatus 1 performs skipgram learning by, for example, negative sampling, for each node included in each generated node list, and obtains vector information which is distributed representation information of each node. That is, the information processing apparatus 1 uses the generated node list as one "sentence" and the node included in the node list as the sentence as a "word" to generate distributed expression information by a widely known method such as word2vec. ..

これにより情報処理装置１は、評価対象の個人アカウントを含む、複数の分析対象を表すノードについての分散表現情報を得る。 As a result, the information processing apparatus 1 obtains distributed representation information about nodes representing a plurality of analysis targets, including the personal account to be evaluated.

また情報処理装置１は、分析対象間に設定された関係が、互いに予め定めた所定の関係にあるとされる複数の分析対象の一部と、分析対象間に設定された関係が、互いに予め定めた所定の関係にないとされる前記複数の分析対象の一部とを第２分析対象群として、当該第２分析対象群に含まれる分析対象の各々に設定された関係に基づいて、当該第２分析対象群に含まれる分析対象のそれぞれに係る分散表現情報を機械学習する（Ｓ１４）。 Further, in the information processing apparatus 1, a part of a plurality of analysis targets in which the relationship set between the analysis targets is considered to have a predetermined relationship predetermined with each other and the relationship set between the analysis targets are set in advance with each other. Based on the relationship set for each of the analysis targets included in the second analysis target group, the second analysis target group is a part of the plurality of analysis targets that are not considered to have a predetermined relationship. Machine learning is performed on the distributed expression information related to each of the analysis targets included in the second analysis target group (S14).

具体的に情報処理装置１は当初は（ノードの分散表現情報がステップＳ１４の処理により演算されたことがなければ）、各ノードの分散表現情報の初期値をランダムに定める。 Specifically, the information processing apparatus 1 initially (unless the distributed representation information of the nodes has been calculated by the process of step S14) randomly determines the initial value of the distributed representation information of each node.

また情報処理装置１は、層Ｂに属する一般アカウントの各ノードを順次、注目ノードＢｎとして選択する。そして情報処理装置１は、層Ｃに属するノードのうち、選択した注目ノードＢｎとの間でエッジが設定されているノードＣｘ（正例分析対象）を少なくとも一つ選択する。また情報処理装置１は、層Ｃに属するノードのうち、選択した注目ノードＢｎとの間でエッジが設定されていないノードＣｙ（負例分析対象）を少なくとも一つ、ノードＣｘと同じ数だけ選択する。 Further, the information processing apparatus 1 sequentially selects each node of the general account belonging to the layer B as the node of interest Bn. Then, the information processing apparatus 1 selects at least one node Cx (example analysis target) whose edge is set between the selected node Bn and the node belonging to the layer C. Further, the information processing apparatus 1 selects at least one node Cy (negative example analysis target) whose edge is not set between the selected node Bn and the node belonging to the layer C, in the same number as the node Cx. do.

情報処理装置１は、ｉ番目に選択した注目ノードＢｎの分散表現情報をｘi（ｉ＝１，２，…）、ノードＣｘの分散表現情報をｘj（ｊ＝１，２，…）、ノードＣｙの分散表現情報をｘj′（ｊ′＝１，２…）として、第２機械学習処理部２４は、損失ＬPARIGを、

として求める。既に説明したように、ここでΘを含む項は正則化項（regularization term）であり、

である。さらにここで

であれば「１」、そうでなければ「０」となる関数である。 In the information processing apparatus 1, the distributed representation information of the node Bn of interest selected i-th is xi (i = 1, 2, ...), The distributed representation information of the node Cx is xj (j = 1, 2, ...), And the node Cy. The second machine learning processing unit 24 sets the loss LPARIG as xj'(j'= 1, 2, ...).

Ask as. As already explained, the term containing Θ here is a regularization term, and it is a regularization term.

Is. Further here

If it is, it is "1", otherwise it is "0".

情報処理装置１は、この損失ＬPARIGを用いて、勾配法により、注目ノードＢｎの分散表現情報ｘiと、ノードＣｘの分散表現情報ｘj（ｊ＝１，２，…）、ノードＣｙの分散表現情報をｘj′（ｊ′＝１，２…）を、Ｂｎの分散表現情報ｘiとノードＣｘの分散表現情報ｘjとの間の距離が、Ｂｎの分散表現情報ｘiとノードＣｙの分散表現情報ｘjとの間の距離より小さくなるよう機械学習する。 Using this loss LPARIG, the information processing apparatus 1 uses the gradient method to obtain the distributed representation information xi of the node Bn of interest, the distributed representation information xj (j = 1, 2, ...) Of the node Cx, and the distributed representation information of the node Cy. The distance between the distributed expression information xi of Bn and the distributed expression information xj of the node Cx is the distance between the distributed expression information xi of Bn and the distributed expression information xj of the node Cy. Machine learn to be less than the distance between.

情報処理装置１は、評価の処理に用いるノード、ここでは層Ｂに属する各ノードの分散表現情報ｘi（ｉ＝１，２…）を、ステップＳ１３における処理とは独立に生成する。そして情報処理装置１は、ノードごとに、ステップＳ１３で求めた当該ノードの分散表現情報とステップＳ１４で求めた当該ノードの分散表現情報とを連接して、当該ノードの統合分散表現情報を得る（Ｓ１５）。 The information processing apparatus 1 generates the distributed representation information xi (i = 1, 2, ...) Of the node used for the evaluation process, here, each node belonging to the layer B, independently of the process in step S13. Then, the information processing apparatus 1 concatenates the distributed representation information of the node obtained in step S13 and the distributed representation information of the node obtained in step S14 for each node to obtain the integrated distributed representation information of the node (). S15).

本実施の形態において特徴的なことの一つは、ステップＳ１３において求められるノードの分散表現情報が、当該ノードからエッジを辿って得られる情報（すなわちグラフのローカルな特徴）に基づいて得られるものであるのに対し、ステップＳ１４において求められるノードの分散表現情報が、当該ノードと連結されていない他のノードの情報を用いて演算されるものであり、グラフのグローバルな特徴に基づいて得られるものであることである。 One of the characteristics of this embodiment is that the distributed representation information of the node obtained in step S13 is obtained based on the information obtained by tracing the edge from the node (that is, the local feature of the graph). On the other hand, the distributed representation information of the node obtained in step S14 is calculated using the information of other nodes that are not connected to the node, and is obtained based on the global characteristics of the graph. It is a thing.

このようにローカルな特徴に基づく分散表現情報と、グローバルな特徴に基づく分散表現情報とを含む統合分散表現情報を用いることで、ローカルな特徴とグローバルな特徴とを含んだ情報に基づいて評価を行うことが可能となる。 By using the integrated distributed representation information including the distributed representation information based on the local features and the distributed representation information based on the global features in this way, the evaluation is performed based on the information including the local features and the global features. It will be possible to do.

情報処理装置１は、ノードごとに得られた統合分散表現情報を用いて、評価の対象としたノード間の関連性を評価する。ここでの例では指定された個人アカウントのノードと、他の個人アカウントのノードとの類似度を評価する（Ｓ１６）。 The information processing apparatus 1 evaluates the relationship between the nodes to be evaluated by using the integrated distributed representation information obtained for each node. In the example here, the similarity between the node of the specified personal account and the node of another personal account is evaluated (S16).

すなわち情報処理装置１は、評価の対象となった個人アカウントのノードの統合分散表現情報と、他の個人アカウントのノードの統合分散表現情報との距離情報として、例えば各統合分散表現情報のユークリッド距離を求める。 That is, the information processing apparatus 1 represents, for example, the Euclidean distance of each integrated distributed representation information as distance information between the integrated distributed representation information of the node of the personal account that is the target of evaluation and the integrated distributed representation information of the node of another personal account. Ask for.

そして情報処理装置１は、得られた距離情報に基づいて、評価の対象となった個人アカウントのノードと、統合分散表現情報に基づく距離が比較的近いと判断される、他の個人アカウントのノードを少なくとも一つ選択する。そして情報処理装置１は、当該選択したノードが表す個人アカウントの情報を出力してユーザに提示する（Ｓ１７）。 Then, the information processing apparatus 1 is determined to have a relatively short distance between the node of the personal account to be evaluated based on the obtained distance information and the node of another personal account based on the integrated distributed expression information. Select at least one. Then, the information processing apparatus 1 outputs the information of the personal account represented by the selected node and presents it to the user (S17).

例えば情報処理装置１は、当該選択したノードにエッジを介して連結されている層Ａ内の公式アカウントのノードのうち、評価の対象となった個人アカウントのノードにエッジを介して連結されていない層Ａ内の公式アカウントのノードを特定する情報を得る。 For example, the information processing apparatus 1 is not connected to the node of the personal account to be evaluated through the edge among the nodes of the official account in the layer A connected to the selected node via the edge. Obtain information that identifies the node of the official account in Layer A.

ここで選択したノードにエッジを介して連結されている層Ａ内の公式アカウントのノードのうち、評価の対象となったアカウントのノードにエッジを介して連結されていない層Ａ内の公式アカウントのノードがない場合は、さらに先に得られた距離情報に基づいて距離が近いと判断されるアカウントのノードのうち、別のノードを選択して、当該選択したノードにエッジを介して連結されている層Ａ内の公式アカウントのノードのうち、評価の対象となったアカウントのノードにエッジを介して連結されていない層Ａ内の公式アカウントのノードを特定する情報を得ることとすればよい。 Of the official account nodes in layer A that are linked to the node selected here via the edge, the official accounts in layer A that are not linked to the node of the account that was evaluated through the edge. If there is no node, select another node from the nodes of the account that is judged to be close based on the distance information obtained earlier, and connect to the selected node via the edge. Among the nodes of the official account in the layer A, the information that identifies the node of the official account in the layer A that is not connected to the node of the account to be evaluated via the edge may be obtained.

情報処理装置１は、ここで得られた評価の対象となったアカウントのノードにエッジを介して連結されていないノードで特定される、公式アカウントに係る情報、つまり、当該公式アカウントのアカウント名や、そのプロファイル情報等を、ユーザに提示する。例えば情報処理装置１は、当該公式アカウントのアカウント名や、そのプロファイル情報等を、ユーザに提示する。この情報の提示は例えば、ユーザが使用する端末装置に対して当該情報を送信して表示させることによって行われてもよい。 The information processing device 1 is the information related to the official account, that is, the account name of the official account, which is specified by the node not connected to the node of the account subject to the evaluation obtained here via the edge. , The profile information and the like are presented to the user. For example, the information processing apparatus 1 presents the account name of the official account, its profile information, and the like to the user. The presentation of this information may be performed, for example, by transmitting and displaying the information to the terminal device used by the user.

本実施の形態のこの例によると、ユーザと当該ユーザがフォローする公式アカウントとの関係、あるいはユーザと当該ユーザが使用しているハッシュタグとの関係、さらには一つの投稿で共起するハッシュタグ間の関係などといった情報に基づいてグラフネットワークを形成し、当該グラフのローカルな特徴とグローバルな特徴との双方を用いてユーザの個人アカウントの統合分散表現情報（なお、統合分散表現情報は公式アカウントやハッシュタグ等についても求めてもよい）を得て、ユーザ間の関係の近さ等の定量評価を可能としている。このため各ユーザが共通したハッシュタグを利用している回数が少なくても、つまり、関係が疎であっても、各ユーザ間の距離を、ローカルな特徴とグローバルな特徴の双方の観点に基づいて定量化可能としている。 According to this example of this embodiment, the relationship between the user and the official account that the user follows, the relationship between the user and the hash tag used by the user, and the hash tag that co-occurs in one post. A graph network is formed based on information such as the relationship between the graphs, and the integrated distributed representation information of the user's personal account is used for both the local and global characteristics of the graph (note that the integrated distributed representation information is the official account. , Hash tags, etc. may also be obtained), enabling quantitative evaluation of the closeness of relationships between users. Therefore, even if each user uses a common hashtag less often, that is, even if the relationship is sparse, the distance between each user is based on both local and global characteristics. Can be quantified.

［エッジの設定方法の他の例］
ここまでの説明では、グラフを生成する際に、分析対象、あるいは、情報項目を表すノード間に設定するエッジを定めるにあたり、分析対象と、情報項目とをそれぞれの種類ごとに層Ａ，Ｂ，Ｃ…に分けた上で、予め定めた規則に従ってエッジを設定していた。 [Other examples of how to set edges]
In the explanation so far, when the analysis target or the edge to be set between the nodes representing the information items is determined when the graph is generated, the analysis target and the information item are divided into layers A, B, and layers for each type. After dividing into C ..., the edge was set according to a predetermined rule.

すなわちここまでの例では、
・層Ｂと層Ａとの間では、層Ｂ内のノードと、当該ノードが表すアカウントがフォローする公式アカウントを表す層Ａ内のノードとの間にエッジを設定する。
・層Ｂと層Ｃとの間では、層Ｂ内のノードと、当該ノードが表すアカウントが投稿したテキストに含まれるハッシュタグを表す層Ｃ内のノードとの間にエッジを設定する。
・層Ｃ内のノード間では、一つの投稿に一度に含まれる（つまり共起している）複数のハッシュタグのそれぞれを表すノードの間にエッジを設定する。
こととしていた。しかしながらエッジの設定方法はこの例に限られるものではない。 That is, in the examples so far,
• Between layer B and layer A, an edge is set between the node in layer B and the node in layer A that represents the official account that the account represented by that node follows.
An edge is set between the layer B and the layer C between the node in the layer B and the node in the layer C representing the hashtag included in the text posted by the account represented by the node.
-Between the nodes in layer C, an edge is set between the nodes representing each of the plurality of hashtags included (that is, co-occurring) at one time in one post.
I was supposed to. However, the edge setting method is not limited to this example.

この他に例えば、層Ｂ内のノード間で、相互フォローしている一対のアカウントのノード間にエッジを設定することとしてもよい。 In addition to this, for example, an edge may be set between the nodes of a pair of accounts that are following each other among the nodes in the layer B.

また、層Ａ内のノードにおいて、別に行われた調査・分析により、互いに類似していると判断される公式アカウント同士について相互にエッジを設定することとしてもよい。例えば公式アカウントが何らかのブランドのアカウントである場合に、顧客が類似しているなどと分析されたブランドの公式アカウント間にエッジを設定してもよい。 Further, in the node in the layer A, the edges may be set for the official accounts that are judged to be similar to each other by the investigation and analysis conducted separately. For example, if the official account is an account of some brand, an edge may be set between the official accounts of the brands analyzed as having similar customers.

［重みの正規化］
なお、ここまでの説明では、情報処理装置１は、層Ｃ内のノード間のエッジに、対応するハッシュタグの共起の回数に応じて、重みを設定するとともに、層Ｂに属するノードと、対応する個人アカウントからの投稿に含まれるハッシュタグを表す層Ｃに属するノードとの間のエッジにも、当該個人アカウントのハッシュタグの使用回数に応じた重みを設定することとしていた。 [Weight normalization]
In the description so far, the information processing apparatus 1 sets weights on the edges between the nodes in the layer C according to the number of times the corresponding hashtags co-occur, and also sets the nodes belonging to the layer B and the nodes belonging to the layer B. The edge between the node and the node belonging to the layer C representing the hashtag included in the post from the corresponding personal account is also set to be weighted according to the number of times the hashtag of the personal account is used.

この重みは次のように正規化されたものであってもよい。具体的に情報処理装置１は、層Ｂ－層Ｃ間のエッジ（個人アカウントとハッシュタグ間のエッジ）についての重みとして、対応する個人アカウント（ノードＶiとする）が、対応するハッシュタグ（ノードＶjとする）を使用した回数ｗ′ijを用いて、

というように正規化した重みを設定してもよい。 This weight may be normalized as follows. Specifically, in the information processing apparatus 1, the corresponding personal account (referred to as node Vi) has the corresponding hash tag (node) as the weight for the edge between layers B and C (edge between the personal account and the hash tag). Using (let's say Vj) the number of times w'ij was used,

You may set the normalized weight like this.

また情報処理装置１は、層Ｃ内の一対のノードＶi，Ｖj間のエッジについては次のように重みを設定する。すなわち、以下の例では情報処理装置１は、ハッシュタグの共起の回数が「１」など所定のしきい値以下となっているときには初期の重みをｗ′ij＝０とする。これはグラフの複雑さを低減して、機械学習に係る処理負荷を軽減するためである。また情報処理装置１は、当該所定のしきい値を超える回数だけ共起したハッシュタグ間の重みを初期的にｗ′ij＝Ｎc（ここでＮcはノードＶi，Ｖjに対応するハッシュタグが共起している投稿の数、すなわち共起回数）とする。 Further, the information processing apparatus 1 sets weights for the edges between the pair of nodes Vi and Vj in the layer C as follows. That is, in the following example, the information processing apparatus 1 sets the initial weight to w'ij = 0 when the number of co-occurrence of hashtags is equal to or less than a predetermined threshold value such as "1". This is to reduce the complexity of the graph and reduce the processing load related to machine learning. Further, the information processing apparatus 1 initially sets the weight between the hashtags co-occurring a number of times exceeding the predetermined threshold value as w'ij = Nc (where Nc is the hashtag corresponding to the nodes Vi and Vj). The number of posts that have been raised, that is, the number of co-occurrence).

そして情報処理装置１は、層Ｃ内の一対のノードＶi，Ｖj間のエッジの重みを、上記初期の重みｗ′ijを用いて、

と設定する。 Then, the information processing apparatus 1 uses the initial weight w'ij to determine the weight of the edge between the pair of nodes Vi and Vj in the layer C.

And set.

このように重みを正規化することで、例えばディープ・ウォークの処理を行う際に各ノードが比較的等確率で選択されるように調整される。 By normalizing the weights in this way, for example, when performing a deep walk process, each node is adjusted to be selected with a relatively equal probability.

［有向グラフとする例］
また以上の説明では、生成するグラフは、方向のない無向グラフとしていたが、例えばエッジを設定する際に、層Ｂ内のノード間で、いずれか一方が他方をフォローしている一対のノード（相互フォローとなっていない場合を含む）間に、フォローをしているアカウントのノードから、フォローされているアカウントのノードへと、方向のあるエッジを設定することとしてもよい。この例では相互フォローとなっている一対のノードＶａ，Ｖｂ間には、ＶａからＶｂへの方向を有するエッジＥａｂと、ＶｂからＶａへの方向を有するエッジＥｂａとが（つまり双方向のエッジが）設定される。 [Example of directed graph]
Further, in the above description, the generated graph is an undirected graph with no direction. However, for example, when setting an edge, a pair of nodes in which one of them follows the other among the nodes in the layer B. During (including non-mutual follow), you may want to set a directional edge from the node of the account you are following to the node of the account you are following. In this example, between a pair of nodes Va and Vb that follow each other, an edge Eba having a direction from Va to Vb and an edge Eba having a direction from Vb to Va (that is, bidirectional edges are ) Set.

またこの場合、層Ａ内の公式アカウントのノードと、層Ｂ内のアカウントのノードとの間では相互フォローでない場合でも、層Ａ内のノードＶｃと、層Ｂ内のノードＶｄ間には、Ｖｃの公式アカウントがＶｄのアカウントをフォローしていなくても、ＶｄのアカウントがＶｃの公式アカウントをフォローしていれば、ＶｃからＶｄへの方向を有するエッジＥｃｄと、ＶｄからＶｃへの方向を有するエッジＥｄｃと（つまり双方向のエッジ）を設定する。 Also, in this case, even if there is no mutual follow-up between the node of the official account in layer A and the node of the account in layer B, Vc is between the node Vc in layer A and the node Vd in layer B. Even if the official account of Vd does not follow the account of Vd, if the account of Vd follows the official account of Vc, it has the edge Edd having the direction from Vc to Vd and the direction from Vd to Vc. Set the edge Edc (that is, the bidirectional edge).

層Ｂ，層Ｃ間においても同様に、双方向のエッジを設定するものとし、層Ｃ間でも一つの投稿内で共起しているハッシュタグに対応する一対のノード間には双方向のエッジ（上述のような２つのエッジでもよい）を設定する。 Similarly, bidirectional edges are set between layers B and C, and bidirectional edges are set between a pair of nodes corresponding to hashtags co-occurring in one post even between layers C. (It may be two edges as described above) is set.

このように有向グラフとした場合は、ディープ・ウォークを行う際に、エッジの方向に従ってのみ移動することとする。従って、移動先のノードにおいて、外向きのエッジが存在しない場合は、そこでリスト終了とする。 In the case of a directed graph in this way, when performing a deep walk, it is assumed that the graph moves only according to the direction of the edge. Therefore, if there is no outward edge in the destination node, the list ends there.

［第２機械学習処理部による演算の他の例］
また本実施の形態のここまでの説明においては、第２機械学習処理部２４は、ｉ番目に選択した注目ノードＢｎの分散表現情報をｘi（ｉ＝１，２，…）、ノードＣｘの分散表現情報をｘj（ｊ＝１，２，…）、ノードＣｙの分散表現情報をｘj′（ｊ′＝１，２…）として、損失としてＬPARIGを（１）式により演算することとしていた。 [Other examples of operations by the second machine learning processing unit]
Further, in the description of the present embodiment so far, the second machine learning processing unit 24 uses xi (i = 1, 2, ...) And the distribution of the node Cx for the distributed representation information of the node Bn of interest selected i-th. The expression information is xj (j = 1, 2, ...), The distributed expression information of the node Cy is xj'(j'= 1, 2, ...), And LPARIG is calculated as the loss by the equation (1).

しかしながら、これは一例であり、損失ＬBPRを、（１）式のＬPARIGに代えて、次の（２）式により求めることとしてもよい。すなわち、第２機械学習処理部２４は、

として損失ＬBPRを求めてもよい。 However, this is only an example, and the loss LBPR may be obtained by the following equation (2) instead of the LPARIG of the equation (1). That is, the second machine learning processing unit 24

The loss LBPR may be obtained as.

この例においても、第２機械学習処理部２４は、この損失ＬBPRを最小とするように注目ノードＢｎの分散表現情報ｘiと、ノードＣｘの分散表現情報ｘj（ｊ＝１，２，…）、ノードＣｙの分散表現情報をｘj′（ｊ′＝１，２…）を更新して機械学習する（例えばSteffen Rendle, et.al., “BPR:Bayesian personalized ranking from implicit feedback”, UAI '09: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, June 2009 Pages 452-461等に記載されている処理と同様）。この更新の方法は、損失の演算が異なるだけで、既に述べた例と同様の方法で行うことができるのでその説明を省略する。 In this example as well, the second machine learning processing unit 24 uses the distributed representation information xi of the node Bn of interest and the distributed representation information xj (j = 1, 2, ...) Of the node Cx so as to minimize this loss LBPR. Machine learning is performed by updating xj'(j'= 1, 2, ...) The distributed representation information of the node Cy (for example, Steffen Rendle, et.al., "BPR: Bayesian personalized ranking from implicit feedback", UAI '09: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, June 2009 Pages 452-461, etc.). Since this update method can be performed by the same method as the above-mentioned example except that the loss calculation is different, the description thereof will be omitted.

なお、この第２機械学習処理部２４においてグローバルな特徴を表す分散表現情報を生成する方法としては、ここまでで述べた方法に限られず、Yehuda Koren, et al., “MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS”, Computer, 42(8), pp.30-37, 2009に記載の方法や、Xiangunan He, et al., “Neural Collaborative Filtering”, arXiv:1708.05031v2, 26 Aug 2017に記載の方法などを利用してもよい。 The method of generating distributed representation information representing global features in the second machine learning processing unit 24 is not limited to the methods described so far, and Yehuda Koren, et al., “MATRIX FACTORIZATION TECHNIQUES FOR RECOMMENDER SYSTEMS” ”, Computer, 42 (8), pp.30-37, 2009 and the method described in Xiangunan He, et al.,“ Neural Collaborative Filtering ”, arXiv: 1708.05031v2, 26 Aug 2017, etc. You may.

［第１機械学習処理部における演算の変形例］
また、ローカルな特徴を表す分散表現情報を生成する第１機械学習処理部２３の処理も、ここで述べた例に限られない。第１機械学習処理部２３は、いわゆるnode2vec（Aditya Grover, et al., Scalable Feature Learning for Networks, DOI: http://dx.doi.org/10.1145/2939672.2939754）を用いてローカルな特徴を表す分散表現情報を生成してもよいし、William Hamilton, et al., ”Inductive Representation Learning on Large Graphs”, In Advances in neural information processing systems, pp.1024-1034 (31st Conference on Neural Information Processing Systems (NIPS 2017))に開示された方法を採用して分散表現情報を生成してもよい。 [Modification example of calculation in the first machine learning processing unit]
Further, the processing of the first machine learning processing unit 23 that generates the distributed representation information representing the local feature is not limited to the example described here. The first machine learning processing unit 23 uses so-called node2vec (Aditya Grover, et al., scalable Feature Learning for Networks, DOI: http://dx.doi.org/10.1145/2939672.2939754) to distribute local features. William Hamilton, et al., "Inductive Representation Learning on Large Graphs", In Advances in neural information processing systems, pp.1024-1034 (31st Conference on Neural Information Processing Systems (NIPS 2017)) The distributed representation information may be generated by adopting the method disclosed in)).

［中間的特徴を含める例］
さらにここまでの説明では、第２機械学習処理部２４は、グラフの全体から注目ノードＢｎとの間に所定の関係（例えばエッジを所定回数以内の回数だけ辿って到達できる関係）にないノードＣｙを選択していたが、本実施の形態はこの例に限られない。例えば第２機械学習処理部２４は、グラフから得られる部分グラフのうち、注目ノードＢｎを含む部分グラフを生成し、当該部分グラフから注目ノードＢｎとの間に所定の関係にないノードＣｙを選択することとしてもよい。この例では、グローバルな特徴ではあるものの、ミッドレンジの特徴を表す分散表現情報が生成される。 [Example of including intermediate features]
Further, in the description so far, the second machine learning processing unit 24 has a node Cy that is not in a predetermined relationship (for example, a relationship that can be reached by tracing the edge within a predetermined number of times) from the entire graph to the node Bn of interest. However, the present embodiment is not limited to this example. For example, the second machine learning processing unit 24 generates a subgraph including the attention node Bn from the subgraphs obtained from the graph, and selects a node Cy that does not have a predetermined relationship with the attention node Bn from the subgraph. You may do it. In this example, although it is a global feature, distributed representation information representing a midrange feature is generated.

また別の例では、第２機械学習処理部２４は、注目ノードＢｎからエッジをα回以内の回数（ここでαは自然数）だけ辿って到達できる関係にあるノードＣｘと、注目ノードＢｎからエッジをβ（β＞α＋１、α，βは自然数）回以内の回数だけ辿って到達できる関係にないノードＣｙを選択して、当該ノードＣｘ，Ｃｙを用いて、注目ノードＢｎのグローバルな特徴を表す分散表現情報を生成するとともに、β＞γ＞αなる自然数γを用い、注目ノードＢｎからエッジをα＋１以上、γ回以下の回数だけ辿って到達できる関係にないノードＣｙ′を選択し、ノードＣｘとこのノードＣｙ′とを用いて、注目ノードＢｎのミッドレンジのグローバルな特徴を表す分散表現情報を得てもよい。後者の例では、α，β（及びγ）の範囲を制御することでどの程度グローバルな特徴を含めるかを調整できる。 In another example, the second machine learning processing unit 24 has a relationship that can be reached by tracing the edge from the attention node Bn within α times (where α is a natural number) and the edge from the attention node Bn. Is β (β> α + 1, α, β is a natural number), and the node Cy that is not in a relationship that can be reached is selected, and the node Cx, Cy is used to represent the global characteristics of the node Bn of interest. While generating distributed representation information, using the natural number γ such that β> γ> α, select a node Cy'that is not related to be reachable by tracing the edge from the note node Bn a number of times α + 1 or more and γ times or less, and select node Cx. And this node Cy'may be used to obtain distributed representation information representing the global characteristics of the midrange of the node Bn of interest. In the latter example, the extent to which global features are included can be adjusted by controlling the range of α, β (and γ).

［他の例］
また本実施の形態のここまでの説明ではＳＮＳ等のユーザに対して、フォローする公式アカウントを推薦する例を示したが、本実施の形態の情報処理装置１の利用分野は、この例に限られるものではない。 [Other examples]
Further, in the explanation so far of this embodiment, an example of recommending an official account to be followed is shown to a user such as SNS, but the field of use of the information processing apparatus 1 of this embodiment is limited to this example. It is not something that can be done.

例えば、公式アカウント側からは、自己をフォローしているユーザ（フォロー中ユーザと呼ぶ）に近いユーザであって、自己をフォローしていないユーザ（フォロー外ユーザと呼ぶ）を、ユーザの個人アカウントのノードの統合分散表現情報を利用して分析してもよい。 For example, from the official account side, a user who is close to a user who is following himself (called a following user) but who is not following himself (called an unfollowed user) is a user's personal account. It may be analyzed by using the integrated distributed representation information of the node.

つまり、情報処理装置１は、公式アカウント側からの要求に応答して、当該要求を行った公式アカウントのフォロー中ユーザの個人アカウントに係る統合分散表現情報を得る。そして情報処理装置１は、当該得た統合分散表現情報のうち例えばＮ個（Ｎは１以上の自然数とする）をランダムに選択して、選択した統合分散表現情報と、当該公式アカウントに対するフォロー外ユーザの統合分散表現情報との距離情報を演算する。そして当該距離情報が表す一対の分散表現情報が互いに類似していると判断される場合（コサイン類似度の場合は０より大きく１以下の所定の値以上である場合）に、情報処理装置１は、当該フォロー外ユーザの個人アカウントの情報を要求元の公式アカウントに対して提示する。 That is, the information processing apparatus 1 responds to the request from the official account side and obtains the integrated distributed representation information related to the personal account of the user who is following the official account that made the request. Then, the information processing apparatus 1 randomly selects, for example, N pieces (N is a natural number of 1 or more) from the obtained integrated distributed expression information, and the selected integrated distributed expression information and the follow-up to the official account are not performed. Calculates the distance information from the user's integrated distributed representation information. Then, when it is determined that the pair of distributed representation information represented by the distance information are similar to each other (in the case of cosine similarity, when it is greater than 0 and equal to or less than a predetermined value), the information processing apparatus 1 , Present the information of the personal account of the unfollowed user to the official account of the requesting source.

なお、複数のフォロー中ユーザに係る統合分散表現情報を用いる場合（上記Ｎが２以上の場合）は、そのいずれにも類似していると判断される場合、あるいはあるフォロー外ユーザに係る統合分散表現情報に類似していると判断されるフォロー中ユーザに係る統合分散表現情報の数をｎとしたとき、ｎ／Ｎが所定の判定しきい値を超える場合に、当該フォロー外ユーザのアカウントの情報を要求元の公式アカウントに対して提示することとしてもよい。 When the integrated distributed expression information related to a plurality of following users is used (when N is 2 or more), it is judged to be similar to any of them, or the integrated distributed related to a certain unfollowed user. When the number of integrated distributed expression information related to the following user judged to be similar to the expression information is n, and n / N exceeds a predetermined judgment threshold, the account of the unfollowed user Information may be presented to the requesting official account.

さらに本実施の形態の情報処理装置１の分析対象は、ＳＮＳに関連する情報である必要はない。例えば、結婚相手を紹介するシステムにおいては、登録者（結婚相手を探しているユーザ）をノードとし、登録者が興味ありとしている情報項目（属している興味グループを表す情報など）をまた別のノードとして、これらのノードの間をエッジで連結してネットワークを生成してもよい。 Further, the analysis target of the information processing apparatus 1 of the present embodiment does not need to be information related to SNS. For example, in a system that introduces a marriage partner, the registrant (user who is looking for a marriage partner) is set as a node, and the information items that the registrant is interested in (information indicating the interest group to which the registrant belongs, etc.) are different. As a node, a network may be created by connecting these nodes at an edge.

この場合、情報処理装置１は、当該ネットワークに関するローカルな特徴を表す分散表現情報を、例えばネットワークエンベディングの方法により登録者ごとに得るとともに、グローバルな特徴を表す分散表現情報を、Bayesian personalized ranking from implicit feedback等の情報によって登録者ごとに得て、これらを統合した統合分散表現情報を登録者ごとに生成する。 In this case, the information processing apparatus 1 obtains distributed representation information representing local features related to the network for each registrant by, for example, a network embedding method, and Bayesian personalized ranking from implicitly obtains distributed representation information representing global features. Information such as feedback is obtained for each registrant, and integrated distributed representation information that integrates these is generated for each registrant.

そして情報処理装置１は、当該統合分散表現情報間の類似度により、登録者間の類似度を求め、この類似度の情報を、推奨する相手となる登録者を紹介する処理に供してもよい。この例においても、同じ登録者が共通して属している興味グループがある場合は、当該興味グループ間にもエッジを設定することとしてもよい。 Then, the information processing apparatus 1 may obtain the similarity between the registrants based on the similarity between the integrated distributed expression information, and may use the information of the similarity for the process of introducing the registrant who is the recommended partner. .. Also in this example, if there are interest groups to which the same registrant belongs in common, an edge may be set between the interest groups as well.

またこの例では、自己（登録者Ｐとする）に係る分散表現情報と類似度の高い分散表現情報となっている他の登録者Ｑが、既に交際を申し込んで断られている登録者Ｘが存在するときには、登録者Ｐに対して、登録者Ｘについては交際の可能性を低い相手として提示することも考えられる。 Further, in this example, the registrant X who has already applied for dating and has been refused by another registrant Q whose distributed expression information has a high degree of similarity to the distributed expression information related to himself (referred to as registrant P) is When present, it is conceivable to present the registrant P with a low possibility of dating with respect to the registrant X.

このように本実施の形態のある例では、分析対象の一つであるＰの統合分散表現情報と類似度が比較的高い（例えばコサイン類似度の場合は０より大きく１以下の所定の値以上であるなど）として判断される統合分散表現情報に係る他の分析対象Ｑが存在し、かつ当該分析対象Ｑが他の分析対象Ｘとの間にエッジを設定するべきでない関係（いわば負例となる関係）がある場合に、情報処理装置１は、分析対象Ｐと、当該情報項目または分析対象Ｘとの間の関係も負例となり得ることを、情報処理装置１のユーザに提示することとしてもよい。またその確率として、当該分析対象Ｐと類似すると判断される統合分散表現情報に係る複数の他の分析対象Ｑ1，Ｑ2…の数Ｎに対し、当該分析対象Ｑのうち情報項目または分析対象Ｘとの間の関係が負例となっている分析対象Ｑの数ｎを求め、その割合ｎ／Ｎを示してもよい。 As described above, in an example of this embodiment, the similarity with the integrated distributed representation information of P, which is one of the analysis targets, is relatively high (for example, in the case of cosine similarity, it is greater than 0 and equal to or less than a predetermined value of 1 or more. There is another analysis target Q related to the integrated distributed representation information that is judged to be (such as), and the analysis target Q should not set an edge with the other analysis target X (so to speak, with a negative example). When there is a relationship), the information processing apparatus 1 presents to the user of the information processing apparatus 1 that the relationship between the analysis target P and the information item or the analysis target X can also be a negative example. May be good. Further, as the probability, for the number N of a plurality of other analysis targets Q1, Q2 ... Related to the integrated distributed expression information judged to be similar to the analysis target P, the information item or the analysis target X in the analysis target Q is used. The number n of the analysis target Q for which the relationship between the two is a negative example may be obtained, and the ratio n / N may be shown.

［公式アカウントの分散情報の例］
また本実施の形態のある例では、情報処理装置１は、公式アカウント（層Ａ）のノードＶiの分散表現ａiを次のように求めてもよい。 [Example of distributed information for official accounts]
Further, in an example of the present embodiment, the information processing apparatus 1 may obtain the distributed representation ai of the node Vi of the official account (layer A) as follows.

すなわち、このノードＶiで表される公式アカウントＡｉをフォローする層Ｂの個人アカウントのノードVkのインデックスｋの集合をＦ(i)と書き、その要素の総数（公式アカウントＡｉをフォローする個人アカウントの総数）を｜Ｆ(i)｜として、情報処理装置１は、当該公式アカウントＡｉの分散表現ａiを、層ＢのノードＶkの統合分散表現情報Ｘkを用いて、

として求めてもよい。 That is, the set of the index k of the node Vk of the individual account of the layer B that follows the official account Ai represented by this node Vi is written as F (i), and the total number of its elements (the individual account that follows the official account Ai). The total number) is | F (i) |, and the information processing apparatus 1 uses the distributed representation ai of the official account Ai and the integrated distributed representation information Xk of the node Vk of the layer B.

May be sought as.

この例では、個人アカウントのノードＶjと、この公式アカウントＡiのノードＶiとの類似度（スコア）を、ａiと、個人アカウントのノードＶjの統合分散表現情報Ｘjとのコサイン類似度により決定してもよい。 In this example, the similarity (score) between the node Vj of the personal account and the node Vi of this official account Ai is determined by the cosine similarity between ai and the integrated distributed representation information Xj of the node Vj of the personal account. May be good.

１情報処理装置、１１制御部、１２記憶部、１３操作部、１４表示部、１５通信部、２１情報取得部、２２グラフ生成部、２３第１機械学習処理部、２４第２機械学習処理部、２５統合処理部、２６評価処理部、２７出力処理部。

1 Information processing device, 11 Control unit, 12 Storage unit, 13 Operation unit, 14 Display unit, 15 Communication unit, 21 Information acquisition unit, 22 Graph generation unit, 23 1st machine learning processing unit, 24 2nd machine learning processing unit , 25 integrated processing unit, 26 evaluation processing unit, 27 output processing unit.

Claims

A means for holding vector values set for each analysis target as distributed representation information for a plurality of analysis targets for which mutual relationships are set, and
A part of the plurality of analysis targets, which are considered to have a predetermined first relationship in which the relationships set between the analysis targets are predetermined to each other, is extracted as a first analysis target group, and the extracted first. A first machine learning means for machine learning distributed expression information related to each of the analysis targets included in the first analysis target group based on the order or combination of the analysis target groups.
A part of the plurality of analysis targets in which the relationship set between the analysis targets is considered to have a predetermined second relationship predetermined to each other, and the relationship set between the analysis targets are predetermined to each other. Based on the relationship set for each of the analysis targets included in the second analysis target group, a part of the plurality of analysis targets that are not considered to have a predetermined second relationship is set as the second analysis target group. , A second machine learning means for machine learning distributed expression information related to each of the analysis targets included in the second analysis target group.
A processing means for subjecting the distributed expression information for each analysis target machine-learned by the first machine learning means and the second machine learning means to a predetermined process,
Information processing equipment including.

The information processing apparatus according to claim 1.
The relationship set between the analysis targets is represented by graph information in which each of the analysis targets is a node and the nodes are connected.
The predetermined first relationship determined in advance is that it is determined that the relationship is connected to each other by a predetermined condition using the graph information.
The first machine learning means selects one of the analysis targets by a predetermined method, and a part of the plurality of analysis targets related to the node determined to be connected to the node related to the selected analysis target. Is an information processing device that extracts as a first analysis target group.

The information processing apparatus according to claim 1 or 2.
The relationship set between the analysis targets is represented by graph information in which each of the analysis targets is a node and the nodes are connected.
The predetermined second relationship is that the corresponding nodes represented by the graph information are connected to each other.
The second machine learning means selects one of the analysis targets by a predetermined method, and the analysis target related to the node connected to the node related to the selected analysis target and the node related to the selected analysis target. An information processing device that extracts an analysis target related to a node that is determined not to be connected to the second analysis target group.

The information processing apparatus according to claim 3.
The second machine learning means sets the analysis target related to the node determined to be connected to the node related to the selected analysis target among the analysis targets included in the second analysis target group as the normal analysis target. , The set of the selected analysis target, the positive analysis target, and the negative analysis target, with the analysis target related to the node determined not to be connected to the node related to the selected analysis target as the negative example analysis target. An information processing device that machine-learns at least the distributed expression information related to the selected analysis target using the above.

The information processing apparatus according to any one of claims 1 to 4.
The processing means evaluates the degree of similarity between at least a part of the analysis target by using the distributed expression information for each analysis target machine-learned by the first machine learning means and the second machine learning means. An information processing device that executes the processing to be performed.

Computer,
A means for holding vector values set for each analysis target as distributed representation information for a plurality of analysis targets for which mutual relationships are set, and
A part of the plurality of analysis targets, which are considered to have a predetermined first relationship in which the relationships set between the analysis targets are predetermined to each other, is extracted as a first analysis target group, and the extracted first. A first machine learning means for machine learning distributed expression information related to each of the analysis targets included in the first analysis target group based on the order or combination of the analysis target groups.
A part of the plurality of analysis targets in which the relationship set between the analysis targets is considered to have a predetermined second relationship predetermined to each other and the relationship set between the analysis targets are predetermined to each other. Based on the relationship set for each of the analysis targets included in the second analysis target group, a part of the plurality of analysis targets that are not considered to have a predetermined second relationship is set as the second analysis target group. , A second machine learning means for machine learning distributed expression information related to each of the analysis targets included in the second analysis target group.
A program that functions as a processing means for subjecting distributed expression information for each analysis target machine-learned by the first machine learning means and the second machine learning means to a predetermined process.