JP6152073B2

JP6152073B2 - Group association apparatus, method, and program

Info

Publication number: JP6152073B2
Application number: JP2014105512A
Authority: JP
Inventors: 具治岩田; 平尾　努; 努平尾; 健次福水; 元信金川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-05-21
Filing date: 2014-05-21
Publication date: 2017-06-21
Anticipated expiration: 2034-05-21
Also published as: JP2015219880A

Description

本発明は、グループ対応付け装置、方法、及びプログラムに係り、特に、データ間のグループを対応付けるグループ対応付け装置、方法、及びプログラムに関する。 The present invention relates to a group association apparatus, method, and program, and more particularly, to a group association apparatus, method, and program for associating groups between data.

異なるデータに含まれるオブジェクトを対応付ける手法は、画像と文の対応付けや、英語と日本語の単語の対応付け、異なるデータベースに含まれるユーザＩＤの対応付けなど、様々な応用が可能である。いくつかの応用例においては、オブジェクトをグループ化して、対応付けを行っている。例えば、画像や文をその内容に応じて分類して対応付けを行う場合や、単語がその意味に応じてまとめて対応付けを行う場合、ユーザをその属するコミュニティーに応じてグループ化して対応付けを行う場合などである。 Various methods such as associating images and sentences, associating English and Japanese words, and associating user IDs included in different databases are possible as methods for associating objects included in different data. In some applications, objects are grouped and associated. For example, when images and sentences are classified and matched according to their contents, or when words are associated together according to their meanings, users are grouped according to their communities and associated For example.

これらの対応付けにおいては、オブジェクトに関する対応情報が事前に与えられている場合や、異なるデータ間の距離が計算できる場合の対応付け手法が提案されている (例えば、非特許文献１)。また、対応情報や距離尺度がない場合であっても、オブジェクトを対応付ける手法もいくつか提案されている（例えば、非特許文献２、３、４）。 In these associations, an association method has been proposed in the case where correspondence information about an object is given in advance or a distance between different data can be calculated (for example, Non-Patent Document 1). Also, some methods for associating objects even when there is no correspondence information or distance measure have been proposed (for example, Non-Patent Documents 2, 3, and 4).

Tomoharu Iwata, Shinji Watanabe, Hiroshi Sawada: ”Fashion Coordinates Recommender System using Photographs from Fashion Magazines,” The twenty-second International Joint Conference on Artificial Intelligence (IJCAI), 2262-2267, 2011.Tomoharu Iwata, Shinji Watanabe, Hiroshi Sawada: “Fashion Coordinates Recommender System using Photographs from Fashion Magazines,” The twenty-second International Joint Conference on Artificial Intelligence (IJCAI), 2262-2267, 2011. Tomoharu Iwata, Tsutomu Hirao, Naonori Ueda: ”Unsupervised Cluster Matching via Probabilistic Latent Variable Models,”The Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013.Tomoharu Iwata, Tsutomu Hirao, Naonori Ueda: “Unsupervised Cluster Matching via Probabilistic Latent Variable Models,” The Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013. Novi Quadrianto, Alex J. Smola, Le Song, Tinne Tuytelaars. KernelizedSorting. IEEE Trans. on Pattern Analysis and Machine Intelligence PAMI, vol. 32(10), pp.1809-1821, 2010.Novi Quadrianto, Alex J. Smola, Le Song, Tinne Tuytelaars. Kernelized Sorting. IEEE Trans. On Pattern Analysis and Machine Intelligence PAMI, vol. 32 (10), pp. 1809-1821, 2010. Djuric, N., Grbovic, M., Vucetic, S., Convex Kernelized Sorting, AAAI Conference on Artificial Intelligence (AAAI), Toronto, Canada, 2012.Djuric, N., Grbovic, M., Vucetic, S., Convex Kernelized Sorting, AAAI Conference on Artificial Intelligence (AAAI), Toronto, Canada, 2012.

しかし、非特許文献１の技術では、対応情報や距離尺度が事前に与えられない場合には適用できないという問題がある。また、非特許文献２、３、４の技術では、グループ化されていないデータを対象としているため、オブジェクトのグループについての対応付けができない、という問題がある。 However, the technique of Non-Patent Document 1 has a problem that it cannot be applied when correspondence information and a distance scale are not given in advance. In addition, the techniques of Non-Patent Documents 2, 3, and 4 have a problem that association with object groups cannot be performed because ungrouped data is targeted.

本発明では、上記問題点を解決するために成されたものであり、精度よく、異なるデータ間のグループを対応付けることができるグループ対応付け装置、方法及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a group association apparatus, method, and program capable of accurately associating groups between different data.

上記目的を達成するために、本発明に係るグループ対応付け装置は、異なる複数のデータであって、各データが、Ｎ個のグループにグループ化されたオブジェクトの集合である複数のデータを入力として、前記複数のデータ間で、グループの対応付けを行うグループ対応付け装置であって、前記複数のデータの各々に対し、前記データのＮ個のグループのうちの２つのグループ間の各々について、前記２つのグループ間の関連性を表すカーネルを計算するカーネル計算部と、前記カーネル計算部により前記複数のデータの各々に対して計算した前記２つのグループ間の各々のカーネルに基づいて、前記複数のデータ間の依存性が高くなるように、前記データのＮ個のグループを並び替えることにより、前記複数のデータ間でグループの対応付けを行う並び替え部と、を含んで構成されている。 In order to achieve the above object, a group association apparatus according to the present invention receives a plurality of different data, each of which is an input of a plurality of data that is a set of objects grouped into N groups. A group association apparatus that associates groups among the plurality of data, and for each of the plurality of data, for each of two groups out of N groups of the data, A kernel calculation unit for calculating a kernel representing the relationship between two groups, and the plurality of the plurality of data calculated by the kernel calculation unit for each of the plurality of data based on each of the kernels. By reordering the N groups of the data so as to increase the dependency between the data, the group correspondence between the plurality of data is established. It is configured to include a, a rearrangement unit for performing.

前記カーネル計算部は、前記２つのグループ間の各々について、前記グループに含まれるオブジェクトの分布間の類似度を用いて前記カーネルを計算してもよい。 The kernel calculation unit may calculate the kernel using a similarity between distributions of objects included in the group for each of the two groups.

また、本発明に係るグループ対応付け装置において、前記カーネルは、線形カーネル、ガウスカーネル、及び多項式カーネルのうちいずれか一つのカーネルとしてもよい。 In the group association device according to the present invention, the kernel may be any one of a linear kernel, a Gaussian kernel, and a polynomial kernel.

本発明に係るグループ対応付け方法は、異なる複数のデータであって、各データが、Ｎ個のグループにグループ化されたオブジェクトの集合である複数のデータを入力として、前記複数のデータ間で、グループの対応付けを行うグループ対応付け方法であって、カーネル計算部が、前記複数のデータの各々に対し、前記データのＮ個のグループのうちの２つのグループ間の各々について、前記２つのグループ間の関連性を表すカーネルを計算するステップと、並び替え部が、前記カーネル計算部により前記複数のデータの各々に対して計算した前記２つのグループ間の各々のカーネルに基づいて、前記複数のデータ間の依存性が高くなるように、前記データのＮ個のグループを並び替えることにより、前記複数のデータ間でグループの対応付けを行うステップと、を含んで実行することを特徴とする。 The group association method according to the present invention is a plurality of different data, and each data is a set of objects grouped into N groups, and a plurality of data is input between the plurality of data. A group associating method for associating groups, wherein the kernel calculation unit is configured to perform, for each of the plurality of data, the two groups for each of two groups out of N groups of the data. A step of calculating a kernel representing a relationship between the plurality of data based on each kernel between the two groups calculated by the kernel calculation unit for each of the plurality of data. By reordering the N groups of the data so as to increase the dependency between the data, the group correspondence between the plurality of data is established. And executes comprise the steps of performing.

また、本発明に係るグループ対応付け方法において、前記カーネル計算部において前記２つのグループ間の関連性を表すカーネルを計算するステップは、前記２つのグループ間の各々について、前記グループに含まれるオブジェクトの分布間の類似度を用いて前記カーネルを計算してもよい。 Further, in the group association method according to the present invention, the step of calculating a kernel representing the relationship between the two groups in the kernel calculation unit includes, for each of the two groups, an object included in the group. The kernel may be calculated using the similarity between distributions.

また、本発明に係るグループ対応付け方法において、前記カーネルは、線形カーネル、ガウスカーネル、及び多項式カーネルのうちいずれか一つのカーネルとしてもよい。 In the group association method according to the present invention, the kernel may be any one of a linear kernel, a Gaussian kernel, and a polynomial kernel.

本発明に係るプログラムは、コンピュータに、上記のグループ対応付け装置を構成する各部として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit constituting the group association apparatus.

本発明のグループ対応付け装置、方法、及びプログラムによれば、異なる複数のデータの各データについて、Ｎ個のグループのうちの２つのグループ間の各々の関連性を示すカーネルを計算し、計算した２つのグループ間の各々のカーネルに基づいて、複数のデータ間の依存性が高くなるように、データのグループを並び替えることで、精度よく、異なるデータ間のグループを対応付けることができる、という効果が得られる。 According to the group association apparatus, method, and program of the present invention, for each data of a plurality of different data, a kernel indicating each relationship between two groups out of N groups is calculated and calculated. Effect of rearranging groups of data so that the dependency between a plurality of data becomes high based on each kernel between two groups, so that groups between different data can be associated with high accuracy. Is obtained.

本発明の実施の形態に係るグループ対応付け装置の主要構成を示すブロック図である。It is a block diagram which shows the main structures of the group matching apparatus which concerns on embodiment of this invention. 本発明の実施の形態におけるグループ対応付け処理ルーチンの内容を示すフローチャートである。It is a flowchart which shows the content of the group matching process routine in embodiment of this invention.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態の原理＞ <Principle of Embodiment of the Present Invention>

まず、本発明の実施の形態の原理について説明する。 First, the principle of the embodiment of the present invention will be described.

入力データとして２つのグループ化されたオブジェクトの集合であるデータＸ＝｛Ｘ_１，・・・，Ｘ_Ｎ｝とＹ＝｛Ｙ_１，・・・，Ｙ_Ｎ｝という２つのデータが与えられたとする。ここで、 As input data, data X = {X ₁ ,..., X _N } and Y = {Y ₁ ,..., Y _N }, which are sets of two grouped objects, are given. To do. here,

は、１つ目のデータのｎ番目のグループに含まれるオブジェクトの集合、ｘ_ｎｉはｎ番目のグループのｉ番目のオブジェクト、Ｉ_ｎはｎ番目のグループに含まれるオブジェクト数を表す。同様に Is a set of objects included in the nth group of the first data, _xni represents the ith object of the _nth group, and In represents the number of objects included in the nth group. As well

は、２つ目のデータのｎ番目のグループに含まれるオブジェクトの集合、ｙ_ｎｊはｎ番目のグループのｊ番目のオブジェクト、Ｊ_ｎはｎ番目のグループに含まれるオブジェクト数を表す。つまり、データＸとＹの各々が、Ｎ個のグループにグループ化されたオブジェクトの集合である。なお、本説明においては２つのデータが与えられた場合を想定して説明するが、３つ以上のデータが与えられた場合にも、２つのデータ毎に適用することにより、同様に適用可能である。 Represents a set of objects included in the nth group of the second data, y _{nj represents} the jth object of the _nth group, and Jn represents the number of objects included in the nth group. That is, each of the data X and Y is a set of objects grouped into N groups. In this description, it is assumed that two pieces of data are given. However, even when three or more pieces of data are given, it can be applied in the same manner by applying to every two pieces of data. is there.

処理としては、まず、上記与えられた２つのデータを読み込む。次に、２つのデータの各々に対し、Ｎ個のグループのうち２つのグループ間の各々について、グループ間の関連性を表すカーネルを計算する。計算において、２つのグループの統計的性質が近似する場合、カーネルの値は高くなる。 As processing, first, the two given data are read. Next, for each of the two data, a kernel representing the relationship between the groups is calculated for each of the two groups out of the N groups. In the calculation, if the statistical properties of the two groups approximate, the value of the kernel will be high.

次に、計算したカーネルの値を用いて、２つのデータ間の依存性が最も高くなるように、Ｎ個のグループを並び替えることにより、２つのデータ間のグループを対応付ける。 Next, using the calculated kernel value, the groups between the two data are associated by rearranging the N groups so that the dependency between the two data becomes the highest.

対応付けは下記（３）式により、並替行列π∈Π_Ｎで表現される。 Mapping by the following formula (3) is expressed by rearrangement matrix π∈Π _N.

ここで１_Ｎは要素がすべて１のＮ次元ベクトルである。そして、下記（４）式により、グループ化されたオブジェクト集合であるデータＸ_ＮとＹ_Ｎとの依存性が最大になるような並替行列を求める。 Here, 1 _N is an N-dimensional vector whose elements are all 1. Then, by the following equation (4), dependence of the data X _N and Y _N is the object set that are grouped seeks rearrangement matrix that maximizes.

ここでＤ（・）は依存性の尺度である。依存性の尺度としては後述するヒルベルト−シュミット独立基準（Ｈｉｌｂｅｒｔ−ＳｃｈｍｉｄｔＩｎｄｅｐｅｎｄｅｎｃｅＣｒｉｔｅｒｉｏｎ；ＨＳＩＣ）を用いる。なお、依存性の尺度として、相互情報量など他の依存性の尺度を用いることも可能である。 Here, D (•) is a measure of dependence. As a measure of dependence, the Hilbert-Schmidt Independence Criterion (HSIC) described later is used. It should be noted that other dependency scales such as mutual information can be used as the dependency scale.

上述したように、本発明の実施の形態では、異なる複数のデータについて、データ毎に、Ｎ個のグループのうちの２つのグループ間の各々に対してカーネルを計算し、複数のデータ間の依存性が高くなるように、Ｎ個のグループを並び替えることで、複数のデータ間のグループの対応付けを行う。 As described above, in the embodiment of the present invention, for each of a plurality of different data, a kernel is calculated for each of two groups out of N groups for each data, and the dependency between the plurality of data is determined. The groups are associated with each other by rearranging the N groups so as to increase the performance.

＜本発明の実施の形態に係るグループ対応付け装置の構成＞ <Configuration of Group Correlation Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係るグループ対応付け装置の構成について説明する。図１に示すように、本発明の実施の形態に係るグループ対応付け装置１００は、ＣＰＵと、ＲＡＭと、後述するグループ対応付け処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このグループ対応付け装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部５０とを備えている。 Next, the configuration of the group association apparatus according to the embodiment of the present invention will be described. As shown in FIG. 1, a group association apparatus 100 according to an embodiment of the present invention includes a CPU, a RAM, a ROM for storing a program and various data for executing a group association processing routine described later, Can be configured with a computer including Functionally, the group association apparatus 100 includes an input unit 10, a calculation unit 20, and an output unit 50 as shown in FIG.

入力部１０は、異なる複数のデータであって、各データが、Ｎ個のグループにグループ化されたオブジェクトの集合である複数のデータを受け付ける。本実施の形態では、複数のデータは２つのデータＸ、Ｙとする。 The input unit 10 receives a plurality of different data, each data being a set of objects grouped into N groups. In the present embodiment, the plurality of data are two pieces of data X and Y.

演算部２０は、データ記憶部３０と、カーネル計算部３２と、並び替え部３４とを含んで構成されている。 The calculation unit 20 includes a data storage unit 30, a kernel calculation unit 32, and a rearrangement unit 34.

データ記憶部３０は、入力部１０において受け付けた２つのデータを記憶している。 The data storage unit 30 stores two data received by the input unit 10.

カーネル計算部３２は、まず、データ記憶部３０に記憶されている２つのデータを読み込む。次に、読み込んだ２つのデータに基づいて、２つのデータの各々について、当該データのＮ個のグループのうち２つのグループ間の各々に対し、２つのグループ間の関連性を表すカーネルを計算する。 The kernel calculation unit 32 first reads two data stored in the data storage unit 30. Next, based on the read two data, for each of the two data, a kernel representing the relationship between the two groups is calculated for each of the two groups out of the N groups of the data. .

カーネル計算部３２における計算は、当該データのＮ個のグループの各々を分布として捉え、グループの分布間の各々の統計的性質を類似度としてカーネルを計算する。まず、当該データの２つのグループに含まれる２つのオブジェクト間の各々の類似度であるカーネルを計算する。そして、計算した２つのオブジェクト間の各々のカーネルに基づいて、当該データについて、Ｎ個のグループのうちの２つのグループの分布間の各々のカーネルを計算する。ここで、カーネル埋め込み法により、オブジェクト間のカーネルを用いて、分布を、再生核ヒルベルト空間上の１点により表現できる。そして、再生核ヒルベルト空間上での分布間のカーネルが、２つのグループ間のカーネルであると考える。 The calculation in the kernel calculation unit 32 regards each of the N groups of the data as a distribution, and calculates a kernel using each statistical property between the distributions of the groups as a similarity. First, a kernel that is the degree of similarity between two objects included in two groups of the data is calculated. Then, based on the calculated kernels between the two objects, the kernels between the distributions of the two groups of the N groups are calculated for the data. Here, with the kernel embedding method, the distribution can be expressed by one point on the reproduction kernel Hilbert space using a kernel between objects. Then, it is considered that the kernel between distributions on the reproduction kernel Hilbert space is a kernel between two groups.

オブジェクト間のカーネル、及び分布間のカーネルとしては、線形カーネル、ガウスカーネル、多項式カーネルのうち任意のカーネルを用いることができる。 As a kernel between objects and a kernel between distributions, any kernel among a linear kernel, a Gaussian kernel, and a polynomial kernel can be used.

オブジェクトｘ_ｎｉとｘ_ｍｊの間の線形カーネルは下記（５）式で計算できる。 A linear kernel between the objects x _ni and x _mj can be calculated by the following equation (5).

また、オブジェクトｘ_ｎｉとｘ_ｍｊの間のガウスカーネルは下記（６）式で計算できる。 Further, a Gaussian kernel between the objects x _ni and x _mj can be calculated by the following equation (6).

また、オブジェクトｘ_ｎｉとｘ_ｍｊの間の多項式カーネルは下記（７）式で計算できる。 A polynomial kernel between the objects x _ni and x _mj can be calculated by the following equation (7).

再生核ヒルベルト空間上でのグループＸ_ｎとＸ_ｍの間の線形カーネルは下記（８）式で計算できる。 Linear kernel between groups X _n and X _m on the reproducing kernel Hilbert space can be calculated by the following equation (8).

また、グループＸ_ｎとＸ_ｍの間のガウスカーネルは下記（９）式で計算できる。 Also, the Gaussian kernel between groups _{X n} and _{X m} can be calculated by the following equation (9).

また、グループＸ_ｎとＸ_ｍの間の多項式カーネルは下記（１０）式で計算できる。 Further, the polynomial kernel between groups _{X n} and _{X m} can be calculated by the following equation (10).

そして、カーネル計算部３２は、上記（８）式、（９）式、又は（１０）式に従って、計算した２つのグループ間の各々のカーネルに基づいて、２つのデータの各々について、データごとに、全てのグループ間のカーネルをまとめた行列を得る。本実施の形態では、１つ目のデータＸの全てのグループ間のカーネルをまとめたものをＫとする。ここでＫはＮ×Ｎの行列であり、Ｋの（ｎ，ｍ）要素はｎ番目のグループとｍ番目のグループのカーネルＫ（Ｘ_ｎ，Ｘ_ｍ）を表す。同様に、２つ目のデータＹの全てのグループ間のカーネルをまとめたものをＬとする。 Then, the kernel calculation unit 32 performs, for each data, for each of the two data based on each kernel between the two groups calculated according to the above formula (8), (9), or (10). Get a matrix summarizing kernels between all groups. In the present embodiment, K is the sum of kernels between all groups of the first data X. Here, K is an N × N matrix, and the (n, m) element of K represents the kernel K (X _n , X _m ) of the n th group and the m th group. Similarly, let L be the sum of kernels between all groups of the second data Y.

並び替え部３４は、カーネル計算部３２により２つのデータの各々に対して計算した２つのグループ間の各々のカーネルに基づいて、２つのデータ間の依存性が高くなるように、データのＮ個のグループを並び替えて、２つのデータ間のグループを対応付ける。 The rearrangement unit 34 determines the N pieces of data so that the dependency between the two data becomes high based on each kernel between the two groups calculated by the kernel calculation unit 32 for each of the two data. Are rearranged to associate groups between two pieces of data.

例えば、並び替え部３４は、まず、カーネル計算部３２により得たデータＸついてのＫ、データＹについてのＬを読み込む。そして、ＸとＹとのデータ間の依存性が高くなるように、Ｎ個のグループを並べ替える。ここで、依存性の尺度としてヒルベルト−シュミット独立基準（Ｈｉｌｂｅｒｔ−ＳｃｈｍｉｄｔＩｎｄｅｐｅｎｄｅｎｃｅＣｒｉｔｅｒｉｏｎ；ＨＳＩＣ）を用いる。２つのデータ間のＨＳＩＣは以下（１１）式により表される。 For example, the rearrangement unit 34 first reads K for data X and L for data Y obtained by the kernel calculation unit 32. Then, the N groups are rearranged so that the dependency between the data of X and Y becomes high. Here, the Hilbert-Schmidt Independence Criterion (HSIC) is used as a measure of dependence. The HSIC between the two data is expressed by the following equation (11).

ここでｔｒはトレース、Ｈ＝Ｉ−１_Ｎ１_Ｎ ^Ｔ／Ｎは中心化行列を表す。また、￣Ｋ＝ＨＫＨ、￣Ｌ＝ＨＬＨである。次の（１２）式により、ＨＳＩＣを最大化するように、並替行列を求める。 Here, tr represents a trace, and H = I−1 _N 1 _N ^T / N represents a centering matrix. Further, ￣K = HKH and ￣L = HLH. The permutation matrix is obtained so as to maximize the HSIC by the following equation (12).

なお、（１２）式は、ＤＣＰｒｏｇｒａｍｍｉｎｇを用いる方法、制約付き固有値問題として解く方法（非特許文献３）、又は凸問題として解く方法（非特許文献４）などを用いることにより最大化する。 Equation (12) is maximized by using a method using DC Programming, a method solving as a restricted eigenvalue problem (Non-Patent Document 3), a method solving as a convex problem (Non-Patent Document 4), or the like.

＜本発明の実施の形態に係るグループ対応付け装置の作用＞ <Operation of Group Correlation Device According to Embodiment of the Present Invention>

次に、本発明の実施の形態に係るグループ対応付け装置１００の作用について説明する。入力部１０において、各データが、Ｎ個のグループにグループ化されたオブジェクトの集合である２つのデータを受け付けると、データ記憶部３０に格納する。そして、グループ対応付け装置１００は、図２に示すグループ対応付け処理ルーチンを実行する。 Next, the operation of the group association apparatus 100 according to the embodiment of the present invention will be described. When the input unit 10 receives two pieces of data, each of which is a set of objects grouped into N groups, the data is stored in the data storage unit 30. Then, the group association apparatus 100 executes a group association processing routine shown in FIG.

まず、ステップＳ１００では、２つのデータをデータ記憶部３０から取得する。 First, in step S <b> 100, two data are acquired from the data storage unit 30.

次に、ステップＳ１０２では、ステップＳ１００で取得した２つのデータに基づいて、２つのデータの各々に対し、上記（８）式、（９）式、又は（１０）式に従って、当該データのＮ個のグループのうちの２つのグループ間の各々のカーネルを計算する。そして、２つのデータの各々について、データごとに、全てのグループ間のカーネルをまとめた行列を得る。 Next, in step S102, based on the two data acquired in step S100, for each of the two data, according to the above formula (8), (9), or (10), N pieces of the data Compute each kernel between two of the groups. Then, for each of the two data, a matrix in which the kernels between all the groups are collected is obtained for each data.

次に、ステップＳ１０６において、ステップＳ１０２で２つのデータの各々に対して計算したグループ間の各々のカーネルに基づいて、上記（１２）式に従って、２つのデータ間の依存性が最も高くなるように、データのＮ個のグループを並び替える並替行列を求める。そして、求めた並替行列に基づいて、データのＮ個のグループを並び替え、２つのデータ間でグループを対応付ける。そして、ステップＳ１０８において、ステップＳ１０６でデータ間のグループを対応付けた２つのデータを出力し、処理を終了する。 Next, in step S106, based on each kernel between the groups calculated for each of the two data in step S102, the dependency between the two data is maximized according to the above equation (12). , Find a permutation matrix that rearranges the N groups of data. Then, based on the obtained rearrangement matrix, N groups of data are rearranged, and groups are associated between the two data. In step S108, two data in which groups between the data are associated in step S106 are output, and the process ends.

＜実験結果＞ <Experimental result>

次に、本実施の形態に係る手法に基づいて行った実験結果について説明する。 Next, experimental results performed based on the method according to the present embodiment will be described.

本発明の実施の形態に係る手法を評価するため、第１の実験例として、４つのラベル付きデータセットを用いて、データセットごとに実験を行った。実験のため、各データセットの特徴量をランダムに分割し、データセットごとに２つのデータを作成した。 In order to evaluate the method according to the embodiment of the present invention, an experiment was performed for each data set using four labeled data sets as a first experimental example. For the experiment, feature values of each data set were randomly divided, and two data were created for each data set.

第１の実験例において、本実施の形態と比較する手法としては、ＫＳ−ｍｅａｎ及びＫＳ−ｏｂｊｅｃｔを用いた。ＫＳ−ｍｅａｎは、グループ毎の特徴量をその平均で表し、オブジェクト対応付け法であるｃｏｎｖｅｘＫｅｒｎｅｌｉｚｅｄＳｏｒｔｉｎｇ（非特許文献４）を用いてグループを対応付ける方法である。ＫＳ−ｏｂｊｅｃｔは、まずオブジェクトの対応付けをｃｏｎｖｅｘＫｅｒｎｅｌｉｚｅｄＳｏｒｔｉｎｇを用いて行い、その後、グループ間の対応する確率を対応付けられたオブジェクト数によって計算し、グループ間の対応付ける方法である。 In the first experimental example, KS-mean and KS-object were used as a method for comparison with the present embodiment. KS-mean is a method of representing a feature amount for each group as an average, and associating groups using convex kernelized sorting (Non-Patent Document 4), which is an object association method. KS-object is a method of first associating objects by using convex kernelized sorting, and then calculating the corresponding probability between groups based on the number of associated objects, and associating the groups.

第１の実験例における本実施の形態に係る手法では、オブジェクト間の各々のカーネルの計算には（５）式を、グループ間の各々のカーネルの計算には（８）式を適用した。 In the method according to the present embodiment in the first experimental example, Equation (5) is applied to the calculation of each kernel between objects, and Equation (8) is applied to the calculation of each kernel between groups.

第１の実験例における各実験結果の平均正答率と標準誤差を表１に表す。本発明の実施の形態に係る手法が、全データセットで最も高い正答率を示しており、効果的にグループの対応付けを発見できている。なお、ＫＳ−ｏｂｊｅｃｔは多くの計算量を必要とするため、オブジェクト数が多いＳａｔｉｍａｇｅとＬｅｔｔｅｒのデータセットには適用していない。 Table 1 shows the average correct answer rate and standard error of each experimental result in the first experimental example. The method according to the embodiment of the present invention shows the highest correct answer rate in all the data sets, and the group association can be found effectively. Since KS-object requires a large amount of calculation, it is not applied to the data sets of Satimage and Letter having a large number of objects.

また、第２の実験例として、Ｗｉｋｉｐｅｄｉａ（登録商標）に含まれる６言語（英語、ドイツ語、フィンランド語、フランス語、日本語）の文書のカテゴリを対応付ける実験を行った。 Further, as a second experimental example, an experiment was performed in which categories of documents in six languages (English, German, Finnish, French, Japanese) included in Wikipedia (registered trademark) are associated.

第２の実験例において本実施の形態と比較する手法としては、ＫＳ−ｍｅａｎを用いた。第２の実験例における本実施の形態に係る手法では、オブジェクト間の各々のカーネルの計算には（６）式を、グループ間の各々のカーネルの計算には（９）式を適用した。 In the second experimental example, KS-mean was used as a method for comparison with the present embodiment. In the method according to the present embodiment in the second experimental example, equation (6) is applied to the calculation of each kernel between objects, and equation (9) is applied to the calculation of each kernel between groups.

第２の実験例における各実験結果の平均正答率と標準誤差を表２に示す。１つの言語対を除き、本発明の実施の形態に係る手法がＫＳ−ｍｅａｎよりも高い正答率を示し、本発明の実施の形態に係る手法の有効性を示している。 Table 2 shows the average correct answer rate and standard error of each experimental result in the second experimental example. With the exception of one language pair, the method according to the embodiment of the present invention shows a higher correct answer rate than KS-mean, indicating the effectiveness of the method according to the embodiment of the present invention.

以上、説明したように、本実施の形態に係るグループ対応付け装置によれば、異なる２つのデータの各データについて、Ｎ個のグループのうち２つのグループ間の各々のカーネルを計算し、計算した２つのグループ間の各々のカーネルに基づいて、２つのデータ間の依存性が高くなるように、データのグループを並び替えることで、精度よく、異なるデータ間のグループを対応付けることができる。 As described above, according to the group association device according to the present embodiment, for each data of two different data, each kernel between two groups out of N groups is calculated and calculated. By rearranging the data groups so that the dependency between the two data becomes high based on the respective kernels between the two groups, the groups between the different data can be associated with each other with high accuracy.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

１０入力部
２０演算部
３０データ記憶部
３２カーネル計算部
３４並び替え部
１００グループ対応付け装置 DESCRIPTION OF SYMBOLS 10 Input part 20 Operation part 30 Data storage part 32 Kernel calculation part 34 Rearrangement part 100 Group matching apparatus

Claims

Group associating device for associating a plurality of pieces of data with a plurality of different data, each of which is a set of objects grouped into N groups, and associating the groups among the plurality of data Because
For each of the plurality of data, a kernel calculation unit that calculates a kernel representing a relationship between the two groups for each of two groups out of N groups of the data;
Based on each kernel between the two groups calculated for each of the plurality of data by the kernel calculation unit, the N groups of the data are increased so that the dependency between the plurality of data is increased. By reordering, a reordering unit that associates groups among the plurality of data,
A group association apparatus including

The group association device according to claim 1, wherein the kernel calculation unit calculates the kernel using a similarity between distributions of objects included in the group for each of the two groups.

A group association method for associating a group between a plurality of pieces of data, each of which is a plurality of different data, each of which is a set of objects grouped into N groups. Because
A kernel calculator for each of the plurality of data, for each of two groups out of N groups of the data, calculating a kernel representing the relationship between the two groups;
Based on each kernel between the two groups calculated by the kernel calculation unit for each of the plurality of data, the sorting unit increases the dependency between the plurality of data. Rearranging the N groups of the group to associate groups among the plurality of data; and
Group matching method including

The step of calculating a kernel representing the relationship between the two groups in the kernel calculation unit calculates the kernel using the similarity between distributions of objects included in the group for each of the two groups. The group association method according to claim 3.

The program for functioning a computer as each part which comprises the group matching apparatus of Claim 1 or 2.