JP6058065B2

JP6058065B2 - Tensor data calculation device, tensor data calculation method, and program

Info

Publication number: JP6058065B2
Application number: JP2015093088A
Authority: JP
Inventors: 達史松林; 澤田　宏; 宏澤田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-01-23
Filing date: 2015-04-30
Publication date: 2017-01-11
Anticipated expiration: 2035-04-30
Also published as: JP2016139391A

Description

本発明は、複数の属性情報から要因パターンを抽出する因子分解に関する技術であり、特に、非負値テンソル因子分解技術に関連する。 The present invention relates to a factorization technique for extracting a factor pattern from a plurality of attribute information, and particularly relates to a non-negative tensor factorization technique.

購買ログやチェックインログなど、一般的にログデータはテンソルとして表現することが可能である。またそのようなデータは正の実数値で表現されるため、テンソルとして表現されたデータは非負値テンソル因子分解（ＮｏｎｎｅｇａｔｉｖｅＴｅｎｓｏｒＦａｃｔｏｒｉｚａｔｉｏｎ）手法を用いて因子分析を行うことが可能である。しかしながら、ログデータ分析手法として、因子分解技術は広く用いられている一方、非常に計算時間がかかるという問題を抱えている。なお、例えば非特許文献１には、一般的な因子分解手法についての技術が記載されている。 In general, log data such as purchase logs and check-in logs can be expressed as tensors. Further, since such data is expressed as a positive real value, the data expressed as a tensor can be subjected to factor analysis using a non-negative tensor factorization method. However, factorization technology is widely used as a log data analysis method, but has a problem that it takes a lot of calculation time. For example, Non-Patent Document 1 describes a technique regarding a general factorization technique.

ユーザが複数の商品の購買を行い、そのようなデータが数日間分存在する例を考える。一般的には「ユーザｘ商品ｘ日」の３次のテンソルデータとして表現し、Ｒ（Ｒａｎｋ）個の基底に分解を行う。この時、ユーザ数（Ｉ）、商品数（Ｊ）、日数（Ｋ）とすると、当該テンソルはサイズ（ＩｘＪｘＫ）のテンソルとして表現され、因子分解の計算量は（ＩｘＪｘＫｘＲ）に比例する。例えば、１０００人のユーザが、１０００種類のアイテムから購買を行い、そのようなデータが１０００日間蓄積され、１００個の基底に分解を行う時、その計算量は一回の反復計算に１０００億回規模の演算処理を行わなければならない。通常反復計算を１０００回ほど行うとして、１００兆回規模の演算処理を行い、その計算は近年の汎用コンピュータを用いると数日かかる。 Consider an example where a user purchases a plurality of products and such data exists for several days. Generally, it is expressed as third-order tensor data of “user x product x day”, and is decomposed into R (Rank) bases. At this time, assuming that the number of users (I), the number of products (J), and the number of days (K), the tensor is expressed as a tensor of size (IxJxK), and the calculation amount of factorization is proportional to (IxJxKxR). For example, when 1000 users make purchases from 1000 types of items, such data is accumulated for 1000 days and decomposed into 100 bases, the amount of calculation is 100 billion times per iteration. You have to do scale computations. Assuming that iterative calculation is normally performed about 1000 times, arithmetic processing is performed on a scale of 100 trillion times, and the calculation takes several days when a recent general-purpose computer is used.

また、テンソルデータを保持するためのメモリ領域は（ＩｘＪｘＫ）のオーダ量が必要になり、前述の規模でも数ＧＢのメモリを要する。従って、大規模なデータ（ユーザ数や商品数の増加）では物理的計算ができなくなる。 Further, the memory area for holding the tensor data requires an order amount of (IxJxK), and requires several GB of memory even in the above-described scale. Therefore, physical calculation cannot be performed with large-scale data (increase in the number of users and the number of products).

ここで、非負値テンソル因子分解の計算例について説明する。非負値テンソル因子分解では、非負性を保って、テンソルデータを因子行列のテンソル積に分解する。例えば図１に示すように、「ユーザ［Ｉ］ｘ商品［Ｊ］ｘ日［Ｋ］」の３次のテンソルデータＸは３個の因子行列Ａ、Ｂ、Ｃに分解することができ、下記の式のように表すことができる。 Here, a calculation example of non-negative tensor factorization will be described. Non-negative tensor factorization decomposes tensor data into tensor products of factor matrices while maintaining non-negativeness. For example, as shown in FIG. 1, the third-order tensor data X of “user [I] x product [J] x day [K]” can be decomposed into three factor matrices A, B, and C. It can be expressed as

なお、本明細書のテキストでは、便宜上、テンソル積の記号を＊と表すことにする。また、本明細書のテキストでは、推定量を示すハット"＾"の記号を、便宜上、文字の頭上でなく、文字の直前に記載する。例えば、"＾Ｘ"のようにである。

In the text of this specification, the symbol of the tensor product is represented by * for convenience. Moreover, in the text of this specification, the symbol of the hat “＾” indicating the estimated amount is described immediately before the character, for the sake of convenience. For example, “^ X”.

上記のＡ、Ｂ、ＣはそれぞれＩｘＲ、ＪｘＲ、ＫｘＲの非負値の行列であり、テンソル積＊は下記のように各基底の積で表される。 The above A, B, and C are non-negative matrices of IxR, JxR, and KxR, respectively, and the tensor product * is represented by the product of each base as follows.

上記のテンソルデータＸと、Ａ＊Ｂ＊Ｃが近似的に等しくなるように、Ａ、Ｂ、Ｃを求める手法がテンソル因子分解である。この時、＾Ｘ＝Ａ＊Ｂ＊Ｃとして、Ｄ（Ｘ｜｜＾Ｘ）を最小化させる。Ｄ（・）は距離関数であり、一般化ＫＬダイバージェンス（ｇＫＬ）距離では以下のように表される。

A technique for obtaining A, B, and C so that the above tensor data X and A * B * C are approximately equal is tensor factorization. At this time, D (X || ^ X) is minimized as ^ X = A * B * C. D (•) is a distance function, and is expressed as follows in the generalized KL divergence (gKL) distance.

距離関数をｇＫＬとした時、因子分解の最適値を推定するための更新式は下記のようになる。

When the distance function is gKL, the update formula for estimating the optimum factorization value is as follows.

更新式を何回か繰り返し適用することで、因子分解後のＡ、Ｂ、Ｃが得られる。図２は、例として因子行列Ａの更新処理手順を示している。図２に示すように、ａ_ｉｒの更新では、ａ_ｉｒ自体についてＩｘＲのループ処理が必要となり、ａ_ｉｒの値を求めるためにＪｘＫのループ処理が必要となる。従って、最終的にＩｘＪｘＫｘＲ回のループ処理が必要になる。また、＾Ｘ_ｉｊｋ、ｂ_ｊｋ、ｃ_ｋｒの値を求めるためにも同様の処理回数が要求される。

A, B, and C after factorization are obtained by repeatedly applying the update formula several times. FIG. 2 shows an update process procedure of the factor matrix A as an example. As shown in FIG. _2, the updating of _{a _ir,} loop processing of IxR for _{a ir} itself is _required, the loop process of JxK to determine the value of _{a ir} is required. Therefore, IxJxKxR loop processing is finally required. In addition, the same number of processes is required to obtain the values of ＸX _ijk , b _jk , and c _kr .

Liu, Weixiang, Tianfu Wang, and Siping Chen. "Nonnegative tensor factorization for clustering genes with time series microarrays from different conditions: A case study." Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on. Vol. 6. IEEE, 2010.Liu, Weixiang, Tianfu Wang, and Siping Chen. "Nonnegative tensor factorization for clustering genes with time series microarrays from different conditions: A case study." Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on. Vol. 6. IEEE , 2010.

一般的に多くのテンソルデータは値を持たない要素が多く、データ構造が疎である場合が多い（疎であることをここではスパースであるという表現を使う）。例えば、購買ログでは、全てのユーザが全ての商品を購入するわけではなく、テンソルの要素では「０」として意味のない値として扱われる事が多い。具体的な例としては、ユーザ毎にデータ収集期間が異なる場合や収集データの一部に欠損がある場合などは欠損値として扱う「欠損値スパーステンソル」があり、ユーザが意図的に購入しない場合（男性が女性用の下着を購入しないなど）などは０値として扱う「０値スパーステンソル」があり、いずれも因子分解の更新式において計算不要な要素を省略し、計算が必要な要素のみを演算処理することによって高速に計算が可能となる。 In general, many tensor data have many elements that do not have a value, and the data structure is often sparse (the sparse is used here as a sparse expression). For example, in a purchase log, not all users purchase all commodities, and the tensor element is often treated as a meaningless value as “0”. As a specific example, there is a “missing value sparse tensor” that treats as a missing value when the data collection period varies from user to user or when some of the collected data is missing, and the user does not intentionally purchase There is a “zero-value sparse tensor” that treats as zero values (such as men not buying women's underwear), and all of them omit the elements that do not need to be calculated in the factorization update formula, and only the elements that need to be calculated Calculation can be performed at high speed by performing arithmetic processing.

例えばログ数Ｌとした場合の欠損値スパーステンソルの更新式は以下のようになり、因子行列Ａの更新手順は図３に示すようになる。 For example, the update formula of the missing value sparse tensor when the number of logs is L is as follows, and the update procedure of the factor matrix A is as shown in FIG.

図３に示すように、ａ_ｉｒの更新では、ａ_ｉｒ自体に対するＩｘＲのループ処理が必要ではあるが、ａ_ｉｒの値を求めるためにＪｘＫのループ処理は行わず、ｉに対するｊ，ｋのログ数Ｌ_ｉの処理回数でよく、最終的にＬｘＲ回のループ処理でよい。＾Ｘ_ｉｊｋ、ｂ_ｊｋ、ｃ_ｋｒの値を求めるためにも同様の処理回数でよく、スパースなデータ構造では計算の処理回数はＬｘＲ回に比例する。よって、計算量を大幅に削減することが可能である。また、物理的なメモリ量も（Ｌ）のオーダとなり、汎用ＰＣによる処理が可能となる。

As shown in FIG. _3, the updating of _{a _ir,} _{a ir} loop processing IxR against itself is required. _However, the loop process of JxK to determine the value of _{a ir} is not performed, j for i, k log well treatment times of a few L _i, finally may loop processing of LxR times. In order to obtain the values of ^ X _ijk , b _jk , and c _kr , the same number of processes may be used. In a sparse data structure, the number of calculations is proportional to LxR. Therefore, it is possible to greatly reduce the calculation amount. Also, the physical memory amount is on the order of (L), and processing by a general-purpose PC is possible.

また、例えば、０値スパーステンソルの更新式は以下のようになり、因子行列Ａの更新手順は図４に示すようになる。 Further, for example, the update formula of the zero-value sparse tensor is as follows, and the update procedure of the factor matrix A is as shown in FIG.

ここで、図４に示すように、０値スパーステンソルの更新手順においては、分母を分子とは別のループで処理をしている。すなわち、ａ_ｉｒの更新では、「数８」に示すように分母はｉに非依存な値のため、分母をｉに関してのＩ回のループ処理の外に出すことができる。分母はｒに依存するため、ｒ個のデータ配列で取得しておくことによって分母の計算量は（Ｊ＋Ｋ）ｘＲ回に比例する程度に抑えることが可能である。ｂ_ｊｋ、ｃ_ｋｒの値を求めるためにも同様に、分母と分子を切り分けて計算することが可能であり、その結果、分母の計算量が（（Ｉ＋Ｊ＋Ｋ）ｘＲ）のオーダになり、分子の計算量が（ＬｘＲ）のオーダになる。しかしながら、一般的にはＩ，Ｊ，Ｋはログ数Ｌと比較して十分小さな値になるため、全体の計算量に対して分母の計算量は無視できるオーダになり、欠損値スパーステンソルの計算量と同程度のものになる。また、物理的なメモリ量も（Ｌ）のオーダとなり、汎用ＰＣによる処理が可能である。

Here, as shown in FIG. 4, in the update procedure of the zero-value sparse tensor, the denominator is processed in a loop different from the numerator. That is, in the update of a _ir , since the denominator is a value independent of i as shown in “ _Equation 8”, the denominator can be out of I loop processing for i. Since the denominator depends on r, the calculation amount of the denominator can be suppressed to be proportional to (J + K) × R times by obtaining r data arrays. Similarly, in order to obtain the values of b _jk and c _kr , the denominator and the numerator can be calculated separately. As a result, the calculation amount of the denominator is on the order of ((I + J + K) xR), The calculation amount is on the order of (LxR). However, in general, I, J, and K are sufficiently small values as compared with the number of logs L, so that the calculation amount of the denominator is negligible with respect to the total calculation amount, and the calculation of the missing value sparse tensor is performed. It will be about the same amount. Also, the physical memory amount is on the order of (L), and processing by a general-purpose PC is possible.

しかしながら、いずれの例の場合でも、スパースな非負値テンソル因子分解は原理的に計算量を削減することが可能である一方、行列因子分解法と異なりデータ構造の複雑性から効果的な高速化手法が確立されていない。特に、単純にデータ構造を保持しただけではメモリのランダムアクセスが発生し、大規模な処理では処理が遅くなる。 However, in any case, the sparse non-negative tensor factorization can reduce the amount of calculation in principle, but unlike the matrix factorization method, it is an effective acceleration method due to the complexity of the data structure. Is not established. In particular, if the data structure is simply held, random access to the memory occurs, and processing is slow in large-scale processing.

本発明は上記の点に鑑みてなされたものであり、スパーステンソルの因子分解のための処理を高速化するとともに、処理の際に必要とするメモリ量とメモリアクセス通信量を削減する技術を提供することを目的とする。 The present invention has been made in view of the above points, and provides a technique for speeding up the processing for factoring the sparse tensor and reducing the amount of memory and the amount of memory access communication required for the processing. The purpose is to do.

本発明の実施の形態により、Ｎ次元（Ｎは３以上の整数）のテンソルデータに対し、当該テンソルデータの複数インデックスについてのループ計算を行うテンソルデータ計算装置であって、
計算ループの最下位のインデックスから最上位の１つ前までの各インデックスの順で各インデックスのループ方向順となるようにテンソルデータの要素の順番付けを行い、当該順番付けに従って、テンソルデータ中の空でない要素について、当該要素の値と、当該要素のテンソルデータ上の位置を示す最下位のインデックスから最上位の１つ前までのインデックスのインデックス値とをデータ記憶部上に配置するとともに、当該空でない要素のカウント数を算出して前記データ記憶部に配置する処理を、計算ループの最上位のインデックスの順番毎に当該インデックスの順番数だけ実行するデータ配置処理手段と、
前記データ配置処理手段により前記データ記憶部上に配置されたテンソルデータに対するループ計算を行う計算処理手段と
を備えることを特徴とするテンソルデータ計算装置が提供される。 According to an embodiment of the present invention, a tensor data calculation device that performs loop calculation on N-dimensional (N is an integer of 3 or more) tensor data for a plurality of indexes of the tensor data,
The tensor data elements are ordered in the order of the loop direction of each index in the order of each index from the lowest index of the calculation loop to the previous highest index, and in the tensor data according to the ordering. For a non-empty element, the value of the element and the index value of the index from the lowest index indicating the position on the tensor data of the element to the immediately preceding index are arranged on the data storage unit, and Data placement processing means for calculating the number of non-empty elements and placing the data storage unit in the data storage unit, for each order of the highest index in the calculation loop, the number of the order of the index,
There is provided a tensor data calculation device comprising: calculation processing means for performing loop calculation on tensor data arranged on the data storage unit by the data arrangement processing means.

また、本発明の実施の形態により、Ｎ次元（Ｎは３以上の整数）のテンソルデータに対し、当該テンソルデータの複数インデックスについてのループ計算を行うテンソルデータ計算装置が実行するテンソルデータ計算方法であって、
計算ループの最下位のインデックスから最上位の１つ前までの各インデックスの順で各インデックスのループ方向順となるようにテンソルデータの要素の順番付けを行い、当該順番付けに従って、テンソルデータ中の空でない要素について、当該要素の値と、当該要素のテンソルデータ上の位置を示す最下位のインデックスから最上位の１つ前までのインデックスのインデックス値とをデータ記憶部上に配置するとともに、当該空でない要素のカウント数を算出して前記データ記憶部に配置する処理を、計算ループの最上位のインデックスの順番毎に当該インデックスの順番数だけ実行するデータ配置処理ステップと、
前記データ配置処理ステップにより前記データ記憶部上に配置されたテンソルデータに対するループ計算を行う計算処理ステップと
を備えることを特徴とするテンソルデータ計算方法が提供される。 According to the embodiment of the present invention, a tensor data calculation method executed by a tensor data calculation device that performs loop calculation for a plurality of indexes of tensor data for N-dimensional (N is an integer of 3 or more) tensor data. There,
The tensor data elements are ordered in the order of the loop direction of each index in the order of each index from the lowest index of the calculation loop to the previous highest index, and in the tensor data according to the ordering. For a non-empty element, the value of the element and the index value of the index from the lowest index indicating the position on the tensor data of the element to the immediately preceding index are arranged on the data storage unit, and A data placement processing step for calculating the number of non-empty elements and placing the data storage unit in the data storage unit for each order of the highest index in the calculation loop for the number of the order of the index;
There is provided a tensor data calculation method comprising: a calculation processing step of performing a loop calculation on tensor data arranged on the data storage unit by the data arrangement processing step.

本発明の実施の形態によれば、スパーステンソルの因子分解のための処理を高速化するとともに、処理の際に必要とするメモリ量とメモリアクセス通信量を削減する技術が提供される。 According to the embodiment of the present invention, there is provided a technique for speeding up the process for factoring the sparse tensor and reducing the amount of memory and the amount of memory access communication required for the process.

非負値テンソル因子分解の例を示す図である。It is a figure which shows the example of non-negative tensor factorization. 密テンソルにおける因子行列Ａの更新処理手順を示す図である。It is a figure which shows the update process sequence of the factor matrix A in a dense tensor. 欠損値スパーステンソルにおける因子行列Ａの更新処理手順を示す図である。It is a figure which shows the update process sequence of the factor matrix A in a missing value sparse tensor. ０値スパーステンソルにおける因子行列Ａの更新処理手順を示す図である。It is a figure which shows the update process sequence of the factor matrix A in 0 value sparse tensor. 本発明の実施の形態におけるテンソルデータ計算装置の構成図である。It is a block diagram of the tensor data calculation apparatus in an embodiment of the present invention. テンソルデータのデータ構造のイメージを示す図である。It is a figure which shows the image of the data structure of tensor data. テンソルデータを３次の配列で保持する場合における密テンソルの因子行列Ａの更新処理手順を示す図である。It is a figure which shows the update process procedure of the factor matrix A of a dense tensor in the case of hold | maintaining tensor data with a tertiary array. テンソルデータを３次の配列で保持する場合における欠損値スパーステンソルの因子行列Ａの更新処理手順を示す図である。It is a figure which shows the update process sequence of the factor matrix A of a missing value sparse tensor in the case of hold | maintaining tensor data with a 3rd-order arrangement | sequence. テンソルデータを３次の配列で保持する場合における０値スパーステンソルの因子行列Ａの更新処理手順を示す図である。It is a figure which shows the update process sequence of the factor matrix A of 0 value sparse tensor in the case of holding | maintaining tensor data by a 3rd-order arrangement | sequence. 因子行列Ａに関し、ｉを固定して、ｊ毎に因子行列展開を行うことを説明するための図である。It is a figure for demonstrating performing factor matrix expansion | deployment for every j, fixing i regarding the factor matrix A. FIG. ユーザ１に対して必要な情報を示す図である。It is a figure which shows the information required with respect to the user 1. 因子行列Ａの更新のために保持するデータを示す図である。It is a figure which shows the data hold | maintained for the update of the factor matrix A. 因子行列Ｂ、因子行列Ｃにおける軸の取り方の例を説明するための図である。10 is a diagram for explaining an example of how to set axes in the factor matrix B and the factor matrix C. FIG. 因子行列Ｂの更新のために保持するデータを示す図である。It is a figure which shows the data hold | maintained for the update of the factor matrix B. FIG. 各因子行列の更新のためのデータを示す図である。It is a figure which shows the data for the update of each factor matrix. 軸毎の疎行列を用意する効果を説明するための図である。It is a figure for demonstrating the effect which prepares the sparse matrix for every axis | shaft. 高次のテンソルへの拡張を示す図である。It is a figure which shows the expansion to a high-order tensor. 因子行列Ｚ_ｋの更新のために保持するデータを示す図である。Is a diagram illustrating the data held to update the factor matrix Z _k.

以下、図面を参照して本発明の実施の形態を説明する。なお、以下で説明する実施の形態は一例に過ぎず、本発明が適用される実施の形態は、以下の実施の形態に限られるわけではない。 Embodiments of the present invention will be described below with reference to the drawings. The embodiment described below is only an example, and the embodiment to which the present invention is applied is not limited to the following embodiment.

（装置構成）
図５に本発明の実施の形態に係るテンソルデータ計算装置１０の構成を示す。テンソルデータ計算装置１０は、テンソルデータの入力を受けて、非負値テンソル因子分解を行って、結果としての因子行列のデータを出力する装置である。図５に示すように、テンソルデータ計算装置１０は、データ入力部１１、データ配置処理部１２、データ記憶部１３、テンソル因子分解処理部１４、及びデータ出力部１５を有する。 (Device configuration)
FIG. 5 shows a configuration of the tensor data calculation apparatus 10 according to the embodiment of the present invention. The tensor data calculation device 10 receives tensor data, performs non-negative tensor factorization, and outputs the resulting factor matrix data. As illustrated in FIG. 5, the tensor data calculation apparatus 10 includes a data input unit 11, a data arrangement processing unit 12, a data storage unit 13, a tensor factorization processing unit 14, and a data output unit 15.

データ入力部１１はテンソルデータを受信し、当該テンソルデータをデータ配置処理部１２に渡す。データ配置処理部１２は、テンソルデータを疎行列化して、後に詳細に説明するデータ構造でテンソルデータをデータ記憶部１３（メモリ等）に格納（配置）する。テンソル因子分解処理部１４は、データ記憶部１３に配置されている上記データ構造のテンソルデータに対し、非負値テンソル因子分解の処理を行って、処理結果である因子行列のデータをデータ出力部１５に渡し、データ出力部１５は当該データを出力する。 The data input unit 11 receives the tensor data and passes the tensor data to the data arrangement processing unit 12. The data arrangement processing unit 12 converts the tensor data into a sparse matrix and stores (arranges) the tensor data in the data storage unit 13 (memory or the like) with a data structure described in detail later. The tensor factorization processing unit 14 performs non-negative tensor factorization processing on the tensor data having the above data structure arranged in the data storage unit 13, and outputs the data of the factor matrix as the processing result to the data output unit 15. The data output unit 15 outputs the data.

本実施の形態に係るテンソルデータ計算装置１０は、コンピュータに、本実施の形態で説明する処理内容を記述したプログラムを実行させることにより実現可能である。すなわち、テンソルデータ計算装置１０が有する機能は、当該コンピュータに内蔵されるＣＰＵやメモリ、ハードディスクなどのハードウェア資源を用いて、テンソルデータ計算装置１０で実施される処理に対応するプログラムを実行することによって実現することが可能である。また、上記プログラムは、コンピュータが読み取り可能な記録媒体（可搬メモリ等）に記録して、保存したり、配布したりすることが可能である。また、上記プログラムをインターネットや電子メールなど、ネットワークを通して提供することも可能である。 The tensor data calculation apparatus 10 according to the present embodiment can be realized by causing a computer to execute a program that describes the processing content described in the present embodiment. That is, the function of the tensor data calculation apparatus 10 is to execute a program corresponding to the processing executed by the tensor data calculation apparatus 10 using hardware resources such as a CPU, memory, and hard disk built in the computer. Can be realized. Further, the program can be recorded on a computer-readable recording medium (portable memory or the like), stored, or distributed. It is also possible to provide the program through a network such as the Internet or electronic mail.

以下、テンソルデータ計算装置１０における処理をより詳細に説明する。以下では、特にデータ配置処理部１２により実行される処理を詳細に説明する。以下では、まず、図１に示したような、３次のテンソルデータを３個の因子行列Ａ、Ｂ、Ｃに分解する場合を例として説明する。その後に、Ｎ次（Ｎは３以上の整数）に拡張する例を説明する。 Hereinafter, the process in the tensor data calculation apparatus 10 will be described in more detail. In the following, the processing executed by the data arrangement processing unit 12 will be described in detail. In the following, first, a case where the cubic tensor data as shown in FIG. 1 is decomposed into three factor matrices A, B, and C will be described as an example. Thereafter, an example of extending to the Nth order (N is an integer of 3 or more) will be described.

（テンソルデータ構造）
まず、テンソルデータ計算装置１０が扱う一般的なテンソルデータ構造について説明する。テンソルデータ計算装置１０では、高次のテンソルデータはプログラム処理上配列として格納して扱う。例えば、３次のテンソルデータＸ_ｉｊｋをＸ［Ｉ］［Ｊ］［Ｋ］という３次の配列で保持する。この場合のデータ構造のイメージは図６に示すとおりである。 (Tensor data structure)
First, a general tensor data structure handled by the tensor data calculation apparatus 10 will be described. In the tensor data calculation device 10, higher-order tensor data is stored and handled as an array for program processing. For example, the third order tensor data X _ijk is held in a third order array of X [I] [J] [K]. The image of the data structure in this case is as shown in FIG.

当該データ構造において、因子分解のための、通常の密行列の更新処理では、上記のとおり３次のテンソルデータをＸ［Ｉ］［Ｊ］［Ｋ］という３次の配列で保持し、図７に示すように、４重ループによる処理を行う。しかしながら、前述したように、密テンソルの処理には、計算量が大きくなり、必要とするメモリ領域も大きくなるという問題がある。 In the data structure, in the normal dense matrix update process for factorization, the third-order tensor data is held in the third-order array of X [I] [J] [K] as described above, and FIG. As shown in FIG. 4, processing by a quadruple loop is performed. However, as described above, the dense tensor processing has a problem that the amount of calculation increases and the required memory area also increases.

本実施の形態では、スパースなテンソルデータを対象としており、その場合、前述したとおり、因子分解の更新式において計算不要な要素を省略し、計算が必要な要素のみを演算処理することによって高速に計算を可能とする。 In this embodiment, sparse tensor data is targeted. In this case, as described above, elements that do not require calculation are omitted in the factorization update formula, and only the elements that require calculation are processed at high speed. Enable calculation.

この場合の因子行列Ａの更新処理手順は図８及び図９に示すとおりである。当該更新処理は、テンソルデータ計算装置１０のテンソル因子分解処理部１４により実行されるものである。図８に示すとおり、欠損値スパースデータの場合、例えばａ_ｉｒの更新では、ａ_ｉｒ自体に対してＩｘＲのループ処理を行い、ｊ、ｋの組み合わせに関しては、ｉに対するｊ、ｋのログ数Ｌ_ｉの処理ループを回す計算を行う。図９は、０値スパースデータの場合における因子行列Ａの更新処理手順を示す。図４を参照して既に説明したとおり、０値スパースデータの場合、因子分解の最適値を推定するための更新式におけるインデックスの依存関係に応じて、ループ計算を分離して実行している。具体的には、分母の計算処理に係るループ計算と、分子の計算処理に係るループ計算とを分離して実行している。最も計算量のかかる処理は分子の値の計算であり、分子の計算処理に関しては欠損値スパーステンソルと同様の計算を行う。すなわち、分子の計算処理において、例えばａ_ｉｒの更新では、ａ_ｉｒ自体に対してＩｘＲのループ処理を行い、ｊ、ｋの組み合わせに関しては、ｉに対するｊ、ｋのログ数Ｌ_ｉの処理ループを回す計算を行う。分母に関しては、図９に示すとおり、Ｊ回のループ処理とＫ回のループ処理とをＲ回行う。前述したように、全体の計算量に対して分母の計算量は無視できるオーダである。 The update process procedure of the factor matrix A in this case is as shown in FIGS. The update process is executed by the tensor factorization processing unit 14 of the tensor data calculation apparatus 10. As shown in FIG. 8, when the missing values sparse data, for example, in the updating of _{a _ir,} _{a ir} loops through the IxR against itself, j, with respect to the combination of k, j for i, k of the number of logs L Perform a calculation that turns the processing loop of _i . FIG. 9 shows the update processing procedure of the factor matrix A in the case of 0-value sparse data. As already described with reference to FIG. 4, in the case of zero-valued sparse data, loop calculation is separated and executed in accordance with the index dependency in the update formula for estimating the optimum factorization value. Specifically, the loop calculation related to the denominator calculation process and the loop calculation related to the numerator calculation process are executed separately. The processing with the most calculation amount is the calculation of the value of the molecule, and the calculation of the molecule is performed in the same manner as the missing value sparse tensor. That, in the calculation processing of the molecule, for example, in the updating of _{a _ir,} loops through the IxR against _{a ir} itself, j, with respect to the combination of k, j for i, the processing loop of the log number _{L i} of k Perform a turn calculation. Regarding the denominator, as shown in FIG. 9, J times of loop processing and K times of loop processing are performed R times. As described above, the calculation amount of the denominator is negligible with respect to the total calculation amount.

この時、各ｉにおけるｊ、ｋ、ｌの組み合わせ情報の保持が必要となる。図８及び図９では、当該情報は、Ａで示されるように、各指標データ（ｉｎｄｅｘ＿ｊ，ｉｎｄｅｘ＿ｋ，ｉｎｄｅｘ＿ｌ）としてデータ記憶部１３に配置される。また、Ｘ_ｉｊｋと＾Ｘ_ｉｊｋについては、ベクトル、もしくは同様な指標としてデータ記憶部１３に保持しておく。因子行列Ｂ、Ｃについても同様のデータをデータ記憶部１３に配置しておく。 At this time, it is necessary to hold the combination information of j, k, and l in each i. 8 and 9, the information is arranged in the data storage unit 13 as index data (index_j, index_k, index_l) as indicated by A. Further, X _ijk and ^ X _ijk are stored in the data storage unit 13 as vectors or similar indices. Similar data is arranged in the data storage unit 13 for the factor matrices B and C.

本実施の形態では、これら指標のデータを、データ配置処理部１２により、更新処理を効率良く実行可能ならしめるデータ構造としてデータ記憶部１３に配置しておくことにより、テンソル因子分解処理部１４により実行されるスパーステンソルの行列更新を高速に実施することを可能としている。 In the present embodiment, the tensor factor decomposition processing unit 14 arranges the data of these indexes in the data storage unit 13 as a data structure that enables the update processing to be executed efficiently by the data arrangement processing unit 12. It is possible to perform matrix update of the executed sparse tensor at high speed.

以下では、本実施の形態において、データ記憶部１３に配置されるデータをより詳細に説明する。以下で説明する処理は、欠損値スパーステンソルの場合と０値スパーステンソルの場合とで共通である。ただし、以下で説明する処理は、欠損値スパーステンソルの場合は分母と分子の計算処理に関連し、０値スパーステンソルの場合は分子の計算処理に関連する。 Hereinafter, in the present embodiment, data arranged in the data storage unit 13 will be described in more detail. The processing described below is common to the case of the missing value sparse tensor and the case of the zero value sparse tensor. However, the processing described below relates to the denominator and numerator calculation processing in the case of the missing value sparse tensor, and relates to the numerator calculation processing in the case of the zero value sparse tensor.

（テンソルデータの疎行列化：因子行列Ａを例とした説明）
図８及び図９の更新手順に示したように、因子行列Ａに関しては、ｉ毎の要素で更新を行う。当該更新のループ計算において、ｉが最上位のインデックスである。 (Sparse matrix of tensor data: explanation using factor matrix A as an example)
As shown in the update procedure of FIGS. 8 and 9, the factor matrix A is updated with elements for each i. In the update loop calculation, i is the highest index.

本実施の形態では、ｉを固定して、ｊ毎に行列展開を行うことで、テンソルデータの疎行列化を行う。すなわち、テンソルデータ計算装置１０のデータ配置処理部１２は、図１０（ａ）に示すようなテンソルデータを、図１０（ｂ）に示すように、ｉを固定して、ｊ毎の行列に展開する。本実施の形態では、これをテンソルデータの疎行列化と呼んでいる。なお、ここでのポイントは、ｊ毎に展開を行うことではなくて、ｉを固定することである。 In this embodiment, tensor data is sparsed by fixing i and performing matrix expansion for each j. That is, the data arrangement processing unit 12 of the tensor data calculation apparatus 10 expands the tensor data as shown in FIG. 10A into a matrix for each j with i fixed as shown in FIG. 10B. To do. In the present embodiment, this is called sparse matrixing of tensor data. The point here is not to expand every j, but to fix i.

図１０（ｂ）に示すとおり、展開した結果において、ｉ＝１では、
ｉ＝１，［６，０，２，４，０，０，０，０，０，１，２，０，０，０，０，２］
というベクトルができる。スパーステンソルでは０要素は処理が不要であるため、データ配置処理部１２は必要な要素の情報のみをデータ記憶部１３に配置する。当該情報を図１１において、Ｘ_ｉｊｋとして示す。 As shown in FIG. 10B, in the expanded result, when i = 1,
i = 1, [6, 0, 2, 4, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 2]
The vector Since the sparse tensor does not require processing of the 0 element, the data arrangement processing unit 12 arranges only the information of the necessary elements in the data storage unit 13. This information is shown as X _{ijk in} FIG.

また、このとき、ｊ、ｋの情報が必要なため当該情報を別途保持する。当該情報は、図１１において（ｊ，ｋ）として示されている。例えば、ユーザ１のログデータ値６は、（ｊ，ｋ）＝（１，１）に対応することが示されている。また、図１１には、更新処理上持つ必要がある情報として＾Ｘ_ｉｊｋの値の例が示されている。当該値は、因子行列の更新に伴って順次更新されていく値である。 At this time, since information on j and k is necessary, the information is held separately. This information is shown as (j, k) in FIG. For example, it is shown that the log data value 6 of the user 1 corresponds to (j, k) = (1, 1). In addition, FIG. 11 shows an example of the value of ^ _Xijk as information necessary for the update process. The value is a value that is sequentially updated as the factor matrix is updated.

なお、上記ｋは、計算ループの最下位のインデックスであり、上記ｊは、最上位の１つ前のインデックスである。 Note that k is the lowest index of the calculation loop, and j is the highest previous index.

上記のとおり、因子行列ＡはＩに関しての因子分解であり、更新処理はｉに関して行うため、データ配置処理部１２は、因子行列Ａに関して、各ｉのログ数、各ｉに関する（ｊ，ｋ）の組み合わせ、Ｘ_ｉｊｋの値、＾Ｘ_ｉｊｋの値をデータ記憶部１３に配置する。 As described above, since the factor matrix A is factorization with respect to I and the update process is performed with respect to i, the data arrangement processing unit 12 regarding the factor matrix A has the number of logs for each i and (j, k) for each i. , X _ijk value and ^ X _ijk value are arranged in the data storage unit 13.

最終的に因子行列Ａの更新のために、図１２（ａ）〜（ｃ）に示すデータをデータ記憶部１３に配置する。（ｃ）の＾Ｘ_ｉｊｋは空欄になっているが、これは、計算結果を格納するメモリ領域として予め用意することを意味する。ここで、（ｊ，ｋ）の組み合わせを格納する疎行列データ（図１２の（ｂ））をインデックス疎行列と呼ぶ。 Finally, in order to update the factor matrix A, data shown in FIGS. (X) _{ijk in} (c) is blank, which means that it is prepared in advance as a memory area for storing calculation results. Here, the sparse matrix data ((b) in FIG. 12) storing the combination of (j, k) is called an index sparse matrix.

因子行列Ａについて、上記のようなインデックス疎行列等を保持しておき、図８及び図９に示した手順で更新処理を行うのである。このように、スパーステンソルを疎行列表現に置き換えて処理を行うことで、処理の高速化、及びメモリ使用量の低減を図る事が可能である。 For the factor matrix A, the index sparse matrix as described above is held, and the update process is performed according to the procedure shown in FIGS. Thus, by replacing the sparse tensor with the sparse matrix representation and performing the processing, it is possible to increase the processing speed and reduce the memory usage.

上記のようなデータ配置を行う際に、データ配置処理部１２は、計算ループの最下位のインデックス（例：ｋ）から最上位の１つ前までの各インデックス（例：ｊ）の順で各インデックスのループ方向順となるようにテンソルデータの要素の順番付けを行い、当該順番付けに従って、テンソルデータ中の空でない要素について、当該要素の値（例：Ｘ_ｉｊｋ）と、当該要素のテンソルデータ上の位置を示す最下位のインデックスから最上位の１つ前までのインデックスのインデックス値（例：（ｊ，ｋ）の値）とをデータ記憶部１３上に配置するとともに、当該空でない要素のカウント数（Ｌ_ｉ）を算出して前記データ記憶部１３に配置する処理を、計算ループの最上位のインデックスの順番毎（例：ｉ＝１，２，３，４の順番）に当該インデックスの順番数（例：ｉ＝１，２，３，４の場合、４）だけ実行する。 When performing the data arrangement as described above, the data arrangement processing unit 12 sets each index in the order from the lowest index (example: k) to the previous index (example: j) in the calculation loop. The elements of the tensor data are ordered so that they are in the loop direction of the index, and according to the ordering, for the non-empty element in the tensor data, the value of the element (eg, X _ijk ) and the tensor data of the element An index value (for example, a value of (j, k)) from the lowest index indicating the upper position to the previous highest index is arranged on the data storage unit 13 and the non-empty element The process of calculating the count number (L _i ) and placing it in the data storage unit 13 is performed for each order of the highest index in the calculation loop (eg, the order of i = 1, 2, 3, 4). Execute only for the order number of the index (example: 4 for i = 1, 2, 3, 4).

なお、上記の「計算ループの最下位のインデックス（例：ｋ）から最上位の１つ前までの各インデックス（例：ｊ）の順で各インデックスのループ方向順となるように順番付けする」とは、例えば、図１０（ｂ）と図１２（ｂ）の各ｉの行の（ｊ，ｋ）について、（ｉ＝１ではｊ＝１，ｋ＝１からＬ_１＝６なので６個を並べる），（ｉ＝２では、ｊ＝１，ｋ＝２からＬ_２＝４なので４個を並べる）...のように順番付けすることである。 In addition, the above-mentioned “order the loops in the order of each index in the order of each index (example: j) from the lowest index (example: k) of the calculation loop to the immediately preceding index”. For example, for (j, k) in each row i in FIGS. 10 (b) and 12 (b) (6 = 1 because j = 1 and k = 1 to L ₁ = 6 for i = 1). (When i = 2, j = 1, k = 2 to L ₂ = 4, so four are arranged).

上記のようなデータ配置を行うことで、更新のための計算ループを実行する際は、要素のカウント数（Ｌ_ｉ）だけインデックスおよび要素値を参照すればよく、空の要素を参照する必要がなくなるため、計算処理を効率化できるとともにメモリを効率的に利用可能となる。また、空でない要素の値にアクセスする際も、インデックスのループの最下位から最上位の順に従ってメモリ上で順次アクセスすることが可能となるため、ランダムアクセスが解消し、メモリアクセススピードを高速化できる効果を奏する。 By performing the calculation loop for updating by performing the data arrangement as described above, it is only necessary to refer to the index and element value by the count number (L _i ) of the element, and it is necessary to refer to an empty element. Therefore, the calculation process can be made efficient and the memory can be used efficiently. In addition, when accessing the value of a non-empty element, it is possible to access on the memory in order from the lowest to highest in the index loop, eliminating random access and increasing memory access speed. There is an effect that can be done.

（テンソルデータの疎行列化：因子行列Ｂ、Ｃに関して）
これまでに説明したとおり、因子行列Ａに対してはＩ方向に疎行列の行を取っている。因子行列Ｂに対しては、Ｊ方向を疎行列の行にするために、図１３の（ａ）から（ｂ）のように軸を変えて処理する。同様に、因子行列Ｃに対しては、Ｋ方向を疎行列の行にするために、図１３の（ｂ）から（ｃ）のように軸を変えて処理する。 (Tensor data sparse matrix: Factor matrices B and C)
As described above, the factor matrix A has sparse matrix rows in the I direction. The factor matrix B is processed by changing the axis as shown in FIGS. 13A to 13B in order to make the J direction a sparse matrix row. Similarly, the factor matrix C is processed by changing the axis as shown in FIGS. 13B to 13C in order to make the K direction a sparse matrix row.

図１４は、因子行列Ｂに対して保持するデータを示す。図１４に示すとおり、因子行列Ｂに対しては、Ｌ_ｊ、Ｘ_ｉｊｋ、（ｋ，ｉ）のインデックス疎行列を保持し、＾Ｘ_ｉｊｋのメモリ領域を確保しておく。また、因子行列Ｃに対しても同様に、Ｌ_ｋ、Ｘ_ｉｊｋ、（ｉ，ｊ）のインデックス疎行列を保持し、＾Ｘ_ｉｊｋのメモリ領域を確保しておく。図１５に、因子行列Ａ、Ｂ、Ｃの更新のためのデータをまとめて示す。 FIG. 14 shows data held for the factor matrix B. As shown in FIG. 14, for the factor matrix B, an index sparse matrix of L _j , X _ijk , (k, i) is held, and a memory area of ^ X _ijk is secured. Similarly, for the factor matrix C, an index sparse matrix of L _k , X _ijk , (i, j) is retained, and a memory area of ^ X _ijk is secured. FIG. 15 collectively shows data for updating the factor matrices A, B, and C.

（テンソルデータの疎行列化：効果の説明）
例えば、因子行列Ａで用いたＸ_ｉｊｋに対して、Ｊ軸要素を基とする因子行列Ｂの更新処理を行うと、図１６（ａ）に示すようにランダムアクセスをしてしまう。 (Tense data sparse matrix: explanation of effects)
For example, if update processing of the factor matrix B based on the J-axis element is performed on X _ijk used in the factor matrix A, random access is made as shown in FIG.

一方、本実施の形態の技術のように、因子行列に対応する軸毎に疎行列を用意することにより、当該因子行列の更新処理を行う際に、図１６（ｂ）に示すように、ランダムアクセスは発生せず、通信コストが下がり計算処理の高速化が可能となる。 On the other hand, when a sparse matrix is prepared for each axis corresponding to a factor matrix as in the technique of the present embodiment, when updating the factor matrix, as shown in FIG. No access occurs, the communication cost is reduced, and the calculation process can be speeded up.

すなわち、本実施の形態のように、各軸（モード）毎に更新用の疎行列データ構造を保持することで、モード毎に対する処理を単純ループ処理で実施することが可能となる。また、単純ループ処理に加え、メモリのランダムアクセスを抑えることによって特に大規模データでの処理ではキャッシュヒット率を向上させるなど、高速化が期待できる。ランダムアクセスを抑える効果は、ＧＰＵなどの並列分散処理では参照メモリ量を予め各スレッドでの参照必要量だけに抑えるだけでよいなど、非常に有効である。 That is, as in the present embodiment, by holding a sparse matrix data structure for update for each axis (mode), it is possible to perform processing for each mode by simple loop processing. In addition to simple loop processing, it is expected to increase the speed by suppressing random access of the memory, for example, improving the cache hit rate particularly in processing with large-scale data. The effect of suppressing random access is very effective, for example, in the case of parallel distributed processing such as GPU, it is only necessary to suppress the reference memory amount to the reference necessary amount in each thread in advance.

（Ｎ次への拡張）
ここまで３次のテンソルを例にとって因子分解例を説明したが、本実施の形態に係る技術は、３次のテンソルに限らず、３次よりも大きな次数であるＮ次（Ｎは３以上の整数）のテンソルに拡張可能である。 (Extension to Nth order)
The factorization example has been described so far by taking the third-order tensor as an example. However, the technique according to the present embodiment is not limited to the third-order tensor, and the N-order (N is 3 or more), which is an order larger than the third-order. (Integer) tensor.

Ｎ次のテンソルデータにおいて、ログ数Ｌ、Ｘ_{ｉ１…ｉＮ}のテンソルデータがあり、これを、図１７に示すように、因子行列Ｚ_１…Ｚ_Ｎにそれぞれ分解するケースを考える。なお、Ｘの添字は、明細書での表記の便宜上"ｉ１…ｉＮ"と表しているが、これは"ｉ_１…ｉ_Ｎ"を意図している。 In order N tensor data, it has a log number _{L, X i1 ... iN} tensor data, which, as shown in FIG. 17, consider each decomposed case to factor matrix _Z 1 ... _{Z N.} Incidentally, subscript X is represents a convenience "i1 ... iN" notation in the specification, which is intended _"i ₁ ... i _N".

Ｎ次の場合も、３次を例にとって説明した場合と同様であり、データ配置処理部１２は、因子行列Ｚ_１…Ｚ_Ｎの各々について、対応する軸に対して展開した疎行列を作成する。 The N-order case is the same as the case described for the third-order example, and the data arrangement processing unit 12 creates a sparse matrix expanded with respect to the corresponding axis for each of the factor matrices Z ₁ ... Z _N. .

図１８（ａ）〜（ｃ）に、因子行列Ｚ_ｋの更新のためのデータを示す。図１８（ａ）〜（ｃ）に示すとおり、Ｚ_ｋの因子行列を更新させる場合、ｋに関わるログ数はＬ_ｋであり、Ｘ_{ｉ１…ｉＮ}のテンソルデータも（ａ）に示す構造で保持しておく。またインデックス疎行列については、（ｂ）に示すように、ｋ以外の要素の組み合わせ（ｉ_ｋ＋１，…ｉ_Ｎ，ｉ_１，…ｉ_ｋ−１）を保持しておく。これは、前述した「計算ループの最下位のインデックスから最上位の１つ前までの各インデックス」に相当する。また、＾Ｘ_{ｉ１…ｉＮ}のメモリ領域を確保しておく。これにより、任意の次数Ｎのテンソル因子分解に拡張可能である。すなわち、本実施の形態の技術により、インデックス疎行列に複数要素をもたせることで、高次モードへの拡張が容易に可能である。 Figure 18 (a) ~ (c) , shows the data for updating the factor matrix _{Z k.} As shown in FIGS. 18A to 18C, when the factor matrix of Z _k is updated, the number of logs related to _k is L _k and the tensor data of X _{i1... IN} is also held in the structure shown in FIG. Keep it. As for the index sparse matrix, as shown in (b), combinations of elements other than _k (i _{k + 1} ,..., I _N , i ₁ ,..., I _k−1 ) are held. This corresponds to the “each index from the lowest index of the calculation loop to the previous index of the highest one” described above. Also, a memory area of ^ X _{i1... IN} is secured. Thereby, it can be extended to a tensor factorization of arbitrary order N. That is, with the technique of the present embodiment, the index sparse matrix has a plurality of elements, and can be easily extended to a higher-order mode.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１０テンソルデータ計算装置
１１データ入力部
１２データ配置処理部
１３データ記憶部
１４テンソル因子分解処理部
１５データ出力部 DESCRIPTION OF SYMBOLS 10 Tensor data calculation apparatus 11 Data input part 12 Data arrangement | positioning process part 13 Data storage part 14 Tensor factorization process part 15 Data output part

Claims

A tensor data calculation device that performs a loop calculation on a plurality of indexes of tensor data for N-dimensional (N is an integer of 3 or more) tensor data,
The tensor data elements are ordered in the order of the loop direction of each index in the order of each index from the lowest index of the calculation loop to the previous highest index, and in the tensor data according to the ordering. For a non-empty element, the value of the element and the index value of the index from the lowest index indicating the position on the tensor data of the element to the immediately preceding index are arranged on the data storage unit, and Data placement processing means for calculating the number of non-empty elements and placing the data storage unit in the data storage unit, for each order of the highest index in the calculation loop, the number of the order of the index,
A tensor data calculation device comprising: calculation processing means for performing loop calculation on tensor data arranged on the data storage unit by the data arrangement processing means.

The tensor data calculation apparatus according to claim 1, wherein the calculation processing unit performs factorization of the tensor data by loop calculation on the tensor data.

The tensor data calculation according to claim 2, wherein the calculation processing unit performs loop calculation separately according to an index dependency in an update expression for estimating an optimum value of the factorization. apparatus.

A tensor data calculation method executed by a tensor data calculation device that performs loop calculation on a plurality of indexes of tensor data for N-dimensional (N is an integer of 3 or more) tensor data,
The tensor data elements are ordered in the order of the loop direction of each index in the order of each index from the lowest index of the calculation loop to the previous highest index, and in the tensor data according to the ordering. For a non-empty element, the value of the element and the index value of the index from the lowest index indicating the position on the tensor data of the element to the immediately preceding index are arranged on the data storage unit, and A data placement processing step for calculating the number of non-empty elements and placing the data storage unit in the data storage unit for each order of the highest index in the calculation loop for the number of the order of the index;
A tensor data calculation method comprising: a calculation processing step of performing loop calculation on tensor data arranged on the data storage unit by the data arrangement processing step.

5. The tensor data calculation method according to claim 4, wherein, in the calculation processing step, the tensor data calculation device performs factorization of the tensor data by loop calculation on the tensor data.

In the calculation processing step, the tensor data calculation device performs a loop calculation separately according to an index dependency in an update expression for estimating an optimum value of the factorization. 5. The tensor data calculation method according to 5.

The program for functioning a computer as each means in the tensor data calculation apparatus of any one of Claims 1 thru | or 3.