JP2018128708A

JP2018128708A - Tensor factor decomposition processing apparatus, tensor factor decomposition processing method and tensor factor decomposition processing program

Info

Publication number: JP2018128708A
Application number: JP2017019202A
Authority: JP
Inventors: 良太今井; Ryota Imai; 忠毛利; Tadashi Mori; 宮本　勝; Masaru Miyamoto; 勝宮本
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-02-06
Filing date: 2017-02-06
Publication date: 2018-08-16
Anticipated expiration: 2037-02-06
Also published as: JP6535355B2

Abstract

PROBLEM TO BE SOLVED: To speed up the decomposition of nonnegative tensor factor and improve computation accuracy.SOLUTION: In a tensor factor decomposition processing apparatus 1 for performing factor decomposition processing of a tensor, a tensor construction unit 11 replaces a missing value of tensor constructed with multiple attribute information of a specific event as an element with a negative value. A matrix updating unit 21 updates elements of a factor matrix of the tensors processed in the tensor construction unit 11 by an updating expression based on distance between the tensor and the tensor calculated from the factor matrix. Moreover, if the element of the tensor is a negative value, it is determined that the element is a missing value of the tensor, and the correction value of the element is added to the update.SELECTED DRAWING: Figure 1

Description

本発明は、データマイニング技術、特に、複数の属性情報から要因パターンを抽出する因子分解に関する技術、具体的には非負値複合テンソル因子分解の技術に関する。 The present invention relates to a data mining technique, and more particularly to a technique related to factorization for extracting a factor pattern from a plurality of attribute information, specifically to a technique for non-negative complex tensor factorization.

複数の属性情報から要因パターンを抽出する技術として、非負値テンソル因子分解及び非負値複合テンソル因子分解と呼ばれる技術がある（非特許文献１）。これらの技術には、高速化技術が提案されており、スパーステンソルに対して高速に処理することができる（特許文献１）。 As a technique for extracting a factor pattern from a plurality of attribute information, there is a technique called non-negative value tensor factorization and non-negative composite tensor factorization (Non-Patent Document 1). For these techniques, high-speed techniques have been proposed, and high-speed processing can be performed for sparse tensors (Patent Document 1).

さらに、非負値テンソル因子分解の応用として、属性情報の一部が欠損している場合、この欠損値を補完する非負値テンソル補完と呼ばれる技術がある（非特許文献２）。非負値テンソル補完は、目的こそ異なるが、非負値テンソル因子分解において欠損値に関する処理を追加したものであるといえる。 Furthermore, as an application of non-negative tensor factorization, there is a technique called non-negative tensor complementation that complements this missing value when part of attribute information is missing (Non-patent Document 2). Although non-negative tensor complementation is different in purpose, it can be said that non-negative tensor factorization adds processing for missing values.

Koh Takeuchi , Ryota Tomioka , Katsuhiko Ishiguro , Akisato Kimura , and Hiroshi Sawada ," Non-negative Multiple Tensor Factorization ", ICDM , 2013 , 1199-1204Koh Takeuchi, Ryota Tomioka, Katsuhiko Ishiguro, Akisato Kimura, and Hiroshi Sawada, "Non-negative Multiple Tensor Factorization", ICDM, 2013, 1199-1204 竹内孝，納谷太，上田修功，「一般化ＫＬダイバージェンスを用いた非負テンソル補完と交通流解析への応用」，Proceedings of DEIM Forum, 2016 H7-2Takashi Takeuchi, Tataka Naya, Nobuyoshi Ueda, “Nonnegative tensor interpolation using generalized KL divergence and its application to traffic flow analysis”, Proceedings of DEIM Forum, 2016 H7-2

特開２０１６−１３９３９１号公報Japanese Patent Laid-Open No. 2006-139391

従来の高速化技術（特許文献１）は、非負値テンソルが「０値スパーステンソル」または「欠損値スパーステンソル」のいずれかである場合に適用できる高速化技術である。「０値スパーステンソル」は、欠損値のない非負値テンソルである。「欠損値スパーステンソル」は、欠損値のある非負値テンソルの欠損値を値が「０」である要素（ゼロ要素）として表現したものである。この技術では、観測値としての「０」と欠損値とを一つのテンソル内で同時に表現することができない。したがって、非負値（複合）テンソル補完の内部で用いる非負値（複合）テンソル因子分解を高速に処理するためには、以下の課題がある。 The conventional speed-up technique (Patent Document 1) is a speed-up technique that can be applied when the non-negative tensor is either “0-value sparse tensor” or “missing-value sparse tensor”. The “0 value sparse tensor” is a non-negative tensor with no missing value. The “missing value sparse tensor” represents a missing value of a non-negative tensor having a missing value as an element (zero element) having a value of “0”. With this technique, “0” as an observed value and a missing value cannot be expressed simultaneously in one tensor. Therefore, in order to process the non-negative (composite) tensor factorization used inside the non-negative (composite) tensor complementation at a high speed, there are the following problems.

上記の従来の高速化技術を欠損値のあるテンソルに適用するためには、事前に欠損値をゼロ要素で置き換えたうえで、すべてのゼロ要素を欠損値として扱わなければならず、元のテンソルに欠損値ではないゼロ要素が存在すると計算結果が不正確になる。 In order to apply the above conventional acceleration technology to tensors with missing values, the missing values must be replaced with zero elements in advance, and all zero elements must be treated as missing values. If there are zero elements that are not missing values, the calculation result will be inaccurate.

一方、上記の従来の高速化技術を用いずに正確な計算を行おうとすると、同程度の規模で欠損値のないテンソルの因子分解と比べて計算時間が増大する。 On the other hand, if an accurate calculation is attempted without using the conventional speed-up technique, the calculation time increases as compared with the factorization of a tensor having the same scale and no missing value.

本発明は、上記の事情に鑑み、非負値テンソル因子分解の高速化と計算精度の向上を図ることを課題とする。 In view of the above circumstances, an object of the present invention is to speed up non-negative tensor factorization and improve calculation accuracy.

そこで、本発明の一態様は、テンソルの因子分解処理を行うテンソル因子分解処理装置であって、特定の事象の複数の属性情報を要素として構築されたテンソルの欠損値を負の値に置換するテンソル構築手段と、このテンソル構築手段にて処理されたテンソルの因子行列の要素を当該テンソルとその因子行列から算出されるテンソルの間の距離に基づく更新式により更新する行列更新手段を備える。 Therefore, one aspect of the present invention is a tensor factorization processing apparatus that performs tensor factorization processing, and replaces a missing value of a tensor constructed with a plurality of attribute information of a specific event as a negative value. Tensor construction means and matrix update means for updating the elements of the factor matrix of the tensor processed by the tensor construction means by an update formula based on the distance between the tensor and the tensor calculated from the factor matrix.

また、本発明の一態様は、テンソルの因子分解処理を行うテンソル因子分解処理装置が実行するテンソル因子分解処理方法であって、特定の事象の複数の属性情報を要素として構築されたテンソルの欠損値を負の値に置換するテンソル構築ステップと、このテンソル構築ステップにて処理された前記テンソルの因子行列の要素を当該テンソルとその因子行列から算出されるテンソルの間の距離に基づく更新式により更新する行列更新ステップを有する。 One embodiment of the present invention is a tensor factorization processing method executed by a tensor factorization processing apparatus that performs tensor factorization processing, wherein a tensor deficiency constructed using a plurality of pieces of attribute information of specific events as elements. A tensor construction step that replaces the value with a negative value, and an update formula based on the distance between the tensor and the tensor calculated from the factor matrix of the factor matrix of the tensor processed in the tensor construction step There is a matrix update step to update.

前記行列更新手段及び前記行列更新ステップの一態様は、前記テンソルの要素が負の値であると当該要素は当該テンソルの欠損値であると判断して当該要素の補正値を前記更新に加算する。 In one aspect of the matrix updating means and the matrix updating step, if the element of the tensor is a negative value, the element is determined to be a missing value of the tensor, and the correction value of the element is added to the update. .

前記テンソル構築手段及びテンソル構築ステップの一態様は、前記置換の際に、前記構築されたテンソルをこのテンソルよりも低階のテンソルに変換する。 In one aspect of the tensor construction means and the tensor construction step, the constructed tensor is converted into a tensor having a lower order than the tensor at the time of the replacement.

尚、本発明は、上記装置を構成する各手段としてコンピュータを機能させるプログラム若しくは上記方法の各ステップをコンピュータに実行させるテンソル因子分解処理プログラムの態様とすることもできる。 In addition, this invention can also be made into the aspect of the tensor factor decomposition processing program which makes a computer perform each step of the said method or the program which functions a computer as each means which comprises the said apparatus.

以上の本発明によれば、非負値テンソル因子分解の高速化と計算精度の向上が図られる。 According to the present invention described above, the non-negative tensor factorization can be speeded up and the calculation accuracy can be improved.

（ａ）は本発明の実施形態におけるテンソル因子分解処理装置のブロック構成図、（ｂ）は同装置におけるテンソル分解部のブロック構成図。(A) is a block block diagram of the tensor factor decomposition processing apparatus in the embodiment of the present invention, (b) is a block block diagram of a tensor decomposition unit in the apparatus. （ａ）ログデータに基づくテンソルの一例を示した構成図、（ｂ）は当該テンソルの欠損値情報を例示した一覧表。(A) The block diagram which showed an example of the tensor based on log data, (b) is the list which illustrated the missing value information of the said tensor. テンソルを構築する過程を説明したフローチャート。The flowchart explaining the process of building a tensor. テンソルの圧縮形式を例示した説明図。Explanatory drawing which illustrated the compression format of the tensor. テンソルの因子行列を初期化する過程を説明したフローチャート。The flowchart explaining the process which initializes the factor matrix of a tensor. テンソルの因子行列を更新する過程を説明したフローチャート。The flowchart explaining the process which updates the factor matrix of a tensor.

以下、図面を参照しながら本発明の実施の形態について説明するが本発明はこの実施形態に限定されるものではない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the present invention is not limited to these embodiments.

［概要］
図１に例示のテンソル因子分解処理装置１は、非負値テンソル因子分解の高速化技術を拡張してゼロ要素と区別できる欠損値の表現を導入する。すなわち、本来のテンソルの要素が０以上であることを利用して、テンソルの欠損値を負の値の要素として表現することにより、一度のデータ走査で観測値と欠損値の処理をまとめて行う。また、テンソルの更新式の分母について、先ず、不正確であるが高速に計算できる概算値を算出し、その後、補正値を計算することにより、計算精度が確保される。そして、補正値の算出のためのデータ走査を前記更新式の分子の計算と一括して行うことにより、補正値の高速計算が実現する。 [Overview]
The tensor factorization processing apparatus 1 illustrated in FIG. 1 introduces a representation of missing values that can be distinguished from zero elements by extending the speeding up technique of non-negative tensor factorization. In other words, using the fact that the elements of the original tensor are 0 or more and expressing the missing value of the tensor as a negative value element, the processing of the observed value and the missing value is performed in a single data scan. . For the denominator of the tensor update formula, first, an approximate value that is inaccurate but can be calculated at high speed is calculated, and then a correction value is calculated to ensure calculation accuracy. A high-speed calculation of the correction value is realized by performing a data scan for calculating the correction value together with the calculation of the update type numerator.

［技術用語の説明］
本実施形態の説明にあたり、本実施形態に関連する技術用語について説明する。 [Explanation of technical terms]
In describing the present embodiment, technical terms related to the present embodiment will be described.

属性情報とは、特定の事象を一つ以上の属性の組合せとこの組合せに対応する値で表したものである。例えば、人々が商店を訪れたことの属性情報は、例えば、ユーザＩＤ，店ＩＤ，曜日の３つの属性と、これらの属性に対応する訪問回数や滞在時間とで表すことができる。尚、属性情報は、各属性をモードとみなすことでテンソルとして表すことができる。 The attribute information represents a specific event by a combination of one or more attributes and a value corresponding to the combination. For example, attribute information indicating that people have visited a store can be represented by, for example, three attributes of a user ID, a store ID, and a day of the week, and the number of visits and the stay time corresponding to these attributes. The attribute information can be expressed as a tensor by regarding each attribute as a mode.

テンソルとは、本実施形態においては多次元の配列と同義である。例えば、３階のテンソルは３次元配列として表現できる。但し、非負値テンソルとは、テンソルの全ての要素が０以上であるテンソルを示す。 A tensor is synonymous with a multidimensional array in this embodiment. For example, the third-floor tensor can be expressed as a three-dimensional array. However, the non-negative value tensor indicates a tensor in which all elements of the tensor are 0 or more.

モードとは、テンソルの軸を指す。例えば、行列は２階のテンソルとみなせるが、このとき行方向と列方向の２つのモードがある。 Mode refers to the tensor axis. For example, a matrix can be regarded as a tensor of the second floor, but at this time, there are two modes of a row direction and a column direction.

因子行列とは、非負値テンソルを因子分解することで得られる行列であり、モードと同じ数だけ存在する。 The factor matrix is a matrix obtained by factoring the non-negative tensor, and there are as many as the modes.

欠損値とは、テンソルの要素のうち、その値が不明なものを示す。テンソルに欠損値が含まれる要因としては、例えば、複数のセンサの状態を一定期間収集したデータにおいて、特定のセンサが特定の期間に故障しており、当該データの値が不明な場合が挙げられる。欠損値は、観測値としての「０」とは区別される。観測値としての「０」とは、例えば、特定のセンサが特定の期間に正常に稼働していた上で何も観測しなかった場合に「０」を出力するような場合に対応する。 The missing value indicates a tensor element whose value is unknown. As a factor that a tensor includes a missing value, for example, in a case where data of a plurality of sensor states are collected for a certain period, a specific sensor has failed during a specific period, and the value of the data is unknown. . Missing values are distinguished from “0” as an observed value. “0” as an observation value corresponds to a case where “0” is output when, for example, a specific sensor is operating normally during a specific period and nothing is observed.

［装置の構成例］
テンソル因子分解処理装置１は、入力データ記憶部１０，テンソル構築部１１，テンソル分解部１２，欠損値推定部１３，出力データ記憶部１４を備える。 [Device configuration example]
The tensor factorization processing apparatus 1 includes an input data storage unit 10, a tensor construction unit 11, a tensor decomposition unit 12, a missing value estimation unit 13, and an output data storage unit 14.

入力データ記憶部１０は、因子分解及び欠損値の推定を行う対象の非負値テンソル（以下、テンソルと称する）と、各テンソルの欠損値の位置を示す欠損値情報と、因子分解で用いるパラメータを保存している。これらの情報は入力データ記憶部１０に予め保存されているものとする。 The input data storage unit 10 includes a non-negative tensor (hereinafter referred to as a tensor) on which factorization and missing value estimation are performed, missing value information indicating the position of the missing value of each tensor, and parameters used in the factorization. Saved. These pieces of information are assumed to be stored in advance in the input data storage unit 10.

図２は入力データ記憶部１０に保存されているテンソルとその欠損値情報の一例を示す。本事例のテンソルは３階テンソルであり、３つのモードは「ユーザＩＤ」「曜日」「店ＩＤ」という３つの属性情報に対応している。テンソルの要素は属性情報の組に対応する値を表現している。例えば、「ユーザ２」が「月曜」に「店３」に「４回」訪問した、という情報をテンソルの要素として表現できる。テンソルは非特許文献１の技術により複数個あってもよい。欠損値情報は、対応するテンソルのどの要素が欠損しているかを示す。 FIG. 2 shows an example of tensors and missing value information stored in the input data storage unit 10. The tensor of this example is the third floor tensor, and the three modes correspond to three pieces of attribute information “user ID”, “day of the week”, and “store ID”. A tensor element represents a value corresponding to a set of attribute information. For example, information that “user 2” visited “four times” at “store 3” on “Monday” can be expressed as a tensor element. A plurality of tensors may be provided by the technique of Non-Patent Document 1. The missing value information indicates which element of the corresponding tensor is missing.

テンソル構築部（テンソル構築手段）１１は、予め、特定の事象の複数の属性情報をモードとするテンソルを構築する。前記テンソルは入力データ記憶部１０に保存される。そして、テンソル構築部１１は、入力データ記憶部１０から前記テンソルを引き出し、このテンソルの因子分解の際に効率的に走査可能な形式に変換する。具体的には、前記引き出されたテンソルよりも低階のテンソルである部分テンソルに変換し、その要素のうちゼロでない要素（非ゼロ要素）を並べた形で主記憶に展開する。尚、テンソル内の要素に欠損値があるときは、当該要素の値を任意の負の値に置換し、これを非ゼロ要素として当該テンソルの部分テンソルに加える。詳細な処理は後述する。 A tensor constructing unit (tensor constructing means) 11 constructs a tensor using a plurality of pieces of attribute information of a specific event as a mode in advance. The tensor is stored in the input data storage unit 10. Then, the tensor construction unit 11 extracts the tensor from the input data storage unit 10 and converts it into a format that can be efficiently scanned when factoring the tensor. Specifically, it is converted into a partial tensor that is a lower-order tensor than the extracted tensor, and non-zero elements (non-zero elements) among the elements are arranged in the main memory. When there is a missing value in an element in the tensor, the value of the element is replaced with an arbitrary negative value, and this is added as a non-zero element to the partial tensor of the tensor. Detailed processing will be described later.

テンソル分解部１２は、テンソル構築部１１により得られたテンソルの因子分解を行う。詳細な処理は後述する。 The tensor decomposition unit 12 performs factorization of the tensor obtained by the tensor construction unit 11. Detailed processing will be described later.

欠損値推定部１３は、テンソル分解部１２により得られたテンソルの因子行列に基づき当該テンソルの欠損値を推定する。 The missing value estimation unit 13 estimates the missing value of the tensor based on the tensor factor matrix obtained by the tensor decomposition unit 12.

出力データ記憶部１４は、欠損値推定部１３により得られた欠損値の推定値を保存する。 The output data storage unit 14 stores the estimated value of the missing value obtained by the missing value estimation unit 13.

また、テンソル分解部１２は、図１（ｂ）に例示されたように、初期化部２０、行列更新部２１及び計算終了評価部２２を備える。 In addition, the tensor decomposition unit 12 includes an initialization unit 20, a matrix update unit 21, and a calculation end evaluation unit 22, as illustrated in FIG.

初期化部２０は、テンソル構築部１１により得られたテンソルの因子分解に必要な初期化処理を行う。具体的には前記テンソルの因子行列の要素を乱数で初期化する。詳細な処理は後述する。 The initialization unit 20 performs an initialization process necessary for factorization of the tensor obtained by the tensor construction unit 11. Specifically, the elements of the tensor factor matrix are initialized with random numbers. Detailed processing will be described later.

行列更新部（行列更新手段）２１は、前記初期化された因子行列の要素をテンソル構築部１１により得られたテンソルとその因子行列から算出されるテンソルの間の距離に基づく更新式により更新する。詳細な処理は後述する。 The matrix update unit (matrix update means) 21 updates the elements of the initialized factor matrix by an update formula based on the distance between the tensor obtained by the tensor construction unit 11 and the tensor calculated from the factor matrix. . Detailed processing will be described later.

計算終了評価部２２は、行列更新部２１により更新された因子行列に基づき当該更新の継続を決定する。具体的には、テンソル毎に対応する因子行列からそのテンソルの推定値を計算し、元のテンソルと推定されたテンソルの距離を計算する。但し、欠損値は比較ができないためこの計算の対象外とする。テンソルの距離には、一般化ＫＬダイバージェンスを用いることができる。この距離が予め設定された前記更新の終了条件を満たしている場合または前記更新の計算回数が予め設定された上限に達している場合に当該更新を終了させる。一方、前記距離が前記終了条件を満たさない場合は、前記更新を継続させる。 The calculation end evaluation unit 22 determines to continue the update based on the factor matrix updated by the matrix update unit 21. Specifically, the estimated value of the tensor is calculated from the factor matrix corresponding to each tensor, and the distance between the original tensor and the estimated tensor is calculated. However, since missing values cannot be compared, they are excluded from this calculation. A generalized KL divergence can be used for the tensor distance. The update is terminated when the distance satisfies the preset update end condition or when the update calculation count reaches a preset upper limit. On the other hand, when the distance does not satisfy the termination condition, the update is continued.

以上のテンソル因子分解処理装置１の機能部１０〜１４，２０〜２２はコンピュータのハードウェアリソースにより実現される。すなわち、テンソル因子分解処理装置１は、少なくとも演算装置（ＣＰＵ）、記憶装置（メモリ、ハードディスク装置等）、通信インタフェース等のコンピュータに係るハードウェアリソースを備える。そして、これらのハードウェアリソースがソフトウェアリソース（ＯＳ、アプリケーション等）と協働することにより各機能部１０〜１４，２０〜２２が実装される。また、各々のコンピュータに機能部１０〜１４，２０〜２２を各々実装させるようにしてもよい。 The functional units 10 to 14 and 20 to 22 of the tensor factorization processing apparatus 1 described above are realized by hardware resources of a computer. That is, the tensor factorization processing apparatus 1 includes hardware resources related to a computer such as at least an arithmetic device (CPU), a storage device (memory, hard disk device, etc.), a communication interface, and the like. And these hardware resources cooperate with software resources (OS, application, etc.), and each function part 10-14, 20-22 is mounted. Moreover, you may make it each implement | achieve the function parts 10-14 and 20-22 in each computer.

［本実施形態のテンソル因子分解の過程］
本実施形態のテンソル因子分解の過程は、以下の「テンソルの構築（Ｓ１００〜Ｓ１０４）」「因子行列の初期化（Ｓ２００〜Ｓ２０２）」「因子行列の更新（Ｓ５００〜Ｓ５０６）」の過程を有する。 [Process of tensor factorization of this embodiment]
The tensor factorization process of the present embodiment includes the following processes of “tensor construction (S100 to S104)”, “factor matrix initialization (S200 to S202)”, and “factor matrix update (S500 to S506)”. .

（テンソルの構築）
図３を参照しながら本態様のテンソル構築ステップ（Ｓ１００〜Ｓ１０４）について説明する。本過程はテンソル構築部１１により実行される。 (Tensor construction)
The tensor construction step (S100 to S104) of this aspect will be described with reference to FIG. This process is executed by the tensor construction unit 11.

Ｓ１００：入力データ記憶部１０からテンソルを引き出す。以降の説明ではテンソルが一つの場合を仮定して説明するが、本装置が処理するテンソル因子分解の問題設定により、テンソルが複数ともなり得る。テンソルが複数の場合、テンソルの個数分だけＳ１００以降の処理を繰り返す。 S100: A tensor is extracted from the input data storage unit 10. In the following description, a case where there is one tensor will be described. However, a plurality of tensors may be provided depending on the problem setting of the tensor factorization processed by the apparatus. When there are a plurality of tensors, the processing after S100 is repeated by the number of tensors.

Ｓ１０１：テンソルの各モードについて、Ｓ１０２以降の処理を行う。全てのモードの処理が終了したときは、処理終了に進む。 S101: For each mode of the tensor, the processing after S102 is performed. When all modes have been processed, the process proceeds to end.

Ｓ１０２：対象モードの次元と同じ次元の配列を用意する。例えば、１０×８×６の３階テンソルでは、第１モードについては１０個の要素をもつ配列を、第２モードについては８個の要素をもつ配列を、第３モードについては６個の要素をもつ配列を用意する。 S102: An array having the same dimension as that of the target mode is prepared. For example, a 10 × 8 × 6 third-order tensor has an array with 10 elements for the first mode, an array with 8 elements for the second mode, and 6 elements for the third mode. Prepare an array with.

Ｓ１０３：対象モードの各次元について、Ｓ１０４の処理を行う。全ての次元の処理が終了したときは、Ｓ１０１に進む。例えば、１０×８×６の３階テンソルの第１モードを対象モードとするとき、Ｓ１０４は１０回行われる。 S103: The process of S104 is performed for each dimension of the target mode. When processing for all dimensions is completed, the process proceeds to S101. For example, when the first mode of the third floor tensor of 10 × 8 × 6 is set as the target mode, S104 is performed 10 times.

Ｓ１０４：対象の次元に対応する部分テンソルを非ゼロ要素に着目した圧縮形式で表現し、その参照を配列の対応する要素にセットする。この圧縮形式としては、Coordinate list （ＣＯＯ）と呼ばれる形式が適用できる。テンソルが３階テンソルであれば、部分テンソルは当該テンソルよりも低階のテンソルである２階テンソル、すなわち、行列となるため、Compressed row storage（ＣＲＳ）と呼ばれる形式も適用できる。但し、本態様では、テンソル内の要素に欠損値があるときは、当該要素の値を任意の負の値に置換し、これを非ゼロ要素として圧縮形式に加える。 S104: The partial tensor corresponding to the target dimension is expressed in a compressed format focusing on non-zero elements, and the reference is set to the corresponding element of the array. As this compression format, a format called Coordinate list (COO) can be applied. If the tensor is a third-order tensor, the partial tensor is a second-order tensor that is a lower-order tensor than the tensor, that is, a matrix. Therefore, a form called compressed row storage (CRS) is also applicable. However, in this aspect, when an element in the tensor has a missing value, the value of the element is replaced with an arbitrary negative value, and this is added to the compression format as a non-zero element.

図４は、Ｓ１０４の一連の処理の流れを例示したものである。（ａ）は処理対象の３階テンソルである。（ｂ）は、（ａ）を部分テンソル、すなわち、行列の列として表したものである。（ｃ）は、各部分テンソルの圧縮形式の列である。（ｄ）は、部分テンソルに欠損値がないときのＣＯＯ形式及びＣＲＳ形式の例である。（ｅ）は、（ｄ）の部分テンソルのうち１個の要素が欠損値であったときのＣＯＯ形式及びＣＲＳ形式の例であり、当該要素が負の値を持つ非ゼロ要素とみなされることを示している。 FIG. 4 exemplifies the flow of a series of processes in S104. (A) is a 3rd floor tensor to be processed. (B) represents (a) as a partial tensor, that is, a matrix column. (C) is a column of the compression format of each partial tensor. (D) is an example of the COO format and the CRS format when there is no missing value in the partial tensor. (E) is an example of COO format and CRS format when one element of the partial tensor of (d) is a missing value, and the element is regarded as a non-zero element having a negative value. Is shown.

（因子行列の初期化）
図５を参照しながら因子行列の初期化の過程（Ｓ２００〜Ｓ２０２）について説明する。本過程はテンソル分解部１２の初期化部２０により実行される。 (Factor matrix initialization)
The factor matrix initialization process (S200 to S202) will be described with reference to FIG. This process is executed by the initialization unit 20 of the tensor decomposition unit 12.

Ｓ２００：入力データ記憶部１０から本実施形態の因子分解で用いるパラメータとしてランク数を引き出す。 S200: The rank number is extracted from the input data storage unit 10 as a parameter used in the factorization of this embodiment.

Ｓ２０１：テンソル構築部１１（Ｓ１００〜Ｓ１０４）により得られたテンソルに対応する各因子行列について、Ｓ２０２の処理を実行する。全ての因子行列について処理が終了したときは、処理終了に進む。 S201: The process of S202 is executed for each factor matrix corresponding to the tensor obtained by the tensor construction unit 11 (S100 to S104). When the process is completed for all factor matrices, the process proceeds to the end.

Ｓ２０２：対象である前記各因子行列の全ての要素について、０より大きい乱数を代入する。尚、因子行列は、行の大きさを対応するモードの大きさとし、列の大きさをＳ２００で引き出したランク数とする。 S202: Random numbers greater than 0 are substituted for all elements of each factor matrix that is the object. In the factor matrix, the row size is the size of the corresponding mode, and the column size is the rank number derived in S200.

（因子行列の更新）
図６を参照しながら本態様の行列更新ステップ（Ｓ５００〜Ｓ５０６）について説明する。本過程はテンソル分解部１２の行列更新部２１により実行される。 (Update factor matrix)
The matrix update step (S500 to S506) of this aspect will be described with reference to FIG. This process is executed by the matrix update unit 21 of the tensor decomposition unit 12.

Ｓ５００：初期化部２０により初期化された因子行列の各要素について、Ｓ５０１以降の処理を行う。全ての要素について処理が終了したときは、処理終了に進む。 S500: The process after S501 is performed on each element of the factor matrix initialized by the initialization unit 20. When the process is completed for all elements, the process proceeds to the end.

Ｓ５０１：因子行列の要素を更新するための更新式の分母の値について概算値を計算する。更新式は、テンソル間の距離として一般化ＫＬダイバージェンスが用いられる場合、下記の式（１）〜（３）のように示される。 S501: An approximate value is calculated for the denominator value of the update formula for updating the elements of the factor matrix. When the generalized KL divergence is used as the distance between tensors, the update formula is expressed as the following formulas (1) to (3).

テンソルは簡略化のために１個の３階テンソルと仮定しているが、個数は１以上、階数は２以上の任意の数でよい。この更新式の詳細は非特許文献２で示されている。 The tensor is assumed to be one third-order tensor for simplification, but the number may be any number of 1 or more and the number of ranks of 2 or more. Details of this updating formula are shown in Non-Patent Document 2.

尚、テンソルに欠損値がない、すなわち、全ての要素は観測値であると仮定してこの更新式を展開すると、上記の式（４）のようになる。この更新式の詳細は、テンソルが複数個である場合も含めて非特許文献１で示されている。 If this update equation is expanded assuming that there are no missing values in the tensor, that is, all elements are observed values, the above equation (4) is obtained. The details of this update formula are shown in Non-Patent Document 1, including the case where there are a plurality of tensors.

上記の式（１）における分母の値は、テンソルに欠損値がないと仮定すれば上記の式（５）のような式変形により高速に計算することができる。しかし、欠損値がある場合、前記分母の値は欠損値が多いほど不正確になるため、本態様ではこれを概算値と称する。 The denominator value in the above equation (1) can be calculated at high speed by modifying the equation as in the above equation (5), assuming that there is no missing value in the tensor. However, when there is a missing value, the value of the denominator becomes inaccurate as the number of missing values increases. In this embodiment, this value is referred to as an approximate value.

Ｓ５０２：テンソルの対応する要素を走査し、各要素についてＳ５０３以降の処理を行う。全ての要素について処理が終了したときは、Ｓ５０６に進む。 S502: The corresponding element of the tensor is scanned, and the processes after S503 are performed for each element. When the process is completed for all elements, the process proceeds to S506.

対応する要素とは、上記の式（１）の構造からわかるとおり、更新する因子行列に対応するモードのインデックスを固定し、残りのモードの全てのインデックスを任意としたときの非ゼロ要素集合である。例えば、１０×８×６の３階テンソルで第１モードを処理対象とするとき、最大で８×６=４８個の非ゼロ要素がこの集合に含まれる。 The corresponding element is a non-zero element set when the index of the mode corresponding to the factor matrix to be updated is fixed and all the indexes of the remaining modes are arbitrary, as can be seen from the structure of the above formula (1). is there. For example, when the first mode is processed by a 10 × 8 × 6 third-order tensor, a maximum of 8 × 6 = 48 non-zero elements are included in this set.

Ｓ５０３：要素が欠損値であれば、Ｓ５０４に進み、そうでなければＳ５０５に進む。要素が欠損値であることの判定は、テンソル構築部１１で欠損値を負の値の非ゼロ要素で表現していることから、値の符号によって行える。 S503: If the element is a missing value, proceed to S504, otherwise proceed to S505. The determination that an element is a missing value can be performed by the sign of the value because the tensor construction unit 11 expresses the missing value as a non-zero element having a negative value.

Ｓ５０４：要素の分母の補正値を計算し、更新式の分母の補正値に加算する。要素の分母の補正値は上記の式（６）により算出される。 S504: The correction value of the denominator of the element is calculated and added to the correction value of the denominator of the update formula. The correction value of the element denominator is calculated by the above equation (6).

Ｓ５０５：要素の分子の式を計算し、更新式の分子に加算する。尚、要素の分子の式は、上記の式（２）である。 S505: Calculate the numerator formula of the element and add it to the numerator of the update formula. The molecular formula of the element is the above formula (2).

Ｓ５０６：更新式の値を計算し、因子行列の要素を更新する。更新式の分母は、Ｓ５０１の概算値からＳ５０４の補正値を引くことで得られる。更新式の分子は、Ｓ５０５で得られたものを用いる。これらを上記の式（１）に代入することにより更新式の値が求められる。 S506: The value of the update formula is calculated, and the elements of the factor matrix are updated. The denominator of the update formula is obtained by subtracting the correction value of S504 from the approximate value of S501. The renewal type numerator is the one obtained in S505. By substituting these into the above equation (1), the value of the update equation is obtained.

欠損値推定部１３は、以上のテンソル分解部１２による更新処理により得られたテンソルの因子行列に基づき当該テンソルの欠損値を推定する。そして、この欠損値の推定値は、出力データ記憶部１４に保存される。 The missing value estimation unit 13 estimates the missing value of the tensor based on the tensor factor matrix obtained by the update process by the tensor decomposition unit 12 described above. Then, the estimated value of the missing value is stored in the output data storage unit 14.

［本実施形態の効果］
以上のテンソル因子分解処理装置１によれば、非負値（複合）テンソル補完に用いるテンソル因子分解にあたり、テンソルの欠損値に対応した因子分解の高速化が実現する。したがって、計算結果の正確性を維持しながら、高速化技術を導入した欠損値のない非負値テンソルに対する非負値（複合）因子分解と比べて計算時間の増大を抑制できる。 [Effect of this embodiment]
According to the tensor factorization processing apparatus 1 described above, in the tensor factorization used for non-negative (composite) tensor interpolation, the factorization corresponding to the missing value of the tensor can be speeded up. Therefore, while maintaining the accuracy of the calculation result, it is possible to suppress an increase in calculation time as compared with non-negative (composite) factorization for a non-negative tensor having no missing value with the introduction of a high speed technique.

特に、前記テンソルの要素が負の値であると当該要素は当該テンソルの欠損値であると判断して当該要素の補正値が前記更新に加算されることにより、当該テンソルの因子分解の計算結果の正確性が確保される。 In particular, if the element of the tensor is a negative value, the calculation result of factorization of the tensor is determined by determining that the element is a missing value of the tensor and adding the correction value of the element to the update. Accuracy is ensured.

以上のように、欠損値のあるテンソルの因子分解が低速にならざるをえない状況であっても、従来の欠損値非対応の高速化技術を拡張することにより、欠損値のないテンソルの因子分解のように非負値テンソル補完の高速化が図れる。 As described above, even in situations where the factorization of tensors with missing values must be slow, by extending the conventional high-speed technology that does not support missing values, tensor factors without missing values As with decomposition, non-negative tensor interpolation can be speeded up.

［本発明の他の態様］
本発明は、テンソル因子分解処理装置１を構成する各手段（機能部１０〜１４，２０〜２２）の一部若しくは全てとしてコンピュータを機能させるプログラムで構成しこれを当該コンピュータに実行させることにより実現できる。または、同装置１が実行するテンソル因子分解処理方法のステップＳ１００〜Ｓ１０４，Ｓ２００〜Ｓ２０２，Ｓ５００〜Ｓ５０６の一部若しくは全てをコンピュータに実行させるプログラムで構成しこれを当該コンピュータに実行させることにより実現できる。そして、これらのプログラム（テンソル因子分解処理プログラム）をそのコンピュータが読み取り可能な周知の記録媒体（例えば、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ等）に格納して提供できる。または、前記プログラムをインターネットや電子メール等でネットワークを介して提供できる。 [Other Embodiments of the Present Invention]
The present invention is realized by configuring a program that causes a computer to function as a part or all of each means (functional units 10 to 14, 20 to 22) constituting the tensor factorization processing apparatus 1 and causing the computer to execute the program. it can. Alternatively, a part or all of steps S100 to S104, S200 to S202, and S500 to S506 of the tensor factorization processing method executed by the apparatus 1 are configured by a program that is executed by a computer and is realized by causing the computer to execute the program. it can. These programs (tensor factor decomposition processing programs) can be provided by being stored in a known recording medium (for example, a hard disk, a flexible disk, a CD-ROM, etc.) that can be read by the computer. Alternatively, the program can be provided via the network via the Internet or e-mail.

尚、以上の発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更、応用が可能である。 The above invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

１…テンソル因子分解処理装置
１１…テンソル構築部（テンソル構築手段）
１２…テンソル分解部
１３…欠損値推定部
２０…初期化部
２１…行列更新部（行列更新手段）
２２…計算終了評価部 DESCRIPTION OF SYMBOLS 1 ... Tensor factorization processing apparatus 11 ... Tensor construction part (tensor construction means)
DESCRIPTION OF SYMBOLS 12 ... Tensor decomposition part 13 ... Missing value estimation part 20 ... Initialization part 21 ... Matrix update part (matrix update means)
22 ... Calculation end evaluation section

Claims

A tensor factorization processing apparatus that performs tensor factorization processing,
A tensor constructing means for replacing a missing value of a tensor constructed with a plurality of pieces of attribute information of a specific event as elements, and a negative value;
Matrix update means for updating an element of a factor matrix of a tensor processed by the tensor construction means by an update formula based on a distance between the tensor and a tensor calculated from the factor matrix. Tensor factorization processor.

The matrix updating means determines that if the element of the tensor is a negative value, the element is a missing value of the tensor, and adds the correction value of the element to the update. The tensor factorization processing apparatus according to 1.

The tensor factorization processing apparatus according to claim 1, wherein the tensor construction unit converts the constructed tensor into a tensor having a lower order than the tensor at the time of the replacement.

A tensor factorization processing method executed by a tensor factorization processing apparatus that performs tensor factorization processing,
A tensor construction step of replacing a missing value of a tensor constructed with a plurality of pieces of attribute information of a specific event as elements, and a negative value;
A matrix update step of updating an element of the factor matrix of the tensor processed in the tensor construction step by an update formula based on a distance between the tensor and a tensor calculated from the factor matrix. Tensor factorization method.

5. In the matrix updating step, if the element of the tensor is a negative value, it is determined that the element is a missing value of the tensor, and the correction value of the element is added to the update. The tensor factorization processing method described in 1.

6. The tensor factorization processing method according to claim 4, wherein, in the tensor construction step, the constructed tensor is converted into a tensor having a lower order than the tensor at the time of replacement.

A tensor factor decomposition processing program for causing a computer to function as each means constituting the tensor factor decomposition processing device according to any one of claims 1 to 3.