CN108170639B

CN108170639B - Tensor CP decomposition implementation method based on distributed environment

Info

Publication number: CN108170639B
Application number: CN201711426277.0A
Authority: CN
Inventors: 周维; 麦超; 蔡莉; 何靖; 姚绍文
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2021-08-17
Anticipated expiration: 2037-12-26
Also published as: CN108170639A

Abstract

The invention discloses a tensor CP decomposition implementation method based on a distributed environment, which is based on an ALS algorithm and is used for carrying out factor matrix A in each iteration process⁽ⁿ⁾The updating of (2) first calculates Y ═ X by splitting the Khatri-Rao product_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n‑1)⊙…⊙A⁽¹⁾) Then adopts the mode of parallel computing outer product to compute

And finally, partitioning the matrix Y and the matrix V, distributing the partitioned matrixes corresponding to the matrix Y and the matrix V to the hosts of the Spark cluster by adopting Map operation, carrying out matrix multiplication by adopting Reduce operation, sending multiplication results to one host by adopting Map operation, and combining by adopting Reduce operation to obtain A⁽ⁿ⁾YV. The method realizes the tensor CP decomposition based on the MapReduce and Spark technology, and can effectively improve the efficiency of the tensor CP decomposition.

Description

Tensor CP decomposition implementation method based on distributed environment

Technical Field

The invention belongs to the technical field of tensor decomposition, and particularly relates to a tensor CP decomposition implementation method based on a distributed environment.

Background

In recent years, data scale has been growing rapidly in the fields of social networks, computing advertising, e-commerce, and the like. To describe complex relationships, for example: the characteristics of each person in the social network, such as the friend relationship, the computational advertisements, and the e-commerce, are abundant based on data modeled by a high-dimensional space. The appearance of these high-order data makes the conventional method of describing data in a two-dimensional manner by using a matrix increasingly inapplicable, and therefore a tool capable of describing the high-order relationships in the high-dimensional data is urgently needed.

The tensor, which is a generalization of the matrix in a high-dimensional space, is a better tool for describing the high-order relationships among multiple variables. As early as 1940, tensors were proposed in psychometrics, and later tensors were widely used in theoretical fields such as physics, numerical analysis, signal processing, and theoretical computer science. Because the tensor is a high-dimensional array, and the tensor-based algorithm is often exponential in time complexity, many iterations are required in the calculation, and early computers cannot complete the calculation at all.

With the development of hardware and software technologies, a large server is gradually no longer the first choice in the industry due to factors such as cost and maintenance, and a cluster built by a common PC gradually becomes a mainstream data processing platform. Following the development of the theoretical domain, tensors again have received a lot of attention in the engineering domain because of their ability to describe and analyze higher-order data. Due to the appearance of programming models such as MapReduce and the like, an algorithm which is operated independently by a single machine is changed into an algorithm which is operated by a plurality of machines in a scattered manner, and the calculation efficiency is improved by utilizing the parallel calculation capability of the plurality of machines. The rise of such big data technologies as distributed storage and computation makes it possible to process large-scale data. At present, commonly used distributed computing frames include Hadoop and Spark, Hadoop based on a MapReduce programming model is the most widely used distributed computing frame, but each MapReduce task of Hadoop needs to read and write a disk before and after execution, and the Hadoop is not suitable for a scene with many iterations due to a large amount of disk I/O. The distributed elastic data set (RDD) in Spark is stored in the memory, so that the overhead caused by accessing a disk is avoided in each iteration, and the iteration efficiency is greatly improved.

The calculation of the tensor is easy to parallelize, and the problem that the tensor cannot be processed in the early stage can be completed in a distributed processing mode. The CP Decomposition (tensor polymeric composition) of tensor is also used more and more widely as a key in tensor research, and can extract subjects implicit in data, remove noise data, and reduce data dimensions. Conventional CP decomposition algorithms are stand-alone, and although programs can be made to process larger-scale data by upgrading the configuration of the machine, such upgrading is limited after all.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a tensor CP decomposition implementation method based on a distributed environment, and the efficiency of tensor CP decomposition is improved based on MapReduce and Spark technologies.

In order to achieve the above purpose, the tensor CP decomposition implementation method based on the distributed environment is used for the N-order tensor with the rank of R

Initializing N factor matrices A⁽ⁿ⁾Alternately updated at each iteration

The other factor matrixes are fixed during calculation, and iteration is repeated until the value of the objective function is zero or less than a given threshold value, wherein the N factor matrixes A⁽ⁿ⁾I.e. tensor

The result of CP decomposition of (1), wherein the factor matrix A⁽ⁿ⁾The update formula of (2) is:

factor matrix A⁽ⁿ⁾The following method is adopted for updating:

s1: let set D be {1,2, …, N } - { N }, arrange the elements in set D in ascending order, let the jth element be D_jJ ═ 1,2, …, N-1; let matrix Y be X_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾)，

S2: calculating Y ═ X by splitting Khatri-Rao product_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾) The method comprises the following specific steps:

s2.1: initializing a rank serial number r to be 1;

s2.2: the initialization j is 1 and the initialization j is,

s2.3: and Map: according to mode-d_jSplitting to obtain tensor

When n > d_jThen to

As a key, the key is a key,

as value, otherwise with

As a key, the key is a key,

as value, the tensor can be obtained by performing map operation

The fibers are distributed to all the hosts of the Spark cluster; simultaneous factor matrix

Column vector of

Is transferred to

As broadcast variables, distributing the broadcast variables to each host of the Spark cluster;

s2.4: reduce: each host of the Spark cluster is receiving

In the form of a key, the key is a key,

or

Value data and column vector

Then, by

An

Constituting fiber

Or is made of

An

Constituting fiber

The inner product of the fiber and column vector is calculated according to the following formula:

if n > d_j+1Calculating

Or

If n < d_j+1Calculating

Or

S2.5: judging whether j is less than N-1, if so, entering a step S2.6, otherwise, entering a step S2.7;

s2.6: j is made j +1, and the step S2.3 is returned;

s2.7: and Map: each host of Spark cluster is in code₁Is key,

Performing map operations for value, code₁The code is a preset code;

reduce: receive to

Will all

Combining to obtain vectors

S2.8: judging whether R is less than R, if so, entering step S2.9, otherwise, entering step S2.10;

s2.9: let r be r +1, return to step S2.3;

s2.10: merging vectors obtained by R times of cyclic calculation

Will be provided with

The r column vector is used as the matrix Y, so that the matrix Y is obtained;

s3: matrix calculation by parallel outer product calculation

The specific method comprises the following steps:

s3.1: initializing j to 1;

s3.2: MapReduce-based computation

1) And Map: first split the matrix

The row vectors of the matrix are distributed to each host of the Spark cluster, namely, the row sequence number i is taken as key,

Performing map operation for value;

2) reduce: each host of the Spark cluster receives i as a key,

after the value data, calculate

All the hosts calculated on the host

Sum and record the result as

M is 1,2, …, M represents the number of hosts of the Spark cluster;

3) and Map: each host of Spark cluster is in code₂Is key,

Performing map operations for value, code₂The code is a preset code;

4) reduce: receive to

Will all

Add up to obtain

The expression is as follows:

s3.3: judging whether j is less than N-1, if so, entering a step S3.4, otherwise, entering a step S3.5;

s3.4: j is made j +1, and the step returns to step S3.2;

s3.5: according to N-1

The matrix V is calculated according to the calculation result, and the specific process is as follows:

1) and Map: the host computer is obtained by calculation

Then, in code₃Is key,

Performing map operations for value, code₃Is a preset code.

2) Reduce: receive to

Host computer of (2) calculates all

Calculating the pseudo-inverse of the Hadamard product result to obtain a matrix V;

s4: partitioning the matrix Y and the matrix V, distributing the partitioned matrixes in the matrix Y and the matrix V to a host of a Spark cluster by adopting Map operation, carrying out matrix multiplication by adopting Reduce operation, and then carrying out multiplication resultSending the data to a host by adopting Map operation and merging by adopting Reduce operation to obtain A⁽ⁿ⁾＝YV。

The tensor CP decomposition implementation method based on the distributed environment is based on the ALS algorithm and used for carrying out factor matrix A in each iteration process⁽ⁿ⁾The updating of (2) first calculates Y ═ X by splitting the Khatri-Rao product_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾) Then adopts the mode of parallel computing outer product to compute

Drawings

FIG. 1 is a flowchart of an embodiment of updating a factor matrix in a distributed environment-based tensor CP decomposition implementation method according to the present invention;

FIG. 2 is a flow chart of the present invention for splitting the Khatri-Rao product calculation matrix Y;

FIG. 3 is a flow chart of the present invention for computing matrix V by parallel outer product computation;

FIG. 4 is a MapReduce-based calculation in the present invention

A schematic flow diagram of (a);

FIG. 5 is a graph of the runtime contrast of the present invention and the contrast method for different tensor sizes;

FIG. 6 is a graph of the runtime comparison of the present invention and the comparison method at different tensor densities.

Detailed Description

The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

To better explain the technical solution of the present invention, the tensor CP decomposition and the principle on which the present invention is based will be briefly explained.

For N-order tensor with rank R

I_nDimension representing the nth order, N being 1,2, …, N, with the goal of computing a nearest neighbor

And the rank is R, i.e. the calculation

II denotes the norm, where

Wherein A is⁽¹⁾,…,A^(n-1),A⁽ⁿ⁾,…,A^(N)Is a factor matrix of the tensor.

ALS (Alternating Least Squares) algorithm is a common algorithm for the current tensor CP decomposition, and the method is to calculate in turn

Each factor matrix is fixed with other factor matrixes when in each calculation, so that each calculation is converted into an optimization problem shown by the following formula:

wherein X_(n)Tensor of representation

Of mode-n matrixing, i.e. tensors

The resulting matrix is expanded along mode-n with superscript T indicating transpose and the lines indicating the Khatri-Rao product.

When finding R factor matrixes

And make the target function | X_(n)-A⁽ⁿ⁾(A⁽¹⁾⊙…⊙A^(n-1)⊙…⊙A^(N))^TWhen the value of | is the smallest, then the R factor matrices are the result of the CP decomposition.

In engineering, tensor is calculated using ALS algorithm

Initializing N factor matrices A⁽ⁿ⁾Performing multiple iterative computations, each iterative computation being updated in turn

The other factor matrices are fixed during calculation, and the iteration is repeated until the value of the objective function is zero or less than a given threshold value. Wherein each iteration requires updating all factor matrices, a for each factor matrix⁽ⁿ⁾The update is performed using the following formula:

wherein, the upper label

Representing the pseudo-inverse and denotes the Hadamard product.

Thus, in each calculation, the corresponding basis is updated according to the above formula. Parallelization and distribution of tensor decomposition algorithm are that a distributed algorithm needs to be designed to complete factor matrix A⁽ⁿ⁾Is moreAnd (5) new.

Examples

The invention is based on ALS algorithm, improves the updating of the factor matrix in each iteration process, and realizes the updating of the factor matrix based on a distributed environment so as to improve the efficiency of tensor CP decomposition. Fig. 1 is a flowchart of an embodiment of updating a factor matrix in a distributed environment-based tensor CP decomposition implementation method according to the present invention. As shown in fig. 1, in the tensor CP decomposition implementation method based on the distributed environment of the present invention, the specific step of updating the factor matrix includes:

s101: data arrangement:

let set D be {1,2, …, N } - { N }, arrange the elements in set D in ascending order, let the jth element be D_jIt is clear that there are N-1 elements in set D, i.e., j ═ 1,2, …, N-1; let matrix Y be X_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾)，

Namely A⁽ⁿ⁾＝YV。

S102: splitting the Khatri-Rao product to calculate a matrix Y:

in equation (2), first, Y ═ X needs to be calculated_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾) And A is^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾Is the Khatri-Rao product that leads to a surge in intermediate data. Calculating X_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾) Is actually calculating the tensor

N-mode product with the column vector of the factor matrix, so by the above analysis, X can be multiplied_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾) The calculation process of (a) is converted into the following algorithm:

order to

Let R be 1,2, …, R in turn, cyclically calculate:

wherein the content of the first and second substances,

matrix of presentation factors

The r-th column vector of (2),

denotes an n-mode product, where n ═ d_j，

Represents X_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾) Is the transpose of the r-th column vector in the calculation result Y.

The following third order tensor with a rank of 2

The above method will be described as an example. Suppose that the factor matrix to be updated this time is A⁽¹⁾Third order tensor

The forward slices of (a) are respectively:

last updated obtained A⁽²⁾And A⁽³⁾Respectively as follows:

due to the need to update the factor matrix A⁽¹⁾Thus, the set D ═ 2, 3, let

Let r be 1, then:

let r be 2, then:

are combined to obtain

It can be seen that the above calculation process needs R iterations, N-1N-mode product calculations are performed in each iteration, and the calculation of the N-mode product does not need a large amount of storage, and can be conveniently realized by using MapReduce.

According to the analysis, the Khatri-Rao product is obtained by splitting the Khatri-Rao product so as to calculate Y ═ X_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾) The specific process of (1). FIG. 2 is a flow chart of the present invention for splitting the Khatri-Rao product calculation matrix Y. As shown in fig. 2, the specific process of splitting the Khatri-Rao product to calculate the matrix Y in the present invention is as follows:

s201: the initialization rank index r is 1.

S202: the initialization j is 1 and the initialization j is,

s203: splitting tensor:

and Map: according to mode-d_jSplitting to obtain tensor

When n > d_jThen to

As a key, the key is a key,

as value, otherwise with

As a key, the key is a key,

as value.

Or

Tensor of representation

The elements of (1), the elements of the same key constituting the tensor

The tensor can be expanded by performing map operation on the fiber

Is distributed to each host of the Spark cluster. Simultaneously, the factor matrix is formed by utilizing the characteristics of shared variables of Spark

Column vector of

Is transferred to

As a broadcast variable, it is distributed to each host of the Spark cluster.

Since the Spark technique is adopted for calculation in the invention, except for

The rest(s)

In fact, the tensors are distributed on different hosts, and splitting the tensors, that is, when Map operation is performed, the tensor splitting operation is actually completed by combining a plurality of hosts.

S204: calculate the inner product of fiber and column vector:

reduce: each host of the Spark cluster is receiving

In the form of a key, the key is a key,

or

Value data and column vector

Then, by

An

Constituting fiber

Or is made of

An

Constituting fiber

if n > d_j+1Calculating

Or

If n < d_j+1Calculating

Or

The different hosts can be calculated according to the data distributed to the hosts and the formula

S205: and judging whether j is less than N-1, if so, entering the step S206, otherwise, entering the step S207.

S206: let j equal j +1, return to step S203.

S207: to obtain

And Map: each host of Spark cluster is in code₁Is key,

Performing map operations for value, code₁For preset codes, i.e. calculated by each host

Map to a host.

Reduce: receive to

Will all

Combining to obtain vectors

S208: and judging whether R is less than R, if so, entering step S209, and otherwise, entering step S210.

S209: let r be r +1, return to step S202.

S210: merging vectors:

merging vectors obtained by R times of cyclic calculation

Will be provided with

As the r column vector of the matrix YThereby obtaining a matrix Y.

Third order tensor still with preceding rank 2

The above-described process will be described by way of example. Also assume that the factor matrix to be updated this time is A⁽¹⁾Third order tensor

The forward slices of (a) are respectively:

last updated obtained A⁽²⁾And A⁽³⁾Respectively as follows:

due to the need to update the factor matrix A⁽¹⁾I.e. n is 1, so the set D is 2, 3.

Let r be 1. The initialization j is 1 and the initialization j is,

map due to d₁Split 2 by mode-2

4 fibers are obtained, since

Is 3 rd order and thus each fiber is actually a vector. The key of each element should be i₁+i₃. Table 1 shows the split by mode-2 in this example

The result of (1).

key	1+1	1+1	1+1
				Value of element	1	3	5
key	1+2	1+2	1+2
				Value of element	2	4	6
key	2+1	2+1	2+1
				Value of element	7	9	11
key	2+2	2+2	2+2
				Value of element	8	10	12

TABLE 1

Wherein each row of element values in Table 1 constitutes a fiber, i.e.

The values of the respective elements in Table 1 are expressed as i₁+i₃As a key, to

And performing map operation as value, and distributing the value to each host of the Spark cluster, namely completing the distribution of the fiber. A is to be⁽²⁾Column vector of

Is transferred to

As a broadcast variable, it is distributed to each host of the Spark cluster.

Reduce: each host computer obtaining data respectively calculates 4 fibers

And

the inner product of (a) yields 4 values 6, 8, 18, 20. From these 4 values, the size I is formed₁×I₃Magnitude of 2 nd order tensor

Namely, it is

Therein of elements

The following can be obtained:

let j equal 2.

Map due to d₂Split 3, required by mode-3

While

Each element in (1), namely the calculation result of last Reduce, is dispersed on each host of Spark cluster, so that each host directly performs Map operation, namely executing

Each host computer of (1) with₁As a key, the key is a key,

the Map operation is performed as value. Obviously, the fiber sent at this time is [ 618 ]]、[8 20]. A is to be⁽³⁾Middle column vector of

Is transferred to

As a broadcast variable, it is distributed to each host of the Spark cluster.

Reduce, each host computer obtaining data respectively calculates 2 fibers

And

the inner product of (a) yields 2 values 24, 28. From these 2 values, the size I is formed₁Tensor of order 1

Namely, it is

Therein of elements

Will be provided with

Mapping to the same host, and merging to obtain vector

Can be calculated in a similar way

Then will be

And

combining to obtain matrix

S103: and (3) calculating a matrix V by adopting a parallel outer product calculation mode:

the matrix is then analyzed

And (4) calculating. It is clear that the key step is to be able to calculate (A) efficiently^(1)TA⁽¹⁾*...*A^(n-1)TA^(n-1)*...*A^(N)TA^(N)) Because the result of this equation is a matrix of size R x R, which is typically a small value, the computation of the pseudo-inverse is quite fast and easy. (A)^(1)TA⁽¹⁾*...*A^(n-1)TA^(n-1)*...*A^(N)TA^(N)) Can be calculated from left to right, respectively, the result of the product of the transpose of each matrix and itself, since

The results are thus a matrix of size R x R, and the Hadamard products of the N-1 matrices of size R x R are finally calculated, thus completing the calculation of the equation. Computing

Is just a calculation matrix

And the process of transposing each row of (a) and adding the results of all the outer products to the outer product of that row. Thus, can be

The calculation of (c) is described by the following algorithm:

initializing a matrix of R

Is a zero matrix, sequentially order

Circulation ofAnd (3) calculating:

wherein the content of the first and second substances,

to represent

Row i vector of (1), and o represents the outer product.

The above algorithm is illustrated by taking a matrix a of size 3 × 2 as an example:

initialization

When i is 1, there are:

when i is 2, there are:

when i is 3, there are:

obviously, the results obtained by using the above algorithm are combined with the direct calculation of A^TThe results for a are the same.

In the above algorithm, the matrix needs to be calculated

The transpose of each row of (a) and the outer product of that row, thus requiring multiple iterations. By observing the calculation in each iteration of the algorithm, i.e.

It can be found that the data used to compute the outer product is a matrix

The same row vector, so that the calculation of the outer product can be completed independently; meanwhile, the matrix obtained by calculating the outer product is a matrix with the size of R multiplied by R, and the matrix is very small and does not occupy a large amount of storage space. Based on this finding, the present invention combines matrices

And splitting is carried out, each row vector is distributed to each machine of the cluster, and the outer product calculation is executed in parallel, so that the efficiency is improved. Here, the matrix needs to be considered

Is provided with

Go into

The rows may be a large number, so the computation of the outer product cannot be completed by only one reducer, partial outer products need to be merged in advance, the pressure of the reducer is reduced, the computation efficiency is improved, and finally all the outer products are merged on one reducer and the computation of the pseudo-inverse is completed. The specific implementation of MapReduce in the step is divided into two MapReduce algorithms, the outer products are calculated and all the outer products are merged, and each MapReduce is divided into two steps which are Map and Reduce respectively.

From the above analysis, the calculations of the present invention can be obtained

The specific process of (1). Fig. 3 is a flow chart of the present invention for computing matrix V by parallel computing outer products. As shown in fig. 3, the specific steps of calculating the matrix V in the form of parallel outer product in the present invention include:

s301: the initialization j is 1.

S302: is calculated to obtain

In the invention, two steps of MapReduce are adopted for calculation

FIG. 4 is a MapReduce-based calculation in the present invention

Is a schematic flow diagram. As shown in FIG. 4, the calculation based on MapReduce in the invention

The specific method comprises the following steps:

1) and Map: first split the matrix

The map operation is performed for value.

2) Reduce: each host of the Spark cluster receives i as a key,

after the value data, calculate

All the hosts calculated on the host

Sum and record the result as

M is 1,2, …, M indicates the number of hosts of the Spark cluster. This is because

Usually, it is large, so that more than one row vector and its transposed outer product are calculated on each host, so that each host merges the partial outer products calculated thereon in advance to reduce the subsequent workload.

3) And Map: each host of Spark cluster is in code₂Is key,

Performing map operations for value, code₂For preset codes, i.e. calculated by each host

Map to a host.

4) Reduce: receive to

Will all

Add up to obtain

The expression is as follows:

s303: judging whether j is less than N-1, if so, entering step S304, otherwise, entering step S305.

S304: let j be j +1, return to step S302.

S305: calculating a matrix V:

according to N-1

1) and Map: the host computer is obtained by calculation

Then, in code₃Is key,

Map operations for value, code as such₃Is a preset code.

2) Reduce: receive to

Host computer of (2) calculates all

And calculating the pseudo-inverse of the Hadamard product result to obtain a matrix V. According to the previous analysis, since each

Are all R × R matrices, the calculation is relatively simple, and therefore can be done with one Reduce.

S104: computing based on distributed caches⁽ⁿ⁾：

In the calculation of matrix multiplication A⁽ⁿ⁾One key factor to consider when YV is that a single machine store can accommodate both matrices and operate efficiently. Wherein Y ═ X_(n)(A^(N)⊙…⊙A⁽ⁿ⁺¹⁾⊙A^(n-1)⊙…⊙A⁽¹⁾) Is a size of I_nX R matrix, where R is usually a smaller number, and I_nOften, the memory of the matrix unit is not able to be accommodated. If a disk is used for auxiliary storage (for example, a swap partition mode is used), a large amount of disk I/O (input/output) is generated when a program runsGreatly affecting the efficiency of the operation.

Is R x R, this matrix can be stored in a stand-alone memory completely since R is a small number. From the above analysis, it can be concluded that A is⁽ⁿ⁾YV is a large matrix multiplication with severe data skew, and the common matrix multiplication method cannot be applied to such a matrix.

Based on the reasons, the matrix Y and the matrix V are partitioned, Map operation is adopted to distribute the partitioned matrixes in the matrix Y and the matrix V to the hosts of the Spark cluster, Reduce operation is adopted to carry out matrix multiplication, then the multiplication results are sent to one host by Map operation and are merged by Reduce operation, and A is obtained⁽ⁿ⁾YV, thereby enabling more efficient calculations.

For matrix blocking, the common ways are: the division by columns, by rows and by columns. The research of the invention finds that the matrix V is not the optimal way if the matrix V is also partitioned because of the small scale of the matrix V. Therefore, it is preferable to block only the matrix Y, i.e. the blocking method for the matrix Y and the matrix V is as follows:

and partitioning the matrix Y according to rows to obtain a partitioned matrix with the columns of R, wherein the row size of the partitioned matrix is set according to actual needs, and the partitioned matrix of the matrix V is the partitioned matrix, namely the partitioned matrix is not partitioned.

The block dividing mode is adopted, and the calculation A is calculated based on the distributed cache⁽ⁿ⁾The specific process is as follows:

1) and Map: firstly, splitting the matrix Y, and taking the row sequence number k as key and row vector Y_kMap operation is performed as value, k is 1,2, …, I_nSeparate row vectors y_kAnd distributing to each host of the Spark cluster. And meanwhile, setting the matrix V as a broadcast variable of Spark, and distributing the broadcast variable to each host of the Spark cluster.

2) Reduce: receiving row vector y in Spark cluster_kAnd each host of matrix V, calculate A_k ⁽ⁿ⁾＝y_kV。

3) And Map: the host computer obtains A through calculation_k ⁽ⁿ⁾Then, in code₄Is key, A_k ⁽ⁿ⁾Performing map operations for value, code₄Is a preset code.

4) Reduce: receive I_nA is_k ⁽ⁿ⁾Host computer A_k ⁽ⁿ⁾As A⁽ⁿ⁾To obtain the factor matrix a⁽ⁿ⁾。

In order to better illustrate the technical effect of the invention, a specific example is adopted to carry out experimental verification on the invention, and the technical effect is compared with the technical effect of the existing tensor CP decomposition method. In the experimental verification, 10 hosts are included in the Spark cluster, and the tensor data uses the NELL data source of CMU (university of tomilon in card), which originates from the "Read the Web" item of CMU and includes a large number of categories and relationships. Because the reality data are generally sparse, in order to test the expression of the decomposition algorithm of the tensor under the data with different sparsity degrees, the NELL full data are added, and the third-order tensors with different sizes and different densities are generated randomly in the experiment. Table 2 is a data set description in this example.

TABLE 2

The comparison method adopted in the experimental verification is that a traditional Tensor CP decomposition method tool is software MATLAB (matrix laboratory) Toolbox Version 2.6, the MATLAB Toolbox is realized by Tamara G.Kolda of American Sandia national laboratory, and the characteristic peer-to-peer operations of CP decomposition, Tucker decomposition and matrix calculation of dense Tensor, sparse Tensor and structured Tensor are provided, but the MATLAB Toolbox does not support distributed Tensor decomposition operation. This experiment verifies that the rank R of the tensor is set to 10 when CP decomposition is performed.

Firstly, fixing tensor density, and testing the running time of the method and the comparison method under the condition that tensor sizes are different. The tensor in the experiment is from I to J to K to 10³Gradually increase to I ═ J ═ K ═ 10⁸The number of non-zero elements is 10 × I. FIG. 5 is a graph of the runtime comparison of the present invention and the comparison method at different tensor sizes. As shown in fig. 5, the operation time of the contrast method increases with the scale of the tensor, and when the size of the tensor exceeds I, J, K, 10⁶In time, the contrast method cannot complete the CP decomposition of the tensor due to the CPU and memory limitations of the single machine (mainly limited to the memory). The invention has tensor size of I-J-K-10³～10⁶The running time is stable because under the condition that the tensor scale is not large, the task scheduling and network transmission data occupy most of the running time of the program when the algorithm is operated, and the part of the running time is relatively stable. When the tensor size exceeds I, J, K and 10⁶The runtime of the present invention begins to increase; when the tensor size reaches I ═ J ═ K ═ 10 ═ K ═⁸The increment of the running time of the invention is higher by an order of magnitude, at this time, the memory occupation of the cluster reaches a peak value, Spark starts to use the exchange partition of the disk to store partial data, some temporarily unnecessary RDDs are also cleared from the memory, and the RDDs are recalculated according to the ancestry of the RDDs when needed later, so that the running time is obviously increased by the two factors. Although the invention increases the run time when CP decomposing the large scale tensor, it is acceptable for engineering applications.

The size of the tensor is then fixed and the run time of the present invention and comparison method is tested for different tensor densities. The tensor size used for the test is I-J-K-10⁵. Density of tensor is from 10^-9Increment to 10^-5Number of non-zero elements is 10⁶To 10¹⁰. FIG. 6 is a graph of the runtime comparison of the present invention and the comparison method at different tensor densities. As shown in fig. 6, the density of the tensor is from 10^-9Increased to 10^-7The run time of the comparative method increases substantially linearly,when the density exceeds 10^-7In time, the contrast method cannot complete the CP decomposition of the tensor. The invention has a tensor density of 10^-9To 10^-6The run time growth was more stable when the tensor density increased to 10^-5The time for program execution increases significantly. Because the invention uses the mode of sparse storage to store the tensor, the number of the nonzero elements is increased along with the increase of the density of the tensor, more memories are needed on Spark to store RDD, if the memories are not enough, the mechanisms of discarding the temporarily unneeded RDD and using the disk swap partition are started to be started, which causes extra operation cost and increases the operation time. Although the invention increases the run time when CP decomposing the high density tensor, it is acceptable for engineering applications.

In conclusion, the tensor CP decomposition method has less running time than the traditional method, can break through the limitation of single machine software and hardware conditions, realizes the large-scale and high-density tensor CP decomposition, and keeps better timeliness.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. Tensor CP decomposition implementation method based on distributed environment, for N-order tensor with rank R

I_nDimension number of nth order, N is 1,2, …, N, and

wherein A is⁽¹⁾,…,A^(n-1),A⁽ⁿ⁾,…,A^(N)Is a factor matrix of the tensor; initializing N factor matrices A⁽ⁿ⁾Alternately updated at each iteration

wherein, the upper label

Indicates a pseudo-inverse,. indicates a Khatri-Rao product,. indicates a Hadamard product;

characterized by a factor matrix A⁽ⁿ⁾The following method is adopted for updating:

s2.1: initializing a rank serial number r to be 1;

s2.2: the initialization j is 1 and the initialization j is,

s2.3: and Map: according to mode-d_jSplitting to obtain tensor

When n > d_jThen to

Or

Tensor of representation

The tensor can be tensed by performing map operation

Column vector of

Is transferred to

s2.4: reduce: each host of the Spark cluster is receiving

In the form of a key, the key is a key,

or

Value data and column vector

Then, by

An

Constituting fiber

Or is made of

An

Constituting fiber

if n > d_j+1Calculating

Or

If n < d_j+1Calculating

Or

s2.6: j is made j +1, and the step S2.3 is returned;

s2.7: and Map: each host of Spark cluster is in code₁Is key,

Performing map operations for value, code₁The code is a preset code;

reduce: receive to

Will all

Combining to obtain vectors

s2.9: let r be r +1, return to step S2.3;

s2.10: merging vectors obtained by R times of cyclic calculation

Will be provided with

The r column vector is used as the matrix Y, so that the matrix Y is obtained;

s3: matrix calculation by parallel outer product calculation

The specific method comprises the following steps:

s3.1: initializing j to 1;

s3.2: MapReduce-based computation

1) And Map: first split the matrix

Performing map operation for value;

2) reduce: each host of the Spark cluster receives i as a key,

after the value data, calculate

Represents the outer product, all the hosts calculated on it

Sum and record the result as

M represents the number of hosts of the Spark cluster;

3) and Map: each host of Spark cluster is in code₂Is key,

Performing map operations for value, code₂The code is a preset code;

4) reduce: receive to

Will all

Add up to obtain

The expression is as follows:

s3.4: j is made j +1, and the step returns to step S3.2;

s3.5: according to N-1

1) and Map: the host computer is obtained by calculation

Then, in code₃Is key,

Performing map operations for value, code₃The code is a preset code;

2) reduce: receive to

Host computer of (2) calculates all

s4: partitioning the matrix Y and the matrix V, distributing the partitioned matrixes corresponding to the matrix Y and the matrix V to the hosts of the Spark cluster by adopting Map operation, carrying out matrix multiplication by adopting Reduce operation, sending multiplication results to one host by adopting Map operation, and merging by adopting Reduce operation to obtain A⁽ⁿ⁾＝YV。

2. The tensor CP decomposition implementation method as claimed in claim 1, wherein the partitioning method for the matrix Y and the matrix V in S4 is: and partitioning the matrix Y according to rows to obtain a partitioned matrix with the columns being R, wherein the partitioned matrix of the matrix V is the partitioned matrix.