CN108170639B - Tensor CP decomposition implementation method based on distributed environment - Google Patents
Tensor CP decomposition implementation method based on distributed environment Download PDFInfo
- Publication number
- CN108170639B CN108170639B CN201711426277.0A CN201711426277A CN108170639B CN 108170639 B CN108170639 B CN 108170639B CN 201711426277 A CN201711426277 A CN 201711426277A CN 108170639 B CN108170639 B CN 108170639B
- Authority
- CN
- China
- Prior art keywords
- matrix
- tensor
- host
- key
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a tensor CP decomposition implementation method based on a distributed environment, which is based on an ALS algorithm and is used for carrying out factor matrix A in each iteration process(n)The updating of (2) first calculates Y ═ X by splitting the Khatri-Rao product(n)(A(N)⊙…⊙A(n+1)⊙A(n‑1)⊙…⊙A(1)) Then adopts the mode of parallel computing outer product to computeAnd finally, partitioning the matrix Y and the matrix V, distributing the partitioned matrixes corresponding to the matrix Y and the matrix V to the hosts of the Spark cluster by adopting Map operation, carrying out matrix multiplication by adopting Reduce operation, sending multiplication results to one host by adopting Map operation, and combining by adopting Reduce operation to obtain A(n)YV. The method realizes the tensor CP decomposition based on the MapReduce and Spark technology, and can effectively improve the efficiency of the tensor CP decomposition.
Description
Technical Field
The invention belongs to the technical field of tensor decomposition, and particularly relates to a tensor CP decomposition implementation method based on a distributed environment.
Background
In recent years, data scale has been growing rapidly in the fields of social networks, computing advertising, e-commerce, and the like. To describe complex relationships, for example: the characteristics of each person in the social network, such as the friend relationship, the computational advertisements, and the e-commerce, are abundant based on data modeled by a high-dimensional space. The appearance of these high-order data makes the conventional method of describing data in a two-dimensional manner by using a matrix increasingly inapplicable, and therefore a tool capable of describing the high-order relationships in the high-dimensional data is urgently needed.
The tensor, which is a generalization of the matrix in a high-dimensional space, is a better tool for describing the high-order relationships among multiple variables. As early as 1940, tensors were proposed in psychometrics, and later tensors were widely used in theoretical fields such as physics, numerical analysis, signal processing, and theoretical computer science. Because the tensor is a high-dimensional array, and the tensor-based algorithm is often exponential in time complexity, many iterations are required in the calculation, and early computers cannot complete the calculation at all.
With the development of hardware and software technologies, a large server is gradually no longer the first choice in the industry due to factors such as cost and maintenance, and a cluster built by a common PC gradually becomes a mainstream data processing platform. Following the development of the theoretical domain, tensors again have received a lot of attention in the engineering domain because of their ability to describe and analyze higher-order data. Due to the appearance of programming models such as MapReduce and the like, an algorithm which is operated independently by a single machine is changed into an algorithm which is operated by a plurality of machines in a scattered manner, and the calculation efficiency is improved by utilizing the parallel calculation capability of the plurality of machines. The rise of such big data technologies as distributed storage and computation makes it possible to process large-scale data. At present, commonly used distributed computing frames include Hadoop and Spark, Hadoop based on a MapReduce programming model is the most widely used distributed computing frame, but each MapReduce task of Hadoop needs to read and write a disk before and after execution, and the Hadoop is not suitable for a scene with many iterations due to a large amount of disk I/O. The distributed elastic data set (RDD) in Spark is stored in the memory, so that the overhead caused by accessing a disk is avoided in each iteration, and the iteration efficiency is greatly improved.
The calculation of the tensor is easy to parallelize, and the problem that the tensor cannot be processed in the early stage can be completed in a distributed processing mode. The CP Decomposition (tensor polymeric composition) of tensor is also used more and more widely as a key in tensor research, and can extract subjects implicit in data, remove noise data, and reduce data dimensions. Conventional CP decomposition algorithms are stand-alone, and although programs can be made to process larger-scale data by upgrading the configuration of the machine, such upgrading is limited after all.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a tensor CP decomposition implementation method based on a distributed environment, and the efficiency of tensor CP decomposition is improved based on MapReduce and Spark technologies.
In order to achieve the above purpose, the tensor CP decomposition implementation method based on the distributed environment is used for the N-order tensor with the rank of RInitializing N factor matrices A(n)Alternately updated at each iterationThe other factor matrixes are fixed during calculation, and iteration is repeated until the value of the objective function is zero or less than a given threshold value, wherein the N factor matrixes A(n)I.e. tensorThe result of CP decomposition of (1), wherein the factor matrix A(n)The update formula of (2) is:
factor matrix A(n)The following method is adopted for updating:
s1: let set D be {1,2, …, N } - { N }, arrange the elements in set D in ascending order, let the jth element be DjJ ═ 1,2, …, N-1; let matrix Y be X(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)),
S2: calculating Y ═ X by splitting Khatri-Rao product(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) The method comprises the following specific steps:
s2.1: initializing a rank serial number r to be 1;
s2.3: and Map: according to mode-djSplitting to obtain tensorWhen n > djThen toAs a key, the key is a key,as value, otherwise withAs a key, the key is a key,as value, the tensor can be obtained by performing map operationThe fibers are distributed to all the hosts of the Spark cluster; simultaneous factor matrixColumn vector ofIs transferred toAs broadcast variables, distributing the broadcast variables to each host of the Spark cluster;
s2.4: reduce: each host of the Spark cluster is receivingIn the form of a key, the key is a key,orValue data and column vectorThen, byAnConstituting fiberOr is made ofAnConstituting fiberThe inner product of the fiber and column vector is calculated according to the following formula:
S2.5: judging whether j is less than N-1, if so, entering a step S2.6, otherwise, entering a step S2.7;
s2.6: j is made j +1, and the step S2.3 is returned;
s2.7: and Map: each host of Spark cluster is in code1Is key,Performing map operations for value, code1The code is a preset code;
S2.8: judging whether R is less than R, if so, entering step S2.9, otherwise, entering step S2.10;
s2.9: let r be r +1, return to step S2.3;
s2.10: merging vectors obtained by R times of cyclic calculationWill be provided withThe r column vector is used as the matrix Y, so that the matrix Y is obtained;
s3: matrix calculation by parallel outer product calculationThe specific method comprises the following steps:
s3.1: initializing j to 1;
1) And Map: first split the matrixThe row vectors of the matrix are distributed to each host of the Spark cluster, namely, the row sequence number i is taken as key,Performing map operation for value;
2) reduce: each host of the Spark cluster receives i as a key,after the value data, calculateAll the hosts calculated on the hostSum and record the result asM is 1,2, …, M represents the number of hosts of the Spark cluster;
3) and Map: each host of Spark cluster is in code2Is key,Performing map operations for value, code2The code is a preset code;
s3.3: judging whether j is less than N-1, if so, entering a step S3.4, otherwise, entering a step S3.5;
s3.4: j is made j +1, and the step returns to step S3.2;
s3.5: according to N-1The matrix V is calculated according to the calculation result, and the specific process is as follows:
1) and Map: the host computer is obtained by calculationThen, in code3Is key,Performing map operations for value, code3Is a preset code.
2) Reduce: receive toHost computer of (2) calculates allCalculating the pseudo-inverse of the Hadamard product result to obtain a matrix V;
s4: partitioning the matrix Y and the matrix V, distributing the partitioned matrixes in the matrix Y and the matrix V to a host of a Spark cluster by adopting Map operation, carrying out matrix multiplication by adopting Reduce operation, and then carrying out multiplication resultSending the data to a host by adopting Map operation and merging by adopting Reduce operation to obtain A(n)=YV。
The tensor CP decomposition implementation method based on the distributed environment is based on the ALS algorithm and used for carrying out factor matrix A in each iteration process(n)The updating of (2) first calculates Y ═ X by splitting the Khatri-Rao product(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) Then adopts the mode of parallel computing outer product to computeAnd finally, partitioning the matrix Y and the matrix V, distributing the partitioned matrixes corresponding to the matrix Y and the matrix V to the hosts of the Spark cluster by adopting Map operation, carrying out matrix multiplication by adopting Reduce operation, sending multiplication results to one host by adopting Map operation, and combining by adopting Reduce operation to obtain A(n)YV. The method realizes the tensor CP decomposition based on the MapReduce and Spark technology, and can effectively improve the efficiency of the tensor CP decomposition.
Drawings
FIG. 1 is a flowchart of an embodiment of updating a factor matrix in a distributed environment-based tensor CP decomposition implementation method according to the present invention;
FIG. 2 is a flow chart of the present invention for splitting the Khatri-Rao product calculation matrix Y;
FIG. 3 is a flow chart of the present invention for computing matrix V by parallel outer product computation;
FIG. 5 is a graph of the runtime contrast of the present invention and the contrast method for different tensor sizes;
FIG. 6 is a graph of the runtime comparison of the present invention and the comparison method at different tensor densities.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
To better explain the technical solution of the present invention, the tensor CP decomposition and the principle on which the present invention is based will be briefly explained.
For N-order tensor with rank RInDimension representing the nth order, N being 1,2, …, N, with the goal of computing a nearest neighborAnd the rank is R, i.e. the calculationII denotes the norm, whereWherein A is(1),…,A(n-1),A(n),…,A(N)Is a factor matrix of the tensor.
ALS (Alternating Least Squares) algorithm is a common algorithm for the current tensor CP decomposition, and the method is to calculate in turnEach factor matrix is fixed with other factor matrixes when in each calculation, so that each calculation is converted into an optimization problem shown by the following formula:
wherein X(n)Tensor of representationOf mode-n matrixing, i.e. tensorsThe resulting matrix is expanded along mode-n with superscript T indicating transpose and the lines indicating the Khatri-Rao product.
When finding R factor matrixesAnd make the target function | X(n)-A(n)(A(1)⊙…⊙A(n-1)⊙…⊙A(N))TWhen the value of | is the smallest, then the R factor matrices are the result of the CP decomposition.
In engineering, tensor is calculated using ALS algorithmInitializing N factor matrices A(n)Performing multiple iterative computations, each iterative computation being updated in turnThe other factor matrices are fixed during calculation, and the iteration is repeated until the value of the objective function is zero or less than a given threshold value. Wherein each iteration requires updating all factor matrices, a for each factor matrix(n)The update is performed using the following formula:
Thus, in each calculation, the corresponding basis is updated according to the above formula. Parallelization and distribution of tensor decomposition algorithm are that a distributed algorithm needs to be designed to complete factor matrix A(n)Is moreAnd (5) new.
Examples
The invention is based on ALS algorithm, improves the updating of the factor matrix in each iteration process, and realizes the updating of the factor matrix based on a distributed environment so as to improve the efficiency of tensor CP decomposition. Fig. 1 is a flowchart of an embodiment of updating a factor matrix in a distributed environment-based tensor CP decomposition implementation method according to the present invention. As shown in fig. 1, in the tensor CP decomposition implementation method based on the distributed environment of the present invention, the specific step of updating the factor matrix includes:
s101: data arrangement:
let set D be {1,2, …, N } - { N }, arrange the elements in set D in ascending order, let the jth element be DjIt is clear that there are N-1 elements in set D, i.e., j ═ 1,2, …, N-1; let matrix Y be X(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)),Namely A(n)=YV。
S102: splitting the Khatri-Rao product to calculate a matrix Y:
in equation (2), first, Y ═ X needs to be calculated(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) And A is(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)Is the Khatri-Rao product that leads to a surge in intermediate data. Calculating X(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) Is actually calculating the tensorN-mode product with the column vector of the factor matrix, so by the above analysis, X can be multiplied(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) The calculation process of (a) is converted into the following algorithm:
wherein the content of the first and second substances,matrix of presentation factorsThe r-th column vector of (2),denotes an n-mode product, where n ═ dj,Represents X(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) Is the transpose of the r-th column vector in the calculation result Y.
The following third order tensor with a rank of 2The above method will be described as an example. Suppose that the factor matrix to be updated this time is A(1)Third order tensorThe forward slices of (a) are respectively:
last updated obtained A(2)And A(3)Respectively as follows:
Let r be 1, then:
let r be 2, then:
It can be seen that the above calculation process needs R iterations, N-1N-mode product calculations are performed in each iteration, and the calculation of the N-mode product does not need a large amount of storage, and can be conveniently realized by using MapReduce.
According to the analysis, the Khatri-Rao product is obtained by splitting the Khatri-Rao product so as to calculate Y ═ X(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) The specific process of (1). FIG. 2 is a flow chart of the present invention for splitting the Khatri-Rao product calculation matrix Y. As shown in fig. 2, the specific process of splitting the Khatri-Rao product to calculate the matrix Y in the present invention is as follows:
s201: the initialization rank index r is 1.
s203: splitting tensor:
and Map: according to mode-djSplitting to obtain tensorWhen n > djThen toAs a key, the key is a key,as value, otherwise withAs a key, the key is a key,as value.OrTensor of representationThe elements of (1), the elements of the same key constituting the tensorThe tensor can be expanded by performing map operation on the fiberIs distributed to each host of the Spark cluster. Simultaneously, the factor matrix is formed by utilizing the characteristics of shared variables of SparkColumn vector ofIs transferred toAs a broadcast variable, it is distributed to each host of the Spark cluster.
Since the Spark technique is adopted for calculation in the invention, except forThe rest(s)In fact, the tensors are distributed on different hosts, and splitting the tensors, that is, when Map operation is performed, the tensor splitting operation is actually completed by combining a plurality of hosts.
S204: calculate the inner product of fiber and column vector:
reduce: each host of the Spark cluster is receivingIn the form of a key, the key is a key,orValue data and column vectorThen, byAnConstituting fiberOr is made ofAnConstituting fiberThe inner product of the fiber and column vector is calculated according to the following formula:
The different hosts can be calculated according to the data distributed to the hosts and the formula
S205: and judging whether j is less than N-1, if so, entering the step S206, otherwise, entering the step S207.
S206: let j equal j +1, return to step S203.
And Map: each host of Spark cluster is in code1Is key,Performing map operations for value, code1For preset codes, i.e. calculated by each hostMap to a host.
S208: and judging whether R is less than R, if so, entering step S209, and otherwise, entering step S210.
S209: let r be r +1, return to step S202.
S210: merging vectors:
merging vectors obtained by R times of cyclic calculationWill be provided withAs the r column vector of the matrix YThereby obtaining a matrix Y.
Third order tensor still with preceding rank 2The above-described process will be described by way of example. Also assume that the factor matrix to be updated this time is A(1)Third order tensorThe forward slices of (a) are respectively:
last updated obtained A(2)And A(3)Respectively as follows:
due to the need to update the factor matrix A(1)I.e. n is 1, so the set D is 2, 3.
map due to d1Split 2 by mode-24 fibers are obtained, sinceIs 3 rd order and thus each fiber is actually a vector. The key of each element should be i1+i3. Table 1 shows the split by mode-2 in this exampleThe result of (1).
key | 1+1 | 1+1 | 1+1 |
Value of |
1 | 3 | 5 |
|
1+2 | 1+2 | 1+2 |
Value of element | 2 | 4 | 6 |
key | 2+1 | 2+1 | 2+1 |
Value of element | 7 | 9 | 11 |
key | 2+2 | 2+2 | 2+2 |
Value of element | 8 | 10 | 12 |
TABLE 1
Wherein each row of element values in Table 1 constitutes a fiber, i.e. The values of the respective elements in Table 1 are expressed as i1+i3As a key, toAnd performing map operation as value, and distributing the value to each host of the Spark cluster, namely completing the distribution of the fiber. A is to be(2)Column vector ofIs transferred toAs a broadcast variable, it is distributed to each host of the Spark cluster.
Reduce: each host computer obtaining data respectively calculates 4 fibersAndthe inner product of (a) yields 4 values 6, 8, 18, 20. From these 4 values, the size I is formed1×I3Magnitude of 2 nd order tensorNamely, it isTherein of elementsThe following can be obtained:
let j equal 2.
Map due to d2Split 3, required by mode-3WhileEach element in (1), namely the calculation result of last Reduce, is dispersed on each host of Spark cluster, so that each host directly performs Map operation, namely executingEach host computer of (1) with1As a key, the key is a key,the Map operation is performed as value. Obviously, the fiber sent at this time is [ 618 ]]、[8 20]. A is to be(3)Middle column vector ofIs transferred toAs a broadcast variable, it is distributed to each host of the Spark cluster.
Reduce, each host computer obtaining data respectively calculates 2 fibersAndthe inner product of (a) yields 2 values 24, 28. From these 2 values, the size I is formed1Tensor of order 1Namely, it isTherein of elementsWill be provided withMapping to the same host, and merging to obtain vector
S103: and (3) calculating a matrix V by adopting a parallel outer product calculation mode:
the matrix is then analyzedAnd (4) calculating. It is clear that the key step is to be able to calculate (A) efficiently(1)TA(1)*...*A(n-1)TA(n-1)*...*A(N)TA(N)) Because the result of this equation is a matrix of size R x R, which is typically a small value, the computation of the pseudo-inverse is quite fast and easy. (A)(1)TA(1)*...*A(n-1)TA(n-1)*...*A(N)TA(N)) Can be calculated from left to right, respectively, the result of the product of the transpose of each matrix and itself, sinceThe results are thus a matrix of size R x R, and the Hadamard products of the N-1 matrices of size R x R are finally calculated, thus completing the calculation of the equation. ComputingIs just a calculation matrixAnd the process of transposing each row of (a) and adding the results of all the outer products to the outer product of that row. Thus, can beThe calculation of (c) is described by the following algorithm:
wherein the content of the first and second substances,to representRow i vector of (1), and o represents the outer product.
The above algorithm is illustrated by taking a matrix a of size 3 × 2 as an example:
When i is 1, there are:
when i is 2, there are:
when i is 3, there are:
obviously, the results obtained by using the above algorithm are combined with the direct calculation of ATThe results for a are the same.
In the above algorithm, the matrix needs to be calculatedThe transpose of each row of (a) and the outer product of that row, thus requiring multiple iterations. By observing the calculation in each iteration of the algorithm, i.e.It can be found that the data used to compute the outer product is a matrixThe same row vector, so that the calculation of the outer product can be completed independently; meanwhile, the matrix obtained by calculating the outer product is a matrix with the size of R multiplied by R, and the matrix is very small and does not occupy a large amount of storage space. Based on this finding, the present invention combines matricesAnd splitting is carried out, each row vector is distributed to each machine of the cluster, and the outer product calculation is executed in parallel, so that the efficiency is improved. Here, the matrix needs to be consideredIs provided withGo intoThe rows may be a large number, so the computation of the outer product cannot be completed by only one reducer, partial outer products need to be merged in advance, the pressure of the reducer is reduced, the computation efficiency is improved, and finally all the outer products are merged on one reducer and the computation of the pseudo-inverse is completed. The specific implementation of MapReduce in the step is divided into two MapReduce algorithms, the outer products are calculated and all the outer products are merged, and each MapReduce is divided into two steps which are Map and Reduce respectively.
From the above analysis, the calculations of the present invention can be obtainedThe specific process of (1). Fig. 3 is a flow chart of the present invention for computing matrix V by parallel computing outer products. As shown in fig. 3, the specific steps of calculating the matrix V in the form of parallel outer product in the present invention include:
s301: the initialization j is 1.
In the invention, two steps of MapReduce are adopted for calculationFIG. 4 is a MapReduce-based calculation in the present inventionIs a schematic flow diagram. As shown in FIG. 4, the calculation based on MapReduce in the inventionThe specific method comprises the following steps:
1) and Map: first split the matrixThe row vectors of the matrix are distributed to each host of the Spark cluster, namely, the row sequence number i is taken as key,The map operation is performed for value.
2) Reduce: each host of the Spark cluster receives i as a key,after the value data, calculateAll the hosts calculated on the hostSum and record the result asM is 1,2, …, M indicates the number of hosts of the Spark cluster. This is becauseUsually, it is large, so that more than one row vector and its transposed outer product are calculated on each host, so that each host merges the partial outer products calculated thereon in advance to reduce the subsequent workload.
3) And Map: each host of Spark cluster is in code2Is key,Performing map operations for value, code2For preset codes, i.e. calculated by each hostMap to a host.
s303: judging whether j is less than N-1, if so, entering step S304, otherwise, entering step S305.
S304: let j be j +1, return to step S302.
S305: calculating a matrix V:
according to N-1The matrix V is calculated according to the calculation result, and the specific process is as follows:
1) and Map: the host computer is obtained by calculationThen, in code3Is key,Map operations for value, code as such3Is a preset code.
2) Reduce: receive toHost computer of (2) calculates allAnd calculating the pseudo-inverse of the Hadamard product result to obtain a matrix V. According to the previous analysis, since eachAre all R × R matrices, the calculation is relatively simple, and therefore can be done with one Reduce.
S104: computing based on distributed caches(n):
In the calculation of matrix multiplication A(n)One key factor to consider when YV is that a single machine store can accommodate both matrices and operate efficiently. Wherein Y ═ X(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) Is a size of InX R matrix, where R is usually a smaller number, and InOften, the memory of the matrix unit is not able to be accommodated. If a disk is used for auxiliary storage (for example, a swap partition mode is used), a large amount of disk I/O (input/output) is generated when a program runsGreatly affecting the efficiency of the operation.Is R x R, this matrix can be stored in a stand-alone memory completely since R is a small number. From the above analysis, it can be concluded that A is(n)YV is a large matrix multiplication with severe data skew, and the common matrix multiplication method cannot be applied to such a matrix.
Based on the reasons, the matrix Y and the matrix V are partitioned, Map operation is adopted to distribute the partitioned matrixes in the matrix Y and the matrix V to the hosts of the Spark cluster, Reduce operation is adopted to carry out matrix multiplication, then the multiplication results are sent to one host by Map operation and are merged by Reduce operation, and A is obtained(n)YV, thereby enabling more efficient calculations.
For matrix blocking, the common ways are: the division by columns, by rows and by columns. The research of the invention finds that the matrix V is not the optimal way if the matrix V is also partitioned because of the small scale of the matrix V. Therefore, it is preferable to block only the matrix Y, i.e. the blocking method for the matrix Y and the matrix V is as follows:
and partitioning the matrix Y according to rows to obtain a partitioned matrix with the columns of R, wherein the row size of the partitioned matrix is set according to actual needs, and the partitioned matrix of the matrix V is the partitioned matrix, namely the partitioned matrix is not partitioned.
The block dividing mode is adopted, and the calculation A is calculated based on the distributed cache(n)The specific process is as follows:
1) and Map: firstly, splitting the matrix Y, and taking the row sequence number k as key and row vector YkMap operation is performed as value, k is 1,2, …, InSeparate row vectors ykAnd distributing to each host of the Spark cluster. And meanwhile, setting the matrix V as a broadcast variable of Spark, and distributing the broadcast variable to each host of the Spark cluster.
2) Reduce: receiving row vector y in Spark clusterkAnd each host of matrix V, calculate Ak (n)=ykV。
3) And Map: the host computer obtains A through calculationk (n)Then, in code4Is key, Ak (n)Performing map operations for value, code4Is a preset code.
4) Reduce: receive InA isk (n)Host computer Ak (n)As A(n)To obtain the factor matrix a(n)。
In order to better illustrate the technical effect of the invention, a specific example is adopted to carry out experimental verification on the invention, and the technical effect is compared with the technical effect of the existing tensor CP decomposition method. In the experimental verification, 10 hosts are included in the Spark cluster, and the tensor data uses the NELL data source of CMU (university of tomilon in card), which originates from the "Read the Web" item of CMU and includes a large number of categories and relationships. Because the reality data are generally sparse, in order to test the expression of the decomposition algorithm of the tensor under the data with different sparsity degrees, the NELL full data are added, and the third-order tensors with different sizes and different densities are generated randomly in the experiment. Table 2 is a data set description in this example.
TABLE 2
The comparison method adopted in the experimental verification is that a traditional Tensor CP decomposition method tool is software MATLAB (matrix laboratory) Toolbox Version 2.6, the MATLAB Toolbox is realized by Tamara G.Kolda of American Sandia national laboratory, and the characteristic peer-to-peer operations of CP decomposition, Tucker decomposition and matrix calculation of dense Tensor, sparse Tensor and structured Tensor are provided, but the MATLAB Toolbox does not support distributed Tensor decomposition operation. This experiment verifies that the rank R of the tensor is set to 10 when CP decomposition is performed.
Firstly, fixing tensor density, and testing the running time of the method and the comparison method under the condition that tensor sizes are different. The tensor in the experiment is from I to J to K to 103Gradually increase to I ═ J ═ K ═ 108The number of non-zero elements is 10 × I. FIG. 5 is a graph of the runtime comparison of the present invention and the comparison method at different tensor sizes. As shown in fig. 5, the operation time of the contrast method increases with the scale of the tensor, and when the size of the tensor exceeds I, J, K, 106In time, the contrast method cannot complete the CP decomposition of the tensor due to the CPU and memory limitations of the single machine (mainly limited to the memory). The invention has tensor size of I-J-K-103~106The running time is stable because under the condition that the tensor scale is not large, the task scheduling and network transmission data occupy most of the running time of the program when the algorithm is operated, and the part of the running time is relatively stable. When the tensor size exceeds I, J, K and 106The runtime of the present invention begins to increase; when the tensor size reaches I ═ J ═ K ═ 10 ═ K ═8The increment of the running time of the invention is higher by an order of magnitude, at this time, the memory occupation of the cluster reaches a peak value, Spark starts to use the exchange partition of the disk to store partial data, some temporarily unnecessary RDDs are also cleared from the memory, and the RDDs are recalculated according to the ancestry of the RDDs when needed later, so that the running time is obviously increased by the two factors. Although the invention increases the run time when CP decomposing the large scale tensor, it is acceptable for engineering applications.
The size of the tensor is then fixed and the run time of the present invention and comparison method is tested for different tensor densities. The tensor size used for the test is I-J-K-105. Density of tensor is from 10-9Increment to 10-5Number of non-zero elements is 106To 1010. FIG. 6 is a graph of the runtime comparison of the present invention and the comparison method at different tensor densities. As shown in fig. 6, the density of the tensor is from 10-9Increased to 10-7The run time of the comparative method increases substantially linearly,when the density exceeds 10-7In time, the contrast method cannot complete the CP decomposition of the tensor. The invention has a tensor density of 10-9To 10-6The run time growth was more stable when the tensor density increased to 10-5The time for program execution increases significantly. Because the invention uses the mode of sparse storage to store the tensor, the number of the nonzero elements is increased along with the increase of the density of the tensor, more memories are needed on Spark to store RDD, if the memories are not enough, the mechanisms of discarding the temporarily unneeded RDD and using the disk swap partition are started to be started, which causes extra operation cost and increases the operation time. Although the invention increases the run time when CP decomposing the high density tensor, it is acceptable for engineering applications.
In conclusion, the tensor CP decomposition method has less running time than the traditional method, can break through the limitation of single machine software and hardware conditions, realizes the large-scale and high-density tensor CP decomposition, and keeps better timeliness.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.
Claims (2)
1. Tensor CP decomposition implementation method based on distributed environment, for N-order tensor with rank RInDimension number of nth order, N is 1,2, …, N, andwherein A is(1),…,A(n-1),A(n),…,A(N)Is a factor matrix of the tensor; initializing N factor matrices A(n)Alternately updated at each iterationThe other factor matrixes are fixed during calculation, and iteration is repeated until the value of the objective function is zero or less than a given threshold value, wherein the N factor matrixes A(n)I.e. tensorThe result of CP decomposition of (1), wherein the factor matrix A(n)The update formula of (2) is:
wherein, the upper labelIndicates a pseudo-inverse,. indicates a Khatri-Rao product,. indicates a Hadamard product;
characterized by a factor matrix A(n)The following method is adopted for updating:
s1: let set D be {1,2, …, N } - { N }, arrange the elements in set D in ascending order, let the jth element be DjJ ═ 1,2, …, N-1; let matrix Y be X(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)),
S2: calculating Y ═ X by splitting Khatri-Rao product(n)(A(N)⊙…⊙A(n+1)⊙A(n-1)⊙…⊙A(1)) The method comprises the following specific steps:
s2.1: initializing a rank serial number r to be 1;
s2.3: and Map: according to mode-djSplitting to obtain tensorWhen n > djThen to OrTensor of representationThe tensor can be tensed by performing map operationThe fibers are distributed to all the hosts of the Spark cluster; simultaneous factor matrixColumn vector ofIs transferred toAs broadcast variables, distributing the broadcast variables to each host of the Spark cluster;
s2.4: reduce: each host of the Spark cluster is receivingIn the form of a key, the key is a key,orValue data and column vectorThen, byAnConstituting fiberOr is made ofAnConstituting fiberThe inner product of the fiber and column vector is calculated according to the following formula:
S2.5: judging whether j is less than N-1, if so, entering a step S2.6, otherwise, entering a step S2.7;
s2.6: j is made j +1, and the step S2.3 is returned;
s2.7: and Map: each host of Spark cluster is in code1Is key,Performing map operations for value, code1The code is a preset code;
S2.8: judging whether R is less than R, if so, entering step S2.9, otherwise, entering step S2.10;
s2.9: let r be r +1, return to step S2.3;
s2.10: merging vectors obtained by R times of cyclic calculationWill be provided withThe r column vector is used as the matrix Y, so that the matrix Y is obtained;
s3: matrix calculation by parallel outer product calculationThe specific method comprises the following steps:
s3.1: initializing j to 1;
1) And Map: first split the matrixThe row vectors of the matrix are distributed to each host of the Spark cluster, namely, the row sequence number i is taken as key,Performing map operation for value;
2) reduce: each host of the Spark cluster receives i as a key,after the value data, calculate Represents the outer product, all the hosts calculated on itSum and record the result asM represents the number of hosts of the Spark cluster;
3) and Map: each host of Spark cluster is in code2Is key,Performing map operations for value, code2The code is a preset code;
s3.3: judging whether j is less than N-1, if so, entering a step S3.4, otherwise, entering a step S3.5;
s3.4: j is made j +1, and the step returns to step S3.2;
s3.5: according to N-1The matrix V is calculated according to the calculation result, and the specific process is as follows:
1) and Map: the host computer is obtained by calculationThen, in code3Is key,Performing map operations for value, code3The code is a preset code;
2) reduce: receive toHost computer of (2) calculates allCalculating the pseudo-inverse of the Hadamard product result to obtain a matrix V;
s4: partitioning the matrix Y and the matrix V, distributing the partitioned matrixes corresponding to the matrix Y and the matrix V to the hosts of the Spark cluster by adopting Map operation, carrying out matrix multiplication by adopting Reduce operation, sending multiplication results to one host by adopting Map operation, and merging by adopting Reduce operation to obtain A(n)=YV。
2. The tensor CP decomposition implementation method as claimed in claim 1, wherein the partitioning method for the matrix Y and the matrix V in S4 is: and partitioning the matrix Y according to rows to obtain a partitioned matrix with the columns being R, wherein the partitioned matrix of the matrix V is the partitioned matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711426277.0A CN108170639B (en) | 2017-12-26 | 2017-12-26 | Tensor CP decomposition implementation method based on distributed environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711426277.0A CN108170639B (en) | 2017-12-26 | 2017-12-26 | Tensor CP decomposition implementation method based on distributed environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108170639A CN108170639A (en) | 2018-06-15 |
CN108170639B true CN108170639B (en) | 2021-08-17 |
Family
ID=62520749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711426277.0A Active CN108170639B (en) | 2017-12-26 | 2017-12-26 | Tensor CP decomposition implementation method based on distributed environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108170639B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11567816B2 (en) | 2017-09-13 | 2023-01-31 | Hrl Laboratories, Llc | Transitive tensor analysis for detection of network activities |
CN109299725B (en) * | 2018-07-27 | 2021-10-08 | 华中科技大学鄂州工业技术研究院 | Prediction system and device for parallel realization of high-order principal eigenvalue decomposition by tensor chain |
US10796225B2 (en) * | 2018-08-03 | 2020-10-06 | Google Llc | Distributing tensor computations across computing devices |
CN110362780B (en) * | 2019-07-17 | 2021-03-23 | 北京航空航天大学 | Large data tensor canonical decomposition calculation method based on Shenwei many-core processor |
CN111276183B (en) * | 2020-02-25 | 2023-03-21 | 云南大学 | Tensor decomposition processing method based on parameter estimation |
CN111461193B (en) * | 2020-03-25 | 2023-04-18 | 中国人民解放军国防科技大学 | Incremental tensor decomposition method and system for open source event correlation prediction |
EP4185970A1 (en) * | 2020-07-22 | 2023-05-31 | HRL Laboratories, LLC | Transitive tensor analysis for detection of network activities |
CN112835552A (en) * | 2021-01-26 | 2021-05-25 | 算筹信息科技有限公司 | Method for solving inner product of sparse matrix and dense matrix by outer product accumulation |
CN115146780B (en) * | 2022-08-30 | 2023-07-11 | 之江实验室 | Quantum tensor network transposition and contraction cooperative method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105260554A (en) * | 2015-10-27 | 2016-01-20 | 武汉大学 | GPU cluster-based multidimensional big data factorization method |
CN105913085A (en) * | 2016-04-12 | 2016-08-31 | 中国科学院深圳先进技术研究院 | Tensor model-based multi-source data classification optimizing method and system |
CN107015946A (en) * | 2016-01-27 | 2017-08-04 | 常州普适信息科技有限公司 | Distributed high-order SVD and its incremental computations a kind of method |
-
2017
- 2017-12-26 CN CN201711426277.0A patent/CN108170639B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105260554A (en) * | 2015-10-27 | 2016-01-20 | 武汉大学 | GPU cluster-based multidimensional big data factorization method |
CN107015946A (en) * | 2016-01-27 | 2017-08-04 | 常州普适信息科技有限公司 | Distributed high-order SVD and its incremental computations a kind of method |
CN105913085A (en) * | 2016-04-12 | 2016-08-31 | 中国科学院深圳先进技术研究院 | Tensor model-based multi-source data classification optimizing method and system |
Non-Patent Citations (3)
Title |
---|
Tensor Decomposition for Signal Processing and Machine Learning;Nicholas D. Sidiropoulos et al.;《arXiv:1607.01668v2》;20161214;第1-44页 * |
分布式环境下的张量分解算法研究;adsuhviusa;《http://www.doc88.com/p-3197463411177.html》;20171204;第6-49页 * |
基于共享内存的多核时代数据结构研究;周维 等;《软件学报》;20160430;第27卷(第4期);第1009-1025页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108170639A (en) | 2018-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108170639B (en) | Tensor CP decomposition implementation method based on distributed environment | |
Lu et al. | SpWA: An efficient sparse winograd convolutional neural networks accelerator on FPGAs | |
Smith et al. | SPLATT: Efficient and parallel sparse tensor-matrix multiplication | |
Albericio et al. | Cnvlutin: Ineffectual-neuron-free deep neural network computing | |
CN109328361B (en) | Accelerator for deep neural network | |
Dang et al. | CUDA-enabled Sparse Matrix–Vector Multiplication on GPUs using atomic operations | |
JP2016119084A (en) | Computer-implemented system and method for efficient sparse matrix representation and processing | |
Ma et al. | Optimizing sparse tensor times matrix on GPUs | |
WO2012076379A2 (en) | Data structure for tiling and packetizing a sparse matrix | |
CN109033030B (en) | Tensor decomposition and reconstruction method based on GPU | |
US20200159810A1 (en) | Partitioning sparse matrices based on sparse matrix representations for crossbar-based architectures | |
WO2012076377A2 (en) | Optimizing output vector data generation using a formatted matrix data structure | |
Rungsawang et al. | Fast pagerank computation on a gpu cluster | |
Conte et al. | GPU-acceleration of waveform relaxation methods for large differential systems | |
D’Amore et al. | Mathematical approach to the performance evaluation of matrix multiply algorithm | |
Gu et al. | Efficient large scale distributed matrix computation with spark | |
US20180373677A1 (en) | Apparatus and Methods of Providing Efficient Data Parallelization for Multi-Dimensional FFTs | |
US20220382829A1 (en) | Sparse matrix multiplication in hardware | |
WO2022016261A1 (en) | System and method for accelerating training of deep learning networks | |
Jain-Mendon et al. | A hardware–software co-design approach for implementing sparse matrix vector multiplication on FPGAs | |
Wang et al. | A novel parallel algorithm for sparse tensor matrix chain multiplication via tcu-acceleration | |
Wu et al. | Optimizing dynamic programming on graphics processing units via data reuse and data prefetch with inter-block barrier synchronization | |
US9600446B2 (en) | Parallel multicolor incomplete LU factorization preconditioning processor and method of use thereof | |
Caron et al. | On the performance of parallel factorization of out-of-core matrices | |
CN114428936A (en) | Allocating processing threads for matrix-matrix multiplication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |