WO2016123808A1 - 数据处理系统、计算节点和数据处理的方法 - Google Patents

数据处理系统、计算节点和数据处理的方法 Download PDF

Info

Publication number
WO2016123808A1
WO2016123808A1 PCT/CN2015/072451 CN2015072451W WO2016123808A1 WO 2016123808 A1 WO2016123808 A1 WO 2016123808A1 CN 2015072451 W CN2015072451 W CN 2015072451W WO 2016123808 A1 WO2016123808 A1 WO 2016123808A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data block
intermediate result
node
computing
Prior art date
Application number
PCT/CN2015/072451
Other languages
English (en)
French (fr)
Inventor
黄国位
颜友亮
朱望斌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020177022612A priority Critical patent/KR101999639B1/ko
Priority to PCT/CN2015/072451 priority patent/WO2016123808A1/zh
Priority to EP15880764.4A priority patent/EP3239853A4/en
Priority to CN201580001137.1A priority patent/CN106062732B/zh
Priority to JP2017541356A priority patent/JP6508661B2/ja
Publication of WO2016123808A1 publication Critical patent/WO2016123808A1/zh
Priority to US15/667,634 priority patent/US10567494B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1023Server selection for load balancing based on a hash applied to IP addresses or costs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/184Distributed file systems implemented as replicated file system
    • G06F16/1844Management specifically adapted to replicated file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1012Server selection for load balancing based on compliance of requirements or conditions with available server resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • G06F2207/4818Threshold devices
    • G06F2207/4824Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/83Admission control; Resource allocation based on usage prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1074Peer-to-peer [P2P] networks for supporting data block transmission mechanisms
    • H04L67/1078Resource delivery mechanisms

Definitions

  • the present invention relates to the field of computer technology, and in particular, to a data processing system, a computing node, and a data processing method in the field of graph computing.
  • the object of data mining and machine learning is usually a collection of objects and the relationship between these objects (for example, social networks).
  • the above research objects can be expressed in the form of graphs in mathematical meaning.
  • the graph is used to describe the relationship between the object and the object. From the visual point of view, the graph can be composed of small dots and lines connecting the dots. The dots are called vertex of the graph, and the line scale connecting the dots. For the edge.
  • a data structure is chosen to represent the graph.
  • the adjacency list uses objects to represent vertices, and uses pointers or references to represent edges. This data structure is not conducive to parallelization of graphs.
  • Adjacency matrix referred to as matrix in this paper, uses a two-dimensional matrix to store each The adjacency relationship between the vertices can be parallelized by the data structure method, and the data storage amount is small when the matrix is used to store data.
  • Matrix calculations in graph calculations can theoretically include matrix vector multiplication operations and matrix matrix multiplication operations.
  • the existing matrix vector power operation such as the generalized Iterated Matrix-Vector multiplication (GIMV)
  • GIMV generalized Iterated Matrix-Vector multiplication
  • the elements and the elements of the vector are all merged after the two-two merge operation is completed, and the result of the two-two merge operation is globally merged.
  • the embodiment of the invention provides a data processing system, a computing node and a data processing method, which can make the memory space occupied by the data processing smaller, and can reduce the calculation time.
  • an embodiment of the present invention provides a data processing system, where the data processing system includes a management node and a first type of computing node.
  • the management node is used to:
  • FC x is the xth computing node of the at least two computing nodes, and x is a positive integer ;
  • At least two computing nodes of the first type of computing nodes process the first processing task allocated by the management node in parallel;
  • the computing node FC x is configured to acquire a data block M x and a data block V 1x in the data set to be processed according to the first processing task allocated by the management node, where the data block M x is m rows a matrix of n columns of data, the data block V 1x is a vector containing n-dimensional data, m and n are positive integers, and the value of n is not less than 2;
  • first intermediate result V′ x being a vector including m-dimensional data
  • the element in block V 1x , j is a variable, and the values of j are from 1 to n, respectively;
  • the management node is further configured to obtain, according to a first intermediate result obtained by at least two computing nodes of the first type of computing nodes, a processing result of the to-be-processed data set.
  • the data processing system further includes a second type of computing node, where the management node is specifically configured to:
  • the computing node SC y is used to:
  • the management node is specifically configured to:
  • an embodiment of the present invention provides another data processing system, where the data processing system includes a management node and a first type of computing node.
  • the management node is used to:
  • FC x is the xth computing node of the at least two computing nodes, and x is a positive integer ;
  • At least two computing nodes of the first type of computing nodes process the first processing task allocated by the management node in parallel;
  • the computing node FC x is configured to acquire a data block M 1x and a data block M 2x in a data set to be processed according to a first processing task allocated by the management node, where the data block M 1x includes m rows n a matrix of column data, the data block M 2x is a matrix containing n rows and p columns of data, m, n and P are positive integers, and the value of n is not less than 2;
  • the management node is further configured to obtain, according to a first intermediate result obtained by at least two computing nodes of the first type of computing nodes, a processing result of the to-be-processed data set.
  • the data processing system further includes a second type of computing node, where the management node is specifically configured to:
  • the computing node SC y is used to:
  • the management node is specifically configured to:
  • an embodiment of the present invention provides a computing node, including:
  • a receiving module configured to receive a first processing task allocated by a management node in a data processing system, where the data processing system includes the computing node and the management node;
  • An acquiring module configured to acquire, according to the first processing task allocated by the management node received by the receiving module, a data block M x and a data block V 1x in a data set to be processed, where the data block M x a matrix comprising m rows and n columns of data, the data block V 1x is a vector containing n-dimensional data, m and n are positive integers, and the value of n is not less than 2;
  • the computing node comprises a physical machine, a virtual machine, or a central processing unit CPU.
  • an embodiment of the present invention provides another computing node, including:
  • a receiving module configured to receive a first processing task allocated by a management node in a data processing system, where the data processing system includes the computing node and the management node;
  • An obtaining module configured to acquire, according to the first processing task that is allocated by the management node that is received by the receiving module, a data block M 1x and a data block M 2x in a data set to be processed, where the data block M 1x a matrix comprising m rows and n columns of data, the data block M 2x is a matrix comprising n rows and p columns of data, m, n and P are positive integers, and the value of n is not less than 2;
  • a processing module configured to perform a merge combine2 operation and a reduction reduce2 operation on the data block M 1x and the data block M 2x to obtain a first intermediate result M′ x , the first intermediate result M′ x being included a matrix of m rows and p columns of data, the elements in the first intermediate result M' x are m' ij , i and j are variables, and the values of i are from 1 to m, respectively, and the values of j are from 1 to p, respectively.
  • an embodiment of the present invention provides a data processing method, where the method is applied to a data processing system, where the data processing system includes a management node and a first type of computing node, and the method includes:
  • the management node allocates a first processing task to at least two computing nodes including the FC x in the first type of computing node, where FC x is the xth computing node of the at least two computing nodes, And x is a positive integer, wherein at least two computing nodes of the first type of computing nodes process the first processing task allocated by the management node in parallel;
  • the first computing node FC x according to the management node processing task allocation, to obtain data to be processed set of data blocks and data block M x V 1x, wherein said data blocks containing M x m rows and n columns of data a matrix, the data block V 1x is a vector containing n-dimensional data, m and n are positive integers, and the value of n is not less than 2;
  • the management node obtains a processing result of the to-be-processed data set according to a first intermediate result obtained by at least two of the first type of computing nodes.
  • the data processing system further includes at least one second type of computing node, the method further includes:
  • the management node allocates a second processing task to the at least one computing node including the SC y in the second type of computing node according to the first intermediate result obtained by the at least two computing nodes of the first type of computing node
  • SC y is the yth computing node of the at least one computing node, and y is a positive integer
  • the computing node SC y obtains a first intermediate result obtained by at least two computing nodes of the first type of computing nodes according to the second processing task, where the first intermediate result obtained by the SC y is based on Determining a first intermediate result obtained by a data block located in the same row in the data set;
  • the computing node SC y performs a reduce2 operation on the first intermediate result obtained by the SC y to obtain a second intermediate result V′′ y , wherein the second intermediate result V′′ y is a vector containing m-dimensional data;
  • the management node obtains a processing result of the to-be-processed data set according to a second intermediate result obtained by at least one of the second type of calculation results.
  • the data set further includes a data block V 2x , where the data block V 2x is a vector including m-dimensional data
  • the method also includes:
  • the management node allocates to the at least one computing node including the SC y in the second type of computing node according to the second intermediate result obtained by the at least one computing node including the SC y in the second type of computing node Third processing task;
  • the computing node SC y acquires the data block V 2x in the data set according to the third processing task
  • the embodiment provides another method of data processing, the method being applied to a data processing system, the data processing system comprising a management node and a first computing node, the method comprising:
  • the management node allocates a first processing task to at least two computing nodes including the FC x in the first type of computing node, where FC x is the xth computing node of the at least two computing nodes, And x is a positive integer, wherein at least two computing nodes of the first type of computing nodes process the first processing task allocated by the management node in parallel;
  • the computing node FC x acquires the data block M 1x and the data block M 2x in the data set to be processed according to the first processing task allocated by the management node, where the data block M 1x is data including m rows and n columns. a matrix, the data block M 2x is a matrix containing n rows and p columns of data, m, n and P are positive integers, and the value of n is not less than 2;
  • the management node obtains a processing result of the to-be-processed data set according to a first intermediate result obtained by at least two of the first type of computing nodes.
  • the data processing system further includes a second type of computing node, where the method further includes:
  • the management node allocates a second processing task to the at least one computing node including the SC y in the second type of computing node according to the first intermediate result obtained by the at least two computing nodes of the first type of computing nodes
  • SC y is the yth computing node of the at least one computing node, and y is a positive integer
  • the computing node SC y obtains a first intermediate result obtained by at least two computing nodes of the first type of computing nodes according to the second processing task, where the first intermediate result obtained by the SC y is based on Determining the first intermediate result obtained by the data block M 1x in the same row and the data block M 2x in the same column;
  • the computing node SC y performs a reduce2 operation on the first intermediate result obtained by the SC y to obtain a second intermediate result M ′ y , wherein the second intermediate result M′′ y is a data containing m rows and p columns of data. matrix;
  • the management node obtains a processing result of the to-be-processed data set according to a second intermediate result obtained by at least one of the second type of calculation results.
  • the data set further includes a data block M 3x , where the data block M 3x is a matrix including m rows and p columns of data.
  • the method further includes:
  • the management node allocates to the at least one computing node including the SC y in the second type of computing node according to the second intermediate result obtained by the at least one computing node including the SC y in the second type of computing node Third processing task;
  • the computing node SC y acquires the data block M 3x in the data set according to the third processing task
  • the computing node SC y SC y obtained from the second intermediate result M "y M 3x of the data block and performs assignment operations assign, to obtain a processing result of the data set to be processed.
  • the data processing system, the computing node, and the data processing method provided by the embodiments of the present invention do not need to wait for all the merge operations to be completed before performing the merge operation and the reduction operation on the data block.
  • the operation is performed, and the merge operation and the reduction operation are alternately performed, thereby saving the memory space occupied by the calculation and reducing the calculation time.
  • FIG. 1 is a schematic diagram of a concept "graph" according to an embodiment of the present invention.
  • FIG. 2 is still another schematic diagram of a concept "graph" according to an embodiment of the present invention.
  • FIG. 3A is a schematic diagram showing an object relationship by a "directed graph"; and FIG. 3B is an adjacency matrix corresponding to the "directed graph" of FIG. 3A.
  • Figure 4 is a schematic diagram of a matrix vector multiplication operation.
  • Figure 5 is a schematic diagram of a matrix matrix multiplication operation.
  • FIG. 6A is a schematic diagram showing an object relationship by a "weighted directed graph"; and FIG. 6B is an adjacency matrix corresponding to the "weighted directed graph" of FIG. 6A.
  • Figure 7 is a schematic diagram of yet another matrix vector multiplication operation.
  • FIG. 8 is a schematic block diagram of a data processing system according to an embodiment of the present invention.
  • FIG. 9 is a schematic flowchart of data processing performed by a data processing system according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a single source shortest path (SSSP) algorithm according to an embodiment of the present invention.
  • SSSP single source shortest path
  • FIG. 11 is a schematic block diagram of still another data processing system according to an embodiment of the present invention.
  • FIG. 12 is a schematic flowchart of still another data processing system for data processing according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of still another data processing system performing probability propagation algorithm according to an embodiment of the present invention.
  • FIG. 14 is a schematic block diagram of a computing node according to an embodiment of the present invention.
  • FIG. 15 is a schematic block diagram of still another computing node according to an embodiment of the present invention.
  • FIG. 16 is a schematic block diagram of still another computing node according to an embodiment of the present invention.
  • FIG. 17 is a schematic block diagram of still another computing node according to an embodiment of the present invention.
  • FIG. 18 is a schematic flowchart of a method for data processing according to an embodiment of the present invention.
  • FIG. 19 is a schematic flowchart of still another method for data processing according to an embodiment of the present invention.
  • Diagrams are used to describe the relationship between objects and objects.
  • a graph consists of small dots and lines connecting the dots.
  • the dots are called the vertices of the graph, and the lines connecting the dots are called edges.
  • the sides can be divided into an undirected edge as shown in Figure 1 and a directed edge as shown in Figure 2.
  • Figure 1 contains 1-6 six objects, the relationship between the six objects is represented by the undirected edges between them;
  • Figure 2 contains 0-6 seven objects, seven objects between each other
  • the relationship is represented by a directed edge between them, and when the relationship between objects is represented by a directed edge, the graph may be referred to as a directed graph.
  • multiplication can mean generalized multiplication, and can also mean two multiplications.
  • multiplication in matrix vector multiplication refers to generalized multiplication, that is, the multiplication of the elements of the matrix and the corresponding elements of the vector is not a multiplication of the traditional meaning, but may be addition, subtraction, multiplication, division, summation, quadrature, maximum
  • Other processes, such as a value or a minimum value, are not limited in this embodiment of the present invention.
  • the adjacency matrix uses a two-dimensional matrix to store the adjacencies between the vertices in the graph.
  • 3A is a directed graph with six vertices V 1 -V 6, directed edges represent the relationship between the six vertices;
  • FIG. 3B is a representation of a corresponding adjacency matrix.
  • the graph can be parallelized by the adjacency matrix representation graph.
  • the matrix can be divided into two types, a dense matrix and a sparse matrix.
  • Dense matrices are generally represented by a vector or a two-dimensional vector (vector of a vector), which is divided into a main sequence of behavior and a main sequence of columns; a sparse matrix (without 0 elements) generally has three storage formats of COO, CSR, and CSC.
  • the 0 elements in the matrix data representation (or, as the infinite value elements of the SSSP algorithm) are not stored when stored, and thus the representation of the matrix can reduce the amount of data storage.
  • most of the operations of the graph can be transformed into matrix vector multiplication (ie, matrix and vector multiplication) operations, or matrix matrix multiplication (ie, matrix and matrix multiplication) operations.
  • Matrix vector multiplication operations are based on a series of operations between a matrix and a vector. Examples of matrix vector multiplication operation shown in FIG calculated: YES Friends Find all V in FIG. 2 to FIG. 3A, FIG. 4 matrix-vector multiplication can be accomplished by way of FIG. First construct the query vector, because the lookup is the OutNeighbors of V 2 , so the second element of the vector is set to 1, the other elements are set to 0. Second, because the lookup is OutNeighbors, the adjacency matrix needs to be transposed Finally, multiply the transposed adjacency matrix with the constructed vector (as shown in Figure 4) to obtain the result vector. The sixth bit of the result vector is 1, which means the friend of V 2 Only V 6 can be verified from Figure 3A.
  • Matrix matrix multiplication operations are based on a series of operations between a matrix and a matrix.
  • Matrix matrix multiplication operation represents an example of graph calculation: calculating the number of common buddies (outgoing vertices) between two vertices in Fig. 3A, which can be realized by a matrix matrix multiplication operation.
  • the adjacency matrix is constructed, as shown in the matrix of Fig. 3B, where the matrix is represented by A.
  • the obtained value of the element b ij in the matrix B indicates how many common vertices of the i-th vertex and the j-th vertex have .
  • the value of column 1 of row 3 indicates that vertex 3 has a common friend with vertex 1.
  • the cluster environment includes multiple computing units for calculation. Matrix vector multiplication operations and matrix matrix multiplication operations are performed based on matrix partitioning.
  • the matrix can be called distributed matrix after partitioning, and multiple distributed matrices can be used.
  • Parallel processing is performed on the computing unit.
  • M is an m-row n-column matrix
  • M ij is a matrix block after M-segmentation
  • V is an n-dimensional column vector
  • V j is a vector block after V-blocking
  • V' is an m-dimensional column vector
  • the GIMV model extends the traditional matrix vector multiplication:
  • combine2 is a merge operation of matrix elements and vector elements, the types of matrix elements and vector elements may be different, and the operations may be addition, subtraction, Multiply, divide, take the maximum value, take the minimum value, etc., but the embodiment of the present invention is not limited thereto, and the intermediate value x j is returned after combine2;
  • combineAll is a merge operation on multiple values or a record set for a combine2 result x 1 ,..., x n of a row of the matrix, generally a function (for example, can be an accumulation operation), after combineAll Return intermediate value
  • the GIMV model inputs the matrix M and the vector V. After three operations, the output vector V' has three operators:
  • the GIMV model can represent more algorithms.
  • FIG. 6A a weighted directed graph is shown
  • FIG. 6B is a corresponding adjacency matrix M.
  • the value in the matrix M is the distance weight of two points
  • the distance from the vertex to itself is 0, and the unreachable is represented by infinity.
  • the shortest distance from vertex 0 to other vertices can be achieved by iterative matrix vector multiplication. Each iteration represents the shortest distance obtained by adding 1 to the hop count (for example: from vertex 1 to vertex 2 represents 1 hop, from vertex 1 through vertex 2 to reach vertex 3 means 2 hops).
  • the result vector V' obtained by the first multiplication means that the shortest distance from other vertices can be reached after one hop from vertex 0.
  • the stopping condition of the SSSP algorithm iteration is: if the result vector V' obtained by a certain iteration does not change compared with the initial vector V of this iteration, the algorithm terminates. If the stop condition is not reached, the algorithm continues to iterate, that is, the result vector V' of this iteration is used as the initial vector V of the next iteration, and the next iteration is performed; the final result vector V' obtained by the algorithm is from the vertex 0. The shortest distance to each vertex.
  • the execution flow of the GIMV model is: the operation of the combineAll operation on all the combine2 results of a row of the matrix, which can be performed after all the combine2 operations are completed. Therefore, it needs to occupy a matrix-sized intermediate memory space during the calculation process. Moreover, in a distributed environment, the system needs to perform a large amount of data transmission.
  • the assign operation is the operation of the intermediate vector V and the initial vector V obtained by combineAll. Therefore, the two vector dimensions must be equal, and from another perspective, the matrix M must be a square matrix. And the vector of assign can only be the initial vector V, which also limits the representation range of the GIMV model.
  • the present invention is directed to the disadvantages of the above schemes for improving the data processing method to reduce the intermediate memory occupation and the amount of data transmitted during the matrix vector multiplication operation of the data processing; further, based on the matrix vector multiplication operation of the present invention
  • the principle of the matrix matrix multiplication operation model is proposed, so that more algorithms can be expressed.
  • FIG. 8 shows a schematic block diagram of a data processing system 100 in accordance with an embodiment of the present invention.
  • the data processing system 100 includes a management node 110 and a first type of computing node 120,
  • the management node 110 is used to:
  • FC x 121 is the xth computing node of the at least two computing nodes, and x is positive Integer
  • At least two of the first type of computing nodes 120 process the first processing task allocated by the management node 110 in parallel;
  • the computing node FC x 121 is configured to acquire, according to the first processing task allocated by the management node 110, the data block M x and the data block V 1x in the data set to be processed, where the data block M x is m rows and n a matrix of column data, the data block V 1x is a vector containing n-dimensional data, m and n are positive integers, and the value of n is not less than 2;
  • the management node 110 is further configured to obtain a processing result of the to-be-processed data set according to the first intermediate result obtained by at least two computing nodes of the first type of computing node 120.
  • the data processing system provided by the embodiment of the present invention does not need to wait for all the merge operations to complete after the merge operation and the reduction operation, but the merge operation and the reduction operation. Alternate, which saves memory space for calculations and reduces computation time.
  • Data processing system 100 can be applied to big data processing. Because of the large amount of data processed by big data, data is usually divided into blocks, and different data blocks are distributed to different computing nodes. Parallel computing to increase the efficiency of the calculation.
  • Data processing system 100 includes a management node 110 and a first type of computing node 120.
  • the management node 110 is configured to receive a data processing task, and divide the data processing task into a plurality of processing tasks, and distribute the processing tasks to the computing node.
  • the management node 110 is further configured to receive an execution state of each processing node for its processing task to manage a process of data processing.
  • the computing node is configured to receive the processing task delivered by the management node 110, and obtain a data block according to the processing task to perform a corresponding processing task.
  • the computing node may acquire the data block execution processing task stored in the computing node, and may also acquire the data block execution processing task stored in the other computing node.
  • the compute nodes can be classified according to the category of their processing tasks. For example, the first processing task is processed as the first type of node, and the second processing task is processed as the second type of node.
  • the management node 110 is configured to allocate a first processing task to at least two computing nodes including the FC x 121 in the first type of computing node 120, where FC x 121 is the at least two calculations.
  • the xth compute node in the node, x is a positive integer.
  • At least two of the first type of computing nodes 120 process the first processing task assigned by the management node 110 in parallel.
  • the computing node FC x 121 After receiving the first processing task assigned by the management node 110, the computing node FC x 121 acquires the data block M x and the data block V 1x in the data set to be processed according to the first processing task, wherein the data block M x is A matrix comprising m rows and n columns of data, the data block V 1x is a vector containing n-dimensional data, m and n are positive integers, and the value of n is not less than 2, and processing of the data block M x and the data block V 1x can be performed.
  • the data block M x is A matrix comprising m rows and n columns of data
  • the data block V 1x is a vector containing n-dimensional data
  • m and n are positive integers
  • the value of n is not less than 2
  • the compute node FC x 121 performs a merge combine2 operation and a reduction reduce 2 operation on the data block M x and the data block V 1x to obtain a first intermediate result V′ x , the first intermediate result V′ x being an m-dimensional data Vector.
  • the element, j is a variable, and the values of j are from 1 to n, respectively.
  • the i-th row and the j-th column element of the data block M x and the j-th row element of the data block V 1x are combined to obtain an i-th row and a j-th column element of the data block M x and the data.
  • the intermediate result x j corresponding to the jth row element of block V 1x .
  • combine2 operation and reduce2 operation may be to calculate x 1 and x 2, for x 1 and x 2 for reduce2 operation; recalculated x 3, the x 1 and x 2 after the results reduce2 operation x 3 for reduce2 operation; ... ... until all the intermediate results x j corresponding to the i-th row of the data block M x are subjected to the reduce2 operation.
  • the reduce2 operation does not wait until all the combine2 operations have been completed, but the combine2 operation and the reduce2 operation alternate. In this way, the intermediate result x j that passes the reduce2 operation in the calculation process can be deleted without storing all the results of the combine2 operation in the memory, thereby saving memory space.
  • the above process is essentially an update process, that is, the intermediate result of the two x j after the reduce2 operation is first obtained by the reduce2 operation, and then the intermediate result is obtained by performing the reduce2 operation with other x j or other intermediate results. Continuously update intermediate results.
  • the reduce2 operation may be addition, subtraction, multiplication, division, maximum, minimum, etc., but the embodiment of the present invention is not limited thereto.
  • the reduce2 operation processes the intermediate results (such as x 1 ,..., x n ) corresponding to each element of a row of the matrix, and does not need to wait until all x 1 ,...,x n is calculated, but During the process of calculating the intermediate result x j , the reduce2 processing is gradually expanded.
  • the advantage of the reduce2 operation is that there is no need to care about the order of the elements performing the reduce2 operation in the calculation process, that is, regardless of the order of the elements in the reduce2 operation, the obtained result is unique.
  • an array of it Array(0,1,2,3,4,5) sums the array and can be expressed as it.reduce(_+_).
  • the value obtained by adding data from the left to the right is summed with reduce2 until the end result is the same.
  • the management node 110 is further configured to obtain a processing result of the to-be-processed data set according to the first intermediate result obtained by the at least two computing nodes of the first type of computing node 120. After the first type of computing node 120 completes the first processing task, the management node 110 is notified, and the management node 110 obtains the data set of the to-be-processed data set according to the first intermediate result obtained by at least two computing nodes of the first type of computing node 120.
  • the processing result is used, or the first intermediate result is used as the basic data of the other processing tasks, and the processing task using the first intermediate result for calculation is sent to the corresponding computing node.
  • the data processing system provided by the embodiment of the present invention performs a merge operation on a data block and In the reduction operation, it is not necessary to wait for all the merge operations to be completed before performing the reduction operation, but the merge operation and the reduction operation are alternately performed, thereby saving the memory space occupied by the calculation and reducing the calculation time.
  • the data processing system 100 further includes a second type of computing node, where the management node 110 is specifically configured to:
  • the compute node SC y is used to:
  • the management node 110 is specifically configured to:
  • the management node 110 may be at least according to the first type of computing node 120.
  • a first intermediate result obtained by the two computing nodes assigning a second processing task to at least one computing node including the SC y in the second type of computing node, wherein SC y is the yth computing in the at least one computing node Node, y is a positive integer.
  • the computing node SC y is configured to obtain, according to the second processing task, a first intermediate result obtained by at least two computing nodes in the first type of computing node 120, wherein the first intermediate result obtained by the SC y is based on the to-be-processed The first intermediate result obtained by the data block in the same row in the data set.
  • the second processing task performs a reduce2 operation on the first intermediate result obtained by the SC y to obtain a second intermediate result V" y , wherein the second intermediate result V" y is a vector containing m-dimensional data.
  • the management node 110 is specifically configured to obtain a processing result of the to-be-processed data set according to a second intermediate result obtained by at least one of the second type of computing nodes.
  • the data set further includes a data block V 2x , where the data block V 2x is a vector containing m-dimensional data, and the management node 110 is further configured to:
  • the compute node SC y is also used to:
  • SC y obtained in the second intermediate result V "y and V 2x perform the block assignment assign operation to obtain the processing result data set to be processed.
  • the second intermediate result obtained after the first intermediate result obtained by the data block in which the data to be processed is concentrated in the same row is subjected to the reduce2 operation, and the assign operation may be performed to obtain the processing result of the to-be-processed data set.
  • the second intermediate result V" y and the data block V 2x are vectors containing m-dimensional data, and the corresponding element performs an assign operation to obtain a result vector, and the result vector is an m-dimensional column vector.
  • the assignment processing can be described above.
  • V 3 ⁇ *M*V 1 + ⁇ *V 2 (Equation 4)
  • Equation 4 represents that the weight V 2 is introduced in the result vector V 3 obtained by the data processing system 100 of the embodiment of the present invention, so that the assign processing is not limited to the vector used for multiplication, and therefore can also support the non-existing GIMV model.
  • the matrix multiplication operation of the square matrix expands the range that the matrix vector multiplication operation can represent.
  • the "vector for adding" (corresponding to the vector V 2 in the upper bit) is often set to (1-d)/N to participate in the operation.
  • the (1-d)/N vector can be used to adjust the pagerank value of each vertex in the corresponding graph, so that the pagerank value of each vertex is more in line with the real situation.
  • the pagerank value can be expressed as R, specifically Equation 5.
  • the data block V 1x and the data block V 2x are the same data block. I.e., assign a data block for performing operation of the data block M x V 1x operation and data blocks for the same data block V 2x.
  • At least two computing nodes of the second type of computing nodes process the second processing task allocated by the management node in parallel.
  • the data set to be processed is divided into rows and columns, at least two second-class computing nodes are required to process the second processing task; when the data set to be processed is only partitioned by columns, A second type of computing node is needed; when the data set to be processed is only partitioned by row, the second type of computing node is not needed, that is, the processing of the second processing task is not required.
  • the management node, the first type of computing node, and the second type of computing node may be a physical machine, a virtual machine, or a central processing unit CPU, etc., which is not limited by the embodiment of the present invention.
  • FIG. 9 shows a schematic flow chart of data processing performed by data processing system 100 in accordance with an embodiment of the present invention. As shown in FIG. 9, data processing by data processing system 100 includes the following steps:
  • the data block V 2 ' is distributed, and the data block V 2 ' is divided into blocks, and the divided data block V 2 ' is broadcasted.
  • the data block V 1 ' is distributed, and the data block V 1 ' is divided into blocks, and the divided data block V 1 ' is broadcasted.
  • the tiled matrices and vectors in S202 through S204 are correspondingly distributed to at least two compute nodes, which are distributed.
  • S205 Perform local combine2 processing and partial reduce2 processing on each computing node.
  • the data block M x and the data block V 1x perform the combine2 process, and before obtaining all the intermediate results corresponding to the data block M x and the data block V 1x , the intermediate result is subjected to reduce2 processing to obtain the first intermediate result; and then the first intermediate result is obtained. And the newly obtained intermediate result is subjected to reduce2 processing to obtain a new first intermediate result.
  • the final first intermediate result is that all the intermediate results corresponding to the data block M x and the data block V 1x have undergone the result of reduce2 processing.
  • Each computing node performs global data transmission on the first intermediate result obtained by S205 to concentrate the first intermediate result on one computing node.
  • the method of the embodiment of the present invention performs data reduction after global data transmission, and the amount of data transmitted is greatly reduced compared to the existing matrix vector multiplication operation.
  • the second intermediate result V "element y corresponding elements of the data block is V 2x assign to give the element of the result vector, whereby the result vector.
  • the matrix in the SSSP algorithm is the matrix after the adjacency matrix M in FIG. 6B is transposed, and the vector V 1 and the vector V 2 are both V in FIG. 7 .
  • the adjacency matrix M in FIG. 6B is first transposed and then divided, and the initial vector V is divided. Then, the block of the matrix and the corresponding block of V are subjected to combine2 processing and reduce2 processing.
  • the reduce2 processing is performed when all the first intermediate results of the combine2 processing of the block of the matrix and the corresponding block of V have not been obtained, that is, the combine2 processing and the reduce2 processing are alternately performed.
  • the first intermediate result corresponding to each matrix block can be obtained.
  • the intermediate vector V can be obtained by performing reduce2 processing on the first intermediate result corresponding to the matrix block located in the same row.
  • the intermediate vector V and V are assigned to obtain a result vector.
  • the data processing system provided by the embodiment of the present invention does not need to wait for all the merge operations to complete after the merge operation and the reduction operation, but the merge operation and the reduction operation. Alternate, which saves memory space for calculations and reduces computation time.
  • FIG. 11 shows a schematic block diagram of a data processing system 300 in accordance with another embodiment of the present invention.
  • the data processing system 300 includes a management node 310 and a first type of computing node 320.
  • the management node 310 is used to:
  • FC x is the xth computing node of the at least two computing nodes, and x is a positive integer
  • At least two of the first type of computing nodes 320 process the first processing task allocated by the management node 310 in parallel;
  • the computing node FC x is configured to acquire the data block M 1x and the data block M 2x in the data set to be processed according to the first processing task allocated by the management node 310, where the data block M 1x is data including m rows and n columns a matrix, the data block M 2x is a matrix containing n rows and p columns of data, m, n and P are positive integers, and the value of n is not less than 2;
  • first intermediate result M′ x Performing a merge combine2 operation and a reduction reduce2 operation on the data block M 1x and the data block M 2x to obtain a first intermediate result M′ x , the first intermediate result M′ x being a matrix including m rows and p columns of data
  • the management node 310 is further configured to obtain a processing result of the to-be-processed data set according to the first intermediate result obtained by at least two computing nodes of the first type of computing node 320.
  • the data processing system provided by the embodiment of the present invention does not need to wait for all the merge operations to complete after the merge operation and the reduction operation, but the merge operation and the reduction operation. Alternate, which saves memory used by calculations Between, reduce the calculation time.
  • the data processing system 300 can be applied to big data processing. Because of the large amount of data processed by big data, the data is usually divided into blocks, and different data blocks are distributed to different computing nodes. Parallel computing to increase the efficiency of the calculation.
  • the data processing system 300 includes a management node 310 and a first type of computing node 320.
  • the management node 310 is configured to receive a data processing task, and divide the data processing task into a plurality of processing tasks, and distribute the processing tasks to the computing node.
  • the management node 310 is further configured to receive an execution state of each processing node for its processing task to manage a process of data processing.
  • the computing node is configured to receive the processing task delivered by the management node 310, and obtain a data block according to the processing task to perform a corresponding processing task.
  • the computing node may acquire the data block execution processing task stored in the computing node, and may also acquire the data block execution processing task stored in the other computing node.
  • the compute nodes can be classified according to the category of their processing tasks. For example, the first processing task is processed as the first type of node, and the second processing task is processed as the second type of node.
  • the management node 310 is configured to allocate a first processing task to at least two computing nodes including the FC x 321 in the first type of computing node 320, where FC x 321 is the at least two calculations.
  • the xth compute node in the node, x is a positive integer.
  • At least two of the first type of computing nodes 320 process the first processing task assigned by the management node 310 in parallel.
  • the computing node FC x 321 After receiving the first processing task assigned by the management node 310, the computing node FC x 321 acquires the data block M 1x and the data block M 2x in the data set to be processed according to the first processing task, where the data block M 1x is A matrix comprising m rows and n columns of data, the data block M 2x is a matrix containing n rows and p columns of data, m, n and P are positive integers, and the value of n is not less than 2, and the data block M 1x and the data block can be The processing of M 2x is treated as a matrix matrix multiplication operation.
  • the compute node FC x 321 performs a merge combine2 operation and a reduction reduce 2 operation on the data block M 1x and the data block M 2x to obtain a first intermediate result M′ x , the first intermediate result M′ x being including m rows p A matrix of column data.
  • the element, k is a variable, and the values of k are from 1 to n, respectively.
  • the values of the elements m i ' , j , i are from 1 to m, respectively, and the values of j are from 1 to p, respectively, and the first intermediate result M' x can be obtained.
  • the combine2 operation and the reduce2 operation may be to first calculate x i1j and x i2j , and perform reduce2 operations on x i1j and x i2j ; then calculate x i3j , perform the reduce 2 operation on x i1j and x i2j and perform reduce2 operation on x i3j ; ...; i-th row until the data block and the data block M x M 2x corresponding to the j-th column intermediate result x ikj reduce2 through all operations.
  • the reduce2 operation does not wait until all the combine2 operations have been completed, but the combine2 operation and the reduce2 operation alternate. In this way, the intermediate result x j that passes the reduce2 operation in the calculation process can be deleted without storing all the results of the combine2 operation in the memory, thereby saving memory space.
  • the above process is essentially an update procedure, i.e. first intermediate result obtained in the two x ikj reduce2 by reduce2 operation, then the intermediate result obtained by further reduce2 x ikj other operations or other intermediate results, Continuously update intermediate results.
  • the reduce2 operation may be addition, subtraction, multiplication, division, maximum, minimum, etc., but the embodiment of the present invention is not limited thereto.
  • the reduce2 operation processes the intermediate results (such as x i1j , ..., x inj ) corresponding to each element of a row of the matrix, and does not need to wait until all x i1j , ..., x inj is calculated, and then the process of calculating the intermediate result x ikj gradually expand reduce2 process.
  • the advantage of the reduce2 operation is that there is no need to care about the order of the elements performing the reduce2 operation in the calculation process, that is, regardless of the order of the elements in the reduce2 operation, the obtained result is unique.
  • an array of it Array(0,1,2,3,4,5) sums the array and can be expressed as it.reduce(_+_).
  • the value obtained by adding data from the left to the right is summed with reduce2 until the end result is the same.
  • the management node 310 is further configured to obtain a processing result of the to-be-processed data set according to the first intermediate result obtained by the at least two computing nodes of the first type of computing node 320. After the first type of computing node 320 completes the first processing task, the management node 310 is notified, and the management node 310 obtains the to-be-processed data set according to the first intermediate result obtained by at least two computing nodes of the first type of computing node 320.
  • the processing result is used, or the first intermediate result is used as the basic data of the other processing tasks, and the processing task using the first intermediate result for calculation is sent to the corresponding computing node.
  • the data processing system provided by the embodiment of the present invention does not need to wait for all the merge operations to complete after the merge operation and the reduction operation, but the merge operation and the reduction operation. Alternate, which saves memory space for calculations and reduces computation time.
  • the data processing system 300 further includes a second type of computing node, where the management node 310 is specifically configured to:
  • the compute node SC y is used to:
  • the management node 310 is specifically configured to:
  • the management node 310 may allocate a second processing task to the at least one computing node including the SC y in the second type of computing node according to the first intermediate result obtained by the at least two computing nodes of the first type of computing node 320.
  • SC y is the yth compute node of the at least one compute node
  • y is a positive integer.
  • the computing node SC y is configured to obtain, according to the second processing task, a first intermediate result obtained by at least two computing nodes in the first type of computing node 320, where the first intermediate result obtained by the SC y is according to the data block M 1x , M 1x is the data block in the same row of the data to be processed, and the data block M 2x , M 2x is the first intermediate result obtained by the data block in the same column of the data set to be processed.
  • the second processing task performs a reduce2 operation on the first intermediate result obtained by the SC y to obtain a second intermediate result M" y , wherein the second intermediate result M" y is a matrix containing m rows and p columns of data.
  • the data block M 1x , M 1x is a data block in which the data to be processed is located in the same row
  • the data block M 2x , M 2x performs the reduce2 operation on the two first intermediate results obtained by the data blocks in the same column of the data to be processed, and performs the reduce2 operation on the result of the reduce2 operation and the other first intermediate results.
  • the management node 110 is specifically configured to obtain a processing result of the to-be-processed data set according to a second intermediate result obtained by at least one of the second type of computing nodes.
  • the data set further comprises M 3x data block, the data block M 3x matrix comprising m rows and p columns of data, the management node 310 is further configured to:
  • the compute node SC y is also used to:
  • the second intermediate result obtained after the first intermediate result obtained by the data block in which the data set M 1x is located in the same row and the M 2x is located in the same column is subjected to the reduce2 operation may also be subjected to an assign operation to obtain The processing result of the data set to be processed.
  • the second intermediate result M" y and the data block M 3x are each a matrix containing m rows and p columns of data, and the corresponding element performs an assign operation to obtain a result vector, which is a matrix containing m rows and p columns of data. Processing can be handled as assigned above.
  • the second type of computing node in the data processing system 300 is further configured to perform row processing on the rth row of the processing result of the data set to be processed, the row being processed as an element for the rth row. deal with.
  • the result matrix D can be further processed, for example, the elements of the rth row of the result matrix D are subjected to reduction processing.
  • Formula can be expressed as reduceRow(D i1 ,...,D in ), where reduceRow processing can take the maximum value, take the minimum value, take the largest Q values, take the smallest Q values, sum the data of the row, etc.
  • reduceRow processing can take the maximum value, take the minimum value, take the largest Q values, take the smallest Q values, sum the data of the row, etc.
  • the embodiment of the present invention does not limit this.
  • the result obtained by the reduceRow processing may still be stored in the corresponding matrix form, for example, the maximum processing of the ith row of the result matrix D, the maximum value being D i1 , the ith row of the stored matrix
  • the first column stores the value D i1
  • the other columns store the value 0 (or do not store 0).
  • the result obtained by the reduceRow process may store only the value obtained after the process, for example, the i-th row of the result matrix D is subjected to a summation process, and the result of the summation is Y, then the value Y is stored, which is in the embodiment of the present invention.
  • the storage method is not limited.
  • the second type of compute nodes in data processing system 300 are also used for column processing of column c of the processing result of the data set to be processed, the column processing being performed for the elements of the column c. deal with. To avoid repetition, the text will not go into details.
  • the data block M 1x is a matrix of 3 rows and 4 columns
  • the data block M 2x is a matrix of 4 rows and 4 columns.
  • a matrix of 3 rows and 4 columns is obtained.
  • the matrix of 3 rows and 4 columns and the data block M 3x are subjected to evaluation processing to obtain a result matrix, so the data block M 1x and the data block M 3x can be the same data block.
  • n m
  • the data block M 2x and the data block M 3x are the same data block. It should be understood that, in the embodiment of the present invention, at least one of the data block M 1x , the data block M 2x , and the data block M 3x that performs the operation may be transposed to satisfy the operation. Therefore, the data block M 2x and the data block M 3x are the same data block.
  • At least two computing nodes when at least two computing nodes are included in the second type of computing node, at least two computing nodes of the second type of computing nodes process the second processing task allocated by the management node in parallel.
  • the management node, the first type of computing node, and the second type of computing node comprise a physical machine, a virtual machine, or a central processing unit CPU.
  • FIG. 12 shows a schematic flow diagram of a method 400 of data processing by data processing system 300 in accordance with an embodiment of the present invention. As shown in FIG. 12, the method 400 includes the following steps:
  • a matrix block the data blocks M 1, M 2 blocks of data divided into blocks.
  • the data block M 1 is divided into a plurality of data blocks M 1x
  • the data block M 1x is a matrix containing m rows and n columns of data
  • the data block M 2 is divided into a plurality of data blocks M 2x
  • the data block M 2x is composed of n rows A matrix of p-column data.
  • S404 Perform local combine2 processing and local reduce2 processing on each of the first type of computing nodes.
  • Data block and the data block M 1x M 2x for combine2 process until all the intermediate results of the data block M 1x row and a respective column blocks corresponding to the M 2x, intermediate results to obtain the first intermediate result reduce2 Then, the first intermediate result and the newly obtained intermediate result are subjected to reduce2 processing to obtain a new first intermediate result.
  • the first intermediate result is that all the intermediate results corresponding to a certain row of the moment data block M 1x and the corresponding column of the data block M 2x have been subjected to the result of reduce2 processing. Further, the results similarly obtained for all combinations of the first intermediate a row of data blocks and data blocks M 1x M 2x columns corresponding to, and form a first intermediate result matrix.
  • the data block M 1 M blocks of the same row of data corresponding to a first result of at least two intermediate blocks corresponding to the same column 2 reduce2 performed to give a second intermediate result.
  • a plurality of second intermediate results form an intermediate matrix X.
  • the intermediate matrix X is partitioned and distributed to at least one compute node.
  • the data block M 3 is divided into blocks and distribute the distribution data block M 3x data block M 3 to the computing node matrix of a block intermediate matrix X are located, the data block M 3x comprising m rows and p columns of data Matrix.
  • each matrix block of the result matrix D is subjected to reduceRow processing by a block.
  • S410 Perform data transmission, transmit the result obtained in S409, and perform reduceRow processing on the result of the matrix block corresponding to the same row to obtain a matrix Y.
  • each matrix block of the matrix Y is subjected to reduceCol processing in columns.
  • S412 Perform data transmission, transmit the result obtained in S411, and perform reduceCol processing on the result of the matrix block corresponding to the same column to obtain a matrix Z.
  • the data processing system provided by the embodiment of the present invention does not need to wait for all the merge operations to complete after the merge operation and the reduction operation, but the merge operation and the reduction operation. Alternate, which saves memory space for calculations and reduces computation time.
  • Probability propagation is a type of recommendation algorithm. For a "user-project" interactive record database, several items that the user may be interested in are recommended for each user. Probabilistic propagation is based on global data, which calculates the potential interest items of all users at once. The algorithm has a solid theoretical basis: probability propagation is derived from the physics "energy conservation law". The operation of the matrix is similar to the propagation of energy between different materials, and the resulting interest matrix corresponds to the original matrix. The sum is exactly equal, and the sum of the corresponding columns is also equal, reflecting the conservation of energy.
  • the probability propagation algorithm can be implemented by matrix operation.
  • the matrix implementation of the existing probability propagation algorithm can obtain the “movie-to-user attraction matrix” and the “inter-similarity matrix between users”; then, by matrix matrix multiplication operation, "The new attractiveness matrix of the movie to the user”; then, screen out the movies that the user has already seen; finally, only recommend movies that have not been seen by the topk for each user.
  • the probability propagation algorithm only recommends a limited number of movies for each user, and the resulting matrix is sparse (more elements are 0). In general, the amount of data is not large (in this scenario, 0 elements are not stored).
  • the “New Attraction of Movies to Users” matrix obtained by the existing schemes in the calculation process is often very dense and the amount of data is very large. This results in a large amount of intermediate memory usage and a large amount of data transfer in the system.
  • the probability propagation algorithm implemented by the matrix matrix multiplication operation in the embodiment of the present invention wherein the number of movies in the original data set is m, the number of users is n, and the topk movie is recommended for each user.
  • the first matrix A is a "movie-to-user attraction" matrix of m rows and n columns
  • the second matrix B is n rows and n columns "interest similarity between users” matrix
  • the third matrix C and the number for assign A matrix A is the same matrix.
  • the formula for recommending movies that have not been seen by the topk for each user is:
  • the specific calculation process is as follows: First, the elements of the first matrix A and the elements of the second matrix B are subjected to combine2 processing and reduce2 processing.
  • the reduce2 processing is performed when all intermediate results of the combine2 processing of the elements of the first matrix A and the elements of the second matrix B have not been obtained, that is, the combine2 processing and the reduce2 processing are alternately performed.
  • a first intermediate result corresponding to the second matrix B of the first matrix A can be obtained.
  • the first intermediate result and the third matrix C for assign are subjected to an assign process to obtain a result matrix D.
  • the reduceCol processing is performed on each column of the result matrix D.
  • Assign processing is to take the "screening" process, That is, if the element at the corresponding position in the third matrix C of the assignment processing is not zero, the element is screened out (the element is disposed 0), that is, if the user has not watched the movie, the data is retained if the user views After the movie, the data is filtered out (this element handles 0) for performing reduceCol processing.
  • the main flow chart of the probability propagation algorithm implemented by the matrix matrix multiplication operation of the embodiment of the present invention is shown in FIG. Performing combine2 and reduce2 processing on one row of the first matrix A and one column of the second matrix B to obtain a value Then, since the value of the corresponding position of the third matrix C for the assignment processing is 1, the assignment result matrix D corresponds to an element value of 0 (if the 0 element in the system is not saved, the corresponding combine2 and reduce2 processing may not Calculation). Finally, for the obtained elements of the same column, the topk processing is performed to obtain a movie with a degree of attraction topk in a movie that the user has not seen.
  • Matrix matrix multiplication operation using a probability propagation algorithm in calculating "movie In the process of “new attraction” to the user, it is possible to simultaneously screen out the movies that have already been viewed (even without directly counting the records that need to be screened), and at the same time perform the recommended operation of the topk movie of the user score ranking, thereby reducing Intermediate memory usage, as well as system data transfer volume.
  • the isCompute operator may be introduced in the process of the matrix vector multiplication operation and the matrix matrix multiplication operation in the embodiment of the present invention to determine whether the row needs to be calculated. If you do not need to calculate, skip the line and continue to calculate the next line; if you need to calculate, perform the combine2 operation and the reduce2 operation according to the algorithm.
  • the isCompute operator in the matrix vector multiplication operation may be a column vector equal to the number of rows of the data block M x
  • the isCompute operator in the matrix matrix multiplication operation may be a matrix, which is not limited in the embodiment of the present invention.
  • the performance of the prior art extended GIMV model and the matrix vector multiplication operation of the embodiment of the present invention is compared on a general parallel computing framework spark.
  • the test environment is a cluster of 3 machines (3 RH2285, 12 cores, 24 threads, 192g memory, 100g configuration).
  • the test data is a wiki_talk data set, and the test result indicates that the calculation time of the extended GIMV model of the prior art exceeds 3600 s, and the matrix vector multiplication operation of the embodiment of the present invention uses 340 s.
  • the data set size and test results of the test are shown in Table 1.
  • the test data are data sets of Interactive Network Television (IPTV) and Netflix (Nasdaq NFLX, NETFLIX). It can be seen from the test results that the embodiment of the present invention can effectively reduce the intermediate memory occupation and shorten the calculation time, so that a larger data set can be processed.
  • FIGS. 1 through 13 A data processing system according to an embodiment of the present invention is described in detail above with reference to FIGS. 1 through 13, and a computing node in a data processing system according to an embodiment of the present invention will be described in detail below with reference to FIGS. 14 through 15.
  • FIG. 14 illustrates a computing node 500 that belongs to a data processing system, the data processing system further includes a management node, and the computing node 500 includes:
  • the receiving module 501 is configured to receive a first processing task allocated by the management node
  • the obtaining module 502 is configured to acquire the data block M x and the data block V 1x in the data set to be processed according to the first processing task allocated by the management node received by the receiving module 501, where the data block M x is included a matrix of m rows and n columns of data, the data block V 1x is a vector containing n-dimensional data, m and n are positive integers, and the value of n is not less than 2;
  • the element in block V 1x , j is a variable, and the values of j are from 1 to n, respectively.
  • the computing node is a physical machine, a virtual machine, or a central processing unit CPU, which is not limited in this embodiment of the present invention.
  • the computing node provided by the embodiment of the present invention does not need to wait for all the merge operations to complete the reduction operation after performing the merge operation and the reduction operation on the data block, but the merge operation and the reduction operation are alternated. This can save the memory space occupied by the calculation and reduce the calculation time.
  • Figure 15 illustrates a computing node 600 that is part of a data processing system, the data processing system further including a management node, the computing node comprising:
  • the receiving module 601 is configured to receive a first processing task allocated by the management node
  • the obtaining module 602 is configured to obtain the data block M 1x and the data block M 2x in the data set to be processed according to the first processing task allocated by the management node received by the receiving module 601, where the data block M 1x is included a matrix of m rows and n columns of data, the data block M 2x is a matrix containing n rows and p columns of data, m, n and P are positive integers, and the value of n is not less than 2;
  • the computing node is a physical machine, a virtual machine, or a central processing unit CPU, which is not limited in this embodiment of the present invention.
  • the computing node provided by the embodiment of the present invention does not need to wait for all the merge operations to complete the reduction operation after performing the merge operation and the reduction operation on the data block, but the merge operation and the reduction operation are alternated. This can save the memory space occupied by the calculation and reduce the calculation time.
  • an embodiment of the present invention further provides a computing node 700, which includes a processor 701, a memory 702, a bus system 703, and a transceiver 704.
  • the processor 701, the memory 702, and the transceiver 704 pass The bus system 703 is connected.
  • the memory 702 is for storing instructions
  • the processor 701 is for executing instructions stored by the memory 702.
  • the transceiver 704 is configured to:
  • the processor 701 is configured to:
  • the computing node provided by the embodiment of the present invention does not need to wait for all the merge operations to complete the reduction operation after performing the merge operation and the reduction operation on the data block, but the merge operation and the reduction operation are alternated. This can save the memory space occupied by the calculation and reduce the calculation time.
  • the processor processor 701 may be a central processing unit (CPU), and the processor processor 701 may also be other general-purpose processors, digital signal processors (Digital Signal) Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 702 can include read only memory and random access memory and provides instructions and data to the processor 701. A portion of the memory 702 can also include a non-volatile random access memory. For example, the memory 702 can also store information of the device type.
  • the bus system 703 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 703 in the figure.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 701 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 702, and the processor 701 reads the information in the memory 702 and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • the computing node 700 is a physical machine, a virtual machine, or a central processing unit CPU.
  • computing node 700 may correspond to the main body of the method in the embodiment of the present invention, and may also correspond to the computing node 500 according to an embodiment of the present invention, and the foregoing modules of the computing node 700 are And other operations and/or functions are corresponding processes for implementing the data processing method, and are not described herein for brevity.
  • the computing node provided by the embodiment of the present invention does not need to wait for all the merge operations to complete the reduction operation after performing the merge operation and the reduction operation on the data block, but the merge operation and the reduction operation are alternated. This can save the memory space occupied by the calculation and reduce the calculation time.
  • an embodiment of the present invention further provides a computing node 800, which includes a processor 801, a memory 802, a bus system 803, and a transceiver 804.
  • the processor 801, the memory 802, and the transceiver 804 pass The bus system 803 is connected.
  • the memory 802 is for storing instructions
  • the processor 801 is for executing instructions stored by the memory 802.
  • the transceiver 804 is used to:
  • the data block M 1x is a matrix including m rows and n columns of data
  • the data block M 2x is a matrix containing n rows and p columns of data
  • m, n and P are positive integers, and the value of n is not less than 2;
  • the processor 801 is configured to:
  • first intermediate result M′ x Performing a merge combine2 operation and a reduction reduce2 operation on the data block M 1x and the data block M 2x to obtain a first intermediate result M′ x , the first intermediate result M′ x being a matrix including m rows and p columns of data
  • the computing node provided by the embodiment of the present invention does not need to wait for all the merge operations to complete the reduction operation after performing the merge operation and the reduction operation on the data block, but the merge operation and the reduction operation are alternated. This can save the memory space occupied by the calculation and reduce the calculation time.
  • the processor processor 801 may be a central processing unit (CPU), and the processor processor 801 may also be other general-purpose processors and digital signal processors (Digital Signal). Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 802 can include read only memory and random access memory and provides instructions and data to the processor 801. A portion of the memory 802 may also include a non-volatile random access memory. For example, the memory 802 can also store information of the device type.
  • the bus system 803 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 803 in the figure.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 801 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as hardware processor execution, or use hardware and software module groups in the processor.
  • the execution is completed.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in memory 802, and processor 801 reads the information in memory 802 and, in conjunction with its hardware, performs the steps of the above method. To avoid repetition, it will not be described in detail here.
  • the computing node 800 is a physical machine, a virtual machine, or a central processing unit CPU.
  • computing node 800 may correspond to the main body of the method in the embodiment of the present invention, and may also correspond to the computing node 600 according to an embodiment of the present invention, and the above-mentioned modules of the node 800 are calculated. And other operations and/or functions are corresponding processes for implementing the data processing method, and are not described herein for brevity.
  • the computing node provided by the embodiment of the present invention does not need to wait for all the merge operations to complete the reduction operation after performing the merge operation and the reduction operation on the data block, but the merge operation and the reduction operation are alternated. This can save the memory space occupied by the calculation and reduce the calculation time.
  • a data processing system and a computing node according to an embodiment of the present invention are described in detail above with reference to FIGS. 1 through 17, and a method of data processing according to an embodiment of the present invention will be described in detail below with reference to FIGS. 18 through 19.
  • FIG. 18 illustrates a method 900 of data processing in accordance with an embodiment of the present invention, wherein the method 900 is applied to a data processing system including a management node and a first computing node, the method 900 including :
  • the management node allocates a first processing task to at least two computing nodes including the FC x in the first type of computing node, where FC x is the xth computing node of the at least two computing nodes, where a positive integer, wherein at least two of the first type of computing nodes process the first processing task allocated by the management node in parallel;
  • the computing node FC x acquires the data block M x and the data block V 1x in the data set to be processed according to the first processing task allocated by the management node, where the data block M x is data including m rows and n columns. a matrix, the data block V 1x is a vector containing n-dimensional data, m and n are positive integers, and the value of n is not less than 2;
  • the management node obtains a processing result of the to-be-processed data set according to the first intermediate result obtained by at least two computing nodes in the first type of computing node.
  • the data processing method provided by the embodiment of the present invention does not need to wait for all the merge operations to complete after the merge operation and the reduction operation, but the merge operation and the reduction operation.
  • the operations are alternated, which saves memory space occupied by calculations and reduces calculation time.
  • the data processing system further includes at least one second type of computing node
  • the method 900 further includes:
  • the management node allocates a second processing task to the at least one computing node including the SC y in the second type of computing node according to the first intermediate result obtained by the at least two computing nodes of the first type of computing node, where SC y is the yth computing node of the at least one computing node, and y is a positive integer;
  • the computing node SC y obtains a first intermediate result obtained by at least two computing nodes of the first type of computing node according to the second processing task, where the first intermediate result obtained by the SC y is based on the to-be-processed data set The first intermediate result obtained by the data block located in the same row;
  • the computing node SC y performs a reduce2 operation on the first intermediate result obtained by the SC y to obtain a second intermediate result V′′ y , wherein the second intermediate result V′′ y is a vector containing m-dimensional data;
  • the management node obtains a processing result of the to-be-processed data set according to a second intermediate result obtained by at least one of the second type of calculation results.
  • the data set further includes a data block V 2x , where the data block V 2x is a vector including m-dimensional data
  • the method 900 further includes:
  • the management node allocates a third process to the at least one computing node including the SC y in the second type of computing node according to the second intermediate result obtained by the at least one computing node including the SC y in the second type of computing node task;
  • the computing node SC y acquires the data block V 2x in the data set according to the third processing task
  • the computing node SC y SC y obtained in the second intermediate result V "y and V 2x perform the block assignment assign operation to obtain the processing result data set to be processed.
  • n n
  • the data block V 1x and the data block V 2x are the same data block.
  • At least two computing nodes when at least two computing nodes are included in the second type of computing node, at least two computing nodes of the second type of computing nodes process the second processing task allocated by the management node in parallel.
  • the management node, the first type of computing node, and the second type of computing node comprise a physical machine, a virtual machine, or a central processing unit CPU.
  • the data processing method provided by the embodiment of the present invention does not need to wait for all the merge operations to complete after the merge operation and the reduction operation, but the merge operation and the reduction operation.
  • the operations are alternated, which saves memory space occupied by calculations and reduces calculation time.
  • FIG. 19 illustrates a method 1000 of data processing in accordance with an embodiment of the present invention, wherein the method 1000 is applied to a data processing system including a management node and a first computing node, the method 1000 including :
  • the management node allocates a first processing task to at least two computing nodes including the FC x in the first type of computing node, where FC x is the xth computing node of the at least two computing nodes, where a positive integer, wherein at least two of the first type of computing nodes process the first processing task allocated by the management node in parallel;
  • the computing node FC x acquires a data block M 1x and a data block M 2x in a data set to be processed according to a first processing task allocated by the management node, where the data block M 1x is a matrix including m rows and n columns of data.
  • the data block M 2x is a matrix containing n rows and p columns of data, m, n and P are positive integers, and the value of n is not less than 2;
  • the management node obtains a processing result of the to-be-processed data set according to a first intermediate result obtained by at least two computing nodes of the first type of computing nodes.
  • the data processing method provided by the embodiment of the present invention performs a merge operation on the data block.
  • the reduction operation it is not necessary to wait for all the merge operations to be completed before performing the reduction operation, and the merge operation and the reduction operation are alternately performed, thereby saving the memory space occupied by the calculation and reducing the calculation time.
  • the data processing system further includes at least one second type of computing node
  • the method 1000 further includes:
  • the management node allocates a second processing task to the at least one computing node including the SC y in the second type of computing node according to the first intermediate result obtained by the at least two computing nodes of the first type of computing node, where SC y is the yth computing node of the at least one computing node, and y is a positive integer;
  • the computing node SC y obtains a first intermediate result obtained by at least two computing nodes of the first type of computing node according to the second processing task, where the first intermediate result obtained by the SC y is based on the to-be-processed data set a first intermediate result obtained by the data block M 1x located in the same row and the data block M 2x located in the same row;
  • the computing node SC y performs a reduce2 operation on the first intermediate result obtained by the SC y to obtain a second intermediate result M′′ y , wherein the second intermediate result M′′ y is a matrix including m rows and p columns of data;
  • the management node obtains a processing result of the to-be-processed data set according to a second intermediate result obtained by at least one of the second type of calculation results.
  • the data set further includes a data block M 3x , where the data block M 3x is a matrix including m rows and p columns of data, the method 1000 further includes:
  • the management node allocates a third process to the at least one computing node including the SC y in the second type of computing node according to the second intermediate result obtained by the at least one computing node including the SC y in the second type of computing node task;
  • the computing node SC y acquires the data block M 3x in the data set according to the third processing task
  • n m
  • the data block M 2x and the data block M 3x are the same data block.
  • n p
  • the data block M 1x and the data block M 3x are the same data block.
  • At least two calculation sections are included in the second type of computing node. At the time of the point, at least two of the second type of computing nodes process the second processing task assigned by the management node in parallel.
  • the management node, the first type of computing node, and the second type of computing node comprise a physical machine, a virtual machine, or a central processing unit CPU.
  • the data processing method provided by the embodiment of the present invention does not need to wait for all the merge operations to complete after the merge operation and the reduction operation, but the merge operation and the reduction operation.
  • the operations are alternated, which saves memory space occupied by calculations and reduces calculation time.
  • the foregoing storage medium may include: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a random access memory (RAM), a solid state disk (SSD), or a non-volatile memory.
  • a USB flash drive a mobile hard disk
  • a magnetic disk a magnetic disk
  • an optical disk a random access memory (RAM)
  • a solid state disk SSD
  • a variety of non-transitory machine readable media that can store program code. The above description is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Multi Processors (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Complex Calculations (AREA)

Abstract

一种数据处理系统、计算节点和数据处理的方法,该数据处理系统(100)包括:管理节点(110)和第一类计算节点(120),管理节点(110)用于向第一类计算节点(120)分配第一处理任务,第一类计算节点(120)中的至少两个计算节点(121)并行处理管理节点(110)分配的第一处理任务,计算节点(121)对数据块M x以及数据块V 1x执行合并combine2操作和归约reduce2操作获得的第一中间结果,管理节点(110)根据第一类计算节点(120)获得的第一中间结果获得待处理数据集的处理结果。所述数据处理系统(100),对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。

Description

数据处理系统、计算节点和数据处理的方法 技术领域
本发明涉及计算机技术领域,尤其涉及图计算领域中数据处理系统、计算节点和数据处理的方法。
背景技术
随着信息通信技术(Information Communication Technology,ICT)的不断发展,信息互联网络中产生的数据呈现爆炸式增长。通过对这些数据进行数据挖掘和机器学习,可以获得大量的有价值信息。数据挖掘和机器学习的研究对象通常是一个对象集合以及这些对象之间的关系(例如:社交网络),以上研究对象可以表示成数学意义中的图(Graph)的形式。图用于描述对象与对象之间的关系,从直观上看,图可以由一些小圆点和连接这些圆点的线组成,圆点称为图的顶点(Vertex),连接圆点的线称为边(Edge)。
由此,数据挖掘和机器学习算法可以转化为对图的操作即图计算。为了能够对图进行操作,要选择一种数据结构来表示图。目前,图的表示方式主要有两种:邻接表和邻接矩阵。邻接表是用对象来表示顶点,用指针或引用来表示边,采用这种数据结构方式不利于对图进行并行化处理;邻接矩阵,在本文中简称矩阵,是用一个二维矩阵来存储各顶点之间的邻接关系,采用这种数据结构方式可以很好地对图进行并行化处理,并且,采用矩阵存储数据时,数据的存储量较小。
图计算中的矩阵计算理论上可以包括矩阵向量乘操作和矩阵矩阵乘操作。现有的矩阵向量乘方操作,例如广义的可迭代的矩阵向量乘(Generalized Iterated Matrix-Vector multiplication,GIMV)是将矩阵中的某一行的元素与向量的元素进行两两合并操作,待该行的元素与向量的元素所有的两两合并操作完成后,对两两合并操作的结果进行全局合并操作。这就导致了在计算过程中需要占用一个矩阵大小的中间内存空间,从而对系统硬件设备提出更高的要求,并且如果将矩阵向量乘应用于分布式环境下时,系统需要进行大量的数据传输,使得计算花费大量的时间。
发明内容
本发明实施例提供一种数据处理系统、计算节点和数据处理的方法,能够使得在进行数据处理时所占用的内存空间更小,可以减少计算时间。
第一方面,本发明实施例提供了一种数据处理系统,所述数据处理系统包括管理节点和第一类计算节点,
所述管理节点用于:
向所述第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为所述至少两个计算节点中的第x个计算节点,x为正整数;
所述第一类计算节点中的至少两个计算节点并行处理所述管理节点分配的第一处理任务;
所述计算节点FCx,用于根据所述管理节点分配的第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,所述数据块Mx为包含m行n列数据的矩阵,所述数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
对所述数据块Mx以及所述数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,所述第一中间结果V′x为包括m维数据的向量,所述第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为所述数据块Mx中的元素,vj为所述数据块V1x中的元素,j为变量,j的取值分别从1到n;
所述管理节点,还用于根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果获得所述待处理数据集的处理结果。
结合第一方面,在第一方面的第一种可能的实现方式中,所述数据处理系统中还包括第二类计算节点,所述管理节点具体用于:
根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为所述至少一个计算节点中的第y个计算节点,y为正整数;
所述计算节点SCy用于:
根据所述第二处理任务获取所述第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,所述SCy获取的第一中间结果是根据所述待 处理数据集中位于同一行的数据块获得的第一中间结果;
将所述SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果V″y,其中,所述第二中间结果V″y为包含m维数据的向量;
所述管理节点具体用于:
根据所述第二类计算节点中的至少一个计算节点获得的第二中间结果获得所述待处理数据集的处理结果。
第二方面,本发明实施例提供了另一种数据处理系统,所述数据处理系统包括管理节点和第一类计算节点,
所述管理节点用于:
向所述第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为所述至少两个计算节点中的第x个计算节点,x为正整数;
所述第一类计算节点中的至少两个计算节点并行处理所述管理节点分配的第一处理任务;
所述计算节点FCx,用于根据所述管理节点分配的第一处理任务获取待处理的数据集中的数据块M1x和数据块M2x,其中,所述数据块M1x为包含m行n列数据的矩阵,所述数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
对所述数据块M1x以及所述数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,所述第一中间结果M′x为包括m行p列数据的矩阵,所述第一中间结果M′x中的元素为m′i,j,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为所述数据块M1x中第i行第k列的元素,m2[k,j]为所述数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n;
所述管理节点,还用于根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果获得所述待处理数据集的处理结果。
结合第二方面,在第二方面的第一种可能的实现方式中,所述数据处理系统中还包括第二类计算节点,所述管理节点具体用于:
根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任 务,其中,SCy为所述至少一个计算节点中的第y个计算节点,y为正整数;
所述计算节点SCy用于:
根据所述第二处理任务获取所述第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,所述SCy获取的第一中间结果是根据所述待处理数据集中位于同一行的数据块M1x和位于同一列的数据块M2x获得的第一中间结果;
将所述SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果M″y,其中,所述第二中间结果M″y为包含m行p列数据的矩阵;
所述管理节点具体用于:
根据所述第二类计算结果中的至少一个计算节点获得的第二中间结果获得所述待处理数据集的处理结果。
第三方面,本发明实施例提供了一种计算节点,包括:
接收模块,用于接收数据处理系统中的管理节点分配的第一处理任务,其中,所述数据处理系统包括所述计算节点和所述管理节点;
获取模块,用于根据所述接收模块接收的所述管理节点分配的所述第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,所述数据块Mx为包含m行n列数据的矩阵,所述数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
处理模块,用于对所述数据块Mx以及所述数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,所述第一中间结果V′x为包括m维数据的向量,所述第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为所述数据块Mx中的元素,vj为所述数据块V1x中的元素,j为变量,j的取值分别从1到n。
结合第三方面,在第三方面的第一种可能的实现方式中,所述计算节点包括物理机、虚拟机或中央处理器CPU。
第四方面,本发明实施例提供了另一种计算节点,包括:
接收模块,用于接收数据处理系统中的管理节点分配的第一处理任务,其中,所述数据处理系统包括所述计算节点和所述管理节点;
获取模块,用于根据所述接收模块接收的所述管理节点分配的所述第一处理任务,获取待处理的数据集中的数据块M1x和数据块M2x,其中,所述 数据块M1x为包含m行n列数据的矩阵,所述数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
处理模块,用于对所述数据块M1x以及所述数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,所述第一中间结果M′x为包括m行p列数据的矩阵,所述第一中间结果M′x中的元素为m′ij,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为所述数据块M1x中第i行第k列的元素,m2[k,j]为所述数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n。
第五方面,本发明实施例提供了一种数据处理的方法,所述方法应用于数据处理系统中,所述数据处理系统包括管理节点和第一类计算节点,所述方法包括:
所述管理节点向所述第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为所述至少两个计算节点中的第x个计算节点,x为正整数,其中,所述第一类计算节点中的至少两个计算节点并行处理所述管理节点分配的第一处理任务;
所述计算节点FCx根据所述管理节点分配的第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,所述数据块Mx为包含m行n列数据的矩阵,所述数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
所述计算节点FCx对所述数据块Mx以及所述数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,所述第一中间结果V′x为包括m维数据的向量,所述第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为所述数据块Mx中的元素,vj为所述数据块V1x中的元素,j为变量,j的取值分别从1到n;
所述管理节点根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果获得所述待处理数据集的处理结果。
结合第五方面,在第五方面的第一种可能的实现方式中,所述数据处理系统还包括至少一个第二类计算节点,所述方法还包括:
所述管理节点根据所述第一类计算节点中的至少两个计算节点获得的 第一中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为所述至少一个计算节点中的第y个计算节点,y为正整数;
所述计算节点SCy根据所述第二处理任务获取所述第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,所述SCy获取的第一中间结果是根据所述待处理数据集中位于同一行的数据块获得的第一中间结果;
所述计算节点SCy将所述SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果V″y,其中,所述第二中间结果V″y为包含m维数据的向量;
所述管理节点根据所述第二类计算结果中的至少一个计算节点获得的第二中间结果获得所述待处理数据集的处理结果。
结合第五方面的第一种可能的实现方式,在第二种可能的实现方式中,所述数据集还包括数据块V2x,所述数据块V2x为包含m维数据的向量,所述方法还包括:
所述管理节点根据所述第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
所述计算节点SCy根据所述第三处理任务获取所述数据集中的数据块V2x
所述计算节点SCy对所述SCy获得的第二中间结果V″y和所述数据块V2x执行赋值assign操作,以获得所述待处理数据集的处理结果。第六方面,本发明实施例提供了另一种数据处理的方法,所述方法应用于数据处理系统中,所述数据处理系统包括管理节点和第一计算节点,所述方法包括:
所述管理节点向所述第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为所述至少两个计算节点中的第x个计算节点,x为正整数,其中,所述第一类计算节点中的至少两个计算节点并行处理所述管理节点分配的第一处理任务;
所述计算节点FCx根据所述管理节点分配的第一处理任务获取待处理的数据集中的数据块M1x和数据块M2x,其中,所述数据块M1x为包含m行n列数据的矩阵,所述数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
所述计算节点FCx对所述数据块M1x以及所述数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,所述第一中间结果M′x为包括m行p列数据的矩阵,所述第一中间结果M′x中的元素为m′ij,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为所述数据块M1x中第i行第k列的元素,m2[k,j]为所述数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n;
所述管理节点根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果获得所述待处理数据集的处理结果。
结合第六方面,在第一种可能的实现方式中,所述数据处理系统中还包括第二类计算节点,所述方法还包括:
所述管理节点根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为所述至少一个计算节点中的第y个计算节点,y为正整数;
所述计算节点SCy根据所述第二处理任务获取所述第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,所述SCy获取的第一中间结果是根据所述待处理数据集中位于同一行的数据块M1x和位于同一列的数据块M2x获得的第一中间结果;
所述计算节点SCy将所述SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果M″y,其中,所述第二中间结果M″y为包含m行p列数据的矩阵;
所述管理节点根据所述第二类计算结果中的至少一个计算节点获得的第二中间结果获得所述待处理数据集的处理结果。
结合第六方面的第一种可能的实现方式,在第二种可能的实现方式中,所述数据集还包括数据块M3x,所述数据块M3x为包含m行p列数据的矩阵,所述方法还包括:
所述管理节点根据所述第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
所述计算节点SCy根据所述第三处理任务获取所述数据集中的数据块 M3x
所述计算节点SCy对所述SCy获得的第二中间结果M″y和所述数据块M3x执行赋值assign操作,以获得所述待处理数据集的处理结果。
基于上述技术方案,本发明实施例提供的数据处理系统、计算节点和数据处理的方法,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本发明的一些实施例的附图。
图1是本发明实施例涉及的概念“图”的示意图。
图2是本发明实施例涉及的概念“图”的又一示意图。
图3A是用“有向图”表示对象关系的示意图;图3B是图3A的“有向图”对应的邻接矩阵。
图4是一种矩阵向量乘操作的示意图。
图5是一种矩阵矩阵乘操作的示意图。
图6A是用“带权有向图”表示对象关系的示意图;图6B是图6A的“带权有向图”对应的邻接矩阵。
图7是又一种矩阵向量乘操作的示意图。
图8是本发明实施例提供的一种数据处理系统的示意性框图。
图9是本发明实施例提供的一种数据处理系统进行数据处理的示意性流程图。
图10是本发明实施例提供的一种单源最短路径(Single Source Shortest Path,SSSP)算法的示意图。
图11是本发明实施例提供的又一种数据处理系统的示意性框图。
图12是本发明实施例提供的又一种数据处理系统进行数据处理的示意性流程图。
图13是本发明实施例提供的又一种数据处理系统执行概率传播算法的示意图。
图14是本发明实施例提供的一种计算节点的示意性框图。
图15是本发明实施例提供的又一种计算节点的示意性框图。
图16是本发明实施例提供的又一种计算节点的示意性框图。
图17是本发明实施例提供的又一种计算节点的示意性框图。
图18是本发明实施例提供的一种数据处理的方法的示意性流程图。
图19是本发明实施例提供的又一种数据处理的方法的示意性流程图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。
为了方便理解,首先,对本说明书中涉及的概念及相关技术进行简要说明。
图:
图用于描述对象与对象之间的关系,从直观上看,图是由一些小圆点和连接这些圆点的线组成,圆点称为图的顶点,连接圆点的线称为边。边可以分为如图1所示的无向边和如图2所示的有向边。其中,图1中包含1-6六个对象,六个对象相互之间的关系由它们之间的无向边表示;图2中包含0-6七个对象,七个对象相互之间的单向关系由它们之间的有向边表示,当对象相互之间的关系由有向边表示时,该图可以称为有向图。
乘:
在本文中,乘可以表示广义乘,也可以表示两个数乘。例如,矩阵向量乘中的乘指的是广义乘,即矩阵的元素与向量对应元素的乘不是传统意义的数乘,而可以是加、减、乘、除、求和、求积、取最大值或取最小值等其它处理,本发明实施例对此不作限定。
邻接矩阵:
邻接矩阵,在本文中简称为矩阵,是用一个二维矩阵来存储图中各顶点之间的邻接关系。如图3A和图3B所示,图3A为一个有向图,有V1-V6六个顶点,有向边表示六个顶点之间的相互关系;图3B为对应的邻接矩阵表示形式。通过邻接矩阵表示图,可以很好地对图进行并行化处理。矩阵可分为两种,稠密矩阵和稀疏矩阵。稠密矩阵一般用一个向量或二维向量(向量 的向量)表示,分为以行为主序和以列为主序;稀疏矩阵(不存储0元素)一般有COO、CSR和CSC三种存储格式。矩阵数据表示中的0元素(或者,如SSSP算法的无穷值元素)在存储时不用存储,因而,用矩阵表示图可以减小数据存储量。在邻接矩阵的表示方式下,图的大部分操作可以转化为矩阵向量乘(即矩阵与向量乘)操作,或者矩阵矩阵乘(即矩阵与矩阵乘)操作。
矩阵向量乘操作:
矩阵向量乘操作是指基于矩阵与向量之间的一系列运算。矩阵向量乘操作表示图计算的例子:在图3A的有向图中查找V2的所有好友,可以通过如图4所示的矩阵向量乘的方式来完成。首先构造查询向量,因为查找的是V2的出度顶点(OutNeighbors),所以向量的第2个元素置1,其他元素置为0;其次,因为查找的是OutNeighbors,所以邻接矩阵需要进行转置;最后,将转置后的邻接矩阵与构造出来的向量进行相乘(如图4所示),得到结果向量,如图中所示结果向量的第6位为1,即表示V2的好友只有V6,从图3A可以验证这个结果。
矩阵矩阵乘操作:
矩阵矩阵乘操作是指基于矩阵与矩阵之间的一系列运算。矩阵矩阵乘操作表示图计算的例子:计算图3A中两两顶点间的共同好友(出度顶点)个数,可以通过一个矩阵矩阵乘操作实现。首先构造邻接矩阵,如图3B矩阵所示,这里用A表示该矩阵。然后执行B=(bij)=A*AT(如图5所示),得到的矩阵B中的元素bij的值,表示第i个顶点与第j个顶点有多少个共同出度顶点。例如,第3行第1列的值为1,表示顶点3与顶点1有一个共同好友。
分布式矩阵:
集群环境中包括多个用于计算的计算单元,矩阵向量乘操作和矩阵矩阵乘操作都基于矩阵分块进行操作,矩阵进行分块后可以称为分布式矩阵,对分布式矩阵可以在多个计算单元上进行并行式处理。
继而,对现有技术中矩阵向量乘模型进行简要介绍。
传统的矩阵操作只针对数值型,且对具体操作作了限定。例如,传统的 矩阵向量乘如公式1所示,Mij与Vj的操作只能是乘法,矩阵某一行与向量乘的结果(每一个乘的结果记为xj=MijVj)只能为叠加处理。因此,大大限制了矩阵操作可以表示的算法。例如SSSP算法,就无法用传统的矩阵向量乘来表示。
Figure PCTCN2015072451-appb-000001
  (公式1)
其中,M为m行n列矩阵,Mij为将M分块后的矩阵块;V为n维列向量,Vj为将V分块后的向量块;V'为m维列向量。
针对现有矩阵向量乘模型的限制,基于矩阵向量操作的大数据处理系统PEGASUS提出了GIMV模型。GIMV模型扩展了传统的矩阵向量乘:
1.将Mij与Vj的处理扩展为一个合并操作combine2,combine2是对矩阵元素和向量元素的合并操作,矩阵元素和向量元素的类型可以是不一样的,其操作可以是加、减、乘、除、取最大值、取最小值等,但本发明实施例并不限于此,combine2后返回中间值xj
2.对矩阵某一行的combine2结果x1,…,xn应用一个合并操作combineAll,combineAll为对多个数值或一个记录集的合并操作,一般为一个函数(例如可以为累加操作),combineAll后返回中间值
Figure PCTCN2015072451-appb-000002
3.将本次计算得到的
Figure PCTCN2015072451-appb-000003
与初始向量V的元素Vi进行赋值操作assign,得到本次计算的结果向量的元素的值V'i
GIMV模型输入矩阵M和向量V,经过三个操作后输出向量V’,其操作主要有三个算子:
Figure PCTCN2015072451-appb-000004
 (公式2)
1.combine2(Mij,Vj),对Mij和Vj进行合并操作,得到中间结果xj
2.combineAll(x1,…,xn),对矩阵某一行的中间结果x1,…,xn进行合并操作,得到中间结果
Figure PCTCN2015072451-appb-000005
3.
Figure PCTCN2015072451-appb-000006
将初始向量的元素Vi与对应中间值
Figure PCTCN2015072451-appb-000007
进行赋值操作,得到结果向量的元素的值V'i
GIMV模型可以表示更多的算法,例如SSSP算法可以对应于:combine2操作为“加法”操作,即combine2(Mij,Vj)=Mij+Vj;combineAll操作为“取最小值”操作,即combineAll(x1,…,xn)=min(x1,…,xn);assign操作也是“取最小值”操作,即:
Figure PCTCN2015072451-appb-000008
如图6A表示的一个带权有向图,图6B为对应的邻接矩阵M。其中:矩阵M中的值为两点的距离权重,顶点到自身的距离为0,不可达用无穷表示。要求从顶点0出发到其它各个顶点的最短距离,可以用迭代的矩阵向量乘来实现,每一次迭代表示跳数加1得到的最短距离(例如:从顶点1到顶点2表示1跳,从顶点1经过顶点2到达顶点3表示2跳)。
算法首先构造初始向量V,从顶点0出发,所以初始向量V的0位置(第1行的元素)的值为0,如图7中的V所示。计算时,用图6B中的邻接矩阵M转置后乘以初始向量V,然后用SSSP算法的三个算子进行运算:combine2(Mij,Vj)=Mij+Vj,combineAll(x1,…,xn)=min(x1,…,xn),
Figure PCTCN2015072451-appb-000009
第一次乘得到的结果向量V'表示,从顶点0出发经过1跳,可以到达其它顶点的最短距离。SSSP算法迭代的停止条件为:如果某次迭代得到的结果向量V',相比这次迭代的初始向量V没有变化,则算法终止。如果未达到停止条件,算法继续进行迭代,即:将本次迭代的结果向量V',作为下次迭代的初始向量V,进行下一次迭代;算法得到的最终结果向量V',就是从顶点0出发到各个顶点的最短距离。
GIMV模型的执行流程为:combineAll操作对矩阵某一行所有的combine2结果进行的操作,需要等到所有combine2操作完成之后才能进行。因此,在计算过程中其需要占用一个矩阵大小的中间内存空间。而且,在分布式环境下,系统需要进行大量的数据传输。assign操作是combineAll得到的中间向量V与初始向量V的操作。因此,两个向量维度必需相等,从另一个角度来说就是矩阵M必需为方阵。且assign的向量只能为初始向量V,这也限制了GIMV模型的表示范围。
本发明正是针对上述方案的缺点对数据处理的方法进行了改进,以减少数据处理的矩阵向量乘操作过程中的中间内存占用以及传输的数据量;此外,基于与本发明矩阵向量乘操作相似的原理,提出了矩阵矩阵乘操作的模型,从而能够表达更多的算法。
图8示出了根据本发明实施例的数据处理系统100的示意性框图。如图8所示,该数据处理系统100包括管理节点110和第一类计算节点120,
该管理节点110用于:
向该第一类计算节点120中包括FCx121在内的至少两个计算节点分配第一处理任务,其中,FCx121为该至少两个计算节点中的第x个计算节点,x为正整数;
该第一类计算节点120中的至少两个计算节点并行处理该管理节点110分配的第一处理任务;
该计算节点FCx121,用于根据该管理节点110分配的第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,该数据块Mx为包含m行n列数据的矩阵,该数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
对该数据块Mx以及该数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,该第一中间结果V′x为包括m维数据的向量,该第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为该数据块Mx中的元素,vj为该数据块V1x中的元素,j为变量,j的取值分别从1到n;
该管理节点110,还用于根据该第一类计算节点120中的至少两个计算节点获得的第一中间结果获得该待处理数据集的处理结果。
因此,本发明实施例提供的数据处理系统,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
具体而言,本发明实施例提供的数据处理系统100可以应用于大数据处理,由于大数据处理的数据量较大,通常将数据进行分块,把不同的数据块分发给不同的计算节点进行并行计算,以提高计算的效率。该数据处理系统 100包括管理节点110和第一类计算节点120。管理节点110用于接收数据处理任务,并将该数据处理任务切分为多个处理任务,将处理任务分发给计算节点。管理节点110还用于接收各计算节点对其处理任务的执行状态,以管理数据处理的进程。计算节点用于接收管理节点110下发的处理任务,根据处理任务获取数据块,以执行相应的处理任务。计算节点可以获取本计算节点中存储的数据块执行处理任务,也可以获取其它计算节点中存储的数据块执行处理任务。根据处理的任务的类别不同,可以将计算节点按照其处理任务的类别进行分类。例如,处理第一处理任务的为第一类节点,处理第二处理任务的为第二类节点。
在本发明实施例中,管理节点110用于向该第一类计算节点120中包括FCx121在内的至少两个计算节点分配第一处理任务,其中,FCx121为该至少两个计算节点中的第x个计算节点,x为正整数。该第一类计算节点120中的至少两个计算节点并行处理该管理节点110分配的第一处理任务。
计算节点FCx121在接收到管理节点110分配的第一处理任务后,根据第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,该数据块Mx为包含m行n列数据的矩阵,该数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2,可以将对数据块Mx和数据块V1x的处理视为矩阵向量乘操作。
计算节点FCx121对该数据块Mx以及该数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,该第一中间结果V′x为包括m维数据的向量。该第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为该数据块Mx中的元素,vj为该数据块V1x中的元素,j为变量,j的取值分别从1到n。
具体地,对该数据块Mx的第i行第j列元素和该数据块V1x的第j行元素进行合并操作,以得到该数据块Mx的第i行第j列元素和该数据块V1x的第j行元素对应的中间结果xj。这里合并操作可以为前文介绍的combine2处理,用公式可以表示为中间结果xj=combine2(mi,j,vj)。
继而,对数据块Mx的第i行对应的中间结果xj执行reduce2操作,以获得该数据块Mx的第i行对应的元素vi',i取值分别从1到m,则可以得到第一中间结果V′x。combine2操作和reduce2操作可以是先计算x1和x2,对x1和 x2进行reduce2操作;再计算x3,将x1和x2进行reduce2操作后的结果与x3进行reduce2操作;……;直至将数据块Mx的第i行对应的中间结果xj全部都经过reduce2操作。reduce2操作并不是等待所有的combine2操作全部进行完成之后才进行的,而是combine2操作和reduce2操作交替进行。这样操作,在计算过程中经过reduce2操作的中间结果xj可以删除掉,而不需将所有的combine2操作的结果均存储在内存中,因而可以节省内存空间。
应理解,以上过程实质上为一个更新过程,即首先通过reduce2操作得到两个xj进行reduce2操作后的中间结果,继而通过与其它的xj或其它的中间结果再进行reduce2操作得到中间结果,不断地更新中间结果。
这里归约reduce2操作可以是加、减、乘、除、取最大值、取最小值等,但本发明实施例并不限于此。此处,reduce2操作对矩阵的某一行各元素对应的中间结果(如x1,…,xn)进行处理,不需等到全部的x1,…,xn计算完毕后再进行,而是在计算中间结果xj的过程中逐步展开reduce2处理。
应理解,reduce2操作的优点是,在计算过程中不需关心进行reduce2操作的各元素的顺序,即不论reduce2操作中各元素的顺序是何种的,所得到的结果是唯一的。例如,在scala语言中,一个数组it=Array(0,1,2,3,4,5),对该数组进行求和,可以表达为it.reduce(_+_)。在计算的底层,对数据从左边一直加到右边得到的值,和两两进行reduce2求和,直到得到最终值的结果是一样的。上文中的将x1和x2进行reduce2操作后的结果与x3再进行reduce2操作的描述仅是其中的一种实现方式,reduce2操作的执行顺序不限于按照x1,…,xn顺序执行,x1,…,xn中任意两个进行reduce2操作再与其它的xj进行reduce2操作均能实现与顺序执行相同的结果,本发明实施例对reduce2操作的顺序不作限定。
管理节点110,还用于根据该第一类计算节点120中的至少两个计算节点获得的第一中间结果获得该待处理数据集的处理结果。第一类计算节点120将第一处理任务完成后,通知管理节点110,管理节点110根据该第一类计算节点120中的至少两个计算节点获得的第一中间结果获得该待处理数据集的处理结果,或者以该第一中间结果作为其它处理任务的基础数据,向相应的计算节点下发使用该第一中间结果进行计算的处理任务。
因此,本发明实施例提供的数据处理系统,在对数据块进行合并操作和 归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
可选地,作为一个实施例,该数据处理系统100中还包括第二类计算节点,该管理节点110具体用于:
根据该第一类计算节点120中的至少两个计算节点获得的第一中间结果,向该第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为该至少一个计算节点中的第y个计算节点,y为正整数;
该计算节点SCy用于:
根据该第二处理任务获取该第一类计算节点120中的至少两个计算节点获得的第一中间结果,其中,该SCy获取的第一中间结果是根据该待处理数据集中位于同一行的数据块获得的第一中间结果;
将该SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果V″y,其中,该第二中间结果V″y为包含m维数据的向量;
该管理节点110具体用于:
根据该第二类计算节点中的至少一个计算节点获得的第二中间结果获得该待处理数据集的处理结果。
具体而言,在完成上述处理后,对第一类计算节点120获得的第一中间结果还可以进行其它处理。例如,当第一类计算节点120中的至少两个计算节点处理的数据块Mx为待处理数据集中位于同一行的数据块时,管理节点110可以根据该第一类计算节点120中的至少两个计算节点获得的第一中间结果,向第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为该至少一个计算节点中的第y个计算节点,y为正整数。
计算节点SCy用于根据该第二处理任务获取该第一类计算节点120中的至少两个计算节点获得的第一中间结果,其中,该SCy获取的第一中间结果是根据该待处理数据集中位于同一行的数据块获得的第一中间结果。第二处理任务为将该SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果V″y,其中,该第二中间结果V″y为包含m维数据的向量。将该SCy获取的第一中间结果执行reduce2操作,与上文中描述的reduce2操作类似,即将待处理数据集中位于同一行的数据块获得的两个第一中间结果先进行reduce2操作,将reduce2操作的结果再与其它的第一中间结果进行reduce2 操作。
管理节点110具体用于根据该第二类计算节点中的至少一个计算节点获得的第二中间结果获得该待处理数据集的处理结果。
可选地,作为一个实施例,该数据集还包括数据块V2x,该数据块V2x为包含m维数据的向量,该管理节点110具体还用于:
根据该第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向该第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
该计算节点SCy还用于:
根据该第三处理任务获取数据集中的数据块V2x
对该SCy获得的第二中间结果V″y和该数据块V2x执行赋值assign操作,以获得该待处理数据集的处理结果。
具体而言,将待处理数据集中位于同一行的数据块获得的第一中间结果进行reduce2操作后获得的第二中间结果,还可以对其进行assign操作,以获得该待处理数据集的处理结果。第二中间结果V″y和数据块V2x均为包含m维数据的向量,相对应的元素进行assign操作可以得到结果向量,该结果向量为m维列向量。这里赋值处理可以为上文介绍的assign处理,用公式可以表示为结果向量的元素V3,i=assign(V″i,V2,i),m个V3,i形成结果向量V3x
以上为矩阵向量乘操作的过程。总体而言,本发明实施例的矩阵向量乘操作可以表示为以下公式3。
Figure PCTCN2015072451-appb-000010
  (公式3)相对于现有的GIMV模型,本发明实施例的assign处理中增加了向量V2,因此,结果向量V3可以表示为
V3=α*M*V1+β*V2  (公式4)
其中,α和β为数值。公式4表示相对现有的GIMV模型,本发明实施例的数据处理系统100所得到的结果向量V3中引入了权重V2,使得assign处理不在局限于用于乘的向量,因此也能够支持非方阵的矩阵乘操作,扩大了矩阵向量乘操作可以表示的范围。
下面以一个具体的例子来说明在矩阵向量乘操作中引入权重V2的作用。例如,在用于性能测试的网页排名pagerank算法中,常常将“用于加的向量”(对应于上位中的向量V2)设为(1-d)/N来参与运算。(1-d)/N向量可用 于调整对应的图中各个顶点的pagerank值,从而使得各个顶点的pagerank值更符合真实情况。pagerank值可以表示为R,具体为公式5
Figure PCTCN2015072451-appb-000011
 (公式5)
例如,对于某些官方网站,可以设置向量中对应该官方网站的顶点的值为较大,这样得到的结果pagerank值一般也会比较大。
可选地,作为一个实施例,m=n,数据块V1x和数据块V2x为同一数据块。即用于与数据块Mx操作的数据块V1x和用于进行assign操作的数据块V2x为同一数据块。
可选地,作为一个实施例,在第二类计算节点中包括至少两个计算节点时,第二类计算节点中的至少两个计算节点并行处理管理节点分配的第二处理任务。在待处理数据集按行和按列均进行了分块时,则至少需要两个第二类计算节点进行第二处理任务的处理;在待处理数据集仅按列进行了分块时,则需要一个第二类计算节点;在待处理数据集仅按行进行了分块时,则不需要第二类计算节点,即不需要进行第二处理任务的处理。
可选地,作为一个实施例,管理节点、第一类计算节点和第二类计算节点可以是物理机、虚拟机或中央处理器CPU等,本发明实施例对此不作限定。
下面将结合具体的例子对本发明实施例进行详细说明。
图9示出了根据本发明实施例的数据处理系统100进行数据处理的示意性流程图。如图9所示,数据处理系统100进行数据处理包括以下步骤:
S201,进行预处理,获取数据块M'、数据块V1'和数据块V2',其中,数据块M'为矩阵,数据块V1'和数据块V2'为向量。
S202,进行矩阵分发,将矩阵M'进行分块,并将矩阵M'按块分发到集群的至少两个计算节点上,其中计算节点FCx上分发的数据块为Mx
S203,进行数据块V2'分发,对数据块V2'进行分块,将分块后的数据块V2'进行广播。
S204,进行数据块V1'分发,对数据块V1'进行分块,将分块后的数据块V1' 进行广播。S202至S204中的经分块后的矩阵和向量被相对应的分发到至少两个计算节点,这些计算节点是分布式的上。
S205,在每一计算节点上,进行局部combine2处理和局部reduce2处理。数据块Mx和数据块V1x进行combine2处理,在得到数据块Mx和数据块V1x所对应的所有中间结果之前,对中间结果进行reduce2处理得到第一中间结果;继而对第一中间结果和新得到的中间结果进行reduce2处理以得到新的第一中间结果。最终的第一中间结果为数据块Mx和数据块V1x所对应的所有的中间结果全部进行过进行reduce2处理后的结果。
S206,各计算节点对S205得到的第一中间结果进行全局数据传输,以将第一中间结果集中到一个计算节点上。本发明实施例的方法进行全局数据传输时是经过归约处理后的数据,相对于现有的矩阵向量乘操作,传输的数据量大大减少。
S207,对位于矩阵M'同一水平位置的相应的至少两个矩阵块对应的第一中间结果的元素进行reduce2处理,得到第二中间结果的元素。即对位于同一水平位置的多个矩阵块中的每一行的第一中间结果的元素进行reduce2处理,以得到第二中间结果的元素,多个第二中间结果的元素构成第二中间结果V″y
S208,将第二中间结果V″y的元素与数据块V2x的相应元素进行assign处理,得到结果向量的元素,从而得到结果向量。
S209,判断是否符合终止条件,如果是,结束;如果否,将结果向量作为下一次迭代的数据块V1',执行S204至S209。
下面就前文中描述的SSSP算法解决确定顶点0到其他顶点的最短距离问题的例子,对本发明实施例进行说明。SSSP算法中的矩阵为图6B中的邻接矩阵M转置之后的矩阵,向量V1和向量V2均为图7中的V。如图10所示,首先将图6B中的邻接矩阵M转置之后进行分块,对初始向量V进行分块。然后对矩阵的块与V的相应的块进行combine2处理和reduce2处理。这里,reduce2处理是在还未获得矩阵的块与V的相应的块的combine2处理的所有的第一中间结果时就执行的,即combine2处理和reduce2处理是交替进行的。对各块进行combine2处理和reduce2处理后,可以得到各个矩阵块对应的第一中间结果。对位于同一行的矩阵块对应的第一中间结果再进行reduce2处理可以得到中间向量V。中间向量V与V进行assign处理,得到结果向量。 其中,combine2处理为“加”处理,combine2(Mij,Vj)=Mij+Vj;reduce2处理为“取最小值”处理,reduce2(x1,x2)=min(x1,x2);assign处理为“取最小值”处理,
Figure PCTCN2015072451-appb-000012
因此,本发明实施例提供的数据处理系统,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
图11示出了根据本发明另一实施例的数据处理系统300的示意性框图。如图11所示,该数据处理系统300包括管理节点310和第一类计算节点320,
该管理节点310用于:
向该第一类计算节点320中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为该至少两个计算节点中的第x个计算节点,x为正整数;
该第一类计算节点320中的至少两个计算节点并行处理该管理节点310分配的第一处理任务;
该计算节点FCx,用于根据该管理节点310分配的第一处理任务获取待处理的数据集中的数据块M1x和数据块M2x,其中,该数据块M1x为包含m行n列数据的矩阵,该数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
对该数据块M1x以及该数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,该第一中间结果M′x为包括m行p列数据的矩阵,该第一中间结果M′x中的元素为m′i,j,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为该数据块M1x中第i行第k列的元素,m2[k,j]为该数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n;
该管理节点310,还用于根据该第一类计算节点320中的至少两个计算节点获得的第一中间结果获得该待处理数据集的处理结果。
因此,本发明实施例提供的数据处理系统,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空 间,减少计算时间。
具体而言,本发明实施例提供的数据处理系统300可以应用于大数据处理,由于大数据处理的数据量较大,通常将数据进行分块,把不同的数据块分发给不同的计算节点进行并行计算,以提高计算的效率。该数据处理系统300包括管理节点310和第一类计算节点320。管理节点310用于接收数据处理任务,并将该数据处理任务切分为多个处理任务,将处理任务分发给计算节点。管理节点310还用于接收各计算节点对其处理任务的执行状态,以管理数据处理的进程。计算节点用于接收管理节点310下发的处理任务,根据处理任务获取数据块,以执行相应的处理任务。计算节点可以获取本计算节点中存储的数据块执行处理任务,也可以获取其它计算节点中存储的数据块执行处理任务。根据处理的任务的类别不同,可以将计算节点按照其处理任务的类别进行分类。例如,处理第一处理任务的为第一类节点,处理第二处理任务的为第二类节点。
在本发明实施例中,管理节点310用于向该第一类计算节点320中包括FCx321在内的至少两个计算节点分配第一处理任务,其中,FCx321为该至少两个计算节点中的第x个计算节点,x为正整数。该第一类计算节点320中的至少两个计算节点并行处理该管理节点310分配的第一处理任务。
计算节点FCx321在接收到管理节点310分配的第一处理任务后,根据第一处理任务,获取待处理的数据集中的数据块M1x和数据块M2x,其中,该数据块M1x为包含m行n列数据的矩阵,该数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2,可以将对数据块M1x和数据块M2x的处理视为矩阵矩阵乘操作。
计算节点FCx321对该数据块M1x以及该数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,该第一中间结果M′x为包括m行p列数据的矩阵。该第一中间结果M′x中的元素为m′ij,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为该数据块M1x中第i行第k列的元素,m2[k,j]为该数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n。
具体地,对该数据块M1x的第i行第z列元素和该数据块M2x中第k行第j列的元素进行合并操作,以得到该数据块M1x的第i行第k列元素和该 数据块M2x中第k行第j列的元素对应的中间结果xikj。这里合并操作可以为前文介绍的combine2操作,用公式可以表示为中间结果xikj=combine2(m1[i,k],m2[k,j])。
继而,对数据块M1x的第i行和数据块M2x第j列对应的中间结果xikj执行reduce2操作,以获得该数据块Mx的第i行和数据块M2x第j列对应的元素mi',j,i的取值分别从1到m,j的取值分别从1到p,则可以得到第一中间结果M′x。combine2操作和reduce2操作可以是先计算xi1j和xi2j,对xi1j和xi2j进行reduce2操作;再计算xi3j,将xi1j和xi2j进行reduce2操作后的结果与xi3j进行reduce2操作;……;直至将数据块Mx的第i行和数据块M2x第j列对应的中间结果xikj全部都经过reduce2操作。reduce2操作并不是等待所有的combine2操作全部进行完成之后才进行的,而是combine2操作和reduce2操作交替进行。这样操作,在计算过程中经过reduce2操作的中间结果xj可以删除掉,而不需将所有的combine2操作的结果均存储在内存中,因而可以节省内存空间。
应理解,以上过程实质上为一个更新过程,即首先通过reduce2操作得到两个xikj进行reduce2操作后的中间结果,继而通过与其它的xikj或其它的中间结果再进行reduce2操作得到中间结果,不断地更新中间结果。
这里归约reduce2操作可以是加、减、乘、除、取最大值、取最小值等,但本发明实施例并不限于此。此处,reduce2操作对矩阵的某一行各元素对应的中间结果(如xi1j,…,xinj)进行处理,不需等到全部的xi1j,…,xinj计算完毕后再进行,而是在计算中间结果xikj的过程中逐步展开reduce2处理。
应理解,reduce2操作的优点是,在计算过程中不需关心进行reduce2操作的各元素的顺序,即不论reduce2操作中各元素的顺序是何种的,所得到的结果是唯一的。例如,在scala语言中,一个数组it=Array(0,1,2,3,4,5),对该数组进行求和,可以表达为it.reduce(_+_)。在计算的底层,对数据从左边一直加到右边得到的值,和两两进行reduce2求和,直到得到最终值的结果是一样的。上文中的将xi1j和xi2j进行reduce2操作后的结果与xi3j再进行reduce2操作的描述仅是其中的一种实现方式,reduce2操作的执行顺序不限于按照xi1j,…,xinj顺序执行,xi1j,…,xinj中任意两个进行reduce2操作再与其它的xj进行reduce2操作均能实现与顺序执行相同的结果,本发明实施例对 reduce2操作的顺序不作限定。
管理节点310,还用于根据该第一类计算节点320中的至少两个计算节点获得的第一中间结果获得该待处理数据集的处理结果。第一类计算节点320将第一处理任务完成后,通知管理节点310,管理节点310根据该第一类计算节点320中的至少两个计算节点获得的第一中间结果获得该待处理数据集的处理结果,或者以该第一中间结果作为其它处理任务的基础数据,向相应的计算节点下发使用该第一中间结果进行计算的处理任务。
因此,本发明实施例提供的数据处理系统,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
可选地,作为一个实施例,数据处理系统300中还包括第二类计算节点,该管理节点310具体用于:
根据该第一类计算节点320中的至少两个计算节点获得的第一中间结果,向该第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为该至少一个计算节点中的第y个计算节点,y为正整数;
该计算节点SCy用于:
根据该第二处理任务获取该第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,该SCy获取的第一中间结果是根据该待处理数据集中位于同一行的数据块M1x和位于同一列的数据块M2x获得的第一中间结果;
将该SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果M″y,其中,该第二中间结果M″y为包含m行p列数据的矩阵;
该管理节点310具体用于:
根据该第二类计算结果中的至少一个计算节点获得的第二中间结果获得该待处理数据集的处理结果。
具体而言,在完成上述处理后,对第一类计算节点320获得的第一中间结果还可以进行其它处理。例如,当第一类计算节点320中的至少两个计算节点处理的数据块M1x为待处理数据集中位于同一行的数据块,且数据块M2x为待处理数据集中位于同一列的数据块时,管理节点310可以根据该第一类计算节点320中的至少两个计算节点获得的第一中间结果,向第二类计 算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为该至少一个计算节点中的第y个计算节点,y为正整数。
计算节点SCy用于根据该第二处理任务获取该第一类计算节点320中的至少两个计算节点获得的第一中间结果,其中,该SCy获取的第一中间结果是根据数据块M1x,M1x为待处理数据集中位于同一行的数据块,和数据块M2x,M2x为待处理数据集中位于同一列的数据块获得的第一中间结果。第二处理任务为将该SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果M″y,其中,该第二中间结果M″y为包含m行p列数据的矩阵。将该SCy获取的第一中间结果执行reduce2操作,与上文中描述的reduce2操作类似,即将数据块M1x,M1x为待处理数据集中位于同一行的数据块,和数据块M2x,M2x为待处理数据集中位于同一列的数据块获得的两个第一中间结果先进行reduce2操作,将reduce2操作的结果再与其它的第一中间结果进行reduce2操作。
管理节点110具体用于根据该第二类计算节点中的至少一个计算节点获得的第二中间结果获得该待处理数据集的处理结果。
可选地,作为一个实施例,该数据集还包括数据块M3x,该数据块M3x为包含m行p列数据的矩阵,该管理节点310还用于:
根据该第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向该第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
该计算节点SCy还用于:
根据该第三处理任务获取该数据集中的数据块M3x
对该SCy获得的第二中间结果M″y和该数据块M3x执行赋值assign操作,以获得该待处理数据集的处理结果。
具体而言,将待处理数据集中M1x位于同一行且M2x位于同一列的数据块获得的第一中间结果进行reduce2操作后获得的第二中间结果,还可以对其进行assign操作,以获得该待处理数据集的处理结果。第二中间结果M″y和数据块M3x均为包含m行p列数据的矩阵,相对应的元素进行assign操作可以得到结果向量,该结果向量为包含m行p列数据的矩阵。这里赋值处理可以为上文介绍的assign处理。
以上为矩阵矩阵乘操作的过程。总体而言,本发明实施例的矩阵矩阵乘 操作可以表示为以下公式6。
Figure PCTCN2015072451-appb-000013
  (公式6)
可选地,作为一个实施例,数据处理系统300中的第二类计算节点还用于对待处理数据集的处理结果的第r行进行行处理,该行处理为针对该第r行的元素的处理。
具体而言,可以对结果矩阵D进行进一步处理,例如对结果矩阵D的第r行的元素进行归约处理。用公式可以表示为reduceRow(Di1,…,Din),其中reduceRow处理可以是取最大值、取最小值、取最大的Q个值、取最小的Q个值、对该行数据求和等等,本发明实施例对此不作限定。经reduceRow处理后所得的结果存储时可以仍以对应的矩阵形式存储,例如,对结果矩阵D的第i行进行取最大值处理,最大值为Di1,则所存储的矩阵的第i行的第1列存储数值Di1,其它列存储数值0(或者不存储0)。经reduceRow处理后所得的结果存储时可以仅存储处理后所得的数值,例如,对结果矩阵D的第i行进行求和处理,求和的结果为Y,则存储数值Y,本发明实施例对存储方式不作限定。
类似地,作为一个实施例,数据处理系统300中的第二类计算节点还用于对待处理数据集的处理结果的第c列进行列处理,该列处理为针对该第c列的元素进行的处理。为避免重复,文中不再赘述。
可选地,作为一个实施例,n=p,该数据块M1x和该数据块M3x为同一数据块。应理解,例如数据块M1x为3行4列的矩阵,数据块M2x为4行4列的矩阵,对数据块M1x和数据块M2x进行合并处理后,得到3行4列的矩阵,再将该3行4列的矩阵与数据块M3x进行赋值处理后得到结果矩阵,因此数据块M1x和数据块M3x可以为同一数据块。
可选地,作为一个实施例,n=m,该数据块M2x和该数据块M3x为同一数据块。应理解,在本发明实施例中,可以对进行运算的数据块M1x、数据块M2x和数据块M3x中的至少一个进行转置等操作以满足运算的需要。因此,数据块M2x和数据块M3x为同一数据块。
可选地,作为一个实施例,在该第二类计算节点中包括至少两个计算节点时,该第二类计算节点中的至少两个计算节点并行处理该管理节点分配的第二处理任务。
可选地,作为一个实施例,该管理节点、该第一类计算节点和该第二类计算节点包括物理机、虚拟机或中央处理器CPU。
下面将结合具体的例子对本发明实施例进行详细说明。
图12示出了根据本发明实施例的数据处理系统300进行数据处理的方法400的示意性流程图。如图12所示,该方法400包括以下步骤:
S401,进行预处理,获取数据块M1、数据块M2和数据块M3,该数据块M1、数据块M2和数据块M3均为矩阵。
S402,进行矩阵分块,将数据块M1、数据块M2进行分块。该数据块M1分为多个数据块M1x,数据块M1x为包含m行n列数据的矩阵,数据块M2分为多个数据块M2x,该数据块M2x为包含n行p列数据的矩阵。
S403,进行矩阵分发,将数据块M1x按行分发到至少一个计算节点上,相应地,将数据块M2x按列分发到至少一个计算节点上。
S404,在每一第一类计算节点上,进行局部combine2处理和局部reduce2处理。数据块M1x和数据块M2x进行combine2处理,在得到数据块M1x的某一行和数据块M2x的相应的列所对应的所有中间结果之前,对中间结果进行reduce2处理得到第一中间结果;继而对第一中间结果和新得到的中间结果进行reduce2处理,以得到新的第一中间结果。第一中间结果为矩数据块M1x的某一行和数据块M2x的相应的列所对应的所有的中间结果全部进行过进行reduce2处理后的结果。进而,类似地获得数据块M1x的某一行和数据块M2x的列的所有组合对应的第一中间结果,第一中间结果可以形成一个矩阵。
S405,各第一类计算节点上对S404得到的第一中间结果进行全局数据传输,以将第一中间结果集中到一个第二类计算节点上。本发明实施例的方法进行全局数据传输时是经过归约处理后的数据,传输的数据量较小。
S406,对数据块M1同一行数据块M2同一列的相应的至少两个数据块对应的第一中间结果进行reduce2处理,得到第二中间结果。多个第二中间结果构成中间矩阵X。对中间矩阵X进行分块,并分发到至少一个计算节点上。
S407,对数据块M3进行分块并分发,将数据块M3的数据块M3x分发到中间矩阵X的矩阵块所位于的计算节点上,该数据块M3x为包含m行p列数据的矩阵。
S408,将中间矩阵X的矩阵块的元素与数据块M3x的相应元素进行 assign处理,得到结果矩阵D的矩阵块的元素,从而得到结果矩阵D的矩阵块。
S409,将结果矩阵D的每一个矩阵块按块进行reduceRow处理。
S410,进行数据传输,对S409中得到的结果进行传输,再对对应于同一行的矩阵块的结果进行reduceRow处理,得到矩阵Y。
S411,将矩阵Y的每一个矩阵块按列进行reduceCol处理。
S412,进行数据传输,对S411中得到的结果进行传输,再对对应于同一列的矩阵块的结果进行reduceCol处理,得到矩阵Z。
因此,本发明实施例提供的数据处理系统,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
概率传播是推荐算法的一种,对于一个“用户-项目”的交互记录数据库,要为每个用户推荐用户可能感兴趣的若干个项目。概率传播是基于全局数据,它能一次性计算所有用户的潜在兴趣项目。该算法有坚实的理论基础:概率传播是从物理学上的“能量守恒定律”演化而来,矩阵的运算类似于能量在不同物质间的传播,最终得到的兴趣度矩阵与原始矩阵对应的行之和完全相等,对应的列之和也完全相等,体现了能量的守恒性。
概率传播算法可以用矩阵操作实现,现有的概率传播算法的矩阵实现可以得到“电影对用户的吸引度矩阵”和“用户之间的兴趣相似度矩阵”;然后,通过矩阵矩阵乘操作,得到“电影对用户的新的吸引度矩阵”;接着,筛除用户已看过的电影;最后,为每个用户只推荐topk部没看过的电影。概率传播算法只为每个用户推荐有限的几部电影,得到的结果矩阵是比较稀疏的(0元素比较多),一般情况下数据量并不大(在这个场景下,0元素不用存储)。然而,现有方案在计算过程中得到的“电影对用户的新的吸引度”矩阵往往很稠密,数据量非常大。这就导致了大量中间内存占用,以及系统大量数据传输。
本发明实施例的矩阵矩阵乘操作实现的概率传播算法,其中,原始数据集中的电影数为m,用户数为n,对每用户推荐topk部电影。第一矩阵A为m行n列的“电影对用户的吸引度”矩阵,第二矩阵B为n行n列“用户之间的兴趣相似度”矩阵,用于assign的第三矩阵C与第一矩阵A为同一矩阵。 则为每个用户推荐topk部没看过的电影的公式为:
Figure PCTCN2015072451-appb-000014
 (公式7)
具体计算过程如下:首先,对第一矩阵A的元素和第二矩阵B的元素进行combine2处理和reduce2处理。这里,reduce2处理是在还未获得第一矩阵A的元素和第二矩阵B的元素的combine2处理的所有的中间结果时就执行的,即combine2处理和reduce2处理是交替进行的。对第一矩阵A的元素和第二矩阵B的元素进行combine2处理和reduce2处理后,可以得到第一矩阵A的和第二矩阵B对应的第一中间结果。对第一中间结果和用于assign的第三矩阵C进行assign处理,得到结果矩阵D。最后对结果矩阵D的各列进行reduceCol处理。
其中,combine2处理为“乘”处理,combine2(Aij,Bjk)=Aij*Bjk;reduce2处理为“加”处理,xik=reduce2(xi1k,xi2k)=xi1k+xi2k。assign处理为取“筛除”处理,
Figure PCTCN2015072451-appb-000015
即如果assign处理的第三矩阵C中相应位置处的元素不为零,则筛除该元素(该元素处置0),亦即,如果用户没有观看过该电影,则保留该数据,如果用户观看过该电影,则筛除该数据(该元素处置0),以用于执行reduceCol处理。reduceCol处理为“求topk”处理,reduceCol(D1j,…,Dmj)=(D1j,…,Dmj).topk,即求第j列的值最大的k个值。本例子中k取1。
采用本发明实施例的矩阵矩阵乘操作实现的概率传播算法主要流程图如图13所示。对第一矩阵A的一行和第二矩阵B的一列先进行combine2和reduce2处理,得到一个值
Figure PCTCN2015072451-appb-000016
然后,因为用于assign处理的第三矩阵C对应位置的值为1,进行assign处理得结果矩阵D对应的元素值为0(如果系统中0元素不保存,则相应的combine2和reduce2处理可以不进行计算)。最后,对于得到的同一列的元素,进行求topk处理,得到某个用户没看过的电影中吸引度topk的电影。
根据本发明实施例的矩阵矩阵乘操作,采用概率传播算法在计算“电影 对用户的新的吸引度”矩阵的过程中,可以同时筛除已看过的电影(甚至可以直接不计算需要筛除的记录),以及同时进行用户得分排名topk部电影的推荐操作,从而减少中间内存占用,以及系统数据传输量。
应理解,在本发明实施例的矩阵向量乘操作和矩阵矩阵乘操作的过程中可以引入isCompute算子,判断该行是否需要计算。如果不需要计算则跳过该行,继续计算下一行;如果需要计算,则按算法进行combine2操作和reduce2操作。通常,矩阵向量乘操作中的isCompute算子可以为一个与数据块Mx的行数相等的列向量,矩阵矩阵乘操作中的isCompute算子可以为一个矩阵,本发明实施例对此不作限定。
下面对本发明实施例的矩阵向量乘操作和矩阵矩阵乘操作的性能进行说明。
在通用并行计算框架spark上对比现有技术的扩展的GIMV模型与本发明实施例的矩阵向量乘操作的性能。测试环境为3台机器组成的集群(3台RH2285,12个核24个线程,192g内存,配置100g)。其中,测试数据为wiki_talk数据集,测试结果表明现有技术的扩展的GIMV模型的计算时间超过了3600s,本发明实施例的矩阵向量乘操作用了340s。
同样,在spark上对比了本发明实施例的矩阵矩阵乘操作和现有技术实现“求推荐电影”操作的性能。测试的数据集大小及测试结果如表1所示,测试数据为交互式网络电视(IPTV)和网飞(Nasdaq NFLX,NETFLIX)的数据集。从测试结果可以看出,本发明实施例可以有效的减少中间内存占用,缩短计算时间,从而可以处理更大的数据集。
表1
Figure PCTCN2015072451-appb-000017
上文中结合图1至图13,详细描述了根据本发明实施例的数据处理系统,下面将结合图14至图15,详细描述根据本发明实施例的数据处理系统中的计算节点。
图14示出了根据本发明实施例的计算节点500,计算节点500属于数据处理系统,数据处理系统还包括管理节点,计算节点500包括:
接收模块501,用于接收该管理节点分配的第一处理任务;
获取模块502,用于根据该接收模块501接收的该管理节点分配的该第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,该数据块Mx为包含m行n列数据的矩阵,该数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
处理模块503,用于对该数据块Mx以及该数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,该第一中间结果V′x为包括m维数据的向量,该第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为该数据块Mx中的元素,vj为该数据块V1x中的元素,j为变量,j的取值分别从1到n。
可选地,作为一个实施例,计算节点为物理机、虚拟机或中央处理器CPU,本发明实施例对此不作限定。
因此,本发明实施例提供的计算节点,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
图15示出了根据本发明实施例的计算节点600,该计算节点600属于数据处理系统,该数据处理系统还包括管理节点,该计算节点包括:
接收模块601,用于接收该管理节点分配的第一处理任务;
获取模块602,用于根据该接收模块601接收的该管理节点分配的该第一处理任务,获取待处理的数据集中的数据块M1x和数据块M2x,其中,该数据块M1x为包含m行n列数据的矩阵,该数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
处理模块603,用于对该数据块M1x以及该数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,该第一中间结果M′x为包括m行p列数据的矩阵,该第一中间结果M′x中的元素为m′ij,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为该数据块M1x中第 i行第k列的元素,m2[k,j]为该数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n。
可选地,作为一个实施例,计算节点为物理机、虚拟机或中央处理器CPU,本发明实施例对此不作限定。
因此,本发明实施例提供的计算节点,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
如图16所示,本发明实施例还提供了一种计算节点700,该计算节点700包括处理器701、存储器702、总线系统703和收发器704,处理器701、存储器702和收发器704通过总线系统703相连。存储器702用于存储指令,处理器701用于执行存储器702存储的指令。其中,收发器704用于:
接收该管理节点分配的第一处理任务;
根据该管理节点分配的该第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,该数据块Mx为包含m行n列数据的矩阵,该数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
处理器701用于:
对该数据块Mx以及该数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,该第一中间结果V′x为包括m维数据的向量,该第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为该数据块Mx中的元素,vj为该数据块V1x中的元素,j为变量,j的取值分别从1到n。
因此,本发明实施例提供的计算节点,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
应理解,在本发明实施例中,该处理器处理器701可以是中央处理单元(Central Processing Unit,CPU),该处理器处理器701还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立 门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器702可以包括只读存储器和随机存取存储器,并向处理器701提供指令和数据。存储器702的一部分还可以包括非易失性随机存取存储器。例如,存储器702还可以存储设备类型的信息。
该总线系统703除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统703。
在实现过程中,上述方法的各步骤可以通过处理器701中的硬件的集成逻辑电路或者软件形式的指令完成。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器702,处理器701读取存储器702中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
可选地,作为一个实施例,该计算节点700为物理机、虚拟机或中央处理器CPU。
应理解,根据本发明实施例的计算节点700可对应于执行本发明实施例中的方法的主体,还可以对应于根据本发明实施例的计算节点500,并且计算节点700中的各个模块的上述和其它操作和/或功能是为了实现数据处理的方法的相应流程,为了简洁,在此不再赘述。
因此,本发明实施例提供的计算节点,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
如图17所示,本发明实施例还提供了一种计算节点800,该计算节点800包括处理器801、存储器802、总线系统803和收发器804,处理器801、存储器802和收发器804通过总线系统803相连。存储器802用于存储指令,处理器801用于执行存储器802存储的指令。其中,收发器804用于:
接收该管理节点分配的第一处理任务;
根据该管理节点分配的该第一处理任务,获取待处理的数据集中的数据 块M1x和数据块M2x,其中,该数据块M1x为包含m行n列数据的矩阵,该数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
处理器801用于:
对该数据块M1x以及该数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,该第一中间结果M′x为包括m行p列数据的矩阵,该第一中间结果M′x中的元素为m′ij,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为该数据块M1x中第i行第k列的元素,m2[k,j]为该数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n。
因此,本发明实施例提供的计算节点,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
应理解,在本发明实施例中,该处理器处理器801可以是中央处理单元(Central Processing Unit,CPU),该处理器处理器801还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器802可以包括只读存储器和随机存取存储器,并向处理器801提供指令和数据。存储器802的一部分还可以包括非易失性随机存取存储器。例如,存储器802还可以存储设备类型的信息。
该总线系统803除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统803。
在实现过程中,上述方法的各步骤可以通过处理器801中的硬件的集成逻辑电路或者软件形式的指令完成。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组 合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器802,处理器801读取存储器802中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
可选地,作为一个实施例,该计算节点800为物理机、虚拟机或中央处理器CPU。
应理解,根据本发明实施例的计算节点800可对应于执行本发明实施例中的方法的主体,还可以对应于根据本发明实施例的计算节点600,并且计算节点800中的各个模块的上述和其它操作和/或功能是为了实现数据处理的方法的相应流程,为了简洁,在此不再赘述。
因此,本发明实施例提供的计算节点,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
上文中结合图1至图17,详细描述了根据本发明实施例的数据处理系统及计算节点,下面将结合图18至图19,详细描述根据本发明实施例的数据处理的方法。
图18示出了根据本发明实施例的一种数据处理的方法900,其特征在于,该方法900应用于数据处理系统中,该数据处理系统包括管理节点和第一计算节点,该方法900包括:
S901,该管理节点向该第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为该至少两个计算节点中的第x个计算节点,x为正整数,其中,该第一类计算节点中的至少两个计算节点并行处理该管理节点分配的第一处理任务;
S902,该计算节点FCx根据该管理节点分配的第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,该数据块Mx为包含m行n列数据的矩阵,该数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
S903,该计算节点FCx对该数据块Mx以及该数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,该第一中间结果V′x为包括m维数据的向量,该第一中间结果V′x中的元素为vi',i为变 量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为该数据块Mx中的元素,vj为该数据块V1x中的元素,j为变量,j的取值分别从1到n;
S904,该管理节点根据该第一类计算节点中的至少两个计算节点获得的第一中间结果获得该待处理数据集的处理结果。
因此,本发明实施例提供的数据处理的方法,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
可选地,作为一个实施例,该数据处理系统中还包括至少一个第二类计算节点,该方法900还包括:
该管理节点根据该第一类计算节点中的至少两个计算节点获得的第一中间结果,向该第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为该至少一个计算节点中的第y个计算节点,y为正整数;
该计算节点SCy根据该第二处理任务获取该第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,该SCy获取的第一中间结果是根据该待处理数据集中位于同一行的数据块获得的第一中间结果;
该计算节点SCy将该SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果V″y,其中,该第二中间结果V″y为包含m维数据的向量;
该管理节点根据该第二类计算结果中的至少一个计算节点获得的第二中间结果获得该待处理数据集的处理结果。
可选地,作为一个实施例,该数据集还包括数据块V2x,该数据块V2x为包含m维数据的向量,该方法900还包括:
该管理节点根据该第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向该第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
该计算节点SCy根据该第三处理任务获取该数据集中的数据块V2x
该计算节点SCy对该SCy获得的第二中间结果V″y和该数据块V2x执行赋值assign操作,以获得该待处理数据集的处理结果。
可选地,作为一个实施例,m=n,该数据块V1x和该数据块V2x为同一 数据块。
可选地,作为一个实施例,在该第二类计算节点中包括至少两个计算节点时,该第二类计算节点中的至少两个计算节点并行处理该管理节点分配的第二处理任务。
可选地,作为一个实施例,该管理节点、该第一类计算节点和该第二类计算节点包括物理机、虚拟机或中央处理器CPU。
因此,本发明实施例提供的数据处理的方法,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
图19示出了根据本发明实施例的一种数据处理的方法1000,其特征在于,该方法1000应用于数据处理系统中,该数据处理系统包括管理节点和第一计算节点,该方法1000包括:
S1001,该管理节点向该第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为该至少两个计算节点中的第x个计算节点,x为正整数,其中,该第一类计算节点中的至少两个计算节点并行处理该管理节点分配的第一处理任务;
S1002,该计算节点FCx根据该管理节点分配的第一处理任务获取待处理的数据集中的数据块M1x和数据块M2x,其中,该数据块M1x为包含m行n列数据的矩阵,该数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
S1003,该计算节点FCx对该数据块M1x以及该数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,该第一中间结果M′x为包括m行p列数据的矩阵,该第一中间结果M′x中的元素为m′ij,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为该数据块M1x中第i行第k列的元素,m2[k,j]为该数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n;
S1004,该管理节点根据该第一类计算节点中的至少两个计算节点获得的第一中间结果获得该待处理数据集的处理结果。
因此,本发明实施例提供的数据处理的方法,在对数据块进行合并操作 和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
可选地,作为一个实施例,该数据处理系统中还包括至少一个第二类计算节点,该方法1000还包括:
该管理节点根据该第一类计算节点中的至少两个计算节点获得的第一中间结果,向该第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为该至少一个计算节点中的第y个计算节点,y为正整数;
该计算节点SCy根据该第二处理任务获取该第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,该SCy获取的第一中间结果是根据该待处理数据集中位于同一行的数据块M1x和位于同一列的数据块M2x获得的第一中间结果;
该计算节点SCy将该SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果M″y,其中,该第二中间结果M″y为包含m行p列数据的矩阵;
该管理节点根据该第二类计算结果中的至少一个计算节点获得的第二中间结果获得该待处理数据集的处理结果。
可选地,作为一个实施例,该数据集还包括数据块M3x,该数据块M3x为包含m行p列数据的矩阵,该方法1000还包括:
该管理节点根据该第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向该第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
该计算节点SCy根据该第三处理任务获取该数据集中的数据块M3x
该计算节点SCy对该SCy获得的第二中间结果M″y和该数据块M3x执行赋值assign操作,以获得该待处理数据集的处理结果。
可选地,作为一个实施例,n=m,该数据块M2x和该数据块M3x为同一数据块。
可选地,作为一个实施例,n=p,该数据块M1x和该数据块M3x为同一数据块。
可选地,作为一个实施例,在该第二类计算节点中包括至少两个计算节 点时,该第二类计算节点中的至少两个计算节点并行处理该管理节点分配的第二处理任务。
可选地,作为一个实施例,该管理节点、该第一类计算节点和该第二类计算节点包括物理机、虚拟机或中央处理器CPU。
因此,本发明实施例提供的数据处理的方法,在对数据块进行合并操作和归约操作时,不需要等待所有的合并操作全部进行完成之后再进行归约操作,而是合并操作和归约操作交替进行,从而可以节省计算所占用的内存空间,减少计算时间。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本领域的技术人员可以清楚地了解到本发明实施例中的技术可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。前述的存储介质可以包括:U盘、移动硬盘、磁碟、光盘、随机存储器(Random-Access Memory,RAM)、固态硬盘(Solid State Disk,SSD)或者非易失性存储器(non-volatile memory)等各种可以存储程序代码的非短暂性的(non-transitory)机器可读介质。以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此。

Claims (30)

  1. 一种数据处理系统,其特征在于,所述数据处理系统包括管理节点和第一类计算节点,
    所述管理节点用于:
    向所述第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为所述至少两个计算节点中的第x个计算节点,x为正整数;
    所述第一类计算节点中的至少两个计算节点并行处理所述管理节点分配的第一处理任务;
    所述计算节点FCx,用于根据所述管理节点分配的第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,所述数据块Mx为包含m行n列数据的矩阵,所述数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
    对所述数据块Mx以及所述数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,所述第一中间结果V′x为包括m维数据的向量,所述第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为所述数据块Mx中的元素,vj为所述数据块V1x中的元素,j为变量,j的取值分别从1到n;
    所述管理节点,还用于根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果获得所述待处理数据集的处理结果。
  2. 根据权利要求1所述的数据处理系统,其特征在于,所述数据处理系统中还包括第二类计算节点,所述管理节点具体用于:
    根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为所述至少一个计算节点中的第y个计算节点,y为正整数;
    所述计算节点SCy用于:
    根据所述第二处理任务获取所述第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,所述SCy获取的第一中间结果是根据所述待处理数据集中位于同一行的数据块获得的第一中间结果;
    将所述SCy获取的第一中间结果执行reduce2操作,以获得第二中间结 果V″y,其中,所述第二中间结果V″y为包含m维数据的向量;
    所述管理节点具体用于:
    根据所述第二类计算节点中的至少一个计算节点获得的第二中间结果获得所述待处理数据集的处理结果。
  3. 根据权利要求2所述的数据处理系统,其特征在于,所述数据集还包括数据块V2x,所述数据块V2x为包含m维数据的向量,所述管理节点还用于:
    根据所述第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
    所述计算节点SCy还用于:
    根据所述第三处理任务获取所述数据集中的数据块V2x
    对所述SCy获得的第二中间结果V″y和所述数据块V2x执行赋值assign操作,以获得所述待处理数据集的处理结果。
  4. 根据权利要求3所述的数据处理系统,其特征在于:m=n,所述数据块V1x和所述数据块V2x为同一数据块。
  5. 根据权利要求2-4任意一项所述的数据处理系统,其特征在于,在所述第二类计算节点中包括至少两个计算节点时,所述第二类计算节点中的至少两个计算节点并行处理所述管理节点分配的第二处理任务。
  6. 根据权利要求2-5任意一项所述的数据处理系统,其特征在于:所述管理节点、所述第一类计算节点和所述第二类计算节点包括物理机、虚拟机或中央处理器CPU。
  7. 一种数据处理系统,其特征在于,所述数据处理系统包括管理节点和第一类计算节点,
    所述管理节点用于:
    向所述第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为所述至少两个计算节点中的第x个计算节点,x为正整数;
    所述第一类计算节点中的至少两个计算节点并行处理所述管理节点分配的第一处理任务;
    所述计算节点FCx,用于根据所述管理节点分配的第一处理任务获取待 处理的数据集中的数据块M1x和数据块M2x,其中,所述数据块M1x为包含m行n列数据的矩阵,所述数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
    对所述数据块M1x以及所述数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,所述第一中间结果M′x为包括m行p列数据的矩阵,所述第一中间结果M′x中的元素为m′i,j,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为所述数据块M1x中第i行第k列的元素,m2[k,j]为所述数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n;
    所述管理节点,还用于根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果获得所述待处理数据集的处理结果。
  8. 根据权利要求7所述的数据处理系统,其特征在于,所述数据处理系统中还包括第二类计算节点,所述管理节点具体用于:
    根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为所述至少一个计算节点中的第y个计算节点,y为正整数;
    所述计算节点SCy用于:
    根据所述第二处理任务获取所述第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,所述SCy获取的第一中间结果是根据所述待处理数据集中位于同一行的数据块M1x和位于同一列的数据块M2x获得的第一中间结果;
    将所述SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果M″y,其中,所述第二中间结果M″y为包含m行p列数据的矩阵;
    所述管理节点具体用于:
    根据所述第二类计算结果中的至少一个计算节点获得的第二中间结果获得所述待处理数据集的处理结果。
  9. 根据权利要求8所述的数据处理系统,其特征在于,所述数据集还包括数据块M3x,所述数据块M3x为包含m行p列数据的矩阵,所述管理节点还用于:
    根据所述第二类计算节点中包括SCy在内的至少一个计算节点获得的第 二中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
    所述计算节点SCy还用于:
    根据所述第三处理任务获取所述数据集中的数据块M3x
    对所述SCy获得的第二中间结果M″y和所述数据块M3x执行赋值assign操作,以获得所述待处理数据集的处理结果。
  10. 根据权利要求9所述的数据处理系统,其特征在于:n=m,所述数据块M2x和所述数据块M3x为同一数据块。
  11. 根据权利要求9所述的数据处理系统,其特征在于:n=p,所述数据块M1x和所述数据块M3x为同一数据块。
  12. 根据权利要求8-11任意一项所述的数据处理系统,其特征在于:在所述第二类计算节点中包括至少两个计算节点时,所述第二类计算节点中的至少两个计算节点并行处理所述管理节点分配的第二处理任务。
  13. 根据权利要求7-12任意一项所述的数据处理系统,其特征在于:所述管理节点、所述第一类计算节点和所述第二类计算节点包括物理机、虚拟机或中央处理器CPU。
  14. 一种计算节点,其特征在于,包括:
    接收模块,用于接收数据处理系统中的管理节点分配的第一处理任务;
    获取模块,用于根据所述接收模块接收的所述管理节点分配的所述第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,所述数据块Mx为包含m行n列数据的矩阵,所述数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
    处理模块,用于对所述数据块Mx以及所述数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,所述第一中间结果V′x为包括m维数据的向量,所述第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为所述数据块Mx中的元素,vj为所述数据块V1x中的元素,j为变量,j的取值分别从1到n。
  15. 根据权利要求14所述的计算节点,其特征在于:所述计算节点包括物理机、虚拟机或中央处理器CPU。
  16. 一种计算节点,其特征在于,包括:
    接收模块,用于接收数据处理系统中的管理节点分配的第一处理任务,其中,所述数据处理系统包括所述计算节点和所述管理节点;
    获取模块,用于根据所述接收模块接收的所述管理节点分配的所述第一处理任务,获取待处理的数据集中的数据块M1x和数据块M2x,其中,所述数据块M1x为包含m行n列数据的矩阵,所述数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
    处理模块,用于对所述数据块M1x以及所述数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,所述第一中间结果M′x为包括m行p列数据的矩阵,所述第一中间结果M′x中的元素为m′ij,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为所述数据块M1x中第i行第k列的元素,m2[k,j]为所述数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n。
  17. 根据权利要求14所述的计算节点,其特征在于:所述计算节点包括物理机、虚拟机或中央处理器CPU。
  18. 一种数据处理的方法,其特征在于,所述方法应用于数据处理系统中,所述数据处理系统包括管理节点和第一类计算节点,所述方法包括:
    所述管理节点向所述第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为所述至少两个计算节点中的第x个计算节点,x为正整数,其中,所述第一类计算节点中的至少两个计算节点并行处理所述管理节点分配的第一处理任务;
    所述计算节点FCx根据所述管理节点分配的第一处理任务,获取待处理的数据集中的数据块Mx和数据块V1x,其中,所述数据块Mx为包含m行n列数据的矩阵,所述数据块V1x为包含n维数据的向量,m和n为正整数,n的值不小于2;
    所述计算节点FCx对所述数据块Mx以及所述数据块V1x执行合并combine2操作和归约reduce2操作,以获得第一中间结果V′x,所述第一中间结果V′x为包括m维数据的向量,所述第一中间结果V′x中的元素为v′i,i为变量,i的取值分别从1到m,其中,v′i=v′i,n,v′i,n根据v′i,j=reduce2(v′i,j-1,combine2(mi,j,vj))获得,mi,j为所述数据块Mx中的元素,vj为所述数据块V1x中的元素,j为变量,j的取值分别从1到n;
    所述管理节点根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果获得所述待处理数据集的处理结果。
  19. 根据权利要求18所述的方法,其特征在于,所述数据处理系统还包括至少一个第二类计算节点,所述方法还包括:
    所述管理节点根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为所述至少一个计算节点中的第y个计算节点,y为正整数;
    所述计算节点SCy根据所述第二处理任务获取所述第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,所述SCy获取的第一中间结果是根据所述待处理数据集中位于同一行的数据块获得的第一中间结果;
    所述计算节点SCy将所述SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果V″y,其中,所述第二中间结果V″y为包含m维数据的向量;
    所述管理节点根据所述第二类计算结果中的至少一个计算节点获得的第二中间结果获得所述待处理数据集的处理结果。
  20. 根据权利要求19所述的方法,其特征在于,所述数据集还包括数据块V2x,所述数据块V2x为包含m维数据的向量,所述方法还包括:
    所述管理节点根据所述第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
    所述计算节点SCy根据所述第三处理任务获取所述数据集中的数据块V2x
    所述计算节点SCy对所述SCy获得的第二中间结果V″y和所述数据块V2x执行赋值assign操作,以获得所述待处理数据集的处理结果。
  21. 根据权利要求20所述的方法,其特征在于:m=n,所述数据块V1x和所述数据块V2x为同一数据块。
  22. 根据权利要求19-21任意一项所述的方法,其特征在于:在所述第二类计算节点中包括至少两个计算节点时,所述第二类计算节点中的至少两个计算节点并行处理所述管理节点分配的第二处理任务。
  23. 根据权利要求18-22任意一项所述的方法,其特征在于:所述管理 节点、所述第一类计算节点和所述第二类计算节点包括物理机、虚拟机或中央处理器CPU。
  24. 一种数据处理的方法,其特征在于,所述方法应用于数据处理系统中,所述数据处理系统包括管理节点和第一类计算节点,所述方法包括:
    所述管理节点向所述第一类计算节点中包括FCx在内的至少两个计算节点分配第一处理任务,其中,FCx为所述至少两个计算节点中的第x个计算节点,x为正整数,其中,所述第一类计算节点中的至少两个计算节点并行处理所述管理节点分配的第一处理任务;
    所述计算节点FCx根据所述管理节点分配的第一处理任务获取待处理的数据集中的数据块M1x和数据块M2x,其中,所述数据块M1x为包含m行n列数据的矩阵,所述数据块M2x为包含n行p列数据的矩阵,m、n和P为正整数,n的值不小于2;
    所述计算节点FCx对所述数据块M1x以及所述数据块M2x执行合并combine2操作和归约reduce2操作,以获得第一中间结果M′x,所述第一中间结果M′x为包括m行p列数据的矩阵,所述第一中间结果M′x中的元素为m′ij,i和j为变量,i的取值分别从1到m,j的取值分别从1到p,其中,m′i,j=m′i,j,n,m′i,j,n根据m′i,j,k=reduce2(m′i,j,k-1,combine2(m1[i,k],m2[k,j]))获得,m1[i,k]为所述数据块M1x中第i行第k列的元素,m2[k,j]为所述数据块M2x中第k行第j列的元素,k为变量,k的取值分别从1到n;
    所述管理节点根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果获得所述待处理数据集的处理结果。
  25. 根据权利要求24所述的方法,其特征在于,所述数据处理系统中还包括第二类计算节点,所述方法还包括:
    所述管理节点根据所述第一类计算节点中的至少两个计算节点获得的第一中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第二处理任务,其中,SCy为所述至少一个计算节点中的第y个计算节点,y为正整数;
    所述计算节点SCy根据所述第二处理任务获取所述第一类计算节点中的至少两个计算节点获得的第一中间结果,其中,所述SCy获取的第一中间结果是根据所述待处理数据集中位于同一行的数据块M1x和位于同一列的数据块M2x获得的第一中间结果;
    所述计算节点SCy将所述SCy获取的第一中间结果执行reduce2操作,以获得第二中间结果M″y,其中,所述第二中间结果M″y为包含m行p列数据的矩阵;
    所述管理节点根据所述第二类计算结果中的至少一个计算节点获得的第二中间结果获得所述待处理数据集的处理结果。
  26. 根据权利要求25所述的方法,其特征在于,所述数据集还包括数据块M3x,所述数据块M3x为包含m行p列数据的矩阵,所述方法还包括:
    所述管理节点根据所述第二类计算节点中包括SCy在内的至少一个计算节点获得的第二中间结果,向所述第二类计算节点中包括SCy在内的至少一个计算节点分配第三处理任务;
    所述计算节点SCy根据所述第三处理任务获取所述数据集中的数据块M3x
    所述计算节点SCy对所述SCy获得的第二中间结果M″y和所述数据块M3x执行赋值assign操作,以获得所述待处理数据集的处理结果。
  27. 根据权利要求26所述的方法,其特征在于:n=m,所述数据块M2x和所述数据块M3x为同一数据块。
  28. 根据权利要求26所述的方法,其特征在于:n=p,所述数据块M1x和所述数据块M3x为同一数据块。
  29. 根据权利要求25-28任意一项所述的方法,其特征在于:在所述第二类计算节点中包括至少两个计算节点时,所述第二类计算节点中的至少两个计算节点并行处理所述管理节点分配的第二处理任务。
  30. 根据权利要求24-29任意一项所述的方法,其特征在于:所述管理节点、所述第一类计算节点和所述第二类计算节点包括物理机、虚拟机或中央处理器CPU。
PCT/CN2015/072451 2015-02-06 2015-02-06 数据处理系统、计算节点和数据处理的方法 WO2016123808A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020177022612A KR101999639B1 (ko) 2015-02-06 2015-02-06 데이터 처리 시스템, 계산 노드 및 데이터 처리 방법
PCT/CN2015/072451 WO2016123808A1 (zh) 2015-02-06 2015-02-06 数据处理系统、计算节点和数据处理的方法
EP15880764.4A EP3239853A4 (en) 2015-02-06 2015-02-06 Data processing system, calculation node and data processing method
CN201580001137.1A CN106062732B (zh) 2015-02-06 2015-02-06 数据处理系统、计算节点和数据处理的方法
JP2017541356A JP6508661B2 (ja) 2015-02-06 2015-02-06 データ処理システム、計算ノードおよびデータ処理方法
US15/667,634 US10567494B2 (en) 2015-02-06 2017-08-03 Data processing system, computing node, and data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/072451 WO2016123808A1 (zh) 2015-02-06 2015-02-06 数据处理系统、计算节点和数据处理的方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/667,634 Continuation US10567494B2 (en) 2015-02-06 2017-08-03 Data processing system, computing node, and data processing method

Publications (1)

Publication Number Publication Date
WO2016123808A1 true WO2016123808A1 (zh) 2016-08-11

Family

ID=56563349

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/072451 WO2016123808A1 (zh) 2015-02-06 2015-02-06 数据处理系统、计算节点和数据处理的方法

Country Status (6)

Country Link
US (1) US10567494B2 (zh)
EP (1) EP3239853A4 (zh)
JP (1) JP6508661B2 (zh)
KR (1) KR101999639B1 (zh)
CN (1) CN106062732B (zh)
WO (1) WO2016123808A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764490A (zh) * 2018-08-28 2018-11-06 合肥本源量子计算科技有限责任公司 一种量子虚拟机

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11748625B2 (en) 2016-12-30 2023-09-05 Intel Corporation Distributed convolution for neural networks
US10169296B2 (en) * 2016-12-30 2019-01-01 Intel Corporation Distributed matrix multiplication for neural networks
US10540398B2 (en) * 2017-04-24 2020-01-21 Oracle International Corporation Multi-source breadth-first search (MS-BFS) technique and graph processing system that applies it
CN107273339A (zh) * 2017-06-21 2017-10-20 郑州云海信息技术有限公司 一种任务处理方法及装置
CN107590254B (zh) * 2017-09-19 2020-03-17 华南理工大学 具有合并处理方法的大数据支撑平台
US11636327B2 (en) * 2017-12-29 2023-04-25 Intel Corporation Machine learning sparse computation mechanism for arbitrary neural networks, arithmetic compute microarchitecture, and sparsity for training mechanism
CN113678124A (zh) * 2019-02-01 2021-11-19 光子智能股份有限公司 处理速率受限系统的矩阵操作
CN110489448A (zh) * 2019-07-24 2019-11-22 西安理工大学 基于Hadoop的大数据关联规则的挖掘方法
CN110727836B (zh) * 2019-12-17 2020-04-07 南京华飞数据技术有限公司 基于Spark GraphX的社交网络分析系统及其实现方法
CN112667679B (zh) * 2020-12-17 2024-02-13 中国工商银行股份有限公司 数据关系的确定方法、装置和服务器
CN113872752B (zh) * 2021-09-07 2023-10-13 哲库科技(北京)有限公司 安全引擎模组、安全引擎装置和通信设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078991A1 (en) * 2010-09-27 2012-03-29 Fujitsu Limited Ordering generating method and storage medium, and shared memory scalar parallel computer
CN102831102A (zh) * 2012-07-30 2012-12-19 北京亿赞普网络技术有限公司 一种在计算机集群上进行矩阵乘积运算的方法和系统
CN103136244A (zh) * 2011-11-29 2013-06-05 中国电信股份有限公司 基于云计算平台的并行数据挖掘方法及系统
CN103345514A (zh) * 2013-07-09 2013-10-09 焦点科技股份有限公司 大数据环境下的流式数据处理方法
CN103544328A (zh) * 2013-11-15 2014-01-29 南京大学 一种基于Hadoop的并行k均值聚类方法

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6282583B1 (en) * 1991-06-04 2001-08-28 Silicon Graphics, Inc. Method and apparatus for memory access in a matrix processor computer
JPH05181895A (ja) * 1991-12-27 1993-07-23 Hitachi Ltd 並列計算行列解法
JPH05324700A (ja) * 1992-05-19 1993-12-07 N T T Data Tsushin Kk 行列乗算装置
JPH06175986A (ja) * 1992-12-10 1994-06-24 Nippon Telegr & Teleph Corp <Ntt> 行列演算の並列処理方法
US7072960B2 (en) * 2002-06-10 2006-07-04 Hewlett-Packard Development Company, L.P. Generating automated mappings of service demands to server capacities in a distributed computer system
US20080147821A1 (en) * 2006-12-19 2008-06-19 Dietrich Bradley W Managed peer-to-peer content backup service system and method using dynamic content dispersal to plural storage nodes
JP2009087282A (ja) * 2007-10-03 2009-04-23 Fuji Xerox Co Ltd 並列計算システムおよび並列計算方法
US7925842B2 (en) * 2007-12-18 2011-04-12 International Business Machines Corporation Allocating a global shared memory
US7921261B2 (en) * 2007-12-18 2011-04-05 International Business Machines Corporation Reserving a global address space
US9552206B2 (en) * 2010-11-18 2017-01-24 Texas Instruments Incorporated Integrated circuit with control node circuitry and processing circuitry
US9489176B2 (en) * 2011-09-15 2016-11-08 Exxonmobil Upstream Research Company Optimized matrix and vector operations in instruction limited algorithms that perform EOS calculations
GB2500444B (en) * 2012-03-21 2015-09-16 Broadcom Corp Data processing
US20140331014A1 (en) * 2013-05-01 2014-11-06 Silicon Graphics International Corp. Scalable Matrix Multiplication in a Shared Memory System
US20150095747A1 (en) * 2013-09-30 2015-04-02 Itzhak Tamo Method for data recovery
US9916188B2 (en) * 2014-03-14 2018-03-13 Cask Data, Inc. Provisioner for cluster management system
WO2016037351A1 (en) * 2014-09-12 2016-03-17 Microsoft Corporation Computing system for training neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078991A1 (en) * 2010-09-27 2012-03-29 Fujitsu Limited Ordering generating method and storage medium, and shared memory scalar parallel computer
CN103136244A (zh) * 2011-11-29 2013-06-05 中国电信股份有限公司 基于云计算平台的并行数据挖掘方法及系统
CN102831102A (zh) * 2012-07-30 2012-12-19 北京亿赞普网络技术有限公司 一种在计算机集群上进行矩阵乘积运算的方法和系统
CN103345514A (zh) * 2013-07-09 2013-10-09 焦点科技股份有限公司 大数据环境下的流式数据处理方法
CN103544328A (zh) * 2013-11-15 2014-01-29 南京大学 一种基于Hadoop的并行k均值聚类方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KANG, U. ET AL.: "PEGAUS: Mining Peta-scale Graphs", KNOWLEDGE AND INFORMATION SYSTEMS, vol. 27, no. 2, 31 May 2009 (2009-05-31), pages 305, XP008185792 *
See also references of EP3239853A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764490A (zh) * 2018-08-28 2018-11-06 合肥本源量子计算科技有限责任公司 一种量子虚拟机

Also Published As

Publication number Publication date
CN106062732B (zh) 2019-03-01
EP3239853A4 (en) 2018-05-02
EP3239853A1 (en) 2017-11-01
JP6508661B2 (ja) 2019-05-08
CN106062732A (zh) 2016-10-26
US20170331886A1 (en) 2017-11-16
KR20170103949A (ko) 2017-09-13
JP2018508887A (ja) 2018-03-29
KR101999639B1 (ko) 2019-07-12
US10567494B2 (en) 2020-02-18

Similar Documents

Publication Publication Date Title
WO2016123808A1 (zh) 数据处理系统、计算节点和数据处理的方法
CN111684473B (zh) 提高神经网络阵列的性能
US10445638B1 (en) Restructuring a multi-dimensional array
US11361483B2 (en) Graph partitioning for massive scale graphs
US20230351186A1 (en) Processing for multiple input data sets
Peng et al. Parallel and distributed sparse optimization
US10936765B2 (en) Graph centrality calculation method and apparatus, and storage medium
US11775430B1 (en) Memory access for multiple circuit components
US9996391B2 (en) Parallel computer system, method of controlling parallel computer system, and recording medium
Arnaiz-González et al. MR-DIS: democratic instance selection for big data by MapReduce
WO2023087914A1 (zh) 推荐内容的选择方法、装置、设备、存储介质及程序产品
JP2023541350A (ja) 表畳み込みおよびアクセラレーション
Kumar Aggregation based on graph matching and inexact coarse grid solve for algebraic two grid
CN109710403B (zh) 应用进程映射方法、电子装置及计算机可读存储介质
CN108140022A (zh) 数据查询方法和数据库系统
Le Borne et al. Domain decomposition methods in scattered data interpolation with conditionally positive definite radial basis functions
DeFord et al. Empirical analysis of space-filling curves for scientific computing applications
Ibeid et al. A performance model for the communication in fast multipole methods on high-performance computing platforms
US9600446B2 (en) Parallel multicolor incomplete LU factorization preconditioning processor and method of use thereof
CN110210691B (zh) 资源推荐方法、装置、存储介质及设备
AU2019200112A1 (en) Reciprocal distribution calculating method and reciprocal distribution calculating system for cost accounting
CN111712811A (zh) Hd地图的可扩展图形slam
US20240080255A1 (en) Network traffic control using estimated maximum gap
Wlodarczyk et al. Differential evolution for task allocation in mesh networks: Experimentation system and comparison of mutation schemes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15880764

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015880764

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017541356

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20177022612

Country of ref document: KR

Kind code of ref document: A