CN107273339A - A kind of task processing method and device - Google Patents

A kind of task processing method and device Download PDF

Info

Publication number
CN107273339A
CN107273339A CN201710483201.5A CN201710483201A CN107273339A CN 107273339 A CN107273339 A CN 107273339A CN 201710483201 A CN201710483201 A CN 201710483201A CN 107273339 A CN107273339 A CN 107273339A
Authority
CN
China
Prior art keywords
back end
matrix
data set
parallel
sub data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710483201.5A
Other languages
Chinese (zh)
Inventor
刘姝
黄雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710483201.5A priority Critical patent/CN107273339A/en
Publication of CN107273339A publication Critical patent/CN107273339A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/543User-generated data transfer, e.g. clipboards, dynamic data exchange [DDE], object linking and embedding [OLE]

Abstract

The present invention provides a kind of task processing method and device, reads corresponding matrix Sub Data Set parallel by multiple back end and the calculating task of the matrix Sub Data Set is divided into the i.e. multiple submatrix blocks in multiple subtasks respectively;The submatrix block is distributed to corresponding calculate node and performs the parallel computation i.e. parallel reading of file and the parallel execution of task by the calculate node by the back end, is realized efficient, inexpensive task processing, be greatly strengthen Consumer's Experience.

Description

A kind of task processing method and device
Technical field
The invention belongs to high-performance computing sector, more particularly to a kind of task processing method and device.
Background technology
With continuing to develop for big data epoch, large-scale data plays crucial work for scientific algorithm and scientific statistics With, and Matrix Multiplication is a kind of algorithm more popularized in large-scale scientific algorithm.And the increase and calculating with data set are complicated The lifting of degree, requirement to algorithm performance also more and more higher during science is realized, the lifting of algorithm performance is to promoting engineering, project Research progress plays vital effect, and is limited to the limitation of computer hardware resource internal memory, calculating speed etc., for completing The time for generally requiring tens of days or even several months is realized on the computing of some extensive matrix, single machine, and for internal memory The less calculating platform of capacity, its scarce capacity frequently can lead to journey to support under the storage of large-scale dataset, serious conditions The collapse of sequence, so as to limit the realization of large-scale dataset calculating.
Traditional matrix algorithm realizes that the product for being generally basede on each matrix element in single calculate node is summed successively, and adopts Data are read with serial calculation, i.e. order, wait back to calculate the calculating for completing just to start next step.If want The matrix size of calculating is ranks up to ten thousand, and the time complexity and disappeared that two matrix multiple methods are brought are realized using serial mode The cost of consumption is well imagined.
Therefore, above-mentioned technical problem is solved in the urgent need to providing a kind of task processing scheme efficiently, inexpensive.
The content of the invention
The present invention provides a kind of task processing method and device, to solve the above problems.
The embodiment of the present invention provides a kind of task processing method.The above method comprises the following steps:Pass through multiple data sections Point reads corresponding matrix Sub Data Set and the calculating task of the matrix Sub Data Set is divided into many height respectively parallel appoints Business is multiple submatrix blocks;
The submatrix block is distributed to corresponding calculate node and performed simultaneously by the calculate node by the back end Row is calculated.
The embodiment of the present invention provides a kind of Task Processing Unit, including read module, division module, distribution computing module; Wherein, the read module is connected by the division module with the distribution computing module;
The read module, for reading corresponding matrix Sub Data Set parallel by multiple back end;
The division module, is divided into multiple subtasks i.e. many for respectively by the calculating task of the matrix Sub Data Set Individual sub- matrix-block;
The distribution computing module, for distributing the submatrix block to corresponding calculate node and being saved by described calculate Point performs parallel computation.
Pass through following scheme:Corresponding matrix Sub Data Set is read parallel by multiple back end and respectively by the square The calculating task of a period of time data set is divided into the i.e. multiple submatrix blocks in multiple subtasks;The back end is by the submatrix block Distribute to corresponding calculate node and by the calculate node perform parallel computation be file parallel reading and task it is parallel Perform, realize efficient, inexpensive task processing, greatly strengthen Consumer's Experience.
Pass through following scheme:The submatrix block includes the line number more than 1 and line number is continuous so that calculate node will be counted Calculate result and be disposably sent to back end, number of communications is reduced, so as to reduce communication overhead.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 show the task processing method flow chart of the embodiment of the present invention 1;
Fig. 2 show the designed holder composition of the embodiment of the present invention 2;
Fig. 3 show principal and subordinate's process task distribution implementation process figure of the embodiment of the present invention 3;
Fig. 4 show the Task Processing Unit structure chart of the embodiment of the present invention 4.
Embodiment
Describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that not conflicting In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
Fig. 1 show the task processing method flow chart of the embodiment of the present invention 1, comprises the following steps:
Step 101:Corresponding matrix Sub Data Set is read parallel by multiple back end and respectively by matrix The calculating task of data set is divided into the i.e. multiple submatrix blocks in multiple subtasks;
Further, the process for reading corresponding matrix Sub Data Set parallel by multiple back end is:
Multiple back end open the file for depositing matrix by corresponding digital independent host process respectively, return to a text Part handle;
The offset in the corresponding matrix file of each digital independent host process is calculated, each digital independent host process is obtained Logical place of the matrix Sub Data Set to be read in matrix file;
Logical place of the digital independent host process according to the matrix Sub Data Set to be read in matrix file, reads Take corresponding matrix Sub Data Set.
Preferably, multiple back end are read after corresponding matrix Sub Data Set parallel, are stored in internal memory.
Further, the matrix Sub Data Set is averagely divided into many by the back end according to calculate node quantity Individual sub- matrix-block is simultaneously sent to corresponding calculate node.
Preferably, the submatrix block includes the line number more than 1 and line number is continuous.
If the line number for the matrix Sub Data Set that the back end is read is M, calculate node number is child_ Process, then distribute each calculate node per_rank=M/child_process rows,
For remaining extra=M%child_process rows, extra rows are evenly distributed to 1 to No. extra calculating Node, 1 to each process of extra calculate nodes increases a line.
Step 102:The submatrix block is distributed to corresponding calculate node and saved by described calculate by the back end Point performs parallel computation.
Further, the calculate node is completed after the calculating of the submatrix block, and result of calculation is sent to corresponding Back end.
Further, a back end is selected from the back end, the back end for statistics is used as;
The back end for being used to count is collected and stored to the result of calculation obtained from other back end.
On the one hand the method that the embodiment of the present invention is proposed draws large-scale dataset by way of parallel file is read It is divided into numerous matrix Sub Data Sets, solves internal memory restricted problem, on the other hand by the way that the calculating task of matrix Sub Data Set is drawn It is divided into multiple subtasks, realizes the parallel execution of multiple calculating subtasks, so as to improves algorithm based under large-scale dataset Computational efficiency.
(1) for the processing of large-scale dataset, by the way of parallel file is read, using parallel file I/O technologies, When matrix is deposited in file, in the case of cluster multinode, the several nodes of selection pass through finger as back end from file Determine display offset, read the data of diverse location in file, big matrix data collection is divided, each back end is from matrix file The middle a part of subset for obtaining whole matrix data collection, this multiple node can be same if multiple back end parallel read datas Shi Zuowei MPI host processes carry out data distribution and management.This mode on the one hand can solving matrix scale it is too big, single node internal memory The problem of being not enough to store whole matrix capacity, while the mode read using parallel file offset address realizes each data section again Point parallel read data, improves the access efficiency of file.
(2) for serial matrix computations inefficiency the problem of, by the way of being divided using multiple subtasks, in cluster more piece In dot system, the calculating of the matrix Sub Data Set that each back end is read parallel in (1) is appointed using MPI multiprocess communications technology Business is divided into multiple subtasks, that is, divides the matrix into multiple submatrix blocks, distributes to different processes and handles, and handles submatrix Node where the process of block is that (node where host process is back end to calculate node, is meter from the node where process Operator node), the calculating task between each calculate node is performed parallel.
The embodiment of the present invention provides a set of effective host process, from process collaborative management method, and host process completes task Division, the distribution of data complete the parallel computation of Sub Data Set from process, if there is multiple back end to read parallel with collecting work Fetch evidence, then it is considered that there are multiple host processes, each host process is responsible for that multiple (i.e. each back end is responsible for multiple from process Calculate node).
(3) the Matrix Multiplication optimization method that the embodiment of the present invention is proposed can run on large-scale cluster platform, utilize MPI skills Art is different by setting offset address by the way of the reading of multiple back end parallel files for large-scale dataset Back end obtains the Sub Data Set of file diverse location simultaneously, so can both improve file reading efficiency, single-unit is solved again The problem of point limited memory is not enough to store whole data set.While the computational efficiency in order to improve algorithm, utilizes MPI multi-process The mode of cooperative cooperating, the calculating task of a certain matrix data collection is divided into the calculating of multiple submatrix blocks, respectively negative from process The wherein a certain submatrix block of duty, respectively from the parallel calculating for completing correspondence submatrix block between process.
(4) distinguishing feature of the embodiment of the present invention is in large-scale cluster system, for large-scale dataset, in single-unit Point machine is not enough in the case of storing whole data set, sets multiple back end to read correspondence Sub Data Set parallel, for Matrix data in different pieces of information node, is divided into multiple submatrix blocks, and submatrix block is allocated parallel to multiple calculate nodes Complete to cooperate between the processing of different submatrix blocks, each node, finally by the unified management for completing communication of back end and knot The arrangement of fruit data.
It is specifically described below:
In the optimization method of extensive Matrix Multiplication based on group system, it is divided into back end and calculate node, when single When node memory is not enough to data storage collection, multiple nodes are by the way of parallel file work/O, by specifying display in file The data that different nodes read diverse location in file are offset, these nodes are referred to as back end.Different from traditional MPI journeys Exist multiple " host processes " in sequence, the MPI programs that the embodiment of the present invention is realized, i.e., it is parallel in back end to read the every of file We are referred to as host process to individual process, in order to improve computational efficiency, and each host process divides the Sub Data Set obtained again (and will Matrix Sub Data Set in back end is divided into multiple submatrix blocks), distribution is to corresponding calculate node (from the section where process Put our referred to as calculate nodes), multiple calculate nodes obtain it is parallel after submatrix blocks performs calculatings, before replacement matrix division Serial computing, by parallel twice (digital independent calculates parallel with data parallel), (large-scale dataset is divided with division twice It is that Sub Data Set and matrix-block are divided into submatrix block), realize the algorithm optimization based on the extensive Matrix Multiplication of cluster platform.This hair Bright embodiment realizes framework as shown in Fig. 2 each back end obtains matrix Sub Data Set i.e. sub_ from database dataset;Each back end is responsible for different calculate nodes, and (calculate node obtains corresponding submatrix block, for example:Sub1, and Result of calculation res is returned into corresponding back end), last result of calculation carries out unified remittance by one of back end Always (res_dataset) and storage.
The embodiment that multiple back end (i.e. multiple digital independent host processes) read file parallel is as follows, calls first MPI_File_open (comm, filename ...), each process opens the file of storage matrix, returns to a file handle, Operation of each process to file is all realized by handle.The offset in each process homography file is calculated, is obtained Logical place of the data in matrix file needed for each process, so each process (back end) can obtain matrix data collection A part (i.e. Sub Data Set), each digital independent process is by calling MPI_File_read_at () to read square in file respectively The submatrix data set of acquisition is deposited in progress next step meter in respective internal memory by the different piece of battle array, so each back end Calculate operation.
It is previously noted that being by secondary stroke of data set in the method that the embodiment of the present invention is realized using MPI multi-process technologies Point, first time division is multiple back end (host process) Sub Data Set that reading large-scale data is concentrated parallel and is stored in interior In depositing, second of division is that matrix Sub Data Set is again divided into multiple submatrix blocks, the meter of each submatrix block by back end Calculation is allocated to corresponding calculate node (from process), and the parallel calculating for performing submatrix block, is realized one between calculate node The serial computing of individual large-scale dataset is converted to the parallel computation that submatrix block is directed to for multiple calculate nodes.In multiple data Nodal parallel is read under conditions of data set, and the multiple calculate nodes of each back end correspondence, back end enters as a master Matrix Sub Data Set is averagely divided into multiple submatrix blocks and sent to corresponding calculate node (calculate node as from process) by journey, Calculate node after the calculating of submatrix block is completed parallel, by the unified transmission of result of calculation to corresponding data node, each data section Point completes the distribution and collection of data, finally selects back end to complete collecting and storing for result data.
The implementation that submatrix block realizes parallel computation to calculate node (from process) is divided for back end (host process) Process is as shown in figure 3, mainly include following key point:
Host process carries out submatrix block and averagely divided, and sends consecutive numbers row data extremely from process;
From task parallelism calculated sub-matrix block, result of calculation is sent to host process;
Host process receives the feedback data from process, as a result collects and stores.
In addition, submatrix block divide with transmission process in, for ensure internal memory continuously read, host process be sent to it is each from The data of process all need to be continuous several rows in matrix, are calculated, then will calculated after respectively receiving host process data from process As a result host process is disposably sent to, number of communications can be so reduced, so as to reduce communication overhead.For the processing of remainder, if Current matrix line number is M, is child_process from process number, then each can obtain per_rank=M/ from process first Child_process rows, for remaining extra=M%child_process rows, to ensure that the load of each process processing is equal Weighing apparatus, extra rows are evenly distributed into 1 to extra processes, 1 to each process of extra processes increases a line, i.e. host process To 1 to No. extra each sends M/num_process+1 row data from process, and to extra to child_process, process is sent out Send M/child_process row data.
Fig. 4 show the Task Processing Unit structure chart of the embodiment of the present invention 4, including read module, division module, distribution Computing module;Wherein, the read module is connected by the division module with the distribution computing module;
The read module, for reading corresponding matrix Sub Data Set parallel by multiple back end;
The division module, is divided into multiple subtasks i.e. many for respectively by the calculating task of the matrix Sub Data Set Individual sub- matrix-block;
The distribution computing module, for distributing the submatrix block to corresponding calculate node and being saved by described calculate Point performs parallel computation.
Further, the submatrix block includes the line number more than 1 and line number is continuous.
Pass through following scheme:Corresponding matrix Sub Data Set is read parallel by multiple back end and respectively by the square The calculating task of a period of time data set is divided into the i.e. multiple submatrix blocks in multiple subtasks;The back end is by the submatrix block Distribute to corresponding calculate node and by the calculate node perform parallel computation be file parallel reading and task it is parallel Perform, realize efficient, inexpensive task processing, greatly strengthen Consumer's Experience.
Pass through following scheme:The submatrix block includes the line number more than 1 and line number is continuous so that calculate node will be counted Calculate result and be disposably sent to back end, number of communications is reduced, so as to reduce communication overhead.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (10)

1. a kind of task processing method, it is characterised in that comprise the following steps:
Corresponding matrix Sub Data Set is read parallel by multiple back end and respectively by the calculating of the matrix Sub Data Set Task is divided into the i.e. multiple submatrix blocks in multiple subtasks;
The submatrix block is distributed to corresponding calculate node and performs parallel meter by the calculate node by the back end Calculate.
2. according to the method described in claim 1, it is characterised in that read corresponding matrix parallel by multiple back end The process of data set is:
Multiple back end open the file for depositing matrix by corresponding digital independent host process respectively, return to a file sentence Handle;
The offset in the corresponding matrix file of each digital independent host process is calculated, each digital independent host process is obtained and wants Logical place of the matrix Sub Data Set of reading in matrix file;
Logical place of the digital independent host process according to the matrix Sub Data Set to be read in matrix file, reading pair The matrix Sub Data Set answered.
3. according to the method described in claim 1, it is characterised in that multiple back end read corresponding matrix subdata parallel After collection, it is stored in internal memory.
4. according to the method described in claim 1, it is characterised in that the back end, will be described according to calculate node quantity Matrix Sub Data Set is averagely divided into multiple submatrix blocks and sent to corresponding calculate node.
5. according to the method described in claim 1, it is characterised in that the submatrix block includes the line number for being more than 1 and line number is Continuously.
6. method according to claim 5, it is characterised in that if the row for the matrix Sub Data Set that the back end is read Number is M, and calculate node number is child_process, then distributes each calculate node per_rank=M/child_process OK,
For remaining extra=M%child_process rows, extra rows are evenly distributed to 1 to No. extra and calculate section Point, 1 to each process of extra calculate nodes increases a line.
7. according to the method described in claim 1, it is characterised in that the calculate node completes the calculating of the submatrix block Afterwards, result of calculation is sent to corresponding back end.
8. method according to claim 7 a, it is characterised in that back end is selected from the back end, makees For the back end for statistics;
The back end for being used to count is collected and stored to the result of calculation obtained from other back end.
9. a kind of Task Processing Unit, it is characterised in that including read module, division module, distribution computing module;Wherein, institute Read module is stated by the division module with the distribution computing module to be connected;
The read module, for reading corresponding matrix Sub Data Set parallel by multiple back end;
The division module, for the calculating task of the matrix Sub Data Set to be divided into the i.e. many height in multiple subtasks respectively Matrix-block;
The distribution computing module, for distributing the submatrix block to corresponding calculate node and being held by the calculate node Row parallel computation.
10. device according to claim 9, it is characterised in that the submatrix block includes the line number for being more than 1 and line number is Continuously.
CN201710483201.5A 2017-06-21 2017-06-21 A kind of task processing method and device Pending CN107273339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710483201.5A CN107273339A (en) 2017-06-21 2017-06-21 A kind of task processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710483201.5A CN107273339A (en) 2017-06-21 2017-06-21 A kind of task processing method and device

Publications (1)

Publication Number Publication Date
CN107273339A true CN107273339A (en) 2017-10-20

Family

ID=60068355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710483201.5A Pending CN107273339A (en) 2017-06-21 2017-06-21 A kind of task processing method and device

Country Status (1)

Country Link
CN (1) CN107273339A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108764490A (en) * 2018-08-28 2018-11-06 合肥本源量子计算科技有限责任公司 A kind of quantum virtual machine
CN109189732A (en) * 2018-08-03 2019-01-11 成都四方伟业软件股份有限公司 A kind of median analysis method and device
CN109669772A (en) * 2018-12-28 2019-04-23 第四范式(北京)技术有限公司 Calculate the parallel execution method and apparatus of figure
CN113254078A (en) * 2021-06-23 2021-08-13 北京睿芯高通量科技有限公司 Data stream processing method for efficiently executing matrix addition on GPDPU simulator
CN113568736A (en) * 2021-06-24 2021-10-29 阿里巴巴新加坡控股有限公司 Data processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040193841A1 (en) * 2003-03-31 2004-09-30 Fujitsu Limited Matrix processing device in SMP node distributed memory type parallel computer
CN102831102A (en) * 2012-07-30 2012-12-19 北京亿赞普网络技术有限公司 Method and system for carrying out matrix product operation on computer cluster
CN104461466A (en) * 2013-09-25 2015-03-25 广州中国科学院软件应用技术研究所 Method for increasing computing speed through parallel computing based on MPI and OpenMP hybrid programming model
CN104461467A (en) * 2013-09-25 2015-03-25 广州中国科学院软件应用技术研究所 Method for increasing calculation speed of SMP cluster system through MPI and OpenMP in hybrid parallel mode
CN105260342A (en) * 2015-09-22 2016-01-20 浪潮(北京)电子信息产业有限公司 Solving method and system for symmetric positive definite linear equation set
CN106062732A (en) * 2015-02-06 2016-10-26 华为技术有限公司 Data processing system, calculation node and data processing method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040193841A1 (en) * 2003-03-31 2004-09-30 Fujitsu Limited Matrix processing device in SMP node distributed memory type parallel computer
CN102831102A (en) * 2012-07-30 2012-12-19 北京亿赞普网络技术有限公司 Method and system for carrying out matrix product operation on computer cluster
CN104461466A (en) * 2013-09-25 2015-03-25 广州中国科学院软件应用技术研究所 Method for increasing computing speed through parallel computing based on MPI and OpenMP hybrid programming model
CN104461467A (en) * 2013-09-25 2015-03-25 广州中国科学院软件应用技术研究所 Method for increasing calculation speed of SMP cluster system through MPI and OpenMP in hybrid parallel mode
CN106062732A (en) * 2015-02-06 2016-10-26 华为技术有限公司 Data processing system, calculation node and data processing method
CN105260342A (en) * 2015-09-22 2016-01-20 浪潮(北京)电子信息产业有限公司 Solving method and system for symmetric positive definite linear equation set

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周灿: "基于MPI的矩阵运算并行算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李小卫 等: "基于MPI的并行I/O方法", 《微型机与应用》 *
许彦芹 等: "基于SMP集群的MPI+CUDA模型的研究与实现", 《计算机工程与设计》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109189732A (en) * 2018-08-03 2019-01-11 成都四方伟业软件股份有限公司 A kind of median analysis method and device
CN108764490A (en) * 2018-08-28 2018-11-06 合肥本源量子计算科技有限责任公司 A kind of quantum virtual machine
CN109669772A (en) * 2018-12-28 2019-04-23 第四范式(北京)技术有限公司 Calculate the parallel execution method and apparatus of figure
CN111522640A (en) * 2018-12-28 2020-08-11 第四范式(北京)技术有限公司 Parallel execution method and equipment of computational graph
CN113254078A (en) * 2021-06-23 2021-08-13 北京睿芯高通量科技有限公司 Data stream processing method for efficiently executing matrix addition on GPDPU simulator
CN113254078B (en) * 2021-06-23 2024-04-12 北京中科通量科技有限公司 Data stream processing method for efficiently executing matrix addition on GPDPU simulator
CN113568736A (en) * 2021-06-24 2021-10-29 阿里巴巴新加坡控股有限公司 Data processing method and device

Similar Documents

Publication Publication Date Title
CN107273339A (en) A kind of task processing method and device
CN108875958A (en) Use the primary tensor processor of outer product unit
US8751556B2 (en) Processor for large graph algorithm computations and matrix operations
US7680765B2 (en) Iterate-aggregate query parallelization
CN103049241B (en) A kind of method improving CPU+GPU isomery device calculated performance
CN108875956B (en) Primary tensor processor
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
CN103324765B (en) A kind of multi-core synchronization data query optimization method based on row storage
CN104239144A (en) Multilevel distributed task processing system
CN103019728A (en) Effective complex report parsing engine and parsing method thereof
Tantalaki et al. Pipeline-based linear scheduling of big data streams in the cloud
WO2014052942A1 (en) Random number generator in a parallel processing database
CN110929884A (en) Classification method and device for distributed machine learning optimization based on column division
CN106371924B (en) A kind of method for scheduling task minimizing MapReduce cluster energy consumption
CN105786619B (en) Virtual machine distribution method and device
CN107085562A (en) A kind of neural network processor and design method based on efficient multiplexing data flow
CN107491416A (en) Reconfigurable Computation structure and calculating dispatching method and device suitable for Arbitrary Dimensions convolution demand
CN105677763A (en) Image quality evaluating system based on Hadoop
CN107402926A (en) A kind of querying method and query facility
CN106412124A (en) Task allocation system and task allocation method for parallel ordering cloud service platform
Nicol et al. Efficient aggregation of multiple PLs in distributed memory parallel simulations
CN106844320A (en) A kind of financial statement integration method and equipment
CN105608138B (en) A kind of system of optimization array data base concurrency data loading performance
CN104156505B (en) A kind of Hadoop cluster job scheduling method and devices based on user behavior analysis
CN107436865A (en) A kind of word alignment training method, machine translation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171020