CN107273339A - A kind of task processing method and device - Google Patents
A kind of task processing method and device Download PDFInfo
- Publication number
- CN107273339A CN107273339A CN201710483201.5A CN201710483201A CN107273339A CN 107273339 A CN107273339 A CN 107273339A CN 201710483201 A CN201710483201 A CN 201710483201A CN 107273339 A CN107273339 A CN 107273339A
- Authority
- CN
- China
- Prior art keywords
- back end
- matrix
- data set
- parallel
- sub data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/543—User-generated data transfer, e.g. clipboards, dynamic data exchange [DDE], object linking and embedding [OLE]
Abstract
The present invention provides a kind of task processing method and device, reads corresponding matrix Sub Data Set parallel by multiple back end and the calculating task of the matrix Sub Data Set is divided into the i.e. multiple submatrix blocks in multiple subtasks respectively;The submatrix block is distributed to corresponding calculate node and performs the parallel computation i.e. parallel reading of file and the parallel execution of task by the calculate node by the back end, is realized efficient, inexpensive task processing, be greatly strengthen Consumer's Experience.
Description
Technical field
The invention belongs to high-performance computing sector, more particularly to a kind of task processing method and device.
Background technology
With continuing to develop for big data epoch, large-scale data plays crucial work for scientific algorithm and scientific statistics
With, and Matrix Multiplication is a kind of algorithm more popularized in large-scale scientific algorithm.And the increase and calculating with data set are complicated
The lifting of degree, requirement to algorithm performance also more and more higher during science is realized, the lifting of algorithm performance is to promoting engineering, project
Research progress plays vital effect, and is limited to the limitation of computer hardware resource internal memory, calculating speed etc., for completing
The time for generally requiring tens of days or even several months is realized on the computing of some extensive matrix, single machine, and for internal memory
The less calculating platform of capacity, its scarce capacity frequently can lead to journey to support under the storage of large-scale dataset, serious conditions
The collapse of sequence, so as to limit the realization of large-scale dataset calculating.
Traditional matrix algorithm realizes that the product for being generally basede on each matrix element in single calculate node is summed successively, and adopts
Data are read with serial calculation, i.e. order, wait back to calculate the calculating for completing just to start next step.If want
The matrix size of calculating is ranks up to ten thousand, and the time complexity and disappeared that two matrix multiple methods are brought are realized using serial mode
The cost of consumption is well imagined.
Therefore, above-mentioned technical problem is solved in the urgent need to providing a kind of task processing scheme efficiently, inexpensive.
The content of the invention
The present invention provides a kind of task processing method and device, to solve the above problems.
The embodiment of the present invention provides a kind of task processing method.The above method comprises the following steps:Pass through multiple data sections
Point reads corresponding matrix Sub Data Set and the calculating task of the matrix Sub Data Set is divided into many height respectively parallel appoints
Business is multiple submatrix blocks;
The submatrix block is distributed to corresponding calculate node and performed simultaneously by the calculate node by the back end
Row is calculated.
The embodiment of the present invention provides a kind of Task Processing Unit, including read module, division module, distribution computing module;
Wherein, the read module is connected by the division module with the distribution computing module;
The read module, for reading corresponding matrix Sub Data Set parallel by multiple back end;
The division module, is divided into multiple subtasks i.e. many for respectively by the calculating task of the matrix Sub Data Set
Individual sub- matrix-block;
The distribution computing module, for distributing the submatrix block to corresponding calculate node and being saved by described calculate
Point performs parallel computation.
Pass through following scheme:Corresponding matrix Sub Data Set is read parallel by multiple back end and respectively by the square
The calculating task of a period of time data set is divided into the i.e. multiple submatrix blocks in multiple subtasks;The back end is by the submatrix block
Distribute to corresponding calculate node and by the calculate node perform parallel computation be file parallel reading and task it is parallel
Perform, realize efficient, inexpensive task processing, greatly strengthen Consumer's Experience.
Pass through following scheme:The submatrix block includes the line number more than 1 and line number is continuous so that calculate node will be counted
Calculate result and be disposably sent to back end, number of communications is reduced, so as to reduce communication overhead.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this hair
Bright schematic description and description is used to explain the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 show the task processing method flow chart of the embodiment of the present invention 1;
Fig. 2 show the designed holder composition of the embodiment of the present invention 2;
Fig. 3 show principal and subordinate's process task distribution implementation process figure of the embodiment of the present invention 3;
Fig. 4 show the Task Processing Unit structure chart of the embodiment of the present invention 4.
Embodiment
Describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that not conflicting
In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
Fig. 1 show the task processing method flow chart of the embodiment of the present invention 1, comprises the following steps:
Step 101:Corresponding matrix Sub Data Set is read parallel by multiple back end and respectively by matrix
The calculating task of data set is divided into the i.e. multiple submatrix blocks in multiple subtasks;
Further, the process for reading corresponding matrix Sub Data Set parallel by multiple back end is:
Multiple back end open the file for depositing matrix by corresponding digital independent host process respectively, return to a text
Part handle;
The offset in the corresponding matrix file of each digital independent host process is calculated, each digital independent host process is obtained
Logical place of the matrix Sub Data Set to be read in matrix file;
Logical place of the digital independent host process according to the matrix Sub Data Set to be read in matrix file, reads
Take corresponding matrix Sub Data Set.
Preferably, multiple back end are read after corresponding matrix Sub Data Set parallel, are stored in internal memory.
Further, the matrix Sub Data Set is averagely divided into many by the back end according to calculate node quantity
Individual sub- matrix-block is simultaneously sent to corresponding calculate node.
Preferably, the submatrix block includes the line number more than 1 and line number is continuous.
If the line number for the matrix Sub Data Set that the back end is read is M, calculate node number is child_
Process, then distribute each calculate node per_rank=M/child_process rows,
For remaining extra=M%child_process rows, extra rows are evenly distributed to 1 to No. extra calculating
Node, 1 to each process of extra calculate nodes increases a line.
Step 102:The submatrix block is distributed to corresponding calculate node and saved by described calculate by the back end
Point performs parallel computation.
Further, the calculate node is completed after the calculating of the submatrix block, and result of calculation is sent to corresponding
Back end.
Further, a back end is selected from the back end, the back end for statistics is used as;
The back end for being used to count is collected and stored to the result of calculation obtained from other back end.
On the one hand the method that the embodiment of the present invention is proposed draws large-scale dataset by way of parallel file is read
It is divided into numerous matrix Sub Data Sets, solves internal memory restricted problem, on the other hand by the way that the calculating task of matrix Sub Data Set is drawn
It is divided into multiple subtasks, realizes the parallel execution of multiple calculating subtasks, so as to improves algorithm based under large-scale dataset
Computational efficiency.
(1) for the processing of large-scale dataset, by the way of parallel file is read, using parallel file I/O technologies,
When matrix is deposited in file, in the case of cluster multinode, the several nodes of selection pass through finger as back end from file
Determine display offset, read the data of diverse location in file, big matrix data collection is divided, each back end is from matrix file
The middle a part of subset for obtaining whole matrix data collection, this multiple node can be same if multiple back end parallel read datas
Shi Zuowei MPI host processes carry out data distribution and management.This mode on the one hand can solving matrix scale it is too big, single node internal memory
The problem of being not enough to store whole matrix capacity, while the mode read using parallel file offset address realizes each data section again
Point parallel read data, improves the access efficiency of file.
(2) for serial matrix computations inefficiency the problem of, by the way of being divided using multiple subtasks, in cluster more piece
In dot system, the calculating of the matrix Sub Data Set that each back end is read parallel in (1) is appointed using MPI multiprocess communications technology
Business is divided into multiple subtasks, that is, divides the matrix into multiple submatrix blocks, distributes to different processes and handles, and handles submatrix
Node where the process of block is that (node where host process is back end to calculate node, is meter from the node where process
Operator node), the calculating task between each calculate node is performed parallel.
The embodiment of the present invention provides a set of effective host process, from process collaborative management method, and host process completes task
Division, the distribution of data complete the parallel computation of Sub Data Set from process, if there is multiple back end to read parallel with collecting work
Fetch evidence, then it is considered that there are multiple host processes, each host process is responsible for that multiple (i.e. each back end is responsible for multiple from process
Calculate node).
(3) the Matrix Multiplication optimization method that the embodiment of the present invention is proposed can run on large-scale cluster platform, utilize MPI skills
Art is different by setting offset address by the way of the reading of multiple back end parallel files for large-scale dataset
Back end obtains the Sub Data Set of file diverse location simultaneously, so can both improve file reading efficiency, single-unit is solved again
The problem of point limited memory is not enough to store whole data set.While the computational efficiency in order to improve algorithm, utilizes MPI multi-process
The mode of cooperative cooperating, the calculating task of a certain matrix data collection is divided into the calculating of multiple submatrix blocks, respectively negative from process
The wherein a certain submatrix block of duty, respectively from the parallel calculating for completing correspondence submatrix block between process.
(4) distinguishing feature of the embodiment of the present invention is in large-scale cluster system, for large-scale dataset, in single-unit
Point machine is not enough in the case of storing whole data set, sets multiple back end to read correspondence Sub Data Set parallel, for
Matrix data in different pieces of information node, is divided into multiple submatrix blocks, and submatrix block is allocated parallel to multiple calculate nodes
Complete to cooperate between the processing of different submatrix blocks, each node, finally by the unified management for completing communication of back end and knot
The arrangement of fruit data.
It is specifically described below:
In the optimization method of extensive Matrix Multiplication based on group system, it is divided into back end and calculate node, when single
When node memory is not enough to data storage collection, multiple nodes are by the way of parallel file work/O, by specifying display in file
The data that different nodes read diverse location in file are offset, these nodes are referred to as back end.Different from traditional MPI journeys
Exist multiple " host processes " in sequence, the MPI programs that the embodiment of the present invention is realized, i.e., it is parallel in back end to read the every of file
We are referred to as host process to individual process, in order to improve computational efficiency, and each host process divides the Sub Data Set obtained again (and will
Matrix Sub Data Set in back end is divided into multiple submatrix blocks), distribution is to corresponding calculate node (from the section where process
Put our referred to as calculate nodes), multiple calculate nodes obtain it is parallel after submatrix blocks performs calculatings, before replacement matrix division
Serial computing, by parallel twice (digital independent calculates parallel with data parallel), (large-scale dataset is divided with division twice
It is that Sub Data Set and matrix-block are divided into submatrix block), realize the algorithm optimization based on the extensive Matrix Multiplication of cluster platform.This hair
Bright embodiment realizes framework as shown in Fig. 2 each back end obtains matrix Sub Data Set i.e. sub_ from database
dataset;Each back end is responsible for different calculate nodes, and (calculate node obtains corresponding submatrix block, for example:Sub1, and
Result of calculation res is returned into corresponding back end), last result of calculation carries out unified remittance by one of back end
Always (res_dataset) and storage.
The embodiment that multiple back end (i.e. multiple digital independent host processes) read file parallel is as follows, calls first
MPI_File_open (comm, filename ...), each process opens the file of storage matrix, returns to a file handle,
Operation of each process to file is all realized by handle.The offset in each process homography file is calculated, is obtained
Logical place of the data in matrix file needed for each process, so each process (back end) can obtain matrix data collection
A part (i.e. Sub Data Set), each digital independent process is by calling MPI_File_read_at () to read square in file respectively
The submatrix data set of acquisition is deposited in progress next step meter in respective internal memory by the different piece of battle array, so each back end
Calculate operation.
It is previously noted that being by secondary stroke of data set in the method that the embodiment of the present invention is realized using MPI multi-process technologies
Point, first time division is multiple back end (host process) Sub Data Set that reading large-scale data is concentrated parallel and is stored in interior
In depositing, second of division is that matrix Sub Data Set is again divided into multiple submatrix blocks, the meter of each submatrix block by back end
Calculation is allocated to corresponding calculate node (from process), and the parallel calculating for performing submatrix block, is realized one between calculate node
The serial computing of individual large-scale dataset is converted to the parallel computation that submatrix block is directed to for multiple calculate nodes.In multiple data
Nodal parallel is read under conditions of data set, and the multiple calculate nodes of each back end correspondence, back end enters as a master
Matrix Sub Data Set is averagely divided into multiple submatrix blocks and sent to corresponding calculate node (calculate node as from process) by journey,
Calculate node after the calculating of submatrix block is completed parallel, by the unified transmission of result of calculation to corresponding data node, each data section
Point completes the distribution and collection of data, finally selects back end to complete collecting and storing for result data.
The implementation that submatrix block realizes parallel computation to calculate node (from process) is divided for back end (host process)
Process is as shown in figure 3, mainly include following key point:
Host process carries out submatrix block and averagely divided, and sends consecutive numbers row data extremely from process;
From task parallelism calculated sub-matrix block, result of calculation is sent to host process;
Host process receives the feedback data from process, as a result collects and stores.
In addition, submatrix block divide with transmission process in, for ensure internal memory continuously read, host process be sent to it is each from
The data of process all need to be continuous several rows in matrix, are calculated, then will calculated after respectively receiving host process data from process
As a result host process is disposably sent to, number of communications can be so reduced, so as to reduce communication overhead.For the processing of remainder, if
Current matrix line number is M, is child_process from process number, then each can obtain per_rank=M/ from process first
Child_process rows, for remaining extra=M%child_process rows, to ensure that the load of each process processing is equal
Weighing apparatus, extra rows are evenly distributed into 1 to extra processes, 1 to each process of extra processes increases a line, i.e. host process
To 1 to No. extra each sends M/num_process+1 row data from process, and to extra to child_process, process is sent out
Send M/child_process row data.
Fig. 4 show the Task Processing Unit structure chart of the embodiment of the present invention 4, including read module, division module, distribution
Computing module;Wherein, the read module is connected by the division module with the distribution computing module;
The read module, for reading corresponding matrix Sub Data Set parallel by multiple back end;
The division module, is divided into multiple subtasks i.e. many for respectively by the calculating task of the matrix Sub Data Set
Individual sub- matrix-block;
The distribution computing module, for distributing the submatrix block to corresponding calculate node and being saved by described calculate
Point performs parallel computation.
Further, the submatrix block includes the line number more than 1 and line number is continuous.
Pass through following scheme:Corresponding matrix Sub Data Set is read parallel by multiple back end and respectively by the square
The calculating task of a period of time data set is divided into the i.e. multiple submatrix blocks in multiple subtasks;The back end is by the submatrix block
Distribute to corresponding calculate node and by the calculate node perform parallel computation be file parallel reading and task it is parallel
Perform, realize efficient, inexpensive task processing, greatly strengthen Consumer's Experience.
Pass through following scheme:The submatrix block includes the line number more than 1 and line number is continuous so that calculate node will be counted
Calculate result and be disposably sent to back end, number of communications is reduced, so as to reduce communication overhead.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.
Claims (10)
1. a kind of task processing method, it is characterised in that comprise the following steps:
Corresponding matrix Sub Data Set is read parallel by multiple back end and respectively by the calculating of the matrix Sub Data Set
Task is divided into the i.e. multiple submatrix blocks in multiple subtasks;
The submatrix block is distributed to corresponding calculate node and performs parallel meter by the calculate node by the back end
Calculate.
2. according to the method described in claim 1, it is characterised in that read corresponding matrix parallel by multiple back end
The process of data set is:
Multiple back end open the file for depositing matrix by corresponding digital independent host process respectively, return to a file sentence
Handle;
The offset in the corresponding matrix file of each digital independent host process is calculated, each digital independent host process is obtained and wants
Logical place of the matrix Sub Data Set of reading in matrix file;
Logical place of the digital independent host process according to the matrix Sub Data Set to be read in matrix file, reading pair
The matrix Sub Data Set answered.
3. according to the method described in claim 1, it is characterised in that multiple back end read corresponding matrix subdata parallel
After collection, it is stored in internal memory.
4. according to the method described in claim 1, it is characterised in that the back end, will be described according to calculate node quantity
Matrix Sub Data Set is averagely divided into multiple submatrix blocks and sent to corresponding calculate node.
5. according to the method described in claim 1, it is characterised in that the submatrix block includes the line number for being more than 1 and line number is
Continuously.
6. method according to claim 5, it is characterised in that if the row for the matrix Sub Data Set that the back end is read
Number is M, and calculate node number is child_process, then distributes each calculate node per_rank=M/child_process
OK,
For remaining extra=M%child_process rows, extra rows are evenly distributed to 1 to No. extra and calculate section
Point, 1 to each process of extra calculate nodes increases a line.
7. according to the method described in claim 1, it is characterised in that the calculate node completes the calculating of the submatrix block
Afterwards, result of calculation is sent to corresponding back end.
8. method according to claim 7 a, it is characterised in that back end is selected from the back end, makees
For the back end for statistics;
The back end for being used to count is collected and stored to the result of calculation obtained from other back end.
9. a kind of Task Processing Unit, it is characterised in that including read module, division module, distribution computing module;Wherein, institute
Read module is stated by the division module with the distribution computing module to be connected;
The read module, for reading corresponding matrix Sub Data Set parallel by multiple back end;
The division module, for the calculating task of the matrix Sub Data Set to be divided into the i.e. many height in multiple subtasks respectively
Matrix-block;
The distribution computing module, for distributing the submatrix block to corresponding calculate node and being held by the calculate node
Row parallel computation.
10. device according to claim 9, it is characterised in that the submatrix block includes the line number for being more than 1 and line number is
Continuously.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710483201.5A CN107273339A (en) | 2017-06-21 | 2017-06-21 | A kind of task processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710483201.5A CN107273339A (en) | 2017-06-21 | 2017-06-21 | A kind of task processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107273339A true CN107273339A (en) | 2017-10-20 |
Family
ID=60068355
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710483201.5A Pending CN107273339A (en) | 2017-06-21 | 2017-06-21 | A kind of task processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107273339A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764490A (en) * | 2018-08-28 | 2018-11-06 | 合肥本源量子计算科技有限责任公司 | A kind of quantum virtual machine |
CN109189732A (en) * | 2018-08-03 | 2019-01-11 | 成都四方伟业软件股份有限公司 | A kind of median analysis method and device |
CN109669772A (en) * | 2018-12-28 | 2019-04-23 | 第四范式(北京)技术有限公司 | Calculate the parallel execution method and apparatus of figure |
CN113254078A (en) * | 2021-06-23 | 2021-08-13 | 北京睿芯高通量科技有限公司 | Data stream processing method for efficiently executing matrix addition on GPDPU simulator |
CN113568736A (en) * | 2021-06-24 | 2021-10-29 | 阿里巴巴新加坡控股有限公司 | Data processing method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040193841A1 (en) * | 2003-03-31 | 2004-09-30 | Fujitsu Limited | Matrix processing device in SMP node distributed memory type parallel computer |
CN102831102A (en) * | 2012-07-30 | 2012-12-19 | 北京亿赞普网络技术有限公司 | Method and system for carrying out matrix product operation on computer cluster |
CN104461466A (en) * | 2013-09-25 | 2015-03-25 | 广州中国科学院软件应用技术研究所 | Method for increasing computing speed through parallel computing based on MPI and OpenMP hybrid programming model |
CN104461467A (en) * | 2013-09-25 | 2015-03-25 | 广州中国科学院软件应用技术研究所 | Method for increasing calculation speed of SMP cluster system through MPI and OpenMP in hybrid parallel mode |
CN105260342A (en) * | 2015-09-22 | 2016-01-20 | 浪潮(北京)电子信息产业有限公司 | Solving method and system for symmetric positive definite linear equation set |
CN106062732A (en) * | 2015-02-06 | 2016-10-26 | 华为技术有限公司 | Data processing system, calculation node and data processing method |
-
2017
- 2017-06-21 CN CN201710483201.5A patent/CN107273339A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040193841A1 (en) * | 2003-03-31 | 2004-09-30 | Fujitsu Limited | Matrix processing device in SMP node distributed memory type parallel computer |
CN102831102A (en) * | 2012-07-30 | 2012-12-19 | 北京亿赞普网络技术有限公司 | Method and system for carrying out matrix product operation on computer cluster |
CN104461466A (en) * | 2013-09-25 | 2015-03-25 | 广州中国科学院软件应用技术研究所 | Method for increasing computing speed through parallel computing based on MPI and OpenMP hybrid programming model |
CN104461467A (en) * | 2013-09-25 | 2015-03-25 | 广州中国科学院软件应用技术研究所 | Method for increasing calculation speed of SMP cluster system through MPI and OpenMP in hybrid parallel mode |
CN106062732A (en) * | 2015-02-06 | 2016-10-26 | 华为技术有限公司 | Data processing system, calculation node and data processing method |
CN105260342A (en) * | 2015-09-22 | 2016-01-20 | 浪潮(北京)电子信息产业有限公司 | Solving method and system for symmetric positive definite linear equation set |
Non-Patent Citations (3)
Title |
---|
周灿: "基于MPI的矩阵运算并行算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
李小卫 等: "基于MPI的并行I/O方法", 《微型机与应用》 * |
许彦芹 等: "基于SMP集群的MPI+CUDA模型的研究与实现", 《计算机工程与设计》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109189732A (en) * | 2018-08-03 | 2019-01-11 | 成都四方伟业软件股份有限公司 | A kind of median analysis method and device |
CN108764490A (en) * | 2018-08-28 | 2018-11-06 | 合肥本源量子计算科技有限责任公司 | A kind of quantum virtual machine |
CN109669772A (en) * | 2018-12-28 | 2019-04-23 | 第四范式(北京)技术有限公司 | Calculate the parallel execution method and apparatus of figure |
CN111522640A (en) * | 2018-12-28 | 2020-08-11 | 第四范式(北京)技术有限公司 | Parallel execution method and equipment of computational graph |
CN113254078A (en) * | 2021-06-23 | 2021-08-13 | 北京睿芯高通量科技有限公司 | Data stream processing method for efficiently executing matrix addition on GPDPU simulator |
CN113254078B (en) * | 2021-06-23 | 2024-04-12 | 北京中科通量科技有限公司 | Data stream processing method for efficiently executing matrix addition on GPDPU simulator |
CN113568736A (en) * | 2021-06-24 | 2021-10-29 | 阿里巴巴新加坡控股有限公司 | Data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107273339A (en) | A kind of task processing method and device | |
CN108875958A (en) | Use the primary tensor processor of outer product unit | |
US8751556B2 (en) | Processor for large graph algorithm computations and matrix operations | |
US7680765B2 (en) | Iterate-aggregate query parallelization | |
CN103049241B (en) | A kind of method improving CPU+GPU isomery device calculated performance | |
CN108875956B (en) | Primary tensor processor | |
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
CN103324765B (en) | A kind of multi-core synchronization data query optimization method based on row storage | |
CN104239144A (en) | Multilevel distributed task processing system | |
CN103019728A (en) | Effective complex report parsing engine and parsing method thereof | |
Tantalaki et al. | Pipeline-based linear scheduling of big data streams in the cloud | |
WO2014052942A1 (en) | Random number generator in a parallel processing database | |
CN110929884A (en) | Classification method and device for distributed machine learning optimization based on column division | |
CN106371924B (en) | A kind of method for scheduling task minimizing MapReduce cluster energy consumption | |
CN105786619B (en) | Virtual machine distribution method and device | |
CN107085562A (en) | A kind of neural network processor and design method based on efficient multiplexing data flow | |
CN107491416A (en) | Reconfigurable Computation structure and calculating dispatching method and device suitable for Arbitrary Dimensions convolution demand | |
CN105677763A (en) | Image quality evaluating system based on Hadoop | |
CN107402926A (en) | A kind of querying method and query facility | |
CN106412124A (en) | Task allocation system and task allocation method for parallel ordering cloud service platform | |
Nicol et al. | Efficient aggregation of multiple PLs in distributed memory parallel simulations | |
CN106844320A (en) | A kind of financial statement integration method and equipment | |
CN105608138B (en) | A kind of system of optimization array data base concurrency data loading performance | |
CN104156505B (en) | A kind of Hadoop cluster job scheduling method and devices based on user behavior analysis | |
CN107436865A (en) | A kind of word alignment training method, machine translation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171020 |