CN107256203A - The implementation method and device of a kind of matrix-vector multiplication - Google Patents
The implementation method and device of a kind of matrix-vector multiplication Download PDFInfo
- Publication number
- CN107256203A CN107256203A CN201710506697.3A CN201710506697A CN107256203A CN 107256203 A CN107256203 A CN 107256203A CN 201710506697 A CN201710506697 A CN 201710506697A CN 107256203 A CN107256203 A CN 107256203A
- Authority
- CN
- China
- Prior art keywords
- matrix
- data block
- vector
- row
- column
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Complex Calculations (AREA)
Abstract
The embodiment of the invention discloses a kind of implementation method of matrix-vector multiplication, this method includes:Under open computing language OpenCL frameworks, vectorization processing is carried out respectively to the first matrix and the second matrix of multiplication;The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.Device is realized the embodiment of the invention also discloses a kind of matrix-vector multiplication.It by the scheme of the embodiment of the present invention, can implement on high-performance calculation platform, take full advantage of computer hardware resource, and substantially reduce the calculating time, improve operation efficiency.
Description
Technical field
The present embodiments relate to the implementation method and dress of high-performance computing sector, more particularly to a kind of matrix-vector multiplication
Put.
Background technology
The data huge explosion of the current social mankind, information data is more and more, and people are to the disposal ability of information data
It is required that also more and more higher, such as in artificial intelligence, weather forecast, space flight national defence, financial economy, oil exploration, scientific research
Field, the demand to high performance computation is growing day by day, and high performance matrix-vector multiplication is calculated, and is even more its important foundation stone.
But the scheme used in current matrix-vector multiplication is CPU (Central Processing Unit central processing units) serial
Calculating matrix vector multiplication, i.e., carry out the multiplication of next data again after being multiplied to a data in matrix, calculates the time
Long, efficiency is low, is required far from current growing data processing speed is met.
The content of the invention
In order to solve the above problems, the embodiment of the present invention proposes the implementation method and device of a kind of matrix-vector multiplication,
It can implement on high-performance calculation platform, make full use of computer hardware resource, and substantially reduce the calculating time, improve
Operation efficiency.
In order to achieve the above object, the embodiment of the present invention proposes a kind of implementation method of matrix-vector multiplication, this method
Including:
Under open computing language OpenCL frameworks, the first matrix and the second matrix of multiplication are carried out at vectorization respectively
Reason;
The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.
Alternatively, under open computing language OpenCL frameworks, the first matrix and the second matrix of multiplication are carried out respectively
Vectorization processing includes:
Using each row vector in the first matrix as a row data block, and by each in the second matrix arrange to
Amount is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain many
Individual kernel functions;Wherein, each kernel functions and wherein incoming row data block and column data block are corresponded;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple rows
To quantum matrix and multiple column vector submatrixs.
Alternatively, carrying out vectorization processing respectively to the row data block and column data block in each kernel functions includes:
Using OpenCL vectorial Vector data types, every n real-coded GA in row data block is carried out respectively
Vectorization processing to obtain multiple row vector submatrixs, and every n real-coded GA in column data block is carried out respectively to
Quantification treatment to obtain multiple column vector submatrixs, wherein, n is positive integer.
Alternatively, the multiple submatrixs obtained after vectorization is handled, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is carried out parallel with column vector submatrix respectively
It is multiplied.
Alternatively, n=4.
In order to achieve the above object, what the embodiment of the present invention also proposed a kind of matrix-vector multiplication realizes device, the dress
Put including:Processing module and computing module;
Processing module, under open computing language OpenCL frameworks, to the first matrix and the second matrix point of multiplication
Carry out not vectorization processing;
Computing module, multiple submatrixs for being obtained after vectorization is handled carry out concurrent operation.
Alternatively, processing module is under open computing language OpenCL frameworks, to the first matrix and the second matrix of multiplication
Carrying out vectorization processing respectively includes:
Using each row vector in the first matrix as a row data block, and by each in the second matrix arrange to
Amount is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain many
Individual kernel functions;Wherein, each kernel functions and wherein incoming row data block and column data block are corresponded;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple rows
To quantum matrix and multiple column vector submatrixs.
Alternatively, processing module is carried out at vectorization respectively to the row data block and column data block in each kernel functions
Reason includes:
Using OpenCL Vector data types, row vector is entered to every n real-coded GA in row data block respectively
Change processing to obtain multiple row vector submatrixs, and vectorization is carried out to every n real-coded GA in column data block respectively
Handle to obtain multiple column vector submatrixs, wherein, n is positive integer.
Alternatively, the multiple submatrixs obtained after computing module handles vectorization, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is carried out parallel with column vector submatrix respectively
It is multiplied.
Alternatively, n=4.
Scheme of the embodiment of the present invention includes:Under open computing language OpenCL frameworks, the first matrix to multiplication and the
Two matrixes carry out vectorization processing respectively;The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.By this hair
The scheme of bright embodiment, can implement on high-performance calculation platform, take full advantage of computer hardware resource, and contract significantly
The short calculating time, improve operation efficiency.
Brief description of the drawings
The accompanying drawing in the embodiment of the present invention is illustrated below, the accompanying drawing in embodiment is used for the embodiment of the present invention
Further understand, together with specification be used for explain the embodiment of the present invention, do not constitute to protection domain of the embodiment of the present invention
Limitation.
Fig. 1 is the implementation method flow chart of the matrix-vector multiplication of the embodiment of the present invention;
Fig. 2 carries out vectorization processing method stream respectively for the first matrix and the second matrix to multiplication of the embodiment of the present invention
Cheng Tu;
Fig. 3 carries out data block to the first matrix and the second matrix for the embodiment of the present invention and divides schematic diagram;
Fig. 4 is the OpenCL block flow diagrams of the matrix-vector multiplication of the embodiment of the present invention;
Fig. 5 is carries out vectorization schematic diagram in the kernel functions of the embodiment of the present invention to row data block and column data block;
Fig. 6 realizes device composition frame chart for the matrix-vector multiplication of the embodiment of the present invention.
Embodiment
For the ease of the understanding of those skilled in the art, the embodiment of the present invention is made further to retouch below in conjunction with the accompanying drawings
State, can not be used for limiting the protection domain of the embodiment of the present invention.
The embodiment of the present invention proposes a kind of implementation method of matrix-vector multiplication, as shown in figure 1, this method can include
S101-S102:
S101, under open computing language OpenCL frameworks, the first matrix and the second matrix of multiplication are carried out respectively to
Quantification treatment.
In embodiments of the present invention, asked in order to which the speed for solving the presence of current matrix multiplying method is slow, efficiency is low
A kind of topic, it is proposed that matrix-vector multiplication implementation method based on OpenCL.(OpenComputingLanguage is opened OpenCL
Put computing language) language, it is open, the cross-platform multiple programming framework towards the general purpose of heterogeneous system.Current
The achievable computer hardware accelerated parallel under the conditions of, can make full use of computer hardware resource, improve matrix-vector multiplication
The operation efficiency of method;Described computer hardware is all support OpenCL computer hardware platforms, for example, this computer is hard
Part platform can by CPU, GPU (Graphic Processing Unit, graphics processor) or other kinds of processor group into.
In embodiments of the present invention, the method that the matrix-vector multiplication based on OpenCL is realized is by the way that operation matrix is divided
Block processing, piecemeal number is equal with row matrix columns, so as to realize the parallelization processing of data.Following scheme can specifically be passed through
Realize.
Alternatively, as shown in Fig. 2 under open computing language OpenCL frameworks, to the first matrix and the second square of multiplication
Battle array carries out vectorization processing respectively can include S201-S203:
S201, using each row vector in the first matrix as a row data block, and will be each in the second matrix
Individual column vector is used as a column data block.
In embodiments of the present invention, initial work is carried out first under OpenCL frameworks, to equipment Device, context
Component is defined and assignment necessary to Context, program Program etc., then to the first matrix and the second matrix of multiplication
Carry out piecemeal processing.Specifically, can be using each row vector in the first matrix as a row data block, and by the second square
Each column vector in battle array is used as a column data block.Row data block Hblock1 as shown in Figure 3, Hblock2,
Hblock3 ... Hblockn, and column data block Lblock1, Lblock2, Lblock3 ... Lblockn.
S202, respectively by any one row data block and the incoming kernel function of any one column data block, and
Obtain multiple kernel functions;Wherein, each kernel functions and wherein incoming row data block and column data block are corresponded.
In embodiments of the present invention, as shown in figure 4, when being calculated, can by an arbitrary row data block and
In the incoming kernel function of one column data block, for example, by Hblock1 and the incoming kernel of Lblock1, inciting somebody to action
In the Hblock2 and incoming kerne2 of Lblock1, by Hblock3 and the incoming kerne3 of Lblock1 ... ..., the rest may be inferred, will
Different row data blocks is put into different kernel functions with combining for column data block, to divide in different kernel functions
It is other that parallel multiplication computing is done to corresponding row data block and column data block.
S203, vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain
Multiple row vector submatrixs and multiple column vector submatrixs.
In embodiments of the present invention, after by the incoming different kernel functions of row data block and column data block,
In each kernel functions, vectorization processing further can also be carried out to row data block and column data block, by the row data
Block is further divided into smaller subvector or submatrix with column data block.It can specifically be realized by following scheme.
Alternatively, carrying out vectorization processing respectively to the row data block and column data block in each kernel functions includes:
Using OpenCL vectorial Vector data types, every n real-coded GA in row data block is carried out respectively
Vectorization processing to obtain multiple row vector submatrixs, and every n real-coded GA in column data block is carried out respectively to
Quantification treatment to obtain multiple column vector submatrixs, wherein, n is positive integer.
In embodiments of the present invention, as shown in figure 5, among the computing of kernel functions, using OpenCL Vector
The data included in each row data block and column data block further can be carried out vectorization processing, each by data type
Row data block and column data block can be divided into multiple submatrixs, each submatrix can be comprising n real-coded GA.
In embodiments of the present invention, n numerical value can be determined according to the computing capability of current calculating platform, if worked as
The computing capability of preceding calculating platform is stronger, and can set n numerical value is smaller, if the calculating energy of current calculating platform
Power is poor, and can set n numerical value is larger.Alternatively, n=4.For example, by every adjacent 4 in row data block
Single-precision floating-point data is as a row vector submatrix, accordingly, by every 4 adjacent single-precision floating points in column data block
Data are used as a column vector submatrix, row vector submatrix HVector1, HVector2 ... as shown in Figure 5
HVectorn, and column vector submatrix LVector1, LVector2 ... LVectorn.
S102, vectorization is handled after multiple submatrixs for obtaining carry out concurrent operation.
In embodiments of the present invention, just can be right after to the further vectorization of data block in each kernel functions
Multiple submatrixs after vectorization carry out concurrent operation using asynchronous thread.
Alternatively, the multiple submatrixs obtained after vectorization is handled, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is carried out parallel with column vector submatrix respectively
It is multiplied.
In embodiments of the present invention, for example HVector1 is multiplied with LVector1 respectively, by HVector2 with
LVector2, which is multiplied, ... ... is multiplied HVectorn with LVectorn, and above-mentioned computing is carried out parallel, as n=4 so that
4 single precision floating datum operations of each thread single treatment, so as to further improve operation efficiency using vector operation.Each
Computing unit is separate in calculating process, without communication, therefore also has good expansibility, and the embodiment scheme
Parallel granularity and operation efficiency are taken into account.
In embodiments of the present invention, by scheme of the embodiment of the present invention will realize high-performance calculation unit between independently transport
Calculate, without communication or mutually wait, realize that high-performance is quickly calculated;And there is scheme of the embodiment of the present invention good platform to move
Plant property, due to OpenCL cross-platform characteristic, the isomery that this embodiment scheme can easily be transplanted to all support OpenCL is high
In performance calculating platform;Compared with traditional CPU serial computing matrix-vector multiplications, data parallel and vectorization side make use of
Method, substantially increases computational efficiency.
In order to achieve the above object, the embodiment of the present invention also proposed a kind of matrix-vector multiplication realize device 1, it is necessary to
Illustrate, any one embodiment in above-mentioned embodiment of the method is suitable for the device embodiment of the present invention, herein
Repeat no more, as shown in fig. 6, the device can include:Processing module 11 and computing module 12;
Processing module 11, under open computing language OpenCL frameworks, to the first matrix and the second matrix of multiplication
Vectorization processing is carried out respectively;
Computing module 12, multiple submatrixs for being obtained after vectorization is handled carry out concurrent operation.
Alternatively, processing module 11 is under open computing language OpenCL frameworks, to the first matrix and the second square of multiplication
Battle array carries out vectorization processing respectively to be included:
Using each row vector in the first matrix as a row data block, and by each in the second matrix arrange to
Amount is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain many
Individual kernel functions;Wherein, each kernel functions and wherein incoming row data block and column data block are corresponded;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple rows
To quantum matrix and multiple column vector submatrixs.
Alternatively, the row data block and column data block in each kernel functions of 11 pairs of processing module carry out vectorization respectively
Processing includes:
Using OpenCL Vector data types, row vector is entered to every n real-coded GA in row data block respectively
Change processing to obtain multiple row vector submatrixs, and vectorization is carried out to every n real-coded GA in column data block respectively
Handle to obtain multiple column vector submatrixs, wherein, n is positive integer.
Alternatively, multiple submatrixs that computing module 12 is obtained after vectorization is handled, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is carried out parallel with column vector submatrix respectively
It is multiplied.
Alternatively, n=4.
Scheme of the embodiment of the present invention includes:Under open computing language OpenCL frameworks, the first matrix to multiplication and the
Two matrixes carry out vectorization processing respectively;The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.By this hair
The scheme of bright embodiment, can implement on high-performance calculation platform, take full advantage of computer hardware resource, and contract significantly
The short calculating time, improve operation efficiency.
It should be noted that embodiment described above be for only for ease of it will be understood by those skilled in the art that, and
It is not used in the protection domain of the limitation embodiment of the present invention, on the premise of the inventive concept of the embodiment of the present invention is not departed from, ability
Any obvious replacement and improvement that field technique personnel are made to the embodiment of the present invention etc. is in the embodiment of the present invention
Within protection domain.
Claims (10)
1. a kind of implementation method of matrix-vector multiplication, it is characterised in that methods described includes:
Under open computing language OpenCL frameworks, vectorization processing is carried out respectively to the first matrix and the second matrix of multiplication;
The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.
2. the implementation method of matrix-vector multiplication as claimed in claim 1, it is characterised in that described in open computing language
Under OpenCL frameworks, carrying out vectorization processing respectively to the first matrix and the second matrix of multiplication includes:
Using each row vector in first matrix as a row data block, and by each in second matrix
Column vector is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain multiple
Kernel functions;Wherein, each kernel functions are corresponded with the wherein incoming row data block and the column data block;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple row vectors
Submatrix and multiple column vector submatrixs.
3. the implementation method of matrix-vector multiplication as claimed in claim 2, it is characterised in that described to each kernel functions
In row data block and column data block carry out respectively vectorization processing include:
Using the vectorial Vector data types of the OpenCL, respectively to every n real-coded GA in the row data block
Vectorization processing is carried out to obtain multiple row vector submatrixs, and respectively to every n floating type number in the column data block
Handled according to vectorization is carried out to obtain multiple column vector submatrixs, wherein, n is positive integer.
4. the implementation method of matrix-vector multiplication as claimed in claim 3, it is characterised in that it is described vectorization is handled after obtain
The multiple submatrixs obtained, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is subjected to parallel phase with column vector submatrix respectively
Multiply.
5. the implementation method of matrix-vector multiplication as claimed in claim 3, it is characterised in that the n=4.
6. a kind of matrix-vector multiplication realizes device, it is characterised in that described device includes:Processing module and computing module;
The processing module, under open computing language OpenCL frameworks, to the first matrix and the second matrix point of multiplication
Carry out not vectorization processing;
The computing module, multiple submatrixs for being obtained after vectorization is handled carry out concurrent operation.
7. matrix-vector multiplication as claimed in claim 6 realizes device, it is characterised in that the processing module is transported open
Calculate under language OpenCL frameworks, carrying out vectorization processing respectively to the first matrix and the second matrix of multiplication includes:
Using each row vector in first matrix as a row data block, and by each in second matrix
Column vector is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain multiple
Kernel functions;Wherein, each kernel functions are corresponded with the wherein incoming row data block and the column data block;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple row vectors
Submatrix and multiple column vector submatrixs.
8. matrix-vector multiplication as claimed in claim 7 realizes device, it is characterised in that the processing module is to each
Row data block and column data block in kernel functions carry out vectorization processing respectively to be included:
Using the vectorial Vector data types of the OpenCL, respectively to every n real-coded GA in the row data block
Vectorization processing is carried out to obtain multiple row vector submatrixs, and respectively to every n floating type number in the column data block
Handled according to vectorization is carried out to obtain multiple column vector submatrixs, wherein, n is positive integer.
9. matrix-vector multiplication as claimed in claim 8 realizes device, it is characterised in that the computing module is by vectorization
The multiple submatrixs obtained after processing, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is subjected to parallel phase with column vector submatrix respectively
Multiply.
10. matrix-vector multiplication as claimed in claim 8 realizes device, it is characterised in that the n=4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710506697.3A CN107256203A (en) | 2017-06-28 | 2017-06-28 | The implementation method and device of a kind of matrix-vector multiplication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710506697.3A CN107256203A (en) | 2017-06-28 | 2017-06-28 | The implementation method and device of a kind of matrix-vector multiplication |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107256203A true CN107256203A (en) | 2017-10-17 |
Family
ID=60024258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710506697.3A Pending CN107256203A (en) | 2017-06-28 | 2017-06-28 | The implementation method and device of a kind of matrix-vector multiplication |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107256203A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726357A (en) * | 2017-10-27 | 2019-05-07 | 阿里巴巴集团控股有限公司 | Matrix multiplication calculation method and calculating equipment |
CN111339490A (en) * | 2020-02-18 | 2020-06-26 | 三星(中国)半导体有限公司 | Matrix multiplication computing method and device |
CN112632464A (en) * | 2020-12-28 | 2021-04-09 | 上海壁仞智能科技有限公司 | Processing device for processing data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411558A (en) * | 2011-10-31 | 2012-04-11 | 中国人民解放军国防科学技术大学 | Vector processor oriented large matrix multiplied vectorization realizing method |
CN103631761A (en) * | 2012-08-29 | 2014-03-12 | 睿励科学仪器(上海)有限公司 | Method for matrix operation and rigorous wave coupling analysis through parallel processing architecture |
CN105426344A (en) * | 2015-11-09 | 2016-03-23 | 南京大学 | Matrix calculation method of distributed large-scale matrix multiplication based on Spark |
US20160140084A1 (en) * | 2014-11-14 | 2016-05-19 | Advanced Micro Devices, Inc. | Efficient sparse matrix-vector multiplication on parallel processors |
-
2017
- 2017-06-28 CN CN201710506697.3A patent/CN107256203A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411558A (en) * | 2011-10-31 | 2012-04-11 | 中国人民解放军国防科学技术大学 | Vector processor oriented large matrix multiplied vectorization realizing method |
CN103631761A (en) * | 2012-08-29 | 2014-03-12 | 睿励科学仪器(上海)有限公司 | Method for matrix operation and rigorous wave coupling analysis through parallel processing architecture |
US20160140084A1 (en) * | 2014-11-14 | 2016-05-19 | Advanced Micro Devices, Inc. | Efficient sparse matrix-vector multiplication on parallel processors |
CN105426344A (en) * | 2015-11-09 | 2016-03-23 | 南京大学 | Matrix calculation method of distributed large-scale matrix multiplication based on Spark |
Non-Patent Citations (1)
Title |
---|
刘文志 等: "《OpenCL异构并行计算 原理、机制与优化实践》", 31 January 2016, 机械工业出版社 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726357A (en) * | 2017-10-27 | 2019-05-07 | 阿里巴巴集团控股有限公司 | Matrix multiplication calculation method and calculating equipment |
CN109726357B (en) * | 2017-10-27 | 2023-04-07 | 阿里巴巴集团控股有限公司 | Matrix multiplication computing method and computing device |
CN111339490A (en) * | 2020-02-18 | 2020-06-26 | 三星(中国)半导体有限公司 | Matrix multiplication computing method and device |
CN111339490B (en) * | 2020-02-18 | 2024-04-19 | 三星(中国)半导体有限公司 | Matrix multiplication calculation method and device |
CN112632464A (en) * | 2020-12-28 | 2021-04-09 | 上海壁仞智能科技有限公司 | Processing device for processing data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245751B (en) | GEMM operation method and device | |
EP3179415B1 (en) | Systems and methods for a multi-core optimized recurrent neural network | |
CN107145939A (en) | A kind of Neural network optimization and device | |
CN103970720B (en) | Based on extensive coarseness imbedded reconfigurable system and its processing method | |
CN106407158A (en) | GPU accelerated method for performing batch processing of isomorphic sparse matrixes multiplied by full vectors | |
CN105373517A (en) | Spark-based distributed matrix inversion parallel operation method | |
CN107256203A (en) | The implementation method and device of a kind of matrix-vector multiplication | |
US20200117988A1 (en) | Networks for distributing parameters and data to neural network compute cores | |
CN115880132B (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN107590106A (en) | A kind of computational methods for being applied to symmetrical matrix and vector multiplication | |
CN109284824A (en) | A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies | |
CN110069444A (en) | A kind of computing unit, array, module, hardware system and implementation method | |
CN108595379A (en) | A kind of parallelization convolution algorithm method and system based on multi-level buffer | |
CN108197075B (en) | Multi-core implementation method of Inceptation structure | |
CN104484234A (en) | Multi-front load flow calculation method and system based on GPU (graphics processing unit) | |
CN109472734A (en) | A kind of target detection network and its implementation based on FPGA | |
CN114201287A (en) | Method for cooperatively processing data based on CPU + GPU heterogeneous platform | |
CN209708122U (en) | A kind of computing unit, array, module, hardware system | |
Fuketa et al. | Image-classifier deep convolutional neural network training by 9-bit dedicated hardware to realize validation accuracy and energy efficiency superior to the half precision floating point format | |
CN104572588B (en) | Matrix inversion process method and apparatus | |
Wang et al. | Accelerating ap3m-based computational astrophysics simulations with reconfigurable clusters | |
CN109615061A (en) | A kind of convolution algorithm method and device | |
CN107220702B (en) | Computer vision processing method and device of low-computing-capacity processing equipment | |
Yu et al. | GPU-based JFNG method for power system transient dynamic simulation | |
Alias et al. | Parallel performance comparison of alternating group explicit method between parallel virtual machine and matlab distributed computing for solving large sparse partial differential equations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171017 |