CN107256203A - The implementation method and device of a kind of matrix-vector multiplication - Google Patents

The implementation method and device of a kind of matrix-vector multiplication Download PDF

Info

Publication number
CN107256203A
CN107256203A CN201710506697.3A CN201710506697A CN107256203A CN 107256203 A CN107256203 A CN 107256203A CN 201710506697 A CN201710506697 A CN 201710506697A CN 107256203 A CN107256203 A CN 107256203A
Authority
CN
China
Prior art keywords
matrix
data block
vector
row
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710506697.3A
Other languages
Chinese (zh)
Inventor
谢启凯
吴韶华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710506697.3A priority Critical patent/CN107256203A/en
Publication of CN107256203A publication Critical patent/CN107256203A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the invention discloses a kind of implementation method of matrix-vector multiplication, this method includes:Under open computing language OpenCL frameworks, vectorization processing is carried out respectively to the first matrix and the second matrix of multiplication;The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.Device is realized the embodiment of the invention also discloses a kind of matrix-vector multiplication.It by the scheme of the embodiment of the present invention, can implement on high-performance calculation platform, take full advantage of computer hardware resource, and substantially reduce the calculating time, improve operation efficiency.

Description

The implementation method and device of a kind of matrix-vector multiplication
Technical field
The present embodiments relate to the implementation method and dress of high-performance computing sector, more particularly to a kind of matrix-vector multiplication Put.
Background technology
The data huge explosion of the current social mankind, information data is more and more, and people are to the disposal ability of information data It is required that also more and more higher, such as in artificial intelligence, weather forecast, space flight national defence, financial economy, oil exploration, scientific research Field, the demand to high performance computation is growing day by day, and high performance matrix-vector multiplication is calculated, and is even more its important foundation stone. But the scheme used in current matrix-vector multiplication is CPU (Central Processing Unit central processing units) serial Calculating matrix vector multiplication, i.e., carry out the multiplication of next data again after being multiplied to a data in matrix, calculates the time Long, efficiency is low, is required far from current growing data processing speed is met.
The content of the invention
In order to solve the above problems, the embodiment of the present invention proposes the implementation method and device of a kind of matrix-vector multiplication, It can implement on high-performance calculation platform, make full use of computer hardware resource, and substantially reduce the calculating time, improve Operation efficiency.
In order to achieve the above object, the embodiment of the present invention proposes a kind of implementation method of matrix-vector multiplication, this method Including:
Under open computing language OpenCL frameworks, the first matrix and the second matrix of multiplication are carried out at vectorization respectively Reason;
The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.
Alternatively, under open computing language OpenCL frameworks, the first matrix and the second matrix of multiplication are carried out respectively Vectorization processing includes:
Using each row vector in the first matrix as a row data block, and by each in the second matrix arrange to Amount is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain many Individual kernel functions;Wherein, each kernel functions and wherein incoming row data block and column data block are corresponded;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple rows To quantum matrix and multiple column vector submatrixs.
Alternatively, carrying out vectorization processing respectively to the row data block and column data block in each kernel functions includes:
Using OpenCL vectorial Vector data types, every n real-coded GA in row data block is carried out respectively Vectorization processing to obtain multiple row vector submatrixs, and every n real-coded GA in column data block is carried out respectively to Quantification treatment to obtain multiple column vector submatrixs, wherein, n is positive integer.
Alternatively, the multiple submatrixs obtained after vectorization is handled, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is carried out parallel with column vector submatrix respectively It is multiplied.
Alternatively, n=4.
In order to achieve the above object, what the embodiment of the present invention also proposed a kind of matrix-vector multiplication realizes device, the dress Put including:Processing module and computing module;
Processing module, under open computing language OpenCL frameworks, to the first matrix and the second matrix point of multiplication Carry out not vectorization processing;
Computing module, multiple submatrixs for being obtained after vectorization is handled carry out concurrent operation.
Alternatively, processing module is under open computing language OpenCL frameworks, to the first matrix and the second matrix of multiplication Carrying out vectorization processing respectively includes:
Using each row vector in the first matrix as a row data block, and by each in the second matrix arrange to Amount is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain many Individual kernel functions;Wherein, each kernel functions and wherein incoming row data block and column data block are corresponded;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple rows To quantum matrix and multiple column vector submatrixs.
Alternatively, processing module is carried out at vectorization respectively to the row data block and column data block in each kernel functions Reason includes:
Using OpenCL Vector data types, row vector is entered to every n real-coded GA in row data block respectively Change processing to obtain multiple row vector submatrixs, and vectorization is carried out to every n real-coded GA in column data block respectively Handle to obtain multiple column vector submatrixs, wherein, n is positive integer.
Alternatively, the multiple submatrixs obtained after computing module handles vectorization, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is carried out parallel with column vector submatrix respectively It is multiplied.
Alternatively, n=4.
Scheme of the embodiment of the present invention includes:Under open computing language OpenCL frameworks, the first matrix to multiplication and the Two matrixes carry out vectorization processing respectively;The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.By this hair The scheme of bright embodiment, can implement on high-performance calculation platform, take full advantage of computer hardware resource, and contract significantly The short calculating time, improve operation efficiency.
Brief description of the drawings
The accompanying drawing in the embodiment of the present invention is illustrated below, the accompanying drawing in embodiment is used for the embodiment of the present invention Further understand, together with specification be used for explain the embodiment of the present invention, do not constitute to protection domain of the embodiment of the present invention Limitation.
Fig. 1 is the implementation method flow chart of the matrix-vector multiplication of the embodiment of the present invention;
Fig. 2 carries out vectorization processing method stream respectively for the first matrix and the second matrix to multiplication of the embodiment of the present invention Cheng Tu;
Fig. 3 carries out data block to the first matrix and the second matrix for the embodiment of the present invention and divides schematic diagram;
Fig. 4 is the OpenCL block flow diagrams of the matrix-vector multiplication of the embodiment of the present invention;
Fig. 5 is carries out vectorization schematic diagram in the kernel functions of the embodiment of the present invention to row data block and column data block;
Fig. 6 realizes device composition frame chart for the matrix-vector multiplication of the embodiment of the present invention.
Embodiment
For the ease of the understanding of those skilled in the art, the embodiment of the present invention is made further to retouch below in conjunction with the accompanying drawings State, can not be used for limiting the protection domain of the embodiment of the present invention.
The embodiment of the present invention proposes a kind of implementation method of matrix-vector multiplication, as shown in figure 1, this method can include S101-S102:
S101, under open computing language OpenCL frameworks, the first matrix and the second matrix of multiplication are carried out respectively to Quantification treatment.
In embodiments of the present invention, asked in order to which the speed for solving the presence of current matrix multiplying method is slow, efficiency is low A kind of topic, it is proposed that matrix-vector multiplication implementation method based on OpenCL.(OpenComputingLanguage is opened OpenCL Put computing language) language, it is open, the cross-platform multiple programming framework towards the general purpose of heterogeneous system.Current The achievable computer hardware accelerated parallel under the conditions of, can make full use of computer hardware resource, improve matrix-vector multiplication The operation efficiency of method;Described computer hardware is all support OpenCL computer hardware platforms, for example, this computer is hard Part platform can by CPU, GPU (Graphic Processing Unit, graphics processor) or other kinds of processor group into.
In embodiments of the present invention, the method that the matrix-vector multiplication based on OpenCL is realized is by the way that operation matrix is divided Block processing, piecemeal number is equal with row matrix columns, so as to realize the parallelization processing of data.Following scheme can specifically be passed through Realize.
Alternatively, as shown in Fig. 2 under open computing language OpenCL frameworks, to the first matrix and the second square of multiplication Battle array carries out vectorization processing respectively can include S201-S203:
S201, using each row vector in the first matrix as a row data block, and will be each in the second matrix Individual column vector is used as a column data block.
In embodiments of the present invention, initial work is carried out first under OpenCL frameworks, to equipment Device, context Component is defined and assignment necessary to Context, program Program etc., then to the first matrix and the second matrix of multiplication Carry out piecemeal processing.Specifically, can be using each row vector in the first matrix as a row data block, and by the second square Each column vector in battle array is used as a column data block.Row data block Hblock1 as shown in Figure 3, Hblock2, Hblock3 ... Hblockn, and column data block Lblock1, Lblock2, Lblock3 ... Lblockn.
S202, respectively by any one row data block and the incoming kernel function of any one column data block, and Obtain multiple kernel functions;Wherein, each kernel functions and wherein incoming row data block and column data block are corresponded.
In embodiments of the present invention, as shown in figure 4, when being calculated, can by an arbitrary row data block and In the incoming kernel function of one column data block, for example, by Hblock1 and the incoming kernel of Lblock1, inciting somebody to action In the Hblock2 and incoming kerne2 of Lblock1, by Hblock3 and the incoming kerne3 of Lblock1 ... ..., the rest may be inferred, will Different row data blocks is put into different kernel functions with combining for column data block, to divide in different kernel functions It is other that parallel multiplication computing is done to corresponding row data block and column data block.
S203, vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain Multiple row vector submatrixs and multiple column vector submatrixs.
In embodiments of the present invention, after by the incoming different kernel functions of row data block and column data block, In each kernel functions, vectorization processing further can also be carried out to row data block and column data block, by the row data Block is further divided into smaller subvector or submatrix with column data block.It can specifically be realized by following scheme.
Alternatively, carrying out vectorization processing respectively to the row data block and column data block in each kernel functions includes:
Using OpenCL vectorial Vector data types, every n real-coded GA in row data block is carried out respectively Vectorization processing to obtain multiple row vector submatrixs, and every n real-coded GA in column data block is carried out respectively to Quantification treatment to obtain multiple column vector submatrixs, wherein, n is positive integer.
In embodiments of the present invention, as shown in figure 5, among the computing of kernel functions, using OpenCL Vector The data included in each row data block and column data block further can be carried out vectorization processing, each by data type Row data block and column data block can be divided into multiple submatrixs, each submatrix can be comprising n real-coded GA.
In embodiments of the present invention, n numerical value can be determined according to the computing capability of current calculating platform, if worked as The computing capability of preceding calculating platform is stronger, and can set n numerical value is smaller, if the calculating energy of current calculating platform Power is poor, and can set n numerical value is larger.Alternatively, n=4.For example, by every adjacent 4 in row data block Single-precision floating-point data is as a row vector submatrix, accordingly, by every 4 adjacent single-precision floating points in column data block Data are used as a column vector submatrix, row vector submatrix HVector1, HVector2 ... as shown in Figure 5 HVectorn, and column vector submatrix LVector1, LVector2 ... LVectorn.
S102, vectorization is handled after multiple submatrixs for obtaining carry out concurrent operation.
In embodiments of the present invention, just can be right after to the further vectorization of data block in each kernel functions Multiple submatrixs after vectorization carry out concurrent operation using asynchronous thread.
Alternatively, the multiple submatrixs obtained after vectorization is handled, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is carried out parallel with column vector submatrix respectively It is multiplied.
In embodiments of the present invention, for example HVector1 is multiplied with LVector1 respectively, by HVector2 with LVector2, which is multiplied, ... ... is multiplied HVectorn with LVectorn, and above-mentioned computing is carried out parallel, as n=4 so that 4 single precision floating datum operations of each thread single treatment, so as to further improve operation efficiency using vector operation.Each Computing unit is separate in calculating process, without communication, therefore also has good expansibility, and the embodiment scheme Parallel granularity and operation efficiency are taken into account.
In embodiments of the present invention, by scheme of the embodiment of the present invention will realize high-performance calculation unit between independently transport Calculate, without communication or mutually wait, realize that high-performance is quickly calculated;And there is scheme of the embodiment of the present invention good platform to move Plant property, due to OpenCL cross-platform characteristic, the isomery that this embodiment scheme can easily be transplanted to all support OpenCL is high In performance calculating platform;Compared with traditional CPU serial computing matrix-vector multiplications, data parallel and vectorization side make use of Method, substantially increases computational efficiency.
In order to achieve the above object, the embodiment of the present invention also proposed a kind of matrix-vector multiplication realize device 1, it is necessary to Illustrate, any one embodiment in above-mentioned embodiment of the method is suitable for the device embodiment of the present invention, herein Repeat no more, as shown in fig. 6, the device can include:Processing module 11 and computing module 12;
Processing module 11, under open computing language OpenCL frameworks, to the first matrix and the second matrix of multiplication Vectorization processing is carried out respectively;
Computing module 12, multiple submatrixs for being obtained after vectorization is handled carry out concurrent operation.
Alternatively, processing module 11 is under open computing language OpenCL frameworks, to the first matrix and the second square of multiplication Battle array carries out vectorization processing respectively to be included:
Using each row vector in the first matrix as a row data block, and by each in the second matrix arrange to Amount is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain many Individual kernel functions;Wherein, each kernel functions and wherein incoming row data block and column data block are corresponded;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple rows To quantum matrix and multiple column vector submatrixs.
Alternatively, the row data block and column data block in each kernel functions of 11 pairs of processing module carry out vectorization respectively Processing includes:
Using OpenCL Vector data types, row vector is entered to every n real-coded GA in row data block respectively Change processing to obtain multiple row vector submatrixs, and vectorization is carried out to every n real-coded GA in column data block respectively Handle to obtain multiple column vector submatrixs, wherein, n is positive integer.
Alternatively, multiple submatrixs that computing module 12 is obtained after vectorization is handled, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is carried out parallel with column vector submatrix respectively It is multiplied.
Alternatively, n=4.
Scheme of the embodiment of the present invention includes:Under open computing language OpenCL frameworks, the first matrix to multiplication and the Two matrixes carry out vectorization processing respectively;The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.By this hair The scheme of bright embodiment, can implement on high-performance calculation platform, take full advantage of computer hardware resource, and contract significantly The short calculating time, improve operation efficiency.
It should be noted that embodiment described above be for only for ease of it will be understood by those skilled in the art that, and It is not used in the protection domain of the limitation embodiment of the present invention, on the premise of the inventive concept of the embodiment of the present invention is not departed from, ability Any obvious replacement and improvement that field technique personnel are made to the embodiment of the present invention etc. is in the embodiment of the present invention Within protection domain.

Claims (10)

1. a kind of implementation method of matrix-vector multiplication, it is characterised in that methods described includes:
Under open computing language OpenCL frameworks, vectorization processing is carried out respectively to the first matrix and the second matrix of multiplication;
The multiple submatrixs obtained after vectorization is handled carry out concurrent operation.
2. the implementation method of matrix-vector multiplication as claimed in claim 1, it is characterised in that described in open computing language Under OpenCL frameworks, carrying out vectorization processing respectively to the first matrix and the second matrix of multiplication includes:
Using each row vector in first matrix as a row data block, and by each in second matrix Column vector is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain multiple Kernel functions;Wherein, each kernel functions are corresponded with the wherein incoming row data block and the column data block;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple row vectors Submatrix and multiple column vector submatrixs.
3. the implementation method of matrix-vector multiplication as claimed in claim 2, it is characterised in that described to each kernel functions In row data block and column data block carry out respectively vectorization processing include:
Using the vectorial Vector data types of the OpenCL, respectively to every n real-coded GA in the row data block Vectorization processing is carried out to obtain multiple row vector submatrixs, and respectively to every n floating type number in the column data block Handled according to vectorization is carried out to obtain multiple column vector submatrixs, wherein, n is positive integer.
4. the implementation method of matrix-vector multiplication as claimed in claim 3, it is characterised in that it is described vectorization is handled after obtain The multiple submatrixs obtained, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is subjected to parallel phase with column vector submatrix respectively Multiply.
5. the implementation method of matrix-vector multiplication as claimed in claim 3, it is characterised in that the n=4.
6. a kind of matrix-vector multiplication realizes device, it is characterised in that described device includes:Processing module and computing module;
The processing module, under open computing language OpenCL frameworks, to the first matrix and the second matrix point of multiplication Carry out not vectorization processing;
The computing module, multiple submatrixs for being obtained after vectorization is handled carry out concurrent operation.
7. matrix-vector multiplication as claimed in claim 6 realizes device, it is characterised in that the processing module is transported open Calculate under language OpenCL frameworks, carrying out vectorization processing respectively to the first matrix and the second matrix of multiplication includes:
Using each row vector in first matrix as a row data block, and by each in second matrix Column vector is used as a column data block;
Respectively by any one row data block and the incoming kernel function of any one column data block, and obtain multiple Kernel functions;Wherein, each kernel functions are corresponded with the wherein incoming row data block and the column data block;
Vectorization processing is carried out respectively to the row data block and column data block in each kernel functions, to obtain multiple row vectors Submatrix and multiple column vector submatrixs.
8. matrix-vector multiplication as claimed in claim 7 realizes device, it is characterised in that the processing module is to each Row data block and column data block in kernel functions carry out vectorization processing respectively to be included:
Using the vectorial Vector data types of the OpenCL, respectively to every n real-coded GA in the row data block Vectorization processing is carried out to obtain multiple row vector submatrixs, and respectively to every n floating type number in the column data block Handled according to vectorization is carried out to obtain multiple column vector submatrixs, wherein, n is positive integer.
9. matrix-vector multiplication as claimed in claim 8 realizes device, it is characterised in that the computing module is by vectorization The multiple submatrixs obtained after processing, which carry out concurrent operation, to be included:
In each kernel functions, mutual corresponding row vector submatrix is subjected to parallel phase with column vector submatrix respectively Multiply.
10. matrix-vector multiplication as claimed in claim 8 realizes device, it is characterised in that the n=4.
CN201710506697.3A 2017-06-28 2017-06-28 The implementation method and device of a kind of matrix-vector multiplication Pending CN107256203A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710506697.3A CN107256203A (en) 2017-06-28 2017-06-28 The implementation method and device of a kind of matrix-vector multiplication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710506697.3A CN107256203A (en) 2017-06-28 2017-06-28 The implementation method and device of a kind of matrix-vector multiplication

Publications (1)

Publication Number Publication Date
CN107256203A true CN107256203A (en) 2017-10-17

Family

ID=60024258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710506697.3A Pending CN107256203A (en) 2017-06-28 2017-06-28 The implementation method and device of a kind of matrix-vector multiplication

Country Status (1)

Country Link
CN (1) CN107256203A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726357A (en) * 2017-10-27 2019-05-07 阿里巴巴集团控股有限公司 Matrix multiplication calculation method and calculating equipment
CN111339490A (en) * 2020-02-18 2020-06-26 三星(中国)半导体有限公司 Matrix multiplication computing method and device
CN112632464A (en) * 2020-12-28 2021-04-09 上海壁仞智能科技有限公司 Processing device for processing data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411558A (en) * 2011-10-31 2012-04-11 中国人民解放军国防科学技术大学 Vector processor oriented large matrix multiplied vectorization realizing method
CN103631761A (en) * 2012-08-29 2014-03-12 睿励科学仪器(上海)有限公司 Method for matrix operation and rigorous wave coupling analysis through parallel processing architecture
CN105426344A (en) * 2015-11-09 2016-03-23 南京大学 Matrix calculation method of distributed large-scale matrix multiplication based on Spark
US20160140084A1 (en) * 2014-11-14 2016-05-19 Advanced Micro Devices, Inc. Efficient sparse matrix-vector multiplication on parallel processors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411558A (en) * 2011-10-31 2012-04-11 中国人民解放军国防科学技术大学 Vector processor oriented large matrix multiplied vectorization realizing method
CN103631761A (en) * 2012-08-29 2014-03-12 睿励科学仪器(上海)有限公司 Method for matrix operation and rigorous wave coupling analysis through parallel processing architecture
US20160140084A1 (en) * 2014-11-14 2016-05-19 Advanced Micro Devices, Inc. Efficient sparse matrix-vector multiplication on parallel processors
CN105426344A (en) * 2015-11-09 2016-03-23 南京大学 Matrix calculation method of distributed large-scale matrix multiplication based on Spark

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘文志 等: "《OpenCL异构并行计算 原理、机制与优化实践》", 31 January 2016, 机械工业出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726357A (en) * 2017-10-27 2019-05-07 阿里巴巴集团控股有限公司 Matrix multiplication calculation method and calculating equipment
CN109726357B (en) * 2017-10-27 2023-04-07 阿里巴巴集团控股有限公司 Matrix multiplication computing method and computing device
CN111339490A (en) * 2020-02-18 2020-06-26 三星(中国)半导体有限公司 Matrix multiplication computing method and device
CN111339490B (en) * 2020-02-18 2024-04-19 三星(中国)半导体有限公司 Matrix multiplication calculation method and device
CN112632464A (en) * 2020-12-28 2021-04-09 上海壁仞智能科技有限公司 Processing device for processing data

Similar Documents

Publication Publication Date Title
CN110245751B (en) GEMM operation method and device
EP3179415B1 (en) Systems and methods for a multi-core optimized recurrent neural network
CN107145939A (en) A kind of Neural network optimization and device
CN103970720B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
CN106407158A (en) GPU accelerated method for performing batch processing of isomorphic sparse matrixes multiplied by full vectors
CN105373517A (en) Spark-based distributed matrix inversion parallel operation method
CN107256203A (en) The implementation method and device of a kind of matrix-vector multiplication
US20200117988A1 (en) Networks for distributing parameters and data to neural network compute cores
CN115880132B (en) Graphics processor, matrix multiplication task processing method, device and storage medium
CN107590106A (en) A kind of computational methods for being applied to symmetrical matrix and vector multiplication
CN109284824A (en) A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies
CN110069444A (en) A kind of computing unit, array, module, hardware system and implementation method
CN108595379A (en) A kind of parallelization convolution algorithm method and system based on multi-level buffer
CN108197075B (en) Multi-core implementation method of Inceptation structure
CN104484234A (en) Multi-front load flow calculation method and system based on GPU (graphics processing unit)
CN109472734A (en) A kind of target detection network and its implementation based on FPGA
CN114201287A (en) Method for cooperatively processing data based on CPU + GPU heterogeneous platform
CN209708122U (en) A kind of computing unit, array, module, hardware system
Fuketa et al. Image-classifier deep convolutional neural network training by 9-bit dedicated hardware to realize validation accuracy and energy efficiency superior to the half precision floating point format
CN104572588B (en) Matrix inversion process method and apparatus
Wang et al. Accelerating ap3m-based computational astrophysics simulations with reconfigurable clusters
CN109615061A (en) A kind of convolution algorithm method and device
CN107220702B (en) Computer vision processing method and device of low-computing-capacity processing equipment
Yu et al. GPU-based JFNG method for power system transient dynamic simulation
Alias et al. Parallel performance comparison of alternating group explicit method between parallel virtual machine and matlab distributed computing for solving large sparse partial differential equations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171017