CN107506173A - A kind of accelerated method, the apparatus and system of singular value decomposition computing - Google Patents

A kind of accelerated method, the apparatus and system of singular value decomposition computing Download PDF

Info

Publication number
CN107506173A
CN107506173A CN201710765950.7A CN201710765950A CN107506173A CN 107506173 A CN107506173 A CN 107506173A CN 201710765950 A CN201710765950 A CN 201710765950A CN 107506173 A CN107506173 A CN 107506173A
Authority
CN
China
Prior art keywords
singular value
value decomposition
computing
pending
operational order
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710765950.7A
Other languages
Chinese (zh)
Inventor
李磊
王洪伟
李雪雷
丁良奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710765950.7A priority Critical patent/CN107506173A/en
Publication of CN107506173A publication Critical patent/CN107506173A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Power Sources (AREA)

Abstract

The embodiment of the invention discloses a kind of accelerated method, the apparatus and system of singular value decomposition computing, including the pending singular value decomposition data of reception processing device transmission and point multiplication operation instruction;According to the point multiplication operation algorithm of point multiplication operation instruction calls FPGA hardware circuit realiration, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain the first operation result;First operation result is back to processor.The embodiment of the present invention realizes the point multiplication operation in singular value decomposition computing by FPGA parallel computing platforms, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call the point of FPGA hardware circuit realiration into algorithm to be calculated accordingly, the embodiment of the present invention is while realizing concurrent operation, ensureing computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.

Description

A kind of accelerated method, the apparatus and system of singular value decomposition computing
Technical field
The present embodiments relate to singularity value decomposition field, more particularly to a kind of acceleration of singular value decomposition computing Method, apparatus and system.
Background technology
Singular value decomposition is a kind of important matrix decomposition in linear algebra, is normal matrix unitarily diagonalizable in matrix analysis Popularization.There is important, such as the field such as signal transacting and statistics in many fields.With the hair of big data Exhibition, during singular value decomposition computing, the data operation of magnanimity can expend the very long time, and occupy ample resources. Its main operational in singular value decomposition computing is added up for vector and point multiplication operation, and such computing has parallel characteristics, So in order to using the cumulative parallel characteristics with point multiplication operation of vector in different value decomposition operation, reduce operation time, existing skill In art, singular value decomposition computing is performed typically on GPU, although singular value decomposition computing is performed on GPU can realize simultaneously Row calculates, and still, because GPU power consumption is higher, makes calculating cost higher.
Therefore, how a kind of accelerated method for the singular value decomposition computing for solving above-mentioned technical problem, device are provided and is System, which turns into those skilled in the art, to be needed to solve the problems, such as at present.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of accelerated method, the apparatus and system of singular value decomposition computing, in reality While existing concurrent operation, guarantee computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.
In order to solve the above technical problems, the embodiments of the invention provide a kind of accelerated method of singular value decomposition computing, bag Include:
The pending singular value decomposition data and operational order that reception processing device is sent;
According to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to described Mathematical algorithm is handled the pending singular value decomposition data, obtains operation result;
The operation result is back to the processor;
The operational order instructs including point multiplication operation, and the mathematical algorithm includes point multiplication operation algorithm, the computing knot Fruit includes the first operation result.
Optionally, the operational order also includes the instruction of vectorial accumulating operation, and it is cumulative that the mathematical algorithm also includes vector Mathematical algorithm, the operation result also include the second operation result.
Optionally, the pending singular value decomposition data and the process of operational order that the reception processing device is sent are specific For:
The pending singular value decomposition data and operational order that reception processing device is sent;
Cache the pending singular value decomposition data.
The embodiment of the present invention additionally provides a kind of accelerator of singular value decomposition computing, including:
Receiving module, the pending singular value decomposition data and operational order sent for reception processing device;
Computing module, for calling computing calculation corresponding, by FPGA hardware circuit realiration according to the operational order Method, and the pending singular value decomposition data are handled according to the mathematical algorithm, obtain operation result;
Module is returned to, for the operation result to be back into the processor;
The operational order instructs including point multiplication operation, and the computing module includes point multiplication operation unit, the computing knot Fruit includes the first operation result.
Optionally, the operational order also includes the instruction of vectorial accumulating operation, and it is cumulative that the computing module also includes vector Arithmetic element, the operation result also include the second operation result.
Optionally, the receiving module includes:
Receiving unit, the pending singular value decomposition data and operational order sent for reception processing device;
Buffer unit, for caching the pending singular value decomposition data.
Optionally, the buffer unit includes DDR3 memories.
The embodiment of the present invention additionally provides a kind of acceleration system of singular value decomposition computing, including processor and institute as described above The accelerator for the singular value decomposition computing stated.
The embodiments of the invention provide a kind of accelerated method, the apparatus and system of singular value decomposition computing, including receiving area Manage pending singular value decomposition data and the point multiplication operation instruction that device is sent;According to point multiplication operation instruction calls FPGA hardware electricity The point multiplication operation algorithm that road is realized, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain First operation result;First operation result is back to processor.
The embodiment of the present invention realizes the point multiplication operation in singular value decomposition computing by FPGA parallel computing platforms, is receiving After pending singular value decomposition data and the point multiplication operation instruction sent to processor, the point of FPGA hardware circuit realiration is called Into algorithm to be calculated accordingly, the embodiment of the present invention reduces fortune while realizing concurrent operation, ensureing computational efficiency Power consumption during calculation, so as to reduce calculating cost.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to institute in prior art and embodiment The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention Example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to these accompanying drawings Obtain other accompanying drawings.
Fig. 1 is a kind of schematic flow sheet of the accelerated method of singular value decomposition computing provided in an embodiment of the present invention;
Fig. 2 is a kind of structural representation of the accelerator of singular value decomposition computing provided in an embodiment of the present invention;
Fig. 3 is the structural representation of the accelerator of another singular value decomposition computing provided in an embodiment of the present invention.
Embodiment
It is parallel realizing the embodiments of the invention provide a kind of accelerated method, the apparatus and system of singular value decomposition computing While computing, guarantee computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Fig. 1 is refer to, Fig. 1 is that a kind of flow of the accelerated method of singular value decomposition computing provided in an embodiment of the present invention is shown It is intended to.
This method includes:
S11:The pending singular value decomposition data and operational order that reception processing device is sent;
S12:According to operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to computing Algorithm is handled pending singular value decomposition data, obtains operation result;
S13:Operation result is back to processor;
Operational order instructs including point multiplication operation, and mathematical algorithm includes point multiplication operation algorithm, and operation result includes the first fortune Calculate result.
, can be with it should be noted that the embodiment of the present invention realizes singular value decomposition computing by FPGA parallel computing platforms Calculating process is accelerated to a certain extent, improves calculating speed.
Specifically, by the way that the related operation (such as point multiplication operation etc.) in singular value decomposition is passed through into opencl in advance (Open Computing Laguage, open computing language) is transplanted to FPAG (Field-Programmable Gate Array, field programmable gate array) board on, namely be transplanted to FPGA kernel ends, realized so as to pass through to call The FPGA hardware circuit of corresponding mathematical algorithm carries out calculating processing to pending data, and can in specific calculating process To realize parallel computation by Pragaunroll x instructions, wherein x is the expansion number of plies, and its specific value can be according to PCIe bands Width is determined.
Specifically, the point multiplication operation in singular value decomposition computing can be transplanted to FPAG board by opencl in advance On, wherein, the relationship of point multiplication operation is:
Wherein, X1=m1*a1+m2*a2+...mN*aN;X2=m1*b1+m2*b2+...mN*bN;X3、X4Deng by that analogy.
It should also be noted that, the first operation result be with corresponding mathematical algorithm corresponding to operation result, for example, when fortune Calculate instruction for point multiplication operation instruction when, FPGA hardware circuit realiration will be passed through according to the point multiplication operation instruction calls point multiplication operation Algorithm, and calculating processing is carried out to pending singular value decomposition data by the algorithm, obtain the first operation result.
Further, operational order also includes the instruction of vectorial accumulating operation, and mathematical algorithm also includes vectorial accumulating operation and calculated Method, operation result also include the second operation result.
Specifically, the vectorial accumulating operation algorithm in singular value decomposition computing can also be transplanted to by opencl in advance On FPAG board, wherein, the relationship of vectorial accumulating operation algorithm is:
Wherein, u1=X1*a1+X1*b1+...X1*size1, u2=X2*a2+X2*b2+...X2*size2, u3、u4Deng class successively Push away.
Instructed it should be noted that the operational order received can also include vectorial accumulating operation, now, will be according to this Vectorial accumulating operation instruction calls treat place by the vectorial accumulating operation algorithm of FPGA hardware circuit realiration by the algorithm The singular value decomposition data of reason carry out calculating processing, obtain corresponding second operation result.
In addition, when needing to carry out point multiplication operation and vectorial accumulating operation to pending singular value decomposition data, dot product Computing and vectorial accumulating operation can pass through the relational implementation of flowing water.
Specifically, reception processing device is sent in above-mentioned S11 pending singular value decomposition data and the mistake of operational order Journey is specially:
The pending singular value decomposition data and operational order that reception processing device is sent;
Cache pending singular value decomposition data.
It should be noted that when the data volume of pending singular value decomposition data is larger, can also be delayed Deposit, and calculating processing is carried out to the data after caching.
The embodiments of the invention provide a kind of accelerated method of singular value decomposition computing, including treating of sending of reception processing device Singular value decomposition data and the point multiplication operation instruction of processing;Dot product according to point multiplication operation instruction calls FPGA hardware circuit realiration Mathematical algorithm, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain the first operation result; First operation result is back to processor.
The embodiment of the present invention realizes the point multiplication operation in singular value decomposition computing by FPGA parallel computing platforms, is receiving After pending singular value decomposition data and the point multiplication operation instruction sent to processor, the point of FPGA hardware circuit realiration is called Into algorithm to be calculated accordingly, the embodiment of the present invention reduces fortune while realizing concurrent operation, ensureing computational efficiency Power consumption during calculation, so as to reduce calculating cost.
Accordingly, the embodiments of the invention provide a kind of accelerator of singular value decomposition computing, as shown in Fig. 2 Fig. 2 is A kind of structural representation of the accelerator of singular value decomposition computing provided in an embodiment of the present invention.On the basis of above-described embodiment On:
The device includes:
Receiving module 1, the pending singular value decomposition data and operational order sent for reception processing device;
Computing module 2, for according to operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, And pending singular value decomposition data are handled according to mathematical algorithm, obtain operation result;
Module 3 is returned to, for operation result to be back into processor;
Operational order instructs including point multiplication operation, and computing module 2 includes point multiplication operation unit, and operation result includes the first fortune Calculate result.
It should be noted that point multiplication operation unit in computing module is particularly for realizing the first of point multiplication operation algorithm FPGA hardware circuit.
Fig. 3 is refer to, Fig. 3 is the structure of the accelerator for another singular value decomposition computing that the present invention implements offer Schematic diagram.
On the basis of above-described embodiment, further operational order also includes vectorial accumulating operation and instructed, computing module 2 Also include vectorial accumulating operation unit, operation result also includes the second operation result.
It should be noted that vectorial accumulating operation unit in computing module is particularly for realizing that vectorial accumulating operation calculates Second FPGA hardware circuit of method.
Optionally, receiving module 1 includes:
Receiving unit, the pending singular value decomposition data and operational order sent for reception processing device;
Buffer unit, for caching pending singular value decomposition data.
Specifically, buffer unit can include DDR3 memories, it is of course also possible to the memory including other models, tool The body embodiment of the present invention does not limit.
It should be noted that the embodiment of the present invention realizes the point in singular value decomposition computing by FPGA parallel computing platforms Multiplication, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call FPGA hard The point of part circuit realiration into algorithm to be calculated accordingly, realizing concurrent operation, ensureing computational efficiency by the embodiment of the present invention While, the power consumption in calculating process is reduced, so as to reduce calculating cost.
In addition, the specific introduction for the accelerated method of singular value decomposition computing involved in the embodiment of the present invention please With reference to above method embodiment, the application will not be repeated here.
The embodiment of the present invention additionally provides a kind of acceleration system of singular value decomposition computing, including processor and described above The accelerator of singular value decomposition computing.
It should be noted that the embodiment of the present invention realizes the point in singular value decomposition computing by FPGA parallel computing platforms Multiplication, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call FPGA hard The point of part circuit realiration into algorithm to be calculated accordingly, realizing concurrent operation, ensureing computational efficiency by the embodiment of the present invention While, the power consumption in calculating process is reduced, so as to reduce calculating cost.In addition, for involved in the embodiment of the present invention And to the specific introduction of accelerated method of singular value decomposition computing refer to above method embodiment, the application is no longer superfluous herein State.
Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.
It should also be noted that, in this manual, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged Except other identical element in the process including the key element, method, article or equipment being also present.
Professional further appreciates that, with reference to the unit of each example of the embodiments described herein description And algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware and The interchangeability of software, the composition and step of each example are generally described according to function in the above description.These Function is performed with hardware or software mode actually, application-specific and design constraint depending on technical scheme.Specialty Technical staff can realize described function using distinct methods to each specific application, but this realization should not Think beyond the scope of this invention.
Directly it can be held with reference to the step of method or algorithm that the embodiments described herein describes with hardware, processor Capable software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In the storage medium of any other forms well known in field.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (8)

  1. A kind of 1. accelerated method of singular value decomposition computing, it is characterised in that including:
    The pending singular value decomposition data and operational order that reception processing device is sent;
    According to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to the computing Algorithm is handled the pending singular value decomposition data, obtains operation result;
    The operation result is back to the processor;
    The operational order instructs including point multiplication operation, and the mathematical algorithm includes point multiplication operation algorithm, the operation result bag Include the first operation result.
  2. 2. according to the accelerated method of claim 1 singular value decomposition computing, it is characterised in that the operational order also includes vector Accumulating operation instructs, and the mathematical algorithm also includes vectorial accumulating operation algorithm, and the operation result also includes the second computing knot Fruit.
  3. 3. according to the accelerated method of claim 2 singular value decomposition computing, it is characterised in that what the reception processing device was sent treats The singular value decomposition data of processing and the process of operational order are specially:
    The pending singular value decomposition data and operational order that reception processing device is sent;
    Cache the pending singular value decomposition data.
  4. A kind of 4. accelerator of singular value decomposition computing, it is characterised in that including:
    Receiving module, the pending singular value decomposition data and operational order sent for reception processing device;
    Computing module, for according to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and The pending singular value decomposition data are handled according to the mathematical algorithm, obtain operation result;
    Module is returned to, for the operation result to be back into the processor;
    The operational order instructs including point multiplication operation, and the computing module includes point multiplication operation unit, the operation result bag Include the first operation result.
  5. 5. according to the accelerator of claim 4 singular value decomposition computing, it is characterised in that the operational order also includes vector Accumulating operation instructs, and the computing module also includes vectorial accumulating operation unit, and the operation result also includes the second computing knot Fruit.
  6. 6. according to the accelerator of claim 5 singular value decomposition computing, it is characterised in that the receiving module includes:
    Receiving unit, the pending singular value decomposition data and operational order sent for reception processing device;
    Buffer unit, for caching the pending singular value decomposition data.
  7. 7. according to the accelerator of claim 6 singular value decomposition computing, it is characterised in that the buffer unit is deposited including DDR3 Reservoir.
  8. 8. a kind of acceleration system of singular value decomposition computing, it is characterised in that including processor and as claim 4-7 is any one The accelerator of singular value decomposition computing described in.
CN201710765950.7A 2017-08-30 2017-08-30 A kind of accelerated method, the apparatus and system of singular value decomposition computing Pending CN107506173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710765950.7A CN107506173A (en) 2017-08-30 2017-08-30 A kind of accelerated method, the apparatus and system of singular value decomposition computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710765950.7A CN107506173A (en) 2017-08-30 2017-08-30 A kind of accelerated method, the apparatus and system of singular value decomposition computing

Publications (1)

Publication Number Publication Date
CN107506173A true CN107506173A (en) 2017-12-22

Family

ID=60694435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710765950.7A Pending CN107506173A (en) 2017-08-30 2017-08-30 A kind of accelerated method, the apparatus and system of singular value decomposition computing

Country Status (1)

Country Link
CN (1) CN107506173A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109976809A (en) * 2017-12-28 2019-07-05 北京中科寒武纪科技有限公司 Dispatching method and relevant apparatus
CN111722834A (en) * 2020-07-23 2020-09-29 哈尔滨工业大学 Robot-oriented EKF-SLAM algorithm acceleration method
CN112596701A (en) * 2021-03-05 2021-04-02 之江实验室 FPGA acceleration realization method based on unilateral Jacobian singular value decomposition
CN116382617A (en) * 2023-06-07 2023-07-04 之江实验室 Singular value decomposition accelerator with parallel ordering function based on FPGA

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228238A (en) * 2016-07-27 2016-12-14 中国科学技术大学苏州研究院 The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN106528490A (en) * 2016-11-30 2017-03-22 郑州云海信息技术有限公司 FPGA (Field Programmable Gate Array) heterogeneous accelerated computing device and system
CN107027036A (en) * 2017-05-12 2017-08-08 郑州云海信息技术有限公司 A kind of FPGA isomeries accelerate decompression method, the apparatus and system of platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228238A (en) * 2016-07-27 2016-12-14 中国科学技术大学苏州研究院 The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform
CN106528490A (en) * 2016-11-30 2017-03-22 郑州云海信息技术有限公司 FPGA (Field Programmable Gate Array) heterogeneous accelerated computing device and system
CN107027036A (en) * 2017-05-12 2017-08-08 郑州云海信息技术有限公司 A kind of FPGA isomeries accelerate decompression method, the apparatus and system of platform

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109976809A (en) * 2017-12-28 2019-07-05 北京中科寒武纪科技有限公司 Dispatching method and relevant apparatus
CN111722834A (en) * 2020-07-23 2020-09-29 哈尔滨工业大学 Robot-oriented EKF-SLAM algorithm acceleration method
CN111722834B (en) * 2020-07-23 2021-12-28 哈尔滨工业大学 Robot-oriented EKF-SLAM algorithm acceleration method
CN112596701A (en) * 2021-03-05 2021-04-02 之江实验室 FPGA acceleration realization method based on unilateral Jacobian singular value decomposition
CN112596701B (en) * 2021-03-05 2021-06-01 之江实验室 FPGA acceleration realization method based on unilateral Jacobian singular value decomposition
CN116382617A (en) * 2023-06-07 2023-07-04 之江实验室 Singular value decomposition accelerator with parallel ordering function based on FPGA
CN116382617B (en) * 2023-06-07 2023-08-29 之江实验室 Singular value decomposition accelerator with parallel ordering function based on FPGA

Similar Documents

Publication Publication Date Title
CN107506173A (en) A kind of accelerated method, the apparatus and system of singular value decomposition computing
US12014272B2 (en) Vector computation unit in a neural network processor
CN103970718B (en) Device and method is realized in a kind of fast Fourier transform
GB2583594A (en) Performing kernel striding in hardware
CN109284817A (en) Depth separates convolutional neural networks processing framework/method/system and medium
GB2600031A (en) Batch processing in a neural network processor
CN106776466A (en) A kind of FPGA isomeries speed-up computation apparatus and system
CN103984560B (en) Based on extensive coarseness imbedded reconfigurable system and its processing method
CN110210610A (en) Convolutional calculation accelerator, convolutional calculation method and convolutional calculation equipment
CN104699624B (en) Lothrus apterus towards FFT parallel computations stores access method
CN109409511A (en) A kind of convolution algorithm data stream scheduling method for dynamic reconfigurable array
CN105808795A (en) FPGA chip global placement optimization method based on temporal constraint
KR20210074992A (en) Accelerating 2d convolutional layer mapping on a dot product architecture
CN104268122A (en) Point-changeable floating point FFT (fast Fourier transform) processor
CN103365731B (en) A kind of method and system reducing processor soft error rate
CN109993293A (en) A kind of deep learning accelerator suitable for stack hourglass network
CN105359142B (en) Hash connecting method and device
CN107402905A (en) Computational methods and device based on neutral net
CN103544111B (en) A kind of hybrid base FFT method based on real-time process
CN102567283A (en) Method for small matrix inversion by using GPU (graphic processing unit)
CN109740115A (en) A kind of method, device and equipment for realizing matrix multiplication operation
CN103294646B (en) Digital signal processing method and digital signal processor
CN110377877A (en) A kind of data processing method, device, equipment and storage medium
CN110163793A (en) Convolutional calculation acceleration method and device
CN109308327A (en) Figure calculation method device medium apparatus based on the compatible dot center's model of subgraph model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171222

RJ01 Rejection of invention patent application after publication