CN107506173A - A kind of accelerated method, the apparatus and system of singular value decomposition computing - Google Patents
A kind of accelerated method, the apparatus and system of singular value decomposition computing Download PDFInfo
- Publication number
- CN107506173A CN107506173A CN201710765950.7A CN201710765950A CN107506173A CN 107506173 A CN107506173 A CN 107506173A CN 201710765950 A CN201710765950 A CN 201710765950A CN 107506173 A CN107506173 A CN 107506173A
- Authority
- CN
- China
- Prior art keywords
- singular value
- value decomposition
- computing
- pending
- operational order
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/57—Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Physics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Power Sources (AREA)
Abstract
The embodiment of the invention discloses a kind of accelerated method, the apparatus and system of singular value decomposition computing, including the pending singular value decomposition data of reception processing device transmission and point multiplication operation instruction;According to the point multiplication operation algorithm of point multiplication operation instruction calls FPGA hardware circuit realiration, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain the first operation result;First operation result is back to processor.The embodiment of the present invention realizes the point multiplication operation in singular value decomposition computing by FPGA parallel computing platforms, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call the point of FPGA hardware circuit realiration into algorithm to be calculated accordingly, the embodiment of the present invention is while realizing concurrent operation, ensureing computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.
Description
Technical field
The present embodiments relate to singularity value decomposition field, more particularly to a kind of acceleration of singular value decomposition computing
Method, apparatus and system.
Background technology
Singular value decomposition is a kind of important matrix decomposition in linear algebra, is normal matrix unitarily diagonalizable in matrix analysis
Popularization.There is important, such as the field such as signal transacting and statistics in many fields.With the hair of big data
Exhibition, during singular value decomposition computing, the data operation of magnanimity can expend the very long time, and occupy ample resources.
Its main operational in singular value decomposition computing is added up for vector and point multiplication operation, and such computing has parallel characteristics,
So in order to using the cumulative parallel characteristics with point multiplication operation of vector in different value decomposition operation, reduce operation time, existing skill
In art, singular value decomposition computing is performed typically on GPU, although singular value decomposition computing is performed on GPU can realize simultaneously
Row calculates, and still, because GPU power consumption is higher, makes calculating cost higher.
Therefore, how a kind of accelerated method for the singular value decomposition computing for solving above-mentioned technical problem, device are provided and is
System, which turns into those skilled in the art, to be needed to solve the problems, such as at present.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of accelerated method, the apparatus and system of singular value decomposition computing, in reality
While existing concurrent operation, guarantee computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.
In order to solve the above technical problems, the embodiments of the invention provide a kind of accelerated method of singular value decomposition computing, bag
Include:
The pending singular value decomposition data and operational order that reception processing device is sent;
According to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to described
Mathematical algorithm is handled the pending singular value decomposition data, obtains operation result;
The operation result is back to the processor;
The operational order instructs including point multiplication operation, and the mathematical algorithm includes point multiplication operation algorithm, the computing knot
Fruit includes the first operation result.
Optionally, the operational order also includes the instruction of vectorial accumulating operation, and it is cumulative that the mathematical algorithm also includes vector
Mathematical algorithm, the operation result also include the second operation result.
Optionally, the pending singular value decomposition data and the process of operational order that the reception processing device is sent are specific
For:
The pending singular value decomposition data and operational order that reception processing device is sent;
Cache the pending singular value decomposition data.
The embodiment of the present invention additionally provides a kind of accelerator of singular value decomposition computing, including:
Receiving module, the pending singular value decomposition data and operational order sent for reception processing device;
Computing module, for calling computing calculation corresponding, by FPGA hardware circuit realiration according to the operational order
Method, and the pending singular value decomposition data are handled according to the mathematical algorithm, obtain operation result;
Module is returned to, for the operation result to be back into the processor;
The operational order instructs including point multiplication operation, and the computing module includes point multiplication operation unit, the computing knot
Fruit includes the first operation result.
Optionally, the operational order also includes the instruction of vectorial accumulating operation, and it is cumulative that the computing module also includes vector
Arithmetic element, the operation result also include the second operation result.
Optionally, the receiving module includes:
Receiving unit, the pending singular value decomposition data and operational order sent for reception processing device;
Buffer unit, for caching the pending singular value decomposition data.
Optionally, the buffer unit includes DDR3 memories.
The embodiment of the present invention additionally provides a kind of acceleration system of singular value decomposition computing, including processor and institute as described above
The accelerator for the singular value decomposition computing stated.
The embodiments of the invention provide a kind of accelerated method, the apparatus and system of singular value decomposition computing, including receiving area
Manage pending singular value decomposition data and the point multiplication operation instruction that device is sent;According to point multiplication operation instruction calls FPGA hardware electricity
The point multiplication operation algorithm that road is realized, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain
First operation result;First operation result is back to processor.
The embodiment of the present invention realizes the point multiplication operation in singular value decomposition computing by FPGA parallel computing platforms, is receiving
After pending singular value decomposition data and the point multiplication operation instruction sent to processor, the point of FPGA hardware circuit realiration is called
Into algorithm to be calculated accordingly, the embodiment of the present invention reduces fortune while realizing concurrent operation, ensureing computational efficiency
Power consumption during calculation, so as to reduce calculating cost.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to institute in prior art and embodiment
The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention
Example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to these accompanying drawings
Obtain other accompanying drawings.
Fig. 1 is a kind of schematic flow sheet of the accelerated method of singular value decomposition computing provided in an embodiment of the present invention;
Fig. 2 is a kind of structural representation of the accelerator of singular value decomposition computing provided in an embodiment of the present invention;
Fig. 3 is the structural representation of the accelerator of another singular value decomposition computing provided in an embodiment of the present invention.
Embodiment
It is parallel realizing the embodiments of the invention provide a kind of accelerated method, the apparatus and system of singular value decomposition computing
While computing, guarantee computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Fig. 1 is refer to, Fig. 1 is that a kind of flow of the accelerated method of singular value decomposition computing provided in an embodiment of the present invention is shown
It is intended to.
This method includes:
S11:The pending singular value decomposition data and operational order that reception processing device is sent;
S12:According to operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to computing
Algorithm is handled pending singular value decomposition data, obtains operation result;
S13:Operation result is back to processor;
Operational order instructs including point multiplication operation, and mathematical algorithm includes point multiplication operation algorithm, and operation result includes the first fortune
Calculate result.
, can be with it should be noted that the embodiment of the present invention realizes singular value decomposition computing by FPGA parallel computing platforms
Calculating process is accelerated to a certain extent, improves calculating speed.
Specifically, by the way that the related operation (such as point multiplication operation etc.) in singular value decomposition is passed through into opencl in advance
(Open Computing Laguage, open computing language) is transplanted to FPAG (Field-Programmable Gate
Array, field programmable gate array) board on, namely be transplanted to FPGA kernel ends, realized so as to pass through to call
The FPGA hardware circuit of corresponding mathematical algorithm carries out calculating processing to pending data, and can in specific calculating process
To realize parallel computation by Pragaunroll x instructions, wherein x is the expansion number of plies, and its specific value can be according to PCIe bands
Width is determined.
Specifically, the point multiplication operation in singular value decomposition computing can be transplanted to FPAG board by opencl in advance
On, wherein, the relationship of point multiplication operation is:
Wherein, X1=m1*a1+m2*a2+...mN*aN;X2=m1*b1+m2*b2+...mN*bN;X3、X4Deng by that analogy.
It should also be noted that, the first operation result be with corresponding mathematical algorithm corresponding to operation result, for example, when fortune
Calculate instruction for point multiplication operation instruction when, FPGA hardware circuit realiration will be passed through according to the point multiplication operation instruction calls point multiplication operation
Algorithm, and calculating processing is carried out to pending singular value decomposition data by the algorithm, obtain the first operation result.
Further, operational order also includes the instruction of vectorial accumulating operation, and mathematical algorithm also includes vectorial accumulating operation and calculated
Method, operation result also include the second operation result.
Specifically, the vectorial accumulating operation algorithm in singular value decomposition computing can also be transplanted to by opencl in advance
On FPAG board, wherein, the relationship of vectorial accumulating operation algorithm is:
Wherein, u1=X1*a1+X1*b1+...X1*size1, u2=X2*a2+X2*b2+...X2*size2, u3、u4Deng class successively
Push away.
Instructed it should be noted that the operational order received can also include vectorial accumulating operation, now, will be according to this
Vectorial accumulating operation instruction calls treat place by the vectorial accumulating operation algorithm of FPGA hardware circuit realiration by the algorithm
The singular value decomposition data of reason carry out calculating processing, obtain corresponding second operation result.
In addition, when needing to carry out point multiplication operation and vectorial accumulating operation to pending singular value decomposition data, dot product
Computing and vectorial accumulating operation can pass through the relational implementation of flowing water.
Specifically, reception processing device is sent in above-mentioned S11 pending singular value decomposition data and the mistake of operational order
Journey is specially:
The pending singular value decomposition data and operational order that reception processing device is sent;
Cache pending singular value decomposition data.
It should be noted that when the data volume of pending singular value decomposition data is larger, can also be delayed
Deposit, and calculating processing is carried out to the data after caching.
The embodiments of the invention provide a kind of accelerated method of singular value decomposition computing, including treating of sending of reception processing device
Singular value decomposition data and the point multiplication operation instruction of processing;Dot product according to point multiplication operation instruction calls FPGA hardware circuit realiration
Mathematical algorithm, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain the first operation result;
First operation result is back to processor.
The embodiment of the present invention realizes the point multiplication operation in singular value decomposition computing by FPGA parallel computing platforms, is receiving
After pending singular value decomposition data and the point multiplication operation instruction sent to processor, the point of FPGA hardware circuit realiration is called
Into algorithm to be calculated accordingly, the embodiment of the present invention reduces fortune while realizing concurrent operation, ensureing computational efficiency
Power consumption during calculation, so as to reduce calculating cost.
Accordingly, the embodiments of the invention provide a kind of accelerator of singular value decomposition computing, as shown in Fig. 2 Fig. 2 is
A kind of structural representation of the accelerator of singular value decomposition computing provided in an embodiment of the present invention.On the basis of above-described embodiment
On:
The device includes:
Receiving module 1, the pending singular value decomposition data and operational order sent for reception processing device;
Computing module 2, for according to operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration,
And pending singular value decomposition data are handled according to mathematical algorithm, obtain operation result;
Module 3 is returned to, for operation result to be back into processor;
Operational order instructs including point multiplication operation, and computing module 2 includes point multiplication operation unit, and operation result includes the first fortune
Calculate result.
It should be noted that point multiplication operation unit in computing module is particularly for realizing the first of point multiplication operation algorithm
FPGA hardware circuit.
Fig. 3 is refer to, Fig. 3 is the structure of the accelerator for another singular value decomposition computing that the present invention implements offer
Schematic diagram.
On the basis of above-described embodiment, further operational order also includes vectorial accumulating operation and instructed, computing module 2
Also include vectorial accumulating operation unit, operation result also includes the second operation result.
It should be noted that vectorial accumulating operation unit in computing module is particularly for realizing that vectorial accumulating operation calculates
Second FPGA hardware circuit of method.
Optionally, receiving module 1 includes:
Receiving unit, the pending singular value decomposition data and operational order sent for reception processing device;
Buffer unit, for caching pending singular value decomposition data.
Specifically, buffer unit can include DDR3 memories, it is of course also possible to the memory including other models, tool
The body embodiment of the present invention does not limit.
It should be noted that the embodiment of the present invention realizes the point in singular value decomposition computing by FPGA parallel computing platforms
Multiplication, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call FPGA hard
The point of part circuit realiration into algorithm to be calculated accordingly, realizing concurrent operation, ensureing computational efficiency by the embodiment of the present invention
While, the power consumption in calculating process is reduced, so as to reduce calculating cost.
In addition, the specific introduction for the accelerated method of singular value decomposition computing involved in the embodiment of the present invention please
With reference to above method embodiment, the application will not be repeated here.
The embodiment of the present invention additionally provides a kind of acceleration system of singular value decomposition computing, including processor and described above
The accelerator of singular value decomposition computing.
It should be noted that the embodiment of the present invention realizes the point in singular value decomposition computing by FPGA parallel computing platforms
Multiplication, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call FPGA hard
The point of part circuit realiration into algorithm to be calculated accordingly, realizing concurrent operation, ensureing computational efficiency by the embodiment of the present invention
While, the power consumption in calculating process is reduced, so as to reduce calculating cost.In addition, for involved in the embodiment of the present invention
And to the specific introduction of accelerated method of singular value decomposition computing refer to above method embodiment, the application is no longer superfluous herein
State.
Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be and other
The difference of embodiment, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment
For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part
It is bright.
It should also be noted that, in this manual, such as first and second or the like relational terms be used merely to by
One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation
Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning
Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that
A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or
The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged
Except other identical element in the process including the key element, method, article or equipment being also present.
Professional further appreciates that, with reference to the unit of each example of the embodiments described herein description
And algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software, the composition and step of each example are generally described according to function in the above description.These
Function is performed with hardware or software mode actually, application-specific and design constraint depending on technical scheme.Specialty
Technical staff can realize described function using distinct methods to each specific application, but this realization should not
Think beyond the scope of this invention.
Directly it can be held with reference to the step of method or algorithm that the embodiments described herein describes with hardware, processor
Capable software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In the storage medium of any other forms well known in field.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention.
A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention
The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one
The most wide scope caused.
Claims (8)
- A kind of 1. accelerated method of singular value decomposition computing, it is characterised in that including:The pending singular value decomposition data and operational order that reception processing device is sent;According to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to the computing Algorithm is handled the pending singular value decomposition data, obtains operation result;The operation result is back to the processor;The operational order instructs including point multiplication operation, and the mathematical algorithm includes point multiplication operation algorithm, the operation result bag Include the first operation result.
- 2. according to the accelerated method of claim 1 singular value decomposition computing, it is characterised in that the operational order also includes vector Accumulating operation instructs, and the mathematical algorithm also includes vectorial accumulating operation algorithm, and the operation result also includes the second computing knot Fruit.
- 3. according to the accelerated method of claim 2 singular value decomposition computing, it is characterised in that what the reception processing device was sent treats The singular value decomposition data of processing and the process of operational order are specially:The pending singular value decomposition data and operational order that reception processing device is sent;Cache the pending singular value decomposition data.
- A kind of 4. accelerator of singular value decomposition computing, it is characterised in that including:Receiving module, the pending singular value decomposition data and operational order sent for reception processing device;Computing module, for according to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and The pending singular value decomposition data are handled according to the mathematical algorithm, obtain operation result;Module is returned to, for the operation result to be back into the processor;The operational order instructs including point multiplication operation, and the computing module includes point multiplication operation unit, the operation result bag Include the first operation result.
- 5. according to the accelerator of claim 4 singular value decomposition computing, it is characterised in that the operational order also includes vector Accumulating operation instructs, and the computing module also includes vectorial accumulating operation unit, and the operation result also includes the second computing knot Fruit.
- 6. according to the accelerator of claim 5 singular value decomposition computing, it is characterised in that the receiving module includes:Receiving unit, the pending singular value decomposition data and operational order sent for reception processing device;Buffer unit, for caching the pending singular value decomposition data.
- 7. according to the accelerator of claim 6 singular value decomposition computing, it is characterised in that the buffer unit is deposited including DDR3 Reservoir.
- 8. a kind of acceleration system of singular value decomposition computing, it is characterised in that including processor and as claim 4-7 is any one The accelerator of singular value decomposition computing described in.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710765950.7A CN107506173A (en) | 2017-08-30 | 2017-08-30 | A kind of accelerated method, the apparatus and system of singular value decomposition computing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710765950.7A CN107506173A (en) | 2017-08-30 | 2017-08-30 | A kind of accelerated method, the apparatus and system of singular value decomposition computing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107506173A true CN107506173A (en) | 2017-12-22 |
Family
ID=60694435
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710765950.7A Pending CN107506173A (en) | 2017-08-30 | 2017-08-30 | A kind of accelerated method, the apparatus and system of singular value decomposition computing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107506173A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109976809A (en) * | 2017-12-28 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Dispatching method and relevant apparatus |
CN111722834A (en) * | 2020-07-23 | 2020-09-29 | 哈尔滨工业大学 | Robot-oriented EKF-SLAM algorithm acceleration method |
CN112596701A (en) * | 2021-03-05 | 2021-04-02 | 之江实验室 | FPGA acceleration realization method based on unilateral Jacobian singular value decomposition |
CN116382617A (en) * | 2023-06-07 | 2023-07-04 | 之江实验室 | Singular value decomposition accelerator with parallel ordering function based on FPGA |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228238A (en) * | 2016-07-27 | 2016-12-14 | 中国科学技术大学苏州研究院 | The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform |
CN106528490A (en) * | 2016-11-30 | 2017-03-22 | 郑州云海信息技术有限公司 | FPGA (Field Programmable Gate Array) heterogeneous accelerated computing device and system |
CN107027036A (en) * | 2017-05-12 | 2017-08-08 | 郑州云海信息技术有限公司 | A kind of FPGA isomeries accelerate decompression method, the apparatus and system of platform |
-
2017
- 2017-08-30 CN CN201710765950.7A patent/CN107506173A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106228238A (en) * | 2016-07-27 | 2016-12-14 | 中国科学技术大学苏州研究院 | The method and system of degree of depth learning algorithm is accelerated on field programmable gate array platform |
CN106528490A (en) * | 2016-11-30 | 2017-03-22 | 郑州云海信息技术有限公司 | FPGA (Field Programmable Gate Array) heterogeneous accelerated computing device and system |
CN107027036A (en) * | 2017-05-12 | 2017-08-08 | 郑州云海信息技术有限公司 | A kind of FPGA isomeries accelerate decompression method, the apparatus and system of platform |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109976809A (en) * | 2017-12-28 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Dispatching method and relevant apparatus |
CN111722834A (en) * | 2020-07-23 | 2020-09-29 | 哈尔滨工业大学 | Robot-oriented EKF-SLAM algorithm acceleration method |
CN111722834B (en) * | 2020-07-23 | 2021-12-28 | 哈尔滨工业大学 | Robot-oriented EKF-SLAM algorithm acceleration method |
CN112596701A (en) * | 2021-03-05 | 2021-04-02 | 之江实验室 | FPGA acceleration realization method based on unilateral Jacobian singular value decomposition |
CN112596701B (en) * | 2021-03-05 | 2021-06-01 | 之江实验室 | FPGA acceleration realization method based on unilateral Jacobian singular value decomposition |
CN116382617A (en) * | 2023-06-07 | 2023-07-04 | 之江实验室 | Singular value decomposition accelerator with parallel ordering function based on FPGA |
CN116382617B (en) * | 2023-06-07 | 2023-08-29 | 之江实验室 | Singular value decomposition accelerator with parallel ordering function based on FPGA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107506173A (en) | A kind of accelerated method, the apparatus and system of singular value decomposition computing | |
US12014272B2 (en) | Vector computation unit in a neural network processor | |
CN103970718B (en) | Device and method is realized in a kind of fast Fourier transform | |
GB2583594A (en) | Performing kernel striding in hardware | |
CN109284817A (en) | Depth separates convolutional neural networks processing framework/method/system and medium | |
GB2600031A (en) | Batch processing in a neural network processor | |
CN106776466A (en) | A kind of FPGA isomeries speed-up computation apparatus and system | |
CN103984560B (en) | Based on extensive coarseness imbedded reconfigurable system and its processing method | |
CN110210610A (en) | Convolutional calculation accelerator, convolutional calculation method and convolutional calculation equipment | |
CN104699624B (en) | Lothrus apterus towards FFT parallel computations stores access method | |
CN109409511A (en) | A kind of convolution algorithm data stream scheduling method for dynamic reconfigurable array | |
CN105808795A (en) | FPGA chip global placement optimization method based on temporal constraint | |
KR20210074992A (en) | Accelerating 2d convolutional layer mapping on a dot product architecture | |
CN104268122A (en) | Point-changeable floating point FFT (fast Fourier transform) processor | |
CN103365731B (en) | A kind of method and system reducing processor soft error rate | |
CN109993293A (en) | A kind of deep learning accelerator suitable for stack hourglass network | |
CN105359142B (en) | Hash connecting method and device | |
CN107402905A (en) | Computational methods and device based on neutral net | |
CN103544111B (en) | A kind of hybrid base FFT method based on real-time process | |
CN102567283A (en) | Method for small matrix inversion by using GPU (graphic processing unit) | |
CN109740115A (en) | A kind of method, device and equipment for realizing matrix multiplication operation | |
CN103294646B (en) | Digital signal processing method and digital signal processor | |
CN110377877A (en) | A kind of data processing method, device, equipment and storage medium | |
CN110163793A (en) | Convolutional calculation acceleration method and device | |
CN109308327A (en) | Figure calculation method device medium apparatus based on the compatible dot center's model of subgraph model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171222 |
|
RJ01 | Rejection of invention patent application after publication |