CN107506173A

CN107506173A - A kind of accelerated method, the apparatus and system of singular value decomposition computing

Info

Publication number: CN107506173A
Application number: CN201710765950.7A
Authority: CN
Inventors: 李磊; 王洪伟; 李雪雷; 丁良奎
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2017-08-30
Filing date: 2017-08-30
Publication date: 2017-12-22

Abstract

The embodiment of the invention discloses a kind of accelerated method, the apparatus and system of singular value decomposition computing, including the pending singular value decomposition data of reception processing device transmission and point multiplication operation instruction；According to the point multiplication operation algorithm of point multiplication operation instruction calls FPGA hardware circuit realiration, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain the first operation result；First operation result is back to processor.The embodiment of the present invention realizes the point multiplication operation in singular value decomposition computing by FPGA parallel computing platforms, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call the point of FPGA hardware circuit realiration into algorithm to be calculated accordingly, the embodiment of the present invention is while realizing concurrent operation, ensureing computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.

Description

A kind of accelerated method, the apparatus and system of singular value decomposition computing

Technical field

The present embodiments relate to singularity value decomposition field, more particularly to a kind of acceleration of singular value decomposition computing Method, apparatus and system.

Background technology

Singular value decomposition is a kind of important matrix decomposition in linear algebra, is normal matrix unitarily diagonalizable in matrix analysis Popularization.There is important, such as the field such as signal transacting and statistics in many fields.With the hair of big data Exhibition, during singular value decomposition computing, the data operation of magnanimity can expend the very long time, and occupy ample resources. Its main operational in singular value decomposition computing is added up for vector and point multiplication operation, and such computing has parallel characteristics, So in order to using the cumulative parallel characteristics with point multiplication operation of vector in different value decomposition operation, reduce operation time, existing skill In art, singular value decomposition computing is performed typically on GPU, although singular value decomposition computing is performed on GPU can realize simultaneously Row calculates, and still, because GPU power consumption is higher, makes calculating cost higher.

Therefore, how a kind of accelerated method for the singular value decomposition computing for solving above-mentioned technical problem, device are provided and is System, which turns into those skilled in the art, to be needed to solve the problems, such as at present.

The content of the invention

The purpose of the embodiment of the present invention is to provide a kind of accelerated method, the apparatus and system of singular value decomposition computing, in reality While existing concurrent operation, guarantee computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.

In order to solve the above technical problems, the embodiments of the invention provide a kind of accelerated method of singular value decomposition computing, bag Include：

The pending singular value decomposition data and operational order that reception processing device is sent；

According to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to described Mathematical algorithm is handled the pending singular value decomposition data, obtains operation result；

The operation result is back to the processor；

The operational order instructs including point multiplication operation, and the mathematical algorithm includes point multiplication operation algorithm, the computing knot Fruit includes the first operation result.

Optionally, the operational order also includes the instruction of vectorial accumulating operation, and it is cumulative that the mathematical algorithm also includes vector Mathematical algorithm, the operation result also include the second operation result.

Optionally, the pending singular value decomposition data and the process of operational order that the reception processing device is sent are specific For：

Cache the pending singular value decomposition data.

The embodiment of the present invention additionally provides a kind of accelerator of singular value decomposition computing, including：

Receiving module, the pending singular value decomposition data and operational order sent for reception processing device；

Computing module, for calling computing calculation corresponding, by FPGA hardware circuit realiration according to the operational order Method, and the pending singular value decomposition data are handled according to the mathematical algorithm, obtain operation result；

Module is returned to, for the operation result to be back into the processor；

The operational order instructs including point multiplication operation, and the computing module includes point multiplication operation unit, the computing knot Fruit includes the first operation result.

Optionally, the operational order also includes the instruction of vectorial accumulating operation, and it is cumulative that the computing module also includes vector Arithmetic element, the operation result also include the second operation result.

Optionally, the receiving module includes：

Receiving unit, the pending singular value decomposition data and operational order sent for reception processing device；

Buffer unit, for caching the pending singular value decomposition data.

Optionally, the buffer unit includes DDR3 memories.

The embodiment of the present invention additionally provides a kind of acceleration system of singular value decomposition computing, including processor and institute as described above The accelerator for the singular value decomposition computing stated.

The embodiments of the invention provide a kind of accelerated method, the apparatus and system of singular value decomposition computing, including receiving area Manage pending singular value decomposition data and the point multiplication operation instruction that device is sent；According to point multiplication operation instruction calls FPGA hardware electricity The point multiplication operation algorithm that road is realized, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain First operation result；First operation result is back to processor.

The embodiment of the present invention realizes the point multiplication operation in singular value decomposition computing by FPGA parallel computing platforms, is receiving After pending singular value decomposition data and the point multiplication operation instruction sent to processor, the point of FPGA hardware circuit realiration is called Into algorithm to be calculated accordingly, the embodiment of the present invention reduces fortune while realizing concurrent operation, ensureing computational efficiency Power consumption during calculation, so as to reduce calculating cost.

Brief description of the drawings

Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to institute in prior art and embodiment The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention Example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to these accompanying drawings Obtain other accompanying drawings.

Fig. 1 is a kind of schematic flow sheet of the accelerated method of singular value decomposition computing provided in an embodiment of the present invention；

Fig. 2 is a kind of structural representation of the accelerator of singular value decomposition computing provided in an embodiment of the present invention；

Fig. 3 is the structural representation of the accelerator of another singular value decomposition computing provided in an embodiment of the present invention.

Embodiment

It is parallel realizing the embodiments of the invention provide a kind of accelerated method, the apparatus and system of singular value decomposition computing While computing, guarantee computational efficiency, the power consumption in calculating process is reduced, so as to reduce calculating cost.

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.

Fig. 1 is refer to, Fig. 1 is that a kind of flow of the accelerated method of singular value decomposition computing provided in an embodiment of the present invention is shown It is intended to.

This method includes：

S11：The pending singular value decomposition data and operational order that reception processing device is sent；

S12：According to operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to computing Algorithm is handled pending singular value decomposition data, obtains operation result；

S13：Operation result is back to processor；

Operational order instructs including point multiplication operation, and mathematical algorithm includes point multiplication operation algorithm, and operation result includes the first fortune Calculate result.

, can be with it should be noted that the embodiment of the present invention realizes singular value decomposition computing by FPGA parallel computing platforms Calculating process is accelerated to a certain extent, improves calculating speed.

Specifically, by the way that the related operation (such as point multiplication operation etc.) in singular value decomposition is passed through into opencl in advance (Open Computing Laguage, open computing language) is transplanted to FPAG (Field-Programmable Gate Array, field programmable gate array) board on, namely be transplanted to FPGA kernel ends, realized so as to pass through to call The FPGA hardware circuit of corresponding mathematical algorithm carries out calculating processing to pending data, and can in specific calculating process To realize parallel computation by Pragaunroll x instructions, wherein x is the expansion number of plies, and its specific value can be according to PCIe bands Width is determined.

Specifically, the point multiplication operation in singular value decomposition computing can be transplanted to FPAG board by opencl in advance On, wherein, the relationship of point multiplication operation is：

Wherein, X₁=m₁*a₁+m₂*a₂+...m_N*a_N；X₂=m₁*b₁+m₂*b₂+...m_N*b_N；X₃、X₄Deng by that analogy.

It should also be noted that, the first operation result be with corresponding mathematical algorithm corresponding to operation result, for example, when fortune Calculate instruction for point multiplication operation instruction when, FPGA hardware circuit realiration will be passed through according to the point multiplication operation instruction calls point multiplication operation Algorithm, and calculating processing is carried out to pending singular value decomposition data by the algorithm, obtain the first operation result.

Further, operational order also includes the instruction of vectorial accumulating operation, and mathematical algorithm also includes vectorial accumulating operation and calculated Method, operation result also include the second operation result.

Specifically, the vectorial accumulating operation algorithm in singular value decomposition computing can also be transplanted to by opencl in advance On FPAG board, wherein, the relationship of vectorial accumulating operation algorithm is：

Wherein, u₁=X₁*a₁+X₁*b₁+...X₁*size₁, u₂=X₂*a₂+X₂*b₂+...X₂*size₂, u₃、u₄Deng class successively Push away.

Instructed it should be noted that the operational order received can also include vectorial accumulating operation, now, will be according to this Vectorial accumulating operation instruction calls treat place by the vectorial accumulating operation algorithm of FPGA hardware circuit realiration by the algorithm The singular value decomposition data of reason carry out calculating processing, obtain corresponding second operation result.

In addition, when needing to carry out point multiplication operation and vectorial accumulating operation to pending singular value decomposition data, dot product Computing and vectorial accumulating operation can pass through the relational implementation of flowing water.

Specifically, reception processing device is sent in above-mentioned S11 pending singular value decomposition data and the mistake of operational order Journey is specially：

Cache pending singular value decomposition data.

It should be noted that when the data volume of pending singular value decomposition data is larger, can also be delayed Deposit, and calculating processing is carried out to the data after caching.

The embodiments of the invention provide a kind of accelerated method of singular value decomposition computing, including treating of sending of reception processing device Singular value decomposition data and the point multiplication operation instruction of processing；Dot product according to point multiplication operation instruction calls FPGA hardware circuit realiration Mathematical algorithm, and pending singular value decomposition data are handled according to point multiplication operation algorithm, obtain the first operation result； First operation result is back to processor.

Accordingly, the embodiments of the invention provide a kind of accelerator of singular value decomposition computing, as shown in Fig. 2 Fig. 2 is A kind of structural representation of the accelerator of singular value decomposition computing provided in an embodiment of the present invention.On the basis of above-described embodiment On：

The device includes：

Receiving module 1, the pending singular value decomposition data and operational order sent for reception processing device；

Computing module 2, for according to operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, And pending singular value decomposition data are handled according to mathematical algorithm, obtain operation result；

Module 3 is returned to, for operation result to be back into processor；

Operational order instructs including point multiplication operation, and computing module 2 includes point multiplication operation unit, and operation result includes the first fortune Calculate result.

It should be noted that point multiplication operation unit in computing module is particularly for realizing the first of point multiplication operation algorithm FPGA hardware circuit.

Fig. 3 is refer to, Fig. 3 is the structure of the accelerator for another singular value decomposition computing that the present invention implements offer Schematic diagram.

On the basis of above-described embodiment, further operational order also includes vectorial accumulating operation and instructed, computing module 2 Also include vectorial accumulating operation unit, operation result also includes the second operation result.

It should be noted that vectorial accumulating operation unit in computing module is particularly for realizing that vectorial accumulating operation calculates Second FPGA hardware circuit of method.

Optionally, receiving module 1 includes：

Buffer unit, for caching pending singular value decomposition data.

Specifically, buffer unit can include DDR3 memories, it is of course also possible to the memory including other models, tool The body embodiment of the present invention does not limit.

It should be noted that the embodiment of the present invention realizes the point in singular value decomposition computing by FPGA parallel computing platforms Multiplication, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call FPGA hard The point of part circuit realiration into algorithm to be calculated accordingly, realizing concurrent operation, ensureing computational efficiency by the embodiment of the present invention While, the power consumption in calculating process is reduced, so as to reduce calculating cost.

In addition, the specific introduction for the accelerated method of singular value decomposition computing involved in the embodiment of the present invention please With reference to above method embodiment, the application will not be repeated here.

The embodiment of the present invention additionally provides a kind of acceleration system of singular value decomposition computing, including processor and described above The accelerator of singular value decomposition computing.

It should be noted that the embodiment of the present invention realizes the point in singular value decomposition computing by FPGA parallel computing platforms Multiplication, after the pending singular value decomposition data of processor transmission and point multiplication operation instruction are received, call FPGA hard The point of part circuit realiration into algorithm to be calculated accordingly, realizing concurrent operation, ensureing computational efficiency by the embodiment of the present invention While, the power consumption in calculating process is reduced, so as to reduce calculating cost.In addition, for involved in the embodiment of the present invention And to the specific introduction of accelerated method of singular value decomposition computing refer to above method embodiment, the application is no longer superfluous herein State.

Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.

It should also be noted that, in this manual, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged Except other identical element in the process including the key element, method, article or equipment being also present.

Professional further appreciates that, with reference to the unit of each example of the embodiments described herein description And algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware and The interchangeability of software, the composition and step of each example are generally described according to function in the above description.These Function is performed with hardware or software mode actually, application-specific and design constraint depending on technical scheme.Specialty Technical staff can realize described function using distinct methods to each specific application, but this realization should not Think beyond the scope of this invention.

Directly it can be held with reference to the step of method or algorithm that the embodiments described herein describes with hardware, processor Capable software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In the storage medium of any other forms well known in field.

The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims

A kind of 1. accelerated method of singular value decomposition computing, it is characterised in that including：

The pending singular value decomposition data and operational order that reception processing device is sent；

According to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and according to the computing Algorithm is handled the pending singular value decomposition data, obtains operation result；

The operation result is back to the processor；

The operational order instructs including point multiplication operation, and the mathematical algorithm includes point multiplication operation algorithm, the operation result bag Include the first operation result.
2. according to the accelerated method of claim 1 singular value decomposition computing, it is characterised in that the operational order also includes vector Accumulating operation instructs, and the mathematical algorithm also includes vectorial accumulating operation algorithm, and the operation result also includes the second computing knot Fruit.
3. according to the accelerated method of claim 2 singular value decomposition computing, it is characterised in that what the reception processing device was sent treats The singular value decomposition data of processing and the process of operational order are specially：

The pending singular value decomposition data and operational order that reception processing device is sent；

Cache the pending singular value decomposition data.
A kind of 4. accelerator of singular value decomposition computing, it is characterised in that including：

Receiving module, the pending singular value decomposition data and operational order sent for reception processing device；

Computing module, for according to the operational order call it is corresponding, by the mathematical algorithm of FPGA hardware circuit realiration, and The pending singular value decomposition data are handled according to the mathematical algorithm, obtain operation result；

Module is returned to, for the operation result to be back into the processor；

The operational order instructs including point multiplication operation, and the computing module includes point multiplication operation unit, the operation result bag Include the first operation result.
5. according to the accelerator of claim 4 singular value decomposition computing, it is characterised in that the operational order also includes vector Accumulating operation instructs, and the computing module also includes vectorial accumulating operation unit, and the operation result also includes the second computing knot Fruit.
6. according to the accelerator of claim 5 singular value decomposition computing, it is characterised in that the receiving module includes：

Receiving unit, the pending singular value decomposition data and operational order sent for reception processing device；

Buffer unit, for caching the pending singular value decomposition data.
7. according to the accelerator of claim 6 singular value decomposition computing, it is characterised in that the buffer unit is deposited including DDR3 Reservoir.
8. a kind of acceleration system of singular value decomposition computing, it is characterised in that including processor and as claim 4-7 is any one The accelerator of singular value decomposition computing described in.