Summary of the invention
The object of the present invention is to provide the figures that is used for of a kind of concurrency that can be improved figure computation accelerator and efficiency to calculate
The diagram data compression method and figure computation accelerator of accelerator.
To achieve the above object, the main technical schemes that the present invention uses include:
Provided by the present invention for the diagram data compression method of figure computation accelerator, comprising:
S1, the diagram data that the pretreatment circuit of figure computation accelerator abuts sparse matrix by be processed and indicates are converted into
The diagram data of independent sparse column compression CSCI format, the diagram data after each column independent compression include column mark data to and non-zero entry
Prime number is according to right, and each data are to including index index and numerical value value, by the two instruction index of highest for indexing index
The meaning of remaining and numerical value value,
S2, figure computation accelerator pretreatment circuit the diagram data of the CSCI format after conversion be stored in the figure calculate
In the memory of accelerator.
As a further improvement of that present invention, the step S1 includes:
Column independent compression is pressed into data pair one by one to the diagram data that sparse adjacency matrix indicates;
Each data includes: index index and numerical value value to structure;
Index highest two be " 01 " or " 10 " data to for column identify ioc;
Data as column mark are to subsequent data to the corresponding data pair of nonzero element for all rows of the column.
As a further improvement of that present invention, when being " 01 " for index highest two, remaining position index indicates column index,
Value indicates the nonzero element number of the column in adjacent sparse matrix;
When being " 10 " for index highest two, remaining position index indicates column index, and this is classified as adjacent sparse matrix
Last column, value indicate the nonzero element number of the column in adjacent sparse matrix;
When being " 00 " for index highest two, remaining position index indicates line index, and value is indicated in sparse adjacency matrix
Corresponding nonzero element value.
As a further improvement of that present invention, number of the digit of the index and value according to adjacent sparse matrix data
It is determined according to amount.
On the other hand, the present invention also provides a kind of figure computation accelerators, including pretreatment circuit and memory;
The pretreatment circuit is according to any compression method of the claims 1 to 4 to adjacent sparse matrix number
According to progress conversion process.
As a further improvement of that present invention, further includes:
Control circuit, data access unit, scheduler, combination grain processing unit and result generate unit;
Wherein, the pretreatment circuit is also used to column mark copy in CSCI being stored in the memory;
The control circuit stores the conversion for finishing and sending later just for receiving the pretreatment circuit in memory
Thread indication signal, according to the figure that host is sent calculate application type control the data access unit, combination grain processing unit,
As a result generate unit operation, and by host send application type one root vertex index or application type two source summit
Index sends the data access unit;
The data access unit, for reading the diagram data and column mark of the CSCI, and root from the memory
It is being stored according to the specified vertex of vertex index calculating of enlivening that described vertex index, source summit index or result generate unit transmission
Physical address in device is transferred to scheduler to carry out data access, and by the data of reading;
The scheduler, for keeping in the nonzero element number of column mark instruction in CISI, and according to the mangcorn
Temporary data are assigned to the processing elements in combination grain processing unit and carried out by the status signal for spending processing elements in processing unit
Processing;
The combination grain processing unit, for according in control circuit application type and result generate unit and enliven
Vertex data carries out parallel processing to the data kept in scheduler, and intermediate data transmission result generation is single by treated
Member;
The result generates unit, for being handled according to the application type in control circuit intermediate data, and
The vertex index of enlivening for the treatment of process is sent into data access unit, final result stores by treated.
As a further improvement of that present invention, the control circuit includes: host interface component and control logic component;
The host interface component, for receiving host send application type, application type one root vertex index and
The source summit of application type two indexes;
The control logic component, the conversion ready transport indicator sent for receiving the pretreatment circuit, will be described
Root vertex index or source summit index send the data access unit, and application type is sent combination grain processing unit and knot
Fruit generates unit, and starts each module in figure computation accelerator and start to work;
Wherein application type is first is that breadth first search application BFS type, application type is second is that signal source shortest path application
SSSP type.
As a further improvement of that present invention, the data access unit includes: user logic component, address calculation module
Buffer is identified with column;
The column identify buffer, and the column for storing diagram data in CSCI identify;
The address calculation module generates the vertex rope of unit input for send according to the control circuit and result
Draw, calculates current active vertex i in conjunction with each column nonzero element data, the number of every row storage data in column mark buffer
The physical address of corresponding data in memory;
The user logic component is temporarily stored in the column mark for reading column mark from the memory and keeps in
In device;The number for enlivening vertex correspondence accordingly is read from the memory according to the address that the address calculation module calculates
According to, and the data read to the scheduler dispatches;
And then receive scheduler dispatches pause read signal after, stopping read data from the memory;
The user logic component is also used to read again number after Signal Fail is read in the pause of the scheduler dispatches
According to.
As a further improvement of that present invention, scheduler includes: Buffer allocation module, task scheduling modules and double buffering
Area's module;
Buffer allocation module, the column for analyzing the column data transmitted from data access unit identify corresponding data
It is right, and the column are identified according to the buffer status information of double buffering module transmission by corresponding diagram data and are sent to double buffering
Module then sends to data access unit when buffer areas all in double buffering module all occupy and stops reading signal;
Task scheduling modules, processing elements status signal and double buffering for being transmitted according to combination grain processing unit
The buffer status information of module transmission is sent into idle and calculates what capacity was met the requirements data unscheduled in all buffer areas
Processing elements are handled;
Double buffering module includes: that multiple groups keep in the different front and back double buffering composition of capacity;
The double buffering module, for when all buffer states are set to " full ", notice task scheduling modules to be dispatched
The diagram data of buffer cache, the buffer state that data dispatch is completed is set to " sky ", and sends buffer state to buffer area
Distribution module.
As a further improvement of that present invention, combination grain processing unit includes: auxiliary circuit module and Processor Array;
The auxiliary circuit module, for result to be generated unit input according to processing elements state each in Processor Array
It enlivens vertex data pair and is transferred to corresponding free time processing elements in Processor Array with the corresponding CSCI that scheduler inputs;
Processor Array is made of the processing elements PE of multiple and different capacity, multiple processing elements concurrent workings;
Each processing elements receive the input of auxiliary circuit module enliven vertex data pair and CSCI after, passed according to control circuit
Defeated application type is calculated with CSCI to enlivening vertex data pair.
As a further improvement of that present invention, each processing elements, are specifically used for:
When the application type of control circuit transmission is breadth first search, the value value for enlivening vertex data pair is added 1
It is assigned to the value of each data pair in CSCI;
When application type is signal source shortest path, by data pair each in the value and CSCI that enliven vertex data pair
Value be added after for updating the value of each data pair in CSCI;
The calculated result of processing elements is output to result and generates unit, and the data that calculated result includes are to maximum number and each
The data that processing elements are handled simultaneously are identical to number.
As a further improvement of that present invention, as a result generating unit includes: that operation module, comparator and on piece result are temporary
Device;
Operation module includes: 8 tunnel, the 4 level production line tree of 15 operating units composition;
The calculating capacity of each operating unit is identical to maximum number as the data that input data includes;
Each operating unit, the application type for being inputted according to control circuit is in the input of combination grain processing unit
Between data calculated;
The comparator, the data for being inputted according to the operation module, one by one according to the line index of each data pair
The corresponding last time value value of the line index of read operation cell processing from piece result buffer, and it is current with input
Value value compares, if current value value is not smaller than last time value value, do not execute any operation, directly carries out next
The calculating of line index updates the line index of operating unit processing on piece result buffer if current value value is smaller
Value value, and the line index corresponding vertex is set to and enlivens vertex, which is output to data access unit, will be gone
Index and value Value Data are to being output to combination grain processing unit;
On piece result buffer, for keeping in the depth/distance on each vertex.
As a further improvement of that present invention, each operating unit, is specifically used for: for application type one and application type
Two calculating carried out are all to compare operation, i.e., are compared the value value of the identical two-way input data pair of line index, will be compared with
Small value is exported as the corresponding new value of the line index to next stage, until exporting from afterbody op_cell, is then inputted
To comparator.
As a further improvement of that present invention, the address calculation module is also used to the base in memory according to CSCI
Address BaseAddr determines the physical address PhyAddr according to formula one;
One: PhyAddr=BaseAddr+ (nnz_c of formula0+nnz_c1+...+nnz_ci)/RowSize;
nnz_ciIndicate the nonzero element number of respective column in independent sparse adjacency matrix, i is column index;
The byte number of the RowSize expression every row storage data of memory.
The beneficial effects of the present invention are:
The diagram data compressed in diagram data compression method of the invention is applied to figure computation accelerator, and then schemes to calculate and accelerate
Device efficiently realizes two kinds of applications such as BFS and SSSP in figure calculating, improves effective bandwidth and concurrency, accelerates processed
Journey.
Specific embodiment
In order to better explain the present invention, in order to understand, with reference to the accompanying drawing, by specific embodiment, to this hair
It is bright to be described in detail.
Currently, the management of large-scale graph data can use a variety of data models, the vertex that can connect according to a line
Number is divided into simple graph model and hypergraph model.The present invention can only connect two vertex towards simple graph model, i.e. a line, and
There may be loops.Figure in the real world is usually averaged degree, i.e. the ratio of number of edges and number of vertex, only several to several hundred, with
Easily up to ten million or even more than one hundred million a vertex scale is compared to seeming extremely sparse, and degree is in power-law distribution.
Simple graph model can be expressed as sparse adjacency matrix form.Diagram data since scale is big, in memory mostly with
Compressed format storage, compressed format have CSC (Compressed Sparse Column), CSR (Compressed Sparse
Row), in COO (Coordinate List), DCSC (Doubly Compressed Sparse Column) and the present invention
CSCI (Compressed Sparse Column Independently) referred to etc..
Breadth first search BFS is basic graph search algorithm, and the basis of many important nomographys.Breadth First
BFS is searched for from given vertex, referred to as root, starts iterative search and gives all accessible vertexs in figure, and calculate from root vertex to
The depth of all accessible vertexs, i.e., least number of edges.When initialization, the depth on root vertex is set as 0, and labeled as active
(active), the depth on every other vertex is set as infinity.In the t times iteration, the vertex v adjacent with vertex is enlivened
Depth is calculated by following formula.If the depth on a vertex is updated to t+1 by infinity, which is marked as living
It jumps and is used for next iteration.It so repeats to terminate until search.
Depth (v)=min (depth (v), t+1)
Signal source shortest path SSSP is used to calculate the shortest path from specified source summit all accessible vertexs into given figure
Distance.When initialization, the distance of source summit is set as 0, and labeled as active (active), the distance on every other vertex is set
For infinity.In the t times iteration, it is assumed that the weight on the side from vertex u to vertex v is w (u, v), then from source summit to top
The shortest path distance of point v is calculated by following formula.If the distance on a vertex is updated, which is marked as living
It jumps and is used for next iteration.It so repeats until completing all accessible vertexs.
Distance (v)=min (distance (v), distance (u)+w (u, v)).
Figure computation accelerator of the invention be it is a kind of for scheme calculate, using sparse column independent compression and combination grain at
Circuit structure that reason unit accelerates parallel, that two kinds of BFS and SSSP etc. figure calculating applications can be carried out.Below in conjunction with Figure 1A to figure
The structure and working principle of 7 pairs of figure computation accelerators of the invention are described in detail.
As shown in Figure 1A, the figure computation accelerator structure of the embodiment of the present invention include pretreatment circuit (CSCIU,
Compressed Sparsed Column Independently Unit), control circuit (CTR, ConTRoller), one
A data access unit (DAU, Data Accessing Unit), a scheduler (SCD, SCheDuler), a mangcorn
Spend processing unit (MGP, Mixed-Granularity Processing unit) and result generate unit (RGU,
Result Generating Unit)。
Wherein, the memory in figure computation accelerator structure can be regarded as general-purpose storage, calculate correlation for storing figure
Data.
The adjoining sparse matrix diagram data of input is converted into independent sparse column compressed format by pretreatment circuit (CSCIU)
(CSCI) it is stored in memory, while storing the copy for arranging mark in portion CSCI in memory, that is to say, that column mark (ioc)
In respectively arrange the copy of corresponding nonzero element number.
The input of pretreatment circuit being originally inputted for external structure.In addition, the representation of simple graph also there are many,
For this purpose, pre-processing circuit in the application, it converts the diagram data of adjacency matrix form.
In the present embodiment, data can adopt the index of (index, value) in the compressed CSCI format of diagram data
It is indicated with 32bit, value can indicate that the concrete meaning of index and value are as shown in table 1, wherein index using 16bit
[31:30] is that the data of " 01 " or " 10 " identify (ioc) to for column.The copy that corresponding nonzero element number is respectively arranged in column mark exists
It is stored one by one on memory by column.
Table 1, data illustrate meaning in CSCI format
Circuit is pre-processed after the completion of storage, issues conversion ready transport indicator to control circuit.
In the present embodiment, the adjoining sparse matrix of diagram data is indicated independent by column by independent sparse column compression CSCI format
Compression forms data one by one to (index, value).
For convenience of explanation, it is assumed here that index is indicated that value is indicated by 16bit by 32bit, and concrete application can basis
The practical scale of diagram data determines the expression digit of index and value.The concrete meaning of index and value is as shown in table 1,
Middle index [31:30] is that the data of " 01 " or " 10 " identify ioc to for column.
Index highest two be " 01 " or " 10 " data to for column identify ioc (indicator of column), often
A column mark data is to subsequent data to the corresponding data pair of nonzero element for all rows of the column;
When being " 01 " for index highest two, remaining position index indicates that column index, value indicate in sparse adjacency matrix
The nonzero element number of the column;
When being " 10 " for index highest two, remaining position index indicates column index, and this is classified as sparse adjacency matrix
Last column, value indicate the nonzero element number of the column in sparse adjacency matrix;When being " 00 " for index highest two,
Remaining position index indicates line index, and value indicates corresponding nonzero element value in sparse adjacency matrix.
CSCI compressed format is exemplified below, for convenience of explanation, for the simple graph shown in Figure 1B, the figure include A, B,
C, six vertex D, E, F, the weight of each edge are identified on Figure 1B.The corresponding sparse adjacency matrix M of the Figure 1B is expressed as follows, by
Include six vertex in the Figure 1B, therefore adjacency matrix is 6 × 6 matrixes, the row, column of matrix indexes the strigula therein since 1
It indicates not connect between corresponding two vertex, weight 0;
Compress the 1st column: last column of the column non-matrix have 2 nonzero elements, are located at the 3rd row and the 4th row, therefore should
Column compression are as follows:
(0100_0000_0000_0000_0000_0000_0000_0001,0000_0000_0000_0010)
(0000_0000_0000_0000_0000_0000_0000_0011,0000_0000_0000_0011)
(0000_0000_0000_0000_0000_0000_0000_0100,0000_0000_0000_0010)
Compress the 2nd column: last column of the column non-matrix have 1 nonzero element, are located at the 1st row, therefore the column compress are as follows:
(0100_0000_0000_0000_0000_0000_0000_0010,0000_0000_0000_0001)
(0000_0000_0000_0000_0000_0000_0000_0001,0000_0000_0000_0001)
Compress the 3rd column: last column of the column non-matrix have 1 nonzero element, are located at the 5th row, therefore the column compress are as follows:
(0100_0000_0000_0000_0000_0000_0000_0011,0000_0000_0000_0001)
(0000_0000_0000_0000_0000_0000_0000_0101,0000_0000_0000_0001)
Compress the 4th column: last column of the column non-matrix have 1 nonzero element, are located at the 2nd row, therefore the column compress are as follows:
(0100_0000_0000_0000_0000_0000_0000_0100,0000_0000_0000_0001)
(0000_0000_0000_0000_0000_0000_0000_0010,0000_0000_0000_0011)
Compress the 5th column: last column of the column non-matrix have 2 nonzero elements, are located at the 1st row and the 4th row, therefore should
Column compression are as follows:
(0100_0000_0000_0000_0000_0000_0000_0101,0000_0000_0000_0010)
(0000_0000_0000_0000_0000_0000_0000_0001,0000_0000_0000_0010)
(0000_0000_0000_0000_0000_0000_0000_0100,0000_0000_0000_0100)
Compress the 6th column: last column of the column bit matrix have 2 nonzero elements, are located at the 3rd row and the 5th row, therefore should
Column compression are as follows:
(1000_0000_0000_0000_0000_0000_0000_0110,0000_0000_0000_0010)
(0000_0000_0000_0000_0000_0000_0000_0011,0000_0000_0000_0001)
(0000_0000_0000_0000_0000_0000_0000_0101,0000_0000_0000_0011)
Each row compression finishes, and is sequentially stored into memory by column main sequence.
Compression process can be described as follows:
Sparse matrix be will abut against since first row by column independent process, when each column compress,
1) it counts the column nonzero element number and generates column mark data pair, if last non-column of the column, arrange mark index
[31:30] is " 01 ", is otherwise " 10 ", and column mark index [29:0] of each column indicates column index, the column mark of each column
Value [15:0] indicates the nonzero element number of the column;
2) all nonzero elements of the column are sequentially generated data pair by row, index [31:30] is " 00 ", index [29:0]
Indicate the line index of each nonzero element, value [15:0] indicates the numerical value of each nonzero element.
Referring to Fig. 2, control circuit (CTR) is by host interface component (host_IF) and control logic component (Ctr_Logic)
Two parts composition.
Application type and the corresponding vertex index of application type that host interface (host_IF) receiving host is sent are simultaneously temporary
It deposits.
Application type in the present embodiment includes: the application type one of breadth first search (BFS) application, single source shortest path
The application type two of diameter (SSSP) application.
Wherein, the vertex index of breadth first search (BFS) application is root vertex index, and signal source shortest path (SSSP) is answered
Vertex index is source summit index.
After control logic component (Ctr_Logic) receives the conversion ready transport indicator that pretreatment circuit is sent, by vertex
Index is sent to data access unit (DAU), and application type is sent to combination grain processing unit (MGP) and result generates list
First (RGU) and Acceleration of starting device are started to work.
Referring to Fig. 3, data access unit (DAU) by user logic component (UI), column mark buffer (ioc_ram),
Location computing module (addr_cal) composition.
User logic component mainly completes three functions:
1) it is temporary that column mark feeding column mark buffer (ioc_ram) is read from memory;
2) it is read from memory according to the address that address calculation module (addr_cal) is calculated and accordingly enlivens vertex correspondence
Data, and according to the value value of the vertex column mark data centering determine read data number;
Address calculation module can be calculated according to following formula one, in addition, can be by enlivening vertex during address calculation
Index participates in completing, for this purpose, for the address for enlivening vertex calculating, 2) accordingly enliven vertex correspondence is top belonging to address in
Point.
Enlivening vertex is the vertex that algorithm is updated in every wheel iterative process, enlivens vertex as next round.Just
Designated root vertex/source summit when the beginning, this i.e. first round enliven vertex, subsequent to produce when being calculated according to root vertex/source summit
Raw each round enlivens vertex.
3) vertex data read is sent to scheduler (SCD), and signal is read according to the pause that scheduler (SCD) is sent
Data are read in stopping from memory, while saving current state, in case after the pause reading Signal Fail that scheduler is sent again
Secondary reading data.
Column mark buffer (ioc_ram) is used to keep in the column mark of diagram data in CSCI format.
Address calculation module (addr_cal) generates the vertex index of unit input according to control circuit and result, in conjunction with column
The number RowSize for identifying each column nonzero element data and the every row storage data of memory that buffer provides calculates current live
Jump the physical address PhyAddr of vertex i corresponding data in memory, it is assumed that the base of diagram data in memory in CSCI format
Address is BaseAddr, then PhyAddr may be expressed as:
One: PhyAddr=BaseAddr+ (nnz_c of formula0+nnz_c1+...+nnz_ci)/RowSize。
Referring to Fig. 4, scheduler (SCD) is by Buffer allocation module (buf_assign), task scheduling modules (task_
Sch), double buffering module (double_buffer) forms.
Buffer allocation module (buf_assign) is analyzed from the diagram data for the CSCI format that data access unit is sent into
The column mark of the column data (column data that data access unit is sent into) knows column data to be processed to number, i.e. column mark
The value value of data pair, and the buffer status information sent according to double buffering module (double_buffer) is to be processed
The column data be sent to double buffering module (double_buffer), when all buffer areas all occupy, as " full " when,
It is then sent to data access unit (DAU) and stops reading signal;
Task scheduling modules (task_sch) according to the processing elements status signal that combination grain processing unit (MGP) is sent with
And the buffer status information that double buffering module (double_buffer) is sent, (not data unscheduled in all buffer areas
It is sent into the data of combination grain processing unit) it is sent into idle and calculates the processing elements that capacity is met the requirements and handled;
Double buffering module (double_buffer) is made of 16 groups of different front and back double bufferings of temporary capacity, each
The capacity of buffer area is described as follows.
The preceding buffer area of f, b expression, rear buffer area in the title of buffer area, 0~7 expression No. 0 buffer area to No. 7 buffer areas, 8~
11,12~13 meanings are similar.
When buffer area receives the diagram data of CSCI format, which is temporarily stored into number determination according to the data of the column data
Buffer area, each buffer area are equipped with former and later two, are used alternatingly in the form of table tennis, and when initial, buffer state is all " sky ",
Data are stored in preceding buffer area, i.e. buf*_f, when the preceding buffer area of identical capacity is all occupied, then it is current newly to receive data deposit
The rear buffer area of range of capacity, i.e. buf*_b, if capacity it is small front and back buffer area is all occupied and the biggish buffer area of capacity
Peanut data can be then stored in large capacity buffer area by the free time.
For example, successively being stored in buf0_f~buf7_f, when data are no more than 64 to number if buf0_f is
Have data and do not walked by reading, be then stored in buf1_f, if buf1_f occupy, be stored in buf2_f, and so on, when buf0_f~
Buf7_f is all occupied, then is sequentially successively stored in buf0_b~buf7_b, if buf0_f~buf7_f, buf0_b~buf7_b
It all occupies, then the data that data are no more than 64 to number can be temporarily stored into buf8_f~buf11_f, and so on.
Since memory data bandwidth is limited, when a column data of the diagram data of CSCI format is more than memory data bandwidth
When, then it needs to be stored in a buffer area by several times, until a column data of storage is all temporary, which is set to " full ", and
Notice task scheduling modules (task_sch) can dispatch the column data, and after the completion of data dispatch, which is set to
" sky ", while state is sent to Buffer allocation module (buf_assign);When a column data is more than to number in diagram data
1024, i.e., buffer area maximum capacity when, can batch processing.
Buf0~7_f, buf0~7_b:64 data pair;
Buf8~11_f, buf8~11_b:128 data pair;
Buf12~13_f, buf12~13_b:256 data pair;
Buf14_f, buf14_b:512 data pair;
Buf15_f, buf15_b:1024 data pair.
Referring to Fig. 5, combination grain processing unit (MGP) is by auxiliary circuit module (aux_cell), Processor Array (PEA)
Composition.
Auxiliary circuit module (aux_cell), for result to be generated to the work of unit (RGU) input according to processing elements state
It is corresponding idle that the diagram data for the corresponding CSCI format that jump vertex data pair is inputted with scheduler (SCD) is sent to Processor Array
Processing elements.
Above-mentioned jump vertex data is to can be regarded as: the calculated result of BFS/SSSP is to generate a number to each vertex
Value, to represent depth or distance of the vertex in figure, intermediate result is also in this way, only the value can quilt during successive iterations
It updates.Because referred to herein as enlivening vertex data pair.
Processor Array (PEA) is made of the processing elements PE of 16 different capabilities, 16 processing elements can concurrent working, processing
It is following (data handled here to Exclude Col mark data to) that member calculates capacity.
Processing elements (PE) receive auxiliary circuit module (aux_cell) input enliven vertex data pair and CSCI format
According to the application type of control circuit CTR feeding, to it, (what is inputted enlivens vertex data pair and CSCI format after diagram data
Diagram data) it is calculated.
When application type is breadth first search (BFS), processing elements (PE) add the value value for enlivening vertex data pair
1 is assigned to the value of each data pair of CSCI diagram data;
When application type is signal source shortest path (SSSP), processing elements (PE) will enliven the value of vertex data pair with
The value of each data pair of CSCI diagram data is used to update the value of each data pair of diagram data in CSCI format after being added;
Calculated result is output to result and generates unit (RGU), and the data that calculated result includes are to maximum number and each place
It is identical to number to manage the data that first (PE) can be handled simultaneously.
PE0~7: 64 data pair can be handled simultaneously;
PE8~11: 128 data pair can be handled simultaneously;
PE12~13: 256 data pair can be handled simultaneously;
PE14: 512 data pair can be handled simultaneously;
PE15: 1024 data pair can be handled simultaneously.
Referring to Fig. 6, unit (RGU) is as a result generated by operation module (OPC), comparator (CMP), on piece result buffer
(cur_rlt) it forms.
In conjunction with Fig. 7 and Fig. 6,8 tunnel, 4 level production line tree that operation module (OPC) is made of 15 operating units (op_cell)
It constitutes.
The calculating capacity of each op_cell is identical to maximum number as the data that input data includes;
The application type that each op_cell is inputted according to control circuit (CTR) counts (MGP) intermediate data inputted
It calculates, the calculating carried out for breadth first search (BFS) and signal source shortest path (SSSP) is all to compare operation, i.e., by line index
The value of identical two-way input data pair is compared, and is exported smaller value as the corresponding new value of the line index under
Level-one is then input to comparator (CMP) until exporting from afterbody op_cell;
The data that comparator (CMP) is inputted according to operation module (OPC), one by one according to the line index of each data pair from piece
The corresponding last time value value of the line index is read in upper result buffer (cur_rlt), and compared with the current value of input,
If current value is not smaller than last time value, any operation is not executed, the calculating of next line index is directly carried out, if
Current value is smaller, then updates value value of the line index on piece result buffer (cur_rlt), and by the line index
Corresponding vertex, which is set to, enlivens vertex, which is output to data access unit (DAU), by line index and value data pair
It is output to combination grain processing unit (MGP);On piece result buffer (cur_rlt) is for the temporary depth per each vertex
(depth) (for breadth first search BFS) and apart from (distance) (for signal source shortest path (SSSP)).
The present invention is used in the project of " the parallel figure computation accelerator of combination grain ", by actual verification,
The result shows that the function of the circuit meets target, the object of the invention can be realized with reliably working.
It is to be appreciated that describing the skill simply to illustrate that of the invention to what specific embodiments of the present invention carried out above
Art route and feature, its object is to allow those skilled in the art to can understand the content of the present invention and implement it accordingly, but
The present invention is not limited to above-mentioned particular implementations.All various changes made within the scope of the claims are repaired
Decorations, should be covered by the scope of protection of the present invention.