CN110598175B - Sparse matrix column vector comparison device based on graph computation accelerator - Google Patents

Sparse matrix column vector comparison device based on graph computation accelerator Download PDF

Info

Publication number
CN110598175B
CN110598175B CN201910877555.7A CN201910877555A CN110598175B CN 110598175 B CN110598175 B CN 110598175B CN 201910877555 A CN201910877555 A CN 201910877555A CN 110598175 B CN110598175 B CN 110598175B
Authority
CN
China
Prior art keywords
comparison
vector
operation circuit
output module
comparison operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910877555.7A
Other languages
Chinese (zh)
Other versions
CN110598175A (en
Inventor
邓军勇
田璞
杨博文
赵一迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Posts and Telecommunications
Original Assignee
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Posts and Telecommunications filed Critical Xian University of Posts and Telecommunications
Priority to CN201910877555.7A priority Critical patent/CN110598175B/en
Publication of CN110598175A publication Critical patent/CN110598175A/en
Application granted granted Critical
Publication of CN110598175B publication Critical patent/CN110598175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The invention discloses a sparse matrix column vector comparison device based on a graph calculation accelerator, which comprises: n comparison operation circuits, wherein all non-zero elements of the comparison vector are input into a first comparison operation circuit; all non-zero elements of the target vector are respectively input to the N comparison operation circuits; each of the comparison operation circuits includes: the device comprises an operating circuit, a direct output module and an intermediate output module; each of the operation circuits compares all or part of non-zero elements of the comparison vector with one non-zero element of the target vector input to the comparison operation circuit; the N comparison operation circuits are sequentially connected through an intermediate output module and a direct output module, and the results of the intermediate output module and the direct output module of the (N-1) th comparison operation circuit are respectively input to the nth comparison operation circuit; the last comparison operation circuit outputs the result vector after the sparse matrix column vectors are compared, and the defect that a comparison device for two sparse matrix column vectors is not provided in the prior art is overcome.

Description

Sparse matrix column vector comparison device based on graph computation accelerator
Technical Field
The invention relates to a graph computation accelerator technology, in particular to a sparse matrix column vector comparison device based on a graph computation accelerator.
Background
With the rise of new internet applications such as social networks and the popularization of various electronic devices, graph computation, especially the related application of large-scale graph computation, increasingly becomes a research hotspot in academia and industry, and the research and development of computing accelerators viewed from the perspectives of technology, application, independent intellectual property rights and the like are imperative. In the design process of the graph computation accelerator, efficient implementation of Sparse matrix Column vector comparison Compressed by CSCI (Compressed Sparse Column compression) needs to be considered. Here, each non-zero element in each sparse matrix column vector is compressed into a respective data pair according to the CSCI format, and each data pair structure includes: (index, value).
At present, the number of the non-zero elements in the matrix is far less than the total number of the matrix elements, and the distribution of the non-zero elements is irregular, so that the matrix is a sparse matrix, and conversely, if the number of the non-zero elements is most, the matrix is called a dense matrix. For the operation of the dense matrix column vector, all elements of the matrix need to be stored, and when two dense matrix column vector operations are performed, the elements of the same row are directly operated.
However, there is no prior art that provides any means for efficiently performing a comparison operation of two sparse matrix column vectors.
Disclosure of Invention
The invention aims to provide a sparse matrix column vector comparison device based on a graph computation accelerator, which is used for overcoming the defect that a comparison device for two sparse matrix column vectors does not exist in the prior art.
In order to achieve the purpose, the invention adopts the main technical scheme that:
in a first aspect, the present invention provides a sparse matrix column vector comparison apparatus for a graph-based computation accelerator, the sparse matrix column vector comprising: at least one comparison vector and at least one target vector, wherein the apparatus comprises:
n comparison operation circuits, wherein the number of the comparison operation circuits is larger than the maximum number of non-zero elements of column vectors in the sparse matrix;
for each comparison vector, inputting all non-zero elements of the comparison vector into a first comparison operation circuit;
for each target vector, inputting all non-zero elements of the target vector to N comparison operation circuits respectively;
each of the comparison operation circuits includes: the device comprises an operating circuit, a direct output module and an intermediate output module; the operation circuit of each comparison operation circuit is used for comparing all or part of non-zero elements of the comparison vector with one non-zero element of the target vector input to the comparison operation circuit;
the N comparison operation circuits are sequentially connected through an intermediate output module and a direct output module, and specifically, the results of the intermediate output module and the direct output module of the (N-1) th comparison operation circuit are respectively input to the nth comparison operation circuit; n belongs to the element of N;
the operation circuit of the nth comparison operation circuit transparently transmits the result of the direct output module of the (n-1) th comparison operation circuit, and the operation circuit of the nth comparison operation circuit processes the result of the intermediate output module of the (n-1) th comparison operation circuit;
and the last comparison operation circuit outputs a result vector after sparse matrix column vector comparison.
Optionally, for the operation circuit of the first comparison operation circuit, comparing all non-zero elements of the comparison vector with the first non-zero element of the target vector based on the comparison policy;
the comparison strategy comprises the following steps: comparing the non-zero element of which the row index is smaller than that of the first non-zero element of the target vector in the vector as an output result of the direct output module; comparing non-zero elements of the row index of the first non-zero element larger than the target vector in the vector as an output result of the intermediate output module; comparing the row index of the nonzero element in the vector with the row index of the first nonzero element of the target vector, and operating the nonzero elements with the same indexes;
and aiming at the operation circuit of the nth comparison operation circuit, comparing the nonzero element output by the intermediate output module of the (n-1) th comparison operation circuit with the nonzero element input to the nth comparison operation circuit by the target vector based on a comparison strategy.
Optionally, the operation circuit comprises: n is a radical of1A packet comparison unit; all the grouping comparison units are connected in sequence;
for an operation circuit of a first comparison operation circuit, dividing the comparison vector into N in order of non-zero elements1Group (d);
a first grouping comparison unit which compares a first group of non-zero elements of the grouped comparison vector with a first non-zero element of the target vector based on the comparison strategy;
the nth comparison operation circuit1A group comparison unit for comparing n-th of the vector after grouping based on the comparison policy1Comparing the group of non-zero elements with a first non-zero element of the target vector;
wherein n is1Is N1And all the elements in (1) are natural numbers larger than 1.
Optionally, if the number of non-zero elements in the comparison vector and the target vector are 64, N and N18, each group of the comparison vectors after grouping has 8 non-zero elements, and the operation circuits are 8 grouping comparison units CU 0-CU 7;
for the first grouping comparison unit CU0, if the row index of the first group of non-zero elements of the grouped comparison vector is the same as the row index of the first non-zero element of the target vector, performing a comparison operation on the non-zero elements with the same row index; the result after the comparison operation is transmitted to a direct output module;
other non-zero elements with row indexes smaller than the index of the first non-zero element of the target vector are transmitted to the direct output module, and non-zero elements with row indexes larger than the first non-zero element of the target vector are transmitted to the intermediate output module;
the remaining group comparison units CU 1-CU 7 do not perform any further comparisons and transmit the non-zero elements of the comparison vectors that do not perform further comparisons to the direct output module to which the operational circuitry is connected.
Optionally, the compression format of the sparse matrix is a CSCI compression format.
Optionally, the comparison operation of each group comparison unit in the operation circuit is determined according to the application of a graph calculation accelerator to which the sparse matrix column vector comparison device belongs.
Optionally, the comparing operation performed by the first group comparing unit CU0 on the two non-zero elements with the same row index includes: and taking the smaller value of the two numerical values based on the breadth-first search algorithm BFS.
In a second aspect, the present invention provides a graph computation accelerator, including the sparse matrix column vector comparing apparatus for a graph computation accelerator according to any one of the first aspect.
The invention has the beneficial effects that:
the device of the application is suitable for comparison operation of all sparse matrix column vectors, can improve operation efficiency, can be applied to different graph calculation applications, and improves applicability.
In addition, the apparatus of the present invention is the same for graph computation applications in each graph computation accelerator, except that the operation on the numerical values after the row index comparison is completed depends on the different graph computation applications.
Drawings
FIG. 1 is a schematic diagram of a comparison structure of two sparse vectors according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a comparison operation circuit according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an internal circuit of a comparison unit according to an embodiment of the present invention;
fig. 4 to 13 are schematic diagrams illustrating a specific process of the sparse matrix column vector comparing apparatus according to an embodiment of the present invention.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.
In order to better understand the solution of the embodiment of the present invention, the following outlines the apparatus of the embodiment of the present invention.
The storage format adopted by the sparse matrix in the embodiment of the invention is an independent sparse column compression CSCI format, and only non-zero elements in the sparse matrix are stored to save the storage space. So two column vectors need to be operated on, the elements of the same row need to be found first, and the operation cannot be performed directly. The specific implementation mode of the CSCI graph data compression format is as follows: each column of independently compressed graph data includes: the columns identify pairs of data and pairs of non-zero element data, each pair of data including an index and a value. The index of the column identification data pair represents the column number, and the numerical value represents that the column has several nonzero elements; the index of the non-zero element data pair indicates that the non-zero element is in the row number, and the value indicates the value of the non-zero element. In addition, in the embodiment of the present application, part of the sparse vectors may be abbreviated as sparse matrix column vectors
In graph calculation applications such as BFS (broadcast First Search), SSSP (Single Source short Path), etc., it is necessary to calculate the column vectors of the sparse matrix, so when two column vectors are calculated, it is First determined whether non-zero elements are in the same row, and thus, there is an operation of comparing the row indexes of the non-zero elements in the sparse column vectors. After comparing the row indexes of the sparse column vectors, corresponding operations are required to be performed on the values according to different graph calculation applications.
The compression format of the sparse matrix in the application adopts a CSCI format, the operation of two sparse matrix column vectors is realized, the specific realization method is as shown in figure 1, the two column vectors are assumed to be a comparison vector and a target vector respectively, the circuit consists of n comparison operation units, n nonzero elements of the comparison vector are input from a first comparison operation unit, and n nonzero elements of the target vector are input to n comparison operation circuits respectively. Firstly, comparing a first non-zero element of a target vector with all non-zero elements of a comparison vector, after comparison is finished, taking the non-zero element with the row index smaller than that of the first non-zero element of the target vector as output, not performing comparison of the next stage, outputting the non-zero element of the comparison vector with the row index larger than that of the first non-zero element of the target vector as input of a second comparison operation circuit, then comparing with the second non-zero element of the target vector, directly taking the non-zero element with the row index smaller than that of the second non-zero element as output, taking the larger non-zero element as input of the next comparison operation circuit, and so on, the number of the input non-zero elements of the comparison operation circuit is less and less, and the target vector element can more quickly find the comparison vector element with the same index.
Example one
The embodiment of the invention provides a sparse matrix column vector comparison device based on a graph calculation accelerator, wherein the sparse matrix column vector comprises: at least one comparison vector and at least one target vector, wherein the apparatus comprises:
n comparison operation circuits, wherein the number of the comparison operation circuits is larger than the maximum number of non-zero elements of column vectors in the sparse matrix; n is a natural number greater than 1; as shown in fig. 1.
For each comparison vector, inputting all non-zero elements of the comparison vector into a first comparison operation circuit;
for each target vector, inputting all non-zero elements of the target vector to N comparison operation circuits respectively;
each of the comparison operation circuits includes: the device comprises an operating circuit, a direct output module and an intermediate output module; the operation circuit of each comparison operation circuit is used for comparing all or part of non-zero elements of the comparison vector with one non-zero element of the target vector input to the comparison operation circuit;
the N comparison operation circuits are sequentially connected through an intermediate output module and a direct output module, and specifically, the results of the intermediate output module and the direct output module of the (N-1) th comparison operation circuit are respectively input to the nth comparison operation circuit; n belongs to the element of N;
the operation circuit of the nth comparison operation circuit transparently transmits the result of the direct output module of the (n-1) th comparison operation circuit, and the operation circuit of the nth comparison operation circuit processes the result of the intermediate output module of the (n-1) th comparison operation circuit;
and the last comparison operation circuit outputs a result vector after sparse matrix column vector comparison.
In the embodiment, for the operation circuit of the first comparison operation circuit, all non-zero elements of the comparison vector are compared with the first non-zero element of the target vector based on the comparison strategy;
the comparison strategy comprises the following steps: comparing the non-zero element of which the row index is smaller than that of the first non-zero element of the target vector in the vector as an output result of the direct output module; comparing non-zero elements of the row index of the first non-zero element larger than the target vector in the vector as an output result of the intermediate output module; comparing the row index of the nonzero element in the vector with the row index of the first nonzero element of the target vector, and operating the nonzero elements with the same indexes;
and aiming at the operation circuit of the nth comparison operation circuit, comparing the nonzero element output by the intermediate output module of the (n-1) th comparison operation circuit with the nonzero element input to the nth comparison operation circuit by the target vector based on a comparison strategy.
It should be noted that the number of zero elements in the sparse matrix is much larger than the number of non-zero elements. When two column vectors in a sparse matrix are operated on, for convenience of explanation, the two column vectors are referred to as a target vector and a comparison vector, respectively, in the present application. The number of non-zero elements in the two vectors is uncertain and may be equal or unequal.
For the N comparison operation circuits, the above-described apparatus can be used for both sparse column vector operations when the number of non-zero elements of the column vector is less than N. Of course, the number of comparison and operation circuits can be expanded according to the number of non-zero elements of the column vector.
In this embodiment, the last compare operation circuit of the apparatus may not have an intermediate output module.
In a specific implementation process, the foregoing operating circuit may include: n is a radical of1A packet comparison unit; all the grouping comparison units are connected in sequence;
for an operation circuit of a first comparison operation circuit, dividing the comparison vector into N in order of non-zero elements1Group (d);
a first grouping comparison unit which compares a first group of non-zero elements of the grouped comparison vector with a first non-zero element of the target vector based on the comparison strategy;
the nth comparison operation circuit1A group comparison unit for comparing n-th of the vector after grouping based on the comparison policy1Comparing the group of non-zero elements with a first non-zero element of the target vector;
wherein n is1Is N1Of (1).
For better understanding, the following example shows that if the number of non-zero elements in the comparison vector and the target vector are 64, N and N18, each group of the comparison vectors after grouping has 8 non-zero elements, and the operation circuits are 8 grouping comparison units CU 0-CU 7;
for the first grouping comparison unit CU0, if the row index of the first group of non-zero elements of the grouped comparison vector is the same as the row index of the first non-zero element of the target vector, performing a comparison operation on the non-zero elements with the same row index; the result after the comparison operation is transmitted to a direct output module; it should be noted that: different graph computations apply different operations to be performed on the values. The invention takes the breadth-first search algorithm BFS as an example, and takes the smaller value of two numerical values.
Other non-zero elements with row indexes smaller than the index of the first non-zero element of the target vector are transmitted to the direct output module, and non-zero elements with row indexes larger than the first non-zero element of the target vector are transmitted to the intermediate output module;
the remaining group comparison units CU 1-CU 7 do not perform any more comparisons and transmit the non-zero elements of the comparison vectors that do not perform any more comparisons to an intermediate output block to which the operational circuitry is connected.
In this embodiment, the compression format of the sparse matrix is a CSCI compression format.
In particular, the comparison operation of each group comparison unit in the operation circuit is determined according to the application of a graph calculation accelerator to which the sparse matrix column vector comparison device belongs.
For example, the comparison operation performed by the first group comparison unit CU0 on the two non-zero elements with the same row index includes: and taking the smaller value of the two numerical values based on the breadth-first search algorithm BFS.
The device of the application is suitable for comparison operation of all sparse matrix column vectors, can improve operation efficiency, can be applied to different graph calculation applications, and improves applicability.
Example two
In this embodiment, a comparison method of the foregoing apparatus can be explained, assuming that each sparse vector has n nonzero elements.
The operation circuit is configured to perform index comparison on the sparse vector elements to be compared with the target vector elements after grouping the sparse vector elements, continue to perform operation on the numerical value if the sparse vector elements have the same index, and then concatenate the index and the numerical value (for example, in a concatenation manner such as (index, numerical value)), as an output result. Meanwhile, the indexes of the nonzero elements in the comparison vector are smaller than the output index, and the nonzero elements can be directly used as output without participating in comparison of other elements of the subsequent target vector.
The direct output module is used for transmitting the non-zero elements which can be directly output in the comparison process in each stage of operation circuit, and simultaneously transmitting the vector elements which can be output in comparison in the next stage.
And the intermediate output module is used for outputting other non-zero elements which need to be output to the next stage in the comparison vector and compared with the next target non-zero vector as the input of the next stage.
For example, the operational circuitry may include: 8 packet comparison units.
The grouping comparison unit is used for comparing the target vector element with 8 elements of the comparison vector. The comparison vectors are sequentially grouped, each group comprises 8 nonzero elements, the target vector element is firstly compared with the first element index of the comparison vector, the comparison can be finished when the target vector element is less than or equal to the first element index of the comparison vector, a corresponding result is directly output, and other elements of the comparison vector can be output to the next stage for comparison; if the comparison vector is larger than the last element index of the comparison vector, comparing with the last element index of the comparison vector, if the comparison vector is equal to the last element index of the comparison vector, outputting a corresponding result, finishing the comparison of the current stage, sending the comparison vector elements smaller than the element index to the direct output module, and if the comparison vector is larger than the last element index of the comparison vector, continuously comparing with the next group of comparison vectors; if the index is smaller than the last element index of the comparison vector, the comparison is continuously carried out according to the dichotomy to obtain the result of the comparison of the current stage, finally, the vector elements which can be directly output are sent to a direct output module, and the comparison vector elements which need to be output and are continuously compared of the next stage are sent to an intermediate output module.
The foregoing apparatus and operation will be described in detail with reference to fig. 1 to 3.
As shown in fig. 1, assuming that the target sparse vector has n nonzero elements, the sparse vector comparison implemented by the present invention is composed of n comparison operation circuits, each comparison operation circuit is allocated with a nonzero element of the target vector, all nonzero elements of the comparison vector are used as the input of the first comparison operation circuit, after passing through the first comparison operation circuit, the vector elements that can be directly output and the vector elements that need to be continuously compared with the second nonzero element of the target vector are respectively output to the second comparison operation circuit, and so on until the last nonzero element of the target vector is compared.
As shown in fig. 2, the comparison operation circuit of the present embodiment includes: an operating circuit, an intermediate output module and a direct output module.
The operation circuit groups the non-zero elements of the comparison vector according to 8 comparison elements in each group to obtain 8 grouped Comparison Units (CU), and the 8 grouped Comparison Units (CU) can perform index comparison. The intermediate output module temporarily stores the data to be output to the next comparison operation circuit and compares the data with the next target vector element. The direct output module temporarily stores the determined vector elements which can be directly output, and then each stage of operation comparison circuit conducts transparent transmission until the final output is achieved.
As shown in fig. 3, the group comparison unit CU implements a process of comparing non-zero elements of one target vector with 8 non-zero elements in the comparison vector. Where target represents a target vector element, com represents an element of the target vector, 00 represents an index of the target vector element equal to the compare vector element index, 01 represents less than, and 10 represents greater than. The comparison steps are as follows:
the first step is as follows: the target element index target _ index is compared to the com0_ index.
(1) If the comparison result is equal to (00), the target value and the com0 value are applied according to different graph calculation and are correspondingly calculated, for example, the smaller value of the two values can be selected based on the breadth-first search algorithm BFS.
Then the data is spliced with the index and temporarily stored through a direct output module, then each stage of comparison operation circuit is transmitted through, com 1-com 7 are temporarily stored through an intermediate output module, output to the next stage and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit at the stage is finished.
(2) If the comparison result is less than (01), the target is directly and temporarily stored in the direct output module, then the comparison operation circuits of each stage are transmitted, com 0-com 7 are temporarily stored through the intermediate output module, output to the next stage and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit of the stage is finished.
(3) If the comparison result is greater than (10), jumping to the second step.
The second step is that: the target element index target _ index is compared to the com7_ index.
(1) If the comparison result is equal to (00), the target value and the com7 value are compared to obtain a smaller value, then the smaller value is spliced with the index and temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, com 0-com 6 are temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, and the operation of the stage of comparison operation circuit is finished.
(2) If the comparison result is greater than (10), com 0-com 7 are temporarily stored through the direct output module, then each stage of comparison operation circuit is transmitted through, and the target needs to be compared with the next group of comparison vector elements.
(3) And if the comparison result is less than (01), jumping to the third step.
The third step: the target element index target _ index is compared to the com6_ index.
(1) If the comparison result is equal to (00), the target value and the com6 value are compared to obtain a smaller value, then the smaller value is spliced with the index and temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, com 0-com 5 are temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, and the operation of the stage of comparison operation circuit is finished.
(2) If the comparison result is greater than (10), the target and com 0-com 5 are temporarily stored through the direct output module, then each stage of comparison operation circuit is transmitted through, and com7 performs comparison operation through the intermediate output module and the next element of the target vector. The comparison operation circuit at this stage finishes the operation.
(3) And if the comparison result is less than (01), jumping to the fourth step.
The fourth step: the target element index target _ index is compared to the com3_ index.
(1) If the comparison result is equal to (00), the target value and the com3 value are compared to obtain a smaller value, then the smaller value is spliced with the index and temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, com 0-com 2 are temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, and the operation of the stage of comparison operation circuit is finished.
(2) And if the comparison result is less than (01), jumping to the fifth step (1).
(3) And if the comparison result is greater than (10), jumping to the fifth step (2).
The fifth (1) step: the target element index target _ index is compared to the com2_ index.
(1) If the comparison result is equal to (00), the target value and the com2 value are compared to obtain a smaller value, then the smaller value is spliced with the index and temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, the com0 and the com1 are also temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, the com 3-com 7 are temporarily stored by the middle output module and output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit at the stage is finished.
(2) If the comparison result is greater than (10), the target and com 0-com 2 are temporarily stored through the direct output module, then the comparison operation circuits of each stage are transmitted through, the com 3-com 7 are temporarily stored through the intermediate output module and output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit of the stage is finished.
(3) And if the comparison result is less than (01), jumping to the sixth step (1).
The fifth (2) step: the target element index target _ index is compared to the com4_ index.
(1) If the comparison results are equal to (00), the value of the target and the value of the com4 are compared to obtain a smaller value, then the smaller value is spliced with the index and temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, the com 0-com 3 are also temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, the com 5-com 7 are temporarily stored by the middle output module and output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit at the stage is finished.
(2) If the comparison result is less than (01), the target and com 0-com 3 are temporarily stored through the direct output module, then the comparison operation circuits of each stage are transmitted through, the com 4-com 7 are temporarily stored through the intermediate output module and output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit of the stage is finished.
(3) And if the comparison result is greater than (10), jumping to the sixth step (2).
Sixth (1) step: the target element index target _ index is compared to the com1_ index.
(1) If the comparison result is equal to (00), the value of the target and the value of the com1 are compared to obtain a smaller value, then the smaller value is spliced with the index and temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, the com0 is also temporarily stored by the direct output module, then each stage of comparison operation circuit is transmitted through, the com 2-com 7 are temporarily stored by the middle output module and output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit at the stage is finished.
(2) If the comparison result is less than (01), target, com0 is temporarily stored through the direct output module, then the comparison operation circuit of each stage is transmitted through, com 1-com 7 is temporarily stored through the intermediate output module, and is output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit of the stage is finished.
(3) If the comparison result is greater than (10), the target, the com0 and the com1 are temporarily stored through the direct output module, then the comparison operation circuits of each stage are transmitted through, the com 2-com 7 are temporarily stored through the intermediate output module and output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit of the stage is finished.
And a sixth step (2): the target element index target _ index is compared to the com5_ index.
(1) If the comparison result is equal to (00), the value of the target and the value of the com5 are compared to obtain a smaller value, then the smaller value is temporarily stored with the index through the direct output module, then the smaller value is temporarily stored with the index (for example, (index and value)) through the direct output module, then all levels of comparison operation circuits are transmitted through the direct output module, com 0-com 4 are also temporarily stored through the direct output module, all levels of comparison operation circuits are transmitted through the direct output module, com 6-com 7 are temporarily stored through the intermediate output module and output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit at the stage is finished.
(2) If the comparison result is less than (01), the target is temporarily stored by the com 0-com 4 through the direct output module, then the comparison operation circuits of each stage are transmitted through, the com 5-com 7 are temporarily stored by the intermediate output module and output to the next comparison operation circuit of the next stage and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit of the current stage is finished.
(3) If the comparison result is greater than (10), the target and com 0-com 5 are temporarily stored through the direct output module, then the comparison operation circuits of each stage are transmitted through, the com6 and com7 are temporarily stored through the intermediate output module and output to the next stage of comparison operation circuit and the next nonzero element of the target vector for comparison operation, and the operation of the comparison operation circuit of the stage is finished.
EXAMPLE III
As shown in fig. 4, in CU0, the first non-zero element of the target vector is compared with the first group of 8 non-zero elements of the comparison vector after grouping,
if the minimum row index (compare0_ index) of the non-zero elements in the first group of the comparison vector is greater than the row index (target0_ index) of the first non-zero element of the target vector, the first non-zero element of the target vector can be directly output to the direct output circuit, the grouped comparison units CU0 to CU7 do not operate any more, the non-zero elements of the comparison vector which do not perform comparison any more are output to the intermediate output module connected to the operation circuit, and then the comparison vector is used as the input comparison vector of the second comparison operation circuit to continue the comparison with the second non-zero element of the target vector, as shown in fig. 4 ((r)).
If the non-zero elements in the first group of the comparison vector have the same row index as the first non-zero element of the target vector, the numerical values of the two non-zero elements are correspondingly operated, the result is transmitted to the direct output circuit, the non-zero elements of the other indexes in the comparison vector, which are smaller than the index of the first non-zero element of the target vector, are transmitted to the direct output circuit, and the rest non-zero elements, which are larger than the index of the first non-zero element of the target vector, are transmitted to the indirect output circuit. The remaining group comparison units CU 1-CU 7 are not operated any more, and output the non-zero elements of the comparison vector which is not compared any more to the intermediate output module connected to the operation circuit, and then as the input comparison vector of the second comparison operation circuit, continue the comparison with the second non-zero elements of the target vector, as shown in fig. 4.
If the row indexes of the non-zero elements in the first group of the comparison vector are all smaller than the row index of the first non-zero element of the target vector, outputting the first group of the non-zero elements of the comparison vector to a direct output module, continuously comparing the row index of the first non-zero element of the target vector with the indexes of the 8 non-zero elements of the second group of CUs 2 of the comparison vector, and so on, in the comparison method, as shown in third FIG. 4.
In addition, as shown in FIG. 5, assume that there are two sparse column vectors, each data pair meaning an identification: comma "," the number before "indicates the row index of the non-zero element, and the number after comma indicates the numerical value of the non-zero element. The target vector and the comparison vector are randomly allocated, as shown in fig. 5, the target vector (target) has 7 nonzero elements, and the comparison vector (compare) has 8 nonzero elements, and are mapped on the structure of the invention, as shown in fig. 6, wherein mid represents an intermediate output result, and imm represents a direct output result. There are 8 comparison operation circuits, and the element of the comparison vector input by the last comparison operation circuit is set to 0.
According to the comparison operation of the BFS in the foregoing example, if there is a comparison vector element with the same index as the target element, the numerical values of the two elements are compared, and the smaller value is taken; if not, the target element is directly output.
Comparison process of the comparison operation circuit 0: the index comparisons of the first non-zero element (4,2) of the target vector and all non-zero elements (15,2), (18,3), (24,1), (26,3), (28,2), (35,2), (44,1), (52,3) of the comparison vector, the row index of (4,2) being smaller than the row index of all non-zero elements of the comparison vector, it is possible to output (4,2) directly, so the result is as follows:
Figure GDA0002685595850000151
comparison process of the comparison operation circuit 1: the second non-zero element (15,3) of the target vector is compared with the index of the mid element (15,2), (18,3), (24,1), (26,3), (28,2), (35,2), (44,1), (52,3) of the intermediate output result mid output from the comparison operation circuit 0, the row index of the element (15,2) in the (15,3) and the intermediate output result mid output from the comparison operation circuit 0 is the same, the numerical value of the two elements is compared to take the smaller value, and then the result of comparison (15,2) and the direct output result (4,2) from the previous stage transmission are taken as the direct output result of the comparison operation circuit 1, with the following results:
Figure GDA0002685595850000152
comparison process of the comparison operation circuit 2: comparing the third non-zero element (20, 1) of the target vector with the index of the mid output result mid element (18,3), (24,1), (26,3), (28,2), (35,2), (44,1), (52,3) output from the comparison operation circuit 1, it can be seen that the row index of the target vector element (20, 1) is larger than the row index of the element (18,3) and smaller than the row index of the element (24,1), so the intermediate output result and the direct output result of the comparison operation circuit of this stage are as follows:
Figure GDA0002685595850000161
comparison process of the comparison operation circuit 3: comparing the fourth non-zero element (26,3) of the target vector with the indexes of the mid output result mid elements (24,1), (26,3), (28,2), (35,2), (44,1), (52,3) output from the comparison operation circuit 2, it can be seen that the row index of the target element (26,3) is greater than the row index of the element (24,1) and equal to the row index of the element (26,3), so the output result of the comparison operation circuit of this stage is as follows:
Figure GDA0002685595850000162
comparison process of the comparison operation circuit 4: the fifth non-zero element (27, 2) of the target vector is compared with the indexes of the mid output result mid elements (28,2), (35,2), (44,1), (52,3) output from the comparison operation circuit 3, and it can be seen that the row index of the target element (27, 2) is smaller than the row indexes of all the intermediate output results output from the comparison operation circuit 3, so the output result of this stage is as follows:
Figure GDA0002685595850000163
comparison process of the comparison operation circuit 5: the sixth non-zero element (44, 2) of the target vector is compared with the indices of the mid output result mid elements (28,2), (35,2), (44,1), (52,3) output from the comparison operation circuit 4, and the results are as follows:
Figure GDA0002685595850000171
comparison process of the comparison operation circuit 6: the seventh non-zero element (52, 1) of the target vector is compared with the index of the mid-output result mid element (52,3) output from the comparison operation circuit 5, and at this time the comparison is finished, there is no intermediate output result, and the intermediate output result can be set to 0, with the following results:
Figure GDA0002685595850000172
the comparison operation circuit 7 does not perform any operation any more, and directly transmits the direct output result imm output by the comparison operation circuit 6 as a final output result, and the last comparison operation circuit does not have an intermediate output result.
With respect to fig. 3, for ease of understanding, the following is explained: assuming that the target vector element is (5,2), it can be seen that the line index of this element is 5 (i.e., this element is in the fifth line of the target vector) and the value is 2. The 8 non-zero elements of the first group of the comparison vector are (3,1), (5,1), (8,3), (11,2), (15,3), (19,2), (20,3), (27,5), and may also be represented as
Figure GDA0002685595850000181
On several axes as shown in fig. 7.
According to the comparison method of fig. 3, the target vector element (target) (5,2) is first compared with the first non-zero element (com0) (3,1) of the comparison vector by row index, as shown in fig. 8, it is obvious that the row index of target (5,2) is greater than that of com0(3,1), and com0(3,1) can be sent to the direct output circuit.
Referring next to fig. 9-13, target (5,2) and com7(27,5) are compared for row index, as shown in fig. 9, target's row index is 5, com7 is 27, it is clear that target's row index is less than com7, then com6 is compared for row index, as shown in fig. 10, it is clear that target's row index is also less than com6, also com 1-com 5 are not compared with target's row index, a binary method is used for comparison, then target's row index and com 483 6 row index are compared, as shown in fig. 11, target's row index is still less than com3, next step is to compare target's row index and com2 row index, as shown in fig. 12, target's row index is less than com4, next step is compared with target's row index 366325, as shown in fig. 13, it can be seen that target's row index is equal to com1, then, the smaller value of the values of target and com1 is calculated, and it can be seen that the value of target is 2, the value of com1 is 1, and the smaller value of the two is 1, so the result of comparing the values of target and com1 is (5, 1).
It should be noted that: concatenation means combining the index 5, and the result 1 of the numerical comparison, into a data pair, e.g., (5, 1).
the comparison between target and com 0-com 7 is completed, and the vector elements directly output include
Figure GDA0002685595850000182
The vector elements which are indirectly output and need to be output to the next comparison operation unit are
Figure GDA0002685595850000191
The above description of the embodiments of the present invention is provided for the purpose of illustrating the technical lines and features of the present invention and is provided for the purpose of enabling those skilled in the art to understand the contents of the present invention and to implement the present invention, but the present invention is not limited to the above specific embodiments. It is intended that all such changes and modifications as fall within the scope of the appended claims be embraced therein.

Claims (8)

1. A sparse matrix column vector comparison apparatus for a graph-based computation accelerator, the sparse matrix column vector comprising: at least one comparison vector and at least one target vector, wherein the apparatus comprises:
n comparison operation circuits, wherein the number of the comparison operation circuits is larger than the maximum number of non-zero elements of column vectors in the sparse matrix;
for each comparison vector, inputting all non-zero elements of the comparison vector into a first comparison operation circuit;
for each target vector, inputting all non-zero elements of the target vector to N comparison operation circuits respectively;
each of the comparison operation circuits includes: the device comprises an operating circuit, a direct output module and an intermediate output module; the operation circuit of each comparison operation circuit is used for comparing all or part of non-zero elements of the comparison vector with one non-zero element of the target vector input to the comparison operation circuit;
the N comparison operation circuits are sequentially connected through an intermediate output module and a direct output module, and specifically, the results of the intermediate output module and the direct output module of the (N-1) th comparison operation circuit are respectively input to the nth comparison operation circuit; n belongs to the element of N;
the operation circuit of the nth comparison operation circuit transparently transmits the result of the direct output module of the (n-1) th comparison operation circuit, and the operation circuit of the nth comparison operation circuit processes the result of the intermediate output module of the (n-1) th comparison operation circuit;
and the last comparison operation circuit outputs a result vector after sparse matrix column vector comparison.
2. The sparse matrix column vector comparison device of claim 1,
comparing, for an operational circuit of a first comparison operational circuit, all non-zero elements of a comparison vector with a first non-zero element of a target vector based on a comparison policy;
the comparison strategy comprises the following steps: comparing the non-zero element of which the row index is smaller than that of the first non-zero element of the target vector in the vector as an output result of the direct output module; comparing non-zero elements of the row index of the first non-zero element larger than the target vector in the vector as an output result of the intermediate output module; comparing the row index of the nonzero element in the vector with the row index of the first nonzero element of the target vector, and operating the nonzero elements with the same indexes;
and aiming at the operation circuit of the nth comparison operation circuit, comparing the nonzero element output by the intermediate output module of the (n-1) th comparison operation circuit with the nonzero element input to the nth comparison operation circuit by the target vector based on a comparison strategy.
3. The sparse matrix column vector comparison device of claim 1, whereinCharacterized in that said operational circuitry comprises: n is a radical of1A packet comparison unit; all the grouping comparison units are connected in sequence;
for an operation circuit of a first comparison operation circuit, dividing the comparison vector into N in order of non-zero elements1Group (d);
a first grouping comparison unit which compares a first group of non-zero elements of the grouped comparison vector with a first non-zero element of the target vector based on the comparison strategy;
the nth comparison operation circuit1A group comparison unit for comparing n-th of the vector after grouping based on the comparison policy1Comparing the group of non-zero elements with a first non-zero element of the target vector;
wherein n is1Is N1Of (1).
4. The sparse matrix column vector comparison device of claim 2,
if the number of the non-zero elements in the comparison vector and the target vector is 64, N and N18, each group of the comparison vectors after grouping has 8 non-zero elements, and the operation circuits are 8 grouping comparison units CU 0-CU 7;
for the first grouping comparison unit CU0, if the row index of the first group of non-zero elements of the grouped comparison vector is the same as the row index of the first non-zero element of the target vector, performing a comparison operation on the non-zero elements with the same row index; the result after the comparison operation is transmitted to a direct output module;
other non-zero elements with row indexes smaller than the index of the first non-zero element of the target vector are transmitted to the direct output module, and non-zero elements with row indexes larger than the first non-zero element of the target vector are transmitted to the intermediate output module;
the remaining group comparison units CU 1-CU 7 do not perform any further comparisons and transmit the non-zero elements of the comparison vectors that do not perform further comparisons to the direct output module to which the operational circuitry is connected.
5. The sparse matrix column vector comparison device of any one of claims 1 to 4,
the compression format of the sparse matrix adopts a CSCI compression format.
6. The sparse matrix column vector comparison device of claim 2,
and the comparison operation of each group comparison unit in the operation circuit is determined according to the application of a graph calculation accelerator to which the sparse matrix column vector comparison device belongs.
7. The sparse matrix column vector comparison device of claim 4,
the first group comparison unit CU0 performs a comparison operation on two non-zero elements with the same row index, including: and taking the smaller value of the two numerical values based on the breadth-first search algorithm BFS.
8. A graph computation accelerator comprising the sparse matrix column vector comparison apparatus of any of claims 1 to 7.
CN201910877555.7A 2019-09-17 2019-09-17 Sparse matrix column vector comparison device based on graph computation accelerator Active CN110598175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910877555.7A CN110598175B (en) 2019-09-17 2019-09-17 Sparse matrix column vector comparison device based on graph computation accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910877555.7A CN110598175B (en) 2019-09-17 2019-09-17 Sparse matrix column vector comparison device based on graph computation accelerator

Publications (2)

Publication Number Publication Date
CN110598175A CN110598175A (en) 2019-12-20
CN110598175B true CN110598175B (en) 2021-01-01

Family

ID=68860273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910877555.7A Active CN110598175B (en) 2019-09-17 2019-09-17 Sparse matrix column vector comparison device based on graph computation accelerator

Country Status (1)

Country Link
CN (1) CN110598175B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893333A (en) * 2016-03-25 2016-08-24 合肥工业大学 Hardware circuit for calculating covariance matrix in MUSIC algorithm
CN108960418A (en) * 2018-08-08 2018-12-07 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing matrix-vector multiplication instruction
CN110110851A (en) * 2019-04-30 2019-08-09 南京大学 A kind of the FPGA accelerator and its accelerated method of LSTM neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100943514B1 (en) * 2007-12-18 2010-02-22 한국전자통신연구원 An Row-Vector Norm Comparator for Inverse Matrix and Method Therefor
CN111176608A (en) * 2016-04-26 2020-05-19 中科寒武纪科技股份有限公司 Apparatus and method for performing vector compare operations
US10691610B2 (en) * 2017-09-27 2020-06-23 Apple Inc. System control using sparse data
CN108388446A (en) * 2018-02-05 2018-08-10 上海寒武纪信息科技有限公司 Computing module and method
CN109919826B (en) * 2019-02-02 2023-02-17 西安邮电大学 Graph data compression method for graph computation accelerator and graph computation accelerator
CN109949202B (en) * 2019-02-02 2022-11-11 西安邮电大学 Parallel graph computation accelerator structure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893333A (en) * 2016-03-25 2016-08-24 合肥工业大学 Hardware circuit for calculating covariance matrix in MUSIC algorithm
CN108960418A (en) * 2018-08-08 2018-12-07 上海寒武纪信息科技有限公司 Processing with Neural Network device and its method for executing matrix-vector multiplication instruction
CN110110851A (en) * 2019-04-30 2019-08-09 南京大学 A kind of the FPGA accelerator and its accelerated method of LSTM neural network

Also Published As

Publication number Publication date
CN110598175A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN108416327B (en) Target detection method and device, computer equipment and readable storage medium
CN109543816B (en) Convolutional neural network calculation method and system based on weight kneading
US20210349692A1 (en) Multiplier and multiplication method
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN112200300B (en) Convolutional neural network operation method and device
CN112257844B (en) Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
Xia et al. Fully dynamic inference with deep neural networks
CN109791628A (en) Neural network model splits' positions method, training method, computing device and system
CN110413254A (en) Data processor, method, chip and electronic equipment
CN108228536A (en) The method that Hermitian matrix decompositions are realized using FPGA
CN112257366A (en) CNF generation method and system for equivalence verification
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
CN110635809B (en) Design method of parallel polarization code BP decoder based on formula language
CN110598175B (en) Sparse matrix column vector comparison device based on graph computation accelerator
JP2015503785A (en) FFT / DFT reverse sorting system, method, and operation system thereof
CN112862091B (en) Resource multiplexing type neural network hardware accelerating circuit based on quick convolution
CN112149047A (en) Data processing method and device, storage medium and electronic device
CN104854602A (en) Generating messages from the firing of pre-synaptic neurons
KR20020080789A (en) Fast fourier transform apparatus
CN115952846A (en) AI algorithm architecture implementation device, sparse convolution operation method and related equipment
CN109101219A (en) High radix subset code multiplier architecture
Arredondo-Velázquez et al. A streaming accelerator of convolutional neural networks for resource-limited applications
CN115878957B (en) Matrix multiplication acceleration device and method
CN112269806B (en) Data query method, device, equipment and computer storage medium
US20230206045A1 (en) Deep learning acceleration with mixed precision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant