KR101929847B1

KR101929847B1 - Apparatus and method for computing a sparse matrix

Info

Publication number: KR101929847B1
Application number: KR1020180055243A
Authority: KR
Inventors: 기안도
Original assignee: 주식회사 퓨쳐디자인시스템
Priority date: 2018-05-15
Filing date: 2018-05-15
Publication date: 2018-12-17

Abstract

The present invention relates to an apparatus for calculating a sparse matrix which can ultimately increase the speed of matrix calculation and a method thereof. The apparatus for calculating a sparse matrix comprises: a memory to store sparse matrices in which information indicating the number of continuous zeros in an identical row or column directions is represented by mantissa bits of the number of floating points representing values of elements of matrices; and one or more matrix calculation units to calculate sparse matrices accessed in the memory. A matrix calculation unit includes: a multiplication and accumulator (MAC) to matrix-multiply values of elements sequentially inputted in different paths; and a controller to sequentially select values of elements making up each sparse matrix accessed in the memory to output the values to a multiplier of the MAC, and output values to make an output of the multiplier, a zero, without selecting the number of values of elements indicated by the run length encoding when mantissa bits of the number of floating points for selected elements are run length encoded.

Description

[0001] APPARATUS AND METHOD FOR COMPUTING A SPARSE MATRIX [0002]

본 발명은 행렬 연산에 관한 것으로, 특히 희소행렬을 연산하는 장치 및 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to matrix operations, and more particularly, to an apparatus and method for computing a sparse matrix.

희소행렬을 연산하는 방법으로서, 도 1은 GEMM(General Matrix Multiplication)을 도식화한 것이며, 도 2는 일반적인 행렬 곱셈 방법을 도식화한 것이다.FIG. 1 is a graphical representation of a GEMM (General Matrix Multiplication), and FIG. 2 is a schematic diagram of a general matrix multiplication method.

행렬연산은 과학기술 관련 프로그램, 이미지 처리 프로그램, 인공지능(딥러닝) 프로그램에서 핵심적인 역할을 하는 것으로, 도 1과 같은 BLAS(Basic Linear Algebra Subprograms)의 GEMM(General Matrix Multiplication) 루틴을 주로 사용한다.Matrix operation plays a key role in science-related programs, image processing programs, and artificial intelligence (deep learning) programs, and mainly uses GEMM (General Matrix Multiplication) routines of BLAS (Basic Linear Algebra Subprograms) .

GEMM의 핵심은 행렬 A와 행렬 B를 곱하여 행렬 C를 계산하는 것으로, 도 2에 도시한 바와 같이 행렬 A의 행(row)과 행렬 B의 열(column)의 각 요소(element)를 곱한 후 더하여 행렬 C의 요소를 계산한다.The core of the GEMM is to multiply the matrix A by the matrix B to calculate the matrix C. As shown in FIG. 2, the row of the matrix A is multiplied by the elements of the column of the matrix B, Compute the elements of matrix C.

도 2의 계산을 수식으로 표시하면 다음과 같다.The calculation of FIG. 2 can be expressed by the following equation.

이러한 계산은 프로세서에서 하기와 같은 FOR 루프를 통해 계산한다. 이는 행렬 A의 크기가 M×K이고, 행렬 B의 크기가 K×N 일 때 매우 많은 횟수의 메모리 참조와 곱셈 그리고 덧셈이 필요하다.These calculations are calculated by the processor through the FOR loop as follows. This requires a very large number of memory references, multiplication and addition when the size of matrix A is M × K and the size of matrix B is K × N.

만약 행렬 곱셈을 프로그램이 아닌 하드웨어로 수행한다면 도 3에 도시한 MAC(Multiplication and Accumulator)을 활용할 수 있다.If the matrix multiplication is performed by hardware rather than a program, the MAC (multiplication and accumulator) shown in FIG. 3 can be utilized.

한편, 희소행렬(sparce matrix)은 행렬의 요소 중 '0'인 요소의 수가 많은 경우인데, 행렬을 저장하기 위한 메모리 공간을 줄이기 위한 방법들이 제안되어 사용되고 있다. 대표적인 방법이 키를 이용하는 방법과 링크드리스트를 이용하는 방법이다.On the other hand, a sparse matrix has a large number of elements of '0' among elements of the matrix, and methods for reducing memory space for storing the matrix have been proposed and used. A typical method is a method using a key and a method using a linked list.

행렬 곱셈에 사용하는 행렬값(요소들의 값)은 통상 부동소수점 수(floating-point number)이며, 부동소수점 수를 표현하는 방법은 도 4와 같이 그 비트 수에 따라 구별하며 이는IEEE 754 표준을 준용한 것이다.The matrix value (element value) used for the matrix multiplication is usually a floating-point number, and the method of expressing the floating-point number is classified according to the number of bits as shown in FIG. 4, It is.

도 4 에서 S는 가수의 양과 음을 표시하고, E 는 지수이고 M 은 가수이다. 이들 부동소수점은 다음과 같이 특별한 표현들이 있다.In FIG. 4, S denotes the quantity and the sign of the mantissa, E is the exponent and M is the mantissa. These floating point numbers have the following special expressions:

S=0, E=all 1, M=all 0 →Infinity.S = 0, E = all 1, M = all 0 → Infinity.

S=0, E=all 1, M=not all 0 → +NaN.S = 0, E = all 1, M = not all 0 → + NaN.

S=1, E=all 1, M=all 0 → -Infinity.S = 1, E = all 1, M = all 0 - > -Infinity.

S=1, E=all 1, M=not all 0 → -NaN.S = 1, E = all 1, M = not all 0 -NaN.

앞서 언급한 바와 같이 하드웨어로 행렬 곱셈을 수행함에 있어 특히 희소행렬에 대한 곱셈을 수행하는 경우 불필요한 연산과 메모리 참조가 반복되기 때문에 연산 속도를 향상시키는데 제약이 따른다.As described above, in performing matrix multiplication with hardware, there is a restriction in improving the operation speed, in particular, in performing multiplication for a sparse matrix because unnecessary operations and memory references are repeated.

대한민국 등록특허공보 제10-1400577호Korean Registered Patent No. 10-1400577 대한민국 공개특허공보 제10-2017-0089389호Korean Patent Publication No. 10-2017-0089389

이에 본 발명은 상술한 문제를 해결하기 위해 창안된 발명으로써, 본 발명의 주요 목적은 행렬 연산을 함에 있어 희소행렬에 대한 불필요한 연산과 메모리 혹은 행렬 데이터 참조 횟수를 감소시켜 궁극적으로 행렬 연산의 속도를 향상시킬 수 있는 희소행렬 연산 방법과 장치를 제공함에 있다.Accordingly, it is an object of the present invention to solve the above-mentioned problems, and it is a primary object of the present invention to provide a method and apparatus for reducing the number of unnecessary operations on a sparse matrix and a reference number of memory or matrix data, And to provide a method and an apparatus for operating a sparse matrix that can be improved.

상술한 기술적 과제를 해결하기 위한 본 발명의 실시예에 따른 희소행렬 연산 방법은,According to an aspect of the present invention, there is provided a method for calculating a sparse matrix,

동일한 행 방향 혹은 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보가 부동소수점 수의 가수 비트들로 표현되어 있는 희소행렬들을 메모리에서 액세스하는 단계와;Accessing in the memory sparse matrices in which information indicating the number of consecutive "zeros" in the same row or column direction is represented by mantissa bits of a floating-point number;

행렬 연산하기 위해 액세스된 각각의 희소행렬을 구성하는 원소의 값을 순차 선택하여 행렬 연산부로 출력하되, 선택된 원소의 값을 표현하는 부동소수점 수의 가수 비트가 상기 연속되는 '제로'의 개수를 나타내는 정보일 때 그 연속되는 '제로'의 개수만큼 원소의 값 선택 없이 '제로'를 출력하는 단계;를 포함함을 특징으로 하며,And sequentially outputting the values of the elements constituting each accessed sparse matrix to the matrix operation unit, wherein the mantissa bits of the floating-point number expressing the value of the selected element represent the number of consecutive zeros And outputting 'zero' without selecting a value of an element by the number of consecutive 'zeros' when the information is information,

더 나아가 상기 메모리에서 희소행렬을 액세스하기 이전에,Further, prior to accessing the sparse matrix in the memory,

각 원소의 값이 부동소수점 수로 표현되어 있는 희소행렬을 전송받아 상기 메모리에 저장하되, 희소행렬을 구성하는 각 원소의 값 중 동일한 행 방향 혹은 열 방향으로 연속하여 '제로'의 값을 가지는 각 원소가,And a value of each element is represented by a floating-point number, and the element is stored in the memory, and the value of each element constituting the sparse matrix is consecutive in the same row direction or column direction, end,

자신에 뒤이어 연속하는 '제로'의 개수를 런 길이 부호화하여 이를 부동소수점 수의 가수 비트로 표현한 값을 원소의 값으로 가지도록 변환하여 저장하는 단계를 더 포함함을 특징으로 한다.Length encoding a number of consecutive zeroes and converting the number of consecutive zeroes into a value of an element expressed by mantissa bits of a floating-point number, and storing the transformed value.

또한 상술한 실시 예의 희소행렬 연산 방법에 있어서, 상기 메모리에서 액세스되는 희소행렬은 상기 메모리에 저장된 희소행렬을 m×n(m,n은 자연수) 단위로 분할하여 행렬 연산이 처리되도록 함을 또 다른 특징으로 한다.Further, in the sparse matrix computation method of the above-described embodiment, the sparse matrix accessed in the memory divides the sparse matrix stored in the memory into units of mxn (m, n is a natural number) .

상기 희소행렬을 구성하는 원소의 값 중 동일한 행 방향 혹은 열 방향으로 연속하여 '제로'의 값을 가지는 원소의 값은,The value of an element having a value of " zero " continuously in the same row direction or column direction among the values of elements constituting the sparse matrix,

행 혹은 열 우선을 나타내는 부호(S)비트와, 모두가 "1"인 지수(E)비트와, 행 혹은 열에서 연속된 '제로'의 개수를 나타내는 가수(M) 비트들로 구성되는 부동소수점 수로 표현되되, 상기 '제로'의 개수는 런 길이 부호화됨을 특징으로 한다.(S) bit indicating row or column priority, an exponent (E) bit having both "1" and mantissa (M) bits indicating the number of consecutive zeros in a row or column, And the number of 'zero' is run-length encoded.

한편 본 발명의 또 다른 실시예에 따른 희소행렬 연산 장치는,According to another aspect of the present invention, there is provided a sparse matrix computing apparatus comprising:

동일한 행 방향 혹은 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보가 행렬을 구성하는 각 원소의 값을 표현하는 부동소수점 수의 가수 비트들로 표현되어 있는 희소행렬들을 저장하기 위한 메모리와;A memory for storing sparse matrices in which information indicating the number of 'zeroes' continuous in the same row direction or column direction is represented by mantissa bits of a floating-point number representing the value of each element constituting the matrix;

상기 메모리에서 액세스된 희소행렬들을 행렬 연산하기 위한 하나 이상의 행렬 연산부를 포함하되, 상기 행렬 연산부는,And at least one matrix operation unit for performing matrix operation on the accessed sparse matrices in the memory,

서로 다른 경로로 순차 입력되는 원소의 값을 행렬 연산하기 위한 MAC(Multiplication and Accumulator)과;A MAC (Multiplication and Accumulator) for matrix operation of values of elements sequentially input in different paths;

상기 메모리에서 액세스된 각각의 희소행렬을 구성하는 원소의 값을 순차 선택하여 상기 MAC의 곱셈기로 출력하되, 선택 원소에 대한 부동소수점 수의 가수 비트가 런 길이 부호화되어 있을 때 상기 런 길이 부호화 값이 지시하는 개수만큼 원소의 값 선택 없이 상기 곱셈기의 출력을 '제로'로 만들기 위한 값을 출력하는 컨트롤러를 포함함을 특징으로 한다.The run length encoding value is calculated by sequentially selecting values of elements constituting each of the sparse matrices accessed in the memory and outputting the values to the multiplier of the MAC when the mantissa bits of the floating point number for the selected element are run- And outputs a value for making the output of the multiplier 'zero' without selecting a value of the number of elements to be indicated.

상술한 희소행렬 연산 장치에 있어서, 동일한 행 방향 혹은 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보가 행렬을 구성하는 각 원소의 값을 표현하는 부동소수점 수의 가수 비트들로 표현되어 있는 희소행렬을 생성하여 상기 메모리로 전달하는 외부 프로세서를 더 포함함을 또 다른 특징으로 한다.In the above-described sparse matrix arithmetic unit, information indicating the number of 'zeroes' continuous in the same row direction or column direction is expressed as a scarce number expressed by mantissa bits of a floating-point number representing the value of each element constituting the matrix And an external processor for generating a matrix and transmitting the matrix to the memory.

상술한 구성의 희소행렬 연산 장치에 있어서, 상기 희소행렬을 구성하는 원소의 값 중 동일한 행 방향 혹은 열 방향으로 연속하여 '제로'의 값을 가지는 원소의 값은,In the sparse matrix computing device having the above-described configuration, the value of an element having a value of " zero " continuously in the same row direction or column direction among the values of the elements constituting the sparse matrix,

본 발명의 실시예에 따른 방법 혹은 장치로 행렬 연산을 할 경우 동일한 행 방향 혹은 열 방향으로 연속하는 원소의 값이 '제로'일 때 그 개수만큼 원소의 값 참조 없이 행렬 연산이 이루어지도록 할 수 있기 때문에, 행렬 데이터 참조 횟수를 감소시켜 궁극적으로 행렬 연산의 속도를 향상시킬 수 있는 것이다. 더욱이 희소행렬의 크기가 클수록 행렬 데이터 참조 횟수는 더 증가하게 되므로, 본 발명의 효과는 극대화될 수 있다.When a matrix operation is performed by a method or apparatus according to an embodiment of the present invention, matrix operation can be performed without referring to the value of an element as many as the number of consecutive elements in the same row direction or column direction when the number is zero Therefore, it is possible to reduce the number of reference times of the matrix data, thereby ultimately improving the speed of the matrix operation. Furthermore, the larger the size of the sparse matrix, the more the number of reference times of the matrix data increases, so that the effect of the present invention can be maximized.

특히 과학기술 관련, 이미지 처리 관련, 인공 지능(딥 러닝) 관련 분야에서의 행렬 연산은 매우 많은 양의 계산을 필요로 한다. 이러한 행렬 연산을 수행함에 있어 연속하여 '제로'의 값을 가지는 행 및 열 데이터를 메모리에서 액세스하지 않거나 참조하는 횟수를 줄이는 것만으로도 행렬 계산 양을 줄일 수 있어 연산 성능의 향상은 물론 고속화에 이바지할 수 있는 이점이 있다.In particular, mathematical operations in the fields of science and technology, image processing, and artificial intelligence (deep running) require a very large amount of computation. In performing such a matrix operation, it is possible to reduce the amount of matrix calculation by simply reducing the number of times that row and column data having a value of 'zero' are not accessed or referred to in the memory, There is an advantage to be able to do.

도 1은 GEMM(General Matrix Multiplication)을 도식화한 도면.
도 2는 행렬 곱셈의 일 예시도.
도 3는 행렬 연산에 이용되는 일반적인 MAC(Multiplier and Accumulator) 구성 예시도.
도 4는 부동소수점 수의 표현 예시도.
도 5는 본 발명의 일 실시예에 따른 행렬 연산 장치의 주변 구성 예시도.
도 6은 도 5 중 행렬 연산 장치의 구성 예시도.
도 7 및 도 8은 본 발명의 실시예에 따른 부동소수점 수의 표현 방법을 설명하기 위한 도면.
도 9는 본 발명의 실시예에 따라 희소행렬을 분할 액세스하는 과정을 설명하기 위한 도면.
도 10 및 도 11은 본 발명의 실시예에 따른 효과를 부연 설명하기 위한 도면.FIG. 1 is a schematic diagram of a GEMM (General Matrix Multiplication). FIG.
Figure 2 is an example of matrix multiplication.
3 is a diagram illustrating an example of a general MAC (Multiplier and Accumulator) configuration used for matrix operation.
4 is an exemplary representation of a floating-point number;
5 is a diagram illustrating an example of a peripheral configuration of a matrix computation apparatus according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a configuration example of a matrix computing device in FIG. 5; FIG.
7 and 8 are diagrams for explaining a method of representing a floating-point number according to an embodiment of the present invention.
9 is a diagram for explaining a process of dividing and accessing a sparse matrix according to an embodiment of the present invention.
10 and 11 are diagrams for further illustrating effects according to an embodiment of the present invention.

이하 본 발명의 바람직한 실시예를 첨부 도면을 참조하여 상세히 설명하기로 한다. 본 발명을 설명함에 있어 관련된 공지 기능 혹은 구성과 같은 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그에 대한 상세한 설명은 생략하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

참고적으로 하기에서 사용되는 용어 중 '원소'란 행렬을 구성하는 성분(entry)를 의미하는 것으로 정의하기로 하며, '원소의 값'이란 상기 정의된 '원소'의 행렬값을 의미하는 것으로 정의한다. 통상 행렬 곱셈에 사용하는 행렬값, 즉 원소의 값은 부동소수점 수(floating-point number)로 표현되기에, 상기 '원소의 값' 역시 부동소수점 수로 표현된다. 한편 '후속'이란 표현은 '동일한 행(row) 방향 혹은 열(column) 방향으로 자신을 포함해 뒤이어 연속되는' 의미인 것으로 정의하기로 한다. 또한 하기에서는 희소행렬 연산 중 희소행렬 곱셈을 예시하여 본 발명의 실시예를 설명하기로 한다.For reference, the term 'element' used in the following description is defined as meaning an entry constituting a matrix, and 'element value' means a matrix value of the element defined above. do. Since the matrix value used in the normal matrix multiplication, that is, the value of an element is represented by a floating-point number, the value of the element is also represented by a floating-point number. On the other hand, the term 'follow-up' is defined as a meaning that includes 'itself in the same row direction or column direction, followed by' meaning. In the following, embodiments of the present invention will be described by exemplifying a sparse matrix multiplication during a sparse matrix operation.

도 5는 본 발명의 일 실시예에 따른 행렬 연산 장치(400)의 주변 구성도를 예시한 것이며, 도 6은 도 5 중 행렬 연산 장치의 구성도를 예시한 것이다.FIG. 5 illustrates a peripheral configuration diagram of a matrix operation device 400 according to an embodiment of the present invention, and FIG. 6 illustrates a configuration diagram of a matrix operation device in FIG.

도 5를 참조하면, 외부 프로세서(100)는 본 발명의 실시예에 따른 행렬 연산 장치(400)가 실장되어 있는 하드웨어(예를 들면 FPGA와 같은) 보드와 시스템 버스(120)를 통해 연결되어 있다. 외부 프로세서(100)는 희소행렬의 연산, 예를 들면 희소행렬의 곱셈을 위해 필요한 희소행렬 데이터를 메인 메모리(110)에서 독출하여 상기 하드웨어 보드측으로 전달한다.5, the external processor 100 is connected to a hardware (e.g., FPGA) board on which a matrix computing device 400 according to an embodiment of the present invention is mounted and a system bus 120 . The external processor 100 reads the sparse matrix data necessary for the calculation of the sparse matrix, for example, the multiplication of the sparse matrix, from the main memory 110 and transfers it to the hardware board.

전달되는 희소행렬의 각 원소의 값은 앞서 설명한 바와 같이 부동소수점 수로 표현되는데, 구현 방식에 따라 상기 외부 프로세서(100)는 단순히 각 원소의 값이 통상의 방법에서와 같이 부동소수점 수로 표현되어 있는 희소행렬을 상기 하드웨어 보드측으로 전달하거나, 본 발명의 실시 예에 따라 동일한 행 방향 혹은 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보가 행렬을 구성하는 각 원소의 값을 표현하는 부동소수점 수의 가수(M) 비트들로 표현되어 있는 희소행렬을 생성하여 상기 하드웨어 보드측으로 전달할 수도 있다. 즉, 행렬 내에서 동일한 행 방향 혹은 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보가 부동소수점 수의 가수 비트들로 표현되는 희소행렬을 생성하는 외부 프로세서(100)도 행렬 연산 장치(400)의 일 구성요소로 간주될 수 있다.The value of each element of the delivered sparse matrix is expressed by a floating-point number as described above. According to the implementation method, the external processor 100 merely stores the value of each element as a rare- Information indicating the number of 'zeros' consecutive in the same row direction or column direction according to the embodiment of the present invention is transmitted as a floating-point number mantissa representing the value of each element constituting the matrix (M) bits to be transmitted to the hardware board. That is, the external processor 100 that generates the sparse matrix in which the information indicating the number of 'zeroes' continuous in the same row direction or column direction in the matrix is expressed by the mantissa bits of the floating-point number, Can be regarded as a component of.

외부 프로세서(100)의 부하를 분산시키기 위해 존재하는 하드웨어 보드에는 도 5에 도시한 바와 같이 버스 I/F(Interface)(210), FPGA 내부 시스템 버스(220), 메모리(300), 행렬 연산 장치(400)가 실장될 수 있다. 도 5에서는 메모리(300)와 행렬 연산 장치(400)가 분리되어 있으나, 상기 메모리(300)는 행렬 연산 장치(400)를 구성하는 하나 이상의 행렬 연산부와 함께 하나의 행렬 연산 장치(400)로 명명될 수도 있다.As shown in FIG. 5, a hardware board for distributing the load of the external processor 100 includes a bus I / F (Interface) 210, an FPGA internal system bus 220, a memory 300, (400) can be mounted. 5, the memory 300 and the matrix calculator 400 are separated from each other. However, the memory 300 includes one or more matrix calculators constituting the matrix calculator 400 and one matrix calculator 400 .

상기 메모리(300)에는 동일한 행 방향 혹은 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보가 행렬을 구성하는 각 원소의 값을 표현하는 부동소수점 수의 가수 비트들로 표현되어 있는 희소행렬들이 저장된다. 이러한 희소행렬들은 상기 외부 프로세서(100)로부터 전달받을 수 있다.In the memory 300, information indicating the number of 'zeros' consecutive in the same row direction or column direction is stored as a floating-point number of mantissa bits representing the value of each element constituting the matrix do. These sparse matrices may be delivered from the external processor 100.

통상의 방법에 의해 부동소수점 수로 표현된 희소행렬을 외부 프로세서(100)로부터 전달받는 경우에는 후술할 행렬 연산 장치(400) 내의 컨트롤러가 전달받은 회소행렬 내에서 동일한 행 방향 혹은 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보가 부동소수점 수의 가수 비트들로 표현되도록 변환하여 행렬 연산에 이용할 수도 있다.When a sparse matrix represented by a floating-point number is received from the external processor 100 by a conventional method, the controller in the matrix calculator 400, which will be described later, Quot; zero " is represented by the mantissa bits of the floating-point number and may be used for the matrix operation.

한편 행렬 연산 장치(400)는 상기 메모리(300)에서 액세스된 희소행렬들을 행렬 연산하기 위한 하나 이상의 행렬 연산부를 포함하되, 상기 행렬 연산부는 일 예로서 도 6에 도시한 바와 같이,On the other hand, the matrix calculator 400 includes one or more matrix calculators for performing matrix calculations on the sparse matrices accessed in the memory 300, and the matrix calculator includes, as an example,

서로 다른 경로로 순차 입력되는 원소의 값을 행렬 곱셈하기 위한 MAC(Multiplication and Accumulator)(430)과,A MAC (Multiplication and Accumulator) 430 for performing matrix multiplication on values of elements sequentially input in different paths,

상기 메모리(300)에서 액세스된 각각의 희소행렬을 구성하는 원소의 값을 순차 선택하여 상기 MAC(430)의 곱셈기로 출력하되, 선택 원소에 대한 부동소수점 수의 가수 비트가 런 길이(run length) 부호화되어 있을 때 상기 런 길이 부호화 값이 지시하는 개수만큼 원소의 값 선택 없이 상기 곱셈기의 출력을 '제로'로 만들기 위한 값을 출력하는 컨트롤러(420)를 포함할 수 있다.And sequentially outputs the values of the elements constituting each of the sparse matrices accessed in the memory 300 to the multiplier of the MAC 430. When the mantissa of the floating point number of the selected element is a run length, And outputs a value for making the output of the multiplier 'zero' without selecting a value of an element as many as the number indicated by the run length encoded value when encoded.

상기 컨트롤러(420)는 버스 I/F(410)를 통해 시스템 버스(220)에 연결 가능하며 DMA 방식으로 상기 메모리(300)로부터 희소행렬을 액세스하여 후단의 MAC(430)으로 전달한다.The controller 420 is connectable to the system bus 220 via the bus I / F 410 and accesses the sparse matrix from the memory 300 in the DMA scheme and transfers the sparse matrix to the MAC 430 in the subsequent stage.

도 6에 도시한 바와 같은 구성을 가지는 복수의 행렬 연산부 각각은 시스템 버스(220)에 연결되어 행렬 곱셈을 병렬 처리할 수 있다. 이를 위해 상기 컨트롤러(420)는 메모리(300)에 저장된 희소행렬을 m×n(m,n은 자연수) 단위로 분할하여 액세스할 수 있다. 물론 상기 자연수 m과 n은 저장된 희소행렬의 행과 열의 길이 보다 작은 값을 가져야 할 것이다. 도 6에 도시한 MAC(430)은 도 3에 도시한 구성을 가지므로 상세 설명은 생략하기로 한다.Each of the plurality of matrix operation units having the configuration as shown in FIG. 6 may be connected to the system bus 220 to process the matrix multiplication in parallel. To this end, the controller 420 may access the sparse matrix stored in the memory 300 by dividing the m × n (m, n is a natural number) unit. Of course, the natural numbers m and n should have values smaller than the lengths of the rows and columns of the stored sparse matrix. Since the MAC 430 shown in FIG. 6 has the configuration shown in FIG. 3, a detailed description thereof will be omitted.

이하 메모리(300)와 하나 이상의 행렬 연산부(보다 구체적으로는 행렬 곱셈부)를 포함하는 행렬 연산 장치(400)의 동작을 도 7 내지 도 11을 참조하여 보다 상세히 설명하기로 한다.Hereinafter, the operation of the matrix calculator 400 including the memory 300 and one or more matrix calculators (more specifically, a matrix multiplier) will be described in detail with reference to FIGS. 7 to 11. FIG.

우선 도 7 및 도 8은 본 발명의 실시예에 따른 부동소수점 수의 표현 방법을 설명하기 위한 도면을 도시한 것으로, 보다 구체적으로 도 7은 행 우선 RLE(Run Length Encoding)를 적용한 경우의 예를 나타낸 것이고, 도 8은 열 우선 RLE를 적용한 경우의 예를 나타낸 것이다.7 and 8 are diagrams for explaining a floating-point number representation method according to an embodiment of the present invention. More specifically, FIG. 7 shows an example of applying a row-first run length encoding (RLE) And FIG. 8 shows an example in the case of applying the column-priority RLE.

본 발명의 실시 예에서는 행렬 연산의 입력값으로 사용하지 않는 표현 중 NaN(Not a Number)을 RLE에 사용하여 하기와 같이 표현한다.In the embodiment of the present invention, NaN (Not a Number), which is not used as an input value of a matrix operation, is used for RLE and expressed as follows.

RRLE:S=0, E=all, M='제로'의 개수 → row-major run length encodingRRLE: S = 0, E = all, M = number of zeroes → row-major run length encoding

CRLE:S=1, E=all, M='제로'의 개수 → column-major run length encodingCRLE: S = 1, E = all, M = number of 'zeroes' → column-major run length encoding

즉, 행 우선의 경우 S(부호비트)가 '0'이고, E(지수비트)가 모두 '1'인 경우 행 방향으로 연속된 '제로'의 개수를 M(가수비트)에 표시한다. 열 우선의 경우에는 S가 '1'이고, E가 모두 '1'인 경우 동일 열 방향으로 연속된 '제로'의 개수를 M에 표시한다.That is, the number of consecutive zeros in the row direction is displayed in M (mantissa bit) when S (sign bit) is '0' and E (exponent bit) is '1' in row priority. In the case of column priority, the number of consecutive zeros in the same column direction is indicated in M when S is '1' and E is all '1'.

부연 설명하면, 도 7이 행 우선 행렬이라고 가정했을 때 1번 행에서 2번 내지 7번 열까지 연속하여 원소의 값이 '제로'라면, 1번 행 2번 열의 원소의 값은 S=0, E=all1, M=5이다. 이때 가수 비트인 M을 런 길이 부호화하면 해당 원소의 값은 {0,1111_1111,000_0000_0000_0000_0000_0101}과 같은 부호소수점 수로 표현 가능하다. 또한 1번 행 3번 열의 원소의 값은 S=0, E=all1, M=4이므로, 가수 비트인 M을 런 길이 부호화하면 해당 원소(1번 행 3번 열)의 값은 {0,1111_1111,000_0000_0000_0000_0000_0100}로 표현 가능하다. '제로'가 1개인 경우는 원래의 표현인 {0,0000_0000,000_0000_0000_0000_0000}으로 표시하거나, {0,1111_1111,000_0000_0000_0000_0000_0001}로 표시한다.In other words, assuming that FIG. 7 is a row-first matrix, if the value of the element is 'zero' successively from row 2 to column 7 in row 1, the values of the elements in column 1 row 2 are S = 0, E = all1, M = 5. At this time, if the mantissa M is run-length encoded, the value of the corresponding element can be represented by a sign decimal point number such as {0, 1111_1111,000_0000_0000_0000_0000_0101}. Since the values of the elements in row 1, row 3 and column 3 are S = 0, E = all 1 and M = 4, the value of the corresponding element (column 1, row 3) is {0,1111_1111 , 000_0000_0000_0000_0000_0100}. If there is one 'zero', it can be displayed as {0,0000_0000,000_0000_0000_0000_0000} or {0,1111_1111,000_0000_0000_0000_0000_0001} as the original expression.

행 우선의 경우 도 7의 2번과 3번 행의 경우와 같이 모든 RLE는 행 단위에서 새롭게 시작한다.In the row-first case, as in the case of the second and third rows in FIG. 7, all the RLEs are newly started in the row units.

위와 동일한 방식으로 열 우선 행렬을 구성하는 원소의 값 역시 도 8에 도시한 바와 같이 동일한 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보를 부동소수점 수의 가수(M) 비트들로 표현되도록 변환할 수 있다.As shown in FIG. 8, the values of the elements constituting the column-first matrix in the same manner as described above are converted such that information indicating the number of 'zeroes' continuous in the same column direction is represented by mantissa (M) bits of the floating- can do.

이상에서 설명한 방법으로 외부 프로세서(100) 혹은 행렬 곱셈부를 구성하는 컨트롤러(420)는 동일한 행 방향 혹은 열 방향으로 연속되는 '제로'의 개수를 나타내는 정보가 부동소수점 수의 가수(M) 비트들로 표현되도록 희소행렬을 변환한다. 이와 같이 변환된 희소행렬들은 메모리(300)에 저장되며, 후술하는 하나 이상의 행렬 곱셈부 내의 컨트롤러(420)에 의해 액세스되어 행렬 곱셈에 이용된다.In the above-described method, the controller 420 constituting the external processor 100 or the matrix multiplier divides the information indicating the number of 'zeros' consecutive in the same row or column direction into mantissa (M) bits of the floating-point number The sparse matrix is transformed to be represented. The thus transformed sparse matrices are stored in the memory 300 and accessed by the controller 420 in one or more matrix multipliers described below to be used for matrix multiplication.

컨트롤러(420)는 도 9에 도시한 바와 같이 행렬 곱셈부의 수에 맞게 행 방향 혹은 열 방향으로 분할하여 액세스할 수도 있다.The controller 420 may be divided and accessed in the row direction or column direction according to the number of matrix multiplication units as shown in FIG.

참고적으로 도 10 및 도 11은 본 발명의 실시예에 따른 효과를 부연 설명하기 위한 도면을 도시한 것이다.For reference, FIGS. 10 and 11 show drawings for further illustrating effects according to the embodiment of the present invention.

우선 메모리(300)에서 행렬 곱셈을 위해 필요한 희소행렬을 액세스한 컨트롤러(420)는 액세스된 각각의 희소행렬을 구성하는 원소의 값을 순차 선택하여 MAC(Multiplication and Accumulator)(430)의 곱셈기로 출력하되, 선택된 원소의 값을 표현하는 부동소수점 수의 가수(M) 비트가 후속되는 '제로'의 개수를 나타내는 정보일 때 그 후속되는 '제로'의 개수만큼 원소의 값 선택 없이 상기 곱셈기의 출력을 '제로'로 만들기 위한 값(예를 들면 논리레벨 제로)을 출력한다.The controller 420 accesses the sparse matrix necessary for the matrix multiplication in the memory 300. The controller 420 sequentially selects the values of the elements constituting the accessed sparse matrix and outputs them to the multiplier of the MAC (Multiplication and Accumulator) (M) bits of a floating-point number representing a value of a selected element are information indicating the number of 'zeroes' followed by the output of the multiplier without selecting the value of the element as many as the number of the following 'zeroes' And outputs a value for making it zero (for example, logic level zero).

이를 도 10 및 도 11을 참조하여 부연 설명하면, 우선 도 10에 도시한 바와 같이 희소행렬 A와 희소행렬 B를 곱셈할 경우 행렬 A의 1번 행에 대한 계산을 할 경우 도 11에서 빗금으로 표시한 행렬 B의 원소들에 대한 곱셈 계산은 필요 없게 된다. 이는 곧 해당 빗금으로 표시한 행렬 B의 원소들의 값을 참조할 필요가 없다는 것을 의미한다.10 and 11, when the sparse matrix A and the sparse matrix B are multiplied as shown in Fig. 10, when the calculation for the first row of the matrix A is performed, The multiplication calculation for the elements of a matrix B becomes unnecessary. This means that it is not necessary to refer to the values of the elements of the matrix B marked with the corresponding hatched.

이에 컨트롤러(420)는 MAC(420)으로 출력하기 위해 선택한 원소의 값을 표현하는 부동소수점 수의 가수 비트가 후속되는 '제로'의 개수를 나타내는 정보일 때 그 후속되는 '제로'의 개수만큼 원소의 값 선택 없이 곱셈기의 출력을 '제로'로 만들기 위한 값을 출력하여 행렬 곱셈이 이루어지도록 한다.When the controller 420 determines that the mantissa bit of the floating-point number representing the value of the element selected for output to the MAC 420 is the information indicating the number of 'zeroes' followed by the number of 'zero' So that the matrix multiplication is performed by outputting a value for making the output of the multiplier zero.

이와 같은 방법으로 도 10에서 행렬 B의 열 2번을 행렬 곱셈할 경우 행렬 A의 1-4번 값을 참조하지 않아도 된다.In this way, in the case of performing matrix multiplication of column 2 of matrix B in FIG. 10, it is not necessary to refer to values 1-4 of matrix A.

따라서 본 발명의 실시예에 따른 방법 혹은 장치로 행렬 곱셈을 할 경우 동일한 행 방향 혹은 열 방향으로 연속하는 원소의 값이 '제로'일 때 그 개수만큼 원소의 값 참조 없이 행렬 곱셈이 이루어지도록 할 수 있기 때문에, 행렬 데이터 참조 횟수를 감소시켜 궁극적으로 행렬 연산의 속도를 향상시킬 수 있는 것이다. 더욱이 희소행렬의 크기가 클수록 행렬 데이터 참조 횟수는 더 증가하게 되므로, 본 발명의 효과는 극대화될 수 있다.Therefore, when the matrix multiplication is performed by the method or apparatus according to the embodiment of the present invention, matrix multiplication can be performed without referring to the values of elements as many as the number of consecutive elements in the same row direction or column direction, , It is possible to reduce the number of matrix data references and ultimately improve the speed of matrix operation. Furthermore, the larger the size of the sparse matrix, the more the number of reference times of the matrix data increases, so that the effect of the present invention can be maximized.

특히 과학기술 관련, 이미지 처리 관련, 인공 지능(딥 러닝) 관련 분야에서의 행렬 연산은 매우 많은 양의 계산을 필요로 한다. 이러한 행렬 연산을 수행함에 있어 연속하여 '제로'의 값을 가지는 행 및 열 데이터를 메모리에서 액세스하지 않거나 참조하는 횟수를 줄이는 것만으로도 행렬 계산 양을 줄일 수 있어 연산 성능의 향상을 기할 수 있다.In particular, mathematical operations in the fields of science and technology, image processing, and artificial intelligence (deep running) require a very large amount of computation. In performing such a matrix operation, it is possible to reduce the amount of matrix calculation by reducing the number of times that row and column data having a value of 'zero' are not accessed or referred to in the memory, thereby improving the calculation performance.

이상은 도면에 도시된 실시예들을 참고로 설명되었으나 이는 예시적인 것에 불과하며, 당해 기술분야에서 통상의 지식을 가진자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 예를 들어 본 발명의 일 실시예에서는 행렬 곱셈을 예시하였으나, 행렬 가산 혹은 감산에도 동일하게 본 발명의 기술적 발상을 적용하여 행렬 연산을 수행할 수 있을 것이다.While the invention has been shown and described with reference to certain embodiments thereof, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims. For example, although the matrix multiplication is exemplified in the embodiment of the present invention, matrix operation can be performed by applying the technical idea of the present invention to matrix addition or subtraction.

Claims

Accessing in the memory sparse matrices in which information indicating the number of consecutive "zeros" in the same row or column direction is represented by mantissa bits of a floating-point number;
And sequentially outputting the values of the elements constituting each accessed sparse matrix to the matrix operation unit, wherein the mantissa bits of the floating-point number expressing the value of the selected element represent the number of consecutive zeros And outputting 'zero' to the matrix operation unit without selecting a value of an element as many as the number of consecutive 'zeros' when the information is information, wherein, before accessing the sparse matrix in the memory,
And a value of each element is represented by a floating-point number, and the element is stored in the memory, and the value of each element constituting the sparse matrix is consecutive in the same row direction or column direction, end,
Encoding the number of consecutive 'zeroes' followed by converting the number of successive zeroes into a value of an element, and storing the transformed value as a mantissa bit of a floating-point number.

2. The method of claim 1, wherein the sparse matrix accessed in the memory is a matrix obtained by dividing a sparse matrix stored in the memory in units of mxn (m, n is a natural number).

[2] The method of claim 1, wherein the value of an element having a value of " zero " continuously in the same row direction or column direction among the values of elements constituting the sparse matrix,
(S) bit indicating row or column priority, an exponent (E) bit having both "1" and mantissa (M) bits indicating the number of consecutive zeros in a row or column, Wherein the number of 'zeroes' is run-length encoded.

Accessing in the memory sparse matrices in which information indicating the number of consecutive "zeros" in the same row or column direction is represented by mantissa bits of a floating-point number;
And sequentially outputs the values of the elements constituting each accessed sparse matrix to the matrix operation unit, wherein the mantissa bits of the floating-point number expressing the value of the selected element represent the number of consecutive zeros And outputting 'zero' to the matrix operation unit without selecting a value of an element as many as the number of consecutive 'zeros' when the information is information,
The value of an element having a value of " zero " continuously in the same row direction or column direction among the values of elements constituting the sparse matrix,
(S) bit indicating row or column priority, an exponent (E) bit having both "1" and mantissa (M) bits indicating the number of consecutive zeros in a row or column, Wherein the number of 'zeroes' is run-length encoded.

A memory for storing sparse matrices in which information indicating the number of 'zeroes' continuous in the same row direction or column direction is represented by mantissa bits of a floating-point number representing the value of each element constituting the matrix;
And at least one matrix operation unit for performing matrix operation on the accessed sparse matrices in the memory,
A MAC (Multiplication and Accumulator) for matrix multiplication of values of elements sequentially input in different paths;
The run length encoding value is calculated by sequentially selecting values of elements constituting each of the sparse matrices accessed in the memory and outputting the values to the multiplier of the MAC when the mantissa bits of the floating point number for the selected element are run- And outputs a value for making the output of the multiplier 'zero' without selecting a value of an element by a specified number of elements.

The method according to claim 5, wherein a sparse matrix is generated in which information indicating the number of 'zero' continuous in the same row direction or column direction is expressed by mantissa bits of a floating-point number representing the value of each element constituting the matrix And an external processor for transferring the sparse matrix to the memory.

The method according to claim 5 or 6, wherein values of elements having a value of " zero " continuously in the same row direction or column direction among the values of the elements constituting the sparse matrix,
(S) bit indicating row or column priority, an exponent (E) bit having both "1" and mantissa (M) bits indicating the number of consecutive zeros in a row or column, Wherein the number of 'zeroes' is run-length encoded.