KR102572429B1

KR102572429B1 - Method, apparatus and storage for storing a program for multi-demensional matrix multiplication

Info

Publication number: KR102572429B1
Application number: KR1020210043533A
Authority: KR
Inventors: 이경하; 성원경; 김은희
Original assignee: 한국과학기술정보연구원
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2023-09-01
Also published as: KR20220137443A

Abstract

실시예들은 대규모 다차원 행렬에 대한 곱셈 연산 처리 성능을 향상시키기 위한 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램에 관한 것으로, 압축된 비트 벡터 간 곱셈을 통해 결과 행렬을 도출할 수 있다.Embodiments relate to a multidimensional matrix multiplication method, an arithmetic device, and a program for improving multidimensional matrix multiplication processing performance, and a resulting matrix may be derived through multiplication between compressed bit vectors.

Description

Compressed multidimensional matrix multiplication method, arithmetic device, and storage medium for storing programs

실시예들은 행렬 곱셈 방법, 연산 장치 및 프로그램에 관한 것으로, 구체적으로, 압축된 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램에 적용될 수 있다.Embodiments relate to a matrix multiplication method, arithmetic device, and program, and may be specifically applied to a compressed multidimensional matrix multiplication method, arithmetic device, and program.

최근, 인공지능 및 딥러닝 분야에 대한 관심 및 활용도가 높아지고 있다. 이에 따라, 딥 러닝을 위한 딥 뉴럴 네트워크(Deep Neural Network)에 대한 관심 및 활용도가 높아지고 있다.Recently, interest in and utilization of artificial intelligence and deep learning fields are increasing. Accordingly, interest in and utilization of deep neural networks for deep learning are increasing.

딥 뉴럴 네트워크의 학습 및 추론을 위해서는 입력 벡터, 행렬 또는 텐서(tensor)에 대해 가중치(weight) 행렬 또는 텐서를 곱하는 행렬 곱셈(matrix multiplication) 연산이 요구된다. 이러한 행렬 곱셈 연산은 딥 뉴럴 네트워크의 깊이와 크기가 증가함에 따라 연산 속도를 결정짓는 중요한 요소로서, 행렬 내 각 값들의 양자화(quantization) 기술을 포함한다.For learning and inference of a deep neural network, a matrix multiplication operation that multiplies an input vector, matrix, or tensor with a weight matrix or tensor is required. This matrix multiplication operation is an important factor that determines the operation speed as the depth and size of the deep neural network increase, and includes a quantization technology of each value in the matrix.

그러나, 양자화된 행렬에 대한 곱셈 연산은, 기존 희소 행렬 기법으로 효과적으로 처리하기 곤란하며, 딥러닝에서의 행렬은 희소 행렬이 아닌 경우가 많아 데이터 저장 공간이나 연산량 면에서 효과적이지 않은 문제가 있다.However, multiplication operations for quantized matrices are difficult to effectively process with existing sparse matrix techniques, and matrices in deep learning are often not sparse matrices, so there is a problem in that they are not effective in terms of data storage space or amount of computation.

또한, 행렬이나 텐서를 비트 벡터로 변환하여 이를 이용한 행렬 곱셈 연산은, 데이터에 존재하는 유일값(Distinct value)의 수만큼의 비트 벡터가 존재해야 함에 따라, 다양한 값을 갖는 딥러닝에서의 행렬값을 표현할 때 많은 수의 비트 벡터가 필요한 문제가 있다. 나아가, 압축 비트 벡터를 압축하여 이용하더라도, 행렬 곱셈 연산 시에는 비트벡터에 대하여 곱셈 연산을 수행하지 못하고, 압축을 푼 뒤 처리해야 하는 문제가 있다.In addition, the matrix multiplication operation by converting a matrix or tensor into a bit vector and using it requires that as many bit vectors as the number of distinct values exist in the data, matrix values in deep learning with various values There is a problem that requires a large number of bit vectors when expressing . Furthermore, even if the compressed bit vector is compressed and used, there is a problem in that the matrix multiplication operation cannot perform the multiplication operation on the bit vector and must be processed after decompression.

따라서, 압축된 데이터 표현을 행렬 곱셈 연산에 그대로 적용할 수 있는 행렬 곱셈 방법이 요구된다.Therefore, a matrix multiplication method capable of directly applying the compressed data expression to a matrix multiplication operation is required.

실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 입력 행렬 또는 텐서에 대해 양자화를 수행하여 복수 개의 이진화 된 행렬 또는 텐서로 출력하는 단계; 복수 개의 이진화 된 행렬 또는 텐서를 압축하여 복수 개의 압축된 비트 벡터로 변환하는 단계; 복수 개의 압축된 비트 벡터 상에서 특정 위치의 비트를 계산하는 단계; 복수 개의 압축된 비트 벡터 간 곱셈을 수행하여 결과 행렬을 출력하는 단계; 및 압축된 비트 벡터 상에서 특정 비트의 위치에 대응하여, 결과 행렬을 변환하는 단계; 를 포함할 수 있다.A multidimensional matrix multiplication method according to embodiments may include performing quantization on a plurality of input matrices or tensors and outputting a plurality of binarized matrices or tensors; compressing a plurality of binarized matrices or tensors and converting them into a plurality of compressed bit vectors; calculating a bit at a specific position on a plurality of compressed bit vectors; outputting a resulting matrix by performing multiplication between a plurality of compressed bit vectors; and transforming a resulting matrix corresponding to the position of a specific bit on the compressed bit vector; can include

또한, 복수 개의 압축된 비트 벡터로 변환하는 단계는, 복수 개의 이진화 된 행렬 또는 텐서를 1차원으로 풀어(unfold) 비트 벡터로 변환하는 단계; 및 변환된 비트 벡터에 포함되는 비트의 값을 기록하여 변환된 비트 벡터의 압축을 수행하는 단계; 를 포함할 수 있다.In addition, the step of converting into a plurality of compressed bit vectors may include converting a plurality of binarized matrices or tensors into one-dimensional bit vectors by unfolding them; and performing compression of the converted bit vector by recording values of bits included in the converted bit vector. can include

또한, 압축된 비트 벡터의 값은 벡터의 압축여부(0 또는 1)과 0, 1 값 및 0 과 1이 반복된 횟수를 포함할 수 있다.In addition, the value of the compressed bit vector may include whether the vector is compressed (0 or 1), values of 0 and 1, and the number of repetitions of 0 and 1.

또한, 복수 개의 입력 행렬은 행렬 곱셈의 대상이 되는 제 1 행렬 및 제 2 행렬을 포함하고, 결과 행렬을 출력하는 단계는, 제 1 행렬의 열의 개수 또는 상기 제 2 행렬의 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 논리적인 블록 단위로 그루핑(grouping)하는 단계; 그루핑 된 비트들에 대하여 기 설정된 연산을 수행하는 단계; 기 설정된 연산을 통해 얻어진 새로운 비트 벡터에 포함되는 1의 개수를 계산하는 단계; 및 1의 개수를 위치(position)로 설정하여 결과 행렬을 출력하는 단계; 를 더 포함할 수 있다.In addition, the plurality of input matrices include first matrices and second matrices to be subjected to matrix multiplication, and the step of outputting the resulting matrix includes at least one of the number of columns of the first matrix and the number of rows of the second matrix grouping bits included in the compressed bit vector by the number of logical blocks; performing a preset operation on the grouped bits; calculating the number of 1s included in a new bit vector obtained through a predetermined operation; and outputting a resultant matrix by setting the number of 1s as a position; may further include.

또한, 기 설정된 연산은 XNOR(exclusive nor) 연산과 bitcount 연산일 수 있다.Also, the preset operation may be an exclusive nor (XNOR) operation and a bitcount operation.

또한, 논리적인 블록 단위로 그루핑 된 비트들은 제 1 그룹 및 제 2 그룹을 포함하고, 결과 행렬을 출력하는 단계는, 제 1 그룹에 대하여 결과 행렬이 출력됨에 따라, 제 2 그룹에 대하여 결과 행렬을 출력하는 단계; 를 포함할 수 있다.In addition, the bits grouped in logical block units include a first group and a second group, and outputting the result matrix, as the result matrix for the first group is output, outputs the result matrix for the second group. outputting; can include

또한, 논리적인 블록 단위로 그루핑 된 비트들 전부에 대해 결과 행렬을 출력한 경우, 논리적인 블록 단위의 크기만큼, 압축된 비트 벡터에 포함되는 비트 벡터를 왼쪽으로 순환 시프트(circular shift)하는 단계; 제 1 행렬의 열의 개수 또는 제 2 행렬의 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 블록 단위로 그루핑하는 단계; 그루핑 된 비트들에 대하여 기 설정된 연산을 수행하는 단계; 기 설정된 연산을 통해 얻어진 새로운 비트 벡터에 포함되는 1의 개수를 계산하는 단계; 및 1의 개수를 위치로 설정하여 결과 행렬을 출력하는 단계; 를 더 포함할 수 있다.In addition, when the result matrix is output for all of the bits grouped in logical block units, circularly shifting the bit vector included in the compressed bit vector to the left by the size of the logical block unit; grouping bits included in a bit vector compressed by at least one of the number of columns of the first matrix and the number of rows of the second matrix in block units; performing a preset operation on the grouped bits; calculating the number of 1s included in a new bit vector obtained through a predetermined operation; and outputting a resultant matrix by setting the number of 1s as positions; may further include.

실시예들에 따른 다차원 행렬 곱셈 연산 장치는, 복수 개의 입력 행렬에 대해 양자화를 수행하여 복수 개의 이진화 된 행렬을 출력하는 이진화 모듈; 복수 개의 이진화 된 행렬을 복수 개의 압축된 비트 벡터로 변환하는 비트 벡터 압축 모듈; 복수 개의 압축된 비트 벡터 상에서 특정 비트의 위치를 계산하는 커서 모듈; 및 복수 개의 압축된 비트 벡터의 곱셈을 수행하여 결과 행렬을 출력하는 곱셈 연산 모듈; 을 포함할 수 있다.A multidimensional matrix multiplication operation apparatus according to embodiments includes a binarization module that performs quantization on a plurality of input matrices and outputs a plurality of binarized matrices; a bit vector compression module for converting a plurality of binarized matrices into a plurality of compressed bit vectors; A cursor module that calculates a position of a specific bit on a plurality of compressed bit vectors; and a multiplication operation module for performing multiplication of a plurality of compressed bit vectors and outputting a resulting matrix. can include

또한, 비트 벡터 압축 모듈은, 복수 개의 이진화 된 행렬 또는 텐서를 풀어(unfold) 비트 벡터로 변환하고, 변환된 비트 벡터에 포함되는 비트의 값을 기록하여 상기 변환된 비트 벡터의 압축을 수행할 수 있다.In addition, the bit vector compression module converts a plurality of binarized matrices or tensors into bit vectors by unfolding them, and records the values of bits included in the converted bit vectors to compress the converted bit vectors. there is.

또한, 복수 개의 입력 행렬은 제 1 행렬 및 제 2 행렬을 포함하고, 곱셈 연산 모듈은, 제 1 행렬의 열의 개수 또는 제 2 행렬의 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 논리적인 블록 단위로 그루핑하고, 그루핑 된 비트들에 대하여 기 설정된 연산을 수행하고, 기 설정된 연산을 통해 얻어진 새로운 비트 벡터에 포함되는 1의 개수를 계산하고, 1의 개수를 위치로 설정하여 상기 결과 행렬을 출력할 수 있다.In addition, the plurality of input matrices include a first matrix and a second matrix, and the multiplication operation module is included in the bit vector compressed by at least one of the number of columns of the first matrix and the number of rows of the second matrix Bits are grouped in logical block units, a preset operation is performed on the grouped bits, the number of 1s included in the new bit vector obtained through the preset operation is calculated, and the number of 1s is set as a position. The resulting matrix may be output.

또한, 논리적인 블록 단위로 그루핑 된 비트들은 제 1 그룹 및 제 2 그룹을 포함하고, 곱셈 연산 모듈은, 제 1 그룹에 대하여 결과 행렬이 출력됨에 따라, 제 2 그룹에 대하여 결과 행렬을 출력할 수 있다.In addition, the bits grouped in logical block units include a first group and a second group, and the multiplication operation module may output a result matrix for the second group as the result matrix is output for the first group. there is.

또한, 곱셈 연산 모듈이, 논리적인 블록 단위로 그루핑 된 비트들 전부에 대해 결과 행렬을 출력한 경우, 곱셈 연산 모듈은, 논리적인 블록 단위의 크기만큼 상기 압축된 비트 벡터에 포함되는 비트 벡터를 순환 시프트하고, 제 1 행렬의 열의 개수 또는 상기 제 2 행렬의 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 논리적인 블록 단위로 그루핑하고, 그루핑 된 비트들에 대하여 기 설정된 연산을 수행하고, 기 설정된 연산을 통해 얻어진 새로운 비트 벡터에 포함되는 1의 개수를 계산하고, 1의 개수를 위치로 설정하여 결과 행렬을 출력할 수 있다.In addition, when the multiplication operation module outputs a result matrix for all of the bits grouped in logical block units, the multiplication operation module cycles the bit vector included in the compressed bit vector by the size of the logical block unit. Shift, group the bits included in the bit vector compressed by at least one of the number of columns of the first matrix or the number of rows of the second matrix in logical block units, and perform a predetermined operation on the grouped bits , calculates the number of 1's included in the new bit vector obtained through a preset operation, sets the number of 1's as a position, and outputs the resulting matrix.

실시예들에 따른 다차원 행렬 곱셈 프로그램을 저장하는 저장매체는, 복수 개의 입력 행렬 또는 텐서에 대해 양자화(quantization)를 수행하여 복수 개의 이진화 된 행렬로 출력하고; 복수 개의 이진화 된 행렬을 복수 개의 압축된 비트 벡터로 변환하고; 복수 개의 압축된 비트 벡터 상에서 특정 위치의 비트 값을 계산하고; 복수 개의 압축된 비트 벡터 간 곱셈을 수행하여 결과 행렬을 출력하고; 압축된 비트 벡터 상에서 특정 비트의 위치에 대응하여, 결과 행렬을 변환할 수 있다.A storage medium storing a multidimensional matrix multiplication program according to embodiments performs quantization on a plurality of input matrices or tensors and outputs them as a plurality of binarized matrices; converting a plurality of binarized matrices into a plurality of compressed bit vectors; Calculate a bit value at a specific position on the plurality of compressed bit vectors; performing multiplication between a plurality of compressed bit vectors to output a resulting matrix; Corresponding to the location of a specific bit on the compressed bit vector, the resulting matrix may be transformed.

실시예들에 따른 압축된 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 대규모 다차원 행렬 및 텐서에 대한 곱셈 연산 처리 성능의 향상을 통해 딥러닝에서의 딥 뉴럴 네트워크 모델 학습 및 추론 과정에서 처리 속도와 저장 공간 양쪽에서 효율성을 개선할 수 있다.The storage medium for storing the compressed multidimensional matrix multiplication method, operation device, and program according to the embodiments improves the processing performance of multidimensional matrices and tensors for large-scale multidimensional matrices and tensors in the deep neural network model learning and reasoning process in deep learning. Efficiency can be improved in both processing speed and storage space.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 고성능 연산 장치를 가지지 못한 임베디드, 모바일 기기 등에서 딥러닝 모델을 효율적으로 처리할 수 있다. The storage medium for storing the multidimensional matrix multiplication method, computing device, and program according to the embodiments can efficiently process a deep learning model in an embedded or mobile device that does not have a high-performance computing device.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 계산해야 할 입력 행렬의 크기를 압축 비트벡터를 통해 크게 축소할 수 있다.A storage medium storing a multidimensional matrix multiplication method, an arithmetic device, and a program according to embodiments can significantly reduce the size of an input matrix to be calculated through a compressed bit vector.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 행렬 곱셈 연산을 압축 비트 벡터 상의 비트열 연산으로 대체함으로써 저성능 기기에서의 계산 시간을 크게 단축할 수 있다.The multidimensional matrix multiplication method, arithmetic unit, and storage medium storing the program according to the embodiments can significantly reduce calculation time in low-performance devices by replacing matrix multiplication with bit string operations on compressed bit vectors.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 저성능 기기에서의 딥러닝 모델을 통한 추론을 효율적으로 하도록 할 수 있다.The storage medium for storing the multidimensional matrix multiplication method, the computing device, and the program according to the embodiments can efficiently perform inference through a deep learning model in a low-performance device.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 모델 양자화를 통해 경량화 된 뉴럴 네트워크 모델에서의 행렬 곱셈 연산을 보대 효과적으로 수행할 수 있다.A storage medium storing a multidimensional matrix multiplication method, an arithmetic unit, and a program according to embodiments can more effectively perform a matrix multiplication operation in a lightweight neural network model through model quantization.

도 1은 실시예들에 따른 뉴럴 네트워크의 구조도를 도시한 것이다.
도 2는 실시예들에 따른 행렬 곱셈을 도시한 것이다.
도 3은 실시예들에 따른 행렬 곱셈을 비트 벡터를 이용하여 실행하는 예를 도시하였다.
도 4는 실시예들에 따른 다차원 행렬 곱셈 방법의 흐름도이다.
도 5는 실시예들에 따라 입력 행렬을 이진화하고 이를 행렬 곱셈 하는 것이 기존 입력행렬들을 그대로 행렬 곱셈한 결과와 근사화될 수 있음을 개략적으로 나타낸 도면이다.
도 6은 실시예들에 따라 행렬 및 텐서를 풀어 생성한 비트 벡터를 압축하는 것을 개략적으로 나타낸 도면이다.
도 7은 실시예들에 따라 커서에 의해 특정 비트의 위치가 계산되는 것을 개략적으로 나타낸 도면이다.
도 8은 실시예들에 따라 압축된 비트 벡터를 통해 행렬 곱셈을 행하는 것을 개략적으로 나타낸 도면이다.
도 9는 실시예들에 따른 다차원 행렬 곱셈 연산 장치의 구성을 개략적으로 나타낸 도면이다.1 shows a structural diagram of a neural network according to embodiments.
2 illustrates matrix multiplication according to embodiments.
3 illustrates an example of performing matrix multiplication using bit vectors according to embodiments.
4 is a flowchart of a multidimensional matrix multiplication method according to embodiments.
5 is a diagram schematically showing that binarization of an input matrix and matrix multiplication thereof according to embodiments can be approximated with a result of matrix multiplication of existing input matrices as they are.
6 is a diagram schematically illustrating compression of a bit vector generated by solving a matrix and a tensor according to embodiments.
7 is a diagram schematically illustrating that a position of a specific bit is calculated by a cursor according to embodiments.
Figure 8 is a schematic diagram of performing matrix multiplication over a compressed bit vector according to embodiments.
9 is a diagram schematically showing the configuration of a multidimensional matrix multiplication operation device according to embodiments.

이하, 첨부된 도면을 참조하여 실시예들을 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예들을 설명함에 있어서 관련된 공지기술에 대한 구체적인 설명이 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 기술적 사상이 제한되는 것으로 해석되어서는 안된다.Hereinafter, the embodiments will be described in detail with reference to the accompanying drawings, but the same or similar elements are given the same reference numerals regardless of reference numerals, and overlapping descriptions thereof will be omitted. In describing the embodiments, if it is determined that a detailed description of related known technologies may obscure the gist of the embodiment, the detailed description will be omitted. In addition, the accompanying drawings are only for easy understanding of the embodiments, and should not be construed as limiting the technical idea by the accompanying drawings.

또한, 층, 영역 또는 기판과 같은 요소가 다른 구성요소 "상(on)"에 존재하는 것으로 언급될 때, 이것은 직접적으로 다른 요소 상에 존재하거나 또는 그 사이에 중간 요소가 존재할 수도 있다는 것을 이해할 수 있을 것이다. It is also to be understood that when an element such as a layer, region or substrate is referred to as being “on” another element, it may be directly on the other element or intervening elements may exist therebetween. There will be.

실시예들을 통해 설명되는 다차원 행렬 곱셈은, 딥러닝 모델을 도입해야 할 필요성이 있는 모바일, 임베디드 기기 등 계산성능, 메모리, 배터리 제약을 갖는 저성능 기기를 위한 딥러닝 기반 서비스 모두에 활용 가능하나, 추후 개발되는 새로운 제품 형태라도 행렬 곱셈이 적용되는 연산 방법 또는 장치에는 적용될 수 있음을 본 기술 분야의 통상의 기술자라면 쉽게 알 수 있을 것이다.The multidimensional matrix multiplication described through the embodiments can be used for all deep learning-based services for low-performance devices with computational performance, memory, and battery limitations such as mobile and embedded devices that need to introduce a deep learning model, A person skilled in the art will readily recognize that even a new product form to be developed later can be applied to an arithmetic method or device to which matrix multiplication is applied.

도 1은 실시예들에 따른 뉴럴 네트워크의 구조도를 도시한 것이다.1 shows a structural diagram of a neural network according to embodiments.

도 1에 도시한 바와 같이, 실시예들에 따른 뉴럴 네트워크(100)(Neural Network)는 입력 레이어(101), 복수 개의 히든 레이어들(102) 및 출력 레이어(103)를 포함할 수 있다. 실시예들에 따른 뉴럴 네트워크(100)는, 유효한 정보를 추출할 수 있는 보다 많은 레이어들을 포함하고 있어, 효과적으로 복잡한 데이터 집합들을 처리할 수 있다. 도 1에서는 1 개의 입력 레이어(101), 3 개의 히든 레이어(102) 및 1 개의 출력 레이어(103)를 도시하였으나, 이에 한정되는 것은 아니며, 실시예들에 따른 뉴럴 네트워크(100)는 더 적거나 더 많은 레이어들을 포함할 수 있다. 나아가, 실시예들에 따른 뉴럴 네트워크(100)는 도 1에 도시된 것과는 달리 다른 다양한 구조의 레이어들을 포함할 수 있다.As shown in FIG. 1 , a neural network 100 according to embodiments may include an input layer 101, a plurality of hidden layers 102, and an output layer 103. The neural network 100 according to the embodiments includes more layers from which valid information can be extracted, and thus can effectively process complex data sets. 1 shows one input layer 101, three hidden layers 102, and one output layer 103, but is not limited thereto, and the neural network 100 according to embodiments has fewer or It may contain more layers. Furthermore, the neural network 100 according to the embodiments may include layers having various structures different from those shown in FIG. 1 .

실시예들에 따른 뉴럴 네트워크(100)에 포함된 레이어들(101, 102, 103) 각각은 뉴런(neuron), 프로세싱 엘리먼트(processing element: PE), 유닛(unit) 또는 이와 유사한 용어들로 알려진 복수의 인공 노드(artificial node)들을 포함할 수 있다. 예를 들어, 도 1에 도시한 바와 같이, 입력 레이어(101), 복수 개의 히든 레이어들(102) 및 출력 레이어(103)는 각각 6 개의 노드들을 포함할 수 있다. 그러나, 이에 한정되는 것은 아니며, 각각의 레이어들(101, 102, 103)은 다양한 개수의 노드들을 포함할 수 있다.Each of the layers 101, 102, and 103 included in the neural network 100 according to the embodiments includes a plurality of layers known as a neuron, a processing element (PE), a unit, or similar terms. may include artificial nodes of For example, as shown in FIG. 1 , each of the input layer 101, the plurality of hidden layers 102, and the output layer 103 may include 6 nodes. However, it is not limited thereto, and each of the layers 101, 102, and 103 may include various numbers of nodes.

실시예들에 따른 각각의 레이어들(101, 102, 103)에 포함되는 노드들은 서로 연결되어 그 노드의 연산 결과를 다음 레이어의 노드에 전달할 수 있다. 예를 들어, 하나의 노드는 이전 레이어의 다른 노드들로부터 데이터를 수신하여 연산할 수 있고, 연산 결과를 다음 레이어의 다른 노드들로 전달할 수 있다. 실시예들에 따른 노드들 각각은 이전 레이어에 포함된 노드들로부터 수신된 출력값들의 가중치(weight) 값과 활성 함수(activation function)들에 기초하여 자신의 출력값을 결정할 수 있다.Nodes included in each of the layers 101, 102, and 103 according to the embodiments are connected to each other, and the operation result of the node can be transmitted to the node of the next layer. For example, one node may receive data from other nodes of a previous layer, perform calculations, and transfer the calculation result to other nodes of a next layer. Each of the nodes according to the embodiments may determine its own output value based on a weight value of output values received from nodes included in the previous layer and activation functions.

실시예들에 따른 뉴럴 네트워크(100)의 계산을 위해, 입력 벡터에 대하여 가중치를 곱하는 행렬 곱셈을 실시할 수 있다. 이때, 행렬 곱셈은 각 레이어의 입력에 대해 모두 적용되며, 각 입력 벡터에 대해 반복적으로 수행됨에 따라, 뉴럴 네트워크의 깊이와 크기가 증가할수록 전체 연산 속도를 결정 짓는 가장 중요한 기본 연산자를 의미할 수 있다. For calculation of the neural network 100 according to the embodiments, matrix multiplication for multiplying input vectors by weights may be performed. At this time, matrix multiplication is applied to all inputs of each layer, and as it is repeatedly performed for each input vector, it can mean the most important basic operator that determines the overall operation speed as the depth and size of the neural network increase. .

도 2는 실시예들에 따른 행렬 곱셈을 도시한 것이다.2 illustrates matrix multiplication according to embodiments.

도 2는 실시예들에 따른 행렬 곱셈을 압축 희소 행렬을 이용하여 실행하는 예를 도시하였다.2 illustrates an example of performing matrix multiplication using a compressed sparse matrix according to embodiments.

실시예들에 따른 희소 행렬은 행렬 내 0의 위치가 많고, 0이 아닌 값이 극히 일부인 행렬을 의미할 수 있다. 희소 행렬은 희소 행렬 그대로 저장하는 경우 0의 값에 의한 공간 소비가 큰 문제가 있다. 따라서, 실시예들에 따른 행렬 곱셈은 0이 아닌 행렬 위치를 표기하기 위한 압축 방식을 포함할 수 있다.A sparse matrix according to embodiments may refer to a matrix in which there are many positions of 0 in the matrix and a small portion of non-zero values. When a sparse matrix is stored as it is, there is a problem in that space consumption due to a value of 0 is large. Accordingly, matrix multiplication according to embodiments may include a compression scheme for marking a non-zero matrix position.

도 2의 (a)는 실시예들에 따른 행렬 곱셈으로서, CSR 방식을 도시한 것이다.2(a) shows a CSR method as matrix multiplication according to embodiments.

실시예들에 따른 CSR(Compressed Sparse Row) 방식을 이용하는 행렬 곱셈일 수 있다. 예를 들어, 도 2의 (a)에 도시한 바와 같이, 기존 행렬(210)에 있어서, 맨 위에 위치하는 행부터 왼쪽에서 오른쪽의 순서대로 0이 아닌 값을 데이터(Data) 벡터(211)에 저장할 수 있다. 즉, 데이터 벡터(211)는, 8257129의 값을 저장할 수 있다. 지수(Indices) 벡터(212)는, 기존 행렬(210)에서 0이 아닌 값이 위치한 열의 위치값을 순서대로 저장할 수 있다. 예를 들어, 기존 행렬(210)에 포함된 0이 아닌 값 8, 2, 5, 7, 1, 2, 9가 위치하는 열인 0, 2, 2, 2, 3, 4, 3를 각각의 순서대로 0222343으로 저장할 수 있다. 인덱스 포인터(Index Pointer) 벡터(213)는 범위로써 데이터의 순서를 기록할 수 있다. 예를 들어, [0:2]의 범위 내에 첫 행의 데이터가 존재하므로, 데이터의 수는 2 개가 됨을 알 수 있고, 데이터 벡터(211)에서 앞 두 개, 즉, 8과 2가 있음을 호가인할 수 있다. 그 다음 행은 [2:3]의 범위가 가리키는 5가, 그 다음 행은 0이 아닌 값이 존재하지 않으므로, [3:3]으로 표기할 수 있다.It may be matrix multiplication using a compressed sparse row (CSR) method according to embodiments. For example, as shown in (a) of FIG. 2, in the existing matrix 210, non-zero values are assigned to the data vector 211 from the top row in order from left to right. can be saved That is, the data vector 211 can store 8257129 values. The indices vector 212 may sequentially store position values of columns in which non-zero values are located in the existing matrix 210 . For example, 0, 2, 2, 2, 3, 4, 3, which are columns where non-zero values 8, 2, 5, 7, 1, 2, and 9 are located in the existing matrix 210, are arranged in the respective order. It can be saved as 0222343 as follows. The index pointer vector 213 may record the order of data as a range. For example, since the data of the first row exists within the range of [0:2], it can be seen that the number of data is 2, and the call indicates that there are the first two in the data vector 211, that is, 8 and 2. can be cut The next row is 5 indicated by the range of [2:3], and since there is no non-zero value in the next row, it can be written as [3:3].

도 2의 (b)는 실시예들에 따른 행렬 곱셈으로서, CSC 방식을 도시한 것이다.2(b) shows a CSC scheme as matrix multiplication according to embodiments.

실시예들에 따른 CSC(Compressed Sparse Column) 방식을 이용하는 행렬 곱셈일 수 있다. 이때, CSC 방식은 CSR 방식과 동일하게 진행하되, 행이 아닌 열을 중심으로 데이터를 기록할 수 있다. 예를 들어, 도 2의 (b)에 도시한 바와 같이, 기존 행렬(220)에 있어서, 맨 왼쪽에 위치하는 열부터 위쪽에서 아래쪽의 순서대로 0이 아닌 값을 데이터 벡터(221)에 저장할 수 있다. 즉, 데이터 벡터(221)는, 8257192의 값을 저장할 수 있다. 지수 벡터(222)는, 기존 행렬(220)에서 0이 아닌 값이 위치한 행의 위치값을 순서대로 저장할 수 있다. 예를 들어, 기존 행렬(220)에 포함된 0이 아닌 값이 위치하는 행을 순서대로 0014464로 저장할 수 있다. 인덱스 포인터 벡터(223)는 범위로써 데이터의 순서를 기록할 수 있고, 예를 들어, 011467의 값을 가질 수 있다.It may be matrix multiplication using a Compressed Sparse Column (CSC) method according to embodiments. At this time, the CSC method is performed in the same way as the CSR method, but data can be recorded centering on columns rather than rows. For example, as shown in (b) of FIG. 2, in the existing matrix 220, non-zero values can be stored in the data vector 221 in order from top to bottom from the leftmost column. there is. That is, the data vector 221 can store 8257192 values. The index vector 222 may sequentially store position values of rows in which non-zero values are located in the existing matrix 220 . For example, rows in which non-zero values included in the existing matrix 220 are located may be sequentially stored as 0014464. The index pointer vector 223 may record the order of data as a range, and may have, for example, a value of 011467.

그러나, 압축 희소 행렬을 이용한 표현 방식은, 희소 행렬에는 효과적인 방식이나, 딥러닝에서의 행렬들은 희소 행렬이 아닌 경우가 많아, 데이터 저장 공간이나 연산량 면에서 효과적이지 않은 문제가 있다. 따라서, 뉴럴 네트워크에서의 다양한 값을 표현할 수 있는 연산 방법이 요구된다.However, although the expression method using a compressed sparse matrix is effective for sparse matrices, matrices in deep learning are often not sparse, and thus are not effective in terms of data storage space or computational complexity. Therefore, a calculation method capable of expressing various values in a neural network is required.

도 3은 실시예들에 따른 행렬 곱셈을 비트 벡터를 이용하여 실행하는 예를 도시하였다.3 illustrates an example of performing matrix multiplication using bit vectors according to embodiments.

도 3은 실시예들에 따른 행렬 곱셈을 압축 비트 벡터를 이용하여 실행하는 예를 도시하였다.3 illustrates an example of performing matrix multiplication using a compressed bit vector according to embodiments.

실시예들에 따른 압축 비트 벡터를 이용한 행렬 곱셈 방식은 데이터(301)를 비트 벡터(302)로 변환하고, 비트 벡터(302)를 압축 비트 벡터(303)로 변환하는 과정을 포함할 수 있다. A matrix multiplication method using a compressed bit vector according to embodiments may include converting data 301 into a bit vector 302 and converting the bit vector 302 into a compressed bit vector 303 .

실시예들에 따른 비트 벡터(302)는 데이터(301)의 유일값(distinct value)만큼 비트 벡터들을 생성하고, 데이터 중 특정 값에 대한 위치를 해당값을 표현하는 비트 벡터에 1로 표현하여 저장할 수 있다. 예를 들어, 데이터(301)에 포함되는 첫 번째 값인 1의 경우, 1에 대한 비트 벡터(302)의 첫 번째 1로 표현될 수 있다. 또한, 데이터(301)에 포함되는 두 번째 값인 3의 경우, 3에 대한 비트 벡터(302)의 두 번째 위치의 1로 표현할 수 있다. The bit vector 302 according to the embodiments generates bit vectors as many as the distinct values of the data 301, and stores the position of a specific value among the data by expressing it as 1 in the bit vector representing the corresponding value. can For example, in the case of 1, which is the first value included in the data 301, it can be expressed as the first 1 of the bit vector 302 for 1. In addition, in the case of 3, which is the second value included in the data 301, it can be expressed as 1 in the second position of the bit vector 302 for 3.

실시예들에 따른 압축 비트 벡터(303)는, 반복적인 0 또는 1들을 적은 값으로 표현하도록 압축하는 방식으로, RLE(Run Length Encoding) 방식을 이용할 수 있다. 구체적으로, 반복적으로 나오는 동일 값에 대해 런의 길이를 기록할 수 있고, 예를 들어, 도 3에 도시한 바와 같이, 8*0이 2 번 나왔음을 기록할 수 있다.The compressed bit vector 303 according to the embodiments is compressed to represent repetitive 0s or 1s as a small value, and a Run Length Encoding (RLE) method may be used. Specifically, the length of the run may be recorded for the same value repeatedly appearing, for example, as shown in FIG. 3, it may be recorded that 8*0 appears twice.

그러나, 압축 비트 벡터를 이용한 표현 방식은 데이터에 존재하는 유일값의 수만큼 비트벡터가 존재해야 함에 따라 다양한 값들을 갖는 딥러닝에서의 행렬 값들을 표현하는데 있어 많은 수의 비트 벡터가 필요한 문제가 있다. 또한, 압축 비트 벡터를 이용한 표현 방식은 곧바로 연산을 수행하지 못하고, 압축을 해제해야 하는 문제가 있다. 따라서, 뉴럴 네트워크에서 보다 효과적으로 행렬 곱셈 연산을 수행할 수 있는 연산 방법이 요구된다.However, the expression method using compressed bit vectors has a problem in that a large number of bit vectors are required to express matrix values in deep learning having various values as bit vectors must exist as many as the number of unique values existing in data. . In addition, the expression method using a compressed bit vector has a problem in that an operation cannot be performed immediately and compression must be released. Therefore, an operation method capable of performing a matrix multiplication operation more effectively in a neural network is required.

도 4는 실시예들에 따른 다차원 행렬 곱셈 방법의 흐름도이다.4 is a flowchart of a multidimensional matrix multiplication method according to embodiments.

도 4에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 입력 행렬을 복수 개의 이진화 된 행렬로 출력하는 단계(s401)를 포함할 수 있다. As shown in FIG. 4 , the multidimensional matrix multiplication method according to the embodiments may include outputting a plurality of input matrices as a plurality of binarized matrices (S401).

실시예들에 따른 복수 개의 입력 행렬은, 계산 대상이 되는 행렬로서, 행렬 곱셈을 수행하기 위한 입력 행렬들을 포함할 수 있다. 복수 개의 입력 행렬은 다차원 행렬을 포함할 수 있다. 복수 개의 입력 행렬은, 예를 들어, 3차원 행렬을 가질 수 있다. 복수 개의 입력 행렬은 양자화를 통해 이진화 된 행렬로 출력될 수 있다. 이때, 양자화(quantization)는 원래 행렬의 원소 값을 보다 적은 비트길이를 갖도록 구성되는 값으로, 예를 들어, 원래 값을 이용하는 경우의 뉴럴 네트워크 모델의 정확도와 최대한 같거나 유사하도록 학습하면서 원래 값을 적은 비트길이로 표현하는 작업일 수 있다. 복수 개의 입력 행렬은 제 1 행렬 및 제 2 행렬을 포함할 수 있다. A plurality of input matrices according to embodiments are matrices to be calculated, and may include input matrices for performing matrix multiplication. A plurality of input matrices may include multidimensional matrices. A plurality of input matrices may have, for example, a 3-dimensional matrix. A plurality of input matrices may be output as binarized matrices through quantization. At this time, quantization is a value configured to have a smaller bit length than the element value of the original matrix, for example, while learning to be as similar as or equal to the accuracy of the neural network model in the case of using the original value as much as possible, the original value It may be a task to express with a small bit length. The plurality of input matrices may include a first matrix and a second matrix.

실시예들에 따라 이진화 된 행렬은 입력 행렬에 포함되는 값들이 양자화를 통해 이진화(binarization) 되거나 또는 이진화 된 값으로 분해 가능한 값들로 구성되는 행렬일 수 있다. 이진화 된 행렬은 입력 행렬과 동일한 모습을 가지나, 이진화 된 행렬에 포함되는 값은 이진화 된 값을 가질 수 있다.According to embodiments, the binarized matrix may be a matrix in which values included in the input matrix are binarized through quantization or a matrix composed of values decomposable into binarized values. The binarized matrix has the same appearance as the input matrix, but values included in the binarized matrix may have binarized values.

도 1에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 이진화 된 행렬을 압축하여 복수 개의 압축된 비트 벡터로 변환하는 단계(s402)를 포함할 수 있다. As shown in FIG. 1 , the multidimensional matrix multiplication method according to the embodiments may include compressing a plurality of binarized matrices and converting them into a plurality of compressed bit vectors (S402).

실시예들에 따라 이진화 된 행렬은 언폴드(unfold)한 상태에서 1차원의 비트 벡터로 변환될 수 있다. 변환된 비트 벡터는 RLE(Run Length Encoding) 방식을 통해 압축될 수 있다. 구체적으로, 변환된 비트 벡터는, 변환된 비트 벡터에 포함되는 비트의 값을 기록하여 압축될 수 있다.According to embodiments, the binarized matrix may be converted into a one-dimensional bit vector in an unfolded state. The converted bit vector may be compressed through a run length encoding (RLE) method. Specifically, the converted bit vector may be compressed by recording the values of bits included in the converted bit vector.

실시예들에 따라 압축된 비트 벡터는, 0, 1 및 0 또는 1이 반복되는 횟수를 비트(bit)로서 포함할 수 있다. 구체적으로, 압축된 비트 벡터의 값은 벡터의 압축여부(0 또는 1)과 0, 1 값 및 0 과 1이 반복된 횟수를 포함할 수 있다. 이에 따라, 실시예들에 따른 다차원 행렬 곱셈 방법은 입력 행렬의 크기를 압축 비트벡터를 통해 크게 축소할 수 있다.According to embodiments, the compressed bit vector may include 0, 1, and the number of repetitions of 0 or 1 as bits. Specifically, the value of the compressed bit vector may include whether the vector is compressed (0 or 1), values of 0 and 1, and the number of repetitions of 0 and 1. Accordingly, the multidimensional matrix multiplication method according to the embodiments can greatly reduce the size of an input matrix through a compressed bit vector.

도 4에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 압축된 비트 벡터 상에서 특정 위치의 비트를 계산하는 단계(s403)를 포함할 수 있다.As shown in FIG. 4, the multidimensional matrix multiplication method according to the embodiments may include calculating a bit at a specific position on a plurality of compressed bit vectors (S403).

실시예들에 따라 압축된 비트 벡터는, 커서(cursor)에 의해 압축된 비트 벡터 내에 존재하는 특정 비트의 위치를 파악하거나, 특정 위치의 비트 값을 알 수 있다. 이때, 커서는 가상 커서로서, 압축된 비트 벡터에 대한 특정 비트의 위치 정보를 압축되지 않은 비트 벡터에 대한 특정 비트의 위치 정보로 환원하기 위한 것일 수 있다. 이때, 커서는, 압축되지 않은 비트 벡터에 대한 특정 비트의 위치 정보를 획득하기 위해, 현재 압축된 워드(word) 단위에서의 위치와 해당 위치에 대응하는 압축된 형태의 방문 위치를 획득할 수 있다.In the compressed bit vector according to embodiments, the position of a specific bit existing in the compressed bit vector may be determined by a cursor, or a bit value at a specific position may be known. In this case, the cursor is a virtual cursor and may be used to reduce positional information of a specific bit of the compressed bit vector to positional information of a specific bit of the uncompressed bit vector. At this time, the cursor may obtain a location in a currently compressed word unit and a visit location in a compressed form corresponding to the location in order to obtain location information of a specific bit for the uncompressed bit vector. .

실시예들에 따른 다차원 행렬 곱셈 방법은, 커서를 통해, 비트의 크기가 나누어지는 크기 단위 내에 0이 포함된 경우, 0을 포함하는 영역을 조사 시 건너뜀(skip)하여 처리 속도를 줄일 수 있다. 구체적으로, 실시예들에 따른 다차원 행렬 곱셈 방법은, 커서를 통해, 1을 포함하는 비트가 나올 때까지, 비트 방문 단위를 건너뛰거나, 특정 비트의 위치로 커서가 이동하도록 할 수 있다.In the multidimensional matrix multiplication method according to the embodiments, when 0 is included in a size unit in which the size of a bit is divided, a region including 0 is skipped when examining through a cursor, thereby reducing processing speed. . Specifically, in the multidimensional matrix multiplication method according to the embodiments, a bit visit unit may be skipped through a cursor until a bit including 1 appears, or the cursor may be moved to a position of a specific bit.

도 4에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 압축된 비트 벡터 간 곱셈을 수행하여 결과 행렬을 출력하는 단계(s404)를 포함할 수 있다. 이때 압축된 비트 벡터 간 곱셈을 수행하기 위하여 XNOR 연산자와 1 값을 가지는 비트들의 개수를 계산하는 bitcount 연산을 포함할 수 있다.As shown in FIG. 4 , the multidimensional matrix multiplication method according to the embodiments may include performing multiplication between a plurality of compressed bit vectors and outputting a resulting matrix (S404). In this case, an XNOR operator and a bitcount operation for calculating the number of bits having a value of 1 may be included to perform multiplication between compressed bit vectors.

실시예들에 따라 압축된 비트 벡터는 계산을 위해 논리적으로 그루핑(grouping, 그룹화) 될 수 있다. 구체적으로, 압축된 비트 벡터에 포함되는 비트들은 논리적인 블록 단위로 그룹이 나누어지는 그루핑이 될 수 있다. 압축된 비트 벡터는, 입력 행렬인 제 1 행렬의 열의 개수에 대응하는 비트의 개수를 포함하는 논리적인 블록 단위로 그루핑 될 수 있다. 또한, 압축된 비트 벡터는, 입력 행렬인 제 2 행렬의 행의 개수에 대응하는 비트의 개수를 포함하는 논리적인 블록 단위로 그루핑 될 수 있다. According to embodiments, compressed bit vectors may be logically grouped for calculation. Specifically, the bits included in the compressed bit vector may be grouped in logical block units. The compressed bit vector may be grouped in units of logical blocks including the number of bits corresponding to the number of columns of the first matrix as the input matrix. In addition, the compressed bit vector may be grouped in units of logical blocks including the number of bits corresponding to the number of rows of the second matrix, which is the input matrix.

실시예들에 따라 그루핑 된 비트들은 기 설정된 연산에 의해 서로 매칭(matching) 또는 미스 매칭(mismatching)되는지 판단될 수 있다. 구체적으로, 연산의 대상이 되는 복수의 그루핑 된 비트들은, 기 설정된 연산을 통해 서로 매칭되는 횟수가 많은지 또는 미스매칭 되는 횟수가 많은지 판단될 수 있다. 이때, 기 설정된 연산은 배타적 NOR(XNOR, exclusive Nor)일 수 있다. 즉, 그루핑 된 비트들은 기 설정된 연산, 예를 들어, bitcount 함수를 이용한 XNOR 연산을 통해, 1의 개수를 계산할 수 있다. According to embodiments, it may be determined whether the grouped bits are matching or mismatching with each other by a preset operation. Specifically, it may be determined whether the plurality of grouped bits, which are the target of operation, are matched or mismatched with each other through a predetermined operation. In this case, the preset operation may be exclusive NOR (XNOR, exclusive Nor). That is, the grouped bits may calculate the number of 1s through a preset operation, for example, an XNOR operation using the bitcount function.

실시예들에 따라 출력된 결과 행렬은, 계산된 1의 개수에 기초하여 비트의 위치(position)가 설정될 수 있다. 구체적으로, 출력된 결과 행렬에서, 각 비트의 값이 입력되는 위치는 1의 개수에 기초하여 설정될 수 있다.In the resulting matrix output according to embodiments, a bit position may be set based on the calculated number of 1's. Specifically, in the output result matrix, the position where the value of each bit is input may be set based on the number of 1's.

즉, 실시예들에 따른 다차원 행렬 곱셈 방법은, 압축을 풀지 않은 상태, 즉, 비트 벡터가 압축된 상태에서 압축된 비트 벡터 간 곱셈을 수행하여 결과 행렬을 출력할 수 있다.That is, the multidimensional matrix multiplication method according to the embodiments may perform multiplication between compressed bit vectors in an uncompressed state, that is, in a state in which bit vectors are compressed, and output a resulting matrix.

도 4에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 압축된 비트 벡터 상에서 특정 비트의 위치에 대응하여 결과 행렬을 변환하는 단계(s405)를 포함할 수 있다. As shown in FIG. 4 , the multidimensional matrix multiplication method according to the embodiments may include transforming a resulting matrix corresponding to a position of a specific bit on a compressed bit vector (S405).

실시예들에 따라 출력된 결과 행렬은, 커서에 의해 획득한 압축되지 않은 비트 벡터에 대한 특정 비트의 위치 정보에 기초하여, 압축되지 않은 결과 행렬로 변환될 수 있다.According to embodiments, the output result matrix may be converted into an uncompressed result matrix based on positional information of a specific bit with respect to an uncompressed bit vector obtained by a cursor.

이때, 실시예들에 따라 논리적으로 그루핑 된 비트들은 기 설정된 연산에 의해 서로 매칭 또는 미스 매칭되는지 판단될 수 있다. 구체적으로, 연산의 대상이 되는 복수의 그루핑 된 비트들은, 기 설정된 연산을 통해 서로 매칭되는 횟수가 많은지 또는 미스매칭 되는 횟수가 많은지 판단될 수 있다. 이때, 기 설정된 연산은 XNOR일 수 있다. 즉, 그루핑 된 비트들은 기 설정된 연산, 예를 들어, bitcount 함수를 이용한 XNOR 연산을 통해, 1의 개수를 계산할 수 있다. 계산된 1의 개수에 기초하여, 비트의 위치(position)가 설정될 수 있다. 구체적으로, 출력될 결과 행렬에서, 각 비트의 값이 입력되는 위치는 1의 개수에 기초하여 설정될 수 있다.In this case, according to embodiments, it may be determined whether logically grouped bits match or mismatch each other by a preset operation. Specifically, it may be determined whether the plurality of grouped bits, which are the target of operation, are matched or mismatched with each other through a predetermined operation. In this case, the preset operation may be XNOR. That is, the grouped bits may calculate the number of 1s through a preset operation, for example, an XNOR operation using the bitcount function. Based on the calculated number of 1's, the position of the bit may be set. Specifically, in the result matrix to be output, the position where the value of each bit is input may be set based on the number of 1's.

실시예들에 따른 다차원 행렬 곱셈 방법은, 압축된 비트 벡터의 제 1 그룹에 속하는 비트들에 대하여 연산이 종료된 경우, 제 2 그룹에 속하는 비트들에 대하여 연산을 반복 수행할 수 있다. 이 경우, 제 2 그룹은, s403 단계를 수행할 수 있다.In the multidimensional matrix multiplication method according to the embodiments, when the operation is completed on the bits belonging to the first group of the compressed bit vector, the operation may be repeatedly performed on the bits belonging to the second group. In this case, the second group may perform step s403.

실시예들에 따른 다차원 행렬 곱셈 방법은, 실시예들에 따라 그룹으로 나누어진 전체 블록 단위(제 1 그룹, 제 2 그룹을 포함)에 대하여, s401 내지 s403의 단계가 모두 수행된 경우, 블록 단위 크기만큼 비트 벡터를 왼쪽으로 순환 시프트하여, s404의 단계를 수행할 수 있다.In the multidimensional matrix multiplication method according to the embodiments, when all of the steps s401 to s403 are performed for all block units (including the first group and the second group) divided into groups according to the embodiments, block unit Step s404 may be performed by circularly shifting the bit vector to the left by the size.

이하에서는, 입력 행렬을 이진화 된 행렬로 출력하는 것에 대하여 상술한다.Hereinafter, outputting an input matrix as a binarized matrix will be described in detail.

도 5는 실시예들에 따라 입력 행렬을 이진화하고 이를 행렬 곱셈 하는 것이 기존 입력행렬들을 그대로 행렬 곱셈한 결과와 근사화될 수 있음을 개략적으로 나타낸 도면이다.5 is a diagram schematically showing that binarization of an input matrix and matrix multiplication thereof according to embodiments can be approximated with a result of matrix multiplication of existing input matrices as they are.

도 5에 도시한 바와 같이, 입력 행렬(510)(예를 들어, 도 1에서 설명한 입력 레이어, 도 2 내지 도 3에서 설명한 데이터 및 도 4에서 설명한 입력 행렬)은 계산의 대상이 되는 복수 개의 다차원 행렬 또는 텐서를 포함할 수 있다. 도 5에서는, 입력 행렬(510)을 3차원으로 도시하였으나, 이에 한정되는 것은 아니며, 지원하는 차원은 어떤 차원이어도 된다. 입력 행렬(510)은, 예를 들어, 제 1 행렬(I)과 제 2 행렬(W)을 포함할 수 있다.As shown in FIG. 5, the input matrix 510 (eg, the input layer described in FIG. 1, the data described in FIGS. 2 and 3, and the input matrix described in FIG. 4) is a plurality of multi-dimensional objects to be calculated. Can contain matrices or tensors. In FIG. 5, the input matrix 510 is shown in three dimensions, but is not limited thereto, and any dimension may be supported. The input matrix 510 may include, for example, a first matrix I and a second matrix W.

도 5에 도시한 바와 같이, 입력 행렬(510)에 포함되는 값들은, 이진화 되어 이진화 된 행렬(520)(예를 들어, 도 4에서 설명한 이진화 된 행렬)로 출력될 수 있다. 구체적으로, 입력 행렬(510)에 포함되는 값들은, 양자화가 수행되어 이진화 되거나 또는 이진화 된 값으로 분해 가능한 값들로 구성되는 이진화 된 행렬(520)로 출력될 수 있다. 이때, 식별번호 521은, 입력 행렬이 이진화 된 값으로 분해 가능한 경우, 분해 전 값으로 돌아가기 위한 행렬(K) 또는 상수(α)일 수 있다. 예를 들어, 입력 행렬(510)에 포함되는 제 1 행렬(I)의 값 0.2는 이진화되면서 1이 되고, 입력 행렬(510)에 포함되는 제 2 행렬(I)의 값 -5는 이진화되면서 -1이 될 수 있다. 이때, 이진화 된 행렬(520)은, 입력 행렬(510)보다 작은 비트 길이를 갖도록 구성되면서, 입력 행렬(510)과 같거나 유사한 값을 가질 수 있는 비트를 포함할 수 있다. 이진화 된 행렬(520)은 비트 벡터로 변환될 수 있다.As shown in FIG. 5 , values included in the input matrix 510 may be binarized and output as a binarized matrix 520 (eg, the binarized matrix described in FIG. 4 ). Specifically, values included in the input matrix 510 may be binarized after quantization, or may be output as a binarized matrix 520 composed of values decomposable into the binarized values. In this case, the identification number 521 may be a matrix (K) or a constant (α) for returning to a value before decomposition when the input matrix can be decomposed into a binarized value. For example, the value 0.2 of the first matrix (I) included in the input matrix 510 becomes 1 while being binarized, and the value -5 of the second matrix (I) included in the input matrix 510 is binarized - can be 1 In this case, the binarized matrix 520 may have a bit length smaller than that of the input matrix 510 and may include bits that may have the same or similar values as the input matrix 510 . The binarized matrix 520 may be converted into a bit vector.

이하에서는, 비트 벡터로 변환된 이진화 된 행렬을 압축하여 압축된 비트 벡터로 변환하는 것에 대하여 상술한다.Hereinafter, the process of compressing and converting a binary matrix converted into a bit vector into a compressed bit vector will be described in detail.

도 6은 실시예들에 따라 행렬 및 텐서를 풀어 생성한 비트 벡터를 압축하는 것을 개략적으로 나타낸 도면이다.6 is a diagram schematically illustrating compression of a bit vector generated by solving a matrix and a tensor according to embodiments.

실시예들에 따른 다차원 행렬 곱셈 방법(예를 들어, 도 1 내지 도 3에서 설명한 행렬 곱셈, 도 4 내지 도 5 에서 설명한 다차원 행렬 곱셈 방법)은 이진화 된 행렬(예를 들어, 도 4 내지 도 5에서 설명한 이진화 된 행렬)을 압축된 비트 벡터(예를 들어, 도 4에서 설명한 압축된 비트 벡터)로 나타내는 단계(예를 들어, 도 4에서 설명한 s402)를 포함할 수 있다.The multidimensional matrix multiplication method (eg, the matrix multiplication described in FIGS. 1 to 3 or the multidimensional matrix multiplication method described in FIGS. 4 to 5) according to the embodiments is a binarized matrix (eg, FIGS. 4 to 5). A step of expressing the binarized matrix described above as a compressed bit vector (eg, the compressed bit vector described in FIG. 4) (eg, s402 described in FIG. 4) may be included.

도 6의 (a)에서는 이진화 된 행렬 및 텐서를 언폴드(unfold)한 비트 벡터를 나타냈다. 언폴드 된 비트 벡터는, 예를 들어, 8,000 비트로 구성되고, 이때 비트 벡터는 단지 0 또는 1의 값만을 가질 수 있다. 설명의 편의를 위하여, 이하에서는, 컴퓨팅 기기를 포함하는 연산 장치에서 1개의 단어(word)는 32비트로 구성되는 경우를 가정하나, 이에 한정되는 것은 아니다.In (a) of FIG. 6, bit vectors obtained by unfolding binarized matrices and tensors are shown. An unfolded bit vector consists of, for example, 8,000 bits, in which case the bit vector can only have a value of 0 or 1. For convenience of description, the following assumes that one word is composed of 32 bits in an arithmetic device including a computing device, but is not limited thereto.

도 6의 (b)에 도시한 바와 같이, 1개의 단어(word)는 32비트이므로, 31개의 비트씩 그루핑이 이루어질 수 있다. 예를 들어, 8000비트는 7998비트에 대하여 258개의 그룹으로 그루핑이 이루어지고, 2비트가 남을 수 있다. 이때, 258개의 그룹은 1을 포함하는 그룹과 1을 포함하지 않는 그룹으로 분리될 수 있다. 예를 들어, 2 개의 그룹은 1을 포함하고, 256 개의 그룹은 1을 포함하지 않는 그룹일 수 있다. 이 경우, 도 6의 (b)에 도시한 것처럼, 1을 포함하는 그룹과 1을 포함하지 않는 그룹을 분리하여, 31bits 와 256*31bits 로 표현할 수 있다. As shown in (b) of FIG. 6, since one word is 32 bits, grouping may be performed by 31 bits. For example, 8000 bits are grouped into 258 groups for 7998 bits, and 2 bits may remain. At this time, the 258 groups may be divided into a group including 1 and a group not including 1. For example, 2 groups may include 1, and 256 groups may not include 1. In this case, as shown in (b) of FIG. 6, a group including 1 and a group not including 1 may be separated and expressed as 31 bits and 256*31 bits.

도 6의 (c)에 도시한 바와 같이, 1을 포함하는 그룹에 대하여는 압축을 수행하지 않을 수 있다. 이 경우, 그룹의 첫 비트는 0으로 표현될 수 있다.As shown in (c) of FIG. 6, compression may not be performed on a group including 1. In this case, the first bit of the group may be represented as 0.

도 6의 (c)에 도시한 바와 같이, 1을 포함하지 않는 그룹에 대하여는 압축이 수행될 수 있다. 이 경우, 그룹의 첫 비트는 1로 표현될 수 있다. 이때, 그룹이 반복된 횟수(256)(Run-length)를 나머지 비트에 기록할 수 있고, 예를 들어, 100000000로 기록할 수 있다. As shown in (c) of FIG. 6, compression may be performed on a group not including 1. In this case, the first bit of the group may be represented as 1. At this time, the number of repetitions of the group (256) (run-length) may be recorded in the remaining bits, for example, 100000000 may be recorded.

즉, 도 6의 (c)에 도시한 바와 같이, 각 그룹들에 대하여 1을 포함하는지 여부에 따라 압축을 수행함으로써, 압축된 비트 벡터를 출력할 수 있다.That is, as shown in (c) of FIG. 6, by performing compression according to whether each group includes 1, a compressed bit vector can be output.

이하에서는, 압축된 비트 벡터에 상에서 특정 비트의 위치를 계산하는 것에 대하여 상술한다.Hereinafter, calculating the position of a specific bit on a compressed bit vector will be described in detail.

도 7은 실시예들에 따라 커서에 의해 특정 비트의 위치가 계산되는 것을 개략적으로 나타낸 도면이다.7 is a diagram schematically illustrating that a position of a specific bit is calculated by a cursor according to embodiments.

실시예들에 따른 다차원 행렬 곱셈 방법(예를 들어, 도 1 내지 도 3에서 설명한 행렬 곱셈, 도 4 내지 도 6 에서 설명한 다차원 행렬 곱셈 방법)은 커서(예를 들어, 도 4에서 설명한 커서 또는 가상 커서)를 통해 특정 비트의 위치 정보를 획득하는 단계(예를 들어, 도 4에서 설명한 s403)를 포함할 수 있고, 구체적으로, 압축된 비트 벡터(예를 들어, 도 4 및 도 6에서 설명한 압축된 비트 벡터)에 대해 특정 비트의 위치 정보를 압축되지 않은 상태에서의 위치 정보로 환원하기 위한 정보를 획득할 수 있다.The multidimensional matrix multiplication method (eg, the matrix multiplication method described in FIGS. 1 to 3 and the multidimensional matrix multiplication method described in FIGS. 4 to 6) according to the embodiments is a cursor (eg, the cursor described in FIG. 4 or the virtual matrix multiplication method described in FIG. 4 ). cursor) to obtain location information of a specific bit (eg, s403 described in FIG. 4), and specifically, a compressed bit vector (eg, compression described in FIGS. 4 and 6). It is possible to obtain information for reducing position information of a specific bit to position information in an uncompressed state for a bit vector).

실시예들에 따른 커서는, 도 7의 a)에 도시한 것처럼, 다음 1비트가 나올 때까지 비트 단위의 방문을 건너뛸(skip) 수 있다. 예를 들어, 단위 비트 중에서 0만을 포함하는 단위 비트를 건너 뛰어서, 1을 포함하는 단위 비트에서 1을 포함하는 단위 비트로 이동할 수 있다. As shown in a) of FIG. 7 , the cursor according to the embodiments may skip visits in bit units until the next 1 bit appears. For example, it is possible to move from a unit bit including 1 to a unit bit including 1 by skipping a unit bit including only 0 among unit bits.

실시예들에 따른 커서는, 도 7의 b)에 도시한 것처럼, 특정 비트의 위치로 커서를 이동할 수 있다. 예를 들어, 커서는 '1'을 갖는 특정 비트의 위치로 이동할 수 있다.As shown in b) of FIG. 7, the cursor according to the embodiments may move the cursor to a position of a specific bit. For example, the cursor may move to a position of a specific bit having '1'.

이하에서는, 압축된 비트 벡터를 이용한 행렬 곱셈에 대하여 상술한다.Hereinafter, matrix multiplication using compressed bit vectors will be described in detail.

도 8은 실시예들에 따라 압축된 비트 벡터를 통해 행렬 곱셈을 행하는 것을 개략적으로 나타낸 도면이다.Figure 8 is a schematic diagram of performing matrix multiplication over a compressed bit vector according to embodiments.

실시예들에 따른 다차원 행렬 곱셈 방법(예를 들어, 도 1 내지 도 3에서 설명한 행렬 곱셈, 도 4 내지 도 6 에서 설명한 다차원 행렬 곱셈 방법)은 결과 행렬(830)(예를 들어, 도 1에서 설명한 출력 레이어 및 도 4에서 설명한 결과 행렬)을 출력하는 단계(예를 들어, 도 4에서 설명한 s404 내지 s405)를 포함할 수 있다. 구체적으로, 다차원 행렬 곱셈 방법은, 입력 행렬(810)(예를 들어, 도 1에서 설명한 입력 레이어, 도 2 내지 도 3에서 설명한 데이터 및 도 4 내지 도 5에서 설명한 입력 행렬)을 언폴드 된 벡터(820)(예를 들어, 도 6에서 설명한 언폴드 된 행렬)로 나타내고, 언폴드 된 벡터를 결과 행렬(830)로 출력할 수 있다.The multidimensional matrix multiplication method according to the embodiments (eg, the matrix multiplication described in FIGS. 1 to 3 or the multidimensional matrix multiplication method described in FIGS. 4 to 6 ) is a result matrix 830 (eg, in FIG. 1 ). The step of outputting the described output layer and the result matrix described in FIG. 4 (eg, s404 to s405 described in FIG. 4 ) may be included. Specifically, the multidimensional matrix multiplication method converts the input matrix 810 (eg, the input layer described in FIG. 1, the data described in FIGS. 2 to 3, and the input matrix described in FIGS. 4 to 5) into an unfolded vector. 820 (eg, the unfolded matrix described in FIG. 6), and the unfolded vector can be output as a result matrix 830.

실시예들에 따른 입력 행렬(810)은 입력 행렬 I 와 가중치 행렬 W를 포함할 수 있다. 입력 행렬 I 와 가중치 행렬 W는, 도 4 내지 도 5에서 설명한 양자화 기법에 의해 실수값 0 또는 1의 값을 갖는 행렬로 변환될 수 있다. 도 8에 도시한 입력 행렬(810)들이 가지는 1 내지 15의 값은, 입력 행렬(810)들이 가지는 값의 위치를 알기 쉽게 숫자로 나타낸 것이다. 즉, 입력 행렬(810)은, 양자화 기법을 통해 이진화 된 행렬(예를 들어, 도 4 내지 도 6에서 설명한 이진화 된 행렬)로 변환될 수 있다.An input matrix 810 according to embodiments may include an input matrix I and a weight matrix W. The input matrix I and the weight matrix W may be converted into a matrix having a real value of 0 or 1 by the quantization technique described in FIGS. 4 and 5 . Values of 1 to 15 of the input matrices 810 shown in FIG. 8 represent positions of the values of the input matrices 810 as easy-to-understand numbers. That is, the input matrix 810 may be converted into a binarized matrix (eg, the binarized matrix described in FIGS. 4 to 6) through a quantization technique.

실시예들에 따른 이진화 된 행렬은 도 4 및 도 6에서 설명한 언폴드 과정을 통해 언폴드 된 벡터(820)로 변환될 수 있다. 언폴드 과정을 통해, 입력 벡터 I의 벡터 값들 및 가중치 벡터 W의 벡터 값들은, 언폴드 된 벡터인 unfold(I) 및 unfold(W) 상에 나열될 수 있다. 이때, unfold(I)와 unfold(W)에 대응되는 숫자들, 예를 들어 unfold(I)에 포함되는 1과 unfold(W)에 대응되는 1은 서로 곱해져야 하는 값들을 나타낼 수 있다. The binarized matrix according to the embodiments may be converted into an unfolded vector 820 through the unfolding process described in FIGS. 4 and 6 . Through the unfolding process, the vector values of the input vector I and the vector values of the weight vector W can be listed on the unfolded vectors, unfold(I) and unfold(W). At this time, numbers corresponding to unfold(I) and unfold(W), for example, 1 included in unfold(I) and 1 corresponding to unfold(W) may indicate values to be multiplied with each other.

이때, 실시예들에 따라 언폴드 과정에서 동일 행 또는 동일 열의 구분이 없어지는 것에 대응하여, 실시예들에 따른 다차원 행렬 곱셈 방법은 입력 행렬 I의 열의 개수 또는 가중치 행렬 W의 행의 개수만큼 비트들을 논리적인 블록 단위로 그룹짓는 그루핑을 행할 수 있고, 이에 따라, 그루핑 된 비트(예를 들어, 도 4에서 설명한 그루핑 된 비트)가 형성될 수 있다. At this time, in response to the fact that the distinction between the same row or the same column is lost in the unfolding process according to the embodiments, the multidimensional matrix multiplication method according to the embodiments has as many bits as the number of columns of the input matrix I or the number of rows of the weight matrix W. Grouping may be performed to group them in logical block units, and thus, grouped bits (eg, grouped bits described in FIG. 4) may be formed.

이때, 실시예들에 따른 다차원 행렬 곱셈 방법은, 그루핑 된 간격 내에서, XNOR 및/또는 bitcount를 수행하여 결과값을 구할 수 있고, 예를 들어, Bitcount(XNOR(I,W))와 같이 수행할 수 있다. At this time, the multidimensional matrix multiplication method according to the embodiments may perform XNOR and / or bitcount within the grouped interval to obtain a result value, for example, Bitcount (XNOR (I, W)) can do.

도 8에 도시한 바와 같이, 언폴드 된 행렬(821)에 대하여 결과 행렬(831)을 출력하고 나면, 왼쪽 순환 시프트(예를 들어, 도 4에서 설명한 순환 시프트)를 통해 벡터 내 다음 간격에 대하여 연산을 반복할 수 있다. 즉, 비트 벡터로 표현되는 행렬(822)에 대하여 연산을 수행하고, 이에 따라 결과 행렬(832)을 출력할 수 있다.As shown in FIG. 8, after outputting the resulting matrix 831 for the unfolded matrix 821, for the next interval in the vector through a left cyclic shift (for example, the cyclic shift described in FIG. 4) The operation can be repeated. That is, an operation may be performed on a matrix 822 represented by a bit vector, and a resulting matrix 832 may be output accordingly.

이때, 결과 행렬(830)은, 결과값을 순서대로 ①, ②, ③, ④, ⑤, ⑥ 상에 기록할 수 있다. 이를 통해 비트 단위로 적용되는 연산의 횟수를 최소화 하여 다차원 행렬 간 효율적인 연산을 수행할 수 있다.At this time, in the result matrix 830, result values may be sequentially written on ①, ②, ③, ④, ⑤, and ⑥. Through this, it is possible to perform efficient operations between multidimensional matrices by minimizing the number of operations applied in bit units.

도 8에서는 설명의 편의를 위하여 벡터들이 압축되지 않은 상태에서 XNOR 연산을 통해 계산되어 행렬 원소의 위치를 표시하도록 도시하고 있으나, 실제 처리에서는 도 6 내지 도 7에서 도시한 바와 같이, 압축된 비트 벡터를 대상으로 도 8에서 도시한 연산을 수행할 수 있다.In FIG. 8, for convenience of description, vectors are calculated through XNOR operation in an uncompressed state to indicate the position of a matrix element, but in actual processing, as shown in FIGS. 6 and 7, compressed bit vectors It is possible to perform the calculation shown in FIG. 8 for the target.

도 9는 실시예들에 따른 다차원 행렬 곱셈 연산 장치의 구성을 개략적으로 나타낸 도면이다.9 is a diagram schematically showing the configuration of a multidimensional matrix multiplication operation device according to embodiments.

실시예들에 따른 다차원 행렬 곱셈 연산 장치(900)는 이진화 모듈(910), 비트 벡터 압축 모듈(920), 커서 모듈(930) 및 곱셈 연산 모듈(940)을 포함할 수 있다.The multidimensional matrix multiplication operation apparatus 900 according to embodiments may include a binarization module 910, a bit vector compression module 920, a cursor module 930, and a multiplication operation module 940.

실시예들에 따른 이진화 모듈(910)은, 복수 개의 입력 행렬(예를 들어, 도 1에서 설명한 입력 레이어, 도 2 내지 도 3에서 설명한 데이터 및 도 4 내지 도 5 및 도 8에서 설명한 입력 행렬)에 대해 양자화를 수행하여 복수 개의 이진화 된 행렬(예를 들어, 도 4 내지 도 6 및 도 8에서 설명한 이진화 된 행렬)을 출력할 수 있다. 이진화 모듈(910)은, 예를 들어, 도 4에서 설명한 s401을 수행할 수 있다.The binarization module 910 according to embodiments includes a plurality of input matrices (eg, the input layer described in FIG. 1, the data described in FIGS. 2 to 3, and the input matrix described in FIGS. 4 to 5 and 8). A plurality of binarized matrices (eg, the binarized matrices described in FIGS. 4 to 6 and 8) may be output by performing quantization on . The binarization module 910 may perform, for example, s401 described in FIG. 4 .

실시예들에 따른 비트 벡터 압축 모듈(920)은, 복수 개의 이진화 된 행렬을 복수 개의 압축된 비트 벡터(예를 들어, 도 4 및 도 6 내지 도 7에서 설명한 압축된 비트 벡터)로 변환할 수 있다. 비트 벡터 압축 모듈(920)은, 예를 들어, 도 4에서 설명한 s402 를 수행할 수 있다. 구체적으로, 비트 벡터 압축 모듈(920)은 복수 개의 이진화 된 행렬을 언폴드하여 비트 벡터로 변환하고, 변환된 비트 벡터에 포함되는 비트의 값을 기록하여 변환된 비트 벡터의 압축을 수행할 수 있다. 이때, 압축된 비트 벡터의 값은 0, 1 및 0 과 1이 반복된 횟수를 포함할 수 있다.The bit vector compression module 920 according to embodiments may convert a plurality of binarized matrices into a plurality of compressed bit vectors (eg, the compressed bit vectors described in FIGS. 4 and 6 to 7). there is. The bit vector compression module 920 may perform, for example, s402 described with reference to FIG. 4 . Specifically, the bit vector compression module 920 may unfold a plurality of binarized matrices, convert them into bit vectors, record values of bits included in the converted bit vectors, and perform compression of the converted bit vectors. . In this case, the value of the compressed bit vector may include 0, 1, and the number of repetitions of 0 and 1.

실시예들에 따른 커서 모듈(930)은, 복수 개의 압축된 비트 벡터 상에서 특정 비트의 위치를 계산하거나 또는 특정 위치의 비트 값을 계산할 수 있다. 커서 모듈(930)은, 커서((예를 들어, 도 4 내지 도 5에서 설명한 커서 또는 가상 커서)를 이용하여 특정 비트(예를 들어, 1의 위치)를 획득할 수 있다. 커서 모듈(930)은, 예를 들어, 도 4에서 설명한 s403을 수행할 수 있다. 커서 모듈(930)은 또한 도 7에서 보이는 바와 같이 다음의 1 비트값이 나올 때까지 다른 0 값들을 검사하지 않도록 계산을 건너뛸 수 있다. The cursor module 930 according to embodiments may calculate a position of a specific bit on a plurality of compressed bit vectors or calculate a bit value of a specific position. The cursor module 930 may acquire a specific bit (eg, a position of 1) using a cursor (eg, the cursor described in FIGS. 4 to 5 or a virtual cursor). Cursor module 930 ) may perform, for example, s403 described in Figure 4. The cursor module 930 may also skip calculation so that other 0 values are not checked until the next 1-bit value as shown in Figure 7. can run

실시예들에 따른 곱셈 연산 모듈(940)은, 복수 개의 압축된 비트 벡터의 곱셈을 수행하여 결과 행렬(예를 들어, 도 4 및 도 8에서 설명한 결과 행렬)을 출력할 수 있다. 곱셈 연산 모듈(940)은, 예를 들어, 도 4에서 설명한 s404 및 s405을 수행할 수 있다. 구체적으로, 곱셈 연산 모듈(940)은 복수 개의 입력 행렬에 포함된 열의 개수 또는 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 블록 단위로 그루핑할 수 있다. 곱셈 연산 모듈(940)은 그루핑 된 비트들에 대하여 기 설정된 연산을 수행할 수 있다. 이때, 기 설정된 연산은 배타적 NOR(XNOR, exclusive Nor)일 수 있다. 즉, 곱셈 연산 모듈(940)은, 기 설정된 연산, 예를 들어, bitcount 함수를 이용한 XNOR 연산을 통해, 그루핑 된 비트들에 포함된 1의 개수를 계산할 수 있다. 곱셈 연산 모듈(940)은, 1의 개수를 위치로 설정하여 결과 행렬을 출력할 수 있다. 곱셈 연산 모듈(940)은, 첫 번째 그룹에 대하여 결과 행렬이 출력되면, 다음 그룹에 대하여 압축된 비트 벡터 간 곱셈을 수행할 수 있다. 곱셈 연산 모듈(940)은, 그룹화 된 비트들 전체에 대하여 곱셈이 수행된 경우, 블록 단위의 크기만큼 압축된 비트 벡터에 포함되는 비트 벡터를 순환 시프트(예를 들어, 도 4 및 도 8에서 설명한 순환 시프트) 하여 압축된 비트 벡터의 곱셈을 수행할 수 있다. The multiplication operation module 940 according to embodiments may perform multiplication of a plurality of compressed bit vectors and output a result matrix (eg, the result matrix described in FIGS. 4 and 8 ). The multiplication operation module 940 may perform, for example, s404 and s405 described in FIG. 4 . Specifically, the multiplication operation module 940 may group bits included in the bit vector compressed by at least one of the number of columns and the number of rows included in the plurality of input matrices in block units. The multiplication operation module 940 may perform a preset operation on the grouped bits. In this case, the preset operation may be exclusive NOR (XNOR, exclusive Nor). That is, the multiplication operation module 940 may calculate the number of 1s included in the grouped bits through a preset operation, for example, an XNOR operation using the bitcount function. The multiplication operation module 940 may output a result matrix by setting the number of 1's as positions. When the result matrix for the first group is output, the multiplication operation module 940 may perform multiplication between compressed bit vectors for the next group. When multiplication is performed on all of the grouped bits, the multiplication operation module 940 cyclically shifts the bit vector included in the bit vector compressed by the block unit size (for example, as described in FIGS. 4 and 8 ). cyclic shift) to perform multiplication of compressed bit vectors.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 포함하는 저장 장치는 압축 비트벡터를 이용한 행렬 곱셈 고속화 기법을 통해 GPU 등 고성능 계산 기기를 갖지 못하는 저성능의 PC 및 임베디드, 모바일 기기 등에서 딥러닝 모델을 효율적으로 처리할 수 있다.A storage device including a multidimensional matrix multiplication method, an arithmetic unit, and a program according to embodiments uses a matrix multiplication speedup technique using a compressed bit vector in low-performance PCs and embedded and mobile devices that do not have high-performance computing devices such as GPUs. The running model can be processed efficiently.

제1, 제2 등과 같은 용어는 실시예들의 다양한 구성요소들을 설명하기 위해 사용될 수 있다. 하지만 실시예들에 따른 다양한 구성요소들은 위 용어들에 의해 해석이 제한되어서는 안된다. 이러한 용어는 하나의 구성요소를 다른 구성요소와 구별하기 위해 사용되는 것에 불과하다. 것에 불과하다. 예를 들어, 제1 사용자 인풋 시그널은 제2사용자 인풋 시그널로 지칭될 수 있다. 이와 유사하게, 제2사용자 인풋 시그널은 제1사용자 인풋시그널로 지칭될 수 있다. 이러한 용어의 사용은 다양한 실시예들의 범위 내에서 벗어나지 않는 것으로 해석되어야만 한다. 제1사용자 인풋 시그널 및 제2사용자 인풋 시그널은 모두 사용자 인풋 시그널들이지만, 문맥 상 명확하게 나타내지 않는 한 동일한 사용자 인풋 시그널들을 의미하지 않는다.Terms such as first, second, etc. may be used to describe various components of the embodiments. However, interpretation of various components according to embodiments should not be limited by the above terms. These terms are only used to distinguish one component from another. only thing For example, a first user input signal may be referred to as a second user input signal. Similarly, the second user input signal may be referred to as the first user input signal. Use of these terms should be construed as not departing from the scope of the various embodiments. Although both the first user input signal and the second user input signal are user input signals, they do not mean the same user input signals unless the context clearly indicates otherwise.

실시예들을 설명하기 위해 사용된 용어는 특정 실시예들을 설명하기 위한 목적으로 사용되고, 실시예들을 제한하기 위해서 의도되지 않는다. 실시예들의 설명 및 청구항에서 사용된 바와 같이, 문맥 상 명확하게 지칭하지 않는 한 단수는 복수를 포함하는 것으로 의도된다. 및/또는 표현은 용어 간의 모든 가능한 결합을 포함하는 의미로 사용된다. 포함한다 표현은 특징들, 수들, 단계들, 엘리먼트들, 및/또는 컴포넌트들이 존재하는 것을 설명하고, 추가적인 특징들, 수들, 단계들, 엘리먼트들, 및/또는 컴포넌트들을 포함하지 않는 것을 의미하지 않는다. 실시예들을 설명하기 위해 사용되는, ~인 경우, ~때 등의 조건 표현은 선택적인 경우로만 제한 해석되지 않는다. 특정 조건을 만족하는 때, 특정 조건에 대응하여 관련 동작을 수행하거나, 관련 정의가 해석되도록 의도되었다.Terms used to describe the embodiments are used for the purpose of describing specific embodiments and are not intended to limit the embodiments. As used in the description of the embodiments and in the claims, the singular is intended to include the plural unless the context clearly dictates otherwise. and/or expressions are used in a sense that includes all possible combinations between the terms. The expression includes describes that there are features, numbers, steps, elements, and/or components, and does not imply that additional features, numbers, steps, elements, and/or components are not included. . Conditional expressions such as when ~, when, etc., used to describe the embodiments, are not limited to optional cases. When a specific condition is satisfied, a related action is performed in response to the specific condition, or a related definition is intended to be interpreted.

이상의 설명은 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 실시예들이 속하는 기술 분야에서 통상의 지식을 가진 자라면 실시예들의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. The above description is merely an example of the technical idea, and those skilled in the art to which the embodiments belong will be able to make various modifications and variations without departing from the essential characteristics of the embodiments.

따라서, 이상에서 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 본 발명의 실시예에 의하여 기술 사상의 범위가 한정되는 것은 아니다.Therefore, the embodiments disclosed above are not intended to limit the technical idea of the present invention, but to explain, and the scope of the technical idea is not limited by these embodiments of the present invention.

본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.The protection scope of the present invention should be construed according to the following claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

810: 입력 행렬
820: 언폴드 된 벡터
830: 결과 행렬810: input matrix
820: Unfolded vector
830: result matrix

Claims

performing quantization on a plurality of input matrices and outputting a plurality of binarized matrices;
compressing the plurality of binarized matrices and converting them into a plurality of compressed bit vectors;
Based on whether or not each word constituting the bit vector is compressed on the plurality of compressed bit vectors, the position of the word currently visited by the cursor, the position of the bit indicated by the cursor in the word visited by the current cursor, restoring positional information of a specific bit through a cursor including the number of bits in a remaining word of the word;
outputting a resulting matrix by performing multiplication between the plurality of compressed bit vectors; and
transforming the resulting matrix in correspondence with positional information of the specific bit on the compressed bit vector; including,
Multidimensional matrix multiplication method.

According to claim 1,
The step of converting into the plurality of compressed bit vectors,
converting the plurality of binarized matrices into bit vectors by unfolding;
performing compression of the converted bit vector by recording values of bits included in the converted bit vector; including,
Multidimensional matrix multiplication method.

According to claim 1,
The compressed bit vector is represented by 0 and 1,
Multidimensional matrix multiplication method.

According to claim 1,
The plurality of input matrices include a first matrix and a second matrix,
The step of outputting the result matrix,
grouping bits included in the compressed bit vector by the number of at least one of the number of columns of the first matrix and the number of rows of the second matrix in a logical block unit;
performing a predetermined operation on the grouped bits;
calculating the number of 1s included in the new bit vector obtained through the predetermined operation; and
outputting the resulting matrix by setting the number of 1's as a position; further comprising
Multidimensional matrix multiplication method.

According to claim 4,
The preset operation is XNOR (exclusive nor) operation and Bitcount operation,
Multidimensional matrix multiplication method.

According to claim 4,
The bits grouped in units of the logical block include a first group and a second group,
The step of outputting the result matrix,
outputting a result matrix for the second group as the result matrix for the first group is output; containing
Multidimensional matrix multiplication method.

According to claim 6,
When the result matrix is output for all of the bits grouped in units of the logical block,
circularly shifting a bit vector included in the compressed bit vector by the size of the logical block unit;
grouping bits included in the compressed bit vector by the number of at least one of the number of columns of the first matrix and the number of rows of the second matrix in logical block units;
performing a predetermined operation on the grouped bits;
calculating the number of 1s included in the new bit vector obtained through the predetermined operation; and
outputting the resulting matrix by setting the number of 1's as a position; further comprising
Multidimensional matrix multiplication method.

a binarization module that performs quantization on a plurality of input matrices and outputs a plurality of binarized matrices;
a bit vector compression module for converting the plurality of binarized matrices into a plurality of compressed bit vectors;
Based on whether or not each word constituting the bit vector is compressed on the plurality of compressed bit vectors, the position of the word currently visited by the cursor, the position of the bit indicated by the cursor in the word visited by the current cursor, a cursor module that restores positional information of a specific bit through a cursor including the number of bits in the remaining word of the word; and
a multiplication operation module for outputting a resultant matrix by performing multiplication of the plurality of compressed bit vectors;
including,
Multidimensional matrix multiplication unit.

According to claim 8,
The bit vector compression module,
Unfolding the plurality of binarized matrices and converting them into bit vectors;
Recording the values of bits included in the converted bit vector to perform compression of the converted bit vector,
Multidimensional matrix multiplication unit.

According to claim 8,
The bit vector is represented by 0 and 1,
Multidimensional matrix multiplication unit.

According to claim 8,
The plurality of input matrices include a first matrix and a second matrix,
The multiplication operation module,
Grouping the bits included in the compressed bit vector by the number of at least one of the number of columns of the first matrix or the number of rows of the second matrix in logical block units,
Performing a predetermined operation on the grouped bits;
Calculate the number of 1s included in the new bit vector obtained through the preset operation,
Outputting the result matrix by setting the number of 1s as a position,
Multidimensional matrix multiplication unit.

According to claim 11,
The preset operation is an XNOR operation and a Bitcount operation,
Multidimensional matrix multiplication unit.

According to claim 11,
The bits grouped in units of the logical block include a first group and a second group,
The multiplication operation module,
Outputting a result matrix for the second group as the result matrix is output for the first group,
Multidimensional matrix multiplication unit.

According to claim 13,
When the multiplication operation module outputs a result matrix for all of the bits grouped in units of the logical block,
The multiplication operation module,
Cyclically shifting a bit vector included in the compressed bit vector by the size of the logical block unit,
Grouping the bits included in the compressed bit vector by the number of at least one of the number of columns of the first matrix or the number of rows of the second matrix in logical block units,
Performing a predetermined operation on the grouped bits;
Calculate the number of 1s included in the new bit vector obtained through the preset operation,
Outputting the result matrix by setting the number of 1s as a position,
Multidimensional matrix multiplication unit.

quantization is performed on a plurality of input matrices and output as a plurality of binarized matrices;
converting the plurality of binarized matrices into a plurality of compressed bit vectors;
Based on whether or not each word constituting the bit vector is compressed on the plurality of compressed bit vectors, the position of the word currently visited by the cursor, the position of the bit indicated by the cursor in the word visited by the current cursor, returning positional information of a specific bit through a cursor including the number of bits in the remaining word of the word;
outputting a resulting matrix by performing multiplication between the plurality of compressed bit vectors;
transforming the resulting matrix in response to positional information of the specific bit on the compressed bit vector;
A storage medium that stores a multidimensional matrix multiplication program.