KR20220137443A

KR20220137443A - Method, apparatus and storage for storing a program for multi-demensional matrix multiplication

Info

Publication number: KR20220137443A
Application number: KR1020210043533A
Authority: KR
Inventors: 이경하; 성원경; 김은희
Original assignee: 한국과학기술정보연구원
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2022-10-12
Also published as: KR102572429B1

Abstract

The embodiments relate to a method, operation device, and program for multiplying a multidimensional matrix for improving a multiplication operation processing performance for a large multidimensional matrix, wherein a result matrix can be derived through multiplication between the compressed bit vectors. The method comprises: a step of outputting as a plurality of binarized matrices; a step of converting as a plurality of compressed bit vectors; a step of calculating a position of a specific bit; a step of outputting a result matrix; and a step of transforming the result matrix.

Description

A storage medium for storing a compressed multidimensional matrix multiplication method, an arithmetic unit and a program {METHOD, APPARATUS AND STORAGE FOR STORING A PROGRAM FOR MULTI-DEMENSIONAL MATRIX MULTIPLICATION}

실시예들은 행렬 곱셈 방법, 연산 장치 및 프로그램에 관한 것으로, 구체적으로, 압축된 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램에 적용될 수 있다.The embodiments relate to a matrix multiplication method, an arithmetic device, and a program, and specifically, may be applied to a compressed multidimensional matrix multiplication method, an arithmetic device, and a program.

최근, 인공지능 및 딥러닝 분야에 대한 관심 및 활용도가 높아지고 있다. 이에 따라, 딥 러닝을 위한 딥 뉴럴 네트워크(Deep Neural Network)에 대한 관심 및 활용도가 높아지고 있다.Recently, interest and utilization in the fields of artificial intelligence and deep learning are increasing. Accordingly, interest and utilization of deep neural networks for deep learning are increasing.

딥 뉴럴 네트워크의 학습 및 추론을 위해서는 입력 벡터, 행렬 또는 텐서(tensor)에 대해 가중치(weight) 행렬 또는 텐서를 곱하는 행렬 곱셈(matrix multiplication) 연산이 요구된다. 이러한 행렬 곱셈 연산은 딥 뉴럴 네트워크의 깊이와 크기가 증가함에 따라 연산 속도를 결정짓는 중요한 요소로서, 행렬 내 각 값들의 양자화(quantization) 기술을 포함한다.In order to learn and infer a deep neural network, a matrix multiplication operation of multiplying an input vector, matrix, or tensor with a weight matrix or tensor is required. This matrix multiplication operation is an important factor determining the operation speed as the depth and size of the deep neural network increase, and includes a quantization technique of each value in the matrix.

그러나, 양자화된 행렬에 대한 곱셈 연산은, 기존 희소 행렬 기법으로 효과적으로 처리하기 곤란하며, 딥러닝에서의 행렬은 희소 행렬이 아닌 경우가 많아 데이터 저장 공간이나 연산량 면에서 효과적이지 않은 문제가 있다.However, it is difficult to effectively process a multiplication operation on a quantized matrix using the existing sparse matrix technique, and the matrix in deep learning is not effective in terms of data storage space or amount of computation because it is often not a sparse matrix.

또한, 행렬이나 텐서를 비트 벡터로 변환하여 이를 이용한 행렬 곱셈 연산은, 데이터에 존재하는 유일값(Distinct value)의 수만큼의 비트 벡터가 존재해야 함에 따라, 다양한 값을 갖는 딥러닝에서의 행렬값을 표현할 때 많은 수의 비트 벡터가 필요한 문제가 있다. 나아가, 압축 비트 벡터를 압축하여 이용하더라도, 행렬 곱셈 연산 시에는 비트벡터에 대하여 곱셈 연산을 수행하지 못하고, 압축을 푼 뒤 처리해야 하는 문제가 있다.In addition, matrix multiplication operation using a matrix or tensor converted into a bit vector requires as many bit vectors as the number of distinct values present in data, so matrix values in deep learning with various values There is a problem that a large number of bit vectors are required to express . Furthermore, even if the compressed bit vector is compressed and used, there is a problem in that the multiplication operation cannot be performed on the bit vector during the matrix multiplication operation, and the process must be performed after decompression.

따라서, 압축된 데이터 표현을 행렬 곱셈 연산에 그대로 적용할 수 있는 행렬 곱셈 방법이 요구된다.Accordingly, there is a need for a matrix multiplication method capable of directly applying the compressed data representation to a matrix multiplication operation.

실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 입력 행렬 또는 텐서에 대해 양자화를 수행하여 복수 개의 이진화 된 행렬 또는 텐서로 출력하는 단계; 복수 개의 이진화 된 행렬 또는 텐서를 압축하여 복수 개의 압축된 비트 벡터로 변환하는 단계; 복수 개의 압축된 비트 벡터 상에서 특정 위치의 비트를 계산하는 단계; 복수 개의 압축된 비트 벡터 간 곱셈을 수행하여 결과 행렬을 출력하는 단계; 및 압축된 비트 벡터 상에서 특정 비트의 위치에 대응하여, 결과 행렬을 변환하는 단계; 를 포함할 수 있다.A multidimensional matrix multiplication method according to embodiments may include performing quantization on a plurality of input matrices or tensors and outputting them as a plurality of binarized matrices or tensors; compressing a plurality of binarized matrices or tensors and converting them into a plurality of compressed bit vectors; calculating a bit at a specific position on a plurality of compressed bit vectors; outputting a result matrix by performing multiplication between a plurality of compressed bit vectors; and transforming the result matrix according to the position of the specific bit on the compressed bit vector; may include.

또한, 복수 개의 압축된 비트 벡터로 변환하는 단계는, 복수 개의 이진화 된 행렬 또는 텐서를 1차원으로 풀어(unfold) 비트 벡터로 변환하는 단계; 및 변환된 비트 벡터에 포함되는 비트의 값을 기록하여 변환된 비트 벡터의 압축을 수행하는 단계; 를 포함할 수 있다.In addition, the converting into a plurality of compressed bit vectors may include: transforming a plurality of binarized matrices or tensors into a bit vector by unfolding them in one dimension; and performing compression of the converted bit vector by writing the value of the bit included in the converted bit vector. may include.

또한, 압축된 비트 벡터의 값은 벡터의 압축여부(0 또는 1)과 0, 1 값 및 0 과 1이 반복된 횟수를 포함할 수 있다.In addition, the value of the compressed bit vector may include whether the vector is compressed (0 or 1), the values 0 and 1, and the number of times 0 and 1 are repeated.

또한, 복수 개의 입력 행렬은 행렬 곱셈의 대상이 되는 제 1 행렬 및 제 2 행렬을 포함하고, 결과 행렬을 출력하는 단계는, 제 1 행렬의 열의 개수 또는 상기 제 2 행렬의 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 논리적인 블록 단위로 그루핑(grouping)하는 단계; 그루핑 된 비트들에 대하여 기 설정된 연산을 수행하는 단계; 기 설정된 연산을 통해 얻어진 새로운 비트 벡터에 포함되는 1의 개수를 계산하는 단계; 및 1의 개수를 위치(position)로 설정하여 결과 행렬을 출력하는 단계; 를 더 포함할 수 있다.In addition, the plurality of input matrices include a first matrix and a second matrix to be subjected to matrix multiplication, and the outputting of the result matrix includes at least one of the number of columns of the first matrix or the number of rows of the second matrix. grouping bits included in the compressed bit vector by the number of logical blocks in units of logical blocks; performing a preset operation on the grouped bits; calculating the number of 1's included in a new bit vector obtained through a preset operation; and outputting a result matrix by setting the number of 1s as positions; may further include.

또한, 기 설정된 연산은 XNOR(exclusive nor) 연산과 bitcount 연산일 수 있다.Also, the preset operation may be an exclusive nor (XNOR) operation and a bitcount operation.

또한, 논리적인 블록 단위로 그루핑 된 비트들은 제 1 그룹 및 제 2 그룹을 포함하고, 결과 행렬을 출력하는 단계는, 제 1 그룹에 대하여 결과 행렬이 출력됨에 따라, 제 2 그룹에 대하여 결과 행렬을 출력하는 단계; 를 포함할 수 있다.In addition, the bits grouped in logical block units include a first group and a second group, and the step of outputting the result matrix includes, as the result matrix is output for the first group, the result matrix for the second group outputting; may include.

또한, 논리적인 블록 단위로 그루핑 된 비트들 전부에 대해 결과 행렬을 출력한 경우, 논리적인 블록 단위의 크기만큼, 압축된 비트 벡터에 포함되는 비트 벡터를 왼쪽으로 순환 시프트(circular shift)하는 단계; 제 1 행렬의 열의 개수 또는 제 2 행렬의 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 블록 단위로 그루핑하는 단계; 그루핑 된 비트들에 대하여 기 설정된 연산을 수행하는 단계; 기 설정된 연산을 통해 얻어진 새로운 비트 벡터에 포함되는 1의 개수를 계산하는 단계; 및 1의 개수를 위치로 설정하여 결과 행렬을 출력하는 단계; 를 더 포함할 수 있다.In addition, when the result matrix is output for all bits grouped in logical block units, circularly shifting a bit vector included in the compressed bit vector to the left by the size of the logical block unit; grouping bits included in a compressed bit vector by at least one of the number of columns of the first matrix and the number of rows of the second matrix in units of blocks; performing a preset operation on the grouped bits; calculating the number of 1's included in a new bit vector obtained through a preset operation; and outputting a result matrix by setting the number of 1s to positions; may further include.

실시예들에 따른 다차원 행렬 곱셈 연산 장치는, 복수 개의 입력 행렬에 대해 양자화를 수행하여 복수 개의 이진화 된 행렬을 출력하는 이진화 모듈; 복수 개의 이진화 된 행렬을 복수 개의 압축된 비트 벡터로 변환하는 비트 벡터 압축 모듈; 복수 개의 압축된 비트 벡터 상에서 특정 비트의 위치를 계산하는 커서 모듈; 및 복수 개의 압축된 비트 벡터의 곱셈을 수행하여 결과 행렬을 출력하는 곱셈 연산 모듈; 을 포함할 수 있다.A multidimensional matrix multiplication operation apparatus according to embodiments includes: a binarization module configured to output a plurality of binarized matrices by performing quantization on a plurality of input matrices; a bit vector compression module for converting a plurality of binarized matrices into a plurality of compressed bit vectors; a cursor module for calculating a position of a specific bit on a plurality of compressed bit vectors; and a multiplication operation module for performing multiplication of a plurality of compressed bit vectors to output a result matrix; may include

또한, 비트 벡터 압축 모듈은, 복수 개의 이진화 된 행렬 또는 텐서를 풀어(unfold) 비트 벡터로 변환하고, 변환된 비트 벡터에 포함되는 비트의 값을 기록하여 상기 변환된 비트 벡터의 압축을 수행할 수 있다.In addition, the bit vector compression module unfolds a plurality of binarized matrices or tensors, converts them into bit vectors, and records the values of bits included in the converted bit vectors to perform compression of the converted bit vectors. have.

또한, 복수 개의 입력 행렬은 제 1 행렬 및 제 2 행렬을 포함하고, 곱셈 연산 모듈은, 제 1 행렬의 열의 개수 또는 제 2 행렬의 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 논리적인 블록 단위로 그루핑하고, 그루핑 된 비트들에 대하여 기 설정된 연산을 수행하고, 기 설정된 연산을 통해 얻어진 새로운 비트 벡터에 포함되는 1의 개수를 계산하고, 1의 개수를 위치로 설정하여 상기 결과 행렬을 출력할 수 있다.In addition, the plurality of input matrices include a first matrix and a second matrix, and the multiplication operation module is included in the compressed bit vector by at least one of the number of columns of the first matrix or the number of rows of the second matrix. Grouping bits in logical block units, performing a preset operation on the grouped bits, calculating the number of 1s included in a new bit vector obtained through the preset operation, and setting the number of 1s as positions The result matrix may be output.

또한, 논리적인 블록 단위로 그루핑 된 비트들은 제 1 그룹 및 제 2 그룹을 포함하고, 곱셈 연산 모듈은, 제 1 그룹에 대하여 결과 행렬이 출력됨에 따라, 제 2 그룹에 대하여 결과 행렬을 출력할 수 있다.In addition, the bits grouped in logical block units include a first group and a second group, and the multiplication operation module may output the result matrix to the second group as the result matrix is output to the first group. have.

또한, 곱셈 연산 모듈이, 논리적인 블록 단위로 그루핑 된 비트들 전부에 대해 결과 행렬을 출력한 경우, 곱셈 연산 모듈은, 논리적인 블록 단위의 크기만큼 상기 압축된 비트 벡터에 포함되는 비트 벡터를 순환 시프트하고, 제 1 행렬의 열의 개수 또는 상기 제 2 행렬의 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 논리적인 블록 단위로 그루핑하고, 그루핑 된 비트들에 대하여 기 설정된 연산을 수행하고, 기 설정된 연산을 통해 얻어진 새로운 비트 벡터에 포함되는 1의 개수를 계산하고, 1의 개수를 위치로 설정하여 결과 행렬을 출력할 수 있다.In addition, when the multiplication operation module outputs a result matrix for all of the bits grouped in logical block units, the multiplication operation module circulates the bit vectors included in the compressed bit vectors by the size of the logical block units. Shifting, grouping bits included in the compressed bit vector by at least one of the number of columns of the first matrix or the number of rows of the second matrix in logical block units, and a preset operation on the grouped bits , calculates the number of 1s included in a new bit vector obtained through a preset operation, sets the number of 1s as positions, and outputs a result matrix.

실시예들에 따른 다차원 행렬 곱셈 프로그램을 저장하는 저장매체는, 복수 개의 입력 행렬 또는 텐서에 대해 양자화(quantization)를 수행하여 복수 개의 이진화 된 행렬로 출력하고; 복수 개의 이진화 된 행렬을 복수 개의 압축된 비트 벡터로 변환하고; 복수 개의 압축된 비트 벡터 상에서 특정 위치의 비트 값을 계산하고; 복수 개의 압축된 비트 벡터 간 곱셈을 수행하여 결과 행렬을 출력하고; 압축된 비트 벡터 상에서 특정 비트의 위치에 대응하여, 결과 행렬을 변환할 수 있다.A storage medium storing a multidimensional matrix multiplication program according to embodiments may include performing quantization on a plurality of input matrices or tensors and outputting them as a plurality of binarized matrices; transform the plurality of binarized matrices into a plurality of compressed bit vectors; calculating a bit value of a specific position on a plurality of compressed bit vectors; performing multiplication between a plurality of compressed bit vectors to output a result matrix; Corresponding to the position of a specific bit on the compressed bit vector, the result matrix may be transformed.

실시예들에 따른 압축된 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 대규모 다차원 행렬 및 텐서에 대한 곱셈 연산 처리 성능의 향상을 통해 딥러닝에서의 딥 뉴럴 네트워크 모델 학습 및 추론 과정에서 처리 속도와 저장 공간 양쪽에서 효율성을 개선할 수 있다.A storage medium for storing a compressed multidimensional matrix multiplication method, an arithmetic device, and a program according to the embodiments is a deep neural network model learning and inference process in deep learning through improvement of multiplication operation processing performance for large-scale multidimensional matrices and tensors. It can improve efficiency in both processing speed and storage space.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 고성능 연산 장치를 가지지 못한 임베디드, 모바일 기기 등에서 딥러닝 모델을 효율적으로 처리할 수 있다. The multi-dimensional matrix multiplication method according to the embodiments, the computing device, and the storage medium for storing the program can efficiently process the deep learning model in embedded or mobile devices that do not have a high-performance computing device.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 계산해야 할 입력 행렬의 크기를 압축 비트벡터를 통해 크게 축소할 수 있다.The multidimensional matrix multiplication method, arithmetic device, and storage medium for storing a program according to embodiments may greatly reduce the size of an input matrix to be calculated through a compressed bit vector.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 행렬 곱셈 연산을 압축 비트 벡터 상의 비트열 연산으로 대체함으로써 저성능 기기에서의 계산 시간을 크게 단축할 수 있다.The multi-dimensional matrix multiplication method, the calculation device, and the storage medium for storing the program according to the embodiments replace the matrix multiplication operation with a bit string operation on a compressed bit vector, thereby greatly reducing the calculation time in a low-performance device.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 저성능 기기에서의 딥러닝 모델을 통한 추론을 효율적으로 하도록 할 수 있다.The multidimensional matrix multiplication method according to the embodiments, the arithmetic device and the storage medium for storing the program can efficiently perform inference through a deep learning model in a low-performance device.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 저장하는 저장매체는 모델 양자화를 통해 경량화 된 뉴럴 네트워크 모델에서의 행렬 곱셈 연산을 보대 효과적으로 수행할 수 있다.The multi-dimensional matrix multiplication method, the calculation device, and the storage medium for storing the program according to the embodiments can more effectively perform the matrix multiplication operation in the lightweight neural network model through model quantization.

도 1은 실시예들에 따른 뉴럴 네트워크의 구조도를 도시한 것이다.
도 2는 실시예들에 따른 행렬 곱셈을 도시한 것이다.
도 3은 실시예들에 따른 행렬 곱셈을 비트 벡터를 이용하여 실행하는 예를 도시하였다.
도 4는 실시예들에 따른 다차원 행렬 곱셈 방법의 흐름도이다.
도 5는 실시예들에 따라 입력 행렬을 이진화하고 이를 행렬 곱셈 하는 것이 기존 입력행렬들을 그대로 행렬 곱셈한 결과와 근사화될 수 있음을 개략적으로 나타낸 도면이다.
도 6은 실시예들에 따라 행렬 및 텐서를 풀어 생성한 비트 벡터를 압축하는 것을 개략적으로 나타낸 도면이다.
도 7은 실시예들에 따라 커서에 의해 특정 비트의 위치가 계산되는 것을 개략적으로 나타낸 도면이다.
도 8은 실시예들에 따라 압축된 비트 벡터를 통해 행렬 곱셈을 행하는 것을 개략적으로 나타낸 도면이다.
도 9는 실시예들에 따른 다차원 행렬 곱셈 연산 장치의 구성을 개략적으로 나타낸 도면이다.1 illustrates a structural diagram of a neural network according to embodiments.
2 illustrates matrix multiplication according to embodiments.
3 illustrates an example of performing matrix multiplication using a bit vector according to embodiments.
4 is a flowchart of a multidimensional matrix multiplication method according to embodiments.
FIG. 5 is a diagram schematically illustrating that binarizing an input matrix and matrix-multiplying it according to embodiments can approximate a result of matrix-multiplying existing input matrices as they are.
6 is a diagram schematically illustrating compression of a bit vector generated by unraveling a matrix and a tensor according to embodiments.
7 is a diagram schematically illustrating that a position of a specific bit is calculated by a cursor according to embodiments.
8 is a diagram schematically illustrating performing matrix multiplication through a compressed bit vector according to embodiments.
9 is a diagram schematically illustrating a configuration of a multidimensional matrix multiplication operation apparatus according to embodiments.

이하, 첨부된 도면을 참조하여 실시예들을 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예들을 설명함에 있어서 관련된 공지기술에 대한 구체적인 설명이 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 기술적 사상이 제한되는 것으로 해석되어서는 안된다.Hereinafter, the embodiments will be described in detail with reference to the accompanying drawings, but the same or similar components are assigned the same reference numerals regardless of reference numerals, and redundant description thereof will be omitted. In describing the embodiments, if it is determined that a detailed description of a related known technology may obscure the gist of the embodiment, the detailed description thereof will be omitted. In addition, the accompanying drawings are only for easy understanding of the embodiments, and should not be construed as limiting the technical idea by the accompanying drawings.

또한, 층, 영역 또는 기판과 같은 요소가 다른 구성요소 "상(on)"에 존재하는 것으로 언급될 때, 이것은 직접적으로 다른 요소 상에 존재하거나 또는 그 사이에 중간 요소가 존재할 수도 있다는 것을 이해할 수 있을 것이다. It is also understood that when an element, such as a layer, region, or substrate, is referred to as being “on” another component, it may be directly on the other element or intervening elements in between. There will be.

실시예들을 통해 설명되는 다차원 행렬 곱셈은, 딥러닝 모델을 도입해야 할 필요성이 있는 모바일, 임베디드 기기 등 계산성능, 메모리, 배터리 제약을 갖는 저성능 기기를 위한 딥러닝 기반 서비스 모두에 활용 가능하나, 추후 개발되는 새로운 제품 형태라도 행렬 곱셈이 적용되는 연산 방법 또는 장치에는 적용될 수 있음을 본 기술 분야의 통상의 기술자라면 쉽게 알 수 있을 것이다.Multi-dimensional matrix multiplication described through embodiments can be used for both deep learning-based services for low-performance devices with computational performance, memory, and battery constraints such as mobile and embedded devices that need to introduce a deep learning model, Those skilled in the art will readily appreciate that even a new product form developed later can be applied to an arithmetic method or apparatus to which matrix multiplication is applied.

도 1은 실시예들에 따른 뉴럴 네트워크의 구조도를 도시한 것이다.1 illustrates a structural diagram of a neural network according to embodiments.

도 1에 도시한 바와 같이, 실시예들에 따른 뉴럴 네트워크(100)(Neural Network)는 입력 레이어(101), 복수 개의 히든 레이어들(102) 및 출력 레이어(103)를 포함할 수 있다. 실시예들에 따른 뉴럴 네트워크(100)는, 유효한 정보를 추출할 수 있는 보다 많은 레이어들을 포함하고 있어, 효과적으로 복잡한 데이터 집합들을 처리할 수 있다. 도 1에서는 1 개의 입력 레이어(101), 3 개의 히든 레이어(102) 및 1 개의 출력 레이어(103)를 도시하였으나, 이에 한정되는 것은 아니며, 실시예들에 따른 뉴럴 네트워크(100)는 더 적거나 더 많은 레이어들을 포함할 수 있다. 나아가, 실시예들에 따른 뉴럴 네트워크(100)는 도 1에 도시된 것과는 달리 다른 다양한 구조의 레이어들을 포함할 수 있다.1 , a neural network 100 according to embodiments may include an input layer 101 , a plurality of hidden layers 102 , and an output layer 103 . The neural network 100 according to the embodiments includes more layers from which valid information can be extracted, and thus can effectively process complex data sets. 1 illustrates one input layer 101 , three hidden layers 102 , and one output layer 103 , but the present invention is not limited thereto. It can contain more layers. Furthermore, the neural network 100 according to the embodiments may include layers having various structures different from those shown in FIG. 1 .

실시예들에 따른 뉴럴 네트워크(100)에 포함된 레이어들(101, 102, 103) 각각은 뉴런(neuron), 프로세싱 엘리먼트(processing element: PE), 유닛(unit) 또는 이와 유사한 용어들로 알려진 복수의 인공 노드(artificial node)들을 포함할 수 있다. 예를 들어, 도 1에 도시한 바와 같이, 입력 레이어(101), 복수 개의 히든 레이어들(102) 및 출력 레이어(103)는 각각 6 개의 노드들을 포함할 수 있다. 그러나, 이에 한정되는 것은 아니며, 각각의 레이어들(101, 102, 103)은 다양한 개수의 노드들을 포함할 수 있다.Each of the layers 101 , 102 , and 103 included in the neural network 100 according to embodiments is a neuron, a processing element (PE), a unit, or a plurality known as similar terms. may include artificial nodes of For example, as shown in FIG. 1 , the input layer 101 , the plurality of hidden layers 102 , and the output layer 103 may each include six nodes. However, the present invention is not limited thereto, and each of the layers 101 , 102 , and 103 may include a variable number of nodes.

실시예들에 따른 각각의 레이어들(101, 102, 103)에 포함되는 노드들은 서로 연결되어 그 노드의 연산 결과를 다음 레이어의 노드에 전달할 수 있다. 예를 들어, 하나의 노드는 이전 레이어의 다른 노드들로부터 데이터를 수신하여 연산할 수 있고, 연산 결과를 다음 레이어의 다른 노드들로 전달할 수 있다. 실시예들에 따른 노드들 각각은 이전 레이어에 포함된 노드들로부터 수신된 출력값들의 가중치(weight) 값과 활성 함수(activation function)들에 기초하여 자신의 출력값을 결정할 수 있다.Nodes included in each of the layers 101 , 102 , and 103 according to the embodiments may be connected to each other to transmit an operation result of the node to a node of a next layer. For example, one node may perform an operation by receiving data from other nodes of a previous layer, and may transmit an operation result to other nodes of a next layer. Each of the nodes according to the embodiments may determine its own output value based on activation functions and a weight value of output values received from nodes included in the previous layer.

실시예들에 따른 뉴럴 네트워크(100)의 계산을 위해, 입력 벡터에 대하여 가중치를 곱하는 행렬 곱셈을 실시할 수 있다. 이때, 행렬 곱셈은 각 레이어의 입력에 대해 모두 적용되며, 각 입력 벡터에 대해 반복적으로 수행됨에 따라, 뉴럴 네트워크의 깊이와 크기가 증가할수록 전체 연산 속도를 결정 짓는 가장 중요한 기본 연산자를 의미할 수 있다. In order to calculate the neural network 100 according to embodiments, matrix multiplication in which an input vector is multiplied by a weight may be performed. At this time, matrix multiplication is applied to all inputs of each layer, and as it is iteratively performed for each input vector, as the depth and size of the neural network increase, it can mean the most important basic operator that determines the overall operation speed. .

도 2는 실시예들에 따른 행렬 곱셈을 도시한 것이다.2 illustrates matrix multiplication according to embodiments.

도 2는 실시예들에 따른 행렬 곱셈을 압축 희소 행렬을 이용하여 실행하는 예를 도시하였다.2 illustrates an example of performing matrix multiplication using a compressed sparse matrix according to embodiments.

실시예들에 따른 희소 행렬은 행렬 내 0의 위치가 많고, 0이 아닌 값이 극히 일부인 행렬을 의미할 수 있다. 희소 행렬은 희소 행렬 그대로 저장하는 경우 0의 값에 의한 공간 소비가 큰 문제가 있다. 따라서, 실시예들에 따른 행렬 곱셈은 0이 아닌 행렬 위치를 표기하기 위한 압축 방식을 포함할 수 있다.A sparse matrix according to embodiments may refer to a matrix in which there are many positions of 0 in the matrix and only a few non-zero values are present. When the sparse matrix is stored as it is, there is a problem of large space consumption due to a value of 0. Accordingly, matrix multiplication according to embodiments may include a compression method for indicating a non-zero matrix position.

도 2의 (a)는 실시예들에 따른 행렬 곱셈으로서, CSR 방식을 도시한 것이다.FIG. 2A illustrates a CSR scheme as matrix multiplication according to embodiments.

실시예들에 따른 CSR(Compressed Sparse Row) 방식을 이용하는 행렬 곱셈일 수 있다. 예를 들어, 도 2의 (a)에 도시한 바와 같이, 기존 행렬(210)에 있어서, 맨 위에 위치하는 행부터 왼쪽에서 오른쪽의 순서대로 0이 아닌 값을 데이터(Data) 벡터(211)에 저장할 수 있다. 즉, 데이터 벡터(211)는, 8257129의 값을 저장할 수 있다. 지수(Indices) 벡터(212)는, 기존 행렬(210)에서 0이 아닌 값이 위치한 열의 위치값을 순서대로 저장할 수 있다. 예를 들어, 기존 행렬(210)에 포함된 0이 아닌 값 8, 2, 5, 7, 1, 2, 9가 위치하는 열인 0, 2, 2, 2, 3, 4, 3를 각각의 순서대로 0222343으로 저장할 수 있다. 인덱스 포인터(Index Pointer) 벡터(213)는 범위로써 데이터의 순서를 기록할 수 있다. 예를 들어, [0:2]의 범위 내에 첫 행의 데이터가 존재하므로, 데이터의 수는 2 개가 됨을 알 수 있고, 데이터 벡터(211)에서 앞 두 개, 즉, 8과 2가 있음을 호가인할 수 있다. 그 다음 행은 [2:3]의 범위가 가리키는 5가, 그 다음 행은 0이 아닌 값이 존재하지 않으므로, [3:3]으로 표기할 수 있다.It may be matrix multiplication using a Compressed Sparse Row (CSR) method according to embodiments. For example, as shown in (a) of FIG. 2 , in the existing matrix 210 , non-zero values are assigned to the data vector 211 in the order from left to right from the top row. can be saved That is, the data vector 211 may store a value of 8257129. The indices vector 212 may sequentially store position values of columns in which non-zero values are located in the existing matrix 210 . For example, 0, 2, 2, 2, 3, 4, 3, which are columns in which non-zero values 8, 2, 5, 7, 1, 2, and 9 included in the existing matrix 210 are located, are ordered in each order. You can save it as 0222343. The index pointer vector 213 may record the order of data as a range. For example, since the data of the first row exists within the range of [0:2], it can be seen that the number of data is two, and it can be seen that there are the first two, that is, 8 and 2 in the data vector 211 . can be cut The next row can be written as [3:3] because there is no non-zero value in the next row.

도 2의 (b)는 실시예들에 따른 행렬 곱셈으로서, CSC 방식을 도시한 것이다.FIG. 2B illustrates a CSC method as matrix multiplication according to embodiments.

실시예들에 따른 CSC(Compressed Sparse Column) 방식을 이용하는 행렬 곱셈일 수 있다. 이때, CSC 방식은 CSR 방식과 동일하게 진행하되, 행이 아닌 열을 중심으로 데이터를 기록할 수 있다. 예를 들어, 도 2의 (b)에 도시한 바와 같이, 기존 행렬(220)에 있어서, 맨 왼쪽에 위치하는 열부터 위쪽에서 아래쪽의 순서대로 0이 아닌 값을 데이터 벡터(221)에 저장할 수 있다. 즉, 데이터 벡터(221)는, 8257192의 값을 저장할 수 있다. 지수 벡터(222)는, 기존 행렬(220)에서 0이 아닌 값이 위치한 행의 위치값을 순서대로 저장할 수 있다. 예를 들어, 기존 행렬(220)에 포함된 0이 아닌 값이 위치하는 행을 순서대로 0014464로 저장할 수 있다. 인덱스 포인터 벡터(223)는 범위로써 데이터의 순서를 기록할 수 있고, 예를 들어, 011467의 값을 가질 수 있다.It may be matrix multiplication using a Compressed Sparse Column (CSC) method according to embodiments. In this case, the CSC method proceeds in the same manner as the CSR method, but data may be recorded centered on columns rather than rows. For example, as shown in FIG. 2B , in the existing matrix 220 , non-zero values can be stored in the data vector 221 in the order from top to bottom from the leftmost column. have. That is, the data vector 221 may store a value of 8257192. The exponential vector 222 may sequentially store position values of rows in which non-zero values are located in the existing matrix 220 . For example, rows in which non-zero values included in the existing matrix 220 are located may be sequentially stored as 0014464. The index pointer vector 223 may record the order of data as a range, and may have a value of, for example, 011467.

그러나, 압축 희소 행렬을 이용한 표현 방식은, 희소 행렬에는 효과적인 방식이나, 딥러닝에서의 행렬들은 희소 행렬이 아닌 경우가 많아, 데이터 저장 공간이나 연산량 면에서 효과적이지 않은 문제가 있다. 따라서, 뉴럴 네트워크에서의 다양한 값을 표현할 수 있는 연산 방법이 요구된다.However, an expression method using a compressed sparse matrix is an effective method for sparse matrices, but matrices in deep learning are often not sparse matrices, so there is a problem in that they are not effective in terms of data storage space or computational amount. Accordingly, a calculation method capable of expressing various values in a neural network is required.

도 3은 실시예들에 따른 행렬 곱셈을 비트 벡터를 이용하여 실행하는 예를 도시하였다.3 illustrates an example of performing matrix multiplication using a bit vector according to embodiments.

도 3은 실시예들에 따른 행렬 곱셈을 압축 비트 벡터를 이용하여 실행하는 예를 도시하였다.3 illustrates an example of performing matrix multiplication using a compressed bit vector according to embodiments.

실시예들에 따른 압축 비트 벡터를 이용한 행렬 곱셈 방식은 데이터(301)를 비트 벡터(302)로 변환하고, 비트 벡터(302)를 압축 비트 벡터(303)로 변환하는 과정을 포함할 수 있다. A matrix multiplication method using a compressed bit vector according to embodiments may include converting data 301 into a bit vector 302 and converting the bit vector 302 into a compressed bit vector 303 .

실시예들에 따른 비트 벡터(302)는 데이터(301)의 유일값(distinct value)만큼 비트 벡터들을 생성하고, 데이터 중 특정 값에 대한 위치를 해당값을 표현하는 비트 벡터에 1로 표현하여 저장할 수 있다. 예를 들어, 데이터(301)에 포함되는 첫 번째 값인 1의 경우, 1에 대한 비트 벡터(302)의 첫 번째 1로 표현될 수 있다. 또한, 데이터(301)에 포함되는 두 번째 값인 3의 경우, 3에 대한 비트 벡터(302)의 두 번째 위치의 1로 표현할 수 있다. The bit vector 302 according to the embodiments generates bit vectors as much as a unique value of the data 301 and stores the position of a specific value in the data by expressing it as 1 in the bit vector expressing the corresponding value. can For example, in the case of 1, which is the first value included in the data 301 , it may be expressed as the first 1 of the bit vector 302 for 1. In addition, in the case of 3, which is the second value included in the data 301 , it can be expressed as 1 in the second position of the bit vector 302 for 3 .

실시예들에 따른 압축 비트 벡터(303)는, 반복적인 0 또는 1들을 적은 값으로 표현하도록 압축하는 방식으로, RLE(Run Length Encoding) 방식을 이용할 수 있다. 구체적으로, 반복적으로 나오는 동일 값에 대해 런의 길이를 기록할 수 있고, 예를 들어, 도 3에 도시한 바와 같이, 8*0이 2 번 나왔음을 기록할 수 있다.The compressed bit vector 303 according to embodiments may use a Run Length Encoding (RLE) method as a method of compressing repetitive 0s or 1s to express a small value. Specifically, it is possible to record the length of a run for the same value that appears repeatedly, for example, as shown in FIG. 3 , it can record that 8*0 occurs twice.

그러나, 압축 비트 벡터를 이용한 표현 방식은 데이터에 존재하는 유일값의 수만큼 비트벡터가 존재해야 함에 따라 다양한 값들을 갖는 딥러닝에서의 행렬 값들을 표현하는데 있어 많은 수의 비트 벡터가 필요한 문제가 있다. 또한, 압축 비트 벡터를 이용한 표현 방식은 곧바로 연산을 수행하지 못하고, 압축을 해제해야 하는 문제가 있다. 따라서, 뉴럴 네트워크에서 보다 효과적으로 행렬 곱셈 연산을 수행할 수 있는 연산 방법이 요구된다.However, the expression method using compressed bit vectors has a problem in that a large number of bit vectors are required to express matrix values in deep learning having various values as the number of bit vectors must exist as many as the number of unique values present in the data. . In addition, the expression method using the compressed bit vector has a problem in that the operation cannot be performed immediately and the compression must be decompressed. Accordingly, there is a need for an operation method capable of more effectively performing a matrix multiplication operation in a neural network.

도 4는 실시예들에 따른 다차원 행렬 곱셈 방법의 흐름도이다.4 is a flowchart of a multidimensional matrix multiplication method according to embodiments.

도 4에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 입력 행렬을 복수 개의 이진화 된 행렬로 출력하는 단계(s401)를 포함할 수 있다. As shown in FIG. 4 , the multidimensional matrix multiplication method according to the embodiments may include outputting a plurality of input matrices as a plurality of binarized matrices ( S401 ).

실시예들에 따른 복수 개의 입력 행렬은, 계산 대상이 되는 행렬로서, 행렬 곱셈을 수행하기 위한 입력 행렬들을 포함할 수 있다. 복수 개의 입력 행렬은 다차원 행렬을 포함할 수 있다. 복수 개의 입력 행렬은, 예를 들어, 3차원 행렬을 가질 수 있다. 복수 개의 입력 행렬은 양자화를 통해 이진화 된 행렬로 출력될 수 있다. 이때, 양자화(quantization)는 원래 행렬의 원소 값을 보다 적은 비트길이를 갖도록 구성되는 값으로, 예를 들어, 원래 값을 이용하는 경우의 뉴럴 네트워크 모델의 정확도와 최대한 같거나 유사하도록 학습하면서 원래 값을 적은 비트길이로 표현하는 작업일 수 있다. 복수 개의 입력 행렬은 제 1 행렬 및 제 2 행렬을 포함할 수 있다. A plurality of input matrices according to embodiments may include input matrices for performing matrix multiplication as a matrix to be calculated. The plurality of input matrices may include a multidimensional matrix. The plurality of input matrices may have, for example, a three-dimensional matrix. A plurality of input matrices may be output as a binarized matrix through quantization. At this time, quantization is a value configured to have a smaller bit length for element values of the original matrix. It may be an operation to express with a small bit length. The plurality of input matrices may include a first matrix and a second matrix.

실시예들에 따라 이진화 된 행렬은 입력 행렬에 포함되는 값들이 양자화를 통해 이진화(binarization) 되거나 또는 이진화 된 값으로 분해 가능한 값들로 구성되는 행렬일 수 있다. 이진화 된 행렬은 입력 행렬과 동일한 모습을 가지나, 이진화 된 행렬에 포함되는 값은 이진화 된 값을 가질 수 있다.According to embodiments, the binarized matrix may be a matrix in which values included in the input matrix are binarized through quantization or are composed of values decomposable into binarized values. The binarized matrix has the same appearance as the input matrix, but values included in the binarized matrix may have binarized values.

도 1에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 이진화 된 행렬을 압축하여 복수 개의 압축된 비트 벡터로 변환하는 단계(s402)를 포함할 수 있다. As shown in FIG. 1 , the multidimensional matrix multiplication method according to embodiments may include compressing a plurality of binarized matrices and converting the plurality of binarized matrices into a plurality of compressed bit vectors ( S402 ).

실시예들에 따라 이진화 된 행렬은 언폴드(unfold)한 상태에서 1차원의 비트 벡터로 변환될 수 있다. 변환된 비트 벡터는 RLE(Run Length Encoding) 방식을 통해 압축될 수 있다. 구체적으로, 변환된 비트 벡터는, 변환된 비트 벡터에 포함되는 비트의 값을 기록하여 압축될 수 있다.According to embodiments, the binarized matrix may be converted into a one-dimensional bit vector in an unfolded state. The converted bit vector may be compressed through a Run Length Encoding (RLE) method. Specifically, the converted bit vector may be compressed by writing the value of the bit included in the converted bit vector.

실시예들에 따라 압축된 비트 벡터는, 0, 1 및 0 또는 1이 반복되는 횟수를 비트(bit)로서 포함할 수 있다. 구체적으로, 압축된 비트 벡터의 값은 벡터의 압축여부(0 또는 1)과 0, 1 값 및 0 과 1이 반복된 횟수를 포함할 수 있다. 이에 따라, 실시예들에 따른 다차원 행렬 곱셈 방법은 입력 행렬의 크기를 압축 비트벡터를 통해 크게 축소할 수 있다.A bit vector compressed according to embodiments may include 0, 1, and the number of times 0 or 1 is repeated as bits. Specifically, the value of the compressed bit vector may include whether the vector is compressed (0 or 1), the values 0 and 1, and the number of times 0 and 1 are repeated. Accordingly, the multidimensional matrix multiplication method according to the embodiments may greatly reduce the size of the input matrix through a compressed bit vector.

도 4에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 압축된 비트 벡터 상에서 특정 위치의 비트를 계산하는 단계(s403)를 포함할 수 있다.As shown in FIG. 4 , the multidimensional matrix multiplication method according to embodiments may include calculating a bit at a specific position on a plurality of compressed bit vectors ( S403 ).

실시예들에 따라 압축된 비트 벡터는, 커서(cursor)에 의해 압축된 비트 벡터 내에 존재하는 특정 비트의 위치를 파악하거나, 특정 위치의 비트 값을 알 수 있다. 이때, 커서는 가상 커서로서, 압축된 비트 벡터에 대한 특정 비트의 위치 정보를 압축되지 않은 비트 벡터에 대한 특정 비트의 위치 정보로 환원하기 위한 것일 수 있다. 이때, 커서는, 압축되지 않은 비트 벡터에 대한 특정 비트의 위치 정보를 획득하기 위해, 현재 압축된 워드(word) 단위에서의 위치와 해당 위치에 대응하는 압축된 형태의 방문 위치를 획득할 수 있다.In the compressed bit vector according to embodiments, the position of a specific bit existing in the compressed bit vector by a cursor or the bit value of the specific position may be determined. In this case, the cursor is a virtual cursor, and may be for reducing position information of a specific bit with respect to a compressed bit vector to position information of a specific bit with respect to an uncompressed bit vector. In this case, the cursor may acquire a position in a currently compressed word unit and a visited position in a compressed form corresponding to the position in order to acquire position information of a specific bit with respect to the uncompressed bit vector. .

실시예들에 따른 다차원 행렬 곱셈 방법은, 커서를 통해, 비트의 크기가 나누어지는 크기 단위 내에 0이 포함된 경우, 0을 포함하는 영역을 조사 시 건너뜀(skip)하여 처리 속도를 줄일 수 있다. 구체적으로, 실시예들에 따른 다차원 행렬 곱셈 방법은, 커서를 통해, 1을 포함하는 비트가 나올 때까지, 비트 방문 단위를 건너뛰거나, 특정 비트의 위치로 커서가 이동하도록 할 수 있다.In the multidimensional matrix multiplication method according to the embodiments, when 0 is included in a size unit in which the size of a bit is divided through a cursor, the processing speed may be reduced by skipping an area including 0 when irradiating it. . Specifically, the multidimensional matrix multiplication method according to the embodiments may skip a bit visit unit or move the cursor to a position of a specific bit until a bit including 1 comes out through the cursor.

도 4에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 복수 개의 압축된 비트 벡터 간 곱셈을 수행하여 결과 행렬을 출력하는 단계(s404)를 포함할 수 있다. 이때 압축된 비트 벡터 간 곱셈을 수행하기 위하여 XNOR 연산자와 1 값을 가지는 비트들의 개수를 계산하는 bitcount 연산을 포함할 수 있다.As shown in FIG. 4 , the multidimensional matrix multiplication method according to embodiments may include performing multiplication between a plurality of compressed bit vectors to output a result matrix ( S404 ). In this case, in order to perform multiplication between compressed bit vectors, an XNOR operator and a bitcount operation for calculating the number of bits having a value of 1 may be included.

실시예들에 따라 압축된 비트 벡터는 계산을 위해 논리적으로 그루핑(grouping, 그룹화) 될 수 있다. 구체적으로, 압축된 비트 벡터에 포함되는 비트들은 논리적인 블록 단위로 그룹이 나누어지는 그루핑이 될 수 있다. 압축된 비트 벡터는, 입력 행렬인 제 1 행렬의 열의 개수에 대응하는 비트의 개수를 포함하는 논리적인 블록 단위로 그루핑 될 수 있다. 또한, 압축된 비트 벡터는, 입력 행렬인 제 2 행렬의 행의 개수에 대응하는 비트의 개수를 포함하는 논리적인 블록 단위로 그루핑 될 수 있다. According to embodiments, the compressed bit vector may be logically grouped for calculation. Specifically, the bits included in the compressed bit vector may be grouping in which groups are divided into logical block units. The compressed bit vector may be grouped in logical block units including the number of bits corresponding to the number of columns of the first matrix, which is the input matrix. Also, the compressed bit vector may be grouped in logical block units including the number of bits corresponding to the number of rows of the second matrix, which is the input matrix.

실시예들에 따라 그루핑 된 비트들은 기 설정된 연산에 의해 서로 매칭(matching) 또는 미스 매칭(mismatching)되는지 판단될 수 있다. 구체적으로, 연산의 대상이 되는 복수의 그루핑 된 비트들은, 기 설정된 연산을 통해 서로 매칭되는 횟수가 많은지 또는 미스매칭 되는 횟수가 많은지 판단될 수 있다. 이때, 기 설정된 연산은 배타적 NOR(XNOR, exclusive Nor)일 수 있다. 즉, 그루핑 된 비트들은 기 설정된 연산, 예를 들어, bitcount 함수를 이용한 XNOR 연산을 통해, 1의 개수를 계산할 수 있다. According to embodiments, it may be determined whether grouped bits are matched or mismatched with each other by a preset operation. Specifically, it may be determined whether the plurality of grouped bits, which are the target of the operation, match each other or mismatch many times through a preset operation. In this case, the preset operation may be an exclusive NOR (XNOR). That is, the number of 1s may be calculated for the grouped bits through a preset operation, for example, an XNOR operation using the bitcount function.

실시예들에 따라 출력된 결과 행렬은, 계산된 1의 개수에 기초하여 비트의 위치(position)가 설정될 수 있다. 구체적으로, 출력된 결과 행렬에서, 각 비트의 값이 입력되는 위치는 1의 개수에 기초하여 설정될 수 있다.In the result matrix output according to embodiments, a position of a bit may be set based on the calculated number of ones. Specifically, in the output result matrix, a position at which a value of each bit is input may be set based on the number of 1's.

즉, 실시예들에 따른 다차원 행렬 곱셈 방법은, 압축을 풀지 않은 상태, 즉, 비트 벡터가 압축된 상태에서 압축된 비트 벡터 간 곱셈을 수행하여 결과 행렬을 출력할 수 있다.That is, the multidimensional matrix multiplication method according to the embodiments may output a result matrix by performing multiplication between compressed bit vectors in an uncompressed state, that is, in a state in which the bit vectors are compressed.

도 4에 도시한 바와 같이, 실시예들에 따른 다차원 행렬 곱셈 방법은, 압축된 비트 벡터 상에서 특정 비트의 위치에 대응하여 결과 행렬을 변환하는 단계(s405)를 포함할 수 있다. As shown in FIG. 4 , the multidimensional matrix multiplication method according to embodiments may include transforming the result matrix corresponding to the position of a specific bit on a compressed bit vector ( S405 ).

실시예들에 따라 출력된 결과 행렬은, 커서에 의해 획득한 압축되지 않은 비트 벡터에 대한 특정 비트의 위치 정보에 기초하여, 압축되지 않은 결과 행렬로 변환될 수 있다.A result matrix output according to embodiments may be converted into an uncompressed result matrix based on position information of a specific bit with respect to an uncompressed bit vector obtained by a cursor.

이때, 실시예들에 따라 논리적으로 그루핑 된 비트들은 기 설정된 연산에 의해 서로 매칭 또는 미스 매칭되는지 판단될 수 있다. 구체적으로, 연산의 대상이 되는 복수의 그루핑 된 비트들은, 기 설정된 연산을 통해 서로 매칭되는 횟수가 많은지 또는 미스매칭 되는 횟수가 많은지 판단될 수 있다. 이때, 기 설정된 연산은 XNOR일 수 있다. 즉, 그루핑 된 비트들은 기 설정된 연산, 예를 들어, bitcount 함수를 이용한 XNOR 연산을 통해, 1의 개수를 계산할 수 있다. 계산된 1의 개수에 기초하여, 비트의 위치(position)가 설정될 수 있다. 구체적으로, 출력될 결과 행렬에서, 각 비트의 값이 입력되는 위치는 1의 개수에 기초하여 설정될 수 있다.In this case, according to embodiments, it may be determined whether the logically grouped bits match or mismatch each other by a preset operation. Specifically, it may be determined whether the plurality of grouped bits, which are the target of the operation, match each other or mismatch many times through a preset operation. In this case, the preset operation may be XNOR. That is, the number of 1s may be calculated for the grouped bits through a preset operation, for example, an XNOR operation using the bitcount function. Based on the calculated number of 1s, the position of the bit may be set. Specifically, in the result matrix to be output, the position at which the value of each bit is input may be set based on the number of 1's.

실시예들에 따른 다차원 행렬 곱셈 방법은, 압축된 비트 벡터의 제 1 그룹에 속하는 비트들에 대하여 연산이 종료된 경우, 제 2 그룹에 속하는 비트들에 대하여 연산을 반복 수행할 수 있다. 이 경우, 제 2 그룹은, s403 단계를 수행할 수 있다.In the multidimensional matrix multiplication method according to the embodiments, when the operation on the bits belonging to the first group of the compressed bit vector is finished, the operation may be repeatedly performed on the bits belonging to the second group. In this case, the second group may perform step s403.

실시예들에 따른 다차원 행렬 곱셈 방법은, 실시예들에 따라 그룹으로 나누어진 전체 블록 단위(제 1 그룹, 제 2 그룹을 포함)에 대하여, s401 내지 s403의 단계가 모두 수행된 경우, 블록 단위 크기만큼 비트 벡터를 왼쪽으로 순환 시프트하여, s404의 단계를 수행할 수 있다.In the multidimensional matrix multiplication method according to the embodiments, when all of the steps s401 to s403 are performed with respect to the entire block unit (including the first group and the second group) divided into groups according to the embodiments, the block unit By cyclically shifting the bit vector to the left by the size, the step s404 may be performed.

이하에서는, 입력 행렬을 이진화 된 행렬로 출력하는 것에 대하여 상술한다.Hereinafter, outputting the input matrix as a binarized matrix will be described in detail.

도 5는 실시예들에 따라 입력 행렬을 이진화하고 이를 행렬 곱셈 하는 것이 기존 입력행렬들을 그대로 행렬 곱셈한 결과와 근사화될 수 있음을 개략적으로 나타낸 도면이다.FIG. 5 is a diagram schematically illustrating that binarizing an input matrix and multiplying it by a matrix according to embodiments can approximate a result of matrix multiplication of existing input matrices.

도 5에 도시한 바와 같이, 입력 행렬(510)(예를 들어, 도 1에서 설명한 입력 레이어, 도 2 내지 도 3에서 설명한 데이터 및 도 4에서 설명한 입력 행렬)은 계산의 대상이 되는 복수 개의 다차원 행렬 또는 텐서를 포함할 수 있다. 도 5에서는, 입력 행렬(510)을 3차원으로 도시하였으나, 이에 한정되는 것은 아니며, 지원하는 차원은 어떤 차원이어도 된다. 입력 행렬(510)은, 예를 들어, 제 1 행렬(I)과 제 2 행렬(W)을 포함할 수 있다.As shown in FIG. 5 , the input matrix 510 (eg, the input layer described in FIG. 1 , the data described in FIGS. 2 to 3 , and the input matrix described in FIG. 4 ) is a plurality of multidimensional calculation objects. May contain matrices or tensors. In FIG. 5 , the input matrix 510 is illustrated in three dimensions, but the present invention is not limited thereto, and the supported dimension may be any dimension. The input matrix 510 may include, for example, a first matrix (I) and a second matrix (W).

도 5에 도시한 바와 같이, 입력 행렬(510)에 포함되는 값들은, 이진화 되어 이진화 된 행렬(520)(예를 들어, 도 4에서 설명한 이진화 된 행렬)로 출력될 수 있다. 구체적으로, 입력 행렬(510)에 포함되는 값들은, 양자화가 수행되어 이진화 되거나 또는 이진화 된 값으로 분해 가능한 값들로 구성되는 이진화 된 행렬(520)로 출력될 수 있다. 이때, 식별번호 521은, 입력 행렬이 이진화 된 값으로 분해 가능한 경우, 분해 전 값으로 돌아가기 위한 행렬(K) 또는 상수(α)일 수 있다. 예를 들어, 입력 행렬(510)에 포함되는 제 1 행렬(I)의 값 0.2는 이진화되면서 1이 되고, 입력 행렬(510)에 포함되는 제 2 행렬(I)의 값 -5는 이진화되면서 -1이 될 수 있다. 이때, 이진화 된 행렬(520)은, 입력 행렬(510)보다 작은 비트 길이를 갖도록 구성되면서, 입력 행렬(510)과 같거나 유사한 값을 가질 수 있는 비트를 포함할 수 있다. 이진화 된 행렬(520)은 비트 벡터로 변환될 수 있다.As shown in FIG. 5 , values included in the input matrix 510 may be binarized and output as a binarized matrix 520 (eg, the binarized matrix described in FIG. 4 ). Specifically, values included in the input matrix 510 may be binarized by performing quantization or output as a binarized matrix 520 composed of values decomposable into binarized values. In this case, the identification number 521 may be a matrix (K) or a constant (α) for returning to a value before decomposition when the input matrix can be decomposed into a binarized value. For example, the value 0.2 of the first matrix (I) included in the input matrix 510 becomes 1 while being binarized, and the value -5 of the second matrix (I) included in the input matrix 510 is binarized while - can be 1. In this case, the binarized matrix 520 may include bits that are configured to have a smaller bit length than the input matrix 510 and may have the same or similar value to the input matrix 510 . The binarized matrix 520 may be converted into a bit vector.

이하에서는, 비트 벡터로 변환된 이진화 된 행렬을 압축하여 압축된 비트 벡터로 변환하는 것에 대하여 상술한다.Hereinafter, the compression of the binarized matrix converted into a bit vector and conversion into a compressed bit vector will be described in detail.

도 6은 실시예들에 따라 행렬 및 텐서를 풀어 생성한 비트 벡터를 압축하는 것을 개략적으로 나타낸 도면이다.6 is a diagram schematically illustrating compression of a bit vector generated by unraveling a matrix and a tensor according to embodiments.

실시예들에 따른 다차원 행렬 곱셈 방법(예를 들어, 도 1 내지 도 3에서 설명한 행렬 곱셈, 도 4 내지 도 5 에서 설명한 다차원 행렬 곱셈 방법)은 이진화 된 행렬(예를 들어, 도 4 내지 도 5에서 설명한 이진화 된 행렬)을 압축된 비트 벡터(예를 들어, 도 4에서 설명한 압축된 비트 벡터)로 나타내는 단계(예를 들어, 도 4에서 설명한 s402)를 포함할 수 있다.The multidimensional matrix multiplication method (eg, the matrix multiplication described in FIGS. 1 to 3 , the multidimensional matrix multiplication method described in FIGS. 4 to 5 ) according to the embodiments is a binarized matrix (eg, FIGS. 4 to 5 ) It may include a step (eg, s402 described in FIG. 4 ) representing the binarized matrix described in ) as a compressed bit vector (eg, the compressed bit vector described in FIG. 4 ).

도 6의 (a)에서는 이진화 된 행렬 및 텐서를 언폴드(unfold)한 비트 벡터를 나타냈다. 언폴드 된 비트 벡터는, 예를 들어, 8,000 비트로 구성되고, 이때 비트 벡터는 단지 0 또는 1의 값만을 가질 수 있다. 설명의 편의를 위하여, 이하에서는, 컴퓨팅 기기를 포함하는 연산 장치에서 1개의 단어(word)는 32비트로 구성되는 경우를 가정하나, 이에 한정되는 것은 아니다.6(a) shows a bit vector in which a binarized matrix and a tensor are unfolded. An unfolded bit vector consists of, for example, 8,000 bits, where the bit vector can only have a value of 0 or 1. For convenience of description, hereinafter, it is assumed that one word in a computing device including a computing device consists of 32 bits, but the present invention is not limited thereto.

도 6의 (b)에 도시한 바와 같이, 1개의 단어(word)는 32비트이므로, 31개의 비트씩 그루핑이 이루어질 수 있다. 예를 들어, 8000비트는 7998비트에 대하여 258개의 그룹으로 그루핑이 이루어지고, 2비트가 남을 수 있다. 이때, 258개의 그룹은 1을 포함하는 그룹과 1을 포함하지 않는 그룹으로 분리될 수 있다. 예를 들어, 2 개의 그룹은 1을 포함하고, 256 개의 그룹은 1을 포함하지 않는 그룹일 수 있다. 이 경우, 도 6의 (b)에 도시한 것처럼, 1을 포함하는 그룹과 1을 포함하지 않는 그룹을 분리하여, 31bits 와 256*31bits 로 표현할 수 있다. As shown in (b) of FIG. 6 , since one word is 32 bits, grouping can be performed by 31 bits. For example, 8000 bits are grouped into 258 groups with respect to 7998 bits, and 2 bits may remain. In this case, the 258 groups may be divided into a group including 1 and a group not including 1. For example, 2 groups may include 1, and 256 groups may be groups that do not include 1. In this case, as shown in (b) of FIG. 6 , a group including 1 and a group not including 1 may be separated and expressed as 31 bits and 256 * 31 bits.

도 6의 (c)에 도시한 바와 같이, 1을 포함하는 그룹에 대하여는 압축을 수행하지 않을 수 있다. 이 경우, 그룹의 첫 비트는 0으로 표현될 수 있다.As shown in (c) of FIG. 6 , compression may not be performed on a group including 1. In this case, the first bit of the group may be expressed as 0.

도 6의 (c)에 도시한 바와 같이, 1을 포함하지 않는 그룹에 대하여는 압축이 수행될 수 있다. 이 경우, 그룹의 첫 비트는 1로 표현될 수 있다. 이때, 그룹이 반복된 횟수(256)(Run-length)를 나머지 비트에 기록할 수 있고, 예를 들어, 100000000로 기록할 수 있다. As shown in (c) of FIG. 6 , compression may be performed on a group that does not include 1 . In this case, the first bit of the group may be expressed as 1. In this case, the number of times the group is repeated 256 (Run-length) may be recorded in the remaining bits, for example, 100000000 may be recorded.

즉, 도 6의 (c)에 도시한 바와 같이, 각 그룹들에 대하여 1을 포함하는지 여부에 따라 압축을 수행함으로써, 압축된 비트 벡터를 출력할 수 있다.That is, as shown in (c) of FIG. 6 , by performing compression according to whether 1 is included in each group, a compressed bit vector may be output.

이하에서는, 압축된 비트 벡터에 상에서 특정 비트의 위치를 계산하는 것에 대하여 상술한다.Hereinafter, calculating the position of a specific bit on the compressed bit vector will be described in detail.

도 7은 실시예들에 따라 커서에 의해 특정 비트의 위치가 계산되는 것을 개략적으로 나타낸 도면이다.7 is a diagram schematically illustrating that a position of a specific bit is calculated by a cursor according to embodiments.

실시예들에 따른 다차원 행렬 곱셈 방법(예를 들어, 도 1 내지 도 3에서 설명한 행렬 곱셈, 도 4 내지 도 6 에서 설명한 다차원 행렬 곱셈 방법)은 커서(예를 들어, 도 4에서 설명한 커서 또는 가상 커서)를 통해 특정 비트의 위치 정보를 획득하는 단계(예를 들어, 도 4에서 설명한 s403)를 포함할 수 있고, 구체적으로, 압축된 비트 벡터(예를 들어, 도 4 및 도 6에서 설명한 압축된 비트 벡터)에 대해 특정 비트의 위치 정보를 압축되지 않은 상태에서의 위치 정보로 환원하기 위한 정보를 획득할 수 있다.A multidimensional matrix multiplication method (eg, the matrix multiplication described with reference to FIGS. 1 to 3 , and the multidimensional matrix multiplication method described with reference to FIGS. 4 to 6 ) according to embodiments is a cursor (eg, the cursor or virtual method described with reference to FIG. 4 ) It may include a step of obtaining the position information of a specific bit (eg, s403 described in FIG. 4 ) through a cursor), and specifically, a compressed bit vector (eg, the compression described in FIGS. 4 and 6 ). bit vector), information for reducing the position information of a specific bit to position information in an uncompressed state can be obtained.

실시예들에 따른 커서는, 도 7의 a)에 도시한 것처럼, 다음 1비트가 나올 때까지 비트 단위의 방문을 건너뛸(skip) 수 있다. 예를 들어, 단위 비트 중에서 0만을 포함하는 단위 비트를 건너 뛰어서, 1을 포함하는 단위 비트에서 1을 포함하는 단위 비트로 이동할 수 있다. A cursor according to embodiments may skip a bit-by-bit visit until the next 1 bit is displayed, as shown in FIG. 7 a ). For example, by skipping a unit bit including only 0 among unit bits, it is possible to move from a unit bit including 1 to a unit bit including 1.

실시예들에 따른 커서는, 도 7의 b)에 도시한 것처럼, 특정 비트의 위치로 커서를 이동할 수 있다. 예를 들어, 커서는 '1'을 갖는 특정 비트의 위치로 이동할 수 있다.As shown in b) of FIG. 7 , the cursor according to embodiments may move the cursor to a position of a specific bit. For example, the cursor may move to the position of a specific bit with a '1'.

이하에서는, 압축된 비트 벡터를 이용한 행렬 곱셈에 대하여 상술한다.Hereinafter, matrix multiplication using a compressed bit vector will be described in detail.

도 8은 실시예들에 따라 압축된 비트 벡터를 통해 행렬 곱셈을 행하는 것을 개략적으로 나타낸 도면이다.8 is a diagram schematically illustrating performing matrix multiplication through a compressed bit vector according to embodiments.

실시예들에 따른 다차원 행렬 곱셈 방법(예를 들어, 도 1 내지 도 3에서 설명한 행렬 곱셈, 도 4 내지 도 6 에서 설명한 다차원 행렬 곱셈 방법)은 결과 행렬(830)(예를 들어, 도 1에서 설명한 출력 레이어 및 도 4에서 설명한 결과 행렬)을 출력하는 단계(예를 들어, 도 4에서 설명한 s404 내지 s405)를 포함할 수 있다. 구체적으로, 다차원 행렬 곱셈 방법은, 입력 행렬(810)(예를 들어, 도 1에서 설명한 입력 레이어, 도 2 내지 도 3에서 설명한 데이터 및 도 4 내지 도 5에서 설명한 입력 행렬)을 언폴드 된 벡터(820)(예를 들어, 도 6에서 설명한 언폴드 된 행렬)로 나타내고, 언폴드 된 벡터를 결과 행렬(830)로 출력할 수 있다.The multidimensional matrix multiplication method (eg, the matrix multiplication described in FIGS. 1 to 3 , the multidimensional matrix multiplication method described in FIGS. 4 to 6 ) according to the embodiments is the result matrix 830 (eg, in FIG. 1 ). outputting the described output layer and the result matrix described with reference to FIG. 4 (eg, s404 to s405 described with reference to FIG. 4 ). Specifically, in the multidimensional matrix multiplication method, the input matrix 810 (eg, the input layer described in FIG. 1 , the data described in FIGS. 2 to 3 , and the input matrix described in FIGS. 4 to 5 ) is converted into an unfolded vector. It is represented by 820 (eg, the unfolded matrix described with reference to FIG. 6 ), and an unfolded vector may be output as the result matrix 830 .

실시예들에 따른 입력 행렬(810)은 입력 행렬 I 와 가중치 행렬 W를 포함할 수 있다. 입력 행렬 I 와 가중치 행렬 W는, 도 4 내지 도 5에서 설명한 양자화 기법에 의해 실수값 0 또는 1의 값을 갖는 행렬로 변환될 수 있다. 도 8에 도시한 입력 행렬(810)들이 가지는 1 내지 15의 값은, 입력 행렬(810)들이 가지는 값의 위치를 알기 쉽게 숫자로 나타낸 것이다. 즉, 입력 행렬(810)은, 양자화 기법을 통해 이진화 된 행렬(예를 들어, 도 4 내지 도 6에서 설명한 이진화 된 행렬)로 변환될 수 있다.The input matrix 810 according to embodiments may include an input matrix I and a weight matrix W. The input matrix I and the weight matrix W may be transformed into a matrix having a real value of 0 or 1 by the quantization technique described with reference to FIGS. 4 to 5 . Values of 1 to 15 of the input matrices 810 shown in FIG. 8 are numerically indicating positions of values of the input matrices 810 in an easy to understand manner. That is, the input matrix 810 may be transformed into a binarized matrix (eg, the binarized matrix described with reference to FIGS. 4 to 6 ) through a quantization technique.

실시예들에 따른 이진화 된 행렬은 도 4 및 도 6에서 설명한 언폴드 과정을 통해 언폴드 된 벡터(820)로 변환될 수 있다. 언폴드 과정을 통해, 입력 벡터 I의 벡터 값들 및 가중치 벡터 W의 벡터 값들은, 언폴드 된 벡터인 unfold(I) 및 unfold(W) 상에 나열될 수 있다. 이때, unfold(I)와 unfold(W)에 대응되는 숫자들, 예를 들어 unfold(I)에 포함되는 1과 unfold(W)에 대응되는 1은 서로 곱해져야 하는 값들을 나타낼 수 있다. The binarized matrix according to embodiments may be converted into an unfolded vector 820 through the unfolding process described with reference to FIGS. 4 and 6 . Through the unfolding process, the vector values of the input vector I and the vector values of the weight vector W can be arranged on unfold(I) and unfold(W), which are the unfolded vectors. In this case, numbers corresponding to unfold(I) and unfold(W), for example, 1 included in unfold(I) and 1 corresponding to unfold(W) may represent values that should be multiplied by each other.

이때, 실시예들에 따라 언폴드 과정에서 동일 행 또는 동일 열의 구분이 없어지는 것에 대응하여, 실시예들에 따른 다차원 행렬 곱셈 방법은 입력 행렬 I의 열의 개수 또는 가중치 행렬 W의 행의 개수만큼 비트들을 논리적인 블록 단위로 그룹짓는 그루핑을 행할 수 있고, 이에 따라, 그루핑 된 비트(예를 들어, 도 4에서 설명한 그루핑 된 비트)가 형성될 수 있다. In this case, in response to the loss of distinction between the same row or the same column in the unfolding process according to the embodiments, the multidimensional matrix multiplication method according to the embodiments includes bits as many as the number of columns of the input matrix I or the number of rows of the weighting matrix W. Grouping may be performed by grouping them into logical block units, and accordingly, a grouped bit (eg, the grouped bit described with reference to FIG. 4 ) may be formed.

이때, 실시예들에 따른 다차원 행렬 곱셈 방법은, 그루핑 된 간격 내에서, XNOR 및/또는 bitcount를 수행하여 결과값을 구할 수 있고, 예를 들어, Bitcount(XNOR(I,W))와 같이 수행할 수 있다. In this case, the multidimensional matrix multiplication method according to the embodiments may obtain a result value by performing XNOR and/or bitcount within the grouped interval, for example, Bitcount(XNOR(I,W)). can do.

도 8에 도시한 바와 같이, 언폴드 된 행렬(821)에 대하여 결과 행렬(831)을 출력하고 나면, 왼쪽 순환 시프트(예를 들어, 도 4에서 설명한 순환 시프트)를 통해 벡터 내 다음 간격에 대하여 연산을 반복할 수 있다. 즉, 비트 벡터로 표현되는 행렬(822)에 대하여 연산을 수행하고, 이에 따라 결과 행렬(832)을 출력할 수 있다.As shown in FIG. 8 , after outputting the result matrix 831 with respect to the unfolded matrix 821 , for the next interval in the vector through a left cyclic shift (eg, the cyclic shift described in FIG. 4 ) The operation can be repeated. That is, an operation may be performed on the matrix 822 expressed as a bit vector, and thus the result matrix 832 may be output.

이때, 결과 행렬(830)은, 결과값을 순서대로 ①, ②, ③, ④, ⑤, ⑥ 상에 기록할 수 있다. 이를 통해 비트 단위로 적용되는 연산의 횟수를 최소화 하여 다차원 행렬 간 효율적인 연산을 수행할 수 있다.In this case, the result matrix 830 may record the result values on ①, ②, ③, ④, ⑤, and ⑥ in order. Through this, efficient operation between multidimensional matrices can be performed by minimizing the number of operations applied in units of bits.

도 8에서는 설명의 편의를 위하여 벡터들이 압축되지 않은 상태에서 XNOR 연산을 통해 계산되어 행렬 원소의 위치를 표시하도록 도시하고 있으나, 실제 처리에서는 도 6 내지 도 7에서 도시한 바와 같이, 압축된 비트 벡터를 대상으로 도 8에서 도시한 연산을 수행할 수 있다.In FIG. 8, for convenience of explanation, vectors are calculated through XNOR operation in an uncompressed state to indicate the positions of matrix elements, but in actual processing, as shown in FIGS. 6 to 7, compressed bit vectors The operation shown in FIG. 8 may be performed with respect to .

도 9는 실시예들에 따른 다차원 행렬 곱셈 연산 장치의 구성을 개략적으로 나타낸 도면이다.9 is a diagram schematically illustrating a configuration of a multidimensional matrix multiplication operation apparatus according to embodiments.

실시예들에 따른 다차원 행렬 곱셈 연산 장치(900)는 이진화 모듈(910), 비트 벡터 압축 모듈(920), 커서 모듈(930) 및 곱셈 연산 모듈(940)을 포함할 수 있다.The multidimensional matrix multiplication operation apparatus 900 according to embodiments may include a binarization module 910 , a bit vector compression module 920 , a cursor module 930 , and a multiplication operation module 940 .

실시예들에 따른 이진화 모듈(910)은, 복수 개의 입력 행렬(예를 들어, 도 1에서 설명한 입력 레이어, 도 2 내지 도 3에서 설명한 데이터 및 도 4 내지 도 5 및 도 8에서 설명한 입력 행렬)에 대해 양자화를 수행하여 복수 개의 이진화 된 행렬(예를 들어, 도 4 내지 도 6 및 도 8에서 설명한 이진화 된 행렬)을 출력할 수 있다. 이진화 모듈(910)은, 예를 들어, 도 4에서 설명한 s401을 수행할 수 있다.The binarization module 910 according to embodiments may include a plurality of input matrices (eg, the input layer described in FIG. 1 , the data described in FIGS. 2 to 3 , and the input matrix described in FIGS. 4 to 5 and 8 ). A plurality of binarized matrices (eg, the binarized matrices described with reference to FIGS. 4 to 6 and 8 ) may be output by performing quantization. The binarization module 910 may, for example, perform s401 described with reference to FIG. 4 .

실시예들에 따른 비트 벡터 압축 모듈(920)은, 복수 개의 이진화 된 행렬을 복수 개의 압축된 비트 벡터(예를 들어, 도 4 및 도 6 내지 도 7에서 설명한 압축된 비트 벡터)로 변환할 수 있다. 비트 벡터 압축 모듈(920)은, 예를 들어, 도 4에서 설명한 s402 를 수행할 수 있다. 구체적으로, 비트 벡터 압축 모듈(920)은 복수 개의 이진화 된 행렬을 언폴드하여 비트 벡터로 변환하고, 변환된 비트 벡터에 포함되는 비트의 값을 기록하여 변환된 비트 벡터의 압축을 수행할 수 있다. 이때, 압축된 비트 벡터의 값은 0, 1 및 0 과 1이 반복된 횟수를 포함할 수 있다.The bit vector compression module 920 according to embodiments may convert a plurality of binarized matrices into a plurality of compressed bit vectors (eg, the compressed bit vectors described with reference to FIGS. 4 and 6 to 7 ). have. The bit vector compression module 920 may, for example, perform s402 described with reference to FIG. 4 . Specifically, the bit vector compression module 920 may unfold a plurality of binarized matrices to convert them into bit vectors, and perform compression of the converted bit vectors by recording the values of bits included in the converted bit vectors. . In this case, the value of the compressed bit vector may include 0, 1, and the number of times 0 and 1 are repeated.

실시예들에 따른 커서 모듈(930)은, 복수 개의 압축된 비트 벡터 상에서 특정 비트의 위치를 계산하거나 또는 특정 위치의 비트 값을 계산할 수 있다. 커서 모듈(930)은, 커서((예를 들어, 도 4 내지 도 5에서 설명한 커서 또는 가상 커서)를 이용하여 특정 비트(예를 들어, 1의 위치)를 획득할 수 있다. 커서 모듈(930)은, 예를 들어, 도 4에서 설명한 s403을 수행할 수 있다. 커서 모듈(930)은 또한 도 7에서 보이는 바와 같이 다음의 1 비트값이 나올 때까지 다른 0 값들을 검사하지 않도록 계산을 건너뛸 수 있다. The cursor module 930 according to embodiments may calculate a position of a specific bit or calculate a bit value of a specific position on a plurality of compressed bit vectors. The cursor module 930 may acquire a specific bit (eg, a position of 1) using a cursor (eg, the cursor or virtual cursor described with reference to FIGS. 4 to 5 ). . can run

실시예들에 따른 곱셈 연산 모듈(940)은, 복수 개의 압축된 비트 벡터의 곱셈을 수행하여 결과 행렬(예를 들어, 도 4 및 도 8에서 설명한 결과 행렬)을 출력할 수 있다. 곱셈 연산 모듈(940)은, 예를 들어, 도 4에서 설명한 s404 및 s405을 수행할 수 있다. 구체적으로, 곱셈 연산 모듈(940)은 복수 개의 입력 행렬에 포함된 열의 개수 또는 행의 개수 중 적어도 하나의 개수만큼 압축된 비트 벡터에 포함되는 비트들을 블록 단위로 그루핑할 수 있다. 곱셈 연산 모듈(940)은 그루핑 된 비트들에 대하여 기 설정된 연산을 수행할 수 있다. 이때, 기 설정된 연산은 배타적 NOR(XNOR, exclusive Nor)일 수 있다. 즉, 곱셈 연산 모듈(940)은, 기 설정된 연산, 예를 들어, bitcount 함수를 이용한 XNOR 연산을 통해, 그루핑 된 비트들에 포함된 1의 개수를 계산할 수 있다. 곱셈 연산 모듈(940)은, 1의 개수를 위치로 설정하여 결과 행렬을 출력할 수 있다. 곱셈 연산 모듈(940)은, 첫 번째 그룹에 대하여 결과 행렬이 출력되면, 다음 그룹에 대하여 압축된 비트 벡터 간 곱셈을 수행할 수 있다. 곱셈 연산 모듈(940)은, 그룹화 된 비트들 전체에 대하여 곱셈이 수행된 경우, 블록 단위의 크기만큼 압축된 비트 벡터에 포함되는 비트 벡터를 순환 시프트(예를 들어, 도 4 및 도 8에서 설명한 순환 시프트) 하여 압축된 비트 벡터의 곱셈을 수행할 수 있다. The multiplication operation module 940 according to embodiments may perform multiplication of a plurality of compressed bit vectors to output a result matrix (eg, the result matrix described with reference to FIGS. 4 and 8 ). The multiplication operation module 940 may, for example, perform operations s404 and s405 described with reference to FIG. 4 . Specifically, the multiplication operation module 940 may group the bits included in the compressed bit vector by at least one of the number of columns and the number of rows included in the plurality of input matrices in block units. The multiplication operation module 940 may perform a preset operation on the grouped bits. In this case, the preset operation may be an exclusive NOR (XNOR). That is, the multiplication operation module 940 may calculate the number of 1s included in the grouped bits through a preset operation, for example, an XNOR operation using a bitcount function. The multiplication operation module 940 may output a result matrix by setting the number of 1s to positions. When the result matrix is output for the first group, the multiplication operation module 940 may perform multiplication between compressed bit vectors for the next group. When multiplication is performed on all grouped bits, the multiplication operation module 940 cyclically shifts a bit vector included in a bit vector compressed by a block unit size (eg, as described in FIGS. 4 and 8 ). cyclic shift) to perform multiplication of compressed bit vectors.

실시예들에 따른 다차원 행렬 곱셈 방법, 연산 장치 및 프로그램을 포함하는 저장 장치는 압축 비트벡터를 이용한 행렬 곱셈 고속화 기법을 통해 GPU 등 고성능 계산 기기를 갖지 못하는 저성능의 PC 및 임베디드, 모바일 기기 등에서 딥러닝 모델을 효율적으로 처리할 수 있다.A storage device including a multidimensional matrix multiplication method, an arithmetic device, and a program according to the embodiments is a low-performance PC, embedded, and mobile device that does not have a high-performance computing device such as a GPU through a matrix multiplication speeding technique using a compressed bit vector. The learning model can be processed efficiently.

제1, 제2 등과 같은 용어는 실시예들의 다양한 구성요소들을 설명하기 위해 사용될 수 있다. 하지만 실시예들에 따른 다양한 구성요소들은 위 용어들에 의해 해석이 제한되어서는 안된다. 이러한 용어는 하나의 구성요소를 다른 구성요소와 구별하기 위해 사용되는 것에 불과하다. 것에 불과하다. 예를 들어, 제1 사용자 인풋 시그널은 제2사용자 인풋 시그널로 지칭될 수 있다. 이와 유사하게, 제2사용자 인풋 시그널은 제1사용자 인풋시그널로 지칭될 수 있다. 이러한 용어의 사용은 다양한 실시예들의 범위 내에서 벗어나지 않는 것으로 해석되어야만 한다. 제1사용자 인풋 시그널 및 제2사용자 인풋 시그널은 모두 사용자 인풋 시그널들이지만, 문맥 상 명확하게 나타내지 않는 한 동일한 사용자 인풋 시그널들을 의미하지 않는다.Terms such as first, second, etc. may be used to describe various components of the embodiments. However, the interpretation of various components according to the embodiments should not be limited by the above terms. These terms are only used to distinguish one component from another. it is only For example, the first user input signal may be referred to as a second user input signal. Similarly, the second user input signal may be referred to as a first user input signal. Use of these terms should be interpreted as not departing from the scope of the various embodiments. Although both the first user input signal and the second user input signal are user input signals, they do not mean the same user input signals unless the context clearly indicates otherwise.

실시예들을 설명하기 위해 사용된 용어는 특정 실시예들을 설명하기 위한 목적으로 사용되고, 실시예들을 제한하기 위해서 의도되지 않는다. 실시예들의 설명 및 청구항에서 사용된 바와 같이, 문맥 상 명확하게 지칭하지 않는 한 단수는 복수를 포함하는 것으로 의도된다. 및/또는 표현은 용어 간의 모든 가능한 결합을 포함하는 의미로 사용된다. 포함한다 표현은 특징들, 수들, 단계들, 엘리먼트들, 및/또는 컴포넌트들이 존재하는 것을 설명하고, 추가적인 특징들, 수들, 단계들, 엘리먼트들, 및/또는 컴포넌트들을 포함하지 않는 것을 의미하지 않는다. 실시예들을 설명하기 위해 사용되는, ~인 경우, ~때 등의 조건 표현은 선택적인 경우로만 제한 해석되지 않는다. 특정 조건을 만족하는 때, 특정 조건에 대응하여 관련 동작을 수행하거나, 관련 정의가 해석되도록 의도되었다.The terminology used to describe the embodiments is used for the purpose of describing specific embodiments, and is not intended to limit the embodiments. As used in the description of the embodiments and in the claims, the singular is intended to include the plural unless the context clearly dictates otherwise. and/or expressions are used in their sense to include all possible combinations between terms. The expression to include describes the presence of features, numbers, steps, elements, and/or components, and does not mean to exclude additional features, numbers, steps, elements, and/or components. . Conditional expressions such as when, when, etc. used to describe the embodiments are not limited to only optional cases. When a specific condition is satisfied, a related action is performed in response to the specific condition, or a related definition is intended to be interpreted.

이상의 설명은 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 실시예들이 속하는 기술 분야에서 통상의 지식을 가진 자라면 실시예들의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. The above description is merely illustrative of the technical idea, and various modifications and variations will be possible without departing from the essential characteristics of the embodiments by those of ordinary skill in the art to which the embodiments belong.

따라서, 이상에서 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 본 발명의 실시예에 의하여 기술 사상의 범위가 한정되는 것은 아니다.Therefore, the embodiments disclosed above are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit is not limited by the embodiments of the present invention.

본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

810: 입력 행렬
820: 언폴드 된 벡터
830: 결과 행렬810: input matrix
820: Unfolded Vector
830: result matrix

Claims

performing quantization on a plurality of input matrices and outputting them as a plurality of binarized matrices;
converting the plurality of binarized matrices into a plurality of compressed bit vectors;
calculating a position of a specific bit on the plurality of compressed bit vectors;
outputting a result matrix by performing multiplication between the plurality of compressed bit vectors; and
transforming the result matrix according to the position of the specific bit on the compressed bit vector; containing,
Multidimensional matrix multiplication method.

The method of claim 1,
The step of converting the plurality of compressed bit vectors comprises:
converting the plurality of binarized matrices into bit vectors by unfolding them;
performing compression of the converted bit vector by writing a bit value included in the converted bit vector; containing,
Multidimensional matrix multiplication method.

3. The method of claim 2,
The value of the bit vector includes 0, 1, and the number of times 0 and 1 are repeated.
Multidimensional matrix multiplication method.

The method of claim 1,
The plurality of input matrices includes a first matrix and a second matrix,
Outputting the result matrix comprises:
grouping bits included in the compressed bit vector into logical blocks by at least one of the number of columns of the first matrix and the number of rows of the second matrix;
performing a preset operation on the grouped bits;
calculating the number of 1's included in the new bit vector obtained through the preset operation; and
outputting the result matrix by setting the number of 1s as positions; further comprising
Multidimensional matrix multiplication method.

5. The method of claim 4,
The preset operation is an XNOR (exclusive nor) operation and a Bitcount operation,
Multidimensional matrix multiplication method.

5. The method of claim 4,
The bits grouped in units of logical blocks include a first group and a second group,
Outputting the result matrix comprises:
outputting a result matrix for the second group as the result matrix is output for the first group; containing
Multidimensional matrix multiplication method.

7. The method of claim 6,
When the result matrix is output for all bits grouped in the logical block unit,
circularly shifting a bit vector included in the compressed bit vector by the size of the logical block unit;
grouping bits included in the compressed bit vector in logical block units by at least one of the number of columns of the first matrix and the number of rows of the second matrix;
performing a preset operation on the grouped bits;
calculating the number of 1's included in the new bit vector obtained through the preset operation; and
outputting the result matrix by setting the number of 1s as positions; further comprising
Multidimensional matrix multiplication method.

a binarization module for outputting a plurality of binarized matrices by performing quantization on a plurality of input matrices;
a bit vector compression module converting the plurality of binarized matrices into a plurality of compressed bit vectors;
a cursor module for calculating a bit at a specific position on the plurality of compressed bit vectors; and
a multiplication operation module for performing multiplication of the plurality of compressed bit vectors and outputting a result matrix;
containing
Multidimensional matrix multiplication operation unit.

9. The method of claim 8,
The bit vector compression module,
unfolding the plurality of binarized matrices and transforming them into bit vectors,
performing compression of the converted bit vector by writing the value of the bit included in the converted bit vector,
Multidimensional matrix multiplication operation unit.

10. The method of claim 9,
The value of the bit vector includes 0, 1, and the number of times 0 and 1 are repeated.
Multidimensional matrix multiplication operation unit.

9. The method of claim 8,
The plurality of input matrices includes a first matrix and a second matrix,
The multiplication operation module is
Grouping bits included in the compressed bit vector by at least one of the number of columns of the first matrix or the number of rows of the second matrix in logical block units,
performing a preset operation on the grouped bits,
Calculate the number of 1s included in the new bit vector obtained through the preset operation,
Outputting the result matrix by setting the number of 1s to positions,
Multidimensional matrix multiplication operation unit.

12. The method of claim 11,
The preset operation is an XNOR operation and a Bitcount operation,
Multidimensional matrix multiplication operation unit.

12. The method of claim 11,
The bits grouped in units of logical blocks include a first group and a second group,
The multiplication operation module is
outputting a result matrix for the second group as the result matrix is output for the first group,
Multidimensional matrix multiplication operation unit.

14. The method of claim 13,
When the multiplication operation module outputs a result matrix for all of the bits grouped in the logical block unit,
The multiplication operation module is
Cycically shifting a bit vector included in the compressed bit vector by the size of the logical block unit,
Grouping bits included in the compressed bit vector by at least one of the number of columns of the first matrix or the number of rows of the second matrix in logical block units,
performing a preset operation on the grouped bits,
Calculate the number of 1s included in the new bit vector obtained through the preset operation,
Outputting the result matrix by setting the number of 1s as positions,
Multidimensional matrix multiplication operation unit.

performing quantization on a plurality of input matrices and outputting them as a plurality of binarized matrices;
transform the plurality of binarized matrices into a plurality of compressed bit vectors;
calculating a position of a specific bit on the plurality of compressed bit vectors;
performing multiplication between the plurality of compressed bit vectors to output a result matrix;
transforming the result matrix, corresponding to the position of the specific bit on the compressed bit vector;
A storage medium storing a multidimensional matrix multiplication program.