KR20170089389A

KR20170089389A - Memory allocation apparatus and method for large-scale sparse matrix multiplication on a single machine

Info

Publication number: KR20170089389A
Application number: KR1020160090956A
Authority: KR
Inventors: 김상욱; 조용연; 이규환; 장명환
Original assignee: 한양대학교 산학협력단
Priority date: 2016-01-26
Filing date: 2016-07-18
Publication date: 2017-08-03
Also published as: KR101858593B1

Abstract

Disclosed are a device and a method for allocating a memory for single machine based large scale sparse matrix multiplication. The method for allocating a memory comprises the following steps: identifying the whole region of the memory; confirming a size of a first region in the memory requiring for loading a first sparse matrix of two sparse matrices, a size of a second region in the memory requiring for loading a second sparse matrix of two sparse matrices, and a size of a third region in the memory for recording a multiplication result with respect to the two sparse matrices; and allocating the memory region according to a multiplication method of the two sparse matrices based on the whole region of the identified memory and a sum of the size of the first region, the size of the second region, and the size of the third region.

Description

[0001] MEMORY ALLOCATION APPARATUS AND METHOD FOR LARGE-SCALE SPARSE MATRIX MULTIPLICATION ON A SINGLE MACHINE [0002]

본 발명은 단일 머신 기반의 대용량 희소행렬 곱셈을 위한 메모리 할당 장치 및 방법에 관한 것으로 보다 구체적으로는 단일 머신으로 두 희소행렬의 곱셈을 효과적으로 수행하기 위해 필요한 메모리 영역을 할당하는 메모리 할당 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and a method for allocating a memory for a large capacity sparse matrix multiplication based on a single machine, and more particularly, to a memory allocation apparatus and method for allocating memory areas necessary for efficiently performing a multiplication of two sparse matrices with a single machine .

오늘날 사회연결망, 웹 등의 그래프 크기가 급격히 증가하면서 이를 효과적으로 분석하기 위한 방법이 필요하게 되었다. 현재 분산 처리 시스템을 기반으로 대용량 그래프를 분석하는 방법들이 제안되어 왔다. 두 희소행렬의 곱셈은 그래프 연산에서 핵심이 되는 연산이다.Today, there is a need for a method for effectively analyzing the increase in the graph size of the social network and the web. Currently, methods for analyzing large capacity graphs based on distributed processing systems have been proposed. Multiplication of two sparse matrices is a key operation in graph operations.

그러나 두 희소행렬의 곱셈은 분산 처리 시스템에서 더욱 비효율적이다. 이유는 다음과 같다. 첫째, 각 분산 처리 시스템의 노드 마다 처리할 데이터를 저장해야 한다. 두 희소행렬 곱셈은 두 개의 행렬에 대한 데이터를 노드 마다 중복적으로 가지고 있어야 하는 경우가 발생한다. 대용량의 데이터의 경우에는 많은 저장 공간을 요구할 수 있다.However, the multiplication of two sparse matrices is more inefficient in a distributed processing system. The reason is as follows. First, the data to be processed for each node of each distributed processing system must be stored. Two sparse matrix multiplications occur when data for two matrices must be redundant for each node. In the case of a large amount of data, a large amount of storage space may be required.

둘째, 최종 결과를 생성하기 위해서 노드 간의 communication이 필요하다. 각 노드마다 불완전한 결과들을 생성할 수 있다. 따라서 이것들을 병합하는 과정이 필요하고, 이를 위해 노드 간의 communication이 발생한다. 또한 두 희소행렬 곱셈은 매우 많은 결과를 생성할 수 있기 때문에 communication 비용은 더욱 증가할 수 있다.Second, communication between nodes is required to produce the final result. Each node can produce incomplete results. Therefore, a process of merging them is necessary, and communication is generated between the nodes. Also, since the two sparse matrix multiplications can produce very large results, the communication cost can be further increased.

셋째, 그래프 연산에서 사용되는 사회연결망 또는 웹과 같은 그래프를 표현한 희소행렬의 원소의 분포는 power-law degree distribution을 따른다. 이러한 분포로 인해 각 노드에 작업량을 균등하게 분배하는 것이 어렵워 분산처리 시스템을 충분히 활용하기 어렵다.Third, the distribution of the elements of the sparse matrices that represent graphs such as social networks or webs used in graph operations follows a power-law degree distribution. Because of this distribution, it is difficult to evenly distribute the workload to each node, making it difficult to fully utilize the distributed processing system.

따라서, 이러한 문제를 해결하기 위한 단일 머신 기반의 두 희소행렬의 곱셈에 대한 필요성이 대두되었다.Therefore, there is a need for multiplication of two sparse matrices based on a single machine to solve this problem.

본 발명은 두 희소행렬의 곱셈 방식에 따라 서로 다른 메모리 영역을 할당함으로써 각각의 곱셈 방식에 대해 효과적으로 두 희소행렬의 곱셈을 수행할 수 있는 장치 및 방법을 제공한다.The present invention provides an apparatus and method for effectively performing a multiplication of two sparse matrices for each multiplication scheme by allocating different memory areas according to a method of multiplying two sparse matrices.

본 발명의 일실시예에 따른 메모리 할당 방법은 메모리의 전체 영역을 식별하는 단계; 두 희소행렬 중 제1 희소행렬을 로드하기 위해 필요한 상기 메모리 내 제1 영역의 크기, 제2 희소행렬을 로드하기 위해 필요한 상기 메모리 내 제2 영역의 크기 및 상기 두 희소행렬에 대한 곱셈 결과를 기록하기 위한 상기 메모리 내 제3 영역의 크기를 확인하는 단계; 및 상기 식별된 메모리의 전체 영역과 상기 확인된 제1 영역의 크기, 제2 영역의 크기 및 제3 영역의 크기의 합에 기초하여 상기 두 희소행렬의 곱셈 방식에 따라 상기 메모리의 영역을 할당하는 단계를 포함할 수 있다.A method of allocating memory according to an embodiment of the present invention includes: identifying an entire area of a memory; The size of the first area in the memory needed to load the first sparse matrix of the two sparse matrices, the size of the second area in the memory required to load the second sparse matrix, and the result of multiplying the two sparse matrices Confirming a size of a third area in the memory to make the first area; And allocating the area of the memory according to the multiplication method of the two sparse matrices based on the total area of the identified memory and the sum of the size of the identified first area, the size of the second area and the size of the third area Step < / RTI >

상기 할당하는 단계는 상기 두 희소행렬을 inner product 에 기초하여 곱셈을 하는 경우, 상기 메모리의 제2 영역 및 제3 영역에 최소 영역을 할당하고, 할당하고 남은 영역을 제1 영역에 할당할 수 있다.The allocating step may allocate a minimum area to the second area and the third area of the memory and allocate the remaining area to the first area when multiplying the two sparse matrices based on the inner product .

상기 할당하는 단계는 상기 제2 영역 및 제3 영역은 동일한 크기의 영역을 할당할 수 있다.In the allocating step, the second area and the third area may be allocated the same size area.

상기 할당하는 단계는 상기 두 희소행렬을 row-row product에 기초하여 곱셈을 하는 경우, 상기 메모리의 제1 영역에 최소 영역을 할당하고, 할당하고 남은 영역을 제2 영역 및 제3 영역에 나누어 할당할 수 있다.Wherein the allocating step allocates a minimum area to the first area of the memory when the two sparse matrices are multiplied based on the row-row product, allocates the remaining area to the second area and the third area, can do.

상기 할당하는 단계는 상기 두 희소행렬을 outer product에 기초하여 곱셈을 하는 경우, 상기 메모리의 제3 영역에 상기 메모리의 절반에 해당하는 영역을 할당하고, 할당하고 남은 영역을 제1 영역 및 제2 영역에 나누어 할당할 수 있다.Wherein the allocating step allocates the area corresponding to one half of the memory to the third area of the memory when the two sparse matrices are multiplied based on the outer product and allocates the remaining area to the first area and the second area It is possible to allocate them separately.

상기 할당하는 단계는 상기 제1 영역 및 제2 영역은 동일한 크기의 영역을 할당할 수 있다.In the allocating step, the first area and the second area may be allocated areas of the same size.

상기 할당된 제1 영역에 제1 희소행렬의 부분 행렬을 로드하고, 상기 할당된 제2 영역에 제2 희소행렬의 부분 행렬을 로드하는 단계; 상기 로드된 두 희소행렬의 부분 행렬에 대한 곱셈을 수행하는 단계; 및 상기 두 희소행렬의 부분 행렬에 대한 곱셈을 수행한 결과를 상기 할당된 제3 영역에 기록하여 처리하는 단계를 더 포함할 수 있다.Loading a partial matrix of the first sparse matrix into the allocated first region and loading a partial matrix of the second sparse matrix into the allocated second region; Performing a multiplication on the partial matrix of the two loaded sparse matrices; And writing the result of performing the multiplication of the partial matrix of the two sparse matrices to the allocated third region and processing the processed result.

상기 로드하는 단계는 상기 제1 희소행렬의 부분 행렬 및 상기 제2 희소행렬의 부분 행렬 중 로드되지 않고 남아있는 부분 행렬을 확인하는 단계; 및 상기 남아 있는 부분 행렬을 각각 제1 영역 및 제2 영역에 로드하는 단계를 더 포함할 수 있다.Wherein the loading step comprises: identifying a partial matrix of the first sparse matrix and a partial matrix of the second sparse matrix that remains unloaded; And loading the remaining partial matrixes into the first area and the second area, respectively.

본 발명의 일실시예에 따른 메모리 할당 장치는 두 희소행렬을 곱하기 위하여 상기 두 희소행렬의 부분 행렬을 로드하기 위한 메모리; 상기 메모리의 전체 영역을 식별하고, 상기 두 희소행렬 중 제1 희소행렬을 로드하기 위해 필요한 상기 메모리 내 제1 영역의 크기, 제2 희소행렬을 로드하기 위해 필요한 상기 메모리 내 제2 영역의 크기 및 상기 두 희소행렬에 대한 곱셈 결과를 기록하기 위한 상기 메모리 내 제3 영역의 크기를 확인하며, 상기 식별된 메모리의 전체 영역과 상기 확인된 제1 영역의 크기, 제2 영역의 크기 및 제3 영역의 크기의 합에 기초하여 상기 두 희소행렬의 곱셈 방식에 따라 상기 메모리의 영역을 할당하는 프로세서를 포함할 수 있다.A memory allocation apparatus according to an embodiment of the present invention includes a memory for loading a partial matrix of the two sparse matrices to multiply two sparse matrices; A size of a first area in the memory required to load a first sparse matrix of the two sparse matrices; a size of a second area in the memory required to load a second sparse matrix; Determining a size of a third area in the memory for recording a result of the multiplication for the two sparse matrices, determining a size of the identified area, a size of the second area, And a processor for allocating an area of the memory according to a method of multiplying the two sparse matrices based on a sum of sizes of the two sparse matrices.

상기 프로세서는 상기 두 희소행렬을 inner product 에 기초하여 곱셈을 하는 경우, 상기 메모리의 제2 영역 및 제3 영역에 최소 영역을 할당하고, 할당하고 남은 영역을 제1 영역에 할당할 수 있다.When multiplying the two sparse matrices based on inner products, the processor allocates a minimum area to the second area and the third area of the memory, and allocates the remaining area to the first area.

상기 프로세서는 상기 두 희소행렬을 row-row product에 기초하여 곱셈을 하는 경우, 상기 메모리의 제1 영역에 최소 영역을 할당하고, 할당하고 남은 영역을 제2 영역 및 제3 영역에 나누어 할당할 수 있다.When multiplying the two sparse matrices based on a row-row product, the processor allocates a minimum area to the first area of the memory, allocates the remaining area to the second area and the third area, have.

상기 프로세서는 상기 두 희소행렬을 outer product에 기초하여 곱셈을 하는 경우, 상기 메모리의 제3 영역에 상기 메모리의 절반에 해당하는 영역을 할당하고, 할당하고 남은 영역을 제1 영역 및 제2 영역에 나누어 할당할 수 있다.The processor allocates a region corresponding to one-half of the memory to a third region of the memory, and allocates the remaining region to the first region and the second region when multiplying the two sparse matrices based on the outer product. You can allocate them separately.

상기 프로세서는 상기 할당된 제1 영역에 제1 희소행렬의 부분 행렬을 로드하고, 상기 할당된 제2 영역에 제2 희소행렬의 부분 행렬을 로드하며, 상기 로드된 두 희소행렬의 부분 행렬에 대한 곱셈을 수행한 후 상기 두 희소행렬의 부분 행렬에 대한 곱셈을 수행한 결과를 상기 할당된 제3 영역에 기록하여 처리할 수 있다.The processor loads the partial matrix of the first sparse matrix into the allocated first region, loads the partial matrix of the second sparse matrix into the allocated second region, and updates the partial matrix of the loaded two sparse matrix A result of performing a multiplication of a partial matrix of the two sparse matrices after performing multiplication may be recorded and processed in the allocated third region.

상기 프로세서는 상기 할당된 제3 영역이 가득찬 경우, 상기 제3 영역에 기록된 결과를 저장장치에 저장하고, 상기 제3 영역에 기록된 결과가 row-row product 또는 outer prodect에 의해 생성된 경우, 상기 저장장치에 저장된 결과를 통합할 수 있다.When the allocated third area is full, the processor stores the result recorded in the third area in the storage device, and when the result recorded in the third area is generated by a row-row product or an outer prodect , And integrate the results stored in the storage device.

본 발명의 일실시예에 따르면, 두 희소행렬의 곱셈 방식에 따라 서로 다른 메모리 영역을 할당함으로써 각각의 곱셈 방식에 대해 효과적으로 두 희소행렬의 곱셈을 수행할 수 있다.According to an embodiment of the present invention, the two sparse matrix multiplication can be effectively performed for each multiplication scheme by allocating different memory areas according to the multiplication method of the two sparse matrices.

도 1은 본 발명의 일실시예에 따른 메모리 할당 장치를 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 서로 다른 3가지의 두 희소행렬 곱셈 방식을 도시한 도면이다.
도 3은 본 발명의 일실시예에 따른 서로 다른 3가지의 두 희소행렬 곱셈 방식에 따른 메모리 할당 방법의 예를 도시한 도면이다.
도 4는 본 발명의 일실시예에 따른 메모리 할당 장치가 두 희소행렬에 대해 곱셈 연산하는 방법을 플로우챠트로 도시한 도면이다.1 is a block diagram illustrating a memory allocation apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating three different two-sparse matrix multiplication schemes according to an embodiment of the present invention.
3 is a diagram illustrating an example of a memory allocation method according to three different two-sparse matrix multiplication schemes according to an embodiment of the present invention.
FIG. 4 is a flowchart illustrating a method of performing a multiplication operation on two sparse matrices by a memory allocation apparatus according to an embodiment of the present invention. Referring to FIG.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 메모리 할당 장치를 도시한 도면이다.1 is a block diagram illustrating a memory allocation apparatus according to an embodiment of the present invention.

메모리 할당 장치(100)는 프로세서(110), 메모리(120) 및 저장 장치(130)로 구성될 수 있다. 프로세서(110)는 두 희소행렬의 곱셈을 수행하기 위하여 두 희소행렬의 곱셈 방식에 따라 상기 메모리(120) 내에 서로 다른 메모리 영역을 할당함으로써 각각의 곱셈 방식에 대해 효과적으로 두 희소행렬의 곱셈을 수행할 수 있다. 이때, 메모리 할당 장치(100)는 단일 머신으로 구성될 수 있으며, 두 희소행렬의 곱셈 방식은 inner product, row-row product 및 outer product 중 적어도 하나를 포함할 수 있다.The memory allocation apparatus 100 may include a processor 110, a memory 120, and a storage device 130. The processor 110 effectively performs a multiplication of two sparse matrices for each multiplication scheme by allocating different memory areas in the memory 120 according to a multiplication method of two sparse matrices to perform a multiplication of two sparse matrices . At this time, the memory allocation apparatus 100 may be composed of a single machine, and the multiplication method of the two sparse matrices may include at least one of an inner product, a row-row product, and an outer product.

구체적으로 본 발명의 메모리 할당 장치(100)는 두 희소행렬이 메모리(120)에 한 번에 로드되는 것이 불가능한 상황을 가정한다. 이러한 경우에 메모리 할당 장치(100)는 먼저 메모리(120)의 전체 영역에 대한 크기를 고려하여 두 희소행렬을 부분행렬로 분해할 수 있다. Specifically, the memory allocation apparatus 100 of the present invention assumes a situation where it is impossible for two sparse matrices to be loaded into the memory 120 at one time. In this case, the memory allocation apparatus 100 can first decompose the two sparse matrices into submatrons considering the size of the entire area of the memory 120. [

이후 메모리 할당 장치(100)는 메모리(120) 내에 두 희소행렬의 부분행렬을 각각 로드하기 위한 영역 및 두 희소행렬의 부분행렬을 곱하여 생성되는 결과를 기록하기 위한 영역을 두 희소행렬의 곱셈 방식에 따라 나누어 할당할 수 있다. Thereafter, the memory allocation apparatus 100 divides the area for loading the partial matrixes of the two sparse matrices into the memory 120 and the area for recording the result generated by multiplying the partial matrixes of the two sparse matrices by a method of multiplying two sparse matrices You can assign them separately.

메모리 할당 장치(100)는 두 희소행렬의 부분행렬을 각각 로드하기 위해 할당된 영역에 상기 두 희소행렬의 부분행렬을 로드하고, 각각의 두 희소행렬의 곱셈 방식에 따라 곱셈을 실시할 수 있다. 이후, 메모리 할당 장치(100)는 곱셈을 실시한 결과를 두 희소행렬의 부분행렬을 곱하여 생성되는 결과를 기록하기 위해 할당된 영역에 기록하고, 해당 영역이 가득 찬 경우, 기록된 결과를 저장 장치(130)에 저장할 수 있다.The memory allocation apparatus 100 may load the submatrix of the two sparse matrices into an area allocated to load each of the partial matrices of the two sparse matrices and perform multiplication according to the multiplication method of each of the two sparse matrices. Thereafter, the memory allocation apparatus 100 writes the result of the multiplication in an area allocated for recording a result generated by multiplying a partial matrix of two sparse matrices, and when the area is full, 130).

도 2는 본 발명의 일실시예에 따른 서로 다른 3가지의 두 희소행렬 곱셈 방식을 도시한 도면이다.FIG. 2 is a diagram illustrating three different two-sparse matrix multiplication schemes according to an embodiment of the present invention.

두 희소행렬의 곱셈은 하기의 식 1과 같이 두 개의 희소 행렬(제1 희소행렬과 제2 희소행렬)을 곱하여 새로운 제3 결과행렬을 생성하는 연산이다. 이 연산은 행렬 내 원소간의 반복적인 곱셈과 덧셈을 수행된다. 원소간의 곱셈 및 덧셈을 수행하는 순서와 두 희소행렬 내의 원소에 접근하는 순서에 따라 두 희소행렬의 곱셈 방법을 (1) inner product, (2) row-row product 및 (3) outer product로 분류할 수 있다.The multiplication of two sparse matrices is an operation of multiplying two sparse matrices (first sparse matrix and second sparse matrix) as shown in Equation 1 below to generate a new third result matrix. This operation is repeatedly multiplied and added between elements in the matrix. (1) inner product, (2) row-row product, and (3) outer product, depending on the order of performing multiplication and addition between elements and the order of accessing elements in two sparse matrices .

[식 1][Formula 1]

제3 결과행렬 = 제1 희소행렬

제2 희소행렬The third result matrix = the first sparse matrix

The second sparse matrix

보다 구체적으로 두 희소행렬의 곱셈 방법 중 (1)inner product를 이용하여 두 희소행렬을 곱하는 방법은 하기의 식 2와 같다. More specifically, a method of multiplying two sparse matrices by using (1) an inner product among the methods of multiplying two sparse matrices is as follows.

[식 2][Formula 2]

이때, C[i][k]는 제3 결과행렬을 나타내고, A[i][j]와 B[j][k]는 각각 제1 희소행렬 및 제2 희소행렬을 나타낼 수 있다. 제1 희소행렬 A에 포함된 행 벡터(A[i][])의 각 원소는 상기 각 원소와 대응되는 제2 희소행렬 B에 포함된 열 벡터(B[][k])의 원소 하나와 곱해질 수 있다. 이때, 제1 희소행렬 A에 포함된 행 벡터(A[i][])의 각 원소 및 상기 각 원소와 대응되는 제2 희소행렬 B에 포함된 열 벡터(B[][k])의 원소간의 곱셈으로 생성된 결과들을 합하여 제3 결과행렬 C에 포함되는 1개의 결과 원소(C[i][k])를 생성할 수 있다. Here, C [i] [k] represents a third result matrix, and A [i] [j] and B [j] [k] may represent a first sparse matrix and a second sparse matrix, respectively. Each element of the row vector A [i] [] included in the first sparse matrix A is divided into one element of the column vector B [k] included in the second sparse matrix B corresponding to each element, Can be multiplied. At this time, the elements of the row vector (A [i] []) included in the first sparse matrix A and the elements of the column vectors B [] [k] included in the second sparse matrix B corresponding to the respective elements (C [i] [k]) included in the third result matrix C can be generated by summing the results generated by the multiplication between the first and second result matrices.

이 때, 두 원소간의 곱셈이 가능하기 위해서는 제1 희소행렬 A 내 원소(A[i][j])의 열 인덱스(index)와 제2 희소행렬 B 내 원소(B[j][k])의 행 인덱스가 같아야 한다. 그 결과 도 2의 (a)와 같이 제1 희소행렬 A에 포함된 행 벡터 1개와 희소행렬 B에 포함된 열 벡터 n개를 연산하여 제3 결과행렬 C에 포함되는 행 벡터 1개를 생성되고, 이를 제1 희소행렬 A에 포함된 n개의 행 벡터에 대해 반복함으로써 최종 제3 결과행렬 C를 생성할 수 있다.In order to enable the multiplication between the two elements, the column index of the element A [i] [j] in the first sparse matrix A and the element B [j] [k] Must be the same. As a result, one row vector included in the first sparse matrix A and one column vector included in the sparse matrix B are calculated to generate one row vector included in the third result matrix C , It is possible to generate the final third result matrix C by repeating it for n row vectors included in the first sparse matrix A. [

Inner product 기반의 대용량 희소 행렬의 곱셈을 수행하기 위해서 제1 희소행렬 A는 행 벡터에 순차적으로 접근할 수 있도록 메모리에 저장되고, 제2 희소행렬 B는 열 벡터에 순차적으로 접근할 수 있도록 메모리에 저장되어야 한다. 이는 제1 희소행렬 A는 행 벡터 단위의 접근이 요구되고, 희소행렬 B는 열 벡터 단위의 접근이 요구되기 때문이다.In order to perform multiplication of a large-capacity sparse matrix based on an inner product, a first sparse matrix A is stored in a memory so as to be sequentially accessed to a row vector, and a second sparse matrix B is stored in a memory Should be stored. This is because the first sparse matrix A requires access on a row vector basis and the sparse matrix B requires access on a column vector basis.

다음으로 두 희소행렬의 곱셈 방법 중 (2)row-row product를 이용하여 두 희소행렬을 곱하는 방법은 하기의 식 3와 같다. Next, a method of multiplying two sparse matrices by using a row-row product in the method of multiplying two sparse matrices is expressed by Equation 3 below.

[식 3][Formula 3]

이때, C[i][]는 제3 결과행렬을 나타내고, A[i][j]와 B[j][]는 각각 제1 희소행렬 및 제2 희소행렬을 나타낼 수 있다. 제1 희소행렬 A에 포함된 하나의 행 벡터(A[i][])와 제2 희소행렬 B에 포함된 복수의 행 벡터(B[][k], B[][l], …)를 곱하여 제3 결과행렬 C에 포함되는 하나의 행 벡터(C[i][])를 생성할 수 있다. 이 때, 제1 희소행렬 A에 포함된 행 벡터의 각 원소(A[i][j])는 상기 각 원소(A[i][j])와 대응되는 제2 희소행렬 B에 포함된 하나의 행 벡터(B[j][]) 내에 존재하는 모든 원소와 계산되어 불완전한 제3 결과행렬 C에 포함되는 하나의 행 벡터(C[i][])를 생성할 수 있다. Here, C [i] [] represents a third result matrix, and A [i] [j] and B [j] [] may represent a first sparse matrix and a second sparse matrix, respectively. A plurality of row vectors B [k], B [] [l], ... included in the second sparse matrix B and one row vector A [i] To generate one row vector C [i] [] included in the third result matrix C, as shown in FIG. At this time, each element (A [i] [j]) of the row vector included in the first sparse matrix A corresponds to one element included in the second sparse matrix B corresponding to each of the elements A [i] [j] (C [i] []) included in the third result matrix C, which is calculated and incomplete, with all the elements existing in the row vector B [j] [] of the row vector C [i]

그 결과 제1 희소행렬 A에 포함된 하나의 행 벡터와 제2 희소행렬 B에 포함된 복수의 행 벡터와의 연산으로 생성된 불완전한 제3 결과행렬 C에 포함되는 복수의 행 벡터들을 합침으로써 완전한 제3 결과행렬 C에 포함되는 하나의 행 벡터를 생성할 수 있다. 이를 제1 희소행렬 A에 포함된 n개의 행 벡터에 대해 반복하여 최종 제3 결과행렬 C를 생성할 수 있다.As a result, by combining a plurality of row vectors included in the incomplete third result matrix C generated by the operation of one row vector included in the first sparse matrix A and the plurality of row vectors included in the second sparse matrix B, One row vector included in the third result matrix C can be generated. It can be repeated for n row vectors included in the first sparse matrix A to generate the final third result matrix C. [

Row-row product 기반의 대용량 희소 행렬의 곱셈을 수행하기 위해서 제1 희소행렬 A 및 제2 희소행렬 B 모두 행 벡터에 순차적으로 접근할 수 있도록 메모리에 저장되어야 한다. 이는 제1 희소행렬 A 및 제2 희소행렬 B 모두 행 벡터 단위의 접근이 요구되기 때문이다.In order to perform a multiplication of a large-capacity sparse matrix based on a row-row product, both the first sparse matrix A and the second sparse matrix B must be stored in the memory so as to sequentially access the row vectors. This is because both the first sparse matrix A and the second sparse matrix B require access on a row vector basis.

마지막으로 두 희소행렬의 곱셈 방법 중 (3)outer product를 이용하여 두 희소행렬을 곱하는 방법은 하기의 식 4와 같다.Finally, among the multiplication methods of two sparse matrices, (3) a method of multiplying two sparse matrices by using an outer product is as follows.

[식 4][Formula 4]

이때, C[][]는 제3 결과행렬을 나타내고, A[][j]와 B[j][]는 각각 제1 희소행렬 및 제2 희소행렬을 나타낼 수 있다. 제1 희소행렬 A에 포함된 하나의 열 벡터(A[][j])와 제2 희소행렬 B에 포함된 하나의 행 벡터(B[j][])를 곱하여 불완전한 전체 제3 결과행렬 C를 생성할 수 있다. 이 때, 제1 희소행렬 A에 포함된 하나의 열 벡터의 각 원소(A[i][j])는 제2 희소행렬 B에 포함된 행 벡터(B[j][]) 내에 존재하는 모든 원소와 계산되어 불완전한 전체 제3 결과행렬 C에 포함되는 불완전한 하나의 행 벡터(C[i][])를 생성할 수 있다. Here, C [] [] represents a third result matrix, and A [] [j] and B [j] [] may represent a first sparse matrix and a second sparse matrix, respectively. A single column vector A [j] included in the first sparse matrix A is multiplied by one row vector B [j] [] included in the second sparse matrix B to generate an incomplete total third result matrix C Lt; / RTI > At this time, each element (A [i] [j]) of one column vector included in the first sparse matrix A is all contained in the row vector B [j] [] included in the second sparse matrix B (C [i] []), which is incompletely included in the computed and incomplete total third result matrix C, can be generated.

그 결과 제1 희소행렬 A에 포함된 하나의 열 벡터에 존재하는 모든 원소와 제2 희소행렬 B에 포함된 하나의 행 벡터 (B[j][])와의 연산으로 생성된 불완전한 전체 제3 결과행렬 C에 포함되는 불완전한 복수 개의 행 벡터를 생성할 수 있다. 이와 같이, 제1 희소행렬 A에 포함된 모든 열 벡터와 제2 희소행렬 B에 포함된 모든 행 벡터가 동일한 인덱스 쌍에 따라 연산되어 불완전한 전체 제3 결과행렬 C에 포함되는 불완전한 복수의 행 벡터들을 생성할 수 있고, 생성된 불완전한 복수의 행 벡터들을 통합함으로써 최종 제3 결과행렬 C를 생성할 수 있다.As a result, an incomplete entire third result generated by an operation of all the elements existing in one column vector included in the first sparse matrix A and one row vector B [j] [] included in the second sparse matrix B A plurality of incomplete row vectors included in the matrix C can be generated. In this manner, all column vectors included in the first sparse matrix A and all the row vectors included in the second sparse matrix B are computed according to the same index pair, and a plurality of incomplete row vectors included in the incomplete overall result matrix C And generate the final third result matrix C by integrating the generated incomplete plurality of row vectors.

도 3은 본 발명의 일실시예에 따른 서로 다른 3가지의 두 희소행렬 곱셈 방식에 따른 메모리 할당 방법의 예를 도시한 도면이다.3 is a diagram illustrating an example of a memory allocation method according to three different two-sparse matrix multiplication schemes according to an embodiment of the present invention.

두 희소행렬의 곱셈을 위해 할당되는 두 희소행렬의 부분행렬을 각각 로드하기 위한 영역의 크기 및 두 희소행렬의 부분행렬을 곱하여 생성되는 결과를 기록하기 위한 영역의 크기에 따라 두 희소행렬의 곱셈에 대한 성능이 변하기 때문에 각 두 희소행렬의 곱셈 방법에 대한 특징을 파악하고 그에 맞는 메모리 할당 방법이 필요하다.The multiplication of two sparse matrices according to the size of the area for loading the partial matrixes of the two sparse matrices allocated for the multiplication of the two sparse matrices and the size of the area for recording the result generated by multiplying the partial matrixes of the two sparse matrices Since the performance is changed, it is necessary to grasp the characteristics of the method of multiplying two sparse matrices and a memory allocation method corresponding thereto.

(1) Inner product 기반의 대용량 희소 행렬의 곱셈 방법은 데이터베이스 분야에서 잘 알려진 block nested loop join을 이용한 두 relation의 join과 유사하게 동작한다. 우선, 메모리 할당 장치(100)는 제1 희소행렬 A를 복수의 행 벡터로 이루어진 부분 행렬 단위로 분해하여 메모리(120)로 로드할 수 있다. 이후 메모리 할당 장치(100)는 제2 희소행렬 B의 전체를 부분 행렬 단위로 분해하여 메모리(120)로 로드하면서 곱셈 가능한 원소들에 대해 곱셈 연산을 수행할 수 있다. 이 과정 역시 제1 희소행렬 A의 모든 부분 행렬을 처리할 때까지 반복되므로, 제2 희소행렬 B의 전체를 제1 희소행렬 A의 부분 행렬 수만큼 로드하게 된다. 메모리 할당 장치(100)는 연산된 결과를 메모리(120) 담아두었다가 결과를 위한 메모리 영역이 가득 찼을 때 저장 장치(130)로 옮겨 저장할 수 있다. (1) Multiplication of large capacity sparse matrices based on Inner product works similar to joining two relations using block nested loop join which is well known in database field. First, the memory allocation apparatus 100 can decompose the first sparse matrix A into sub-matrix units composed of a plurality of row vectors and load them into the memory 120. [ The memory allocation apparatus 100 may then perform a multiplication operation on the elements that can be multiplied while decomposing the entire second sparse matrix B into sub-matrix units and loading them into the memory 120. [ This process is also repeated until all the partial matrices of the first sparse matrix A are processed, so that the entirety of the second sparse matrix B is loaded by the number of partial matrices of the first sparse matrix A. [ The memory allocation apparatus 100 may store the computed result in the memory 120 and transfer the result to the storage device 130 when the memory area for the result is full.

이때, 제1 희소행렬 A 및 제2 희소행렬 B와 제3 결과행렬 C에 할당하는 메모리 영역의 크기에 따라 (1)두 희소행렬의 부분 행렬을 메모리에 적재하는 과정, (2) 두 희소행렬의 부분 행렬에 대한 곱셈 결과 생성된 제3 결과행렬 C를 저장장치(130)에 저장하는 과정의 성능이 달라진다. 상기 두 과정의 성능 최적화를 위해 메모리 할당 장치(100)는 block nested loop join에서 IO 비용을 최소화시키는 메모리 할당 방법을 적용한다. (1) loading a partial matrix of two sparse matrices into a memory according to the size of a memory area allocated to the first sparse matrix A, the second sparse matrix B, and the third result matrix C; (2) The performance of the process of storing the third result matrix C generated as a result of the multiplication on the partial matrix of the storage device 130 is different. In order to optimize the performance of the two processes, the memory allocation apparatus 100 applies a memory allocation method that minimizes IO cost in a block nested loop join.

즉, 메모리 할당 장치(100)는 제2 희소행렬을 로드하기 위해 필요한 메모리(120) 내의 제2 영역 및 두 희소행렬에 대한 곱셈 결과 생성된 제3 결과행렬 C를 기록하기 위해 필요한 메모리(120) 내의 제3 영역에 대해서는 최소 영역을 할당할 수 있다. 그리고 메모리 할당 장치(100)는 제1 희소행렬을 로드하기 위해 필요한 메모리(120) 내의 제1 영역에 대해서는 제2 영역 및 제3 영역에 할당하고 남은 모든 영역을 할당할 수 있다That is, the memory allocation apparatus 100 allocates the memory 120 necessary for recording the second area in the memory 120 necessary for loading the second sparse matrix and the third result matrix C resulting from the multiplication for the two sparse matrices, The minimum area can be allocated to the third area within the area. The memory allocation apparatus 100 may allocate all of the remaining areas allocated to the second area and the third area for the first area in the memory 120 required for loading the first sparse matrix

이를 통해, 메모리 할당 장치(100)는 제1 희소행렬을 나누어 로드하는 횟수를 줄여 제2 희소행렬의 전체를 로드하는 횟수를 줄일 수 있다. 이때, 두 희소행렬을 곱한 결과 생성된 제3 결과행렬 C의 양은 메모리 영역의 크기와 상관없이 일정하기 때문에, 할당하는 영역의 크기를 최소화해도 기록되는 양은 변하지 않는다. 따라서 메모리 할당 장치(100)는 inner product를 수행하여 두 희소행렬을 곱하는 경우, 제2 희소행렬 B를 로드할 수 있는 제2 영역과 두 희소행렬의 곱셈 결과 생성된 제3 결과행렬 C를 담을 수 있는 제3 영역을 최소화하고, 제1 희소행렬 A를 로드할 수 있는 제1 영역에 나머지 메모리 영역을 모두 할당함으로써 IO 비용을 최소화할 수 있다.Accordingly, the memory allocation apparatus 100 can reduce the number of times of loading the entire second sparse matrix by reducing the number of times the first sparse matrix is divided and loaded. At this time, since the amount of the third result matrix C generated as a result of multiplying by the two sparse matrices is constant irrespective of the size of the memory area, the amount to be recorded does not change even if the size of the area to be allocated is minimized. Accordingly, when the inner product is performed and the two sparse matrices are multiplied, the memory allocation apparatus 100 can hold the second resultant matrix C that is the result of the multiplication of the two sparse matrices and the second region in which the second sparse matrix B can be loaded It is possible to minimize the IO cost by minimizing the third area and allocating the remaining memory areas to the first area where the first sparse matrix A can be loaded.

(2) Row-row product 기반의 대용량 희소 행렬의 곱셈 방법도 inner product 기반의 대용량 희소 행렬의 곱셈 방법과 같이 block nested loop join을 이용한 두 relation의 join과 유사하게 동작한다. Row-row product를 이용해서 대용량 희소 행렬의 곱셈을 수행할 때, 두 희소행렬을 로드하는 방식은 inner product와 동일하다. 즉, 두 희소행렬을 부분 행렬로 나누어 메모리로 적재하고 곱셈 연산을 수행할 수 있다. 그러나 inner product와는 다르게 두 희소행렬의 부분 행렬에 대한 곱셈 연산으로 완전하지 않은 중간 결과를 얻기 때문에, block nested loop join에서의 방법만 적용해서는 성능을 최적화시키기 어렵다. 따라서 본 발명은 row-row product의 특성을 분석해서 성능을 최적화시킬 수 있는 방법을 제안한다. (2) Multiplication of large-capacity sparse matrices based on row-row products also works similar to joining two relations using block-nested loop joins as well as inner products-based multiplication of large scalar matrices. When performing a multiplication of a large sparse matrix using the row-row product, the way of loading the two sparse matrices is the same as the inner product. That is, two sparse matrices can be divided into partial matrices, loaded into memory, and multiplication operations can be performed. However, it is difficult to optimize the performance by applying only the method of block nested loop join because it obtains an intermediate result which is not perfect by multiplying the partial matrix of two sparse matrices differently from inner product. Therefore, the present invention proposes a method for optimizing the performance by analyzing the characteristics of the row-row product.

제1 희소행렬 A 및 제2 희소행렬 B와 제3 결과행렬 C에 할당하는 메모리 영역의 크기에 따라 (1)두 희소행렬의 부분 행렬을 메모리(120)에 적재하는 과정, (2) 메모리(120) 상에서 중간 결과를 정렬하고 합병하는 과정, (3) 정렬된 중간 결과를 저장장치(130)에 저장하는 과정, (4) 저장장치(130)에 있는 중간 결과들을 최종 결과로 합병하는 과정의 성능이 달라진다. 두 희소행렬의 부분 행렬을 메모리에 적재하는 과정의 성능을 향상시키기 위해서는 inner product에서와 같은 방법을 이용할 수 있다. 그러나 이 방법이 전체 수행시간을 최소화하지 않기 때문에, 나머지 과정의 성능까지 고려해서 메모리를 할당해야 한다. (1) loading a partial matrix of two sparse matrices into the memory 120 according to the size of a memory area allocated to the first sparse matrix A, the second sparse matrix B, and the third result matrix C, (2) 120), (3) storing the sorted intermediate results in the storage device 130, (4) merging the intermediate results in the storage device 130 into the final result, Performance is different. In order to improve the performance of the process of loading the partial matrix of two sparse matrices into the memory, the same method as in the inner product can be used. However, since this method does not minimize the total execution time, memory must be allocated considering the performance of the remaining processes.

메모리(120) 상에서 중간 결과를 정렬하고 합병하는 과정의 성능은 제3 결과행렬 C에 할당하는 메모리(120) 상의 제3 영역의 크기에 영향을 받는다. 이때, 제3 결과행렬 C에 할당하는 제3 영역의 크기가 클수록 한 번에 정렬해야 하는 원소의 수가 많아지기 때문에 정렬 비용이 커지는 단점이 있다.The performance of the process of sorting and merging intermediate results on memory 120 is influenced by the size of the third region on memory 120 that allocates to third result matrix C. [ At this time, as the size of the third area allocated to the third result matrix C is larger, the number of elements to be aligned at one time increases, which increases the cost of sorting.

정렬된 중간 결과를 저장장치(130)에 쓰는 과정의 성능을 향상시키기 위한 방법은 2가지가 있다. 첫 번째, 저장장치(130)에 순차적으로 접근하도록 제3 결과행렬 C에 할당하는 제3 영역의 크기를 늘린다. 두 번째, 메모리(120) 상에서 합병되는 중간 결과의 수를 늘려 저장장치(130)에 저장하는 양을 줄인다. There are two methods for improving the performance of the process of writing the sorted intermediate results to the storage device 130. [ First, the size of the third area allocated to the third result matrix C is increased so that the storage area 130 is sequentially accessed. Second, the number of intermediate results merged on the memory 120 is increased to reduce the amount of storage in the storage device 130.

제3 결과행렬 C에 할당하는 제3 영역의 크기가 커지면, 저장장치(130)에 저장하기 전에 담아둘 수 있는 중간 결과가 많으므로 서로 합병될 수 있는 원소가 있을 확률이 높다. 또한, 제2 희소행렬 B에 할당하는 제2 영역의 크기가 커지면, 제1 희소행렬 A에 포함된 행 1개와 연속적으로 계산될 수 있는 제2 희소행렬 B에 포함된 행의 수가 많아지므로 서로 합병될 수 있는 중간 결과들이 연속적으로 제3 영역에 담기게 된다. 따라서, 제2 희소행렬 B에 할당하는 제2 영역과 제3 결과행렬 C에 할당하는 제3 영역의 크기가 클수록 메모리(120) 상에서 합병되는 중간 결과의 수가 많아진다. If the size of the third area allocated to the third result matrix C is large, there are many intermediate results that can be stored before being stored in the storage device 130, and there is a high probability that there are elements that can be merged with each other. When the size of the second area allocated to the second sparse matrix B is increased, the number of rows included in the first sparse matrix A and the number of rows included in the second sparse matrix B, which can be calculated in succession, The intermediate results that can be included are continuously included in the third area. Therefore, the larger the size of the second area allocated to the second sparse matrix B and the third area allocated to the third result matrix C, the greater the number of intermediate results to be merged on the memory 120. [

저장장치(130)에 있는 중간 결과들을 최종 결과로 합병하는 과정의 성능은 합병해야 할 원소의 수가 적고, file의 수가 줄어들수록 향상된다. 즉, 정렬된 중간 결과를 저장장치(130)에 저장하는 과정의 성능을 향상시키는 방법과 동일하다. The performance of the process of merging the intermediate results in the storage device 130 into the final result is improved as the number of elements to be merged is reduced and the number of files is decreased. That is, it is the same as the method of improving the performance of the process of storing the sorted intermediate results in the storage device 130.

이때, 제1 희소행렬 A, 제2 희소행렬 B 및 제3 결과행렬 C에 할당하는 메모리 영역의 크기는 서로 다른 곱셈 연산 과정의 성능에 영향을 주므로 이를 고려해서 할당해야 한다. At this time, the sizes of the memory areas to be allocated to the first sparse matrix A, the second sparse matrix B, and the third result matrix C are influenced by the performance of different multiplication operation processes.

제1 희소행렬 A에 할당하는 제1 영역의 크기는 제1 희소행렬의 부분행렬 및 제2 희소행렬의 부분 행렬을 메모리(120)에 적재하는 과정의 성능에만 영향을 준다. Inner product에서처럼 제1 희소행렬 A에 많은 메모리 영역의 크기를 할당해서 이 과정의 성능만 향상시키는 것은 비효율적이다. 따라서 메모리 할당 장치(100)는 제2 희소행렬 B를 nested loop join에서의 outer relation에 대응시키고, 제1 희소행렬 A에 최소한의 메모리 영역의 크기를 할당할 수 있다. The size of the first area allocated to the first sparse matrix A affects only the performance of the process of loading the partial matrix of the first sparse matrix and the partial matrix of the second sparse matrix into the memory 120. [ It is inefficient to improve the performance of this process by allocating a large memory area size to the first sparse matrix A as in the inner product. Accordingly, the memory allocation apparatus 100 can map the second sparse matrix B to the outer relation in the nested loop join, and allocate a minimum memory area size to the first sparse matrix A.

이때, 할당하고 남은 메모리 영역의 크기의 대부분을 제2 희소행렬 B의 부분 행렬을 로드하기 위한 제2 영역에 할당하는 경우, 제3 결과행렬 C에 할당되는 제3 영역의 크기가 작아서 합병되는 중간 결과의 수가 적어진다. 따라서, 합병되는 중간 결과의 수를 늘리기 위해 제1 희소행렬 A의 부분행렬을 로드하기 위한 제1 영역에 할당하고 남은 메모리 영역의 크기를 제2 희소행렬 B에 할당하는 제2 영역과 제3 결과행렬 C에 할당하는 제3 영역에 나누어 할당하는 방법이 필요하다. In this case, when allocating most of the size of the allocated and remaining memory area to the second area for loading the partial matrix of the second sparse matrix B, the size of the third area allocated to the third result matrix C is small, The number of results is reduced. Therefore, in order to increase the number of intermediate results to be merged, a second area for allocating the partial matrix of the first sparse matrix A to the first area for allocating the remaining memory area to the second sparse matrix B, And a third area allocated to the matrix C is required.

제1 희소행렬 A에 포함된 행 벡터 1개와 제2 희소행렬 B에 포함된 복수의 행 벡터를 곱하였을 때의 중간 결과들은 모두 제3 결과행렬 C에 포함되는 행 벡터 1개로 합병될 수 있는 결과들이다. 즉, 제1 희소행렬 A에 포함된 원소 1개와 제2 희소행렬 B에 포함된 행 벡터 1개를 곱해서 제3 결과행렬 C에 포함되는 행 벡터 1개에 대한 중간 결과를 얻기 때문에, 제1 희소행렬 A에 포함된 행 벡터 1개와 제2 희소행렬 B에 포함된 행 벡터 k개를 곱해서 나오는 제3 결과행렬 C에 포함되는 행 벡터 수는 최대 k개이다. 따라서 합병이 가능한 중간 결과들을 모두 담아둘 수 있는 공간을 확보하기 위해 제3 결과행렬 C에 할당하는 제3 영역 크기는 제2 희소행렬 B에 할당하는 제2 영역의 크기보다 같거나 커야 한다. The intermediate results when one row vector included in the first sparse matrix A is multiplied by a plurality of row vectors included in the second sparse matrix B are all the results that can be merged into one row vector included in the third result matrix C admit. That is, since an intermediate result for one row vector included in the third result matrix C is obtained by multiplying one element included in the first sparse matrix A and one row vector included in the second sparse matrix B, The number of the row vectors included in the third result matrix C obtained by multiplying one row vector included in the matrix A and k row vectors included in the second sparse matrix B is at most k. Therefore, the size of the third area allocated to the third result matrix C should be equal to or larger than the size of the second area allocated to the second sparse matrix B, in order to secure a space for storing all the intermediate results that can be merged.

이와 같이 제2 희소행렬 B에 할당하는 제2 영역의 크기가 클수록 제1 희소행렬의 부분 행렬 및 제2 희소행렬의 부분 행렬을 메모리(120)에 적재하는 과정의 성능이 향상되고, 제3 결과행렬 C에 할당하는 제3 영역의 크기가 작을수록 메모리(120) 상에서 중간 결과를 정렬하는 과정의 성능이 향상되기 때문에, 메모리 할당 장치(100)는 제1 희소행렬 A의 부분행렬을 로드하기 위한 제1 영역에 할당하고 남은 메모리 영역의 크기를 제2 희소행렬 B에 할당하는 제2 영역과 제3 결과행렬 C에 할당하는 제3 영역에 동일하게 나누어 할당할 수 있다.As the size of the second area allocated to the second sparse matrix B increases, the performance of the process of loading the partial matrix of the first sparse matrix and the partial matrix of the second sparse matrix into the memory 120 is improved, Since the performance of the process of sorting the intermediate results on the memory 120 is improved as the size of the third area allocated to the matrix C is smaller, the memory allocation apparatus 100 can not allocate the partial matrix of the first sparse matrix A The size of the remaining memory area allocated to the first area may be equally divided into a second area allocated to the second sparse matrix B and a third area allocated to the third result matrix C,

(3) Outer product 기반의 대용량 희소 행렬의 곱셈 방법은 데이터베이스 분야에서 잘 알려진 sort merge join과 유사하게 동작한다. 우선 메모리 할당 장치(100)는 제1 희소행렬 A에 포함된 열 벡터와 제2 희소행렬 B에 포함된 행 벡터의 인덱스(index)를 비교해서 같은 경우 곱셈 연산을 수행할 수 있다. 제1 희소행렬 A에 포함된 열 벡터에 대한 인덱스와 제2 희소행렬 B에 포함된 행 벡터의 인덱스는 서로 중복되지 않기 때문에, join attribute의 중복이 없는 sort merge join과 유사하게 동작한다. (3) The multiplication method of large-capacity sparse matrices based on the outer product works similar to the sort merge join which is well known in the database field. The memory allocation apparatus 100 may compare the column vectors included in the first sparse matrix A and the indexes of the row vectors included in the second sparse matrix B to perform a multiplication operation in the same case. Since the indexes of the column vectors included in the first sparse matrix A and the indexes of the row vectors included in the second sparse matrix B do not overlap with each other, they operate similarly to the sort merge join without overlapping join attributes.

이 때, 메모리 할당 장치(100)는 제1 희소행렬 A를 복수의 열 벡터로 이루어진 부분 행렬 단위로 메모리(120) 상의 제1 영역에 로드하고, 제2 희소행렬 B를 복수의 행 벡터로 이루어진 부분 행렬 단위로 메모리(120) 상의 제2 영역에 로드할 수 있다. 메모리 할당 장치(100)는 메모리(120) 상에 로드된 제1 희소행렬 A의 부분 행렬 및 제2 희소행렬 B의 부분 행렬들을 이용해서 곱셈 연산을 수행하고, 곱셈 연산을 수행한 결과 생성된 제3 결과행렬 C들을 메모리(120) 상의 제3 영역에 담아두었다가 제3 영역에 대응하는 메모리 공간이 가득 차면 제3 결과행렬 C를 저장장치(130)에 저장할 수 있다.At this time, the memory allocation apparatus 100 loads the first sparse matrix A into the first area on the memory 120 in units of partial matrix consisting of a plurality of column vectors, and the second sparse matrix B is composed of a plurality of row vectors And may be loaded into the second area on the memory 120 in units of submatrices. The memory allocation apparatus 100 performs a multiplication operation using the partial matrix of the first sparse matrix A and the partial matrixes of the second sparse matrix B loaded on the memory 120, 3 result matrix C in a third area on the memory 120 and store the third result matrix C in the storage device 130 when the memory space corresponding to the third area is full.

Row-row product와 마찬가지로, 제1 희소행렬 A 및 제2 희소행렬 B와 결과행렬 C에 할당하는 메모리 영역의 크기에 따라 (1) 두 희소행렬의 부분 행렬을 메모리에 적재하는 과정, (2) 메모리(120) 상에서 중간 결과를 정렬하고 합병하는 과정, (3) 정렬된 중간 결과를 저장장치(130)에 저장하는 과정, (4) 저장장치(130)에 있는 중간 결과들을 최종 결과로 합병하는 과정의 성능이 달라진다. (1) loading a partial matrix of two sparse matrices into a memory according to the sizes of memory areas allocated to the first sparse matrix A and the second sparse matrix B and the result matrix C, as in the case of the row-row product, (2) (3) storing the sorted intermediate results in the storage device 130; (4) merging the intermediate results in the storage device 130 into the final result; The performance of the process changes.

Sort merge join의 특성으로 제1 희소행렬 A에 할당하는 제1 영역 및 제2 희소행렬 B에 할당하는 제2 영역의 크기와 상관없이 제1 희소행렬 A 및 제2 희소행렬 B를 읽는 횟수는 한 번으로 일정하지만, 할당하는 메모리(120 상의 제1 영역 및 제2 영역의 크기가 클수록 저장장치(130)에 순차 접근하는 효과가 있다. The number of times of reading the first sparse matrix A and the second sparse matrix B without regard to the size of the first region allocated to the first sparse matrix A and the second region allocated to the second sparse matrix B as the characteristics of the sort merge join is However, the greater the size of the first area and the second area on the memory 120 to be allocated, the more sequential access is made to the storage device 130.

정렬된 중간 결과를 저장장치(130)에 저장하는 과정의 성능은 제3 결과행렬 C에 할당하는 제3 영역의 크기가 클수록 좋아진다. 이는 저장장치(130)에 순차 접근하는 효과뿐만 아니라, 중간 결과를 쓰는 양을 줄일 수 있기 때문이다. 제1 희소행렬 A에 포함된 열 벡터 1개와 제2 희소행렬 B에 포함된 행 벡터 1개에 대한 곱셈 결과 생성된 불완전한 제3 결과행렬 C를 제3 영역에 기록하였을 때, 제3 영역의 공간이 가득 차지 않는다면, 그 다음 벡터에 대한 곱셈 연산 결과를 담을 수 있다. 이 때, 생성된 불완전한 제3 결과행렬 C의 동일한 위치에 속하는 원소에 대한 중간 결과들은 메모리(120) 상의 제3 영역에서 정렬된 후 합병하는 과정을 거쳐 하나의 값으로 합쳐진다. 또한, 많은 원소를 하나의 file에 쓰게 되므로, 중간 결과를 저장하는 file의 개수가 줄어든다. 따라서 저장장치(130)에 있는 중간 결과들을 합병하기 위한 외부 정렬의 성능을 향상시킬 수 있다. The performance of the process of storing the sorted intermediate results in the storage device 130 is improved as the size of the third area allocated to the third result matrix C is larger. This is because not only the sequential access to the storage device 130 but also the amount of writing intermediate results can be reduced. When the incomplete third result matrix C generated as a result of the multiplication of one row vector included in the first sparse matrix A and one row vector included in the second sparse matrix B is written in the third region, If it is not full, then it can contain the result of the multiplication operation on the vector. At this time, the intermediate results for the elements belonging to the same position in the generated incomplete third result matrix C are merged in a third region on the memory 120, merged and merged into one value. Also, since many elements are written to a single file, the number of files that store intermediate results is reduced. Thus enhancing the performance of external alignment to merge intermediate results in storage device 130.

예를 들어, m원 합병 정렬은 정렬해야 할 전체 원소의 수가 같을 때, 내부적으로 정렬된 file의 수가 적을수록 좋은 성능을 보인다. 반면, 메모리(120) 상의 제3 영역에서 중간 결과를 정렬하는 과정의 성능은 제3 결과행렬 C에 할당한 제3 영역의 크기가 클수록 나빠진다. 만약 메모리(120) 상의 제 3영역에서 많은 중간 결과들이 합병된다면, 저장장치(130)에 있는 중간 결과들의 양이 줄어들어 정렬하고 합병하는 비용이 줄어든다. For example, the m-ary merger sort shows better performance when the number of all elements to be sorted is equal, and the number of internally sorted files is less. On the other hand, the performance of the process of aligning the intermediate result in the third area on the memory 120 becomes worse as the size of the third area allocated to the third result matrix C is larger. If many intermediate results are merged in the third region on the memory 120, the amount of intermediate results in the storage device 130 is reduced and the cost of sorting and merging is reduced.

다만, 희소 행렬 내 원소의 위치가 불규칙하기 때문에, 메모리(120) 상의 제3 영역에서 합병되는 중간 결과의 양을 예측하기 어려우므로 실험을 통해 최적의 성능을 낼 수 있는 메모리 할당 방법을 확인하였다. 그 결과 메모리 할당 장치(100)는 두 희소행렬의 곱셈 결과 생성된 제3 결과행렬 C에 전체 메모리(120) 영역의 절반 크기에 해당하는 제3 영역을 할당하고, 제3 영역에 할당하고 남은 메모리 영역의 크기를 제1 희소행렬 A에 할당하는 제1 영역 및 제2 희소행렬 B에 할당하는 제2 영역에 동일하게 나누어 할당 할 수 있다.However, since the positions of the elements in the sparse matrix are irregular, it is difficult to predict the amount of the intermediate result to be merged in the third area on the memory 120, so that the memory allocation method capable of achieving the optimum performance through experiments has been confirmed. As a result, the memory allocation apparatus 100 allocates the third area corresponding to half the size of the entire memory area 120 to the third result matrix C generated as a result of the multiplication of the two sparse matrices, allocates the third area to the remaining memory The size of the area may be equally divided into a first area allocated to the first sparse matrix A and a second area allocated to the second sparse matrix B. [

도 4는 본 발명의 일실시예에 따른 메모리 할당 장치가 두 희소행렬에 대해 곱셈 연산하는 방법을 플로우챠트로 도시한 도면이다.FIG. 4 is a flowchart illustrating a method of performing a multiplication operation on two sparse matrices by a memory allocation apparatus according to an embodiment of the present invention. Referring to FIG.

단계(410)에서 메모리 할당 장치(100)는 메모리(120)의 전체 영역의 크기를 식별할 수 있다. In step 410, the memory allocation apparatus 100 may identify the size of the entire area of the memory 120. [

단계(420)에서 메모리 할당 장치(100)는 제1 희소 행렬을 로드하기 위해 필요한 메모리(120) 상의 제1 영역의 크기, 제2 희소 행렬을 로드하기 위해 필요한 메모리(120) 상의 제2 영역의 크기 및 제1 희소행렬과 제2 희소행렬에 대한 곱셈 결과 생성된 제3 결과행렬을 기록하기 위한 메모리(120) 상의 제3 영역의 크기를 확인할 수 있다.In step 420, the memory allocation apparatus 100 determines the size of the first area on the memory 120 needed to load the first sparse matrix, the size of the second area on the memory 120 required to load the second sparse matrix Size and size of the third region on the memory 120 for recording the third result matrix generated as a result of the multiplication for the first sparse matrix and the second sparse matrix.

단계(430)에서 메모리 할당 장치(100)는 식별된 메모리(120)의 전체 영역의 크기과 확인된 제1 영역의 크기, 제2 영역의 크기 및 제3 영역의 크기의 합에 기초하여 제1 희소행렬과 제2 희소행렬의 곱셈 방식에 따라 메모리(120)의 영역을 할당할 수 있다.In step 430, the memory allocation apparatus 100 determines whether or not the first rare < RTI ID = 0.0 > (n) < The area of the memory 120 may be allocated according to the multiplication method of the matrix and the second sparse matrix.

본 발명은 제1 희소행렬 및 제2 희소행렬이 메모리(120) 상에 한 번에 로드되는 것이 불가능한 상황을 가정한다. 따라서, 메모리 할당 장치(100)는 식별된 메모리(120)의 전체 영역의 크기가 확인된 제1 영역의 크기, 제2 영역의 크기 및 제3 영역의 크기의 합보다 작은 경우 제1 희소행렬과 제2 희소행렬의 곱셈 방식에 따라 메모리(120)의 영역을 할당할 수 있다.The present invention assumes a situation where it is impossible for the first sparse matrix and the second sparse matrix to be loaded on the memory 120 at one time. Therefore, when the size of the entire area of the identified memory 120 is smaller than the sum of the size of the identified first area, the size of the second area, and the size of the third area, the memory allocation apparatus 100 allocates the first sparse matrix The area of the memory 120 can be allocated according to the multiplication method of the second sparse matrix.

구체적으로 메모리 할당 장치(100)는 Inner product 기반의 대용량 희소 행렬에 대해 곱셈 연산을 수행하는 경우, 제2 희소행렬을 로드하기 위해 필요한 메모리(120) 내의 제2 영역 및 두 희소행렬에 대한 곱셈 결과 생성된 제3 결과행렬을 기록하기 위해 필요한 메모리(120) 내의 제3 영역에 대해서는 최소 영역을 할당하고, 제1 희소행렬을 로드하기 위해 필요한 메모리(120) 내의 제1 영역에 대해서는 제2 영역 및 제3 영역에 할당하고 남은 모든 영역을 할당할 수 있다Specifically, when performing the multiplication operation on the inner product-based large capacity sparse matrix, the memory allocation apparatus 100 calculates the multiplication result for the second region and the two sparse matrices in the memory 120 necessary for loading the second sparse matrix A minimum area is allocated for a third area in the memory 120 required to record the generated third result matrix, a second area is allocated for a first area in the memory 120 required to load the first sparse matrix, All the remaining areas allocated to the third area can be allocated

그리고 메모리 할당 장치(100)는 Row-row product 기반의 대용량 희소 행렬에 대해 곱셈 연산을 수행하는 경우, 제1 희소행렬을 로드하기 위해 필요한 메모리(120) 내의 제1 영역에 대해서는 최소 영역을 할당하고, 제2 희소행렬에 할당하는 제2 영역과 제3 결과행렬에 할당하는 제3 영역에 대해서는 제1 영역에 할당하고 남은 메모리(120) 영역의 크기를 동일하게 나누어 할당할 수 있다.When the memory allocation apparatus 100 performs a multiplication operation on a large-capacity sparse matrix based on a row-row product, the memory allocation apparatus 100 allocates a minimum area for the first area in the memory 120 required for loading the first sparse matrix A second area allocated to the second sparse matrix and a third area assigned to the third result matrix can be allocated to the first area and the remaining area of the memory 120 equally.

또한 메모리 할당 장치(100)는 Outer product 기반의 대용량 희소 행렬에 대해 곱셈 연산을 수행하는 경우, 제1 희소행렬 및 제2 희소행렬에 대한 곱셈 결과 생성된 제3 결과행렬에 대해서는 전체 메모리(120) 영역의 절반 크기에 해당하는 제3 영역을 할당하고, 제1 희소행렬 A에 할당하는 제1 영역 및 제2 희소행렬 B에 대해서는 제3 영역에 할당하고 남은 메모리 영역의 크기를 동일하게 나누어 할당 할 수 있다.In addition, when the memory allocation apparatus 100 performs a multiplication operation on an outer product-based large-capacity sparse matrix, the memory allocation apparatus 100 allocates the first result matrix to the entire memory 120 for the third result matrix generated as a result of the multiplication of the first sparse matrix and the second sparse matrix. A third area corresponding to a half size of the area is allocated, and a first area allocated to the first sparse matrix A and a second sparse matrix B are allocated to the third area, and the remaining memory areas are equally divided .

단계(440)에서 메모리 할당 장치(100)는 할당된 제1 영역에 제1 희소 행렬의 부분 행렬을 로드하고, 할당된 제2 영역에 제2 희소 행렬의 부분 행렬을 로드할 수 있다.In step 440, the memory allocation apparatus 100 may load the partial matrix of the first sparse matrix into the allocated first region and load the partial matrix of the second sparse matrix into the allocated second region.

단계(450)에서 메모리 할당 장치(100)는 로드된 제1 희소행렬 및 제2 희소행렬의 부분 행렬에 대한 곱셈 연산을 수행할 수 있다. 이후 메모리 할당 장치(100)는 곱셈 연산을 수행한 결과 생성된 제3 결과행렬을 메모리(120) 상의 제3 영역에 기록하고, 제3 영역이 가득 차는 경우 제3 영역에 기록된 제3 결과행렬을 저장장치(130)에 저장할 수 있다.In step 450, the memory allocation apparatus 100 may perform a multiplication operation on the partial matrices of the first sparse matrix and the second sparse matrix that are loaded. Then, the memory allocation apparatus 100 writes the third result matrix generated as a result of the multiplication operation into the third area on the memory 120. When the third area is full, the third result matrix May be stored in the storage device 130.

이때, inner product를 이용하여 제1 희소행렬 및 제2 희소행렬의 부분 행렬에 대한 곱셈 연산을 수행하는 것과는 달리 row-row product 및 outer product를 이용하여 제1 희소행렬 및 제2 희소행렬의 부분 행렬에 대한 곱셈 연산을 수행하는 경우, 완전하지 않는 중간 결과를 얻을 수 있다. 따라서, row-row product 및 outer product를 이용하여 제1 희소행렬 및 제2 희소행렬의 부분 행렬에 대한 곱셈 연산을 수행하는 경우 메모리 할당 장치(100)는 저장 장치(130)에 저장된 제3 결과행렬을 통합함으로써 최종 제3 결과행렬을 생성할 수 있다.Unlike the multiplication operation for the partial matrixes of the first sparse matrix and the second sparse matrix using the inner product, the first sparse matrix and the partial matrix of the second sparse matrix are calculated using the row-row product and the outer product, , It is possible to obtain an incomplete intermediate result. Accordingly, when performing the multiplication operation on the partial matrixes of the first sparse matrix and the second sparse matrix using the row-row product and the outer product, the memory allocation apparatus 100 allocates the third result matrix stored in the storage device 130 To produce a final third result matrix.

단계(460)에서 메모리 할당 장치(100)는 제1 희소행렬 및 제2 희소행렬의 부분 행렬 중 로드 되지 않고 남아있는 부분 행렬의 존재 여부를 식별할 수 있다. 만약 제1 희소행렬 및 제2 희소행렬의 부분 행렬 중 로드 되지 않고 남아있는 부분 행렬이 존재한다면 메모리 할당 장치(100)는 단계(440) 내지 단계(450)을 반복적으로 수행하여 제1 희소행렬 및 제2 희소행렬의 모든 부분 행렬에 대해서 곱셈 연산이 수행될 수 있도록 할 수 있다.In step 460, the memory allocation apparatus 100 may identify the presence or absence of a partial matrix of the first sparse matrix and the second sparse matrix that remains unloaded. If there is a partial matrix remaining unloaded among the partial matrixes of the first sparse matrix and the second sparse matrix, the memory allocation apparatus 100 repeatedly performs steps 440 through 450 to calculate the first sparse matrix and So that a multiplication operation can be performed on all partial matrices of the second sparse matrix.

이와는 달리 제1 희소행렬 및 제2 희소행렬의 부분 행렬 중 로드 되지 않고 남아있는 부분 행렬이 존재하지 않는다면 메모리 할당 장치(100)는 두 희소행렬에 대한 곱셈 연산을 종료할 수 있다.Alternatively, if there is no partial matrix remaining unloaded among the partial matrixes of the first sparse matrix and the second sparse matrix, the memory allocation apparatus 100 may terminate the multiplication operation for the two sparse matrices.

본 발명의 실시 예에 따른 방법들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. The methods according to embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and configured for the present invention or may be available to those skilled in the art of computer software.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

100 : 메모리 할당 장치
110 : 프로세서
120 : 메모리
130 : 저장 장치100: memory allocation device
110: Processor
120: Memory
130: Storage device

Claims

Identifying an entire area of memory;
The size of the first area in the memory needed to load the first sparse matrix, the size of the second area in the memory needed to load the second sparse matrix, and the multiplication result for the first sparse matrix and the second sparse matrix Confirming a size of a third area in the memory for writing the generated third result matrix; And
And a second sparse matrix multiplication method based on a sum of the entire area of the identified memory and the size of the identified first area, the size of the second area, and the size of the third area, Quot;
/ RTI >

The method according to claim 1,
Wherein the assigning comprises:
When multiplying the first sparse matrix and the second sparse matrix based on inner product,
Allocating a minimum area to a second area and a third area of the memory, and allocating a remaining area to the first area.

3. The method of claim 2,
Wherein the second area and the third area are allocated the same size area.

The method according to claim 1,
Wherein the assigning comprises:
When the first sparse matrix and the second sparse matrix are multiplied based on the row-row product,
Allocating a minimum area to the first area of the memory, and allocating the remaining area to the second area and the third area.

5. The method of claim 4,
Wherein the second area and the third area are allocated the same size area.

The method according to claim 1,
Wherein the assigning comprises:
When multiplying the first sparse matrix and the second sparse matrix based on the outer product,
Allocating an area corresponding to half the size of the memory to a third area of the memory, and allocating the allocated area to the first area and the second area.

The method according to claim 6,
Wherein the first area and the second area are allocated the same size area.

The method according to claim 1,
Loading a partial matrix of the first sparse matrix into the allocated first region and loading a partial matrix of the second sparse matrix into the allocated second region;
Performing a multiplication of the partial matrix of the loaded first sparse matrix and the partial matrix of the second sparse matrix; And
And a third result matrix generated by performing a multiplication of the partial matrix of the first sparse matrix and the partial matrix of the second sparse matrix in the allocated third region and processing
&Lt; / RTI >

9. The method of claim 8,
Wherein the loading step comprises:
Identifying a partial matrix of the first sparse matrix and a partial matrix of the second sparse matrix that remains unloaded; And
Loading the remaining partial matrixes into a first area and a second area, respectively
&Lt; / RTI >

A memory for loading the partial matrix of the first sparse matrix and the partial matrix of the second sparse matrix to multiply the first sparse matrix and the second sparse matrix;
Identifying a total area of the memory, determining a size of a first area in the memory necessary to load the first sparse matrix, a size of a second area in the memory required to load a second sparse matrix, The size of the first region, the size of the second region, and the size of the first region, and the size of the second region in the memory, A processor for allocating an area of the memory according to a multiplication scheme of the first sparse matrix and the second sparse matrix based on a sum of sizes of three areas,
And a memory.

11. The method of claim 10,
The processor comprising:
When multiplying the first sparse matrix and the second sparse matrix based on inner product,
Allocating a minimum area to the second area and the third area of the memory, and allocating the remaining area to the first area.

11. The method of claim 10,
The processor comprising:
When the first sparse matrix and the second sparse matrix are multiplied based on the row-row product,
Allocating a minimum area to the first area of the memory, and allocating the allocated area to the second area and the third area.

11. The method of claim 10,
The processor comprising:
When multiplying the first sparse matrix and the second sparse matrix based on the outer product,
Allocating an area corresponding to half the size of the memory to a third area of the memory, and allocating the allocated area to the first area and the second area.

11. The method of claim 10,
The processor comprising:
A partial matrix of the first sparse matrix is loaded in the allocated first region, a partial matrix of the second sparse matrix is loaded in the allocated second region, and a partial matrix of the loaded first sparse matrix and a partial matrix of the second rare matrix And a third result matrix generated by performing a multiplication with respect to the partial matrix of the matrix is written to the allocated third region and processed.

15. The method of claim 14,
The processor comprising:
Storing the result recorded in the third area in a storage device when the allocated third area is full,
And integrates the result stored in the storage device when the result recorded in the third area is generated by a row-row product or an outer product.

Identifying an entire area of memory;
The size of the first area in the memory needed to load the first sparse matrix of the two sparse matrices, the size of the second area in the memory required to load the second sparse matrix, and the result of multiplying the two sparse matrices Confirming a size of a third area in the memory to make the first area;
Allocating a region of the memory according to a multiplication scheme of the two sparse matrices based on the total area of the identified memory, the size of the identified first area, the size of the second area, and the size of the third area ;
Loading a partial matrix of the first sparse matrix into the allocated first region and loading a partial matrix of the second sparse matrix into the allocated second region;
Performing a multiplication on the partial matrix of the two loaded sparse matrices; And
Recording a result of performing a multiplication of a partial matrix of the two sparse matrices in the allocated third area;
Merging the results recorded in the allocated third area;
Storing the result recorded in the third area in a storage device when the allocated third area is full; And
Integrating the results stored in the storage device
/ RTI >

17. The method of claim 16,
Wherein the assigning comprises:
When the two sparse matrices are multiplied based on inner product,
Allocating a minimum area to the second area and the third area of the memory, allocating the remaining area to the first area,
When the two sparse matrices are multiplied based on the row-row product,
Allocating a minimum area to the first area of the memory, allocating the remaining area to the second area and the third area,
When the two sparse matrices are multiplied based on the outer product,
Allocating an area corresponding to one half of the memory to a third area of the memory, and allocating the remaining area to the first area and the second area.