KR102268813B1

KR102268813B1 - Method and System for design of field programmable gate array for deep learning algorithm

Info

Publication number: KR102268813B1
Application number: KR1020200174243A
Authority: KR
Inventors: 전해룡
Original assignee: 주식회사 모빌린트
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-06-25
Also published as: WO2022131389A1

Abstract

Provided are a method and system for designing a field programmable gate array (FPGA) optimized for a deep learning algorithm. The method of the present invention comprises: a module architecture designing step of setting a module architecture including at least one of a convolution operation, an addition operation, a pooling operation, and an activation operation for each layer in accordance with a ResNet algorithm layer structure; a control architecture designing step of connecting input data and weight data in parallel to one or more module architecture to set a data path in accordance with the operation within the one or more module architectures; and an FPGA designing step of determining a location where the overall architecture including the one or more module architecture and the control architecture for the one or more module architecture is disposed within the FPGA, based on a resource size of the FPGA.

Description

FPGA design method and system for deep learning algorithm {Method and System for design of field programmable gate array for deep learning algorithm}

본 발명은 딥러닝 알고리즘을 위한 FPGA (Field Programmable Gate Array) 설계 방법 및 시스템에 관한 것이다. The present invention relates to a method and system for designing an FPGA (Field Programmable Gate Array) for a deep learning algorithm.

딥러닝이란 인간의 뉴런을 모방한 인공신경망을 이용하는 인공지능의 한 분야로써 연속된 층(Layer)에서 점진적으로 의미있는 표현을 배우고, 데이터로부터 표현 및 특징을 추출하는데 강점을 가진 기계 학습 (machine learnming) 방법의 하나이다. 충분한 훈련/학습이 수행되는 경우, 상기 딥러닝에 따르면 기존의 알고리즘에 비해 매우 높은 성능의 분류, 예측 등이 가능한 바, 상기 딥러닝을 다양한 분야에 적용하기 위한 여러 방안들이 제안되고 있다. Deep learning is a field of artificial intelligence that uses artificial neural networks that mimic human neurons. Machine learning (machine learning) has the strength to gradually learn meaningful expressions from successive layers and extract expressions and features from data. ) is one of the methods. When sufficient training/learning is performed, according to the deep learning, classification and prediction of very high performance are possible compared to the existing algorithms, and various methods for applying the deep learning to various fields have been proposed.

다만, 이와 같은 딥러닝 알고리즘은 특성상 매우 연산량이 크기 때문에 이를 전용으로 처리해주는 연산 장치를 필요로 한다. 이에, 상기 딥러닝 알고리즘을 위한 전용 연산 장치에 대한 연구가 활발히 진행되고 있으며, 이 중 한가지 방법으로 FPGA (field programmable gate array)를 이용하여 딥러닝 알고리즘 연산기를 설계하는 방법도 활발히 진행되고 있다.However, since such a deep learning algorithm has a very large amount of computation in nature, it requires a computational device that handles it exclusively. Accordingly, research on a dedicated computing device for the deep learning algorithm is being actively conducted, and one of these methods is to design a deep learning algorithm operator using a field programmable gate array (FPGA).

공개특허공보 제10-2018-0125843호, 2018.11.26Laid-open Patent Publication No. 10-2018-0125843, 2018.11.26

본 발명이 해결하고자 하는 과제는 딥러닝 알고리즘을 위해 최적화된 FPGA 설계 방법 및 시스템을 제공하는 것이다.The problem to be solved by the present invention is to provide an FPGA design method and system optimized for a deep learning algorithm.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 딥러닝 알고리즘을 위한 FPGA (field programable gate array) 설계 방법은, 레즈넷 (ResNet) 신경망의 계층 (layer) 구조에 따라 각 층별로 콘볼루션 (convolution) 연산, 덧셈 (add) 연산, 풀링 연산 및 활성화 (activation) 연산을 각각 수행하기 위한 연산기들을 포함한 하나 이상의 모듈 아키텍쳐를 설계하는 단계; 상기 설계된 하나 이상의 모듈 아키텍쳐 내의 상기 연산기들이 각각 수행할 연산 데이터의 흐름을 위한 경로를 설정하여 컨트롤 아키텍쳐를 설계하는 단계; 및 FPGA (field programmable gate array)의 자원 크기에 기초하여, 상기 설계된 하나 이상의 모듈 아키텍쳐 및 상기 설계된 컨트롤 아키텍쳐를 포함한 전체 아키텍쳐가 상기 FPGA 내에서 배치될 위치를 결정하여 상기 전체 아키텍쳐를 설계하는 단계;를 포함하며, 상기 모듈 아키텍쳐 설계 단계는, 상기 하나 이상의 모듈 아키텍쳐와 동일한 모듈 아키텍쳐가 추가되는 경우, 상기 추가되는 모듈 아키텍쳐를 상기 하나 이상의 모듈 아키텍쳐가 연산하는 루프에 병렬로 연결하고, 상기 컨트롤 아키텍쳐 설계 단계는, 상기 설계된 하나 이상의 모듈 아키텍쳐 중 연산이 수행되지 않는 모듈 아키텍쳐를 감지하고, 상기 감지된 모듈 아키텍쳐로 연산 중간 시퀀스를 제공하여 대응하는 연산이 수행된 후에 상기 설계된 하나 이상의 모듈 아키텍쳐에 병합되도록 상기 경로를 설정하는 것을 특징으로 한다.A field programmable gate array (FPGA) design method for a deep learning algorithm according to an aspect of the present invention for solving the above-described problems is a convolution (field programmable gate array) for each layer according to the layer structure of a ResNet neural network. designing one or more module architectures including operators for performing a convolution) operation, an addition operation, a pooling operation, and an activation operation, respectively; designing a control architecture by setting a path for the flow of computation data to be performed by the operators in the designed one or more module architectures; and based on the resource size of a field programmable gate array (FPGA), designing the entire architecture by determining a location where the entire architecture including the designed one or more module architectures and the designed control architecture will be placed in the FPGA; wherein the module architecture design step includes, when the same module architecture as the one or more module architectures is added, the added module architecture is connected in parallel to a loop operated by the one or more module architectures, and the control architecture design step Detects a module architecture in which no operation is performed among the designed one or more module architectures, and provides an intermediate sequence of operations to the detected module architecture so that the path is merged into the designed one or more module architectures after a corresponding operation is performed. characterized in that it is set.

또한, 상기 모듈 아키텍쳐 설계 단계는, 상기 콘볼루션 연산이 1 사이클에 수행되도록 설정하고, 상기 하나 이상의 모듈 아키텍쳐로 입력되는 입력 데이터가 2개의 계층을 통과할 때마다 상기 2개의 계층 이전의 입력 데이터와 합산되도록 상기 덧셈 연산을 설정하는 것을 특징으로 한다.In addition, in the module architecture design step, the convolution operation is set to be performed in one cycle, and whenever input data input to the one or more module architectures passes through two layers, input data before the two layers and It is characterized in that the addition operation is set to be summed.

또한, 상기 콘볼루션 연산은, 3*3 곱셈 연산이고, 상기 덧셈 연산은, 상기 3*3 곱셈 연산의 결과 값의 합산이고, 상기 풀링 연산은, 상기 합산된 결과 값들 중 가장 큰 값을 추출하는 연산이며, 상기 활성화 연산은, 상기 추출된 값에 비선형 특성을 부가하는 연산인 것을 특징으로 한다.In addition, the convolution operation is a 3 * 3 multiplication operation, the addition operation is the sum of the result values of the 3 * 3 multiplication operation, and the pooling operation is to extract the largest value among the summed result values. operation, and the activation operation is an operation that adds a nonlinear characteristic to the extracted value.

삭제delete

또한, 상기 전체 아키텍쳐 설계 단계는, 상기 FPGA의 전체 자원 크기 및 각 블록 별 자원 크기를 산출하는 단계; 상기 산출된 자원 크기에 기초하여, 상기 설계된 하나의 모듈 아키텍쳐를 위한 블록 형태 및 블록 개수를 결정하는 단계; 및 상기 결정된 블록 형태 및 블록 개수에 기초하여, 상기 FPGA의 전체 구조에 상기 설계된 하나 이상의 모듈 아키텍쳐 및 상기 설계된 컨트롤 아키텍쳐를 포함한 상기 전체 아키텍처가 상기 FPGA 내 배치되는 위치를 결정하여 상기 전체 아키텍쳐를 설계하는 단계; 포함할 수 있다.In addition, the overall architecture design step, calculating the total resource size of the FPGA and the resource size for each block; determining a block type and the number of blocks for the designed one module architecture based on the calculated resource size; And based on the determined block type and the number of blocks, the overall architecture including the designed one or more module architectures and the designed control architecture in the overall structure of the FPGA to determine the location in the FPGA to design the overall architecture step; may include

또한, 상기 전체 아키텍쳐 설계 단계는, 상기 결정된 블록 형태 및 블록 개수에 기초하여, 상기 FPGA 내 최대 개수의 모듈 아키텍쳐가 배치되도록 상기 전체 아키텍쳐가 상기 FPGA 내 배치되는 위치를 결정하는 것을 특징으로 한다.In addition, the overall architecture design step, based on the determined block type and the number of blocks, it is characterized in that the position where the entire architecture is arranged in the FPGA so that the maximum number of module architectures are arranged in the FPGA.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 딥러닝 알고리즘을 위한 FPGA) 설계 시스템은, 레즈넷 (ResNet) 신경망의 계층 (layer) 구조에 따라 각 층별로 콘볼루션 (convolution) 연산, 덧셈 (add) 연산, 풀링 연산 및 활성화 (activation) 연산을 각각 수행하기 위한 연산기들을 포함한 하나 이상의 모듈 아키텍쳐를 설계하는 모듈 아키텍쳐 설계부; 상기 설계된 하나 이상의 모듈 아키텍쳐 내의 상기 연산기들이 각각 수행할 연산 데이터의 흐름을 위한 경로를 설정하여 컨트롤 아키텍쳐를 설계하는 컨트롤 아키텍쳐 설계부; 및 FPGA (field programmable gate array)의 자원 크기에 기초하여, 상기 설계된 하나 이상의 모듈 아키텍쳐 및 상기 설계된 컨트롤 아키텍쳐를 포함한 전체 아키텍쳐가 상기 FPGA 내에서 배치될 위치를 결정하여 상기 전체 아키텍쳐를 설계하는 FPGA 설계부를 포함하며, 상기 모듈 아키텍쳐 설계부는, 상기 하나 이상의 모듈 아키텍쳐와 동일한 모듈 아키텍쳐가 추가되는 경우, 상기 추가되는 모듈 아키텍쳐를 상기 하나 이상의 모듈 아키텍쳐가 연산하는 루프에 병렬로 연결하고, 상기 컨트롤 아키텍쳐 설계부는, 상기 설계된 하나 이상의 모듈 아키텍쳐 중 연산이 수행되지 않는 모듈 아키텍쳐를 감지하고, 상기 감지된 모듈 아키텍쳐로 연산 중간 시퀀스를 제공하여 대응하는 연산이 수행된 후에 상기 설계된 하나 이상의 모듈 아키텍쳐에 병합되도록 상기 경로를 설정하는 것을 특징으로 한다.FPGA) design system for a deep learning algorithm according to an aspect of the present invention for solving the above-mentioned problems, according to the layer structure of the ResNet neural network, convolution operation, addition for each layer (add) a module architecture design unit for designing one or more module architectures including operators for performing each operation, a pooling operation, and an activation operation; a control architecture design unit for designing a control architecture by setting a path for the flow of operation data to be performed by the operators in the designed one or more module architectures; And based on the resource size of the FPGA (field programmable gate array), an FPGA design unit for designing the entire architecture by determining a position in which the entire architecture including the designed one or more module architectures and the designed control architecture will be placed in the FPGA The module architecture design unit includes, when the same module architecture as the one or more module architectures is added, the added module architecture is connected in parallel to a loop operated by the one or more module architectures, and the control architecture design unit, Detects a module architecture in which no operation is performed among the designed one or more module architectures, provides an intermediate sequence of operations to the detected module architecture, and sets the path to be merged into the designed one or more module architectures after a corresponding operation is performed characterized in that

또한, 상기 콘볼루션 연산은, 3*3 곱셈 연산이고, 상기 덧셈 연산은, 상기 3*3 곱셈 연산의 결과 값의 합산이고, 상기 풀링 연산은, 상기 합산된 결과 값들 중 가장 큰 값을 추출하는 연산이고, 상기 활성화 연산은, 상기 추출된 값에 비선형 특성을 부가하는 연산인 것을 특징으로 한다.In addition, the convolution operation is a 3 * 3 multiplication operation, the addition operation is the sum of the result values of the 3 * 3 multiplication operation, and the pooling operation is to extract the largest value among the summed result values. operation, and the activation operation is an operation for adding a nonlinear characteristic to the extracted value.

또한, 상기 FPGA 설계부는, 상기 FPGA의 전체 자원 크기 및 각 블록 별 자원 크기를 산출하는 자원 크기 산출부; 상기 산출된 자원 크기에 기초하여, 상기 설계된 하나의 모듈 아키텍쳐를 위한 블록 형태 및 블록 개수를 결정하는 모듈 블록 결정부; 및 상기 결정된 블록 형태 및 블록 개수에 기초하여, 상기 FPGA의 전체 구조에 상기 설계된 하나 이상의 모듈 아키텍쳐 및 상기 설계된 컨트롤 아키텍쳐를 포함한 상기 전체 아키텍처가 상기 FPGA 내 배치되는 위치를 결정하는 블록 배치부를 포함하는 것을 특징으로 한다.In addition, the FPGA design unit, a resource size calculation unit for calculating the total resource size of the FPGA and the resource size for each block; a module block determining unit for determining a block type and the number of blocks for the designed one module architecture, based on the calculated resource size; And on the basis of the determined block type and the number of blocks, the overall structure of the FPGA including the designed one or more module architecture and the designed control architecture in the overall structure comprising a block arranging unit for determining a location in the FPGA characterized.

상술한 과제를 해결하기 위한 본 발명의 또 다른 면에 따른 컴퓨터 프로그램은, 컴퓨터와 결합하여, 앞서 상술한 딥러닝 알고리즘을 위한 FPGA 설계 방법을 실행시키기 위하여 컴퓨터 판독가능 기록매체에 저장될 수 있다.A computer program according to another aspect of the present invention for solving the above-described problems may be stored in a computer-readable recording medium in combination with a computer to execute the FPGA design method for the above-described deep learning algorithm.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

상기와 같은 본 발명에 따르면, 딥러닝 알고리즘 (특히, 레즈네 알고리즘)을 위한 FPGA 설계의 능률, 최적화 및 유연성을 높일 수 있다.According to the present invention as described above, it is possible to increase the efficiency, optimization and flexibility of FPGA design for a deep learning algorithm (in particular, the Rezne algorithm).

또한, 본 발명에 따르면, 딥러닝 연산을 수행하는 아키텍쳐를 모듈화함으로써 FPGA의 자원 수에 따라 상기 모듈화된 아키텍쳐를 적절하게 배치할 수 있다. 이를 통해, FPGA의 사이즈에 따라 적응적으로 최적화된 아키텍쳐를 빠르고 유연하게 구현할 수 있다.In addition, according to the present invention, by modularizing the architecture for performing the deep learning operation, it is possible to appropriately arrange the modular architecture according to the number of resources of the FPGA. Through this, an architecture that is adaptively optimized according to the size of the FPGA can be implemented quickly and flexibly.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 인공 신경망의 기본적인 개념을 간단히 나타낸 도면이다.
도 2는 레즈넷의 기본적인 잔차 연결 (residual connection) 구조를 간단히 나타낸 도면이다.
도 3은 본 발명의 일 예에 따른 딥러닝 알고리즘을 위한 FPGA 설계 방법의 흐름도이다.
도 4는 본 발명의 일 예에 따른 모듈 아키텍쳐를 간단히 나타낸 도면이다.
도 5는 본 발명에 적용 가능한 활성화 함수의 예시들을 나타낸 도면이다.
도 6은 본 발명의 일 예에 따른 아키텍쳐 구조를 간단히 나타낸 도면이다.
도 7a 내지 도 7c는 본 발명에 적용 가능한 FPGA 설계 단계를 간단히 나타낸 도면이다.
도 8은 본 발명의 일 예에 따른 FPGA 설계 시스템의 구성을 나타낸 도면이다.1 is a diagram briefly illustrating the basic concept of an artificial neural network.
2 is a diagram schematically illustrating a basic residual connection structure of Reznet.
3 is a flowchart of an FPGA design method for a deep learning algorithm according to an example of the present invention.
4 is a diagram briefly illustrating a module architecture according to an example of the present invention.
5 is a diagram showing examples of activation functions applicable to the present invention.
6 is a diagram schematically illustrating an architecture structure according to an example of the present invention.
7A to 7C are diagrams briefly illustrating FPGA design steps applicable to the present invention.
8 is a diagram showing the configuration of an FPGA design system according to an example of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and those of ordinary skill in the art to which the present invention pertains. It is provided to fully understand the scope of the present invention to those skilled in the art, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of describing the embodiments and is not intended to limit the present invention. As used herein, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other components in addition to the stated components. Like reference numerals refer to like elements throughout, and "and/or" includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various elements, these elements are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first component mentioned below may be the second component within the spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used herein will have the meaning commonly understood by those of ordinary skill in the art to which this invention belongs. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에서는 딥러닝 알고리즘을 위한 FPGA (field programmable gate array) 설계 방법 및 시스템을 개시한다. 다시 말해, 본 발명에서는 상기 딥러닝 알고리즘을 구현할 수 있는 제한된 자원을 갖는 FPGA 상에 설계하는 방법 및 이를 위한 시스템을 개시한다.The present invention discloses a method and system for designing a field programmable gate array (FPGA) for a deep learning algorithm. In other words, the present invention discloses a method for designing on an FPGA having limited resources capable of implementing the deep learning algorithm and a system therefor.

설명에 앞서 본 명세서에서 사용하는 용어의 의미를 간략히 설명한다. 그렇지만 용어의 설명은 본 명세서의 이해를 돕기 위한 것이므로, 명시적으로 본 발명을 한정하는 사항으로 기재하지 않은 경우에 본 발명의 기술적 사상을 한정하는 의미로 사용하는 것이 아님을 주의해야 한다.Before the description, the meaning of the terms used herein will be briefly described. However, it should be noted that the description of terms is for the purpose of helping the understanding of the present specification, and is not used in the meaning of limiting the technical idea of the present invention unless explicitly described as limiting the present invention.

먼저, 딥 러닝 (deep learning) 알고리즘은 머신 러닝 (machine learning) 알고리즘의 하나로 인간의 신경망을 본딴 인공 신경망에서 발전된 모델링 기법을 의미한다. 인공 신경망은 도 1에 도시된 바와 같이 다층 계층 구조로 구성될 수 있다.First, a deep learning algorithm is one of the machine learning algorithms and refers to a modeling technique developed from an artificial neural network that mimics a human neural network. The artificial neural network may be configured in a multi-layered hierarchical structure as shown in FIG. 1 .

도 1은 인공 신경망의 기본적인 개념을 간단히 나타낸 도면이다.1 is a diagram briefly illustrating the basic concept of an artificial neural network.

도 1에 도시된 바와 같이, 인공 신경망 (artifical neural network; ANN)은 입력 층, 출력 층, 그리고 상기 입력 층과 출력 층 사이에 적어도 하나 이상의 중간 층 (또는 은닉 층, hidden layer)을 포함하는 계층 구조로 구성될 수 있다. 딥러닝 알고리즘은, 이와 같은 다중 계층 구조에 기반하여, 층간 활성화 함수 (activation function)의 가중치를 최적화 (optimization)하는 학습을 통해 결과적으로 신뢰성 높은 결과를 도출할 수 있다.1 , an artificial neural network (ANN) is a layer including an input layer, an output layer, and at least one intermediate layer (or a hidden layer) between the input layer and the output layer. can be structured. The deep learning algorithm can derive reliable results as a result through learning that optimizes the weight of the activation function between layers based on such a multi-layer structure.

본 발명에 적용 가능한 딥러닝 알고리즘은, 심층 신경망 (deep neural network; DNN), 합성곱 신경망 (convolutional neural network; CNN), 순환 신경망 (recurrent neural network; RNN) 등을 포함할 수 있다. The deep learning algorithm applicable to the present invention may include a deep neural network (DNN), a convolutional neural network (CNN), a recurrent neural network (RNN), and the like.

DNN은 기본적으로 기존 ANN 모델 내 중간 층 (또는 은닉 층)을 많이 늘려서 학습의 결과를 향상시키는 것을 특징으로 한다. 일 예로, 상기 DNN은 2개 이상의 중간 층을 이용하여 학습 과정을 수행하는 것을 특징으로 한다. 이에 따라, 컴퓨터는 스스로 분류 레이블을 만들어 내고 공간을 왜곡하고 데이터를 구분짓는 과정을 반복하여 최적의 출력 값을 도출할 수 있다.DNN is basically characterized by increasing the middle layer (or hidden layer) in the existing ANN model to improve the learning result. As an example, the DNN is characterized in that it performs a learning process using two or more intermediate layers. Accordingly, the computer can derive the optimal output value by repeating the process of creating a classification label by itself, distorting the space, and classifying the data.

CNN은, 기존의 데이터에서 지식을 추출하여 학습 과정이 수행되는 기법과 달리, 데이터의 특징을 추출하여 특징들의 패턴을 파악하는 구조를 갖는 것을 특징으로 한다. 상기 CNN은 콘볼루션 (convolution) 과정과 풀링 (pooling) 과정을 통해 수행될 수 있다. 다시 말해, 상기 CNN은 콘볼루션 층과 풀링 층이 복합적으로 구성된 알고리즘을 포함할 수 있다. 여기서, 콘볼루션 층에서는 데이터의 특징을 추출하는 과정 (일명, 콘볼루션 과정)이 수행된다. 상기 콘볼루션 과정은 데이터에 각 성분의 인접 성분들을 조사해 특징을 파악하고 파악한 특징을 한장으로 도출하는 과정으로써, 하나의 압축 과정으로써 파라미터의 개수를 효과적으로 줄일 수 있다. 풀링 층에서는 콘볼루션 과정을 거친 레이어의 사이즈를 줄여주는 과정 (일명, 풀링 과정)이 수행된다. 상기 풀링 과정은 데이터의 사이즈를 줄이고 노이즈를 상쇄시키고 미세한 부분에서 일관적인 특징을 제공할 수 있다. 일 예로, 상기 CNN은 정보 추출, 문장 분류, 얼굴 인식 등 여러 분야에 활용될 수 있다.Unlike the conventional technique in which a learning process is performed by extracting knowledge from data, CNN is characterized in that it has a structure for identifying patterns of features by extracting features of data. The CNN may be performed through a convolution process and a pooling process. In other words, the CNN may include an algorithm in which a convolutional layer and a pooling layer are combined. Here, in the convolutional layer, a process of extracting features of data (so-called convolution process) is performed. The convolution process is a process of examining adjacent components of each component in the data to determine the characteristics and deriving the identified characteristics into a single sheet. As a single compression process, the number of parameters can be effectively reduced. In the pooling layer, a process of reducing the size of the convolutional layer (so-called pooling process) is performed. The pooling process may reduce the size of data, cancel noise, and provide consistent features in minute details. For example, the CNN may be used in various fields such as information extraction, sentence classification, and face recognition.

RNN은 반복적이고 순차적인 데이터 (sequential data) 학습에 특화된 인공 신경망의 한 종류로써 내부에 순환 구조를 갖는 것을 특징으로 한다. 상기 RNN은 상기 순환 구조를 이용하여 과거의 학습 내용에 가중치를 적용하여 현재 학습에 반영함으로써, 현재의 학습과 과거의 학습 간 연결을 가능하게 하고 시간에 종속된다는 특징을 갖는다. 상기 RNN은 기존의 지속적이고 반복적이며 순차적인 데이터 학습의 한계를 해결한 알고리즘으로써, 음성 웨이브폼을 파악하거나 텍스트의 앞 뒤 성분을 파악하는 등에 활용될 수 있다.RNN is a type of artificial neural network specialized for iterative and sequential data learning, and is characterized by having a cyclic structure inside. The RNN uses the cyclic structure to apply weights to past learning contents and reflect them in current learning, thereby enabling a connection between current learning and past learning and being time dependent. The RNN is an algorithm that solves the limitations of the existing continuous, iterative, and sequential data learning, and can be used to identify a speech waveform or identify the front and back components of a text.

특히, 이하에서는 CNN 중 하나인 레즈넷 (ResNet)을 기초로 본 발명에 따른 FPGA 설계 방법에 대해 상세히 설명한다. 다만, 본 발명에 개시된 기술 구성은 단순히 레즈넷에 한정되어 적용되지 않으며, 상기 레즈넷과 유사한 다양한 딥러닝 알고리즘에도 적용될 수 있다.In particular, the FPGA design method according to the present invention based on ResNet, which is one of CNNs, will be described in detail below. However, the technical configuration disclosed in the present invention is not limited to simply being applied to Resnet, and may also be applied to various deep learning algorithms similar to Resnet.

도 2는 레즈넷의 기본적인 잔차 연결 (residual connection) 구조를 간단히 나타낸 도면이다.2 is a diagram schematically illustrating a basic residual connection structure of Reznet.

도 2에 도시된 바와 같이, 레즈넷에 적용된 잔차 연결에 따르면, x라고 하는 입력 값은 일정 개수의 가중치 레이어들 (예: 2개의 가중치 레이어들)을 통과하고 비선형 활성화 함수인 ReLU 함수를 통과하게 된다. ReLU 함수를 f(x)라고 할 때, 잔차 연결 구조에 따르면 활성화 값인 f(x)에 입력값 자체 (identity)인 x를 더한 f(x)+x 가 다음 활성화 함수의 입력 값으로 적용된다. 즉, 상기 잔차 연결 구조는 본연의 입력값인 x의 그래디언트 (gradient)가 직접적으로 네트워크를 통해 흐를 수 있도록 하고, 비선형 함수를 통과한 출력 값에 입력값 자체 (identity)를 더함으로써 본연의 그래디언트를 손실하지 않도록 한다.As shown in Fig. 2, according to the residual connection applied to Resnet, an input value of x passes through a certain number of weight layers (eg, two weight layers) and passes through the ReLU function, which is a non-linear activation function. do. When the ReLU function is f(x), according to the residual linkage structure, f(x)+x, which is the activation value f(x) plus x, the input value itself (identity), is applied as the input value of the next activation function. That is, the residual connection structure allows the gradient of x, which is the original input value, to flow directly through the network, and adds the input value itself (identity) to the output value that has passed the nonlinear function to obtain the original gradient. make sure not to lose

이러한 잔차 연결 구조를 적용한 레즈넷은 일정한 연산 규칙성을 가진 구조를 연속적으로 포함할 수 있다. 이러한 특성을 고려할 때, 레즈넷의 기본 아키텍쳐를 모듈화하여 연산기를 구성할 수 있다. 이때, 상기 기본 아키텍쳐는 FPGA가 갖는 자원 수 (또는 자원 양, 이때, 자원은 DSP (digital signal processing), LUT (look up table) 등을 포함할 수 있음)를 고려하여 하나 또는 복수 개가 FPGA에 적용될 수 있다. 상기 모듈화된 기본 아키텍쳐는 개수에 따라 이를 제어(control)하는 컨트롤 아키텍쳐를 수정/보완함으로써 다양한 사이즈의 FPGA에 적용될 수 있다. 이에 따라, 알고리즘 자체의 구조가 매우 간단하여 FPGA 삽입 시 배치 및 결선 에러를 최소화할 수 있다. Resnet to which such a residual connection structure is applied may continuously include a structure having a certain operation regularity. Considering these characteristics, the operator can be configured by modularizing the basic architecture of Reznet. In this case, the basic architecture considers the number of resources (or the amount of resources, in this case, the resources may include a digital signal processing (DSP), a look up table (LUT), etc.) of the FPGA, and one or a plurality of them are applied to the FPGA. can The modular basic architecture can be applied to FPGAs of various sizes by modifying/supplementing the control architecture that controls them according to the number. Accordingly, the structure of the algorithm itself is very simple, so that placement and wiring errors can be minimized when inserting the FPGA.

이에, 본 발명에서는 이와 같은 딥러닝 알고리즘 (특히, 레즈넷 알고리즘)의 특성들을 고려한 FPGA 설계 방법 및 시스템에 대해 상세히 설명한다.Accordingly, in the present invention, a detailed description will be given of an FPGA design method and system in consideration of the characteristics of such a deep learning algorithm (particularly, Reznet algorithm).

FPGA는 설계 가능 논리 소자와 프로그래밍이 가능한 내부 회로가 포함된 반도체 소자이다. 상기 FPGA는 사용자의 설정에 따라 내부의 연산기 배열, 저장소 사용/연결 구조를 유연하게 바꿀 수 있는 특성이 있다. 다만, 연산기 아키텍쳐의 설계는 매우 복잡한데다 FPGA가 확보하고 있는 자원 (예: DSP, LUT 등)에 따라 그 설계를 달리해야 하는 단점이 있다.FPGAs are semiconductor devices that contain designable logic elements and programmable internal circuitry. The FPGA has the characteristic of being able to flexibly change the internal operator arrangement and storage use/connection structure according to the user's setting. However, the design of the operator architecture is very complicated and the design has to be different depending on the resources (eg DSP, LUT, etc.) secured by the FPGA.

다만, 앞서 상술한 바와 같이, 일관성 있게 일정한 구조가 연속되는 딥러닝 알고리즘을 기본 모듈 아키텍쳐로 구성한다면, FPGA의 자원 크기에 따라 상기 기본 모듈 아키텍쳐는 쉽게 추가/적용될 수 있다. 다시 말해, 모듈화된 기본 모듈 아키텍쳐를 활용 시, 기본 모듈 아키텍쳐의 개수 증가에 따라 딥러닝 연산 성능을 쉽게 향상시킬 수 있다. 게다가, 모듈화된 기본 모듈 아키텍쳐를 활용 시, 이를 컨트롤해주는 컨트롤 아키텍쳐는 상기 기본 모듈 아키텍쳐의 개수에 따라 간단한 수정을 통해 구현 가능하다. 다시 말해, 상기 컨트롤 아키텍쳐는 상기 기본 모듈 아키텍쳐의 개수에 따른 간단한 수정을 통해 각 사이즈별 FPGA에 적용될 수 있다. 또한, 상기 기본 모듈 아키텍쳐 자체의 구조는 매우 단순하게 구현가능한 바, 이를 FPGA에 삽입할 때 배치 및 결선 에러를 최소화할 수 있다.However, as described above, if a deep learning algorithm in which a consistent structure is continuous is configured as a basic module architecture, the basic module architecture can be easily added/applied according to the resource size of the FPGA. In other words, when using a modularized basic module architecture, deep learning computation performance can be easily improved as the number of basic module architectures increases. In addition, when using the modularized basic module architecture, the control architecture that controls it can be implemented through simple modification according to the number of the basic module architectures. In other words, the control architecture can be applied to FPGAs for each size through simple modification according to the number of the basic module architectures. In addition, since the structure of the basic module architecture itself can be implemented very simply, it is possible to minimize placement and wiring errors when inserting it into the FPGA.

이런 특징들에 기초하여, 이하에서는 딥러닝 알고리즘을 위한 FPGA 설계 방법 및 시스템에 대해 상세히 설명한다.Based on these characteristics, the following describes in detail an FPGA design method and system for a deep learning algorithm.

도 3은 본 발명의 일 예에 따른 딥러닝 알고리즘을 위한 FPGA 설계 방법의 흐름도이다.3 is a flowchart of an FPGA design method for a deep learning algorithm according to an example of the present invention.

도 3에 도시된 바와 같이, 본 발명에 따른 딥러닝 알고리즘 (예: 레즈넷 등)을 위한 FPGA 설계 방법은, 모듈 아키텍쳐 설계 단계 (S310), 컨트롤 아키텍쳐 설계 단계 (S320) 및 FPGA 설계 단계 (S330)을 포함할 수 있다. 본 발명에 있어, 상기 FPGA 설계 방법은 FPGA 설계 시스템 또는 FPGA 설계 방법을 실행시키기 위하여 컴퓨터 판독가능 기록매체에 저장된 컴퓨터 프로그램에 의해 실시될 수 있다. 이에 따라, 상기 FPGA 설계 방법은 사용자로부터 입력/설정 정보를 입력 받고, 대응하는 딥러닝 알고리즘을 위한 최적의 FPGA 설계 방법을 사용자에게 제공/디스플레이할 수 있다.As shown in FIG. 3 , the FPGA design method for a deep learning algorithm (eg, Reznet, etc.) according to the present invention includes a module architecture design step (S310), a control architecture design step (S320) and an FPGA design step (S330). ) may be included. In the present invention, the FPGA design method may be implemented by a computer program stored in a computer-readable recording medium to execute the FPGA design system or the FPGA design method. Accordingly, the FPGA design method may receive input/setting information from a user, and provide/display an optimal FPGA design method for a corresponding deep learning algorithm to the user.

본 발명에 적용 가능한 일 실시예에 있어, S310 단계에서 FPGA 설계 시스템은 레즈넷 (ResNet) 알고리즘의 계층 (layer) 구조에 따라 각 층별 연산기 구성을 설계할 수 있다. 보다 구체적으로, 상기 FPGA 설계 시스템은 S310 단계에서 상기 레즈넷 알고리즘의 계층 구조에 따라 각 층별로 콘볼루션 (convolution) 연산, 덧셈 (add) 연산, 풀링 연산, 활성화 (activation) 연산 중 적어도 하나 이상의 연산을 포함한 모듈 아키텍쳐를 설정할 수 있다. 이때, 상기 레즈넷 알고리즘 또는 딥러닝 알고리즘의 연산 구조 (또는 계층 구조)는 사용자로부터 입력된 데이터 또는 설정된 데이터로부터 획득될 수 있다. In one embodiment applicable to the present invention, in step S310, the FPGA design system may design the configuration of the operator for each layer according to the layer structure of the ResNet algorithm. More specifically, the FPGA design system performs at least one of a convolution operation, an add operation, a pooling operation, and an activation operation for each layer according to the hierarchical structure of the Reznet algorithm in step S310. You can configure the module architecture including In this case, the operation structure (or hierarchical structure) of the Reznet algorithm or the deep learning algorithm may be obtained from data input from a user or set data.

본 발명에 적용 가능한 일 예로, 콘볼루션 연산은 1 사이클에 수행되도록 설정될 수 있고, 덧셈 연산은 입력 데이터가 2개의 계층을 통과할 때마다 상기 2개의 계층 이전의 입력 데이터와 합산하도록 설정될 수 있다. 보다 구체적으로, 반복적으로 이용되는 콘볼루션 연산 (예: 3*3 콘볼루션 연산 등)은 1 사이클에 수행되도록 곱셈-누산 아키텍쳐가 구성될 수 있고, 덧셈 연산은 입력 데이터가 2개의 계층을 통과할 때마다 상기 2개의 계층 이전의 입력 데이터와 합삼되도록 패턴을 구성할 수 있다. 이와 함께 풀링 연산 및 활성화 연산은 특정 계층 (예: 초기 레이어 탐색 단계에서 파악된 레이어 등)에 배치되어 연산 동작을 수행하도록 설정될 수 있다. As an example applicable to the present invention, the convolution operation may be set to be performed in one cycle, and the addition operation may be set to add up input data before the two layers whenever input data passes through two layers. have. More specifically, the multiplication-accumulation architecture may be configured such that repeatedly used convolution operations (eg, 3*3 convolution operations, etc.) are performed in one cycle, and the addition operation is performed when input data passes through two layers. The pattern may be configured to be merged with input data before the two layers each time. In addition, the pooling operation and the activation operation may be set to be placed in a specific layer (eg, a layer identified in an initial layer search step, etc.) to perform an operation operation.

도 4는 본 발명의 일 예에 따른 모듈 아키텍쳐를 간단히 나타낸 도면이다.4 is a diagram briefly illustrating a module architecture according to an example of the present invention.

도 4에 도시된 바와 같이, 본 발명의 일 예에 따른 모듈 아키텍쳐는 3*3 곱셈 연산기, 누산기, 풀링 연산 및 ReLU 연산을 포함할 수 있다. 다시 말해, 상기 모듈 아키텍쳐는 총 9개의 연산 (3*3)을 한꺼번에 수행하는 연산기 및 이를 누적시켜 합산하는 아키텍쳐를 포함하도록 구성될 수 있다. 일 예로, 상기 콘볼루션 연산은 3*3 곱셈 연산을 수행하고, 상기 덧셈 연산은 상기 콘볼루션 연산의 결과 값의 합산을 수행하고, 상기 풀링 연산은 상기 합산된 값들 중 가장 큰 값을 추출하고, 상기 활성화 연산은 상기 추출된 값에 비선형 특성을 부가하도록 설정될 수 있다. 이때, 상기 모듈 아키텍쳐는 누산기, 풀링 연산, ReLU 연산으로부터 출력되는 데이터로부터 일정 데이터 값을 출력하는 데이터 선택기를 추가적으로 포함할 수 있다. 추가적인 실시예로, 이와 같은 모듈 아키텍쳐가 하나 더 추가되는 경우, 상기 추가되는 모듈 아키텍쳐는 기존 모듈 아키텍쳐가 연산하는 루프에 병렬로 연결될 수 있다. 이를 통해, 전체 모듈 아키텍쳐는 2개의 레즈넷 연산을 동시에 처리 가능하도록 구성될 수도 있다.As shown in FIG. 4 , the module architecture according to an embodiment of the present invention may include a 3*3 multiplication operator, an accumulator, a pooling operation, and a ReLU operation. In other words, the module architecture may be configured to include an operator that performs a total of nine operations (3*3) at once, and an architecture that accumulates and sums them. For example, the convolution operation performs a 3*3 multiplication operation, the addition operation performs summing the result values of the convolution operation, and the pooling operation extracts the largest value among the summed values, The activation operation may be set to add a non-linear characteristic to the extracted value. In this case, the module architecture may additionally include a data selector for outputting a predetermined data value from data output from an accumulator, a pooling operation, and a ReLU operation. As an additional embodiment, when one such module architecture is added, the added module architecture may be connected in parallel to a loop operated by the existing module architecture. Through this, the entire module architecture may be configured to simultaneously process two Reznet operations.

도 5는 본 발명에 적용 가능한 활성화 함수의 예시들을 나타낸 도면이다.5 is a diagram showing examples of activation functions applicable to the present invention.

도 5에 도시된 바와 같이, 모듈 아키텍쳐 내 활성화 연산은 입력된 데이터 (예: 풀링 연산으로부터 추출된 값)에 비선형 특성을 부가하도록 활성화 함수를 적용할 수 있다. 이때, 상기 활성화 함수로는, 도 5에 도시된 다양한 활성화 함수들 중 하나가 적용될 수 있다. 일 예로, 본 발명에서는 ReLU 함수가 활성화 함수로 적용되는 예시를 개시하였으나, 실시예에 따라 ReLU 함수 대신 Sigmoid, Leaky ReLU, ELU 함수 등이 적용될 수도 있다.As shown in FIG. 5 , an activation operation in the module architecture may apply an activation function to add a non-linear characteristic to input data (eg, a value extracted from a pooling operation). In this case, as the activation function, one of various activation functions shown in FIG. 5 may be applied. As an example, the present invention discloses an example in which the ReLU function is applied as an activation function, but according to an embodiment, a Sigmoid, Leaky ReLU, ELU function, etc. may be applied instead of the ReLU function.

본 발명에 적용 가능한 일 실시예에 있어, S320 단계에서 FPGA 설계 시스템은 하나 이상의 모듈 아키텍쳐로 입력 데이터 및 가중치 (weight) 데이터를 병렬로 연결하고, 상기 하나 이상의 모듈 아키텍쳐 내 연산에 따른 데이터 경로를 설정할 수 있다. 보다 구체적으로, S320 단계에서 상기 FPGA 설계 시스템은 앞서 상술한 바에 따라 설계된 하나 이상의 모듈 아키텍쳐를 하나로 묶어주고, 상기 하나 이상의 모듈 아키텍쳐에 대해 적절한 데이터 흐름을 설정할 수 있다.In an embodiment applicable to the present invention, in step S320, the FPGA design system connects input data and weight data to one or more module architectures in parallel, and sets a data path according to an operation in the one or more module architectures. can More specifically, in step S320, the FPGA design system may bind one or more module architectures designed as described above into one, and set an appropriate data flow for the one or more module architectures.

보다 구체적으로, FPGA 설계 시스템은 하나의 모듈 아키텍쳐를 위한 입력 데이터 및 가중치 데이터를 연결하고, 상기 하나의 모듈 아키텍쳐 내 연산에 따른 데이터 경로를 설정할 수 있다. 이어, 하나 이상의 (데이터) 모듈 아키텍쳐가 추가되는 경우, 상기 FPGA 설계 시스템은 상기 추가되는 하나 이상의 모듈 아키텍쳐에 기존 저장소에 저장되어 있는 입력 데이터 및 가중치 데이터를 병렬로 연결하고, 상기 추가되는 하나 이상의 모듈 아키텍쳐 중 각각의 연산 (예: 콘볼루션 연산, 덧셈 연산, 풀링 연산, 활성화 연산 등)을 수행하지 않는 모듈 아키텍쳐를 감지할 수 있다. 이어, 상기 FPGA 설계 시스템은 연산 중간 시퀀스를 연산이 미수행중인 모듈 아키텍쳐로 제공하여 연산이 수행된 후 병합되도록 컨트롤 아키텍쳐를 설정할 수 있다. 다시 말해, S320 단계에서, 상기 FPGA 설계 시스템은, 하나 이상의 모듈 아키텍쳐 내에서 연산이 수행되지 않는 모듈 아키텍쳐를 감지하고, 상기 감지된 모듈 아키텍쳐로 연산 중간 시퀀스를 제공하여 대응하는 연산이 수행되어 병합되도록 데이터 경로를 설정할 수 있다.More specifically, the FPGA design system may connect input data and weight data for one module architecture, and set a data path according to an operation within the one module architecture. Then, when one or more (data) module architectures are added, the FPGA design system connects the input data and weight data stored in the existing storage to the one or more module architectures to be added in parallel, and the one or more modules to be added It is possible to detect a module architecture that does not perform each operation among the architectures (eg, a convolution operation, an addition operation, a pooling operation, an activation operation, etc.). Next, the FPGA design system may set the control architecture to be merged after the operation is performed by providing the operation intermediate sequence to the module architecture in which the operation is not being performed. In other words, in step S320, the FPGA design system detects a module architecture in which no operation is performed within one or more module architectures, and provides an intermediate sequence of operations to the detected module architecture so that the corresponding operation is performed and merged. You can set the data path.

도 6은 본 발명의 일 예에 따른 아키텍쳐 구조를 간단히 나타낸 도면이다.6 is a diagram schematically illustrating an architecture structure according to an example of the present invention.

앞서 상술한 모듈 아키텍쳐 설계 단계 및 컨트롤 아키텍쳐 설계 단계를 거치면, 도 6에 도시된 바와 같이, 하나 이상의 모듈 아키텍쳐 및 상기 하나 이상의 모듈 아키텍쳐를 위한 컨트롤 아키텍쳐를 포함한 전체 아키텍쳐 구조를 설계할 수 있다.After the module architecture design step and the control architecture design step described above, as shown in FIG. 6 , the entire architecture structure including one or more module architectures and a control architecture for the one or more module architectures can be designed.

일 예로, 도 6에 도시된 바와 같이, 모듈 아키텍쳐는 하나 이상의 연산 아키텍쳐 및 상기 하나 이상의 연산 아키텍쳐로부터의 출력 값을 병합하는 데이터 병합 아키텍쳐를 포함할 수 있다. 이어, 컨트롤 아키텍쳐는 입력되는 입력 데이터 및 가중치 데이터를 각 연산 아키텍쳐 및 데이터 병합 아키텍쳐에 제공하여 특정 딥러닝 알고리즘 (예: 레즈넷 알고리즘 등)을 구현하도록 설정할 수 있다. 이어, 상기 데이터 병합 아키텍쳐는 상기 특정 딥러닝 알고리즘에 따른 결과 데이터를 출력할 수 있다.For example, as shown in FIG. 6 , the module architecture may include one or more computational architectures and a data merging architecture that merges output values from the one or more computational architectures. Then, the control architecture may be configured to implement a specific deep learning algorithm (eg, Reznet algorithm, etc.) by providing input input data and weight data to each computational architecture and data merging architecture. Then, the data merging architecture may output result data according to the specific deep learning algorithm.

본 발명에 적용 가능한 일 실시예에 있어, S330 단계에서 FPGA 설계 시스템은, FPGA의 자원 크기에 기초하여, 하나 이상의 모듈 아키텍쳐 및 하나 이상의 모듈 아키텍쳐를 위한 컨트롤 아키텍쳐를 포함한 전체 아키텍쳐가 FPGA 내 배치되는 위치를 결정할 수 있다.In an embodiment applicable to the present invention, in step S330, the FPGA design system, based on the resource size of the FPGA, the entire architecture including one or more module architectures and a control architecture for one or more module architectures is located in the FPGA can be decided

보다 구체적으로, S330 단계에서 FPGA 설계 시스템은 앞서 상술한 방법에 따라 설계된 딥러닝 알고리즘 (예: 레즈넷 알고리즘)을 위한 모듈 아키텍쳐 및 이를 컨트롤하는 컨트롤 아키텍쳐를 실제 FPGA에 적절히 삽입하여 구현할 수 있도록 설정할 수 있다. More specifically, in step S330, the FPGA design system can be set to be implemented by properly inserting the module architecture for the deep learning algorithm (eg, Resnet algorithm) designed according to the above-described method and the control architecture to control it into the actual FPGA. have.

일반적으로, 특정 아키텍쳐를 FPGA에 삽입하기 위해 컴퓨터 시스템은 자동적으로 연산기를 배치 및 결선 기능을 제공하나, 이러한 방법은 많은 소요 시간을 필요로 하는데다 실패 여부의 확인이 매우 늦어 비효율적이다. 게다가, FPGA의 사이즈가 달라지는 경우, 이러한 배치 및 결선 작업은 다시 수행되어야하는 단점이 있다.In general, in order to insert a specific architecture into an FPGA, a computer system automatically provides a function of arranging and connecting an operator, but this method requires a lot of time and is inefficient because it takes a lot of time and the confirmation of failure is very slow. In addition, when the size of the FPGA is changed, there is a disadvantage that such arrangement and wiring work must be performed again.

반면, 본 발명과 같이 모듈화된 아키텍쳐 (예: 모듈 아키텍쳐)를 활용하는 경우, 상기 모듈화된 아키텍쳐가 필요로 하는 FPGA 영역 (예: 블록 영역)을 P-블록으로 설정할 수 있다. 이때, P-블록이란, FPGA 상 평면도 또는 기본 단위를 의미할 수 있다. 이에, 상기 P-블록이 FPGA 내 특정 위치에 배치 가능한지 여부를 확인함으로써 동일한 P-블록에 대한 실패나 오류 확률을 줄이면서 결과적으로 FPGA 내 배치 및 결선을 성공적으로 수행할 수 있다. 이렇게 검증된 모듈 아키텍쳐를 FPGA의 전체 영역에 배치하여 삽입하는 경우, 각 사이즈별 FPGA로 최적화된 레즈넷 연산 아키텍쳐의 삽입에 투입되는 시간을 대폭으로 줄일 수 있어 합리적인 최적화 설계가 가능할 수 있다.On the other hand, when using a modular architecture (eg, a module architecture) as in the present invention, an FPGA area (eg, a block area) required by the modular architecture may be set as a P-block. In this case, the P-block may mean a plan view or a basic unit on the FPGA. Accordingly, by checking whether the P-block can be placed in a specific position in the FPGA, the probability of failure or error for the same P-block is reduced, and as a result, placement and wiring in the FPGA can be successfully performed. When the verified module architecture is placed and inserted in the entire area of the FPGA, the time required for inserting the REZNET operation architecture optimized for each size FPGA can be significantly reduced, enabling rational optimization design.

도 7a 내지 도 7c는 본 발명에 적용 가능한 FPGA 설계 단계를 간단히 나타낸 도면이다.7A to 7C are diagrams briefly illustrating FPGA design steps applicable to the present invention.

상기와 같은 동작을 위해, S330 단계 (FPGA 설계 단계는), FPGA의 전체 자원 크기 (예: DSP, LUT 자원 등) 및 각 블록 별 자원 크기를 산출하는 자원 크기 산출 단계 (도 7a), 상기 산출된 자원 크기에 기초하여, 하나의 모듈 아키텍쳐를 위한 블록 형태 및 블록 개수를 결정하는 모듈 블록 결정 단계 (도 7b), 상기 결정된 하나의 모듈 아키텍쳐를 위한 블록 형태 및 블록 개수에 기초하여, 상기 FPGA의 전체 구조에 상기 하나 이상의 모듈 아키텍쳐 및 상기 하나 이상의 모듈 아키텍쳐를 위한 컨트롤 아키텍쳐를 포함한 상기 전체 아키텍처가 상기 FPGA 내 배치되는 위치를 결정하는 블록 배치 단계 (도 7c)를 포함할 수 있다.For the above operation, step S330 (FPGA design step), the resource size calculation step (FIG. 7a) of calculating the total resource size of the FPGA (eg, DSP, LUT resource, etc.) and the resource size for each block (FIG. 7a), the calculation Based on the determined resource size, the module block determination step of determining the block type and the number of blocks for one module architecture (FIG. 7b), based on the determined block type and the number of blocks for the one module architecture, the FPGA and a block placement step (FIG. 7C) of determining where the overall architecture, including the one or more module architectures and a control architecture for the one or more module architectures, is placed in the FPGA in the overall architecture.

여기서, 블록 배치 단계는, 결정된 하나의 모듈 아키텍쳐를 위한 블록 형태 및 블록 개수에 기초하여, FPGA 내 최대 개수의 모듈 아키텍쳐가 배치되도록 전체 아키텍쳐가 상기 FPGA 내 배치되는 위치를 결정하는 것을 포함할 수 있다. 다시 말해, FPGA 설계 시스템은, 해당 FPGA 내 최대 개수의 모듈 아키텍쳐가 배치될 수 있도록 상기 최대 개수의 모듈 아키텍쳐 및 상기 최대 개수의 모듈 아키텍쳐를 위한 컨트롤 아키텍쳐를 포함한 전체 아키텍쳐의 FPGA 내 배치 결과를 사용자에게 제공할 수 있다.Here, the block arrangement step may include, based on the determined block type and the number of blocks for one module architecture, determining a location where the entire architecture is placed in the FPGA so that the maximum number of module architectures in the FPGA is placed. . In other words, the FPGA design system provides the user with the result of placement in the FPGA of the entire architecture including the maximum number of module architectures and the control architecture for the maximum number of module architectures so that the maximum number of module architectures can be placed in the FPGA. can provide

다른 예로, FPGA 설계 시스템은, 블록 배치 단계에서, 결정된 하나의 모듈 아키텍쳐를 위한 블록 형태 및 블록 개수에 기초하여, 상기 FPGA 내 실패 및/또는 오류 확률을 최소화되도록 전체 아키텍쳐가 상기 FPGA 내 배치되는 위치를 결정하는 것을 포함할 수 있다. 다시 말해, FPGA 설계 시스템은, 해당 FPGA 내 하나 이상의 모듈 아키텍쳐의 구현 실패/오류 확률이 최소가 될 수 있는 상기 하나 이상의 모듈 아키텍쳐 및 상기 하나 이상의 모듈 아키텍쳐를 위한 컨트롤 아키텍쳐를 포함한 전체 아키텍쳐의 FPGA 내 배치 결과를 사용자에게 제공할 수 있다.As another example, in the FPGA design system, based on the block type and the number of blocks for one module architecture determined in the block arrangement step, the entire architecture is placed in the FPGA to minimize the probability of failure and/or error in the FPGA. may include determining In other words, the FPGA design system is a configuration in the FPGA of the entire architecture including the one or more module architectures and the control architecture for the one or more module architectures, such that the probability of implementation failure/error of the one or more module architectures in the FPGA can be minimized. Results can be provided to the user.

본 발명에 있어, 상기와 같은 과정은 일련의 알고리즘을 구동하는 스크립트를 이용하여 다양한 사이즈의 FPGA에서도 동일한 알고리즘 구동을 통해 수행될 수 있다. 이를 통해, 본 발명에 따른 FPGA 설계 시스템은 모듈 아키텍쳐를 위한 최적화 형태의 블록을 탐색하여 배치할 수 있다.In the present invention, the above process can be performed through the same algorithm driving in FPGAs of various sizes by using a script for driving a series of algorithms. Through this, the FPGA design system according to the present invention can search for and arrange a block in an optimized form for a module architecture.

도 8은 본 발명의 일 예에 따른 딥러닝 알고리즘을 위한 FPGA 설계 시스템의 구성을 나타낸 도면이다.8 is a diagram showing the configuration of an FPGA design system for a deep learning algorithm according to an example of the present invention.

도 8에 도시된 바와 같이, 상기 FPGA 설계 시스템은, 모듈 아키텍쳐 설계부 (810), 컨트롤 아키텍쳐 설계부 (820), FPGA 설계부 (830) 및 입력부 (840)를 포함할 수 있다.As shown in FIG. 8 , the FPGA design system may include a module architecture design unit 810 , a control architecture design unit 820 , an FPGA design unit 830 , and an input unit 840 .

본 발명에 있어, 모듈 아키텍쳐 설계부 (810)는 앞서 상술한 모듈 아키텍쳐 설계와 관련된 동작을 수행할 수 있다. 또한, 컨트롤 아키텍쳐 설계부 (820)는 앞서 상술한 컨트롤 아키텍쳐 설계와 관련된 동작을 수행할 수 있다. 또한, FPGA 설계부 (830)는 앞서 상술한 FPGA 설계와 관련된 동작을 수행할 수 있다. 특히, 앞서 상술한 FPGA 설계 단계를 위해, FPGA 설계부 (830)는, FPGA의 전체 자원 크기 및 각 블록 별 자원 크기를 산출하는 자원 크기 산출부, 상기 산출된 자원 크기에 기초하여, 하나의 모듈 아키텍쳐를 위한 블록 형태 및 블록 개수를 결정하는 모듈 블록 결정부, 및 상기 결정된 하나의 모듈 아키텍쳐를 위한 블록 형태 및 블록 개수에 기초하여, 상기 FPGA의 전체 구조에 상기 하나 이상의 모듈 아키텍쳐 및 상기 하나 이상의 모듈 아키텍쳐를 위한 컨트롤 아키텍쳐를 포함한 상기 전체 아키텍처가 상기 FPGA 내 배치되는 위치를 결정하는 블록 배치부를 포함할 수 있다.In the present invention, the module architecture design unit 810 may perform an operation related to the above-described module architecture design. Also, the control architecture design unit 820 may perform an operation related to the aforementioned control architecture design. Also, the FPGA design unit 830 may perform an operation related to the above-described FPGA design. In particular, for the above-described FPGA design step, the FPGA design unit 830, a resource size calculation unit for calculating the total resource size of the FPGA and the resource size for each block, based on the calculated resource size, one module architecture A module block determining unit that determines the block shape and the number of blocks for, and the one or more module architectures and the one or more module architectures in the overall structure of the FPGA based on the determined block shape and the number of blocks for one module architecture The entire architecture including the control architecture for may include a block arrangement unit that determines a location in the FPGA.

추가적으로, 입력부 (840)는 FPGA 설계 시스템을 위한 사용자의 입력/설정 등을 획득하고, 이를 적절한 데이터 (예: 딥러닝 알고리즘의 계층 구조, 입력 데이터, 가중치 데이터 등)으로 변환하여 상기 FPGA 설계 시스템에 제공할 수 있다. 이를 통해, 상기 FPGA 설계 시스템은 사용자가 설정한 딥러닝 알고리즘을 위한 최적의 FPGA 설계 방법을 상기 사용자에게 제공할 수 있다.Additionally, the input unit 840 obtains the user's input/settings for the FPGA design system, and converts it into appropriate data (eg, hierarchical structure of deep learning algorithm, input data, weight data, etc.) to the FPGA design system. can provide Through this, the FPGA design system can provide the user with an optimal FPGA design method for the deep learning algorithm set by the user.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in relation to an embodiment of the present invention may be implemented directly in hardware, as a software module executed by hardware, or by a combination thereof. A software module may contain random access memory (RAM), read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, hard disk, removable disk, CD-ROM, or It may reside in any type of computer-readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다. As mentioned above, although embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains can realize that the present invention can be embodied in other specific forms without changing its technical spirit or essential features. you will be able to understand Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

800: FPGA 설계 시스템
810: 모듈 아키텍쳐 설계부
820: 컨트롤 아키텍쳐 설계부
830: FPGA 설계부
840: 입력부800: FPGA design system
810: module architecture design unit
820: control architecture design unit
830: FPGA Design Department
840: input unit

Claims

A method performed by a computer, comprising:
One or more module architectures including operators for performing convolution operation, addition operation, pooling operation, and activation operation for each layer according to the layer structure of the ResNet neural network. designing;
designing a control architecture by setting a path for the flow of computation data to be performed by the operators in the designed one or more module architectures; and
Based on the resource size of a field programmable gate array (FPGA), designing the overall architecture by determining a location in which the entire architecture including the designed one or more module architectures and the designed control architecture will be placed in the FPGA; including; and
The module architecture design step is,
When the same module architecture as the one or more module architectures is added, the added module architecture is connected in parallel to a loop in which the one or more module architectures operate,
The control architecture design step is,
Detecting a module architecture in which an operation is not performed among the designed one or more module architectures,
providing an intermediate sequence of operations to the sensed module architecture to set the path to be merged into the designed one or more module architectures after a corresponding operation is performed,
FPGA Design Methods for Deep Learning Algorithms.

The method of claim 1,
The module architecture design step is,
Set the convolution operation to be performed in one cycle,
Each time the input data input to the one or more module architectures passes through two layers, the addition operation is set to be summed with input data before the two layers,
FPGA Design Methods for Deep Learning Algorithms.

The method of claim 1,
The convolution operation is a 3 * 3 multiplication operation,
The addition operation is the sum of the result values of the 3 * 3 multiplication operation,
The pooling operation is an operation for extracting the largest value among the summed result values,
The activation operation is characterized in that it is an operation that adds a non-linear characteristic to the extracted value,
FPGA Design Methods for Deep Learning Algorithms.

delete

The method of claim 1,
The overall architecture design step is,
calculating the total resource size of the FPGA and the resource size for each block;
determining a block type and the number of blocks for the designed one module architecture based on the calculated resource size; and
Based on the determined block type and the number of blocks, designing the overall architecture by determining a location where the entire architecture including the designed one or more module architectures and the designed control architecture in the overall structure of the FPGA is placed in the FPGA ; characterized in that it comprises
FPGA Design Methods for Deep Learning Algorithms.

6. The method of claim 5,
The overall architecture design step is,
Based on the determined block type and the number of blocks, it is characterized in that to determine a position where the entire architecture is placed in the FPGA so that the maximum number of module architectures in the FPGA are placed
FPGA Design Methods for Deep Learning Algorithms.

One or more module architectures including operators for performing convolution operation, addition operation, pooling operation, and activation operation for each layer according to the layer structure of the ResNet neural network. a module architecture design unit that designs;
a control architecture design unit for designing a control architecture by setting a path for the flow of operation data to be performed by the operators in the designed one or more module architectures; and
Based on the resource size of a field programmable gate array (FPGA), the entire architecture including the designed one or more module architectures and the designed control architecture is placed in the FPGA to determine a location in the FPGA to design the entire architecture. and
The module architecture design unit, when the same module architecture as the one or more module architectures is added, connects the added module architecture in parallel to a loop in which the one or more module architectures operate,
The control architecture design unit detects a module architecture in which an operation is not performed among the designed one or more module architectures, provides an intermediate sequence of operations to the detected module architecture, and after a corresponding operation is performed, to the designed one or more module architectures characterized in that the path is set to be merged,
FPGA design system for deep learning algorithms.

8. The method of claim 7,
The convolution operation is a 3 * 3 multiplication operation,
The addition operation is the sum of the result values of the 3 * 3 multiplication operation,
The pooling operation is an operation for extracting the largest value among the summed result values,
The activation operation is characterized in that it is an operation that adds a non-linear characteristic to the extracted value,
FPGA design system for deep learning algorithms.

8. The method of claim 7,
The FPGA design unit,
a resource size calculator for calculating the total resource size of the FPGA and the resource size for each block;
a module block determining unit for determining a block type and the number of blocks for the designed one module architecture, based on the calculated resource size; and
Based on the determined block type and the number of blocks, the overall structure of the FPGA includes a block arrangement unit that determines a location in which the entire architecture including the designed one or more module architectures and the designed control architecture is placed in the FPGA. to do,
FPGA design system for deep learning algorithms.

A recording medium in which a program for executing the method of any one of claims 1 to 6 is stored in combination with a computer which is hardware.