KR20190117103A

KR20190117103A - Method and Apparatus for High-Speed Low-Power Processing in Large-Scale Deep Neural Network

Info

Publication number: KR20190117103A
Application number: KR1020180040202A
Authority: KR
Inventors: 장성준; 이상설; 최병호
Original assignee: 전자부품연구원
Priority date: 2018-04-06
Filing date: 2018-04-06
Publication date: 2019-10-16
Also published as: KR102453370B1

Abstract

Provided are a method and device for high-speed and low-power processing in a large-scale deep neural network. According to an embodiment of the present invention, the method partitions a feature map, repartitions the feature map, inputs the repartitioned feature map into a neural network to perform an operation using the feature map, and outputs a result of the operation. As a result, hardware for pipeline processing of a deep convolution neural network having a large-scale channel may be implemented using an FPGA, and an ASIC, and even high-speed and low power processing may be enabled.

Description

Method and Apparatus for High-Speed Low-Power Processing of Large-scale Deep Neural Networks {Method and Apparatus for High-Speed Low-Power Processing in Large-Scale Deep Neural Network}

본 발명은 인공 지능 관련 기술에 관한 것으로, 더욱 상세하게는 심층 신경망(Deep Neural Network)를 가속하기 위한 방법에 관한 것이다.The present invention relates to an artificial intelligence technology, and more particularly, to a method for accelerating a deep neural network.

도 1은 심층 컨볼루션 신경망(Deep Convolutional Neural Network)의 개념도이다. 심층 컨볼루션 신경망은, 도 1에 도시된 바와 같이, 컨볼루션 커널을 이용하여, 입력 특징맵(Feature Map)에 대해 컨볼루션(Multiply 및 Accumulation) 연산을 수행하여 출력 특징맵을 생성한다.1 is a conceptual diagram of a deep convolutional neural network. As shown in FIG. 1, the deep convolution neural network generates an output feature map by performing a convolution (Multiply and Accumulation) operation on an input feature map using a convolution kernel.

도 2 내지 도 4에는 심층 컨볼루션 신경망의 처리 방법을 나타내었다.2 to 4 show a method of processing a deep convolutional neural network.

구체적으로, 도 2에는 CPU로 구현할 수 있는 심층 컨볼루션 신경망의 순차적 처리 방법을 나타내었고, 도 3에는 GPU, FPGA, ASIC 등으로 구현할 수 있는 심층 컨볼루션 신경망의 병렬 처리 방법을 나타내었다.Specifically, FIG. 2 illustrates a sequential processing method of a deep convolution neural network that can be implemented by a CPU, and FIG. 3 illustrates a parallel processing method of a deep convolution neural network that can be implemented by a GPU, an FPGA, an ASIC, and the like.

도 4에는 FPGA, ASIC 등으로 구현할 수 있는 심층 컨볼루션 신경망의 파이프라인 처리 방법을 나타내었다. 레이어 별로 독립적인 컨볼루션 계산기들이 구성되고, 레이어 간에는 파이프라인 처리 기법이 적용된다. 다음 레이어의 동작을 위해 이전 레이어에서의 모든 채널의 출력 값이 필요하다.4 illustrates a pipeline processing method of a deep convolution neural network that can be implemented with an FPGA, an ASIC, and the like. Independent layers of convolution calculators are configured for each layer, and pipeline processing is applied between layers. For the operation of the next layer, the output values of all channels in the previous layer are needed.

도 2 내지 도 4에 제시된 심층 컨볼루션 신경망의 처리 방법 중 가장 효과적인 방법은 심층 컨볼루션 신경망의 파이프라인 처리 방법이다. 도 5에는 심층 컨볼루션 신경망의 파이프라인 처리를 위한 하드웨어 장치의 구성을 나타낸 도면이다.The most effective method of processing the deep convolutional neural network shown in FIGS. 2 to 4 is a pipeline processing method of the deep convolutional neural network. 5 is a diagram illustrating a configuration of a hardware device for pipeline processing of a deep convolution neural network.

도시된 도면을 통해 짐작할 수 있는 바와 같이, 심층 컨볼루션 신경망의 파이프라인 처리 방법은 채널이 증가할 경우에 버퍼 메모리의 크기가 커지게 되는 문제가 있다.As can be seen from the figure, the pipeline processing method of the deep convolution neural network has a problem that the size of the buffer memory increases when the channel is increased.

최근 신경망의 경우 512채널까지 요구하고 있는데, 이를 위한 버퍼 크기는 제한된 메모리 리소스 환경의 FPGA, ASIC 등으로는 구현 불가능한 수준이다.Recently, neural networks require up to 512 channels, and the buffer size for this is impossible to implement with FPGAs and ASICs in a limited memory resource environment.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 대규모 채널을 가진 심층 컨볼루션 신경망의 파이프라인 처리를 위한 하드웨어를 FPGA, ASIC 등으로 구현할 수 있도록 하기 위한 방법을 제공함에 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide a method for implementing hardware, such as FPGA, ASIC, for pipeline processing of a deep convolution neural network having a large channel. Is in.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 신경망 처리 방법은, 특징맵을 분할하는 단계; 분할된 특징맵을 재분할하는 단계; 재분할된 특징맵을 신경망에 입력하는 단계; 신경망에서 입력된 특징맵을 이용한 연산을 수행하는 단계; 및 연산 결과를 출력하는 단계;를 포함한다.According to an embodiment of the present invention, a neural network processing method includes: segmenting a feature map; Repartitioning the divided feature map; Inputting the repartitioned feature map into the neural network; Performing an operation using a feature map input from a neural network; And outputting a result of the operation.

그리고, 분할 단계는, 특징맵을 수평으로 분할할 수 있다.In the dividing step, the feature map may be horizontally divided.

또한, 재분할 단계는, 분할된 특징맵을 수직으로 분할할 수 있다.Also, in the re-dividing step, the divided feature map may be vertically divided.

그리고, 재분할 단계는, 분할된 특징맵에 대해, 경계선을 중심으로 주변 일부가 중첩되도록 재분할할 수 있다.In addition, in the repartitioning step, the divided feature map may be repartitioned so that a part of the periphery overlaps with respect to the boundary line.

또한, 재분할된 특징맵이 저장되는 버퍼와 경계선을 중심으로 한 중첩 영역이 저장되는 버퍼가 구분되어 있을 수 있다.In addition, a buffer in which the repartitioned feature map is stored and a buffer in which an overlapping area around the boundary line is stored may be divided.

그리고, 입력 단계, 연산 수행 단계 및 출력 단계는, 파이프라인으로 처리될 수 있다.In addition, the input step, the operation execution step, and the output step may be processed in a pipeline.

또한, 신경망은, 심층 컨볼루션 신경망일 수 있다.The neural network may also be a deep convolutional neural network.

한편, 본 발명의 다른 실시예에 따른, 신경망 처리 장치는, 특징맵을 분할하고, 분할된 특징맵을 재분할하는 버퍼; 및 버퍼에서 재분할된 특징맵을 입력받아 연산을 수행하여 결과를 출력하는 신경망;를 포함한다.On the other hand, according to another embodiment of the present invention, a neural network processing apparatus includes a buffer for dividing a feature map and re-dividing the divided feature map; And a neural network that receives the feature map repartitioned from the buffer, performs an operation, and outputs a result.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 대규모 채널을 가진 심층 컨볼루션 신경망의 파이프라인 처리를 위한 하드웨어를 FPGA, ASIC 등으로 구현할 수 있으며, 나아가 고속/저전력 처리까지도 가능하게 된다.As described above, according to embodiments of the present invention, hardware for pipeline processing of a deep convolutional neural network having a large channel may be implemented using an FPGA, an ASIC, and the like, and high speed / low power processing may be possible.

도 1은 심층 컨볼루션 신경망의 개념도,
도 2는 심층 컨볼루션 신경망의 순차적 처리 방법,
도 3은 심층 컨볼루션 신경망의 병렬 처리 방법,
도 4는 심층 컨볼루션 신경망의 파이프라인 처리 방법,
도 5는 심층 컨볼루션 신경망의 파이프라인 처리를 위한 하드웨어 장치,
도 6은 본 발명의 일 실시예에 따른 심층 컨볼루션 신경망 처리 방법의 개념도,
도 7은 특징맵의 중첩 처리 개념을 나타낸 도면, 그리고,
도 8은 본 발명의 다른 실시예에 따른 심층 컨볼루션 신경망 처리를 위한 하드웨어 구성을 나타낸 도면이다.1 is a conceptual diagram of a deep convolution neural network,
2 is a sequential processing method of a deep convolution neural network,
3 is a parallel processing method of a deep convolution neural network,
4 is a pipeline processing method of a deep convolution neural network,
5 is a hardware device for pipeline processing of deep convolutional neural networks,
6 is a conceptual diagram of a deep convolution neural network processing method according to an embodiment of the present invention;
7 is a view showing a concept of overlapping processing of a feature map, and
8 is a diagram illustrating a hardware configuration for processing a deep convolution neural network according to another embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, with reference to the drawings will be described the present invention in more detail.

도 6은 본 발명의 일 실시예에 따른 심층 컨볼루션 신경망 처리 방법의 개념도이다. 본 발명의 실시예에 따른 심층 컨볼루션 신경망 처리 방법은, 대규모 채널을 가진 심층 컨볼루션 신경망에 대한 파이프라인 처리 방법을 제시한다.6 is a conceptual diagram of a deep convolution neural network processing method according to an embodiment of the present invention. The deep convolution neural network processing method according to the embodiment of the present invention provides a pipeline processing method for a deep convolution neural network having a large channel.

본 발명의 실시예에 따른 심층 컨볼루션 신경망 처리 방법에서는, 도 6에 도시된 바와 같이, 입력되는 특징맵(100)을 수평으로 분할하는 것에서 나아가, 수직으로도 재분할한다.In the deep convolution neural network processing method according to the embodiment of the present invention, as shown in FIG. 6, the input feature map 100 is further divided horizontally and further divided vertically.

입력되는 특징맵(100)의 수평적 분할 처리와 수직적 분할 처리를 통해 파이프라인 버퍼의 너비(Width)를 감소시키기 위함이다.This is to reduce the width of the pipeline buffer through horizontal division processing and vertical division processing of the input feature map 100.

구체적으로, 파이프라인 처리 시 2차원 특징맵의 라인 단위로 버퍼링하되, 파이프라인 버퍼의 크기를 줄이기 위해 특징맵을 수직적으로 분할하여 순차적으로 처리하도록 한다.Specifically, the buffering is performed in units of lines of the 2D feature map during pipeline processing, and the feature map is vertically divided to sequentially process to reduce the size of the pipeline buffer.

처리하는 특징맵의 너비 감소로 인해 이를 저장하는 파이프라인 버퍼의 너비 또한 감소된다.Due to the width reduction of the feature map being processed, the width of the pipeline buffer storing it is also reduced.

이를 통해, 특징맵(100)은 블럭화되어 다수의 블럭들(110)로 나누어지며, 이에 따라 특징맵(100)은 라인 단위가 아닌 블럭 단위로 심층 컨볼루션 신경망(200)에 입력된다.Through this, the feature map 100 is blocked and divided into a plurality of blocks 110. Accordingly, the feature map 100 is input to the deep convolutional neural network 200 in units of blocks instead of lines.

심층 컨볼루션 신경망(200)은 특징맵을 이용한 컨볼루션 연산을 수행하고, 연산 결과(120)를 출력한다.The deep convolution neural network 200 performs a convolution operation using the feature map, and outputs the operation result 120.

한편, 수평으로 분할된 특징맵(100)을 수직으로 분할함에 있어, 경계선을 중심으로 주변 일부가 중첩되도록 한다. 구체적으로, 도 7에 도시된 바와 같이, 컨볼루션 윈도우의 너비를 기초로 적정 너비의 특징맵(100)이 중첩되어 분할되도록 하는 것이다.Meanwhile, in vertically dividing the horizontally divided feature map 100, a portion of the periphery overlaps with respect to the boundary line. In detail, as illustrated in FIG. 7, the feature map 100 having the appropriate width is overlapped and divided based on the width of the convolution window.

특징맵(100)을 분할하여 처리하는 경우에, 경계선에서 발생하는 오류를 제거하여 성능 저하를 최소화기 위함이다. 이에, 파이프라인 버퍼 외에 이전 경계선에서의 특징맵 저장을 위한 별도의 버퍼가 필요하다.In the case of processing the feature map 100 by dividing it, it is to minimize the performance degradation by eliminating errors occurring at the boundary line. Therefore, in addition to the pipeline buffer, a separate buffer for storing the feature map in the previous boundary is required.

도 8은 본 발명의 다른 실시예에 따른 심층 컨볼루션 신경망 처리를 위한 하드웨어 구성을 나타낸 도면이다.8 is a diagram illustrating a hardware configuration for processing a deep convolution neural network according to another embodiment of the present invention.

본 발명의 실시예에 따른 심층 컨볼루션 신경망 처리를 위한 하드웨어는, 대규모 채널을 가진 심층 컨볼루션 신경망의 파이프라인 처리를 위한 하드웨어로, FPGA, ASIC 등으로 구현할 수 있다.The hardware for deep convolution neural network processing according to an embodiment of the present invention is hardware for pipeline processing of a deep convolution neural network having a large channel, and may be implemented in an FPGA, an ASIC, or the like.

도시된 바와 같이, 본 발명의 실시예에 따른 심층 컨볼루션 신경망 처리를 위한 하드웨어는, 입력 특징맵 버퍼(210), 경계값 버퍼(215), 심층 컨볼루션 신경망 레이어-1(220), 특징맵 버퍼(230), 경계값 버퍼(235)를 포함한다.As shown, the hardware for deep convolution neural network processing according to an embodiment of the present invention, the input feature map buffer 210, boundary value buffer 215, deep convolutional neural network layer-1 (220), feature map The buffer 230 and the boundary value buffer 235 are included.

입력 특징맵 버퍼(210)는 입력되는 특징맵을 임시 저장하고, 저장된 특징맵을 수평으로 분할하고 수직으로 재분할한다. 경계값 버퍼(215)에는 분할된 경계선 주변의 특징맵 값이 저장된다. 다음 블럭에 대한 연산에 필요하기 때문이다.The input feature map buffer 210 temporarily stores the input feature map, divides the stored feature map horizontally, and subdivides it vertically. The boundary value buffer 215 stores the feature map values around the divided boundary lines. This is necessary for the operation on the next block.

심층 컨볼루션 신경망 레이어-1(220)를 구성하는 컨볼루션 계산기들은 입력 특징맵 버퍼(210)와 경계값 버퍼(215)에 저장된 특징맵 데이터를 이용하여 컨볼루션 연산을 수행하고, 연산 결과(120)를 특징맵 버퍼(230)에 저장한다.The convolution calculators constituting the deep convolution neural network layer-1 220 perform a convolution operation using the feature map data stored in the input feature map buffer 210 and the boundary value buffer 215, and calculate a result 120. ) Is stored in the feature map buffer 230.

경계값 버퍼(235)에는 특징맵 버퍼(230)에 저장된 특징맵에서 경계선 주변의 특징맵 값이 저장된다. 특징맵 버퍼(230)와 경계값 버퍼(235)에 저장된 특징맵 데이터는 심층 컨볼루션 신경망 레이어-2(미도시)로 입력되어 연산된다.The boundary value buffer 235 stores feature map values around boundary lines in the feature map stored in the feature map buffer 230. The feature map data stored in the feature map buffer 230 and the boundary value buffer 235 is input to the deep convolutional neural network layer-2 (not shown) and calculated.

이 과정은 심층 컨볼루션 신경망(200)을 구성하는 레이어들 전부에 대해 수행되며, 마지막 레이에어서의 연산 결과가 심층 컨볼루션 신경망(200)의 출력이 된다.This process is performed for all layers constituting the deep convolution neural network 200, and the result of the calculation of the last layer becomes the output of the deep convolution neural network 200.

지금까지, 대규모 심층 신경망의 고속 저전력 처리를 위한 방법 및 장치에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, the method and apparatus for high speed and low power processing of a large scale deep neural network have been described in detail with reference to a preferred embodiment.

위 실시예에서는, 특징맵 분할 기반의 심층 컨볼루션 신경망 파이프라인 처리 방법 및 이를 위한 하드웨어 장치를 제시하였으며, 이 과정에서 경계선 오류 제거를 위한 중첩처리 방법 및 이를 지원하기 위한 버퍼 구조를 제시하였다.In the above embodiment, a method for processing a deep convolutional neural network pipeline based on feature map segmentation and a hardware device therefor has been presented. In this process, an overlapping method for removing boundary errors and a buffer structure for supporting the same are presented.

위 실시예에서 언급한 심층 컨볼루션 신경망은 신경망의 일종으로 언급한 것이다. 다른 종류의 신경망으로 대체되는 경우에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다.The deep convolution neural network mentioned in the above embodiment is referred to as a kind of neural network. Of course, the technical spirit of the present invention can be applied to the case of being replaced with another kind of neural network.

본 발명의 실시예에 따른 대규모 심층 신경망 처리 방법 및 장치는, 심층 인공 신경망에 대한 가속 기술로, ADAS(Advanced Driver Assistance System : 첨단 운전자 지원 시스템), 자율주행 등을 위한 차량 지능화 기술에 적용될 수 있다.The large-scale deep neural network processing method and apparatus according to an embodiment of the present invention is an acceleration technology for deep artificial neural networks, and may be applied to vehicle intelligence technology for advanced driver assistance system (ADAS), autonomous driving, and the like. .

나아가, 본 발명의 실시예에 따른 대규모 심층 신경망 처리 방법 및 장치는, 보안 및 감시 카메라 지능형 영상처리 기술에도 적용될 수 있고, 그 밖의 다른 기술 분야에도 적용될 수 있음은 물론이다.Furthermore, the large-scale deep neural network processing method and apparatus according to the embodiment of the present invention may be applied to security and surveillance camera intelligent image processing technology, and may be applied to other technical fields.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.On the other hand, the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical idea according to various embodiments of the present disclosure may be implemented in the form of computer readable codes recorded on a computer readable recording medium. The computer-readable recording medium can be any data storage device that can be read by a computer and can store data. For example, the computer-readable recording medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical disk, a hard disk drive, or the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between the computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although the preferred embodiment of the present invention has been shown and described above, the present invention is not limited to the specific embodiments described above, but the technical field to which the invention belongs without departing from the spirit of the invention claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

100 : 입력 특징맵
200 : 심층 컨볼루션 신경망
210, 230 : 특징맵 버퍼
215, 235 : 경계값 버퍼
220 : 심층 컨볼루션 신경망 레이어100: input feature map
200: Deep Convolution Neural Network
210, 230: feature map buffer
215, 235: boundary value buffer
220: deep convolution neural network layer

Claims

Dividing the feature map;
Repartitioning the divided feature map;
Inputting the repartitioned feature map into the neural network;
Performing an operation using a feature map input from a neural network; And
Outputting a result of the operation; neural network processing method comprising a.

The method according to claim 1,
The division step is
Neural network processing method characterized by dividing the feature map horizontally.

The method according to claim 2,
The repartitioning step is
A neural network processing method characterized by dividing a divided feature map vertically.

The method according to claim 3,
The repartitioning step is
The neural network processing method of the divided feature map, the subdivision so as to overlap a portion around the boundary line.

The method according to claim 4,
And a buffer in which the subdivided feature map is stored and a buffer in which an overlapping area centered on a boundary line is stored.

The method according to claim 1,
The input step, the calculation step and the output step are
Neural network processing method characterized in that the processing in the pipeline.

The method according to claim 1,
Neural network,
Neural network processing method characterized in that the deep convolutional neural network.

A buffer for dividing the feature map and subdividing the divided feature map;
And a neural network that receives the feature map repartitioned from the buffer and performs an operation to output a result.