KR20180101978A

KR20180101978A - Neural network device, Neural network processor and method of operating neural network processor

Info

Publication number: KR20180101978A
Application number: KR1020170041160A
Authority: KR
Inventors: 이세환; 김동영; 유승주
Original assignee: 삼성전자주식회사; 서울대학교산학협력단
Priority date: 2017-03-06
Filing date: 2017-03-30
Publication date: 2018-09-14
Also published as: KR102390379B1

Abstract

Provided are a method for selectively performing a convolution operation on an input feature map and a weight map based on input feature information indicating whether each of a plurality of input features of the input feature map has a non-zero value and weight information indicating whether each of a plurality of weights of the weight map has a non-zero value, and a processor therefor. Accordingly, the present invention can reduce an operation amount and an operation time in the convolution operation.

Description

Technical Field [0001] The present invention relates to a neural network processor, a neural network processor, and a neural network device,

본 개시는 뉴럴 네트워크 프로세서, 뉴럴 네트워크 프로세서의 동작 방법, 및 뉴럴 네트워크 장치에 관한 것이다.The present disclosure relates to a neural network processor, a method of operating a neural network processor, and a neural network device.

뉴럴 네트워크(neural network)는 생물학적 뇌를 모델링한 컴퓨터 과학적 아키텍쳐(computational architecture)를 참조한다. 최근 뉴럴 네트워크(neural network) 기술이 발전함에 따라, 다양한 종류의 전자 시스템에서 뉴럴 네트워크 장치를 사용하여 입력 데이터를 분석하고 유효한 정보를 추출하는 연구가 활발히 진행되고 있다.A neural network refers to a computational architecture that models a biological brain. Recently, with the development of neural network technology, various kinds of electronic systems have been actively studied for analyzing input data and extracting valid information using a neural network device.

뉴럴 네트워크 장치는 복잡한 입력 데이터에 대한 많은 양의 연산을 필요로 한다. 뉴럴 네트워크 장치가 고화질 입력을 실시간으로 분석하고, 정보를 추출하기 위해서 뉴럴 네트워크 연산을 효율적으로 처리할 수 있는 기술이 요구된다.Neural network devices require large amounts of computation for complex input data. There is a need for a technique capable of efficiently processing neural network operations in order to analyze high-quality input in real time and extract information.

뉴럴 네트워크 프로세서, 뉴럴 네트워크 프로세서의 동작 방법, 및 뉴럴 네트워크 장치를 제공하는데 있다. 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시예들로부터 또 다른 기술적 과제들이 유추될 수 있다.A neural network processor, a method of operating a neural network processor, and a neural network apparatus. The technical problem to be solved by this embodiment is not limited to the above-mentioned technical problems, and other technical problems can be deduced from the following embodiments.

일 측면에 따라, 뉴럴 네트워크 프로세서는, 입력 피처 맵(input feature map)의 복수의 입력 피처들 각각이 비제로(non-zero)값을 갖는지 여부를 나타내는 입력 피처 정보 및 웨이트 맵(weight map)의 복수의 웨이트들 각각이 비제로값을 갖는지 여부를 나타내는 웨이트 정보를 획득하고, 획득된 입력 피처 정보 및 웨이트 정보에 기초하여, 복수의 입력 피처들 및 복수의 웨이트들 중에서 컨벌루션 연산이 수행될 입력 피처 및 웨이트를 결정하는, 페치 컨트롤러(fetch controller); 및 결정된 웨이트 및 입력 피처에 대해 컨벌루션 연산을 수행하여 출력 피처 맵을 생성하는 데이터 연산 회로;를 포함할 수 있다.In accordance with one aspect, the neural network processor includes input feature information that indicates whether each of a plurality of input features of an input feature map has a non-zero value, Acquiring weight information indicating whether each of the plurality of weights has a nonzero value, calculating, based on the obtained input feature information and weight information, a plurality of input features and an input feature And a fetch controller for determining a weight; And a data operation circuit that performs a convolution operation on the determined weight and the input feature to generate an output feature map.

또한, 데이터 연산 회로는, 복수의 입력 피처들 및 복수의 웨이트들 중에서 결정된 웨이트 및 입력 피처에 대해서만 컨벌루션 연산을 선택적으로 수행할 수 있다.The data operation circuit may also selectively perform a convolution operation only for a weight and an input feature determined among a plurality of input features and a plurality of weights.

또한, 페치 컨트롤러는, 입력 피처 정보 및 웨이트 정보에 대한 연산을 수행하여, 비제로값을 갖는 입력 피처 및 웨이트를 검출하고, 데이터 연산 회로는, 검출된 입력 피처 및 웨이트에 대해 컨벌루션 연산을 수행할 수 있다.The fetch controller also performs an operation on input feature information and weight information to detect input features and weights having nonzero values and the data operation circuit performs a convolution operation on the detected input features and weights .

또한, 입력 피처 정보는, 제로값을 갖는 피처는 0으로 표시되고, 비제로값을 갖는 피처는 1로 표시되는 입력 피처 벡터를 포함하고, 웨이트 정보는, 제로값을 갖는 웨이트는 0으로 표시되고, 비제로값을 갖는 웨이트는 1로 표시되는 웨이트 벡터를 포함할 수 있다.Further, the input feature information includes a feature having a zero value represented by 0, a feature having a non-zero value includes an input feature vector represented by 1, and the weight information is expressed by a weight having a zero value as 0 , And a weight having a non-zero value may include a weight vector denoted by 1.

또한, 결정된 입력 피처 및 웨이트가, 제 1 입력 피처 및 제 1 웨이트, 및 제 2 입력 피처 및 제 2 웨이트인 경우, 데이터 연산 회로는, 현재 사이클에서, 제 1 입력 피처 및 제 1 웨이트를 입력 피처 맵 및 웨이트 맵으로부터 읽어 내어 컨벌루션 연산을 수행하고, 다음 사이클에서, 제 2 입력 피처 및 제 2 웨이트를 입력 피처 맵 및 웨이트 맵으로부터 읽어 내어 컨벌루션 연산을 수행할 수 있다.In addition, when the determined input features and weights are the first input feature and the first weight, and the second input feature and the second weight, the data operation circuit, in the current cycle, converts the first input feature and the first weight into input features From the map and weight map to perform the convolution operation, and in the next cycle, perform the convolution operation by reading the second input feature and the second weight from the input feature map and the weight map.

다른 측면에 따른, 뉴럴 네트워크 프로세서의 동작 방법은 입력 피처 맵(input feature map)의 복수의 입력 피처들 각각이 비제로(non-zero)값을 갖는지 여부를 나타내는 입력 피처 정보 및 웨이트 맵의 복수의 웨이트들 각각이 비제로값을 갖는지 여부를 나타내는 웨이트 정보를 획득하는 단계; 획득된 입력 피처 정보 및 웨이트 정보에 기초하여, 복수의 입력 피처들 및 복수의 웨이트들 중에서 컨벌루션 연산이 수행될 입력 피처 및 웨이트를 결정하는 단계; 및 결정된 웨이트 및 입력 피처에 대해 컨벌루션 연산을 수행하여 출력 피처 맵을 생성하는 단계를 포함할 수 있다.According to another aspect, a method of operation of a neural network processor includes input feature information indicating whether each of a plurality of input features of an input feature map has a non-zero value, Obtaining weight information indicating whether each of the weights has a non-zero value; Determining input features and weights to be convoluted among the plurality of input features and the plurality of weights based on the obtained input feature information and weight information; And performing a convolution operation on the determined weight and the input feature to generate an output feature map.

또 다른 측면에 따라, 뉴럴 네트워크 장치는, 복수의 뉴럴 네트워크 프로세서들을 포함하는 프로세서 어레이; 입력 피처 맵 및 복수의 웨이트 맵들을 저장하는 메모리; 및 입력 피처 맵 및 복수의 웨이트 맵들을 프로세서 어레이에 할당하는 컨트롤러를 포함하고, 컨트롤러는, 웨이트 맵 내에서 비제로값을 갖는 웨이트의 비율을 기준으로, 복수의 웨이트 맵들을 복수의 웨이트 그룹들로 그룹화하고, 복수의 웨이트 그룹들 각각을 프로세서 어레이에 할당할 수 있다.According to another aspect, a neural network device comprises: a processor array including a plurality of neural network processors; A memory for storing an input feature map and a plurality of weight maps; And a controller for assigning the input feature map and the plurality of weight maps to the processor array, wherein the controller is configured to assign the plurality of weight maps to the plurality of weight groups based on the ratio of the weights having non-zero values in the weight map Group, and assign each of the plurality of weight groups to the processor array.

또한, 컨트롤러는, 웨이트 그룹에 포함된 웨이트 맵들 각각의 비제로 값을 갖는 웨이트 비율이 균등해지도록, 복수의 웨이트 맵들을 복수의 웨이트 그룹들로 그룹화할 수 있다.The controller may also group the plurality of weight maps into a plurality of weight groups so that the weight ratios with non-zero values of each of the weight maps included in the weight group become equal.

또한, 컨트롤러는, 복수의 뉴럴 네트워크 프로세서들을 복수의 프로세서 그룹들로 그룹화하고, 복수의 웨이트 그룹들 각각을 순차적으로 상기 복수의 프로세서 그룹들 각각에 할당할 수 있다.The controller may also group the plurality of neural network processors into a plurality of processor groups and sequentially assign each of the plurality of weight groups to each of the plurality of processor groups.

또한, 컨트롤러는, 입력 피처 맵의 입력 피처들 각각이 비제로값을 갖는지 여부를 나타내는 입력 피처 정보 및, 복수의 웨이트 맵들의 웨이트들 각각이 비제로값을 갖는지 여부를 나타내는 웨이트 정보를 프로세서 어레이에 제공하고, 프로세서 어레이는, 입력 피처 정보 및 웨이트 정보에 기초하여, 입력 피처 맵 및 복수의 웨이트 맵들에 대한 컨벌루션 연산을 수행하여 출력 피처 맵을 생성할 수 있다.The controller also includes input feature information that indicates whether each of the input features of the input feature map has a nonzero value and weight information that indicates whether each of the weights of the plurality of weight maps has a non- And the processor array may perform a convolution operation on the input feature map and the plurality of weight maps based on the input feature information and weight information to generate an output feature map.

또 다른 측면에 따라, 뉴럴 네트워크 프로세서의 동작 방법을 구현하기 위한 프로그램이 기록된 컴퓨터로 판독 가능한 기록 매체가 제공된다.According to another aspect, there is provided a computer-readable recording medium having recorded thereon a program for implementing a method of operating a neural network processor.

본 실시예들에 따르면, 뉴럴 네트워크 프로세서는 비제로값을 갖는 입력 피처 및 웨이트에 대한 컨벌루션 연산을 선택적으로 수행할 수 있는 바, 입력 피처 및 웨이트에 대한 컨벌루션 연산에 있어서 연산량 및 연산 시간을 감소시킬 수 있다.According to these embodiments, the neural network processor can selectively perform convolution operations on input features and weights having a nonzero value, thereby reducing the amount of computation and computation time in convolution computation on input features and weights .

또한, 본 실시예들에 따르면, 뉴럴 네트워크 장치는 비제로값을 갖는 웨이트의 비율를 기준으로 그룹화된 복수의 웨이트 그룹들 각각을 순차적으로 프로세서 어레이에 할당하는 바, 프로세서 어레이의 컨벌루션 연산 속도를 향상시킬 수 있다.In addition, according to the embodiments, the neural network apparatus sequentially allocates each of the plurality of weight groups grouped based on the ratio of the weights having the non-zero value to the processor array, thereby improving the convolution operation speed of the processor array .

도 1은 뉴럴 네트워크 구조의 일 예시를 나타낸다.
도 2는 일 실시예에 따른 뉴럴 네트워크 프로세서를 나타내는 블록도이다.
도 3은 뉴럴 네트워크 프로세서가 선택적 컨벌루션을 수행하는 구체적인 실시예를 나타낸다.
도 4는 일 실시예에 따른 뉴럴 네트워크 프로세서의 동작 방법을 설명하기 위한 도면이다.
도 5는 뉴럴 네트워크 장치의 일 실시예를 나타낸다.
도 6은 뉴럴 네트워크 장치가 복수의 웨이트 맵을 복수의 웨이트 그룹들로 그룹화하는 실시예를 나타낸다.
도 7a 및 7b는 뉴럴 네트워크 장치가 입력 피처 맵 및 웨이트 맵을 처리하는 실시예를 나타낸다.
도 8은 일 실시예에 따라, 뉴럴 네트워크 장치의 동작 방법을 설명하기 위한 도면이다.
도 9는 본 개시의 실시예에 따른 전자 시스템을 나타내는 블록도이다.1 shows an example of a neural network structure.
2 is a block diagram illustrating a neural network processor in accordance with one embodiment.
3 shows a specific embodiment in which the neural network processor performs selective convolution.
4 is a diagram for explaining a method of operating a neural network processor according to an embodiment of the present invention.
5 shows an embodiment of a neural network device.
6 shows an embodiment in which a neural network device groups a plurality of weight maps into a plurality of weight groups.
7A and 7B show an embodiment in which a neural network device processes an input feature map and a weight map.
8 is a diagram for explaining a method of operating a neural network device according to an embodiment.
9 is a block diagram illustrating an electronic system according to an embodiment of the present disclosure.

이하 첨부된 도면을 참조하면서 오로지 예시를 위한 실시예를 상세히 설명하기로 한다. 하기 실시예는 기술적 내용을 구체화하기 위한 것일 뿐 권리 범위를 제한하거나 한정하는 것이 아님은 물론이다. 상세한 설명 및 실시예로부터 해당 기술분야의 전문가가 용이하게 유추할 수 있는 것은 권리범위에 속하는 것으로 해석된다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Exemplary embodiments will now be described in detail with reference to the accompanying drawings. It is to be understood that the following embodiments are for the purpose of describing the technical contents, but do not limit or limit the scope of the rights. Those skilled in the art can easily deduce from the detailed description and examples that the scope of the present invention falls within the scope of the right.

본 명세서에서 사용되는 '구성된다' 또는 '포함한다' 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.As used herein, the terms " comprising " or " comprising " and the like should not be construed as necessarily including the various elements or steps described in the specification, May not be included, or may be interpreted to include additional components or steps.

또한, 본 명세서에서 사용되는 '제 1' 또는 '제 2' 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용할 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. In addition, terms including ordinals such as 'first' or 'second' used in this specification can be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

본 실시예들은 큐브 맵이라는 텍스쳐를 처리하는 방법 및 장치에 관한 것으로서 이하의 실시예들이 속하는 기술 분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 관해서는 자세한 설명을 생략한다.The present invention relates to a method and apparatus for processing a texture called a cubemap, and a detailed description thereof will be omitted with respect to matters widely known to those skilled in the art to which the following embodiments belong.

도 1은 뉴럴 네트워크 구조의 일 예시를 나타낸다.1 shows an example of a neural network structure.

도 1은 뉴럴 네트워크 구조의 일 예로서, 컨볼루션 뉴럴 네트워크의 구조를 나타낸다. 일 실시예에 따라, 도 1에서는 컨볼루션 뉴럴 네트워크 중 컨볼루션 레이어(10)가 도시되었지만, 컨볼루션 뉴럴 네트워크는, 컨볼루션 레이어(10)에 이어, 풀링 레이어(pooling layer), 풀리 커넥티드(fully connected) 레이어 등을 더 포함할 수 있다.1 shows a structure of a convolutional neural network as an example of a neural network structure. 1, a convolutional layer 10 is shown in the convolutional neural network, but the convolutional layer 10 is followed by a pooling layer, a pulley connected a fully connected layer, and the like.

컨볼루션 레이어(10)에서, 제 1 피처 맵(FM1)은 입력 피처 맵이 될 수 있고, 제 2 피처 맵(FM2)는 출력 피처 맵이 될 수 있다. 피처 맵은 입력 데이터의 다양한 특징이 표현된 데이터를 의미한다. 피처 맵들(FM1, FM2)은 2차원 매트릭스 또는 3차원 매트릭스 형태를 가질 수 있다. 이러한 다차원 매트릭스 형태를 가지는 피처 맵들(FM1, FM2)은 피처 텐서(tensor)로 지칭될 수 있다. 또한, 입력 피처 맵은 액티베이션(activation)으로 지칭될 수 있다. 피처 맵들(FM1, FM2)은 너비(W)(또는 칼럼이라고 함), 높이(H)(또는 로우라고 함) 및 깊이(D)를 가지며, 이는 좌표상의 x축, y축 및 z축에 각각 대응할 수 있다. 이때, 깊이(D)는 채널 수로 지칭될 수 있다. In the convolution layer 10, the first feature map FM1 may be an input feature map and the second feature map FM2 may be an output feature map. The feature map means data in which various features of input data are expressed. The feature maps FM1 and FM2 may have a two-dimensional matrix or a three-dimensional matrix shape. Feature maps (FM1, FM2) having such a multidimensional matrix shape may be referred to as feature tensors. Also, the input feature map may be referred to as activation. Feature maps FM1 and FM2 have a width W (or a column), a height H (or a row), and a depth D, which correspond to the x, y, and z axes Can respond. At this time, the depth D may be referred to as a channel number.

컨볼루션 레이어(10)에서, 제1 피처 맵(FM1) 및 웨이트 맵(WM)에 대한 컨볼루션 연산이 수행될 수 있고, 그 결과 제2 피처 맵(FM2)이 생성될 수 있다. 웨이트 맵(WM)은 제1 피처 맵(FM1)을 필터링할 수 있으며, 필터 또는 커널로 지칭될 수 있다. 웨이트 맵(WM)의 깊이, 즉 채널 개수는 제1 피처 맵(FM1)의 깊이, 즉 채널 개수와 동일하며, 웨이트 맵(WM)과 제1 피처 맵(FM1)의 동일한 채널끼리 컨볼루션될 수 있다. 웨이트 맵(WM)이 제1 입력 피처 맵(FM1)을 슬라이딩 윈도로 하여 횡단하는 방식으로 시프트된다. 시프트되는 양은 "스트라이드(stride) 길이" 또는 "스트라이드"로 지칭될 수 있다. 각 시프트동안, 웨이트 맵(WM)에 포함되는 웨이트 각각이 제1 피처 맵(FM1)과 중첩된 영역에서의 모든 피처값과 곱해지고 더해질 수 있다. 제1 피처 맵(FM1)과 웨이트 맵(WM)이 컨볼루션 됨에 따라, 제2 피처 맵(FM2)의 하나의 채널이 생성될 수 있다. 도 1에는 하나의 웨이트 맵(WM)이 표시되었으나, 실질적으로는 복수개의 웨이트 맵이 제1 피처 맵(FM1)과 컨볼루션 되어, 제2 피처 맵(FM2)의 복수개의 채널이 생성될 수 있다. 다시 말해, 제2 피처 맵(FM2)의 채널의 수는 웨이트 맵의 개수에 대응할 수 있다.In the convolution layer 10, a convolution operation on the first feature map FM1 and the weight map WM may be performed, and as a result a second feature map FM2 may be generated. The weight map WM may filter the first feature map FM1 and may be referred to as a filter or kernel. The depth of the weight map WM, that is, the number of channels is equal to the depth of the first feature map FM1, that is, the number of channels, and the same channel of the weight map WM and the first feature map FM1 can be convoluted have. The weight map WM is shifted in such a manner as to traverse the first input feature map FM1 as a sliding window. The amount to be shifted may be referred to as "stride length" or "stride ". During each shift, each of the weights contained in the weight map WM may be multiplied and added with all the feature values in the overlapping region of the first feature map FM1. As the first feature map FM1 and the weight map WM are convolved, one channel of the second feature map FM2 can be generated. Although one weight map WM is shown in FIG. 1, a plurality of weight maps may be convolved with the first feature map FM1 to generate a plurality of channels of the second feature map FM2 . In other words, the number of channels of the second feature map FM2 may correspond to the number of weight maps.

또한, 컨벌루션 레이어(10)의 제 2 피처 맵(FM2)은 다른 레이어의 입력 피처 맵이 될 수 있다. 예를 들어, 제 2 피처 맵(FM2)는 풀링(pooling) 레이어의 입력 피처 맵이 될 수 있다.In addition, the second feature map FM2 of the convolution layer 10 may be an input feature map of another layer. For example, the second feature map FM2 may be an input feature map of a pooling layer.

도 2는 일 실시예에 따른 뉴럴 네트워크 프로세서를 나타내는 블록도이다.2 is a block diagram illustrating a neural network processor in accordance with one embodiment.

뉴럴 네트워크 프로세서(100)는 하드웨어 회로들로 구현될 수 있다. 예를 들어, 뉴럴 네트워크 프로세서(100)는 집적 회로(Integrated Circuit)로 구현될 수 있다. 뉴럴 네트워크 프로세서(100)는, CPU(Central Processing Unit), 멀티-코어 CPU, 어레이 프로세서(array processor), 벡터 프로세서(vector processor), DSP(Digital Signal Processor), FPGA(Field-Programmable Gate Array), PLA(Programmable Logic Array), ASIC(Application Specific Integrated Circuit), 프로그램 가능한 논리 회로(programmable logic circuitry), VPU(Video Processing Unit) 및 GPU(Graphics Processing Unit) 등 중 적어도 하나를 포함할 수 있으나 이에 제한되지 않는다.The neural network processor 100 may be implemented with hardware circuits. For example, the neural network processor 100 may be implemented as an integrated circuit. The neural network processor 100 includes a central processing unit (CPU), a multi-core CPU, an array processor, a vector processor, a digital signal processor (DSP), a field- But are not limited to, at least one of Programmable Logic Array (PLA), Application Specific Integrated Circuit (ASIC), programmable logic circuitry, Video Processing Unit (VPU), and Graphics Processing Unit Do not.

뉴럴 네트워크 프로세서(100)는 페치 컨트롤러(fetch controller)(112), 및 데이터 연산 회로(114)를 포함할 수 있다. 도 2에 도시된 뉴럴 네트워크 프로세서(100)는 본 실시예와 관련된 구성요소들만이 도시되어 있다. 따라서, 도 2에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음을 본 실시예와 관련된 기술분야에서 통상의 지식을 가진 자라면 이해할 수 있다.The neural network processor 100 may include a fetch controller 112, and a data operation circuit 114. The neural network processor 100 shown in FIG. 2 is only shown in the components associated with this embodiment. Accordingly, it will be understood by those skilled in the art that other general-purpose components other than the components shown in FIG. 2 may be further included.

페치 컨트롤러(112)는, 입력 피처 맵의 복수의 입력 피처들 각각이 제로값을 갖는지 여부를 나타내는 입력 피처 정보, 및 웨이트 맵의 복수의 웨이트들 각각이 제로값을 갖는지 여부를 나타내는 웨이트 정보를 획득할 수 있다. 일 실시예에 따라, 페치 컨트롤러(112)는 외부로부터 입력 피처 정보 및 웨이트 정보를 수신할 수 있다. 또한, 다른 실시예에 따라, 페치 컨트롤러(112)는 입력 피처 맵으로부터 입력 피처 정보를 생성할 수 있고, 웨이트 맵으로부터 웨이트 정보를 생성할 수 있다.The fetch controller 112 obtains input feature information that indicates whether each of a plurality of input features of the input feature map has a zero value and weight information that indicates whether each of the plurality of weights of the weight map has a zero value can do. According to one embodiment, the fetch controller 112 may receive input feature information and weight information from the outside. Further, according to another embodiment, the fetch controller 112 may generate input feature information from the input feature map and generate weight information from the weight map.

페치 컨트롤러(112)는 입력 피처 정보 및 웨이트 정보에 기초하여, 복수의 입력 피처들 및 복수의 웨이트들 중에서 컨벌루션 연산이 수행될 입력 피처 및 웨이트를 결정할 수 있다. 일 실시예에 따라, 페치 컨트롤러(112)는 입력 피처 정보 및 웨이트 정보를 이용하여, 복수의 입력 피처들 및 복수의 웨이트들 중 서로 대응되는 위치에서 동일하게 비제로값을 갖는 입력 피처 및 웨이트를 검출할 수 있고, 검출된 입력 피처 및 웨이트를 컨벌루션 연산이 수행될 입력 피처 및 웨이트로 결정할 수 있다. 예를 들어, 입력 피처 정보 및 웨이트 정보 각각이, 제로값을 갖는 피처 또는 웨이트는 0으로 표시되고, 비제로값을 갖는 피처 또는 웨이트는 1로 표시되는 비트 벡터인 경우, 페치 컨트롤러(112)는 입력 피처 정보 및 웨이트 정보에 대한 AND 연산을 수행하여, 컨벌루션 연산이 수행될 입력 피처 및 웨이트를 결정할 수 있다. 일 실시예에 따라, 페치 컨트롤러(112)는 산술 연산 회로를 포함할 수 있다.The fetch controller 112 may determine input features and weights to be convoluted among the plurality of input features and the plurality of weights based on the input feature information and weight information. According to one embodiment, the fetch controller 112 uses the input feature information and weight information to generate input features and weights having equally non-zero values at locations that correspond to each other among a plurality of input features and a plurality of weights And the detected input features and weights may be determined as input features and weights to be convoluted. For example, if the input feature information and the weight information each indicate a feature or weight having a zero value and a feature or weight having a non-zero value is a bit vector denoted by 1, then the fetch controller 112 An AND operation on input feature information and weight information may be performed to determine input features and weights to be convoluted. According to one embodiment, the fetch controller 112 may comprise an arithmetic logic circuit.

데이터 연산 회로(114)는 페치 컨트롤러(112)에 의해 결정된 입력 피처 및 웨이트에 대한 컨벌루션 연산을 수행할 수 있다. 다시 말해, 데이터 연산 회로(114)는 입력 피처 맵의 복수의 입력 피처들 및 웨이트 맵의 복수의 웨이트들 중에서, 페치 컨트롤러(112)에 의해 결정된 입력 피처 및 웨이트에 대해서만 컨벌루션 연산을 선택적으로 수행할 수 있다. 구체적으로, 데이터 연산 회로(114)는 복수의 입력 피처들 및 복수의 웨이트들 중에서, 서로 대응되는 위치에서 동일하게 비제로값을 갖는 입력 피처 및 웨이트에 대해서만 컨벌루션 연산을 선택적으로 수행할 수 있다. 예를 들어, 데이터 연산 회로(114)는 입력 피처값과 웨이트값을 곱하여 컨벌루션 연산을 수행할 수 있다. 일 실시예에 따라, 데이터 연산 회로(114)는 산술 연산 회로로 구현될 수 있다. 또한, 데이터 연산 회로(114)는 입력 피처 및 웨이트에 대한 컨벌루션 연산을 수행하여 출력 피처값을 생성할 수 있다. 따라서, 데이터 연산 회로(114)는 서로 대응되는 위치에서 비제로값을 갖는 입력 피처 및 웨이트를 기준으로, 입력 피처 맵 및 웨이트 맵에 대해 선택적 컨벌루션을 수행할 수 있고, 그 결과 출력 피처 맵을 생성할 수 있다.Data operation circuit 114 may perform a convolution operation on input features and weights determined by fetch controller 112. [ In other words, the data operation circuitry 114 selectively performs a convolution operation only on input features and weights determined by the fetch controller 112, among a plurality of input features of the input feature map and a plurality of weights of the weight map . Specifically, the data operation circuit 114 may selectively perform a convolution operation only on input features and weights having the same non-zero value at positions corresponding to each other among a plurality of input features and a plurality of weights. For example, the data operation circuit 114 may perform a convolution operation by multiplying the input feature value by the weight value. According to one embodiment, the data operation circuit 114 may be implemented as an arithmetic operation circuit. In addition, data operation circuitry 114 may perform convolution operations on input features and weights to generate output feature values. Thus, the data processing circuitry 114 may perform optional convolution on the input feature map and weight map, based on the input features and weights having a nonzero value at locations that correspond to each other, thereby creating an output feature map can do.

일 실시예에 따라, 뉴럴 네트워크 프로세서(100)는 내부 메모리를 더 포함할 수 있다. 내부 메모리는 뉴럴 네트워크 프로세서(100)의 캐시 메모리일 수 있다. 내부 메모리는 SRAM(Static random access Memory)일 수 있다. 그러나, 이에 제한되는 것은 아니며, 내부 메모리는 뉴럴 네트워크 프로세서(100)의 단순 버퍼, 캐시 메모리, 또는 뉴럴 네트워크 프로세서(100)의 다른 종류의 메모리로 구현될 수 있다. 내부 메모리는 데이터 연산 회로(114)에서 수행되는 연산 동작에 따라 생성되는 데이터, 예컨대 출력 피처값, 출력 피처 맵, 또는 연산 과정에서 생성되는 다양한 종류의 데이터 등을 저장할 수 있다. According to one embodiment, the neural network processor 100 may further include an internal memory. The internal memory may be a cache memory of the neural network processor 100. The internal memory may be a static random access memory (SRAM). However, the present invention is not limited thereto, and the internal memory may be implemented as a simple buffer of the neural network processor 100, a cache memory, or another kind of memory of the neural network processor 100. The internal memory may store data generated according to operation operations performed in the data operation circuit 114, for example, output feature values, output feature maps, or various types of data generated during the operation.

뉴럴 네트워크 프로세서(100)는 데이터 연산 회로(114)에 의해 생성되는 출력 피처 맵을 내부 메모리에 저장하거나 외부로 출력할 수 있다.The neural network processor 100 may store the output feature map generated by the data operation circuit 114 in an internal memory or output it to the outside.

따라서, 본 개시에 따르면, 뉴럴 네트워크 프로세서(100)는 비제로값을 갖는 입력 피처 및 웨이트에 대한 컨벌루션 연산을 선택적으로 수행할 수 있는 바, 출력 피처에 영향을 미치지 않는 의미 없는 연산을 생략할 수 있다. 따라서, 뉴럴 네트워크 프로세서(100)는 입력 피처 및 웨이트에 대한 컨벌루션 연산에 있어서 연산량 및 연산 시간을 감소시킬 수 있다.Thus, according to the present disclosure, the neural network processor 100 can selectively perform convolution operations on input features and weights having nonzero values, which can omit meaningless operations that do not affect the output features have. Thus, the neural network processor 100 can reduce the amount of computation and computation time in convolution computation for input features and weights.

도 3은 뉴럴 네트워크 프로세서가 선택적 컨벌루션을 수행하는 구체적인 실시예를 나타낸다.3 shows a specific embodiment in which the neural network processor performs selective convolution.

페치 컨트롤러(112)는 입력 피처 맵의 복수의 입력 피처들 각각이 제로값을 갖는지 여부를 나타내는 입력 피처 벡터 및 웨이트 맵의 복수의 웨이트들 각각이 제로값을 갖는지 여부를 나타내는 웨이트 벡터를 획득할 수 있다. 입력 피처 벡터는, 입력 피처들 각각의 값이 비제로값인 경우 1로 표시되고, 제로값인 경우 0으로 표시되는 비트-벡터이고, 웨이트 벡터 또한, 웨이트들 각각의 값이 비제로값인 경우 1로 표시되고, 제로값인 경우 0으로 표시되는 비트-벡터이다. 다시 말해, 도 3에 개시되어 있듯이, 입력 피처 벡터는 5개의 입력 피처들 중 0번째 입력 피처, 3번째 입력 피처, 및 4번째 입력 피처가 비제로값을 가지므로 1로 표시되고, 1번째 입력 피처 및 2번째 입력 피처는 제로값을 가지므로 0으로 표시되는 벡터이다. 또한, 웨이트 벡터는 5개의 입력 웨이트들 중 0번째 웨이트, 1번째 웨이트, 및 3번째 웨이트가 비제로값을 가지므로 1로 표시되고, 2번째 웨이트 및 4번째 웨이트는 제로값을 가지므로 0으로 표시되는 벡터이다. 도 3에서는 5개의 입력 피처들에 대한 입력 피처 맵 및 입력 피처 벡터, 및 5개의 웨이트들에 대한 웨이트 맵 및 웨이트 벡터가 도시되었지만, 개수는 이에 제한되지 않는다. 또한, 도 3에서 도시되는 입력 피처 맵 및 웨이트 맵은 전체 입력 피처 맵 및 전체 웨이트 맵 중 일부분에 해당하는 입력 피처 맵 및 웨이트 맵일 수 있다.The fetch controller 112 may obtain an input feature vector that indicates whether each of a plurality of input features of the input feature map has a zero value and a weight vector that indicates whether each of the plurality of weights of the weight map has a zero value have. The input feature vector is a bit-vector denoted as 1 if the value of each of the input features is a nonzero value and denoted as 0 if it is a zero value, and the weight vector is also a weight vector if the value of each of the weights is non- 1, and zero if it is zero. In other words, as shown in FIG. 3, the input feature vector is represented by 1 since the 0th input feature, the 3 rd input feature, and the 4th input feature of the 5 input features have non-zero values, The feature and the second input feature are zero vectors because they have zero values. In addition, the weight vector is represented by 1 because the zero weight, the first weight, and the third weight of the five input weights have a non-zero value, and the second and fourth weights have a zero value, It is a vector to be displayed. Although the input feature map and the input feature vector for the five input features and the weight map and weight vector for the five weights are shown in FIG. 3, the number is not limited thereto. In addition, the input feature map and weight map shown in Fig. 3 may be an input feature map and a weight map corresponding to a portion of the entire input feature map and the entire weight map.

페치 컨트롤러(112)는 입력 피처 벡터 및 웨이트 벡터에 대한 AND 연산을 통해, 컨벌루션 연산이 수행될 입력 피처 및 웨이트를 결정할 수 있다. 구체적으로, 페치 컨트롤러(112)는 입력 피처 벡터 및 웨이트 벡터에 대한 AND 연산을 통해, 복수의 입력 피처들 및 복수의 웨이트들 중 서로 대응되는 위치에서 동일하게 비제로값을 갖는 입력 피처 및 웨이트를 검출할 수 있다. 도 3에 도시되어 있듯이, 페치 컨트롤러(112)는 입력 피처 벡터 및 웨이트 벡터에 대한 AND 연산을 수행할 수 있고, 그 결과가 1이 되는 0번째 입력 피처 및 웨이트, 및 3번째 입력 피처 및 웨이트를 검출할 수 있다. 따라서, 페치 컨트롤러(112)는 검출된 입력 피처 및 웨이트를 컨벌루션 연산이 수행될 입력 피처 및 웨이트로 결정할 수 있다.The fetch controller 112 may determine the input features and weights to be convoluted through an AND operation on the input feature vector and the weight vector. Specifically, the fetch controller 112 performs an AND operation on the input feature vector and weight vector to obtain input features and weights having equally non-zero values at positions corresponding to each other among a plurality of input features and a plurality of weights Can be detected. 3, the fetch controller 112 may perform an AND operation on the input feature vector and the weight vector, and the zeroth input feature and weight whose result is 1, and the third input feature and weight Can be detected. Thus, the fetch controller 112 may determine the detected input features and weights as input features and weights to be convoluted.

데이터 연산 회로(114)는 페치 컨트롤러(112)에 의해 결정된 입력 피처 및 웨이트에 대한 컨벌루션 연산을 수행할 수 있다. 데이터 연산 회로(114)는 입력 피처맵 및 웨이트 맵으로부터 순차적으로 입력 피처 및 웨이트를 읽어내어 컨벌루션 연산을 수행할 수 있다. 다시 말해, 데이터 연산 회로(114)는 현재 사이클에서 입력 피처 맵의 n번째 입력 피처 및 웨이트 맵의 n번째 웨이트를 읽어내어 컨벌루션 연산을 수행한 후, 다음 사이클에서 입력 피처 맵의 n+1번째 입력 피처 및 웨이트 맵의 n+1번째 웨이트를 읽어내어 컨벌루션 연산을 수행할 수 있다. 이 때, 데이터 연산 회로(114)는 페치 컨트롤러(112)에 의해 결정된 입력 피처 및 웨이트에 대해서는 컨벌루션 연산을 수행할 수 있고, 페치 컨트롤러(112)에 의해 결정되지 않은 입력 피처 및 웨이트에 대해서는 컨벌루션 연산을 생략할 수 있다. 도 3에 도시되어 있듯이, 데이터 연산 회로(114)는 현재 사이클에서 0번째 입력 피처 및 0번째 웨이트에 대한 컨벌루션 연산을 수행하고, 다음 사이클에서, 1번째 입력 피처 및 1번째 웨이트, 및 2번째 입력 피처 및 2번째 웨이트에 대한 컨벌루션 연산을 생략한 채, 3번째 입력 피처 및 3번째 웨이트에 대한 컨벌루션 연산을 수행할 수 있다. 다시 말해, 데이터 연산 회로(114)는 입력 피처 맵 및 웨이트 맵에서 서로 대응되는 위치에서 비제로값을 갖는 입력 피처 및 웨이트에 대해서만 컨벌루션 연산을 수행할 수 있다.Data operation circuit 114 may perform a convolution operation on input features and weights determined by fetch controller 112. [ The data operation circuit 114 may sequentially read the input features and weights from the input feature map and the weight map to perform the convolution operation. In other words, the data operation circuit 114 performs the convolution operation by reading the n-th input feature of the input feature map and the n-th weight of the weight map in the current cycle, It is possible to perform the convolution operation by reading the n + 1th weight of the feature and weight map. At this time, the data operation circuit 114 can perform a convolution operation on the input features and weights determined by the fetch controller 112, and the convolution operation on input features and weights not determined by the fetch controller 112 Can be omitted. As shown in FIG. 3, the data operation circuit 114 performs a convolution operation on the 0th input feature and the 0th weight in the current cycle, and in the next cycle, the first input feature and the first weight, and the second input The convolution operation for the third input feature and the third weight can be performed while omitting the convolution for the feature and the second weight. In other words, the data operation circuit 114 may perform a convolution operation only on input features and weights having a non-zero value at positions corresponding to each other in the input feature map and the weight map.

데이터 연산 회로(114)는 페치 컨트롤러(112)에 의해 결정된 입력 피처 및 웨이트에 대한 컨벌루션 연산을 수행하여 출력 피처 맵을 생성할 수 있다. 또한, 데이터 연산 회로(114)는, 웨이트 맵과 다른 입력 피처 맵에 대한 컨벌루션 연산을 통해 기 생성된 출력 피처 맵에, 도 3의 컨벌루션 연산 결과를 누적하여 출력 피처 맵을 생성할 수 있다.Data operation circuit 114 may perform a convolution operation on input features and weights determined by fetch controller 112 to generate an output feature map. In addition, the data operation circuit 114 can generate the output feature map by accumulating the convolution operation result of FIG. 3 on the output feature map generated through the convolution operation on the weight map and other input feature maps.

도 4는 일 실시예에 따른 뉴럴 네트워크 프로세서의 동작 방법을 설명하기 위한 도면이다.4 is a diagram for explaining a method of operating a neural network processor according to an embodiment of the present invention.

도 4에 도시된 방법은, 도 2의 뉴럴 네트워크 프로세서(100)의 각 구성요소에 의해 수행될 수 있고, 중복되는 설명에 대해서는 생략한다.The method shown in FIG. 4 may be performed by each component of the neural network processor 100 of FIG. 2, and redundant description is omitted.

단계 s410에서, 뉴럴 네트워크 프로세서(100)는, 입력 피처 맵(input feature map)의 복수의 입력 피처들 각각이 비제로(non-zero)값을 갖는지 여부를 나타내는 입력 피처 정보 및 웨이트 맵의 복수의 웨이트들 각각이 비제로값을 갖는지 여부를 나타내는 웨이트 정보를 획득할 수 있다. 일 실시예에 따라, 뉴럴 네트워크 프로세서(100)는 외부로부터 입력 피처 정보 및 웨이트 정보를 수신할 수 있다. 또한, 다른 실시예에 따라, 뉴럴 네트워크 프로세서(100)는 입력 피처 맵으로부터 입력 피처 정보를 생성할 수 있고, 웨이트 맵으로부터 웨이트 정보를 생성할 수 있다.At step S410, the neural network processor 100 determines whether each of a plurality of input features of the input feature map has input feature information that indicates whether each of the input features has a non-zero value, Weight information indicating whether or not each of the weights has a non-zero value can be obtained. According to one embodiment, the neural network processor 100 may receive input feature information and weight information from the outside. Further, in accordance with another embodiment, neural network processor 100 may generate input feature information from an input feature map and generate weight information from a weight map.

단계 s420에서, 뉴럴 네트워크 프로세서(100)는, s410에서 획득된 입력 피처 정보 및 웨이트 정보에 기초하여, 복수의 입력 피처들 및 복수의 웨이트들 중에서 컨벌루션 연산이 수행될 입력 피처 및 웨이트를 결정할 수 있다. 일 실시예에 따라, 뉴럴 네트워크 프로세서(100)는 입력 피처 정보 및 웨이트 정보에 대한 연산을 수행하여, 비제로값을 갖는 입력 피처 및 웨이트를 검출할 수 있고, 검출된 입력 피처 및 웨이트를 컨벌루션 연산이 수행될 입력 피처 및 웨이트로 결정할 수 있다.At step s420, the neural network processor 100 may determine input features and weights to be convoluted among the plurality of input features and the plurality of weights based on the input feature information and weight information obtained at s410 . According to one embodiment, the neural network processor 100 may perform an operation on input feature information and weight information to detect input features and weights having non-zero values and to perform a convolution operation on the detected input features and weights Can be determined by the input features and weights to be performed.

단계 s430에서, 뉴럴 네트워크 프로세서(100)는, s420에서 결정된 웨이트 및 입력 피처에 대해 컨벌루션 연산을 수행하여 출력 피처 맵을 생성할 수 있다. 다시 말해, 뉴럴 네트워크 프로세서(100)는 입력 피처 맵의 복수의 입력 피처들 및 웨이트 맵의 복수의 웨이트들 중에서, s420에서 결정된 입력 피처 및 웨이트에 대해서만 컨벌루션 연산을 선택적으로 수행할 수 있다. 구체적으로, 뉴럴 네트워크 프로세서(100)는 복수의 입력 피처들 및 복수의 웨이트들 중에서, 서로 대응되는 위치에서 동일하게 비제로값을 갖는 입력 피처 및 웨이트에 대해서만 컨벌루션 연산을 선택적으로 수행할 수 있다. 또한, 뉴럴 네트워크 프로세서(100)는 입력 피처 및 웨이트에 대한 컨벌루션 연산을 수행하여 출력 피처값을 생성할 수 있다. 따라서, 뉴럴 네트워크 프로세서(100)는 서로 대응되는 위치에서 비제로값을 갖는 입력 피처 및 웨이트를 기준으로, 입력 피처 맵 및 웨이트 맵에 대해 선택적 컨벌루션을 수행할 수 있고, 그 결과 출력 피처 맵을 생성할 수 있다.At step s430, neural network processor 100 may perform a convolution operation on the weight and input features determined at S420 to generate an output feature map. In other words, the neural network processor 100 may selectively perform a convolution operation only on the input features and weights determined in S420, among the plurality of weights of the input features and the weight map of the input feature map. Specifically, the neural network processor 100 may selectively perform a convolution operation only on input features and weights having the same non-zero value at positions corresponding to each other among a plurality of input features and a plurality of weights. In addition, the neural network processor 100 may perform convolution operations on input features and weights to produce output feature values. Thus, the neural network processor 100 can perform selective convolution on the input feature map and weight map, based on input features and weights having non-zero values at locations that correspond to each other, resulting in an output feature map can do.

도 5는 뉴럴 네트워크 장치의 일 실시예를 나타낸다.5 shows an embodiment of a neural network device.

뉴럴 네트워크 장치(1000)는 컨트롤러(1010), 프로세서 어레이(1020) 및 메모리(1030)를 포함할 수 있다. 뉴럴 네트워크 장치(1000)의 구성 요소들(1010,1020,1030)은 시스템 버스를 통해 통신할 수 있다. 실시예에 있어서, 뉴럴 네트워크 장치(1000)는 하나의 반도체 칩으로 구현될 수 있으며, 예컨대 시스템 온 칩(SoC) 으로서 구현될 수 있다. 그러나 이에 제한되는 것은 아니며, 뉴럴 네트워크 장치(1000)는 복수의 반도체 칩으로 구현될 수 있다. The neural network device 1000 may include a controller 1010, a processor array 1020, and a memory 1030. The components 1010, 1020, 1030 of the neural network device 1000 may communicate via the system bus. In an embodiment, the neural network device 1000 may be implemented as a single semiconductor chip, for example, as a system-on-chip (SoC). However, the present invention is not limited thereto, and the neural network device 1000 may be implemented by a plurality of semiconductor chips.

컨트롤러(1010)는 CPU(central processing unit), 마이크로 프로세서 등으로 구현될 수 있으며, 뉴럴 네트워크 장치(1000)의 전반적인 동작을 제어할 수 있다. 컨트롤러(1010)는 프로세서 어레이(1020) 및 메모리(1030)의 동작을 제어할 수 있다. 예컨대, 컨트롤러(1010)는 프로세서 어레이(1020)가 뉴럴 네트워크의 레이어들을 정상적으로 실행할 수 있도록 파라미터들을 세팅하고 관리할 수 있다. 또한, 일 실시예에 따라, 컨트롤러(1010)는 ReLU(Rectifier linear Unit) 모듈을 포함할 수 있다.The controller 1010 may be implemented as a central processing unit (CPU), a microprocessor, or the like, and may control the overall operation of the neural network device 1000. The controller 1010 can control the operation of the processor array 1020 and the memory 1030. For example, the controller 1010 may set and manage parameters such that the processor array 1020 can execute layers of the neural network normally. Also, according to one embodiment, the controller 1010 may include a rectifier linear unit (ReLU) module.

프로세서 어레이(1020)는 복수의 뉴럴 네트워크 프로세서들을 포함할 수 있다. 또한, 프로세서 어레이(1020)는 복수의 뉴럴 네트워크 프로세서들이 어레이의 형태로 구현될 수 있다. 일 실시예에 따라, 프로세서 어레이(1020)에 포함된 복수의 뉴럴 네트워크 프로세서들 각각은 도 2 및 3의 뉴럴 네트워크 프로세서(100)가 될 수 있다. 또한, 복수의 뉴럴 네트워크 프로세서들은 병렬적으로 동시에 동작되도록 구현될 수 있다. 나아가, 복수의 뉴럴 네트워크 프로세서들 각각은 독립적으로 동작할 수 있다. 예를 들어, 각각의 뉴럴 네트워크 프로세서는 각각이 인스트럭션들을 실행할 수 있는 코어 회로로서 구현될 수 있다.The processor array 1020 may include a plurality of neural network processors. In addition, the processor array 1020 may be implemented as a plurality of neural network processors in the form of arrays. According to one embodiment, each of the plurality of neural network processors included in the processor array 1020 may be the neural network processor 100 of FIGS. 2 and 3. In addition, a plurality of neural network processors may be implemented to operate in parallel simultaneously. Further, each of the plurality of neural network processors may operate independently. For example, each neural network processor may be implemented as a core circuit, each of which may execute instructions.

메모리(1030)는 RAM(Random Access Memory), 예컨대 DRAM(Dynamic RAM), SRAM 등으로 구현될 수 있다. 메모리(1030)는 각종 프로그램들 및 데이터를 저장할 수 있다. 일 실시예에 따라, 메모리(1030)는 외부 장치, 예컨대, 서버 또는 외부 메모리 등으로부터 제공되는 웨이트 맵들 또는 입력 피처 맵들을 저장할 수 있다.The memory 1030 may be implemented as a random access memory (RAM), for example, a dynamic random access memory (DRAM), an SRAM, or the like. The memory 1030 may store various programs and data. According to one embodiment, the memory 1030 may store weight maps or input feature maps provided from an external device, such as a server or external memory.

컨트롤러(1010)는, 메모리(1030)에 저장된 입력 피처 맵 및 웨이트 맵을 프로세서 어레이(1020)에 할당할 수 있다.The controller 1010 may assign the input feature map and the weight map stored in the memory 1030 to the processor array 1020. [

또한, 컨트롤러(1010)는 입력 피처 맵으로부터, 입력 피처 맵의 복수의 입력 피처들 각각이 비제로값을 갖는지 여부를 나타내는 입력 피처 정보를 생성할 수 있다. 또한, 컨트롤러(1010)는 웨이트 맵으로부터, 웨이트 맵의 복수의 웨이트들 각각이 비제로값을 갖는지 여부를 나타내는 웨이트 정보를 생성할 수 있다. 또한, 컨트롤러(1010)는 입력 피처 정보 및 웨이트 정보를 프로세서 어레이(1020)에 제공할 수 있다. 또한, 다른 실시예에 따라, 컨트롤러(1010)는 외부로부터 입력 피처 정보 및 웨이트 정보를 수신할 수 있다.In addition, the controller 1010 may generate, from the input feature map, input feature information that indicates whether each of a plurality of input features of the input feature map has a non-zero value. Further, the controller 1010 can generate, from the weight map, weight information indicating whether or not each of the plurality of weights of the weight map has a non-zero value. In addition, the controller 1010 may provide input feature information and weight information to the processor array 1020. Further, according to another embodiment, the controller 1010 may receive input feature information and weight information from the outside.

프로세서 어레이(1020)의 각 뉴럴 네트워크 프로세서는 할당된 입력 피처 맵 및 웨이트 맵에 대한 컨벌루션 연산을 수행하여 출력 피처 맵을 생성할 수 있다. 또한, 프로세서 어레이(1020)는 입력 피처 정보 및 웨이트 정보에 기초하여, 입력 피처 맵 및 웨이트 맵에 대한 컨벌루션 연산을 수행하여 출력 피처 맵을 생성할 수 있다. 구체적으로, 프로세서 어레이(1020)는 비제로값을 갖는 입력 피처 및 웨이트에 대해 선택적으로 컨벌루션을 수행하여, 출력 피처 맵을 생성할 수 있다.Each neural network processor of the processor array 1020 may perform a convolution operation on the assigned input feature map and weight map to generate an output feature map. In addition, the processor array 1020 can perform a convolution operation on the input feature map and the weight map based on the input feature information and the weight information to generate an output feature map. Specifically, the processor array 1020 may selectively perform convolution on input features and weights having a non-zero value to generate an output feature map.

컨트롤러(1020)는 입력 피처 맵을 공간적 차원(spatial dimension)에 따라 분할할 수 있다. 예를 들어, 컨트롤러(1020)는 웨이트 맵의 크기를 기준으로 입력 피처 맵을 분할할 수 있다. 이어서, 컨트롤러(1020)는 분할된 입력 피처 맵들을 프로세서 어레이(1020)에 할당할 수 있다.The controller 1020 may divide the input feature map according to a spatial dimension. For example, the controller 1020 may partition the input feature map based on the size of the weight map. Controller 1020 may then allocate the partitioned input feature maps to processor array 1020. [

컨트롤러(1010)는 프로세서 어레이(1020)의 복수의 뉴럴 네트워크 프로세서들을 복수의 프로세서 그룹들로 그룹화할 수 있다. 다시 말해, 컨트롤러(1010)는 기 설정된 개수의 뉴럴 네트워크 프로세서들을 하나의 프로세서 그룹으로 그룹화하여, 결과적으로 복수의 프로세서 그룹들을 결정할 수 있다. 예를 들어, 복수의 뉴럴 네트워크 프로세서들이 100개 존재하는 경우, 컨트롤러(1010)는 10개의 뉴럴 네트워크 프로세서들을 1개의 프로세서 그룹으로 그룹화하여 결과적으로 10개의 프로세서 그룹들을 결정할 수 있다.The controller 1010 may group a plurality of neural network processors of the processor array 1020 into a plurality of processor groups. In other words, the controller 1010 may group a predetermined number of neural network processors into one processor group, and consequently determine a plurality of processor groups. For example, if there are 100 multiple neural network processors, the controller 1010 may group 10 neural network processors into one processor group, resulting in 10 processor groups.

컨트롤러(1020)는 분할된 입력 피처 맵들 각각을 프로세서 어레이(1020)의 각 프로세서 그룹에 할당할 수 있다. 또한, 컨트롤러(1020)는 복수의 웨이트 맵들을 프로세서 어레이(1020)의 각 프로세서 그룹에 할당할 수 있다. 따라서, 프로세서 어레이(1020)의 프로세서 그룹에 포함되는 뉴럴 네트워크 프로세서들 각각은 동일한 입력 피처 맵을 할당 받을 수 있고, 서로 다른 웨이트 맵을 할당 받을 수 있다.The controller 1020 may assign each of the partitioned input feature maps to each processor group of the processor array 1020. [ Further, the controller 1020 may assign a plurality of weight maps to each processor group of the processor array 1020. Thus, each of the neural network processors included in the processor group of the processor array 1020 can be assigned the same input feature map, and can be assigned different weight maps.

일 실시예에 따라, 메모리(1030)는 프로세서 어레이(1020)가 실행할 레이어에 대응하는 웨이트 맵들을 버퍼링할 수 있다. 프로세서 어레이(1020)에서 웨이트 맵을 이용하여 연산을 수행하는 경우, 사용되는 웨이트 맵은 외부 메모리로부터 출력되어 프로세서 어레이(1020)의 뉴럴 네트워크 프로세서의 내부 메모리에 저장될 수 있다. 메모리(1030)는 외부 메모리로부터 출력된 웨이트 맵들이 프로세서 어레이(1020)의 뉴럴 네트워크 프로세서의 내부 메모리에 제공되기 전에 웨이트 맵들을 임시로 저장할 수 있다. 또한, 일 실시예에 따라, 메모리(1030)는 프로세서 어레이(1020)에서 출력되는 출력 피처 맵을 임시 저장할 수 있다.According to one embodiment, the memory 1030 may buffer the weight maps corresponding to the layer to be executed by the processor array 1020. When a calculation is performed using the weight map in the processor array 1020, the used weight map may be output from the external memory and stored in the internal memory of the neural network processor of the processor array 1020. [ The memory 1030 may temporarily store the weight maps before the weight maps output from the external memory are provided to the internal memory of the neural network processor of the processor array 1020. [ Further, according to one embodiment, the memory 1030 may temporarily store an output feature map output from the processor array 1020. [

일 실시예에 따라, 컨트롤러(1010)는, 웨이트 맵 내에서 비제로값을 갖는 웨이트의 비율를 기준으로, 복수의 웨이트 맵들을 복수의 웨이트 그룹들로 그룹화할 수 있다. 구체적으로, 컨트롤러(1010)는, 웨이트 그룹에 포함된 웨이트 맵들 각각의 비제로값을 갖는 웨이트 비율이 서로 균등해지도록, 복수의 웨이트 맵들을 복수의 웨이트 그룹들로 그룹화할 수 있다. 예를 들어, 컨트롤러(1010)는, 웨이트 맵의 전체 웨이트들 중에서 비제로값을 갖는 웨이트의 비율를 기준으로, 오름차순으로 복수의 웨이트 맵들을 정렬할 수 있고, 정렬된 복수의 웨이트 맵들을 순서대로 기 설정된 개수 단위로 그룹화하여 복수의 웨이트 그룹들을 결정할 수 있다. 일 실시예에 따라, 컨트롤러(1010)는 프로세서 어레이(1020)의 하나의 프로세서 그룹에 포함된 뉴럴 네트워크 프로세서들의 개수를 기준으로, 복수의 웨이트 맵들을 복수의 웨이트 맵 그룹들로 그룹화할 수 있다. 예를 들어, 복수의 웨이트 맵들이 381개이고 프로세서 어레이(1020)의 하나의 그룹에 포함된 뉴럴 네트워크 프로세서들의 개수가 40개인 경우, 컨트롤러(1010)는 10개의 그룹으로 복수의 웨이트 맵들을 그룹화할 수 있다. 다시 말해, 컨트롤러(1010)는 40개의 웨이트 맵들을 포함하는 9개의 웨이트 맵 그룹과 21개의 웨이트 맵들을 포함하는 1개의 웨이트 맵 그룹을 결정할 수 있다.According to one embodiment, the controller 1010 may group a plurality of weight maps into a plurality of weight groups, based on the ratio of weights having non-zero values in the weight map. Specifically, the controller 1010 may group the plurality of weight maps into a plurality of weight groups so that the weight ratios of non-zero values of each of the weight maps included in the weight group become equal to each other. For example, the controller 1010 can arrange a plurality of weight maps in ascending order based on the ratio of weights having a non-zero value among all the weights of the weight map, A plurality of weight groups can be determined by grouping them in the set number unit. In accordance with one embodiment, controller 1010 may group a plurality of weight maps into a plurality of weight map groups, based on the number of neural network processors included in one processor group of processor array 1020. For example, if the number of weighted maps is 381 and the number of neural network processors included in one group of the processor array 1020 is 40, the controller 1010 can group the plurality of weight maps into 10 groups have. In other words, the controller 1010 can determine one weight map group including nine weight map groups including the 40 weight maps and 21 weight maps.

컨트롤러(1010)는 복수의 웨이트 그룹들을 프로세서 어레이(1020)에 할당할 수 있다. 구체적으로, 컨트롤러(1010)는 복수의 웨이트 그룹들 각각을 순차적으로 프로세서 어레이(1020)의 각 프로세서 그룹에 할당할 수 있다. 예를 들어, 컨트롤러(1010)는 복수의 웨이트 그룹들 중 제 0 웨이트 그룹에 포함되는 웨이트 맵들을 프로세서 어레이(1020)의 프로세서 그룹들 각각에 할당할 수 있다. 제 0 웨이트 그룹에 대해 프로세서 어레이(1020)에 의한 컨벌루션 연산이 완료된 후, 컨트롤러(1020)는 복수의 웨이트 그룹들 중 제 1 웨이트 그룹에 포함되는 웨이트 맵들을 프로세서 어레이(1020)의 프로세서 그룹들 각각에 할당할 수 있다. The controller 1010 may assign a plurality of weight groups to the processor array 1020. Specifically, the controller 1010 may assign each of a plurality of weight groups to each processor group of the processor array 1020 sequentially. For example, the controller 1010 may assign the weight maps included in the zeroth weight group of the plurality of weight groups to each of the processor groups of the processor array 1020. [ After the convolution by the processor array 1020 is completed for the zeroth weight group, the controller 1020 sends the weight maps contained in the first one of the plurality of weight groups to the processor groups 1020 of the processor array 1020 . &Lt; / RTI >

따라서, 본 개시에 따르면, 뉴럴 네트워크 장치(1000)는, 비제로값을 갖는 웨이트의 비율를 기준으로 그룹화된 복수의 웨이트 그룹들 각각을 순차적으로 프로세서 어레이(1020)에 할당하는 바, 프로세서 어레이(1020)의 컨벌루션 연산 속도를 향상시킬 수 있다. 다시 말해, 프로세서 어레이(1020)의 각 프로세서 그룹에 할당되는 웨이트 맵들의 비제로값을 갖는 웨이트 비율이 균등한 바, 프로세서 그룹 내 뉴럴 네트워크 프로세서들 간의 컨벌루션 연산 속도가 균등해질 수 있으므로, 결과적으로 프로세서 어레이(1020)의 컨벌루션 연산 속도를 향상시킬 수 있다. Thus, according to the present disclosure, the neural network device 1000 sequentially assigns each of the plurality of weight groups grouped based on the ratio of the weights having a non-zero value to the processor array 1020, ) Can be improved. In other words, since the non-zero value weight ratios of the weight maps allocated to each processor group of the processor array 1020 are equal, the convolution operation speed between the neural network processors in the processor group can be equalized, The arithmetic operation speed of the array 1020 can be improved.

도 6은 뉴럴 네트워크 장치가 복수의 웨이트 맵들을 복수의 웨이트 그룹들로 그룹화하는 실시예를 나타낸다.6 shows an embodiment in which a neural network device groups a plurality of weight maps into a plurality of weight groups.

일 실시예에 따라, 그래프(610)는 메모리(1030)에 저장된 복수의 웨이트 맵들 각각이 갖는 비제로값 웨이트의 비율을 나타낸다. 컨트롤러(1010)는, 그래프(610)와 같이, 메모리(1030)에 저장된 복수의 웨이트 맵들 각각마다 비제로값을 갖는 웨이트의 비율을 검출할 수 있고, 비제로값을 갖는 웨이트의 비율을 기준으로, 그래프(620)와 같이, 복수의 웨이트 맵들을 정렬할 수 있다. 그래프(620)는, 비제로값을 갖는 웨이트의 비율를 기준으로, 오름차순으로 복수의 웨이트 맵들이 정렬된 것을 나타낸다. 컨트롤러(1010)는 정렬된 복수의 웨이트 맵들을 복수의 웨이트 맵 그룹들로 그룹화할 수 있다. 예를 들어, 컨트롤러(1010)는 그래프(620)에 표시된 바와 같이, 정렬된 복수의 웨이트 맵들을 순서대로 기 설정된 개수 단위로 그룹화하여 10개의 웨이트 맵 그룹들을 결정할 수 있다. 다시 말해, 컨트롤러(1010)는 웨이트 맵 그룹 0 내지 9를 결정할 수 있다. 따라서, 10개의 웨이트 맵 그룹들 각각에 포함되는 웨이트 맵들의 비제로값 웨이트의 비율은 균등해 질 수 있다. According to one embodiment, graph 610 shows the ratio of nonzero value weights each of the plurality of weight maps stored in memory 1030 has. The controller 1010 can detect the ratio of the weights having the non-zero value for each of the plurality of weight maps stored in the memory 1030 as shown in the graph 610 and calculate the ratio of the weights having the non- , The graph 620, and the like. The graph 620 shows that a plurality of weight maps are arranged in ascending order based on the ratio of weights having a non-zero value. The controller 1010 may group the aligned plurality of weight maps into a plurality of weight map groups. For example, the controller 1010 may group the plurality of aligned weight maps in order of a predetermined number of units, as shown in the graph 620, to determine ten weight map groups. In other words, the controller 1010 can determine the weight map groups 0 to 9. Therefore, the ratio of the non-zero value weight of the weight maps included in each of the ten weight map groups can be equalized.

도 7a 및 7b는 뉴럴 네트워크 장치가 입력 피처 맵 및 웨이트 맵을 처리하는 실시예를 나타낸다.7A and 7B show an embodiment in which a neural network device processes an input feature map and a weight map.

컨트롤러(1010)는 너비(W), 높이(H), 및 채널(C)을 갖는 입력 피처 맵을 공간적 차원(spatial dimension)에 따라 분할할 수 있고, 분할된 입력 피처 맵들을 순차적으로 프로세서 어레이(1020)의 각 프로세서 그룹에 할당할 수 있다. 일 실시예에 따라, 도 7b의 화살표를 참조하면, 컨트롤러(1010)는 분할된 입력 피처 맵들을 지그재그(zig-zag) 방향으로 프로세서 어레이(1020)의 각 프로세서 그룹에 할당할 수 있다.The controller 1010 may divide the input feature map having width W, height H and channel C according to a spatial dimension and divide the input feature maps sequentially into processor arrays 1020). &Lt; / RTI > 7b, the controller 1010 may assign the partitioned input feature maps to each processor group of the processor array 1020 in a zig-zag direction.

도 7a를 살펴보면, 컨트롤러(1010)는 분할된 입력 피처 맵 A0 및 A1 각각을 프로세서 어레이(1020)의 제 0 프로세서 그룹 및 제 1 프로세서 그룹 각각에 할당할 수 있다.Referring to FIG. 7A, the controller 1010 may allocate each of the divided input feature maps A0 and A1 to each of the zeroth processor group and the first processor group of the processor array 1020. FIG.

또한, 컨트롤러(1010)는 복수의 웨이트 그룹들을 순차적으로 프로세서 어레이(1020)의 각 프로세서 그룹에 할당할 수 있다. 예를 들어, 컨트롤러(1010)는 제 0 웨이트 그룹의 웨이트 맵 K0 및 K1을 프로세서 어레이(1020)의 제 0 프로세서 그룹 및 제 1 프로세서 그룹 각각에 할당할 수 있다. 또한, 프로세서 어레이(1020)가 제 0 웨이트 그룹과 분할된 모든 입력 피처 맵들에 대한 컨벌루션 연산을 완료하는 경우, 컨트롤러(1010)는 제 1 웨이트 그룹을 프로세서 어레이(1020)에 할당할 수 있다.Further, the controller 1010 may assign a plurality of weight groups to each processor group of the processor array 1020 sequentially. For example, the controller 1010 may assign the weight maps K0 and K1 of the zero-weight group to each of the zeroth processor group and the first processor group of the processor array 1020. [ In addition, controller 1010 may assign a first weight group to processor array 1020 when processor array 1020 completes a convolution operation on all input feature maps partitioned with the zero weight group.

프로세서 어레이(1020)의 프로세서 그룹은 할당 받은 입력 피처 맵 및 웨이트 맵에 대한 컨벌루션 연산을 수행하여 출력 피처 맵을 생성할 수 있다. 예를 들어, 제 0 프로세서 그룹의 제 0 뉴럴 네트워크 프로세서는 입력 피처 맵 A0 및 웨이트 맵 K0에 대한 컨벌루션 연산을 수행하여 출력 피처 맵(Psum0)을 생성할 수 있다. 다시 말해, 제 0 뉴럴 네트워크 프로세서는, 전체 입력 피처 맵의 일부에 해당하는 입력 피처 맵 A0에 대한 출력 피처 맵(Psum0)을 생성할 수 있다. 또한, 제 0 프로세서 그룹의 제 1 뉴럴 네트워크 프로세서는, 입력 피처 맵 A0 및 웨이트 맵 K1에 대한 컨벌루션 연산을 수행하여 출력 피처 맵(Psum1)을 생성할 수 있다. 마찬가지로, 제 1 프로세서 그룹의 제 0 뉴럴 네트워크 프로세서는 입력 피처 맵 A1 및 웨이트 맵 K0에 대한 컨벌루션 연산을 수행하여 출력 피처 맵(Psum0)을 생성할 수 있다. 또한, 제 1 프로세서 그룹의 제 1 뉴럴 네트워크 프로세서는 입력 피처 맵 A1 및 웨이트 맵 K1에 대한 컨벌루션 연산을 수행하여 출력 피처 맵(Psum1)을 생성할 수 있다.The processor group of the processor array 1020 may perform a convolution operation on the assigned input feature map and weight map to generate an output feature map. For example, a zero-neural network processor of the zeroth processor group may perform a convolution operation on input feature map A0 and weight map K0 to generate an output feature map Psum0. In other words, the 0 < th > neural network processor may generate an output feature map (Psum0) for the input feature map A0 corresponding to a portion of the entire input feature map. In addition, the first neural network processor of the zeroth processor group may perform a convolution operation on the input feature map A0 and the weight map K1 to generate the output feature map Psum1. Similarly, the 0 < th > neural network processor of the first processor group may perform a convolution operation on the input feature map A1 and the weight map K0 to generate the output feature map Psum0. In addition, the first neural network processor of the first processor group may perform a convolution operation on the input feature map Al and the weight map K1 to generate the output feature map Psuml.

다음으로 도 7b를 살펴보면, 분할된 입력 피처 맵 A0 및 A1에 대한 컨벌루션 연산이 완료된 후에, 컨트롤러(1010)는 다른 분할된 입력 피처 맵 A2 및 A3를 프로세서 어레이(1020)의 제 0 프로세서 그룹 및 제 1 프로세서 그룹 각각에 할당할 수 있다. 이어서, 프로세서 어레이(1020)의 제 0 프로세서 그룹은 기 할당 받은 웨이트 맵 K0 및 K1과 입력 피처 맵 A2에 대한 컨벌루션 연산을 수행하여 출력 피처 맵 Psum0 및 Psum1을 생성할 수 있다. 또한, 프로세서 어레이(1020)의 제 1 프로세서 그룹은 기 할당 받은 웨이트 맵 K0 및 K1과 입력 피처 맵 A3에 대한 컨벌루션 연산을 수행하여 출력 피처 맵 Psum0 및 Psum1을 생성할 수 있다.Next, referring to FIG. 7B, after the convolution operation for the divided input feature maps A0 and A1 is completed, the controller 1010 transfers the other divided input feature maps A2 and A3 to the zeroth processor group of the processor array 1020 and the 1 processor group. The zeroth processor group of processor array 1020 may then perform a convolution operation on the pre-allocated weight maps K0 and K1 and the input feature map A2 to generate output feature maps Psum0 and Psum1. In addition, the first processor group of the processor array 1020 may perform a convolution operation on the pre-allocated weight maps K0 and K1 and the input feature map A3 to generate the output feature maps Psum0 and Psum1.

도 8은 일 실시예에 따라, 뉴럴 네트워크 장치의 동작 방법을 설명하기 위한 도면이다.8 is a diagram for explaining a method of operating a neural network device according to an embodiment.

도 8에 도시된 방법은, 도 5의 뉴럴 네트워크 장치(1000)의 각 구성요소에 의해 수행될 수 있고, 중복되는 설명에 대해서는 생략한다.The method shown in Fig. 8 can be performed by each component of the neural network apparatus 1000 of Fig. 5, and redundant explanations are omitted.

단계 s810에서, 뉴럴 네트워크 장치(1000)는, 웨이트 맵 내에서 비제로값을 갖는 웨이트의 비율을 기준으로, 복수의 웨이트 맵들을 복수의 웨이트 그룹들로 그룹화할 수 있다. 구체적으로, 뉴럴 네트워크 장치(1000)는, 웨이트 그룹에 포함된 웨이트 맵들 각각의 비제로값을 갖는 웨이트 비율이 서로 유사하도록, 복수의 웨이트 맵들을 복수의 웨이트 그룹들로 그룹화할 수 있다.In step s810, the neural network device 1000 may group the plurality of weight maps into a plurality of weight groups, based on the ratio of the weights having non-zero values in the weight map. Specifically, the neural network apparatus 1000 can group a plurality of weight maps into a plurality of weight groups so that the weight ratios of non-zero values of each of the weight maps included in the weight group are similar to each other.

또한, 뉴럴 네트워크 장치(1000)는 복수의 뉴럴 네트워크 프로세서들을 복수의 프로세서 그룹들로 그룹화할 수 있다. 다시 말해, 뉴럴 네트워크 장치(1000)는 기 설정된 개수의 뉴럴 네트워크 프로세서들을 하나의 프로세서 그룹으로 그룹화하여, 결과적으로 복수의 프로세서 그룹들을 결정할 수 있다.In addition, the neural network device 1000 may group a plurality of neural network processors into a plurality of processor groups. In other words, the neural network apparatus 1000 can group a predetermined number of neural network processors into one processor group, and consequently determine a plurality of processor groups.

단계 s820에서, 뉴럴 네트워크 장치(1000)는, 복수의 웨이트 그룹들 각각과 입력 피처 맵을 복수의 뉴럴 네트워크 프로세서들에 할당할 수 있다. 구체적으로, 뉴럴 네트워크 장치(1000)는 복수의 웨이트 그룹들 각각을 순차적으로 복수의 프로세서 그룹들 각각에 할당할 수 있다.In step s820, the neural network device 1000 may assign each of a plurality of weight groups and an input feature map to a plurality of neural network processors. Specifically, the neural network apparatus 1000 can sequentially assign each of a plurality of weight groups to a plurality of processor groups.

따라서, 복수의 프로세서 그룹들 각각은 할당된 웨이트 그룹 및 입력 피처 맵에 대한 컨벌루션 연산을 수행하여 출력 피처 맵을 생성할 수 있다.Thus, each of the plurality of processor groups may perform a convolution operation on the assigned weight group and the input feature map to generate an output feature map.

도 9는 본 개시의 실시예에 따른 전자 시스템을 나타내는 블록도이다.9 is a block diagram illustrating an electronic system according to an embodiment of the present disclosure.

본 개시의 실시예에 따른 전자 시스템(1)은 뉴럴 네트워크를 기초로 입력 데이터를 실시간으로 분석하여 유효한 정보를 추출하고, 추출된 정보를 기초로 상황 판단을 하거나 또는 전자 시스템(1)이 탑재되는 전자 장치의 구성들을 제어할 수 있다. 예컨대 전자 시스템(1)은 드론(drone), 첨단 운전자 보조 시스템(Advanced Drivers Assistance System; ADAS) 등과 같은 로봇 장치, 스마트 TV, 스마트 폰, 의료 장치, 모바일 장치, 영상 표시 장치, 계측 장치, IoT(Internet of Things) 장치 등에 적용될 수 있으며, 이 외에도 다양한 종류의 전자 장치 중 하나에 탑재될 수 있다. The electronic system 1 according to the embodiment of the present disclosure analyzes inputted data in real time based on a neural network, extracts valid information, makes a situation determination based on the extracted information, To control the configurations of the electronic device. For example, the electronic system 1 may be a robotic device such as a drone, an Advanced Drivers Assistance System (ADAS), a smart TV, a smart phone, a medical device, a mobile device, an image display device, Internet of Things) devices, and the like, and it can be mounted on one of various kinds of electronic devices.

도 1을 참조하면, 전자 시스템(1)은 CPU(Central Processing Unit)(110), RAM(Random Access memory)(120), 뉴럴 네트워크 장치(130), 메모리(140), 센서 모듈(150) 및 통신 모듈(160)을 포함할 수 있다. 전자 시스템(1)은 입출력 모듈, 보안 모듈, 전력 제어 장치 등을 더 포함할 수 있다. 실시예에 있어서, 전자 시스템(1)의 구성들(CPU(110), RAM(120), 뉴럴 네트워크 장치(130), 메모리(140), 센서 모듈(150) 및 통신 모듈(160)) 중 일부는 하나의 반도체 칩에 탑재될 수 있다. 또한, 뉴럴 네트워크 장치(130)는 도 5의 뉴럴 네트워크 장치(1000)와 대응될 수 있다.1, the electronic system 1 includes a central processing unit (CPU) 110, a random access memory (RAM) 120, a neural network device 130, a memory 140, a sensor module 150, Communication module 160 may be included. The electronic system 1 may further include an input / output module, a security module, a power control device, and the like. Some of the components of the electronic system 1 (CPU 110, RAM 120, neural network device 130, memory 140, sensor module 150 and communication module 160) Can be mounted on one semiconductor chip. In addition, the neural network device 130 may correspond to the neural network device 1000 of FIG.

CPU(110)는 전자 시스템(1)의 전반적인 동작을 제어한다. CPU(110)는 하나의 프로세서 코어(Single Core)를 포함하거나, 복수의 프로세서 코어들(Multi-Core)을 포함할 수 있다. CPU(110)는 메모리(140)에 저장된 프로그램들 및/또는 데이터를 처리 또는 실행할 수 있다. 일 실시예에 있어서, CPU(110)는 메모리(140)에 저장된 프로그램들을 실행함으로써, 뉴럴 네트워크 장치(130)의 기능을 제어할 수 있다. The CPU 110 controls the overall operation of the electronic system 1. The CPU 110 may include one processor core or a plurality of processor cores (Multi-Core). CPU 110 may process or execute programs and / or data stored in memory 140. In one embodiment, the CPU 110 may control the functions of the neural network device 130 by executing programs stored in the memory 140.

RAM(120)은 프로그램들, 데이터, 또는 명령들(instructions)을 일시적으로 저장할 수 있다. 예컨대 메모리(140)에 저장된 프로그램들 및/또는 데이터는 CPU(110)의 제어 또는 부팅 코드에 따라 RAM(120)에 일시적으로 저장될 수 있다. RAM(120)은 DRAM(Dynamic RAM) 또는 SRAM(Static RAM) 등의 메모리로 구현될 수 있다.The RAM 120 may temporarily store programs, data, or instructions. For example, the programs and / or data stored in the memory 140 may be temporarily stored in the RAM 120 according to the control of the CPU 110 or the boot code. The RAM 120 may be implemented as a memory such as DRAM (Dynamic RAM) or SRAM (Static RAM).

뉴럴 네트워크 장치(130)는 수신되는 입력 데이터를 기초로 뉴럴 네트워크 의 연산을 수행하고, 수행 결과를 기초로 정보 신호를 생성할 수 있다. 뉴럴 네트워크는 Convolutional Neural Networks(CNN), Recurrent Neural Networks(RNN), Deep Belief Networks, Restricted Boltzman Machines 등을 포함할 수 있으나 이에 제한되지 않는다. The neural network device 130 may perform an operation of the neural network based on the received input data, and may generate an information signal based on the result of the operation. Neural networks may include, but are not limited to, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Belief Networks, and Restricted Boltzman Machines.

정보 신호는 음성 인식 신호, 사물 인식 신호, 영상 인식 신호, 생체 정보 인식 신호 등과 같은 다양한 종류의 인식 신호 중 하나를 포함할 수 있다. 예를 들어, 뉴럴 네트워크 장치(130)는 비디오 스트림에 포함되는 프레임 데이터를 입력 데이터로서 수신하고, 프레임 데이터로부터 프레임 데이터가 나타내는 이미지에 포함된 사물에 대한 인식 신호를 생성할 수 있다. 그러나, 이에 제한되는 것은 아니며, 전자 시스템(1)이 탑재된 전자 장치의 종류 또는 기능에 따라 뉴럴 네트워크 장치(130)는 다양한 종류의 입력 데이터를 수신할 수 있고, 입력 데이터에 따른 인식 신호를 생성할 수 있다.The information signal may include one of various kinds of recognition signals such as a speech recognition signal, an object recognition signal, an image recognition signal, a biometric information recognition signal, and the like. For example, the neural network device 130 may receive frame data included in a video stream as input data, and generate a recognition signal for an object included in the image represented by the frame data from the frame data. However, the present invention is not limited thereto. Depending on the type or function of the electronic device on which the electronic system 1 is mounted, the neural network device 130 can receive various kinds of input data, can do.

메모리(140)는 데이터를 저장하기 위한 저장 장소로서, OS(Operating System), 각종 프로그램들, 및 각종 데이터를 저장할 수 있다. 실시예에 있어서, 메모리(140)는 뉴럴 네트워크 장치(130)의 연산 수행 과정에서 생성되는 중간 결과들, 예컨대 출력 피처 맵을 출력 피처 리스트 또는 출력 피처 매트릭스 형태로 저장할 수 있다. 실시예에 있어서, 메모리(140)에는 압축된 출력 피처 맵이 저장될 수 있다. 또한, 메모리(140)는 뉴럴 네트워크 장치(130)에서 이용되는 각종 파라미터들, 예컨대 웨이트 맵 또는 웨이트 리스트를 저장할 수 있다. The memory 140 is a storage area for storing data, and can store an OS (Operating System), various programs, and various data. In an embodiment, the memory 140 may store intermediate results, such as an output feature map, generated in the operation of the neural network device 130 in the form of an output feature list or an output feature matrix. In an embodiment, the memory 140 may store a compressed output feature map. The memory 140 may also store various parameters used in the neural network device 130, such as a weight map or a weight list.

메모리(140)는 DRAM일 수 있으나, 이에 한정되는 것은 아니다. 메모리(140)는 휘발성 메모리(volatile memory) 또는 불휘발성 메모리(nonvolatile memory) 중 적어도 하나를 포함할 수 있다. 불휘발성 메모리는 ROM (Read Only Memory), PROM (Programmable ROM), EPROM (Electrically Programmable ROM), EEPROM (Electrically Erasable and Programmable ROM), 플래시 메모리, PRAM (Phase-change RAM), MRAM (Magnetic RAM), RRAM (Resistive RAM), FRAM (Ferroelectric RAM) 등을 포함한다. 휘발성 메모리는 DRAM (Dynamic RAM), SRAM (Static RAM), SDRAM (Synchronous DRAM), PRAM (Phase-change RAM), MRAM (Magnetic RAM), RRAM (Resistive RAM), FeRAM (Ferroelectric RAM) 등을 포함한다. 실시예에 있어서, 메모리(14)는 HDD(Hard Disk Drive), SSD(Solid State Drive), CF(compact flash), SD(secure digital), Micro-SD(micro secure digital), Mini-SD(mini secure digital), xD(extreme digital) 또는 Memory Stick 중 적어도 하나를 포함할 수 있다. The memory 140 may be a DRAM, but is not limited thereto. The memory 140 may include at least one of a volatile memory or a nonvolatile memory. The non-volatile memory may be a ROM, a PROM, an EPROM, an EEPROM, a flash memory, a phase-change RAM (PRAM), a magnetic RAM (MRAM) RRAM (Resistive RAM), FRAM (Ferroelectric RAM), and the like. The volatile memory includes a dynamic RAM (SRAM), a synchronous DRAM (SDRAM), a phase-change RAM (PRAM), a magnetic RAM (MRAM), a resistive RAM (RRAM), a ferroelectric RAM . In the embodiment, the memory 14 may be a hard disk drive (HDD), a solid state drive (SSD), a compact flash (CF), a secure digital (SD), a micro secure digital secure digital), xD (extreme digital), or a Memory Stick.

센서 모듈(150)은 전자 시스템(1)이 탑재되는 전자 장치 주변의 정보를 수집할 수 있다. 센서 모듈(150)은 전자 장치의 외부로부터 신호(예컨대 영상 신호, 음성 신호, 자기 신호, 생체 신호, 터치 신호 등)를 센싱 또는 수신하고, 센싱 또는 수신된 신호를 데이터로 변환할 수 있다. 이를 위해, 센서 모듈(150)은 센싱 장치, 예컨대 마이크, 촬상 장치, 이미지 센서, 라이더(LIDAR; light detection and ranging) 센서, 초음파 센서, 적외선 센서, 바이오 센서, 및 터치 센서 등 다양한 종류의 센싱 장치 중 적어도 하나를 포함할 수 있다. The sensor module 150 can collect information around the electronic device on which the electronic system 1 is mounted. The sensor module 150 may sense or receive a signal (e.g., a video signal, a voice signal, a magnetic signal, a biological signal, a touch signal, etc.) from the outside of the electronic device, and convert the sensed or received signal into data. To this end, the sensor module 150 may include a sensing device such as a microphone, an imaging device, an image sensor, a light detection and ranging (LIDAR) sensor, an ultrasonic sensor, an infrared sensor, a biosensor, Or the like.

센서 모듈(150)은 변환된 데이터를 뉴럴 네트워크 장치(130)에 입력 데이터로서 제공할 수 있다. 예를 들어, 센서 모듈(150)은 이미지 센서를 포함할 수 있으며, 전자 장치의 외부 환경을 촬영하여 비디오 스트림을 생성하고, 비디오 스트림의 연속하는 데이터 프레임을 뉴럴 네트워크 장치(130)에 입력 데이터로서 순서대로 제공할 수 있다. 그러나 이에 제한되는 것은 아니며 센서 모듈(150)은 다양한 종류의 데이터를 뉴럴 네트워크 장치(130)에 제공할 수 있다. The sensor module 150 may provide the converted data to the neural network device 130 as input data. For example, the sensor module 150 can include an image sensor, which captures the external environment of the electronic device to produce a video stream, and provides a continuous data frame of the video stream to the neural network device 130 as input data Can be provided in order. However, the present invention is not limited to this, and the sensor module 150 may provide various kinds of data to the neural network device 130.

통신 모듈(160)은 외부 장치와 통신할 수 있는 다양한 유선 또는 무선 인터페이스를 구비할 수 있다. 예컨대 통신 모듈(160)은 유선 근거리통신망(Local Area Network; LAN), Wi-fi(Wireless Fidelity)와 같은 무선 근거리 통신망 (Wireless Local Area Network; WLAN), 블루투스(Bluetooth)와 같은 무선 개인 통신망(Wireless Personal Area Network; WPAN), 무선 USB (Wireless Universal Serial Bus), Zigbee, NFC (Near Field Communication), RFID (Radio-frequency identification), PLC(Power Line communication), 또는 3G (3rd Generation), 4G (4th Generation), LTE (Long Term Evolution) 등 이동 통신망(mobile cellular network)에 접속 가능한 통신 인터페이스 등을 포함할 수 있다.The communication module 160 may have various wired or wireless interfaces capable of communicating with external devices. For example, the communication module 160 may be a wireless local area network (LAN) such as a wired local area network (LAN), a wireless local area network (WLAN) such as Wi-fi (Wireless Fidelity), a wireless personal communication network (WPAN), Wireless USB (Universal Serial Bus), Zigbee, NFC (Near Field Communication), RFID (Radio Frequency Identification), PLC (Power Line Communication) And a communication interface connectable to a mobile cellular network, such as Long Term Evolution (LTE).

실시 예에 있어서, 통신 모듈(160)은 외부 서버로부터 웨이트 맵을 수신할 수 있다. 외부 서버는 방대한 양의 학습 데이터를 기초로 트레이닝을 수행하고, 트레이닝된 웨이트를 포함하는 웨이트 맵을 전자 시스템(1)에 제공할 수 있다. 수신된 웨이트 맵은 메모리(1400)에 저장될 수 있다.In an embodiment, the communication module 160 may receive a weight map from an external server. The external server can perform training based on a large amount of learning data and provide a weight map to the electronic system 1 including the training weight. The received weight map may be stored in the memory 1400.

상기 살펴 본 실시 예들에 따른 장치는 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. The apparatus according to the above embodiments may include a processor, a memory for storing and executing program data, a permanent storage such as a disk drive, a communication port for communicating with an external device, a touch panel, a key, The same user interface device, and the like. Methods implemented with software modules or algorithms may be stored on a computer readable recording medium as computer readable codes or program instructions executable on the processor. Here, the computer-readable recording medium may be a magnetic storage medium such as a read-only memory (ROM), a random-access memory (RAM), a floppy disk, a hard disk, ), And a DVD (Digital Versatile Disc). The computer-readable recording medium may be distributed over networked computer systems so that computer readable code can be stored and executed in a distributed manner. The medium is readable by a computer, stored in a memory, and executable on a processor.

본 실시 예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들은 특정 기능들을 실행하는 다양한 개수의 하드웨어 또는/및 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 실시 예는 하나 이상의 마이크로프로세서들의 제어 또는 다른 제어 장치들에 의해서 다양한 기능들을 실행할 수 있는, 메모리, 프로세싱, 로직(logic), 룩 업 테이블(look-up table) 등과 같은 직접 회로 구성들을 채용할 수 있다. 구성 요소들이 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있는 것과 유사하게, 본 실시 예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 실시 예는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. "매커니즘", "요소", "수단", "구성"과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 상기 용어는 프로세서 등과 연계하여 소프트웨어의 일련의 처리들(routines)의 의미를 포함할 수 있다.This embodiment may be represented by functional block configurations and various processing steps. These functional blocks may be implemented in a wide variety of hardware and / or software configurations that perform particular functions. For example, embodiments may include integrated circuit components such as memory, processing, logic, look-up tables, etc., that may perform various functions by control of one or more microprocessors or other control devices Can be employed. Similar to how components may be implemented with software programming or software components, the present embodiments may be implemented in a variety of ways, including C, C ++, Java (" Java), an assembler, and the like. Functional aspects may be implemented with algorithms running on one or more processors. In addition, the present embodiment can employ conventional techniques for electronic environment setting, signal processing, and / or data processing. Terms such as "mechanism", "element", "means", "configuration" may be used broadly and are not limited to mechanical and physical configurations. The term may include the meaning of a series of routines of software in conjunction with a processor or the like.

본 실시 예에서 설명하는 특정 실행들은 예시들로서, 어떠한 방법으로도 기술적 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. The specific implementations described in this embodiment are illustrative and do not in any way limit the scope of the invention. For brevity of description, descriptions of conventional electronic configurations, control systems, software, and other functional aspects of such systems may be omitted. Also, the connections or connecting members of the lines between the components shown in the figures are illustrative of functional connections and / or physical or circuit connections, which may be replaced or additionally provided by a variety of functional connections, physical Connection, or circuit connections.

본 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 포함하는 것으로서(이에 반하는 기재가 없다면), 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 마지막으로, 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 한정되는 것은 아니다. 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 기술적 사상을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.In this specification (particularly in the claims), the use of the terms "above" and similar indication words may refer to both singular and plural. In addition, when a range is described, it includes the individual values belonging to the above range (unless there is a description to the contrary), and the individual values constituting the above range are described in the detailed description. Finally, if there is no explicit description or contradiction to the steps constituting the method, the steps may be performed in an appropriate order. It is not necessarily limited to the description order of the above steps. The use of all examples or exemplary terms (e. G., The like) is merely intended to be illustrative of technical ideas and is not to be limited in scope by the examples or the illustrative terminology, except as by the appended claims. It will also be appreciated by those skilled in the art that various modifications, combinations, and alterations may be made depending on design criteria and factors within the scope of the appended claims or equivalents thereof.

Claims

A neural network processor comprising:
Input feature information indicating whether each of a plurality of input features of an input feature map has a non-zero value and a plurality of weights of a weight map each having a non- And determining a weight and an input feature to be convoluted among the plurality of input features and the plurality of weights based on the obtained input feature information and weight information, (fetch controller); And
And a data operation circuit that performs a convolution operation on the determined weight and the input feature to produce an output feature map.

The method according to claim 1,
The data operation circuit comprising:
And selectively perform a convolution operation only on the determined weight and input feature among the plurality of input features and the plurality of weights.

The method according to claim 1,
The fetch controller includes:
Performing an operation on the input feature information and the weight information to detect input features and weights having non-zero values,
The data operation circuit comprising:
And performs a convolution operation on the detected input features and weights.

The method according to claim 1,
Wherein the input feature information comprises:
A zero-valued feature is represented by 0, a non-zero-valued feature includes an input feature vector denoted by 1,
The weight information includes:
A weight having a zero value is represented by 0, and a weight having a non-zero value includes a weight vector denoted by 1.

The method according to claim 1,
If the determined input features and weights are a first input feature and a first weight, and a second input feature and a second weight,
The data operation circuit comprising:
In the current cycle, a first input feature and a first weight are read from the input feature map and the weight map to perform a convolution operation,
And in a subsequent cycle, reads a second input feature and a second weight from the input feature map and the weight map to perform a convolution operation.

A method of operating a neural network processor,
Input feature information indicating whether each of a plurality of input features of an input feature map has a non-zero value and whether or not each of the plurality of weights of the weight map has a non- Obtaining weight information;
Determining input features and weights to be convoluted among the plurality of input features and the plurality of weights based on the obtained input feature information and weight information; And
And performing a convolution operation on the determined weight and the input feature to generate an output feature map.

The method according to claim 6,
Wherein the generating comprises:
And selectively performing a convolution operation only on the determined weight and input feature among the plurality of input features and the plurality of weights.

The method according to claim 6,
Wherein the determining comprises:
Performing an operation on the input feature information and the weight information to detect input features and weights having non-zero values,
Wherein the generating comprises:
And performing a convolution operation on the detected input features and weights.

The method according to claim 6,
Wherein the input feature information comprises:
A zero-valued feature is represented by 0, a non-zero-valued feature includes an input feature vector denoted by 1,
The weight information includes:
Wherein a weight having a zero value is represented by 0, and a weight having a non-zero value includes a weight vector denoted by 1.

The method according to claim 1,
If the determined input features and weights are a first input feature and a first weight, and a second input feature and a second weight,
Wherein the generating comprises:
Reading a first input feature and a first weight from the input feature map and the weight map in a current cycle to perform a convolution operation; And
And in the next cycle, reading a second input feature and a second weight from the input feature map and the weight map and performing a convolution operation.

In a neural network device,
A processor array including a plurality of neural network processors;
A memory for storing an input feature map and a plurality of weight maps; And
And a controller for assigning the input feature map and the plurality of weight maps to the processor array,
The controller comprising:
Grouping the plurality of weight maps into a plurality of weight groups and assigning each of the plurality of weight groups to the processor array based on a ratio of a weight having a non-zero value in the weight map.

12. The method of claim 11,
The controller comprising:
And groups the plurality of weight maps into a plurality of weight groups so that the ratio of each of the weight maps included in the weight group is equalized.

12. The method of claim 11,
The controller comprising:
Group the plurality of neural network processors into a plurality of processor groups and sequentially assign each of the plurality of weight groups to each of the plurality of processor groups.

12. The method of claim 11,
The controller comprising:
Input feature information indicating whether each of the input features of the input feature map has a nonzero value and weight information indicating whether each of the weights of the plurality of weight maps has a nonzero value is provided to the processor array ,
The processor array comprising:
And performs a convolution operation on the input feature map and the plurality of weight maps based on the input feature information and the weight information to generate an output feature map.

A computer-readable recording medium storing a program for causing a computer to execute the method according to any one of claims 6 to 10.