KR102277644B1

KR102277644B1 - Conv-xp pruning apparatus of convolutional neural network suitable for an acceleration circuit

Info

Publication number: KR102277644B1
Application number: KR1020190012914A
Authority: KR
Inventors: 강형주; 우용근
Original assignee: 한국기술교육대학교 산학협력단
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2021-07-14
Also published as: KR20200095163A

Abstract

가속 회로에 적합한 합성곱 신경망의 Conv-XP 프루닝 장치가 개시된다. 합성곱 신경망의 연산 회로에 있어서, 입력 데이터(10)에서 X 배열 또는 + 배열을 선택하는 먹스(100); 먹스(100)의 출력과 신경망의 가중치를 곱하는 곱셈기(200); 및 곱셈기(200)의 출력을 더하는 덧셈기(300)를 포함한다. 따라서 곱셈기(200) 수를 줄여 고속 회로를 만들 수 있고, 연산 회로에서 가속 회로의 면적이 줄어 드는 장점이 있다.A Conv-XP pruning device for a convolutional neural network suitable for an acceleration circuit is disclosed. An arithmetic circuit of a convolutional neural network, comprising: a mux (100) for selecting an X array or a + array from input data (10); a multiplier 200 that multiplies the output of the mux 100 and the weight of the neural network; and an adder 300 that adds the output of the multiplier 200 . Therefore, it is possible to make a high-speed circuit by reducing the number of multipliers 200 , and there is an advantage in that the area of the acceleration circuit in the operation circuit is reduced.

Description

Conv-XP Pruning Apparatus for Convolutional Neural Networks Suitable for Acceleration Circuits {CONV-XP PRUNING APPARATUS OF CONVOLUTIONAL NEURAL NETWORK SUITABLE FOR AN ACCELERATION CIRCUIT}

본 발명은 가속 회로에 적합한 합성곱 신경망의 Conv-XP 프루닝 장치에 관한 것으로, 더욱 상세하게는 곱셈기 수를 줄이는 가속 회로에 적합한 합성곱 신경망의 Conv-XP 프루닝 장치에 관한 것이다.The present invention relates to a Conv-XP pruning device for a convolutional neural network suitable for an acceleration circuit, and more particularly, to a Conv-XP pruning device for a convolutional neural network suitable for an acceleration circuit that reduces the number of multipliers.

합성곱 신경망은 입력 데이터와 가중치가 곱셈기에 의해 곱해지고, 곱셈 결과가 덧셈기로 더해져 출력한다. 합성곱 신경망의 연산 회로는 입력 데이터와 가중치를 곱하는 곱셈기와 곱셈 결과를 더하는 덧셈기로 구성된다. 이러한 연산 회로를 예를 들어 살펴 보면 다음과 같다.In a convolutional neural network, input data and weights are multiplied by a multiplier, and the multiplication result is added by an adder to output. The arithmetic circuit of a convolutional neural network consists of a multiplier that multiplies the input data and weight and an adder that adds the multiplication result. An example of such an arithmetic circuit is as follows.

도 1은 종래 합성곱 신경망의 연산 회로를 보인 예시도이다.1 is an exemplary diagram illustrating an operation circuit of a conventional convolutional neural network.

종래 합성곱 신경망의 연산 회로는 3*3 입력 데이터와 3*3 가중치를 9개의 곱셈기가 곱한 후 덧셈기가 더한다. 이러한 연산 회로는 9개의 곱셈기를 가지고 있고, 곱셈기는 복잡도가 높아 전체 연산 회로의 면적이 증가하는 문제점이 있다.In the conventional operation circuit of a convolutional neural network, 9 multipliers multiply 3*3 input data and 3*3 weights, and then add them by an adder. Such an arithmetic circuit has nine multipliers, and the multiplier has a problem in that the area of the entire arithmetic circuit increases because of its high complexity.

도 2는 종래 합성곱 신경망의 다른 실시예를 보인 예시도이다.2 is an exemplary diagram illustrating another embodiment of a conventional convolutional neural network.

종래 합성곱 신경망의 연산 회로는 3*3 입력 데이터를 다중화하는 5개의 먹스와 5개의 먹스 출력과 5개의 가중치를 곱하는 5개의 곱셈기를 가지고, 하나의 덧셈기를 가진다. 이러한 연산 회로는 5개의 곱셈기를 가지지만 먹스 또한 5개를 가져 연산 회로의 복잡도가 증가하는 문제점이 있다.An operation circuit of a conventional convolutional neural network has 5 muxes for multiplexing 3*3 input data, 5 multipliers for multiplying 5 mux outputs and 5 weights, and has one adder. Although this arithmetic circuit has five multipliers, there is a problem in that the mux also has five, increasing the complexity of the arithmetic circuit.

공개번호 제10-2018-0123846호, 합성곱 신경망을 위한 논리적 3차원 구조의 재구성형 연산 가속기Publication No. 10-2018-0123846, Reconstructed Computational Accelerator of Logical 3D Structure for Convolutional Neural Network

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 곱셈기 수를 줄이는 가속 회로에 적합한 합성곱 신경망의 Conv-XP 프루닝 장치를 제공하는데 있다.An object of the present invention for solving the above problems is to provide a Conv-XP pruning device of a convolutional neural network suitable for an acceleration circuit that reduces the number of multipliers.

상기 목적을 달성하기 위한 본 발명은, 합성곱 신경망의 연산 회로에 있어서, 입력 데이터(10)에서 X 배열 또는 + 배열을 선택하는 먹스(100); 먹스(100)의 출력과 신경망의 가중치를 곱하는 곱셈기(200); 및 곱셈기(200)의 출력을 더하는 덧셈기(300)를 포함한다.The present invention for achieving the above object, in an operation circuit of a convolutional neural network, a mux 100 for selecting an X array or a + array from input data 10; a multiplier 200 that multiplies the output of the mux 100 and the weight of the neural network; and an adder 300 that adds the output of the multiplier 200 .

또한, 입력 데이터(10)는 3*3 배열이고, X배열은 대각선과 교차점의 입력 데이터(10)이고, + 배열은 십자선과 십자점의 입력 데이터(10)이고, 먹스(100)는 2입력1출력 먹스(100)가 4개이고, 곱셈기(200)와 가중치의 수는 5개이다.In addition, the input data 10 is a 3*3 array, the X array is the input data 10 of diagonal lines and intersections, the + array is the input data 10 of the crosshairs and the crosshairs, and the mux 100 is 2 inputs One output mux 100 is four, and the number of multipliers 200 and weights is five.

또한, 먹스(100)가 입력 데이터(10)에서 대각선 입력 데이터(10) 또는 십자선 입력 데이터(10)의 값에 따라 입력 데이터(10)에 대한 먹스(100) 출력을 전환한다. 먹스(100)가 대각선 입력 데이터(10)와 십자선 입력 데이터(10)의 합값이 가지는 레벨에 따라 먹스(100) 출력을 전환할 수 있다. 또한, 입력 데이터(10)에 따른 합성곱 신경망 출력이 피드백되어 가중치가 학습될 수 있다. 이때, 가중치가 입력 데이터(10)의 X 배열 또는 + 배열 선택에 따라 달라질 수 있다.In addition, the mux 100 switches the output of the mux 100 for the input data 10 according to the value of the diagonal input data 10 or the cross-hair input data 10 in the input data 10 . The mux 100 may switch the output of the mux 100 according to the level of the sum of the diagonal input data 10 and the cross-hair input data 10 . In addition, the output of the convolutional neural network according to the input data 10 may be fed back to learn weights. In this case, the weight may vary depending on the selection of the X arrangement or the + arrangement of the input data 10 .

또한, Conv-XP 프루닝 장치를 사용하여 연산 회로인 가속 회로를 구성하는 합성곱 신경망이다.In addition, it is a convolutional neural network that uses the Conv-XP pruning device to construct an acceleration circuit that is an arithmetic circuit.

상기와 같은 본 발명에 따른 가속 회로에 적합한 합성곱 신경망의 Conv-XP 프루닝 장치를 이용할 경우에는 곱셈기(200) 수를 줄여 고속 회로를 만들 수 있다.When using the Conv-XP pruning apparatus of the convolutional neural network suitable for the acceleration circuit according to the present invention as described above, the number of multipliers 200 can be reduced to make a high-speed circuit.

또한, 연산 회로에서 가속 회로의 면적이 줄어 드는 장점이 있다.In addition, there is an advantage in that the area of the acceleration circuit in the arithmetic circuit is reduced.

도 1은 종래 합성곱 신경망의 연산 회로를 보인 예시도이다.
도 2는 종래 합성곱 신경망의 다른 실시예를 보인 예시도이다.
도 3은 X 배열과 + 배열을 설명한 예시도이다.
도 4는 본 발명 합성곱 신경망의 연산 회로를 보인 예시도이다.1 is an exemplary diagram illustrating an operation circuit of a conventional convolutional neural network.
2 is an exemplary diagram illustrating another embodiment of a conventional convolutional neural network.
3 is an exemplary diagram illustrating an X arrangement and a + arrangement.
4 is an exemplary diagram illustrating an operation circuit of a convolutional neural network of the present invention.

명세서에서 사용된 용어는 그 단어가 본래 가지는 의미가 확대 해석될 수 있으며 서로 다른 의미를 가지는 단어와 단어가 하나의 단어로 조합됨으로써 광의의 새로운 의미를 가지는 합성어 형태를 취한다. 예를 들어 합성곱이라는 단어와 신경망이라는 단어가 합성되어 합성곱 신경망을 의미하게 된다. 또한, 명세서에서 제시하는 실시예는 실시에 바람직한 실시예이며, 경우에 따라 다른 구성이 부가되거나 본래 있던 구성이 생략됨이 가능하다. 실시예에서 구성을 포함하다 또는 가지다로 실시될 수 있다.The terms used in the specification take the form of a compound word having a new meaning in a broad sense by combining words and words having different meanings into one word, the original meaning of the word can be expanded and interpreted. For example, the word convolution and the word neural network are combined to mean a convolutional neural network. In addition, the embodiment presented in the specification is a preferred embodiment for implementation, and in some cases, it is possible to add another configuration or to omit the original configuration. In an embodiment, it may be implemented as including or having a configuration.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 3은 X 배열과 + 배열을 설명한 예시도이다.3 is an exemplary diagram illustrating an X arrangement and a + arrangement.

3*3 입력 데이터(10)에서 X 배열은 대각선으로 0이 아닌 입력 데이터(10)를 가지고, + 배열은 십자선으로 0이 아닌 입력 데이터(10)를 가진다. 3*3 입력 데이터(10)에서 대각선 또는 십자선으로 0이 아닌 입력 데이터(10)를 가지는 X 배열 또는 + 배열이 구성된다. X 배열 또는 + 배열로 특정되는 3*3 입력 데이터(10)는 합성곱 신경망의 연산 회로에 입력된다. 합성곱 신경망의 연산 회로는 X 배열 또는 + 배열에 적응해서 곱셈과 덧셈 연산을 수행한다.In the 3*3 input data 10, the X array has non-zero input data 10 diagonally, and the + array has non-zero input data 10 cross-haired. In the 3*3 input data 10, an X array or a + array having non-zero input data 10 is constituted by diagonal or crosshairs. The 3*3 input data 10 specified by the X array or the + array is input to the arithmetic circuit of the convolutional neural network. The computational circuit of the convolutional neural network performs multiplication and addition operations by adapting to the X array or the + array.

3*3 입력 데이터(10)에서 X 배열 또는 + 배열로 만드는 것이 프루닝 기법이다. 프루닝 기법은 특정 배열만 0이 아닌 값으로 만들고 0인 값은 연산에서 제외해서 합성곱 신경망의 연산 회로를 가속하는데 필수적이다. 프루닝 기법이 사용되어 합성곱 신경망을 고속 회로로 만들 수 있다.The pruning technique is to make an X array or a + array from 3*3 input data (10). The pruning technique is essential for accelerating the computational circuit of a convolutional neural network by making only a specific array a non-zero value and excluding the zero value from the computation. Pruning techniques can be used to make convolutional neural networks into high-speed circuits.

Conv-XP 프루닝 장치는 도 1과 도 2에서 동시에 처리하는 단위인 한 커널 내의 계수 9개 중에서 일부 계수를 프루닝한다. 종래 프루닝 기법에서는 9개의 계수 중에서 몇 개의 계수가 프루닝될 지, 어떤 계수들이 0이 될 지가 정해져 있지 않다. 이에 반해 Conv-XP 프루닝 장치에서는 도 3과 같이 두 가지 패턴으로만 프루닝을 한다. 도 3에서 흰색 칸은 0으로 프루닝되는 계수를, 회색은 프루닝 후에 0이 아닌 값으로 남는 계수를 의미한다. 두 개의 패턴은 각각 글자 X와 기호 +를 닮아서 Conv-XP 프루닝이라고 부른다.The Conv-XP pruning apparatus prunes some coefficients among nine coefficients in one kernel, which is a unit simultaneously processed in FIGS. 1 and 2 . In the conventional pruning technique, it is not determined how many coefficients out of 9 coefficients will be pruned and which coefficients will be 0. On the other hand, in the Conv-XP pruning device, pruning is performed only in two patterns as shown in FIG. 3 . In FIG. 3 , a white cell indicates a coefficient that is pruned to 0, and a gray indicates a coefficient that remains as a non-zero value after pruning. The two patterns resemble the letter X and the symbol +, respectively, so they are called Conv-XP pruning.

종래 프루닝 기법에 비해 Conv-XP 프루닝 장치는 합성곱 신경망을 가속 회로에 적합하도록 프루닝한다. Conv-XP 프루닝 장치에서는 가속 회로에서 동시에 처리되는 각각의 3*3 커널에 대해 항상 4개의 계수가 0이 되도록 프루닝한다. 그러므로 5개의 곱셈기로도 항상 3*3 커널을 한 싸이클 안에 처리할 수 있음이 보장된다. 그리고 X 배열과 + 배열만 사용하므로 한 곱셈기에는 두 개의 입력 데이터만이 피연산자가 될 가능성이 있다. 이러한 특성을 이용한 합성곱 신경망의 연산 회로는 도 4와 같다.Compared to the conventional pruning technique, the Conv-XP pruning device prunes the convolutional neural network to fit the acceleration circuit. The Conv-XP pruning device prunes so that 4 coefficients are always 0 for each 3*3 kernel that is simultaneously processed in the acceleration circuit. Therefore, it is guaranteed that a 3*3 kernel can always be processed in one cycle even with 5 multipliers. And since only X array and + array are used, there is a possibility that only two input data can be operands to one multiplier. An operation circuit of the convolutional neural network using these characteristics is shown in FIG. 4 .

Conv-XP 프루닝 장치는 각각의 3*3 커널에 대해 적용되고, 각각의 3*3 커널에 대해 X 배열과 + 배열에 있는 계수들의 절대값의 합을 구한 뒤, 그 합이 더 큰 배열을 취하고 배열에 따라 계수들을 0으로 만든다. 모든 3*3 커널에 대해 프루닝을 수행한 뒤, 0인 계수들을 0으로 유지하면서 다시 학습한다.The Conv-XP pruning device is applied to each 3*3 kernel, and for each 3*3 kernel, the sum of the absolute values of the coefficients in the X array and the + array is obtained, and then the array with the larger sum is selected. take and zero the coefficients according to the array. After pruning all 3*3 kernels, it learns again while maintaining 0 coefficients.

도 4는 본 발명 합성곱 신경망의 연산 회로를 보인 예시도이다.4 is an exemplary diagram illustrating an operation circuit of a convolutional neural network of the present invention.

합성곱 신경망의 연산 회로에 있어서, 가속 회로에 적합한 합성곱 신경망의 Conv-XP 프루닝 장치는 입력 데이터(10)에서 X 배열 또는 + 배열을 선택하는 먹스(100); 먹스(100)의 출력과 신경망의 가중치를 곱하는 곱셈기(200); 및 곱셈기(200)의 출력을 더하는 덧셈기(300)를 포함한다.In the computation circuit of the convolutional neural network, the Conv-XP pruning device of the convolutional neural network suitable for the acceleration circuit comprises: a mux 100 for selecting an X array or a + array from input data 10; a multiplier 200 that multiplies the output of the mux 100 and the weight of the neural network; and an adder 300 that adds the output of the multiplier 200 .

입력 데이터(10)는 3*3 배열이고, X배열은 대각선과 교차점의 입력 데이터(10)이고, + 배열은 십자선과 십자점의 입력 데이터(10)이고, 먹스(100)는 2입력1출력 먹스(100)가 4개이고, 곱셈기(200)와 가중치의 수는 5개이다.The input data 10 is a 3*3 array, the X array is the input data 10 of diagonal lines and intersections, the + array is the crosshairs and the cross-dot input data 10, and the mux 100 is 2 inputs 1 output. The mux 100 is four, and the number of multipliers 200 and weights is five.

먹스(100)가 입력 데이터(10)에서 대각선 입력 데이터(10) 또는 십자선 입력 데이터(10)의 값에 따라 입력 데이터(10)에 대한 먹스(100) 출력을 전환한다. 먹스(100)가 대각선 입력 데이터(10)와 십자선 입력 데이터(10)의 합값이 가지는 레벨에 따라 먹스(100) 출력을 전환할 수 있다. 또한, 입력 데이터(10)에 따른 합성곱 신경망 출력이 피드백되어 가중치가 학습될 수 있다. 이때, 가중치가 입력 데이터(10)의 X 배열 또는 + 배열 선택에 따라 달라질 수 있다.The mux 100 switches the output of the mux 100 for the input data 10 according to a value of the diagonal input data 10 or the cross-hair input data 10 in the input data 10 . The mux 100 may switch the output of the mux 100 according to the level of the sum of the diagonal input data 10 and the cross-hair input data 10 . In addition, the output of the convolutional neural network according to the input data 10 may be fed back to learn weights. In this case, the weight may vary depending on the selection of the X arrangement or the + arrangement of the input data 10 .

Conv-XP 프루닝 장치를 사용하여 연산 회로인 가속 회로를 구성하는 합성곱 신경망이 만들어진다.Using the Conv-XP pruning device, a convolutional neural network composing an acceleration circuit, which is a computational circuit, is created.

Conv-XP프루닝 장치를 사용했을 때 또 하나의 이점은 메모리 면적을 줄일 수 있다는 것이다. 종래 프루닝 기법에서는 각각의 0이 아닌 계수들이 곱해질 입력 데이터를 선택하기 위해, 각 곱셈기 앞에 위치하는 9-to-1 MUX의 선택신호를 같이 저장해야 한다. 한 개의 MUX에 4bit의 선택신호가 필요하고, 5개의 곱셈기를 가정하면 총 20bit의 선택신호가 필요하다. Another advantage of using the Conv-XP pruning device is that the memory area can be reduced. In the conventional pruning technique, in order to select input data to be multiplied by each non-zero coefficient, the selection signal of the 9-to-1 MUX positioned in front of each multiplier must be stored together. A 4-bit selection signal is required for one MUX, and assuming 5 multipliers, a total of 20-bit selection signal is required.

그러나 Conv-XP프루닝 장치를 사용하게 되면 X 배열과 + 배열 사이에만 선택하므로 각 곱셈기 앞에는 2-to-1 먹스만 있으며 되고 이 먹스의 선택 신호 길이는 1bit로 충분하다. 그리고 5개의 곱셈기 앞에 있는 먹스들이 같은 선택신호를 공유하므로 5개의 0이 아닌 계수가 하나의 선택 신호를 공유하게 된다. 그러므로 5개의 곱셈기에 대해 20bit의 선택신호가 1bit로 줄어들게 되고, 따라서 이 선택 신호들을 저장하는 메모리의 크기가 줄어들게 된다.However, when using the Conv-XP pruning device, there is only a 2-to-1 mux in front of each multiplier because it selects only between the X array and the + array. The length of the selection signal of this mux is sufficient as 1 bit. And since the muxes in front of the 5 multipliers share the same selection signal, 5 non-zero coefficients share one selection signal. Therefore, the selection signal of 20 bits for the 5 multipliers is reduced to 1 bit, and thus the size of the memory storing these selection signals is reduced.

VGG16VGG16 ResNet-50ResNet-50 baselinebaseline 88.44%88.44% 91.14%91.14% Conv-XPConv-XP 89.74%89.74% 91.30%91.30%

표1에서 VGG16과 ResNet-50에 대해 Conv-XP프루닝 장치를 적용하기 전의 정확도(‘baseline’)와 Conv-XP프루닝 장치를 적용한 후 재학습했을 때의 정확도(‘Conv-XP’)를 비교한다. 재학습은 학습 속도를 0.001로 하고 12 epoch 동안 학습하였다. 표1의 결과에 따르면 Conv-XP프루닝 장치를 적용하더라도 합성곱 신경망의 성능이 저하되지 않음을 알 수 있다. Conv-XP프루닝 장치를 사용했을 때 가속 회로의 복잡도 개선을 비교하기 위해서 세 가지의 회로를 RTL(register transfer level)에서 설계한 뒤 합성하여 비교하였다. 첫 번째는 프루닝 되지 않은 합성곱 신경망을 가정한 도 1과 같은 기존의 구조이고, 두 번째는 기존의 프루닝 기법이 적용된 합성곱 신경망을 위한 sparse 가속 회로인 도 2와 같은 구조이며, 세 번째는 Conv-XP프루닝 장치로 프루닝된 합성곱 신경망을 위한 sparse 가속 회로인 도 4와 같은 구조이다. 표 2에서는 각각을 ‘No sparsity’, ‘Conventional sparsity’, ‘Conv-XP’로 표기하였다. 합성 라이브러리는 Global Foundry사의 65nm 공정을 이용하였다. 표 2에서 PE(processing element)는 9개 또는 5개의 곱셈기(200)로 이루어져서 하나의 3x3 커널을 동시에 처리할 수 있는 블록을 의미하며, 이러한 PE가 16개 있는 것을 가정하였다. In Table 1, the accuracy before applying the Conv-XP pruning device ('baseline') and when re-learning after applying the Conv-XP pruning device ('Conv-XP') are shown for VGG16 and ResNet-50. Compare. The re-learning was performed for 12 epochs with the learning rate set to 0.001. According to the results in Table 1, it can be seen that the performance of the convolutional neural network does not deteriorate even if the Conv-XP pruning device is applied. In order to compare the improvement in the complexity of the acceleration circuit when using the Conv-XP pruning device, three circuits were designed at RTL (register transfer level) and then synthesized and compared. The first is the conventional structure as shown in Fig. 1 assuming an unpruned convolutional neural network, the second is the structure as shown in Fig. 2, which is a sparse acceleration circuit for a convolutional neural network to which the conventional pruning technique is applied, and the third is a sparse acceleration circuit for a convolutional neural network pruned by the Conv-XP pruning device, as shown in FIG. 4 . In Table 2, ‘No sparsity’, ‘Conventional sparsity’, and ‘Conv-XP’ are indicated respectively. The synthesis library used Global Foundry's 65nm process. In Table 2, a processing element (PE) means a block that can simultaneously process one 3x3 kernel by consisting of 9 or 5 multipliers 200, and it is assumed that there are 16 such PEs.

16 PEs16 PEs Weight BufferWeight Buffer Index BufferIndex Buffer Activation MUXActivation MUX MiscellaneousMiscellaneous TotalTotal No sparsityNo sparsity 642826642826 507042507042 -- -- 31993199 11530671153067 Conventional sparsityConventional sparsity 357640357640 281691281691 8189581895 2507825078 5656 746360746360 Conv-XPConv-XP 357364357364 281691281691 59165916 61576157 4747 651175651175

기존의 프루닝 기법을 위한 sparse 가속 회로(표2에서 ‘Conventional sparsity’)는 프루닝을 고려하지 않는 일반적인 가속 회로(표 2에서 ‘No sparsity’)에 비해 곱셈기(200)의 개수가 9개에서 5개로 줄고 필요한 계수의 개수도 9개에서 5개로 감소하므로, PE와 계수 버퍼(Weight Buffer)의 면적이 감소하였다. 그러나 그림 2에 그려져 있는 9-to-1 MUX들을 모아 놓은 Activation MUX가 추가되고, MUX 선택신호를 저장하는 Index Buffer도 역시 추가된다.합성곱 신경망을 Conv-XP프루닝 장치로 프루닝하면 sparse 가속 회로에서 Activation MUX와 Index Buffer의 크기를 줄일 수 있다. 기존의 프루닝 기법에서는 프루닝의 패턴이 없으므로 9개의 입력 데이터(10) 모두로부터 1개의 데이터를 선택하는 9-to-1 MUX가 필요했지만, Conv-XP에서는 두 가지의 패턴 중 하나로 프루닝하므로 2-to-1 MUX로 충분하다. 그러므로 Activation MUX 블록의 크기가 약 25%로 감소된다. 그리고 MUX 선택 신호의 비트 폭도 4 비트에서 1 비트로 줄어들고, 5개의 계수들이 하나의 선택신호를 공유하므로 결국 20 비트에서 1 비트로 줄어든다. 그 결과 Index Buffer의 크기가 약 7.2%로 감소하였다. 이러한 효과들로 인해 Conv-XP프루닝 장치를 가정하면 sparse 가속 회로의 면적이 기존의 sparse 가속 회로에 비해 약 12.8% 감소한다.In the sparse acceleration circuit ('Conventional sparsity' in Table 2) for the conventional pruning technique, the number of multipliers 200 is 9 in comparison to the general acceleration circuit ('No sparsity' in Table 2) that does not consider pruning. As the number of coefficients is reduced to 5 and the number of required coefficients is also reduced from 9 to 5, the area of PE and the counting buffer (Weight Buffer) is reduced. However, the Activation MUX that collects the 9-to-1 MUXs depicted in Figure 2 is added, and the Index Buffer that stores the MUX selection signal is also added. Pruning the convolutional neural network with the Conv-XP pruning device accelerates sparse acceleration. The size of Activation MUX and Index Buffer can be reduced in the circuit. In the existing pruning technique, since there is no pruning pattern, a 9-to-1 MUX that selects one data from all nine input data (10) was required, but Conv-XP prunes with one of two patterns. A 2-to-1 MUX should suffice. Therefore, the size of the Activation MUX block is reduced to about 25%. Also, the bit width of the MUX selection signal is reduced from 4 bits to 1 bit, and since 5 coefficients share one selection signal, it is eventually reduced from 20 bits to 1 bit. As a result, the size of the index buffer was reduced to about 7.2%. Due to these effects, assuming the Conv-XP pruning device, the area of the sparse acceleration circuit is reduced by about 12.8% compared to the conventional sparse acceleration circuit.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand that it can be done.

100: 먹스 200: 곱셈기
300: 덧셈기100: mux 200: multiplier
300: adder

Claims

In the computation circuit of a convolutional neural network,
a mux 100 for selecting an X array or a + array from the input data 10;
a multiplier 200 that multiplies the output of the mux 100 and a weight of the neural network; and
including an adder 300 that adds the output of the multiplier 200,
The input data 10 is a 3*3 array, the X array is input data 10 of diagonal lines and intersections, the + array is input data 10 of crosshairs and cross points, and the mux 100 is Conv-XP pruning device of a convolutional neural network suitable for an acceleration circuit, characterized in that the number of 2 input 1 output mux 100 is 4, and the number of the multipliers 200 and the weight is 5.

delete