KR20200041154A

KR20200041154A - Method and Apparatus for Detecting Fault of Multi-Core in Multi-Layer Perceptron Structure with Dropout

Info

Publication number: KR20200041154A
Application number: KR1020180121247A
Authority: KR
Inventors: 강성호; 이동수
Original assignee: 연세대학교 산학협력단
Priority date: 2018-10-11
Filing date: 2018-10-11
Publication date: 2020-04-21
Also published as: KR102134339B1

Abstract

According to embodiments of the present invention, provided is a computing system capable of verifying whether an error occurs in each core by comparing an original operation result value performed in an activated streaming processor to which a dropout technique is not applied and an operation result value performed in a deactivated streaming processor, in a dropout operation for solving overfitting.

Description

Method and Apparatus for Detecting Fault of Multi-Core in Multi-Layer Perceptron Structure with Dropout in Multi-layer Perceptron Structure with Dropout}

본 실시예가 속하는 기술 분야는 다중 연산코어가 데이터를 병렬 처리하는 컴퓨팅 시스템에서 연산코어의 오류를 검출하는 방법 및 장치에 관한 것이다.The technical field to which this embodiment pertains relates to a method and apparatus for detecting an error in a computing core in a computing system in which multiple computing cores process data in parallel.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this section merely provide background information for this embodiment, and do not constitute a prior art.

다수의 인공 뉴런으로 구성된 인공 신경망은 대량의 데이터를 반복적으로 연산을 실시하여 각 뉴런에 그 연산 값을 학습한다. 이러한 대량의 데이터의 반복 연산은 소수의 연산 유닛으로 구성된 CPU(Central Processing Unit)에서 학습을 진행하기에는 학습 시간이 오래 걸린다는 단점이 있으며 이를 극복하기 위해 많은 수의 연산 유닛으로 구성된 GPU(Graphics Processing Unit)가 필요하다.An artificial neural network composed of a plurality of artificial neurons repeatedly calculates a large amount of data and learns the computational value of each neuron. The repetitive operation of such a large amount of data has a disadvantage that it takes a long time to learn in a CPU (Central Processing Unit) composed of a small number of computational units, and a GPU (Graphics Processing Unit) composed of a large number of computational units to overcome this. ) Is required.

GPU를 활용한 인공 신경망의 학습의 효과가 검증되면서 다양한 방식의 인공 신경망의 학습이 GPU를 통해 이루어 졌으나 GPU는 그 연산 결과에 대한 신뢰성을 확보 할 수 있는 수단이 빈약하다.As the effects of learning of an artificial neural network using a GPU have been verified, learning of various types of artificial neural networks has been performed through the GPU, but the GPU has poor means to secure reliability in the result of the computation.

현재는 메모리 입출력 단계에서 오류를 검증하는 수단인 ECC(Error Correction Code)정도의 보정 수단을 가지고 있다. ECC를 통해 메모리의 입출력 과정에서 오류를 보정하지만 많은 수의 연산 코어들의 연산과정에서 발생하는 오류들의 검증 및 보정을 할 수 없다.Currently, it has a correction means such as Error Correction Code (ECC), which is a means for verifying errors in the memory input / output stage. ECC corrects errors in the process of inputting and outputting memory, but cannot verify and correct errors that occur during calculation of a large number of computing cores.

이러한 개별 코어의 연산 과정에서 발생하는 오류의 검증은 DMR(Dual Modular Redundancy)와 같은 방법을 이용하여 검증을 실시하였지만 DMR을 사용 할 경우 소비 자원이 두 배가 되는 단점이 존재하며 이는 한정된 자원의 활용을 통한 인공 신경망의 학습에 있어 학습시간의 지연을 발생 시키며 이는 효율성 측면에서 문제가 있다.The verification of errors occurring in the process of calculating individual cores was performed using the same method as DMR (Dual Modular Redundancy). However, when using DMR, there is a disadvantage in that the consumption resources are doubled. In learning the artificial neural network through, a delay in learning time occurs, which is problematic in terms of efficiency.

한국공개특허공보 제10-2016-0099587호 (2016.08.22.)Korean Patent Publication No. 10-2016-0099587 (2016.08.22.)

본 발명의 실시예들은 과적합을 해결하기 위한 드롭아웃 동작에서 드롭아웃이 적용되지 않은 활성화된 스트리밍 프로세서에서 진행된 원래의 연산 결과 값과 비활성화된 스트리밍 프로세서에서 진행한 연산 결과 값의 비교를 통해 각각의 코어에서 오류가 발생하였는지를 검증하는 데 발명의 주된 목적이 있다.In the embodiments of the present invention, each of the results through comparison of the result of the operation performed in the deactivated streaming processor and the original operation result in the activated streaming processor without dropout applied in the dropout operation to solve the overfitting The main purpose of the invention is to verify whether an error has occurred in the core.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Still other unspecified objects of the present invention can be further considered within the scope of being easily deduced from the following detailed description and its effects.

본 실시예의 일 측면에 의하면, 다중 연산코어가 데이터를 병렬 처리하는 컴퓨팅 시스템이 다층 퍼셉트론 구조에서 연산코어의 오류 검출 방법에 있어서, 그래픽 프로세싱 유닛(Graphics Processing Unit, GPU)이 기설정된 비율로 드롭아웃(Drop Out)을 수행하면, 상기 그래픽 프로세싱 유닛에 포함된 다중 연산코어 중에서 상기 드롭아웃이 적용되지 않은 활성화 연산코어 블록에서 상기 다층 퍼셉트론 구조에 따라 정의된 연산을 수행하여 제1 결과값을 출력하고, 상기 다중 연산코어 중에서 상기 드롭아웃이 적용된 비활성화 연산코어 블록에서 상기 다층 퍼셉트론 구조에 따라 정의된 연산을 수행하여 제2 결과값을 출력하는 단계, 상기 그래픽 프로세싱 유닛에 연결된 중앙 프로세싱 유닛(Central Processing Unit, CPU)은 상기 제1 결과값과 상기 제2 결과값을 비교하여 상기 그래픽 프로세싱 유닛에 포함된 연산코어 중에서 오류 발생한 연산코어를 검출하는 단계를 포함하는 연산코어의 오류 검출 방법을 제공한다.According to an aspect of the present embodiment, in a method of detecting an error of an operation core in a multi-layer perceptron structure in a computing system in which multiple operation cores process data in parallel, a graphics processing unit (GPU) drops out at a predetermined rate. When (Drop Out) is performed, among the multiple operation cores included in the graphic processing unit, the operation result defined according to the multi-layer perceptron structure is performed on the activated operation core block to which the dropout is not applied, and a first result value is output. , Performing a defined operation according to the multi-layer perceptron structure in an inactive computation core block to which the dropout is applied among the multiple computation cores, and outputting a second result value, a central processing unit connected to the graphic processing unit (Central Processing Unit) , CPU) compares the first result value with the second result value Group provides a processing core among the failure operation core erroneous detection of the operation core comprising the steps included in the method of detecting a graphics processing unit.

본 실시예의 다른 측면에 의하면, 다중 연산코어가 데이터를 병렬 처리하는 컴퓨팅 시스템에 있어서, 기설정된 비율로 드롭아웃(Drop Out)을 수행하면, 상기 다중 연산코어 중에서 상기 드롭아웃이 적용되지 않은 활성화 연산코어 블록에서 상기 다층 퍼셉트론 구조에 따라 정의된 연산을 수행하여 제1 결과값을 출력하고, 상기 다중 연산코어 중에서 상기 드롭아웃이 적용된 비활성화 연산코어 블록에서 상기 다층 퍼셉트론 구조에 따라 정의된 연산을 수행하여 제2 결과값을 출력하는 그래픽 프로세싱 유닛, 및 상기 그래픽 프로세싱 유닛에 연결되어 상기 제1 결과값과 상기 제2 결과값을 비교하여 상기 그래픽 프로세싱 유닛에 포함된 연산코어 중에서 오류 발생한 연산코어를 검출하는 중앙 프로세싱 유닛을 포함하는 데이터를 병렬 처리하는 컴퓨팅 시스템을 제공한다.According to another aspect of the present embodiment, in a computing system in which multiple computation cores process data in parallel, when a dropout is performed at a predetermined rate, an activation operation in which the dropout is not applied among the multiple computation cores is performed. The core block performs an operation defined according to the multi-layer perceptron structure, outputs a first result value, and performs an operation defined according to the multi-layer perceptron structure in an inactive operation core block in which the dropout is applied among the multiple operation cores. A graphic processing unit outputting a second result value, and connected to the graphic processing unit, comparing the first result value with the second result value to detect an error-producing operation core among the operation cores included in the graphic processing unit For computing parallel processing of data containing a central processing unit Provide system.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 과적합을 해결하기 위한 드롭아웃 동작에서 드롭아웃이 적용되지 않은 활성화된 스트리밍 프로세서에서 진행된 원래의 연산 결과 값과 비활성화된 스트리밍 프로세서에서 진행한 연산 결과 값의 비교를 통해 각각의 코어에서 오류가 발생하였는지를 검증할 수 있는 효과가 있다.As described above, according to embodiments of the present invention, in the dropout operation for solving the overfitting, the result of the original operation performed in the activated streaming processor without dropout and the operation performed in the inactive streaming processor It is effective to verify whether an error has occurred in each core by comparing the result values.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if the effects are not explicitly mentioned here, the effects described in the following specification expected by the technical features of the present invention and the potential effects thereof are treated as described in the specification of the present invention.

도 1은 본 발명의 일 실시예에 따른 컴퓨팅 시스템을 예시한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 컴퓨팅 시스템에 적용된 다층 퍼셉트론 구조를 예시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 컴퓨팅 시스템의 다중 연산 코어의 동작을 예시한 도면이다.
도 4는 본 발명의 다른 실시예에 따른 연산코어의 오류 검출 방법을 예시한 흐름도이다.
도 5는 본 발명의 실시예들에 따라 수행된 드롭아웃 비율에 대한 오류 검출율을 예시한 그래프이다.1 is a block diagram illustrating a computing system according to an embodiment of the present invention.
2 is a diagram illustrating a multi-layer perceptron structure applied to a computing system according to an embodiment of the present invention.
3 is a diagram illustrating the operation of multiple computing cores of a computing system according to an embodiment of the present invention.
4 is a flowchart illustrating a method for detecting an error of an operational core according to another embodiment of the present invention.
5 is a graph illustrating an error detection rate for a dropout rate performed according to embodiments of the present invention.

이하, 본 발명을 설명함에 있어서 관련된 공지기능에 대하여 이 분야의 기술자에게 자명한 사항으로서 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하고, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다.Hereinafter, when it is determined that the subject matter of the present invention may be unnecessarily obscured by those skilled in the art with respect to known functions related to the present invention, the detailed description will be omitted, and some embodiments of the present invention will be omitted. It will be described in detail through exemplary drawings.

도 1은 컴퓨팅 시스템을 예시한 블록도이다. 컴퓨팅 시스템은 과적합을 해결하기 위한 드롭아웃 동작에서 드롭아웃이 적용되지 않은 활성화된 스트리밍 프로세서에서 진행된 원래의 연산 결과 값과 비활성화된 스트리밍 프로세서에서 진행한 연산 결과 값의 비교를 통해 각각의 코어에서 오류가 발생하였는지를 검증한다. 컴퓨팅 시스템은 GPU를 활용한 인공 신경망의 학습과정에서 발생할 수 있는 오류를 탐지하고, 오류에 따른 인공 신경망에서 발생 가능한 오작동 및 학습시간 지연의 원인을 파악할 수 있고, 높은 신뢰성을 가지는 인공 신경망의 구축을 가능하게 한다.1 is a block diagram illustrating a computing system. The computing system fails at each core by comparing the result of the operation result from the deactivated streaming processor with the original operation result value from the active streaming processor without dropout in the dropout operation to solve the overfitting. Verify that has occurred. The computing system detects errors that may occur in the learning process of the artificial neural network using the GPU, can identify the cause of malfunctions and learning time delays that may occur in the artificial neural network due to errors, and builds an artificial network with high reliability. It is possible.

도 1을 참조하면, 컴퓨팅 시스템(10)은 중앙 프로세싱 유닛(CPU, 100) 및 그래픽 프로세싱 유닛(GPU, 200)을 포함한다. 컴퓨팅 시스템(10)은 도 1에서 예시적으로 도시한 다양한 구성요소들 중에서 일부 구성요소를 생략하거나 다른 구성요소를 추가로 포함할 수 있다. 예컨대, 컴퓨팅 시스템(10)은 저장부 또는 노스브릿지(North Bidge)를 추가로 포함할 수 있다. 노스브릿지는 CPU, 메모리, 바이오스 롬, GPU, 사우스브릿드 등의 고속 장치를 버스로 연결하여 제어하는 집적회로이다. 사우스브릿지는 주변장치의 데이터 흐름을 제어하거나 전원을 관리한다.Referring to FIG. 1, the computing system 10 includes a central processing unit (CPU, 100) and a graphics processing unit (GPU, 200). The computing system 10 may omit some components or further include other components among various components illustrated by way of example in FIG. 1. For example, the computing system 10 may further include a storage unit or a North Bridge. Northbridge is an integrated circuit that controls and connects high-speed devices such as CPU, memory, BIOS, GPU, and Southbridge by bus. Southbridge controls the data flow of peripheral devices or manages power.

중앙 프로세싱 유닛(CPU, 100)은 외부에서 정보를 입력 받고, 기억하고, 컴퓨터 프로그램의 명령어를 해석하여 연산하고, 외부로 출력하는 역할을 한다. 즉, 중앙 프로세싱 유닛은 컴퓨터 부품과 정보를 교환하면서 컴퓨터 전체의 동작을 제어한다. The central processing unit (CPU, 100) receives information from outside, stores, interprets and interprets computer program instructions, and outputs them to the outside. In other words, the central processing unit controls the overall operation of the computer while exchanging information with computer parts.

그래픽 프로세싱 유닛(GPU, 200)은 컴퓨터 그래픽스를 위한 계산을 수행할 뿐만 아니라, 응용 프로그램들의 계산에 사용할 수 있다. 프로그램 가능한 층과 고정도 연산을 그래픽 파이프라인에 연결하여 데이터에 스트림 프로세싱을 수행할 수 있다. 그래픽 프로세싱 유닛은 병렬로 한번에 하나의 커널을 흐름 속의 많은 레코드에 실행시킨다. 흐름이란 단순히 유사한 계산을 필요로 하는 레코드의 모음이며, 흐름으로 데이터 병렬성을 구현할 수 있다. 커널은 흐름 속의 각 요소에 적용되는 함수이다. 그래픽 프로세싱 유닛은 대량의 코어들을 포함하고, 내부에 메모리를 갖는다. 복수의 그래픽 프로세싱 유닛은 중앙 프로세싱 유닛(100) 또는 노스브릿지에 연결된다.The graphics processing unit (GPU, 200) not only performs calculations for computer graphics, but can also be used to calculate applications. Stream processing can be performed on data by linking programmable layers and high-precision operations to the graphics pipeline. The graphics processing unit runs one kernel at a time in parallel on many records in the flow. A flow is simply a collection of records that require similar computation, and data parallelism can be implemented with flow. The kernel is a function applied to each element in the flow. The graphics processing unit contains a large number of cores and has memory therein. The plurality of graphic processing units are connected to the central processing unit 100 or the Northbridge.

그래픽 프로세싱 유닛(200)은 복수의 스트리밍 멀티프로세서(Streaming Multiprocessor, SM)를 포함한다. 그래픽 프로세싱 유닛(200)은 복수의 스트리밍 멀티프로세서의 동작을 제어하는 스트리밍 멀티프로세서 제어부를 포함할 수 있다. 각 스트리밍 멀티프로세서(SM)마다 독립적인 명령어 스케줄러를 갖추어 복수의 스레드를 동시에 실행할 수 있습니다. 스트리밍 프로세서(Streaming Processor, SP)는 기본적인 논리 및 수학 연산을 수행한다. SFU(Special Function Unit)는 초월 함수, 픽셀 속성 보간 등의 연산에 사용되며 부동 소수점 곱셈기도 포함한다. 스트리밍 멀티프로세서(SM)에서 여러 개의 스레드가 동시 실행될 때 기설정된 개수의 SP와 기설정된 개수의 SFU로 동일한 명령어(Instruction)가 전달(Broadcasting)되는데 이 때 각 유닛(SP 또는 SFU)은 동일한 명령을 수행하지만 레지스터와 메모리 주소는 각각 다르게 관리된다. 공유메모리(Shared Memory)는 기설정된 용량을 갖고 스트리밍 멀티프로세서 내에서 실행되는 스레드 사이의 데이터 교환을 수행한다. LD/ST는 독출 명령 또는 기록 명령을 수행한다. The graphic processing unit 200 includes a plurality of streaming multiprocessors (SMs). The graphics processing unit 200 may include a streaming multiprocessor control unit controlling operations of a plurality of streaming multiprocessors. Each streaming multiprocessor (SM) has an independent instruction scheduler that can run multiple threads simultaneously. Streaming Processor (SP) performs basic logical and mathematical operations. SFU (Special Function Unit) is used for operations such as transcendental function and pixel property interpolation, and it also includes floating point multiplier. When multiple threads are executed simultaneously in a streaming multiprocessor (SM), the same instruction (Broadcasting) is delivered to a predetermined number of SPs and a predetermined number of SFUs, where each unit (SP or SFU) sends the same instruction. However, registers and memory addresses are managed differently. Shared memory has a predetermined capacity and performs data exchange between threads running in a streaming multiprocessor. LD / ST performs a read command or a write command.

그래픽 프로세싱 유닛(200)은 기설정된 비율로 드롭아웃(Drop Out)을 수행하면, 다중 연산코어 중에서 드롭아웃이 적용되지 않은 활성화 연산코어 블록에서 다층 퍼셉트론 구조에 따라 정의된 연산을 수행하여 제1 결과값을 출력한다. 그래픽 프로세싱 유닛(200)은 다중 연산코어 중에서 상기 드롭아웃이 적용된 비활성화 연산코어 블록에서 상기 다층 퍼셉트론 구조에 따라 정의된 연산을 수행하여 제2 결과값을 출력한다.When the graphics processing unit 200 performs dropout at a predetermined rate, the first result is performed by performing a calculation defined according to a multi-layer perceptron structure in an active computed core block to which dropout is not applied among multiple computed cores. Print the value. The graphic processing unit 200 performs a defined operation according to the multi-layer perceptron structure in an inactive operation core block to which the dropout is applied among multiple operation cores, and outputs a second result value.

중앙 프로세싱 유닛(100)는 그래픽 프로세싱 유닛(200)에 연결되어 제1 결과값과 제2 결과값을 비교하여 그래픽 프로세싱 유닛에 포함된 연산코어 중에서 오류 발생한 연산코어를 검출한다. The central processing unit 100 is connected to the graphic processing unit 200 and compares the first result value and the second result value to detect an error-producing operation core among the operation cores included in the graphic processing unit.

도 2는 컴퓨팅 시스템에 적용된 다층 퍼셉트론 구조를 예시한 도면이다.2 is a diagram illustrating a multi-layer perceptron structure applied to a computing system.

컴퓨팅 시스템은 다층 퍼셉트론(Multi-Layer Perceptron)의 구조와 비활성화된 코어에서 도출된 결과 값과 기존의 연산 코어에서 도출된 결과 값을 비교한다.The computing system compares the structure of the multi-layer perceptron and the result value derived from the deactivated core and the result value derived from the existing computational core.

다층 퍼셉트론의 구조에서 뉴런들의 활성화 함수(Activation Function)에 입력되기 전 가중치 연산은 수학식 1과 같이 표현된다.In the structure of the multi-layer perceptron, the weight calculation before being input to the activation function of neurons is expressed as Equation 1.

y는 출력값이고, x는 입력값이고, w는 가중치이고, n은 뉴런의 개수이다.y is the output value, x is the input value, w is the weight, and n is the number of neurons.

다층 퍼셉트론의 연산은 n개의 인공 뉴런에 입력되는 입력값과 각 인공 뉴런이 나타내는 가중치를 합성 곱하여 도출되는 결과를 활성화 함수에 전달하여 활성화 함수에 의해 그 값을 조정하여 출력하게 된다. The multi-layer perceptron is calculated by multiplying the input value input to n artificial neurons and the weight represented by each artificial neuron, and passing the result derived to the activation function to adjust and output the value by the activation function.

다층 퍼셉트론 구조는 네트워크로 연결된 하나 이상의 레이어의 노드들을 포함한다. 복수의 레이어는 입력 레이어, 은닉 레이어, 및 출력 레이어를 포함하며, 각각의 레이어는 가중치를 갖고 가중치를 학습한다. 각각의 레이어는 파라미터를 포함할 수 있다. 레이어의 파라미터는 학습가능한 필터 집합을 포함한다. 파라미터는 노드 간의 가중치 및/또는 바이어스를 포함한다. 복수의 파라미터를 학습하고, 일부 파라미터는 공유될 수 있다. 신경망의 레이어, 가중치의 초기치, 노드에 들어갈 바이어스, 노드에서 사용할 활성 함수, 신경망의 학습률, 전체 오차를 측정하는 손실함수, 신경망의 오차를 최소화하는 알고리즘, 과적응을 규제하는 알고리즘은 구현되는 설계에 따라 적합한 수치 및 함수가 설정될 수 있다.The multilayer perceptron structure includes nodes of one or more layers connected by a network. The plurality of layers includes an input layer, a hidden layer, and an output layer, and each layer has a weight and learns weight. Each layer can include parameters. The layer's parameters include a set of learnable filters. The parameters include weights and / or biases between nodes. Learning a plurality of parameters, some parameters can be shared. The layer of the neural network, the initial value of the weight, the bias to enter the node, the active function to be used at the node, the learning rate of the neural network, the loss function to measure the total error, the algorithm to minimize the error of the neural network, and the algorithm to regulate the over-adaptation depend on the design being implemented. Accordingly, suitable values and functions can be set.

인공 신경망의 각 레이어마다 인공 뉴런의 출력 값을 연산하고 역전파(Back Propagation)을 통해 가중치를 조정하며 연산을 실시할 경우 과적합의 문제가 발생하게 된다. 학습 과정에서의 정확도는 높게 나오나 실제 사용시 정확도를 감소시킨다. 이러한 문제를 해결하기 위하여 드롭아웃(Dropout) 기법이 제시되었고, 드롭아웃이 임의의 뉴런에 적용된 연산의 예시는 수학식 2와 같이 표현된다.When each layer of an artificial neural network calculates the output value of an artificial neuron, adjusts the weight through back propagation, and performs calculation, a problem of overfitting occurs. Accuracy in the learning process is high, but decreases accuracy in actual use. To solve this problem, a dropout technique has been proposed, and an example of an operation in which dropout is applied to an arbitrary neuron is expressed as Equation 2.

수학식 2는 하나의 은닉 레이어(Hidden Layer)가 4개의 뉴런으로 구성된 인공 신경망에서 50%의 확률로 드롭아웃 과정을 진행했을 때를 나타낸다. 수학식 2를 전개하면, 수학식 3이 도출되며 학습과정에서 연산 부하를 줄여 학습 속도의 개선 및 과적합의 방지가 가능하다. 드롭아웃에 관한 비율은 최대치 1을 기준으로 0.1부터 0.9까지의 범위 내에서 설정될 수 있으며, 드롭아웃에 관한 비율이 0.5 이상이면, 오류 검출율이 100%이다.Equation 2 shows a case where a hidden layer has a dropout process with a probability of 50% in an artificial neural network composed of 4 neurons. When Equation 2 is developed, Equation 3 is derived, and it is possible to improve the learning speed and prevent overfitting by reducing the computational load in the learning process. The ratio of the dropout may be set within a range of 0.1 to 0.9 based on the maximum value 1, and when the ratio of the dropout is 0.5 or more, the error detection rate is 100%.

컴퓨팅 시스템은 0의 연산에 인접한 뉴런의 연산을 동일하게 실시하여 그 값을 비교함으로써 GPU의 연산 코어를 사용하여 연산을 실시하였을 때 발생할 수 있는 오류를 검증한다. 다층 퍼셉트론 구조에서 드롭아웃에 따라 연산을 하지 않도록 설정된 노드가 인접한 노드의 연산을 동일하게 수행한다.The computing system verifies errors that may occur when an operation is performed using a computation core of a GPU by performing the same operation of neurons adjacent to the operation of 0 and comparing the values. In a multi-layer perceptron structure, nodes configured not to perform operations according to dropouts perform operations of adjacent nodes in the same way.

컴퓨팅 시스템이 드롭아웃이 적용된 뉴런을 이용한 연산은 수학식 4와 같이 표현된다. The calculation using the neuron to which the computing system has applied dropout is expressed as in Equation 4.

수학식 4를 전개하면 수학식 5가 도출되며 행렬의 1항과 2항, 3항과 4항의 수식이 일치하는 것을 확인할 수 있다. 복수의 그래픽 프로세싱 유닛에서 스트리밍 멀티프로세서(Streaming Multiprocessor, SM) 내의 제어 로직과 명령 캐시를 공유하는 스트리밍 프로세서(Streaming Processor, SP)에 대하여 다층 퍼셉트론 구조에서 대응하는 노드의 개수를 블록으로 지정하여 생성된 스레드가 각각의 스트리밍 프로세서에서 연산을 수행한다.When Equation 4 is developed, Equation 5 is derived, and it can be confirmed that the equations in terms 1, 2, 3, and 4 of the matrix match. In a multi-layer perceptron structure, a number of corresponding nodes in a multi-layer perceptron structure is generated in blocks for a streaming processor (SP) that shares control caches and control caches in a streaming multiprocessor (SM) in a plurality of graphic processing units. Threads perform operations on each streaming processor.

수학식 5를 GPU의 스트리밍 프로세서에 대입하기 위해 스트리밍 멀티프로세서 집단의 스트리밍 프로세서의 개수에 해당하는 만큼 뉴런의 개수를 블록으로 지정하여 생성된 스레드가 각각의 스트리밍 프로세서에서 연산을 진행 할 수 있다. 드롭아웃이 적용되지 않은 스트리밍 프로세서에서 진행된 원래의 연산 결과값과 드롭아웃으로 인해 비활성화된 스트리밍 프로세서에서 진행한 연산 결과값의 비교를 통해 각각의 코어에서 오류가 발생하였는지를 검증한다.In order to substitute Equation 5 into the streaming processor of the GPU, the number of neurons is designated as a block corresponding to the number of streaming processors of the streaming multiprocessor group, so that the generated thread can perform operations on each streaming processor. It is verified whether an error has occurred in each core by comparing the result of an operation performed by a streaming processor inactive due to dropout and an original operation result performed by a streaming processor without dropout applied.

GPU에서 수학식 5의 연산을 실시하였을 때의 예시는 도 3에 도시된다.An example when the calculation of Equation 5 is performed on the GPU is shown in FIG. 3.

드롭아웃이 적용되지 않은 스트리밍 프로세서는 제1 결과값을 출력하고, 드롭아웃이 적용된 스트리밍 프로세서는 제2 결과값을 출력한다. 컴퓨팅 시스템은 제1 결과값과 제2 결과값을 비교하여 그래픽 프로세싱 유닛에 포함된 연산코어 중에서 오류 발생한 연산코어를 검출한다.The streaming processor to which the dropout is not applied outputs the first result, and the streaming processor to which the dropout is applied outputs the second result. The computing system compares the first result value and the second result value to detect an error-producing operation core among the operation cores included in the graphic processing unit.

컴퓨팅 시스템에 포함된 구성요소들이 도 1에서는 분리되어 도시되어 있으나, 복수의 구성요소들은 상호 결합되어 적어도 하나의 모듈로 구현될 수 있다. 구성요소들은 장치 내부의 소프트웨어적인 모듈 또는 하드웨어적인 모듈을 연결하는 통신 경로에 연결되어 상호 간에 유기적으로 동작한다. 이러한 구성요소들은 하나 이상의 통신 버스 또는 신호선을 이용하여 통신한다.Although components included in the computing system are illustrated separately in FIG. 1, a plurality of components may be combined with each other and implemented as at least one module. The components are connected to a communication path connecting a software module or hardware module inside the device to operate organically with each other. These components communicate using one or more communication buses or signal lines.

컴퓨팅 시스템은 하드웨어, 펌웨어, 소프트웨어 또는 이들의 조합에 의해 로직회로 내에서 구현될 수 있고, 범용 또는 특정 목적 컴퓨터를 이용하여 구현될 수도 있다. 장치는 고정배선형(Hardwired) 기기, 필드 프로그램 가능한 게이트 어레이(Field Programmable Gate Array, FPGA), 주문형 반도체(Application Specific Integrated Circuit, ASIC) 등을 이용하여 구현될 수 있다. 또한, 장치는 하나 이상의 프로세서 및 컨트롤러를 포함한 시스템온칩(System on Chip, SoC)으로 구현될 수 있다.The computing system may be implemented in a logic circuit by hardware, firmware, software, or a combination thereof, or may be implemented using a general purpose or special purpose computer. The device may be implemented using a fixed-wired device, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like. In addition, the device may be implemented as a System on Chip (SoC) including one or more processors and controllers.

컴퓨팅 시스템은 하드웨어적 요소가 마련된 컴퓨팅 디바이스에 소프트웨어, 하드웨어, 또는 이들의 조합하는 형태로 탑재될 수 있다. 컴퓨팅 디바이스는 각종 기기 또는 유무선 통신망과 통신을 수행하기 위한 통신 모뎀 등의 통신장치, 프로그램을 실행하기 위한 데이터를 저장하는 메모리, 프로그램을 실행하여 연산 및 명령하기 위한 마이크로프로세서 등을 전부 또는 일부 포함한 다양한 장치를 의미할 수 있다.The computing system may be mounted on a computing device provided with hardware elements in the form of software, hardware, or a combination thereof. Computing devices include various devices or communication devices such as communication modems for performing communication with wired / wireless communication networks, memory for storing data for executing programs, and microprocessors for executing and calculating and executing programs. It can mean a device.

도 4는 본 발명의 다른 실시예에 따른 연산코어의 오류 검출 방법을 예시한 흐름도이다. 연산코어의 오류 검출 방법은 컴퓨팅 시스템에 의해 수행될 수 있다. 컴퓨팅 시스템이 수행하는 동작에 관한 상세한 설명과 중복되는 설명은 생략하기로 한다.4 is a flowchart illustrating a method for detecting an error of an operational core according to another embodiment of the present invention. The error detection method of the computational core may be performed by a computing system. Detailed descriptions of operations performed by the computing system and overlapping descriptions will be omitted.

단계 S410에서, 그래픽 프로세싱 유닛(Graphics Processing Unit, GPU)이 기설정된 비율로 드롭아웃(Drop Out)을 수행하면, 그래픽 프로세싱 유닛에 포함된 다중 연산코어 중에서 드롭아웃이 적용되지 않은 활성화 연산코어 블록에서 다층 퍼셉트론 구조에 따라 정의된 연산을 수행하여 제1 결과값을 출력하고, 다중 연산코어 중에서 드롭아웃이 적용된 비활성화 연산코어 블록에서 다층 퍼셉트론 구조에 따라 정의된 연산을 수행하여 제2 결과값을 출력한다. 드롭아웃이 적용되지 않은 스트리밍 프로세서는 제1 결과값을 출력하고, 드롭아웃이 적용된 스트리밍 프로세서는 제2 결과값을 출력한다.In step S410, if the graphics processing unit (Graphics Processing Unit, GPU) performs a dropout (Drop Out) at a predetermined rate, among the multiple calculation cores included in the graphics processing unit, the active calculation core block to which the dropout is not applied is Perform the operation defined according to the multi-layer perceptron structure to output the first result value, and perform the operation defined according to the multi-layer perceptron structure in the inactive operation core block with dropout applied among the multiple operation cores to output the second result value. . The streaming processor to which the dropout is not applied outputs the first result, and the streaming processor to which the dropout is applied outputs the second result.

단계 S420에서, 그래픽 프로세싱 유닛에 연결된 중앙 프로세싱 유닛(Central Processing Unit, CPU)은 제1 결과값과 제2 결과값을 비교하여 그래픽 프로세싱 유닛에 포함된 연산코어 중에서 오류 발생한 연산코어를 검출한다.In step S420, the central processing unit (CPU) connected to the graphic processing unit compares the first result value and the second result value to detect an error-producing operation core among the operation cores included in the graphic processing unit.

도 5는 GPU를 활용하여 다층 퍼셉트론을 활용하여 학습을 진행하는 대표적인 데이터 세트인 MNIST를 각기 다른 드롭아울 비율을 설정하여 진행하였을 때 오류 검출율을 나타낸 그래프이다. 드롭아웃 비율이 50% 이상일 때 비활성화된 스트리밍 프로세서가 모든 활성화된 스트리밍 프로세서를 포함할 수 있으므로 오류 검출율이 100%를 나타내게 된다.FIG. 5 is a graph showing an error detection rate when MNIST, which is a representative data set for learning using a multi-layer perceptron using a GPU, is set with different drop-out ratios. When the dropout rate is 50% or more, an inactive streaming processor may include all active streaming processors, and thus, an error detection rate is 100%.

본 실시예들에 의하면, GPU를 활용한 다층 퍼셉트의 연산에서 발생하는 오류들에 대한 확인이 가능하며 학습과정에서 발생하는 비이상적인 수행시간에 대한 원인 파악이 가능하다. 인공 신경망에 대한 학습 결과의 신뢰성 확보가 가능하고, 보다 정확한 인공 신경망의 설계가 가능하다. According to the present embodiments, it is possible to check errors occurring in the operation of the multi-layer percept using the GPU, and it is possible to identify the cause of the non-ideal execution time occurring in the learning process. It is possible to secure the reliability of the learning results for the artificial neural network, and more accurate design of the artificial neural network is possible.

도 4에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나 이는 예시적으로 설명한 것에 불과하고, 이 분야의 기술자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 4에 기재된 순서를 변경하여 실행하거나 또는 하나 이상의 과정을 병렬적으로 실행하거나 다른 과정을 추가하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이다.Although FIG. 4 describes that each process is executed sequentially, this is merely illustrative, and a person skilled in the art changes and executes the sequence described in FIG. 4 without departing from the essential characteristics of the embodiments of the present invention. Or, it may be applied by various modifications and variations by executing one or more processes in parallel or adding other processes.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 매체에 기록될 수 있다. 컴퓨터 판독 가능한 매체는 실행을 위해 프로세서에 명령어를 제공하는 데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예를 들면, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.The operation according to the present embodiments may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. Computer readable media refers to any media that participates in providing instructions to a processor for execution. Computer-readable media may include program instructions, data files, data structures, or combinations thereof. For example, there may be a magnetic medium, an optical recording medium, a memory, and the like. The computer program may be distributed over a networked computer system to store and execute computer readable code in a distributed manner. Functional programs, codes, and code segments for implementing the present embodiment can be easily inferred by programmers in the technical field to which this embodiment belongs.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The present embodiments are for explaining the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these examples. The protection scope of the present embodiment should be interpreted by the claims below, and all technical spirits within the equivalent range should be interpreted as being included in the scope of the present embodiment.

10: 컴퓨팅 시스템
100: 중앙 프로세싱 유닛
200: 그래픽 프로세싱 유닛10: computing system
100: central processing unit
200: graphics processing unit

Claims

In a computing system in which multiple computing cores process data in parallel, in a multi-layer perceptron structure, an error detection method of a computing core includes:
If the graphics processing unit (Graphics Processing Unit, GPU) performs dropout at a predetermined rate, among the multiple computational cores included in the graphics processing unit, the multi-core in the active computational core block to which the dropout is not applied is multi-layered. Perform the operation defined according to the perceptron structure to output the first result value, and perform the operation defined according to the multi-layer perceptron structure in the inactive operation core block to which the dropout is applied among the multiple operation cores to obtain the second result value. Outputting;
A central processing unit (CPU) connected to the graphic processing unit compares the first result value and the second result value to detect an error-producing operation core among the operation cores included in the graphic processing unit.
Method of error detection of the operation core comprising a.

According to claim 1,
The ratio related to the dropout is set within a range from 0.1 to 0.9 based on the maximum value of 1, and when the ratio related to the dropout is 0.5 or more, an error detection rate of 100% is calculated, characterized in that the error detection rate is 100%. Way.

According to claim 1,
The multi-layer perceptron structure includes nodes of a plurality of layers connected by a network, the plurality of layers includes an input layer, a hidden layer, and an output layer, each layer having a weight and learning the weight Error detection method

According to claim 1,
In the multi-layer perceptron structure, a node configured not to perform an operation according to the dropout performs the same operation of an adjacent node,
Generated by designating the number of corresponding nodes in the multi-layer perceptron structure as a block for a streaming processor (SP) sharing a control logic and an instruction cache in a streaming multiprocessor (SM) in the graphic processing unit A method for detecting an error in an operation core, characterized in that a thread performs an operation in each streaming processor.

According to claim 4,
The streaming processor to which the dropout is not applied outputs the first result value,
The method for detecting an error in an operational core, characterized in that the streaming processor to which the dropout is applied outputs the second result value.

In a computing system in which multiple computing cores process data in parallel,
When a drop-out is performed at a preset ratio, an active operation core block to which the drop-out is not applied among the multiple operation cores performs a defined operation according to the multi-layer perceptron structure to output a first result value, A graphics processing unit performing a calculation defined according to the multi-layer perceptron structure in an inactive calculation core block to which the dropout is applied among the multiple calculation cores to output a second result value; And
A central processing unit connected to the graphic processing unit to detect an error-producing operation core among operation cores included in the graphic processing unit by comparing the first result value and the second result value
Computing system for parallel processing of data comprising a.

The method of claim 6,
The ratio related to the dropout is set within a range from 0.1 to 0.9 based on the maximum value of 1, and when the ratio related to the dropout is 0.5 or more, an error detection rate is 100% to process data in parallel. Computing system.

The method of claim 6,
The multi-layer perceptron structure includes nodes of a plurality of layers connected by a network, the plurality of layers includes an input layer, a hidden layer, and an output layer, each layer having a weight and learning the weight A computing system that processes data in parallel.

The method of claim 6,
In the multi-layer perceptron structure, a node configured not to perform an operation according to the dropout performs the same operation of an adjacent node,
By designating the number of corresponding nodes in the multi-layer perceptron structure as a block for a streaming processor (SP) that shares a control cache and a command cache in a streaming multiprocessor (SM) in the plurality of graphic processing units, as a block A computing system for parallel processing of data characterized in that the created thread performs an operation on each streaming processor.

The method of claim 9,
The streaming processor to which the dropout is not applied outputs the first result value,
The streaming processor to which the dropout is applied is a computing system for processing data in parallel, characterized by outputting the second result value.