KR102157441B1

KR102157441B1 - Learning method for neural network using relevance propagation and service providing apparatus

Info

Publication number: KR102157441B1
Application number: KR1020190064861A
Authority: KR
Inventors: 강제원; 유채화; 김나영
Original assignee: 이화여자대학교 산학협력단
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2020-09-17

Abstract

A neural network training method using relevance transfer comprises the following steps of: determining, by a training device, a first loss function by performing a forward transfer process using training data on a neural network; calculating, by the training device, a loss function for each layer based on a feature map of each layer and information on a backward relation information in a backward relation transfer process with respect to the neural network; determining, by the training device, a final loss function by summing a second loss function and the first loss function generated as a result of summing loss functions by layer; and updating, by the training device, the weight of each layer while performing a backward transfer process by using the final loss function.

Description

Neural network learning method and service device using relevance transcription {LEARNING METHOD FOR NEURAL NETWORK USING RELEVANCE PROPAGATION AND SERVICE PROVIDING APPARATUS}

이하 설명하는 기술은 인공 신경망 모델을 학습하는 기법에 관한 것이다.The technology described below relates to a technique for learning an artificial neural network model.

인공신경망은 높은 정확도를 바탕으로 다양한 산업에 적용되고 있다. 그러나 인공신경망 모델의 동작이 복잡한 블랙 박스 (black-box) 모델로서 네트워크의 특정 예측에 대한 정보를 제공하지 않는다. 인공신경망이 내부에서 어떤 기준으로 의사 결정을 하는지 알기 어렵다. 이러한 특징 때문에 네트워크의 판단에 대한 근거가 반드시 필요한 의료, 자율 주행, 군사 등의 산업에 쉽게 사용할 수 없는 한계가 있다. 이러한 한계를 극복하고자 신경망을 해석하려는 연구가 진행되고 있다.Artificial neural networks are being applied to various industries based on their high accuracy. However, the operation of the artificial neural network model is a complex black-box model and does not provide information on specific predictions of the network. It is difficult to know by what criteria the artificial neural network makes decisions internally. Due to these characteristics, there is a limit that cannot be easily used in industries such as medical, autonomous driving, and military, where a basis for network judgment is essential. In order to overcome these limitations, studies are underway to interpret neural networks.

Sebastian Bach, Alexander Binder, Gregoire Montavon, Frederick Klauschen, Klaus-Robert Muller, Wojciech Samek, "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation", PLoS one, 10(7): e0130140, 2015Sebastian Bach, Alexander Binder, Gregoire Montavon, Frederick Klauschen, Klaus-Robert Muller, Wojciech Samek, "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation", PLoS one, 10(7): e0130140, 2015

이하 설명하는 기술은 인공신경망을 해석하는 기법을 응용한 새로운 신경망 학습 방법을 제공하고자 한다. 이하 설명하는 기술은 계층적 관련성 전사의 결과를 이용한 신경망 학습 방법을 제공하고자 한다. The technique described below is intended to provide a new neural network learning method applying a technique for analyzing an artificial neural network. The technique described below is intended to provide a neural network learning method using the result of hierarchical relevance transcription.

관련성 전사를 이용한 신경망 학습 방법은 학습 장치가 신경망에 대하여 학습 데이터를 이용한 순방향 전사 과정을 수행하여 제1 손실함수를 결정하는 단계, 상기 학습 장치가 상기 신경망에 대하여 역방향 관련성 전사 과정에서 각 계층의 특징 맵과 역방향 관련성 정보를 기준으로 계층별 손실함수를 연산하는 단계, 상기 학습 장치가 계층별 손실함수를 합산한 결과로 생성되는 제2 손실함수와 상기 제1 손실합수를 합산하여 최종 손실함수를 결정하는 단계 및 상기 학습 장치가 상기 최종 손실함수를 이용하여 역방향 전사 과정을 수행하면서 상기 각 계층의 가중치를 갱신하는 단계를 포함한다.In the neural network learning method using relevance transcription, the learning device determines a first loss function by performing a forward transcription process using training data for the neural network, and the learning device determines the characteristics of each layer in the backward relationship transcription process for the neural network. Computing a loss function for each layer based on the map and the backward relationship information, and determining a final loss function by summing the second loss function and the first loss function generated as a result of the learning device summing the loss functions for each layer And updating, by the learning device, a weight of each layer while performing a backward transfer process using the final loss function.

관련성 전사를 이용하여 학습된 신경망을 이용한 서비스 장치는 입력 데이터를 입력받는 입력 장치, 관련성 전사를 이용하여 학습된 신경망 모델을 저장하는 저장 장치 및 상기 입력 데이터를 상기 신경망 모델에 입력하고, 상기 신경망 모델이 출력하는 결과를 이용하여 특정한 서비스 정보를 생성하는 연산 장치를 포함한다. 상기 신경망 모델은 역방향 관련성 전사 과정에서 생성되는 계층별 손실함수를 이용하여 생성되는 최종 손실함수를 기준으로 학습된다.A service device using a neural network trained using relevance transcription includes an input device that receives input data, a storage device that stores a neural network model learned using relevance transcription, and inputs the input data into the neural network model, and the neural network model It includes a computing device that generates specific service information by using the output result. The neural network model is trained on the basis of the final loss function generated by using the loss function for each layer generated during the backward correlation transcription process.

이하 설명하는 기술은 신경망 내부 계층에서의 판단 정보가 신경망의 추론에 어떤 결과를 가져오는지를 고려하여 신경망을 학습한다. 따라서 이하 설명하는 기술은 주어진 문제에 대한 성능을 높이면서도 특징 맵의 발현 위치가 특정되도록 가중치가 갱신된다. 나아가 이하 설명하는 기술은 일반적인 딥러닝 모델에 적용 가능한 범용적 기술이다.The technique to be described below learns a neural network in consideration of a result of the decision information in the inner layer of the neural network to infer the neural network. Therefore, in the technique described below, the weight is updated so that the expression position of the feature map is specified while improving the performance for a given problem. Furthermore, the technique described below is a general-purpose technique applicable to a general deep learning model.

도 1은 DNN 전사 과정에 대한 예이다.
도 2는 DNN 역전사 과정에 대한 예이다.
도 3은 계층적 관련성 전사 과정에 대한 예이다.
도 4는 DNN 학습 과정에 대한 예이다.
도 5는 DNN 학습 과정에 대한 순서도의 예이다.
도 6은 DNN 학습 과정에 대한 순서도의 다른 예이다.
도 7은 DNN을 포함하는 서비스 장치에 대한 예이다.1 is an example of a DNN transcription process.
2 is an example of a DNN reverse transcription process.
3 is an example of a hierarchical relationship transcription process.
4 is an example of a DNN learning process.
5 is an example of a flow chart for a DNN learning process.
6 is another example of a flowchart for a DNN learning process.
7 is an example of a service device including a DNN.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The technology to be described below may be modified in various ways and may have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to a specific embodiment, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the technology described below.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as 1st, 2nd, A, B, etc. may be used to describe various components, but the components are not limited by the above terms, and only for the purpose of distinguishing one component from other components. Is only used. For example, a first component may be referred to as a second component, and similarly, a second component may be referred to as a first component without departing from the scope of the rights of the technology described below. The term and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In terms of the terms used in the present specification, expressions in the singular should be understood as including plural expressions unless clearly interpreted differently in context, and terms such as "includes" are specified features, numbers, steps, actions, and components. It is to be understood that the presence or addition of one or more other features or numbers, step-acting components, parts or combinations thereof is not meant to imply the presence of, parts, or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Prior to the detailed description of the drawings, it is intended to clarify that the division of the constituent parts in the present specification is merely divided by the main function that each constituent part is responsible for. That is, two or more constituent parts to be described below may be combined into one constituent part, or one constituent part may be divided into two or more according to more subdivided functions. In addition, each of the constituent units to be described below may additionally perform some or all of the functions of other constituent units in addition to its own main function, and some of the main functions of each constituent unit are different. It goes without saying that it may be performed exclusively by.

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, in performing the method or operation method, each of the processes constituting the method may occur differently from the specified order unless a specific order is clearly stated in the context. That is, each process may occur in the same order as the specified order, may be performed substantially simultaneously, or may be performed in the reverse order.

이하 설명하는 기술은 신경망 학습 방법에 관한 것이다. The technique to be described below relates to a neural network learning method.

신경망(neural network) 내지 인공신경망(artificial neural network)은 생물의 신경망을 모방한 통계학적 학습 알고리즘이다. 다양한 신경망 모델이 연구되고 있다. 최근 딥러닝 신경망(deep learning network, DNN)이 주목받고 있다Neural networks or artificial neural networks are statistical learning algorithms that mimic the neural networks of living things. Various neural network models are being studied. Recently, deep learning networks (DNNs) are attracting attention.

DNN은 입력층(input layer)과 출력층(output layer) 사이에 여러 개의 은닉층(hidden layer)들로 이뤄진 인공신경망 모델이다. DNN은 일반적인 인공신경망과 마찬가지로 복잡한 비선형 관계(non-linear relationship)들을 모델링할 수 있다. DNN은 다양한 유형의 모델이 연구되었다. 예컨대, CNN(Convolutional Neural Network), RNN(Recurrent Neural Network), RBM(Restricted Boltzmann Machine), DBN(Deep Belief Network), GAN(Generative Adversarial Network), RL(Relation Networks) 등이 있다. DNN is an artificial neural network model consisting of several hidden layers between an input layer and an output layer. DNN can model complex non-linear relationships like general artificial neural networks. Various types of DNN models have been studied. For example, there are Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Relation Networks (RL).

신경망 내지 DNN은 학습 방법도 연구되고 있다. 이하 설명하는 기술은 역전사(Backward propagation) 방식에 기반한 학습 기법에 적용할 수 있다. 역전사 방식에 따른 가중치 조절은 대부분의 신경망 모델에서 이용하는 방식이다. 따라서, 이하 설명하는 기술은 특정한 신경망 모델에 국한되는 것이 아니고, 역전사 기반한 학습 방법을 갖는 다양한 신경망 모델에 적용 가능하다.Learning methods for neural networks or DNNs are also being studied. The technique described below can be applied to a learning technique based on a backward propagation method. Weight control according to the reverse transcription method is a method used in most neural network models. Therefore, the technique described below is not limited to a specific neural network model, and can be applied to various neural network models having a reverse transcription-based learning method.

신경망 학습 과정은 순방향 전사(Forward propagation) 및 역방향 전사(Backward propagation)를 포함한다. 순방향은 입력층에서 출력층 방향을 의미한다. 역방향 전사는 간략하게 역전사라고도 명명한다.The neural network learning process includes forward propagation and backward propagation. The forward direction means the direction from the input layer to the output layer. Reverse warrior is also referred to simply as reverse warrior.

DNN을 기준으로 신경망 학습 과정에 대하여 먼저 설명한다. The neural network training process is first described based on the DNN.

도 1은 DNN 전사 과정에 대한 예이다. 신경망은 복수의 계층으로 구성된다. 통상적으로 신경망은 입력층, 복수의 은닉층 및 출력층으로 구성된다. 초기 학습 단계에서 신경망의 각 계층들은 가중치가 랜덤한 값으로 초기화된다. 1 is an example of a DNN transcription process. Neural networks are composed of multiple layers. Typically, a neural network is composed of an input layer, a plurality of hidden layers, and an output layer. In the initial learning phase, each layer of the neural network is initialized with a random weight.

신경망에서 각 계층은 주어진 입력 데이터와 가중치의 연산을 수행하여 특징 맵을 결정한다. 하나의 계층에서 생성한 특징 맵은 다음 계층의 입력으로 사용된다. 예컨대, 도 1에서 첫 번째 계층(입력층)에서 출력되는 특징 맵은

이다. 여기서, f는 그 계층이 수행하는 함수,

및

는 각각 그 계층의 입력과 가중치를 의미한다. 신경망의 입력층 및 은닉층이 이와 같은 과정을 반복하고, 마지막으로 출력층은 최종적으로 나온 결과를 바탕으로 비용 함수(cost function)를 계산한다. 출력층은 결정 계층이라고 명명하기도 한다. In a neural network, each layer determines a feature map by calculating a given input data and a weight. The feature map created in one layer is used as an input to the next layer. For example, a feature map output from the first layer (input layer) in FIG. 1 is

to be. Where f is the function that the layer performs,

And

Represents the input and weight of the layer, respectively. The input layer and the hidden layer of the neural network repeat this process, and finally, the output layer calculates a cost function based on the final result. The output layer is sometimes referred to as the decision layer.

비용함수는 학습의 형태 (지도학습, 비지도학습, 강화학습 등), 신경망 모델의 종류, 활성화 함수(activation function) 같은 요인들을 기준으로 선택될 수 있다. 예컨대, 다종 분류 문제에 대해 지도 학습을 수행할 때 일반적으로 활성화 함수와 비용 함수는 각각 소프트맥스(softmax) 함수와 교차 엔트로피 함수(cross entropy function)로 결정된다. 각 함수의 정의는 도 1에 표시하였다. The cost function can be selected based on factors such as the type of learning (supervised learning, unsupervised learning, reinforcement learning, etc.), the type of neural network model, and the activation function. For example, when supervised learning is performed on a multi-classification problem, an activation function and a cost function are generally determined as a softmax function and a cross entropy function, respectively. The definition of each function is shown in FIG. 1.

출력층은 d₁, d₂,..., d_K,..., d_N을 결과를 출력한다. d₁, d₂,..., d_K,..., d_N에 대하여 활성화 함수를 적용한 값이 p₁, p₂,..., p_K,..., p_N이다. 출력층의 결과 중 활성화 함수를 적용한 후 가장 큰 값 p_K라고 가정한다. p_K는 신경망이 내리는 판단(prediction)이다. 비용 함수는 신경망이 내린 판단과 실제 기대하는 결과(label) 간의 손실(loss) L을 계산한다.The output layer outputs d ₁ , d ₂ ,..., d _K ,..., d _N. For d ₁ , d ₂ ,..., d _K ,..., d _N , the activation function applied values are p ₁ , p ₂ ,..., p _K ,..., p _N. It is assumed that p _K is the largest value after applying the activation function among the results of the output layer. p _K is the prediction made by the neural network. The cost function calculates the loss L between the judgment made by the neural network and the actual expected label.

도 2는 DNN 역전사 과정에 대한 예이다. 2 is an example of a DNN reverse transcription process.

신경망(DNN)은 많은 비선형 활성화 함수들이 여러 계층을 통해 복잡하게 얽힌 구조이다. 따라서 계층의 가중치를 최적화하는 일은 매우 어려운 논-컨벡스(non-convex)한 문제이다. A neural network (DNN) is a structure in which many nonlinear activation functions are entangled through several layers. Therefore, optimizing the weights of a layer is a very difficult non-convex problem.

일반적인 신경망의 가중치는 확률적 경사 하강법(stochastic gradient descent)를 통하여 조정(갱신)된다. 가중치는 도 2에 표시한 확률적 경사 하강법을 통하여 적당한 값까지 수렴시키는 방법으로 갱신된다. 각 계층의 가중치는 전사를 통해 마지막 계층에서 결정된 비용 함수에 각 가중치가 얼마나 영향을 미쳤는지에 따라 갱신된다. The weight of a general neural network is adjusted (updated) through stochastic gradient descent. The weight is updated by a method of converging to an appropriate value through the stochastic gradient descent method shown in FIG. 2. The weights of each layer are updated according to how much each weight affects the cost function determined at the last layer through the transcription.

신경망은 복잡한 구조 때문에 중간 계층의 가중치의 영향을 바로 계산할 수 없으므로 연쇄 법칙(chain rule)을 이용하여 계산된다. 도 2는 역전사 과정에 대하여 설명한다. 도 2에서 i번째 계층의 가중치는 w_i, 가중치가 갱신되는 학습 순간을 t+1이다.The neural network is computed using a chain rule because the influence of the weights of the middle layer cannot be calculated directly because of its complex structure. 2 describes the reverse transcription process. In FIG. 2, the weight of the i-th layer is w _i , and the learning moment at which the weight is updated is t+1.

출력층(예컨대, n 계층)에서 손실이 계산되면 이 값에 대한 가중치 변화량을 계산한다. 출력층 바로 앞에 있는 계층(n-1 계층)의 손실에 대한 가중치 변화량은 출력층(n 계층)에서 나온 손실에 대한 가중치 변화량과 출력층(n 계층)의 가중치에 대한 해당 계층(n-1 계층)의 가중치 변화량을 연쇄적으로 곱하여 구할 수 있다. 각 계층의 가중치는 이 변화량에 따라 갱신되고 이 변화량은 계속해서 앞 계층으로 전달된다. 즉, 가중치 갱신은 출력층에서 입력층 방향으로 진행된다. 이와 같은 가중치 갱신 과정을 역전사라고 부른다. 도 2는 역전사 과정 및 계산에 필요한 수식을 도시한다.When the loss is calculated in the output layer (eg, n layer), the weight change amount for this value is calculated. The amount of change in weight for the loss of the layer immediately in front of the output layer (layer n-1) is the amount of change in weight for the loss from the output layer (layer n) and the weight of the corresponding layer (layer n-1) to the weight of the output layer (layer n). It can be found by serially multiplying the amount of change. Each layer's weight is updated according to this variation, and this variation is continuously transmitted to the previous layer. That is, the weight update proceeds from the output layer to the input layer. This process of updating weights is called reverse transcription. 2 shows a reverse transcription process and an equation required for calculation.

도 3은 계층적 관련성 전사(Layerwise Relevance Propagation, LRP) 과정에 대한 예이다. 3 is an example of a layerwise relevance propagation (LRP) process.

히트맵(heatmap)은 배열을 값들을 2차원에 컬러로 표현하는 그래픽 기법이다. 값의 크기에 따라 다른 색을 지정함으로써 배열의 값 분포를 알 수 있다. 인공지능 분야에서는 히트맵은 입력 영상의 픽셀 각각이 신경망의 판단에 기여하는 정도를 표현할 수 있다. 신경망에서 히트맵을 생성하는 대표적인 방법으로 계층적 관련성 전사(Layerwise Relevance Propagation, LRP) 알고리즘이 있다. LRP는 네트워크 내에서 역방향으로 전달되는 신호를 관련성(Relevance)이라고 정의하고, 이 관련성이 계층을 통과해도 같은 값으로 보존된다는 가정으로 진행된다. Heatmap is a graphic technique that expresses an array of values in two dimensions in color. By designating different colors according to the size of the values, you can know the distribution of values in the array. In the field of artificial intelligence, the heat map can express the degree to which each pixel of the input image contributes to the judgment of the neural network. A representative method of generating heat maps in neural networks is the Layerwise Relevance Propagation (LRP) algorithm. LRP defines a signal transmitted in the reverse direction within a network as relevance, and proceeds with the assumption that this relevance is preserved at the same value even when passing through a layer.

신경망의 출력층의 관련성은 전사 과정 후 나오는 결과 자체로부터 계산한다. 역전사 과정과 같이 마지막 계층에서부터 관련성이 전달되는데 관련성은 관련성의 총합이 유지되면서 도 3의 수식처럼 통과하는 계층의 학습된 가중치와 입력되는 특징 맵의 가중 합(weighted sum)을 바탕으로 전달된다. 도 3의 수식에서 c_i는 해당 i 번째 계층의 커널 개수를 의미하고, 함수 f는 해당 계층의 연산 함수와 동일한 함수이다. w_i는 i번째 계층의 가중치이다. 이와 같은 과정을 통하여 각 계층의 입력 특징맵의 픽셀 각각이 네트워크의 판단에 미친 기여도를 알 수 있다. 입력층까지 통과하면 입력 영상과 동일한 크기로 입력 영상의 픽셀이 네트워크 판단에 미친 기여도를 표현하는 히트맵 i번째 계층의

을 얻을 수 있다.The relevance of the output layer of the neural network is calculated from the result itself after the transcription process. As in the reverse transcription process, the relevance is transmitted from the last layer, and the relevance is transmitted based on the learned weight of the passing layer and the weighted sum of the input feature map as shown in the equation of FIG. 3 while the total of the relevance is maintained. In the equation of FIG. 3, c _i denotes the number of kernels of the i-th layer, and function f is the same function as the operation function of the corresponding layer. w _i is the weight of the i-th layer. Through this process, it is possible to know the contribution of each pixel of the input feature map of each layer to the determination of the network. When passing through to the input layer, the heat map of the i-th layer that expresses the contribution of the pixels of the input image to the network determination with the same size as the input image.

Can be obtained.

이하 전술한 히트맵을 생성하는 알고리즘을 응용한 신경망 학습 과정에 대하여 설명한다. 도 4는 DNN 학습 과정에 대한 예이다. 도 4에 도시한 학습 과정은 비용 함수 결정에 종래의 손실 정보와 함께 중간 계층의 특징 맵에 대한 손실 정보를 반영한다. 학습 과정은 크게 3가지 단계로 구성된다. 학습 과정은 순방향 전사, 관련성 전사 및 역방향 전사로 구성된다. 즉, 종래 기법과 달리 관련성 전사가 추가되었다. 관련성 전사의 결과는 최종적인 손실 정보 결정에 사용된다.Hereinafter, a neural network learning process to which the above-described algorithm for generating a heat map is applied will be described. 4 is an example of a DNN learning process. The learning process shown in FIG. 4 reflects loss information on a feature map of an intermediate layer along with conventional loss information in determining a cost function. The learning process is largely composed of three steps. The learning process consists of forward transcription, relevance transcription and reverse transcription. In other words, unlike the conventional technique, related transcription was added. The results of relevance transcription are used to determine the final loss information.

도 4에서 순방향 전사를 통해 계층의 결과로 나오는 특징 맵을 a⁺로 표기하였고, LRP 계산은 네트워크의 역방향으로 진행되므로 계산된 관련성은 a^-로 표기하였다. In FIG. 4, a feature map resulting from a layer through forward transcription is indicated as a ⁺ , and since LRP calculation is performed in the reverse direction of the network, the calculated relationship is indicated as a ⁻ .

(i) 먼저 순방향 전사를 통해 신경망이 다루는 문제에 대한 판단과 기대하는 결과 사이의 손실 L_T을 계산한다. (i) First, the loss L _T between the judgment of the problem handled by the neural network and the expected result is calculated through forward transcription.

(ii) 그 다음 계층별 특징 맵 각각이 네트워크의 판단에 영향을 미치는 정도를 나타내는 관련성을 계산하기 위해 LRP를 진행한다.(ii) Then, LRP is performed to calculate the relevance indicating the degree to which each of the layer-specific feature maps influences the decision of the network.

신경망의 i번째 계층을 기준으로 설명하면, i번째 계층에서 계산되는 손실은 해당 계층의 입력 특징 맵인 (i-1) 번째 계층에서 만든 특징 맵

과 순방향 전사가 끝난 후 신경망이 내린 판단 정보가 전달된 계층의 관련성

간의 L₁ 손실

이다. Explaining based on the i-th layer of the neural network, the loss calculated in the i-th layer is a feature map created at the (i-1)-th layer, the input feature map of the layer.

And the relevance of the layer to which the judgment information made by the neural network is transmitted after the forward transcription is over

L ₁ loss of liver

to be.

이때 가장 첫 계층의 입력 특징 맵

은 실제 신경망의 입력이라고 정의한다. K개의 계층이 있는 신경망의 경우 계층마다 이 손실을 계산하여 나온 K개의 손실과 L_T를 모두 더한 L이 신경망의 최종적인 손실이 된다. At this time, the input feature map of the first layer

Is defined as the input of the actual neural network. In the case of a neural network with K layers, the final loss of the neural network is the sum of K losses and L _T obtained by calculating this loss for each layer.

(iii)LRP 과정이 모두 끝나면 최종 손실 L을 바탕으로 역전사 과정을 수행하여 신경망의 가중치들을 갱신하며 학습을 진행한다. (iii) When the LRP process is all over, it performs a reverse transcription process based on the final loss L, updates the weights of the neural network, and proceeds with learning.

도 4에서 설명한 과정을 수식을 이용하여 정의하면 아래와 같다. CNN을 기준으로 설명한다.The process described in FIG. 4 is defined using an equation as follows. The explanation is based on CNN.

순방향 과정에서 i 번째 계층에 대한 순방향 액티베이션 맵(activation map) 세트는

라고 정의한다.

는 아래 수학식 1과 같이 정의된다. C_i는 i 번째 계층의 채널 크기이다.In the forward process, the set of forward activation maps for the ith layer is

Is defined as.

Is defined as in Equation 1 below. C _i is the channel size of the i-th layer.

C_i는 i 번째 계층의 채널 크기이고, 각 채널의 컨볼루셔널 커널(kernel)은

이다. f는 해당 계층의 함수이다.C _i is the channel size of the i-th layer, and the convolutional kernel of each channel is

to be. f is a function of the layer.

역방향 과정에서 i 번째 계층에 대한 역방향 관련성 세트는

라고 정의한다.

는 아래 수학식 2와 같이 정의된다.In the reverse process, the set of reverse relevance for the i-th layer is

Is defined as.

Is defined as in Equation 2 below.

이때 커널은

와

의 비율로 재정규화된다. 도 4에 도시된 바와 같이

및

는 각각 i 번째 계층의 내측(inward) 액티베이션 맵과 관련성에 해당한다.At this time, the kernel

Wow

Are renormalized at the rate of. As shown in Figure 4

And

Each corresponds to the inward activation map and relevance of the i-th layer.

각 계층에서 순방향 정보와 역방향 정보를 이용한 새로운 손실 함수를 합산한 평균값은 아래의 수학식 3과 같이 정의될 수 있다. 아래

는 순방향 특징맵과 역방향 계층의 관련성을 이용한 손실 정보이다.In each layer, an average value obtained by summing a new loss function using forward information and backward information may be defined as in Equation 3 below. under

Is loss information using the relationship between the forward feature map and the reverse layer.

K는 신경망의 계층 개수이다.

은 두 개의 입력 값(term)의 절대 차이를 연산하는 l₁ 손실 함수이다.K is the number of layers in the neural network.

Is an l ₁ loss function that calculates the absolute difference between two input terms.

전체 손실함수는 아래의 수학식 4와 같이 정의할 수 있다.The total loss function can be defined as in Equation 4 below.

는 신경망의 순방향 과정에서 산출되는 손실함수이다. 즉, 실제 라벨값과 크로스-엔트로피 손실 사이의 손실이다. 도 4에서 L_T로 표시한 값이다.

Is the loss function calculated in the forward process of the neural network. That is, it is the loss between the actual label value and the cross-entropy loss. It is a value expressed as L _T in FIG. 4.

이러한 과정을 통하여 손실함수가 새롭게 정의된다. 따라서 전술한 과정을 관련성 전사를 이용한 정규화 기법이라고 할 수 있다.Through this process, the loss function is newly defined. Therefore, the above-described process can be referred to as a normalization technique using relevance transcription.

이제, 최종적인 손실 함수 L을 이용하여 신경망을 학습할 수 있다.Now, we can train the neural network using the final loss function L.

도 5는 DNN 학습 과정(100)에 대한 순서도의 예이다. 신경망 학습 과정은 입력 데이터 처리가 가능한 연산 장치 내지 컴퓨터 장치가 수행한다. 컴퓨터 장치가 신경망 학습을 수행하므로, 컴퓨터 장치는 학습 장치라고 명명할 수도 있다. 이하 설명의 편의를 위해 컴퓨터 장치가 신경망 학습을 수행한다고 가정한다. 5 is an example of a flow chart for the DNN learning process 100. The neural network learning process is performed by a computing device or a computer device capable of processing input data. Since the computer device performs neural network learning, the computer device may be referred to as a learning device. For convenience of description, it is assumed that the computer device performs neural network learning.

컴퓨터 장치는 학습 데이터를 입력받는다(110). 학습 데이터는 영상 데이터 등일 수 있다. 최초 각 계층의 가중치는 랜덤한 값으로 설정될 수 있다. The computer device receives the training data (110). The training data may be image data or the like. Initially, the weight of each layer may be set to a random value.

컴퓨터 장치는 학습 데이터를 이용하여 순방향 전사 과정을 수행한다(120). 순방향 과정을 통해 각 계층은 특징 맵을 생성한다. 순방향 과정을 통해 출력층은 특정 문제에 대한 판단을 출력한다. 그리고 출력층은 실제 라벨 값과 출력한 판단을 비교하여 손실 L_T를 연산한다.The computer device performs a forward transcription process using the training data (120). Each layer generates a feature map through the forward process. Through the forward process, the output layer outputs a judgment on a specific problem. And the output layer calculates the loss L _T by comparing the actual label value and the output judgment.

컴퓨터 장치는 전술한 계층적 관련성 전사(LRP)를 수행한다(130). 이 과정에서 각 계층에서 순방향의 특징 맵과 역방향의 관련성 정보에 대한 L₁ 손실 함수를 연산한다. 그리고 컴퓨터 장치는 각 계층의 L₁ 손실 함수를 모두 합산한 값과 L_T를 합산하여 최종 손실함수 L을 결정한다(140). 이때 컴퓨터 장치는 수학식 3에서 설명한 바와 같이 각 계층의 L₁ 손실 함수를 모두 합산하여 평균한 값과 L_T를 합산하여 최종 손실함수 L을 결정할 수 있다.The computer device performs the hierarchical association transfer (LRP) described above (130). In this process, the L ₁ loss function for the forward feature map and the reverse relationship information is calculated in each layer. In addition, the computer device determines the final loss function L by summing all L ₁ loss functions of each layer and L _T (140). At this time, as described in Equation 3, the computer device may determine the final loss function L by summing all the L ₁ loss functions of each layer and summing the average value and L _T.

컴퓨터 장치는 최종 손실함수 L을 기준으로 역방향 전사를 수행하여 각 계층의 가중치를 갱신할 수 있다(150).The computer device may update the weight of each layer by performing reverse transcription based on the final loss function L (150).

도 6은 DNN 학습 과정(200)에 대한 순서도의 다른 예이다.6 is another example of a flowchart for the DNN learning process 200.

신경망의 학습 초기에 관련성 계산 과정을 수행하면 낮은 정확도의 판단 정보로부터 관련성 정보가 연산된다. 이 경우 부정확한 정보가 가중치 학습에 반영되어 신경망의 가중치 학습이 최적화되지 못 할 위험이 있다. 따라서, 종래 신경망 학습 방법에 따라 학습을 수행한 후 신경망 판단의 정확도가 어느 정도까지 도달한 후 전술한 방법에 따라 신경망의 가중치를 갱신할 수도 있다. 도 6은 이와 같은 과정에 대한 예이다.When the relevance calculation process is performed at the initial stage of neural network learning, relevance information is computed from judgment information with low accuracy. In this case, there is a risk that inaccurate information is reflected in the weight learning, so that the weight learning of the neural network is not optimized. Accordingly, after learning is performed according to the conventional neural network learning method, after the accuracy of the neural network determination reaches a certain degree, the weight of the neural network may be updated according to the above-described method. 6 is an example of such a process.

컴퓨터 장치는 학습 데이터를 입력받는다(210). 학습 데이터는 영상 데이터 등일 수 있다. 최초 각 계층의 가중치는 랜덤한 값으로 설정될 수 있다. The computer device receives training data (210). The training data may be image data or the like. Initially, the weight of each layer may be set to a random value.

컴퓨터 장치는 학습 데이터를 이용하여 순방향 전사 과정을 수행한다(220). 순방향 과정을 통해 각 계층은 특징 맵을 생성한다. 순방항 과정을 통해 출력층은 특정 문제에 대한 판단을 출력한다. 그리고 출력층은 실제 라벨 값과 출력한 판단을 비교하여 손실 L_T1를 연산한다. 컴퓨터 장치는 최종 손실함수 L_T1을 기준으로 역방향 전사를 수행하여 각 계층의 가중치를 갱신할 수 있다(230).The computer device performs a forward transcription process using the training data (220). Each layer generates a feature map through the forward process. Through the cruise process, the output layer outputs a judgment on a specific problem. And the output layer calculates the loss L _T1 by comparing the actual label value with the output judgment. The computer device may update the weight of each layer by performing reverse transcription based on the final loss function L _T1 (230).

컴퓨터 장치는 실제 라벨값과 현재 신경망 출력층의 결과 사이의 차이가 일정한 기준값 이상인지 판단한다(240). 컴퓨터 장치는 학습 정확도가 기준값 미만인 경우(240의 NO), 순방향 전사와 그 결과를 이용한 역방향 전사 과정을 반복적으로 수행할 수 있다.The computer device determines whether the difference between the actual label value and the result of the current neural network output layer is greater than or equal to a predetermined reference value (240). When the learning accuracy is less than the reference value (NO of 240), the computer device may repeatedly perform a forward transfer and a reverse transfer process using the result.

컴퓨터 장치는 학습 정확도가 기준값 이상인 경우(240의 YES), 계층적 관련성 전사를 수행할 수 있다(250). 이 시점에서 출력층이 결정한 손실 함수가 L_Tk라고 가정한다. 이 과정에서 컴퓨터 장치는 각 계층에서 순방향의 특징 맵과 역방향의 관련성 정보에 대한 L₁ 손실 함수를 연산한다. When the learning accuracy is greater than or equal to the reference value (YES in 240), the computer device may perform hierarchical association transcription (250). At this point, it is assumed that the loss function determined by the output layer is L _Tk . In this process, the computer device calculates the L ₁ loss function for the forward feature map and the reverse relationship information in each layer.

컴퓨터 장치는 각 계층의 L₁ 손실 함수를 모두 합산한 값과 L_Tk를 합산하여 최종 손실함수 L을 결정한다(250). 이때 컴퓨터 장치는 수학식 3에서 설명한 바와 같이 각 계층의 L₁ 손실 함수를 모두 합산하여 평균한 값과 L_Tk를 합산하여 최종 손실함수 L을 결정할 수 있다.The computer device determines a final loss function L by summing all L ₁ loss functions of each layer and L _Tk (250). In this case, as described in Equation 3, the computer device may determine the final loss function L by summing all the L ₁ loss functions of each layer and summing the average value and L _Tk .

컴퓨터 장치는 최종 손실함수 L을 기준으로 역방향 전사를 수행하여 각 계층의 가중치를 갱신할 수 있다(260).The computer device may update the weight of each layer by performing reverse transcription based on the final loss function L (260).

도 7은 DNN을 포함하는 서비스 장치(300)에 대한 예이다. 서비스 장치(300)는 전술한 신경망 학습 방법으로 마련한 신경망 모델을 이용하여 일정한 응용 서비스를 제공하는 장치를 의미한다. 서비스 장치(300)가 신경망을 학습한 컴퓨터 장치일 수도 있다. 또는 서비스 장치(300)는 신경망을 학습한 컴퓨터 장치와 별개의 장치일 수도 있다.7 is an example of a service device 300 including a DNN. The service device 300 refers to a device that provides a certain application service using a neural network model prepared by the aforementioned neural network learning method. The service device 300 may be a computer device that has learned a neural network. Alternatively, the service device 300 may be a device separate from the computer device that has learned the neural network.

서비스 장치(300)는 저장 장치(310), 메모리(320), 연산장치(330), 인터페이스 장치(340) 및 통신 장치(350)를 포함한다.The service device 300 includes a storage device 310, a memory 320, an operation device 330, an interface device 340, and a communication device 350.

저장 장치(310)는 서비스 장치(300)의 동작을 위한 프로그램 내지 코드를 저장한다. 저장 장치(310)는 전술한 학습 방법으로 마련된 신경망 모델을 저장한다. 나아가 저장 장치(310)는 전술한 신경망 학습 과정을 위한 프로그램 내지 코드를 저장할 수도 있다.The storage device 310 stores programs or codes for the operation of the service device 300. The storage device 310 stores the neural network model prepared by the above-described learning method. Furthermore, the storage device 310 may store a program or code for the aforementioned neural network learning process.

메모리(320)는 서비스 장치(300)의 동작 과정에서 생성되는 데이터 및 정보 등을 임시 저장할 수 있다.The memory 320 may temporarily store data and information generated during the operation of the service device 300.

인터페이스 장치(340)는 외부로부터 일정한 명령 및 데이터를 입력받는 장치이다. 인터페이스 장치(340)는 물리적으로 연결된 입력 장치 또는 물리적인 인터페이스(키패드, 터치 패널 등)로부터 일정한 정보를 입력받을 수 있다. 인터페이스 장치(340)는 신경망 모델, 신경망 모델 학습을 위한 정보, 학습 데이터 등을 입력받을 수 있다. 인터페이스 장치(340)는 신경망 모델에 입력하기 위한 입력 데이터를 입력받을 수 있다.The interface device 340 is a device that receives certain commands and data from the outside. The interface device 340 may receive certain information from a physically connected input device or a physical interface (keypad, touch panel, etc.). The interface device 340 may receive a neural network model, information for training a neural network model, and training data. The interface device 340 may receive input data for input to the neural network model.

통신 장치(350)는 무선 네트워크를 통해 일정한 정보를 송수신한다. 통신 장치(350)는 신경망 모델, 신경망 모델 학습을 위한 정보, 학습 데이터 등을 입력받을 수 있다. 통신 장치(350)는 신경망 모델에 입력하기 위한 입력 데이터를 입력받을 수 있다. 통신 장치(350)는 신경망 모델을 이용한 서비스 결과(판단 결과)를 외부 객체에 전송할 수 있다.The communication device 350 transmits and receives certain information through a wireless network. The communication device 350 may receive a neural network model, information for training a neural network model, and training data. The communication device 350 may receive input data for input to the neural network model. The communication device 350 may transmit a service result (a determination result) using a neural network model to an external object.

인터페이스 장치(340) 및 통신 장치(350)는 사용자 또는 외부 객체로부터 일정한 정보 및 데이터를 입력받을 수 있다. 따라서 인터페이스 장치(340) 및 통신 장치(350)를 포괄하여 입력 장치라고 명명할 수 있다.The interface device 340 and the communication device 350 may receive certain information and data from a user or an external object. Accordingly, the interface device 340 and the communication device 350 may be collectively referred to as an input device.

연산 장치(330)는 저장장치(310)에 저장된 프로그램 내지 코드를 이용하여 서비스 장치의 동작을 제어한다. 연산 장치(330)는 학습된 신경망 모델을 이용하여 일정한 판단 과정을 수행할 수 있다. 연산 장치(330)는 입력 데이터를 학습된 신경망 모델에 입력하여, 일정한 판단 결과를 생성할 수 있다. 예컨대, 연산 장치(330)는 영상의 객체 검출, 특정 입력에 대한 판단 결과, 특정 입력에 대한 제어 명령 등을 생성할 수 있다. The computing device 330 controls the operation of the service device using programs or codes stored in the storage device 310. The computing device 330 may perform a certain determination process using the learned neural network model. The computing device 330 may input the input data into the learned neural network model to generate a certain determination result. For example, the computing device 330 may detect an object of an image, a determination result for a specific input, and a control command for a specific input.

나아가 연산 장치(330)는 산출한 결과를 피드백하여 신경망 모델을 갱신할 수 있다. 이 과정에서 연산 장치(330)는 전술한 학습 과정을 통하여 신경망 모델의 가중치를 갱신할 수도 있다. 연산 장치(330)는 데이터를 처리하고, 일정한 연산을 처리하는 프로세서, AP, 프로그램이 임베디드된 칩과 같은 장치일 수 있다.Furthermore, the computing device 330 may update the neural network model by feeding back the calculated result. In this process, the computing device 330 may update the weight of the neural network model through the above-described learning process. The computing device 330 may be a device such as a processor, an AP, or a chip in which a program is embedded that processes data and processes certain operations.

또한, 상술한 바와 같은 DNN 내지 신경망 학습 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.In addition, the above-described DNN or neural network learning method may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be provided by being stored in a non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium refers to a medium that stores data semi-permanently and can be read by a device, not a medium that stores data for a short moment, such as a register, cache, or memory. Specifically, the above-described various applications or programs may be provided by being stored in a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, and ROM.

본 실시례 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.The present embodiment and the accompanying drawings are merely illustrative of some of the technical ideas included in the above-described technology, and those skilled in the art will be able to easily within the scope of the technical ideas included in the specification and drawings of the above-described technology. It will be apparent that all of the modified examples and specific embodiments that can be inferred are included in the scope of the rights of the above-described technology.

Claims

Determining, by the learning device, a first loss function by performing a forward transfer process using the training data on the neural network;
Calculating, by the learning device, a loss function for each layer based on a feature map of each layer and information on the backward relationship in a backward relationship transfer process with respect to the neural network;
Determining, by the learning device, a final loss function by summing a second loss function generated as a result of summing each layer loss function and the first loss function; And
And updating, by the learning device, a weight of each layer while performing a reverse transcription process using the final loss function.

The method of claim 1,
The learning device further comprises initial learning of the neural network while performing a forward transcription process and a reverse transcription process using the learning data,
A neural network learning method using relevance transcription in which a process of calculating the loss function for each layer is performed when the determination result of the initially learned neural network has higher accuracy than a reference value.

The method of claim 1,
The loss function for each layer of the i-th layer of the neural network is a loss function between a feature map generated in the i-1th layer and a backward relationship in the i-th layer.

The method of claim 3,
A neural network learning method using correlation transcription in which the reverse relationship in the i-th layer is determined by the reverse activation map below.

(

Is the backward relationship of the i-th layer, C _i is the channel size of the i-th layer, w is the weight,

Is the forward activation map of the i-th layer)

The method of claim 1,
The second loss function is a neural network learning method using association transcription, which is an average value of the loss function for each layer of the entire layer.

A computer-readable recording medium in which a program for executing the neural network learning method using the association transcription according to any one of claims 1 to 5 is recorded on a computer.

An input device receiving input data;
A storage device that stores a neural network model trained using relevance transcription; And
Including a computing device for inputting the input data into the neural network model and generating specific service information using a result output from the neural network model,
The neural network model is a service apparatus using a neural network trained using a relationship transcription learned based on a final loss function generated using a loss function for each layer generated in a backward relationship transcription process.

The method of claim 7,
The final loss function is
A first loss function determined by performing a forward transcription process on the neural network model based on the training data and a backward relationship to the neural network model for each layer determined on the basis of the feature map of each layer and the backward relationship information in the transcription process A service device using a neural network learned by using relationship transcription, which is a loss function that is a summation of the second loss function generated using the loss function.

The method of claim 7,
The loss function for each layer of the i-th layer of the neural network is a service device using a neural network learned using relationship transcription, which is a loss function between the feature map generated in the i-1th layer and the backward relationship in the i-th layer .

The method of claim 9,
A service apparatus using a neural network learned by using a relationship transcription, in which the reverse relationship in the i-th layer is determined by a reverse activation map below.

(

Is the forward activation map of the i-th layer)