KR100241359B1

KR100241359B1 - Adaptive learning rate and limited error signal

Info

Publication number: KR100241359B1
Application number: KR1019970024952A
Authority: KR
Inventors: 오상훈; 이경준; 이헌
Original assignee: 정선종; 한국전자통신연구원
Priority date: 1997-06-16
Filing date: 1997-06-16
Publication date: 2000-02-01
Also published as: KR19990001578A

Abstract

본 발명은 신경회로망의 학습을 위한 학습률 가변 및 오차신호 크기 제한방법에 관한 것으로서, 패턴 인식 문제의 학습에 널리 사용되는 신경회로망에서 효율적인 학습을 위해 일반화된 크로스-엔트로피 오차함수를 이용하는 경우 나타나는 문제인 오차함수의 차수에 학습성능이 민감함을 해결하는 방법으로서 가변 학습률과 가중치 변경을 위한 오차신호의 크기를 제한하는 방법을 제공함으로써, 일반화된 크로스-엔트로피 오차함수를 이용한 학습에서 학습성능이 오차함수에 크게 영향을 받지 않게 되는 효과를 가진다.The present invention relates to a method of limiting a learning rate variable and an error signal size for learning a neural network. As a method to solve the sensitivity of the learning performance to the order of the function, by providing a method of limiting the magnitude of the error signal for the variable learning rate and the weight change, the learning performance in the learning using the generalized cross-entropy error function It has the effect of not being greatly affected.

Description

Adaptive Learning Rate and Limited Error Signal

본 발명은 신경회로망의 학습을 위한 학습률 가변 및 오차신호 크기 제한방법에 관한 것이다.The present invention relates to a variable learning rate and a method of limiting an error signal for learning a neural network.

일반적으로 신경회로망의 학습 시 학습 시간이 많이 걸리거나, 몇몇 패턴에 대해서는 아예 학습이 되지 않는 현상이 종종 나타난다.In general, the neural network learning takes a lot of time to learn, or some patterns are often not learning at all.

이러한 문제를 해결하기 위해서 크로스-엔트로피 오차함수의 일반화된 형태를 지닌 오차함수가 제시되었다.To solve this problem, an error function with a generalized form of cross-entropy error function has been proposed.

그렇지만, 이 일반화된 크로스-엔트로피 오차함수를 이용한 신경회로망의 학습성능은 오차함수의 차수가 변함에 따라 심하게 변하는 문제가 발생한다.However, the learning performance of neural networks using this generalized cross-entropy error function is severely changed as the error function order changes.

상기 문제를 해결하기 위해 본 발명은, 가변학습률과 가중치 변경을 위한 오차신호의 크기를 제한하는 방법을 제안함으로써, 어렵게 오차함수의 적정차수를 결정할 필요가 없도록 하는 것을 목적으로 한다.In order to solve the above problem, the present invention proposes a method of limiting the magnitude of the error signal for changing the variable learning rate and the weight, so that it is not difficult to determine the proper order of the error function.

도 1은 본 발명의 다층퍼셉트론 신경회로망의 구조도.1 is a structural diagram of a multilayer perceptron neural network of the present invention.

도 2는 본 발명의 시그모이드 활성화 함수 예시도.Figure 2 is an illustration of the sigmoid activation function of the present invention.

도 3은 본 발명에 따른 다층퍼셉트론의 일반적 역전파 학습 흐름도.3 is a general backpropagation learning flow diagram of a multilayer perceptron according to the present invention.

도 4는 종래의 평균제곱 에러(Mean-square error)를 이용한 오차신호 및 일반화된 크로스-엔트로피 에러를 이용한 오차신호 비교도.4 is a comparison diagram of an error signal using a conventional mean-square error and a generalized cross-entropy error.

도 5는 종래에 적용된 고정학습률을 이용한 필기체 숫자인식 문제의 시뮬레이션 결과 비교도.Figure 5 is a comparison of simulation results of the handwritten numeric recognition problem using a fixed learning rate applied in the prior art.

도 6은 본 발명이 적용되는 가변학습률 및 제한된 오차신호를 이용한 필기체 숫자인식 문제의 시뮬레이션 결과 비교도.Figure 6 is a comparison of simulation results of the handwritten numeric recognition problem using a variable learning rate and a limited error signal to which the present invention is applied.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

100 : 시냅스 가중치 201 : 학습패턴100: synaptic weight 201: learning pattern

202 : 전방향 계산 203 : 출력층 오차신호 계산202: omnidirectional calculation 203: output layer error signal calculation

204 : 오차신호 역전파 205 : 가중치 변경에 의한 학습204 Error signal propagation 205 Learning by weight change

상기 목적을 달성하기 위해 본 발명은, 학습률을 목표값과 출력값의 차의 제곱에 대한 기대치와 오차신호의 제곱에 대한 기대치의 비율에 따라 가변시키며, 가중치의 변경량이 매우 커지는 것을 방지하기 위해 출력노드의 오차신호가 제곱의 기대치에 대한 제곱근의 일정비율보다 큰 경우 절삭하는 것을 특징으로 한다.In order to achieve the above object, the present invention changes the learning rate according to the ratio of the expected value for the square of the difference between the target value and the output value and the expected value for the square of the error signal, and to prevent the amount of change in weight from becoming very large. If the error signal is greater than a certain ratio of the square root to the expected value of the square is characterized in that the cutting.

이하 첨부된 도면을 참조하여 본 발명을 상세히 살명하면 다음과 같다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 다층퍼셉트론 신경회로망의 구조도로서, 뉴런을 뜻하는 노드와 노드를 연결하는 시냅스 가중치(100)들이 계층적으로 구성되어 있다.1 is a structural diagram of a multi-layer perceptron neural network of the present invention, in which the synaptic weights 100 connecting the nodes representing the neurons are hierarchically configured.

상기 ″다층퍼셉트론″이란 생명체의 정보처리를 모방한 신경회로망 모델의 하나로써, 이 다층퍼셉트론의 각 노드는 그 상태가 아래층 노드들의 상태 값과 그 연결 가중치들의 ″가중치 합″을 입력으로 받아들여 도 2와 같이 시그모이드로 변환한 값을 출력으로 한다.The ″ multilayer perceptron ″ is a neural network model that mimics the information processing of living things. Output the value converted to sigmoid as shown in 2.

시그모이드 함수는 기울기가 작은 양 측면의 포화영역과 기울기가 큰 중앙의 활성영역으로 나누어 진다.The sigmoid function is divided into a saturation region on both sides with a small slope and a central active region with a large slope.

″학습패턴″이란 패턴인식의 문제를 학습시키기 위해 임의로 수집한 패턴들이다.″ Learning Patterns ″ are patterns randomly collected to learn the problem of pattern recognition.

″시험패턴″이란 패턴인식 문제의 학습정도를 시험하는 기준으로 삼기 위해 임의로 수집한 패턴들이다.″ Test Patterns ″ are randomly collected patterns used as a standard for testing the learning of pattern recognition problems.

이들 패턴들은 여러개의 ″집단″으로 나눌 수 있으며, ″패턴인식″이란 입력된 패턴이 어느 집단에 속하는가를 결정하는 것이다.These patterns can be divided into several "groups", and "pattern recognition" is to determine which group the input pattern belongs to.

신경회로망에서 입력패턴이 속한 집단은 최종계층 노드들의 상태에 나타난다.In the neural network, the group to which the input pattern belongs is shown in the state of the final layer nodes.

″역전파 학습″이란 이 다층퍼셉트론을 학습시키는 방법으로써, 학습패턴을 입력시킨 후 최종계층 노드의 출력값이 원하는 목표값이 나오도록 오차신호에 따라 최종계층 노드와 연결된 가중치들을 변경시키며, 그 아래층의 노드들은 윗 계층에서 역전파된 오차신호에 따라 연결가중치들을 변경시킨다.″ Backpropagation learning ″ is a method of learning this multi-layer perceptron. After inputting the learning pattern, the output value of the last layer node changes the weights connected to the last layer node according to the error signal so that the desired value is obtained. The nodes change the link weights according to the error signal back propagated in the upper layer.

″오차함수″란 다층퍼셉트론의 출력값과 목표값 간의 거리를 측정하는 함수이다.The error function is a function that measures the distance between the output value and the target value of the multilayer perceptron.

″오차신호″란 오차함수의 편미분 값에 '-1'을 곱한 값이다.The ″ error signal ″ is a value obtained by multiplying the partial derivative of the error function by '-1'.

″노드의 포화″란 노드의 가중치 합 입력값이 시그모이드 함수의 기울기가 작은 영역에 위치한 것을 말한다.″ Saturation of node ″ means that the weighted sum input of the node is located in the region where the slope of the sigmoid function is small.

노드가 목표값과 같은 포화영역에 위치하면 ″적절한 포화″, 반대쪽에 위치하면 ″부적절한 포화″라 한다.If a node is in the same saturation region as its target value, it is said to be `` inadequate saturation '';

다층퍼셉트론의 ″역전파 학습 알고리즘″의 구체적인 내용은 도 3과 같다.Details of the ″ backpropagation learning algorithm ″ of the multilayer perceptron are shown in FIG. 3.

먼저 학습패턴(201)

이 입력되면, L층으로 이루어진 다층퍼셉트론은 전방향 계산(202)에 의해 l 층의 j번째 노드 상태가First learning pattern 201

Is input, the multi-layer perceptron consisting of the L layer has the jth node state of the l layer by omnidirectional calculation 202.

와 같이 결정된다.Is determined as follows.

여기서,here,

이며, 상기

은

과

사이의 연결 가중치,

는

의 바이어스(bias), N_l-1은 (l-1)를 나타낸다.And said

silver

and

Connection weight between

Is

Bias (bias), N _l-1 represents a (l-1).

상기와 같이 최종계층 L층 노드의 상태

이 구해지면, 다층퍼셉트론의 오차함수는 입력패턴에 대한 목표패턴

와의 관계에 의해As above, the state of the last layer L layer node

Once this is found, the error function of the multilayer perceptron is the target pattern for the input pattern.

By relationship with

로 정의되며, 이 오차함수 값을 줄이도록 오차신호가 발생되고, 이 오차신호에 따라 각 가중치들이 변경된다.An error signal is generated to reduce the error function value, and each weight is changed according to the error signal.

즉, 출력층의 오차신호 계산(203)은That is, the error signal calculation 203 of the output layer

로 계산된다.Is calculated.

오차신호 역전파(204)에 의한 각 계층 (l≤L-1)의 오차신호들은The error signals of each layer (l≤L-1) by the error signal backpropagation 204 are

로 계산된다.Is calculated.

상기에 따라 가중치 변경에 의한 학습(205)에서 각 계층의 가중치들은According to the above, in the learning by weight change 205, the weights of each layer are

에 따라 변경되어 한 학습패턴(201)에 대하여 학습이 이루어진다.The learning pattern 201 is changed according to the learning.

상기 과정을 모든 학습패턴(201)에 대하여 한 번 수행한 것을 학습횟수(sweep)라는 단위로 표시한다.Performing the above process once for all learning patterns 201 is expressed in units of learning frequency (sweep).

위에서 설명한 역전파 알고리즘에서, 출력층의 오차신호

은 목표값과 실제값의 차이에 시그모이드 활성화 함수의 기울기가 곱해진 형태이다.In the backpropagation algorithm described above, the error signal of the output layer

Is the difference between the target value and the actual value multiplied by the slope of the sigmoid activation function.

만약,

이 -1 혹은 +1에 가까운 값이면, 기울기에 대한 항 때문에

은 아주 작은 값이 된다.if,

If this is close to -1 or +1, because of the term

Is a very small value.

즉 도 4에서 재래 방법(coventional method) 곡선이 나타내는 바와 같이,

이고,

인 경우 혹은 그 반대인 경우에,

은 연결된 가중치들을 조정하기에 충분히 강한 오차신호

을 발생시키지 못한다.That is, as the conventional method curve in Figure 4,

ego,

If or vice versa,

Is an error signal strong enough to adjust the associated weights.

Does not cause

이와 같은 출력노드의 부적절한 포화가 역전파 학습에서 E_m의 최소화를 지연시키고, 어떤 학습 패턴(201)의 학습을 방해한다.This improper saturation of the output node delays the minimization of E _m in backpropagation learning and prevents learning of any learning pattern 201.

이러한 문제들을 해결하기 위해 일반화된 크로스-엔트로피 오차함수가To solve these problems, the generalized cross-entropy error function

와 같이 제안되었다.Proposed as

이 오차함수를 이용한 출력노드의 오차신호는The error signal of the output node using this error function

이 된다.Becomes

학습을 위한 다른 수식은 E_m을 이용한 역전파 알고리즘과 동일하다.The other equation for learning is the same as the backpropagation algorithm using E _m .

이 오차함수를 이용한 역전파 알고리즘은 부적절하게 포화되는 출력노드는 강한 오차신호를 발생시키는 반면에 목표값과 같은 방향으로 포화된 출력노드는 약한 오차신호를 발생시켜 출력노드의 부적절한 포화를 줄여주는 것과 동시에 학습패턴에 과도하게 학습되는 것을 막아준다.The backpropagation algorithm using this error function implies that an improperly saturated output node generates a strong error signal, while an output node saturated in the same direction as the target value produces a weak error signal to reduce the improper saturation of the output node. At the same time, it prevents over-learning from learning patterns.

도 4는

인 경우에 n에 따라 오차신호

을 비교한 것이다.4 is

Error signal according to n

Is a comparison.

상기 도 4의 Gen-CE 곡선에서 보는 바와 같이 n이 증가할수록 오차신호 값은 목표값 t_k근처에서 감소한다.As shown in the Gen-CE curve of FIG. 4, as n increases, the error signal value decreases near the target value t _k .

즉, n이 증가하면 비록 출력노드의 부적절한 포화는 감소되더라도 학습시간이 길어질 것이다.In other words, if n is increased, the learning time will be longer even if the inappropriate saturation of the output node is reduced.

실제 문제의 학습시 위에서 지적한 문제가 발생하는가를 확인하기 위해서 필기체 숫자 18,468개를 입력노드 144, 중간층 노드 30, 출력노드 10 구조의 다층퍼셉트론에 학습시켜 그 결과 시험패턴 2,213개에 대한 오인식률과 함께 도 5에 나타내었는데, 이때 학습률은 0.001 × (n+1)로 하였다.In order to confirm whether the above-mentioned problem occurs when learning the actual problem, 18,468 handwritten numbers are trained in a multi-layered perceptron of 144 input nodes, 30 intermediate layers, and 10 output nodes, and as a result, 2,213 test patterns are misidentified. 5, the learning rate was set to 0.001 × (n + 1).

상기 도 5에서 보는 바와 같이, 학습패턴 및 시험패턴에 대한 오인식률이 n에 따라 매우 심하게 변한다.As shown in FIG. 5, the false recognition rate for the learning pattern and the test pattern changes very badly with n.

상기 문제점을 해결하기 위해서 본 발명에서는 가변학습률을 제안한다.In order to solve the above problems, the present invention proposes a variable learning rate.

즉 가중치 변경에 의한 학습(205)의 각 계층의 가중치 변경량을 결정하는 파라미터 중 하나인

를 매 학습횟수(sweep) 마다That is, one of the parameters for determining the weight change amount of each layer of the training by the weight change (205)

On every swipe

와 같이 변경시킨다.Change it to

여기서, s는 학습횟수 수(sweep number)이다.Where s is the number of learning.

그러면, 출력노드 값

이 목표값 t_k에 가까이 간 경우

이 매우 작더라도 η(s)가 큰 값을 지녀, 일반화된 크로스-엔트로피 오차함수를 이용한 학습에서 n이 증가할수록 학습이 지연되는 단점을 해결한다.Then, output node value

If you are nearing this target value t _k

Even though this is very small, η (s) has a large value, and solves the disadvantage that learning is delayed as n increases in learning using the generalized cross-entropy error function.

이 방법은 η(s)를 크게 해서 학습속도의 지연을 방지하므로, η(s)가 매우 큰 값을 지닐 수도 있다.Since this method increases η (s) to prevent a delay in learning speed, η (s) may have a very large value.

이 경우에 학습은 불안정해지거나, 학습패턴(201)에 대한 과도한 학습이 심화된다.In this case, the learning becomes unstable or the excessive learning about the learning pattern 201 is intensified.

따라서, 본 발명에서는 이와 같은 문제점을 해결하기 위해서 상기 오차신호 계산(203)의 출력노드의 오차신호의 최대치를 제한한다.Therefore, in the present invention, in order to solve such a problem, the maximum value of the error signal of the output node of the error signal calculation 203 is limited.

즉

이면,In other words

If,

이 되도록 하였다.

It was made to be.

본 발명을 학습에 적용시킨 시뮬레이션 결과를 도 6에 나타내었다.6 shows a simulation result of applying the present invention to learning.

학습 결과의 n에 따른 변동이 도 5에서 보다 크게 줄었다.The variation according to n of the learning result is greatly reduced than in FIG. 5.

상술한 바와 같이 본 발명은, 일반화된 크로스-엔트로피 오차함수를 이용한 학습에서 학습성능이 오차함수의 차수에 크게 영향을 받지 않게 되는 효과가 있다.As described above, the present invention has an effect that the learning performance is not significantly affected by the order of the error function in the learning using the generalized cross-entropy error function.

Claims

In learning neural networks using generalized cross-entropy error function,

The learning rate is varied according to the ratio of the expectation to the square of the difference between the target value and the output value and the expectation to the square of the error signal,

In order to prevent the weight change amount from becoming too large, a variable learning rate and error signal size limiting method is characterized in that it is cut if the error signal of the output node is larger than a certain ratio of the square root to the expected value of the square.

The method of claim 1,

The error signal of the output node is a variation of the learning rate and error signal, characterized in that the error function represents the partial value of the weighted sum of the node.