KR101819857B1

KR101819857B1 - A sphericalizing penalization method and apparatus to improve training accuracy of artificial neural networks

Info

Publication number: KR101819857B1
Application number: KR1020170054113A
Authority: KR
Inventors: 김강일
Original assignee: 건국대학교 산학협력단
Priority date: 2017-04-27
Filing date: 2017-04-27
Publication date: 2018-01-17

Abstract

Disclosed are a method and a device for learning an artificial neural network adding a sphericalizing penalty. According to an embodiment of the present invention, the method for learning an artificial neural network comprises the following steps of: generating hidden vectors from layers of the artificial neural network based on input information; generating a cost function based on the hidden vectors; and adding the sphericalizing penalty to the cost function.

Description

Field of the Invention [0001] The present invention relates to a method and apparatus for improving learning performance of an artificial neural network,

아래 실시예들은 인공 신경망의 학습 성능을 향상시키기 위한 구형화 패널티 방법 및 장치에 관한 것이다.The following embodiments relate to a sphering penalty method and apparatus for improving learning performance of an artificial neural network.

인공 신경망(artificial neural network)은 1990년대 이후 꾸준히 연구되어왔다. 2006년 이후, 인공 신경망은 세계에서 각광받고 있고, 딥 러닝(deep learning)이라는 키워드로 지능 시스템의 기본적인 표현형으로 사용되고 있다.Artificial neural networks have been studied since the 1990s. Since 2006, artificial neural networks have been popular in the world and are being used as the basic phenotype of intelligent systems with the keyword deep learning.

인공 신경망에서는 문제를 해결하기 위해, 주어진 입력에 대한 출력을 얼마나 잘 예측하는지를 평가하는 비용 함수(cost function)를 설정하여 학습을 진행한다. 일반적인 비용 함수로는 정답 값과의 에러 또는 정답 값을 정확히 표현할 확률 등을 사용한다.In artificial neural networks, learning is performed by setting a cost function that evaluates how well the output of a given input is predicted to solve the problem. As a general cost function, an error with the correct answer value or a probability of correctly expressing the correct answer value is used.

인공 신경망이 학습할 때, 비용 함수에 패널티(penalty)를 부가하는 패널라이제이션(penalization)이라는 방법을 사용한다. 패널라이제이션은 인공 신경망이 학습하게 되는 모델의 형태를 보다 일반화하기 위한 정규화(regularization) 등과 같이, 특별한 목적을 달성하기 위해 비용 함수에 추가적인 패널티를 부가하는 방법을 일컫는다.When artificial neural networks learn, penalization is used to add a penalty to the cost function. Panelization refers to the addition of an additional penalty to the cost function to achieve a particular purpose, such as regularization to further generalize the type of model that the neural network learns.

이러한 패널라이제이션 방법 중 인공 신경망의 신경들이 활성화(activation)되는 정도를 최대한 낮추는 희귀도 유도 패널티(sparsity-inducting penalty)들이 제시된 바 있다. 대표적으로, L1-norm 또는 로그 패널티(logarithmic penalty) 함수들이 사용되어왔다. 일반적으로 이러한 활성화 함수를 줄이는 방법들은 히든 벡터들의 분포가 지나치게 넓게 퍼지는 현상을 막는 역할을 한다. 이로써, 학습시 수렴하게 되는 안장점(saddle point)들을 줄여 지역 최적해로의 수렴을 돕는 효과가 있다.Among these panelization methods, sparsity-inducting penalties have been proposed that minimize the degree of activation of neural network neurons. Typically, L1-norm or logarithmic penalty functions have been used. In general, these methods of reducing the activation function prevent the distribution of the hidden vectors from spreading too widely. This has the effect of reducing convergence saddle points during learning and helping convergence to the local optimal solution.

하지만 이런 방법들은 일괄적으로 활성화 정도를 줄이기 때문에 여러 문제들을 야기할 수 있다. 첫 번째로, 지나친 활성 값 축소로 인하여 인공 신경망의 히든 벡터들 간의 간섭이 커져 학습시 지역 최적해로의 수렴이 어려워 지는 문제점을 가진다. 두 번째로는, 학습시 에러 기반의 비용 함수에 비해 패널티의 역할이 너무 커져 수렴하는 지역 최적해의 질의 낮아지는 문제점을 가진다.However, these methods can cause various problems because they reduce the degree of activation collectively. First, there is a problem that the interference between the hidden vectors of the artificial neural network becomes large due to the excessive reduction of the active value, and the convergence to the local optimal solution becomes difficult. Second, the role of the penalty is too large as compared with the error-based cost function in learning, and there is a problem that the quality of the converged local optimal solution is lowered.

실시예들은 히든 벡터들 간의 간섭이 커져서 지역 최적해로의 수렴이 어려워 지는 문제와, 지역 최적해의 질이 낮아지는 문제를 구형화 패널티를 부가하여 해결함으로써 인공 신경망의 학습 성능을 향상시키는 기술을 제공할 수 있다.Embodiments provide a technique for improving the learning performance of the artificial neural network by solving the problem that the interference between the hidden vectors becomes large and the convergence to the local optimal solution becomes difficult and the problem of the decrease of the quality of the local optimal solution is added by the spherical penalty .

일 실시예에 따른 인공 신경망 학습 방법은 입력 정보에 기초하여 인공 신경망의 레이어(layer)들에서 히든 벡터(hidden vector)들을 생성하는 단계와, 상기 히든 벡터들에 기초하여 비용 함수(cost function)를 생성하는 단계와, 상기 비용 함수에 구형화(sphericalizing) 패널티를 부가하는 단계를 포함한다.An artificial neural network learning method according to an exemplary embodiment includes generating hidden vectors in layers of an artificial neural network based on input information and calculating a cost function based on the hidden vectors And adding a sphericalizing penalty to the cost function.

상기 구형화 패널티를 부가하는 단계는, 히든 벡터 공간(space)을 히든 벡터 희귀도(sparsity)에 기초하여 제1 히든 벡터 공간 및 제2 히든 벡터 공간으로 구별(distinguish)하는 단계와, 상기 제1 히든 벡터 공간 및 상기 제2 히든 벡터 공간에 상이한 패널티를 적용하는 단계를 포함할 수 있다.Wherein adding the sphering penalty comprises: distinguishing a hidden vector space by a first hidden vector space and a second hidden vector space based on a hidden vector sparsity; And applying a different penalty to the hidden vector space and the second hidden vector space.

상기 구별하는 단계는, 상기 히든 벡터 공간 상에 원점을 중심으로 하는 n 차원 구형 표면을 생성하는 단계와, 상기 구형 표면의 내부를 상기 제1 히든 벡터 공간으로 설정하는 단계와, 상기 구형 표면의 외부를 상기 제2 히든 벡터 공간으로 설정하는 단계를 포함할 수 있다.Wherein the step of distinguishing comprises: creating an n-dimensional spherical surface centered at the origin on the hidden vector space; setting the interior of the spherical surface to the first hidden vector space; To the second hidden vector space.

상기 패널티를 적용하는 단계는, 상기 제1 히든 벡터 공간에 0인 패널티를 적용하는 단계와, 상기 제2 히든 벡터 공간에 원점으로부터의 유클리디안 거리(Euclidean Distance)에 선형적으로 비례하여 증가하는 패널티를 적용하는 단계를 포함할 수 있다.Wherein applying the penalty comprises: applying a penalty of zero to the first hidden vector space; increasing the first hidden vector space by linearly increasing the Euclidean distance from the origin And applying a penalty.

상기 유클리디안 거리는 차원 수(number of dimension)의 제곱근 값을 한도로 할 수 있다.The Euclidean distance may limit the square root of the number of dimensions.

상기 패널티를 부가하는 단계는, 상기 히든 벡터들을 생성하는 상기 레이어들 중 적어도 하나의 레이어에 패널티를 부가하는 단계를 포함할 수 있다.The step of adding the penalty may include adding a penalty to at least one layer of the layers that generate the hidden vectors.

상기 구형화 패널티를 부가하는 단계는, 하기의 수학식에 기초하여 패널티를 부가하는 단계를 포함할 수 있다.The step of adding the sphering penalty may include adding a penalty based on the following equation.

[수학식][Mathematical Expression]

여기서,

는 패널티 함수를 의미하고,

은 인공 신경망을 구성하는 전체 레이어 셋을 의미하고,

은 인공 신경망을 구성하는 각 레이어를 의미하고,

은 레이어

에서 생성된 히든 벡터를 의미하고,

은 각 레이어의 상수를 의미하고,

는 전체 상수를 의미할 수 있다.here,

Is a penalty function,

Means an entire layer set constituting an artificial neural network,

Refers to each layer constituting the artificial neural network,

Layer

&Lt; / RTI > and < RTI ID = 0.0 >

Denotes a constant of each layer,

Can be the whole constant.

상기 패널티 함수는 하기 수학식에 기초하여 결정될 수 있다.The penalty function may be determined based on the following equation.

[수학식] [Mathematical Expression]

여기서,

는 전체 상수를 의미하고,

은 각 레이어의 상수를 의미하고, k는 튜닝 상수를 의미하고,

은 인공 신경망을 구성하는 각 레이어를 의미할 수 있다.here,

Is the total constant,

Denotes a constant of each layer, k denotes a tuning constant,

May refer to each layer constituting the artificial neural network.

일 실시예에 따른 인공 신경망 학습 장치는 입력 정보를 수신하는 입력부와, 상기 입력 정보에 기초하여 인공 신경망의 레이어들에서 히든 벡터들을 생성하고, 상기 히든 벡터들에 기초하여 비용 함수를 생성하고, 상기 비용 함수에 패널티를 부가하는 컨트롤러를 포함한다.An artificial neural network learning apparatus according to an embodiment includes an input unit for receiving input information, a generating unit for generating hidden vectors in layers of an artificial neural network based on the input information, generating a cost function based on the hidden vectors, And a controller for adding a penalty to the cost function.

상기 컨트롤러는, 입력 정보에 기초하여 인공 신경망 레이어들에서 히든 벡터들을 생성하는 벡터 생성 모듈과, 상기 히든 벡터에 기초하여 비용 함수를 생성하는 비용 함수 생성 모듈과, 상기 비용 함수에 구형화 패널티를 부가하는 패널티 부가 모듈을 포함할 수 있다.The controller includes a vector generation module for generating hidden vectors in the neural network layers based on the input information, a cost function generation module for generating a cost function based on the hidden vector, and a spherical penalty for the cost function And a penalty adding module.

상기 패널티 부가 모듈은, 히든 벡터 공간을 히든 벡터 희귀도에 기초하여 제1 히든 벡터 공간 및 제2 히든 벡터 공간으로 구별하는 구별 모듈과, 상기 제1 히든 벡터 공간 및 상기 제2 히든 벡터 공간에 상이한 패널티를 적용하는 패널티 적용 모듈(penalty application module)을 포함할 수 있다.The penalty appending module comprising: a distinguishing module for distinguishing a hidden vector space by a first hidden vector space and a second hidden vector space based on the hidden vector rarity; and a second hidden vector space for distinguishing the first hidden vector space and the second hidden vector space And a penalty application module for applying the penalty.

상기 구별 모듈은, 상기 히든 벡터 공간 상에 원점을 중심으로 하는 n 차원 구형 표면을 생성하는 구형화 모듈(sphericalizing module)과, 상기 구형 표면의 내부를 상기 제1 히든 벡터 공간으로 설정하고, 상기 구형 표면의 외부를 상기 제2 히든 벡터 공간으로 설정하는 공간 분배 모듈(space distribution module)을 포함할 수 있다.Wherein the distinction module comprises: a sphericalizing module for generating an n-dimensional spherical surface centered on the origin on the hidden vector space; and setting the inside of the spherical surface as the first hidden vector space, And a space distribution module for setting the exterior of the surface to the second hidden vector space.

상기 패널티 적용 모듈은, 상기 제1 히든 벡터 공간에 0인 패널티를 적용하고, 상기 제2 히든 벡터 공간에 원점으로부터의 유클리디안 거리에 선형적으로 비례하여 증가하는 패널티를 적용할 수 있다.The penalty applying module may apply a penalty of 0 to the first hidden vector space and a penalty that linearly increases in proportion to the Euclidean distance from the origin in the second hidden vector space.

상기 유클리디안 거리는 차원 수의 제곱근 값을 한도로 할 수 있다.The Euclidean distance may limit the square root of the number of dimensions.

상기 패널티 부가 모듈은, 상기 히든 벡터들을 생성하는 상기 레이어들 중 적어도 하나의 레이어에 패널티를 부가할 수 있다.The penalty appending module may add a penalty to at least one of the layers that generate the hidden vectors.

상기 패널티 부가 모듈은, 하기의 수학식에 기초하여 패널티를 부가할 수 있다.The penalty adding module can add a penalty based on the following equation.

[수학식] [Mathematical Expression]

여기서,

는 패널티 함수를 의미하고,

은 인공 신경망을 구성하는 전체 레이어 셋을 의미하고,

은 인공 신경망을 구성하는 각 레이어를 의미하고,

은 레이어

에서 생성된 히든 벡터를 의미하고,

은 각 레이어의 상수를 의미하고,

는 전체 상수를 의미할 수 있다.here,

Is a penalty function,

Means an entire layer set constituting an artificial neural network,

Refers to each layer constituting the artificial neural network,

Layer

&Lt; / RTI > and < RTI ID = 0.0 >

Denotes a constant of each layer,

Can be the whole constant.

[수학식] [Mathematical Expression]

여기서,

는 전체 상수를 의미하고,

Is the total constant,

Denotes a constant of each layer, k denotes a tuning constant,

May refer to each layer constituting the artificial neural network.

도 1은 일 실시예에 따른 인공 신경망 학습 장치의 개략적인 블록도를 나타낸다.
도 2는 도 1에 도시된 컨트롤러의 개략적인 블록도를 나타낸다.
도 3은 도 2에 도시된 패널티 부가 모듈의 개략적인 블록도를 나타낸다.
도 4는 도 3에 도시된 구별 모듈의 개략적인 블록도를 나타낸다.
도 5는 도 1의 인공 신경망 학습 장치가 부가하는 패널티와 기존 패널티를 비교한 그래프이다.
도 6a은 도 1에 도시된 인공 신경망 학습 장치의 학습 오류 성능을 설명하기 위한 그래프이다.
도 6b는 도 1에 도시된 인공 신경망 학습 장치의 타당성 오류 성능을 설명하기 위한 그래프이다.1 shows a schematic block diagram of an artificial neural network learning apparatus according to an embodiment.
Figure 2 shows a schematic block diagram of the controller shown in Figure 1;
Figure 3 shows a schematic block diagram of the penalty appendage module shown in Figure 2;
Figure 4 shows a schematic block diagram of the differentiator module shown in Figure 3;
FIG. 5 is a graph comparing a penalty added by the artificial neural network learning apparatus of FIG. 1 with an existing penalty.
6A is a graph for explaining learning error performance of the artificial neural network learning apparatus shown in FIG.
6B is a graph for explaining the validity error performance of the artificial neural network learning apparatus shown in FIG.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.It is to be understood that the specific structural or functional descriptions of embodiments of the present invention disclosed herein are presented for the purpose of describing embodiments only in accordance with the concepts of the present invention, May be embodied in various forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Embodiments in accordance with the concepts of the present invention are capable of various modifications and may take various forms, so that the embodiments are illustrated in the drawings and described in detail herein. However, it is not intended to limit the embodiments according to the concepts of the present invention to the specific disclosure forms, but includes changes, equivalents, or alternatives falling within the spirit and scope of the present invention.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.The terms first, second, or the like may be used to describe various elements, but the elements should not be limited by the terms. The terms may be named for the purpose of distinguishing one element from another, for example without departing from the scope of the right according to the concept of the present invention, the first element being referred to as the second element, Similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 “연결되어” 있다거나 “접속되어” 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 “직접 연결되어” 있다거나 “직접 접속되어” 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 “~사이에”와 “바로~사이에” 또는 “~에 직접 이웃하는” 등도 마찬가지로 해석되어야 한다.It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between. Expressions that describe the relationship between components, for example, "between" and "immediately" or "directly adjacent to" should be interpreted as well.

본 명세서에서 사용한 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, “포함하다” 또는 “가지다” 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, the terms " comprises ", or " having ", and the like, are used to specify one or more of the features, numbers, steps, operations, elements, But do not preclude the presence or addition of steps, operations, elements, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the meaning of the context in the relevant art and, unless explicitly defined herein, are to be interpreted as ideal or overly formal Do not.

본 명세서에서의 모듈(module)은 본 명세서에서 설명되는 각 명칭에 따른 기능과 동작을 수행할 수 있는 하드웨어를 의미할 수도 있고, 특정 기능과 동작을 수행할 수 있는 컴퓨터 프로그램 코드를 의미할 수도 있고, 또는 특정 기능과 동작을 수행시킬 수 있는 컴퓨터 프로그램 코드가 탑재된 전자적 기록 매체, 예를 들어 프로세서 또는 마이크로 프로세서를 의미할 수 있다.A module in this specification may mean hardware capable of performing the functions and operations according to the respective names described in this specification and may mean computer program codes capable of performing specific functions and operations , Or an electronic recording medium, e.g., a processor or a microprocessor, equipped with computer program code capable of performing certain functions and operations.

다시 말해, 모듈이란 본 발명의 기술적 사상을 수행하기 위한 하드웨어 및/또는 상기 하드웨어를 구동하기 위한 소프트웨어의 기능적 및/또는 구조적 결합을 의미할 수 있다.In other words, a module may mean a functional and / or structural combination of hardware for carrying out the technical idea of the present invention and / or software for driving the hardware.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. Like reference symbols in the drawings denote like elements.

도 1은 일 실시예에 따른 인공 신경망 학습 장치의 개략적인 블록도를 나타낸다.1 shows a schematic block diagram of an artificial neural network learning apparatus according to an embodiment.

도 1을 참조하면, 인공 신경망 학습 장치(artificial neural network training apparatus; 10)는 입력 정보(input information)에 기초하여 인공 신경망을 학습시킬 수 있다.Referring to FIG. 1, an artificial neural network training apparatus 10 may learn an artificial neural network based on input information.

인공 신경망 학습 장치(10)는 마더보드(motherboard)와 같은 인쇄 회로 기판(printed circuit board(PCB)), 집적 회로(intergrated circuit(IC)), 또는 SoC(system on chip)로 구현될 수 있다. 예를 들어, 인공 신경망 학습 장치(10)는 애플리케이션 프로세서(application processor)로 구현될 수 있다.The neural network learning apparatus 10 may be implemented as a printed circuit board (PCB) such as a motherboard, an integrated circuit (IC), or a system on chip (SoC). For example, the neural network learning apparatus 10 may be implemented as an application processor.

또한, 인공 신경망 학습 장치(10)는 PC(personal computer), 데이터 서버, 또는 휴대용 장치 내에 구현될 수 있다.Further, the artificial neural network learning apparatus 10 can be implemented in a personal computer (PC), a data server, or a portable device.

휴대용 장치는 랩탑(laptop) 컴퓨터, 이동 전화기, 스마트 폰(smart phone), 태블릿(tablet) PC, 모바일 인터넷 디바이스(mobile internet device(MID)), PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라(digital still camera), 디지털 비디오 카메라(digital video camera), PMP(portable multimedia player), PND(personal navigation device 또는 portable navigation device), 휴대용 게임 콘솔(handheld game console), e-북(e-book), 또는 스마트 디바이스(smart device)로 구현될 수 있다. 스마트 비다이스는 스마트 와치(smart watch), 스마트 밴드(smart band), 또는 스마트 링(smart ring)으로 구현될 수 있다.Portable devices include laptop computers, mobile phones, smart phones, tablet PCs, mobile internet devices (MIDs), personal digital assistants (PDAs), enterprise digital assistants (EDAs) A digital still camera, a digital video camera, a portable multimedia player (PMP), a personal navigation device or a portable navigation device (PND), a handheld game console, an e-book e-book, or a smart device. The smartvision can be implemented as a smart watch, a smart band, or a smart ring.

인공 신경망 학습 장치(10)는 입력부(input unit; 30) 및 컨트롤러(controller; 50)를 포함한다.The artificial neural network learning apparatus 10 includes an input unit 30 and a controller 50.

입력부(30)는 입력 정보를 수신할 수 있다. 입력부(30)는 수신한 입력 정보를 컨트롤러(50)로 출력할 수 있다. 예를 들어, 입력 정보(30)는 벡터 형태일 수 있다.The input unit 30 can receive input information. The input unit 30 can output the received input information to the controller 50. [ For example, the input information 30 may be in the form of a vector.

컨트롤러(50)는 입력 정보에 기초하여 신공 신경망 레이어(layer)들에서 히든 벡터(hidden vector)를 생성하고, 히든 벡터에 기초하여 비용 함수(cost function)를 생성할 수 있다. 또한, 컨트롤러(50)는 생성된 비용 함수에 패널티(penalty)를 부가(add)함으로써 인공 신경망의 학습(training) 성능을 향상 시킬 수 있다. 예를 들어, 컨트롤러(50)가 부가하는 패널티는 구형화 패널티(sphericalizing penalty)를 포함할 수 있다.The controller 50 may generate a hidden vector at neural network layers based on the input information and generate a cost function based on the hidden vector. In addition, the controller 50 can improve the training performance of the artificial neural network by adding a penalty to the generated cost function. For example, the penalty added by the controller 50 may include a sphericalizing penalty.

컨트롤러(50)는 히든 벡터를 사용하여 다른 신경망을 위한 특성을 추출하거나, 최종 결과를 예측할 수 있다. 인공 신경망 학습 장치(10), 예를 들어 컨트롤러(50)가 사용 하는 인공 신경망은 RNN(Recurrent Neural Network) 기반의 LSTM(Long Short Term Memory) 인공 신경망일 수 있다.The controller 50 can use the hidden vectors to extract properties for other neural networks or to predict the end result. The artificial neural network used by the artificial neural network learning apparatus 10, for example, the controller 50 may be a long term term memory (LSTM) artificial neural network based on RNN (Recurrent Neural Network).

인공 신경망에서, 히든 벡터의 기하학적 희귀도는 학습의 난이도를 결정하는 중요한 요소일 수 있다. 따라서, 컨트롤러(50)가 인공 신경망의 가중치(weight) 및 바이어스(bias) 파라미터를 업데이트 하는데 있어서, 활성화(activation)는 입력 벡터의 선형 조합으로 표현될 수 있다. 즉, 활성화는 입력 벡터 공산 상의 초평면(hyperplane)일 수 있다. 컨트롤러(50)는 각각의 레이어에 대하여 초평면의 바람직한 배열을 찾는 과정을 통해 인공 신경망을 학습 시킬 수 있다.In artificial neural networks, the geometric rarity of hidden vectors can be an important factor in determining the difficulty of learning. Thus, in the case of the controller 50 updating the weight and bias parameters of the artificial neural network, activation may be represented by a linear combination of input vectors. That is, the activation may be a hyperplane on the input vector. The controller 50 can learn the artificial neural network through a process of finding a preferable arrangement of hyperplanes for each layer.

히든 벡터가 밀집된 곳에서는, 초평면에 약간의 회전이나 이동만 발생하여도 다른 히든 벡터로부터 유도된 활성화가 영향을 받을 수 있기 때문에, 컨트롤러(50)가 인공 신경망을 학습 시키는 것이 어려울 수 있다.Where the hidden vectors are dense, it may be difficult for the controller 50 to learn the artificial neural network, because only slight rotation or movement on the hyperplane may affect the activation derived from other hidden vectors.

히든 벡터가 희귀(sparse)한 지점에서는, 컨트롤러(50)의 학습은 지역 최적해(local optima)로 작용하는 안장점(saddle point)에 의해 매우 제한될 수 있다. 이는, 0에 가까운 경사(gradient)를 생성하는 분포에서, 초평면의 위치나 각도가 더 넓기 때문이다. 최적의 희귀도를 찾아내는 것은 불명확하고, 해결하고자 하는 문제에 의존할 수 있다.At the point where the hidden vector is sparse, the learning of the controller 50 can be very limited by the saddle point acting as a local optima. This is because the position and angle of the hyperplane are wider in a distribution that produces a gradient close to zero. Finding the optimum rareness is unclear and can depend on the problem you want to solve.

패널라이제이션(penalization) 방법은 인공 신경망을 효과적으로 학습 시키기 위해 스케일 팩터(scale factor) λ가 곱해진 패널티(penalty)를 비용 함수에 부가하는 방법을 말한다. 일반적으로 패널티는 비용함수를 정규화(regularization)하기 위하여 부가되지만, 희귀도 유도(sparsity inducing)를 위해서도 부가될 수도 있다.The penalization method refers to a method of adding a penalty multiplied by a scale factor λ to a cost function to effectively learn an artificial neural network. Generally, the penalty is added to normalize the cost function, but it may also be added for sparsity inducing.

기존에 희귀도 유도 패널티(sparsity-inducing penalty) 함수로 L1-norm 패널티, 로그 패널티(logarithmic penalty)가 사용되었다. 이러한 희귀도 유도 패널티는 히든 벡터를 구성하는 뉴런들의 활성화 희귀도를 증가시킴으로써 인공 신경망을 더욱 효율적으로 사용할 수 있게 한다.Previously, L1-norm and logarithmic penalty were used as a sparsity-inducing penalty function. This rareness induction penalty makes artificial neural networks more efficient by increasing the activation rareness of the neurons that make up the hidden vector.

기존의 패널라이제이션 방법들은 최적의 희귀도를 찾기 위해서, 초평면이 극단적으로 큰 오류(error)를 생성할 때까지, 활성화를 감소시키는 압력을 통해 모든 벡터 분포를 응축시켰다. 하지만, 초평면의 위치에 의한 희귀도 바이어스 때문에 이러한 압력이 정교하게 조정될 필요가 있다.Conventional panelization methods have condensed all vector distributions through pressure to reduce activation until the hyperplane generates an extremely large error, in order to find the optimum rareness. However, these pressures need to be precisely adjusted because of the rare bias due to the position of the hyperplane.

컨트롤러(50)는 히든 레이어(hidden layer)의 차원(dimension)에 따라 히든 벡터 공간(hidden vector space)을 결정할 수 있다. 히든 벡터 공간의 실제 모양은 활성화 함수의 범위로 한정되는 하이퍼큐브(hypercube)의 형태일 수 있다. 큐브를 관통하는 초평면을 업데이트 할 때, 큐브 모양으로 인해, 초평면에 허용되는 각도 변화는 하이퍼큐브의 중심 영역 보다는 하이퍼큐브의 모서리(corner)에서 더 넓을 수 있다.The controller 50 can determine a hidden vector space according to the dimension of the hidden layer. The actual shape of the hidden vector space may be in the form of a hypercube defined by the range of the activation function. When updating a hyperplane through a cube, due to the shape of the cube, the angle variation allowed in the hyperplane may be wider at the corners of the hypercube than at the center of the hypercube.

차원이 클수록 모서리의 히든 벡터들이 중심에서 더 멀어지기 때문에, 차원이 증가할수록 초평면의 변화의 자유도도 증가할 수 있고, 다른 활성화들을 변화시키지 않는 초평면의 허용 각도도 더 커질 수 있다. 따라서, 하이퍼큐브의 모서리 영역은 히든 벡터가 희귀하여 큰 압력을 필요로 하는 반면, 중심 영역은 적은 압력을 필요로 할 수 있다.The larger the dimension, the farther away the hidden vectors of the edge are from the center, so the larger the dimension, the greater the degree of freedom of change of the hyperplane and the larger the allowable angle of the hyperplane without changing the other activations. Thus, the edge area of the hypercube may require a large pressure while the hidden vector is rare, while the center area may require less pressure.

컨트롤러(50)는 히든 벡터가 밀집된 중심 영역과, 히든 벡터가 희귀한 모서리 영역에 상이한 패널티를 적용함으로써 인공 신경망의 학습 성능을 향상시킬 수 있다.The controller 50 can improve the learning performance of the artificial neural network by applying a different penalty to the concentrated area of the hidden vector and the rare edge area of the hidden vector.

도 2는 도 1에 도시된 컨트롤러의 개략적인 블록도를 나타낸다.Fig. 2 shows a schematic block diagram of the controller shown in Fig.

도 2를 참조하면, 컨트롤러(50)는 벡터 생성 모듈(vector generation module; 100), 비용 함수 생성 모듈(cost function generation module; 200) 및 패널티 부가 모듈(penalty adding module; 300)을 포함한다.Referring to FIG. 2, the controller 50 includes a vector generation module 100, a cost function generation module 200, and a penalty adding module 300.

벡터 생성 모듈(100)은 입력 정보에 기초하여 인공 신경망의 각 레이어에서 히든 벡터를 생성할 수 있다. 예를 들어, LSTM 인공 신경망에서, 백터 생성 모듈(100)은 다음과 같은 수학식에 기초하여 히든 벡터를 생성할 수 있다.The vector generation module 100 may generate a hidden vector at each layer of the artificial neural network based on the input information. For example, in the LSTM artificial neural network, the vector generation module 100 may generate a hidden vector based on the following equation.

여기서 i는 입력 게이트 벡터(input gate vector)를 의미하고, O는 출력 게이트 벡터(ouput gate vector)를 의미하고, f는 망각 게이트 벡터(forget gate vector)를 의미할 수 있다.

는 셀 벡터(cell vector)를 의미하고,

는 히든 벡터를 의미할 수 있다.

는 시그모이드 함수(sigmoid function)를 의미하고,

는 입력 벡터를 의미할 수 있다. W는 입력에 대한 가중치를 의미하고, U는 맥락(context)에 대한 가중치를 의미하고, b는 바이어스 벡터(bias vector)를 의미할 수 있다.Here, i denotes an input gate vector, O denotes an output gate vector, and f denotes a forget gate vector.

Denotes a cell vector,

May refer to a hidden vector.

Denotes a sigmoid function,

May refer to an input vector. W denotes a weight for input, U denotes a weight for context, and b can denote a bias vector.

즉, 벡터 생성 모듈(100)은 입력 벡터로부터 생성된 게이트 벡터 및 셀 벡터의 연산을 통해 히든 벡터를 생성할 수 있다.That is, the vector generation module 100 can generate the hidden vector through the operation of the gate vector and the cell vector generated from the input vector.

비용 함수 생성 모듈(200)은 히든 벡터에 기초하여 비용함수를 생성할 수 있다.The cost function generation module 200 may generate a cost function based on the hidden vector.

비용 함수 생성 모듈(200)은 히든 벡터에 기초한 인공 신경망 학습 장치(10)가 생성한 출력 값, 및 인공 신경망이 해결하고자 하는 문제의 정답 값과의 오류에 기초하여 비용 함수를 생성할 수 있다. 예를 들어, 비용함수 생성모듈(200)은 출력 값과 정답 값 사이의 오류 또는 정답 값을 정확히 표현하는 확률 등을 비용 함수로 사용할 수 있다.The cost function generation module 200 can generate the cost function based on the output value generated by the artificial neural network learning apparatus 10 based on the hidden vector and the error between the artificial neural network and the correct value of the problem to be solved. For example, the cost function generation module 200 may use an error between an output value and a correct answer value, or a probability of accurately expressing a correct answer value, as a cost function.

비용 함수 생성 모듈(200)은 패널티 부가 모듈(300)로 비용 함수를 출력할 수 있다.The cost function generation module 200 may output the cost function to the penalty adding module 300. [

패널티 부가 모듈(300)은 비용 함수에 패널티를 부가할 수 있다. 즉, 패널티 부가 모듈(300)은 패널라이제이션을 수행할 수 있다. 패널티 부가 모듈(300)이 부가하는 패널티는 구형화 패널티를 포함할 수 있다. 예를 들어, 패널티 부가 모듈(300)은 패널티에 스케일 팩터 λ를 곱하여 패널티가 부가되지 않은 비용 함수에 더함으로써 비용 함수에 패널티를 부가할 수 있다.The penalty appending module 300 may add a penalty to the cost function. That is, the penalty adding module 300 can perform the panelization. The penalty added by the penalty appending module 300 may include a spherical penalty. For example, the penalty adding module 300 may add a penalty to the cost function by multiplying the penalty by the scale factor lambda and adding it to the penalty-free cost function.

도 3은 도 2에 도시된 패널티 부가 모듈의 개략적인 블록도를 나타낸다.Figure 3 shows a schematic block diagram of the penalty appendage module shown in Figure 2;

도 3을 참조하면, 패널티 부가 모듈(300)은 히든 벡터 공간를 구별할 수 있고, 구별된 히든 벡터 공간 각각에 상이한 패널티를 적용할 수 있다.Referring to FIG. 3, the penalty adder module 300 can distinguish hidden vector spaces and apply different penalties to each of the distinct hidden vector spaces.

패널티 부가 모듈(300)은 히든 벡터들이 생성되는 레이어들 중 적어도 하나의 레이어에 패널티를 부가할 수 있다. 예를 들어, 패널티 부가 모듈(300)은 각각의 인공 신경망 레이어들에 상이한 구형화 패널티를 부가할 수 있다.The penalty appending module 300 may add a penalty to at least one of the layers in which the hidden vectors are generated. For example, the penalty adding module 300 may add different spherical penalties to each of the artificial neural network layers.

패널티 부가 모듈(300)은 구별 모듈(distinguishing module; 310) 및 패널티 적용 모듈(penalty application module; 330)을 포함할 수 있다.The penalty appending module 300 may include a distinguishing module 310 and a penalty application module 330.

구별 모듈(310)은 히든 벡터 공간을 히든 벡터 희귀도(sparsity)에 기초하여 제1 히든 벡터 공간 및 제2 히든 벡터 공간으로 구별(distinguish)할 수 있다.The distinction module 310 may distinguish the hidden vector space by a first hidden vector space and a second hidden vector space based on the hidden vector sparsity.

예를 들어 제1 히든 벡터 공간은 히든 벡터의 희귀도가 낮은 공간, 즉, 히든 벡터가 밀집된 영역을 의미할 수 있고, 제2 히든 벡터 공간은 히든 벡터의 희귀도가 높은 공간, 즉, 히든 벡터가 희귀한 영역을 의미할 수 있다. For example, the first hidden vector space may denote a space with a low degree of rareness of the hidden vector, that is, a region in which the hidden vector is dense, and the second hidden vector space may denote a space with a high degree of rareness of the hidden vector, Can mean a rare area.

예를 들어, 구별 모듈(310)은 n 차원 구형 표면을 기준으로 제1 히든 벡터 공간과 제2 히든 벡터 공간을 구별할 수 있다. 구별 모듈(310)의 동작은 도 4를 참조하여 설명할 것이다.For example, the differentiating module 310 may distinguish between a first hidden vector space and a second hidden vector space with respect to an n-dimensional spherical surface. The operation of the distinction module 310 will be described with reference to FIG.

패널티 적용 모듈(330)은 구별된 제1 히든 벡터 공간 및 제2 히든 벡터 공간에 상이한 패널티를 적용할 수 있다. 패널티 적용 모듈(330)은 제1 히든 벡터 공간에 극단적으로 낮은 패널티를 부가하고, 제2 히든 벡터 공간에는 원점으로부터의 유클리디안 거리(Euclidean Distance)에 따라 증가하는 패널티를 부가할 수 있다. 예를 들어, 패널티 적용 모듈(330)은 제1 히든 벡터 공간에 0인 패널티를 적용할 수 있고, 제2 히든 벡터 공간에 원점으로부터의 유클리디안 거리에 선형적으로 비례하여 증가하는 패널티를 적용할 수 있다. 이 때, 패널티가 적용되는 유클리디안 거리는 차원수의 제곱근 값을 한도로 할 수 있다.The penalty applying module 330 may apply different penalties to the first hidden vector space and the second hidden vector space. The penalty applying module 330 may add an extremely low penalty to the first hidden vector space and add a penalty that increases according to the Euclidean distance from the origin in the second hidden vector space. For example, the penalty applying module 330 may apply a zero penalty to the first hidden vector space and apply a penalty that increases linearly in proportion to the Euclidean distance from the origin in the second hidden vector space can do. In this case, the Euclidean distance to which the penalty is applied may be a square root of the number of dimensions.

또한, 패널티 적용 모듈(330)은 인공 신경망 학습 장치(10)가 인공 신경망을 용이하게 학습시키기 위하여, 패널티를 미분 가능한 함수 형태로 설정할 수 있다.In addition, the penalty application module 330 can set the penalty to be a function that can be differentiated so that the artificial neural network learning apparatus 10 can easily learn the artificial neural network.

예를 들어, 패널티 적용 모듈(330)은 각 레이어에서 히든 벡터별 패널티를 생성하여 모두 더한 뒤, 히든 벡터가 생성된 레이어의 길이로 나누어 줌으로써, 레이어 별 패널티를 표준화(normalization) 할 수 있다. 패널티 적용 모듈(330)은 표준화된 레이어 별 패널티를 모든 레이어에 대해서 더한 뒤 평균 값을 구하여 패널티를 계산할 수 있다. 패널티 적용 모듈(330)에서 적용하는 패널티의 예시는 수학식 6과 같은 함수로 나타낼 수 있다.For example, the penalty applying module 330 can generate a penalty for each hidden vector in each layer and add them together, and then divide the hidden vector by the length of the generated layer, thereby normalizing the penalty for each layer. The penalty application module 330 may calculate the penalty by adding the normalized layer-by-layer penalty to all the layers and then calculating an average value. An example of the penalty applied in the penalty application module 330 may be represented by a function expressed by Equation (6).

여기서,

는 패널티 함수를 의미하고,

은 인공 신경망을 구성하는 전체 레이어 셋을 의미하고,

은 인공 신경망을 구성하는 각 레이어를 의미하고,

은 레이어

에서 생성된 히든 벡터를 의미하고,

은 각 레이어의 상수를 의미하고,

는 전체 상수를 의미할 수 있다.here,

Is a penalty function,

Means an entire layer set constituting an artificial neural network,

Refers to each layer constituting the artificial neural network,

Layer

&Lt; / RTI > and < RTI ID = 0.0 >

Denotes a constant of each layer,

Can be the whole constant.

과

는 수학식 7과 같이 나타낼 수 있다.

and

Can be expressed by Equation (7).

여기서 k는 튜닝 가능한 상수를 의미할 수 있다. 예를 들어, k는 0.001일 수 있다.Where k can be a tunable constant. For example, k may be 0.001.

수학식 6 및 수학식 7에 의한 패널티를 적용함으로써, 패널티 적용 모듈(330)은 n차원 구의 바깥에 위치한 히든 벡터들을 n차원 구형 표면으로 밀어낼 수 있다.By applying the penalty according to Equations (6) and (7), the penalty applying module 330 can push the hidden vectors located outside the n-dimensional sphere to the n-dimensional spherical surface.

도 4는 도 3에 도시된 구별 모듈의 개략적인 블록도를 나타낸다.Figure 4 shows a schematic block diagram of the differentiator module shown in Figure 3;

도 4를 참조하면, 구별 모듈(310)은 히든 벡터 공간에 원점을 중심으로 하는 구형 표면을 생성할 수 있다. 구별 모듈(310)은 구형 표면의 내부 및 외부를 각각 다른 히든 벡터 공간으로 설정할 수 있다.Referring to FIG. 4, the differentiating module 310 may generate a spherical surface centered at the origin in the hidden vector space. The distinction module 310 may set the inner and outer sides of the spherical surface to different hidden vector spaces.

구별 모듈(310)은 구형화 모듈(sphericalizing module; 311) 및 공간 분배 모듈(space distribution module; 313)을 포함할 수 있다.The distinction module 310 may include a sphericalizing module 311 and a space distribution module 313.

구형화 모듈(311)은 히든 벡터 공간에 원점을 중심으로 하는 n 차원 구형 표면을 생성할 수 있다.The sphering module 311 may generate an n-dimensional spherical surface centered at the origin in the hidden vector space.

n 차원 히든 벡터 공간은 하이퍼큐브(hypercube)형태를 가질 수 있다. 인공 신경망이 학습을 시작할 때, 히든 벡터들은 초기화를 통해 히든 벡터 공간의 원점에 위치할 수 있다. 따라서 원점과의 거리는 히든 벡터 집중도를 나타내는 척도가 될 수 있다.The n-dimensional hidden vector space may have the form of a hypercube. When the artificial neural network starts learning, the hidden vectors can be located at the origin of the hidden vector space through initialization. Thus, the distance from the origin can be a measure of the hidden vector concentration.

히든 벡터 공간 상의 원점으로부터 일정 거리 내에서는, 히든 벡터들 간의 거리가 일정할 수 있다. 원점으로부터 일정 거리 밖에서는, 원점으로부터의 거리가 멀어질수록 히든 벡터들 간의 거리가 급격하게 상승하여 히든 벡터의 집중도가 감소할 수 있다.Within a certain distance from the origin on the hidden vector space, the distance between the hidden vectors can be constant. Outside a certain distance from the origin, the distance between the hidden vectors increases sharply as the distance from the origin increases, so that the concentration of the hidden vector can be reduced.

구형화 모듈(311)은 히든 벡터 희귀도를 기준으로, 원점을 중심으로 하고 일정한 거리를 반지름으로 가지는 n차원 구형 표면을 생성할 수 있다.The spheroidizing module 311 can generate an n-dimensional spherical surface having a certain distance as a radius centered on the origin based on the hidden vector rareness.

공간 분배 모듈(313)은 구형 표면의 내부 및 외부를 제1 히든 벡터 공간 및 제2 히든 벡터 공간으로 설정함으로써 히든 벡터 공간을 구별할 수 있다. 예를 들어, 구형 표면의 내부를 제1 히든 벡터 공간으로 설정하고, 구형 표면의 외부를 제2 히든 벡터 공간으로 설정할 수 있다. 제1 히든 벡터 공간은 히든 벡터가 밀집된 원점 근처의 영역일 수 있고, 제2 히든 벡터 공간은 히든 벡터가 희귀한 구형 표면 외부의 영역일 수 있다.The spatial distribution module 313 can distinguish the hidden vector space by setting the inside and outside of the spherical surface as the first hidden vector space and the second hidden vector space. For example, the inside of the spherical surface may be set as the first hidden vector space, and the outside of the spherical surface may be set as the second hidden vector space. The first hidden vector space may be an area near the origin of the dense hiding vector, and the second hidden vector space may be an area outside the rare spherical surface of the hidden vector.

도 5는 도 1의 인공 신경망 학습 장치가 부가하는 패널티와 기존 패널티를 비교한 그래프이다.FIG. 5 is a graph comparing a penalty added by the artificial neural network learning apparatus of FIG. 1 with an existing penalty.

도 5를 참조하면, 검은색 실선으로 표시된 부분이 도 1의 인공 신경망 학습 장치(10)가 부가하는 구형화 패널티를 나타낼 수 있다. 그래프에서, sp는 구형화 패널티(Sphericalizing Penalty)를 의미할 수 있고, L1은 L1-norm 패널티를 의미하고, log는 로그 패널티를 의미할 수 있다.Referring to FIG. 5, a black solid line may represent a spherical penalty added by the neural network learning apparatus 10 of FIG. In the graph, sp can refer to a sphericalizing penalty, L1 means an L1-norm penalty, and log can mean a log penalty.

인공 신경망 학습 장치(10)는 히든 벡터의 차원 n에 따라 증가하는 패널티를 부가할 수 있다. 예를 들어, n은 2, 32 또는 400일 수 있다.The artificial neural network learning apparatus 10 can add an increasing penalty according to the dimension n of the hidden vector. For example, n may be 2, 32, or 400.

기하학적 희귀도의 관점에서 원점으로부터 히든 벡터까지의 거리는 기하학적 희귀도를 의미할 수 있다. L1 패널티 및 log 패널티는 동일한 유클리디안 거리에서 많은 가능한 벡터들을 가질 수 있으므로 그 최대 값과 최소 값으로 표현할 수 있다. 유클리디안 거리 d에 대한 L1 패널티는

의 범위를 가질 수 있고, log 패널티는

의 범위를 가질 수 있다.In terms of geometric rarity, the distance from the origin to the hidden vector may mean geometric rarity. The L1 penalty and the log penalty can have many possible vectors at the same Euclidean distance and can therefore be expressed as their maximum and minimum values. The L1 penalty for the Euclidean distance d is

, And the log penalty can range from

. &Lt; / RTI >

L1패널티 및 log패널티가 원점 근처에서도 상대적으로 높은 패널티를 부가하는 반면, 구형화 패널티는 원점 근처에서 0에 가까운 패널티를 부가할 수 있다.The L1 penalty and the log penalty add a relatively high penalty near the origin, while the sphering penalty can add a penalty close to zero near the origin.

구형화 패널티를 부가함으로써, 인공 신경망 학습 장치(10)는 최적화 과정에서 비용(cost) 과 오류(error)사이의 갈등(conflict)을 감소시킬 수 있다. 또한, 인공 신경망 학습 장치는(10) 구형화 패널티를 부가함으로써 히든 벡터들 간의 갈등도 감소시킬 수 있다.By adding the sphericalization penalty, the ANN apparatus 10 can reduce a conflict between cost and error in the optimization process. Also, the artificial neural network learning apparatus can reduce the conflict between hidden vectors by adding (10) a spherical penalty.

이하에서는 상술한 실시예들에 대한 성능에 대해서 설명한다.Hereinafter, performance of the above-described embodiments will be described.

도 6a은 도 1에 도시된 인공 신경망 학습 장치(10)의 학습 오류(training error)성능을 설명하기 위한 그래프이다.6A is a graph for explaining a training error performance of the artificial neural network learning apparatus 10 shown in FIG.

도 6b는 도 1에 도시된 인공 신경망 학습 장치(10)의 타당성 오류(validation error) 성능을 설명하기 위한 그래프이다.FIG. 6B is a graph for explaining the validation error performance of the artificial neural network learning apparatus 10 shown in FIG.

도 6a 및 도 6b는 최적의 하이퍼파라미터(hyperparameter)를 설정한 상태에서, 인공 신경망 학습 장치(10)가 100 에포크(epoch) 동안 학습을 수행하면서, 각 에포크에서 발생한 평균 학습 오류를 나타낼 수 있다. 도 6a 및 도 6b은 L1, log, sp 패널티 함수를 부가했을 때의 인공 신경망 학습 장치의 학습 오류 및 타당성 오류를 나타낼 수 있다. 또한, 도 6a 및 도 6b는 패널티 함수가 부가되지 않은 LSTM 인공 신경망 학습 장치의 학습 오류 및 타당성 오류를 나타낼 수 있다. 6A and 6B can show an average learning error occurring in each epoch while the artificial neural network learning apparatus 10 performs learning for 100 epochs with an optimal hyperparameter set. FIGS. 6A and 6B show learning errors and validity errors of the neural network learning apparatus when the L1, log, sp penalty functions are added. 6A and 6B can show a learning error and a validity error of the LSTM artificial neural network learning apparatus to which the penalty function is not added.

도 6a 및 도 6b에서, 에포크가 증가할수록, 가중치가 업데이트되면서 학습 오류가 급속도로 줄어드는 것을 확인할 수 있다. 패널티 함수가 부가되지 않은 LSTM의 경우에 비하여, 패널티 함수를 적용했을 경우에 인공 학습 장치의 학습오류가 낮아 지는 것을 확인할 수 있다.6A and 6B, it can be seen that as the epoch increases, the learning error rapidly decreases as the weight is updated. It can be confirmed that the learning error of the artificial learning apparatus is lowered when the penalty function is applied as compared with the case of the LSTM without the penalty function.

또한, 패널티 함수를 적용한 인공 학습장치 중에서, 구형화 패널티를 적용한 인공 신경망 학습 장치(10)가 도 6a 및 도 6b에 나타난 학습 장치 중 가장 낮은 학습 오류 및 타당성 오류 성능을 가지는 것을 확인할 수 있다. 즉, 구형화 패널티가 LSTM 기반의 인공 신경망 학습장치(10)의 학습 성능을 향상시킴을 알 수 있다.Also, it can be confirmed that the artificial neural network learning apparatus 10 to which the spherical penalty is applied has the lowest learning error and validity error performance among the learning apparatuses shown in FIGS. 6A and 6B among the artificial learning apparatuses to which the penalty function is applied. That is, it can be seen that the sphericalization penalty improves the learning performance of the LNTI-based neural network learning apparatus 10.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

The artificial neural network learning apparatus generating hidden vectors in layers of an artificial neural network based on input information;
The artificial neural network learning apparatus generating a cost function based on the hidden vectors; And
Wherein the artificial neural network learning apparatus adds a sphericalizing penalty to the cost function by applying a penalty different from the inside and the outside of the spherical surface on the hidden vector space
The method comprising the steps of:

The method according to claim 1,
Wherein the step of adding the sphering penalty comprises:
Distinguishing the hidden vector space by a first hidden vector space and a second hidden vector space based on a hidden vector sparsity; And
Applying a different penalty to the first hidden vector space and the second hidden vector space
The method comprising the steps of:

3. The method of claim 2,
Wherein the distinguishing step comprises:
Generating an n-dimensional spherical surface centered at the origin on the hidden vector space;
Setting an interior of the spherical surface as the first hidden vector space; And
Setting the outside of the spherical surface to the second hidden vector space
The method comprising the steps of:

3. The method of claim 2,
Wherein applying the penalty comprises:
Applying a penalty of zero to the first hidden vector space; And
Applying a penalty that linearly increases in proportion to the Euclidean distance from the origin in the second hidden vector space;
The method comprising the steps of:

5. The method of claim 4,
The Euclidean distance may be determined by limiting the square root of the number of dimensions
Artificial neural network learning method.

The method according to claim 1,
The step of adding the penalty includes:
Adding a penalty to at least one layer of the layers that generate the hidden vectors
The method comprising the steps of:

The method according to claim 1,
Wherein the step of adding the sphering penalty comprises:
Adding a penalty based on the following equation
The method comprising the steps of:
[Mathematical Expression]

here,

Is a penalty function,

Means an entire layer set constituting an artificial neural network,

Refers to each layer constituting the artificial neural network,

Layer

&Lt; / RTI > and < RTI ID = 0.0 >

Denotes a constant of each layer,

Means the whole constant.

8. The method of claim 7,
The penalty function is determined based on the following equation
Artificial neural network learning method.
[Mathematical Expression]

here,

Is the total constant,

Denotes a constant of each layer, k denotes a tuning constant,

Refers to each layer constituting the artificial neural network.

An input unit for receiving input information; And
Generating hidden vectors in the layers of the artificial neural network based on the input information, generating a cost function based on the hidden vectors, and applying a penalty different to the cost function inside and outside the spherical surface on the hidden vector space A controller that adds a sphericalizing penalty
And an artificial neural network learning device.

10. The method of claim 9,
The controller comprising:
A vector generation module for generating hidden vectors in the neural network layers based on the input information;
A cost function generation module for generating a cost function based on the hidden vector; And
A penalty unit for adding a spherical penalty to the cost function to which a different penalty is applied inside and outside the spherical surface on the hidden vector space,
Wherein the neural network learning apparatus comprises:

11. The method of claim 10,
The penalty adding module includes:
A discrimination module for discriminating the hidden vector space by a first hidden vector space and a second hidden vector space based on the hidden vector rareness; And
A penalty application module for applying a different penalty to the first hidden vector space and the second hidden vector space,
Wherein the neural network learning apparatus comprises:

12. The method of claim 11,
Wherein the distinguishing module comprises:
A sphericalizing module for creating an n-dimensional spherical surface about the origin on the hidden vector space; And
A space distribution module for setting the inside of the spherical surface as the first hidden vector space and setting the outside of the spherical surface as the second hidden vector space,
Wherein the neural network learning apparatus comprises:

12. The method of claim 11,
The penalty application module includes:
Applying a zero penalty to the first hidden vector space and applying a penalty increasing linearly in proportion to the Euclidean distance from the origin in the second hidden vector space
Artificial neural network learning device.

14. The method of claim 13,
The Euclidean distance is defined as the length of the square root of the number of dimensions
Artificial neural network learning device.

11. The method of claim 10,
The penalty adding module includes:
Adding a penalty to at least one layer of the layers for generating the hidden vectors
Artificial neural network learning device.

11. The method of claim 10,
The penalty adding module includes:
A penalty is added based on the following equation
Artificial neural network learning device.
[Mathematical Expression]

here,

Is a penalty function,

Means an entire layer set constituting an artificial neural network,

Refers to each layer constituting the artificial neural network,

Layer

&Lt; / RTI > and < RTI ID = 0.0 >

Denotes a constant of each layer,

Means the whole constant.

17. The method of claim 16,
The penalty function is determined based on the following equation
Artificial neural network learning device.
[Mathematical Expression]

here,

Is the total constant,

Denotes a constant of each layer, k denotes a tuning constant,

Refers to each layer constituting the artificial neural network.