KR20180120061A

KR20180120061A - Artificial neural network model learning method and deep learning system

Info

Publication number: KR20180120061A
Application number: KR1020170138261A
Authority: KR
Inventors: 김정희; 조아라
Original assignee: 김정희
Priority date: 2017-04-26
Filing date: 2017-10-24
Publication date: 2018-11-05
Also published as: KR102077804B1; KR20180120057A; KR102022776B1; KR102005628B1; KR20180120056A

Abstract

According to the present invention, disclosed are an artificial neural network model learning method and a deep learning system which allow a user to efficiently learn specific data without defined maximum and minimum values. According to one aspect of the present invention, the deep learning system comprises: an acquisition module which acquires at least one learning data used for learning an artificial neural network model wherein each learning data comprises M feature values corresponding to each of first to M^th features (M is an integer of two or more); and a learning module which learns the artificial neural network model by using the at least one learning data. The artificial neural network model comprises an input layer, a modification function layer, at least one inner product layer, and an output layer. The modification function layer includes K modification function nodes (K is an integer of two or more). A k^th modification function node (k is an arbitrary natural number which satisfies 1 <= k <= K) receives an M-dimensional vector corresponding to the learning data from the input layer, applies a specific k^th modification function to each feature value included in the input M-dimensional vector, generates an M-dimensional modified value vector, and outputs the generated M-dimensional modified value vector to each node of the uppermost inner product layer.

Description

{Artificial neural network model learning method and deep learning system}

본 발명은 인공 신경망 모델 학습 방법 및 딥 러닝 시스템에 관한 것으로서, 보다 상세하게는 최소값과 최대값이 정해져 있지 않은 특정한 데이터를 효율적으로 학습할 수 있는 인공 신경망 모델 학습 방법 및 딥 러닝 시스템에 관한 것이다.The present invention relates to an artificial neural network model learning method and a deep learning system, and more particularly, to an artificial neural network model learning method and a deep learning system capable of efficiently learning specific data for which a minimum value and a maximum value are not defined.

이하에서는 발명의 이해를 돕기 위하여 머신 러닝 관련 기본 이론, 전통적인 머신 러닝 알고리즘 및 딥러닝에 대하여 서술한다.Hereinafter, the basic theory related to machine learning, the conventional machine learning algorithm, and the deep running will be described in order to facilitate understanding of the invention.

1. 머신 러닝의 분류1. Classification of Machine Learning

1.1 지도학습1.1 Map Learning

지도 학습(Supervised Learning)은 학습 데이터로부터 원하는 값을 획득하기 위한 추론 함수를 구하는 머신 러닝의 한 종류로, 학습 데이터로는 입력 데이터와 이에 해당되는 정답(레이블; Label)을 가지고 있다. 지도 학습(Supervised Learning) 알고리즘은 학습 데이터를 분석하고 추측할 수 있는 함수를 만들어 내며, 이 함수는 Classifier(출력 값이 불연속 인 경우) 또는 Regression(출력 값이 연속이면 회귀 참조)이라 칭한다.Supervised learning is a type of machine learning that obtains a reasoning function to obtain a desired value from learning data. The learning data has input data and a corresponding correct label (label). The Supervised Learning algorithm generates a function that can analyze and guess the learning data. This function is called Classifier (when the output value is discontinuous) or Regression (when the output value is continuous, it is regression reference).

도 1은 지도 학습의 예(Classification)에 관한 도면이다.Fig. 1 is a diagram relating to classification of map learning.

예를 들어, 도 1에서 x라는 벡터(vector)가 입력되었을 때 추론 함수 f(x)를 사용해 hamburger를 예측하는 것을 Supervised Learning이라 한다.For example, in Fig. 1, when a vector x is input, predicting a hamburger using a reasoning function f (x) is called Supervised Learning.

1.1.1 분류(Classification)1.1.1 Classification

분류(Classification)란, 학습 데이터 (x_n,y_n)를 기반으로 추론 함수f:R^D→R를 구할 때, y_n이 이산치를 취한다는 것을 의미한다. 일반적으로 학습데이터는 입력 객체 x_n∈R^D와 출력 값 y_n∈R으로 이루어진다. 분류 문제는 Binary classification 과 Multi variable classification으로 나누어지며, Binary Classification경우에는 출력 값이 두 개의 종류이고, Multi-variable classification은 출력 값을 두 개 이상의 종류를 갖는다.Classification means that y _n takes a discrete value when finding the reasoning function f: R ^D → R based on the learning data (x _n , y _n ). Generally, learning data consists of input object x _n ∈ R ^D and output value y _n ∈ R. The classification problem is divided into binary classification and multi-variable classification. In the case of binary classification, the output value has two kinds, and the multi-variable classification has two or more kinds of output values.

도 2는 Classification의 예에 대하여 도시한 도면이다.2 is a diagram showing an example of classification.

1.1.2 회귀 추론1.1.2 Regression Reasoning

회귀(Regression) 모델은 학습 데이터 (x_n,y_n)를 가지고 추론 함수 f:R^D→R를 구할 때, y_n이 연속 값을 갖는다. 일반적으로 학습 데이터는 입력 객체 x_n∈R^D 와 출력 값 y_n∈R으로 이루어져 있으며, 예측치와 실제 값의 차이를 나타내는 loss function이 필요하며 일반적인 문제에서는 다음과 같은 Squared loss function L을 주로 사용한다.The regression model has continuous values of y _n when the inference function f: R ^D → R is obtained with the learning data (x _n , y _n ). In general, the learning data consists of the input object x _n ∈ R ^D and the output value y _n ∈ R, and a loss function representing the difference between the predicted value and the actual value is required. In general problem, the following squared loss function L is mainly used .

회귀 (Regression) 문제는 선형과 비선형으로 나누어지며, 선형 문제일 경우에는 선형 함수를 사용하고 비선형문제에서는 비선형함수를 사용해 결과를 추론한다.The regression problem is divided into linear and nonlinear, in which linear functions are used for linear problems and nonlinear functions are used for inferring the results.

도 3은 Linear regression과 Nonlinear regression의 예에 관한 도면이다.Figure 3 is an example of linear regression and nonlinear regression.

1.2 비지도 학습1.2 Bid map learning

비지도 학습(Unsupervised Learning)은 모델이 입력 패턴의 통계적 구조를 학습하는 방식으로 특정 입력 패턴을 분류하는 법을 학습하는 기법이다. 지도 학습(Supervised Learning)과 강화 학습(Reinforcement Learning)과는 다르게, 예측 값에 대한 보상이나 해당 문제에 대한 정답을 입력으로 받지 않는다. 따라서, 비지도학습은 입력에 대한 답을 찾기보다는 입력의 구조적인 특성이나 서로 다른 입력들 사이의 관계를 찾는 데에 사용된다.Unsupervised learning is a technique that learns how to classify a particular input pattern in such a way that the model learns the statistical structure of the input pattern. Unlike Supervised Learning and Reinforcement Learning, there is no input for compensating for the predicted value or the correct answer to the problem. Thus, non-instructional learning is used to find the structural characteristics of the input or the relationship between the different inputs rather than finding the answer to the input.

도 4는 Unsupervised Learning의 예에 관한 도면이다.4 is a diagram of an example of Unsupervised Learning.

예를 들면, 도 4에 도시된 바와 같이, 모델에 레이블이 없는 햄버거 사진들을 학습데이터로 제공을 한다. 이때 f(x)는 비지도학습을 위한 모델이며, 해당 모델은 각각 빵과 패티만 있는 햄버거, 여러 토핑이 들어간 햄버거, 햄버거 세트로 공통점을 찾고 이 집단들을 분류하게 된다.For example, as shown in FIG. 4, hamburger photographs without labels in a model are provided as learning data. At this time, f (x) is a model for non-map learning, and the model finds a common set of burgers with only bread and patty, a hamburger with various toppings, and a hamburger set.

1.2.1 클러스터링(Clustering)1.2.1 Clustering

도 5는 클러스터링의 예에 관한 도면이다. 클러스터링은 그림 5에서와 같이 각 입력 값들 사이의 유사점을 찾고, 이 유사점을 토대로 입력 값들을 집단으로 묶는다.5 is a diagram of an example of clustering. Clustering finds similarities between each input value, as shown in Figure 5, and groups input values based on this similarity.

Classification과 유사하게 각 입력 값들을 분류하기 위하여 사용되지만, 이전에 정의되지 않은 집단으로 입력들을 분류하는 것에서 차이가 있다. Clustering의 대표적인 예로 k-Means Clustering과 EM Clustering이 있다.Similar to Classification, it is used to classify each input value, but there is a difference in classifying inputs into groups that have not been defined before. Representative examples of clustering include k-Means Clustering and EM Clustering.

2. 전통적 머신 러닝 알고리즘2. Traditional Machine Learning Algorithm

2.1 분류 알고리즘 2.1 Classification Algorithm

대표적인 분류(Classification) 알고리즘으로는 k-NN (k-nearest Neighbor), 뉴럴 네트워크(Neural Network), SVM (Support Vector Machine) 등이 있다 [43]. 본 절에서는 대표적인 분류 알고리즘의 동작 원리에 대하여 간략히 서술한다.Typical classification algorithms include k-nearest neighbors (k-NN), neural networks, and support vector machines (SVM) [43]. This section briefly describes the operation principle of the representative classification algorithm.

2.1.1 k-Nearest Neighbor2.1.1 k-Nearest Neighbor

k-Nearest Neighbor(이하 k-NN)는 간단하면서도 효과적인 분류(Classification) 학습 알고리즘 중의 하나로, 학습 데이터와 입력된 데이터 사이의 유클리드 거리(Euclidean Distance; 또는 다른 종류의 metric들을 이용하는 변형 알고리즘도 존재)가 가장 가까운 k개의 데이터를 찾아 k개의 데이터의 레이블들을 가지고 투표하여 현재 입력된 데이터의 결과값을 추론하는 알고리즘으로 그 예는 다음과 같다.The k-Nearest Neighbor (k-NN) is one of the simple and effective Classification learning algorithms, and it uses a Euclidean distance between training data and input data (or a deformation algorithm using other kinds of metrics) An algorithm that finds the nearest k data and votes with k labels of data to infer the result of the currently input data.

도 6은 K-Nearest Neighbor의 예에 관한 도면이다.6 is a diagram illustrating an example of a K-Nearest Neighbor.

학습데이터가 사각형과 삼각형의 그룹으로 나누어진다고 할 때, 예측하고자 하는 값(도 6에서 물음표로 표기)과 모든 학습 데이터와의 거리를 구한 뒤, 거리가 가장 가까운 k개의 데이터를 추출한 다음, 그 데이터들이 가지는 레이블값들을 가지고 투표를 하는 방식이 k-NN이다. 위의 예에서 k를 3이라고 했을 때, 예측 값은 빨간색 세모가 된다.Assuming that the learning data is divided into a group of squares and triangles, the distance between all the learning data and the value to be predicted (indicated by a question mark in FIG. 6) is obtained and then k pieces of data having the closest distance are extracted, The method of voting with the label values of the k-NN is as follows. In the above example, when k is 3, the predicted value becomes red triangle.

k-NN의 장점은 학습데이터와의 거리(distance) 또는 유사도(similarity) 측정을 통해 답을 유추하는 방식이기 때문에 학습 데이터가 많을 경우 좋은 성능을 보일 가능성이 크다. 또한 다른 알고리즘에 비해 파라미터 수가 적기 때문에 비교적 안정적이다. The advantage of k-NN is that it estimates the answer through distance or similarity measurement with learning data, so it is likely to perform well when there are many learning data. It is also relatively stable because it has fewer parameters than other algorithms.

반면, 모든 학습 데이터와 비교해야 하므로 연산 비용이 크다는 단점이 있으며, 지역적인 군집이 형성되는 경우 예측에 실패할 가능성이 크다. 파라미터 k의 값을 잘 설정하는 것이 성능에 큰 영향을 끼친다.On the other hand, there is a disadvantage in that the computation cost is large because it must be compared with all the learning data, and if the local cluster is formed, the prediction is likely to fail. Setting the value of the parameter k well has a great effect on performance.

2.1.2 뉴럴 네트워크2.1.2 Neural network

인공 신경망(Artificial Neural Network)이라고도 하며 생물학적 신경망의 뉴런을 모방한 인공 뉴런들을 이용하여망을 형성하는 구조를 지칭한다.It is also called artificial neural network, and refers to a structure that forms a network by using artificial neurons that mimic neurons of biological neural networks.

최근 각광을 받고 있는 Deep Learning 알고리즘도 궁극적으로 뉴럴 네트워크의 한 종류이나, 전통적인 뉴럴 네트워크에 비해 훨씬 더 많은 Hidden Layer를 가진다.The Deep Learning algorithm, which is currently in the spotlight, is ultimately a kind of neural network, but it has much more hidden layers than conventional neural networks.

도 7은 Neural Network의 일 예에 관한 도면이며, 도 8은 Weighted Sum의 과정(Perceptron)을 도시한 도면이다.FIG. 7 is a diagram illustrating an example of a neural network, and FIG. 8 is a diagram illustrating a weighted sum process (Perceptron).

Neural Network는 입력을 받는 Input layer, 실제 학습을 하는 Hidden layer (Black box 라고도 한다), 연산의 결과를 반환하는 Output layer로 총 3개의 레이어(layer)로 구성되어 있다. 도 7에서 각각의 노드는 뉴런을 의미하며, 데이터가 입력되면 도8과 같이 가중 합(weighted sum)을 구한 후, 활성 함수(activation function)를 거쳐 다음 노드로 연산 결과값을 넘긴다. 이때 weight는 학습을 통해 각 노드마다 그 값을 갖게 된다.A neural network consists of three layers: an input layer that accepts input, a hidden layer that actually learns (also called a black box), and an output layer that returns the result of the operation. In FIG. 7, each node means a neuron. When data is input, a weighted sum is obtained as shown in FIG. 8, and an operation result is passed to the next node through an activation function. In this case, weights are learned for each node.

각각의 뉴런의 인풋으로는 앞 뉴런의 결과 값들에 가중(weighted) 되어서 입력되고 이에 bias를 더해 그 합을 연산한다. Activation function은 sigmoid, tanh, ReLU 등이 사용된다.The inputs of each neuron are weighted into the results of the previous neuron and added to bias to compute the sum. The activation function uses sigmoid, tanh, ReLU, and so on.

도 9a는 Activation Function: Sigmoid에 관한 도면이며, 도 9b는 Activation Function: tanh에 관한 도면이며, 도 9c는 Activation Function: ReLU에 관한 도면이다.9A is a diagram of an Activation Function: Sigmoid, FIG. 9B is a diagram of an Activation Function: tanh, and FIG. 9C is a diagram of an Activation Function: ReLU.

여러 뉴런으로 이루어진 신경망 구조에서의 학습은 Forward propagation, Back propagation의 2가지 단계를 거쳐 진행된다.Learning in the neural network structure composed of several neurons proceeds through two steps: forward propagation and back propagation.

도 10은 Neural Network의 Forward propagation 과정에 관한 도면이며, 도 11은 Neural Network의 Back propagation 과정에 관한 도면이다.FIG. 10 is a diagram for a forward propagation process of a neural network, and FIG. 11 is a diagram for a back propagation process of a neural network.

Forward propagation에서는 각각의 노드들의 weighted sum을 구한 뒤 activation function을 거쳐 다음 노드로 연산 값을 전달한다. 그 후 나온 결과값들을 이용해 다음과 같이 loss를 계산한다.In Forward propagation, the weighted sum of each node is obtained, and the computation value is transmitted to the next node through the activation function. Using the resulting values, we calculate the loss as follows.

이렇게 구해진 loss를 토대로 loss function의 optimal point를 찾기 위해 forward propagation 으로 구해진 loss에서의 기울기를 계산한다. 다시 말해 오차 값을 줄이기 위해 변화량을 구한 뒤, 각각의 weight에 다시 반영을 한다. 다음과 같이 편미분과 Chain rule 을 이용해 기울기를 구한다.Based on this loss, we calculate the slope of the loss obtained by forward propagation to find the optimal point of the loss function. In other words, the variation is calculated to reduce the error value, and then reflected on each weight. Obtain the slope using the partial derivatives and the chain rule as follows.

이렇게 Chain rule 과 편미분을 이용해서 해당 노드에서의 각 가중치들의 기울기들을 구하고, 이 기울기는 다시 가중치들을 보정하기 위해 사용된다.Using the chain rule and partial derivatives, we obtain the slopes of the weights at the corresponding node, and this slope is used again to correct the weights.

여기서 α는 optimizer function의 학습률을 의미한다. 위의 과정들을 각각의 퍼셉트론에서 학습했던 방향의 반대방향으로 각각의 가중치로 편미분을 해나가면서 가중치들을 보정해나간다. 이 전체 과정을 optimal point에 근사할 때까지 반복해 neural network를 학습시킨다.Where α is the learning rate of the optimizer function. We correct the weights by partially differentiating each of the weights in the opposite direction to the direction in which we learned the above procedures in each perceptron. This whole process is repeated until the optimal point is approximated and the neural network is learned.

2.1.3. Support Vector Machine2.1.3. Support Vector Machine

Support Vector Machine (SVM) 은 이진 지도 분류 학습(Supervised binary classification)의 대표적인 모델로 보편적으로 사용되고있다. SVM은 이름에서 말하는 바와 같이 학습데이터가 벡터 공간에 위치한다고 가정한다. 즉, 벡터 공간은 직각 좌표계에 학습데이터가 위치한 공간이며, 차원을 결정하는 요인은 데이터가 가지고 있는 특성(feature)이다.Support Vector Machine (SVM) is commonly used as a representative model of supervised binary classification. The SVM assumes that the learning data is located in the vector space as the name implies. That is, the vector space is the space where the learning data is located in the rectangular coordinate system, and the factor that determines the dimension is a feature possessed by the data.

도 12는 2차원공간에서 특성 값이 2개인 2차원 벡터를 표현한 그림이다. SVM의 목표는 두개의 그룹(그림에서 빨간색과 파란색)을 분리하는 직선 y=w^Tx+b를 찾는 것이다. 여기서 w는 직선에 수직인 법선 벡터 이다. b는 스칼라 상수이고 b 값에 따라 직선이 상하 좌우로 평행 이동한다.12 is a diagram showing a two-dimensional vector having two characteristic values in a two-dimensional space. The goal of the SVM is to find a straight line y = w ^T x + b separating the two groups (red and blue in the figure). Where w is a normal vector perpendicular to the straight line. b is a scalar constant, and the straight line moves in parallel up and down and left and right according to b value.

파란색 영역에 있으면 w^Tx+b>0 이고, 빨강색 영역에 있으면 w^Tx+b<0 이 된다. 문제는 입력 벡터 x가 경계선과 가까이 있으면 x의 위치를 명확히 예측하기 어렵다. 이를 해결하는 문제가 바로 SVM인것이다. 즉, SVM은 두 그룹을 구별하는 선형식을 찾되, 선형식이 표현하는 직선을 사이에 두고 가능하면 두 그룹이 멀리 떨어져 있도록 하는 직선 식을 구하는 것이다. 이 선형 방정식을 구하는 것이 Linear SVM이다.W ^T x + b> 0 if it is in the blue region, and w ^T x + b <0 if it is in the red region. The problem is that it is difficult to clearly predict the position of x if the input vector x is close to the boundary. SVM is the solution to this problem. That is, the SVM finds a line expression that separates the two groups, with a straight line between the lines represented by the line form, and possibly two groups farther apart. It is Linear SVM to obtain this linear equation.

두 직선 사이의 거리, 즉 마진(margin)을 구하는 방법은 X1이 빨간색 영역의 값이고 X2가 파란색 영역의 값이라고 할 때 다음과 같이 표현한다.The distance between two straight lines, that is, the margin, is expressed as follows when X1 is the value of the red region and X2 is the value of the blue region.

마진(margin)을 최대로 하기 위해서는 ||w||가 최소가 되어야 하기 때문에 최적화 문제로 만들기 위해

를 구하여야 한다. 결국 위의 조건을 만족시키면서 마진을 최대로 하기 위해서는 아래와 같은 식이 도출되며,To maximize the margin, || w || must be minimized to make it an optimization problem.

. In order to maximize the margin while satisfying the above conditions, the following equation is derived,

위의 최적화 문제를 풀어야 한다. 이때 a_i는 라그랑지안 승수이며, 최소값을 구하는 최적화 문제에서는 항상 0보다 크거나 같다.The above optimization problem must be solved. In this case, a _i is a Lagrangian multiplier, which is always greater than or equal to 0 in the optimization problem of finding the minimum value.

이 라그랑지안 문제를 풀기 위해 먼저 w, b에 대해서 미분을 취한 뒤 기존에 있던 공식에 대입하면 다음과 같이 나타낼 수 있으며, 이를 듀얼 라그랑지안이라 한다.To solve this Lagrangian problem, we first take differentials for w and b, and assign them to the existing formula, which is called dual lagranian.

위의 식을 N개의 학습데이터가 있는 문제로 전환하면 행렬

이 되고 이 행렬을 가지고 위의 공식을 한번 더 치환하면,If we convert the above equation into a problem with N learning data,

If we replace this formula with this matrix,

두 식의 해를 구해 최대 마진을 구하면 두 개의 그룹을 분리하는 경계식을 구할 수 있다.The solution of the two equations can be obtained and the maximum margin can be obtained to obtain the boundary equation separating the two groups.

2.1.4 의사 결정 나무2.1.4 Decision trees

의사 결정 나무(Decision Tree)는 지도 분류 학습 알고리즘 중 단순하면서도 가장 성공적인 non-parametric supervised learning 기술 중 하나이다. 이 알고리즘의 목표는 예제에서 추론된 간단한 결정 규칙을 학습하여 대상 변수의 값을 예측하는 모델을 만드는 것으로 종속 변수의 유형에 따라 회귀 분석과 분류 분석 모두에서 Decision Tree를 사용할 수 있다. Decision Tree의 내부 노드는 관측에 대한 일련의 분기 규칙을 나타내며, 이러한 일련의 분할 규칙을 이해하기 쉽게 추상 구조로 시각화할 수 있어 관찰이 용이하다. 도 13은 자동차 구매를 Decision Tree로 표현한 예이다. 각 노드는 프로그래밍 언어에서의 조건문에 대응하며, 각 조건에 대한 분기는 모든 누적된 결정을 기반으로 한 최종 결과를 나타내는 노드에 도달할 때까지 Tree를 탐색한다. 본 예제는 'BUY' 또는 'DON'T BUY'의 결정을 만들어 내는 의사 결정 나무를 표현한다.Decision Tree is one of the simplest and most successful non-parametric supervised learning techniques of the map classification learning algorithm. The goal of this algorithm is to create a model that predicts the values of the target variables by learning simple decision rules deduced from the example. Decision trees can be used in both regression analysis and classification analysis depending on the type of dependent variable. The internal node of the Decision Tree represents a series of branching rules for observations, and this series of partitioning rules can be visualized in an abstract structure for easy understanding, making observation easy. FIG. 13 shows an example of a vehicle purchase by a decision tree. Each node corresponds to a conditional statement in the programming language, and the branch for each condition searches the tree until it reaches a node that represents the end result based on all accumulated decisions. This example represents a decision tree that produces a decision of 'BUY' or 'DO NOT BUY'.

Decision Tree은 다음과 같은 세 가지 장점이 있다.The Decision Tree has three advantages:

첫째, 이해하기 쉽고 해석하기가 쉬우며 결정 트리를 시각화 할 수 있다.First, it is easy to understand and easy to interpret, and the decision tree can be visualized.

둘째, White box model를 사용하며, 주어진 상황이 모델에서 관찰 가능하다면, 조건에 대한 설명은 Boolean 논리에 의해 쉽게 설명될 수 있다. 대조적으로 Black box model (ex: 인공 신경망, 딥러닝 등)에서의 결과는 해석하기가 매우 어렵다.Second, if a white box model is used and the given situation is observable in the model, the description of the condition can be easily explained by Boolean logic. In contrast, results in black box models (ex: artificial neural networks, deep runs, etc.) are very difficult to interpret.

셋째, 통계 테스트를 사용하여 모델의 유효성을 검증할 수 있으며, 모델의 신뢰성을 판단할 수 있다.Third, the validity of the model can be verified using the statistical test, and the reliability of the model can be judged.

결정 트리를 구성하는 알고리즘에는 주로 Top-Down(하향식) 기법이 사용되며, 각 진행 단계에서는 주어진 데이터 집합을 가장 적합한 기준으로 분할하는 변수 값이 선택된다. 서로 다른 알고리즘들은 "분할의 적합성"을 측정하는 각자의 기준이 존재하며, 이러한 기준들은 보통 부분 집합 안에서의 목표 변수의 동질성을 측정한다.The algorithm that constructs the decision tree is mainly Top-Down (top-down) technique. At each stage, a variable is selected which divides a given data set into the most appropriate criteria. Different algorithms have their own criteria for measuring the "fitness of division", and these criteria usually measure the homogeneity of the target variable in a subset.

ID3 알고리즘은 분기(Branching)에 사용하는 속성을 결정하기 위해 greedy search 방법론을 사용하여 노드를 생성하고 트리를 만들어 낸다. 또한 이 트리 내의 탐색은 하향식(root → leaf) 으로 이루어진다. 트리를 만들고 탐색하는 일련의 과정을 분류 결과가 충분히 만족스러울 때까지 반복하여 수행한다.The ID3 algorithm uses the greedy search methodology to create nodes and create a tree to determine the attributes used for branching. The search in this tree is top-down (root → leaf). The process of creating and searching the tree is repeated until the classification results are satisfactory.

각 노드들의 분기에 사용 될 속성을 선택하기 위한 metric 은 Information Gain과 Entropy를 이용하여 계산한다.The metric for selecting the attributes to be used for branching each node is calculated using Information Gain and Entropy.

다음은 트리를 구성하는 알고리즘이다.The following is an algorithm for constructing a tree.

알고리즘 1. ID3 트리 구성 알고리즘Algorithm 1. ID3 tree configuration algorithm Root Node의 분기 속성을 구하고 이를 A라 한다
A의 각 가지에 대해 새로운 자손 노드를 생성한다
예제들을 해당하는 leaf 노드로 정렬한다
예제가 완벽하게 분류된 경우 stop 그렇지않으면 새 leaf노드에서 알고리즘 루프를 반복한다.Find the branch attribute of the Root Node and call it A
Create a new descendant node for each branch of A
Sorts examples into corresponding leaf nodes
If the example is perfectly categorized stop, otherwise loop the algorithm on the new leaf node.

Entropy 함수 E는 다음과 같다.The Entropy function E is as follows.

함수 인자 S는 예제를 의미하고, p는 클래스 + 학습 예제의 비율, q = 1 - p는 클래스 -의 예제 비율을 나타낸다.The function argument S represents an example, p represents the ratio of class + learning examples, and q = 1 - p represents an example ratio of classes.

이를 이용하여 Information Gain 또한 구할 수 있다.Information gain can also be obtained by using this.

여기서 Gain은 A에서의 정렬로 인한 엔트로피의 예상 감소치를 의미한다.Where Gain represents the expected decrease in entropy due to alignment in A.

도 14는 Classifier의 예에 관한 도면이다. 도14 에서는 humidity와 wind 중 Gain이 더 높은 Humidity가 더 좋은 분기 속성이 된다. 이러한 식으로 하향식 기법을 사용하여 분류 결과가 충분히 만족스러울 때까지 새로운 노드를 추가시켜 나간다.14 is a diagram illustrating an example of a classifier. In Fig. 14, the higher the humidity of humidity and the higher the humidity, the better the branch property. In this way, we use top-down techniques to add new nodes until the classification results are satisfactory.

도 15는 C4.5의 예에 관한 도면이다.15 is a diagram relating to the example of C4.5.

C4.5 분류 알고리즘은 ID3알고리즘의 몇 가지 단점들을 보완한 알고리즘이다. C4.5 알고리즘이 보완하고자 한 ID3알고리즘의 문제들은 다음과 같다.The C4.5 classification algorithm is an algorithm that complements some of the shortcomings of the ID3 algorithm. The problems of the ID3 algorithm that the C4.5 algorithm tries to complement are as follows.

수치형 속성 취급(handling continuous attribute) - ID3 알고리즘은 범주형 속성에 대해서만 트리를 생성하는 방법을 제시하고있다. 따라서 수치형 속성은 모델 생성에 활용할 수 없는 한계가 있다. C4.5에서는 수치형 속성까지 사용하는 방법을 제안한다.Handling continuous attribute - The ID3 algorithm suggests a way to create a tree only for categorical attributes. Therefore, numerical attributes have limitations that can not be applied to model creation. In C4.5, we propose a method of using up to numerical attributes.

연속적인 값을 가진 속성을 처리하기 위해 Binary 분할이 수행된다. 일단 속성값을 정렬한 후, 속성의 모든 분리점에 대해 Gain을 계산한다. 가장 좋은 분리점(h)으로 선택되면, A속성은A≤h, A>h로 분할된다. 예를 들어 도 15의 첫번째 node인 glu 속성을 보면, Gain이 가장 낮은 지점이 123으로 연산되어 그 지점을 분리점(h)로 자동 선택된 것을 확인 할 수 있다. 이와 마찬가지로 bmi, age, dp, ped도 동일한 방식으로 각각의 분리점(h)를 자동으로 선택하여 최종적으로는 입력된 데이터의 레이블을 예측 할 수 있다.Binary segmentation is performed to process attributes with successive values. Once the attribute values are sorted, a gain is computed for all breakpoints of the attribute. When selected as the best separation point (h), the A property is divided into A? H and A> h. For example, if we look at the glu attribute, the first node in Figure 15, we can see that the point with the lowest gain is computed as 123 and the point automatically selected as the split point (h). Similarly, bmi, age, dp, and ped can automatically select each breakpoint h in the same way and finally predict the label of the input data.

트리의 깊이 문제 - ID3 알고리즘으로 트리 모델을 생성할 경우 트리의 깊이가 너무 깊게 들어가는 문제가 있다. C4.5 알고리즘에서는 이 문제를 해결하기 위해 깊이를 제한한다.Tree Depth Problem - When creating a tree model with the ID3 algorithm, there is a problem that the depth of the tree becomes too deep. The C4.5 algorithm limits the depth to solve this problem.

결측치 처리 - 데이터 중 특정 속성의 값이 부분적으로 입력되어 있지 않는 데이터에 대한 처리 문제의 누락된 값은 일반적으로 "?" 로 표시된다. 누락된 값을 다루는 것은 대체를 포함하며, 대체는 주요한 기능이 누락된 경우 사용 가능한 데이터로부터 추정할 수 있음을 의미한다. Distribution-based imputation은 누락된 기능에 대해 서로 다른 값을 가진 여러 인스턴스로 예제를 나눌 때 수행된다. 특정 누락 값에 대해 추정된 확률에 해당하는 가중치가 할당되고, 가중치의 최대값은 1이 된다.Missing values - The missing values of the processing problem for data for which the value of a particular attribute is not partially entered in the data is usually "?" . Dealing with missing values involves substitution, which means that the substitution can be estimated from the available data if the major function is missing. Distribution-based imputation is performed when an example is divided into multiple instances with different values for missing functionality. A weight corresponding to the estimated probability is assigned to a specific missing value, and the maximum value of the weight is 1.

CART(Classification and Regression Tree)분류 알고리즘은 기존 C4.5와 비슷한 방식의 알고리즘으로 그 차이점은 다음과 같다.The Classification and Regression Tree (CART) classification algorithm is similar to the existing C4.5 algorithm. The difference is as follows.

- 자손 노드의 수를 2개로 한정- Limit the number of child nodes to 2

- Entropy대신 Gini impurity를 사용하여 Information Gain을 구함- Gain information gain using Gini impurity instead of Entropy

2.1.5 Random Forest2.1.5 Random Forest

Random Forest는 여러 개의 결정 트리(DT)를 bagging기법을 사용하여 학습하는 방식의 머신 러닝 알고리즘이다.Random Forest is a machine learning algorithm that learns multiple decision trees (DT) using bagging technique.

Bias-variance trade off란, 기계학습 알고리즘에서 bias를 줄이면 variance가 높아지는 것처럼, learning error들이 일종의 zero sum 게임 양상을 보이는 것을 말한다. 이를 해결하기 위한 방법으로 Bagging을 사용한다.Bias-variance trade off means that learning errors have a sort of zero sum game appearance, as the variance increases as bias decreases in machine learning algorithms. Bagging is used as a solution to this problem.

Bagging이란, Bootstrap Aggregation의 약자로 bias-variance trade off를 극복하기 위해 사용되는 방법이다. 여기에서 bias와 variance는 학습 오류(learning error)를 구성하는 두 요소로 bias가 높으면 예측 결과가 실제 결과와 비교해서 부정확한 경우가 많고, variance가 높으면 예측 결과가 어떤 학습 예제 에서는 성능이 좋으나, 다른 학습예제에서는 성능이 크게 저하되어 예측 결과의 안정성이 떨어지게 된다.Bagging is an abbreviation of Bootstrap Aggregation and is used to overcome the bias-variance trade off. Here, bias and variance are two factors that constitute a learning error. If the bias is high, the prediction result is often inaccurate compared to the actual result. If the variance is high, the prediction result is good in some learning examples, In the learning example, the performance is greatly degraded and the stability of the prediction result is degraded.

따라서 Bagging은 주어진 학습예제에서 랜덤하게 subset을 N번 샘플링하여 N개의 예측 모형을 만들어 개별예측모형의 결과를 voting하는 방식으로 예측 결과를 결정하여 bias-variance trade off문제를 해결하는 것을 말한다.Thus, Bagging refers to solving the bias-variance trade-off problem by determining the prediction results by voting the results of the individual prediction models by sampling N subsets randomly in a given learning sample N times to generate N prediction models.

Bagging를 통해 Random forest를 학습 과정은 다음과 같이 크게 세 단계로 구성된다.The learning process of the random forest through bagging consists of three stages as follows.

알고리즘 2. Bagging을 통한 Random forest 학습 과정Algorithm 2. Random forest learning process through bagging Bootstrap방법을 통해 N개의 학습 예제를 생성한다.
N개의 DT를 학습시킨다.
DT을 하나의 분류기로 결합(앙상블)한다, 이때 평균 또는 과반수 투표 방식을 이용하여 결과를 예측한다.Through the bootstrap method, N learning examples are generated.
N DTs are learned.
Combine the DT into a classifier (ensemble), then use the average or majority voting method to predict the outcome.

DT는 작은 bias와 큰 variance를 갖기 때문에 매우 깊은 DT에서는 과적합(Overfitting) 문제를 겪게 된다. 한 개의 DT는 학습 예제에 있는 노이즈에 매우 민감하지만, 서로 다른 DT들이 서로 연관되어 있지 않은 경우에는 여러 DT를 평균화 하면 노이즈에 대한 민감도를 줄일 수 있다. 그러나, 동일한 학습 예제로 훈련시킬 경우, 각 DT 상관성이 커지므로, 서로 다른 랜덤 한 학습 예제를 사용하여 DT간의 상관성을 줄여나갈 수 있다. Random Forest의 알고리즘은 다음과 같다.Since DT has a small bias and large variance, it experiences overfitting problems in very deep DTs. One DT is very sensitive to the noise in the training example, but if different DTs are not related, averaging the DTs can reduce the sensitivity to noise. However, when training with the same learning example, each DT correlation is larger, so we can use different random learning examples to reduce the correlation between DTs. The algorithm of Random Forest is as follows.

알고리즘 3. Random ForestAlgorithm 3. Random Forest Precondition: A training set

, features F, and number of trees in forest B.
function RANDOMFOREST(S,F)

for

do

A boostrap sample from

RANDOMIZEDTREELEARN

end for
return H
end function
function RANDOMIZEDTREELEARN(S,F)
At each node:

very small subset of

Split on best feature in

Return The learned tree
end function Precondition: A training set

, features F, and number of trees in forest B.
function RANDOMFOREST (S, F)

for

do

A boostrap sample from

RANDOMIZEDTREELEARN

end for
return H
end function
function RANDOMIZEDTREELEARN (S, F)
At each node:

very small subset of

Split on best feature in

Return The learned tree
end function

DT와 Random Forest의 차이점으로는 크게 두 가지를 들 수 있다. 첫째, Random Forest는 학습 예제를 bootstrap를 사용해 반복 샘플링하여 여러 DT를 학습시킨다는 점이며, 두 번째로는 각 분할에 대해 m 개의 임의로 선택한 variable 만 고려한다는 점이다.There are two main differences between DT and Random Forest. First, the Random Forest trains multiple DTs by repeatedly sampling the training samples using bootstrap, and second, it considers only m arbitrarily selected variables for each partition.

Out-of bag (OOB) 에러를 사용하여 일반화 오차를 추정한다. OOB 에러는 여러 DT를 각각 학습 데이터로 학습시킨 후, 테스트 데이터를 이용하여 각각의 결과를 예측한 뒤, 이를 투표하여 최종 결과를 추정 하고, 실제 값과의 오류를 측정한 것이다. 이 OOB가 중요한 이유는 Breiman[1996b]의 배깅된 분류기들의 오차 측정에 대한 발명에서 OOB 예측 방법이 학습 예제의 크기와 같은 테스트 셋을 사용하여 검증한 것만큼 정확하다는 것을 알려주는 실증적인 증거를 주었기 때문이다.Out-of-bag (OOB) errors are used to estimate the generalization error. The OOB error is obtained by learning each DT as learning data, estimating each result using test data, estimating the final result by voting it, and measuring the error with the actual value. This OOB is important because in Breiman's (1996b) invention of measuring errors in the classifier of the bogged classifiers, we give empirical evidence that the OOB prediction method is as accurate as the test set, Because.

트레이닝 이후 j번째 feature의 중요성을 측정하기 위하여 j번째 feature를 제외한 데이터로 랜덤 포레스트의 OOB오차를 계산한다. j번째 변수의 중요도 점수는 모든 트리들에 대해서 원본 데이터 집합의 OOB오차 값이 큰 feature는 작은 값을 갖는 feature보다 높은 순위의 중요성을 가지게 된다.To measure the significance of the jth feature after training, we calculate the OOB error of the random forest using data excluding the jth feature. The significance score of the jth variable has a higher order of importance than that of a feature with a small OOB error value in the original data set for all trees.

2.1.6 Gradient Boosted Tree2.1.6 Gradient Boosted Tree

Gradient Boosted Tree에서 Gradient Boosted의 의미는 'optimizer function에 적합한'으로 해석 할 수 있다. 여기서 Gradient는 Gradient Descent Optimizer이며, Gradient Boosted Tree란, Gradient Descent Optimizer를 이용해 예측 값과 실제 값의 오차를 줄여 나가며 학습하는 모델이다. 먼저 Objective function 을 정의하면 다음과 같다.In Gradient Boosted Tree, the meaning of Gradient Boosted can be interpreted as 'suitable for optimizer function'. Here, Gradient is a Gradient Descent Optimizer, and Gradient Boosted Tree is a model that learns by reducing errors between predicted and actual values using Gradient Descent Optimizer. First, define the objective function as follows.

여기서

은 loss function이며,

는 regularization,

는 다음과 같다.here

Is a loss function,

Regularization,

Is as follows.

where,

path id in the structure of

's tree,

weightwhere,

path id in the structure of

's tree,

weight

여기에서 모델이 배워야 할 것을 고정시키고, 다음의 단계를 통해 한 단계당 하나의 나무만 새로 추가하는 규칙을 이용해 다음과 같은 Hypothesis function을 구할 수 있다.Here you can find the Hypothesis function using the rule that the model has to learn what you need to learn and then add only one tree per step through the following steps.

where,

and

space of all possible decision trees.where,

and

space of all possible decision trees.

Objective function의 loss function으로 logistic loss를 사용할 경우 그 식은 다음과 같다.If logistic loss is used as a loss function of an objective function, the equation is as follows.

앞서 언급한 규칙이 여기에서는 hypothesis가 되므로

라 할 수 있다. 이때

가 미리 학습된 모델이라 가정하고

가 새로 만들 모델이라 가정했을 때, 그 과정은 다음과 같다.The aforementioned rule is hypothesis here

. At this time

Is a pre-learned model

Assuming the model is a new model, the process is as follows.

이 결과를 다시 Objective function에 대입하면 다음과 같다.Substituting this result into the Objective function,

위의 식으로 정리하게 되고, 앞서 언급했던 logistic loss라 가정했을 때 Gain function 을 계산하면,Gain function is calculated by assuming the above-mentioned logistic loss,

라는 식을 구할 수 있게 된다. 여기에서

는 왼쪽 가지의 스코어이고,

는 오른쪽 가지의 스토어이며,

는 원래 가지의 스코어이고,

는 regularizer이다.Can be obtained. From here

Is the score of the left branch,

Is the store on the right branch,

Is the original branch's score,

Is a regularizer.

여기에서 Gain이

보다 작다면 새로운 가지를 추가하지 않는게 더 낫다는 해석을 할 수 있다.Here,

If it is smaller, it can be interpreted that it is better not to add a new branch.

2.1.7 Ensemble Boosted Tree2.1.7 Ensemble Boosted Tree

Ensemble Boosted Tree는 위에서 언급된 Gradient Boosted Tree 를 Ensemble 한 것이다. Ensemble을 하기 위해선 Ensemble을 할 모델들 간의 correlation이 낮아야 하기 때문에 각각의 모델에 다른 학습데이터를 넣어주게 된다. 만약 학습 데이터가 적다면 resampling을 한다.Ensemble Boosted Tree is an ensemble of the Gradient Boosted Tree mentioned above. In order to make an ensemble, the correlation between the models to be ensemble should be low, so different learning data is put into each model. If less training data is available, resampling is performed.

각각의 Boosted Tree의 학습 결과를 통해 만들어진 모델들의 결과를 다음과 같이 투표를 통해 최종적인 결과 값으로 사용한다.The results of the models created through the learning results of each Boosted Tree are used as the final results by voting as follows.

where,

and K is total number of Model,

is Classifier.where,

and K is the total number of Model,

is Classifier.

2.2 클러스터링(군집화)2.2 Clustering (Clustering)

3.2.1 k-means Clustering3.2.1 k-means clustering

k-Means Clustering은 총 입력의 개수가 n개라 가정했을 때 k개의 집단으로 군집(Clustering)하여 분류하는 방식이다.k-Means Clustering is a method of classifying k clusters by assuming that the total number of inputs is n.

각 입력 값들은 기준점을 중심으로 Euclidean Distance가 최소가 되도록

개의 집단으로 군집한다. 이때 기준점은

가 되며

는 입력(데이터)이고 다음을 최소화 하는 방향으로 학습된다.Each input value is adjusted so that the Euclidean distance is minimized around the reference point.

Groups. At this time,

And

Is the input (data) and is learned in a direction that minimizes the following.

k-Means Clustering은 빠르고 알고리즘이 간단하지만 적절한 k값을 찾지 못하면 성능이 떨어지고, 이상치(outlier)에 민감하며 평균이 다른 Cluster 들을 잘 구분하지 못하는 특징이 있다.k-Means Clustering is fast and simple, but if it does not find an appropriate k value, it will degrade performance, be sensitive to outliers, and can not distinguish between different clusters.

3.2.2 Expectation-Maximum Clustering3.2.2 Expectation-Maximum Clustering

EM Clustering(Expectation-Maximum Clustering)은 알 수 없는 변수(parameter)

의 Maximum likelihood 나 Maximum a posteriori 를 찾기 위한 방법이다.EM Clustering (Expectation-Maximum Clustering) is an unknown parameter,

And the maximum likelihood or Maximum a posteriori.

여기서

는 우도(likelihood) 함수이며here

Is a likelihood function

위의 log의 특성과 우도함수의 정의를 이용해 식(1)의 우변으로 유도됨을 알 수 있다. (여기서 분포 Z에 관한 식으로 전개한 이유는 구하려고 하는 것이 군집화 하려는 분포이기 때문이다).It can be seen that it is derived to the right side of equation (1) by using the above log and the definition of the likelihood function. (Here, the reason for the expansion of the distribution Z is because it is the distribution to be clustered.

은 우도 함수를 이루는 항중 하나로 우도 함수를 직접적으로 증가시키는게 아니라

을 증가시켜 최대값을 찾는다.

Does not directly increase the likelihood function to one of the points making up the likelihood function

To find the maximum value.

주어진 임의의 순간

에서의 변수

에서 주어진

와

의 조건부 확률 분포(Conditional Distribution)

에 대한 log likelihood의 기대 값을 구한다.Given arbitrary moment

Variables in

Given in

Wow

Conditional Distribution of Conditional Distribution

The expected value of log likelihood is obtained.

이를 최대화하는 방향으로 학습한다. 그러나 Maximum likelihood는 관측된 입력 값에 따라 출력 값이 너무 민감하게 변하기 때문에 이를 해결하기 위해 Maximum a posteriori 방법을 사용하기도 한다.Learning in the direction of maximizing this. However, the maximum likelihood method uses the Maximum a posteriori method to solve this problem because the output value changes too sensitively according to the observed input value.

3. Deep Learning3. Deep Learning

도 16은 Convolutional Neural Network의 예(Alex Net)에 관한 도면이다.16 is a diagram of an example of a Convolutional Neural Network (Alex Net).

3.1 Convolution3.1 Convolution

Convolution이란 signal processing 분야에서 주로 사용하는 operation으로, 다음과 같이 표현된다.Convolution is an operation that is used mainly in the field of signal processing.

Convolution은 임의의 filter를 학습하여 주어진 matrix혹은 vector로부터 적절한 feature를 뽑아내기 위해 사용한 operation이다. 특히 이미지와 음성 인식 분야에서 주로 사용되고 있다.Convolution is the operation used to learn arbitrary filters and extract the appropriate features from a given matrix or vector. It is mainly used in image and speech recognition.

3.2 Local Connectivity3.2 Local Connectivity

고차원 입력을 다룰 때 이전 볼륨의 모든 뉴런들을 서로 연결하는 것은 메모리나 연산 비용에서 비현실적이므로 각 뉴런을 입력 볼륨의 로컬 영역에만 연결한다. 이때 이 연결의 공간적 범위는 뉴런의 수용 필드라고 하는 하이퍼 매개 변수이다. 깊이 축을 따라 연결되는 범위는 항상 입력 볼륨의 깊이와 같으며, 공간 치수(너비와 높이)와 깊이 치수를 처리하는 방법에서 이 비대칭을 다시 강조하는 것이 중요하다. 연결은 공간에서 로컬(너비와 높이를 따라)이지만 항상 입력 볼륨의 전체 깊이와 동일하다.When dealing with high-dimensional inputs, connecting all the neurons of the previous volume to each other is impractical in terms of memory or computation, so each neuron is connected only to the local area of the input volume. The spatial extent of this connection is then a hyper parameter called the acceptance field of the neuron. The extent to which the depth axis is connected is always equal to the depth of the input volume, and it is important to re-emphasize this asymmetry in how spatial dimensions (width and height) and depth dimensions are handled. The connection is local (in width and height) in space, but always equal to the total depth of the input volume.

3.3 Shared Weights3.3 Shared Weights

예시:example:

[227x227x3]의 이미지를 인풋 벡터라고 가정하고, 첫 번째 convolution layer에서는 수용 필드 크기 F = 11, 스트라이드 S = 4 및 제로 패딩 P = 0을 갖는 뉴런을 사용하면, Convolution layer의 깊이가 K = 96, (227-11)/4+1=55이므로 Convolution layer의 출력 볼륨의 크기는 [55x55x96]이다. 이 볼륨의 55 * 55 * 96 뉴런 각각은 입력 볼륨에서 크기 [11x11x3]의 영역에 연결된다. 또한, 각 깊이 열에 있는 모든 96 개의 뉴런은 입력의 동일한 [11x11x3] 영역에 연결 되며, 서로 다른 노드와 연결되어 각각 학습된다.Using the neuron with the acceptance field size F = 11, stride S = 4 and zero padding P = 0 in the first convolution layer, we assume that the depth of the convolution layer is K = 96, (227-11) / 4 + 1 = 55, the output volume of the convolution layer is [55x55x96]. Each of the 55 * 55 * 96 neurons of this volume is connected to an area of size [11x11x3] in the input volume. In addition, all 96 neurons in each depth row are connected to the same [11x11x3] region of the input, and are learned and connected to different nodes, respectively.

Weights의 공유 방식은 매개 변수의 수를 제어하기 위해 Convolution 에서 사용된다. 위의 실제 예제를 보면 첫 번째 전환 레이어에 55 * 55 * 96 = 290,400 개의 뉴런이 있고 각각 11 * 11 * 3 = 363 개의 weights와 1 개의 bias가 있음을 알 수 있다. 이것에 의해, Convolution Network 최초의 층에 290400 * 364 = 105,705,600의 파라미터가 추가된다. 이 숫자는 비용 측면에서 매우 큰 수로 하나의 특징이 어떤 공간적 위치 (x,y)에서 계산하는 것이 유용하다면, 다른 위치 (x2,y2)에서 계산하는 것이 매개 변수의 수를 극적으로 줄일 수 있다는 것이 밝혀졌다. 즉 [55 * 55 * 96]개의 뉴런이 있다면 [55 * 55] 의 공간적 위치 정보는 공유를 하고, 깊이 차원만 따로 학습을 할 수 있게 되어 수학적 계산 비용절감을 할 수 있다.The way weights are shared is used in Convolution to control the number of parameters. The actual example above shows that there are 55 * 55 * 96 = 290,400 neurons in the first transition layer and 11 * 11 * 3 = 363 weights and one bias, respectively. This adds 290400 * 364 = 105,705,600 parameters to the first layer of the Convolution Network. This number is a very large number in terms of cost. If it is useful to calculate one feature at some spatial location (x, y), calculating at another location (x2, y2) can dramatically reduce the number of parameters It turned out. That is, if there are [55 * 55 * 96] neurons, the spatial location information of [55 * 55] can be shared, and the depth dimension can be learned separately.

3.4 Pooling Layers3.4 Pooling Layers

Pooling의 기능은 표현의 공간 크기를 점차적으로 줄여서 네트워크의 매개 변수 및 계산량을 줄이고 overfitting을 방지하는 것이다. Pooling layer는 모든 레이어에서 독립적으로 작동하고 지정한 filter 사이즈 내에서 maximum 값 또는 mean 값을 취하는 방식으로 공간적으로 크기를 줄여간다. 가장 일반적인 형태는 크기가 2x2 인 필터가 적용된 Pooling layer 이다. 입력의 깊이 슬라이스를 폭과 높이에 따라 2씩 두 번 down-sampling하여 75 %를 비활성화 시킨다. Max pooling외에도 pooling 장치는 mean pooling 또는 L2-norm pooling과 같은 다른 기능을 수행 할 수도 있다.The function of the pooling is to gradually reduce the spatial size of the representation, thereby reducing the parameters and computation of the network and preventing overfitting. The pooling layer works independently on all layers and spatially reduces it by taking the maximum or mean value within the specified filter size. The most common form is a pooling layer with a 2x2 filter. Down-sampling the depth slice of the input twice, twice by width and height, to inactivate 75%. In addition to Max pooling, the pooling device may perform other functions such as mean pooling or L2-norm pooling.

도 17은 Max-pooling의 예에 관한 도면이다.17 is a diagram illustrating an example of Max-pooling.

도 17의 pooling layer는 입력 볼륨의 각 깊이 슬라이스에 대해 독립적으로 볼륨을 공간적 down-sampling한다.The pooling layer of FIG. 17 spatially down-samples the volume independently for each depth slice of the input volume.

도 17의 좌측의 예에서 크기 [224x224x64]의 입력 볼륨은 필터 크기 2로 풀링되어 크기는 [112x112x64]의 출력 볼륨에 0.5배가되며 볼륨 깊이는 유지된다. 도 17의 우측은 가장 일반적인 down sampling 연산인 max pooling의 예이다.In the example on the left side of FIG. 17, the input volume of size [224x224x64] is pooled to filter size 2, the size is 0.5 times the output volume of [112x112x64], and the volume depth is maintained. The right side of Figure 17 is an example of max pooling, the most common down sampling operation.

3.5 ReLU3.5 ReLU

도 18은 ReLU Function의 예에 관한 도면이다.18 is a diagram illustrating an example of the ReLU Function.

Rectified Linear Unit은 함수

를 계산한다. 즉, 활성화는 도 18에서처럼 단순히 0으로 임계화 한다. ReLU를 사용하는 데는 몇 가지 장단점이 있다. Rectified Linear Unit Functions

. That is, activation is simply thresholded to zero as in Fig. There are several advantages and disadvantages to using ReLU.

첫 번째 장점으로는, 실험적 결과로 추정했을 때, Sigmoid나 tanh 함수에 비해 stochastic gradient descent의 수렴을 크게 증가시키는데 이것은 선형적이고 포화되지 않는 형태이기 때문이다. 두 번째로는, 연산량이 많은 tanh와 sigmoid 뉴런과 비교하여, ReLU는 단순히 활성화 행렬을 0으로 thresholding하여 구현할 수 있다는 점이다.The first advantage is that when estimated from the experimental results, the convergence of the stochastic gradient descent is significantly increased compared to the Sigmoid and tanh functions because it is a linear, non-saturating form. Second, ReLU can be implemented by simply thresholding the activation matrix to zero, compared to the computationally intensive tanh and sigmoid neurons.

반면, 단점으로는 ReLU 유닛은 학습 중에 큰 gradient 값을 입력 받을 경우, 이로 인해 뉴런이 모든 데이터 포인트에서 다시 활성화되지 않도록 가중치가 0으로 고정되어 다시는 학습되지 않을 수 있다. 예를 들어, 학습 속도가 너무 높게 설정된 경우 네트워크의 40 % 정도가 업데이트 되지 못할 수 있으나 학습 속도의 적절한 설정으로 이러한 문제는 해결 가능하다.On the other hand, the disadvantage is that when the ReLU unit receives a large gradient value during learning, the weight is fixed at zero so that the neuron is not reactivated at all data points and may not be learned again. For example, if the learning rate is set too high, about 40% of the network may not be updated, but the problem can be solved by setting the learning rate appropriately.

3.6 Inner Product3.6 Inner Product

내적은 대수적 또는 기하학적으로 정의 될 수 있다. 기하학적 정의는 각도 및 거리 개념 (벡터의 크기)을 기반으로 하며 Neural Network에서의 hidden layer의 형태와 동일하다.The inner product can be defined algebraically or geometrically. The geometric definition is based on the concept of angles and distances (the size of the vector) and is the same as the shape of the hidden layer in Neural Network.

3.7 Softmax3.7 Softmax

Softmax regression(또는 multinomial logistic regression)은 multi-class classification을 해결하기 위한 방법이다. 일반적으로 logistic regression에서 라벨이

와 같이 binary라고 가정한다면, Softmax regression는 다음과 같이 인

개의 클래스를 예측할 수 있게 해준다. Softmax function은 다음과 같다.Softmax regression (or multinomial logistic regression) is a method for solving multi-class classification. Generally, in logistic regression,

Assuming binary as follows, Softmax regression is sign

It allows you to predict the classes of The softmax function is as follows.

정보 이론에서의 실제 데이터 분포 p와 추정된 분포 q 사이의 cross-entropy는 다음과 같이 정의된다.The cross-entropy between the actual data distribution p and the estimated distribution q in the information theory is defined as

따라서, Softmax 는 추정 된 클래스 확률들 사이의 cross-entropy를 최소화한다.Thus, Softmax minimizes cross-entropy between estimated class probabilities.

3.8 Dropout3.8 Dropout

Dropout은 효과적이고 간단한 정규화 기술로 L1, L2, max-norm와 같은 정규화 방법을 보완하는 방법으로 신경망의 Overfitting을 방지하는 효과가 있다. Dropout은 약간의 확률로 뉴런을 활성화 상태로 유지하거나 비활성화하여 노드간의 연결을 끊어 모든 노드간의 정보를 정규화하는 기법이다. 단, 테스트 중에는 모든 뉴런을 활성화시킨다. 도 19는 Dropout의 예에 관한 도면이다.Dropout is an effective and simple normalization technique, which is a complementary method to normalization methods such as L1, L2, and max-norm, which prevents overfitting of neural networks. Dropout is a technique to normalize information between all nodes by disconnecting the nodes by keeping the neurons active or deactivating them with a small probability. However, all neurons are activated during the test. 19 is a diagram related to an example of Dropout.

3.9 Finetuning3.9 Finetuning

기존에 학습 완료된 weight들을 사용함으로써 일부 layer의 weights를 빠르게 미세 조정할 수 있는 기법으로 데이터가 부족할 경우, 다른 유사 데이터 셋에서 학습 완료한 weight를 가져와 initialization하는 방법으로 학습 효과를 극대화 시킬 수 있는 기법이다. 이 때, 미세조정을 통해 일반적인 것에서 특화된 문제에 대해 좋은 결과를 도출해 낼 수 있다.It is a technique to quickly fine-tune the weights of some layers by using already learned weights. It is a technique to maximize the learning effect by initializing the weights that have already been learned in other similar data sets when there is insufficient data. At this point, fine tuning can produce good results for a generalized problem.

신호 처리에 사용되는 데이터들과는 달리 최소값과 최대값이 정해져 있지 않은 특정한 데이터(예를 들면, 재무제표 데이터)의 경우 기존에 알려져 있는 머신러닝 기법들에 그대로 적용하기에는 값의 크기가 너무 클 수 있으며, 이로 인하여 Loss 또한 클 수 밖에 없으며, 종래의 딥 러닝에서도 이러한 문제점이 그대로 나타날 수 밖에 없다.In the case of specific data (for example, financial statement data) in which the minimum value and the maximum value are not defined, unlike the data used for signal processing, the value may be too large to be applied to the known machine learning techniques. As a result, the loss also has to be large, and such a problem can not be avoided even in the conventional deep running.

따라서, 본 발명이 해결하고자 하는 과제는 최소값과 최대값이 정해져 있지 않은 특정한 데이터(예를 들면, 재무제표 데이터)를 효율적으로 학습할 수 있는 인공 신경망 모델 및 이를 이용한 기계학습 방법을 제공하는 것이다.SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide an artificial neural network model capable of efficiently learning specific data (for example, financial statement data) in which a minimum value and a maximum value are not defined, and a machine learning method using the artificial neural network model.

본 발명의 일 측면에 따르면, 딥 러닝 시스템으로서, 인공 신경망 모델을 학습하는데 이용되는 적어도 하나의 학습 데이터를 획득하는 획득모듈-여기서 상기 학습 데이터 각각은, 제1특징 내지 제M특징(M은 2이상의 정수) 각각에 상응하는 M개의 특징 값을 포함함- 및 상기 적어도 하나의 학습 데이터를 이용하여 상기 인공 신경망 모델을 학습하는 학습모듈을 포함하되, 상기 인공 신경망 모델은, 입력 레이어(input layer), 변환 함수 레이어(modification function layer), 적어도 하나의 이너 프로덕트 레이어(inner product layer), 및 출력 레이어(output layer)를 포함하도록 구성되며, 상기 변환 함수 레이어는, K개(K는 2 이상의 자연수)의 변환 함수 노드를 포함하도록 구성되며, k번째 변환 함수 노드(k는 1<=k<=K를 만족하는 임의의 자연수)는, 상기 입력 레이어로부터 학습 데이터에 상응하는 M차원 벡터를 입력받고, 입력된 M차원 벡터에 포함된 각각의 특징 값에 소정의 제k 변환 함수를 적용하여 M차원 변환 값 벡터를 생성하고, 생성된 M차원 변환 값 벡터를 최상단 이너 프로덕트 레이어의 각 노드로 출력하도록 구성되는 딥 러닝 시스템이 제공된다.According to an aspect of the invention, there is provided a deep learning system, comprising: an acquisition module for acquiring at least one learning data used to learn an artificial neural network model, wherein each of said learning data comprises first to Mth features, And an learning module for learning the artificial neural network model using the at least one learning data, wherein the artificial neural network model includes an input layer, A transform function layer, at least one inner product layer, and an output layer, wherein the transform function layer comprises K (K is a natural number of 2 or more) , Wherein the kth transform function node (k is an arbitrary natural number satisfying 1 < = k < = K) Dimensional transformed value vector is generated by applying a predetermined k-th transform function to each feature value included in the input M-dimensional vector, and the generated M-dimensional transformed value vector is generated A deep running system is provided that is configured to output to each node of the top-most inner product layer.

일 실시예에서, 상기 제1변환 함수 내지 제K 변환 함수는 모두 S-shaped curve 성질을 가지는 함수 또는 구간(0,∞)에서 함수값은 증가하고 미분값은 감소하는 성질을 가지는 함수인 것을 특징으로 할 수 있다.In one embodiment, the first transform function to the K-th transform function are all functions having an S-shaped curve property or a function having a property of increasing a function value and decreasing a derivative value in an interval (0,?) .

일 실시예에서, 상기 제1변환 함수 내지 제K 변환 함수 각각은, sigmoid 함수, hyperbolic tangent 함수, 하기 [수식]에 의해 표현될 수 있다.In one embodiment, each of the first to K-th transform functions may be expressed by a sigmoid function, a hyperbolic tangent function,

[수식] tf(x) = log(α_h×x+β_h) (여기서, h는 1<=h<=H인 각각의 정수(H는 1이상의 정수)이며 α_h 및 β_h 는 미리 정의된 상수임)[Formula] tf (x) = log ( α h × x + β h) ( here, h is 1 <= h <= H in each of the integer (H is an integer of 1 or greater), and α _h and β _h is pre-defined Lt; / RTI >

본 발명의 다른 일 측면에 따르면, 인공 신경망 모델 학습 방법으로서, 인공 신경망 모델을 학습하는데 이용되는 적어도 하나의 학습 데이터를 획득하는 획득단계-여기서 상기 학습 데이터 각각은, 제1특징 내지 제M특징(M은 2이상의 정수) 각각에 상응하는 M개의 특징 값을 포함함- 및 상기 적어도 하나의 학습 데이터를 이용하여 인공 신경망 모델을 학습하는 학습단계를 포함하되, 상기 인공 신경망 모델은, 입력 레이어(input layer), 변환 함수 레이어(modification function layer), 적어도 하나의 이너 프로덕트 레이어(inner product layer), 및 출력 레이어(output layer)를 포함하도록 구성되며, 상기 변환 함수 레이어는, K개(K는 2 이상의 자연수)의 변환 함수 노드를 포함하도록 구성되며, k번째 변환 함수 노드(k는 1<=k<=K를 만족하는 임의의 자연수)는, 상기 입력 레이어로부터 학습 데이터에 상응하는 M차원 벡터를 입력받고, 입력된 M차원 벡터에 포함된 각각의 특징 값에 소정의 제k 변환 함수를 적용하여 M차원 변환 값 벡터를 생성하고, 생성된 M차원 변환 값 벡터를 최상단 이너 프로덕트 레이어의 각 노드로 출력하도록 구성되는 인공 신경망 모델 학습 방법이 제공된다.According to another aspect of the present invention, there is provided an artificial neural network model learning method, comprising: acquiring at least one learning data used for learning an artificial neural network model, wherein each of the learning data includes: M is an integer greater than or equal to 2), and learning the artificial neural network model using the at least one learning data, wherein the artificial neural network model comprises an input layer wherein the transform function layer comprises K (K is an integer greater than or equal to 2), a transform function layer, at least one inner product layer, and an output layer, (K is an arbitrary natural number satisfying 1 < = k < = K) is included in the input layer Dimensional transformed value vector by applying a predetermined k-th transform function to each feature value included in the input M-dimensional vector, and generating an M-dimensional transformed value vector by using the generated M-dimensional transformed value An artificial neural network model learning method is provided that is configured to output vectors to each node of the uppermost inner product layer.

본 발명의 다른 일 측면에 따르면, 데이터 처리 장치에 설치되어, 상술한 방법을 수행하기 위한 컴퓨터 프로그램이 제공된다.According to another aspect of the present invention, there is provided a computer program installed in a data processing apparatus for performing the above-described method.

본 발명의 다른 일 측면에 따르면, 딥 러닝 시스템으로서, 프로세서 및 컴퓨터 프로그램을 저장한 메모리를 포함하되, 상기 컴퓨터 프로그램은 상기 프로세서에 의해 실행되는 경우, 상기 딥러닝 시스템이 상술한 방법을 수행하도록 하는 딥 러닝 시스템이 제공된다.According to another aspect of the present invention there is provided a deep running system comprising: a processor and a memory storing a computer program, wherein the computer program, when executed by the processor, causes the deep learning system to perform the method A deep running system is provided.

최소값과 최대값이 정해져 있지 않은 특정한 데이터(예를 들면, 재무제표 데이터)를 효율적으로 학습할 수 있는 인공 신경망 모델 및 이를 이용한 기계학습 방법을 제공할 수 있다.It is possible to provide an artificial neural network model capable of efficiently learning specific data (for example, financial statement data) in which the minimum value and the maximum value are not defined, and a machine learning method using the artificial neural network model.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1은 지도 학습의 예(Classification)에 관한 도면이다.
도 2는 Classification의 예에 대하여 도시한 도면이다.
도 3은 Linear regression과 Nonlinear regression의 예에 관한 도면이다.
도 4는 Unsupervised Learning의 예에 관한 도면이다.
도 5는 클러스터링의 예에 관한 도면이다.
도 6은 K-Nearest Neighbor의 예에 관한 도면이다.
도 7은 Neural Network의 일 예에 관한 도면이며, 도 8은 Weighted Sum의 과정(Perceptron)을 도시한 도면이다.
도 9a는 Activation Function: Sigmoid에 관한 도면이며, 도 9b는 Activation Function: tanh에 관한 도면이며, 도 9c는 Activation Function: ReLU에 관한 도면이다.
도 10은 Neural Network의 Forward propagation 과정에 관한 도면이며, 도 11은 Neural Network의 Back propagation 과정에 관한 도면이다.
도 12는 2차원공간에서 특성 값이 2개인 2차원 벡터를 표현한 그림이다.
도 13은 자동차 구매를 Decision Tree로 표현한 예이다.
도 14는 Classifier의 예에 관한 도면이다.
도 15는 C4.5의 예에 관한 도면이다.
도 16은 Convolutional Neural Network의 예(Alex Net)에 관한 도면이다.
도 17은 Max-pooling의 예에 관한 도면이다.
도 18은 ReLU Function의 예에 관한 도면이다.
도 19는 Dropout의 예에 관한 도면이다.
도 20은 Normal Ensemble Model의 일 예를 도시한 도면이다.
도 21은 Regression Model의 일 예를 도시한 도면이다.
도 22는 Function Concatenate Model의 일 예를 도시한 도면이다.
도 23은 LSTM Model의 일 예를 도시한 도면이다.
도 24는 본 발명의 일 실시예에 따른 딥 러닝 시스템의 개략적인 구조를 나타내는 블록도이다.
도 25는 기초 학습 데이터의 일 예를 도시한 도면이다.
도 26은 본 발명의 일 실시예에 따른 인공 신경망 모델의 일 예를 도시한 도면이다.
도 27은 본 발명의 일 실시예에 따른 인공 신경망 모델 학습 방법을 개략적으로 도시한 흐름도이다.BRIEF DESCRIPTION OF THE DRAWINGS A brief description of each drawing is provided to more fully understand the drawings recited in the description of the invention.
Fig. 1 is a diagram relating to classification of map learning.
2 is a diagram showing an example of classification.
Figure 3 is an example of linear regression and nonlinear regression.
4 is a diagram of an example of Unsupervised Learning.
5 is a diagram of an example of clustering.
6 is a diagram illustrating an example of a K-Nearest Neighbor.
FIG. 7 is a diagram illustrating an example of a neural network, and FIG. 8 is a diagram illustrating a weighted sum process (Perceptron).
9A is a diagram of an Activation Function: Sigmoid, FIG. 9B is a diagram of an Activation Function: tanh, and FIG. 9C is a diagram of an Activation Function: ReLU.
FIG. 10 is a diagram for a forward propagation process of a neural network, and FIG. 11 is a diagram for a back propagation process of a neural network.
12 is a diagram showing a two-dimensional vector having two characteristic values in a two-dimensional space.
FIG. 13 shows an example of a vehicle purchase by a decision tree.
14 is a diagram illustrating an example of a classifier.
15 is a diagram relating to the example of C4.5.
16 is a diagram of an example of a Convolutional Neural Network (Alex Net).
17 is a diagram illustrating an example of Max-pooling.
18 is a diagram illustrating an example of the ReLU Function.
19 is a diagram related to an example of Dropout.
20 is a diagram showing an example of a Normal Ensemble Model.
21 is a diagram showing an example of a regression model.
22 is a diagram showing an example of a Function Concatenate Model.
23 is a diagram showing an example of the LSTM Model.
24 is a block diagram showing a schematic structure of a deep learning system according to an embodiment of the present invention.
25 is a diagram showing an example of basic learning data.
26 is a diagram illustrating an example of an artificial neural network model according to an embodiment of the present invention.
27 is a flowchart schematically illustrating an artificial neural network model learning method according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.BRIEF DESCRIPTION OF THE DRAWINGS The present invention is capable of various modifications and various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise.

본 명세서에 있어서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, the terms "comprises" or "having" and the like refer to the presence of stated features, integers, steps, operations, elements, components, or combinations thereof, But do not preclude the presence or addition of features, numbers, steps, operations, components, parts, or combinations thereof.

또한, 본 명세서에 있어서는 어느 하나의 구성요소가 다른 구성요소로 데이터를 '전송'하는 경우에는 상기 구성요소는 상기 다른 구성요소로 직접 상기 데이터를 전송할 수도 있고, 적어도 하나의 또 다른 구성요소를 통하여 상기 데이터를 상기 다른 구성요소로 전송할 수도 있는 것을 의미한다. 반대로 어느 하나의 구성요소가 다른 구성요소로 데이터를 '직접 전송'하는 경우에는 상기 구성요소에서 다른 구성요소를 통하지 않고 상기 다른 구성요소로 상기 데이터가 전송되는 것을 의미한다.Also, in this specification, when any one element 'transmits' data to another element, the element may transmit the data directly to the other element, or may be transmitted through at least one other element And may transmit the data to the other component. Conversely, when one element 'directly transmits' data to another element, it means that the data is transmitted to the other element without passing through another element in the element.

이하에서는 출원 발명을 도출해내기 위하여 수행된 연구의 상세한 내용 및 이를 통해 제시한 알고리즘에 관하여 먼저 설명하기로 한다.Hereinafter, the details of the research performed to derive the claimed invention and the algorithms presented therefrom will be described first.

1. 데이터의 특성1. Characteristics of data

본 연구의 주요한 목표는 재무제표 데이터가 머신 러닝에 효과적으로 적용될 수 있도록 하는 것이다. 따라서, 본 연구에서 사용되는 학습 데이터는 금액, 비율 등의 금융 데이터로 기존의 딥러닝이 주로 다루는 이미지나 음성 등의 신호데이터와 달리 값의 범위가 매우 크기 때문에 데이터 정규화(Data Normalization)의 영향을 많이 받는 특성이 있다.The main goal of this study is to enable financial statement data to be effectively applied to machine learning. Therefore, the learning data used in this study is the financial data such as the amount and the ratio, and unlike the signal data such as image or voice, which is mainly dealt with by the existing deep learning, the range of the value is very large, so the influence of the data normalization There are many characteristics to receive.

2. 데이터 전처리 기법2. Data preprocessing technique

2.1 데이터 정규화 기법2.1 Data normalization technique

본 연구에서 주로 다루는 재무제표 데이터의 경우 신호 처리에 사용되는 데이터들과는 달리 최소값과 최대값이 정해져 있지 않기 때문에 기존에 알려져 있는 머신러닝 기법들에 그대로 적용하기에는 값의 크기가 너무 크기 때문에 그 Loss 또한 클 수 밖에 없다. 따라서 학습이 전혀 안되거나 학습이 되더라도 시간이 매우 오래 걸린다. 따라서, 데이터 normalization을 적용하는 것은 반드시 필요하다. In the case of financial statement data, which is mainly covered in this study, since the minimum value and the maximum value are not determined unlike the data used for signal processing, the value is too large to be applied to the known machine learning techniques. I can not help it. So it takes a lot of time if learning is never done or learning is done. Therefore, it is essential to apply data normalization.

이하에서는 통계학분야에서 널리 사용되는 세 가지의 normalization 방법에 대하여 기술한다.Hereinafter, three normalization methods widely used in the field of statistics are described.

(1) Feature scaling (zero to one)(1) Feature scaling (zero to one)

데이터 normalization 기법에서 가장 흔하게 사용되고 있는 Feature scaling 방법으로, 최소값과 최대값에 따라 데이터를 0에서 1 사이로 변환한다. The feature scaling method, which is most commonly used in data normalization, converts data from 0 to 1 according to the minimum and maximum values.

이 방법의 경우 모든 수를 0에서 1사이로 변환시키기 때문에 상당한 정보 손실을 발생시키는 단점이 있다.This method has the disadvantage of causing considerable information loss because it converts all numbers from 0 to 1.

(2) z-score normalization(2) z-score normalization

이 정규화 방법에서 x의 값은 평균 및 표준 편차에 따라 정규화되며 그 수식은 다음과 같다.In this normalization method, the value of x is normalized according to the mean and standard deviation, and its formula is as follows.

여기에서

는

의 평균값이고,

는 표준편차이다. From here

The

&Lt; / RTI >

Is the standard deviation.

이 방법은 전체 데이터의 최소값과 최대값을 알 수 없을 때 유용하게 사용된다.This method is useful when the minimum and maximum values of the entire data can not be known.

(3) Median and Median Absolute Deviation(3) Median and Median Absolute Deviation

중앙값과 중앙 절대 편차 (MAD; Median and median absolute deviation)는 변량 표본(univariate sample)의 변동성에 대한 중요한 척도이다. MAD는 통계적 분산 측정 값이며, 표준편차보다 데이터 아웃라이어에 탄력적 적용이 가능하다. 그 수식은 다음과 같다.Median and median absolute deviation (MAD) are important measures of variability in the univariate sample. MAD is a statistical variance measure, which can be applied more flexibly to data outliers than standard deviations. The formula is as follows.

2.2 각 feature에 적합한 normalization 기법의 자동 선택 방법2.2 Automatic selection method of normalization technique for each feature

개별 feature에 가장 적합한 정규화 알고리즘을 선택하는 것은 많은 시간과 노력이 필요한 일로, 특히 재무제표 데이터와 같이 feature의 종류가 매우 많은 경우에는 최적의 정규화 알고리즘을 찾는 것이 대단히 어렵다. 따라서 여러 가지 정규화 기법을 동시에 적용시키고 모든 데이터를 한꺼번에 학습시키는 방법도 사용할 수 있으나, 이는 연산량과 메모리의 한계가 있으며, 노이즈 데이터가 동시에 발생되는 문제가 있다.Choosing the most suitable normalization algorithm for each feature requires a lot of time and effort. It is very difficult to find an optimal normalization algorithm especially when there are many kinds of features such as financial statement data. Therefore, a method of simultaneously applying various normalization techniques and learning all the data at once can be used. However, there is a limitation in the amount of computation and memory, and noise data are simultaneously generated.

이러한 문제를 극복하기 위하여 자동으로 개별 feature에 가장 적합한 정규화 방식을 찾아내는 알고리즘을 제안한다. 알고리즘은 아래에 기술된 바와 같다.To overcome this problem, we propose an algorithm that automatically finds the best normalization method for each feature. The algorithm is as described below.

알고리즘 4. 각 feature에 적합한 normalization기법 선택 알고리즘Algorithm 4. Normalization technique selection algorithm suitable for each feature 1. Train 데이터에 대하여 다음 수식을 각각 적용한다.

(Feature scaling)

(z-score normalization)

(Median and Median Absolute Deviation)
2. 1의 결과에 각각 다음의 함수를 적용한다. (a는 1의 결과들의 각 instance)

3. 2의 결과를 label을 기준으로 bad class와 good class로 분류한다.

4. 3에서 분류한 데이터 각각의 feature histogram 작성
두 그룹별 히스토그램 사이의 Euclidean distance 측정

5. 각 feature별로 d가 큰 3의 결과를 취함

1. Apply the following equations to the train data.

(Feature scaling)

(z-score normalization)

(Median and Median Absolute Deviation)
2. Apply the following function to the result of 1, respectively. (a is each instance of the results of 1)

3. Classify the result of 2 as bad class and good class based on label.

4. Create feature histogram for each of the data classified in 3
Euclidean distance measurement between histograms of two groups

5. For each feature, d takes the result of large 3

위 알고리즘은 개별 feature에 각각 적용된다.The above algorithm applies to each individual feature.

2.3 데이터 필터링 기법2.3 Data filtering techniques

학습에 이용되는 데이터의 feature 중에는 이진분류 문제를 푸는데 오히려 방해가 되는 noise feature가 존재하며, 학습 데이터에 부여된 label이 해당 학습 데이터가 아닌 다른 데이터에 기초하여 도출되는 경우가 있을 수 있는데(예를 들어, 재무제표 데이터의 경우, label이 해당 재무제표 데이터로부터 도출된 것이 아닐 수 있음), 이러한 경우 labeling이 잘못된 것으로 간주할 수 있다. 따라서, 잘못 labeling된 데이터는 학습에 혼란을 야기한다.Among the features of the data used for learning, there is a noise feature that interferes with solving the binary classification problem, and the label assigned to the learning data may be derived based on data other than the learning data (for example, For example, in the case of financial statement data, the label may not be derived from the corresponding financial statement data), in which case the labeling may be considered incorrect. Thus, mislabeled data can cause confusion in learning.

이러한 문제를 해결하기 위해 다양한 데이터 필터링 알고리즘을 제안하였으며, 제안된 알고리즘들은 다음과 같다.To solve these problems, various data filtering algorithms have been proposed. The proposed algorithms are as follows.

2.3.1 k-means Clustering을 이용한 유사 기업 그룹핑2.3.1 Grouping of similar companies using k-means clustering

본 발명은 제조업만을 기준으로 실험을 진행하고 있으나, 그 안에서도 세부 업종 또는 규모가 다를 경우 재무제표의 형태가 달라 유사형태의 기업이 많은 쪽으로 편향된 학습이 이루어진다. 따라서, 기업을 세부적으로 다시 나누어 각각 학습시키는 형태로 데이터 필터링을 진행한다.Although the present invention is based on the manufacturing industry only, if the types of business or the scale of the business are different, the type of financial statement is different, and the learning is biased toward many companies of the similar type. Therefore, the data filtering is carried out in such a way that the companies are divided into detailed and re-learning.

상술한 k-means clustering기법을 이용하여 지정된 k개의 업체로 그룹을 나눈 뒤, 학습이 가능한 수의 집단만을 이용하여 학습 알고리즘들을 통해 최종 결과를 도출한다.The k-means clustering method is used to divide the group into k companies, and then the final results are obtained through learning algorithms using only a number of groups that can be learned.

2.3.2 Histogram distance를 이용한 무의미한 feature 제거2.3.2 Removing meaningless features using histogram distance

학습 데이터가 가지고 있는 모든 feature가 이진분류 문제(예를 들면, good class와 bad class의 구분)를 푸는데 변별력이 없을 수 있으며, 오히려 방해가 되는 요소들이 있을 수도 있다. 이러한 데이터들은 메모리와 연산 시간을 많이 소모하며 학습에 방해되므로 제거하는 것이 오히려 효율적 학습에 도움이 된다. All the features of the learning data may not be distinguishable in solving binary classification problems (for example, the distinction between good and bad classes), and there may be some disturbing elements. Such data consumes a lot of memory and computation time, and it interferes with learning.

따라서, 불필요한 feature들을 자동으로 제거하는 방법을 다음과 같이 제안한다.Therefore, a method of automatically removing unnecessary features is proposed as follows.

알고리즘 5. Histogram distance를 이용한 무의미한 feature 제거 알고리즘Algorithm 5. Elimination of meaningless feature using Histogram distance 각 feature별로 good class와 bad class를 동일 조건으로 계산한다.
두 히스토그램 사이의 Euclidean distance 측정

Euclidean distance가 threshold 이상인 feature만을 남기고, 이하인 것은 noise feature로 간주하여 제거한다.
유의미하다고 판단된 feature만을 이용하여 학습 알고리즘에 적용한다.For each feature, calculate good class and bad class on the same condition.
Euclidean distance measurement between two histograms

Euclidean leaves only those features with a distance equal to or greater than the threshold, and removes them as noise features.
And apply it to the learning algorithm using only the features that are judged to be significant.

2.3.3 k-NN을 이용한 noise data 제거2.3.3 Removing noise data using k-NN

현재 가지고 있는 instance들 중, bad class를 갖는 instance들이 실제 학습 데이터가 아닌 다른 데이터를 통해 작성된 것일 경우, 현재 데이터를 기준으로는 정상 데이터이므로 학습에 많은 장애를 발생시킨다. 따라서, 실제 학습 데이터에 근거하여 도출된 label을 가지고 있는 데이터들만을 사용해야 올바른 학습을 시킬 수 있다. 그러나, 현재 가지고 있는 정보만으로는 어떠한 것이 정상 데이터인지 알 수 없어 이를 임의로 추정하는 알고리즘이 필요하다. 따라서, 본 발명에서는 소정의 구분기준을 만족하는 학습 데이터는 good class로, 나머지 데이터는 bad glass로 labeling하고(이때, 상기 구분기준은 상기 구분기준에 의해 good class로 분류된 데이터가 모두 정상임을 보장할 수 있어야 함), good class로 labeling된 데이터와 bad class로 labeling되어 있는 데이터간의 유사도를 측정해 유사도가 높을 경우에는 noise 데이터로 간주하여 제거하는 방법을 제시한다.Among the instances that are currently held, instances with bad classes are created through data other than the actual learning data, which causes many obstacles to learning because it is normal data based on the current data. Therefore, only the data having the label derived based on the actual learning data can be used for correct learning. However, it is impossible to know what is the normal data based on the current information alone, and an algorithm for randomly estimating it is needed. Therefore, in the present invention, the learning data satisfying a predetermined classification criterion is labeled as a good class and the remaining data is labeled as a bad glass (the classification criterion is that the data classified as a good class by the classification criterion are all normal , It is suggested to measure the similarity between the data labeled with good class and the data labeled with bad class and regard it as noise data if the similarity is high.

알고리즘 6. k-NN을 이용한 noise data 제거Algorithm 6. Remove noise data using k-NN 소정의 구분기준을 만족하는 instance의 label을 0, 나머지를 1로 분리.
Label이 0인 instance들과 1인 instance의 distance를 구한다.

,

: label이 0인

번째 instance,

: label이 1인 instance,

번째 feature값

The label of the instance that satisfies a predetermined division criterion is divided into 0 and the rest is divided into 1.
Obtain the instances of Label 0 and the distance of instance 1.

,

: label is 0

Th instance,

: instance with label 1,

Th feature value

2.3.4 Random Forest를 이용한 noise data 제거2.3.4 Removing noise data using Random Forest

기존에 알려져 있는 Random Forest 알고리즘을 이용해 도출된 Feature importance가 높은 feature들만을 추출하여 실험에 사용하는 방법을 제안한다. 이때, Random Forest 내부에서 random하게 data를 선택하는 부분이 있어 실험 결과의 variance가 높은 것을 보완하기 위하여, 동일 데이터를 이용해 여러 번 feature importance를 추출한 뒤, Top 10에 들어오는 feature들만을 병합하여 최종 feature로 사용한다.In this paper, we propose a method to extract features with high feature importance derived by using the previously known Random Forest algorithm. At this time, in order to compensate the variance of the experiment result, there are parts to randomly select the data in the Random Forest. After extracting the feature importance several times using the same data, only the features coming in the Top 10 are merged, use.

3.5 SVM을 이용한 noise data 제거3.5 Removing noise data using SVM

널리 쓰이는 feature selection 방법 중, 가장 직관적이면서도 널리 쓰이는 3가지 방법론은, forward selection, backward selection 그리고 stepwise selection 기법을 들 수 있다. 먼저 forward selection은 각각의 개별 feature에 대해 특정 분류 방법론을 가지고 분류기를 구축하여, 가장 정확도가 높은 feature를 선택하고, 기존에 선택된 feature와 선택되지 않은 나머지 feature로 짝을 이룬 각각의 모델에 대해 다시 분류기를 구축하여 성능이 가장 뛰어난 feature combination을 발전시켜나가는 모델이다. 점진적으로 feature를 추가하여 얻어진 feature combination을 기반한 분류 모델의 성능이 더 이상 향상되지 않을 때까지 feature combination을 확장시켜 나간다(일반적으로는 n개의 candidate feature들이 있을 때, n 번의 반복을 수행하기도 한다). Backward selection의 경우는, 최초 모든 feature 들을 기반으로 분류 모델을 구축 그 성능을 평가하고, 개별 feature 들을 하나씩 제거한 후 다시 분류 모델을 구축, 그 중 성능을 가장 떨어뜨리는 feature를 하나씩 점진적으로 제거해 나가며 combined된 feature 집합을 생성해 나가는 방식이다. 마지막으로 Stepwise selection 방식은 forward selection과 backward selection을 혼합한 방법으로, 먼저 forward selection에 의해 첫 번째 feature를 기반으로 분류기를 구축한다. 그 다음 역시 forward 방법을 통해 두 번째 feature를 포함시키고, 이미 선택된 첫 번째 feature를 제거할 것인가를 backward selection 방법에 의해 판단한다. 이미 선택이 되지 않은 feature들에 대해서는 forward selection을 적용하여 feature를 하나씩 선택하여 모델에 적용하고, 이미 포함된 feature들에 대해서는 backward 방법에 의해 제거하는 방식이다. 본 발명에서는 backward selection을 SVM에 적용하여 주요 feature를 선택한 뒤, 학습 알고리즘에 적용한다.Among the most widely used feature selection methods, the three most intuitive and widely used methods are forward selection, backward selection, and stepwise selection. First, the forward selection constructs a classifier with a specific classification methodology for each individual feature, selects the feature with the highest accuracy, and then for each model paired with the previously selected feature and the non-selected feature, To develop the feature combination that has the best performance. The feature combination is extended until the performance of the classification model based on the feature combination obtained by gradually adding the features is no longer improved (generally, n candidate repetitions are performed when n candidate features are present). In the case of backward selection, a classification model is constructed based on all the first features, the performance is evaluated, the individual features are removed one by one, and a classification model is constructed again. Gradually, It is a way to create a feature set. Finally, the Stepwise selection method is a mixture of forward selection and backward selection. First, the classifier is constructed based on the first feature by forward selection. Next, the second feature is included through the forward method and the backward selection method is used to determine whether or not to remove the first feature that has already been selected. For features that are not already selected, forward selection is applied to select features one by one, and features already included are removed by backward method. In the present invention, the backward selection is applied to the SVM to select the main features and then applied to the learning algorithm.

2.4 의미 데이터 가공 기법2.4 Semantic Data Processing Techniques

상술한 방법은 현재 가지고 있는 feature 중 의미있는 feature만을 추출하는 방식으로, feature들 간의 Correlation은 고려하지 않고, 원래 상태로 중요하다고 생각되는 feature를 추출하는 데에 목적이 있었다. 이 때, Deep Learning 알고리즘을 사용할 경우 네트워크 안에서 weighted sum을 하기 때문에 어느 정도 상관관계를 판단하는 것은 가능하나, 곱셈이나 나눗셈은 자동으로 할 수 없어 Ratio data를 자동으로 생성하지는 못하는 단점이 있다. 따라서, 사람이 생각했을 때 유효하다고 생각되는 매출액, 자산총계, 유형자산으로 전체 데이터를 각각 나누어 병합하는 방식으로 데이터를 만들어왔는데, 이러한 문제까지 알고리즘을 통해 자동으로 생성하는 것이 가능하다면, 불필요한 feature를 생성해내는 일이 발생하지 않는다. 따라서, 본 연구에서는 Ensemble Boosted Tree를 이용한 Synthetic feature 추출 기법(이하, '종래 연구'라고 함)을 응용한 방법을 제안한다.The method described above extracts only meaningful features among the features that are presently present, and aims at extracting features that are deemed important in the original state without considering correlation between the features. In this case, when using the Deep Learning algorithm, it is possible to judge a certain degree of correlation because weighted sum is performed in the network, but there is a disadvantage that the multiplication or division can not be performed automatically and the ratio data can not be automatically generated. Therefore, we have created data by merging all the data into sales, total assets, and tangible assets that are considered to be valid when a person thinks it. If this problem can be automatically generated through the algorithm, then unnecessary features And does not generate a job. Therefore, in this study, we propose a method to apply synthetic feature extraction technique using Ensemble Boosted Tree (hereinafter referred to as 'conventional research').

2.4.1 EBT를 이용한 Synthetic feature 추출2.4.1 Synthetic feature extraction using EBT

종래 연구에서 Ensemble Boosted Tree를 이용하여 Synthetic feature를 추출하는 알고리즘을 제안된 바 있다. 여기에서 사용된 주요 아이디어는, 각 feature마다 다른 feature 하나를 랜덤하게 선택한 뒤, 랜덤하게 사칙연산 중 하나의 오퍼레이터를 선택해 feature dimension을 늘려 다시 학습 시킨 뒤, feature importance가 threshold 이하인 것들을 제거하는 것을 K번 반복해 feature간의 관계를 찾아가는 방식이다. In the previous research, an algorithm for extracting synthetic features using Ensemble Boosted Tree has been proposed. The main idea used here is to randomly select one feature for each feature, then randomly select one of the operators to increase the feature dimension, and then learn again to remove the features whose feature importance is below the threshold K It is a way to find the relationship between features repeatedly.

그러나, 본 연구에서 다루는 데이터의 특성상 매출액, 자산총계, 유형자산과의 비율은 모든 데이터에 적용해 보아야 하기 때문에, 종래 연구에서 제안한 랜덤의 요소는 제거한 방법을 사용하였다. 가능한 모든 feature 조합과 모든 사칙연산을 적용하여 synthetic feature를 추출하였다. 자세한 사항은 다음과 같다.However, because the characteristics of data covered in this study should be applied to all data, the ratio of sales, total assets, and tangible assets should be applied to all data. All possible feature combinations and all arithmetic operations were applied to extract synthetic features. The details are as follows.

알고리즘 7. Synthetic feature 추출 알고리즘Algorithm 7. Synthetic feature extraction algorithm Input:

: training set,

: number of synthetic features

: number of base learners

: features acceptance threshold
Output:

:set of base learners
for

do
Train

using

;
Remove features from

for which

;
Estimate

from model

;
for

do
for

do
Sample features

and

;
Generate new features

; operation

from {+, -, *, /}
Extend

with new values of

end
end
end
return

Input :

: training set,

: number of synthetic features

: number of base learners

: features acceptance threshold
Output :

: set of base learners
for

do
Train

using

;
Remove features from

for which

;
Estimate

from model

;
for

do
for

do
Sample features

and

;
Generate new features

; operation

from {+, -, *, /}
Extend

with new values of

end
end
end
return

3. 딥러닝 모델3. Deep Learning Model

이하에서는 재무제표 데이터에 적합한 딥러닝 모델로 다음의 세 가지를 제안한다.The following three suggestions are made as a deep learning model suitable for financial statement data.

3.1 Normal Ensemble Model3.1 Normal Ensemble Model

Ensemble 모델을 사용하면, 그 성능이 일정부분 상승함은 이미 이전 발명들에서 많이 증명이 되어왔다. 따라서, Deep Learning 모델을 여러 개 동시에 학습하여 투표하는 형식의 Ensemble 모델을 제안한다. 구조는 그림 20과 같다. 도 20은 Normal Ensemble Model의 일 예를 도시한 도면이다.Using the Ensemble model, its performance has increased to some extent already in previous inventions. Therefore, we propose an Ensemble model of voting by learning multiple Deep Learning models at the same time. The structure is shown in Fig. 20 is a diagram showing an example of a Normal Ensemble Model.

3.2 Regression Model3.2 Regression Model

Normal Ensemble Model의 경우 결과를 good class와 bad class로 예측하였는데, 이를 좀 더 세분화하여 0~1사이의 값으로 어느 정도 bad인지를 추정하는 Regression Mode을 제안한다. 그 모델은 도 21과 같다. 도 21은 Regression Model을 도시한 도면이다.In the case of the Normal Ensemble Model, we predicted the results as good and bad classes. We propose a regression mode that estimates the degree of bad to some degree between 0 and 1 by further subdivision. The model is shown in Fig. 21 is a diagram showing a regression model.

3.3 Function Concatenate Model3.3 Function Concatenate Model

이전 모델들은 normalization된 데이터에 대해서 학습이 잘 수행될 수 있도록 sigmoid, tanh 등의 함수를 임의로 선택하여 연산한 후에 딥러닝 모델에 입력하였었다. 그러나, 각 학습 효과를 극대화시키는 function이 feature마다 다르기 때문에, 모든 function을 적용시킨 뒤 딥러닝 모델이 학습을 하면서 스스로 선택할 수 있도록 하는 Function Concatenate Model을 제안하였다.In previous models, sigmoid, tanh, and other functions were selected randomly and then input to the deep learning model so that learning can be performed well for normalized data. However, since the function that maximizes each learning effect differs from feature to feature, we proposed a function concatenate model that allows the user to select the deeper learning model while learning all functions.

Function Concatenate Model은 재무 데이터의 여러 feature들의 아웃라이어를 자동으로 보정하여 학습 효과를 높이는 딥러닝 모델이다.Function Concatenate Model is a deep learning model that improves the learning effect by automatically correcting the outliers of various features of financial data.

모델의 알고리즘은 다음과 같다.The algorithm of the model is as follows.

Function Concatenate Model에서는 특정 feature에 weight를 스스로 높일 수도 있지만, 특히 중요한 feature의 경우 모든 function들을 수용하여 그 효과를 극대화시키는 효과도 있다. 전체 모델은 도 22와 같으며, 이 모델 또한 앙상블 기법을 사용하여 성능 향상을 도모하였다. 도 22는 Function Concatenate Model의 일 예를 도시한 도면이다.Function Concatenate Model can increase the weight on a certain feature by itself, but it also has the effect of maximizing the effect by accepting all functions in case of a particularly important feature. The overall model is shown in FIG. 22, and this model is also improved by using an ensemble technique. 22 is a diagram showing an example of a Function Concatenate Model.

3.4 LSTM Model3.4 LSTM Model

재무제표 데이터는 매 해마다 입력되는 정보와 label을 가지고 있는 시계열 데이터이므로 도 23와 같은 LSTM (Long Short-term Memory) 모델을 제안한다. LSTM은 이전의 결과들을 일시적으로 기억하여 다음 데이터의 예측에 도움을 주는 네트워크이다.Since the financial statement data is time series data having information and labels inputted every year, it proposes a LSTM (Long Short-term Memory) model as shown in FIG. LSTM is a network that temporarily memorizes previous results and helps predict the next data.

4. 결론4. Conclusion

본 연구에서는, 먼저, 다양한 data normalization 방법을 적용하여 전통적인 머신러닝 기법들에 본 데이터를 적용해보는 실험을 진행하였고, 이 결과를 바탕으로 하여 학습 데이터의 feature별 최적의 normalization 방법을 자동으로 선택하는 알고리즘을 제안하였다.In this study, we first applied various data normalization methods to apply this data to traditional machine learning methods. Based on these results, we propose an algorithm that automatically selects the optimal normalization method for each feature of learning data .

또한 불필요한 정보/노이즈 feature를 제거하는 알고리즘을 제안하였다.We also proposed an algorithm to remove unnecessary information / noise features.

현재 가지고 있는 재무제표 데이터 이외의 다른 항목에 의해 추징 금액이 부여된 경우 해당 데이터는 '정상' 이라는 레이블이 부여된 뒤 학습이 되어야 한다. 그러나 현재 가지고 있는 데이터만을 이용하여 다른 정보에 의해 레이블이 부여된 업체들을 일일이 추론하는 것이 어려워 현재의 데이터만을 이용해 레이블이 잘못 정의된 업체들을 자동으로 탐지 및 제거하는 기술을 제안하였다.If the amount of surplus is given by other items other than the financial statement data that you have, the data should be labeled after being labeled 'normal'. However, it is difficult to deduce the companies that are labeled by other information using only the data that they have. Therefore, we have proposed a technology to automatically detect and remove companies whose labels are misidentified using only current data.

또한, 데이터간의 상관관계를 비례와 반비례 관계에 국한된 것이 아닌, 좀 더 다양한 형태의 관계 정보를 가지고 예측모델을 학습 시키기 위한 Synthetic data 가공 알고리즘을 제안하였다. In addition, we proposed a synthetic data processing algorithm to study the prediction model with more various types of relationship information, not limited to the proportional and inverse relation of the correlation between data.

마지막으로는 이러한 데이터 전처리를 딥러닝 모델에서 자동으로 처리할 수 있도록 모델링한 딥러닝 네트워크를 제안하였다.Finally, we propose a deep learning network modeling the data preprocessing to be automatically processed by the deep learning model.

이하에서는 첨부된 도면들을 참조하여 본 발명의 실시예들을 중심으로 본 발명을 상세히 설명한다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Like reference symbols in the drawings denote like elements.

본 발명의 일 실시예에 따른 딥 러닝 시스템은 복수의 학습 데이터를 이용하여 인공 신경망 모델을 학습할 수 있다.The deep learning system according to an embodiment of the present invention can learn an artificial neural network model using a plurality of learning data.

도 24는 본 발명의 일 실시예에 따른 딥 러닝 시스템(100)의 개략적인 구조를 나타내는 블록도이다.24 is a block diagram illustrating a schematic structure of a deep learning system 100 according to an embodiment of the present invention.

도 24을 참조하면, 상기 딥 러닝 시스템(100)은 획득모듈(110), 제어모듈(120)을 포함할 수 있다. 본 발명의 실시예에 따라서는, 상술한 구성요소들 중 일부 구성요소는 반드시 본 발명의 구현에 필수적으로 필요한 구성요소에 해당하지 않을 수도 있으며, 또한 실시예에 따라 딥 러닝 시스템(100)은 이보다 더 많은 구성요소를 포함할 수도 있음은 물론이다. 예를 들어 상기 딥 러닝 시스템(100)은 상술한 바와 같은 학습 데이터 전처리 과정을 수행하는 전처리 모듈(미도시)을 더 포함할 수도 있다.Referring to FIG. 24, the deep learning system 100 may include an acquisition module 110, a control module 120, and the like. Depending on the embodiment of the present invention, some of the components described above may not necessarily correspond to components that are essential to the implementation of the present invention, and in some embodiments the deep learning system 100 may be more But may include more components. For example, the deep learning system 100 may further include a preprocessing module (not shown) for performing a learning data preprocessing process as described above.

상기 딥 러닝 시스템(100)은 본 발명의 기술적 사상을 구현하기 위해 필요한 하드웨어 리소스(resource) 및/또는 소프트웨어를 구비할 수 있으며, 반드시 하나의 물리적인 구성요소를 의미하거나 하나의 장치를 의미하는 것은 아니다. 즉, 상기 딥 러닝 시스템(100)은 본 발명의 기술적 사상을 구현하기 위해 구비되는 하드웨어 및/또는 소프트웨어의 논리적인 결합을 의미할 수 있으며, 필요한 경우에는 서로 이격된 장치에 설치되어 각각의 기능을 수행함으로써 본 발명의 기술적 사상을 구현하기 위한 논리적인 구성들의 집합으로 구현될 수도 있다. 또한, 상기 딥 러닝 시스템(100)은 본 발명의 기술적 사상을 구현하기 위한 각각의 기능 또는 역할별로 별도로 구현되는 구성들의 집합을 의미할 수도 있다. 상기 딥 러닝 시스템(100)을 구성하는 모듈은 서로 다른 물리적 장치에 위치할 수도 있고, 동일한 물리적 장치에 위치할 수도 있다. 또한, 구현 예에 따라서는 상기 딥 러닝 시스템(100)을 구성하는 각각의 모듈을 구성하는 소프트웨어 및/또는 하드웨어 역시 서로 다른 물리적 장치에 위치하고, 서로 다른 물리적 장치에 위치한 구성들이 서로 유기적으로 결합되어 각각의 모듈들이 수행하는 기능을 실현할 수도 있다.The deep learning system 100 may include hardware resources and / or software necessary for realizing the technical idea of the present invention, and it means one physical component or one device no. That is, the deep learning system 100 may mean a logical combination of hardware and / or software provided to implement the technical idea of the present invention. If necessary, the deep learning system 100 may be installed in a separate apparatus, The present invention may be embodied as a set of logical structures for realizing the technical idea of the present invention. Also, the deep learning system 100 may mean a set of configurations separately implemented for each function or role to implement the technical idea of the present invention. The modules that make up the deep learning system 100 may be located in different physical devices or may be located in the same physical device. In addition, according to an embodiment, the software and / or hardware constituting each module constituting the deep learning system 100 are also located in different physical devices, and configurations located in different physical devices are organically coupled to each other The functions performed by the modules of FIG.

또한, 본 명세서에서 모듈이라 함은, 본 발명의 기술적 사상을 수행하기 위한 하드웨어 및 상기 하드웨어를 구동하기 위한 소프트웨어의 기능적, 구조적 결합을 의미할 수 있다. 예컨대, 상기 모듈은 소정의 코드와 상기 소정의 코드가 수행되기 위한 하드웨어 리소스의 논리적인 단위를 의미할 수 있으며, 반드시 물리적으로 연결된 코드를 의미하거나, 한 종류의 하드웨어를 의미하는 것은 아님은 통상의 기술자에게는 용이하게 추론될 수 있다.In this specification, a module may mean a functional and structural combination of hardware for carrying out the technical idea of the present invention and software for driving the hardware. For example, the module may mean a logical unit of a predetermined code and a hardware resource for executing the predetermined code, and it does not necessarily mean a physically connected code or a kind of hardware. It can easily be deduced to a technician.

상기 제어모듈(120)은 상기 딥 러닝 시스템(100)에 포함된 다른 구성(예를 들면, 획득모듈(110) 등)과 연결되어, 이들의 기능 및/또는 리소스를 제어할 수 있다. 또한 상기 제어모듈(120)은 상기 딥 러닝 시스템(100)이 상술한 인공 신경망 학습 방법을 수행하도록 제어할 수 있다.The control module 120 may be coupled to other components (e.g., the acquisition module 110) included in the deep learning system 100 to control their functions and / or resources. Also, the control module 120 may control the deep learning system 100 to perform the artificial neural network learning method described above.

상기 획득모듈(110)은 인공 신경망 모델을 학습하는데 이용되는 적어도 하나의 학습 데이터를 획득할 수 있다. 여기서 상기 학습 데이터 각각은, 제1특징 내지 제M특징(M은 2이상의 정수) 각각에 상응하는 M개의 특징 값을 포함할 수 있다.The acquisition module 110 may obtain at least one learning data used to learn an artificial neural network model. Here, each of the learning data may include M feature values corresponding to the first to Mth features (M is an integer of 2 or more).

예를 들어, 상기 획득모듈(110)은 N개의 기초 학습 데이터를 저장하고 있는 파일을 읽을 수 있다. 또는 상기 획득모듈(110)은 소정의 입력장치를 통하여 N개의 기초 학습 데이터를 입력받을 수 있다. 또는 상기 획득모듈(110)은 네트워크를 통하여 N개의 기초 학습 데이터를 수신할 수 있다.For example, the acquisition module 110 may read a file storing N basic learning data. Alternatively, the acquisition module 110 may receive N basic learning data through a predetermined input device. Or the acquisition module 110 may receive N basic learning data via the network.

도 25는 M개의 특징값 및 레이블로 구성되는 N개(N은 2 이상의 정수)의 학습 데이터의 일 예를 도시한 도면이다.FIG. 25 is a diagram showing an example of N pieces of learning data composed of M feature values and labels (N is an integer of 2 or more).

도 25에 도시된 바와 같이, 각각의 학습 데이터 D₁ 내지 D_N은 모두 M개의 특징값 및 레이블을 포함할 수 있다. 예를 들어, 학습 데이터 D₁은 M개의 특징값 V₁₁, V₁₂, V₁₃, …, V_1M 및 레이블 L₁을 포함할 수 있으며 특징값 V₁₁, V₁₂, V₁₃, …, V_1M은 각각 차례대로 특징 F₁, F₂, F₃ 내지 F_M에 상응할 수 있다. 마찬가지로 학습 데이터 D_n은 M개의 특징값 V_n1, V_n2, V_n3, …, V_NM 및 레이블 L_n을 포함할 수 있으며, 특징값 V_n1, V_n2, V_n3, …, V_nM은 각각 차례대로 특징 F₁, F₂, F₃ 내지 F_M에 상응할 수 있다.As shown in FIG. 25, each learning data D ₁ to D _N may include M feature values and labels. For example, the learning data D ₁ includes M feature values V ₁₁ , V ₁₂ , V ₁₃ , ... , V _1M, and label L ₁ , and may include feature values V ₁₁ , V ₁₂ , V ₁₃ , ... , V _1M may in turn correspond to features F ₁ , F ₂ , F ₃ to F _M , respectively. Similarly, the learning data D _n includes M feature values V _n1 , V _n2 , V _n3 , ... , V _NM, and label L _n , and the feature values V _n1 , V _n2 , V _n3 , ... , V _nM may in turn correspond to features F ₁ , F ₂ , F ₃ to F _M , respectively.

다시 도 24를 참조하면, 상기 제어모듈(120)은 상기 적어도 하나의 학습 데이터를 이용하여 상기 인공 신경망 모델을 학습할 수 있다.Referring again to FIG. 24, the control module 120 may learn the artificial neural network model using the at least one learning data.

도 26은 본 발명의 일 실시예에 따른 딥 러닝 시스템(100)이 학습할 수 있는 인공 신경망 모델의 구조를 설명하기 위한 도면이다.26 is a diagram for explaining a structure of an artificial neural network model that can be learned by the deep learning system 100 according to an embodiment of the present invention.

도 26을 참조하면 상기 인공 신경망 모델은, 입력 레이어(input layer; 10), 변환 함수 레이어(modification function layer; 20), 적어도 하나의 이너 프로덕트 레이어(inner product layer; 30-1 내지 30-L), 및 출력 레이어(output layer; 30)를 포함하도록 구성될 수 있다.26, the artificial neural network model includes an input layer 10, a modification function layer 20, at least one inner product layer 30-1 to 30-L, , And an output layer (30).

상기 변환 함수 레이어(20)는, K개(K는 2 이상의 자연수)의 변환 함수 노드를 포함하도록 구성될 수 있다. 도 26의 예시에서는 4개의 노드를 도시하고 있으나, 상기 인공 신경망 모델은 이보다 더 많거나 더 적은 수의 변환 함수 노드를 포함할 수 있다.The transform function layer 20 may be configured to include K (K is a natural number of 2 or more) transform function nodes. Although FIG. 26 illustrates four nodes, the artificial neural network model may include more or fewer number of transform function nodes.

상기 변환 함수 레이어(20)에 포함된 모든 노드는 상기 입력 레이어(10)로부터 각각의 학습 데이터를 입력받을 수 있다. 상술한 바와 같이 각각의 학습 데이터는 M개의 특징 값으로 구성되어 있을 수 있다.All the nodes included in the transform function layer 20 can receive the respective learning data from the input layer 10. As described above, each learning data may be composed of M feature values.

즉, k번째 변환 함수 노드(k는 1<=k<=K를 만족하는 임의의 자연수)는, 상기 입력 레이어(10)로부터 학습 데이터에 상응하는 M차원 벡터를 입력받을 수 있다.That is, the k-th transform function node (k is an arbitrary natural number satisfying 1 <= k <= K) can receive the M-dimensional vector corresponding to the learning data from the input layer 10.

또한 k번째 변환 함수 노드(k는 1<=k<=K를 만족하는 임의의 자연수)는 입력된 M차원 벡터에 포함된 각각의 특징 값에 소정의 제k 변환 함수를 적용하여 M차원 변환 값 벡터를 생성할 수 있다. 예를 들어 k번째 변환 함수 노드가 f_k()라면, 첫 번째 변환 함수에 Dn = [V_n1, V_n2, V_n3, …, V_nM]이라는 벡터가 입력 레이어(10)로부터 입력되는 경우 k번째 변환 함수 노드는 [f_k(V_n1), f_k(V_n2), f_k(V_n3), …, f_k(V_nM)]이라는 변환 값 벡터를 생성할 수 있다.The k-th transform function node (k is an arbitrary natural number satisfying 1 < = k < = K) applies a predetermined k-th transform function to each feature value included in the input M- You can create a vector. For example, a k-th function conversion node, if f _k (), the first transformation function _{_{Dn = [V n1, V n2}} , V n3, ... , V _nM ] are input from the input layer 10, the k-th transform function node can be expressed as [f _k (V _n1 ), f _k (V _n2 ), f _k (V _n3 ), ... , f _k (V _nM )].

또한 k번째 변환 함수 노드(k는 1<=k<=K를 만족하는 임의의 자연수)는, 도 26에 도시된 바와 같이, 생성된 M차원 변환 값 벡터를 최상단 이너 프로덕트 레이어(30-1)의 각 노드로 출력하도록 구성될 수 있다.The k-th transformation function node (k is an arbitrary natural number satisfying 1 < = k < = K) transforms the generated M-dimensional transformation value vector to the uppermost inner product layer 30-1, Lt; RTI ID = 0.0 > node < / RTI >

상기 제1변환 함수 내지 제K 변환 함수는 모두 상이한 함수일 수 있다.The first to K-th transform functions may all be different functions.

한편, 상기 제1변환 함수 내지 제K 변환 함수는 모두 S-shaped curve 성질을 가지는 함수 또는 구간(0,∞)에서 함수값은 증가하고 미분값은 감소하는 성질을 가지는 함수일 수 있다. 즉, 상기 제1변환 함수 내지 제K 변환 함수는 모두 변수 값이 커질수록 함수 값의 증가 폭은 작아지는 함수일 수 있다.Meanwhile, the first transform function to the K-th transform function may all have a function having an S-shaped curve property or a function having a property of increasing a function value and decreasing a derivative value in an interval (0,?). That is, the first to K-th transformation functions may all be functions that the increase of the function value decreases as the variable value increases.

예를 들어, 상기 제1변환 함수 내지 제K 변환 함수는 하기 [수식 1]에 의해 표현되는 sigmoid 함수, 하기 [수식 2]에 의해 표현되는 hyperbolic tangent 함수, 하기 [수식 3]에 의해 표현되는 적어도 하나의 tf(x) 중 어느 하나일 수 있다.For example, the first transform function to the K-th transform function may be expressed by a sigmoid function expressed by [Expression 1], a hyperbolic tangent function expressed by Expression 2, And may be any one of tf (x).

[수식 1] 1/(1+e^-x)[Equation 1] 1 / (1 + e ^-x )

[수식 2] tanh(x)[Equation 2] tanh (x)

[수식 3] tf(x) = log(α_h×x+β_h) (여기서, h는 1<=h<=H인 각각의 정수(H는 1이상의 정수)이며 α_h 및 β_h 는 미리 정의된 상수임)[Formula 3] tf (x) = log (α h × x + β h) ( here, h is 1 <= h <= H in each of the integer (H is an integer of 1 or greater), and α _h and β _h in advance Defined constants)

한편, 특정 실시예에서, [수식 3]의 tf(x)의 α_h 는10, 100, 1000, 10000 등일 수 있으며, β_h 는 1일 수 있다. 즉, 본 실시예에서, 상기 복수의 변환 함수는 log(10×x+1), log(100×x+1), log(1000×x+1), log(10000×x+1) 등을 포함할 수 있다.On the other hand, in a particular embodiment, expression of [Formula 3] tf (x) α _h is 10, and the like 100, 1000, 10000 on the, β _h may be one day. That is, in the present embodiment, the plurality of conversion functions are log (10 x x + 1), log (100 x x + 1), log (1000 x x + .

한편, 적어도 하나의 이너 프로덕트 레이어(inner product layer; 30-1 내지 30-L)는 통상적인 뉴럴 네트워크 모델의 히든 레이어(hidden layer)와 동일한 성질을 가질 수 있다. 상기 적어도 하나의 이너 프로덕트 레이어(inner product layer; 30-1 내지 30-L) 각각은 가중치를 가질 수 있으며, 각 노드의 가중치는 학습이 진행됨에 따라 학습 데이터에 맞게 최적화될 수 있다. 각각의 이너 프로덕트 레이어는 적어도 하나의 노드를 포함할 수 있으며 상위 레이어에 포함된 노드 중 적어도 일부는 다음 레이어에 포함된 노드 중 적어도 일부와 연결되어 있을 수 있다.On the other hand, at least one inner product layer (30-1 to 30-L) may have the same properties as the hidden layer of a typical neural network model. Each of the at least one inner product layers 30-1 to 30-L may have a weight, and the weight of each node may be optimized according to learning data as learning proceeds. Each inner product layer may include at least one node, and at least some of the nodes included in the upper layer may be connected to at least some of the nodes included in the next layer.

도 26에서는 이해의 편의를 위해 제2 이너 프로덕트 레이어(30-2) 내지 제L이너 프로덕트 레이어(30-L)에는 노드를 도시하지 아니하였으나 실제 인공 신경망 모델에서는 각각의 이너 프로덕트 레이어에 적어도 하나의 노드가 포함되어 있다.Although nodes are not shown in the second inner product layer 30-2 to the Lth inner product layer 30-L for ease of understanding in FIG. 26, in an actual neural network model, at least one inner product layer Node.

도 22는 상기 변환 함수 레이어(20)에 6개의 변환 함수 노드가 포함되어 있으며, 변환 함수 노드가 차례로 sigmoid 함수, hyperbolic tangent 함수, log(10000×x+1), log(1000×x+1), log(100×x+1), log(10×x+1)를 수행하는 인공 신경망 모델의 예를 도시한 도면이다.22 shows that the transform function layer 20 includes six transform function nodes, and the transform function nodes are in turn a sigmoid function, a hyperbolic tangent function, log (10000 x x + 1), log (1000 x x + 1) , log (100 x x + 1), and log (10 x x + 1).

도 27은 본 발명의 다른 일 실시예에 따른, 인공 신경망 모델 학습 방법을 나타내는 흐름도이다.FIG. 27 is a flowchart illustrating an artificial neural network model learning method according to another embodiment of the present invention.

도 27을 참조하면 딥 러닝 시스템(100)은 인공 신경망 모델을 학습하는데 이용되는 적어도 하나의 학습 데이터를 획득할 수 있다(S100, S110). 여기서 상기 학습 데이터 각각은, 제1특징 내지 제M특징(M은 2이상의 정수) 각각에 상응하는 M개의 특징 값을 포함할 수 있다.Referring to FIG. 27, the deep learning system 100 may acquire at least one learning data used for learning an artificial neural network model (S100, S110). Here, each of the learning data may include M feature values corresponding to the first to Mth features (M is an integer of 2 or more).

또한 상기 딥 러닝 시스템(100)은 상기 적어도 하나의 학습 데이터를 이용하여 인공 신경망 모델을 학습할 수 있다(S100, S120).Also, the deep learning system 100 may learn an artificial neural network model using the at least one learning data (S100, S120).

한편, 구현 예에 따라서, 상기 딥 러닝 시스템(100)은 프로세서 및 상기 프로세서에 의해 실행되는 프로그램을 저장하는 메모리를 포함할 수 있다. 상기 프로세서는 CPU, GPU, MCU, 마이크로프로세서 등과 같은 프로세서를 포함할 수 있으며, 싱글 코어 CPU혹은 멀티 코어 CPU를 포함할 수 있다. 메모리는 휘발성 메모리 및 비휘발성 메모리를 포함할 수 있다. 상기 메모리는 예를 들어, 플래시 메모리, ROM, RAM, EEROM, EPROM, EEPROM, 하드 디스크, 레지스터를 포함할 수 있다. 또는 상기 메모리는 파일 시스템, 데이터베이스, 임베디드 데이터베이스를 포함할 수도 있다. 프로세서 및 기타 구성 요소에 의한 메모리로의 액세스는 메모리 컨트롤러에 의해 제어될 수 있다.Meanwhile, according to an embodiment, the deep learning system 100 may include a processor and a memory for storing a program executed by the processor. The processor may include a processor such as a CPU, a GPU, an MCU, a microprocessor, and the like, and may include a single-core CPU or a multi-core CPU. The memory may include volatile memory and non-volatile memory. The memory may include, for example, a flash memory, a ROM, a RAM, an EEROM, an EPROM, an EEPROM, a hard disk, and a register. Or the memory may include a file system, a database, and an embedded database. Access to the memory by the processor and other components can be controlled by the memory controller.

한편, 본 발명의 실시예에 방법은 컴퓨터가 읽을 수 있는 프로그램 명령 형태로 구현되어 컴퓨터로 읽을 수 있는 기록 매체에 저장될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.Meanwhile, the method according to an embodiment of the present invention may be implemented in the form of computer-readable program instructions and stored in a computer-readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored.

기록 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다.Program instructions to be recorded on a recording medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of software.

컴퓨터로 읽을 수 있는 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Examples of the computer-readable recording medium include magnetic media such as a hard disk, a floppy disk and a magnetic tape, optical media such as CD-ROM and DVD, a floptical disk, And hardware devices that are specially configured to store and execute program instructions such as magneto-optical media and ROM, RAM, flash memory, and the like. The computer readable recording medium may also be distributed over a networked computer system so that computer readable code can be stored and executed in a distributed manner.

프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 전자적으로 정보를 처리하는 장치, 예를 들어, 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Examples of program instructions include machine language code such as those produced by a compiler, as well as devices for processing information electronically using an interpreter or the like, for example, a high-level language code that can be executed by a computer.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다.It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be.

그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성요소들도 결합된 형태로 실시될 수 있다.It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타나며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.It is intended that the present invention covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents. .

Claims

As a deep running system,
An acquisition module for acquiring at least one learning data used for learning an artificial neural network model, wherein each of said learning data includes M feature values corresponding to respective first to Mth features (M is an integer of 2 or more) box; And
And a control module for learning the artificial neural network model using the at least one learning data,
Wherein the artificial neural network model comprises:
An apparatus, comprising: an input layer; a modification function layer; at least one inner product layer; and an output layer,
Wherein the transform function layer comprises:
K < / RTI > (K is a natural number greater than or equal to 2) transform function nodes,
The kth transform function node (k is an arbitrary natural number satisfying 1 < = k < = K)
Dimensional vector corresponding to learning data from the input layer, generates a M-dimensional transformed value vector by applying a predetermined k-th transform function to each feature value included in the input M-dimensional vector, Dimensional transform value vector to each node of the uppermost inner product layer,
Wherein the first transform function to the K-th transform function is a function having an S-shaped curve property or a function having a property of increasing a function value and decreasing a differential value at an interval (0,?) .

The method according to claim 1,
Wherein each of the first to K-
sigmoid function, a hyperbolic tangent function, and at least one tf (x) represented by the following equation.
[Formula] tf (x) = log ( α h × x + β h) ( here, h is 1 <= h <= H in each of the integer (H is an integer of 1 or greater), and α _h and β _h is pre-defined Lt; / RTI >

As a deep running system,
An acquisition module for acquiring at least one learning data used for learning an artificial neural network model, wherein each of said learning data includes M feature values corresponding to respective first to Mth features (M is an integer of 2 or more) box; And
And a control module for learning the artificial neural network model using the at least one learning data,
Wherein the artificial neural network model comprises:
An apparatus, comprising: an input layer; a modification function layer; at least one inner product layer; and an output layer,
Wherein the transform function layer comprises:
K < / RTI > (K is a natural number greater than or equal to 2) transform function nodes,
The kth transform function node (k is an arbitrary natural number satisfying 1 < = k < = K)
Dimensional vector corresponding to learning data from the input layer, generates a M-dimensional transformed value vector by applying a predetermined k-th transform function to each feature value included in the input M-dimensional vector, Dimensional transform value vector to each node of the uppermost inner product layer,
Wherein each of the first to K-
sigmoid function, a hyperbolic tangent function, and at least one tf (x) represented by the following equation.
[Formula] tf (x) = log ( α h × x + β h) ( here, h is 1 <= h <= H in each of the integer (H is an integer of 1 or greater), and α _h and β _h is pre-defined Lt; / RTI >

As an artificial neural network model learning method,
An acquisition step of acquiring at least one learning data used for learning an artificial neural network model, wherein each of the learning data includes M feature values corresponding to each of the first feature to the M feature (M is an integer of 2 or more) box; And
And a learning step of learning an artificial neural network model using the at least one learning data,
Wherein the artificial neural network model comprises:
An apparatus, comprising: an input layer; a modification function layer; at least one inner product layer; and an output layer,
Wherein the transform function layer comprises:
K < / RTI > (K is a natural number greater than or equal to 2) transform function nodes,
The kth transform function node (k is an arbitrary natural number satisfying 1 < = k < = K)
Dimensional vector corresponding to learning data from the input layer, generates a M-dimensional transformed value vector by applying a predetermined k-th transform function to each feature value included in the input M-dimensional vector, Dimensional transform value vector to each node of the uppermost inner product layer,
Wherein the first transform function to the K-th transform function are functions having an S-shaped curve property or a function having a property of increasing a function value and decreasing a derivative value in an interval (0,?) Learning method.

As an artificial neural network model learning method,
An acquisition step of acquiring at least one learning data used for learning an artificial neural network model, wherein each of the learning data includes M feature values corresponding to each of the first feature to the M feature (M is an integer of 2 or more) box; And
And a learning step of learning an artificial neural network model using the at least one learning data,
Wherein the artificial neural network model comprises:
An apparatus, comprising: an input layer; a modification function layer; at least one inner product layer; and an output layer,
Wherein the transform function layer comprises:
K < / RTI > (K is a natural number greater than or equal to 2) transform function nodes,
The kth transform function node (k is an arbitrary natural number satisfying 1 < = k < = K)
Dimensional vector corresponding to learning data from the input layer, generates a M-dimensional transformed value vector by applying a predetermined k-th transform function to each feature value included in the input M-dimensional vector, Dimensional transform value vector to each node of the uppermost inner product layer,
Wherein each of the first to K-
sigmoid function, hyperbolic tangent function, and at least one tf (x) expressed by the following equation.
[Formula] tf (x) = log ( α h × x + β h) ( here, h is 1 <= h <= H in each of the integer (H is an integer of 1 or greater), and α _h and β _h is pre-defined Lt; / RTI >

A computer program for performing the method according to claim 4 or 5, installed in a data processing apparatus.

As a deep running system,
A processor; And
A memory for storing a computer program,
Wherein the computer program causes the deep learning system to perform the method of claim 4 or 5 when executed by the processor.