KR20210039090A

KR20210039090A - Restricted Boltzmann Machine System Using Kernel Methods

Info

Publication number: KR20210039090A
Application number: KR1020190121463A
Authority: KR
Inventors: 김동국
Original assignee: 전남대학교산학협력단
Priority date: 2019-10-01
Filing date: 2019-10-01
Publication date: 2021-04-09
Also published as: KR102263375B1

Abstract

The present invention relates to a restricted Boltzmann machine system using a kernel technique. The restricted Boltzmann machine system includes: a kernel restricted Boltzmann machine (KRBM) formation part mapping input data to a high-dimensional feature space by using a linear function, and forming a KRBM comprising visible and concealed variables having real number values through a kernel function in the mapped space; a KRBM learning part performing learning for the KRBM through a slope lifting method maximizing a log similarity function; and a feature extraction part extracting features from the input data through the kernel function in accordance with an inputted weighted value. According to the present invention, the system maps input data to a high-dimensional feature space through a nonlinear function in a restricted RBM, uses Gaussian probability distribution having real number values for visible and concealed units in the space, and provides a KRBM using a slope-based CD algorithm and a slope lifting method, thereby bringing about an effect of improving feature extraction and recognition performance in comparison to an existing RBM.

Description

Restricted Boltzmann Machine System Using Kernel Methods

본 발명은 커널 기법을 사용한 제한된 볼츠만 머신 시스템에 관한 것으로 더욱 상세하게는, 입력 데이터를 비선형 함수를 통해 고차원의 특징공간으로 매핑하고, 그 공간에서 실수값을 갖는 가시유닛과 은닉유닛으로 구성된 제한된 볼츠만 머신(RBM)을 형성하되, ReLU(Rectified Linear Unit) 커널 함수를 통해 입력 데이터에 대한 특징을 추출하는 기술에 관한 것이다.The present invention relates to a limited Boltzmann machine system using a kernel technique. More specifically, input data is mapped to a high-dimensional feature space through a nonlinear function, and a limited Boltzmann consisting of a visible unit and a hidden unit having a real value in the space. It relates to a technology for forming a machine (RBM) but extracting features for input data through a ReLU (Rectified Linear Unit) kernel function.

특징학습(feature learning) 또는 표현학습(representation learning)은 많은 데이터로부터 검출 또는 인식을 위해 필요한 적절한 특징 또는 유용한 표현들을 자동적으로 추출하기 위한 기계학습의 한 분야이다.Feature learning or representation learning is a branch of machine learning for automatically extracting appropriate features or useful expressions needed for detection or recognition from a large amount of data.

이러한 특징학습 기법은 이미지, 음성 그리고 기타 데이터 등에서 특징을 추출해 인식 시스템의 입력으로 사용되어 종래의 특징들에 비해 인식성능 향상을 목적으로 한다.This feature learning technique extracts features from images, voices, and other data and is used as an input to a recognition system to improve recognition performance compared to conventional features.

특징 학습을 위한 종래의 지도 특징학습은 입력 데이터와 관련된 레이블을 사용하여 특징을 학습하는 기법으로, 학습 시스템이 나타내는 출력과 데이터 레이블 사이의 오류를 계산하여 학습과정 중에 이를 피드백하여 유용한 특징이 발생하도록 한다.The conventional supervised feature learning for feature learning is a technique that learns features using labels related to input data, and calculates an error between the output and the data label indicated by the learning system and feeds it back during the learning process to generate useful features. do.

이러한 지도 특징학습 기법으로는 지도 사전학습(dictionary learning), 다층 퍼셉트론(MultiLayer Perceptron, MLP) 또는 CNN(Convolutional Neural Network)을 이용한 지도 신경망이 주로 사용된다.As such a supervised feature learning technique, a supervised neural network using a supervised pre-learning (dictionary learning), a multilayer perceptron (MLP), or a convolutional neural network (CNN) is mainly used.

그러나, 지도 특징 학습의 경우, 구할 수 있는 데이터양도 적으며 사람이 직접 레이블을 달아야 하기 때문에 많은 시간과 비용이 소요되어 비효율적인 문제점이 있다.However, in the case of supervised feature learning, the amount of data that can be obtained is small, and since a person must directly label it, it takes a lot of time and cost, and thus, there is an inefficient problem.

이에 본 출원인은 데이터의 레이블 없이 단지 입력 데이터만을 사용하여 특징들을 학습하는 비지도 특징학습 중에, 제한된 볼츠만 머신(Restricted Boltzmann Machine, RBM)에서 입력 데이터를 비선형 함수에 의해 고차원의 특징공간으로 매핑하고, 이 공간에서 가시유닛과 은닉유닛에 대해 실수값을 갖는 가우시안 확률 분포를 사용하며, 경사기반 CD(Contrastive Divergence) 알고리즘 및 경사 승강법(gradient ascent)을 이용한 커널 RBM을 제안하고자 한다.Accordingly, the applicant of the present invention maps the input data to a high-dimensional feature space by a nonlinear function in a restricted Boltzmann Machine (RBM) during unsupervised feature learning that learns features using only input data without a label of data, In this space, a Gaussian probability distribution with real values is used for visible and hidden units, and a kernel RBM using a gradient-based CD (Contrastive Divergence) algorithm and a gradient ascent method is proposed.

한국등록특허 제10-1561651호(2015.10.13.등록)Korean Patent Registration No. 10-1561651 (registered on October 13, 2015)

본 발명의 목적은, 제한된 볼츠만 머신(RBM)에서 입력 데이터를 비선형 함수에 의해 고차원의 특징공간으로 매핑하고, 이 공간에서 가시유닛과 은닉유닛에 대해 실수값을 갖는 가우시안 확률 분포를 사용하며, 경사기반 CD 알고리즘 및 경사 승강법을 이용한 커널 RBM을 제공함으로써, 종래의 RBM에 비해 특징 추출 및 인식 성능을 향상시키는데 있다.An object of the present invention is to map input data into a high-dimensional feature space by a nonlinear function in a limited Boltzmann machine (RBM), use a Gaussian probability distribution with real values for visible units and hidden units in this space, and It is to improve feature extraction and recognition performance compared to conventional RBM by providing a kernel RBM using a base CD algorithm and a gradient elevation method.

이러한 기술적 과제를 달성하기 위한 본 발명의 일 실시예는 커널 기법을 사용한 제한된 볼츠만 머신 시스템으로서, 입력 데이터를 비선형 함수를 이용하여 고차원의 특징 공간으로 매핑하고, 매핑한 공간에서 커널함수를 통해 실수값을 갖는 가시변수 및 은닉변수로 구성된 KRBM(Kernel Restricted Boltzmann Machine)을 형성하는 KRBM 형성부; 로그 유사도 함수를 최대화하는 경사 승강법을 통해 KRBM에 대한 학습을 수행하는 KRBM 학습부; 및 입력받은 가중치에 따라 커널함수를 통해 입력 데이터로부터 특징을 추출하는 특징 추출부를 포함한다.An embodiment of the present invention for achieving this technical problem is a limited Boltzmann machine system using a kernel technique, which maps input data to a high-dimensional feature space using a nonlinear function, and uses a kernel function in the mapped space. KRBM forming unit to form a KRBM (Kernel Restricted Boltzmann Machine) consisting of a visible variable and a hidden variable having a; A KRBM learning unit that learns a KRBM through a gradient elevation method that maximizes a log similarity function; And a feature extraction unit for extracting features from the input data through the kernel function according to the received weight.

바람직하게는, 입력 데이터는 커널함수의 가우시안 확률 분포에 의해 매핑되는 것을 특징으로 한다.Preferably, the input data is characterized in that it is mapped by a Gaussian probability distribution of a kernel function.

비선형함수(

)는 n차원의 입력공간에서

차원의 특징벡터 공간으로 매핑하는 함수로 설정되되, KRBM은 특징벡터 공간에서 정의된 가시변수에 의한 층; 및 실수 값을 갖는 m차원 은닉변수 벡터에 의한 층을 포함하는 것을 특징으로 한다.Nonlinear function (

) Is in the n-dimensional input space

It is set as a function that maps to a dimensional feature vector space, and KRBM includes a layer by visible variables defined in the feature vector space; And a layer by an m-dimensional hidden variable vector having a real value.

상기와 같은 본 발명에 따르면, 제한된 RBM에서 입력 데이터를 비선형 함수에 의해 고차원의 특징공간으로 매핑하고, 이 공간에서 가시유닛과 은닉유닛에 대해 실수값을 갖는 가우시안 확률 분포를 사용하며, 경사기반 CD 알고리즘 및 경사 승강법을 이용한 KRBM을 제공함으로써, 종래의 RBM에 비해 특징 추출 및 인식 성능을 향상시키는 효과가 있다.According to the present invention as described above, input data is mapped to a high-dimensional feature space by a nonlinear function in a limited RBM, a Gaussian probability distribution having real values for visible units and hidden units is used in this space, and gradient-based CD By providing the KRBM using the algorithm and the gradient lifting method, there is an effect of improving the feature extraction and recognition performance compared to the conventional RBM.

도 1은 본 발명의 일 실시예에 따른 커널 기법을 사용한 제한된 볼츠만 머신 시스템을 도시한 구성도.
도 2는 학습된 RBM에 대한 특징 표현에 대한 시각화를 살펴보기 위해 학습된 필터(가중치)에 대한 값을 표현한 도면.
도 3은 종래 기법과 본 발명의 일 실시예에 의한 기법에 대해 MNIST에서 가변적인 은닉유닛 수에 따른 테스트 인식 정확도를 도시한 도면.1 is a block diagram showing a limited Boltzmann machine system using a kernel technique according to an embodiment of the present invention.
2 is a diagram showing values for a learned filter (weight) to look at visualization of a feature expression for a learned RBM.
3 is a diagram showing test recognition accuracy according to a variable number of hidden units in MNIST for a conventional technique and a technique according to an embodiment of the present invention.

본 발명의 구체적인 특징 및 이점들은 첨부 도면에 의거한 다음의 상세한 설명으로 더욱 명백해질 것이다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 발명자가 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 할 것이다. 또한, 본 발명에 관련된 공지 기능 및 그 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는, 그 구체적인 설명을 생략하였음에 유의해야 할 것이다.Specific features and advantages of the present invention will become more apparent from the following detailed description based on the accompanying drawings. Prior to this, terms or words used in the present specification and claims are based on the principle that the inventor can appropriately define the concept of the term in order to describe his or her invention in the best way. It should be interpreted as a corresponding meaning and concept. In addition, when it is determined that a detailed description of known functions and configurations thereof related to the present invention may unnecessarily obscure the subject matter of the present invention, it should be noted that the detailed description thereof has been omitted.

먼저, RBM의 구조와 학습 알고리즘에 대해 살피면 아래와 같다.First, the structure and learning algorithm of RBM are as follows.

RBM은 비방향성 그래프 모델(undirected graphical model)로 관측 데이터를 나타내는 n개의 가시유닛(visible units)

과, m개의 은닉유닛(hidden units),

으로 구성된다.RBM is an undirected graphical model, n visible units representing observed data.

And, m hidden units,

It consists of.

이때, 가시유닛과 은닉유닛의 형태에 따라 이진 또는 실수 값을 표현하는 가시유닛과, 이진 값을 갖는 은닉유닛으로 구성된다. 이진 가시유닛과 이진 은닉유닛을 조합을 갖는 RBM을 BBRBM(Bernoulli-Bernoulli RBM)이라 부르며, 이진 값의 모든 조합에 대해 에너지 함수는 [수학식 1]과 같이 정의할 수 있다.At this time, it is composed of a visible unit representing a binary or real value according to the shape of the visible unit and the hidden unit, and a hidden unit having a binary value. An RBM having a combination of a binary visible unit and a binary hidden unit is called BBRBM (Bernoulli-Bernoulli RBM), and the energy function for all combinations of binary values can be defined as in [Equation 1].

여기서,

는 i번째 가시유닛과 j번째 은닉유닛의 이진 상태를 나타내며,

는 두 유닛사이의 가중치이고,

는 각각의 바이어스를 나타낸다. 실수값을 갖는 입력 데이터의 경우 i번째 가시유닛에 대해 평균과 분산이

와

인 가우시안 분포로 모델링하여 표현하며, 이를 GBRBM(Gaussian-Bernoulli RBM)이라고 지칭한다. 이때, 에너지 함수는 [수학식 2]와 같다.here,

Represents the binary state of the i-th visible unit and the j-th hidden unit,

Is the weight between the two units,

Represents each bias. For input data with real values, the mean and variance for the i-th visible unit are

Wow

It is modeled and expressed as a Gaussian distribution, which is referred to as GBRBM (Gaussian-Bernoulli RBM). At this time, the energy function is the same as in [Equation 2].

이러한 두 가지 모델에 의해 두 유닛의 결합 확률분포는 에너지 함수에 의해

형태로 정의된다. 여기서 Z는 파티션 함수이고,

에 의해 표현된다.By these two models, the combined probability distribution of the two units is determined by the energy function.

It is defined as a form. Where Z is the partition function,

Is represented by

RBM 구조는 가시유닛 층과 은닉유닛 층 사이에 가중치 연결만이 존재하고, 같은 층 유닛사이에는 연결이 존재하지 않기 때문에 가시변수가 주어진 경우 은닉변수의 조건부 확률이 독립적이 된다.In the RBM structure, only the weighted connection exists between the visible unit layer and the hidden unit layer, and no connection exists between the units of the same layer, so when a visible variable is given, the conditional probability of the hidden variable becomes independent.

마찬가지로 은닉변수가 주어진 조건하에서 가시유닛의 조건부 확률도 서로 독립적이다. 따라서, BBRBM의 경우 각 유닛에 대한 조건부 확률은 아래의 [수학식 3] 및 [수학식 4]의 형태를 갖는다.Likewise, the conditional probabilities of visible units are independent of each other under the conditions given by the hidden variable. Therefore, in the case of BBRBM, the conditional probability for each unit has the form of [Equation 3] and [Equation 4] below.

여기서,

는 시그모이드(sigmoid) 함수이다. 그리고, GBRBM 가시유닛의 조건부 확률은 [수학식 5]와 같이 가우시안 형태를 갖는다.here,

Is a sigmoid function. And, the conditional probability of the GBRBM visible unit has a Gaussian form as shown in [Equation 5].

여기서,

은 평균이

이고 분산이

인 가우시안 확률분포를 나타낸다. 이러한 성질은 v와 h사이에 깁스 샘플링(Gibbs sampling) 수행을 통해 학습과정을 빠르게 할 수 있다.here,

Is the average

And the variance is

Represents the Gaussian probability distribution. This property can speed up the learning process by performing Gibbs sampling between v and h.

이에 따라, RBM 형성부(102)는 깁스 샘플링 수행시 [수학식 6]을 통해 가중치 파라미터에 대한 유사도 함수의 경사치(양수 경사치(positive gradient) 및 음수 경사치(negative gradient))를 도출한다.Accordingly, the RBM forming unit 102 derives a slope value (positive gradient and negative gradient) of the similarity function for the weight parameter through [Equation 6] when performing Gibbs sampling. .

첫 번째 부분은 양수 경사치(positive gradient)라 부르며 확률분포에서 쉽게 구할 수 있으며, 두 번째 부분은 음수 경사치(negative gradient)로 깁스샘플링을 통해 얻어진 데이터를 사용해 구하게 된다.The first part is called a positive gradient and can be easily obtained from a probability distribution, and the second part is a negative gradient and is obtained using data obtained through Gibbs sampling.

이하, 도 1을 참조하여 본 발명의 일 실시예에 따른 커널 기법을 사용한 제한된 볼츠만 머신 시스템(100)에 대해 살피면 아래와 같다.Hereinafter, referring to FIG. 1, a description of the limited Boltzmann machine system 100 using the kernel technique according to an embodiment of the present invention is as follows.

도 1에 도시된 바와 같이 본 발명의 일 실시예에 따른 커널 기법을 사용한 제한된 볼츠만 머신 시스템(100)은, 입력 데이터(가시변수)를 비선형 함수를 이용하여 고차원의 특징 공간(high-dimensional feature space)으로 매핑하고, 매핑한 공간에서 커널함수(kernel function)를 통해 실수값을 갖는 가시변수 및 은닉변수로 구성된 커널 RBM(KRBM: Kernel Restricted Boltzmann Machine)을 형성하는 KRBM 형성부(102)와, 로그 유사도(log-likelihood) 함수를 최대화하는 경사 승강법(gradient ascent)을 이용하여 KRBM에 대한 학습을 수행하는 KRBM 학습부(104), 및 입력받은 가중치에 따라 ReLU 커널함수를 통해 입력 데이터로부터 특징을 추출하는 특징 추출부(106)를 포함하여 구성된다. 이때, RBM 형성부(102)에 의해 매핑되는 입력 데이터는 커널함수의 가우시안 확률 분포에 의해 매핑된다.As shown in FIG. 1, a limited Boltzmann machine system 100 using a kernel technique according to an embodiment of the present invention uses a nonlinear function to convert input data (visible variable) into a high-dimensional feature space. ), and forming a kernel RBM (Kernel Restricted Boltzmann Machine (KRBM)) composed of visible and hidden variables having real values through a kernel function in the mapped space, and a log The KRBM learning unit 104 that learns the KRBM using a gradient ascent that maximizes the log-likelihood function, and the features from the input data through the ReLU kernel function according to the received weight. It is configured to include a feature extraction unit 106 to extract. In this case, the input data mapped by the RBM forming unit 102 is mapped by a Gaussian probability distribution of the kernel function.

이하, 본 발명의 일 실시예에 따른 커널 기법을 사용한 제한된 볼츠만 머신 시스템의 구체적으로 KRBM 형성부의 세부구성에 대해 살피면 아래와 같다.Hereinafter, a detailed configuration of a KRBM forming unit of a limited Boltzmann machine system using a kernel technique according to an embodiment of the present invention will be described below.

본 발명의 일 실시예에는

을 원래의 n차원 데이터 공간에서 정의된 가시변수 벡터라고 설정하고, 비선형함수

는 n차원의 입력공간에서

차원의 특징벡터 공간으로 매핑하는 함수로 설정한다. 이때, 경우에 따라

는 무한 차원도 가능하다.In one embodiment of the present invention

Is set to be the vector of visible variables defined in the original n-dimensional data space, and the nonlinear function

Is in the n-dimensional input space

It is set as a function that maps to the dimensional feature vector space. In this case, in some cases

Is also possible in infinite dimensions.

는 가시변수 v에 해당되는 비선형적으로 매핑되는

차원의 특징벡터 공간에서의 가시변수이다.

Is nonlinearly mapped corresponding to the visible variable v

It is a visible variable in the dimensional feature vector space.

KRBM 구조는 특징벡터 공간

에서 정의된 가시변수

에 의한 층과

공간에서 정의된 실수 값을 갖는 m차원 은닉변수 벡터

에 의한 층으로 구성된다.KRBM structure is a feature vector space

Visible variables defined in

Layer by and

M-dimensional hidden variable vector with real values defined in space

It is composed of layers by.

가시변수 층과 은닉변수 층 사이의 연결 가중치 설정을 위해 m개의 n차원 가중치 벡터

들을 정의하고, 이들 각각을 가시변수와 동일하게 특징벡터 공간으로 비선형 함수에 의해 매핑하며, 매핑에 따라

는 j번째 은닉변수

와 특징공간에서 가시벡터

을 연결하는 가중치 벡터가 된다. 그러면,

은 가중치 벡터들에 대응되는 특징벡터 공간에서의 가중치 행렬이된다.M n-dimensional weight vectors to establish connection weights between the visible and hidden variable layers

Are defined, each of them is mapped to the feature vector space in the same way as the visible variable by a nonlinear function, and according to the mapping

Is the j-th hidden variable

And visible vectors in feature space

It becomes a weight vector that connects. then,

Is a weight matrix in the feature vector space corresponding to the weight vectors.

이와 같이 도출한 가시변수, 은닉변수 및 가중치 벡터와 행렬을 통해 아래의 [수학식 7]과 같이 KRBM 구조의 에너지 함수를 정의할 수 있다.The energy function of the KRBM structure can be defined as shown in [Equation 7] below through the visual variable, hidden variable, and weight vector and matrix derived as described above.

여기서, 커널트릭에 의해

이고,

이다. 이러한 에너지 함수의 형태는

와

공간에서

와 h가 실수 값을 갖는 것으로 상정하여 결합 확률분포가 가우시안 분포를 갖도록 이차항들의 형태로 구성되었다.Here, by kernel trick

ego,

to be. The form of this energy function is

Wow

In space

Assuming that and h have real values, the combined probability distribution is constructed in the form of quadratic terms to have a Gaussian distribution.

따라서, RBM과 같은 형태의 결합 확률분포

를 정의할 수 있고, 이와 같은 커널 함수에 의해 비선형 특징공간에서 정의된 에너지 함수와 결합 확률분포를 갖는 RBM을 커널(kernel) RBM(KRBM)이라고 한다. 이때, 파티션 함수는 다음의 [수학식 8]와 같다.Hence, the combined probability distribution in the same form as RBM

Can be defined, and an RBM having an energy function and a combined probability distribution defined in a nonlinear feature space by such a kernel function is called a kernel RBM (KRBM). In this case, the partition function is as shown in [Equation 8] below.

KRBM에 대한 결합 확률분포가 주어진 경우, 특징공간에서

에 대한 한계 확률분포는 다음의 [수학식 9]와 같다.Given the combined probability distribution for KRBM, in the feature space

The marginal probability distribution for is as shown in [Equation 9] below.

또한, 상기 [수학식 7]의 에너지 함수가

와 h에 이차함수 형태를 갖고 있기 때문에 KRBM에 대한 조건부 확률분포들을 다음의 [수학식 10] 및 [수학식 11]과 같이 각각 다변수(multivariate) 가우시안 형태를 갖는다.In addition, the energy function of [Equation 7] is

Since and h have quadratic functions, conditional probability distributions for KRBM have a multivariate Gaussian form as shown in [Equation 10] and [Equation 11] below.

여기서,

와

은 각각 m차원과 f차원에서 단위행렬을 나타낸다. 상기 [수학식 10] 및 [수학식 11]의 조건부 확률분포가 각 성분에 대해 독립적이기 때문에 CD 알고리즘의 깁스샘플링 과정을 기존 RBM과 같이 효율적으로 수행할 수 있다.here,

Wow

Represents the unit matrix in the m dimension and the f dimension, respectively. Since the conditional probability distribution of [Equation 10] and [Equation 11] is independent for each component, the Gibbs sampling process of the CD algorithm can be efficiently performed like the conventional RBM.

KRBM에서는 가시벡터가 커널함수를 통해 고차원의 특징공간으로 매핑됨으로 커널함수의 선택은 주어진 데이터와 수행하는 기능에 따라 달라질 수 있으며, 본 발명의 일 실시예에서는 KRBM을 위한 커널함수로 ReLU 함수를 사용하고, n차원의 두 벡터 w와 v에 대한 커널함수로 ReLU 함수의 정의는 아래의 [수학식 12]와 같다.In KRBM, the visible vector is mapped to a high-dimensional feature space through the kernel function, so the selection of the kernel function may vary depending on the given data and the function to be performed. In one embodiment of the present invention, the ReLU function is used as the kernel function for KRBM. And, the definition of the ReLU function as a kernel function for two n-dimensional vectors w and v is as shown in [Equation 12] below.

이하, 본 발명의 일 실시예에 따른 커널 기법을 사용한 제한된 볼츠만 머신 시스템의 구체적으로 KRBM 학습부의 세부구성에 대해 살피면 아래와 같다.Hereinafter, a detailed configuration of a KRBM learning unit of a limited Boltzmann machine system using a kernel technique according to an embodiment of the present invention will be described below.

KRBM 학습부(104)는 로그 유사도(log-likelihood) 함수를 최대화하는 경사 승강법(gradient ascent)을 이용하여 KRBM에 대한 학습을 수행하게 된다.The KRBM learning unit 104 learns the KRBM using a gradient ascent method that maximizes a log-likelihood function.

고차원 특징공간에서 하나의 학습 데이터

가 주어진 경우, 로그 유사도 함수는 전술한 [수학식 8] 및 [수학식 9]에 의해 [수학식 13]과 같이 설정된다.One learning data in a high-dimensional feature space

When is given, the log similarity function is set as [Equation 13] by [Equation 8] and [Equation 9] described above.

이때, KRBM의 파라미터

에 대한 로그 유사도에 대한 경사값은 아래의 [수학식 14]와 같이 주어진다.At this time, the parameters of KRBM

The slope value for the log similarity to is given as [Equation 14] below.

이때, KRBM의 로그 유사도의 경사값은 두 기댓값의 합으로 구성되며, 첫 번째 항은 계산이 용이하나, 두 번째 항은 결합 확률분포하에서 기댓값을 효율적으로 계산할 수 없다. 따라서, RBM에 적용되었던 CD-1 알고리즘을 통해 두 번째 항의 기댓값을 모델 분포로부터 깁스 샘플링에 의한 샘플값을 근사화한다.At this time, the slope value of the log similarity of KRBM is composed of the sum of two expected values, and the first term is easy to calculate, but the second term cannot efficiently calculate the expected value under the combined probability distribution. Therefore, the expected value of the second term is approximated from the model distribution to the sample value by Gibbs sampling through the CD-1 algorithm applied to RBM.

깁스 샘플링은 데이터 샘플

으로 먼저 초기화하고 이를 통해

으로 매핑하고,

로부터 효율적으로

샘플을 도출할 수 있다. 이러한 CD 알고리즘에 기초해 상기 파라미터에 대한 로그 유사도 경사값은 다음의 [수학식 15]와 같이 근사화된다.Gibbs sampling is a data sample

By initializing it first, and through this

Map to,

Efficiently from

Samples can be derived. Based on this CD algorithm, the log similarity gradient value for the parameter is approximated as shown in [Equation 15] below.

이때, ReLU 커널함수가 사용되는 경우, 가중치

와 바이어스

에 대한 경사값은 다음의 [수학식 16] 및 [수학식 17]과 같이 유추할 수 있다.At this time, if the ReLU kernel function is used, the weight

With bias

The slope value for can be inferred as follows [Equation 16] and [Equation 17].

여기서,

이다. 그리고 깁스 샘플링 과정중에

로부터 v의 샘플을 도출하는 과정에 필요하다. v의 샘플값은 조건부 확률이 최대가 되는 값을 다음의 [수학식 18]과 같이 선택하게 된다.here,

to be. And during the Gibbs sampling process

It is necessary in the process of deriving a sample of v from. As for the sample value of v, the value at which the conditional probability is maximized is selected as shown in [Equation 18] below.

상기 [수학식 18]을 커널 트릭을 사용하여 전개하면,

는 [수학식 19]와 같은 목적함수

를 최소화하는 값으로 구해진다.When [Equation 18] is developed using a kernel trick,

Is the same objective function as [Equation 19]

Is obtained as a value that minimizes.

이때, ReLU 커널함수가 사용되는 경우, ReLU 함수를 위 목적함수에 대입한 후, v에 대해 미분을 취하고 0으로 설정하면 고정점 반복법(fixed-point iteration)에 의해 t번째의 샘플 값을 [수학식 20]와 같이 추정할 수 있다.At this time, if the ReLU kernel function is used, after substituting the ReLU function into the objective function above, taking the derivative for v and setting it to 0, the t-th sample value is calculated by the fixed-point iteration method. It can be estimated as Equation 20].

여기서,

는 ReLU 함수의 미분값이고, 초기값은

로 설정되며, 다음 단계 추정값인

을 샘플값

로 사용하였다.here,

Is the derivative of the ReLU function, and the initial value is

Is set to, the next step estimate

Sample value

Was used as.

이하, 본 발명의 일 실시예에 따른 커널 기법을 사용한 제한된 볼츠만 머신 시스템의 모의실험 및 결과에 대해 살피면 아래와 같다.Hereinafter, simulations and results of a limited Boltzmann machine system using a kernel technique according to an embodiment of the present invention will be examined as follows.

[ MNIST ][MNIST]

먼저, MNIST(Modified National Institute of Standards and Technology) 데이터 셋에 대한 학습 및 인식을 수행하였다. 이때, MNIST 데이터 셋은 28 X 28 크기의 grayscale 형식의 손으로 쓴 0 에서 9 사이의 숫자 영상이다. First, learning and recognition of the MNIST (Modified National Institute of Standards and Technology) data set was performed. At this time, the MNIST data set is a handwritten numeric image between 0 and 9 in a grayscale format with a size of 28 X 28.

60,000개의 학습 샘플과 10,000개의 테스트 샘플로 구성되며, 본 발명의 KRBM과 비교하기 위해 BBRBM과 이진 가시유닛을 갖는 NReLU(Noisy ReLU) RBM을 학습하였다. 학습을 위해 CD-1 알고리즘을 사용하였고, 경사 승강법의 학습율은 KRBM에 대해 0.001을 사용하였고, 0.9의 모멘텀을 사용하였다.Consisting of 60,000 training samples and 10,000 test samples, BBRBM and NReLU (Noisy ReLU) RBM having a binary visible unit were trained to compare with the KRBM of the present invention. For learning, the CD-1 algorithm was used, and the learning rate of the gradient elevation method was 0.001 for KRBM, and a momentum of 0.9 was used.

가중치의 초기값은 영평균과 0.01의 표준편차를 갖는 가우시안으로부터 발생된 랜덤값을 사용하였다. 학습을 위한 배치 크기는 100이며, 1000 epoch만큼 학습하였다. RBM의 입력은 28 X 28 = 784 크기의 벡터이며, 은닉층의 수는 가변적으로 사용하여 성능을 비교하였다. As the initial value of the weight, a random value generated from a Gaussian with a zero mean and a standard deviation of 0.01 was used. The batch size for learning is 100, and 1000 epochs were learned. The input of the RBM is a vector of size 28 X 28 = 784, and the number of hidden layers is variably used to compare the performance.

도 2는 학습된 RBM에 대한 특징 표현에 대한 시각화를 살펴보기 위해 학습된 필터(가중치)에 대한 값을 표현한 도면으로, (a)는 BBRBM, (b)는 NReLU RBM, (c)는 본 발명의 일 실시예에 따른 KRBM에 대한 필터로, MNIST에 대해 1024개의 은닉유닛중에 분산이 가장 큰 100개에 대한 이미지이다.2 is a diagram showing values for a learned filter (weight) to look at the visualization of the feature expression for the learned RBM, (a) is BBRBM, (b) is NReLU RBM, (c) is the present invention The filter for KRBM according to an embodiment of, is an image of 100 of the largest variance among 1024 hidden units for MNIST.

일반적으로 RBM은 Gabor 필터와 닮은 국부적인 형태의 필터를 나타낸 것으로 잘 알려져 있다. 도 2에 도시된 바와 같이, NReLU에 의해 학습된 필터는 조금 더 sparse한 형태를 띄고, KRBM은 종래의 RBM과는 다른 형태의 필터 이미지가 나타나는 것을 확인할 수 있고, spares한 형태의 필터뿐만 아니라 다양한 형태를 갖는 필터가 학습됨을 알 수 있다.In general, RBM is well known to represent a localized type of filter resembling a Gabor filter. As shown in FIG. 2, the filter learned by the NReLU has a slightly more sparse form, and the KRBM can confirm that a filter image of a different form from that of the conventional RBM appears. It can be seen that a filter having a shape is learned.

이처럼 학습된 RBM을 사용하여 입력이 주어진 경우, 은닉유닛의 수만큼 은닉유닛에 나타는 값들을 새로운 특징벡터로 추출하고, 이를 인식하기 위해 softmax 인식기를 사용해 3가지 RBM에 대해 인식 실험을 수행하였다.When an input is given using the learned RBM, values appearing in the hidden unit as many as the number of hidden units are extracted as a new feature vector, and to recognize this, a recognition experiment was performed on three RBMs using a softmax recognizer.

한편, 도 3은 가변적인 은닉유닛의 수에 따른 테스트 데이터에 대한 인식 정확도를 도시한 도면이다. 전체적으로 은닉유닛의 수가 증가할수록 인식 정확도가 향상됨을 알 수 있다. NReLU RBM은 적은 수의 은닉수에서 RBM보다 더 낮은 성능을 보이나 2048개에서는 RBM이 약간 더 나은 성능을 보였다.Meanwhile, FIG. 3 is a diagram showing recognition accuracy for test data according to a variable number of hidden units. Overall, it can be seen that the recognition accuracy improves as the number of hidden units increases. NReLU RBM showed lower performance than RBM in a small number of hidden numbers, but RBM showed slightly better performance in 2048.

본 발명의 일 실시예에 따른 KRBM은 적은 수에 대해서 종래의 BBRBM과 NReLU RBM에 비해 훨씬 향상된 정확도를 나타내며, 2048개에서 거의 비슷한 성능을 보임을 알 수 있다. 따라서 본 발명의 일 실시예에 의한 기법이 종래 기법에 비해 비지도 특징학습에 있어 효과적임을 알 수 있다.It can be seen that the KRBM according to an embodiment of the present invention exhibits much improved accuracy compared to the conventional BBRBM and NReLU RBM for a small number, and shows almost similar performance in 2048. Therefore, it can be seen that the technique according to an embodiment of the present invention is more effective in unsupervised feature learning than the conventional technique.

[ STL10 ][STL10]

STM-10은 비교사 특징학습 또는 딥러닝을 위한 영상 인식용 데이터 셋이다. 이는 종래의 CIFA-10 데이터 셋과 비슷하지만 일부 수정을 통해 생성되었고, 각 클래스는 CIFAR-10 보다 적은 레이블을 갖는 학습 샘플과, 훨씬 많은 양의 레이블이 없는 샘플 영상으로 구성되어 교사 학습 전에 비교사 학습을 위해 제작되었다.STM-10 is a data set for image recognition for feature learning or deep learning of non-comparative history. This is similar to the conventional CIFA-10 data set, but was created with some modifications, and each class consists of training samples with fewer labels than CIFAR-10 and a much larger amount of unlabeled sample images. Built for learning.

레이블이 없는 샘플들은 모델 학습을 위해 이용되며, 특히 여러 가지 비교사 학습 방법을 개발하기 위해 사용된다. 영상 샘플은 96 X 96 픽셀 크기의 컬러 영상이며, 10개의 클래스(airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck)로 구성된다.Unlabeled samples are used for model training, especially to develop several non-historical learning methods. The image sample is a color image of 96 X 96 pixels, and consists of 10 classes (airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck).

비교사 학습을 위한 데이터 수는 100,000개이며, 교사학습을 위해 클래스당 500개의 학습 영상과 800개 테스트 영상으로 구성된다. 비교사용 데이터 셋은 레이블을 갖는 영상뿐아니라 다양한 형태의 영상들이 추가로 포함되어 있다. STL-10 데이터를 비지도 학습과 인식에 사용하기 위해 MNIST와 다르게 전처리 과정을 수행한다. The number of data for non-translational learning is 100,000, and it consists of 500 learning videos and 800 test videos per class for teacher learning. The comparative data set includes not only images with labels, but also images of various types. Unlike MNIST, preprocessing is performed to use STL-10 data for unsupervised learning and recognition.

본 발명의 일 실시예는 먼저 96 X 96 크기의 영상을 계산량과 메모리 사용량을 줄이기 위해 32 X 32 크기로 조정한다. 그리고, RBM의 학습을 위해 6 X 6 크기의 컬러 패치(patch)를 클래스가 없는 100,000개의 영상의 임의의 위치에서 총 500,000개를 추출하여 사용한다. 이에 따라 RBM의 입력의 개수는 6 X 6 X 3 = 108이다.According to an embodiment of the present invention, first, an image having a size of 96 X 96 is adjusted to a size of 32 X 32 in order to reduce the amount of computation and memory usage. In addition, for RBM learning, a total of 500,000 colors are extracted from a random position of 100,000 images without a class and used with a color patch of 6 X 6 size. Accordingly, the number of inputs of RBM is 6 X 6 X 3 = 108.

모든 데이터는 whitening 과정을 수행하고, KRBM에 대해서는 영평균을 그리고 나머지 RBM에 대해서는 영평균과 단위분상을 갖도록 정규화 한다. 학습 데이터로 RBM을 학습한 후 인식 실험을 수행하기 위해 입력 영상에 대해 같은 간격으로 패치들을 생성하고 RBM을 통해 특징들을 추출한다.All data are whitening and normalized to have a zero mean for KRBM and a zero mean and unit phase for the remaining RBMs. After learning the RBM with the training data, patches are generated at equal intervals for the input image to perform a recognition experiment, and features are extracted through the RBM.

특징벡터의 수를 줄이는 과정을 걸쳐 최종적으로 한 입력 영상당 1600개의 특징 벡터를 발생하였다. 이를 softmax 인식기를 사용해 인식실험을 수행하였다.Through the process of reducing the number of feature vectors, 1600 feature vectors were finally generated per input image. This recognition experiment was performed using a softmax recognizer.

[표 1]은 STL-10에서 테스트 데이터에 대한 인식 정확도를 나타낸다. [표 1]에 나타나듯 특징학습이 없는 경우 31.8%의 매우 낮은 인식률을 나타낸다. RBM에 의해 특징학습을 수행하고 그 특징들을 이용하는 경우 매우 높은 인식률의 향상이 보임을 알 수 있다.[Table 1] shows the accuracy of recognition for test data in STL-10. As shown in [Table 1], when there is no feature learning, it shows a very low recognition rate of 31.8%. It can be seen that when feature learning is performed by RBM and the features are used, a very high recognition rate is improved.

STL-10의 경우 NReLU 보다 GBRBM이 더 높은 인식률을 보였고, 본 발명에 따른 KRBM은 두 기법보다 높은 인식률을 나타내었다. 따라서 STL-10 데이터에서도 본 발명의 일 실시예에 의한 KRBM이 효과적임을 알 수 있다.In the case of STL-10, GBRBM showed higher recognition rate than NReLU, and KRBM according to the present invention showed higher recognition rate than both techniques. Therefore, it can be seen that the KRBM according to an embodiment of the present invention is effective even in STL-10 data.

이상으로 본 발명의 기술적 사상을 예시하기 위한 바람직한 실시예와 관련하여 설명하고 도시하였지만, 본 발명은 이와 같이 도시되고 설명된 그대로의 구성 및 작용에만 국한되는 것이 아니며, 기술적 사상의 범주를 일탈함이 없이 본 발명에 대해 다수의 변경 및 수정이 가능함을 당업자들은 잘 이해할 수 있을 것이다. 따라서 그러한 모든 적절한 변경 및 수정과 균등 물들도 본 발명의 범위에 속하는 것으로 간주되어야 할 것이다.Although described and illustrated in connection with a preferred embodiment for illustrating the technical idea of the present invention as described above, the present invention is not limited to the configuration and operation as illustrated and described as described above, and deviates from the scope of the technical idea. It will be well understood by those skilled in the art that many changes and modifications are possible to the present invention without. Accordingly, all such appropriate changes and modifications and equivalents should be considered to be within the scope of the present invention.

100: 커널 기법을 사용한 제한된 볼츠만 머신 시스템
102: KRBM 형성부
104: KRBM 학습부
106: 특징 추출부100: Limited Boltzmann machine system using kernel techniques
102: KRBM formation portion
104: KRBM Learning Department
106: feature extraction unit

Claims

A KRBM forming unit that maps input data to a high-dimensional feature space using a nonlinear function, and forms a Kernel Restricted Boltzmann Machine (KRBM) consisting of visible and hidden variables having real values through a kernel function in the mapped space;
A KRBM learning unit that learns a KRBM through a gradient elevation method that maximizes a log similarity function; And
A feature extraction unit that extracts features from the input data through a kernel function according to the input weight.
A limited Boltzmann machine system using kernel techniques, characterized in that it comprises.

The method of claim 1,
The input data,
A limited Boltzmann machine system using a kernel technique, characterized in that the kernel function is mapped by a Gaussian probability distribution.

The method of claim 1,
The nonlinear function (

) Is in the n-dimensional input space

It is set as a function that maps to the dimensional feature vector space,
The KRBM,
Layers by visible variables defined in the feature vector space; And
Layer by m-dimensional hidden variable vector with real values
A limited Boltzmann machine system using kernel techniques, characterized in that it comprises.