KR101895121B1

KR101895121B1 - Machine learning method using relevance vector machine, computer program implementing the same and informaion processintg device configured to perform the same

Info

Publication number: KR101895121B1
Application number: KR1020160116110A
Authority: KR
Inventors: 이재욱; 박새롬; 장희수; 손영두
Original assignee: 서울대학교산학협력단
Priority date: 2016-09-09
Filing date: 2016-09-09
Publication date: 2018-09-04
Also published as: KR20180028610A

Abstract

본 발명은 관련도 벡터 머신을 이용한 기계학습방법, 이를 구현하는 컴퓨터 프로그램 및 이를 수행하도록 구성되는 정보처리장치에 관한 것이다. 구체적으로, 기계학습방법은, (1) 데이터 포인트들을 포함한 원시 데이터 집합 중 일부의 데이터 포인트들만 라벨링된 경우, 라벨링된 데이터 포인트들의 제1 데이터 집합 및 라벨링되지 않은 데이터 포인트들의 제2 데이터 집합을 초기값으로 설정하여, 베이지안 회귀 모델에 기초한 변환적 일반화 관련도 벡터 머신을 구성하는 단계와, (2) 상기 관련도 벡터 머신으로부터 획득된 분포에 기초하여 상기 제2 데이터 집합으로부터 선택된 관련도 벡터들의 제3 데이터 집합을 획득하는 단계와, (3) 상기 제3 데이터 집합에 포함된 관련도 벡터들에 기초하여, 상기 제1 및 제2 데이터 집합들의 갱신을 위한 제4 데이터 집합을 구성하고 상기 제1 데이터 집합 및 제2 데이터 집합을 갱신하는 단계와, (4) 상기 갱신된 제1 데이터 집합 및 제2 데이터 집합을 초기값으로 설정하여, 상기 단계 (1)의 변환적 일반화 관련도 벡터 머신을 재구성하는 단계와, (5) 상기 단계 (2) 내지 단계 (4)를 반복하여 최종 구성되는 관련도 벡터 머신으로부터 관련도 벡터 및 가중치를 획득하는 단계를 포함한다.The present invention relates to a machine learning method using an association vector machine, a computer program implementing the same, and an information processing apparatus configured to perform the same. Specifically, a machine learning method may be used to (1) initialize a first set of labeled data points and a second set of unlabeled data points, if only some of the raw data sets including the data points are labeled Value of the vector of relevance vectors selected from the second set of data based on the distribution obtained from the vector of degrees of relevance; (3) constructing a fourth set of data for updating the first and second sets of data based on the relevance vectors included in the third set of data, Updating the data set and the second data set; (4) updating the updated first data set and the second data set to an initial value (5) repeating the steps (2) to (4) to obtain an relevance vector from the relevance vector machine finally constructed and And obtaining a weight.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a machine learning method using an association vector machine, a computer program for implementing the same, and an information processing apparatus configured to perform the same. 2. Description of the Related Art [0002]

본 발명은 기계학습방법, 이를 구현하는 컴퓨터 프로그램 및 이를 수행하도록 구성되는 정보처리장치에 관한 것으로서, 보다 구체적으로는 소수의 라벨링 데이터를 이용한 기계학습방법, 이를 구현하는 컴퓨터 프로그램 및 이를 수행하도록 구성되는 정보처리장치에 관한 것이다.The present invention relates to a machine learning method, a computer program implementing the same, and an information processing apparatus configured to perform the same. More particularly, the present invention relates to a machine learning method using a small number of labeling data, a computer program implementing the same, And an information processing apparatus.

기계학습(machine learning)은 컴퓨터 과학의 한 분야인 인공지능 분야에서 패턴인식 및 계산학습 이론에 대한 연구로부터 진화된 학문으로서, 주어진 데이터로부터 학습을 하여 예측 결과를 만들어내는 알고리즘에 관련된 연구 분야이다. 컴퓨터 하드웨어의 기술발달 및 다양한 알고리즘의 연구들로 인해, 기계학습은 스팸 필터, 광학 문자인식(optical character recognition, OCR) 등과 같은 응용에서부터 검색 엔진, 컴퓨터 비전(vision), 데이터 마이닝(mining) 등에 이르는 광범위한 분야에 응용되고 있다.Machine learning is an academic discipline from the study of pattern recognition and computational learning theory in artificial intelligence, a field of computer science. It is a field of research related to algorithms that produce learning results from given data and produce prediction results. Due to the development of computer hardware and the study of various algorithms, machine learning has become an increasingly important field of study, from applications such as spam filters, optical character recognition (OCR), to search engines, computer vision, data mining, And has been applied to a wide range of fields.

기계학습에 의해 해결하기 위한 여러 문제들에는 불규칙하게 분포하는 데이터로부터 규칙성을 찾아내는 회귀(regression) 문제나, 이러한 데이터들을 일정한 카테고리로 분류하는 분류(classification) 문제가 포함된다.Several problems to be solved by machine learning include regression problems that find regularity from irregularly distributed data or classification problems that classify these data into certain categories.

지지도 벡터 머신(support vector machine, SVM)은 분류 및 회귀 문제를 위해 사용되는 지도학습(supervised learning) 방법에 관련된 프레임워크이다. SVM을 사용하면, 어느 하나의 카테고리로 마크(mark)된 학습 데이터들이 주어진 경우, 입력되는 새로운 데이터가 SVM 학습 알고리즘에 의해 어느 카테고리로 마크될 것인지 예측하는 모델이 구성된다.A support vector machine (SVM) is a framework for supervised learning methods used for classification and regression problems. Using the SVM, a model for predicting in which category the new data to be input will be marked by the SVM learning algorithm is constituted when the learning data marked in any one category is given.

이러한 분류 또는 회귀 문제에서, 주어진 학습 데이터에 의해 예측을 만들기 위한 적절한 모델을 구성하기 위해서는, 모든 학습 데이터가 라벨링(labelled)될 필요가 있다.In this sorting or regression problem, all learning data needs to be labeled in order to construct an appropriate model for making predictions with given learning data.

그러나 전체 학습 데이터를 라벨링하여 기계학습시키는 경우, 계산 복잡도가 증가하며 성능이 감소되는 문제가 있다.However, when the entire learning data is labeled and machine-learned, the calculation complexity increases and the performance decreases.

또한, SVM을 이용하는 경우, SVM을 구성하는 데에 필요한 지지도 벡터(support vector)들의 개수가 대개 학습 셋트의 크기에 따라 선형적으로 증가하므로 불필요하게 많은 기저 함수들이 사용되고, 이로 인해 계산 복잡도가 증가하게 된다.In addition, when SVM is used, since the number of support vectors required for constructing the SVM linearly increases according to the size of the learning set, unnecessarily many base functions are used, thereby increasing the computational complexity do.

나아가, 회귀 문제에서 데이터 포인트 추정치를 출력하고, 분류 문제에서 강한(hard) 이산적 결정을 출력하는 SVM의 예측은 확률적이지 않기 때문에, 클래스 사전 분포 및 비대칭 오분류 비용에 적응되어야 하는 분류 문제에 적용되기 어려운 한계가 있다.Furthermore, since the prediction of the SVM outputting the data point estimates in the regression problem and outputting the hard discrete decisions in the classification problem is not probabilistic, the problem of classification that must be adapted to the class dictionary distribution and the asymmetric misclassification cost There are limitations that are difficult to apply.

또한, SVM에서는 에러(error) 및 마진(margin) 간의 트레이드 오프(trade-off) 파라미터를 추정해야 하므로, 이를 추정하기 위한 교차검증 과정에 의해 데이터 및 계산 성능이 모두 낭비된다.In addition, since the SVM must estimate the trade-off parameter between the error and the margin, the data and calculation performance are wasted by the cross-validation process for estimating the trade-off parameter.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 전체 학습 데이터 중 소수의 데이터만 라벨링되더라도 효율적으로 기계학습을 수행할 수 있는, 관련도 벡터 머신을 이용한 기계학습방법, 이를 구현하는 컴퓨터 프로그램 및 이를 수행하도록 구성되는 정보처리장치를 제공하는 것을 목적으로 한다.The present invention has been proposed in order to solve the above-mentioned problems of the previously proposed methods, and it is an object of the present invention to provide a machine learning method using an association vector machine capable of efficiently performing machine learning even if only a small number of data of all training data are labeled , A computer program for implementing the same, and an information processing apparatus configured to perform the same.

상기한 목적들을 달성하기 위해 본 발명의 일 측면에 따른 컴퓨터 프로그램은, 정보처리장치의 프로세서에 의해 실행되는 경우, 상기 프로세서로 하여금, (1) N개(N은 2 이상의 자연수)의 데이터 포인트들을 포함한 원시 데이터 집합에서 L개(L은 1 이상 N 미만의 자연수)의 데이터 포인트들을 라벨링하고, 라벨링된 데이터 포인트들의 제1 데이터 집합 및 라벨링되지 않은 데이터 포인트들의 제2 데이터 집합을 초기값으로 설정하여, 베이지안 회귀 모델에 기초한 변환적(transductive) 일반화 관련도 벡터 머신을 구성하는 단계, (2) 상기 관련도 벡터 머신으로부터 획득된 분포에 기초하여 상기 제2 데이터 집합으로부터 선택된 관련도 벡터들의 제3 데이터 집합을 획득하는 단계, (3) 관련도 벡터를 쿼리하는 정책에 따라 상기 제3 데이터 집합에 포함된 관련도 벡터들에 기초하여 제4 데이터 집합을 구성하고, 상기 구성된 제4 데이터 집합에 따라 상기 제1 데이터 집합 및 제2 데이터 집합을 갱신하는 단계, (4) 상기 갱신된 제1 데이터 집합 및 제2 데이터 집합을 초기값으로 설정하여, 상기 단계 (1)의 변환적 일반화 관련도 벡터 머신을 재구성하는 단계, (5) 미리 설정된 정지조건이 만족될 때까지, 상기 단계 (2) 내지 단계 (4)를 반복하는 단계, 및 (6) 상기 정지조건이 만족되는 경우, 최종 구성되는 관련도 벡터 머신으로부터 관련도 벡터 및 가중치를 획득하는 단계를 포함하는 프로세스를 수행하도록 구성된다.In order to achieve the above objects, a computer program according to an aspect of the present invention is a computer program that when executed by a processor of an information processing apparatus, causes the processor to execute (1) N data points (N is a natural number of 2 or more) Labeling L (L is a natural number less than or equal to 1) data points in the original data set including the first data set of the labeled data points and the second data set of the unlabeled data points as initial values , Constructing a transductive generalization relevance vector machine based on a Bayesian regression model, (2) generating third data of relevance vectors selected from the second data set based on the distribution obtained from the relevance vector machine, (3) associating an association vector included in the third data set according to a policy for querying an association vector, (4) updating the first data set and the second data set according to the fourth data set, (4) updating the first data set and the second data set based on the fourth data set, (2) to step (4) until a predetermined stop condition is satisfied; and (5) repeating the steps (2) to (4) until the preset stop condition is satisfied , And (6) if the stop condition is satisfied, obtaining an relevance vector and a weight from the last constructed relevance vector machine.

일 실시예에서, 상기 단계 (1)에서 변환적 일반화 관련도 벡터 머신을 구성하는 단계는, (a) 상기 초기값 중 라벨링된 데이터 포인트들 및 모든 데이터 포인트들에 대한 커널값을 원소로 갖는 행렬을 포함하는 일반화된 베이지안 회귀 모델을 구성하는 단계, (b) 상기 구성된 일반화된 베이지안 회귀 모델로부터 ARD(automatic relevance determination) 사전 확률분포를 사용하여 희소 해를 얻는 단계, (c) 상기 초기값 중 라벨링되지 않은 데이터 포인트들의 추정 출력 및 상기 얻어진 희소 해에 대한 근사 결합 분포를 얻는 단계, (d) 상기 얻어진 근사 결합 분포로부터 상기 희소 해에 대한 근사 주변우도함수를 얻는 단계, 및 (e) 상기 근사 주변우도함수에 기초하여, 상기 초기값에 포함된 데이터 포인트들과 상이한 데이터 포인트에 대한 사후 예측분포를 얻는 단계를 포함할 수 있다.In one embodiment, the step of constructing a transformative generalization relevance vector machine in step (1) comprises the steps of: (a) generating a matrix having element values of labeled values of the initial values and kernel values for all data points (B) obtaining a rare solution using an automatic relevance determination (ARD) prior probability distribution from the constructed generalized Bayesian regression model; (c) (D) obtaining an approximate surrounding likelihood function for the sparse solution from the obtained approximate binding distribution, and (e) calculating an approximate surrounding likelihood function for the near- Obtaining a posterior prediction distribution for a data point different from the data points included in the initial value, based on a likelihood function, It can hamhal.

일 실시예에서, 상기 단계(3)에서 상기 관련도 벡터를 쿼리하는 정책은, 라벨링되지 않은 관련도 벡터에 대한 쿼리, 가장 불특정한 관련도 벡터에 대한 쿼리 및 가장 먼 관련도 벡터에 대한 쿼리 중 적어도 하나를 포함할 수 있다.In one embodiment, the policy for querying the relevance vector in step (3) includes a query for an unlabeled relevance vector, a query for the most unspecific relevance vector, and a query for the most distant relevance vector And may include at least one.

일 실시예에서, 상기 단계 (3)에서 상기 제4 데이터 집합은, 상기 관련도 벡터에 대한 쿼리가 라벨링되지 않은 관련도 벡터에 대한 쿼리인 경우, 상기 제3 데이터 집합과 동일하게 정의되고, 상기 관련도 벡터에 대한 쿼리가 가장 불특정한 관련도 벡터에 대한 쿼리인 경우, 상기 제3 데이터 집합에 포함된 관련도 벡터들 중 분산값을 최대로 만드는 관련도 벡터들의 집합으로 정의되며, 상기 관련도 벡터에 대한 쿼리가 가장 먼 관련도 벡터에 대한 쿼리인 경우, 상기 제3 데이터 집합에 포함된 관련도 벡터들 중 상기 제1 데이터 집합의 데이터 포인트들과의 거리가 최소가 되는 관련도 벡터들의 집합으로 정의될 수 있다.In one embodiment, the fourth data set in step (3) is defined the same as the third data set if the query for the relevance vector is a query for an unlabeled relevance vector, The vector of the relevance vector included in the third data set is defined as a set of vectors that maximize the variance value among the relevance vectors included in the third data set, As a set of relevance vectors having a minimum distance from the data points of the first data set among the relevance vectors included in the third data set, Can be defined.

일 실시예에서, 상기 관련도 벡터에 대한 쿼리가 상기 가장 불특정한 관련도 벡터에 대한 쿼리인 경우, 상기 관련도 벡터에 대한 쿼리가 상기 라벨링되지 않은 관련도 벡터에 대한 쿼리이거나 상기 가장 먼 관련도 벡터에 대한 쿼리인 경우에 비해, 상기 단계 (5)에서의 반복 횟수가 더 적을 수 있다.In one embodiment, if the query for the relevance vector is a query for the least specific relevance vector, then the query for the relevance vector is a query for the unlabeled relevance vector, The number of repetitions in step (5) may be smaller than in the case of a query on a vector.

일 실시예에서, 상기 단계 (1)에서 라벨링되는 데이터 포인트들의 개수는, 상기 관련도 벡터에 대한 쿼리가 상기 라벨링되지 않은 관련도 벡터에 대한 쿼리이거나 상기 가장 먼 관련도 벡터에 대한 쿼리인 경우에 비해, 상기 가장 불특정한 관련도 벡터에 대한 쿼리인 경우에서, 더 적을 수 있다.In one embodiment, the number of data points to be labeled in step (1) may be determined such that if the query for the relevance vector is a query for the unlabeled relevance vector or a query for the most distant relevance vector In the case of the query for the most unspecific relevance vector, it may be less.

일 실시예에서, 상기 단계 (3)에서, 상기 제1 데이터 집합은, 상기 제1 데이터 집합 및 제4 데이터 집합의 합집합으로 갱신되고, 상기 제2 데이터 집합은, 상기 제2 데이터 집합으로부터 상기 제4 데이터 집합의 원소들을 제외한 차집합으로 갱신될 수 있다.In one embodiment, in the step (3), the first data set is updated with the union of the first data set and the fourth data set, and the second data set is updated from the second data set 4 can be updated with the difference set excluding the elements of the data set.

상기한 목적들을 달성하기 위해 본 발명의 일 측면에 따른 기계학습방법은, 관련도 벡터 머신을 이용한 기계학습방법으로서, (1) 데이터 포인트들을 포함한 원시 데이터 집합 중 일부의 데이터 포인트들만 라벨링된 경우, 라벨링된 데이터 포인트들의 제1 데이터 집합 및 라벨링되지 않은 데이터 포인트들의 제2 데이터 집합을 초기값으로 설정하여, 베이지안 회귀 모델에 기초한 변환적(transductive) 일반화 관련도 벡터 머신을 구성하는 단계, (2) 상기 관련도 벡터 머신으로부터 획득된 분포에 기초하여 상기 제2 데이터 집합으로부터 선택된 관련도 벡터들의 제3 데이터 집합을 획득하는 단계, (3) 상기 제3 데이터 집합에 포함된 관련도 벡터들에 기초하여, 상기 제1 및 제2 데이터 집합들의 갱신을 위한 제4 데이터 집합을 구성하고 상기 제1 데이터 집합 및 제2 데이터 집합을 갱신하는 단계, (4) 상기 갱신된 제1 데이터 집합 및 제2 데이터 집합을 초기값으로 설정하여, 상기 단계 (1)의 변환적 일반화 관련도 벡터 머신을 재구성하는 단계, 및 (5) 상기 단계 (2) 내지 단계 (4)를 반복하여 최종 구성되는 관련도 벡터 머신으로부터 관련도 벡터 및 가중치를 획득하는 단계를 포함한다.According to an aspect of the present invention, there is provided a machine learning method using an association vector machine, comprising the steps of: (1) if only a part of data points of a raw data set including data points are labeled, Constructing a transductive generalization relevance vector machine based on a Bayesian regression model by setting a first data set of labeled data points and a second data set of unlabeled data points as initial values, Obtaining a third set of relevance vectors selected from the second set of data based on a distribution obtained from the relevance vector machine; (3) A fourth data set for updating the first and second data sets, (4) reconstructing the transformative generalization relevance vector machine of step (1) by setting the updated first data set and the second data set as initial values, and 5) repeating the above steps (2) to (4) to obtain the relevance vector and the weight from the relevance vector machine finally constructed.

일 실시예에서, 상기 단계 (3)에서, 상기 제4 데이터 집합은, 상기 제3 데이터 집합에 포함된 관련도 벡터에 대한 쿼리가 라벨링되지 않은 관련도 벡터에 대한 쿼리인 경우, 상기 제3 데이터 집합과 동일하게 정의되고, 상기 제3 데이터 집합에 포함된 관련도 벡터에 대한 쿼리가 가장 불특정한 관련도 벡터에 대한 쿼리인 경우, 상기 제3 데이터 집합에 포함된 관련도 벡터들 중 분산값을 최대로 만드는 관련도 벡터들의 집합으로 정의되며, 상기 제3 데이터 집합에 포함된 관련도 벡터에 대한 쿼리가 가장 먼 관련도 벡터에 대한 쿼리인 경우, 상기 제3 데이터 집합에 포함된 관련도 벡터들 중 상기 제1 데이터 집합의 데이터 포인트들과의 거리가 최소가 되는 관련도 벡터들의 집합으로 정의될 수 있다.In one embodiment, in step (3), if the query for the relevance vector included in the third data set is a query for an unlabeled relevance vector, And if the query for the relevance vector included in the third data set is the query for the most irrelevant relevance vector, then the variance value among the relevance vectors included in the third data set is defined as the maximum If the query for the relevance vector included in the third data set is a query for the most distant relevance vector, the relevance vector included in the third data set is defined as a set of relevancy vectors included in the third data set, May be defined as a set of relevance vectors having a minimum distance from the data points of the first data set.

일 실시예에서, 상기 제3 데이터 집합에 포함된 관련도 벡터에 대한 쿼리가 상기 가장 불특정한 관련도 벡터에 대한 쿼리인 경우, 상기 제3 데이터 집합에 포함된 관련도 벡터에 대한 쿼리가 상기 라벨링되지 않은 관련도 벡터에 대한 쿼리이거나 상기 가장 먼 관련도 벡터에 대한 쿼리인 경우에 비해, 상기 단계 (5)에서의 반복 횟수가 더 적을 수 있다.In one embodiment, if the query for the relevance vector included in the third data set is a query for the least specific relevance vector, a query for the relevance vector contained in the third set of data may be included in the labeling The number of iterations in the step (5) may be smaller than that in the case of the query for the relevance vector that is not the most distant relevance vector or the query for the farthest relevance vector.

일 실시예에서, 상기 단계 (3)에서, 상기 제1 데이터 집합 및 제2 데이터 집합은, 상기 제4 데이터 집합과의 합집합 및 차집합 중 적어도 하나의 연산을 통해 갱신될 수 있다.In one embodiment, in step (3), the first data set and the second data set may be updated through at least one of a union and a difference set with the fourth data set.

일 실시예에서, 상기 변환적(transductive) 일반화 관련도 벡터 머신은, (a) 상기 제1 데이터 집합의 데이터 포인트들과 상기 제1 및 제2 데이터 집합들의 합집합에서의 데이터 포인트들에 대한 커널값을 원소로 갖는 행렬을 포함하는 일반화된 베이지안 회귀 모델을 구성하는 단계, (b) 상기 구성된 일반화된 베이지안 회귀 모델로부터 ARD(automatic relevance determination) 사전 확률분포를 사용하여 희소 해를 얻는 단계, (c) 상기 제2 데이터 집합의 데이터 포인트들의 추정 출력 및 상기 얻어진 희소 해에 대한 근사 결합 분포를 얻는 단계, (d) 상기 얻어진 근사 결합 분포로부터 상기 희소 해에 대한 근사 주변우도함수를 얻는 단계, 및 (e) 상기 근사 주변우도함수에 기초하여, 상기 초기값에 포함된 데이터 포인트들과 상이한 데이터 포인트에 대한 사후 예측분포를 얻는 단계를 포함하여 처리될 수 있다.In one embodiment, the transductive generalization relevance vector machine includes (a) a kernel value for the data points in the union of the data points of the first data set and the first and second data sets, (B) obtaining a rare solution using an automatic relevance determination (ARD) prior probability distribution from the constructed generalized Bayesian regression model; (c) (D) obtaining an approximate surrounding likelihood function for the sparse solution from the obtained approximate binding distribution, and (e) obtaining an estimated approximate distribution of the data points of the second data set and an approximate binding distribution for the obtained sparse solution, ) Based on the approximate surrounding likelihood function, a posterior prediction distribution is obtained for data points different from the data points included in the initial value It can be treated, including the steps:

상기한 목적들을 달성하기 위해 본 발명의 일 측면에 따른 정보처리장치는, 관련도 벡터 머신을 이용한 기계학습방법을 수행하도록 구성되는 프로세서를 포함하고, 상기 프로세서는, (1) 데이터 포인트들을 포함한 원시 데이터 집합 중 일부의 데이터 포인트들만 라벨링된 경우, 라벨링된 데이터 포인트들의 제1 데이터 집합 및 라벨링되지 않은 데이터 포인트들의 제2 데이터 집합을 초기값으로 설정하여, 베이지안 회귀 모델에 기초한 변환적(transductive) 일반화 관련도 벡터 머신을 구성하는 단계, (2) 상기 관련도 벡터 머신으로부터 획득된 분포에 기초하여 상기 제2 데이터 집합으로부터 선택된 관련도 벡터들의 제3 데이터 집합을 획득하는 단계, (3) 상기 제3 데이터 집합에 포함된 관련도 벡터들에 기초하여, 상기 제1 및 제2 데이터 집합들의 갱신을 위한 제4 데이터 집합을 구성하고 상기 제1 데이터 집합 및 제2 데이터 집합을 갱신하는 단계, (4) 상기 갱신된 제1 데이터 집합 및 제2 데이터 집합을 초기값으로 설정하여, 상기 단계 (1)의 변환적 일반화 관련도 벡터 머신을 재구성하는 단계, 및 (5) 상기 단계 (2) 내지 단계 (4)를 반복하여 최종 구성되는 관련도 벡터 머신으로부터 관련도 벡터 및 가중치를 획득하는 단계를 수행하도록 구성된다.According to an aspect of the present invention, there is provided an information processing apparatus including a processor configured to perform a machine learning method using an association vector machine, the processor including: (1) If only a portion of the data points of the data set are labeled, a first data set of labeled data points and a second data set of unlabeled data points are set as initial values, and a transductive generalization based on a Bayesian regression model (2) obtaining a third data set of relevance vectors selected from the second set of data based on a distribution obtained from the relevance vector machine; (3) Based on the relevance vectors included in the data set, for updating the first and second sets of data (4) constructing a fourth data set and updating the first data set and the second data set, (4) setting the updated first data set and the second data set as initial values, (5) repeating steps (2) to (4) to obtain an relevance vector and a weight from the relevance vector machine that is finally constructed do.

본 발명에서 제안하고 있는 관련도 벡터 머신을 이용한 기계학습방법, 이를 구현하는 컴퓨터 프로그램 및 이를 수행하도록 구성되는 정보처리장치에 따르면, 라벨링된 데이터와 라벨링되지 않은 데이터를 함께 이용하여 변환적 일반화 관련도 벡터 머신을 구성하되, 관련도 벡터의 쿼리 정책에 따라 라벨링된 데이터 및 라벨링되지 않은 데이터를 반복적으로 갱신하면서 상기 변환적 일반화 관련도 벡터 머신을 새로 구성하여 최종적으로 관련성 벡터 및 가중치를 획득함으로써, 전체 학습 데이터 중 소수의 데이터만 라벨링되더라도 효율적으로 기계학습을 수행할 수 있다.According to the machine learning method using the relevance vector machine proposed in the present invention, the computer program implementing the same, and the information processing apparatus configured to perform the same, both the labeled data and the unlabeled data are used together for conversion generalization relatedness A vector machine is constructed and the transformed generalization relation vector machine is newly constructed by repeatedly updating the labeled and unlabeled data according to the query policy of the association vector to finally obtain the relevance vector and the weight, Even if only a small number of pieces of learning data are labeled, machine learning can be performed efficiently.

도 1은 본 발명의 일실시예에 따른 관련도 벡터 머신을 이용한 기계학습방법의 흐름을 도시한 순서도이다.
도 2는 본 발명의 일실시예에 따른 관련도 벡터 머신을 이용한 기계학습방법에서, 변환적 일반화 관련도 벡터 머신을 구성하는 방법의 흐름을 도시한 순서도이다.
도 3은 본 발명의 일실시예에 따른 관련도 벡터 머신을 이용한 기계학습방법에 따라 소수의 라벨링 데이터를 사용하여 얻어진 예측 평균을 도시한 분포도들이다.1 is a flowchart showing a flow of a machine learning method using an association vector machine according to an embodiment of the present invention.
FIG. 2 is a flowchart showing a flow of a method for constructing a transform generalization related vector machine in a machine learning method using an association vector machine according to an embodiment of the present invention.
FIG. 3 is a distribution diagram showing predictive averages obtained using a small number of labeling data according to a machine learning method using an association vector machine according to an exemplary embodiment of the present invention.

이하에서는 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 실시예들을 상세히 설명한다. 다만, 본 발명의 실시예들을 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 상세한 설명 및 도면 전체에 걸쳐 동일 또는 유사한 부호를 사용한다. 덧붙여, 명세서 전체에서, 어떤 구성요소를 ‘포함’한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, the same or similar reference numerals are used throughout the detailed description and the drawings. Incidentally, throughout the specification, "including" an element means that it may include other elements, not excluding other elements unless specifically stated otherwise.

이하, 본 발명의 실시예들에 따른 관련도 벡터 머신을 이용한 기계학습방법, 이를 구현하는 컴퓨터 프로그램 및 이를 수행하도록 구성되는 정보처리장치를 설명한다. 상기 정보처리장치는 서버 컴퓨터, 데스크톱 컴퓨터(personal computer, PC), 태블릿 PC, 노트북 PC, 스마트폰, 휴대폰, 네비게이션 단말기, PDA(personal digital assistant) 등과 같은 임의의 전자장치의 하나 또는 그 이상의 조합일 수 있다. 상기 정보처리장치는 다양한 형태로 데이터를 처리할 수 있으며, 프로세서, 메인 메모리 및 보조저장장치를 포함할 수 있다. Hereinafter, a machine learning method using an association vector machine according to embodiments of the present invention, a computer program implementing the same, and an information processing apparatus configured to perform the same will be described. The information processing apparatus may be one or more combinations of any electronic device such as a server computer, a personal computer (PC), a tablet PC, a notebook PC, a smart phone, a mobile phone, a navigation terminal, a personal digital assistant . The information processing apparatus may process data in various forms, and may include a processor, a main memory, and an auxiliary storage device.

상기 프로세서는 중앙처리장치(central processing unit, CPU), 그래픽 처리 장치(graphic processing unit, GPU) 등과 같은 고속의 프로세싱 장치를 포함할 수 있다. 상기 메인 메모리는 랜덤 액세스 메모리(RAM) 등과 같은 휘발성 메모리를 포함할 수 있다. 상기 보조저장장치는, 예컨대, 하드디스크 드라이브(hard disk drive, HDD), 솔리드 스테이트 드라이브(solid state drive, SSD)등과 같은 비휘발성 저장매체를 포함할 수 있다.The processor may include a high-speed processing device such as a central processing unit (CPU), a graphics processing unit (GPU), and the like. The main memory may include volatile memory such as random access memory (RAM) or the like. The auxiliary storage device may include a nonvolatile storage medium such as a hard disk drive (HDD), a solid state drive (SSD), or the like.

상기 정보처리장치의 프로세서는, 상기 메인 메모리 및/또는 보조저장장치에 기록되거나 적재된 컴퓨터 프로그램을 실행하도록 구성될 수 있다. 상기 컴퓨터 프로그램은 상기 정보처리장치의 내부 또는 외부에 설치된 비일시적인 컴퓨터 독출가능 매체에 저장된 것일 수 있다. 상기 컴퓨터 프로그램은 예를 들어, RAM에 적재되고, 실행 시, 프로세서에 의해 실행될 수 있다. The processor of the information processing apparatus may be configured to execute a computer program recorded or loaded in the main memory and / or auxiliary storage device. The computer program may be stored in a non-temporary computer readable medium installed inside or outside the information processing apparatus. The computer program may, for example, be loaded into RAM and, when executed, be executed by a processor.

상기 컴퓨터 독출가능 매체는 읽기 전용 메모리(read only memory, ROM), 콤팩트 디스크(compact disc, CD), 디지털 다기능 디스크(digital versatile disc, DVD), HDD, SSD, 자기 디스크, 자기 테이프, 자기-광학 디스크 등과 같은 임의의 기록매체일 수 있다. 상기 컴퓨터 프로그램은 일련의 프로세스를 구현하기 위한 하나 이상의 서브루틴, 함수, 모듈, 기능블록 등을 포함할 수 있다.The computer readable medium may be a read only memory (ROM), a compact disc (CD), a digital versatile disc (DVD), a HDD, an SSD, a magnetic disk, Disk, or the like. The computer program may include one or more subroutines, functions, modules, functional blocks, etc., for implementing a series of processes.

도 1은 본 발명의 일실시예에 따른 관련도 벡터 머신을 이용한 기계학습방법의 흐름을 도시한 순서도이다.1 is a flowchart showing a flow of a machine learning method using an association vector machine according to an embodiment of the present invention.

도 1을 참조하면, 본 실시예에 따른 관련도 벡터 머신을 이용한 기계학습방법은, 데이터 포인트들을 포함한 원시 데이터 집합에서 일부의 데이터 포인트들을 라벨링하고, 라벨링된 데이터 포인트들의 제1 데이터 집합 및 언라벨링된(unlabelled) 데이터 포인트들의 제2 데이터 집합을 초기값으로 설정하여, 베이지안 회귀 모델에 기초한 변환적(transductive) 일반화 관련도 벡터 머신을 구성하는 단계(S100), 관련도 벡터 머신으로부터 획득된 분포에 기초하여 관련도 벡터들의 데이터 집합을 획득하는 단계(S300), 관련도 벡터에 대한 쿼리 정책에 따라, 관련도 벡터들의 데이터 집합에 포함된 관련도 벡터들에 기초하여, 제1 데이터 집합 및 제2 데이터 집합을 갱신하는 단계(S500), 정지 조건이 만족되는지 판단하는 단계(S700) 및 최종 구성된 관련도 벡터 머신으로부터 관련도 벡터 및 가중치를 획득하는 단계(S900)를 포함할 수 있다. 이하, 각각의 단계에 대해 상세히 설명하도록 한다.Referring to FIG. 1, a machine learning method using an association vector machine according to the present embodiment includes labeling a plurality of data points in a raw data set including data points, performing a first data set of unlabeled data points Establishing a second data set of unlabeled data points as initial values, constructing a transductive generalization relevance vector machine based on a Bayesian regression model (SlOO), determining a distribution obtained from the relevance vector machine (S300) of obtaining a data set of relevance vectors based on relevance vectors included in a data set of relevance vectors according to a query policy for the relevancy vector, A step S500 of updating the data set, a step S700 of judging whether the stop condition is satisfied, Relevancy may include the step (S900) for obtaining a vector and weight. Hereinafter, each step will be described in detail.

단계 S100에서는, 데이터 포인트들을 포함한 원시 데이터 집합에서 일부의 데이터 포인트들을 라벨링하고, 라벨링된 데이터 포인트들의 제1 데이터 집합 및 라벨링되지 않은 데이터 포인트들의 제2 데이터 집합을 초기값으로 설정하여, 베이지안 회귀 모델에 기초한 변환적(transductive) 일반화 관련도 벡터 머신을 구성할 수 있다.In step S100, some data points are labeled in a raw data set including data points, a first data set of labeled data points and a second data set of unlabeled data points are set as initial values, and a Bayesian regression model Can be used to construct a transductive generalization-related vector machine based on

데이터 포인트들은 M차원(M은 1 이상의 자연수) 공간의 벡터에 대응될 수 있는 포인트들일 수 있다. 라벨링은 적어도 하나의 데이터 포인트에 대한 마크(mark)를 포함할 수 있다. 단계 S100이 최초로 시작되는 경우, 전체 데이터 포인트들 중 몇몇 데이터 포인터들은 미리 설정된 다른 알고리즘에 의해 라벨링되어 있거나, 또는 정보처리장치의 사용자에 의해 인위적으로 라벨링되어 있을 수 있다.The data points may be points that may correspond to a vector of M dimensions (where M is a natural number greater than or equal to one) space. The labeling may include a mark for at least one data point. If step S100 is initiated for the first time, some of the data points of the entire data points may be labeled by other predetermined algorithms, or may be artificially labeled by the user of the information processing apparatus.

원시 데이터 집합은 N개(N은 2 이상의 자연수)의 데이터 포인트 x들을 포함할 수 있다. 원시 데이터 집합 D는 식 1과 같이 정의될 수 있다.The primitive data set may include N (N is a natural number of 2 or more) data points x. The raw data set D can be defined as:

[식 1][Formula 1]

이때, 원시 데이터 집합에서 L개(L은 1 이상 N 미만의 자연수)의 데이터 포인트들이 라벨링될 수 있고, N-L개의 데이터 포인트들은 라벨링되지 않을 수 있다. 라벨링된 데이터 포인트들의 집합은 제1 데이터 집합을 구성하고, 라벨링되지 않은 데이터 포인트들의 집합은 제2 데이터 집합을 구성한다. 제1 데이터 집합 D_L은 식 2와 같이 정의되고, 제2 데이터 집합 D_U은 식 3과 같이 정의될 수 있다.At this time, L (L is a natural number less than or equal to 1) data points may be labeled in the original data set, and NL data points may not be labeled. The set of labeled data points constitutes a first set of data, and the set of unlabeled data points constitutes a second set of data. The first data set D _L can be defined as Equation 2 and the second data set D _U can be defined as Equation 3. [

[식 2][Formula 2]

[식 3][Formula 3]

식 2 및 식 3에서, 데이터 집합 D의 아랫첨자 L은 라벨링된 데이터 포인트들의 집합을, U는 라벨링되지 않은 데이터 포인트들의 집합을 나타내고, 윗첨자 1은, 최초에 라벨링 또는 언라벨링된 데이터 포인트들의 집합임을 나타낸다. 또한, L₁은 1 이상 및 N 미만의 자연수로서, 최초에 라벨링된 데이터 포인트들의 개수를 나타낸다. y는 데이터 포인트 x에 대응하는 라벨을 나타낸다.In Equations 2 and 3, the subscript L of the data set D represents a set of labeled data points, U represents a set of unlabeled data points, superscript 1 represents the set of labeled or unlabeled data points Indicates a set. Also, L ₁ is a natural number greater than or equal to 1 and less than N, indicating the number of data points initially labeled. y represents a label corresponding to the data point x.

상기 제1 데이터 집합 및 제2 데이터 집합의 데이터 포인트들은 함께, 베이지안 회귀 모델에 기초한 변환적(transductive) 일반화 관련도 벡터 머신(generalized relevance vector machine, GRVM)에 대한 초기값으로 사용된다. 관련도 벡터 머신(relevance vector machine, RVM)은, 식 3의 함수에 기반한 예측을 베이지안 방식으로 수행하는 방법이다.The data points of the first data set and the second data set together are used as initial values for a transductive generalized relevance vector machine (GRVM) based on a Bayesian regression model. The relevance vector machine (RVM) is a method of performing the prediction based on the function of Equation 3 in a Bayesian manner.

[식 3][Formula 3]

여기서, x는 입력 벡터들을 나타내고, w들은 가중치들을 나타내며, K(x,x_i)는 커널 함수를 나타내고, y는 출력 벡터를 나타내며, N은 2 이상의 자연수이다.Here, x denotes the input vector, w represent weight, K (x, x _i) represents the kernel function, y denotes the output vector, N is a natural number of 2 or more.

일반적으로, 데이터로부터 최확값(most probable value)들이 반복적으로 추정되는 하이퍼-파라미터(hyperparameter)의 셋트에 의해 결정되는 모델 가중치들에 대하여 사전 확률이 도입되면, 상기 가중치들의 다수의 사후 분포는 영(zero) 근방에서 피크를 갖게 되는데, 이때 영이 아닌(non-zero) 가중치들에 대한 학습 벡터들은 관련도 벡터(relevance vector, RV)라고 정의된다. RVM의 가장 큰 장점은, SVM과 동등한 성능을 보이면서도 커널 함수의 개수를 크게 감소시킬 수 있으며, 이에 따라 계산 복잡도를 향상시킬 수 있는 점이다. 본 발명에 따른 실시예들에서는, SVM이 아닌, RVM을 능동 학습(active learning) 방식에 적용하여, 계산 복잡도를 향상시킬 수 있다. In general, if prior probabilities are introduced for model weights that are determined by a set of hyperparameters from which the most probable values are repeatedly estimated, then the posterior distribution of the plurality of weights is zero zero, where learning vectors for non-zero weights are defined as relevance vectors (RVs). The biggest advantage of RVM is that it can greatly reduce the number of kernel functions while achieving the same performance as SVM, thereby improving the computational complexity. In the embodiments according to the present invention, RVM rather than SVM can be applied to the active learning method to improve computational complexity.

상기 제1 데이터 집합 및 제2 데이터 집합의 데이터 포인트들을 초기값으로 함께 사용하여, 베이지안 회귀 모델에 기초한 변환적 일반화 관련도 벡터 머신(transductive GRVM)을 구성하는 과정은 도 2를 참조하여 상세히 설명한다.The process of constructing a transformative generalization relativity vector machine based on Bayesian regression models using the data points of the first data set and the second data set together as initial values will be described in detail with reference to FIG. 2 .

도 2는 본 발명의 일실시예에 따른 관련도 벡터 머신을 이용한 기계학습방법에서, 변환적 일반화 관련도 벡터 머신을 구성하는 방법의 흐름을 도시한 순서도이다.FIG. 2 is a flowchart showing a flow of a method for constructing a transform generalization related vector machine in a machine learning method using an association vector machine according to an embodiment of the present invention.

도 2를 참조하면, 본 실시예에 따른 관련도 벡터 머신을 이용한 기계학습방법에서, 변환적 일반화 관련도 벡터 머신을 구성하는 방법은, 라벨링된 데이터 포인트들 및 모든 데이터 포인트들에 대한 커널값을 원소로 갖는 행렬을 포함하는 일반화된 베이지안 회귀 모델을 구성하는 단계(S210), 일반화된 베이지안 회귀 모델로부터 ARD(automatic relevance determination) 사전 확률분포를 사용하여 희소 해(sparse solution)를 얻는 단계(S230), 라벨링되지 않은 데이터 포인트들의 추정 출력 및 상기 희소 해에 대한, 근사 결합분포(approximated joint distribution)를 얻는 단계(S250), 근사 결합분포로부터 희소 해에 대한 근사 주변우도함수(approximated marginal likelihood function)를 얻는 단계(S270), 및 근사 주변우도함수에 기초하여, 초기값에 포함된 데이터 포인트들과 상이한 데이터 포인트에 대한 사후 예측분포를 얻는 단계(S290)를 포함할 수 있다.Referring to FIG. 2, in a machine learning method using an association vector machine according to the present embodiment, a method for constructing a transformational generalization relevance vector machine includes a step of calculating a kernel value for labeled data points and all data points (S230) a sparse solution using an automatic relevance determination (ARD) prior probability distribution from a generalized Bayesian regression model (S230); constructing a generalized Bayesian regression model , Obtaining an estimated output of the unlabeled data points and an approximated joint distribution for the sparse solution (S250), calculating an approximated marginal likelihood function for the sparse combination from the approximate binding distribution (S270), and based on the approximate surrounding likelihood function, the data points that are different from the data points included in the initial value May include a step (S290) of obtaining a post-predicted distribution is about.

단계 S210에서는, 라벨링된 데이터 포인트들 및 모든 데이터 포인트들에 대한 커널값을 원소로 갖는 행렬을 포함하는 일반화된 베이지안 회귀 모델을 구성할 수 있다. 이때, RVM을 이용한 베이지안 회귀 모델은 식 4와 같이, 라벨링된 데이터 집합 D_L의 라벨링된 데이터와, 라벨링되지 않은(unlabelled) 데이터 집합 D_U의 라벨링되지 않은 데이터를 함께 사용하여 구성될 수 있다.In step S210, a generalized Bayesian regression model may be constructed that includes a matrix having labeled data points and a kernel value as an element for all data points. At this time, the Bayesian regression model using the RVM can be constructed using the labeled data of the labeled data set D _{L and} the unlabeled data of the unlabeled data set D _U together as shown in Equation (4).

[식 4][Formula 4]

여기서, Φ_L,L _+U는 Φ_L,L ₊ _U:L×(L+U) 인 행렬로서, 각각의 원소는 라벨링된 x_i및 모든 데이터 x_j에 대한 커널값 k(x_i,x_j)을 나타낸다(단, 1≤i≤L, 1≤j≤L+U 이고, L은 라벨링된 데이터의 개수를 나타내며, U는 언라벨링된 데이터의 개수를 나타냄). 또한, w는 미지의(unknown) 가중치 벡터를 나타낸다. 또한, ε_L은 ε_L ~ N(0, σ²I_L)인 정규분포를 따르는 것으로 전제되는 노이즈 처리를 위한 독립 벡터를 나타내는데, 이때 I_L은 L차원의 단위행렬(identity matrix)을 나타내며, σ는 표준편차를 나타낸다. 또한, y_L은 출력 벡터를 나타낸다.Here, Φ _{L, L} _{+ U} is _{_{_{Φ L, L + U: L}}} × as the matrix (L + U), each element is the kernel value for the labeling x _i, and all the data x _j k (x _i, x _j , where 1? _i ? L, 1? _j ? L + U, where L represents the number of labeled data and U represents the number of unlabeled data. In addition, w represents an unknown weight vector. Also, ε _L represents an independent vector for noise processing, which is assumed to follow a normal distribution of ε _L to N (0, σ ² I _L ), where I _L represents an identity matrix of the L dimension, σ represents the standard deviation. Y _L represents an output vector.

단계 S230에서는, 이렇게 구성된 일반화된 베이지안 회귀 모델로부터 ARD(automatic relevance determination) 사전 확률분포를 사용하여 희소 해(sparse solution)를 얻을 수 있다. ARD는 불필요하거나 과다한 특징들을 효과적으로 제거하며, 파라미터화된, 데이터 종속적인(data-dependent) 사전 확률분포를 사용하여 해 공간(solution space)을 정규화할 수 있다. ARD 사전 확률분포를 이용하여 희소 해를 얻는 과정은, 본 기술분야에서 공지된 방법으로 처리될 수 있다.In step S230, a sparse solution can be obtained using an automatic relevance determination (ARD) prior probability distribution from a generalized Bayesian regression model thus constructed. The ARD effectively removes unnecessary or excessive features and can normalize the solution space using a parameterized, data-dependent prior probability distribution. The process of obtaining the rare solution using the ARD prior probability distribution can be handled in a manner known in the art.

이때, 얻어진 사후 확률분포 p(w|A)는 평균이 영(zero)이고, 공분산 행렬 A^-1을 가지는데, A는 대각행렬이 된다. 여기서, 상기 사후 확률분포의 분산 σ² 와, 공분산 행렬에 관련된 A는, 본 기술분야에서 잘 알려진, 타입-2 최대우도 추정(type-2 maximum likelihood estimation)에 의해 얻을 수 있다.At this time, the posterior probability distribution p (w | A) obtained is zero and has a covariance matrix A ^-1 , where A is a diagonal matrix. Here, the variance? ² of the posterior probability distribution and A related to the covariance matrix can be obtained by type-2 maximum likelihood estimation, which is well known in the art.

단계 S250에서는, 라벨링되지 않은 데이터 포인트들의 추정 출력 및 상기 희소 해에 대한, 근사 결합분포(approximated joint distribution)를 얻을 수 있다. 라벨링되지 않은 데이터들의 추정 출력 f_U는 식 5와 같이 나타낼 수 있는데,In step S250, an estimated output of the unlabeled data points and an approximated joint distribution for the sparse solution can be obtained. The estimated output f _U of the unlabeled data can be expressed as Equation 5,

[식 5][Formula 5]

본 기술분야에서 잘 알려진, 라플라스 근사법(Laplace approximation)을 사용하면, 식 4의 y_L과, 식 5의 f_U에 대한 근사 결합 분포(approximated joint distribution)를 식 6과 같이 얻을 수 있다.Using the Laplace approximation, which is well known in the art, the approximated joint distribution for y _{L in} Equation 4 and f _U in Equation 5 can be obtained as Equation 6.

[식 6][Formula 6]

여기서, Φ_G,H는 Φ_G,H:G×H 인 행렬로서, 각각의 원소는 라벨링된 x_i및 모든 데이터 x_j에 대한 커널값 k(x_i,x_j)을 나타낸다(단, 1≤i≤G, 1≤j≤H 이고, G 및 H는 각각, L, U 및 L+U 중 어느 하나임). 또한, 윗첨자 T는 전치(transpose) 행렬을 나타낸다.Here, Φ _{G, H} is Φ _{G, H:} G as a × H matrix, each element represents the kernel value k (x _i, x _j) of the labeled x _i, and all the data x _j (Note 1 1 &le; J &le; H, and G and H are respectively one of L, U and L + U). The superscript T represents a transpose matrix.

단계 S270에서는, 상기 근사 결합분포로부터 희소 해에 대한 근사 주변우도함수(approximated marginal likelihood function)를 얻을 수 있다. 식 6에서 얻은 y_L에 대한 근사 주변우도함수는 하기의 식 7과 같이 얻어진다.In step S270, an approximated marginal likelihood function for the sparse solution can be obtained from the approximate binding distribution. The approximate surrounding likelihood function for y _L obtained from Eq. (6) is obtained as follows.

[식 7][Equation 7]

단계 S290에서는, 상기 근사 주변우도함수에 기초하여, 초기값에 포함된 데이터 포인트들과 상이한 데이터 포인트 x^*에 대한 사후 예측분포를 얻을 수 있다. 새로운 데이터 포인트 x^*에 대한 사후 예측분포는, 본 기술분야에서 잘 알려진, Sheman-Morrison-Woodbury 식을 통해 하기의 식 8과 같이 얻을 수 있다.In step S290, based on the approximate surrounding likelihood function, a post-prediction distribution for a data point x ^* different from the data points included in the initial value can be obtained. The post-prediction distribution for the new data point x ^* can be obtained by the Sheman-Morrison-Woodbury equation, which is well known in the art, as shown in Equation 8 below.

[식 8][Equation 8]

여기서, m f(x^*)는 식 9와 같이 정의되고, Σ_L _+U는 식 10과 같이 정의될 수 있다.Here, mf (x ^* ) is defined as in Equation 9 and Σ _L _{+ U} can be defined as in Equation 10.

[식 9][Equation 9]

[식 10][Equation 10]

다시 도 1을 참조하면, 단계 S300에서는, 관련도 벡터 머신(transductive GRVM)으로부터 획득된 사후 예측분포에 기초하여 관련도 벡터들의 데이터 집합 RV_U을 획득할 수 있다. 구체적으로, 본 단계에서 제3 데이터 집합 RV_U는 라벨링되지 않은 관련도 벡터들로 구성될 수 있다.Referring again to FIG. 1, in step S300, a data set RV _U of relevance vectors may be obtained based on the post-prediction distribution obtained from the transductive GRVM. Specifically, in this step, the third data set RV _U can be composed of unlabeled relevance vectors.

단계 S500에서는, 관련도 벡터에 대한 쿼리 정책에 따라, 관련도 벡터들의 데이터 집합 RV_U에 포함된 관련도 벡터들에 기초하여, 제1 데이터 집합 및 제2 데이터 집합을 갱신할 수 있다. 본 단계에서는, 단계 S300에서 얻은 관련도 벡터들의 데이터 집합 RV_U을 이용하여, 상기 쿼리 전략에 따라 상기 제1 데이터 집합 D_L및 제2 데이터 집합 D_U을 갱신하기 위한 제4 데이터 집합 D_q를 선택할 수 있다.In step S500, the first data set and the second data set may be updated based on the relevance vectors included in the data set RV _U of the relevancy vectors according to the query policy for the relevance vector. In this step, a fourth data set D _q for updating the first data set D _L and the second data set D _U according to the query strategy is obtained by using the data set RV _U of the relevance vectors obtained in step S300 You can choose.

구체적으로, 상기 쿼리 정책으로서, 관련도 벡터에 대한 쿼리가 언라벨링된 관련도 벡터에 대한 쿼리인 경우, 상기 쿼리 정책에 따른 데이터 집합 D_q는 식 11과 같이, 상기 데이터 집합 RV_U와 동일하게 정의될 수 있다.Specifically, if the query for the relevance vector is a query for an unlabeled relevance vector, the data set D _q according to the query policy is set to be equal to the data set RV _U Can be defined.

[식 11][Equation 11]

상기 쿼리 정책으로서, 관련도 벡터에 대한 쿼리가 가장 불특정한(most uncertain) 관련도 벡터에 대한 쿼리인 경우, 상기 쿼리 정책에 따른 데이터 집합 D_q는 식 12와 같이, 상기 데이터 집합 RV_U에 포함된 관련도 벡터들 중 분산값을 최대로 만드는 관련도 벡터들의 집합으로 정의될 수 있다.If the query for the relevance vector is a query for the most uncertain related vector, the data set D _q according to the query policy is included in the data set RV _U , And a set of relevance vectors that maximize the variance value among the relevance vectors.

[식 12][Equation 12]

여기서, x_u는 제3 데이터 집합 RV_u의 원소인 관련도 벡터들을 나타낸다.Where x _u denotes relevance vectors that are elements of the third data set RV _u .

상기 쿼리 정책으로서, 관련도 벡터에 대한 쿼리가 가장 먼(farthest) 관련도 벡터에 대한 쿼리인 경우, 상기 쿼리 정책에 따른 데이터 집합 D_q는 식 13과 같이, 상기 데이터 집합 RV_U에 포함된 관련도 벡터들 중 상기 제1 데이터 집합 D_L의 데이터 포인트들과의 거리가 최소가 되는 관련도 벡터들의 집합으로 정의될 수 있다.Wherein a query policy, associated even if a query on the vector farthest (farthest) relevancy of the query to the vector, as shown in the data set D _q Equation 13 according to the query policy, the associated included in the data set RV _U May be defined as a set of relevance vectors having a minimum distance from data points of the first data set D _L among the plurality of vector vectors.

[식 13][Formula 13]

여기서, x_u는 제3 데이터 집합 RV_u의 원소인 관련도 벡터들을 나타내고, x_l은 제1 데이터 집합 D_L의 원소인 데이터 포인트들을 나타내며, D_L의 윗첨자 j는 갱신되는 횟수에 따른 카운트를 나타낸다(즉, j=1이면 최초에 설정된 D_L을, j=2이면 1번 갱신된 D_L을, j=3이면 2번 갱신된 D_L을 나타냄).Where x _u denotes relevance vectors that are elements of the third data set RV _u , x ₁ denotes data points that are elements of the first data set D _L , superscript j of D _L denotes a count represents a (that is, if j = 1 is the D 1 _L update the D _L, j = 2 is set in the first, j = 3 indicates the update D _L 2).

이어서, 상기 쿼리 정책에 따라 식 11 내지 식 13 중 어느 하나로 선택되는 데이터 집합 D_q을 이용하여, 제1 데이터 집합 D_L및 제2 데이터 집합 D_U을 갱신할 수 있다. 구체적으로, 제1 데이터 집합 D_L은, 제1 데이터 집합 D_L및 제4 데이터 집합 D_q의 합집합으로 갱신되고, 제2 데이터 집합 D_U는, 제2 데이터 집합 D_U으로부터 제4 데이터 집합 D_q의 원소들을 제외한 차집합으로 갱신될 수 있는데, 이는 하기의 식 14와 같다.Subsequently, the first data set D _L and the second data set D _U can be updated using the data set D _q selected from any one of Equations 11 to 13 according to the query policy. Specifically, the first data set D _L is updated to the union of the first data set D _L and the fourth data set D _q , and the second data set D _U is updated from the second data set D _U to the fourth data set D _q can be updated to the difference set, which is shown in Equation 14 below.

[식 14][Equation 14]

여기서, 윗첨자 j+1은 갱신된 데이터 집합들을 나타내고, y_q는 D_q의 원소들인 x_q에 대한 라벨을 나타낸다.Where superscript j + 1 represents updated data sets, and y _q represents a label for x _q , which is an element of D _q .

단계 S700에서는, 정지 조건이 만족되는지 판단할 수 있다. 상기 정지 조건은 예를 들어, 관련도 벡터들에 대한 분산, 표준편차 등이 미리 설정된 범위 내에 속하거나, 미리 설정된 범위를 벗어나는 것일 수 있다. 상기 정지 조건이 만족되지 않는 경우, 갱신된 제1 데이터 집합 D_L및 제2 데이터 집합 D_U을 초기값으로 설정하여, 단계 S100에서의 베이지안 회귀 모델에 기초한 변환적 일반화 관련도 벡터 머신을 재구성하며, 단계 S300 및 S500을 반복할 수 있다. 이에 따라, 정지 조건이 만족될 때까지, 단계 S100 내지 S500이 반복될 수 있다.In step S700, it can be determined whether or not the stop condition is satisfied. The stop condition may be, for example, a variance, a standard deviation, and the like for the relevance vectors within a predetermined range, or out of a predetermined range. If the stop condition is not satisfied, the updated first data set D _L and the second data set D _U are set as initial values, and the transformative generalization relevancy vector machine based on the Bayesian regression model at step S100 is reconstructed , Steps S300 and S500 can be repeated. Thus, steps S100 to S500 can be repeated until the stop condition is satisfied.

단계 S900에서는, 단계 S700에서 정지 조건이 만족되는 경우, 최종적으로 구성된 관련도 벡터 머신으로부터 관련도 벡터 및 가중치를 획득할 수 있다. 이렇게 얻어진 관련도 벡터 및 가중치는, 최초에 소수의 라벨링된 데이터 포인트들로부터 학습되는 관련도 벡터 및 가중치이므로, 학습을 위해 미리 전체 데이터 포인트들을 라벨링할 필요가 없어, 기계학습의 라벨링 비용을 감소시킬 수 있다.In step S900, if the stop condition is satisfied in step S700, the relevance vector and weight can be obtained from the finally constructed relevance vector machine. Since the relevance vector and the weight thus obtained are relevance vectors and weights that are learned from a small number of the labeled data points first, there is no need to label all the data points in advance for learning, thereby reducing the labeling cost of the machine learning .

구체적으로, 단계 S500에서의 쿼리 정책으로서, 관련도 벡터에 대한 쿼리가 언라벨링된 관련도 벡터에 대한 쿼리이거나, 가장 불특정한(most uncertain) 관련도 벡터에 대한 쿼리인 경우에 설정되는 데이터 집합 D_q에 기반(식 11 또는 식 12)하여 제1 및 제2 데이터 집합들을 갱신하는 경우, 무작위 선택 알고리즘(random selection algorithm)에 비해 표준오차가 적은 것으로 확인되었다. 다만, 출력 분포가 수렴하기 위해서는 비교적 많은 양의 데이터가 라벨링될 필요가 있었다.Specifically, as the query policy in step S500, the query set for the relevance vector is a query for an unlabeled relevance vector, or a query for the most uncertain related degree vector, it is confirmed that the standard error is smaller than that of the random selection algorithm when updating the first and second data sets based on _q (Equation 11 or Equation 12). However, in order for the output distribution to converge, a relatively large amount of data needs to be labeled.

이에 반해, 단계 S500에서의 쿼리 정책으로서, 관련도 벡터에 대한 쿼리가 가장 먼(farthest) 관련도 벡터에 대한 쿼리인 경우에 설정되는 데이터 집합 D_q에 기반(식 13)하여 제1 및 제2 데이터 집합들을 갱신하는 경우, 라벨링 데이터의 개수가 적더라도 상기 무작위 선택 알고리즘에 비해 빠르게 수렴하는 결과를 나타내었다.On the other hand, a query policy in step S500, related to a query is the furthest (farthest) relevancy based on data set D _q is set when the query for the vector of the vector equation (13) by first and second In the case of updating the data sets, even when the number of labeling data is small, the result is faster than that of the random selection algorithm.

도 3은 본 발명의 일실시예에 따른 관련도 벡터 머신을 이용한 기계학습방법에 따라 소수의 라벨링 데이터를 사용하여 얻어진 예측 평균을 도시한 분포도들이다. 도 3(a) 내지 도 3(f)에서, 라벨링된 데이터의 개수는 각각, (a) L=3, (b) L=7, (c) L=9, (d) L=13, (e) L=17, (e) L=20 이다. FIG. 3 is a distribution diagram showing predictive averages obtained using a small number of labeling data according to a machine learning method using an association vector machine according to an exemplary embodiment of the present invention. 3 (a) to 3 (f), the number of labeled data is (a) L = 3, (b) e) L = 17, (e) L = 20.

도 3의 각 도면들에서, 주어진 입력 데이터 포인트들은 청색 아스테리스크(*)로 표시되고, 라벨링된 데이터 포인트는 흑색 크로스(+)로 표시되며, 관련도 벡터들은 적색 원(o)으로 표시되었다. 또한, 녹색 실선은, 모델에 의해 예측된 평균을 나타낸다. 도 3(a) 내지 도 3(f)를 참조하면, 라벨링 개수가 많을수록(a에서 f로 갈수록) 예측 평균이 원래의 주어진 데이터에 보다 잘 수렴하는 것을 확인할 수 있다.3, given input data points are denoted by blue asterisks (*), labeled data points denoted by black cross (+), and relevance vectors denoted by a red circle (o) . Also, the green solid line represents the average predicted by the model. Referring to FIGS. 3 (a) to 3 (f), it can be seen that the more the number of labelings (from a to f), the better the prediction average converges to the original given data.

이상에서와 같이, 본 발명의 실시예들에 따른 관련도 벡터 머신을 이용한 기계학습방법, 이를 구현하는 컴퓨터 프로그램 및 이를 수행하도록 구성되는 정보처리장치에 따르면, 라벨링된 데이터와 라벨링되지 않은 데이터를 함께 이용하여 변환적 일반화 관련도 벡터 머신을 구성하되, 관련도 벡터의 쿼리 정책에 따라 라벨링된 데이터 및 라벨링되지 않은 데이터를 반복적으로 갱신하면서 상기 변환적 일반화 관련도 벡터 머신을 새로 구성하여 최종적으로 관련성 벡터 및 가중치를 획득함으로써, 전체 학습 데이터 중 소수의 데이터만 라벨링되더라도 효율적으로 기계학습을 수행할 수 있다.As described above, according to the machine learning method using the relevance vector machine according to the embodiments of the present invention, the computer program implementing the same, and the information processing apparatus configured to perform the same, the labeled data and the non- And the transformed generalization relevance vector machine is newly constructed by repeatedly updating the labeled data and the unlabeled data according to the query policy of the relevance vector, And weighting, it is possible to efficiently perform the machine learning even if only a small number of data among all the learning data are labeled.

본 발명에 따른 실시예들을 구성하는 구성요소들은 전술한 실시예에 한정되는 것은 아니고, 구성요소들 각각이 하나의 독립적인 하드웨어로 구현되거나, 각 구성요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 이러한, 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 구체적인 의사코드가 명시되지 않더라도, 본 발명의 기술 분야의 통상의 기술자에 의해 용이하게 추론될 수 있을 것이다.The constituent elements constituting the embodiments according to the present invention are not limited to the above-described embodiments, and each of the constituent elements may be embodied as one independent hardware, or some or all of the constituent elements may be selectively combined to form one Or as a computer program having a program module that performs some or all of the functions combined in a plurality of hardware. Such codes and code segments constituting a computer program may be easily deduced by those skilled in the art without specific pseudocode being specified.

또한, 순서도를 사용하여 여기에 설명된 프로세스들은 반드시 상기 순서도에 설명된 순서대로 수행될 필요가 없다. 몇몇 프로세스 단계들은 병렬적으로 실행될 수 있고, 또한, 부가적인 프로세스 단계가 적용될 수도 있다.Also, the processes described herein using the flowchart need not necessarily be performed in the order described in the flowchart. Some process steps may be executed in parallel, and additional process steps may also be applied.

이상 설명한 본 발명은 본 발명이 속한 기술 분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허 청구범위에 의하여 정해져야 할 것이다.The present invention may be embodied in many other specific forms without departing from the spirit or essential characteristics of the invention.

Claims

When executed by a processor of an information processing apparatus,
(1) labeling L (L is a natural number less than or equal to 1) data points in a raw data set including N (N is a natural number equal to or greater than 2) data points, and generating a first set of labeled data points and Establishing a second data set of unlabeled data points as an initial value to construct a transductive generalization relevance vector machine based on a Bayesian regression model;
(2) obtaining a third set of relatedness vectors selected from the second set of data based on a distribution obtained from the relevance vector machine;
(3) construct a fourth data set based on the relevance vectors included in the third data set according to a policy for querying the relevance vector, and generate a second data set based on the first data set and 2 update the data set;
(4) reconstructing the transformed generalization relevance vector machine of step (1) by setting the updated first data set and the second data set as initial values;
(5) repeating the steps (2) to (4) until a preset stop condition is satisfied; And
(6) if the stop condition is satisfied, acquiring an relevance vector and a weight from the last constructed relevance vector machine.

The method according to claim 1,
The step of constructing the transforming generalization relevance vector machine in step (1)
(a) constructing a generalized Bayesian regression model including a matrix having labeled data points of the initial values and kernel values for all data points as elements;
(b) obtaining a rare solution using an automatic relevance determination (ARD) prior probability distribution from the constructed generalized Bayesian regression model;
(c) obtaining an estimated output of unlabeled data points of the initial value and an approximate binding distribution for the obtained rare solution;
(d) obtaining an approximate surrounding likelihood function for the rare solution from the obtained approximate binding distribution; And
(e) obtaining a posterior prediction distribution for a data point different from the data points included in the initial value, based on the approximate surrounding likelihood function.

The method according to claim 1,
The policy for querying the relevance vector in step (3)
A query for an unlabeled relevance vector, a query for an unspecified relevance vector, and a query for a most distant relevance vector. &Lt; Desc / Clms Page number 22 >

The method of claim 3,
Wherein the fourth data set in step (3)
If the query for the relevance vector is a query for an unlabeled relevance vector, the query is defined identically to the third set of data,
And a set of relatedness vectors that maximize a variance value among the relevance vectors included in the third data set when the query for the relevance vector is a query for the most unrelated relevance vector,
If the query for the relevance vector is a query for the farthest relevance vector, the relevance degree of the relevance vector included in the third data set with the minimum distance from the data points of the first data set Wherein the vector is defined as a set of vectors.

5. The method of claim 4,
If the query for the relevance vector is a query for the least specific relevance vector, if the query for the relevance vector is a query for the unlabeled relevancy vector or a query for the most distant relevance vector , The number of repetitions in the step (5) is smaller than that in the step (5).

5. The method of claim 4,
The number of data points labeled in step (1)
In the case where the query for the relevance vector is a query for the least specific relevance vector, compared to when the query is for the unlabeled relevance vector or the query for the worst relevance vector, A computer program stored in a computer readable medium.

The method according to claim 1,
In the step (3)
Wherein the first data set is updated with the union of the first data set and the fourth data set,
Wherein the second data set is updated from the second data set to a difference set excluding elements of the fourth data set.

A machine learning method using an association vector machine,
(1) When the processor of the information processing apparatus is labeled only a part of the data points of the original data set including the data points, the first data set of the labeled data points and the second data set of the unlabeled data points are initialized Configuring a transductive generalization relevance vector machine based on a Bayesian regression model;
(2) the processor of the information processing apparatus acquires a third set of relatedness vectors selected from the second set of data based on the distribution obtained from the relevance vector machine;
(3) the processor of the information processing apparatus constructs a fourth set of data for updating the first and second sets of data based on the relevance vectors included in the third set of data, Updating a second set of data;
(4) the processor of the information processing apparatus sets the updated first data set and the second data set as initial values, and reconstructs the transforming generalization relevance vector machine of step (1); And
(5) The machine learning method according to (5), wherein the processor of the information processing apparatus repeats the steps (2) to (4) to obtain the relevance vector and the weight from the relevance vector machine finally configured.

9. The method of claim 8,
Wherein in the step (3), the fourth data set comprises:
If the query for the relevancy vector included in the third data set is a query for an unlabeled relevance vector,
When the query for the relevance vector included in the third data set is the query for the most unrelated relevance vector, the relevancy vectors of the relevancy vectors included in the third data set, Lt; / RTI >
If the query for the relevance vector included in the third data set is a query for the most distant relevance vector, And the distance vector is defined as a set of relevance vectors having a minimum distance.

10. The method of claim 9,
If the query for the relevance vector included in the third data set is a query for the least specific relevance vector, the query for the relevance vector included in the third data set is the unlabeled relevancy vector Is less than the number of iterations in the step (5), as compared to the case of the query for the closest relevance vector or the query for the farthest relevance vector.

9. The method of claim 8,
In the step (3)
Wherein the first data set and the second data set are updated through at least one of a union and a difference set with the fourth data set.

9. The method of claim 8,
The transductive generalization relevance vector machine may be a < RTI ID = 0.0 >
(a) a generalized Bayesian regression in which a processor of an information processing apparatus includes a matrix with elements of kernel values for data points in a union of data points of the first data set and the first and second sets of data; Constructing a model;
(b) obtaining a rare solution using an automatic relevance determination (ARD) pre-probability distribution from the generalized Bayesian regression model constructed by the processor of the information processing apparatus;
(c) the processor of the information processing apparatus obtaining an estimated output of the data points of the second data set and an approximate binding distribution for the obtained rare solution;
(d) the processor of the information processing apparatus obtaining an approximate surrounding likelihood function for the rare solution from the obtained approximate binding distribution; And
(e) a processor of the information processing apparatus is arranged to obtain, based on the approximate surrounding likelihood function, a posterior prediction distribution for data points different from the data points included in the initial value, Way.

A processor configured to perform a machine learning method using an association vector machine,
The processor comprising:
(1) setting only a first data set of labeled data points and a second data set of unlabeled data points as initial values, if only some data points of the original data set including data points are labeled, Constructing a transductive generalization relevance vector machine based on the vector;
(2) obtaining a third set of relatedness vectors selected from the second set of data based on a distribution obtained from the relevance vector machine;
(3) constructing a fourth set of data for updating the first and second sets of data based on the relevance vectors included in the third set of data and updating the first set of data and the second set of data ;
(4) reconstructing the transformed generalization relevance vector machine of step (1) by setting the updated first data set and the second data set as initial values; And
(5) The information processing apparatus is configured to perform the steps (2) to (4) to obtain the relevance vector and the weight from the relevance vector machine which is finally configured.

14. The method of claim 13,
Wherein in the step (3), the fourth data set comprises:
If the query for the relevancy vector included in the third data set is a query for an unlabeled relevance vector,
When the query for the relevance vector included in the third data set is the query for the most unrelated relevance vector, the relevancy vectors of the relevancy vectors included in the third data set, Lt; / RTI >
If the query for the relevance vector included in the third data set is a query for the most distant relevance vector, And the distance vector is defined as a set of relevance vectors having a minimum distance.