KR102203337B1

KR102203337B1 - Apparatus and method for m-estimation with trimmed l1 penalty

Info

Publication number: KR102203337B1
Application number: KR1020180162849A
Authority: KR
Inventors: 양은호; 윤지훈
Original assignee: 한국과학기술원
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2021-01-15
Also published as: KR20200074450A

Abstract

샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하고, 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하고, l₁정규화의 결과에 기반하여 샘플 집합과 연관된 모델의 최대우도를 추정하는 기계학습을 위한 최대우도 추정 방법이 제공된다.Trimming to process outlier values and heavy tailed noise is performed on the loss function for the sample set, and l ₁ normalization is performed in consideration of the penalty determined as the trimming is performed, A maximum likelihood estimation method for machine learning that estimates the maximum likelihood of a model associated with a sample set based on the result of l ₁ normalization is provided.

Description

Device and method for estimating maximum likelihood through pruned L1 penalty {APPARATUS AND METHOD FOR M-ESTIMATION WITH TRIMMED L1 PENALTY}

실시예들은 최대우도 추정 장치 및 방법에 관한 것으로 가지친(즉, 트리밍된(trimmed) l_{1 -}페널티를 사용하는 최대우도 추정(M-estimation) 장치 및 방법에 관한 것이다.The embodiments relate to a maximum likelihood estimation apparatus and method, and to a maximum likelihood estimation apparatus and method using a trimmed (ie, trimmed 1 ₁ -penalty).

기계학습(머신러닝)(Machine Learning)은 컴퓨터가 입력된 데이터에 기반하여 이전에 본 적이 없는 데이터를 적절하게 예측하는 방법을 학습하는 과정을 의미한다. 기계학습에 의해 해결하기 위한 여러 문제들에는 불규칙하게 분포하는 데이터로부터 규칙성을 찾아내는 회귀(regression) 문제나, 데이터들을 일정한 카테고리로 분류하는 분류(classification) 문제가 포함된다.Machine Learning (Machine Learning) refers to the process of learning how to properly predict data that has not been seen before based on input data. Several problems to be solved by machine learning include a regression problem that finds regularity from irregularly distributed data, and a classification problem that classifies data into certain categories.

최대우도 추정(maximum likelihood estimation)은 모수(parameter)가 미지의

인 확률분포에서 뽑은 샘플(관측치) x들에 기반하여

를 추정하는 기법이다. 우도(likelihood)는 주어진 샘플 x을 고려할 때 모수

에 대한 추정이 일치하는 정도를 나타낼 수 있다. 기계학습은 이러한 우도를 최대화하는 방식으로 이루어질 수 있다. For maximum likelihood estimation, the parameter is unknown.

Based on the sample (observation) xs drawn from the probability distribution of

It is a technique to estimate. Likelihood is a parameter given the given sample x

It can indicate the degree to which the estimates of is matched. Machine learning can be done in a way that maximizes this likelihood.

모델 복잡도(complexity)와 관련하여, 모델의 과적합을 방지하고 일반화 성능을 높이기 위한 방안으로서, 정규화(regularization)가 수행될 수 있다. 정규화는 모델 가중치에 대해 패널티(penalty)를 부과하는 것으로서, 정규화의 방법으로는 l₁ 정규화 및 l₂ 정규화 방법 등이 있다.Regarding model complexity, as a method for preventing overfitting of a model and improving generalization performance, regularization may be performed. Normalization imposes a penalty on model weights, and methods of normalization include l ₁ normalization and l ₂ normalization.

한국공개특허 제10-2009-0009478호(공개일: 2009년 01월 23일)에서는, 무선 통신 시스템에 있어서의 최대우도 검출 방법에 있어서, 채널행렬 정보, 잡음전력 정보 및 스트림별 변조차수 정보 중 적어도 하나의 정보를 이용하여 유클리디안 거리를 계산하는 과정과, 계산된 유클리디안 거리를 이용하여 신호쌍 오류율(PER)을 계산하는 과정과, 계산된 PER을 이용하여 스트림별 오류확률을 계산하는 과정과, 계산된 스트림별 오류확률에 대해 정렬 및 분류를 수행하는 과정과, 수행된 정렬 및 분류 결과를 이용하여 최대우도 검출을 수행하는 과정을 포함하는 최대우도 검출 방법과 관련된 기술을 개시하고 있다.In Korean Patent Application Publication No. 10-2009-0009478 (published date: January 23, 2009), in a method for detecting maximum likelihood in a wireless communication system, among channel matrix information, noise power information, and modulation order information for each stream, The process of calculating the Euclidean distance using at least one piece of information, the process of calculating the signal pair error rate (PER) using the calculated Euclidean distance, and the process of calculating the error probability for each stream using the calculated PER Disclosing a technique related to a maximum likelihood detection method including a process of performing a process, a process of aligning and classifying a calculated error probability for each stream, and a process of performing maximum likelihood detection using the performed alignment and classification results, and have.

상기에서 설명된 정보는 단지 이해를 돕기 위한 것이며, 종래 기술의 일부를 형성하지 않는 내용을 포함할 수 있으며, 종래 기술이 통상의 기술자에게 제시할 수 있는 것을 포함하지 않을 수 있다.The information described above is for illustrative purposes only, may include content that does not form part of the prior art, and may not include what the prior art may present to a person skilled in the art.

일 실시예는, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하고, 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하고, l₁정규화의 결과에 기반하여 샘플 집합과 연관된 모델의 최대우도를 추정하는 기계학습을 위한 최대우도 추정 방법이 제공된다.In one embodiment, trimming for processing outlier values and heavy tailed noise for a loss function for a sample set is performed, and l ₁ in consideration of a penalty determined as the trimming is performed. A maximum likelihood estimation method for machine learning that performs normalization and estimates the maximum likelihood of a model associated with a sample set based on the result of l ₁ normalization is provided.

일 측면에 있어서, 컴퓨터에 의해 수행되는, 기계학습을 위한 최대우도 추정 방법에 있어서, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 단계, 상기 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 단계 및 상기 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 단계를 포함하고, 상기 트리밍을 수행하는 단계는 상기 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것을 포함하는, 최대우도 추정 방법이 제공된다. In one aspect, in the maximum likelihood estimation method for machine learning, performed by a computer, for processing outlier values and heavy tailed noise for a loss function for a sample set. Performing trimming, performing l ₁ normalization in consideration of a penalty determined as the trimming is performed, and estimating a maximum likelihood of a model associated with the sample set based on a result of the l ₁ normalization, A method of estimating a maximum likelihood is provided, in which the step of performing the trimming includes trimming the entry that generates the greatest penalty for the loss function.

상기 손실 함수에 대한 트리밍은 수학식 1에 따라 수행되고,Trimming for the loss function is performed according to Equation 1,

[수학식 1][Equation 1]

은 n개의 샘플들에 대한 상기 샘플 집합이고, 상기

은 상기 손실함수이고, w_i는 가중치이고, h는 트리밍 파라미터이고, 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것은 수학식 2에 따라 수행되고,

Is the sample set for n samples, and

Is the loss function, w _i is the weight, h is the trimming parameter, and trimming the entry that causes the greatest penalty is performed according to Equation 2,

[수학식 2][Equation 2]

는 파라미터 공간을 나타낼 수 있다.

May represent the parameter space.

상기 트리밍을 수행하는 단계는, 상기 수학식 2에서

의 순서를 정의하고,

의 크기에 기반하여 w_i를 0 또는 1로 설정함으로써, 획득되는 수학식 3에 따라 수행되고, The step of performing the trimming is in Equation 2

Define the order of,

It is performed according to Equation 3 obtained by setting w _i to 0 or 1 based on the size of,

[수학식 3][Equation 3]

는 정규화기(regularizer)로서,

의 최소 p-h 절대합을 나타낼 수 있다.

Is a regularizer,

It can represent the absolute sum of the minimum ph of.

희소 페널티(sparsity penalty)에 종속되는 손실을 최소화하기 위해, 상기

을 0으로 설정할 수 있다. In order to minimize the loss dependent on the sparsity penalty,

Can be set to 0.

상기 모델은 희소 선형 모델(sparse linear model)이고, 상기 희소 선형 모델은 선형 관계에 있는 실수 값 타겟

과 공변량

의 n 개의 관찰 쌍을 가가지는 것으로서 수학식 4와 같이 정의되고,The model is a sparse linear model, and the sparse linear model is a real value target in a linear relationship.

And covariate

It is defined as in Equation 4 as having n observation pairs of,

[수학식 4][Equation 4]

,

, 및

은 독립적인 관측 노이즈이고,

는 추정 대상이되는 k-희소 벡터일 수 있다.

,

, And

Is the independent observed noise,

May be a k-sparse vector to be estimated.

상기 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것은 수학식 5에 따라 수행되고,Trimming the entry causing the greatest penalty is performed according to Equation 5,

[수학식 5][Equation 5]

는 상기 손실 함수에 대응할 수 있다.

May correspond to the loss function.

상기 모델은 희소 그래픽 모델(sparse graphical model)이고, 상기 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것은 수학식 6에 따라 수행되고,The model is a sparse graphical model, and trimming the entry that causes the greatest penalty is performed according to Equation 6,

[수학식 6][Equation 6]

는 정치 행렬(positive definite matrices)의 볼록한 콘(convex cone)을 나타내고,

는 비대각의 최소 p(p-1)-h 절대합을 나타낼 수 있다.

Denotes the convex cone of positive definite matrices,

Can represent the absolute sum of the non-diagonal minimum p(p-1)-h.

상기 수학식 2의 문제는 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 사용하여 해결될 수 있다. The problem of Equation 2 can be solved using a block coordinate descent algorithm.

상기 블록 좌표 하강 알고리즘은,

,

및

를 입력 받는 단계,

,

및

를 0으로 초기화하는 단계, 수렴하지 않는 동안,

를 수행하는 단계 및

및

를 출력하는 단계를 포함하여 수행될 수 있다. The block coordinate descent algorithm,

,

And

Steps to receive input,

,

And

Initializing to 0, while not converging,

And

It may be performed including the step of outputting.

다른 일 측면에 있어서, 최대우도 추정 방법을 포함하여 수행되는 기계학습 방법에 있어서, 상기 최대우도 추정 방법은, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 단계, 상기 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 단계 및 상기 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 단계를 포함하고, 상기 트리밍을 수행하는 단계는 상기 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것을 포함하는 기계학습 방법이 제공된다. In another aspect, in a machine learning method performed including a maximum likelihood estimation method, the maximum likelihood estimation method includes outlier values and heavy tailed noise for a loss function for a sample set. noise), performing l ₁ normalization in consideration of the penalty determined as the trimming is performed, and estimating the maximum likelihood of the model associated with the sample set based on the result of the l ₁ normalization A machine learning method is provided, wherein the step of performing the trimming includes trimming the entry that generates the greatest penalty for the loss function.

또 다른 일 측면에 있어서, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 트리밍 수행부, 상기 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 정규화 수행부 및 상기 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 우도 추정부를 포함하고, 상기 트리밍 수행부는 상기 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는, 최대우도 추정기가 제공된다. In another aspect, a trimming performing unit that performs trimming for processing outlier values and heavy tailed noise for a loss function for a sample set, determined as the trimming is performed A normalization performing unit that performs l ₁ normalization in consideration of a penalty, and a likelihood estimating unit that estimates a maximum likelihood of a model associated with the sample set based on a result of the l ₁ normalization, and the trimming unit A maximum likelihood estimator is provided that trims the entries causing a large penalty.

상기 트리밍 수행부는 수학식 1에 따라 상기 손실 함수에 대한 트리밍을 수행하고,The trimming unit performs trimming on the loss function according to Equation 1,

[수학식 1][Equation 1]

은 n개의 샘플들에 대한 상기 샘플 집합이고, 상기

은 상기 손실함수이고, w_i는 가중치이고, h는 트리밍 파라미터이고, 상기 가장 큰 페널티를 발생시키는 엔트리를 수학식 2에 따라 트리밍하고,

Is the sample set for n samples, and

Is the loss function, w _i is a weight, h is a trimming parameter, and the entry that causes the greatest penalty is trimmed according to Equation 2,

[수학식 2][Equation 2]

는 파라미터 공간을 나타낼 수 있다.

May represent the parameter space.

상기 수학식 2의 문제는 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 사용하여 해결되고, 상기 블록 좌표 하강 알고리즘은,

,

및

를 입력 받는 단계,

,

및

를 0으로 초기화하는 단계, 수렴하지 않는 동안,

를 수행하는 단계 및

및

를 출력하는 단계를 포함하여 수행될 수 있다. The problem of Equation 2 is solved using a block coordinate descent algorithm, and the block coordinate descent algorithm,

,

And

Steps to receive input,

,

And

Initializing to 0, while not converging,

And

It may be performed including the step of outputting.

또 다른 일 측면에 있어서, 기계학습 장치에 있어서, 최대우도 추정기를 포함하고, 상기 최대우도 추정기는, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 트리밍 수행부, 상기 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 정규화 수행부 및 상기 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 우도 추정부를 포함하고, 상기 트리밍 수행부는 상기 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는, 기계학습 장치가 제공된다. In another aspect, in the machine learning apparatus, a maximum likelihood estimator is included, and the maximum likelihood estimator includes outlier values and heavy tailed noise for a loss function for a sample set. A trimming performing unit performing trimming for processing, a normalization performing unit performing l ₁ normalization in consideration of the penalty determined as the trimming is performed, and the maximum likelihood of the model associated with the sample set based on the result of the l ₁ normalization A machine learning apparatus is provided, comprising: a likelihood estimation unit for estimating a, wherein the trimming performing unit trims an entry generating the greatest penalty for the loss function.

실시예의 트리밍된 l₁ 페널티를 사용하는 최대우도 추정 장치 및 방법은 기계학습 분야에서 적용 및 응용이 가능하다. 특히, 인공신경망에 이를 적용하는 것을 통해서는 학습에 필요한 중요 특징들을 효과적으로 골라낼 수 있게 된다. The apparatus and method for estimating maximum likelihood using the trimmed l ₁ penalty of the embodiment can be applied and applied in the field of machine learning. In particular, by applying this to an artificial neural network, it is possible to effectively select important features necessary for learning.

또한, 트리밍의 수행에 따라 용량이 큰 인공 신경망을 경량화할 수 있다. 이러한 경량화가 이루어짐으로써, 실시간으로 데이터를 입력는 IoT 환경이나 스마트폰과 같은 소형의 단말에 대해서도 용이하게 기계학습을 위한 알고리즘을 적용할 수 있게 될 것이다. In addition, it is possible to lighten the artificial neural network with a large capacity according to the trimming. By achieving such weight reduction, it will be possible to easily apply an algorithm for machine learning to an IoT environment where data is input in real time or to a small terminal such as a smartphone.

도 1은 일 실시예에 따른 최대우도 추정을 포함하는 기계학습을 수행하는 장치를 나타낸다.
도 2는 일 실시예에 따른 최대우도를 추정하는 방법을 나타내는 흐름도이다.
도 3은 일 실시예에 따른 트리밍을 수행하기 위한 문제를 해결하기 위한 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 나타낸다.
도 4는 일 실시예에 따른 알고리즘과 다른 알고리즘 간의 수렴 비교를 나타낸다.1 shows an apparatus for performing machine learning including maximum likelihood estimation according to an embodiment.
2 is a flowchart illustrating a method of estimating a maximum likelihood according to an exemplary embodiment.
3 shows a block coordinate descent algorithm for solving a problem for performing trimming according to an embodiment.
4 shows a convergence comparison between an algorithm according to an embodiment and another algorithm.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. The same reference numerals in each drawing indicate the same members.

후술될 상세한 설명에서, l₁ 정규화는 모델 가중치의 l₁ 놈(norm)(가중치 각 요소 절대값의 합)에 대해 패널티를 부과하는 것일 수 있다. 예컨대, 대부분의 요소값이 0인 희소 특징(sparse feature)에 의존하는 모델에 있어서 l₁ 정규화는 불필요한 피처에 대응하는 가중치들을 정확히 0으로 만듦으로써, 해당 피처를 모델이 무시하도록 만드는 것일 수 있다. In a detailed description to be described later, the l ₁ normalization may be to impose a penalty on the l ₁ norm of the model weight (the sum of the absolute values of each element of the weight). For example, in a model that relies on a sparse feature where most of the element values are 0, l ₁ normalization may cause the model to ignore the feature by making weights corresponding to unnecessary features exactly 0.

l₂ 정규화는 모델 가중치의 l₁ 놈의 제곱(가중치 각 요소 제곱의 합)에 대해 패널티를 부과하는 것일 수 있다. l₁ 정규화는 아주 큰 값이나 작은 값을 가지는 아웃라이어(outlier) 모델 가중치를 0에 가깝지만 0은 아닌 값으로 만들 수 있다. The l ₂ normalization may be to impose a penalty for the square of l ₁ norm of the model weight (the sum of the squares of each element of the weight). The l ₁ normalization can make the weight of an outlier model with very large or small values close to zero but non-zero.

도 1은 일 실시예에 따른 최대우도 추정을 포함하는 기계학습을 수행하는 장치를 나타낸다.1 shows an apparatus for performing machine learning including maximum likelihood estimation according to an embodiment.

도시된 기계학습 장치(100)는 기계학습을 수행하는 여하한 컴퓨팅 장치(컴퓨터 또는 서버)를 포함할 수 있다. 기계학습은 예컨대, 인공신명망을 사용하는 딥러닝 방식을 포함할 수 있고, 여하한 인공지능에 기반한 기계 학습 방법을 의미할 수 있다. The illustrated machine learning device 100 may include any computing device (computer or server) that performs machine learning. Machine learning may include, for example, a deep learning method using an artificial new network, and may refer to a machine learning method based on any artificial intelligence.

기계학습 장치(100)는 스마트 폰, PC(personal computer), 노트북 컴퓨터(laptop computer), 랩탑 컴퓨터(laptop computer), 태블릿(tablet), 사물 인터넷(Internet Of Things) 기기, 또는 웨어러블 컴퓨터(wearable computer) 등의 사용자가 사용하는 단말(또는 그 일부)일 수 있다. 또는, 기계학습 장치(100)는 사용자 단말에 대해 서비스나 콘텐츠를 제공하는 서버 또는 그 일부일 수 있다. 기계학습 장치(100)는 사물인터넷(IoT) 환경에서 동작하는 장치일 수 있다. The machine learning device 100 is a smart phone, a personal computer (PC), a laptop computer, a laptop computer, a tablet, an Internet of Things (Internet Of Things) device, or a wearable computer. ), etc., may be a terminal used by a user (or a part thereof). Alternatively, the machine learning apparatus 100 may be a server that provides services or contents to a user terminal or a part thereof. The machine learning device 100 may be a device operating in an Internet of Things (IoT) environment.

기계학습 장치(100)는 최대우도 추정기(110)를 포함할 수 있다. 최대우도 추정기(110)는 기계학습 과정에서 수행되는 최대우도를 추정하는 장치로서, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 트리밍 수행부(120), 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 정규화 수행부(130) 및 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 우도 추정부(140)를 포함할 수 있다. 트리밍 수행부(120)는 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍함으로써 트리밍을 수행할 수 있다. The machine learning apparatus 100 may include a maximum likelihood estimator 110. The maximum likelihood estimator 110 is a device for estimating the maximum likelihood performed in a machine learning process, and trimming for processing outlier values and heavy tailed noise for a loss function for a sample set. a trimming execution unit 120, a maximum likelihood of the trimming is performed in consideration of the determined penalty according to the associated on the basis of results of the qualified execution unit 130, and l ₁ qualified to perform l ₁ normalized set of the sample-model to perform the It may include a likelihood estimation unit 140 to estimate. The trimming performing unit 120 may perform trimming by trimming the entry that generates the greatest penalty for the loss function.

최대우도 추정기(110) 및 그 구성들은 기계학습 장치(100)가 포함하는 프로세서의 일부로서 구현될 수 있고, 각각은 하나 이상의 소프트웨어 모듈 및/또는 하드웨어 모듈로 구현될 수 있다.The maximum likelihood estimator 110 and its components may be implemented as part of a processor included in the machine learning apparatus 100, and each may be implemented as one or more software modules and/or hardware modules.

도 2는 일 실시예에 따른 최대우도를 추정하는 방법을 나타내는 흐름도이다. 2 is a flowchart illustrating a method of estimating a maximum likelihood according to an exemplary embodiment.

도 2를 참조하여, 전술된 최대우도 추정기(110)에 의한 최대우도 추정 방법에 대해 더 자세하게 설명한다. With reference to FIG. 2, the maximum likelihood estimation method by the maximum likelihood estimator 110 described above will be described in more detail.

단계(210)에서, 트리밍 수행부(120)는 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행할 수 있다. 트리밍 수행부(120)는 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍함으로써 트리밍을 수행할 수 있다. In operation 210, the trimming unit 120 may perform trimming for processing outlier values and heavy tailed noise with respect to a loss function for a sample set. The trimming performing unit 120 may perform trimming by trimming the entry that generates the greatest penalty for the loss function.

단계(220)에서, 정규화 수행부(130)는 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행할 수 있다. 정규화 수행부(130)는 유사하게 트리밍된 페널티를 고려하여 l₂ 정규화를 비롯한 다른 정규화를 추가로 수행할 수 있다. In step 220, the normalization performing unit 130 may perform l ₁ normalization in consideration of the penalty determined as the trimming is performed. The normalization performing unit 130 may additionally perform other normalization including l ₂ normalization in consideration of the similarly trimmed penalty.

단계(230)에서, 우도 추정부(230)는 l₁정규화의 결과에 기반하여 샘플 집합과 연관된 모델의 최대우도를 추정할 수 있다. 샘플 집합과 연관된 모델은 희소 선형 모델(sparse linear model) 또는 희소 그래픽 모델(sparse graphical model)일 수 있다.In step 230, the likelihood estimator 230 may estimate the maximum likelihood of the model associated with the sample set based on the result of the l ₁ normalization. The model associated with the sample set may be a sparse linear model or a sparse graphical model.

도 1을 참조하여 전술된 기술적 특징은 도 4에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다. Since the technical features described above with reference to FIG. 1 may be applied to FIG. 4 as they are, duplicate descriptions are omitted.

도 3은 일 실시예에 따른 트리밍을 수행하기 위한 문제를 해결하기 위한 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 나타낸다.3 shows a block coordinate descent algorithm for solving a problem for performing trimming according to an embodiment.

도 3은 아래에서 설명되는 수학식 2의 문제를 해결하기 위한 블록 좌표 하강 알고리즘을 나타낼 수 있다. 말하자면, 후술될 수학식 2의 문제는 도 3에서 도시된 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 사용하여 해결될 수 있다. 3 may show a block coordinate descending algorithm for solving the problem of Equation 2 described below. In other words, the problem of Equation 2 to be described later can be solved using a block coordinate descent algorithm shown in FIG. 3.

블록 좌표 하강 알고리즘은,

,

및

를 입력 받는 단계,

,

및

를 0으로 초기화하는 단계, 수렴하지 않는 동안,

를 수행하는 단계 및

및

를 출력하는 단계를 포함하여 수행될 수 있다. The block coordinate descent algorithm is:

,

And

Steps to receive input,

,

And

Initializing to 0, while not converging,

And

It may be performed including the step of outputting.

도 1 및 도 2를 참조하여 전술된 기술적 특징은 도 3에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다. Since the technical features described above with reference to FIGS. 1 and 2 may be applied to FIG. 3 as they are, overlapping descriptions will be omitted.

아래에서는 트리밍된(trimmed) l₁ 페널티를 가지는 고차원 M-추정기(최대우도 추정기)에 대해 보다 더 자세하게 설명한다. 아래에서 설명되는 M-추정기는 최대우도 추정기(110)에 대응할 수 있다. 후술될 트리밍, 정규화 및 우도 추정은 전술된 트리밍 수행부(120), 정규화 수행부(130) 및 우도 추정부(140)에 의해 각각 수행될 수 있다.The following describes in more detail the high-dimensional M-estimator (maximum likelihood estimator) with a trimmed l ₁ penalty. The M-estimator described below may correspond to the maximum likelihood estimator 110. Trimming, normalization, and likelihood estimation, which will be described later, may be performed by the aforementioned trimming performing unit 120, normalizing performing unit 130, and likelihood estimating unit 140, respectively.

표준적인 l₁ 페널티는 바이어스(shrinkage)를 초래하지만, 트리밍된 l₁은 h 최대 엔트리에서 페널티가 없게 된다(penalty-free). 이러한 추정기군은 희소 선형 회귀 분석(sparse linear regression)을 위한 트리밍된 라소(Trimmed Lasso)와 희소 그래픽 모델 추정을 위한 그 상대방(counterpart)을 포함한다. 트리밍된 l₁ 페널티는 비-볼록형(non-convex)이지만 SCAD 및 MCP와 같은 다른 비-볼록형 정규화기와는 달리 amenable하지 않으므로 사전 분석을 적용 할 수 없다.The standard l ₁ penalty results in shrinkage, but trimmed l ₁ is penalty-free at the h maximum entry. This group of estimators includes a trimmed Lasso for sparse linear regression and a counterpart for sparse graphic model estimation. The trimmed l ₁ penalty is non-convex, but unlike other non-convex normalizers such as SCAD and MCP, it is not amenable, so prior analysis cannot be applied.

또한, 트리밍 파라미터 h의 함수로서 추정치의 서포트 리커버리(support recovery)을 특징화한다. 특정 조건 하에서, 어떠한 로컬 옵티멈(local optimum)에 대해서도, (i) 트리밍 파라미터 h가 트루(true) 서포트 크기(support size)보다 작은 경우, 트루 파라미터 벡터의 모든 제로 엔트리는 성공적으로 0으로 추정되고, (ii) h가 트루 서포트 크기보다 더 크면 로컬 옵티멈의 비관련 파라미터는 관련 파라미터보다 절대 값이 작게 되므로 따라서 관련 파라미터는 페널티화되지 않는다는 점이 나타난다.It also characterizes the support recovery of the estimate as a function of the trimming parameter h. Under certain conditions, for any local optimum, (i) if the trimming parameter h is less than the true support size, all zero entries of the true parameter vector are successfully estimated to be zero, (ii) If h is larger than the true support size, the absolute value of the non-related parameter of the local optimum becomes smaller than that of the related parameter, so the related parameter is not penalized.

다음으로, 여하한 로컬 옵티멈의 l₂ 에러를 제한한다(bound). 이러한 제한의 범위는 SCAD 또는 MCP와 같이 볼록하지 않은 조정 가능한 페널티(amenable penalty)를 위한 것들과 점근적으로 비견될 수 있으나, 일정한 것이 더 좋다. 주요 결과는 선형 회귀 및 그래픽 모델 추정에 특화될 수 있다. Next, we bound the l ₂ error of any local optimum. The extent of this limitation can be asymptotically compared to those for an amenable penalty that is not convex such as SCAD or MCP, but it is better to be constant. The main results can be specialized in linear regression and graphical model estimation.

또한, 트리밍된 정규화 문제에 대해 신속하고 입증 가능한 수렴 최적화 알고리즘에 대해 설명한다. 이러한 알고리즘은 볼록(convex)의 미분(DC) 기반 접근법과 동일한 수렴 속도를 갖지만, 실제로는 더 빠르고 최근에 제안된 DC 최적화를 위한 알고리즘보다 더 객관적인 값들을 찾아낼 수 있다. 이를 통해 시예에 따른 l₁ 트리밍의 가치가 입증될 수 있다.It also describes a fast and verifiable convergence optimization algorithm for a trimmed normalization problem. These algorithms have the same convergence speed as the convex derivative (DC)-based approach, but are actually faster and can find more objective values than the recently proposed algorithm for DC optimization. Through this, the value of 1 ₁ trimming according to the example can be proved.

여기에서는, 가장 큰 h 파라미터를 비페널티화된 상태로 유지하는, 트리밍된 l₁ 정규화를 통한 M-추정기군(family of M-estimator)에 대해 설명한다. M-추정기군은 트리밍된 라소(Trimmed Lasso) 추정기와 희소한 그래픽 모델 추정을 위한 상대방을 포함할 수 있다(이는 그래픽 트리밍된 라소(Graphical Trimmed Lasso)라고 할 수 있음).Here, a description will be given of a family of M-estimators through trimmed l ₁ normalization that maintains the largest h parameter in a non-penalized state. The M-estimator group may include a trimmed lasso estimator and a counterpart for estimating a sparse graphic model (this may be referred to as a graphical trimmed lasso).

정규화기의 분리 가능한 구성 요소에 대해 트리밍 메커니즘을 적용할 수 있다. 트리밍된 정규화를 통한 M-추정기의 제1 통계적 분석을 제시한다. 이러한 추정기는 볼록하지 않지만, SCAD 및 MCP 정규화기와는 달리 amenable하지 않으므로 사전 분석을 적용할 수 없다.A trimming mechanism can be applied to the separable components of the normalizer. We present a first statistical analysis of the M-estimator through trimmed normalization. These estimators are not convex, but unlike SCAD and MCP normalizers, they are not amenable, so prior analysis cannot be applied.

도출되는 이론적인 결과는 트리밍 파라미터 h가 트루(true) 서포트 크기보다 작으면 결과적인 비-볼록형 프로그램의 모든 로컬 옵티멈에 대해 트루 파라미터 벡터의 모든 제로 엔트리가 성공적으로 0으로 추정된다는 것을 보여주고; h가 트루 서포트 크기보다 큰 경우 로컬 옵티멈의 비관련 파라미터는 관련 파라미터보다 절대 값이 작으므로 따라서, 관련 파라미터는 페널티화되지 않는다는 것을 보여준다.The theoretical results derived show that if the trimming parameter h is less than the true support size, all zero entries of the true parameter vector are successfully estimated to be zero for all local optima of the resulting non-convex program; When h is greater than the true support size, it is shown that the non-related parameter of the local optimum has an absolute value smaller than that of the related parameter, and thus the related parameter is not penalized.

에러 범위(error bound) 외에도, l₂ 오류 범위를 제공한다. 이들은 SCAD 또는 MCP와 같이 조정 가능한 정규화된 문제들을 위한 것들과 점근적으로 동일하지만, 일정한 것이 더 좋고, 추가 제한 조건

을 요구하지 않는다(여기서 R은 안전 반경(safety radius)임). 주요 결과는 선형 회귀와 그래픽 모델 추정의 특수한 경우에 특화될 수 있다.

In addition to the error bound, it provides an l ₂ error range. These are asymptotically identical to those for tunable normalized problems such as SCAD or MCP, but the constant is better, and additional constraints

Is not required (where R is the safety radius). The main results can be specialized in the special case of linear regression and graphical model estimation.

트리밍된 정규화 문제를 최적화하기 위해, 우리는 볼록의 미분(DC) 함수 최적화를 기반으로 한 최근의 방법보다 더 우수한 특수 알고리즘을 개발하였고, 이에 대해 설명한다.In order to optimize the trimmed normalization problem, we have developed a special algorithm that is superior to the recent method based on convex differential (DC) function optimization, and described.

시뮬레이션 및 실제 데이터에 대한 실험을 통해 SCAD, MCP 및 바닐라 l₁ 페널티에 비해 l₁ 트리밍의 가치가 입증될 수 있다. l₁ 정규화를 넘어, 트리밍 전략은 그룹-희소성 촉진 l₁/l_q 정규화를 포함하는 다른 분해 가능한 정규화기에 대해서도 원활하게 적용될 수 있다.Simulation and experimentation with real data can demonstrate the value of ₁ trimming versus the SCAD, MCP and vanilla ₁ penalty. Beyond l ₁ normalization, the trimming strategy can be smoothly applied to other decomposable normalizers, including group-sparse promotion l ₁ /l _q normalization.

[문제 설정 및 트리밍된 정규화기][Problem setting and trimmed normalizer]

트리밍은 일반적으로 M-추정기의 손실 함수에 적용될 수 있다. n개의 샘플들의 주어진 집합인

에 대한 손실 함수

에 대해 큰 잔차(large residuals) 가지는 관측값(observation)을 트리밍함으로써 아웃라이어(outlier)와 헤비 테일 노이즈를 처리할 수 있다. h 아웃라이얼을 트리밍하는 아래 수학식 1의 문제를 해결할 수 있다. 말하자면, 손실 함수에 대한 트리밍은 상기 수학식 1에 따라 수행될 수 있다. w_i는 가중치이고, h는 트리밍 파라미터를 나타낼 수 있다. Trimming can generally be applied to the loss function of the M-estimator. given a set of n samples

Loss function for

Outliers and heavy tail noises can be processed by trimming observations having large residuals for. h The problem of Equation 1 below for trimming the outline can be solved. In other words, the trimming for the loss function can be performed according to Equation 1 above. w _i is a weight, and h may be a trimming parameter.

여기에서, 일반적인 고차원 문제에 대해 트리밍된 정규화를 사용하는 M-추정기군을 고려한다. 가장 큰 페널티를 초래하는

의 엔트리(즉, 가장 큰 페널티를 발생시키는 엔트리)는 아래 수학식 2에 의해 트리밍될 수 있다. Here, we consider a group of M-estimators using trimmed normalization for general high-dimensional problems. With the greatest penalty

The entry of (i.e., the entry that causes the greatest penalty) can be trimmed by Equation 2 below.

는 파라미터 공간을 나타낸다(예컨대, 선형 회귀에 대한

).

Represents the parameter space (e.g., for linear regression

).

파라미터

의 순서 통계를 정의하여, w (w_i를

의 크기에 기반하여 0 또는 1로 설정함)에 대해 최소화하고, 수학식 2의 문제의 축약된 버전을

에 대해서만 다시 아래의 수학식 3과 같이 나타낼 수 있다. 말하자면, 트리밍은 상기 수학식 2에서

의 순서를 정의하고,

의 크기에 기반하여 w_i를 0 또는 1로 설정함으로써, 획득되는 수학식 3에 따라 수행될 수 있다. parameter

By defining the order statistics of w (w _i

Is set to 0 or 1 based on the size of ), and the abbreviated version of the problem in Equation 2

Only for can be expressed as in Equation 3 below. In other words, trimming is in Equation 2 above

Define the order of,

It may be performed according to Equation 3 obtained by setting w _i to 0 or 1 based on the size of.

정규화기

는

의 최소 p-h 절대합이다. 수학식 3의 축약된 버전은 희소 페널티(sparsity penalty)에 종속되는 손실을 최소화하는 것과 동일하며, 아래 수학식 4와 같이 표현될 수 있다. 말하자면, 희소 페널티(sparsity penalty)에 종속되는 손실을 최소화하기 위해,

은 0으로 설정될 수 있다. Normalizer

Is

Is the absolute sum of the minimum ph. The abbreviated version of Equation 3 is the same as minimizing a loss dependent on a sparsity penalty, and can be expressed as Equation 4 below. In other words, in order to minimize the loss dependent on the sparsity penalty,

Can be set to 0.

통계적 분석을 위해, 수학식 3의 문제에 초점을 맞춘다. 최적화 시에 수학식 2의 구조를 이용하여 가중치 w를 보조 최적화 변수로서 취급할 수 있다. 이는 수학식 3의 DC 구조에 기반하지 않는 새로운 빠른 알고리즘과 분석 기법을 제공할 수 있다. For statistical analysis, we focus on the problem of Equation 3. During optimization, the weight w can be treated as an auxiliary optimization variable by using the structure of Equation 2. This can provide a new fast algorithm and analysis technique that is not based on the DC structure of Equation 3.

선행 기술은 많은 실제 응용 프로그램에 의해 동기가 부여된 다양한 희소 정규화된 된 추정기를 위한 추정 상한선을 도출한다. The prior art derives an estimate upper bound for various sparse normalized estimators motivated by many practical applications.

본 개시에서는, 주로 트리밍된 l₁ 정규화기를 사용하는 2개의 일반적인 예시를 주로 고려하지만, 결과는 일반화된다.In this disclosure, mainly two general examples of using a trimmed 1 ₁ regularizer are considered, but the results are generalized.

예시: 희소 선형 모델. 고차원 선형 회귀 문제에서, 아래 수학식 5와 같은 선형 관계에 있는 실수 값 타겟

과 공변량

의 n 개의 관찰 쌍을 가질 수 있다. 희소 선형 모델은 아래 수학식 5와 같이 정의될 수 있다. Example: sparse linear model . In a high-dimensional linear regression problem, a real value target in a linear relationship as shown in Equation 5 below

And covariate

Can have n observation pairs. The sparse linear model can be defined as in Equation 5 below.

여기서,

,

, 및

은 n 개의 독립적인 관측 노이즈를 나타낸다. 목표는 k-희소 벡터

를 추정하는 것이다. 수학식 3의 프레임워크 에 따르면 우리는 (Lasso Tibshirani의 표준 l₁ 놈 대신에) l₁ 정규화기를 트리밍한 아래 수학식 6의 최소 자승 손실 함수를 사용한다.

는 전술된 손실 함수에 대응할 수 있다. here,

,

, And

Represents n independent observed noises. Goal is k-sparse vector

Is to estimate. According to the framework of Equation 3, we use the least squares loss function of Equation 6 below by trimming the l ₁ regularizer (instead of the standard l ₁ norm of Lasso Tibshirani).

May correspond to the aforementioned loss function.

예시: 희소 그래픽 모델. GGM(Gaussian Graphical Model)은 변수들 사이에서 조건부 독립성 조건을 부호화하기 위해 무방향성 그래프를 사용하고, 변수 집합에 대한 분포를 나타내기 위헤 강력한 통계적 모델군을 형성할 수 있다. Example: sparse graphic model . The Gaussian Graphical Model (GGM) uses an undirected graph to encode conditional independence conditions among variables, and can form a powerful statistical model group to represent the distribution of a set of variables.

이러한 고차원 설정에서, 그래프 희소성 구속 조건은 특히, GGM을 추정하기 위해 적합할 수 있다. 가장 널리 사용되는 추정기인, 그래픽 라소는 정밀도 행렬(precision matrix)의 엔트리(또는 대각선 엔트리)의 l₁놈(norm)에 의해 정규화된 음의 가우시안 대수 우도를 최소화한다. 실시예의 프레임워크에서는 아래의 수학식 7과 같이 l₁은 트림된 버전으로 대체할 수 있다.In such a high-dimensional setting, the graph sparsity constraint may be particularly suitable for estimating GGM. The most widely used estimator, the graphic laso, minimizes the negative Gaussian log likelihood normalized by the l ₁ norm of the entry (or diagonal entry) of the precision matrix. In the framework of the embodiment, 1 ₁ may be replaced with a trimmed version as shown in Equation 7 below.

여기서,

는 대칭이며 엄밀하게 정치 행렬(positive definite matrices)의 볼록한 콘(convex cone)을 나타내고,

는 비대각의 최소 p(p-1)-h 절대합을 나타낼 수 있다. here,

Denotes the convex cone of symmetrical and strictly positive definite matrices,

Can represent the absolute sum of the non-diagonal minimum p(p-1)-h.

[트리밍된 정규화의 이론적인 보증] [Theoretical guarantee of trimmed normalization]

예상된 손실의 최소화기인 트루(true) k-희소 파라미터 벡터(또는 행렬)

을 추정하고자 한다:

A true k-sparse parameter vector (or matrix) as a minimizer of expected loss

I want to estimate:

S는

의 서포트 세트를 지칭하기 위해 사용한다. 즉, 0이 아닌 엔트리 세트(즉, k = |S|)를 나타낸다. 여기에서는, 다음과 같은 표준 가정하에 추정 일관성의 상한(

및 서포트 세트 리커버리)을 유도한다: S is

It is used to refer to the support set of. That is, it represents a non-zero entry set (ie, k = |S|). Here, the upper bound of the estimated consistency (

And support set recovery):

(C-1) 손실 함수

은 미분 가능하고 볼록하다. (C-1) loss function

Is differentiable and convex.

(C-2) (

에 대한 제한된 강한 복잡도)

를 파라미터

에 대한 에러 벡터의 가능한 세트로 둔다.

이다. 아래 수학식 8과 같은 관계 식이 얻어질 수 있다. (C-2) (

For limited strong complexity)

Parameter

Let be a possible set of error vectors for.

to be. A relational expression such as Equation 8 below can be obtained.

는 곡률(curvature) 파라미터이고,

는 공차(tolerance) 상수이다.

Is the curvature parameter,

Is the tolerance constant.

볼록 손실 함수

은 일반적으로 고차원 설정 (p> n) 하에서 강하게 볼록하게 될 수 없다. (C-2)는 비율

이 작은 일부 제한된 방향에서만 강한 곡률을 부과할 수 있다. 이러한 조건은 광범위하게 연구되어 몇몇 대중적인 고차원 문제를 보유하고 있음이 알려져있다. (C-1)에서의

의 복잡도 조건은 추가적인 약한 구속 조건을 도입함으로써 완화될 수 있다. 그러나 본 개시에서는 명확성을 위해 볼록한 손실에 초점을 맞춘다.Convex loss function

Is generally not able to become strongly convex under high-dimensional settings (p>n). (C-2) the ratio

Only some of these small limited directions can impose strong curvature. It is known that these conditions have been studied extensively and have some popular high-level problems. In (C-1)

The complexity condition of can be relaxed by introducing additional weak constraints. However, this disclosure focuses on convex losses for clarity.

로 바운드(bound)로 시작한다. 이를 위해 트리밍된 정규화기

에 대해 특별히 고안된 PDW(primal-dual witness) 기법을 채택할 수 있다. 작업 라인은 PDW 기법을 사용하고, l₁ 정규화기와 조정 가능한 비볼록 정규화기에 대한 서포트 세트 리커버리를 나타낼 수 있다는 점에 유의한다. 그러나,

가 대칭이고 오목한 경우에도 이는 amenable하지 않다.

It starts with a bound. Trimmed regularizer for this

A PDW (primal-dual witness) technique specially designed for can be adopted. Note that the working line uses the PDW technique and can represent the support set recovery for an l ₁ normalizer and an adjustable non-convex normalizer. But,

Even if is symmetric and concave, it is not amenable.

PDW의 핵심 단계는 제한된 프로그램(restricted program)을 작성하는 것일 수 있다. T를 {1, ... , p}의 임의의 부분 집합이라하고, 크기를 h라고 하고,

이고,

라고 하면, 아래 수학식 9와 같이 제한된 프로그램을 나타낼 수 있다. The core step of the PDW may be to write a restricted program. Let T be an arbitrary subset of {1, ..., p}, let the size h be,

ego,

If so, it can represent a limited program as shown in Equation 9 below.

모든

에 대해

으로 수정할 수 있다. 이중 변수

를 제로 서브-그라디언트 조건(zero sub-gradient condition)을 만족하기 위해 더 구축해 둘 수 있다(아래 수학식 10 참조).all

About

Can be modified by Double variable

May be further constructed to satisfy the zero sub-gradient condition (see Equation 10 below).

(적절한 리오더링 후)

이고,

일 수 있다. 명확성을 위해서

와

에서 T에 대한 의존성을 억제할 수 있음에 유의한다. 최종적인 표현을 유도하기 위해

의 엄격한 이중 실행 가능성

, 즉,

을 설정할 수 있다. 다음 정리(theorem)는 비볼록 프로그램(수학식 3)의 로컬 옵티멈과 관련하는 주요한 이론적인 결과를 설명한다. 이 정리는 엄격한 이중 실행 가능성 하에서 로컬 옵티멈의 비관련 파라미터가 관련 파라미터보다 작은 절대 값을 가지므로 관련 파라미터가 페널티화되지 않는다는 것을 보증한다(h가 k보다 크게 설정되는 한).(After appropriate reordering)

ego,

Can be For clarity

Wow

Note that the dependence on T can be suppressed. To induce the final expression

Strict double practicability of

, In other words,

Can be set. The following theorem explains the main theoretical results related to the local optima of a non-convex program (Equation 3). This theorem guarantees that under strict double feasibility, the relevant parameter is not penalized as the unrelated parameter of the local optimum has an absolute value less than the relevant parameter (as long as h is set greater than k).

[정리 1] (C-1)과 (C-2)를 만족하는 트리밍된 정규화기(수학식 3)의 문제를 고려한다.

는 샘플 크기

이고,

인 여하한 수학식 3의 로컬 최소값일 수 있다. 다음을 가정한다: [Theorem 1] Consider the problem of a trimmed regularizer (Equation 3) that satisfies (C-1) and (C-2).

Is the sample size

ego,

May be any local minimum value of Equation 3. Assume the following:

(a)

의 여하한 선택에 대해, 수학식 10의 PDW 구조로부터의 이중 벡터

는 어떤(some)

에 대해 수학식 11과 같은 엄격한 이중 실행 가능성을 만족할 수 있다. (a)

For any choice of, double vector from PDW structure in Equation 10

Is some

It is possible to satisfy the strict double feasibility of Equation (11).

U는 트루 서포트 S 및 T의 합집합(union)일 수 있다.U may be a union of true supports S and T.

(b)

로 두면,

는 아래 수학식 12에 의해 하한(loer bounded)이 정해질 수 있다. (b)

If you leave it as,

May be determined by Equation 12 below.

는 행렬의 최대 절대 로우(row) 합을 나타낼 수 있다.

May represent the maximum absolute row sum of the matrix.

이에 따라 다음을 얻을 수 있다:This gives you:

(1) 모든

에 대해

일 수 있다.(1) all

About

Can be

(2)

이면, 모든

가 0으로서 성공적으로 추정될 수 있고, 아래 수학식 13과 같은 관계를 가질 수 있다. (2)

If, all

May be successfully estimated as 0, and may have a relationship as shown in Equation 13 below.

(3)

이면, S^C의 적어도 최소(절대의) p-h 엔트리는 정확하게 0을 가지지만, 대신에, 보다 간단하게(어쩌면 더 타이트하게) 아래의 수학식 14와 같이 정해질 수 있다. (3)

If so, at least the minimum (absolute) ph entry of S ^C has exactly 0, but instead, it can be more simply (maybe tighter) determined by Equation 14 below.

는 S를 포함하는

의 h 최대 절대 엔트리로서 정의될 수 있다.

Contains S

H of can be defined as the maximum absolute entry.

실제 문제에 대한 결과 정리(corollaries)에 있어서

및

와 관련하는 항들에 대한 실제 경계(bound)가 유도될 수 있다(예컨대,

가

에 의해 상한이 정해질 것이고, 따라서,

을 선택할 수 있음).In the corollaries of real problems

And

The actual bounds for terms related to can be derived (e.g.,

end

The upper limit will be set by, and therefore,

Can be selected).

수학식 11 및 12가 h = 0인 경우(Lasso)보다 더 엄격하게 보일지라도, h = 0과 점진적으로 동일한 확률 하에서의 모든 선택에 대해 균등하게 상한이라고 결과 정리(corollaries)에서는 볼 수 있다. Although Equations 11 and 12 seem more stringent than the case of h = 0 (Lasso), it can be seen from the corollaries that they are equally upper bound for all choices under a progressively equal probability of h = 0.

또한, h가 0으로 설정되면, 결과는 일반적인 l₁ 놈의 결과를 회복하게 될 것임에 유의한다. 게다가, 정리의 (1)에 의해,

이면,

는 단지 관련 특징 지수만을 포함하고 몇몇 관련 특징은 페널티화되지 않을 수 있다.

인 경우,

는 모든 관련 지표 (및 비관련 지표)를 포함할 수 있다. 이 경우, 수학식 13의 두 번째 항은 사라지지만,

항이 커짐에 따라

는 증가할 수 있다. 또한, h가 p에 접근함에 따라

인 조건이 위반될 수 있다. 많은 문제들에서 선험적으로 희소성 k를 알지 못하는 반면, 암시적으로

를 설정할 수 있는 것으로 가정한다(즉, 교차 검증).Also, note that if h is set to 0, the result will recover the normal l ₁ norm. Besides, by theorem (1),

If,

Contains only the relevant feature index and some relevant features may not be penalized.

If is,

May include all relevant (and unrelated) indicators. In this case, the second term in Equation 13 disappears,

As the term grows

Can increase. Also, as h approaches p

The phosphorus condition may be violated. While in many problems we do not know the scarcity k a priori, implicitly

Is assumed to be configurable (i.e., cross-validation).

이제 같은 조건 하에서 l₂ 추정 바운드(bound)로 돌아간다:Now we return to the estimated bound of l ₂ under the same conditions:

[결과 정리 2] 정리 1의 모든 조건이 만족되는 트리밍된 정규화기(수학식 3)을 사용하는 문제를 고려한다. 여하한 수학식 3의 로컬 최소값에 대해, l₂ 놈에 대한 파라미터 추정 에러는 어떤(some) 상수 C에 대하여, 다음과 같이 상한이 정해질 수 있다: [Result Theorem 2] Consider the problem of using a trimmed regularizer (Equation 3) that satisfies all the conditions of Theorem 1. For any local minimum value of Equation 3, the parameter estimation error for the l ₂ norm can be upper bound for some constant C as follows:

수학식 15에서의 l₂바운드는 임의의 로컬 옴티멈에 대해, 지역 최적 조건에 대해

의 크기가 정리 1에 의해

보다 작거나 같음이 보증되고, 따라서 주어진

에 대한 제한된 프로그램(수학식 9)의 l₂바운드를 획득하기 위해 알려진 결과에 대해 적용할 수 있으므로 쉽게 유도될 수 있다. 정리 1과 결과 정리 2로부터, 추정 바운드(bound)가 SCAD 나 MCP와 같은

-조정 가능한 정규화된 문제에 대한 것들과 점근적으로 동일하다는 것을 관찰 할 수 있다:

The l ₂ bound in Equation 15 is for any local omnimum and for local optimal conditions.

The size of by theorem 1

Is guaranteed to be less than or equal to, and thus a given

It can be easily derived because it can be applied to known results to obtain the l ₂ bound of the limited program for (Equation 9). From Theorem 1 and Results Theorem 2, the estimated bound is the same as SCAD or MCP.

-You can observe that it is asymptotically identical to those for the adjustable normalized problem:

그러나, 이러한 정규화기에 대한 상수는 (트리밍된 l_1-에 대한

대신에)

항이 관여하므로 더 클 수 있다. 또한, 이러한 비볼록형 정규화기는 이론적인 보장에 대한 최적화 문제에서 추가 제약 조건

을 요구하며

와 튜닝 파라미터변수 R에 대한 추가 가정을 도입한다.However, the constant for these regularizers is (for trimmed l _1-

Instead of)

It can be larger because the term is involved. In addition, these non-convex regularizers provide additional constraints in the optimization problem for theoretical guarantees.

Demanding

And additional assumptions for the tuning parameter variable R are introduced.

여기에서는 전술된 대중적인(popular) 고차원 문제에 대한 주요 정리를 적용한다: 희소 선형 회귀 및 희소 그래픽 모델(이에 대해서는 이후 자세하게 설명함).Here we apply the main theorem to the popular high-dimensional problem described above: sparse linear regression and sparse graphic models (more on this later).

희소 최소 자승법. 임의의 방법에 대한 정보 이론적인 접근에 의해 동기 부여되며, 희소 선형 회귀의 이전의 모든 분석은 충분히 큰 상수 c₀에 대해

로 가정할 수 있다. 또한, 정해진

에 대해,

라고 가정할 수 있다. Sparse least squares method . Motivated by an information-theoretic approach to any method, all previous analyzes of sparse linear regression have given a sufficiently large constant c ₀

Can be assumed. Also, set

About,

Can be assumed.

[결과 정리 3]

는 서브-가우시안인 모델(수학식 5)을 고려한다. 어떤 상수 c_l과 아래의 조건을 만족하는 h에 대한

의 선택을 통해 프로그램(수학식 6)을 푸는 것을 가정한다: [Result summary 3]

Consider a sub-Gaussian model (Equation 5). For some constant c _l and h satisfying the following conditions

Assume that the program (Equation 6) is solved through the choice of:

(a) 샘플 공분산 행렬

은 아래 수학식 16의 조건을 만족한다:(a) Sample covariance matrix

Satisfies the condition of Equation 16 below:

의 임의의 선택에 대해,

For a random choice of,

는 행렬의 최대 특이값(singular value)일 수 있다.

May be the maximum singular value of the matrix.

또한,

는 어떤 상수 c₁에 대해

에 의해 하한이 정해지는 것으로 가정할 수 있다. 그리고 적어도 1-c₂ exp(-c₃ log p)의 높은 확률로 수학식 6의 로컬 최소값

는 다음을 만족한다:Also,

Is for some constant c ₁

It can be assumed that the lower limit is determined by. And the local minimum value of Equation 6 with a high probability of at least 1-c ₂ exp(-c ₃ log p)

Satisfies:

(a) 모든

에 대해,

를 가지고,(a) all

About,

Have,

(b)

이면, 모든

는 0으로 성공적으로 추정되고 아래 수학식 17의 관계를 가지고,(b)

If, all

Is successfully estimated to be 0 and has the relationship of Equation 17 below,

(c)

이면, 적어도 S^c의 최소 p-h 엔트리에서 정확하게 0을 가지고 아래 수학식 18의 관계를 가질 수 있다.(c)

In this case, it is possible to have a relationship of Equation 18 below with exactly 0 in the minimum ph entry of S ^c .

결과 정리 3의 조건은 이전의 연구에서 검토된 바 있다. 특히, 수학식 16은 희소 최소 자승 추정기(Wainwright(2009b))에 대한 비 일관성 조건으로서 알려져 있다. 모든 조건은 서브-가우시안 행렬에 대한 표준 콘센트레이션 바운드(concentration bound)를 통해 높은 확률을 유지하는 것으로 보여질 수 있다. 라소(Lasso)의 경우, 비 일관성 조건이 Wainwright (2009b)를 위반하면 추정은 실패하게 될 것이라는 점을 유의하는 것이 중요하다. 라소와 달리, 이러한 조건이 충족되지 않더라도 l₁의 문제 (수학식 6)가 성공할 수 있음을 확인할 수 있다.The conditions for Theoretical Results 3 have been reviewed in previous studies. In particular, Equation 16 is known as an inconsistency condition for a sparse least squares estimator (Wainwright (2009b)). All conditions can be shown to maintain a high probability through standard concentration bounds for the sub-Gaussian matrix. In the case of Lasso, it is important to note that the estimation will fail if the inconsistency condition violates Wainwright (2009b). Unlike Lasso, it can be seen that the problem of l ₁ (Equation 6) can succeed even if these conditions are not satisfied.

[최적화][optimization]

수학식 2의 문제를 해결하기 위한 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 개발하고 분석한다. 가중치 w를 프로젝팅 아웃(projecting out)하기 보다는 명시적인 블록(explicit block)으로 남겨둠으로써, 가중치 w를 설정하기 전에 더 많은 자유도를 부여할 수 있다. 수학식 3에 대한 DC 수식에 의존하는 대신에 수학식 2의 구조를 사용하여 알고리즘을 분석할 수 있다. 이러한 접근법은 도 3에서 도시된 알고리즘 1에서 자세히 설명되어 있으며 수렴 분석은 아래의 정리 4에서 요약될 수 있다. A block coordinate descent algorithm for solving the problem of Equation 2 is developed and analyzed. By leaving the weight w as an explicit block rather than projecting out, more degrees of freedom can be given before setting the weight w. Instead of relying on the DC equation for Equation 3, the structure of Equation 2 can be used to analyze the algorithm. This approach is described in detail in Algorithm 1 shown in Figure 3 and the convergence analysis can be summarized in Theorem 4 below.

수학식 19와 같은 일반적인 목적 함수를 고려한다. Consider a general objective function such as Equation 19.

는 볼록성 지시 함수(convex indicator function)일 수 있다.

으로 둘 수 있다.

May be a convex indicator function.

I can put it as.

그리고, 아래의 가정을 만족하는 것으로 가정한다.And, it is assumed that the following assumptions are satisfied.

[가정 1] (a) f는 L-Lipchitz 연속 그라디언트를 갖는 매끄러운 닫힌 볼록 함수이다. (b) r_i는 볼록하고, (c) S는 닫힌 볼록 집합이며 F는 아래로 바운드된다. [Assumption 1] (a) f is a smooth closed convex function with an L-Lipchitz continuous gradient. (b) r _i is convex, (c) S is a closed convex set and F is bound down.

[정리 4] 가정 1의 (a)-(c)이 성립하면, 알고리즘 1에 의해 생성된 반복은 아래 수학식 20을 만족한다: [Theorem 4] If (a)-(c) of Assumption 1 is established, the iteration generated by Algorithm 1 satisfies Equation 20 below:

또한,

로 정의하고, 스텝 크기

를 선택하면, 아래 수학식 21을 얻을 수 있다.Also,

And the step size

By selecting, Equation 21 below can be obtained.

이는 최적 조건과 관련하여 서브리니어(sublinear)의 수렴 레이트(rate)를 제공한다. This gives a sublinear convergence rate with respect to the optimal condition.

수학식 2의 문제는 가정 1을 만족하고, 따라서 알고리즘 1은 알고리즘 1은

를 사용하여 측정된 것처럼 서브리니어 레이트로 수렴한다.The problem in Equation 2 satisfies the assumption 1, so Algorithm 1 is

It converges at the sublinear rate as measured using.

알고리즘 1의 효율성을 나타내기 위해, 알고리즘 2(Khamaru and Wainwright, 2018의 알고리즘)와 비교를 위한 수치 실험을 수행할 수 있다. DC 프로그램을 위한 여러 접근법이 제안되었다: prox-type 알고리즘(알고리즘 2)은 하위 집합 선택에 특히 적합할 수 있다.To demonstrate the effectiveness of Algorithm 1, numerical experiments for comparison with Algorithm 2 (Khamaru and Wainwright, 2018) can be performed. Several approaches have been proposed for DC programming: the prox-type algorithm (Algorithm 2) may be particularly suitable for subset selection.

500의 차원 변수 및 100의 샘플의 변수로 Lasso 시뮬레이션 데이터를 생성하였다. 실제 생성 변수에서 0이 아닌 요소의 수는 10일 수 있다. h = 25를 취하였고, 알고리즘 1과 알고리즘 2를 모두 적용하였다. 결과는 도 4에서 도시되었다. Lasso simulation data were generated with a dimensional variable of 500 and a variable of 100 samples. In the actual generated variable, the number of non-zero elements may be 10. h = 25 was taken, and both Algorithm 1 and Algorithm 2 were applied. The results are shown in Figure 4.

도 4는 일 실시예에 따른 알고리즘(도 3을 참조하여 설명한 알고리즘)과 다른 알고리즘 간의 수렴 비교를 나타낸다. 도 4에서는 도 3을 참조하여 설명한 알고리즘에 대응하는 알고리즘 1과 알고리즘 2 간의 수렴 비교를 나타낸다.4 shows a convergence comparison between an algorithm (an algorithm described with reference to FIG. 3) according to an embodiment and another algorithm. 4 shows a convergence comparison between Algorithm 1 and Algorithm 2 corresponding to the algorithm described with reference to FIG. 3.

방법들의 반복당 진행은 비슷하지만 알고리즘 1은 목적의 더 낮은 값에서도 선형 레이트로 계속 진행되었으나, 알고리즘 2 특정 로컬 최소값을 넘어서는 선형으로 진행되지 않았다. Although the progress per iteration of the methods is similar, Algorithm 1 continued at a linear rate even at the lower value of the objective, but Algorithm 2 did not proceed linearly beyond a specific local minimum.

본 개시에서는, 트리밍된 l₁ 페널티를 가지는 고차원 M-추정기에 대해 설명하였다. 가장 큰 h 파라미터를 페널티가 없는 상태로 둠으로써, 이러한 추정기는 바닐라 l₁ 페널티에 의해 발생하는 바이어스(bias)를 완화할 수 있다. 서포트 리커버리 및 1₂ 에러 범위의 관점에서 이론적인 결과는 모든 로컬 옵티멈을 유지하고 다른 비볼록 접근법들에 비해 경쟁력이 있음이 나타났다. 또한, 트리밍 파라미터 h에 대해 프로시저의 놀라운 견고성이 나타났다. 이러한 결과는 광범위한 시뮬레이션 실험에 의해 입증될 수 있다.In this disclosure, a high-dimensional M-estimator with a trimmed l ₁ penalty has been described. By leaving the largest h parameter without penalty, this estimator can mitigate the bias caused by the vanilla l ₁ penalty. In terms of support recovery and 1 ₂ error range, the theoretical results showed that all local optimizations were maintained and were competitive over other non-convex approaches. In addition, the surprising robustness of the procedure was shown for the trimming parameter h. These results can be verified by extensive simulation experiments.

또한, 본 개시에서는 트리밍된 문제를 위한 입증할 수 있는 수렴성 맞춤형 알고리즘이 설명되었다. 알고리즘 및 분석 기법은 간단한 DC 구조보다는 문제 구조에 기반을 두고 있으며, 유망한 수치적 결과를 제공할 수 있다. 이러한 접근법은 보다 일반적인 정규화기에 대해 유용하게 적용될 수 있다.In addition, a verifiable convergence custom algorithm for a trimmed problem has been described in this disclosure. Algorithms and analysis techniques are based on problem structures rather than simple DC structures, and can provide promising numerical results. This approach can be usefully applied for more general regularizers.

아래에서는, 전술한 희소 그래픽 모델에 대해 추가로 더 자세하게 설명한다. In the following, the above-described sparse graphic model will be described in more detail.

희소 그래픽 모델: 여기에서는 트리밍된 그래픽 라소(수학식 7)에 대한 결과 정리를 도출한다. 전반적으로 샘플 크기는 트루 파라미터의 로우 희소성 d로 스케일링하는 것을 가정하고, 다른 작업보다 더 마일드한 역 공분산은

로 가정한다(n 은 k로 스케일링하고,

의 0이 아닌 엔트리의 수): Sparse graphic model : Here, we derive the result theorem for the trimmed graphic laso (Equation 7). Overall the sample size is assumed to be scaled by the raw sparsity d of the true parameter, and the inverse covariance, which is milder than other operations, is

(N scales by k,

Number of non-zero entries):

[결과 정리 5] x_i가 서브-가우시안으로부터 얻어지고, 샘플 크기 n이 c₀d²log p보다 큰 프로그램(수학식 7)을 고려한다. 또한, 다음을 만족하는

과 h를 선택하는 것을 더 가정한다: [Results theorem 5] Consider a program (Equation 7) in which x _i is obtained from sub-Gaussian and the sample size n is larger than c ₀ d ² log p. Also, satisfying the following

Assume further to select and h:

(a)

에서의 모든 선택에 대해, 아래 수학식 22를 만족한다.(a)

For all the selections in, Equation 22 below is satisfied.

또한

는 어떤 상수 c₁에 대해

는 다음을 만족할 수 있다:Also

Is for some constant c ₁

Can satisfy:

(a) 모든

에 대해,

를 가지고,(a) all

About,

Have,

(b)

이면, 모든

If, all

(c)

이면, 적어도 S^c의 최소 p-h 엔트리에서 정확하게 0을 가지고 아래 수학식 24의 관계를 가질 수 있다. (c)

In this case, it is possible to have exactly 0 in the minimum ph entry of S ^c and have the relationship of Equation 24 below.

수학식 22의 조건은 비일관성 조건일 수 있고, 결과는 상기의 희소 선형 모델의 경우와 일치하며 l₁ 또는

정규화 된 종래의 모델(Glasso Loh 및 Wainwright (2017))의 경우와 유사할 수 있다.The condition of Equation 22 may be an inconsistency condition, and the result is consistent with the case of the above sparse linear model, and l ₁ or

It may be similar to the case of the normalized conventional model (Glasso Loh and Wainwright (2017)).

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the devices and components described in the embodiments include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It can be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of software. For the convenience of understanding, although it is sometimes described that one processing device is used, one of ordinary skill in the art, the processing device is a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, configuring the processing unit to behave as desired or processed independently or collectively. You can command the device. Software and/or data may be interpreted by a processing device or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodyed in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by the limited embodiments and drawings, various modifications and variations are possible from the above description by those of ordinary skill in the art. For example, the described techniques are performed in a different order from the described method, and/or components such as a system, structure, device, circuit, etc. described are combined or combined in a form different from the described method, or other components Alternatively, even if substituted or substituted by an equivalent, an appropriate result can be achieved.

Claims

In the maximum likelihood estimation method for machine learning performed by a computer,
Performing trimming for processing outlier values and heavy tailed noise on the loss function for the sample set;
Performing l ₁ normalization in consideration of the penalty determined as the trimming is performed; And
Estimating a maximum likelihood of a model associated with the sample set based on the result of the l ₁ normalization
Including,
The step of performing the trimming trims the entry that causes the greatest penalty for the loss function,
Trimming for the loss function is performed according to Equation 1,
[Equation 1]

Is the sample set for n samples, and

Is the loss function, w _i is the weight, h is the trimming parameter,
Trimming the entry causing the largest penalty is performed according to Equation 2,
[Equation 2]

A method for estimating maximum likelihood comprising representing a parameter space.

delete

The method of claim 1,
The step of performing the trimming,
In Equation 2 above

Define the order of,

It is performed according to Equation 3 obtained by setting w _i to 0 or 1 based on the size of,
[Equation 3]

Is a regularizer,

Maximum likelihood estimation method, representing the absolute sum of the minimum ph of.

The method of claim 3,
In order to minimize the loss dependent on the sparsity penalty,

The maximum likelihood estimation method, setting to 0.

In the maximum likelihood estimation method for machine learning performed by a computer,
Performing trimming for processing outlier values and heavy tailed noise on the loss function for the sample set;
Performing l ₁ normalization in consideration of the penalty determined as the trimming is performed; And
Estimating a maximum likelihood of a model associated with the sample set based on the result of the l ₁ normalization
Including,
The step of performing the trimming trims the entry that causes the greatest penalty for the loss function,
The model is a sparse linear model,
The sparse linear model is a real-valued target in a linear relationship.

And covariate

It is defined as in Equation 4 as having n observation pairs of,
[Equation 4]

,

, And

Is the independent observed noise,

Is the k-sparse vector to be estimated, the maximum likelihood estimation method.

The method of claim 5,
Trimming the entry causing the greatest penalty is performed according to Equation 5,
[Equation 5]

Corresponds to the loss function, the maximum likelihood estimation method.

In the maximum likelihood estimation method for machine learning performed by a computer,
Performing trimming for processing outlier values and heavy tailed noise on the loss function for the sample set;
Performing l ₁ normalization in consideration of the penalty determined as the trimming is performed; And
Estimating a maximum likelihood of a model associated with the sample set based on the result of the l ₁ normalization
Including,
The step of performing the trimming trims the entry that causes the greatest penalty for the loss function,
The model is a sparse graphical model,
Trimming the entry causing the greatest penalty is performed according to Equation 6,
[Equation 6]

Denotes the convex cone of positive definite matrices,

Represents the absolute sum of the non-diagonal minimum p(p-1)-h, the maximum likelihood estimation method.

The method of claim 1,
The problem of Equation 2 is solved using a block coordinate descent algorithm.

The method of claim 8,
The block coordinate descent algorithm,

,

And

Receiving input;

,

And

Initializing to 0;
While not converging,

Performing; And

And

Steps to output
The maximum likelihood estimation method performed including a.

In the machine learning method performed by a computer, including the maximum likelihood estimation method,
The maximum likelihood estimation method,
Performing trimming for processing outlier values and heavy tailed noise on the loss function for the sample set;
Performing l ₁ normalization in consideration of the penalty determined as the trimming is performed; And
Estimating a maximum likelihood of a model associated with the sample set based on the result of the l ₁ normalization
Including,
The step of performing the trimming trims the entry that causes the greatest penalty for the loss function,
Trimming for the loss function is performed according to Equation 1,
[Equation 1]

Is the sample set for n samples, and

A machine learning method comprising representing the parameter space.

A trimming performing unit for performing trimming for processing outlier values and heavy tailed noise with respect to the loss function for the sample set;
A normalization performing unit that performs l ₁ normalization in consideration of the penalty determined as the trimming is performed; And
A likelihood estimation unit that estimates the maximum likelihood of a model associated with the sample set based on the result of the l ₁ normalization
Including,
The trimming unit trims the entry that generates the greatest penalty for the loss function,
The trimming unit performs trimming on the loss function according to Equation 1,
[Equation 1]

Is the sample set for n samples, and

Is the loss function, w _i is the weight, h is the trimming parameter,
Trimming the entry generating the largest penalty according to Equation 2,
[Equation 2]

Denotes the parameter space, the maximum likelihood estimator.

delete

The method of claim 11,
The problem of Equation 2 is solved using a block coordinate descent algorithm,
The block coordinate descent algorithm,

,

And

Receiving input;

,

And

Initializing to 0;
While not converging,

Performing; And

And

Steps to output
A maximum likelihood estimator performed including.

In the machine learning device,
Including a maximum likelihood estimator,
The maximum likelihood estimator,
A trimming performing unit for performing trimming for processing outlier values and heavy tailed noise with respect to the loss function for the sample set;
A normalization performing unit that performs l ₁ normalization in consideration of the penalty determined as the trimming is performed; And
A likelihood estimation unit that estimates the maximum likelihood of a model associated with the sample set based on the result of the l ₁ normalization
Including,
The trimming unit trims the entry that generates the greatest penalty for the loss function,
The trimming unit performs trimming on the loss function according to Equation 1,
[Equation 1]

Is the sample set for n samples, and

Represents the parameter space, machine learning device.