KR20200074450A

KR20200074450A - Apparatus and method for m-estimation with trimmed l1 penalty

Info

Publication number: KR20200074450A
Application number: KR1020180162849A
Authority: KR
Inventors: 양은호; 윤지훈
Original assignee: 한국과학기술원
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2020-06-25
Also published as: KR102203337B1

Abstract

Provided is a maximum likelihood estimation method for machine learning. The method comprises the following steps of: performing trimming for processing outlier values and heavy tailed noise on a loss function for a sample set; performing I_1 normalization in consideration of a penalty determined by performing the trimming; and estimating the maximum likelihood of a model associated with the sample set based on the l_1 normalization result.

Description

Apparatus and method for estimating maximum likelihood through pruned L1 penalty{APPARATUS AND METHOD FOR M-ESTIMATION WITH TRIMMED L1 PENALTY}

실시예들은 최대우도 추정 장치 및 방법에 관한 것으로 가지친(즉, 트리밍된(trimmed) l_{1 -}페널티를 사용하는 최대우도 추정(M-estimation) 장치 및 방법에 관한 것이다.Embodiments relate to an apparatus and method for estimating maximum likelihood, and are directed to an apparatus and method for estimating maximum likelihood using trimmed l ₁ -penalty.

기계학습(머신러닝)(Machine Learning)은 컴퓨터가 입력된 데이터에 기반하여 이전에 본 적이 없는 데이터를 적절하게 예측하는 방법을 학습하는 과정을 의미한다. 기계학습에 의해 해결하기 위한 여러 문제들에는 불규칙하게 분포하는 데이터로부터 규칙성을 찾아내는 회귀(regression) 문제나, 데이터들을 일정한 카테고리로 분류하는 분류(classification) 문제가 포함된다.Machine learning (Machine Learning) refers to a process in which a computer learns how to properly predict data that has never been seen before based on the input data. Various problems to be solved by machine learning include a regression problem that finds regularity from irregularly distributed data, or a classification problem that classifies data into certain categories.

최대우도 추정(maximum likelihood estimation)은 모수(parameter)가 미지의

인 확률분포에서 뽑은 샘플(관측치) x들에 기반하여

를 추정하는 기법이다. 우도(likelihood)는 주어진 샘플 x을 고려할 때 모수

에 대한 추정이 일치하는 정도를 나타낼 수 있다. 기계학습은 이러한 우도를 최대화하는 방식으로 이루어질 수 있다. The maximum likelihood estimation has unknown parameters.

Based on the samples (observed values) x from the probability distribution

It is a technique for estimating. Likelihood is a parameter when considering a given sample x

It can indicate the degree to which the estimates for agree. Machine learning can be done in a way that maximizes this likelihood.

모델 복잡도(complexity)와 관련하여, 모델의 과적합을 방지하고 일반화 성능을 높이기 위한 방안으로서, 정규화(regularization)가 수행될 수 있다. 정규화는 모델 가중치에 대해 패널티(penalty)를 부과하는 것으로서, 정규화의 방법으로는 l₁ 정규화 및 l₂ 정규화 방법 등이 있다.With respect to model complexity, regularization may be performed as a method for preventing overfitting of the model and improving generalization performance. Normalization imposes a penalty on model weight, and methods of normalization include l ₁ normalization and l ₂ normalization.

한국공개특허 제10-2009-0009478호(공개일: 2009년 01월 23일)에서는, 무선 통신 시스템에 있어서의 최대우도 검출 방법에 있어서, 채널행렬 정보, 잡음전력 정보 및 스트림별 변조차수 정보 중 적어도 하나의 정보를 이용하여 유클리디안 거리를 계산하는 과정과, 계산된 유클리디안 거리를 이용하여 신호쌍 오류율(PER)을 계산하는 과정과, 계산된 PER을 이용하여 스트림별 오류확률을 계산하는 과정과, 계산된 스트림별 오류확률에 대해 정렬 및 분류를 수행하는 과정과, 수행된 정렬 및 분류 결과를 이용하여 최대우도 검출을 수행하는 과정을 포함하는 최대우도 검출 방법과 관련된 기술을 개시하고 있다.In Korean Patent Publication No. 10-2009-0009478 (Publication Date: January 23, 2009), in the maximum likelihood detection method in a wireless communication system, among channel matrix information, noise power information, and modulation order information for each stream A process of calculating the Euclidean distance using at least one piece of information, a process of calculating the signal pair error rate (PER) using the calculated Euclidean distance, and an error probability of each stream using the calculated PER Disclosing a technique related to a maximum likelihood detection method including a process of performing a sorting and classification on a calculated error probability for each stream, and a process of performing a maximum likelihood detection using the performed sorting and classification results, have.

상기에서 설명된 정보는 단지 이해를 돕기 위한 것이며, 종래 기술의 일부를 형성하지 않는 내용을 포함할 수 있으며, 종래 기술이 통상의 기술자에게 제시할 수 있는 것을 포함하지 않을 수 있다.The information described above is for illustrative purposes only, may include content that does not form part of the prior art, and may not include what the prior art can present to those skilled in the art.

일 실시예는, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하고, 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하고, l₁정규화의 결과에 기반하여 샘플 집합과 연관된 모델의 최대우도를 추정하는 기계학습을 위한 최대우도 추정 방법이 제공된다.One embodiment, the for the loss function of the sample set to perform the trimming to handle outliers (outlier) values and the heavy tail mode noise (heavy tailed noise), and consider the penalty is determined according to the trimming takes place l ₁ A method of estimating maximum likelihood for machine learning is provided that performs normalization and estimates the maximum likelihood of a model associated with a sample set based on the results of l ₁ normalization.

일 측면에 있어서, 컴퓨터에 의해 수행되는, 기계학습을 위한 최대우도 추정 방법에 있어서, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 단계, 상기 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 단계 및 상기 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 단계를 포함하고, 상기 트리밍을 수행하는 단계는 상기 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것을 포함하는, 최대우도 추정 방법이 제공된다. In one aspect, in a method of estimating maximum likelihood for machine learning, performed by a computer, for processing outlier values and heavy tailed noise for a loss function for a sample set Performing trimming, performing l ₁ normalization in consideration of a penalty determined as the trimming is performed, and estimating a maximum likelihood of a model associated with the sample set based on the results of the l ₁ normalization, A method of estimating maximum likelihood is provided, wherein the step of performing the trimming includes trimming an entry that generates the greatest penalty for the loss function.

상기 손실 함수에 대한 트리밍은 수학식 1에 따라 수행되고,Trimming for the loss function is performed according to Equation 1,

[수학식 1][Equation 1]

은 n개의 샘플들에 대한 상기 샘플 집합이고, 상기

은 상기 손실함수이고, w_i는 가중치이고, h는 트리밍 파라미터이고, 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것은 수학식 2에 따라 수행되고,

Is the sample set for n samples, where

Is the loss function, w _i is the weight, h is the trimming parameter, and trimming the entry that generates the largest penalty is performed according to equation (2),

[수학식 2][Equation 2]

는 파라미터 공간을 나타낼 수 있다.

Can represent a parameter space.

상기 트리밍을 수행하는 단계는, 상기 수학식 2에서

의 순서를 정의하고,

의 크기에 기반하여 w_i를 0 또는 1로 설정함으로써, 획득되는 수학식 3에 따라 수행되고, In the step of performing the trimming, in Equation 2

Define the order of

It is performed according to the obtained equation (3) by setting w _i to 0 or 1 based on the size of

[수학식 3][Equation 3]

는 정규화기(regularizer)로서,

의 최소 p-h 절대합을 나타낼 수 있다.

Is a regularizer,

Can represent the minimum ph absolute sum of.

희소 페널티(sparsity penalty)에 종속되는 손실을 최소화하기 위해, 상기

을 0으로 설정할 수 있다. To minimize losses subject to sparse penalty,

Can be set to 0.

상기 모델은 희소 선형 모델(sparse linear model)이고, 상기 희소 선형 모델은 선형 관계에 있는 실수 값 타겟

과 공변량

의 n 개의 관찰 쌍을 가가지는 것으로서 수학식 4와 같이 정의되고,The model is a sparse linear model, and the sparse linear model is a real-valued target in a linear relationship.

And covariates

Having n observation pairs of is defined as Equation 4,

[수학식 4][Equation 4]

,

, 및

은 독립적인 관측 노이즈이고,

는 추정 대상이되는 k-희소 벡터일 수 있다.

,

, And

Is the independent observation noise,

May be a k-sparse vector to be estimated.

상기 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것은 수학식 5에 따라 수행되고,Trimming the entry that generates the largest penalty is performed according to Equation (5),

[수학식 5][Equation 5]

는 상기 손실 함수에 대응할 수 있다.

Can correspond to the loss function.

상기 모델은 희소 그래픽 모델(sparse graphical model)이고, 상기 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것은 수학식 6에 따라 수행되고,The model is a sparse graphical model, and trimming the entry that generates the largest penalty is performed according to Equation (6),

[수학식 6][Equation 6]

는 정치 행렬(positive definite matrices)의 볼록한 콘(convex cone)을 나타내고,

는 비대각의 최소 p(p-1)-h 절대합을 나타낼 수 있다.

Denotes the convex cone of the positive definite matrices,

Can represent the minimum p(p-1)-h absolute sum of the diagonals.

상기 수학식 2의 문제는 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 사용하여 해결될 수 있다. The problem of Equation 2 may be solved using a block coordinate descent algorithm.

상기 블록 좌표 하강 알고리즘은,

,

및

를 입력 받는 단계,

,

및

를 0으로 초기화하는 단계, 수렴하지 않는 동안,

를 수행하는 단계 및

및

를 출력하는 단계를 포함하여 수행될 수 있다. The block coordinate descent algorithm,

,

And

Step to receive input,

,

And

Initializing to 0, while not converging,

Steps to perform and

And

It may be performed, including the step of outputting.

다른 일 측면에 있어서, 최대우도 추정 방법을 포함하여 수행되는 기계학습 방법에 있어서, 상기 최대우도 추정 방법은, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 단계, 상기 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 단계 및 상기 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 단계를 포함하고, 상기 트리밍을 수행하는 단계는 상기 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는 것을 포함하는 기계학습 방법이 제공된다. In another aspect, in a machine learning method including a maximum likelihood estimation method, the maximum likelihood estimation method includes: outlier values and heavy tailed noise for a loss function for a set of samples. performing trimming to process noise, performing l ₁ normalization in consideration of the penalty determined as the trimming is performed, and estimating the maximum likelihood of the model associated with the sample set based on the results of the l ₁ normalization A method of machine learning is provided, wherein the step of performing the trimming comprises trimming an entry that generates the greatest penalty for the loss function.

또 다른 일 측면에 있어서, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 트리밍 수행부, 상기 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 정규화 수행부 및 상기 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 우도 추정부를 포함하고, 상기 트리밍 수행부는 상기 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는, 최대우도 추정기가 제공된다. In another aspect, a trimming unit performing trimming to process outlier values and heavy tailed noise for a loss function for a sample set, determined as the trimming is performed A normalization performing unit performing l ₁ normalization in consideration of a penalty and a likelihood estimation unit estimating a maximum likelihood of a model associated with the sample set based on the results of the l ₁ normalization, and the trimming performing unit performs A maximum likelihood estimator is provided for trimming entries that generate a large penalty.

상기 트리밍 수행부는 수학식 1에 따라 상기 손실 함수에 대한 트리밍을 수행하고,The trimming unit performs trimming on the loss function according to Equation 1,

[수학식 1][Equation 1]

은 n개의 샘플들에 대한 상기 샘플 집합이고, 상기

은 상기 손실함수이고, w_i는 가중치이고, h는 트리밍 파라미터이고, 상기 가장 큰 페널티를 발생시키는 엔트리를 수학식 2에 따라 트리밍하고,

Is the sample set for n samples, where

Is the loss function, w _i is the weight, h is the trimming parameter, and the entry that generates the largest penalty is trimmed according to equation (2),

[수학식 2][Equation 2]

는 파라미터 공간을 나타낼 수 있다.

Can represent a parameter space.

상기 수학식 2의 문제는 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 사용하여 해결되고, 상기 블록 좌표 하강 알고리즘은,

,

및

를 입력 받는 단계,

,

및

를 0으로 초기화하는 단계, 수렴하지 않는 동안,

를 수행하는 단계 및

및

를 출력하는 단계를 포함하여 수행될 수 있다. The problem of Equation 2 is solved using a block coordinate descent algorithm, and the block coordinate descent algorithm is

,

And

Step to receive input,

,

And

Initializing to 0, while not converging,

Steps to perform and

And

It may be performed, including the step of outputting.

또 다른 일 측면에 있어서, 기계학습 장치에 있어서, 최대우도 추정기를 포함하고, 상기 최대우도 추정기는, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 트리밍 수행부, 상기 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 정규화 수행부 및 상기 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 우도 추정부를 포함하고, 상기 트리밍 수행부는 상기 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍하는, 기계학습 장치가 제공된다. In another aspect, in a machine learning apparatus, a maximum likelihood estimator is included, and the maximum likelihood estimator includes outlier values and heavy tailed noise for a loss function for a sample set. Trimming performer to perform trimming to process, Normalization performer performing l ₁ normalization in consideration of the penalty determined as the trimming is performed, and maximum likelihood of the model associated with the sample set based on the results of the l ₁ normalization A machine learning apparatus is provided, which includes a likelihood estimator for estimating, and wherein the trimming unit trims an entry that generates the greatest penalty for the loss function.

실시예의 트리밍된 l₁ 페널티를 사용하는 최대우도 추정 장치 및 방법은 기계학습 분야에서 적용 및 응용이 가능하다. 특히, 인공신경망에 이를 적용하는 것을 통해서는 학습에 필요한 중요 특징들을 효과적으로 골라낼 수 있게 된다. The apparatus and method for estimating maximum likelihood using the trimmed l ₁ penalty of the embodiment can be applied and applied in the field of machine learning. In particular, by applying it to the artificial neural network, it is possible to effectively select important features necessary for learning.

또한, 트리밍의 수행에 따라 용량이 큰 인공 신경망을 경량화할 수 있다. 이러한 경량화가 이루어짐으로써, 실시간으로 데이터를 입력는 IoT 환경이나 스마트폰과 같은 소형의 단말에 대해서도 용이하게 기계학습을 위한 알고리즘을 적용할 수 있게 될 것이다. In addition, an artificial neural network having a large capacity can be lightened according to the trimming. As such weight reduction is achieved, an algorithm for machine learning can be easily applied to a small terminal such as an IoT environment or a smartphone that inputs data in real time.

도 1은 일 실시예에 따른 최대우도 추정을 포함하는 기계학습을 수행하는 장치를 나타낸다.
도 2는 일 실시예에 따른 최대우도를 추정하는 방법을 나타내는 흐름도이다.
도 3은 일 실시예에 따른 트리밍을 수행하기 위한 문제를 해결하기 위한 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 나타낸다.
도 4는 일 실시예에 따른 알고리즘과 다른 알고리즘 간의 수렴 비교를 나타낸다.1 shows an apparatus for performing machine learning including maximum likelihood estimation according to an embodiment.
2 is a flowchart illustrating a method of estimating a maximum likelihood according to an embodiment.
3 shows a block coordinate descent algorithm for solving a problem for performing trimming according to an embodiment.
4 shows a convergence comparison between an algorithm according to an embodiment and another algorithm.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The same reference numerals in each drawing denote the same members.

후술될 상세한 설명에서, l₁ 정규화는 모델 가중치의 l₁ 놈(norm)(가중치 각 요소 절대값의 합)에 대해 패널티를 부과하는 것일 수 있다. 예컨대, 대부분의 요소값이 0인 희소 특징(sparse feature)에 의존하는 모델에 있어서 l₁ 정규화는 불필요한 피처에 대응하는 가중치들을 정확히 0으로 만듦으로써, 해당 피처를 모델이 무시하도록 만드는 것일 수 있다. In the detailed description to be described later, the l ₁ normalization may be to impose a penalty on the l ₁ norm of the model weight (sum of the weighted element absolute values). For example, in a model that relies on a sparse feature in which most of the element values are 0, l ₁ normalization may be to cause the model to ignore the feature by making the weights corresponding to unnecessary features exactly 0.

l₂ 정규화는 모델 가중치의 l₁ 놈의 제곱(가중치 각 요소 제곱의 합)에 대해 패널티를 부과하는 것일 수 있다. l₁ 정규화는 아주 큰 값이나 작은 값을 가지는 아웃라이어(outlier) 모델 가중치를 0에 가깝지만 0은 아닌 값으로 만들 수 있다. The l ₂ normalization may be to impose a penalty on the square of the l ₁ norm of the model weight (the sum of squares of each weighted element). l ₁ Normalization can make outlier model weights with very large or small values close to zero but non-zero.

도 1은 일 실시예에 따른 최대우도 추정을 포함하는 기계학습을 수행하는 장치를 나타낸다.1 shows an apparatus for performing machine learning including maximum likelihood estimation according to an embodiment.

도시된 기계학습 장치(100)는 기계학습을 수행하는 여하한 컴퓨팅 장치(컴퓨터 또는 서버)를 포함할 수 있다. 기계학습은 예컨대, 인공신명망을 사용하는 딥러닝 방식을 포함할 수 있고, 여하한 인공지능에 기반한 기계 학습 방법을 의미할 수 있다. The illustrated machine learning device 100 may include any computing device (computer or server) that performs machine learning. Machine learning may include, for example, a deep learning method using an artificial neural network, and may mean a machine learning method based on any artificial intelligence.

기계학습 장치(100)는 스마트 폰, PC(personal computer), 노트북 컴퓨터(laptop computer), 랩탑 컴퓨터(laptop computer), 태블릿(tablet), 사물 인터넷(Internet Of Things) 기기, 또는 웨어러블 컴퓨터(wearable computer) 등의 사용자가 사용하는 단말(또는 그 일부)일 수 있다. 또는, 기계학습 장치(100)는 사용자 단말에 대해 서비스나 콘텐츠를 제공하는 서버 또는 그 일부일 수 있다. 기계학습 장치(100)는 사물인터넷(IoT) 환경에서 동작하는 장치일 수 있다. The machine learning device 100 is a smart phone, a personal computer (PC), a laptop computer, a laptop computer, a tablet, an Internet of Things device, or a wearable computer ) May be a terminal (or a part thereof) used by a user. Alternatively, the machine learning apparatus 100 may be a server that provides a service or content to a user terminal or a part thereof. The machine learning device 100 may be a device operating in an Internet of Things (IoT) environment.

기계학습 장치(100)는 최대우도 추정기(110)를 포함할 수 있다. 최대우도 추정기(110)는 기계학습 과정에서 수행되는 최대우도를 추정하는 장치로서, 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행하는 트리밍 수행부(120), 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행하는 정규화 수행부(130) 및 l₁정규화의 결과에 기반하여 상기 샘플 집합과 연관된 모델의 최대우도를 추정하는 우도 추정부(140)를 포함할 수 있다. 트리밍 수행부(120)는 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍함으로써 트리밍을 수행할 수 있다. The machine learning apparatus 100 may include a maximum likelihood estimator 110. The maximum likelihood estimator 110 is an apparatus for estimating the maximum likelihood performed in a machine learning process, and trimming to process outlier values and heavy tailed noise for a loss function for a sample set a trimming execution unit 120, a maximum likelihood of the trimming is performed in consideration of the determined penalty according to the associated on the basis of results of the qualified execution unit 130, and l ₁ qualified to perform l ₁ normalized set of the sample-model to perform the The likelihood estimating unit 140 may be included. The trimming unit 120 may perform trimming by trimming an entry that generates the largest penalty for the loss function.

최대우도 추정기(110) 및 그 구성들은 기계학습 장치(100)가 포함하는 프로세서의 일부로서 구현될 수 있고, 각각은 하나 이상의 소프트웨어 모듈 및/또는 하드웨어 모듈로 구현될 수 있다.The maximum likelihood estimator 110 and its components may be implemented as part of a processor included in the machine learning apparatus 100, and each may be implemented as one or more software modules and/or hardware modules.

도 2는 일 실시예에 따른 최대우도를 추정하는 방법을 나타내는 흐름도이다. 2 is a flowchart illustrating a method of estimating a maximum likelihood according to an embodiment.

도 2를 참조하여, 전술된 최대우도 추정기(110)에 의한 최대우도 추정 방법에 대해 더 자세하게 설명한다. Referring to FIG. 2, a maximum likelihood estimation method by the above-described maximum likelihood estimator 110 will be described in more detail.

단계(210)에서, 트리밍 수행부(120)는 샘플 집합에 대한 손실 함수에 대해 아웃라이어(outlier) 값들 및 헤비 테일드 노이즈(heavy tailed noise)를 처리하기 위한 트리밍을 수행할 수 있다. 트리밍 수행부(120)는 손실 함수에 대해 가장 큰 페널티를 발생시키는 엔트리를 트리밍함으로써 트리밍을 수행할 수 있다. In step 210, the trimming unit 120 may perform trimming to process outlier values and heavy tailed noise with respect to the loss function for the sample set. The trimming unit 120 may perform trimming by trimming an entry that generates the largest penalty for the loss function.

단계(220)에서, 정규화 수행부(130)는 트리밍이 수행됨에 따라 결정된 페널티를 고려하여 l₁정규화를 수행할 수 있다. 정규화 수행부(130)는 유사하게 트리밍된 페널티를 고려하여 l₂ 정규화를 비롯한 다른 정규화를 추가로 수행할 수 있다. In step 220, the normalization performing unit 130 may perform l ₁ normalization in consideration of the penalty determined as the trimming is performed. The normalization execution unit 130 may additionally perform other normalizations including l ₂ normalization in consideration of similarly trimmed penalties.

단계(230)에서, 우도 추정부(230)는 l₁정규화의 결과에 기반하여 샘플 집합과 연관된 모델의 최대우도를 추정할 수 있다. 샘플 집합과 연관된 모델은 희소 선형 모델(sparse linear model) 또는 희소 그래픽 모델(sparse graphical model)일 수 있다.In step 230, the likelihood estimator 230 may estimate the maximum likelihood of the model associated with the sample set based on the results of the l ₁ normalization. The model associated with the sample set can be a sparse linear model or a sparse graphical model.

도 1을 참조하여 전술된 기술적 특징은 도 4에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다. The technical features described above with reference to FIG. 1 can be applied to FIG. 4 as it is, so a redundant description is omitted.

도 3은 일 실시예에 따른 트리밍을 수행하기 위한 문제를 해결하기 위한 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 나타낸다.3 shows a block coordinate descent algorithm for solving a problem for performing trimming according to an embodiment.

도 3은 아래에서 설명되는 수학식 2의 문제를 해결하기 위한 블록 좌표 하강 알고리즘을 나타낼 수 있다. 말하자면, 후술될 수학식 2의 문제는 도 3에서 도시된 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 사용하여 해결될 수 있다. 3 may represent a block coordinate descending algorithm for solving the problem of Equation 2 described below. In other words, the problem of Equation 2 to be described later can be solved by using a block coordinate descent algorithm shown in FIG. 3.

블록 좌표 하강 알고리즘은,

,

및

를 입력 받는 단계,

,

및

를 0으로 초기화하는 단계, 수렴하지 않는 동안,

를 수행하는 단계 및

및

,

And

Step to receive input,

,

And

Initializing to 0, while not converging,

Steps to perform and

And

It may be performed, including the step of outputting.

도 1 및 도 2를 참조하여 전술된 기술적 특징은 도 3에 대해서도 그대로 적용될 수 있으므로 중복되는 설명은 생략한다. The technical features described above with reference to FIGS. 1 and 2 may be applied to FIG. 3 as it is, so a redundant description will be omitted.

아래에서는 트리밍된(trimmed) l₁ 페널티를 가지는 고차원 M-추정기(최대우도 추정기)에 대해 보다 더 자세하게 설명한다. 아래에서 설명되는 M-추정기는 최대우도 추정기(110)에 대응할 수 있다. 후술될 트리밍, 정규화 및 우도 추정은 전술된 트리밍 수행부(120), 정규화 수행부(130) 및 우도 추정부(140)에 의해 각각 수행될 수 있다.Below, a high-dimensional M-estimator (maximum likelihood estimator) having a trimmed l ₁ penalty will be described in more detail. The M-estimator described below may correspond to the maximum likelihood estimator 110. Trimming, normalization, and likelihood estimation, which will be described later, may be performed by the trimming execution unit 120, the normalization execution unit 130, and the likelihood estimation unit 140, respectively.

표준적인 l₁ 페널티는 바이어스(shrinkage)를 초래하지만, 트리밍된 l₁은 h 최대 엔트리에서 페널티가 없게 된다(penalty-free). 이러한 추정기군은 희소 선형 회귀 분석(sparse linear regression)을 위한 트리밍된 라소(Trimmed Lasso)와 희소 그래픽 모델 추정을 위한 그 상대방(counterpart)을 포함한다. 트리밍된 l₁ 페널티는 비-볼록형(non-convex)이지만 SCAD 및 MCP와 같은 다른 비-볼록형 정규화기와는 달리 amenable하지 않으므로 사전 분석을 적용 할 수 없다.The standard l ₁ penalty results in shrinkage, but the trimmed l ₁ is penalty-free at h max entry. This group of estimators includes a trimmed Lasso for sparse linear regression and its counterpart for sparse graphic model estimation. The trimmed l ₁ penalty is non-convex, but unlike other non-convex normalizers such as SCAD and MCP, it is not amenable, so pre-analysis cannot be applied.

또한, 트리밍 파라미터 h의 함수로서 추정치의 서포트 리커버리(support recovery)을 특징화한다. 특정 조건 하에서, 어떠한 로컬 옵티멈(local optimum)에 대해서도, (i) 트리밍 파라미터 h가 트루(true) 서포트 크기(support size)보다 작은 경우, 트루 파라미터 벡터의 모든 제로 엔트리는 성공적으로 0으로 추정되고, (ii) h가 트루 서포트 크기보다 더 크면 로컬 옵티멈의 비관련 파라미터는 관련 파라미터보다 절대 값이 작게 되므로 따라서 관련 파라미터는 페널티화되지 않는다는 점이 나타난다.In addition, support recovery of the estimate is characterized as a function of the trimming parameter h. Under certain conditions, for any local optimum, (i) if the trimming parameter h is less than the true support size, all zero entries in the true parameter vector are successfully estimated to zero, (ii) If h is larger than the true support size, it is shown that the unrelated parameter of the local optimizer has an absolute value smaller than the related parameter, and thus the related parameter is not penalized.

다음으로, 여하한 로컬 옵티멈의 l₂ 에러를 제한한다(bound). 이러한 제한의 범위는 SCAD 또는 MCP와 같이 볼록하지 않은 조정 가능한 페널티(amenable penalty)를 위한 것들과 점근적으로 비견될 수 있으나, 일정한 것이 더 좋다. 주요 결과는 선형 회귀 및 그래픽 모델 추정에 특화될 수 있다. Next, we limit the l ₂ error of any local optimizer. The scope of these restrictions can be compared asymptotically with those for non-convex, adjustable penalties such as SCAD or MCP, but some are better. The main results can be specialized in linear regression and graphical model estimation.

또한, 트리밍된 정규화 문제에 대해 신속하고 입증 가능한 수렴 최적화 알고리즘에 대해 설명한다. 이러한 알고리즘은 볼록(convex)의 미분(DC) 기반 접근법과 동일한 수렴 속도를 갖지만, 실제로는 더 빠르고 최근에 제안된 DC 최적화를 위한 알고리즘보다 더 객관적인 값들을 찾아낼 수 있다. 이를 통해 시예에 따른 l₁ 트리밍의 가치가 입증될 수 있다.It also describes a fast and verifiable convergence optimization algorithm for trimmed normalization problems. This algorithm has the same convergence speed as the convex differential (DC) based approach, but it is actually faster and can find more objective values than the algorithm for DC optimization proposed recently. Through this, the value of l ₁ trimming according to an example may be proved.

여기에서는, 가장 큰 h 파라미터를 비페널티화된 상태로 유지하는, 트리밍된 l₁ 정규화를 통한 M-추정기군(family of M-estimator)에 대해 설명한다. M-추정기군은 트리밍된 라소(Trimmed Lasso) 추정기와 희소한 그래픽 모델 추정을 위한 상대방을 포함할 수 있다(이는 그래픽 트리밍된 라소(Graphical Trimmed Lasso)라고 할 수 있음).Here, we describe the family of M-estimators through trimmed l ₁ normalization, which keeps the largest h parameter in a non-penalized state. The M-estimator group may include a trimmed Lasso estimator and a counterpart for estimating a sparse graphic model (this can be referred to as a Graphics Trimmed Lasso).

정규화기의 분리 가능한 구성 요소에 대해 트리밍 메커니즘을 적용할 수 있다. 트리밍된 정규화를 통한 M-추정기의 제1 통계적 분석을 제시한다. 이러한 추정기는 볼록하지 않지만, SCAD 및 MCP 정규화기와는 달리 amenable하지 않으므로 사전 분석을 적용할 수 없다.The trimming mechanism can be applied to the detachable components of the normalizer. A first statistical analysis of the M-estimator through trimmed normalization is presented. These estimators are not convex, but unlike the SCAD and MCP normalizers, they are not amenable, so pre-analysis cannot be applied.

도출되는 이론적인 결과는 트리밍 파라미터 h가 트루(true) 서포트 크기보다 작으면 결과적인 비-볼록형 프로그램의 모든 로컬 옵티멈에 대해 트루 파라미터 벡터의 모든 제로 엔트리가 성공적으로 0으로 추정된다는 것을 보여주고; h가 트루 서포트 크기보다 큰 경우 로컬 옵티멈의 비관련 파라미터는 관련 파라미터보다 절대 값이 작으므로 따라서, 관련 파라미터는 페널티화되지 않는다는 것을 보여준다.The resulting theoretical results show that if the trimming parameter h is less than the true support size, then all zero entries of the true parameter vector are successfully estimated to zero for all local optimizers of the resulting non-convex program; When h is larger than the true support size, the unrelated parameter of the local optimizer shows that the related parameter is not penalized since the absolute value is smaller than the related parameter.

에러 범위(error bound) 외에도, l₂ 오류 범위를 제공한다. 이들은 SCAD 또는 MCP와 같이 조정 가능한 정규화된 문제들을 위한 것들과 점근적으로 동일하지만, 일정한 것이 더 좋고, 추가 제한 조건

을 요구하지 않는다(여기서 R은 안전 반경(safety radius)임). 주요 결과는 선형 회귀와 그래픽 모델 추정의 특수한 경우에 특화될 수 있다.

In addition to the error bounds, l ₂ provides an error range. These are asymptotically identical to those for adjustable normalized problems such as SCAD or MCP, but some are better, and additional constraints

Does not require (where R is the safety radius). The main results can be specified in special cases of linear regression and graphical model estimation.

트리밍된 정규화 문제를 최적화하기 위해, 우리는 볼록의 미분(DC) 함수 최적화를 기반으로 한 최근의 방법보다 더 우수한 특수 알고리즘을 개발하였고, 이에 대해 설명한다.To optimize the trimmed normalization problem, we have developed and describe a special algorithm that is superior to recent methods based on convex differential (DC) function optimization.

시뮬레이션 및 실제 데이터에 대한 실험을 통해 SCAD, MCP 및 바닐라 l₁ 페널티에 비해 l₁ 트리밍의 가치가 입증될 수 있다. l₁ 정규화를 넘어, 트리밍 전략은 그룹-희소성 촉진 l₁/l_q 정규화를 포함하는 다른 분해 가능한 정규화기에 대해서도 원활하게 적용될 수 있다.Experiments with simulated and real data can demonstrate the value of l ₁ trimming compared to SCAD, MCP and vanilla l ₁ penalties. Beyond l ₁ normalization, the trimming strategy can be seamlessly applied to other degradable normalizers including group-sparse promoting l ₁ /l _q normalization.

[문제 설정 및 트리밍된 정규화기][Problem setting and trimmed normalizer]

트리밍은 일반적으로 M-추정기의 손실 함수에 적용될 수 있다. n개의 샘플들의 주어진 집합인

에 대한 손실 함수

에 대해 큰 잔차(large residuals) 가지는 관측값(observation)을 트리밍함으로써 아웃라이어(outlier)와 헤비 테일 노이즈를 처리할 수 있다. h 아웃라이얼을 트리밍하는 아래 수학식 1의 문제를 해결할 수 있다. 말하자면, 손실 함수에 대한 트리밍은 상기 수학식 1에 따라 수행될 수 있다. w_i는 가중치이고, h는 트리밍 파라미터를 나타낼 수 있다. Trimming can generally be applied to the loss function of the M-estimator. a given set of n samples

Loss function for

The outlier and heavy tail noise can be processed by trimming observations with large residuals for. h The problem of Equation 1 below for trimming the outlier can be solved. That is, trimming for the loss function may be performed according to Equation 1 above. w _i is a weight and h can represent a trimming parameter.

여기에서, 일반적인 고차원 문제에 대해 트리밍된 정규화를 사용하는 M-추정기군을 고려한다. 가장 큰 페널티를 초래하는

의 엔트리(즉, 가장 큰 페널티를 발생시키는 엔트리)는 아래 수학식 2에 의해 트리밍될 수 있다. Here, we consider a group of M-estimators that use trimmed normalization for common high-dimensional problems. Causing the greatest penalty

The entry of (that is, the entry that generates the largest penalty) may be trimmed by Equation 2 below.

는 파라미터 공간을 나타낸다(예컨대, 선형 회귀에 대한

).

Denotes the parameter space (e.g. for linear regression

).

파라미터

의 순서 통계를 정의하여, w (w_i를

의 크기에 기반하여 0 또는 1로 설정함)에 대해 최소화하고, 수학식 2의 문제의 축약된 버전을

에 대해서만 다시 아래의 수학식 3과 같이 나타낼 수 있다. 말하자면, 트리밍은 상기 수학식 2에서

의 순서를 정의하고,

의 크기에 기반하여 w_i를 0 또는 1로 설정함으로써, 획득되는 수학식 3에 따라 수행될 수 있다. parameter

Define the ordering statistic of w (w _i

(Set to 0 or 1 based on the size of ), and a shortened version of the problem in equation (2)

It can be expressed as Equation 3 below only for. In other words, trimming is in Equation 2 above.

Define the order of

It can be performed according to Equation 3 obtained by setting w _i to 0 or 1 based on the size of.

정규화기

는

의 최소 p-h 절대합이다. 수학식 3의 축약된 버전은 희소 페널티(sparsity penalty)에 종속되는 손실을 최소화하는 것과 동일하며, 아래 수학식 4와 같이 표현될 수 있다. 말하자면, 희소 페널티(sparsity penalty)에 종속되는 손실을 최소화하기 위해,

은 0으로 설정될 수 있다. Normalizer

The

Is the minimum ph absolute sum. The abbreviated version of Equation 3 is the same as minimizing the loss dependent on the sparse penalty, and can be expressed as Equation 4 below. In other words, to minimize losses subject to sparse penalty,

Can be set to 0.

통계적 분석을 위해, 수학식 3의 문제에 초점을 맞춘다. 최적화 시에 수학식 2의 구조를 이용하여 가중치 w를 보조 최적화 변수로서 취급할 수 있다. 이는 수학식 3의 DC 구조에 기반하지 않는 새로운 빠른 알고리즘과 분석 기법을 제공할 수 있다. For statistical analysis, we focus on the problem in equation (3). At the time of optimization, the weight w can be treated as an auxiliary optimization variable by using the structure of Equation (2). This can provide a new fast algorithm and analysis technique that is not based on the DC structure of Equation 3.

선행 기술은 많은 실제 응용 프로그램에 의해 동기가 부여된 다양한 희소 정규화된 된 추정기를 위한 추정 상한선을 도출한다. The prior art derives an upper estimate limit for various sparse normalized estimators motivated by many practical applications.

본 개시에서는, 주로 트리밍된 l₁ 정규화기를 사용하는 2개의 일반적인 예시를 주로 고려하지만, 결과는 일반화된다.In this disclosure, mainly two general examples using a trimmed l ₁ normalizer are mainly considered, but the results are generalized.

예시: 희소 선형 모델. 고차원 선형 회귀 문제에서, 아래 수학식 5와 같은 선형 관계에 있는 실수 값 타겟

과 공변량

의 n 개의 관찰 쌍을 가질 수 있다. 희소 선형 모델은 아래 수학식 5와 같이 정의될 수 있다. Example: sparse linear model . In a high-dimensional linear regression problem, a real-valued target in a linear relationship as in Equation 5 below.

And covariates

Can have n observation pairs. The sparse linear model can be defined as Equation 5 below.

여기서,

,

, 및

은 n 개의 독립적인 관측 노이즈를 나타낸다. 목표는 k-희소 벡터

를 추정하는 것이다. 수학식 3의 프레임워크 에 따르면 우리는 (Lasso Tibshirani의 표준 l₁ 놈 대신에) l₁ 정규화기를 트리밍한 아래 수학식 6의 최소 자승 손실 함수를 사용한다.

는 전술된 손실 함수에 대응할 수 있다. here,

,

, And

Denotes n independent observation noises. Goal is k-sparse vector

Is to estimate. According to the framework of equation (3), we use the least squares loss function of equation (6) below by trimming the l ₁ regularizer (instead of Lasso Tibshirani's standard l ₁ norm).

Can correspond to the loss function described above.

예시: 희소 그래픽 모델. GGM(Gaussian Graphical Model)은 변수들 사이에서 조건부 독립성 조건을 부호화하기 위해 무방향성 그래프를 사용하고, 변수 집합에 대한 분포를 나타내기 위헤 강력한 통계적 모델군을 형성할 수 있다. Example: sparse graphics model . The GGM (Gaussian Graphical Model) uses a non-directional graph to encode conditional independence conditions among variables, and can form a powerful statistical model group to represent the distribution of a set of variables.

이러한 고차원 설정에서, 그래프 희소성 구속 조건은 특히, GGM을 추정하기 위해 적합할 수 있다. 가장 널리 사용되는 추정기인, 그래픽 라소는 정밀도 행렬(precision matrix)의 엔트리(또는 대각선 엔트리)의 l₁놈(norm)에 의해 정규화된 음의 가우시안 대수 우도를 최소화한다. 실시예의 프레임워크에서는 아래의 수학식 7과 같이 l₁은 트림된 버전으로 대체할 수 있다.In this high dimensional setup, graph sparse constraints may be particularly suitable for estimating GGM. The most widely used estimator, graphic laso, minimizes the negative Gaussian algebraic likelihood normalized by the l ₁ norm of an entry (or diagonal entry) of a precision matrix. In the framework of the embodiment, l ₁ may be replaced with a trimmed version as in Equation 7 below.

여기서,

는 대칭이며 엄밀하게 정치 행렬(positive definite matrices)의 볼록한 콘(convex cone)을 나타내고,

는 비대각의 최소 p(p-1)-h 절대합을 나타낼 수 있다. here,

Is symmetric and strictly represents the convex cone of the positive definite matrices,

Can represent the minimum p(p-1)-h absolute sum of the diagonals.

[트리밍된 정규화의 이론적인 보증] [Theoretical guarantee of trimmed normalization]

예상된 손실의 최소화기인 트루(true) k-희소 파라미터 벡터(또는 행렬)

을 추정하고자 한다:

True k-sparse parameter vector (or matrix), which is the minimizer of the expected loss

I want to estimate:

S는

의 서포트 세트를 지칭하기 위해 사용한다. 즉, 0이 아닌 엔트리 세트(즉, k = |S|)를 나타낸다. 여기에서는, 다음과 같은 표준 가정하에 추정 일관성의 상한(

및 서포트 세트 리커버리)을 유도한다: S is

Used to refer to the support set of. That is, it represents a non-zero set of entries (i.e., k = |S|). Here, the upper limit of the estimated consistency under the following standard assumptions:

And support set recovery):

(C-1) 손실 함수

은 미분 가능하고 볼록하다. (C-1) Loss function

Is differentiable and convex.

(C-2) (

에 대한 제한된 강한 복잡도)

를 파라미터

에 대한 에러 벡터의 가능한 세트로 둔다.

이다. 아래 수학식 8과 같은 관계 식이 얻어질 수 있다. (C-2) (

Limited strong complexity for)

Parameters

Let as a possible set of error vectors for.

to be. A relational expression such as Equation 8 below can be obtained.

는 곡률(curvature) 파라미터이고,

는 공차(tolerance) 상수이다.

Is the curvature parameter,

Is the tolerance constant.

볼록 손실 함수

은 일반적으로 고차원 설정 (p> n) 하에서 강하게 볼록하게 될 수 없다. (C-2)는 비율

이 작은 일부 제한된 방향에서만 강한 곡률을 부과할 수 있다. 이러한 조건은 광범위하게 연구되어 몇몇 대중적인 고차원 문제를 보유하고 있음이 알려져있다. (C-1)에서의

의 복잡도 조건은 추가적인 약한 구속 조건을 도입함으로써 완화될 수 있다. 그러나 본 개시에서는 명확성을 위해 볼록한 손실에 초점을 맞춘다.Convex loss function

Cannot generally be strongly convex under high dimensional settings (p>n). (C-2) is the ratio

Only a few limited directions can impose a strong curvature. It is known that these conditions have been studied extensively and have some popular high-level problems. (C-1)

The complexity condition of can be mitigated by introducing additional weak constraints. However, this disclosure focuses on convex losses for clarity.

로 바운드(bound)로 시작한다. 이를 위해 트리밍된 정규화기

에 대해 특별히 고안된 PDW(primal-dual witness) 기법을 채택할 수 있다. 작업 라인은 PDW 기법을 사용하고, l₁ 정규화기와 조정 가능한 비볼록 정규화기에 대한 서포트 세트 리커버리를 나타낼 수 있다는 점에 유의한다. 그러나,

가 대칭이고 오목한 경우에도 이는 amenable하지 않다.

Start with a low bound. Trimmed normalizer for this

It is possible to adopt a primary-dual witness (PDW) technique specifically designed for. Note that the working line uses the PDW technique and can represent support set recovery for the l ₁ normalizer and the adjustable non-convex normalizer. But,

Even if is symmetrical and concave, it is not amenable.

PDW의 핵심 단계는 제한된 프로그램(restricted program)을 작성하는 것일 수 있다. T를 {1, ... , p}의 임의의 부분 집합이라하고, 크기를 h라고 하고,

이고,

라고 하면, 아래 수학식 9와 같이 제한된 프로그램을 나타낼 수 있다. A key step in PDW may be to write a restricted program. Let T be an arbitrary subset of {1, ..., p}, and say size h,

ego,

Speaking of, it can represent a limited program as in Equation 9 below.

모든

에 대해

으로 수정할 수 있다. 이중 변수

를 제로 서브-그라디언트 조건(zero sub-gradient condition)을 만족하기 위해 더 구축해 둘 수 있다(아래 수학식 10 참조).all

About

Can be modified with Double variable

Can be further constructed to satisfy the zero sub-gradient condition (see Equation 10 below).

(적절한 리오더링 후)

이고,

일 수 있다. 명확성을 위해서

와

에서 T에 대한 의존성을 억제할 수 있음에 유의한다. 최종적인 표현을 유도하기 위해

의 엄격한 이중 실행 가능성

, 즉,

을 설정할 수 있다. 다음 정리(theorem)는 비볼록 프로그램(수학식 3)의 로컬 옵티멈과 관련하는 주요한 이론적인 결과를 설명한다. 이 정리는 엄격한 이중 실행 가능성 하에서 로컬 옵티멈의 비관련 파라미터가 관련 파라미터보다 작은 절대 값을 가지므로 관련 파라미터가 페널티화되지 않는다는 것을 보증한다(h가 k보다 크게 설정되는 한).(After proper reordering)

ego,

Can be For clarity

Wow

Note that the dependence on T can be suppressed. To drive the final expression

Strict Double Workability

, In other words,

You can set The following theorem explains the main theoretical results related to the local optimizer of the non-convex program (Equation 3). This theorem ensures that under the strict double practicability, the unrelated parameter of the local optimizer has an absolute value less than the related parameter, so that the related parameter is not penalized (as long as h is set greater than k).

[정리 1] (C-1)과 (C-2)를 만족하는 트리밍된 정규화기(수학식 3)의 문제를 고려한다.

는 샘플 크기

이고,

인 여하한 수학식 3의 로컬 최소값일 수 있다. 다음을 가정한다: [Theorem 1] Consider the problem of the trimmed normalizer (Equation 3) satisfying (C-1) and (C-2).

Sample size

ego,

Can be the local minimum of Equation 3. Assume the following:

(a)

의 여하한 선택에 대해, 수학식 10의 PDW 구조로부터의 이중 벡터

는 어떤(some)

에 대해 수학식 11과 같은 엄격한 이중 실행 가능성을 만족할 수 있다. (a)

For any choice of, a dual vector from the PDW structure in Equation 10

Is some

Can satisfy the strict double feasibility of Equation (11).

U는 트루 서포트 S 및 T의 합집합(union)일 수 있다.U may be a union of true supports S and T.

(b)

로 두면,

는 아래 수학식 12에 의해 하한(loer bounded)이 정해질 수 있다. (b)

If left as

The lower bound can be determined by Equation 12 below.

는 행렬의 최대 절대 로우(row) 합을 나타낼 수 있다.

Can represent the maximum absolute row sum of the matrix.

이에 따라 다음을 얻을 수 있다:This gives you:

(1) 모든

에 대해

일 수 있다.(1) All

About

Can be

(2)

이면, 모든

가 0으로서 성공적으로 추정될 수 있고, 아래 수학식 13과 같은 관계를 가질 수 있다. (2)

Back side, all

Can be successfully estimated as 0, and may have a relationship as in Equation 13 below.

(3)

이면, S^C의 적어도 최소(절대의) p-h 엔트리는 정확하게 0을 가지지만, 대신에, 보다 간단하게(어쩌면 더 타이트하게) 아래의 수학식 14와 같이 정해질 수 있다. (3)

In this case, at least the minimum (absolute) ph entry of S ^C has exactly 0, but instead, more simply (maybe more tightly), it can be determined as in Equation 14 below.

는 S를 포함하는

의 h 최대 절대 엔트리로서 정의될 수 있다.

Containing S

H can be defined as the maximum absolute entry.

실제 문제에 대한 결과 정리(corollaries)에 있어서

및

와 관련하는 항들에 대한 실제 경계(bound)가 유도될 수 있다(예컨대,

가

에 의해 상한이 정해질 것이고, 따라서,

을 선택할 수 있음).When it comes to corollaries on real problems

And

The actual bounds for terms relating to can be derived (eg,

end

The upper limit will be determined by

You can choose).

수학식 11 및 12가 h = 0인 경우(Lasso)보다 더 엄격하게 보일지라도, h = 0과 점진적으로 동일한 확률 하에서의 모든 선택에 대해 균등하게 상한이라고 결과 정리(corollaries)에서는 볼 수 있다. Although the equations 11 and 12 seem more stringent than when h = 0 (Lasso), it can be seen in the result corollaries that they are equally upper bounds for all choices under the same probability as h = 0.

또한, h가 0으로 설정되면, 결과는 일반적인 l₁ 놈의 결과를 회복하게 될 것임에 유의한다. 게다가, 정리의 (1)에 의해,

이면,

는 단지 관련 특징 지수만을 포함하고 몇몇 관련 특징은 페널티화되지 않을 수 있다.

인 경우,

는 모든 관련 지표 (및 비관련 지표)를 포함할 수 있다. 이 경우, 수학식 13의 두 번째 항은 사라지지만,

항이 커짐에 따라

는 증가할 수 있다. 또한, h가 p에 접근함에 따라

인 조건이 위반될 수 있다. 많은 문제들에서 선험적으로 희소성 k를 알지 못하는 반면, 암시적으로

를 설정할 수 있는 것으로 가정한다(즉, 교차 검증).Also, note that if h is set to 0, the result will recover the result of the normal l ₁ norm. Furthermore, by (1) of theorem,

If it is,

Contains only relevant feature indices and some related features may not be penalized.

If it is,

Can include all relevant indicators (and non-related indicators). In this case, the second term in Equation 13 disappears,

As the term grows

Can increase. Also, as h approaches p

Phosphorus conditions may be violated. In many problems a priori is not aware of scarcity k, while implicitly

It is assumed that can be set (ie, cross-validation).

이제 같은 조건 하에서 l₂ 추정 바운드(bound)로 돌아간다:Now return to the estimated l ₂ bound under the same conditions:

[결과 정리 2] 정리 1의 모든 조건이 만족되는 트리밍된 정규화기(수학식 3)을 사용하는 문제를 고려한다. 여하한 수학식 3의 로컬 최소값에 대해, l₂ 놈에 대한 파라미터 추정 에러는 어떤(some) 상수 C에 대하여, 다음과 같이 상한이 정해질 수 있다: [Results Theorem 2] Consider the problem of using a trimmed normalizer (Equation 3) in which all the conditions of Theorem 1 are satisfied. For any local minimum in Equation 3, the parameter estimation error for the l ₂ norm can be bounded for some constant C, as follows:

수학식 15에서의 l₂바운드는 임의의 로컬 옴티멈에 대해, 지역 최적 조건에 대해

의 크기가 정리 1에 의해

보다 작거나 같음이 보증되고, 따라서 주어진

에 대한 제한된 프로그램(수학식 9)의 l₂바운드를 획득하기 위해 알려진 결과에 대해 적용할 수 있으므로 쉽게 유도될 수 있다. 정리 1과 결과 정리 2로부터, 추정 바운드(bound)가 SCAD 나 MCP와 같은

-조정 가능한 정규화된 문제에 대한 것들과 점근적으로 동일하다는 것을 관찰 할 수 있다:

The l ₂ bound in Equation 15 is for any local omtim, and for local optimal conditions.

The size of the theorem by 1

Less than or equal to is guaranteed, so given

It can be easily derived because it can be applied to known results in order to obtain the l ₂ bound of the limited program for Equation (9). From theorem 1 and result theorem 2, the estimated bound is equal to SCAD or MCP

It can be observed that it is asymptotically identical to the ones for the adjustable normalized problem:

그러나, 이러한 정규화기에 대한 상수는 (트리밍된 l_1-에 대한

대신에)

항이 관여하므로 더 클 수 있다. 또한, 이러한 비볼록형 정규화기는 이론적인 보장에 대한 최적화 문제에서 추가 제약 조건

을 요구하며

와 튜닝 파라미터변수 R에 대한 추가 가정을 도입한다.However, the constant for this normalizer (for the trimmed l _1-

Instead of)

Since the term is involved, it can be larger. In addition, these non-convex normalizers have additional constraints in the optimization problem for theoretical guarantees.

Asking

And assumptions for the tuning parameter variable R are introduced.

여기에서는 전술된 대중적인(popular) 고차원 문제에 대한 주요 정리를 적용한다: 희소 선형 회귀 및 희소 그래픽 모델(이에 대해서는 이후 자세하게 설명함).The main theorem for the popular high-dimensional problems described above is applied here: sparse linear regression and sparse graphic models (which will be described in detail later).

희소 최소 자승법. 임의의 방법에 대한 정보 이론적인 접근에 의해 동기 부여되며, 희소 선형 회귀의 이전의 모든 분석은 충분히 큰 상수 c₀에 대해

로 가정할 수 있다. 또한, 정해진

에 대해,

라고 가정할 수 있다. Sparse least squares method . Motivated by an information-theoretic approach to any method, and all previous analyzes of sparse linear regression for sufficiently large constants c ₀

Can be assumed as Also,

About,

Can be assumed.

[결과 정리 3]

는 서브-가우시안인 모델(수학식 5)을 고려한다. 어떤 상수 c_l과 아래의 조건을 만족하는 h에 대한

의 선택을 통해 프로그램(수학식 6)을 푸는 것을 가정한다: [Result 3]

Considers the sub-Gaussian model (Equation 5). For a constant c _l and h satisfying the following conditions

Assume that the program (Equation 6) is solved by selecting:

(a) 샘플 공분산 행렬

은 아래 수학식 16의 조건을 만족한다:(a) Sample covariance matrix

Satisfies the condition of Equation 16 below:

의 임의의 선택에 대해,

For any choice of,

는 행렬의 최대 특이값(singular value)일 수 있다.

Can be the maximum singular value of the matrix.

또한,

는 어떤 상수 c₁에 대해

에 의해 하한이 정해지는 것으로 가정할 수 있다. 그리고 적어도 1-c₂ exp(-c₃ log p)의 높은 확률로 수학식 6의 로컬 최소값

는 다음을 만족한다:Also,

Is for some constant c ₁

It can be assumed that the lower limit is determined by. And a local minimum of Equation 6 with a high probability of at least 1-c ₂ exp(-c ₃ log p)

Satisfies:

(a) 모든

에 대해,

를 가지고,(a) all

About,

Have,

(b)

이면, 모든

는 0으로 성공적으로 추정되고 아래 수학식 17의 관계를 가지고,(b)

Back side, all

Is successfully estimated as 0 and has the relationship of Equation 17 below,

(c)

이면, 적어도 S^c의 최소 p-h 엔트리에서 정확하게 0을 가지고 아래 수학식 18의 관계를 가질 수 있다.(c)

In this case, it is possible to have a relationship of Equation 18 below with exactly 0 in at least the minimum ph entry of S ^c .

결과 정리 3의 조건은 이전의 연구에서 검토된 바 있다. 특히, 수학식 16은 희소 최소 자승 추정기(Wainwright(2009b))에 대한 비 일관성 조건으로서 알려져 있다. 모든 조건은 서브-가우시안 행렬에 대한 표준 콘센트레이션 바운드(concentration bound)를 통해 높은 확률을 유지하는 것으로 보여질 수 있다. 라소(Lasso)의 경우, 비 일관성 조건이 Wainwright (2009b)를 위반하면 추정은 실패하게 될 것이라는 점을 유의하는 것이 중요하다. 라소와 달리, 이러한 조건이 충족되지 않더라도 l₁의 문제 (수학식 6)가 성공할 수 있음을 확인할 수 있다.The criteria for theorem 3 have been reviewed in previous studies. In particular, Equation 16 is known as the inconsistency condition for the sparse least squares estimator (Wainwright (2009b)). All conditions can be seen to maintain a high probability through the standard concentration bound for the sub-Gaussian matrix. In the case of Lasso, it is important to note that the estimate will fail if inconsistent conditions violate Wainwright (2009b). Unlike Lasso, it can be confirmed that even if these conditions are not satisfied, the problem of l ₁ (Equation 6) can succeed.

[최적화][optimization]

수학식 2의 문제를 해결하기 위한 블록 좌표 하강 알고리즘(block coordinate descent algorithm)을 개발하고 분석한다. 가중치 w를 프로젝팅 아웃(projecting out)하기 보다는 명시적인 블록(explicit block)으로 남겨둠으로써, 가중치 w를 설정하기 전에 더 많은 자유도를 부여할 수 있다. 수학식 3에 대한 DC 수식에 의존하는 대신에 수학식 2의 구조를 사용하여 알고리즘을 분석할 수 있다. 이러한 접근법은 도 3에서 도시된 알고리즘 1에서 자세히 설명되어 있으며 수렴 분석은 아래의 정리 4에서 요약될 수 있다. A block coordinate descent algorithm for solving the problem of Equation 2 is developed and analyzed. By leaving the weight w as an explicit block rather than projecting out, more freedom can be given before setting the weight w. Instead of relying on the DC equation for Equation 3, the algorithm can be analyzed using the structure of Equation 2. This approach is detailed in Algorithm 1 shown in FIG. 3 and the convergence analysis can be summarized in Theorem 4 below.

수학식 19와 같은 일반적인 목적 함수를 고려한다. Consider a general objective function such as Equation (19).

는 볼록성 지시 함수(convex indicator function)일 수 있다.

으로 둘 수 있다.

May be a convex indicator function.

Can be placed as

그리고, 아래의 가정을 만족하는 것으로 가정한다.And, it is assumed that the following assumptions are satisfied.

[가정 1] (a) f는 L-Lipchitz 연속 그라디언트를 갖는 매끄러운 닫힌 볼록 함수이다. (b) r_i는 볼록하고, (c) S는 닫힌 볼록 집합이며 F는 아래로 바운드된다. [Assumption 1] (a) f is a smooth closed convex function with L-Lipchitz continuous gradient. (b) r _i is convex, (c) S is a closed convex set, and F is bound downward.

[정리 4] 가정 1의 (a)-(c)이 성립하면, 알고리즘 1에 의해 생성된 반복은 아래 수학식 20을 만족한다: [Theorem 4] If (a)-(c) of Assumption 1 holds, the iteration generated by Algorithm 1 satisfies Equation 20 below:

또한,

로 정의하고, 스텝 크기

를 선택하면, 아래 수학식 21을 얻을 수 있다.Also,

Define as and step size

If is selected, Equation 21 below can be obtained.

이는 최적 조건과 관련하여 서브리니어(sublinear)의 수렴 레이트(rate)를 제공한다. This provides a sublinear convergence rate in relation to the optimal condition.

수학식 2의 문제는 가정 1을 만족하고, 따라서 알고리즘 1은 알고리즘 1은

를 사용하여 측정된 것처럼 서브리니어 레이트로 수렴한다.The problem in equation (2) satisfies assumption 1, so algorithm 1 is algorithm 1

Converges to a sublinear rate as measured using.

알고리즘 1의 효율성을 나타내기 위해, 알고리즘 2(Khamaru and Wainwright, 2018의 알고리즘)와 비교를 위한 수치 실험을 수행할 수 있다. DC 프로그램을 위한 여러 접근법이 제안되었다: prox-type 알고리즘(알고리즘 2)은 하위 집합 선택에 특히 적합할 수 있다.To demonstrate the effectiveness of Algorithm 1, numerical experiments can be performed for comparison with Algorithm 2 (Khamaru and Wainwright, 2018 algorithm). Several approaches for DC programming have been proposed: The prox-type algorithm (Algorithm 2) may be particularly suitable for subset selection.

500의 차원 변수 및 100의 샘플의 변수로 Lasso 시뮬레이션 데이터를 생성하였다. 실제 생성 변수에서 0이 아닌 요소의 수는 10일 수 있다. h = 25를 취하였고, 알고리즘 1과 알고리즘 2를 모두 적용하였다. 결과는 도 4에서 도시되었다. Lasso simulation data was generated with 500 dimensional variables and 100 sample variables. The number of non-zero elements in the actual generated variable may be 10. h = 25 was taken, and both algorithm 1 and algorithm 2 were applied. The results are shown in FIG. 4.

도 4는 일 실시예에 따른 알고리즘(도 3을 참조하여 설명한 알고리즘)과 다른 알고리즘 간의 수렴 비교를 나타낸다. 도 4에서는 도 3을 참조하여 설명한 알고리즘에 대응하는 알고리즘 1과 알고리즘 2 간의 수렴 비교를 나타낸다.4 illustrates a convergence comparison between an algorithm (an algorithm described with reference to FIG. 3) and another algorithm according to an embodiment. 4 shows a convergence comparison between algorithm 1 and algorithm 2 corresponding to the algorithm described with reference to FIG. 3.

방법들의 반복당 진행은 비슷하지만 알고리즘 1은 목적의 더 낮은 값에서도 선형 레이트로 계속 진행되었으나, 알고리즘 2 특정 로컬 최소값을 넘어서는 선형으로 진행되지 않았다. Although the progression per method was similar, Algorithm 1 continued at a linear rate even at the lower value of the objective, but did not proceed linearly beyond the algorithm 2 specific local minimum.

본 개시에서는, 트리밍된 l₁ 페널티를 가지는 고차원 M-추정기에 대해 설명하였다. 가장 큰 h 파라미터를 페널티가 없는 상태로 둠으로써, 이러한 추정기는 바닐라 l₁ 페널티에 의해 발생하는 바이어스(bias)를 완화할 수 있다. 서포트 리커버리 및 1₂ 에러 범위의 관점에서 이론적인 결과는 모든 로컬 옵티멈을 유지하고 다른 비볼록 접근법들에 비해 경쟁력이 있음이 나타났다. 또한, 트리밍 파라미터 h에 대해 프로시저의 놀라운 견고성이 나타났다. 이러한 결과는 광범위한 시뮬레이션 실험에 의해 입증될 수 있다.In this disclosure, a high-dimensional M-estimator with a trimmed ₁ penalty has been described. By leaving the largest h parameter penalized, this estimator can mitigate the bias caused by the vanilla l ₁ penalty. From the standpoint of support recovery and 1 ₂ error range, the theoretical results show that all local optimizations are maintained and competitive with other non-convex approaches. In addition, the surprising robustness of the procedure was shown for the trimming parameter h. These results can be demonstrated by extensive simulation experiments.

또한, 본 개시에서는 트리밍된 문제를 위한 입증할 수 있는 수렴성 맞춤형 알고리즘이 설명되었다. 알고리즘 및 분석 기법은 간단한 DC 구조보다는 문제 구조에 기반을 두고 있으며, 유망한 수치적 결과를 제공할 수 있다. 이러한 접근법은 보다 일반적인 정규화기에 대해 유용하게 적용될 수 있다.In addition, a verifiable convergence custom algorithm for trimmed problems has been described in this disclosure. Algorithms and analysis techniques are based on problem structures rather than simple DC structures and can provide promising numerical results. This approach may be useful for more general normalizers.

아래에서는, 전술한 희소 그래픽 모델에 대해 추가로 더 자세하게 설명한다. Below, the sparse graphic model described above is described in further detail.

희소 그래픽 모델: 여기에서는 트리밍된 그래픽 라소(수학식 7)에 대한 결과 정리를 도출한다. 전반적으로 샘플 크기는 트루 파라미터의 로우 희소성 d로 스케일링하는 것을 가정하고, 다른 작업보다 더 마일드한 역 공분산은

로 가정한다(n 은 k로 스케일링하고,

의 0이 아닌 엔트리의 수): Sparse graphic model : Here we draw the result theorem for the trimmed graphic laso (Equation 7). Overall, the sample size is assumed to scale to the low sparse d of the true parameter, and the inverse covariance, which is milder than the others,

Suppose (n is scaled to k,

Number of non-zero entries):

[결과 정리 5] x_i가 서브-가우시안으로부터 얻어지고, 샘플 크기 n이 c₀d²log p보다 큰 프로그램(수학식 7)을 고려한다. 또한, 다음을 만족하는

과 h를 선택하는 것을 더 가정한다: [Results Theorem 5] Consider a program (Equation 7) where x _i is obtained from a sub-Gaussian, and the sample size n is greater than c ₀ d ² log p. Also, to satisfy the following

Suppose you choose more and h:

(a)

에서의 모든 선택에 대해, 아래 수학식 22를 만족한다.(a)

For all of the selections in, Equation 22 below is satisfied.

또한

는 어떤 상수 c₁에 대해

는 다음을 만족할 수 있다:Also

Is for some constant c ₁

Can satisfy:

(a) 모든

에 대해,

를 가지고,(a) all

About,

Have,

(b)

이면, 모든

Back side, all

Is successfully estimated as 0 and has the relationship of Equation 17 below,

(c)

이면, 적어도 S^c의 최소 p-h 엔트리에서 정확하게 0을 가지고 아래 수학식 24의 관계를 가질 수 있다. (c)

If it is, at least the minimum ph entry of S ^c has exactly 0 and may have the relationship of Equation 24 below.

수학식 22의 조건은 비일관성 조건일 수 있고, 결과는 상기의 희소 선형 모델의 경우와 일치하며 l₁ 또는

정규화 된 종래의 모델(Glasso Loh 및 Wainwright (2017))의 경우와 유사할 수 있다.The condition of Equation 22 may be an inconsistency condition, and the result is consistent with the case of the sparse linear model above and l ₁ or

This may be similar to the case of normalized conventional models (Glasso Loh and Wainwright (2017)).

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or combinations of hardware components and software components. For example, the devices and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors (micro signal processors), microcomputers, field programmable arrays (FPAs), It may be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. Further, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and/or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodied in the transmitted signal wave. The software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. Includes hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by a limited embodiment and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, proper results can be achieved even if replaced or substituted by equivalents.

Claims

A computer-aided method for estimating maximum likelihood for machine learning,
Performing trimming to process outlier values and heavy tailed noise for the loss function for the sample set;
Performing l ₁ normalization in consideration of a penalty determined as the trimming is performed; And
Estimating a maximum likelihood of the model associated with the sample set based on the results of the l ₁ normalization
Including,
The step of performing the trimming includes trimming an entry that generates the greatest penalty for the loss function.

According to claim 1,
Trimming for the loss function is performed according to Equation 1,
[Equation 1]

Is the sample set for n samples, where

Is the loss function, w _i is the weight, h is the trimming parameter,
Trimming the entry that generates the largest penalty is performed according to Equation 2,
[Equation 2]

Is a parameter space, the maximum likelihood estimation method.

According to claim 2,
The step of performing the trimming,
In Equation 2 above

Define the order of

It is performed according to the obtained equation (3) by setting w _i to 0 or 1 based on the size of
[Equation 3]

Is a regularizer,

The maximum likelihood estimation method, which represents the absolute sum of the minimum ph of.

According to claim 3,
To minimize losses subject to sparse penalty,

The maximum likelihood estimation method, which sets the value to 0.

According to claim 1,
The model is a sparse linear model,
The sparse linear model is a real-valued target in a linear relationship

And covariates

Having n observation pairs of is defined as Equation 4,
[Equation 4]

,

, And

Is the independent observation noise,

Is a k-sparse vector to be estimated, the maximum likelihood estimation method.

The method of claim 5,
Trimming the entry that generates the largest penalty is performed according to Equation (5),
[Equation 5]

Is a maximum likelihood estimation method corresponding to the loss function.

According to claim 1,
The model is a sparse graphical model,
Trimming the entry that generates the largest penalty is performed according to Equation (6),
[Equation 6]

Denotes the convex cone of the positive definite matrices,

Is the maximum likelihood estimation method, which represents the absolute sum of the minimum p(p-1)-h of the diagonal.

According to claim 2,
The problem of Equation 2 is solved using a block coordinate descent algorithm, the maximum likelihood estimation method.

The method of claim 8,
The block coordinate descent algorithm,

,

And

Receiving input;

,

And

Initializing to 0;
While not converging,

Performing; And

And

Steps to output
The maximum likelihood estimation method performed, including.

In the machine learning method performed including the maximum likelihood estimation method,
The maximum likelihood estimation method,
Performing trimming to process outlier values and heavy tailed noise for the loss function for the sample set;
Performing l ₁ normalization in consideration of a penalty determined as the trimming is performed; And
Estimating a maximum likelihood of the model associated with the sample set based on the results of the l ₁ normalization
Including,
The step of performing the trimming includes trimming an entry that generates the greatest penalty for the loss function.

A trimming performing unit that performs trimming to process outlier values and heavy tailed noise for a loss function for a sample set;
A normalization performing unit that performs l ₁ normalization in consideration of a penalty determined as the trimming is performed; And
A likelihood estimator for estimating the maximum likelihood of the model associated with the sample set based on the results of the l ₁ normalization.
Including,
The trimming performing unit trims an entry that generates the largest penalty for the loss function, the maximum likelihood estimator.

The method of claim 11,
The trimming unit performs trimming on the loss function according to Equation 1,
[Equation 1]

Is the sample set for n samples, where

Is the loss function, w _i is the weight, h is the trimming parameter,
Trim the entry that generates the largest penalty according to Equation 2,
[Equation 2]

Is a parameter space, the maximum likelihood estimator.

The method of claim 12,
The problem of Equation 2 is solved using a block coordinate descent algorithm,
The block coordinate descent algorithm,

,

And

Receiving input;

,

And

Initializing to 0;
While not converging,

Performing; And

And

Steps to output
The maximum likelihood estimator is performed, including.

In the machine learning device,
Includes a maximum likelihood estimator,
The maximum likelihood estimator,
A trimming performing unit that performs trimming to process outlier values and heavy tailed noise for a loss function for a sample set;
A normalization performing unit that performs l ₁ normalization in consideration of a penalty determined as the trimming is performed; And
A likelihood estimator for estimating the maximum likelihood of the model associated with the sample set based on the results of the l ₁ normalization.
Including,
The trimming unit trims an entry that generates the greatest penalty for the loss function.