KR20220087044A

KR20220087044A - Method and system for processing missing data in artificial neural network through correction of sparsity bias of zero imputation

Info

Publication number: KR20220087044A
Application number: KR1020200177368A
Authority: KR
Inventors: 양은호; 이준영; 이주혁; 황성주
Original assignee: 한국과학기술원
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2022-06-24

Abstract

희소성 정규화(Sparsity Normalization, SN)를 통해 가변 희소성 문제(Variable Sparsity Problem, VSP)를 해결하여 누락 데이터를 간단하면서도 효과적으로 처리할 수 있는 누락 데이터 처리 방법 및 시스템을 제공한다.A method and system for processing missing data that can simply and effectively handle missing data by solving a variable sparsity problem (VSP) through sparsity normalization (SN) are provided.

Description

Method and system for processing missing data in artificial neural network through sparsity bias correction of zero imputation

아래의 설명은 제로 임퓨테이션의 희소성 편향 보정을 통한 인공 신경망의 누락 데이터 처리 방법 및 시스템에 관한 것이다.The following description relates to a method and system for processing missing data in an artificial neural network through sparsity bias correction of zero imputation.

누락된 데이터를 처리하는 것은 기계 학습의 가장 근본적인 문제 중 하나이다. 많은 접근법 중에서 가장 단순하고 직관적인 방법은 제로 임퓨테이션인데, 이는 누락된 항목의 값을 단순히 0으로 취급한다. 많은 연구들에서 제로 임퓨테이션으로 인해 신경망을 훈련시키는 데 있어 차선의 성과가 나타난다는 것을 실험적으로 확인했다. 그러나, 기존의 어떤 작업도 무엇이 그러한 성능 저하를 가져오는지 설명하지 못했다.Handling missing data is one of the most fundamental problems in machine learning. Of the many approaches, the simplest and most intuitive is zero imputation, which treats the value of a missing item as simply zero. Numerous studies have experimentally confirmed that zero imputation produces suboptimal performance in training neural networks. However, no existing work has been able to explain what is causing such performance degradation.

[선행기술문헌][Prior art literature]

한국공개특허 제10- 2020-0115001호Korean Patent Publication No. 10-2020-0115001

적어도 하나의 프로세서를 포함하는 컴퓨터 장치의 누락 데이터 처리 방법에 있어서, 상기 적어도 하나의 프로세서에 의해, 복수의 입력 벡터 및 복수의 이진 마스크를 포함하는 제1 데이터셋 및 고정 상수를 입력받는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 복수의 입력 벡터 각각에 대해 상기 이진 마스크 및 상기 고정 상수를 이용한 희소성 정규화를 적용하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 상기 희소성 정규화가 적용된 복수의 입력 벡터를 포함하는 제2 데이터셋을 제공하는 단계를 포함하는 누락 데이터 처리 방법을 제공한다.A method for processing missing data in a computer device including at least one processor, the method comprising: receiving, by the at least one processor, a first dataset including a plurality of input vectors and a plurality of binary masks and a fixed constant; applying, by the at least one processor, sparsity normalization using the binary mask and the fixed constant to each of the plurality of input vectors; and providing, by the at least one processor, a second dataset including a plurality of input vectors to which the sparsity normalization is applied.

일측에 따르면, 상기 희소성 정규화를 적용하는 단계는, 상기 복수의 입력 벡터 중 임의의 입력 벡터에 상기 고정 상수를 곱하고, 상기 임의의 입력 벡터에 대응하는 이진 마스크에 대응하는 벡터의 크기로 나누어 상기 임의의 입력 벡터에 희소성 정규화를 적용하는 것을 특징으로 할 수 있다.According to one side, the step of applying the sparsity normalization may include multiplying an arbitrary input vector among the plurality of input vectors by the fixed constant, dividing the arbitrary input vector by the size of a vector corresponding to a binary mask corresponding to the arbitrary input vector. It can be characterized by applying sparsity normalization to the input vector of .

다른 측면에 따르면, 상기 복수의 입력 벡터 각각의 좌표는 입력 벡터의 특징값 및 대응하는 이진 마스크간의 요소별 곱에 의해 생성되는 것을 특징으로 할 수 있다.According to another aspect, the coordinates of each of the plurality of input vectors may be generated by element-by-element multiplication between a feature value of the input vector and a corresponding binary mask.

또 다른 측면에 따르면, 상기 복수의 이진 마스크 각각은 MCAR(Missing Completely At Random)이고, 다른 이진 마스크 또는 대응하는 입력 벡터의 특징값에 대해 독립적인 것을 특징으로 할 수 있다.According to another aspect, each of the plurality of binary masks may be Missing Completely At Random (MCAR), and may be characterized in that it is independent of a feature value of another binary mask or a corresponding input vector.

또 다른 측면에 따르면, 상기 복수의 이진 마스크 각각은 동일한 분포를 따르는 것을 특징으로 할 수 있다.According to another aspect, each of the plurality of binary masks may follow the same distribution.

또 다른 측면에 따르면, 상기 고정 상수는 상기 복수의 이진 마스크 각각에 대응하는 벡터의 크기의 평균으로 계산되는 것을 특징으로 할 수 있다.According to another aspect, the fixed constant may be calculated as an average of magnitudes of vectors corresponding to each of the plurality of binary masks.

컴퓨터 장치와 결합되어 상기 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램을 제공한다.Provided is a computer program stored in a computer-readable recording medium in combination with a computer device to execute the method on the computer device.

상기 방법을 컴퓨터 장치에 실행시키기 위한 프로그램이 기록되어 있는 컴퓨터 판독 가능한 기록매체를 제공한다.It provides a computer-readable recording medium in which a program for executing the method in a computer device is recorded.

컴퓨터에서 판독 가능한 명령을 실행하도록 구현되는 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서에 의해, 복수의 입력 벡터 및 복수의 이진 마스크를 포함하는 제1 데이터셋 및 고정 상수를 입력받고, 상기 복수의 입력 벡터 각각에 대해 상기 이진 마스크 및 상기 고정 상수를 이용한 희소성 정규화를 적용하고, 상기 희소성 정규화가 적용된 복수의 입력 벡터를 포함하는 제2 데이터셋을 제공하는 것을 특징으로 하는 컴퓨터 장치를 제공한다.at least one processor implemented to execute computer readable instructions, receiving, by the at least one processor, a first dataset comprising a plurality of input vectors and a plurality of binary masks and a fixed constant; It provides a computer device characterized by applying sparsity normalization using the binary mask and the fixed constant to each of a plurality of input vectors, and providing a second dataset including a plurality of input vectors to which the sparsity normalization is applied. .

희소성 정규화를 통해 가변 희소성 문제를 해결하여 누락 데이터를 간단하면서도 효과적으로 처리할 수 있다.By solving the variable sparsity problem through sparsity regularization, missing data can be handled simply and effectively.

도 1 내지 도 3은 본 발명의 일실시예에 있어서, 협업 필터링(Collaborative Filtering) 데이터셋 관련 그래프들이다.
도 4 내지 도 6은 본 발명의 일실시예에 있어서, 전자 의료 기록에 대한 LSTM(Long Short Term Memory) 데이터셋 관련 그래프들이다.
도 7 내지 도 9는 본 발명의 일실시예에 있어서, 단일 세포 RNA(Ribonucleic Acid) 서열 데이터셋 관련 그래프들이다.
도 10 및 도 11은 본 발명의 일실시예에 있어서, 입력의 희소성 수준에 따른 출력 값의 변화를 나타낸 도면들이다.
도 12는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다.
도 13은 본 발명의 일실시예에 따른 누락 데이터 처리 방법의 예를 도시한 흐름도이다.1 to 3 are graphs related to a collaborative filtering dataset according to an embodiment of the present invention.
4 to 6 are graphs related to a Long Short Term Memory (LSTM) dataset for an electronic medical record according to an embodiment of the present invention.
7 to 9 are graphs related to a single cell ribonucleic acid (RNA) sequence dataset according to an embodiment of the present invention.
10 and 11 are diagrams illustrating a change in an output value according to a sparsity level of an input according to an embodiment of the present invention.
12 is a block diagram illustrating an example of a computer device according to an embodiment of the present invention.
13 is a flowchart illustrating an example of a missing data processing method according to an embodiment of the present invention.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

1. 서론1. Introduction

많은 실제 데이터셋에는 입력 특성의 하위 집합이 없는 데이터 인스턴스가 포함되는 경우가 많다. 평균과 같은 글로벌 통계를 이용하여 대체하는 것에서부터 GAN(Generative Adversarial Network)과 같은 보조 모델을 학습하여 개별적으로 대체하는 것까지 다양한 임퓨테이션이 그들 자신의 장단점을 응용할 수 있지만, 가장 단순하고 자연스러운 방법은 제로 임퓨테이션(zero imputation)인데, 우리는 단순히 누락된 특성을 0으로 취급한다. 신경망에서는, 얼핏 보면, 제로 임퓨테이션은 단순히 입력 노드와 관련된 가중치가 업데이트되는 것을 방지함으로써 누락된 입력 노드를 떨어뜨리기 때문에 합리적인 해결책이라고 생각할 수 있다. 그러나 놀랍게도 이전의 많은 연구들은 이러한 직관적 접근방식이 모델 성능에 부정적인 영향을 미친다고 보고했다. 그리고 그들 중 아무도 그러한 성능 저하의 이유를 조사하지 않았다.Many real-world datasets often contain data instances that do not have a subset of the input features. Various imputations can apply their own strengths and weaknesses, from substituting using global statistics such as mean, to substituting individually by learning an auxiliary model such as a Generative Adversarial Network (GAN), but the simplest and most natural way is to With zero imputation, we simply treat missing features as zeros. In neural networks, at first glance, zero imputation can be considered a reasonable solution because it drops missing input nodes simply by preventing the weights associated with the input nodes from being updated. Surprisingly, however, many previous studies have reported that this intuitive approach has a negative effect on model performance. And none of them investigated the reasons for such poor performance.

도 1 내지 도 3은 본 발명의 일실시예에 있어서, 협업 필터링(Collaborative Filtering) 데이터셋 관련 그래프들이다. 도 1의 그래프는 협업 필터링 데이터셋으로서의 학습 세트의 알려진 항목 수에 따른 목표 값의 평균을 나타내고 있다. 도 2의 그래프는 협업 필터링 데이터셋과 관련하여 무작위로 선택한 테스트 포인트에 대해 알려진 항목 수에 따르는 제로 임퓨테이션을 가진 모델의 예측 값을 나타내고 있다. 희소성 수준을 인위적으로 제어하기 위해 입력 마스크는 무작위로 샘플링될 수 있다. 이때, 도 2의 그래프는 x 축을 통한 각 대상 희소성 수준에 대해 50 개의 샘플이 그려져 예측 값을 산란시키고 평균을 실선으로 표시하고 있다. 도 3은 희소성 정규화로 도 2의 플롯을 수정하는 방법에 대한 도면이다. 희소성 정규화에 대해서는 이후 더욱 자세히 설명한다.1 to 3 are graphs related to a collaborative filtering dataset according to an embodiment of the present invention. The graph of FIG. 1 shows the average of the target values according to the number of known items of the training set as the collaborative filtering dataset. The graph of FIG. 2 shows the predicted values of the model with zero imputation according to the known number of items for randomly selected test points with respect to the collaborative filtering dataset. To artificially control the sparsity level, the input mask can be randomly sampled. At this time, in the graph of FIG. 2 , 50 samples are drawn for each target sparsity level along the x-axis to scatter predicted values, and the average is indicated by a solid line. 3 is a diagram of a method of modifying the plot of FIG. 2 with sparsity normalization. Sparsity normalization will be described in more detail later.

도 4 내지 도 6은 본 발명의 일실시예에 있어서, 전자 의료 기록에 대한 LSTM(Long Short Term Memory) 데이터셋 관련 그래프들이다. 도 4의 그래프는 전자 의료 기록 데이터셋으로서의 학습 세트의 알려진 항목 수에 따른 목표 값의 평균을 나타내고 있다. 도 5의 그래프는 전자 의료 기록 데이터셋과 관련하여 무작위로 선택한 테스트 포인트에 대해 알려진 항목 수에 따르는 제로 임퓨테이션을 가진 모델의 예측 값을 나타내고 있다. 희소성 수준을 인위적으로 제어하기 위해 입력 마스크는 무작위로 샘플링될 수 있다. 이때, 도 6의 그래프는 x 축을 통한 각 대상 희소성 수준에 대해 50 개의 샘플이 그려져 예측 값을 산란시키고 평균을 실선으로 표시하고 있다. 도 7은 희소성 정규화로 도 6의 플롯을 수정하는 방법에 대한 도면이다.4 to 6 are graphs related to a Long Short Term Memory (LSTM) dataset for an electronic medical record according to an embodiment of the present invention. The graph of Fig. 4 shows the average of the target values according to the number of known items in the training set as the electronic medical record dataset. The graph of Fig. 5 shows the predicted values of the model with zero imputation according to the known number of items for randomly selected test points with respect to the electronic medical record dataset. To artificially control the sparsity level, the input mask can be randomly sampled. At this time, in the graph of FIG. 6 , 50 samples are drawn for each target sparsity level along the x-axis to scatter predicted values, and the average is indicated by a solid line. Fig. 7 is a diagram of a method of modifying the plot of Fig. 6 with sparsity normalization.

도 7 내지 도 9는 본 발명의 일실시예에 있어서, 단일 세포 RNA(Ribonucleic Acid) 서열 데이터셋 관련 그래프들이다. 도 7의 그래프는 단일 세포 RNA 서열 데이터셋으로서의 학습 세트의 알려진 항목 수에 따른 목표 값의 평균을 나타내고 있다. 도 8의 그래프는 단일 세포 RNA 서열 데이터셋과 관련하여 무작위로 선택한 테스트 포인트에 대해 알려진 항목 수에 따르는 제로 임퓨테이션을 가진 모델의 예측 값을 나타내고 있다. 희소성 수준을 인위적으로 제어하기 위해 입력 마스크는 무작위로 샘플링될 수 있다. 이때, 도 8의 그래프는 x 축을 통한 각 대상 희소성 수준에 대해 50 개의 샘플이 그려져 예측 값을 산란시키고 평균을 실선으로 표시하고 있다. 도 9는 희소성 정규화로 도 6의 플롯을 수정하는 방법에 대한 도면이다.7 to 9 are graphs related to a single cell ribonucleic acid (RNA) sequence dataset according to an embodiment of the present invention. The graph of FIG. 7 shows the average of the target values according to the number of known items of the training set as a single-cell RNA sequence dataset. The graph of Figure 8 shows the predicted values of the model with zero imputation according to the known number of items for a randomly selected test point with respect to a single cell RNA sequence dataset. To artificially control the sparsity level, the input mask can be randomly sampled. At this time, in the graph of FIG. 8 , 50 samples are drawn for each target sparsity level along the x-axis to scatter predicted values, and the average is indicated by a solid line. Fig. 9 is a diagram of a method of modifying the plot of Fig. 6 with sparsity normalization.

이하에서는 제로 임퓨테이션으로 인해 입력에서 누락된 항목 수에 따라 신경망의 출력이 크게 달라진다는 것을 설명한다. 본 발명의 실시예들에서는 이 현상을 많은 실제 작업에서 피해야 하는 가변 희소성 문제(Variable Sparsity Problem, VSP)라고 명명한다. 예를 들어, 영화 추천자 시스템을 고려할 수 있다. (실제 등급 값과 관계없이) 평가한 영화의 수가 다르다고 해서 예측 등급의 평균을 다르게 받는 것은 바람직하지 않다. 등급이 낮은 사람들은 일반적으로 영화를 좋아하지 않으며 더 높은 등급을 가진 사람들에게 더 높은 예측 가치를 주는 것은 당연하다고 주장할 수 있다. 이는 일부 희소성 레벨의 사용자에게는 부분적으로 사실일 수 있지만, 더 넓은 범위의 희소성 레벨에 대해 균일하게 적용되는 일반적인 경우는 아니다. 이는 사용자가 알려진 등급 수에 관계없이 시험 데이터에 대해 유사한 평균 등급을 갖는 실제 협업 필터링(Collaborative Filtering) 데이터셋(도 1의 그래프 참조)에서 확인할 수 있다. 그러나 제로 임퓨테이션을 가지는 표준 신경망에서는 모델의 추론이 도 2, 도 5 및 도 8의 그래프들과 같이 데이터 인스턴스의 알려진 항목 수와 상관관계가 있음을 나타낸다. 의료 영역과 같은 일부 안전 중요 애플리케이션에서는 치명적일 수 있다. 예를 들어 환자의 질병 발병 확률을 의료 검사 횟수에 따라 다르게 평가해서는 안 된다(일부 환자가 많이 검진을 받았다고 해서 사망 확률을 높게 예측하는 모델은 원하지 않는다!).Hereinafter, it is explained that the output of a neural network varies greatly depending on the number of items missing from the input due to zero imputation. In embodiments of the present invention, this phenomenon is called a Variable Sparsity Problem (VSP), which should be avoided in many practical tasks. For example, consider a movie recommender system. It is not desirable to receive a different average of predicted ratings for different numbers of movies rated (regardless of actual rating values). It can be argued that people with lower ratings generally don't like movies, and it is reasonable to give people with higher ratings a higher predictive value. This may be partly true for users of some sparsity levels, but this is not the general case for a wider range of sparsity levels. This can be seen in a real Collaborative Filtering dataset (see graph in Fig. 1) where users have similar average ratings for the test data regardless of the number of known ratings. However, in a standard neural network with zero imputation, it indicates that the model's inference is correlated with the known number of items in the data instance, such as the graphs of FIGS. 2, 5 and 8 . In some safety critical applications, such as the medical field, it can be fatal. For example, you shouldn't evaluate a patient's probability of developing a disease differently based on the number of medical tests (we don't want a model that predicts a high probability of death just because some patients have been screened a lot!).

또한, 본 발명의 실시예들에서는 이론적으로 여러 상황에서 VSP의 존재를 분석하고, 제로 임퓨테이션의 직관적인 장점, 즉 각 데이터 인스턴스의 0이 아닌 항목 수로 정상화하는 것을 유지하면서 VSP를 억제하는 단순하면서도 효과적인 수단을 제안한다. 이러한 정규화를 "희소성 정규화(Sparsity Normalization, SN)"라고 하며, 그것이 VSP를 효과적으로 처리하여 신경망 훈련의 성능과 안정성을 모두 크게 향상시키는 결과를 초래한다는 것을 보여준다.In addition, in the embodiments of the present invention, theoretically, the existence of VSP in various situations is analyzed, and the intuitive advantage of zero imputation, that is, a simple yet suggest effective means. This regularization is called "Sparity Normalization (SN)" and shows that it effectively handles VSPs, resulting in greatly improving both the performance and stability of neural network training.

이하에서는 가변 희소성 문제(VSP)로 언급하는 제로 임퓨테이션의 부작용의 원인을 식별하고, 이 문제가 실제로 신경망의 훈련과 추론에 어떻게 영향을 미치는가를 설명하고, 또한 VSP를 사용하여 명확하게 설명되지 않았거나 잘못 이해한 현상을 이해하기 위한 새로운 관점을 설명한다.In the following, we identify the causes of the side effects of zero computation, referred to as the variable sparsity problem (VSP), explain how this problem actually affects the training and inference of neural networks, and also explain how this problem is not explicitly explained using VSP. or explain a new perspective to understand a phenomenon that has been misunderstood.

또한, 이하에서는 희소성 정규화(SN)를 제시하고 이론적으로 SN이 특정 조건에서 VSP를 해결할 수 있음을 설명한다. 이에 더해, SN을 적용하는 것만으로도 VSP를 효과적으로 완화하거나 해결할 수 있음을 설명한다.Also, below, we present sparsity normalization (SN) and explain that SN can theoretically solve VSP under certain conditions. In addition, it is explained that VSP can be effectively mitigated or solved just by applying SN.

2. 가변 희소성 문제2. Variable sparsity problem

가변 희소성 문제는 다음과 같이 정의될 수 있다.The variable sparsity problem can be defined as

VSP: 신경망의 출력층(가중치 및 입력분포 초과)의 기대값이 입력 데이터의 희소성(0 값의 수)에 따라 달라지는 현상VSP: A phenomenon in which the expected value of the output layer of a neural network (exceeding weights and input distribution) depends on the sparsity (number of zero values) of the input data.

도 10 및 도 11은 본 발명의 일실시예에 있어서, 입력의 희소성 수준에 따른 출력 값의 변화를 나타낸 도면들이다. 도 10은 가변 희소성 레벨을 이용한 신경망을, 도 11은 희소성 정규화를 이용한 신경망을 각각 나타내고 있다. 이때, 도 10 및 도 11에서 더 어두운 색은 더 큰 절대값을 나타내고 점선 원은 제로 임퓨테이션을 가진 누락된 노드를 나타낸다. 희소성 정규화는 네트워크의 가능한 출력 범위를 희소성 레벨에 비해 더 안정적으로 만들 수 있다.10 and 11 are diagrams illustrating a change in an output value according to a sparsity level of an input according to an embodiment of the present invention. FIG. 10 shows a neural network using variable sparsity levels, and FIG. 11 shows a neural network using sparsity normalization, respectively. At this time, in FIGS. 10 and 11 , a darker color indicates a larger absolute value and a dotted circle indicates a missing node with zero imputation. Sparsity normalization can make the range of possible outputs of the network more stable compared to the sparsity level.

VSP를 사용하면 신경망의 활성화 값이 0인 항목 수에 따라 정확히 동일한 입력 인스턴스에 대해 크게 달라질 수 있다. 이는 훈련을 더욱 어렵게 만들고 모델을 잘못된 예측으로 호도할 수 있다.With VSP, the activation value of the neural network can vary significantly for exactly the same input instance, depending on the number of zero entries. This makes training more difficult and can mislead the model into making erroneous predictions.

제로 임퓨테이션은 누락된 입력 특성을 떨어뜨린다는 점에서 직관적이지만, 아래와 같은 사례 1 내지 사례 3 에서 VSP를 야기한다는 것을 보여준다. 구체적으로는 일반성이 증가하는 가정 하에서 VSP를 보여준다.Zero imputation is intuitive in that it drops the missing input characteristics, but we show that it causes VSP in cases 1 to 3 below. Specifically, we show VSP under the assumption of increasing generality.

(사례 1) 활성화 함수가 편향 없는 아이덴티티 매핑인 경우.(Case 1) When the activation function is an unbiased identity mapping.

(사례 2) 활성화 함수가 affine 함수인 경우(Case 2) When the activation function is an affine function

(사례 3) 활성화 함수가 ReLU(Rectified Linear Unit), leaky ReLU, ELU (Exponential Linear Unit) 또는 Softplus와 같이 감소하지 않는 볼록함수인 경우.(Case 3) When the activation function is a convex function that does not decrease, such as Rectified Linear Unit (ReLU), leaky ReLU, Exponential Linear Unit (ELU), or Softplus.

여기에서는 명확성을 위해 표기법을 요약한다. 비선형성을 갖는 L-계층 딥 네트워크의 경우,

을 사용하여 i-번째 계층의 가중치 행렬을 나타내고,

는 편향을 나타내며,

는 활성화 벡터를 나타낸다. 단순성을 위해

과

을 각각 사용하여 입력 및 출력 계층을 표현함으로써 다음 수학식 1을 얻을 수 있다.The notation is summarized here for clarity. For an L-layer deep network with nonlinearity,

denote the weight matrix of the i-th layer using

represents the bias,

denotes an activation vector. for simplicity

class

The following Equation 1 can be obtained by expressing the input and output layers using respectively.

(입력 x)의 희소성이 변화함에 따라

의 변화를 관찰하기 위해, 다음과 같은 가정들을 고려한다.

As the sparsity of (input x) changes

To observe the change in , consider the following assumptions.

가정 1. (i) 입력 벡터

의 모든 좌표는

과

의 두 랜덤 변수의 요소별 곱에 의해 생성될 수 있다. 여기서

은 누락값을 나타내는 이진 마스크이고

은 (관찰되지 않은) 입력 벡터의 특징값이다. 여기서 누락된 마스크

은 MCAR(missing completely at random)이며, 다른 마스크 변수 또는 해당 값

에 대한 종속성이 없다. 모든

은 동일한 분포를 따르며 기대값은

이고 (ii) 행렬

의 원소는 상호 독립적이며 모두 동일한 분포를 따르고, 각 원소의 기대값은

이평균다. 마찬가지로

와

은 각각 평균

와

를 가진 i.i.d 좌표로 구성된다. (iii)

는 모든 i에 대해 균일하게 0이 아니다.Assumption 1. (i) input vector

All coordinates of

class

It can be generated by element-wise product of two random variables of here

is the binary mask representing the missing values,

is the feature value of the (unobserved) input vector. missing mask here

is missing completely at random (MCAR), and is another mask variable or its value

There is no dependency on every

follows the same distribution and the expected value is

and (ii) the matrix

The elements of is independent of each other and all follow the same distribution, and the expected value of each element is

this is average Likewise

Wow

is the average of each

Wow

i with . i . It consists of d coordinates. (iii)

is uniformly nonzero for all i .

(i)은 가장 간단한 누락 메커니즘을 가정한다. (ii)는 이미 잘 알려진 가중치 초기화 기술 연구에서와 유사하게 정의된다. (iii)은 일부 초기화 전략에서는 유지되지 않을 수 있지만 학습이 진행됨에 따라 유지될 가능성이 매우 높다.(i) assumes the simplest omission mechanism. (ii) is defined similarly as in the study of well-known weight initialization techniques. (iii) may not hold for some initialization strategies, but is very likely to persist as learning progresses.

(사례 1) 단순성을 위해 우선 비선형성이나 편향성이 없는 네트워크를 고려해보자. 정리 1은 출력층

의 평균값이 마스크 벡터

의 예상값과 정비례한다는 것을 보여준다.(Case 1) For simplicity, first consider a network without nonlinearity or bias. Theorem 1 is the output layer

The mean value of the mask vector

shows that it is directly proportional to the expected value of

정리 1. 활성화

가 항등함수이고 가정 1에서

이 균일하게 0으로 고정된다고 가정하자. 그러면

를 얻는다.Cleanup 1. Activation

is an identity function, and under assumption 1

Assume that this is uniformly fixed to zero. then

to get

(사례 2) 활성화 함수가 affine이지만 0이 아닌 편향 가능성이 있는 경우,

는 다음과 같은 방법으로

의 영향을 받는다.(Case 2) If the activation function is affine, but there is a non-zero potential for bias,

in the following way

is affected by

정리 2. 활성화

는 가정 1에 따른 affine 함수라고 가정한다. 더 나아가

는

로 정의된다고 가정하자. 그러면

이다.Cleanup 2. Activation

Assume that is an affine function according to assumption 1. Furthermore

Is

Assume that it is defined as then

to be.

(사례 3) 마지막으로 활성화 함수가 비선형이지만 감소하지 않고 볼록한 경우,

는

와 관련된 일부 수량에 의해 하한선을 보인다는 것을 보여줄 수 있다.(Case 3) Finally, if the activation function is non-linear but not decreasing and convex,

Is

It can be shown that there is a lower bound by some quantity related to

정리 3.

는 가정 1 하에서 감소하지 않는 볼록함수라고 가정한다. 더 나아가

가

및

으로 정의된다고 가정하자. 그러면

이다.Arrangement 3.

Is Assume that it is a convex function that does not decrease under assumption 1. Furthermore

go

and

Assume that it is defined as then

to be.

출력 계층(또는 출력 계층의 하한)의 기대값이 정리 1-3과 같이 희소성/누락성의 수준에 따라 달라지는 경우, 유사한 데이터 인스턴스라도 희소성 수준에 따라 출력값이 다를 수 있으며, 이는 모델의 공정하고 정확한 추론을 방해할 수 있다. 도 2, 도 5 및 도 8의 그래프들에서 보듯이 VSP는 위의 조건이 유지되지 않는 신경망 훈련의 실제 설정에서도 쉽게 발생할 수 있다.If the expected value of the output layer (or the lower bound of the output layer) depends on the level of sparsity/omission, as in Theorem 1-3, even similar instances of data may have different outputs depending on the level of sparsity, which is a fair and accurate reasoning of the model. may interfere with As shown in the graphs of FIGS. 2, 5 and 8 , VSP can easily occur even in the actual setting of neural network training in which the above condition is not maintained.

3. 희소성 정규화3. Sparsity Normalization

이하에서는 VSP를 해결하기 위해 간단하지만 놀랍도록 효과적인 방법을 제안한다. 활성화의 선형성이 보정을 단순화하기 때문에 예상 출력을 입력 희소성 수준과 독립적으로 만드는 방법을 찾기 위해 먼저 (사례 2)를 다시 살펴본다.

(

는 요소별 곱을 나타냄)의 표기법을 상기하면, 고정 상수

에 대해

을 통한 단순 정규화는 입력 희소성에 대한 의존성을 떨어뜨릴 수 있다는 것을 알 수 있다. 이미 설명한 바와 같이 이 간단한 정규화 기법의 이름을 희소성 정규화(Sparsity Normalization, SN)라 정하고 아래 표 1의 알고리즘 1에 기술한다.Below we propose a simple but surprisingly effective way to solve VSP. Since the linearity of the activation simplifies the calibration, we first revisit (Case 2) to find a way to make the expected output independent of the input sparsity level.

(

Recalling the notation of element-by-element product), the fixed constant

About

It can be seen that simple regularization through As already described, the name of this simple regularization technique is called Sparsity Normalization (SN), and it is described in Algorithm 1 of Table 1 below.

개념적으로 이 방법은 각 입력값의 크기를 그 희소성에 따라 조절함으로써, 도 2의 (b)에 나타난 바와 같이 출력 크기의 변화가 희소성에 덜 민감하게 된다. SN에 의한 희소성 편향의 교정에 관한 공식 설명은 이 특별한 경우에 다음과 같다.Conceptually, this method adjusts the size of each input value according to the sparsity, so that the change in the output size becomes less sensitive to the sparsity as shown in FIG. 2(b). The official explanation for the correction of sparsity bias by SN is as follows in this particular case.

정리 4. (희소성 정규화를 이용하여) 활성화

가 가정 1에 따른 affine 함수라고 가정한다. 더 나아가

라고 가정하고 SN을 사용하여 입력 계층을 교체한다. 즉, 고정 상수

에 대해

이다. 그러면

을 얻는다.Arrangement 4. Activate (using sparsity regularization)

Assume that is an affine function according to assumption 1. Furthermore

, and use SN to replace the input layer. That is, a fixed constant

About

to be. then

to get

정리 2에서와 달리 정리 4에서 SN은 입력의 희소성을 결정하는

으로부터 평균 활성화가 독립되도록 만든다.

는 일반적으로 유지되지 않기 때문에 SN을 이용하여 (사례 3)의 카운터파트를 보여주는 것은 사소한 일이 아니다. 그러나 광범위한 실험을 통해 보다 일반적인 경우에도 SN이 실질적으로 효과적이라는 것을 알 수 있었다.Unlike in Theorem 2, in Theorem 4, SN determines the sparsity of the input.

make the mean activation independent from

It is not trivial to show the counterpart of (Case 3) using the SN, since in general is not maintained. However, extensive experiments have shown that SN is practically effective even in the more general case.

정리 4는

이 모든 데이터 인스턴스에 걸쳐 알려져 있고 고정되어 있다고 가정하지만, 실제로 이러한 가정을 완화하고 데이터 인스턴스들 간에 다양한

을 고려할 수 있다. 특히, 최대우도 원칙에 의해 각 인스턴스에 대해

으로

을 추정할 수 있다. 따라서

(알고리즘 1 참조)에서

을 얻는다. 실제로, 훈련 세트의 모든 예에 대해

를

의 평균으로 사용하는 것을 권장한다.

가 너무 작으면(예:

= 1) 다잉(dying) ReLU 현상이 나타날 수 있다. 하이퍼 파라미터

는 경사(gradient) 크기 조절을 통해 정규화 효과를 가져올 수 있으므로

를 정의하여 평균 스케일이 정규화를 전후하여 일정하게 유지되도록 하며, SN에 의해 야기되는 부작용을 최소화할 수 있다.Theorem 4

It is assumed that these assumptions are known and fixed across all data instances, but in practice we relax these assumptions and vary between data instances.

can be considered. In particular, for each instance by the maximum likelihood principle,

by

can be estimated. therefore

(see Algorithm 1)

to get In fact, for every example in the training set,

cast

It is recommended to use the average of

is too small (e.g.

= 1) Dying ReLU phenomenon may appear. hyperparameter

can bring normalization effect through gradient scaling, so

is defined so that the average scale is kept constant before and after normalization, and side effects caused by SN can be minimized.

도 12는 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다. 본 발명의 실시예들에 따른 누락 데이터 처리 시스템은 적어도 하나의 컴퓨터 장치(1200) 또는 (각각이 컴퓨터 장치(1200)에 대응하는) 복수의 컴퓨터 장치들에 의해 구현될 수 있다. 예를 들어, 컴퓨터 장치(1200)에는 일실시예에 따른 컴퓨터 프로그램이 설치 및 구동될 수 있고, 컴퓨터 장치(1200)는 구동된 컴퓨터 프로그램의 제어에 따라 본 발명의 실시예들에 따른 누락 데이터 처리 방법을 수행할 수 있다.12 is a block diagram illustrating an example of a computer device according to an embodiment of the present invention. The missing data processing system according to embodiments of the present invention may be implemented by at least one computer device 1200 or a plurality of computer devices (each corresponding to the computer device 1200 ). For example, a computer program according to an embodiment may be installed and driven in the computer device 1200 , and the computer device 1200 processes missing data according to embodiments of the present invention under the control of the driven computer program. method can be performed.

이러한 컴퓨터 장치(1200)는 도 12에 도시된 바와 같이, 메모리(1210), 프로세서(1220), 통신 인터페이스(1230) 그리고 입출력 인터페이스(1240)를 포함할 수 있다. 메모리(1210)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 디스크 드라이브와 같은 비소멸성 대용량 기록장치는 메모리(1210)와는 구분되는 별도의 영구 저장 장치로서 컴퓨터 장치(1200)에 포함될 수도 있다. 또한, 메모리(1210)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(1210)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 메모리(1210)로 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 인터페이스(1230)를 통해 메모리(1210)에 로딩될 수도 있다. 예를 들어, 소프트웨어 구성요소들은 네트워크(1260)를 통해 수신되는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 컴퓨터 장치(1200)의 메모리(1210)에 로딩될 수 있다.As shown in FIG. 12 , the computer device 1200 may include a memory 1210 , a processor 1220 , a communication interface 1230 , and an input/output interface 1240 . The memory 1210 is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive. Here, a non-volatile mass storage device such as a ROM and a disk drive may be included in the computer device 1200 as a separate permanent storage device distinct from the memory 1210 . Also, an operating system and at least one program code may be stored in the memory 1210 . These software components may be loaded into the memory 1210 from a computer-readable recording medium separate from the memory 1210 . The separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card. In another embodiment, the software components may be loaded into the memory 1210 through the communication interface 1230 instead of a computer-readable recording medium. For example, the software components may be loaded into the memory 1210 of the computer device 1200 based on a computer program installed by files received through the network 1260 .

프로세서(1220)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(1210) 또는 통신 인터페이스(1230)에 의해 프로세서(1220)로 제공될 수 있다. 예를 들어 프로세서(1220)는 메모리(1210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 1220 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to processor 1220 by memory 1210 or communication interface 1230 . For example, the processor 1220 may be configured to execute a received instruction according to a program code stored in a recording device such as the memory 1210 .

통신 인터페이스(1230)은 네트워크(1260)를 통해 컴퓨터 장치(1200)가 다른 장치(일례로, 앞서 설명한 저장 장치들)와 서로 통신하기 위한 기능을 제공할 수 있다. 일례로, 컴퓨터 장치(1200)의 프로세서(1220)가 메모리(1210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이나 명령, 데이터, 파일 등이 통신 인터페이스(1230)의 제어에 따라 네트워크(1260)를 통해 다른 장치들로 전달될 수 있다. 역으로, 다른 장치로부터의 신호나 명령, 데이터, 파일 등이 네트워크(1260)를 거쳐 컴퓨터 장치(1200)의 통신 인터페이스(1230)를 통해 컴퓨터 장치(1200)로 수신될 수 있다. 통신 인터페이스(1230)를 통해 수신된 신호나 명령, 데이터 등은 프로세서(1220)나 메모리(1210)로 전달될 수 있고, 파일 등은 컴퓨터 장치(1200)가 더 포함할 수 있는 저장 매체(상술한 영구 저장 장치)로 저장될 수 있다.The communication interface 1230 may provide a function for the computer device 1200 to communicate with other devices (eg, the storage devices described above) through the network 1260 . For example, a request, command, data, file, etc. generated by the processor 1220 of the computer device 1200 according to a program code stored in a recording device such as the memory 1210 is transmitted to the network ( 1260) to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device 1200 through the communication interface 1230 of the computer device 1200 via the network 1260 . A signal, command, or data received through the communication interface 1230 may be transferred to the processor 1220 or the memory 1210 , and the file may be a storage medium (described above) that the computer device 1200 may further include. persistent storage).

입출력 인터페이스(1240)는 입출력 장치(1250)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 마이크, 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 디스플레이, 스피커와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(1240)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 입출력 장치(1250)는 컴퓨터 장치(1200)와 하나의 장치로 구성될 수도 있다.The input/output interface 1240 may be a means for an interface with the input/output device 1250 . For example, the input device may include a device such as a microphone, keyboard, or mouse, and the output device may include a device such as a display or a speaker. As another example, the input/output interface 1240 may be a means for an interface with a device in which functions for input and output are integrated into one, such as a touch screen. The input/output device 1250 may be configured as one device with the computer device 1200 .

또한, 다른 실시예들에서 컴퓨터 장치(1200)는 도 7의 구성요소들보다 더 적은 혹은 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 장치(1200)는 상술한 입출력 장치(1250) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다.Also, in other embodiments, the computer device 1200 may include fewer or more components than those of FIG. 7 . However, there is no need to clearly show most of the prior art components. For example, the computer device 1200 may be implemented to include at least a portion of the above-described input/output device 1250 or may further include other components such as a transceiver and a database.

도 13은 본 발명의 일실시예에 따른 누락 데이터 처리 방법의 예를 도시한 흐름도이다. 본 실시예에 따른 누락 데이터 처리 방법은 누락 데이터 처리 시스템을 구현하는 컴퓨터 장치(1200)에 의해 수행될 수 있다. 이때, 컴퓨터 장치(1200)의 프로세서(1220)는 메모리(1210)가 포함하는 운영체제의 코드나 적어도 하나의 컴퓨터 프로그램의 코드에 따른 제어 명령(instruction)을 실행하도록 구현될 수 있다. 여기서, 프로세서(1220)는 컴퓨터 장치(1200)에 저장된 코드가 제공하는 제어 명령에 따라 컴퓨터 장치(1200)가 도 13의 방법이 포함하는 단계들(1310 내지 1330)을 수행하도록 컴퓨터 장치(200)를 제어할 수 있다.13 is a flowchart illustrating an example of a missing data processing method according to an embodiment of the present invention. The missing data processing method according to the present embodiment may be performed by the computer device 1200 implementing the missing data processing system. In this case, the processor 1220 of the computer device 1200 may be implemented to execute a control instruction according to a code of an operating system included in the memory 1210 or a code of at least one computer program. Here, the processor 1220 causes the computer device 200 to perform steps 1310 to 1330 included in the method of FIG. 13 according to a control command provided by a code stored in the computer device 1200 . can control

단계(1310)에서 컴퓨터 장치(200)는 복수의 입력 벡터 및 복수의 이진 마스크를 포함하는 제1 데이터셋 및 고정 상수를 입력받을 수 있다. 이러한 제1 데이터셋은 앞서 표 1의 알고리즘 1에 나타난 데이터셋

에 대응할 수 있다. 이 경우, 복수의 입력 벡터 각각은

에 대응할 수 있으며, 복수의 이진 마스크 각각은

에 대응할 수 있다. 또한, 고정 상수는

에 대응할 수 있다. 여기서, 복수의 입력 벡터 각각의 좌표는 입력 벡터의 특징값 및 대응하는 이진 마스크간의 요소별 곱에 의해 생성될 수 있다. 특징값은 앞서 설명한

에 대응할 수 있으며, 이러한 특징값과 이진 마스크간의 요소별 곱은

로 표현될 수 있다. 이때, 복수의 이진 마스크 각각은 MCAR(Missing Completely At Random)이고, 다른 이진 마스크 또는 대응하는 입력 벡터의 특징값에 대해 독립적일 수 있다. 또한, 복수의 이진 마스크 각각은 복수의 이진 마스크의 평균과 동일한 분포를 따르도록 얻어질 수 있다. 또한, 고정 상수는 복수의 이진 마스크 각각에 대응하는 벡터의 크기의 평균으로 계산될 수 있다. 앞서, 고정 상수가

와 같이 정의될 수 있음을 설명한 바 있다.In operation 1310 , the computer device 200 may receive a first dataset including a plurality of input vectors and a plurality of binary masks and a fixed constant. This first dataset is the dataset shown in Algorithm 1 of Table 1 above.

can respond to In this case, each of the plurality of input vectors is

can correspond to, and each of the plurality of binary masks is

can respond to Also, the fixed constant is

can respond to Here, the coordinates of each of the plurality of input vectors may be generated by element-by-element multiplication between a feature value of the input vector and a corresponding binary mask. The feature values are

, and the element-by-element product between these feature values and the binary mask is

can be expressed as In this case, each of the plurality of binary masks is a Missing Completely At Random (MCAR), and may be independent of a feature value of another binary mask or a corresponding input vector. Also, each of the plurality of binary masks may be obtained to follow the same distribution as the average of the plurality of binary masks. Also, the fixed constant may be calculated as an average of magnitudes of vectors corresponding to each of the plurality of binary masks. Earlier, the fixed constant

It has been explained that it can be defined as

단계(1320)에서 컴퓨터 장치(200)는 복수의 입력 벡터 각각에 대해 이진 마스크 및 고정 상수를 이용한 희소성 정규화를 적용할 수 있다. 예를 들어, 컴퓨터 장치(200)는 복수의 입력 벡터 중 임의의 입력 벡터에 고정 상수를 곱하고, 임의의 입력 벡터에 대응하는 이진 마스크에 대응하는 벡터의 크기로 나누어 임의의 입력 벡터에 희소성 정규화를 적용할 수 있다. 이러한 과정은 표 1의 알고리즘에서 설명하는

에 대응될 수 있다.In operation 1320 , the computer device 200 may apply sparsity normalization using a binary mask and a fixed constant to each of the plurality of input vectors. For example, the computer device 200 multiplies an arbitrary input vector among a plurality of input vectors by a fixed constant and divides it by the magnitude of a vector corresponding to a binary mask corresponding to the arbitrary input vector to perform sparsity normalization on the arbitrary input vector. can be applied This process is described in the algorithm in Table 1.

can correspond to

단계(1330)에서 컴퓨터 장치(200)는 희소성 정규화가 적용된 복수의 입력 벡터를 포함하는 제2 데이터셋을 제공할 수 있다. 이러한 제2 데이터셋은 표 1의 알고리즘 1에 나타난 희소성 정규화된 데이터셋

에 대응할 수 있다. 이처럼, 각 입력값의 크기를 그 희소성에 따라 조절함으로써, 출력 크기의 변화가 희소성에 덜 민감해지도록 희소성 정규화된 데이터셋을 얻을 수 있게 된다.In operation 1330 , the computer device 200 may provide a second dataset including a plurality of input vectors to which sparsity normalization is applied. This second dataset is the sparsity normalized dataset shown in Algorithm 1 of Table 1.

can respond to In this way, by adjusting the size of each input value according to its sparsity, it is possible to obtain a sparsity-normalized dataset so that changes in the output size are less sensitive to sparsity.

이와 같이, 본 발명의 실시예들에 따르면, 희소성 정규화를 통해 가변 희소성 문제를 해결하여 누락 데이터를 간단하면서도 효과적으로 처리할 수 있다.As such, according to embodiments of the present invention, it is possible to simply and effectively process missing data by solving the variable sparsity problem through sparsity normalization.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component or a combination of a hardware component and a software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications executed on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The medium may continuously store a computer executable program, or may be a temporary storage for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or several hardware combined, it is not limited to a medium directly connected to any computer system, and may exist distributedly on a network. Examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like. In addition, examples of other media may include recording media or storage media managed by an app store for distributing applications, sites supplying or distributing other various software, and servers. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method for processing missing data in a computer device comprising at least one processor, the method comprising:
receiving, by the at least one processor, a first dataset including a plurality of input vectors and a plurality of binary masks and a fixed constant;
applying, by the at least one processor, sparsity normalization using the binary mask and the fixed constant to each of the plurality of input vectors; and
providing, by the at least one processor, a second dataset including a plurality of input vectors to which the sparsity normalization is applied;
A method of handling missing data, including

According to claim 1,
The step of applying the sparsity normalization comprises:
Sparsity normalization is applied to the arbitrary input vector by multiplying an arbitrary input vector among the plurality of input vectors by the fixed constant and dividing by the size of a vector corresponding to a binary mask corresponding to the arbitrary input vector How to handle missing data.

According to claim 1,
The method for processing missing data, characterized in that the coordinates of each of the plurality of input vectors are generated by element-by-element multiplication between a feature value of the input vector and a corresponding binary mask.

According to claim 1,
Each of the plurality of binary masks is a Missing Completely At Random (MCAR), and is independent of a feature value of another binary mask or a corresponding input vector.

According to claim 1,
Each of the plurality of binary masks follows the same distribution.

According to claim 1,
The fixed constant is calculated as an average of magnitudes of vectors corresponding to each of the plurality of binary masks.

A computer program stored in a computer-readable recording medium in combination with a computer device to cause the computer device to execute the method of any one of claims 1 to 6.

A computer-readable recording medium in which a computer program for executing the method of any one of claims 1 to 6 in a computer device is recorded.

at least one processor implemented to execute computer-readable instructions
including,
by the at least one processor;
receiving a first dataset including a plurality of input vectors and a plurality of binary masks and a fixed constant;
applying sparsity normalization using the binary mask and the fixed constant to each of the plurality of input vectors;
Providing a second dataset including a plurality of input vectors to which the sparsity normalization is applied
A computer device comprising a.

10. The method of claim 9,
by the at least one processor;
applying sparsity normalization to the arbitrary input vector by multiplying an arbitrary input vector among the plurality of input vectors by the fixed constant and dividing by the magnitude of a vector corresponding to a binary mask corresponding to the arbitrary input vector
A computer device comprising a.

10. The method of claim 9,
The coordinates of each of the plurality of input vectors are generated by element-by-element multiplication between the feature values of the input vectors and the corresponding binary masks.
A computer device comprising a.

10. The method of claim 9,
Each of the plurality of binary masks is a Missing Completely At Random (MCAR), and is independent of a feature value of another binary mask or a corresponding input vector.
A computer device comprising a.

10. The method of claim 9,
each of the plurality of binary masks follows the same distribution
A computer device comprising a.

10. The method of claim 9,
The fixed constant is calculated as an average of the magnitudes of vectors corresponding to each of the plurality of binary masks.
A computer device comprising a.