KR102093090B1

KR102093090B1 - System and method for classifying base on generative adversarial network using labeled data

Info

Publication number: KR102093090B1
Application number: KR1020200023895A
Authority: KR
Inventors: 투옌; 노철균; 민예린
Original assignee: 주식회사 애자일소다
Priority date: 2020-02-26
Filing date: 2020-02-26
Publication date: 2020-03-25

Abstract

Disclosed are a generative adversarial network based classification system using labeled data and a method thereof. The generative adversarial network based classification system can be trained even by using a labeled data set having missing data and an unbalanced data set by using missing data generated by a generative adversarial network (GAN). The generative adversarial network based classification system comprises a generator (100), a discriminator (200), an actor (400), and a weight function part (500).

Description

A system and method for generating hostile neural networks based on label data {SYSTEM AND METHOD FOR CLASSIFYING BASE ON GENERATIVE ADVERSARIAL NETWORK USING LABELED DATA}

본 발명은 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 시스템 및 방법에 관한 발명으로서, 더욱 상세하게는 생성적 적대 신경망(Generative Adversarial Network; GAN)으로 생성한 결측 대체값을 이용하여 결측 데이터가 존재하는 레이블링 데이터 세트와 불균형한 데이터 세트에서도 학습할 수 있는 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for generating hostile neural networks based classification systems using label data, and more specifically, missing data using missing alternative values generated by a generative adversarial network (GAN) The present invention relates to a system and method for generating hostile neural network based classification using label data that can be learned from labeling data sets and unbalanced data sets.

머신 러닝은 복잡한 시스템이 명시적으로 프로그래밍되지 않고서 경험으로부터 자동으로 학습하고 개선할 수 있게 하는 인공 지능의 응용이다. Machine learning is an application of artificial intelligence that allows complex systems to learn and improve automatically from experience without being explicitly programmed.

머신 러닝 모델들의 정확도 및 유효성은 그들 모델들을 훈련시키는 데 사용되는 데이터에 부분적으로 의존할 수 있다. The accuracy and effectiveness of machine learning models can depend in part on the data used to train them.

예를 들어, 머신 러닝 분류자(Classifier)들은 레이블이 있는(또는 레이블링된 데이터(Labeled data) 세트를 사용하여 훈련될 수 있는데, 여기서 분류자가 인식하도록 학습할 데이터의 샘플들이 샘플에 대한 분류(Classification)를 식별하는 하나 이상의 레이블들과 함께 분류자에 제공된다.For example, machine learning classifiers can be trained using a set of labeled (or labeled data), where samples of data to learn to be recognized by the classifier are classified into samples. ) Is provided to the classifier with one or more labels identifying it.

여기서, 레이블링된 데이터는 데이터에 대한 답이 주어져 있는 것(또는 평가가 되어 있는 것)을 말한다.Here, the labeled data means that the answer to the data is given (or evaluated).

그러나, 의사 결정 시스템에서는 종종 다음과 같은 문제들로 어려움을 겪을 때가 발생한다.However, in a decision-making system, it often happens that the following problems arise.

하나는 결측 데이터와 같은 정보를 포함하는 불량 데이터 처리로서, 결측 데이터는 데이터 세트의 전체적인 질을 낮추며, 의사 결정 시스템으로부터 예측된 결과를 왜곡시키는 문제점이 있다.One is bad data processing that includes information such as missing data, and the missing data lowers the overall quality of the data set, and there is a problem of distorting the predicted results from the decision system.

또 다른 하나는 데이터 세트(set)의 불균형으로서, 이러한 불균형의 차이는 매우 심하고, 소수의 클래스는 데이터에서 매우 작은 부분만을 차지하지만, 결과적으로 그러한 클래스의 샘플은 의사 결정 시스템을 업데이트 하는 과정에서 거의 쿼리(Quarry)되지 않는 문제점이 있다.The other is the imbalance of the data set, the difference between these imbalances is very severe, and a small number of classes occupy only a very small part of the data, but as a result, samples of those classes are almost never in the process of updating the decision-making system. There is a problem that is not queried.

한국 공개특허공보 공개번호 제10-2019-0117969호(발명의 명칭: 레이블 있는 데이터 및 레이블 없는 데이터를 병용하는 준지도 강화 학습 방법 및 이를 이용한 장치)Korean Patent Application Publication No. 10-2019-0117969 (Invention name: Semi-supervised reinforcement learning method using labeled data and unlabeled data and apparatus using the same)

이러한 문제점을 해결하기 위하여, 본 발명은 생성적 적대 신경망(Generative Adversarial Network; GAN)으로 생성한 결측 대체값을 이용하여 결측 데이터가 존재하는 레이블링 데이터 세트와 불균형한 데이터 세트에서도 학습할 수 있는 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 시스템 및 방법을 제공하는 것을 목적으로 한다.In order to solve this problem, the present invention uses a missing substitution value generated by a generative adversarial network (GAN) to label data that has missing data and can learn from unbalanced data sets. An object of the present invention is to provide a classification system and method based on a productive hostile neural network.

상기한 목적을 달성하기 위하여 본 발명의 일 실시 예는 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 시스템으로서, 레이블이 있는 데이터 세트로부터 스테이트 중 결측된 부분에 대하여 결측값을 생성하는 생성자; 상기 생성자가 생성한 결측 대체값과 원본 데이터를 구분하는 판별자; 상기 생성자에 의해 생성된 결측 대체값을 가지고 정책을 통해 액션(Action)을 예측하는 액터; 및 상기 결측 대체값으로 대체된 스테이트, 상기 예측된 액션 및 상기 레이블이 있는 데이터 세트의 레이블에 기반하여 리워드의 가중치를 생성하는 가중치 함수부;를 포함하고, 상기 가중치 함수부는 빈도수가 상대적으로 작은 레이블에는 리워드의 가중치가 증가되도록 하고, 빈도수가 상대적으로 큰 레이블에는 리워드의 가중치가 낮아지도록 하여 레이블 간의 균형이 맞춰지도록 동작하며, 상기 액터는 상기 예측된 액션과 가중치 함수부에서 생성된 리워드의 가중치를 반영하여 정책 손실 함수가 최적화 되도록 상기 정책을 학습하되,In order to achieve the above object, an embodiment of the present invention is a classification system based on a hostile neural network using label data, which generates a missing value for a missing part of a state from a labeled data set; A discriminator discriminating between the missing data and the original data generated by the generator; An actor predicting an action through a policy with a missing substitution value generated by the constructor; And a weight function unit that generates a weight of a reward based on the state replaced by the missing substitution value, the predicted action, and the label of the labeled data set, wherein the weight function unit includes a relatively small frequency label. In order to increase the weight of the reward, and to the label with a relatively high frequency, the weight of the reward is lowered so that the balance between the labels is operated, and the actor sets the weight of the reward generated in the predicted action and weight function unit. Reflect to learn the above policy to optimize the policy loss function,

상기 정책의 학습은 하기식Learning the above policy is

- 여기서, y는 스테이트의 레이블이고, a는 주어진 스테이트에 대한 정책 π가 예측한 액션이며,

는 스테이트, 액션 및 레이블에 대한 리워드의 가중치 임 - 을 이용하는 것을 특징으로 한다.Where y is the label of the state, a is the action predicted by policy π for a given state,

Is characterized by using the weight of the reward for states, actions and labels.

또한, 본 발명의 일 실시 예에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법은 생성자와, 판별자와, 액터와 가중치 함수부로 구성된 생성적 적대 신경망(Generative Adversarial Network; GAN)을 이용한 레이블 데이터를 이용하고, a) 생성자가 레이블이 있는 데이터 세트로부터 스테이트 중 결측된 부분에 대하여 결측 대체값을 생성하는 단계; b) 액터가 상기 생성자에 의해 생성된 결측 대체값을 가지고 정책을 통해 액션(Action)을 예측하는 단계; c) 가중치 함수부가 상기 결측 대체값으로 대체된 스테이트, 상기 예측된 액션 및 상기 레이블이 있는 데이터 세트의 레이블에 기반하여 리워드의 가중치 값을 생성하는 단계; 및 d) 상기 액터가 상기 예측된 액션과, 가중치 함수부에서 생성된 리워드의 가중치를 반영하여 정책 손실 함수가 최적화 되도록 상기 정책을 학습하는 단계를 포함하되, In addition, the classification method based on the generative hostile neural network using label data according to an embodiment of the present invention is a label using a generative adversarial network (GAN) composed of a generator, a discriminator, an actor and a weight function unit. Using the data, a) generating a missing substitution value for the missing portion of the state from the labeled data set by the creator; b) actor predicting an action through a policy with a missing substitution value generated by the actor; c) generating a weight value of a reward based on a state in which the weight function unit is replaced with the missing substitution value, the predicted action, and the label of the labeled data set; And d) the actor learning the policy such that the policy loss function is optimized by reflecting the predicted action and the weight of the reward generated by the weight function unit.

상기 정책의 학습은 하기식Learning the above policy is

는 스테이트, 액션 및 레이블에 대한 리워드의 가중치 임 - 을 이용하고,Where y is the label of the state, a is the action predicted by policy π for a given state,

Is the weight of the rewards for states, actions, and labels-

상기 c) 단계에서, 상기 가중치 함수부는 빈도수가 상대적으로 작은 레이블에는 리워드의 가중치가 증가되도록 하고, 빈도수가 상대적으로 큰 레이블에는 리워드의 가중치가 낮아지도록 하여 레이블 간의 균형이 맞춰지도록 동작하는 것을 특징으로 한다.In step c), the weight function unit operates such that the weight of the reward is increased for a label with a relatively low frequency, and the weight of the reward is lowered for a label with a relatively large frequency, thereby balancing the labels. do.

본 발명은 생성적 적대 신경망(GAN)으로 생성한 결측 대체값을 이용하여 결측 데이터가 존재하는 레이블링 데이터 세트와 불균형한 데이터 세트에서도 학습할 수 있는 장점이 있다.The present invention has an advantage in that it is possible to learn from a labeling data set and an unbalanced data set in which missing data is present using a missing substitution value generated by a generative hostile neural network (GAN).

도 1은 본 발명의 일 실시 예에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 시스템의 구성을 나타낸 블록도.
도 2는 본 발명의 일 실시 예에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법을 나타낸 흐름도.
도 3은 도 2에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법의 결측 데이터 학습과정을 나타낸 흐름도.
도 4는 도 2에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법의 가중치를 이용한 분류 학습과정을 나타낸 흐름도.
도 5는 도 2에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법의 가중치 추정과정을 나타낸 흐름도.1 is a block diagram showing the construction of a classification system based on a productive hostile neural network using label data according to an embodiment of the present invention.
2 is a flowchart illustrating a method for generating a hostile neural network based classification using label data according to an embodiment of the present invention.
FIG. 3 is a flow chart showing a process of learning missing data of a classification method based on a productive hostile neural network using the label data according to FIG. 2.
4 is a flowchart illustrating a classification learning process using weights of a classification method based on a productive hostile neural network using the label data according to FIG. 2.
FIG. 5 is a flowchart illustrating a weight estimation process of the classification method based on the productive hostile neural network using the label data according to FIG. 2.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시 예에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 시스템 및 방법의 바람직한 실시예를 상세하게 설명한다.Hereinafter, a preferred embodiment of a system and method for generating hostile neural networks based classification using label data according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

본 명세서에서 어떤 부분이 어떤 구성요소를 "포함"한다는 표현은 다른 구성요소를 배제하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.In the present specification, the expression that a part “includes” a certain component does not exclude other components, but means that other components may be further included.

또한, "‥부", "‥기", "‥모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어, 또는 그 둘의 결합으로 구분될 수 있다.In addition, terms such as "‥ unit", "‥ group", "‥ module" mean a unit that processes at least one function or operation, which can be divided into hardware or software, or a combination of the two.

또한, 본 발명의 상세한 설명 및 청구항들에 걸쳐 '학습' 혹은 '러닝'은 컴퓨터 시스템에서 절차에 따른 컴퓨팅(computing)을 통하여 기계 학습(machine learning)을 수행함을 일컫는 용어인바, 인간의 교육 활동과 같은 정신적 작용을 지칭하도록 의도된 것이 아니며, 훈련(training)은 기계 학습에 관하여 일반적으로 받아들여지는 의미로 쓰인 것이다.In addition, throughout the detailed description and claims of the present invention, 'learning' or 'learning' is a term that refers to performing machine learning through computing according to a procedure in a computer system. It is not intended to refer to the same mental action, and training is used in a generally accepted sense of machine learning.

또한, 컴퓨팅 장치는, 통신장치 및 프로세서를 포함하며, 통신장치를 통하여 외부 컴퓨팅 장치와 직/간접적으로 통신할 수 있다.In addition, the computing device includes a communication device and a processor, and can communicate directly or indirectly with an external computing device through the communication device.

구체적으로, 컴퓨팅 장치는, 전형적인 컴퓨터 하드웨어(예컨대, 컴퓨터 프로세서, 메모리, 스토리지, 입력 장치 및 출력 장치, 기타 기존의 컴퓨팅 장치의 구성요소들을 포함할 수 있는 장치; 라우터, 스위치 등과 같은 전자 통신 장치; 네트워크 부착 스토리지(NAS; network-attached storage) 및 스토리지 영역 네트워크(SAN; storage area network)와 같은 전자 정보 스토리지 시스템)와 컴퓨터 소프트웨어(즉, 컴퓨팅 장치로 하여금 특정의 방식으로 기능하게 하는 명령어들)의 조합을 이용하여 원하는 시스템 성능을 달성하는 것일 수 있다.Specifically, a computing device includes typical computer hardware (eg, a device that may include components of a computer processor, memory, storage, input and output devices, and other existing computing devices; electronic communication devices such as routers, switches, etc.); Electronic information storage systems such as network-attached storage (NAS) and storage area network (SAN) and computer software (i.e., instructions that cause a computing device to function in a particular way). It may be that the desired system performance is achieved using a combination.

이와 같은 컴퓨팅 장치의 통신장치는 연동되는 타 컴퓨팅 장치와 요청과 응답을 송수신할 수 있는바, 일 예시로서 그러한 요청과 응답은 동일한 TCP(transmission control protocol) 세션(session)에 의하여 이루어질 수 있지만, 이에 한정되지는 않는바, 예컨대 UDP(user datagram protocol) 데이터그램(datagram)으로서 송수신 될 수도 있을 것이다. The communication device of such a computing device can transmit and receive requests and responses with other computing devices to be interlocked. As an example, such requests and responses may be made by the same Transmission Control Protocol (TCP) session. It is not limited, for example, it may be transmitted and received as a user datagram protocol (UDP) datagram.

또한, 넓은 의미에서 통신장치는 명령어 또는 지시 등을 전달받기 위한 키보드, 마우스, 기타 외부 입력장치, 프린터, 디스플레이, 기타 외부 출력장치를 포함할 수 있다.In addition, in a broad sense, the communication device may include a keyboard, a mouse, other external input devices, a printer, a display, and other external output devices for receiving commands or instructions.

또한, 컴퓨팅 장치의 프로세서는 MPU(micro processing unit), CPU(central processing unit), GPU(graphics processing unit), NPU(neural processing unit) 또는 TPU(tensor processing unit), 캐시 메모리(cache memory), 데이터 버스(data bus) 등의 하드웨어 구성을 포함할 수 있다. In addition, the processor of the computing device is a micro processing unit (MPU), a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU) or a tensor processing unit (TPU), cache memory, data It may include a hardware configuration such as a bus (data bus).

도 1은 본 발명의 일 실시 예에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 시스템의 구성을 나타낸 블록도이고, 도 2는 본 발명의 일 실시 예에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법을 나타낸 흐름도이며, 도 3은 도 2에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법의 결측 데이터 학습과정을 나타낸 흐름도이고, 도 4는 도 2에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법의 가중치를 이용한 분류 학습과정을 나타낸 흐름도이며, 도 5는 도 2에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 방법의 가중치 추정과정을 나타낸 흐름도이다.1 is a block diagram showing the configuration of a classification system based on a productive hostile neural network using label data according to an embodiment of the present invention, and FIG. 2 is a productive hostile neural network using label data according to an embodiment of the present invention Fig. 3 is a flow chart showing a process of missing data learning of a productive hostile neural network based classification method using the label data according to Fig. 2, and Fig. 4 is a generation using the label data according to Fig. 2 It is a flowchart showing a classification learning process using weights of the hostile neural network based classification method, and FIG. 5 is a flowchart showing a weight estimation process of the productive hostile neural network based classification method using the label data according to FIG. 2.

도 1 내지 도 5을 참조하면, 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 시스템은 생성자(100)와, 판별자(200)와, 액터(400)와, 가중치 함수부(500)를 포함하여 구성된다.Referring to FIGS. 1 to 5, a classification system based on a productive hostile neural network using label data includes a constructor 100, a discriminator 200, an actor 400, and a weight function unit 500 It is composed.

생성자(100)와 판별자(200)는 경쟁 구조에 있는 네트워크인 생성적 적대 신경망(Generative Adversarial Network; GAN)을 사용하여 생성자(100)는 원본 데이터의 분포를 보고 판별자(200)를 속이는 결측 대체값의 생성을 위한 학습을 수행하며, 판별자(200)는 어떤 데이터가 생성자(100)에 의해 생성된 데이터인지 분별하는 학습을 수행한다.The generator 100 and the discriminator 200 use the generative adversarial network (GAN), which is a network in a competitive structure, so that the generator 100 looks at the distribution of the original data and deceives the discriminator 200. The learning for the generation of the replacement value is performed, and the discriminator 200 performs learning to discern which data is the data generated by the constructor 100.

또한, 생성자(100)는 원본 데이터의 분포를 참조하여 판별자(200)를 속이는 결측 대체값의 생성을 위한 학습을 수행한다.In addition, the constructor 100 performs learning for generating a missing replacement value that deceives the discriminator 200 by referring to the distribution of the original data.

또한, 생성자(100)는 레이블이 있는 데이터(S_L)를 이용한 생성적 적대 신경망 기반의 분류 시스템으로서, 레이블이 있는 데이터 세트(10)로부터 결측 대체값을 생성한다.In addition, the generator 100 is a generative hostile neural network based classification system using labeled data S _L , and generates a missing replacement value from the labeled data set 10.

또한, 생성자(100)는 전처리 과정으로, 레이블이 있는 데이터 세트가 아닌 레이블이 없는 데이터 세트로부터 결측 대체값을 생성하기 위한 학습과정을 수행할 수도 있다.In addition, the generator 100 may perform a learning process for generating a missing substitute value from an unlabeled data set rather than a labeled data set as a pre-processing process.

또한, 생성자(100)는 결측 대체값의 생성을 위한 입력으로 데이터 세트(10)로부터 n개의 스테이트(State)와, n개의 스테이트에 해당하는 스테이트의 원소가 결측 됐는지 나타내는 n개의 결측 지표(20, m_L)를 선택한다.In addition, the constructor 100 is an input for generating a missing substitution value, n states (states) from the data set 10, and n missing indicators 20 indicating whether elements of states corresponding to n states are missing. m _L ).

여기서, S_L은 각 스테이트가 레이블이 있는 데이터 세트인 것을 의미하고, m_L은 레이블이 있는 결측 지표를 의미한다.Here, S _L means that each state is a labeled data set, and m _L means a labeled missing indicator.

또한, 레이블이 있는 데이터 세트(10)는 S₁, S₂, S₃, ‥, S_n ∈ R^d 로 이루어진 n개의 스테이트를 포함하고, 여기서 d는 스테이트 특징(feature)이다.In addition, the labeled data set 10 includes n states consisting of S ₁ , S ₂ , S ₃ , ..., S _n ∈ R ^d , where d is a state feature.

또한, 스테이트 i가 가지는 j번째 원소는 s_i ^j라고 표현하고, 여기서, j는 d까지의 상수이며, s_i ^j는 스칼라 또는 결측값을 가진다.Further, the j-th element of state i is expressed as s _i ^j , where j is a constant up to d, and s _i ^j has a scalar or missing value.

또한, 데이터 세트(10)는 레이블이 있는 데이터 및 레이블링되지 않은 데이터 중 적어도 하나의 데이터로 구성될 수 있다.Further, the data set 10 may be composed of at least one of labeled data and unlabeled data.

또한, 결측 지표(20)는 스테이트의 원소가 결측 됐는지를 나타내기 위한 지표로서, m₁, m₂, m₃, ‥, m_n ∈ R^d 를 사용하고, 이때, m_i ^j는 s_i ^j가 결측 데이터를 가지면 결측 지표값(22)은 '0', 그렇지 않으면 결측 지표값(21)은 '1'로 표시된다.In addition, the missing indicator 20 is an indicator for indicating whether the element of the state is missing, m ₁ , m ₂ , m ₃ , ‥, m _n ∈ R ^d is used, where m _i ^j is s _i ^j If there is missing data, the missing indicator value 22 is displayed as '0', otherwise the missing indicator value 21 is displayed as '1'.

또한, 생성자(100)는 n개의 스테이트 중에서 임의의 원소(11)에 대하여 무작위(랜덤)로 선별된 결측 원소(12)에 미리 설정된 값, 예를 들면, '0'과 '1' 사이의 균등 분포로부터 랜덤 노이즈 'Z'로 대체된 결측 대체값(

)을 입력 받아 계산한다.In addition, the constructor 100 is a preset value to a missing element 12 randomly (randomly) selected for any element 11 among n states, for example, an equality between '0' and '1'. Missing substitution values replaced by random noise 'Z' from the distribution (

) To calculate.

이때, 결측 대체값(

)은 하기식을 통해 입력으로 받는다. At this time, the missing value

) Is received as input through the following equation.

여기서, m은 스테이트 s에 해당하는 결측 지표의 벡터이고, z는 '0'과 '1' 사이의 균등 분포로부터 랜덤하게 선별된 노이즈의 벡터이며, 요소별 곱으로 나타낼 수 있다.Here, m is a vector of the missing indicator corresponding to the state s, z is a vector of noise randomly selected from the uniform distribution between '0' and '1', and can be expressed as an element-by-element product.

또한, 생성자(100)는 결측 대체값(

)을 이용하여 생성된 원소들의 벡터로 이루어진 스테이트(

)를 출력한다.In addition, the constructor 100 is the missing replacement value (

State consisting of vectors of elements created using)

).

또한, 생성자(100)는 스테이트(

)로 대체된 결측 대체값(

)을 생성하여 출력한다.In addition, the constructor 100 is a state (

Missing substitution value replaced by)

) And print it out.

이때, 생성자(100)의 출력을 통해 하기식을 따르는 결측 대체값(

)에 해당하는 데이터가 판별자(200)의 학습을 위해 사용될 수 있도록 한다. At this time, the missing substitution value (following the following equation) through the output of the constructor (100

) So that the data corresponding to) can be used for the learning of the discriminator 200.

여기서, m은 스테이트 s에 해당하는 결측 지표의 벡터이다.Here, m is a vector of the missing indicator corresponding to state s.

*판별자(200)는 생성자(100)가 생성한 결측 대체값(

)과 원본 데이터를 구분하는 구성으로서, 판별자(200)가 출력한 것의 각 원소들이 결측(fake)인지 아닌지(real)를 구분하고, 결과적으로 m은

을 위한 레이블로 사용될 수 있다.* The discriminator 200 is a missing substitution value generated by the constructor 100 (

) And the original data, each element of what is output by the discriminator 200 distinguishes whether it is missing or not, and consequently m is

Can be used as a label for.

또한, 판별자(200)는 함수를 통해 스테이트

의 i번째 원소가 결측 데이터가 아닐 확률에 해당하는 판별자(200)의 i번째 출력을 S →[0, 1]^d로나타낼 수 있다.In addition, the discriminator 200 is a state through a function

The i-th output of the discriminator 200 corresponding to the probability that the i-th element of is not the missing data may be represented by S → [0, 1] ^d .

또한, 판별자(200)는 출력을 판별자 출력 지표(30)를 통해 D₁, D₂, D₃, ‥,D_d로 나타낼 수 있다.Also, the discriminator 200 may represent the output as D ₁ , D ₂ , D ₃ , ..., D _d through the discriminator output indicator 30.

한편, 생성자(100)와 판별자(200)는 손실 함수를 통해 학습될 수 있는데, 생성자(100)의 학습을 위한 생성자 손실 함수는 하기식과 같을 수 있다.Meanwhile, the generator 100 and the discriminator 200 may be learned through a loss function, and the constructor loss function for learning the constructor 100 may be as follows.

여기서, 생성자 손실 함수는 두 개의 항(term)으로 구성될 수 있는데, 첫 번째 항은 결측 데이터에 대한 확률 Dⁱ를 최대화 하는 것이다.Here, the generator loss function may be composed of two terms, and the first term is to maximize the probability D ⁱ for the missing data.

또한, 두 번째 항은 원본 데이터 분포를 이용하여 생성자(100)에서 생성된 결측 데이터를 원본 데이터에 가깝도록 변환하는 재구성 손실(reconstruction loss, 40)이고, 여기서, λ는 스케일 팩터(scale factor)이다.In addition, the second term is a reconstruction loss (40) that converts the missing data generated in the generator 100 to be close to the original data using the original data distribution, where λ is a scale factor. .

또한, 판별자(200)의 학습을 위한 판별자 손실 함수는 하기식과 같을 수 있다.In addition, the discriminator loss function for learning the discriminator 200 may be as follows.

판별자 손실 함수는 i번째 원소가 결측 데이터이면, 확률 Dⁱ를 최대화하는 방향으로 학습하고, 아니면 반대로 확률 Dⁱ를 최소화하는 방향으로 학습되도록 구성할 수 있다.Discriminator loss function is the i th element missing data, can be configured to learn a direction to maximize the probability of D and ^i, or learning in a direction to minimize the probability of D ⁱ contrary.

액터(400)는 생성자(100)에 의해 생성된 결측 대체값들로 이루어진 벡터를 정책(Policy)을 이용하여 레이블이 있는 데이터 세트로부터 액션(Action)을 수행할 확률을 예측한다.The actor 400 predicts a probability of performing an action from a labeled data set using a policy of a vector of missing replacement values generated by the constructor 100.

또한, 액터(400)는 강화학습에서 잘 알려진 의사결정 프레임워크인 'Actor-critic' 아키텍처의 구성요소일 수 있다.In addition, the actor 400 may be a component of the 'Actor-critic' architecture, which is a well-known decision-making framework in reinforcement learning.

또한, 액터(400)는 스테이트를 입력으로 받아 주어진 액션(Action)을 할 확률을 출력하고, 'Actor-critic'를 이용하여 정책(Policy)π를 학습하기 위해, 정책 손실 함수(Policy loss function, 41)는 하기식과 같이 정의될 수 있다.In addition, the actor 400 receives a state as an input, outputs a probability of performing a given action, and uses 'Actor-critic' to learn Policy π, a policy loss function, 41) may be defined as follows.

여기서,

는 주어진 스테이트에서 예측된 액션이 좋은지 또는 나쁜지를 결정하는 크리틱(Critic)으로부터 평가되는 함수이다.here,

Is a function evaluated from critic that determines whether the predicted action in a given state is good or bad.

또한,

는 'total discounted reward', 'action-value function' 또는 'TD-error'와 같은 형태를 가질 수도 있다.Also,

May have the form of 'total discounted reward', 'action-value function' or 'TD-error'.

상기된 정책 손실 함수(41)는 액션이 결정되지 않은 일반적인 형태로서, 액터(400)는 정확하고, 부정확한 액션 모두로부터 학습되어야 한다.The above-described policy loss function 41 is a general form in which an action is not determined, and the actor 400 must be learned from both correct and incorrect actions.

그러나,

의 추정치가 나쁜 경우, 그 정책 손실 함수는 잘못된 방향으로 최적화를 하게 되고, 그 결과, 천천히 수렴하거나 또는 발산하게 될 수 있다.But,

If the estimate of is poor, the policy loss function is optimized in the wrong direction, and as a result, it can converge slowly or diverge.

따라서, 본 발명의 실시 예에 따른 액터(400)는 정책 손실 함수(41)를 부정확한 액션으로부터 학습되는 경우를 생략하고, 주어진 정확한 레이블 만을 이용할 수 있도록 하기식으로 정의될 수 있다.Accordingly, the actor 400 according to an embodiment of the present invention may be defined in the following manner so that the policy loss function 41 is omitted from learning from an incorrect action, and only the correct label given is used.

여기서, y는 스테이트의 레이블이고, a는 주어진 스테이트에 대한 정책 π가 예측한 액션이며,

는 스테이트, 액션 및 레이블에 대한 리워드의 가중치이다.Where y is the label of the state, a is the action predicted by policy π for a given state,

Is the weight of the rewards for states, actions and labels.

즉, 예측된 액션을 정확한 레이블로 대체하고, 함수

를 가중치 함수(Weighted Function) W로 대체한다.That is, the predicted action is replaced with the correct label, and the function

Replace W with a Weighted Function W.

따라서, 지도 정책 손실(Supervised policy loss) L_L은 가중치 함수

로부터 얻은 분류 손실 가중치(Classification loss weighted)이다.Therefore, supervised policy loss L _L is a weight function

Classification loss weighted from.

또한, 모든 스테이트, 액션, 레이블에 대해 가중치 함수가 '1'인 경우, L_L은 분류 손실 가중치와 완전하게 같아지게 된다.In addition, when the weight function is '1' for all states, actions, and labels, L _L is completely equal to the classification loss weight.

또한, 액터(400)는 지도 분류를 위한 정책 손실 함수(41)가 가중치 함수부(500)로부터 생성된 리워드의 가중치 값을 이용하여 지도 정책을 학습할 수 있다.In addition, the actor 400 may learn the map policy using the weight value of the reward generated by the weight function unit 500 by the policy loss function 41 for map classification.

가중치 함수부(500)는 스테이트

로부터 가져올 수 있는 리워드의 가중치로서, 레이블이 있는 데이터 세트로부터 레이블의 빈도수에 기반하여 스테이트, 액션 및 레이블에 대한 리워드의 가중치를 생성한다.The weight function unit 500 is a state

As a weight of a reward that can be obtained from, a reward weight for states, actions, and labels is generated based on the frequency of the label from the labeled data set.

여기서, 가중치 함수부(500)가 K개의 레이블이 있는 (k = 0, 1, …, K-1) 레이블이 있는 데이터 세트 S_L을 가지고 있다고 가정하면, K번째 레이블의 빈도수는 하기식으로 근사될 수 있다.Here, assuming that the weight function unit 500 has a data set S _L with K labels (k = 0, 1, ..., K-1), the frequency of the K-th label is approximated by the following equation. Can be.

여기서, n_k는 k번째 레이블의 샘플 수이고,

는 (0, 1)의 범위 안에 있다.Where n _k is the number of samples in the k-th label,

Is in the range of (0, 1).

또한, 가중 계수 ω_k는 각 레이블에 대하여 하기식으로 추정될 수 있다.Further, the weighting coefficient ω _k can be estimated by the following equation for each label.

여기서, b는 로그에 기초한다(b = e, 10, …).Here, b is based on the log (b = e, 10, ...).

따라서, 레이블의 빈도수가 상대적으로 작은 소수의 레이블(minority lable)에 대하여 높은 리워드의 가중치를 주고, 레이블의 빈도수가 상대적으로 큰(높은) 다수의 레이블(majority lable)에는 더 낮은 리워드의 가중치를 줌으로써, 레이블 간의 균형이 맞춰지도록 생성할 수 있다.Therefore, by giving a weight of high rewards to a small number of labels (minority lable) having a relatively small frequency, and by giving a weight of lower rewards to a plurality of labels (high) having a relatively high frequency of labels. , It can be created to balance the labels.

또한, 가중치 함수부(500)는 가중치 함수, 즉 스테이트, 액션 및 레이블에 대한 리워드의 가중치를 하기식으로 정의할 수 있다.Also, the weight function unit 500 may define a weight function, that is, a weight of rewards for states, actions, and labels, using the following equation.

여기서,

는 스테이트

로부터 가져올 수 있는 리워드이고, a는 주어진 스테이트에 대한 정책 π가 예측한 액션이며, y는 스테이트의 레이블이고, ω_y와 ω_a는

(b 는 로그에 기초한 e, 10 …)에 기반한 가중 계수이다.here,

The state

Is a reward that can be fetched from, a is the action predicted by policy π for a given state, y is the label of the state, and ω _y and ω _a are

(b is a weighting coefficient based on e, 10…) based on a logarithm.

다음은 본 발명의 일 실시 예에 따른 레이블 데이터를 이용한 생성적 적대 신경망 기반의 분류 및 학습 방법을 설명한다.The following describes a classification and learning method based on a productive hostile neural network using label data according to an embodiment of the present invention.

학습 절차는 결측값을 생성하는 단계(S100)와 학습 정책을 생성하는 단계(S200)인 두 단계로 나눠질 수 있다.The learning procedure can be divided into two steps: generating a missing value (S100) and generating a learning policy (S200).

그리고, 각 단계 S100과 S200은 레이블이 있는 데이터 세트의 다양한 에폭(epoch)을 통해 반복하면서 업데이트 할 수 있는데, 데이터 세트를 한 번 도는 것을 1 에폭(epoch)이라 한다.And, each of the steps S100 and S200 can be updated while iterating through various epochs of a labeled data set, and turning a data set once is called 1 epoch.

또한, 생성자(100)와, 판별자(200)와, 액터(400)와 가중치 함수부(500)로 구성된 생성적 적대 신경망(Generative Adversarial Network; GAN)을 이용할 수 있다.In addition, a generative adversarial network (GAN) composed of a generator 100, a discriminator 200, an actor 400, and a weight function unit 500 may be used.

우선, 결측 대체값을 생성하는 S100 단계는 생성자(100)와 판별자(200)를 학습하는데, 각각의 반복에서 생성자(100)에 입력될 데이터 세트로부터 무작위(랜덤)로 n개의 스테이트(State)를 선택하는 단계(S110)와, 스테이트에 해당하는 스테이트의 원소가 결측 됐는지 나타내는 n개의 결측 지표(m)를 선택하는 단계(S120)를 수행한다. First, in step S100 of generating a missing substitution value, the generator 100 and the discriminator 200 are learned, and n states are randomly (randomly) from the data set to be input to the generator 100 in each iteration. A step (S110) of selecting and n missing indicators (m) indicating whether an element of the state corresponding to the state is missing is performed (S120).

이때, S110 단계와 S120 단계는 외부 단말로부터 제공될 수도 있고, 미리 설정된 데이터 세트로부터 제공될 수도 있다.At this time, steps S110 and S120 may be provided from an external terminal or may be provided from a preset data set.

또한, S110 단계와 S120 단계에서, 데이터 세트는 레이블링된 데이터 및 레이블링되지 않은 데이터 중 적어도 하나의 데이터로 이루어진 데이터 세트일 수 있다.Further, in steps S110 and S120, the data set may be a data set consisting of at least one of labeled data and unlabeled data.

n개의 스테이트에 미리 설정된 값, 예를 들면, '0'과 '1' 사이의 균등 분포로부터 랜덤 노이즈 'Z'(여기서 Z ∈ [0, 1])로 대체한 벡터를 선별(S130)하여 생성자(100)로 입력되면, 생성자(100)는 결측 대체값(

)과, 스테이트(

)와, 결측 대체값(

)을 계산(S140)한다.Constructor by selecting (S130) a vector that is replaced with random noise 'Z' (here Z ∈ [0, 1]) from a uniform distribution between '0' and '1' in a predetermined value for n states. When input as (100), the generator 100 is the missing replacement value (

) And state (

) And the missing value (

) Is calculated (S140).

여기서,

는 노이즈 'Z'로 대체된 결측 대체값이고,

는 생성자(100)에 의해 생성된 스테이트를 나타내며,

는 생성자에 의해 생성된 값으로 대체된 결측 대체값이다.here,

Is the missing replacement value replaced by the noise 'Z',

Indicates a state generated by the constructor 100,

Is the missing replacement value, replaced by the value generated by the constructor.

S140 단계에서, 생성자(100)는 랜덤 노이즈 'Z'로 대체된 결측 대체값(

)으로 이루어진 벡터를 입력받아 계산하는데, 하기식을 통해 입력으로 받는다. In step S140, the generator 100 is a missing substitution value (replaced by random noise 'Z')

) Is calculated by receiving the input vector.

또한, 생성자(100)는

= G(

)를 통해

∈ R^d를 계산하여 스테이트(

)를 생성한다.Also, the constructor 100

= G (

)Through the

∈ Calculate the state of R ^d

).

또한, 생성자(100)는 생성된 스테이트(

)로 대체된 결측 대체값으로 이루어진 벡터인 결측 대체값(

)을 계산하는데, 하기식을 통해 계산될 수 있다.In addition, the constructor 100 is generated state (

A missing replacement value, which is a vector of missing replacement values replaced by)

), It can be calculated through the following equation.

또한, 생성자(100)가 생성한 결측 대체값(

)은 판별자(200)로 제공되고, 판별자 손실 함수를 이용하여 판별자(200)가 학습(S150)되도록 한다.Also, the missing replacement value generated by the constructor 100 (

) Is provided to the discriminator 200, and uses the discriminator loss function to allow the discriminator 200 to learn (S150).

또한, 생성자(100)가 생성한 결측 대체값(

)은 생성자 손실 함수를 이용하여 생성자(100)가 학습(S160)되도록 한다.Also, the missing replacement value generated by the constructor 100 (

) Allows the constructor 100 to learn (S160) using the generator loss function.

한편, 모든 구성요소들을 학습하기 위해 매개 변수마다 업데이트 속도를 최적으로 조절하는 'Adam optimizer'를 사용할 수도 있다.Meanwhile, in order to learn all components, an 'Adam optimizer' that optimally adjusts an update rate for each parameter may be used.

학습 정책을 생성하는 단계(S200)는 각각의 반복에서, 레이블이 있는 데이터 세트(S_L)로부터 무작위(랜덤)로 n개의 스테이트(State)와, 스테이트에 해당하는 스테이트의 원소가 결측 됐는지 나타내는 n개의 결측 지표(m_L)를 선택(S210)한다.Step (S200) of generating a learning policy is n (State) randomly (random) from the labeled data set (S _L ) at each iteration, and n indicating whether the element of the state corresponding to the state is missing. Dog missing indicator (m _L ) is selected (S210).

계속해서, n개의 스테이트에 미리 설정된 값, 예를 들면, '0'과 '1' 사이의 균등 분포로부터 랜덤 노이즈 'Z'(여기서 Z ∈ [0, 1])로 대체한 벡터를 선별(S220)하여 생성자(100)로 입력되면, 생성자(100)는 결측 대체값(

_L)과, 스테이트(

_L)와, 결측 대체값(

_L)을 계산(S230)한다.Subsequently, a vector substituted with random noise 'Z' (here Z ∈ [0, 1]) is selected from the uniform distribution between '0' and '1', which is a preset value for n states (S220). ) And input to the constructor 100, the generator 100 is missing missing replacement value (

_L ) and state (

_L ) and the missing value (

_L ) is calculated (S230).

여기서,

_L은 노이즈 'Z'로 대체된 결측 대체값이고,

_L은 생성자(100)에 의해 생성된 스테이트를 나타내며,

_L은 생성자에 의해 생성된 값으로 대체된 결측 대체값이다.here,

_L is the missing replacement value replaced by the noise 'Z',

_L represents the state generated by the constructor 100,

_L is a missing substitution value that is replaced by the value generated by the constructor.

S230 단계에서, 생성자(100)는 랜덤 노이즈 'Z'로 대체된 결측 대체값(

_L)으로 이루어진 벡터를 입력받아 계산하는데, 하기식을 통해 입력으로 받는다. In step S230, the generator 100 is the missing substitution value replaced by random noise 'Z' (

Calculate by taking a vector consisting of _L ), and receive it through the following formula.

또한, 생성자(100)는

_L = G(

_L)를 통해

_L ∈ R^d를 계산하여 스테이트(

_L)를 생성한다.Also, the constructor 100

_L = G (

Through _L )

Calculate the state of _L ∈ R ^d (

_L ).

또한, 생성자(100)는 생성된 스테이트(

_L)로 대체된 결측 대체값으로 이루어진 벡터인 결측 대체값(

_L)을 계산하는데, 하기식을 통해 계산될 수 있다.In addition, the constructor 100 is generated state (

A missing replacement value, which is a vector of missing replacement values replaced by _L )

_L ), which can be calculated through the following equation.

계속해서, 액터(400)는 생성된 결측 대체값(

_L)이 정책

을 통해 액션을 수행할 확률값을 예측(S240)한다.Subsequently, the actor 400 generates the missing missing replacement value (

_L ) this policy

Predict the probability value to perform the action through (S240).

이때, 가중치 함수부(500)는 가중치 함수를 이용하여 스테이트, 액션 및 레이블에 대한 리워드의 가중치를 하기식을 통해 생성(S250)한다.At this time, the weight function unit 500 uses the weight function to generate weights of rewards for states, actions, and labels through the following equation (S250).

또한, S250 단계에서, 가중치 함수부(500)는 스테이트로부터 가져올 수 있는 리워드의 가중치로서, 레이블이 있는 데이터 세트로부터 레이블의 빈도수에 기반하여 스테이트, 액션 및 레이블에 대한 리워드의 가중치로 반영할 수 있다.In addition, in step S250, the weight function unit 500 is a weight of a reward that can be obtained from the state, and can be reflected as a weight of the reward for states, actions, and labels based on the frequency of the label from the labeled data set. .

이때, 레이블 빈도수는 하기식을 통해 근사할 수 있다.At this time, the label frequency may be approximated through the following equation.

계속해서, S250 단계에서 생성된 가중치는 하기식을 이용한 지도 정책 손실 함수(41)를 통해 학습(S260)한다.Subsequently, the weight generated in step S250 is learned through the map policy loss function 41 using the following equation (S260).

Is the weight of the rewards for states, actions and labels.

따라서, 생성적 적대 신경망(GAN)으로 생성한 결측 대체값을 이용하여 결측 데이터가 존재하는 레이블링 데이터 세트와 불균형한 데이터 세트에서도 학습할 수 있다.Therefore, it is possible to learn from the labeling data set and the unbalanced data set in which missing data is present by using the missing replacement value generated by the generative hostile neural network (GAN).

상기와 같이, 본 발명의 바람직한 실시 예를 참조하여 설명하였지만 해당 기술 분야의 숙련된 당업자라면 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.As described above, although described with reference to preferred embodiments of the present invention, those skilled in the art variously modify and change the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You can understand that you can.

또한, 본 발명의 특허청구범위에 기재된 도면번호는 설명의 명료성과 편의를 위해 기재한 것일 뿐 이에 한정되는 것은 아니며, 실시예를 설명하는 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다.In addition, the drawing numbers described in the claims of the present invention are merely for clarity and convenience of description, and are not limited thereto. In the course of explaining the embodiment, the thickness of the lines or the size of components shown in the drawings, etc. May be exaggerated for clarity and convenience.

또한, 상술된 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있으므로, 이러한 용어들에 대한 해석은 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In addition, the above-mentioned terms are terms defined in consideration of functions in the present invention, which may vary depending on the intention or custom of the user or operator, and thus interpretation of these terms should be made based on the contents throughout the present specification. .

또한, 명시적으로 도시되거나 설명되지 아니하였다 하여도 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기재사항으로부터 본 발명에 의한 기술적 사상을 포함하는 다양한 형태의 변형을 할 수 있음은 자명하며, 이는 여전히 본 발명의 권리범위에 속한다. In addition, even if not explicitly shown or described, a person having ordinary knowledge in the technical field to which the present invention pertains can make various modifications including the technical spirit according to the present invention from the description of the present invention. Obviously, it is still within the scope of the present invention.

또한, 첨부하는 도면을 참조하여 설명된 상기의 실시예들은 본 발명을 설명하기 위한 목적으로 기술된 것이며 본 발명의 권리범위는 이러한 실시예에 국한되지 아니한다.In addition, the above-described embodiments described with reference to the accompanying drawings are described for the purpose of illustrating the present invention and the scope of the present invention is not limited to these embodiments.

10 : 데이터 세트
11 : 원소
12 : 결측 원소
20 : 결측 지표
21, 22 : 결측 지표값
30 : 판별자 출력 지표
40 : 손실 함수
41 : 정책 손실 함수
100 : 생성자
200 : 판별자
400 : 액터
500 : 가중치 함수부10: data set
11: element
12: missing element
20: missing indicator
21, 22: missing indicator value
30: discriminator output indicator
40: loss function
41: policy loss function
100: constructor
200: discriminator
400: actor
500: weight function unit

Claims

A generator 100 for generating a missing value for the missing portion of the state from the labeled data set 10;
A discriminator 200 for distinguishing the missing data and the original data generated by the constructor 100;
An actor 400 predicting an action through a policy with a missing substitution value generated by the constructor 100; And
It includes; a weight function unit 500 for generating a weight of a reward based on the state replaced by the missing substitution value, the predicted action, and the label of the labeled data set;
The weight function unit 500 operates so that the weight of the reward increases for a label with a relatively low frequency, and the weight of the reward decreases for a label with a relatively large frequency, thereby balancing the labels.
The actor 400 learns the policy to optimize the policy loss function 41 by reflecting the predicted action and the weight of the reward generated by the weight function unit 500,
Learning the above policy is

Where y is the label of the state, a is the action predicted by policy π for a given state,

Is a constructive hostile neural network based classification system using label data, characterized by using weights of rewards for states, actions, and labels.

A classification method using label data using a generative adversarial network (GAN) composed of a generator 100, a discriminator 200, an actor 400, and a weight function unit 500,
a) the generator 100 generating a missing replacement value for the missing portion of the state from the labeled data set 10;
b) actor 400 predicting an action through a policy with a missing substitution value generated by the constructor 100;
c) the weight function unit 500 generating a weight value of the reward based on the state replaced by the missing substitution value, the predicted action, and the label of the labeled data set; And
d) the actor 400 reflecting the predicted action and the weight of the reward generated by the weight function unit 500, comprising the step of learning the policy so that the policy loss function 41 is optimized,
Learning the above policy is

Is the weight of the rewards for states, actions, and labels-
In step c), the weight function unit 500 operates so that the weight of the reward increases for a label with a relatively low frequency, and the weight of the reward decreases for a label with a relatively large frequency, thereby balancing the labels. Genetic hostile neural network based classification method using label data, characterized in that.