KR20190101043A

KR20190101043A - A joint learning framework for active feature acquisition and classification

Info

Publication number: KR20190101043A
Application number: KR1020180020952A
Authority: KR
Inventors: 양은호; 심하진
Original assignee: 한국과학기술원
Priority date: 2018-02-22
Filing date: 2018-02-22
Publication date: 2019-08-30
Also published as: KR102106684B1

Abstract

According to one embodiment of the present invention, a method for providing a common learning framework performed by a framework system performed by a computer may comprise the steps of: formulating a framework related to classification loss and acquisition cost for acquired features since at least one feature for each data point is acquired in an RL agent; jointly learning the RL agent and a classifier through the framework; and inferring the feature acquisition for a novel data point as learning is jointly performed by the RL agent and classifier.

Description

Joint Learning Framework for Dynamic Feature Acquisition and Classification {A JOINT LEARNING FRAMEWORK FOR ACTIVE FEATURE ACQUISITION AND CLASSIFICATION}

아래의 설명은 공동 학습에 대한 프레임워크에 관한 것이다.
The description below relates to a framework for collaborative learning.

심층 학습(Deep learning)은 주로 광대한 양의 데이터에 대한 인터넷으로부터의 쉬운 접근에 의해 최근 몇 년 동안 크게 성장하였다. 몇 가지 예로, 시각 인식(visual recognition) 및 기계 번역과 같은 다양한 태스크들에 대한 전형적이고 표준적인 알고리즘들의 상당한 발전을 해왔다. 정확한 딥 네트워크 학습에 대한 기본적인 가정은 모델이 모든 이용 가능한 특징들을 인지한 후에 예측을 할 수 있도록 데이터가 매우 적은 비용 또는 무비용으로 쉽게 이용 가능하다는 것이다. 그러나, 정보 습득은 때때로 모델에 의해 영향을 받지 않지 않을 뿐 아니라(그 반대의 경우도 있다) 비용을 초래한다. 예를 들어, 질병에 대해 환자를 진단하는 태스크를 고려해보면, 의사는 단지 환자가 처음에 보고한 몇 가지 증상들로 진단을 시작한다. 환자가 최종적인 진단에 충분한 확신을 가질 때까지, 그 의사는 환자가 갖고 있는 잠재적인 질병 세트를 좁히기 위하여 다른 증상들을 질의하거나 어떤 의료적인 검사들을 수행할 것이다. 이러한 의료 테스트를 통해 모든 특징들을 습득하는 것은 환자들에게 재정적인 부담을 주고, 더 심각하게는 적절한 때에 적절한 치료를 받지 못하게 하는 리스크를 증가시킬 수 있다. 게다가, 무의미한 특징들을 수집하는 것은 오직 노이즈를 더하고 예측을 불안정하게 만든다.Deep learning has grown significantly in recent years, mainly due to the easy access from the Internet to vast amounts of data. In some instances, significant advances have been made in typical and standard algorithms for various tasks such as visual recognition and machine translation. The basic assumption for accurate deep network learning is that the data is readily available for very low cost or no movie so that the model can make predictions after recognizing all available features. However, acquiring information is sometimes not only affected by the model (and vice versa), but also incurs costs. For example, considering the task of diagnosing a patient for a disease, the doctor only starts diagnosing the patient with some of the symptoms originally reported. Until the patient has sufficient confidence in the final diagnosis, the doctor will query other symptoms or perform some medical tests to narrow down the potential set of illnesses the patient has. Acquiring all features through these medical tests can put financial burdens on patients and, more seriously, increase the risk of not getting the right care at the right time. In addition, collecting nonsense features only adds noise and makes the prediction unstable.

이에 따라 능동적으로 특징을 획득하고, 특징 분류를 위한 공동 학습 프레임워크가 제안될 필요가 있다.
Accordingly, it is necessary to proactively acquire features and propose a joint learning framework for feature classification.

능동적 특징 획득 및 분류를 위한 공동 학습 프레임워크를 제공할 수 있다. 분류 알고리즘을 위한 동적인 특징 획득을 위하여 end-to-end 방식으로 RL 에이전트와 분류기를 동시에 공동으로 학습하는 프레임워크를 제공할 수 있다.
Provide a collaborative learning framework for active feature acquisition and classification. In order to acquire dynamic features for classification algorithms, we can provide a framework for learning the RL agent and classifier simultaneously in an end-to-end manner.

컴퓨터로 수행되는 프레임워크 시스템에 의하여 수행되는 공동학습 프레임워크를 제공하는 방법은, RL 에이전트에서 각 데이터 지점에 대하여 적어도 하나 이상의 특징을 획득함에 따라 상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계; 상기 프레임워크를 통하여 RL 에이전트와 분류기를 공동으로 학습시키는 단계; 및 상기 RL 에이전트와 분류기에서 공동으로 학습을 수행함에 따라 새로운 데이터 지점에 대한 특징 획득을 추론하는 단계를 포함할 수 있다. A method for providing a co-learning framework performed by a computer-implemented framework system is related to classification loss and acquisition cost for the acquired feature as the RL agent acquires at least one feature for each data point. Formalizing the framework; Jointly learning an RL agent and a classifier through the framework; And inferring feature acquisition for a new data point as the RL agent and the classifier jointly perform learning.

상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계는, 상기 데이터 지점에 대하여 기 설정된 순서로 특징들을 능동적으로 습득하는 단계를 포함할 수 있다. Formulating a framework related to classification loss and acquisition cost for the acquired feature may include actively acquiring features in a predetermined order for the data point.

상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계는, 상기 RL 에이전트로부터 상기 각 데이터 지점에 따라 선택되는 특징들의 서브 세트에 따라 일정 시간까지 획득된 특징을 탐색하고, 상기 탐색된 특징 중 일부의 특징을 선택하는 행동을 수행함에 따라 특징을 획득하는 단계를 포함할 수 있다. Formulating a framework related to classification loss and acquisition cost for the acquired features comprises: searching for the acquired features by a predetermined time according to a subset of the features selected for each data point from the RL agent; The method may include acquiring a feature by performing an action of selecting a feature of some of the found features.

상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계는, 상기 획득된 특징에 대한 분류 손실과 획득 비용을 동시에 최소화하는 모델을 학습시키기 위하여 프레임워크를 공식화하는 단계를 포함할 수 있다. Formulating a framework related to classification loss and acquisition cost for the acquired feature may include formulating the framework to learn a model that minimizes classification loss and acquisition cost for the acquired feature simultaneously. Can be.

상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계는, 상기 획득된 특징에 대하여 환경에 의하여 주어지는 보상을 예측하기 위하여 기 설정된 기준 이상의 정보가 제공되었는지 여부를 측정하고, 상기 RL 에이전트에서 상기 각 데이터 지점에서의 분류 수행 여부에 따라 보상을 제공하기 위한 프레임워크를 구성하는 단계를 포함할 수 있다. Formulating a framework related to classification loss and acquisition cost for the acquired feature may include determining whether information above a predetermined criterion has been provided to predict compensation given by an environment for the acquired feature, The RL agent may include configuring a framework for providing compensation according to whether classification at each data point is performed.

상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계는, 상기 RL 에이전트에 딥러닝을 수행함으로써 상태-행동 값 함수를 근사화하여 정책을 탐색하는 단계를 포함할 수 있다. Formulating a framework related to classification loss and acquisition cost for the acquired feature may include searching for a policy by approximating a state-action value function by performing deep learning to the RL agent.

상기 프레임워크를 통하여 RL 에이전트와 분류기를 공동으로 학습시키는 단계는, 상기 RL 에이전트와 상기 분류기를 공동으로 학습시킴에 따라 특징 또는 숨은 특징들을 공유하고, 상기 RL 에이전트와 상기 분류기에서 공유되는 특징 또는 숨은 특징들의 서브 세트를 인코딩하는 단계를 포함할 수 있다. Learning the RL agent and the classifier jointly through the framework may include sharing features or hidden features as the RL agent and the classifier are jointly learned, and sharing or hiding features shared by the RL agent and the classifier. Encoding a subset of the features.

상기 프레임워크를 통하여 RL 에이전트와 분류기를 공동으로 학습시키는 단계는, 상기 RL 에이전트와 상태-행동값 함수에 의해 결정되는 정책에 따라 각 데이터 지점에 대해 사건을 생성하고, 상기 사건의 분류에 사용되는 특징의 서브 세트를 선택하고, 행동을 취함에 따라 상기 에이전트가 정지 행동을 선택할 때까지 획득된 특징에 대한 특징 획득 비용과 획득된 특징 값을 에이전트에게 반환하는 단계를 포함할 수 있다. Learning the RL agent and the classifier jointly through the framework may generate an event for each data point according to a policy determined by the RL agent and a state-action value function, which is used to classify the event. Selecting a subset of features and returning to the agent the feature acquisition cost and the acquired feature value for the acquired feature until the agent selects a stop behavior as the action is taken.

상기 프레임워크를 통하여 RL 에이전트와 분류기를 공동으로 학습시키는 단계는, 상기 분류기에서 상기 획득된 특징으로부터 선택된 적어도 하나 이상의 특징의 품질을 평가하고, 상기 평가된 품질에 기반하여 상기 RL 에이전트에게 보상을 부여하는 단계를 포함할 수 있다. Learning the RL agent and the classifier jointly through the framework may include evaluating the quality of at least one or more features selected from the acquired features in the classifier and awarding a reward to the RL agent based on the evaluated quality. It may include the step.

컴퓨터로 수행되는 프레임워크 시스템에 의하여 수행되는 공동학습 프레임워크를 제공하는 방법을 실행시키기 위하여 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램은, RL 에이전트에서 각 데이터 지점에 대하여 적어도 하나 이상의 특징을 획득함에 따라 상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계; 상기 프레임워크를 통하여 RL 에이전트와 분류기를 공동으로 학습시키는 단계; 및 상기 RL 에이전트와 분류기에서 공동으로 학습을 수행함에 따라 새로운 데이터 지점에 대한 특징 획득을 추론하는 단계를 포함할 수 있다. A computer program stored on a computer-readable recording medium for executing a method for providing a co-learning framework performed by a computer-implemented framework system, as the RL agent acquires at least one feature for each data point. Formulating a framework related to classification losses and acquisition costs for the acquired features; Jointly learning an RL agent and a classifier through the framework; And inferring feature acquisition for a new data point as the RL agent and the classifier jointly perform learning.

상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계는, 상기 데이터 지점에 대하여 기 설정된 순서로 특징들을 능동적으로 습득하고, 상기 RL 에이전트로부터 상기 각 데이터 지점에 따라 선택되는 특징들의 서브 세트에 따라 일정 시간까지 획득된 특징을 탐색하고, 상기 탐색된 특징 중 일부의 특징을 선택하는 행동을 수행함에 따라 특징을 획득하는 단계를 포함할 수 있다. Formulating a framework related to classification loss and acquisition cost for the acquired features may include actively learning features in a predetermined order for the data points, and selecting the features from the RL agent according to each data point. Searching for the acquired feature up to a certain time according to the subset of the above, and acquiring the feature by performing an action of selecting a feature of some of the found features.

상기 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화하는 단계는, 상기 획득된 특징에 대한 분류 손실과 획득 비용을 동시에 최소화하는 모델을 학습시키기 위하여 프레임워크를 구성하고, 상기 획득된 특징에 대하여 환경에 의하여 주어지는 보상을 예측하기 위하여 기 설정된 기준 이상의 정보가 제공되었는지 여부를 측정하고, 상기 RL 에이전트에서 상기 각 데이터 지점에서의 분류 수행 여부에 따라 보상을 제공하고, 상기 RL 에이전트에 딥러닝을 수행함으로써 상태-행동 값 함수를 근사화하여 정책을 탐색하는 단계를 포함할 수 있다. Formulating a framework related to classification loss and acquisition cost for the acquired feature comprises: configuring a framework to learn a model that minimizes classification loss and acquisition cost for the acquired feature simultaneously; In order to predict the reward given by the environment for the feature, it is measured whether information above a predetermined criterion is provided, the reward is provided according to whether the classification is performed at each data point in the RL agent, and the RL agent is deep Performing a run may include searching for a policy by approximating a state-action value function.

상기 프레임워크를 통하여 RL 에이전트와 분류기를 공동으로 학습시키는 단계는, 상기 RL 에이전트와 상태-행동 값 함수에 의해 결정되는 정책에 따라 각 데이터 지점에 대해 사건을 생성하고, 상기 사건의 분류에 사용되는 특징의 서브 세트를 선택하고, 행동을 취함에 따라 상기 에이전트가 정지 행동을 선택할 때까지 획득된 특징에 대한 특징 획득 비용과 획득된 특징 값을 에이전트에게 반환하는 단계를 포함할 수 있다.
Learning the RL agent and classifier jointly through the framework may generate an event for each data point according to a policy determined by the RL agent and a state-action value function, which is used to classify the event. Selecting a subset of features and returning to the agent the feature acquisition cost and the acquired feature value for the acquired feature until the agent selects a stop behavior as the action is taken.

일 실시예에 따른 프레임워크 시스템은 동적인 특징 획득을 통하여 현재 가지고 있는 정보를 기반으로 다음에 필요한 정보가 무엇인지 판단하고, 필요한 만큼 비용 대비 효율적으로 정보를 요청 및 획득할 수 있다.According to an embodiment, the framework system may determine what information is needed next based on information that is present through dynamic feature acquisition, and request and acquire information efficiently and cost-effectively as needed.

일 실시예에 따른 프레임워크 시스템은 공동 학습 프레임워크를 통하여 분류 손실과 획득 비용을 최소화할 수 있다.
The framework system according to an embodiment may minimize classification loss and acquisition cost through a collaborative learning framework.

도 1은 일 실시예에 따른 프레임워크 시스템에서 제안하는 프레임워크를 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 프레임워크 시스템에서 제안하는 프레임워크를 통한 공유 효과를 설명하기 위한 예이다.
도 3은 일 실시예에 따른 프레임워크 시스템에서 CUBE 데이터 세트를 나타낸 예이다.
도 4는 일 실시예에 따른 프레임워크 시스템의 공동 학습 프레임워크를 제공하는 방법을 설명하기 위한 흐름도이다. 1 is a diagram illustrating a framework proposed by a framework system according to an exemplary embodiment.
2 is an example for explaining a sharing effect through a framework proposed by the framework system according to an embodiment.
3 is an example of a CUBE data set in a framework system according to an embodiment.
4 is a flowchart illustrating a method of providing a collaborative learning framework of a framework system according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.
Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings.

아래의 실시예에서는 특징 획득 비용을 기반으로 하는 조정기(regularizer)를 통해 최적화 문제를 공식화할 수 있다. 알려지지 않은 특징들이 발견되어야 하는지 결정하는 과정은 충분하지만 중복되지 않은 특징들을 수집할 때까지 순차적으로 반복될 수 있다. 특징 습득에서, 미리 정의된 특징 획득 비용을 지불하고 최종 예측에 따라 보상 또는 벌점을 받을 수 있다. 그리고 나서, 제안된 최적화 문제를 체계적으로 해결하기 위하여, 예측용 분류기(classifier) 및 특징 획득용 RL 에이전트를 갖는 순차 특징 습득 프레임워크를 제공할 수 있다. 프레임워크에서 분류기는 RL 에이전트를 위한 추정된 환경으로서 이해될 수 있고, 이는 RL 에이전트를 위한 보상이 우리의 분류기가 최종 결정에 얼마나 확신을 가지는지에 기반해야 한다는 점에서 직관적이다. 누락 항목들을 갖는 새로운 데이터 지점이 주어지면, RL 에이전트는 히스토리에 기반해 순차적으로 특징들을 선택하게 된다. RL 에이전트가 특징 획득을 종료하기로 결정하면, 분류기는 지금까지 RL 에이전트에 의해 획득된 특징들을 기반으로 예측을 수행할 수 있다. 동시에, 현재의 특징 서브 세트가 예측을 위해 적합한지에 대하여 RL 에이전트에 신호를 보내기 위하여 최종 보상은 분류기에 의한 예측으로부터 설정될 수 있다.In the example below, an optimization problem can be formulated through a regularizer based on the cost of feature acquisition. The process of determining if unknown features should be found may be repeated sequentially until sufficient but non-overlapping features are collected. In feature acquisition, one may pay a predefined feature acquisition cost and be rewarded or penalized according to the final prediction. Then, in order to systematically solve the proposed optimization problem, a sequential feature acquisition framework having a predictive classifier and a feature acquisition RL agent can be provided. In the framework, the classifier can be understood as an estimated environment for the RL agent, which is intuitive in that the reward for the RL agent should be based on how confident our classifier is in the final decision. Given a new data point with missing items, the RL agent will select features sequentially based on history. If the RL agent decides to end feature acquisition, the classifier may perform prediction based on the features acquired by the RL agent so far. At the same time, the final compensation may be set from the prediction by the classifier to signal the RL agent as to whether the current feature subset is suitable for prediction.

도 1은 일 실시예에 따른 프레임워크 시스템에서 제안하는 프레임워크를 설명하기 위한 도면이다. 프레임워크 시스템은 순차 특징 획득을 위한 강화 학습 프레임워크를 구성할 수 있다.1 is a diagram illustrating a framework proposed by a framework system according to an exemplary embodiment. The framework system may construct a reinforcement learning framework for sequential feature acquisition.

프레임워크 시스템은 p개의 특징을 갖는 데이터 지점

를 라벨

와 매핑하는 함수

를 학습하는 표준 K-클래스(standard K-class) 분류 문제를 고려하자. 여기서 기본 가정은 특징 벡터는 고정된 차원이며, 모두 제시되어 있다는 점이다. 이때, 누락된 항목들이 존재할 수 있다. The framework system is a data point with p features

Label

To map with

Consider the standard K-class classification problem of learning. The basic assumption here is that the feature vectors are fixed dimensions and all are presented. In this case, missing items may exist.

프레임워크 시스템은 각 데이터 지점

에 대하여 기 설정된 순서(예를 들면, 순차적인 순서)로 특징들을 능동적으로 획득할 수 있다. 특히, t=0에서 공 획득 집합(empty acquired set)

에서 시작될 수 있다. 매 시간 간격 t마다, 선택되지 않은 특징들의 서브 세트

를 선택하고, 비용

에서 누락된 누락 항목들

의 값을 검사할 수 있다. 시간 t에서의 검사를 수행한 후에

값들에 접근할 수 있다. 시간

(

는 모든 데이터 지점 i=1, ... , n과 반드시 동일한 것은 아니다)까지 특징들을 반복적으로 획득하고 관찰된 특징들의 부분 세트(집합)

가 주어졌을 때

를 분류한다. 특징 획득의 순서 및 대응하는 비용은 샘플마다 다양하지만, 맥락상 명확할 때 샘플 인덱스 i를 삭제한다.The framework system is responsible for each data point

Features may be actively acquired in a predetermined order (eg, sequential order) with respect to. In particular, an empty acquired set at t = 0

Can be started from. Every time interval t , a subset of features not selected

Select the cost

Missing items in

You can check the value of. After the test at time t

You can access the values. time

(

Is iteratively obtaining features up to all data points i = 1, ..., n) and subset of the observed features

Is given

Classify The order of feature acquisition and the corresponding cost vary from sample to sample, but delete the sample index i when it is clear in context.

프레임워크 시스템은 분류 손실과 획득 비용을 동시에 최소화하는 모델을 학습시키기 위하여, 아래와 같은 최적화 문제로 우리의 프레임워크를 공식화할 수 있다.The framework system can formulate our framework with the following optimization problem to learn a model that minimizes both classification loss and acquisition cost.

수식 1:Equation 1:

여기서,

은 기-정의된 손실 함수이고,

는 순차적인 선택이 정책(policy)

에 의해 수행될 때 각 특징이 마지막에(또는

에) 획득할 것인지 나타낸다. 분류기

는 오직

인 이용 가능한 특징들에만 접근 가능하다. 수식 1의 프레임워크에서, 분류기(110)의 최적화 파라미터들(

) 과 선택 정책(

)은 다른 방식에 의해 획득될 수 있다.

에 대해 수식 1을 해결하는 것은 도 1에 도시된 바와 같이, RL 에이전트(Reinforcement Learning Agent)(120)를 위한 보상이 특정

를 기반으로 하는 의도적인 강화 설계를 통해 달성될 수 있다.here,

Is a pre-defined loss function,

Sequential selection is a policy

When performed by each feature at the end (or

E) Indicates whether to acquire. Classifier

Only

Only accessible features are accessible. In the framework of Equation 1, the optimization parameters of the classifier 110 (

) And selection policy (

) May be obtained in other ways.

Solving Equation 1 with respect to FIG. 1 is that the reward for the Reinforcement Learning Agent (RL) 120 is specified.

This can be achieved through an intentional reinforcement design based on.

도 1을 참고하면, 순차 특징 획득을 위한 강화 학습 프레임워크의 도면을 나타낸 것이다. 각 사건(episode)은 분류(classification)에 사용되는 특징들의 서브 세트 선택에 대응한다. RL 에이전트는 어느 정보(또는 특징)가 얻어지는지 선택하기 위하여 행동을 취하고, RL 에이전트가 정지 행동을 선택할 때까지 환경은 특징 획득 비용과 함께 획득된 특징에 대한 특징 값을 RL 에이전트에게 반환한다. 이때, 환경은 분류기

를 기반으로 선택된 특징들의 품질을 평가하고 RL 에이전트에게 보상을 부여할 수 있다. 1, a diagram of a reinforcement learning framework for sequential feature acquisition is shown. Each event corresponds to a selection of a subset of the features used for classification. The RL agent takes action to select which information (or feature) is obtained, and the environment returns the feature value for the acquired feature to the RL agent with the feature acquisition cost until the RL agent selects the stop action. At this time, the environment is a classifier

Based on the quality of the selected features can be evaluated and rewarded to the RL agent.

구체적으로,

에 관하여 수식 1을 해결하기 위한 RL 에이전트의 구조를 설명하기로 한다. RL 에이전트의 구조를 설명하기 위하여 상태(State), 행동(Action), 보상과 환경(Reward and environment), 정책(Policy)에 대하여 설명하기로 한다.Specifically,

The structure of the RL agent for solving Equation 1 will now be described. State, action, reward and environment, and policy will be described to explain the structure of the RL agent.

상태: 정보 특징들은 클래스마다 다르기 때문에, RL 에이전트가 선택해야 하는 특징들의 서브 세트는 데이터 지점마다 다를 것이다. 트루 클래스에 대한 어떤 사전 정보 없이, 누락 특징들의 중요성은 현재 이용 가능한 특징들인

로부터 추정될 수 있다. 이를 위하여, 상태

를

와

의 연결로서 설계하고,

의 j번째 항목인

는

라면 0으로 설정되고, 그렇지 안다면 j번째 특징의 값으로 설정된다. 여기서

는 앞서 설명한 바와 같이 정의되며, 이는 시간 t까지 어느 특징들이 습득되었는지를 나타낸다.

는 j번째 특징이 이전에 검사됐다는 것을 의미하고(예를 들어

),

은 j번째 특징이 아직 발견되지 않았다는 것을 의미한다.Status: Because information features vary from class to class, the subset of features that the RL agent should choose will vary from data point to data. Without any prior information about the true class, the importance of missing features is currently available

Can be estimated from For this purpose,

To

Wow

Designed as a connection of

J item of

Is

If not, it is set to 0, otherwise it is set to the value of the j th feature. here

Is defined as described above, which indicates which features have been acquired by time t .

Means that the j th feature has been checked before (for example,

),

Means that the j th feature has not been found yet.

행동: RL 에이전트는 어느 특징들이 검사될 것인지 선택할 수 있다. 모든 가능한 행동들의 세트는

의 멱집합으로 간단히 정의된다(이는 공집합

를 포함하고, 이는 더 이상 특징을 습득하지 않는다는 것을 의미한다). 실시예에서 간결함을 위하여 RL 에이전트가 한 번에 하나의 특징을 획득하고, 이 가정 하에서 행동 공간의 크기는

로 가정하기로 한다. 게다가, 시간 t에서, 대응하는 특징들이 이미 이전에 선택되었다면 일부 행동들은 유효하지 않을 것이다.

은 어떤 시간에도 유효한 특별한 행동이다. RL 에이전트가

를 선택한다면, 알려지지 않은 특징들을 찾는 것을 중지하고 현재 상태

에 기반하여 예측할 수 있다.Action: The RL agent can choose which features to check. The set of all possible actions

Simply defined as the power set of (this is an empty set)

Which means that no more features are acquired). In the embodiment, for the sake of brevity, the RL agent acquires one feature at a time, and under this assumption the size of the behavior space is

Let's assume. In addition, at time t some actions will not be valid if the corresponding features have already been previously selected.

Is a special action that is valid at any time. RL agent

If you choose, stop searching for unknown features

Can be predicted based on

보상과 환경: 보상을 음의 습득 비용으로 정의할 수 있다. 특히,

사건(episode)에서,

는

를 제외한 모든 행동들에서

로 설정된다. 여기서, 보상은 기 정의되고 RL 에이전트에 알려진다.

에서

까지의 상태 변화는 결정론적이지만 사실상 시간 t에서 획득을 관찰하기 전까지

는 RL 에이전트에 알려지지 않기 때문에 여전히 간단하지 않다(not trivial).Rewards and the environment: Rewards can be defined as negative acquisition costs. Especially,

In the episode,

Is

In all actions except

Is set to. Here, the compensation is predefined and known to the RL agent.

in

The state change up to is deterministic but in fact until the observation of the acquisition at time t

Is still not trivial because it is unknown to the RL agent.

'특징 획득'의 행동들과 반대로, 더 이상의 특징 값들이

로 나타나지 않을 것이기 때문에

행동에 의한 상태 변화는 간단하다(trivial). 한편, 보상을 정의하는 것은 매우 어렵다.

를 위해 '환경'에 의해 주어지는 보상은 예측을 위하여 지금까지 얼마나 충분한 정보가 제공되었는지 측정해야 한다. 여기서 보상은, 제공된 특징들이 충분할 때 완전히 정확한 예측을 하는 가상의 분류기(또는 환경)를 포함한다고 가정한다. 분류기가 어떤 데이터 지점

에서 올바른 분류를 하지 않는다면, 음의 보상

가 주어져야 한다. 그렇지 않으면, RL 에이전트는 보상

를 받는다. 그러나, 완벽한 분류기를 실제로 갖고 있지 않기 때문에 이 '충분함'이라는 개념은 RL 에이전트에 전혀 알려져 있지 않다.Contrary to the behavior of 'feature acquisition', further feature values

Because it will not appear as

The change of state by action is trivial. On the other hand, it is very difficult to define compensation.

The rewards given by the 'environment' for this measure should measure how much information has been provided so far to make predictions. The compensation here assumes to include a hypothetical classifier (or environment) that makes a fully accurate prediction when the features provided are sufficient. Categorizer data points

If you don't classify correctly, negative compensation

Should be given. Otherwise, the RL Agent will compensate

Receive. However, this concept of 'sufficient' is not known to the RL agent at all because it does not actually have a perfect classifier.

대신에, 분류기

를 신탁 대리인으로 사용하고

의 예측을 기반으로 충분함의 양을 추정할 수 있다. 최종 보상

을

로 설정한다면, 최고의 정책(policy)

를 찾는 것은 아래와 같이 분류기

가 고정된 채

의 관점에서 수식 1을 해결하는 것이다.Instead, classifier

As a trust agent

The amount of sufficient can be estimated based on the prediction of. Final reward

of

If set to, the best policy

Looking for a classifier as shown below

Fixed

From the point of view of equation 1 is to solve.

여기서,

는 최고의 정책

에 의한 최종 상태이고,

는 최고의 정책

에 대응하는

이다. here,

Is the best policy

Is the final state by

Is the best policy

Corresponding to

to be.

정책: 최적의 정책을 찾기 위하여, RL 에이전트를 위해 Q-learning(예를 들면, Watkins 및 Dayan 1992에서 제안된 Q-learning)을 사용할 수 있다. 구체적으로, 순차적인 상태 공간에 대해 상태-행동 값 함수를 근사화하기 위하여 deep Q-learning(예를 들면, Mnih 등 2013에서 제안된 deep Q-learning)을 채택할 수 있다. 목표 네트워크의 지연 업데이트와 재생 메모리를 사용함으로써 deep Q-learning이 더욱 더 안정적이 되도록 만들 수 있다. 실시예에서 제안하는 순차 특징 습득 프레임워크는 Q-learning에 제한되지 않으며, policy gradient methods, A3C, TRPO와 같은 임의의 다른 표준 정책 학습 방법들 또한 실행 가능한 옵션들이다.Policy: To find the optimal policy, you can use Q-learning (eg Q-learning proposed in Watkins and Dayan 1992) for RL agents. Specifically, deep Q-learning (for example, deep Q-learning proposed by Mnih et al. 2013) may be adopted to approximate a state-behavior value function for the sequential state space. By using delayed update and playback memory of the target network, deep Q-learning can be made more stable. The sequential feature acquisition framework proposed in the embodiment is not limited to Q-learning, and any other standard policy learning methods such as policy gradient methods, A3C, and TRPO are also viable options.

프레임워크 시스템은 프레임워크에서 RL 에이전트와 분류기를 공동으로 학습시킬 수 있다. 다시 말해서, 프레임워크 시스템은

의 매개 변수가 있는 상태-행동 값 함수 Q 및

의 매개 변수가 있는 분류기 C(Q와 매치하기 위하여

라고 부르는 점을 주목하자)를 학습할 수 있다. The framework system can co-learn RL agents and classifiers in the framework. In other words, the framework system

Parameterized state-action value functions of Q and

Parameterized classifier of C (to match Q

Notice what you call).

두 개의 구성요소들은 입력 s를 공유하기 때문에, 멀티태스크를 통하여 동시에 학습시킬 수 있다. 직관적으로, 둘은 수식 1에서 단일한 공동 학습 목표를 최적화하는 데 목표가 있기 때문에 Q 및 C는 기 설정된 기준 범위의 정보량을 공유해야 한다. 그러나, 기 설정된 기준 범위 이상의 과도한 공유는 각 모델의 유연성을 감소시킬 수 있다. 이에 따라, 적당한 정보 공유 범위의 수준을 탐색할 수 있다.Since the two components share the input s, they can be trained simultaneously through multitasking. Intuitively, Q and C must share the amount of information in the preset reference range because they are aimed at optimizing a single collaborative goal in Equation 1. However, excessive sharing above a predetermined reference range can reduce the flexibility of each model. Thus, the level of appropriate information sharing range can be explored.

도 2를 참고하면, 프레임워크 시스템에서 제안하는 프레임워크를 통한 공유 효과를 설명하기 위한 예이다. 도 2(a)는 공유의 효과를 나타낸 그래프이다. Q 및 C 사이의 공유 효과를 확인하기 위하여, 다양한 공유 주제를 갖고, 100개의 특징들(10개의 정보 특징들 및 90개의 더미 특징들)을 갖는 CUBE 데이터 세트에 실시예에서 제안된 모델(프레임워크)을 사용하여 수집된 특징들의 평균 수 및 분류 정확도를 확인할 수 있다. Q 및 C 둘은 사이즈가 50-30-50인 3개의 숨겨진 계층을 갖는 MLP(multi-layer perceptrons, MLP)이다. 공유 계층의 수는 0개(완전 분리)에서 3개(완전 공유)까지 변화시킨다. 점들은 100번의 실행으로부터의 평균 정확도이며, 에러 막대들은 제1 분위 및 제3 분위를 나타낸다. 도 2(b)는 RL 에이전트 Q 및 환경 C에 대한 공동 학습 프레임워크를 나타낸 것이다.Referring to Figure 2, it is an example for explaining the sharing effect through the framework proposed by the framework system. 2 (a) is a graph showing the effect of sharing. In order to confirm the effect of sharing between Q and C, the model (framework) proposed in the embodiment in the CUBE data set with various shared themes and with 100 features (10 information features and 90 dummy features) ) Can be used to determine the average number of features collected and the accuracy of the classification. Both Q and C are multi-layer perceptrons (MLPs) with three hidden layers of size 50-30-50. The number of shared layers varies from zero (complete separation) to three (full sharing). The points are the average accuracy from 100 runs, and the error bars represent the first and third quartiles. 2 (b) shows a co-learning framework for RL agent Q and environment C. FIG.

부분적으로 정보를 공유하는 공유 모델이 정확도 및 관찰된 특징 수의 관점에서 공유하지 않거나 완전히 공유하는 극단적인 모델보다 우수하며, 적은 오차를 발생시킨다. 이에, 숨은 특징들을 공유하는 RL 에이전트 Q와 분류기 C를 공동으로 학습시키는 프레임워크를 제안한다. 예를 들면, Q 및 C가 다계층 퍼셉트론인 경우에, 그들은 처음 몇 계층만 공유할 것이다. 이 공유된 계층들은 출력이 Q와 C로 나오는 공유 인코더

로 고려될 수 있다. Q와 C를 공동으로 학습시키는

프레임워크는 다음과 같은 방법으로 공식화될 수 있다. 매 시간 t 마다, 상태

는 공유 인코더

에 입력된다. 그리고 나서, 인코딩된 표현식

는 Q와 C에 주어진다.A shared model that shares information in part is superior to an extreme model that does not share or fully share in terms of accuracy and number of features observed, resulting in less error. Therefore, we propose a framework for jointly learning the RL agent Q and the classifier C that share hidden features. For example, if Q and C are multi-layered perceptrons, they will share only the first few layers. These shared layers are shared encoders whose output is Q and C.

Can be considered as. Collaborative learning Q and C

Frameworks can be formulated in the following ways: Every hour t , state

Shared encoder

Is entered. Then, the encoded expression

Is given to Q and C.

프레임워크의 전체적인 구조는 도 2(b)에 제시되었다. Q와 C가 자체적인 손실 삼수를 갖고 있는 반면, 공유된 함수

은 응용에 따라 Q와 C 모두 또는 어느 하나에 의해 학습될 수 있다(예를 들어, Q 학습에 있어서

는 상수로 고려될 수 있고,

는 오직 C 학습을 통해 학습될 수 있으며 반대도 된다).

The overall structure of the framework is shown in Figure 2 (b). Q and C have their own lossy sums, while shared functions

Can be learned by both or both Q and C depending on the application (eg, in Q learning

Can be considered a constant,

Can only be learned through C learning and vice versa).

프레임워크 시스템에서 제안된 공유 학습 프레임워크에서 학습 및 추론을 수행할 수 있다. 공유 학습 프레임워크에서 end-to-end 방식으로 어떻게 Q과 C를 공동으로 학습시키는 방법에 대하여 설명하기로 한다. 기본적인 학습 절차를 따르고 DQN 학습의 두 가지 핵심 메커니즘을 채택할 수 있다. 이때, 작은 변화를 방지하기 위하여 재생 메모리 및 목표 Q-네트워크

의 지연 업데이트를 채택할 수 있다. 특히, 학습 단계에서, RL 에이전트는 현재 Q 값에 의해 결정되는 정책에 따라 각 데이터 지점에 대해 사건

를 생성한다. 각 상태에서, 유효하지 않은 행동들의 Q 값은

으로 설정된다. 모든 경험 히스토리인

에 대한

은 학습하는 동안 종전 경험들에 재접근할 수 있도록 재생 메모리에 저장된다. 이때, 메모리의 용량이 초과되면 가장 최근 경험들이 저장된다. 이는 샘플의 의존도를 낮춤으로써 deep Q-learning을 더 안정적으로 만든다.We can perform learning and reasoning in the shared learning framework proposed in the framework system. In the shared learning framework, we describe how to jointly learn Q and C in an end-to-end manner. You can follow the basic learning process and adopt two key mechanisms of DQN learning. At this time, the playback memory and the target Q-network to prevent small changes

A delayed update of can be adopted. In particular, in the learning phase, the RL agent is responsible for each data point according to the policy determined by the current Q value

Create In each state, the Q value of the invalid actions is

Is set. All experience history

For

Is stored in playback memory to allow re-access to previous experiences during learning. At this time, when the capacity of the memory is exceeded, the most recent experiences are stored. This makes deep Q-learning more stable by reducing sample dependence.

중간 보상

는 환경으로부터 온다. '특징 획득'의 행동에 대하여 RL 에이전트는 기 정의된 특징 획득 비용

를 얻는다. 반면,

일 때, 보상은 C로부터의 예측 결과에 기반하여 계산된다. 그러나, 이때 미완성 C로부터의 예측 결과는 노이즈로 인하여 보상 계산이 히스토리가 재생 메모리로부터 샘플링될 때까지 지연된다. 그러므로, 사건을 생성함에 있어서, 경험 튜플(experience tuple)은

의 형태로 저장된다. 이 경험에 대한 보상은 소규모 배치에 샘플링되고 훈련을 위해 사실상 사용될 때 더 똑똑한 C를 통해 계산될 수 있다.Medium compensation

Comes from the environment. With respect to the behavior of 'feature acquisition', the RL agent is responsible for the cost

Get On the other hand,

Is computed based on the prediction result from C. However, at this time, the prediction result from incomplete C is delayed until the compensation calculation is sampled from the reproduction memory due to the noise. Therefore, in creating an event, the experience tuple

Is stored in the form of. Rewards for this experience can be calculated through smarter C when sampled in small batches and actually used for training.

각 샘플에 대한 사건들을 생성함에 따라, 소규모 배치

는 재생 메모리에서 추출될 수 있다. 행동이

인 경험 튜플들에서, 보상은 이 지점에서의 현재 C를 통해 추정될 수 있다.Small batches, as events are generated for each sample

Can be extracted from the playback memory. Action

In in-tutuple tuples, the reward can be estimated through the current C at this point.

학습을 위해 소규모 배치가 주어지면, 제곱 오차

를 최소화하기 위하여 Q의 모든 파라미터들은 gradient decent 방법에 의해 학습된다. 이때,

은

이며 안정성에 대한 지연 업데이트

를 갖는다. 디스카운트하지 않은 특징 획득에 대한 전체적인 비용에 관심이 있기 때문에 디스카운트 인자(discount factor)가 1이라는 점에 주목할 가치가 있다.Given small batches for learning, squared error

To minimize, all parameters of Q are learned by the gradient decent method. At this time,

silver

Update for stability

Has It is worth noting that the discount factor is 1 because we are interested in the overall cost of acquiring non-discounted features.

Q가 학습되는 동안, C 또한 공동으로 학습된다. C는 누락 값들로 분류 태스크를 수행하기로 되어있기 때문에, 불완전 데이터 세트로 학습시킬 수 있다. 재생 메모리의 소규모 배치로부터 불완전 데이터를 시뮬레이션 할 수 있다. 소규모 배치와 함께, 크로스 엔트로피 손실

을 최소화하기 위한 gradient descent 방법으로 C가 학습될 수 있다. 여기서

는 실제 라벨에 대응하는 출력(또는

계층 이후의 확률)이다. Q 및 C는 정지 기준이 만족될 때까지 교대로 업데이트될 수 있다.While Q is learned, C is also learned jointly. Since C is supposed to perform the classification task with missing values, it can learn from an incomplete data set. Incomplete data can be simulated from small batches of playback memory. With small batches, cross entropy loss

C can be learned as a gradient descent method to minimize here

Is the output corresponding to the actual label (or

Probability after the hierarchy). Q and C can be updated alternately until the stop criteria are satisfied.

일단 Q 및 C가 학습됨에 따라 새로운 데이터 지점에 대해 능동적 특징 획득을 수행할 수 있다. 시작 상태는 부분적으로 알려진 특징들의 세트이거나 완전히 빈 세트일 수 있다. RL 에이전트는

행동이 선택될 때까지 최대 Q 값을 갖는 행동을 선택함으로써 어느 특징들이 습득되어야 하는지 결정할 수 있다.

가 선택될 때, C는 지금까지 습득된 특징들에 기반하여 예측하는 추론을 수행할 수 있다.Once Q and C are learned, active feature acquisition can be performed on the new data points. The start state may be a set of partially known features or a completely empty set. RL agents

By selecting an action with a maximum Q value until the action is selected one can determine which features should be learned.

When is selected, C may perform inference predicting based on the features learned so far.

도 3을 참고하면, 8개 클래스에 p 차원의 실수 벡터들로 구성된 CUBE 데이터 세트를 나타낸 예이다. Referring to FIG. 3, an example of a CUBE data set consisting of p-dimensional real vectors in eight classes is illustrated.

프레임워크 시스템은 누락된 특징들을 통하여 인코딩할 수 있다. 공동 학습 프레임워크에서 특징 인코딩

의 예로서 참고문헌 " Vinyals, O.; Bengio, S.; and Kudlur, M. 2016. Order matters: sequence to sequence for sets. In International Conference on Learning Representations."에서 제안된 set encoding 방법을 적용할 수 있다. Set encoding은 두 가지 모호한 경우, 1) j번째 항목이 누락 항목인 경우, 2) j번째 항목이 발견되지만 그 값이 0인 경우를 자연스레 구별하기 때문에

에 적합하다.The framework system may encode through missing features. Feature Encoding in a Collaborative Learning Framework

As an example, the set encoding method proposed in the reference "Vinyals, O .; Bengio, S .; and Kudlur, M. 2016. Order matters: sequence to sequence for sets.In International Conference on Learning Representations." have. Set encoding naturally distinguishes between two ambiguities: 1) the jth item is missing, 2) the jth item is found, but its value is 0.

Suitable for

참고문헌에서 set encoding은 읽기 블록(reading block)이라 불리는 신경망은 입력의 각 요소

를 실수 벡터

와 매핑하고, 처리 블록(process block)이라 불리는 LSTM은

를 처리하고 최종 세트 삽입을 생성하기 위하여 주목(attention) 메커니즘을 반복적으로 적용하는 두 가지의 구성을 포함한다. In bibliography, set encoding is called a reading block.

Mistake vector

And the LSTM, called process block,

It includes two constructs that iteratively apply the attention mechanism to process and generate the final set insert.

프레임워크 시스템은 각 상태

를 표현하기 위하여 set encoding 방법을 채택할 수 있다. 특징 인덱스와 특징 인덱스로부터 관찰된 값의 쌍인

를 세트 내 원소(element)로 취급할 수 있다. 특징 인덱스의 실제 값은 어떠한 정보도 전달하지 않기 때문에 각 관찰된 특징을

로 나타낸다. 여기서

는 좌표 정보를 통합하기 위하여 j번째 좌표에는 1, 그 외에는 0을 갖는 one-hot 벡터이다. 이후, 상기에 소개된 (처리 블록 이후

을 생성하는 읽기 블록을 거치는) set encoding 메커니즘을 통해, 관찰된 특징들을 갖는 세트 삽입(set embedding)을 생성한다.The framework system is in each state

The set encoding method can be adopted to represent. A pair of values observed from the feature index and the feature index,

Can be treated as an element in the set. Since the actual value of the feature index carries no information, each observed feature

Represented by here

Is a one-hot vector with 1 in the jth coordinate and 0 otherwise to integrate the coordinate information. Then, after the processing block introduced above

Through a set encoding mechanism (through a read block) to generate a set embedding with the observed features.

도 4는 일 실시예에 따른 프레임워크 시스템의 공동 학습 프레임워크를 제공하는 방법을 설명하기 위한 흐름도이다. 4 is a flowchart illustrating a method of providing a collaborative learning framework of a framework system according to an embodiment.

단계(410)에서 프레임워크 시스템은 RL 에이전트에서 각 데이터 지점에 대하여 적어도 하나 이상의 특징을 획득할 수 있다. 프레임워크 시스템은 데이터 지점에 대하여 기 설정된 순서로 특징들을 능동적으로 습득할 수 있다. 예를 들면, 순방향 또는 역방향 등의 순서에 기초하여 특징들을 능동적으로 습득할 수 있다. 프레임워크 시스템은 RL 에이전트로부터 각 데이터 지점에 따라 선택되는 특징들의 서브 세트에 따라 일정 시간까지 획득된 특징을 탐색하고, 탐색된 특징 중 일부의 특징을 선택하는 행동을 수행함에 따라 특징을 획득할 수 있다. In step 410, the framework system may acquire at least one feature for each data point in the RL agent. The framework system can actively acquire features in a predetermined order for the data points. For example, features may be actively acquired based on an order such as forward or reverse direction. The framework system may acquire the feature by searching for the acquired feature up to a certain time according to a subset of the features selected for each data point from the RL agent, and performing an action of selecting some of the found features. have.

단계(420)에서 프레임워크 시스템은 획득된 특징에 대한 분류 손실과 획득 비용과 관련된 프레임워크를 공식화할 수 있다. 프레임워크 시스템은 획득된 특징에 대한 분류 손실과 획득 비용을 동시에 최소화하는 모델을 학습시키기 위한 프레임워크를 구성할 수 있다. 프레임워크 시스템은 획득된 특징에 대하여 환경에 의하여 주어지는 보상을 예측하기 위하여 기 설정된 기준 이상의 정보가 제공되었는지 여부를 측정하고, RL 에이전트에서 각 데이터 지점에서의 분류 수행 여부에 따라 보상을 제공하기 위한 프레임워크를 구성할 수 있다. 이때, 프레임워크 시스템은 RL 에이전트에 딥러닝을 수행함으로써 상태-행동 값 함수를 근사화하여 정책을 탐색할 수 있다. In step 420, the framework system may formulate a framework related to the classification loss and acquisition cost for the acquired features. The framework system may construct a framework for learning a model that simultaneously minimizes classification loss and acquisition cost for acquired features. The framework system measures whether information above a predetermined criterion is provided in order to predict a reward given by the environment with respect to the acquired feature, and provides a frame according to whether or not classification is performed at each data point in the RL agent. Work can be configured. In this case, the framework system may search the policy by approximating the state-action value function by performing deep learning on the RL agent.

단계(430)에서 프레임워크 시스템은 프레임워크를 통하여 RL 에이전트와 분류기를 공동으로 학습시킬 수 있다. 프레임워크 시스템은 RL 에이전트와 분류기를 공동으로 학습시킴에 따라 특징 또는 숨은 특징들을 공유하고, RL 에이전트와 분류기에서 공유되는 특징 또는 숨은 특징들의 서브 세트를 인코딩할 수 있다. 이때, 특징을 획득함에 있어서 발생하는 누락 항목을 다루기 위하여 숨은 특징들의 서브 세트를 인코딩한다. 프레임워크 시스템은 RL 에이전트와 상태-행동 값 함수에 의해 결정되는 정책에 따라 각 데이터 지점에 대해 사건을 생성하고, 사건의 분류에 사용되는 특징의 서브 세트를 선택하고, 행동을 취함에 따라 에이전트가 정지 행동을 선택할 때까지 획득된 특징에 대한 특징 획득 비용과 획득된 특징 값을 에이전트에게 반환할 수 있다. In operation 430, the framework system may jointly train the RL agent and the classifier through the framework. The framework system may share features or hidden features as the RL agent and classifier learn jointly, and may encode a subset of features or hidden features shared by the RL agent and the classifier. In this case, a subset of hidden features is encoded to deal with missing items occurring in acquiring the feature. The framework system generates events for each data point according to the policy determined by the RL agent and the state-action value function, selects a subset of the features used to classify the events, and acts as the agent takes action. Until the stop action is selected, the feature acquisition cost and the acquired feature value for the acquired feature may be returned to the agent.

단계(440)에서 프레임워크 시스템은 RL 에이전트와 분류기에서 공동으로 학습을 수행함에 따라 새로운 데이터 지점에 대한 특징 획득을 추론할 수 있다. In step 440, the framework system may infer feature acquisition for a new data point as the RL agent and the classifier jointly perform learning.

예를 들면, 비용 인식 순차 특징 선택은 특징이 완전히 제공되지 않고 각 특징의 콜렉션이 의료 데이터와 같이 가변 비용을 초래하는 상황에서 사용될 수 있다. 예측 손실과 특징 획득 비용을 동시에 최소화하는 최적화 문제로 공식화하고 분류기와 RL 에이전트에 대한 공동 학습 프레임워크를 도출하였다. 프레임워크 시스템은 예측 유용성과 수집 비용을 고려하여 순차적으로 특징을 수집하고 특징의 세트(집합)의 일부만을 사용하여 예측을 수행할 수 있다. 특히, 최적의 정책을 학습하고 수집된 특징 세트를 인코딩하기 위하여 하위 네트워크 계층을 공유하는 분류기를 위해 공동으로 훈련된 다중 작업 네트워크로 모델을 설계할 수 있다. 특징에 대한 액세스 권한이있는 모델보다 훨씬 뛰어난 성능을 얻을 수 있으며, 여러 가지 특징을 사용하여 기준선보다 훨씬 뛰어난 성능을 보이는 관련 기준선에 대해 분류를 위해 종합 및 실제 의료 데이터를 검증할 수 있다.For example, cost-aware sequential feature selection can be used in situations where features are not fully provided and collections of each feature result in variable costs, such as medical data. We formulated the optimization problem to minimize the prediction loss and the feature acquisition cost at the same time, and derive the co-learning framework for classifier and RL agent. The framework system may sequentially collect features in consideration of prediction availability and collection cost and perform prediction using only a portion of the set (set) of features. In particular, the model can be designed as a jointly trained multi-tasking network for classifiers that share a lower network layer in order to learn optimal policy and encode the collected feature set. Much better performance can be achieved than models with access to features, and various features can be used to validate aggregate and actual medical data for classification against relevant baselines that perform much better than baselines.

프레임워크 시스템은 의료용 챗봇 등의 상담 자동화를 위한 챗봇 등에 사용될 수 있다. 또한, 프레임워크 시스템은 질문을 동적으로 사용자에게 질의하고, 그에 따른 답변을 얻음으로써 판단을 내리는 과정을 자동화하여 상담을 자동화시킬 수 있다. 예를 들면, 프레임워크 시스템은 상담을 자동화하기를 원하는 여러 사이트(예를 들면, A/S 센터, 의료 분야, 분류 태스크(진단)를 수행하는 곳)에서 축적된 데이터를 통하여 챗봇이나 웹사이트 문답 형식으로 상담을 자동화할 수 있다. 진료는 해주는 상담 챗봇을 통하여 병원에 방문하기 어려운 노인들, 쉬이 병원을 찾지 않는 사람들이 쉽게 상담에 접근하고 필요한 진료를 알도록 제공할 수 있고, 병원에서는 간단한 문진을 자동화하여 접근성을 높일 수 있다.The framework system may be used for chatbots for automation of consultation such as medical chatbots. In addition, the framework system may automate counseling by automating the process of making a decision by dynamically asking a user a question and obtaining an answer accordingly. For example, a framework system may use chatbots or website questions and answers from data accumulated at various sites that want to automate counseling (for example, after-sales centers, healthcare, and performing classification tasks). The form can be automated. Through the consultation chatbot which provides medical treatment, the elderly who are hard to visit the hospital and those who do not visit the hospital can easily access the consultation and know the necessary medical care.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments are, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable gate arrays (FPGAs). Can be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. It can be embodied in. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.
Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims

In the method for providing a co-learning framework performed by a framework system performed by a computer,
Formulating a framework related to classification loss and acquisition cost for the acquired features as obtaining at least one feature for each data point in an RL agent;
Jointly learning an RL agent and a classifier through the framework; And
Inferring feature acquisition for a new data point as learning is performed jointly by the RL agent and the classifier
Co-learning framework providing method comprising a.

The method of claim 1,
Formulating a framework related to the classification loss and acquisition cost for the acquired features,
Actively acquiring features in a predetermined order for the data point
Co-learning framework providing method comprising a.

The method of claim 2,
Formulating a framework related to the classification loss and acquisition cost for the acquired features,
Acquiring a feature from the RL agent by searching for a feature acquired up to a predetermined time according to a subset of features selected according to each data point, and performing an action of selecting a feature of some of the found features
Co-learning framework providing method comprising a.

The method of claim 1,
Formulating a framework related to the classification loss and acquisition cost for the acquired features,
Formulating a framework to learn a model that simultaneously minimizes classification loss and acquisition cost for the acquired features.
Co-learning framework providing method comprising a.

The method of claim 4, wherein
Formulating a framework related to the classification loss and acquisition cost for the acquired features,
A framework for measuring whether information above a predetermined criterion is provided to predict rewards given by an environment with respect to the acquired feature, and providing a reward according to whether the RL agent performs classification at each data point; Steps to Configure
Co-learning framework providing method comprising a.

The method of claim 5,
Formulating a framework related to the classification loss and acquisition cost for the acquired features,
Searching for a policy by approximating a state-action value function by performing deep learning on the RL agent
Co-learning framework providing method comprising a.

The method of claim 1,
Learning the RL agent and the classifier jointly through the framework may include:
Sharing a feature or hidden features as jointly learning the RL agent and the classifier, and encoding a subset of the feature or hidden features shared by the RL agent and the classifier
Co-learning framework providing method comprising a.

The method of claim 1,
Learning the RL agent and the classifier jointly through the framework may include:
Create an event for each data point according to a policy determined by the RL agent and a state-action value function, select a subset of the features used to classify the event, and stop the agent as it takes action Returning the feature acquisition cost and the acquired feature value for the acquired feature to the agent until the action is selected
Co-learning framework providing method comprising a.

The method of claim 8,
Learning the RL agent and the classifier jointly through the framework may include:
Evaluating, at the classifier, the quality of at least one feature selected from the obtained features, and awarding a reward to the RL agent based on the evaluated quality
Co-learning framework providing method comprising a.

A computer program stored in a computer readable recording medium for executing a method for providing a co-learning framework performed by a computer-based framework system,
Formulating a framework related to classification loss and acquisition cost for the acquired features as obtaining at least one feature for each data point in an RL agent;
Jointly learning an RL agent and a classifier through the framework; And
Inferring feature acquisition for a new data point as learning is performed jointly by the RL agent and the classifier
Computer program stored in a computer readable recording medium comprising a.

The method of claim 10,
Formulating a framework related to the classification loss and acquisition cost for the acquired features,
Actively acquire features in a predetermined order with respect to the data points, search for features acquired by a predetermined time according to a subset of features selected according to each data point from the RL agent, and some of the found features Acquiring the feature as performing the action of selecting the feature of the
Computer program stored in a computer readable recording medium comprising a.

The method of claim 10,
Formulating a framework related to the classification loss and acquisition cost for the acquired features,
Configure a framework to learn a model that minimizes classification loss and acquisition cost for the acquired features at the same time, and whether information above a predetermined criterion is provided to predict the rewards given by the environment for the acquired features. Searching for a policy by approximating a state-behavior value function by providing a compensation, providing compensation based on whether the RL agent performs classification at each data point, and performing deep learning on the RL agent.
Computer program stored in a computer readable recording medium comprising a.

The method of claim 10,
Learning the RL agent and the classifier jointly through the framework may include:
Sharing a feature or hidden features as jointly learning the RL agent and the classifier, and encoding a subset of the feature or hidden features shared by the RL agent and the classifier
Computer program stored in a computer readable recording medium comprising a.

The method of claim 10,
Learning the RL agent and the classifier jointly through the framework may include:
Create an event for each data point according to a policy determined by the RL agent and a state-action value function, select a subset of the features used to classify the event, and stop the agent as it takes action Returning the feature acquisition cost and the acquired feature value for the acquired feature to the agent until the action is selected
Computer program stored in a computer readable recording medium comprising a.