KR102102772B1

KR102102772B1 - Electronic apparatus and method for generating trained model thereof

Info

Publication number: KR102102772B1
Application number: KR1020180010925A
Authority: KR
Inventors: 황성주; 이해범; 나동현; 양은호
Original assignee: 한국과학기술원
Priority date: 2017-06-09
Filing date: 2018-01-29
Publication date: 2020-05-29
Also published as: KR102139740B1; KR20180134738A; KR102139729B1; KR20180134740A; KR20180134739A

Abstract

학습 모델 생성 방법이 개시된다. 본 학습 모델 생성 방법은 학습 데이터를 입력받는 단계, 태스크 간에 비대칭 지식 전달을 허용하는 학습 모델의 파라미터 행렬과 태스크에서 특징으로의 피드백 연결을 위한 피드백 행렬을 갖는 비대칭 다중-태스크 특징 네트워크를 생성하는 단계, 기설정된 목적 함수가 최소화되도록 입력된 학습 데이터를 이용하여 비대칭 다중-태스크 특징 네트워크의 파라미터 행렬을 산출하는 단계, 및 산출된 파라미터 행렬을 생성된 비대칭 다중-태스크 특징 네트워크의 파라미터로 사용하여 비대칭 다중-태스크 특징 학습 모델을 생성하는 단계를 포함한다. Disclosed is a method for generating a learning model. The method for generating a learning model includes receiving learning data, generating an asymmetric multi-task feature network having a parameter matrix of a learning model that allows the transfer of asymmetric knowledge between tasks and a feedback matrix for feedback connection from a task to a feature. , Calculating a parameter matrix of the asymmetric multi-task feature network using the input learning data so that the predetermined objective function is minimized, and using the calculated parameter matrix as a parameter of the generated asymmetric multi-task feature network. And-generating a task feature learning model.

Description

ELECTRONIC APPARATUS AND METHOD FOR GENERATING TRAINED MODEL THEREOF}

본 개시는 전자 장치 및 학습 모델 생성 방법에 관한 것으로, 더욱 상세하기는 비대칭 지식 전이를 강화하여 부정적 전이를 저감하는 상태로 태스크 학습이 가능한 전자 장치 및 학습 모델 생성 방법에 관한 것이다. The present disclosure relates to an electronic device and a method for generating a learning model, and more particularly, to an electronic device and a method for generating a learning model capable of task learning in a state in which negative transfer is reduced by enhancing asymmetric knowledge transfer.

기계학습의 한 분야로서, 서로 다른 태스크들의 연관성을 이용하여 태스크(task)들 간 지식 전이를 활용하는 다중 태스크 학습(Multi-task learning)이 있다. One area of machine learning is multi-task learning, which utilizes the transfer of knowledge between tasks using the association of different tasks.

다중 태스크 학습의 대표적인 형식으로 공통적인 특징을 학습하여 서로 다른 태스크들이 공유하고 선택하도록 하는 GO-MTL(Learning task grouping and overlap in multi-task learning) 모델이 있다. 이러한 GO-MTL 모델은 히든 레이어(hidden Layer)가 한 개인 뉴럴 네트워크에 비선형 변형이 추가된 것이라고 볼 수 있다. There is a learning task grouping and overlap in multi-task learning (GO-MTL) model that allows common tasks to be shared and selected by learning common features as a representative form of multitask learning. This GO-MTL model can be seen as a nonlinear transformation added to a neural network with a single hidden layer.

그러나 GO-MTL 모델은 태스크들 간에 난이도 차이가 심하고, 특정 태스크에 대한 잘못 학습된 파라미터 정보가 다른 태스크에 전이될 경우, 독립적으로 학습했을 때보다 못한 성능을 보이게 된다. 이를 부정적 전이(Negative transfer)라고 하며 다중 태스크 학습에서 해결해야 할 중요한 과제이다.However, the GO-MTL model exhibits poor performance than when it is independently trained, when the difference in difficulty between tasks is severe, and the incorrectly learned parameter information for a specific task is transferred to another task. This is called negative transfer and is an important task to be solved in multitask learning.

최근에는 상술한 부정적 전이를 해결하기 위하여 AMTL (Asymmetric Multi-task Learning) 모델을 이용하고 있다. Recently, asymmetric multi-task learning (AMTL) model has been used to solve the above-mentioned negative transition.

AMTL 모델에서는 GO-MTL과 달리 각 태스크가 공통된 특징을 거치지 않고 다른 태스크 파라미터로부터의 직접적인 지식 전이를 활용하였다. 예를 들어, 얼룩말 (일반 말 + 줄무늬)을 예측하는 태스크는 일반 말을 예측하는 태스크의 파라미터와 호랑이(줄무늬)를 예측하는 태스크의 파라미터의 선형 결합과 가깝도록 유도하였다. In the AMTL model, unlike GO-MTL, each task does not go through a common feature and utilizes direct knowledge transfer from other task parameters. For example, the task of predicting a zebra (normal horse + stripes) was derived to be close to the linear combination of the parameters of the task predicting the normal horse and the parameter of the task predicting the tiger (stripes).

이 방법은 공통된 특징 학습과 달리 각 태스크들 간 비대칭적 지식 전이 (Asymmetric knowledge transfer)가 가능하도록 하였다. 각 태스크들이 서로 연관성이 있다고 하더라도, 쉬운 태스크에서 어려운 태스크의 방향으로만 지식이 전이되도록 유도하고, 어려운 태스크에서는 다른 태스크로 지식이 전이되지 않도록 막음으로써 기존의 특징 공유 기반 모델들이 가졌던 부정적 전이의 문제점을 해결한 것이다. This method enables asymmetric knowledge transfer between tasks, unlike common feature learning. Even if each task is related to each other, the problem of the negative transfer of existing feature sharing based models by inducing the transfer of knowledge only from the easy task to the difficult task and preventing the transfer of knowledge from the difficult task to other tasks Will solve it.

그러나 AMTL 모델은 부정적 전이를 해결하는데 유용하지만 다음과 같은 문제점이 있었다. However, the AMTL model is useful for solving negative transitions, but has the following problems.

첫째, AMTL은 명시적인 특징(explicit feature)을 가정하고 있지 않으므로 심층 뉴럴 네트워크(Deep Neural Network)에 그대로 적용하기에는 적절하지 못하였다. First, AMTL does not assume an explicit feature, so it was not appropriate to apply it directly to a deep neural network.

둘째, 태스크의 개수가 많아질 경우 AMTL은 지니고 있어야 할 각 태스크 간 전이 행렬의 크기가 이차적으로(quadratically) 증가하기 때문에 학습에 요구되는 메모리와 그 시간적인 비용이 효율적이지 못하였다. Second, when the number of tasks increases, the memory required for learning and its time cost are not efficient because the size of the transition matrix between each task to be carried in the AMTL increases quadratically.

셋째, 태스크들 간 직접적인 지식 전이보다는 공통된 특징 기반의 모델이 실제 케이스를 더 잘 반영하고 따라서 인간의 직관에 더 부합하다는 점이다.Third, rather than direct knowledge transfer between tasks, the common feature-based model better reflects the real case, and thus more consistent with human intuition.

따라서, 부정적 전이를 해결하면서도 심층 뉴럴 네트워크에 적용할 수 있는 새로운 모델이 요구되었다. Therefore, a new model that can be applied to deep neural networks while solving negative transitions is required.

따라서, 본 개시의 목적은 비대칭 지식 전이를 강화하여 부정적 전이를 저감하는 상태로 태스크 학습이 가능한 전자 장치 및 학습 모델 생성 방법을 제공하는 데 있다. Accordingly, an object of the present disclosure is to provide an electronic device capable of task learning in a state in which a negative transfer is reduced by strengthening an asymmetric knowledge transfer and a method for generating a learning model.

상술한 바와 같은 목적을 달성하기 위한 본 개시의 학습 모델 생성 방법은, 학습 데이터를 입력받는 단계, 태스크 간에 비대칭 지식 전달을 허용하는 학습 모델의 파라미터 행렬과 태스크에서 특징으로의 피드백 연결을 위한 피드백 행렬을 갖는 비대칭 다중-태스크 특징 네트워크를 생성하는 단계, 기설정된 목적 함수가 최소화되도록 상기 입력된 학습 데이터를 이용하여 상기 비대칭 다중-태스크 특징 네트워크의 파라미터 행렬을 산출하는 단계, 및 상기 산출된 파라미터 행렬을 상기 생성된 비대칭 다중-태스크 특징 네트워크의 파라미터로 사용하여 비대칭 다중-태스크 특징 학습 모델을 생성하는 단계를 포함한다. The method for generating a learning model of the present disclosure for achieving the above object includes receiving a training data, a parameter matrix of a learning model that allows the transfer of asymmetric knowledge between tasks, and a feedback matrix for linking feedback from tasks to features. Generating an asymmetric multi-task feature network having, calculating a parameter matrix of the asymmetric multi-task feature network using the input training data such that a predetermined objective function is minimized, and calculating the calculated parameter matrix. And using the generated asymmetric multi-task feature network as a parameter to generate an asymmetric multi-task feature learning model.

이 경우, 상기 기설정된 목적 함수는 상기 '태스크 간에 비대칭 지식 전달을 허용하는 학습 모델'에 대한 손실 함수, 상기 피드백 행렬을 이용하고 태스크 파라미터들의 비선형 결합을 유도하는 오토인코더 항 및 파라미터 감쇠 정규화 항을 포함할 수 있다. In this case, the preset objective function includes a loss function for the 'learning model that allows the transfer of asymmetric knowledge between tasks', an autoencoder term using the feedback matrix and deriving a nonlinear combination of task parameters and a parameter attenuation normalization term. It can contain.

한편, 상기 기설정된 목적 함수는, 다음의 수식일 수 있다. Meanwhile, the preset objective function may be the following equation.

여기서,

는 다중 레이어로 구성되는 비대칭 다중-태스크 특징 네트워크의 행렬이고,

은 상기 비대칭 다중-태스크 특징 네트워크의 마지막 레이어에 대한 가중치 행렬이고,

는 상기 피드백 행렬,

는

벡터의 t번째 행,

는 태스크 t에 대한 싱글 태스크 학습의 평균 유효성 손실이고,

,

는 비선형 함수, α,

,λ 각각은 각 항의 비중을 조절하기 위한 모델 파라미터이다. here,

Is a matrix of asymmetric multi-task feature networks composed of multiple layers,

Is a weighting matrix for the last layer of the asymmetric multi-task feature network,

Is the feedback matrix,

The

T row of vector,

Is the average effectiveness loss of single-task learning for task t,

,

Is a nonlinear function, α,

Each of, λ is a model parameter for controlling the specific gravity of each term.

한편, 상기 비대칭 다중-태스크 특징 네트워크는 히든 레이어(hidden Layer)가 복수개일 수 있다. Meanwhile, the asymmetric multi-task feature network may have a plurality of hidden layers.

한편, 본 개시의 일 실시 예에 따른 전자 장치는, 학습 데이터가 저장된 메모리, 및 태스크 간에 비대칭 지식 전달을 허용하는 학습 모델의 파라미터 행렬과 태스크에서 특징으로의 피드백 연결을 위한 피드백 행렬을 갖는 비대칭 다중-태스크 특징 네트워크를 생성하고, 기설정된 목적 함수가 최소화되도록 상기 저장된 학습 데이터를 이용하여 상기 비대칭 다중-태스크 특징 네트워크의 파라미터 행렬을 산출하고, 상기 산출된 파라미터 행렬을 상기 생성된 비대칭 다중-태스크 특징 네트워크의 파라미터로 사용하여 비대칭 다중-태스크 특징 학습 모델을 생성하는 프로세서를 포함한다. Meanwhile, the electronic device according to an embodiment of the present disclosure has an asymmetric multiple having a memory in which learning data is stored, and a parameter matrix of a learning model that allows transfer of asymmetric knowledge between tasks, and a feedback matrix for feedback connection from a task to a feature. -Create a task feature network, calculate a parameter matrix of the asymmetric multi-task feature network using the stored learning data so that a predetermined objective function is minimized, and generate the calculated parameter matrix as the generated asymmetric multi-task feature And a processor that generates an asymmetric multi-task feature learning model using the parameters of the network.

여기서,

는 상기 피드백 행렬,

는

벡터의 t번째 행,

,

는 비선형 함수, α,

Is the feedback matrix,

The

T row of vector,

Is the average effectiveness loss of single-task learning for task t,

,

Is a nonlinear function, α,

한편, 상기 비대칭 다중-태스크 특징 네트워크는 히든 레이어가 복수개일 수 있다. Meanwhile, the asymmetric multi-task feature network may have a plurality of hidden layers.

한편, 본 개시의 전자 장치에서의 학습 모델 생성 방법을 실행하기 위한 프로그램을 포함하는 컴퓨터 판독가능 기록 매체에 있어서, 상기 학습 모델 생성 방법은 학습 데이터를 입력받는 단계, 태스크 간에 비대칭 지식 전달을 허용하는 학습 모델의 파라미터 행렬과 태스크에서 특징으로의 피드백 연결을 위한 피드백 행렬을 갖는 비대칭 다중-태스크 특징 네트워크를 생성하는 단계, 기설정된 목적 함수가 최소화되도록 상기 입력된 학습 데이터를 이용하여 상기 비대칭 다중-태스크 특징 네트워크의 파라미터 행렬을 산출하는 단계, 및 상기 산출된 파라미터 행렬을 상기 생성된 비대칭 다중-태스크 특징 네트워크의 파라미터로 사용하여 비대칭 다중-태스크 특징 학습 모델을 생성하는 단계를 포함한다. On the other hand, in a computer-readable recording medium including a program for executing a method for generating a learning model in an electronic device of the present disclosure, the method for generating a learning model includes receiving learning data and allowing asymmetric knowledge transfer between tasks. Generating an asymmetric multi-task feature network having a parameter matrix of the training model and a feedback matrix for feedback connection from task to feature, using the input training data to minimize a predetermined objective function to the asymmetric multi-task Calculating a parameter matrix of a feature network, and using the calculated parameter matrix as a parameter of the generated asymmetric multi-task feature network to generate an asymmetric multi-task feature learning model.

상술한 바와 같이 본 개시의 다양한 실시 예에 따르면, 특징에 대한 학습과정에서 각 태스크의 대칭적 영향으로 야기되는 부정적인 전이를 해결할 수 있으며, 심층 뉴럴 네트워크에 적용이 가능하게 된다. As described above, according to various embodiments of the present disclosure, a negative transition caused by a symmetrical effect of each task in a learning process for a feature can be solved, and it can be applied to a deep neural network.

도 1은 본 개시의 일 실시 예에 따른 전자 장치의 간단한 구성을 나타내는 블록도,
도 2는 본 개시의 일 실시 예에 따른 전자 장치의 구체적인 구성을 나타내는 블록도,
도 3은 본 개시에 따른 비대칭 다중 텍스트 특징 학습 방법을 설명하기 위한 도면,
도 4는 본 개시의 일 실시 예에 따른 피드백 재구성을 나타내는 도면,
도 5는 본 개시의 일 실시 예에 따른 Deep-AMTFL를 나타내는 도면,
도 6은 임의의 기저 네트워크에서 본 AMTFL 모델의 레이어를 나타내는 도면,
도 7은 합성 데이터 세트를 가지고 학습된 특징 및 파라미터를 도시한 도면,
도 8은 다양한 학습 모델의 정량적 평가 결과를 나타내는 도면,
도 9는 STL 대비 태스크 당 오류 감소를 나타내는 도면,
도 10은 증가된 태스크 수에 따른 오류 감소 및 학습 시간을 나타내는 도면,
도 11은 실제 데이터 세트에 적용한 2개의 모델의 실험 결과를 나타내는 도면,
도 12는 복수의 심층 모델에서의 정량적 평가를 나타내는 도면,
도 13은 AWA 데이터 세트에 대한 태스크 단위의 결과를 나타내는 도면,
도 14는 본 개시의 일 실시 예에 따른 학습 모델 생성 방법을 나타내는 도면이다. 1 is a block diagram showing a simple configuration of an electronic device according to an embodiment of the present disclosure;
2 is a block diagram showing a specific configuration of an electronic device according to an embodiment of the present disclosure;
3 is a view for explaining an asymmetric multi-text feature learning method according to the present disclosure;
4 is a diagram illustrating feedback reconstruction according to an embodiment of the present disclosure,
5 is a diagram showing a Deep-AMTFL according to an embodiment of the present disclosure,
6 is a view showing a layer of the AMTFL model viewed from an arbitrary base network,
7 is a diagram showing features and parameters learned with a composite data set;
8 is a view showing the results of quantitative evaluation of various learning models,
9 is a view showing a reduction in errors per task compared to STL,
10 is a diagram showing error reduction and learning time according to an increased number of tasks,
11 is a diagram showing experimental results of two models applied to a real data set;
12 is a diagram showing quantitative evaluation in a plurality of in-depth models,
13 is a view showing a result of a task unit for an AWA data set,
14 is a diagram illustrating a method for generating a learning model according to an embodiment of the present disclosure.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 개시에 대해 구체적으로 설명하기로 한다. Terms used in this specification will be briefly described, and the present disclosure will be described in detail.

본 개시의 실시 예에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 개시의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다. Terms used in the embodiments of the present disclosure, while considering the functions in the present disclosure, general terms that are currently widely used are selected, but this may vary depending on the intention or precedent of a person skilled in the art or the appearance of new technologies. . In addition, in certain cases, some terms are arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding disclosure. Therefore, the terms used in the present disclosure should be defined based on the meaning of the terms and the contents of the present disclosure, not simply the names of the terms.

본 개시의 실시 예들은 다양한 변환을 가할 수 있고 여러 가지 실시 예를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 특정한 실시 형태에 대해 범위를 한정하려는 것이 아니며, 개시된 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 실시 예들을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Embodiments of the present disclosure may apply various transformations and may have various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the scope of the specific embodiments, it should be understood to include all transformations, equivalents, or substitutes included in the disclosed spirit and scope of technology. In the description of the embodiments, when it is determined that the detailed description of the related known technology may obscure the subject matter, the detailed description is omitted.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are only used to distinguish one component from other components.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다." 또는 "구성되다." 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, "includes." Or "composed." Terms such as intended to designate the presence of a feature, number, step, operation, component, part, or combination thereof described in the specification, one or more other features or numbers, steps, operation, component, part, or It should be understood that the possibility of the presence or addition of these combinations is not excluded in advance.

본 개시의 실시 예에서 '모듈' 혹은 '부'는 적어도 하나의 기능이나 동작을 수행하며, 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 '모듈' 혹은 복수의 '부'는 특정한 하드웨어로 구현될 필요가 있는 '모듈' 혹은 '부'를 제외하고는 적어도 하나의 모듈로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.In the exemplary embodiment of the present disclosure, the 'module' or the 'unit' performs at least one function or operation, and may be implemented by hardware or software, or a combination of hardware and software. In addition, a plurality of 'modules' or a plurality of 'units' may be integrated into at least one module except for a 'module' or 'unit', which needs to be implemented with specific hardware, to be implemented with at least one processor.

아래에서는 첨부한 도면을 참고하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present disclosure pertains can easily implement them. However, the present disclosure may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present disclosure in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

이하에서는 도면을 참조하여 본 개시에 대해 더욱 상세히 설명하기로 한다.Hereinafter, the present disclosure will be described in more detail with reference to the drawings.

도 1은 본 개시의 일 실시 예에 따른 전자 장치의 간단한 구성을 나타내는 블록도이다. 1 is a block diagram illustrating a simple configuration of an electronic device according to an embodiment of the present disclosure.

도 1을 참조하면, 전자 장치(100)는 메모리(110) 및 프로세서(120)로 구성될 수 있다. 여기서 전자 장치(100)는 데이터 연산이 가능한 PC, 노트북 PC, 서버 등일 수 있다. Referring to FIG. 1, the electronic device 100 may include a memory 110 and a processor 120. Here, the electronic device 100 may be a PC, a notebook PC, or a server capable of data calculation.

메모리(110)는 학습 모델을 학습시키기 위한 학습 데이터를 저장할 수 있으며, 해당 학습 모델을 이용하여 분류 또는 인식하기 위한 데이터를 저장할 수도 있다. The memory 110 may store learning data for training the learning model, and may store data for classification or recognition using the learning model.

메모리(110)는 태스크 간에 비대칭 지식 전달을 허용하는 학습 모델을 저장할 수 있다. 여기서 학습 모델은 AMTL(Asymmetric Multi-task Learning)일 수 있다. 한편, 학습 모델은 네트워크로 지칭될 수도 있다. The memory 110 may store a learning model that allows asymmetric knowledge transfer between tasks. Here, the learning model may be asymmetric multi-task learning (AMTL). Meanwhile, the learning model may be referred to as a network.

그리고 메모리(110)는 프로세서(120)에서 생성한 비대칭 다중-태스크 특징 학습 모델(즉, AMTFL(Asymmetric Multi-Task Feature Learning)을 생성할 수 있다. 생성된 AMTFL 모델은 히든 레이어가 하나일 수 있으며, 복수개일 수도 있다. In addition, the memory 110 may generate an asymmetric multi-task feature learning model (ie, asymmetric multi-task feature learning) generated by the processor 120. The generated AMTFL model may have one hidden layer, , May be plural.

또한, 메모리(110)는 학습 모델 최적화를 수행하는데 필요한 프로그램을 저장할 수 있다. Also, the memory 110 may store a program necessary to perform learning model optimization.

이러한, 메모리(110)는 전자 장치(100) 내의 저장매체 및 외부 저장매체, 예를 들어 USB 메모리를 포함한 Removable Disk, 호스트(Host)에 연결된 저장매체, 네트워크를 통한 웹서버(Web server) 등으로 구현될 수 있다. The memory 110 is a storage medium and an external storage medium in the electronic device 100, for example, a removable disk including a USB memory, a storage medium connected to a host, a web server through a network, or the like. Can be implemented.

프로세서(120)는 전자 장치(100) 내의 각 구성에 대한 제어를 수행한다. 구체적으로, 프로세서(120)는 사용자로부터 부팅 명령이 입력되면, 메모리(110)에 저장된 운영체제를 이용하여 부팅을 수행할 수 있다. The processor 120 performs control for each component in the electronic device 100. Specifically, when a boot command is input from the user, the processor 120 may boot using an operating system stored in the memory 110.

프로세서(120)는 후술할 조작 입력부(140)를 통하여 비대칭 다중-태스크 특징 네트워크를 생성하는데 필요한 각종 파라미터를 조작 입력부(140)를 통하여 입력받을 수 있다. 여기서 입력받는 각종 파라미터는 하이퍼파라미터 등일 수 있다. The processor 120 may receive various parameters necessary for generating an asymmetric multi-task feature network through the manipulation input 140 through the manipulation input 140. Here, various parameters received may be hyperparameters or the like.

각종 정보를 입력받으면, 프로세서(120)는 태스크 간에 비대칭 지식 전달을 허용하는 학습 모델의 파라미터 행렬과 태스크에서 특징으로의 피드백 연결을 위한 피드백 행렬을 갖는 비대칭 다중-태스크 특징 네트워크를 생성할 수 있다. 구체적으로, 도 4에 도시된 바와 같이 피드백 연결을 있는 뉴럴 네트워크를 생성할 수 있다. Upon receiving various information, the processor 120 may generate an asymmetric multi-task feature network having a parameter matrix of a learning model that allows the transfer of asymmetric knowledge between tasks and a feedback matrix for feedback connection from task to feature. Specifically, as shown in FIG. 4, a neural network having a feedback connection can be generated.

여기서 태스크 간에 비대칭 지식 전달을 허용하는 학습 모델은 AMTL 모델에 기초한 학습 모델이다. Here, the learning model that allows the transfer of asymmetric knowledge between tasks is a learning model based on the AMTL model.

피드백 행렬은 태스크에서 특징으로의 피드백 연결을 위한 것으로, 특징 공간(z)에 대한 피드백 연결 행렬일 수 있다. The feedback matrix is for feedback connection from the task to the feature, and may be a feedback connection matrix for the feature space z.

그리고 프로세서(120)는 기설정된 목적 함수가 최소화되도록 저장된 학습 데이터를 이용하여 비대칭 다중-태스크 특징 네트워크의 파라미터 행렬을 산출한다. 이때 프로세서(120)는 추계적 경사 하강(Stochastic Gradient Descent) 방법을 이용하여 목적 함수가 최소화되도록 할 수 있다. Then, the processor 120 calculates a parameter matrix of the asymmetric multi-task feature network using the stored learning data so that the predetermined objective function is minimized. At this time, the processor 120 may use the stochastic gradient descent method to minimize the objective function.

여기서 목적 함수는 '태스크 간에 비대칭 지식 전달을 허용하는 학습 모델'에 대한 손실 함수, 피드백 행렬을 이용하고 태스크 파라미터들의 비선형 결합을 유도하는 오토인코더 항 및 파라미터 감쇠 정규화 항을 포함할 수 있으며, 수학식 7과 같이 표현될 수 있다. 목적 함수의 구체적인 내용에 대해서는 도 3 및 도 4과 관련하여 후술한다. Here, the objective function may include a loss function for the 'learning model allowing asymmetric knowledge transfer between tasks', an autoencoder term using a feedback matrix, and deriving a nonlinear combination of task parameters, and a parameter attenuation normalization term. It can be expressed as 7. Details of the objective function will be described later with reference to FIGS. 3 and 4.

그리고 프로세서(120)는 산출된 파라미터 행렬을 생성된 비대칭 다중-태스크 특징 네트워크의 파라미터로 사용하여 비대칭 다중-태스크 특징 학습 모델을 생성한다. Then, the processor 120 uses the calculated parameter matrix as a parameter of the generated asymmetric multi-task feature network to generate an asymmetric multi-task feature learning model.

그리고 프로세서(120)는 생성된 학습 모델을 이용하여 비전 인식, 음성 인식, 자연어 처리 등의 각종 처리를 수행할 수 있다. 구체적으로, 학습 모델이 이미지 분류와 관련된 것이었으면, 프로세서(120)는 생성된 학습 모델과 입력된 이미지를 이용하여 입력된 이미지가 어떠한 것인지를 분류할 수 있다. In addition, the processor 120 may perform various processes such as vision recognition, speech recognition, and natural language processing using the generated learning model. Specifically, if the learning model is related to image classification, the processor 120 may classify the input image using the generated learning model and the input image.

이상과 같이 본 개시에 따른 전자 장치(100)는 특징에 대한 학습과정에서 각 태스크의 대칭적 영향으로 야기되는 부정적 전이를 해결할 수 있게 된다. 또한, 본 개시에 따른 전자 장치(100)는 부정적 전이가 해결된 학습 모델을 생성하는바, 실제 모델과 유사한 학습 모델을 생성할 수 있게 된다. As described above, the electronic device 100 according to the present disclosure can solve negative transitions caused by symmetrical effects of each task in the process of learning about features. Also, since the electronic device 100 according to the present disclosure generates a learning model in which negative transitions are solved, it is possible to generate a learning model similar to a real model.

한편, 이상에서는 전자 장치를 구성하는 간단한 구성에 대해서만 도시하고 설명하였지만, 구현시에는 다양한 구성이 추가로 구비될 수 있다. 이에 대해서는 도 2를 참조하여 이하에서 설명한다. Meanwhile, in the above, only a simple configuration constituting the electronic device has been illustrated and described, but in the implementation, various configurations may be additionally provided. This will be described below with reference to FIG. 2.

도 2는 본 개시의 일 실시 예에 따른 전자 장치의 구체적인 구성을 나타내는 블록도이다. 2 is a block diagram illustrating a specific configuration of an electronic device according to an embodiment of the present disclosure.

도 2를 참조하면, 전자 장치(100)는 메모리(110), 프로세서(120), 통신부(130), 디스플레이(140) 및 조작 입력부(150)로 구성될 수 있다. Referring to FIG. 2, the electronic device 100 may include a memory 110, a processor 120, a communication unit 130, a display 140, and an operation input unit 150.

메모리(110) 및 프로세서(120)의 동작에 대해서는 도 1과 관련하여 설명하였는바, 중복 설명은 생략한다. Operations of the memory 110 and the processor 120 have been described with reference to FIG. 1, and duplicate descriptions are omitted.

통신부(130)는 타 전자 장치와 연결되며, 타 전자 장치로부터 학습 데이터를 수신할 수 있다.The communication unit 130 may be connected to another electronic device and receive learning data from the other electronic device.

그리고 통신부(130)는 학습 모델을 이용한 처리를 위한 정보를 수신받을 수 있으며, 처리 결과를 타 전자 장치에 전송할 수 있다. 예를 들어, 해당 학습 모델이 이미지를 분류하는 모델이었으면, 통신부(130)는 분류할 이미지를 타 전자 장치로부터 수신받고, 분류 결과에 대한 정보를 타 전자 장치에 전송할 수 있다. In addition, the communication unit 130 may receive information for processing using a learning model, and may transmit the processing results to other electronic devices. For example, if the corresponding learning model is a model for classifying an image, the communication unit 130 may receive an image to be classified from another electronic device and transmit information about the classification result to another electronic device.

이러한 통신부(130)는 전자 장치(100)를 외부 장치와 연결하기 위해 형성되고, 근거리 통신망(LAN: Local Area Network) 및 인터넷망을 통해 단말장치에 접속되는 형태뿐만 아니라, USB(Universal Serial Bus) 포트 또는 무선 통신(예를 들어, WiFi 802.11a/b/g/n, NFC, Bluetooth) 포트를 통하여 접속되는 형태도 가능하다. The communication unit 130 is formed to connect the electronic device 100 with an external device, and is connected to a terminal device through a local area network (LAN) and an internet network, as well as a universal serial bus (USB). It is also possible to connect through a port or a wireless communication (eg, WiFi 802.11a / b / g / n, NFC, Bluetooth) port.

디스플레이(140)는 전자 장치(100)에서 제공하는 각종 정보를 표시한다. 구체적으로, 디스플레이(140)는 전자 장치(100)가 제공하는 각종 기능을 선택받기 위한 사용자 인터페이스 창을 표시할 수 있다. 구체적으로, 해당 사용자 인터페이스 창은 학습 모델의 생성에 필요한 파라미터를 입력받기 위한 항목을 포함할 수 있다. The display 140 displays various information provided by the electronic device 100. Specifically, the display 140 may display a user interface window for selecting various functions provided by the electronic device 100. Specifically, the corresponding user interface window may include an item for receiving a parameter necessary for generating a learning model.

이러한 디스플레이(140)는 LCD, CRT, OLED 등과 같은 모니터일 수 있으며, 후술할 조작 입력부(150)의 기능을 동시에 수행할 수 있는 터치 스크린으로 구현될 수도 있다. The display 140 may be a monitor such as an LCD, CRT, OLED, or the like, and may be implemented as a touch screen capable of simultaneously performing the functions of the manipulation input unit 150 to be described later.

또한, 디스플레이(140)는 학습 모델을 이용하여 테스트 결과에 대한 정보를 표시할 수 있다. 예를 들어, 해당 학습 모델이 이미지를 분류하는 모델이었으면, 디스플레이(140)는 입력된 이미지에 대한 분류 결과를 표시할 수 있다. In addition, the display 140 may display information about test results using a learning model. For example, if the corresponding learning model is a model for classifying images, the display 140 may display a classification result for the input image.

조작 입력부(150)는 사용자로부터 학습 모델 생성에 필요한 각종 파라미터를 입력받을 수 있다. The manipulation input unit 150 may receive various parameters necessary for generating a learning model from a user.

이러한 조작 입력부(150)는 복수의 버튼, 키보드, 마우스 등으로 구현될 수 있으며, 상술한 디스플레이(140)의 기능을 동시에 수행할 수 있는 터치 스크린으로도 구현될 수도 있다. The manipulation input unit 150 may be implemented with a plurality of buttons, keyboard, mouse, or the like, and may also be implemented with a touch screen capable of simultaneously performing the functions of the display 140 described above.

한편, 도 1 및 도 2를 도시하고 설명함에 있어서, 전자 장치(100)에 하나의 프로세서만 포함되는 것으로 설명하였지만, 전자 장치에는 복수의 프로세서가 포함될 수 있으며, 일반적인 CPU 뿐만 아니라 GPU가 활용될 수 있다. On the other hand, in the illustration and description of FIGS. 1 and 2, although it has been described that the electronic device 100 includes only one processor, the electronic device may include a plurality of processors, and a general CPU as well as a GPU may be utilized. have.

이하에서는 부정적 전이 문제를 해결할 수 있는 학습 방법에 대해서 자세히 설명한다. Hereinafter, a learning method capable of solving the negative transfer problem will be described in detail.

다중 태스크 학습은 여러 태스크 예측자의 일반화 성능을 향상시키기 위해 공동으로 훈련하며 동시에 여러 종류의 지식 전달을 허용한다. 멀티 태스크 학습에서 중요한 과제 중 하나는 보다 어려운 태스크에 대한 부정확한 예측자에 의해 쉬운 태스크를 위한 정확한 예측자가 부정적인 영향을 받는 상황을 설명하는 부정적인 전이 문제이다. Multi-task learning jointly trains to improve the generalization performance of multiple task predictors and allows multiple kinds of knowledge transfer at the same time. One of the important tasks in multi-task learning is a negative transition problem that accounts for the situation where an accurate predictor for an easy task is negatively affected by an inaccurate predictor of a more difficult task.

최근 도입된 방법인 AMTL(Asymmetric Multi-task Learning)은 태스크 간 파라미터 정규화를 통해 태스크 간에 비대칭 지식 전달을 허용함으로써 이러한 부정적인 전이 문제를 해결하고 있다. 특히 AMTL은 각 태스크에 대한 태스크 파라미터를 다른 태스크에 대한 파라미터의 희소 조합(sparse combination)으로도 나타내도록 지정하여 태스크 간의 지식 전달 양을 결정하는 방향 그래프(directed graph)를 학습한다. Asymmetric multi-task learning (AMTL), a recently introduced method, solves this negative transition problem by allowing asymmetric knowledge transfer between tasks through parameter normalization between tasks. In particular, AMTL learns directed graphs that determine the amount of knowledge transfer between tasks by specifying that task parameters for each task are also represented by sparse combinations of parameters for other tasks.

그러나 태스크 간 전달 모델은 여러 측면에서 제한적이다. 우선, 대부분의 경우 태스크는 어느 정도 관련성을 나타내지만 엄격한 인과 관계가 없다는 것이다. 따라서, 상호 태스크 전송 모델에서 가정한 것처럼 관련 태스크 집합에서 생성된다는 점을 고려하기보다는 공통적인 잠정적인 기본 집합에서 생성되는 것으로 가정하는 것이 더 자연스럽다. However, the delivery model between tasks is limited in many ways. First, in most cases, the task is somewhat relevant, but it has no strict causality. Therefore, it is more natural to assume that they are generated from a common tentative base set, rather than taking into account that they are generated from a set of related tasks, as assumed in the cross-task transfer model.

예를 들어 한 가지 태스크는 사람이 지방 간을 가지고 있는지를 예측하고 다른 한 가지는 환자의 건강 기록을 감안할 때 당뇨병 여부를 예측하는 것일 때, 두 태스크는 상관관계가 있지만, 명확한 원인과 결과 관계는 아니다. 오히려 비만과 같은 공통 요인에 의해 유발될 수 있다. For example, when one task predicts whether a person has fatty liver and the other predicts diabetes, given the patient's health record, the two tasks are correlated, but not a clear cause and effect relationship. . Rather, it may be caused by common factors such as obesity.

또한, AMTL은 태스크 간 지식 전달 그래프가 2차적으로 커지기 때문에 AMTL의 태스크 수가 증가함에 따라 잘 확장되지 않으므로 대규모 분류와 같이 많은 수의 태스크가 있을 때 비효율적이고 과도되는 경향이 있다. 희소성(sparsity)은 파라미터의 수를 줄이는 데 도움이 되지만 문제의 본질적인 복잡성을 줄이지는 않는다. In addition, AMTL does not scale well as the number of tasks in AMTL increases because the graph of knowledge transfer between tasks increases secondarily, so it tends to be inefficient and excessive when there are a large number of tasks such as large-scale classification. Sparsity helps reduce the number of parameters, but does not reduce the inherent complexity of the problem.

마지막으로, 태스크 간 전이 모델은 지식을 학습된 모델 파라미터와 그 관계 그래프에 저장한다. 그러나 때때로 학습된 내용을 전이 학습과 같은 다른 태스크에 사용할 수 있는 명시적 표현 형식으로 저장하는 것이 유용할 수 있다. Finally, the transition model between tasks stores knowledge in the trained model parameters and their relationship graphs. However, sometimes it can be useful to save what you learn in an explicit expression format that can be used for other tasks, such as transfer learning.

따라서, 본 개시는 다중 태스크 학습 프레임워크에서 태스크 간에 지식을 공유하는 가장 보편적인 방법의 하나인 잠재 특징(latent feature)을 배우는 것을 목표로 하는 다중 태스크 특징 학습 접근 방법에 기초하여, 비대칭 지식 전이를 강화하여 부정적인 전이를 방지하는 것을 목적으로 한다. 특히 특징 학습에 기여도가 적거나 없도록 신뢰도가 낮은 태스크 예측자의 영향을 경감한 상태에서, 높은 신뢰성을 가진 태스크 예측자가 공유 기능 학습에 더 많은 영향을 주도록 하였다. Accordingly, the present disclosure is based on a multi-task feature learning approach aimed at learning a latent feature, which is one of the most common ways to share knowledge between tasks in a multi-task learning framework, and asymmetric knowledge transfer. It aims to strengthen and prevent negative metastasis. In particular, the task predictor with high reliability has a greater influence on shared function learning, while reducing the influence of the task predictor with low reliability so that there is little or no contribution to feature learning.

이러한 동작을 위하여, 본 개시에서는 공통된 특징 기반의 모델이면서 동시에 부정적 전이 문제도 해결할 수 있는 효율적인 모델을 제시하고자 한다. AMTL의 장점을 지니면서 동시에 특징을 학습한다는 점에서 본 모델을 AMTFL(Asymmetric multi-task feature learning)이라고 지칭한다. For this operation, the present disclosure intends to present an efficient model that can solve a problem of negative transition while being a common feature-based model. This model is called asymmetric multi-task feature learning (AMTFL) in that it has the advantages of AMTL and learns features at the same time.

이러한 AMTFL 모델에 대해서는 도 3을 참조하여 이하에서 설명한다. The AMTFL model will be described below with reference to FIG. 3.

도 3은 본 개시에 따른 비대칭 다중 텍스트 특징 학습 방법을 설명하기 위한 도면이다. 3 is a diagram for explaining an asymmetric multi-text feature learning method according to the present disclosure.

도 3을 참조하면, 호랑이, 얼룩말, 하이에나를 각각 판별해야 하는 태스크가 있고, 태스크들이 공통적으로 '줄무늬' 라는 특징을 공유한다고 가정한다. Referring to FIG. 3, it is assumed that there is a task for determining tigers, zebras, and hyenas, respectively, and that tasks share a common feature of 'stripes'.

이때, 기존의 공통된 특징 기반 다중 태스크 학습 모델(a)은 서로 다른 난이도를 가지는 태스크를 똑같이 취급하여 어려운 하이에나 태스크가 줄무늬 특징에 잘못된 정보를 전달하기 쉽다. 구체적으로, 하이에나는 줄무늬가 없는 개체가 있을 수도 있고, 있더라도 흐릿한 경우도 많기 때문이다. At this time, the existing common feature-based multi-task learning model (a) treats tasks having different difficulties in the same way, and it is easy for a difficult hyena task to transmit wrong information to the striped feature. Specifically, hyenas may have objects without streaks, and they are often blurry.

따라서, 종래와 같은 AMTL 모델에서는 명시적 특징을 가정하지 않기 때문에, 도 3(b)와 같이 쉬운 태스크에서 어려운 태스크로 비대칭적인 지식 전이가 일어나게 된다. 이에 따라 하이에나는 단순히 지식을 전이 받기만 하므로 부정적 전이 문제는 효과적으로 대처할 수 있지만 공통된 특징(줄무늬)을 구성하지 않으므로 학습이 효율적이라고 볼 수 없다. Therefore, as the conventional AMTL model does not assume an explicit feature, an asymmetric knowledge transfer occurs from an easy task to a difficult task as shown in FIG. 3 (b). Accordingly, hyenas simply transfer knowledge, so they can effectively deal with negative transfer problems, but they do not constitute common features (stripes), so learning cannot be considered efficient.

이러한 점을 해결하기 위하여, 본 개시의 AMTEL은 도 3(c)와 같이 줄무늬라는 특징을 모든 태스크가 공통적으로 사용하되, 쉬운 태스크인 호랑이, 얼룩말에서 학습된 결과를 사용해 다시 줄무늬를 유추해내는 추가적인 피드백 연결을 구성한다. To solve this problem, the AMTEL of the present disclosure uses a feature called stripes as shown in FIG. 3 (c) in common for all tasks, but additionally infers stripes again using the results learned from the tiger, zebra, which are easy tasks. Establish a feedback connection.

이러한 추가적인 피드백 연결은, 좀 더 쉬운 태스크와 줄무늬 특징 간에 강한 순환 의존적 관계를 형성하여, 하이에나와 같은 어려운 태스크가 줄무늬 특징에 미치는 영향을 상대적으로 약화시키는 것이다. This additional feedback linkage creates a strong circular dependency between the easier task and the striped feature, thereby relatively weakening the effect of difficult tasks such as hyena on the striped feature.

이러한 점을 사람의 관점에 대입하여 설명하면 다음과 같다. This is explained by substituting this into the human perspective.

사람은 일단 줄무늬에 대한 개념을 다양한 시각적 정보를 통해 학습하고, 후에 호랑이, 얼룩말 등과 같은 쉬운 개념들을 통해 이를 다시 역으로 추론한다. 그러면 기존에 학습했던 줄무늬에 대한 관념이 이러한 역추론을 통해 좀 더 강화된다. A person first learns the concept of stripes through various visual information, and then infers it back through easy concepts such as tigers and zebras. Then, the idea of stripes previously learned is reinforced through this inference.

따라서 하이에나와 같은 어려운 태스크가 줄무늬에 대한 관념을 참조하더라도 학습 시에는 특징에 별 영향력을 행사하지 못하게 되는 것이다. 비대칭적 지식 전이 방향의 관점에서 설명하면, [ (호랑이, 얼룩말) -> 줄무늬 -> 하이에나 ] 와 같은 지식 전이가 일어나게 되고, 이는 기존의 AMTL의 모델과 유사한 태스크 간 지식 전이라고 볼 수 있다. 물론 본 개시의 모델은, GO-MTL처럼 공통된 특징을 기반으로 하는 점이 다르다.Therefore, even if a difficult task, such as a hyena, refers to the notion of stripes, it does not exert much influence on features during learning. In terms of the asymmetric knowledge transfer direction, knowledge transfer such as [(tiger, zebra)-> stripes-> hyena] occurs, which can be seen as a knowledge transfer between tasks similar to the existing AMTL model. Of course, the model of the present disclosure is different in that it is based on a common feature like GO-MTL.

이와 같은 본 개시에 따른 AMTFL 모델은 네트워크의 최상위 레이어가 원래의 피드 포워드 연결과 함께 피드백 연결을 위한 추가 가중치 행렬을 포함하는바, 심층 신경 네트워크에서 학습하는 기능으로 자연스럽게 확장된다는 것이다. 그리고 각 태스크 예측기에서 맨 아래 레이어로의 비대칭 전송을 허용한다. 이를 통해 AMTFL 모델은 최근의 심층 학습의 발전으로부터 이득을 얻기 위하여 최첨단의 심층 신경망 모델을 활용할 수 있는 이점이 있게 된다. The AMTFL model according to the present disclosure is that the top layer of the network includes an original feedforward connection and an additional weight matrix for feedback connection, so that it naturally extends to a function learned in a deep neural network. And it allows asymmetric transmission from each task predictor to the bottom layer. Through this, the AMTFL model has the advantage of utilizing the state-of-the-art deep neural network model to benefit from the recent development of deep learning.

이하에서는 본 개시의 비대칭 다중 태스크 특징 학습에 대한 내용을 설명한다. Hereinafter, the content of learning the asymmetric multi-task feature of the present disclosure will be described.

본 비대칭 다중 태스크 특징 모델은 AMTL 모델의 기본 아이디어를 바탕으로 한다. AMTL 모델은 앞서 설명했듯이 각 태스크 파라미터가 다른 태스크 파라미터의 선형 결합과 가깝도록 유도된다는 점이 특징이다. This asymmetric multi-task feature model is based on the basic idea of the AMTL model. The AMTL model is characterized by the fact that each task parameter is derived close to a linear combination of other task parameters, as described above.

우선 주어진 데이터를

로 표현한다. 여기서, T는 총 태스크의 개수,

는 태스크

의 데이터(인스턴스) 개수,

는 특징 차원 (feature dimension)을 의미한다. 그리고 X는 데이터 인스턴스들의 세트를 나타내고, yt는 대응하는 라벨을 나타낸다. First, given data

Expressed as Where T is the total number of tasks,

The task

Data (instances) count,

Means a feature dimension. And X represents a set of data instances, and yt represents a corresponding label.

한편, 다중 태스크 학습의 목표는 아래의 수학식 1과 같이 일반적인 학습 목표에서처럼 모든 태스크(T)를 이용하여 학습 모델을 훈련시키는 것이다. On the other hand, the goal of multi-task learning is to train the learning model using all tasks (T) as in the general learning goal, as shown in Equation 1 below.

여기서

은 학습 데이터의 교차 엔트로피 손실(손실 함수),

는 태스크 t에 대한 모델 파라미터이고,

는

로 정의되는 열 방향 연결 행렬이다. 그리고 패널티(

)는 모델 파라미터(W)에 대한 특정 사전 가정을 강화하는 정규화 항이다. here

Is the cross-entropy loss of the training data (loss function),

Is the model parameter for task t,

The

It is a column-directed connection matrix defined by. And penalty (

) Is a normalization term that reinforces certain prior assumptions for the model parameter (W).

일반적인 가정 중 하나는 태스크에 걸쳐 공통적인 잠정 기준 집합이 존재한다는 것이다. 이러한 가정을 하면, 행렬

는

로 분해될 수 있다. One of the common assumptions is that there is a common set of tentative criteria across tasks. With this assumption, the matrix

The

Can be decomposed into

여기서

는 k로 표현되는 잠재적 기저 행렬이고,

는 기저 행렬과 선형적으로 결합하기 위한 계수 행렬이다. 따라서, 정규화 항

을 이용하면 멀티 태스크 학습은 다음과 같은 수학식 2로 표현될 수 있다. here

Is the potential basis matrix expressed in k,

Is a coefficient matrix for linearly combining the base matrix. Therefore, the normalization term

Using, multi-task learning may be expressed by Equation 2 below.

여기서

는 태스크 t에 대한 기저 가중치를 나타내는

의 t 번째 열 벡터, 즉

이다. 다른 관점에서,

은 특징 변환으로 생각할 수 있는바, 각 인스턴스 i에 대한 변화된 특징(

)을 이용하여 각 태스크 t에 대한 학습 파라미터

로 간주 할 수 있다. here

Denotes the basis weight for task t

T th column vector, i.e.

to be. From another point of view,

Can be thought of as a feature transformation, the changed feature for each instance i

) To learn the training parameters for each task t

Can be considered as

수학식 2의 특수한 경우로서, 예를 들어, Go-MTL 모델에서, 엘리먼트-와이즐리(element-wisely)

가 정규화되고,

가 희귀하도록

을 강화할 수 있다. As a special case of Equation 2, for example, in a Go-MTL model, element-wisely

Is normalized,

To be rare

Can strengthen.

다른 예로서, 수학식 3에서

은 직교되고,

는 (2,1) - 놈(norm)으로 정규화되도록 제한될 수 있다. As another example, in Equation 3

Is orthogonal,

(2,1)-can be limited to normalize to a norm.

한편, 잠재적 기저의 공통된 세트가 존재한다는 가정을 하지 않고, 태스크 관련성을 이용할 수도 있다. AMTL은 각 태스크 파라미터

가 다른 태스크 파라미터

의 희소한 조합으로 재구성된다는 가정에 기초한 멀티 태스크 학습의 사례이다.On the other hand, task relevance may be used without assuming that there is a common set of potential bases. AMTL is a parameter for each task

Different task parameters

It is an example of multitask learning based on the assumption that it is reconstructed with a rare combination of.

다시 말해

에서

의 가중치

는

로 재구성될 수 있으며, 이는 태스크

에서 t 로의 지식 전달의 양으로 해석될 수 있는 것으로 이해될 수 있다. 행렬

에는 대칭 제약이 없기 때문에보다 신뢰성 있는 태스크에서 덜 신뢰성 있는 태스크로 비대칭 지식 전달 방향을 배울 수 있게 된다. 이러한 점을 고려하여, AMTL을 다음과 같은 수학식 4를 통해 멀티 태스크 학습 문제를 풀 수 있다. In other words

in

Weight of

The

Can be reconfigured as a task

It can be understood that it can be interpreted as the amount of knowledge transfer from to t. procession

Because there is no symmetry constraint in, you can learn asymmetric knowledge transfer direction from more reliable task to less reliable task. Considering this, AMTL can solve the multi-task learning problem through Equation 4 below.

와

는 각 항의 비중을 조절하기 위한 모델 파라미터이다. 그리고,

는 각

의 연결(concatenation)이고,

는 태스크 간 전이행렬,

벡터는

의 각 행이다.

Wow

Is a model parameter for controlling the specific gravity of each term. And,

Each

Is the concatenation of

Is a transition matrix between tasks,

Vector is

Is each row.

이때

의 대각 요소들은 모두

이다. 즉,

의 각 요소는 태스크

가

를 제외한 다른 태스크에 얼마나 많은 지식을 전이했는지를 나타내게 된다.At this time

All of the diagonal elements of

to be. In other words,

Each element of the task

end

It indicates how much knowledge has been transferred to other tasks except.

상술할 수학식 4에서

항의 역할이 각 태스크 파라미터가 다른 태스크 파라미터의 선형 결합으로 유도되도록 하는 것이다. In Equation 4 to be described above

The role of the term is to make each task parameter lead to a linear combination of the other task parameters.

한편, 수학식 3은 모든 태스크가 똑같이 잠재적 기저의 구성에 기여하기 때문에 노이즈하고 어려운 태스크에서 깨끗한/쉬운 태스크로의 심각한 부정적인 전이가 발생하는 결점이 있다. On the other hand, Equation (3) has the drawback that a serious negative transition from a noisy and difficult task to a clean / easy task occurs because all tasks equally contribute to the potential base construction.

또한, 수학식 4는 태스크 수에 맞게 확장할 수 없으며, 명시적인 특징을 배우지 않는다는 단점이 있다. In addition, Equation 4 cannot be extended to suit the number of tasks, and has the disadvantage of not learning explicit features.

따라서, 이하에서는 상술한 두 가지 전이 접근 방식의 한계를 극복하고 비대칭 지식 전달을 부정적인 전이를 방지하면서 심층 신경 네트워크에 적용하는 효과적인 방법을 도 4를 참조하여 설명한다. Therefore, hereinafter, an effective method of overcoming the limitations of the above-described two transition approaches and applying asymmetric knowledge transfer to a deep neural network while preventing negative transfer will be described with reference to FIG. 4.

도 4는 본 개시의 일 실시 예에 따른 피드백 재구성을 나타내는 도면이다. 4 is a diagram illustrating feedback reconstruction according to an embodiment of the present disclosure.

도 4를 참조하면, L은 학습된 기저행렬이고, LS는 학습된 모델 파라미터를 나타낸다. Referring to FIG. 4, L is a trained base matrix, and LS is a trained model parameter.

먼저, 오토인코더 프레임워크를 사용하여 기저 특징을 재구성한다. 구체적으로, 특징 변환과 동일한 L을 사용하여 새로운 특징

을 얻는다. 여기서

는 비선형 변환(구현시에는 recified linear unit(RELU)가 활용될 수 있다)이다. First, the base feature is reconstructed using the autoencoder framework. Specifically, new features using the same L as feature transformation

Get here

Is a non-linear transformation (a recified linear unit (RELU) can be used in implementation).

그리고 모델 출력을 얻기 위하여 변환된 특징상에서 st를 학습한다. 그리고 특징 공간으로 변환 A를 다시 도입하여 잠재적 특징을 재구성하기 위해 모델 출력을 시행한다. 즉,

이다. 여기서 마지막 층의

는 실제 예측에 대한 비선형성(예 : 분류를 위한 softmax 또는 sigmoid 함수, 회귀(regression)의 경우 하이퍼 볼릭 탄젠트 함수(hyperbolic tangent function) 사용)이다. 이 재구성 항을 사용하여 다중 태스크 특징 학습 공식을 수학식 5와 같이 표현할 수 있다. And to get the model output, we learn st on the transformed features. The model A is then output to reconstruct the potential feature by introducing transform A back into the feature space. In other words,

to be. Here's the last floor

Is the nonlinearity of the actual prediction (e.g., a softmax or sigmoid function for classification, or a hyperbolic tangent function for regression). Using this reconstruction term, the multi-task feature learning formula can be expressed as Equation (5).

AMTFL은 AMTL과 달리 명시적 특징을 가정하고 있으므로 이를

이라 표현한다. 이때,

은 각

에 곱해져 특징을 형성하는 잠재적 기저행렬이며,

는 임의의 비선형 함수를 나타낸다. 그리고

의 요소 개수를

라고 할 때,

는 태스크에서 특징으로의 피드백 연결을 위한 행렬이며, AMTFL 모델에서 가장 중요한 변수가 된다.

의 각 행 벡터

는 AMTL에서의

와 유사한 역할을 하며, 각 태스크

가 특징요소들에 얼마나 많은 지식을 전이했는지를 나타낸다.

는 네트워크 가장 마지막 부분에서 실제 예측을 위해 쓰이는 함수 (예를 들어 sigmoid, 혹은 softmax 함수)를 나타낸다.

는 각 항들의 비중을 조절하기 위한 모델 파라미터이다. AMTFL, unlike AMTL, assumes an explicit feature, so

It is expressed as At this time,

Silver angle

Is a potential base matrix multiplied by to form a feature,

Denotes any nonlinear function. And

The number of elements in

When I say,

Is a matrix for connecting feedback from task to feature, and is the most important variable in the AMTFL model.

Each row of vector

In the amtl

Plays a similar role to each task

Indicates how much knowledge has been transferred to the feature elements.

Denotes a function (eg sigmoid, or softmax function) used for actual prediction at the end of the network.

Is a model parameter for adjusting the specific gravity of each term.

태스크로부터 특징으로 비대칭 학습 전이를 위해 태스크의 어려움을 가중하는 A를 제한한다. 따라서 잠재 특징의 쉽고 신뢰도 있는 태스크 예측자가 보다 많은 기여를 하게 된다. 이러한 점을 고려한 비대칭 다중 태스크 특징 학습 모델은 아래와 같은 수학식 6과 같이 나타낼 수 있다. Limits A, which adds to task difficulty for asymmetric learning transfer from task to feature. Therefore, an easy and reliable task predictor of latent features contributes more. The asymmetric multi-task feature learning model considering this point can be expressed as Equation 6 below.

위 식을 다중 레이어를 위한 식으로 바꾸면 수학식 7과 같다. If the above expression is replaced with an expression for multiple layers, Equation 7 is given.

여기서,

는 레이어 L-1에서

로 표현되는 학습된 심층의 행렬이고,

은 마지막 레이어에 대한 가중치 행렬이고,

는 특징 공간 z에 대한 피드백 연결의 행렬이다. 여기서,

는

벡터의 t번째 행이다. here,

In layer L-1

Is a trained deep matrix represented by

Is the weight matrix for the last layer,

Is a matrix of feedback connections to feature space z. here,

The

It is the t-th row of the vector.

에 대한 유효 희소성은

이고, 여기서

는 태스크 t에 대한 싱글 태스크 학습의 평균 유효성 손실이고, 표시를 단순화하기 위하여

이다.

Effective scarcity for

And here

Is the average effectiveness loss of single-task learning for task t, to simplify the presentation

to be.

유의할 것은

하나만으로는 오버피팅을 줄이기 위하여 실제 위험을 포착하기에는 불충분할 수 있다. 여기서 만약 몇 개의 태스크가 낮은 리스트를 갖는 경우, 그것은 낮은 희소성 및 잠재적 기저 구축에 더 많은 기여를 하게 된다. Note that

One alone may not be enough to capture the actual risk to reduce overfitting. Here, if a few tasks have a low list, it contributes more to low scarcity and potential base construction.

앞선 수학식에서의 손실 함수

는 일반적인 손실 함수 일 수 있다. 즉, 일반적인 인스턴스(회귀 태스크에서는 스퀘어드 손실인

, 분류 태스크에서는 로직 손실인

)을 사용할 수 있다. 여기서

는 시그모이드 함수이다. Loss function in the previous equation

Can be a general loss function. In other words, a typical instance (squared loss in regression tasks

In the classification task, logic loss

) Can be used. here

Is a sigmoid function.

도 5는 본 개시의 일 실시 예에 따른 Deep-AMTFL를 나타내는 도면이다. 5 is a diagram showing a Deep-AMTFL according to an embodiment of the present disclosure.

도 5를 참조하면, 마지막 레이어의 그레이 스케일은 실제 위험 측정의 양을 나타내고, 상부 방향의 화살표는 피드백 연결을 나타내고 하부 방향의 화살표는 피드 포워드 단계를 나타낸다. Referring to Fig. 5, the gray scale of the last layer represents the amount of actual risk measurement, the arrow in the upper direction represents the feedback connection, and the arrow in the lower direction represents the feed forward step.

도 6은 임의의 기저 네트워크에서 본 AMTFL 모델의 레이어를 나타내는 도면이다. FIG. 6 is a diagram showing layers of an AMTFL model viewed from an arbitrary base network.

도 6을 참조하면, 실선과 일점 쇄선은 피드 포워드 연결을 나타낸 것이고, 점선과 요소들은 피드백 연결과 지식 전이를 나타낸 것이다. 선의 두께는 각기 서로 다른 양의 신뢰도와 피드백 연결에 의해 특징 부분에 전파된 지식의 양을 나타낸 것이다.Referring to FIG. 6, the solid line and the dashed-dotted line represent the feed forward connection, and the dotted line and the elements represent feedback connection and knowledge transfer. The thickness of the line represents the amount of knowledge propagated to the feature part by different amounts of reliability and feedback connections.

이하에서는 도 7 내지 도 11을 참조하여, 본 개시에 따른 심층 비대칭 다중 태스크 특징 학습 방법의 효과를 설명한다. Hereinafter, the effect of the deep asymmetric multi-task feature learning method according to the present disclosure will be described with reference to FIGS. 7 to 11.

구체적으로, 합성 데이터 세트를 이용한 실험 결과와 실제 데이터 세트를 이용한 실험 결과를 설명한다. Specifically, experimental results using a synthetic data set and experimental results using a real data set will be described.

먼저, 합성 데이터 세트를 이용한 실험에 사용된 학습 모델에 대해서 이하에서 설명한다. First, the learning model used in the experiment using the synthetic data set will be described below.

첫 번째는 STL이다. STL은 선형 단일 태스크 학습 모델로, 회귀 분석에서는 능선 회귀(ride regression) 모델을, 분류에서는 로지스틱 회귀(logistic regression) 모델을, 다중 클래스 분류의 경우 softmax 회귀 모델을 사용하였다. The first is STL. STL is a linear single task learning model, a ridge regression model for regression analysis, a logistic regression model for classification, and a softmax regression model for multiclass classification.

두 번째는 GO-MTL이다. GO-MTL은 특징 공유 멀티 태스크(feature-sharingMTL) 모델로 서로 다른 태스크 예측 변수가 공통된 잠정 기준을 공유한다. The second is GO-MTL. GO-MTL is a feature-sharing MTL model, and different task predictors share a common tentative criterion.

세 번째는 AMTL이다. AMTL은 앞서 설명한 바와 같이 비대칭 다중 태스크 학습 모델로 파라미터 기반 정규화를 통해 태스크 간 지식 전달을 수행한다.The third is AMTL. AMTL is an asymmetric multi-task learning model, as described above, and performs knowledge transfer between tasks through parameter-based normalization.

네 번째는 NN이다. NN은 단일 숨겨진 레이어가 있는 간단한 피드 포워드 신경망이다. The fourth is NN. NN is a simple feed forward neural network with a single hidden layer.

다섯 번째는 다중 태스크 NN이다. 다중 태스크 NN은 NN과 유사하지만 각 태스크에 대한 손실을 Nt로 나눈 값으로 태스크 손실을 조정한다. 이 모델에서는 마지막으로 완전히 연결된 레이어에서 l1 정규화를 적용하였다. The fifth is multi-task NN. Multi-task NN is similar to NN, but it adjusts task loss by dividing the loss for each task by Nt. In this model, l1 normalization was applied to the last fully connected layer.

여섯 번째는 AMTFL이다. 본 개시에 따른 비대칭 다중 태스크 특징 학습이다. The sixth is AMTFL. Asymmetric multi-task feature learning according to the present disclosure.

도 7은 합성 데이터 세트를 가지고 학습된 특징 및 파라미터를 도시한 도면이다. 7 is a diagram illustrating features and parameters learned with a composite data set.

먼저, 도 7의 이미지를 얻기 위하여, 먼저 6개의 30차원의 실제 기저행렬을 생성하였다. First, in order to obtain the image of FIG. 7, six 30-dimensional real base matrices were first generated.

먼저, 가우시안 노이즈

하기 용이하도록, 12개 태스크에 대한 파라미터를 생성하고, 노이즈 레벨(클린, 노이즈)에 기초하여 두 개의 그룹을 생성하기 위하여

를 구분하였다. 클린 태스크는

의 가우시안 노이즈 레벨을 갖고, 노이즈한 태스크는

갖도록 하였다. 각 노이즈한 태스크는

중에서 선택되고, 각 클린 태스크 파라미터는 선형적으로

에 결합하기 위하여 4개의 기저

의 2개의 출력으로 조합되었다. 따라서, 기저

는 클린 및 노이즈한 태스크 각각에 오버랩되었으며, 다른 기저는 각 그룹에서 독점적으로 사용되었다. First, Gaussian noise

To make it easy to create parameters for 12 tasks, and to create two groups based on the noise level (clean, noise)

Separated. Clean Task

With a Gaussian noise level of

To have. Each noisy task

Is selected from, and each clean task parameter is linearly

4 bases to join on

It was combined into two outputs. Therefore, the basis

Was overlapped with each of the clean and noisy tasks, and the other base was used exclusively in each group.

그리고 각 그룹(클린 태스크의 {50/50/100), 노이즈 태스크의 {25/25/100}에 대한 4개의 무작위 학습/평가/태스트 스플릿을 생성하였다. Then, four random learning / evaluation / task splits were generated for each group ({50/50/100 of clean task) and {25/25/100} of noise task.

그리고 AMTFL과 비교하기 용이하도록, 모든 기본 모델을 신경망으로 구현하였으며, L의 보다 나은 재구성을 위하여 모든 모델에

정규화를 적용하였다. 그리고 AMTFL의 히든 레이어(hidden layer)에서 비선형성을 제거하였으며, 모든 하이퍼 파라미터는 개별적인 유효 세트에서 찾았다. 그리고 AMTL을 위하여 데이터 세트 특징에 기초하여 음수 아닌 제약 조건을 제거하였다. And, for easy comparison with AMTFL, all basic models were implemented with neural networks, and all models were modeled for better reconstruction of L.

Normalization was applied. In addition, nonlinearity was removed from the hidden layer of AMTFL, and all hyperparameters were found in individual valid sets. And for AMTL, non-negative constraints were removed based on data set characteristics.

도 7을 참조하면, 첫 번째 이미지(710)는 실제 기저행렬을 나타내고, 두 번째 이미지(720)는 실제 모델 파라미터를 나타낸다. Referring to FIG. 7, the first image 710 represents an actual base matrix, and the second image 720 represents an actual model parameter.

그리고 세 번째 이미지(730)는 본 개시를 이용하여 생성한 기저행렬이고, 네 번째 이미지(740)는 본 개시를 이용하여 생성한 모델 파라미터를 나타낸다. And, the third image 730 is a base matrix generated using the present disclosure, and the fourth image 740 represents model parameters generated using the present disclosure.

그리고 다섯 번째 이미지(750)는 Go-MTL를 이용하여 생성한 기저 행렬을 나타낸다. In addition, the fifth image 750 represents a base matrix generated using Go-MTL.

그리고 여섯 번째 이미지(760)는 Go-MTL를 이용한 생성한 모델 파라미터이다. And the sixth image 760 is a model parameter generated using Go-MTL.

그리고 일곱 번째 이미지(770)는 AMTL에서 학습하여 생성한 모델 파라미터 결과이고, 여덜번째 이미지(780)는 태스크 전달 함수 행렬을 나타낸다. And the seventh image 770 is a model parameter result generated by learning in the AMTL, and the sixth image 780 represents a task transfer function matrix.

도 7의 세 번째 이미지(730)와 다섯 번째 이미지(750)를 비교하여 보면, Go-MTL을 이용하여 학습된 기저행렬(750)보다 본 개시에 따른 기저행렬(730)이 실제 기저행렬(710)과 유사한 것을 확인할 수 있다. When the third image 730 and the fifth image 750 of FIG. 7 are compared, the base matrix 730 according to the present disclosure is actually the base matrix 710 than the base matrix 750 learned using Go-MTL. ).

또한, 네 번재 이미지(740), 여섯 번째 이미지(760), 일곱 번째 이미지(760)를 상호 비교하여 보면, GO-MTL를 이용한 실제 모델 파라미터(760)와 AMTL에서 학습된 모델 파라미터(770)보다 본 개시에 따른 학습된 모델 파라미터(740)가 실제 모델 파라미터(720)에 유사하고, 노이즈가 가장 적게 포함된 것을 확인할 수 있다. 이는 태스크에서 특징으로의 지식전이가 학습에 유의미하게 도움을 주고 있기 때문이다. In addition, when comparing the fourth image 740, the sixth image 760, and the seventh image 760, the actual model parameter 760 using GO-MTL and the model parameter 770 learned from AMTL It can be seen that the trained model parameter 740 according to the present disclosure is similar to the real model parameter 720 and contains the least amount of noise. This is because the transfer of knowledge from task to feature significantly helps learning.

또한, 여덟 번째 이미지(780)를 참고하면, 어려운 태스크(task 7~12)에서 쉬운 태스크 (task 1~6)로 지식이 전이되는 부분은 없는 것을 확인할 수 있다.Also, referring to the eighth image 780, it can be seen that there is no part where knowledge is transferred from a difficult task (tasks 7 to 12) to an easy task (tasks 1 to 6).

도 8은 다양한 학습 모델의 정량적 평가 결과를 나타내는 도면이다. 8 is a diagram showing the results of quantitative evaluation of various learning models.

도 8을 참조하면, 본 개시에 따른 AMTFL이 종래의 방식들보다 우수함을 확인할 수 있다. 구체적으로, 본 개시에 따른 AMTFL은 클린 및 노이즈한 태스크 모두에 대해서 상대적으로 낮은 오류를 갖지만, GO-MTL은 노이즈한 태스크에 대해서 STL보다 높은 오류를 갖는 것을 확인할 수 있다. 이는 노이즈한 태스크로부터 부정적인 전이에 의한 것으로 판단된다. Referring to FIG. 8, it can be confirmed that AMTFL according to the present disclosure is superior to conventional methods. Specifically, AMTFL according to the present disclosure has a relatively low error for both clean and noisy tasks, but it can be confirmed that GO-MTL has a higher error than STL for noisy tasks. This is judged to be due to a negative transition from a noisy task.

그리고 AMTL과 비교하였을 때도, 본 개시에 따른 AMTFL이 우수한 것을 확인할 수 있다. 구체적으로, 동일한 세트의 기저로부터 각 태스크의 데이터가 생성된 것을 가정하더라도, AMTL은 특정 합성 데이터 세트에서 태스크 간의 의미 있는 연결을 찾기가 어렵기 때문이다. And even when compared to AMTL, it can be seen that the AMTFL according to the present disclosure is excellent. Specifically, even if it is assumed that data of each task is generated from the same set of bases, AMTL is difficult to find a meaningful connection between tasks in a specific composite data set.

도 9는 STL 대비 태스크 당 오류 감소를 나타내는 도면이다. 9 is a diagram showing an error reduction per task compared to an STL.

도 9를 참조하면, GO-MTL이 부정적인 전이로 고통받는 동안 AMTFL이 부정적인 전이를 효과적으로 방지하고 AMTL보다 큰 개선이 이루어짐을 확인할 수 있다. Referring to FIG. 9, it can be seen that AMTFL effectively prevents negative metastasis and a greater improvement than AMTL while GO-MTL suffers from negative metastasis.

도 10은 증가된 태스크 수에 따른 오류 감소 및 학습 시간을 나타내는 도면이다. 10 is a diagram showing error reduction and learning time according to an increased number of tasks.

도 10을 참조하면, 본 개시에 따른 AMTFL이 AMTL보다 오류 감소 및 학습 시간 측면에서 우수함을 확인할 수 있다. 특히, 태스크의 수가 증가하면 이러한 우수성은 확연함을 확인할 수 있다. Referring to FIG. 10, it can be confirmed that AMTFL according to the present disclosure is superior to AMTL in terms of error reduction and learning time. In particular, when the number of tasks increases, it can be confirmed that such excellence is obvious.

이하에서는 실제 데이터 세트를 이용한 실험에 사용한 학습 모델 및 실제 데이터 세트를 설명한다. Hereinafter, the learning model and the actual data set used in the experiment using the actual data set will be described.

1) AWA-Attr : 30475 이미지로 구성되는 분류 데이터 세트이고, 태스크는 단일 동물을 설명하는 각 이미지에 대한 85개의 바이너리 특징을 예측한다.1) AWA-Attr: A set of classification data consisting of 30475 images, and the task predicts 85 binary features for each image describing a single animal.

2) MNIST : 10진수로 표시되는 28x28의 이미지가 학습을 위하여 60000개, 태스크를 위하여 10000개로 구성된 분류를 위한 표준 데이터 세트이다. 2) MNIST: A standard data set for classification consisting of 60000 images for learning and 10000 images for tasks with 28x28 images displayed in decimal.

3) School: 139개의 학교의 15362 학생의 시험 점수를 예측하는 회귀 데이터 집합이다. 각 학교의 시험 점수 예측은 하나의 과제로 간주된다. 3) School: This is a set of regression data predicting the test scores of 15362 students in 139 schools. Each school's test score prediction is considered a task.

4) ImageNet-Room : ImageNet 데이터 세트의 하위 세트로 14140개의 이미지를 20개의 실내 장면 클래수로 구분한다. 4) ImageNet-Room: A subset of the ImageNet data set, which divides 14140 images into 20 indoor scene classes.

도 11은 실제 데이터 세트에 적용한 2개의 모델의 실험 결과를 나타내는 도면이다. 11 is a diagram showing experimental results of two models applied to an actual data set.

도 11을 참조하면, 본 개시에 따른 AMTFL은 대부분의 데이터 세트에 대해서 기존의 방식보다 우수한 성능이 있음을 확인할 수 있다. Referring to FIG. 11, it can be confirmed that the AMTFL according to the present disclosure has better performance than the conventional method for most data sets.

이하에서는 4개의 심층 모델과 그에 대한 실험 결과를 설명한다. Hereinafter, four in-depth models and experimental results are described.

1) DNN : 피드백 연결 없이 softmax 손실이 있는 일반적인 심층 신경 네트워크.1) DNN: general deep neural network with softmax loss without feedback connection.

2) 멀티태스킹 DNN : 각 태스크에 대한 softmax 손실을 갖는 DNN을 Nt(각 클래스에 대한 긍정적 인 인스턴스의 수)로 나눈 값. 2) Multitasking DNN: DNN with softmax loss for each task divided by Nt (number of positive instances for each class).

3) Deep-AMTL : Multitask DNN과 동일하지만 비대칭 다중 태스크 학습 목표는 원래 softmax 손실을 대체한 것. 3) Deep-AMTL: Same as Multitask DNN, but the asymmetric multitask learning goal is to replace the original softmax loss.

4) Deep-AMTFL : 본 개시의 심층 비대칭 다중 태스크 특징 학습 모델.4) Deep-AMTFL: Deep asymmetric multitask feature learning model of the present disclosure.

심층 모델의 실험을 위하여, CIFAR-100, AWA, ImageNet-Small을 데이터 세트로 사용하였다. For in-depth model experiments, CIFAR-100, AWA, and ImageNet-Small were used as data sets.

그리고 기존의 모델 및 본 개시의 심층 모델을 구현하기 위해 Caffe 프레임워크를 사용하였다. DNN 및 멀티태스킹 DNN의 경우 처음부터 모델을 학습하고, 나머지 모델은 신속한 학습을 위해 Multitask DNN에서 생성하였다. And the Caffe framework was used to implement the existing model and the deep model of the present disclosure. In the case of DNN and multitasking DNN, the model was trained from the beginning, and the rest of the models were created in the Multitask DNN for quick learning.

Deep-AMTFL의 태스크 신뢰 항

에 대해 각 태스크에 대한 실제 리스크의 프록시로서

를 사용하였다. 구체적으로, 유효성 손실을 사용은 모델을 두 번 학습해야 하기 때문에 실용적이지 않기 때문이다. 유사하게 원칙적으로

도 교차 검증을 통하여 찾아야 하나,

가 충분히 크다는 불균형에 기초하여 수정된

을 사용하였다. 또한, 수학식 7에서의

과

와 같은 파라미터를 찾기 위하여 광범위한 하이퍼파라미터 검색을 사용하지 않고, 합리적인 숫자의 값을 설정하였다. Deep-AMTFL's task confidence term

As a proxy of the actual risk for each task

Was used. Specifically, using loss of effectiveness is not practical because you have to train the model twice. Similarly, in principle

Should be found through cross-validation,

Corrected based on the imbalance that is large enough

Was used. Also, in equation (7)

and

In order to find such parameters, a wide number of hyperparameters were not used, and reasonable numeric values were set.

도 12는 복수의 심층 모델에서의 정량적 평가를 나타내는 도면이다. 12 is a diagram showing quantitative evaluation in a plurality of in-depth models.

도 12를 참조하면, 본 개시의 Deep-AMTFL은 심층 학습 프레임워크에서 비대칭 다중 태스크 특징 학습의 효과를 보여주는 Deep-AMTL을 포함한 다른 모델의 성능을 능가함을 확인할 수 있다. Referring to FIG. 12, it can be seen that the Deep-AMTFL of the present disclosure surpasses the performance of other models including Deep-AMTL showing the effect of asymmetric multi-task feature learning in the deep learning framework.

도 13은 AWA 데이터 세트에 대한 태스크 단위의 결과를 나타낸다. 13 shows the results of task units for the AWA data set.

도 13을 참조하면, AMTFL은 50개 클래스 중 41개 클래스에서 기존의 CNN보다 성능이 우수하며 소수 클래스를 제외하고는 성능이 저하되지 않음을 확인할 수 있다. Referring to FIG. 13, it can be confirmed that AMTFL has better performance than the existing CNN in 41 of 50 classes and does not deteriorate except for a few classes.

이와 같은 결과는 성능 향상이 대부분 부정적 전이의 억제에서 비롯된 것이다. These results were mostly attributed to suppression of negative metastasis.

도 14는 본 개시의 일 실시 예에 따른 학습 모델 생성 방법을 나타내는 도면이다. 14 is a diagram illustrating a method for generating a learning model according to an embodiment of the present disclosure.

학습 데이터를 입력받는다(S1410). 구체적으로, 학습에 필요한 각종 데이터 세트를 입력받을 수 있으며, 구현시에 해당 학습 데이터는 기입력되어 저장되어 있을 수도 있다. Learning data is input (S1410). Specifically, various data sets necessary for learning may be input, and the corresponding learning data may be pre-entered and stored at the time of implementation.

그리고 태스크 간에 비대칭 지식 전달을 허용하는 학습 모델의 파라미터 행렬과 태스크에서 특징으로의 피드백 연결을 위한 피드백 행렬을 갖는 비대칭 다중-태스크 특징 네트워크를 생성한다(S1420). 구체적으로, 도 4에 도시된 바와 같이 피드백 연결을 있는 뉴럴 네트워크를 생성할 수 있다. In addition, an asymmetric multi-task feature network having a parameter matrix of a learning model allowing asymmetric knowledge transfer between tasks and a feedback matrix for feedback connection from the task to the feature is generated (S1420). Specifically, as shown in FIG. 4, a neural network having a feedback connection can be generated.

그리고 기설정된 목적 함수가 최소화되도록 입력된 학습 데이터를 이용하여 비대칭 다중-태스크 특징 네트워크의 파라미터 행렬을 산출한다(S1430). 구체적으로, 추계적 경상 하강(Stochastic Gradient Descent) 방법을 이용하여 수학식 7과 같은 목적 함수가 최소화되도록 하여 네트워크의 파라미터 행렬을 산출할 수 있다. Then, the parameter matrix of the asymmetric multi-task feature network is calculated using the input learning data so that the predetermined objective function is minimized (S1430). Specifically, a parametric matrix of the network may be calculated by minimizing an objective function such as Equation 7 using a stochastic gradient descent method.

산출된 파라미터 행렬을 생성된 비대칭 다중-태스크 특징 네트워크의 파라미터로 사용하여 비대칭 다중-태스크 특징 학습 모델을 생성한다(S1440). An asymmetric multi-task feature learning model is generated using the calculated parameter matrix as a parameter of the generated asymmetric multi-task feature network (S1440).

따라서, 본 실시 예에 따른 학습 모델 생성 방법은 특징에 대한 학습과정에서 각 태스크의 대칭적 영향으로 야기되는 부정적 전이를 해결할 수 있게 된다. 또한, 본 개시에 따른 학습 모델 생성 방법은 부정적 전이가 해결된 학습 모델을 생성하는바, 실제 모델과 유사한 학습 모델을 생성할 수 있게 된다. 도 14와 같은 학습 모델 생성 방법은 도 1 또는 도 2의 구성을 가지는 전자 장치상에서 실행될 수 있으며, 그 밖의 다른 구성을 가지는 전자 장치상에서도 실행될 수 있다. Therefore, the learning model generation method according to the present embodiment can solve the negative transition caused by the symmetrical influence of each task in the learning process for the feature. In addition, the learning model generation method according to the present disclosure generates a learning model in which negative transitions are solved, so that a learning model similar to an actual model can be generated. The method for generating a learning model as shown in FIG. 14 may be executed on an electronic device having the configuration of FIG. 1 or 2, and may be executed on an electronic device having other configurations.

또한, 상술한 바와 같은 학습 모델 생성 방법은 컴퓨터에서 실행될 수 있는 실행 가능한 알고리즘을 포함하는 프로그램으로 구현될 수 있고, 상술한 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다. In addition, the learning model generation method as described above may be implemented as a program including an executable algorithm that can be executed on a computer, and the above-described program is stored and provided in a non-transitory computer readable medium. Can be.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 방법을 수행하기 위한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.A non-transitory readable medium means a medium that stores data semi-permanently and that can be read by a device, rather than a medium that stores data for a short time, such as registers, caches, and memory. Specifically, programs for performing the various methods described above may be stored and provided in a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

또한, 이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시가 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.In addition, although the preferred embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the specific embodiments described above, and the technical field to which the present disclosure belongs without departing from the gist of the present disclosure claimed in the claims. In addition, various modifications can be carried out by a person having ordinary knowledge, and these modifications should not be individually understood from the technical idea or prospect of the present disclosure.

100: 전자 장치 110: 메모리
120: 프로세서 130: 통신 인터페이스
140: 디스플레이 150: 조작 입력부100: electronic device 110: memory
120: processor 130: communication interface
140: display 150: operation input

Claims

A method for generating a learning model in an electronic device,
Receiving learning data from the electronic device;
Generating, in the electronic device, an asymmetric multi-task feature network having a parameter matrix of a learning model allowing asymmetric knowledge transfer between tasks and a feedback matrix for feedback-to-feature feedback connection;
Calculating, in the electronic device, a parameter matrix of the asymmetric multi-task feature network using the input learning data so that a predetermined objective function is minimized; And
And generating, by the electronic device, an asymmetric multi-task feature learning model using the calculated parameter matrix as a parameter of the generated asymmetric multi-task feature network.

According to claim 1,
The predetermined objective function,
A learning model generation method comprising a loss function for the 'learning model allowing asymmetric knowledge transfer between tasks', an autoencoder term using the feedback matrix and deriving a nonlinear combination of task parameters and a parameter attenuation normalization term.

According to claim 1,
The predetermined objective function,
The following formula, how to create a learning model.

here,

Is the feedback matrix,

The

T row of vector,

Is the average effectiveness loss of single-task learning for task t,

,

Is a nonlinear function, α,

According to claim 1,
The asymmetric multi-task feature network is a method for generating a learning model having multiple hidden layers.

In the electronic device,
A memory in which learning data is stored; And
Create an asymmetric multi-task feature network with a parameter matrix of the learning model that allows the transfer of asymmetric knowledge between tasks and a feedback matrix for feedback from task to feature, and use the stored training data to minimize a predetermined objective function And a processor for generating a parameter matrix of the asymmetric multi-task feature network, and using the calculated parameter matrix as a parameter of the generated asymmetric multi-task feature network to generate an asymmetric multi-task feature learning model. Electronic devices.

The method of claim 5,
The predetermined objective function,
An electronic device comprising a loss function for the 'learning model allowing asymmetric knowledge transfer between tasks', an autoencoder term using the feedback matrix and deriving a nonlinear combination of task parameters and a parameter attenuation normalization term.

The method of claim 5,
The predetermined objective function,
The following formula is an electronic device.

here,

Is the feedback matrix,

The

T row of vector,

Is the average effectiveness loss of single-task learning for task t,

,

Is a nonlinear function, α,

The method of claim 5,
The asymmetric multi-task feature network is an electronic device having a plurality of hidden layers.

A computer readable recording medium comprising a program for executing a method for generating a learning model in an electronic device, the computer readable recording medium comprising:
The learning model generation method,
Receiving learning data;
Generating an asymmetric multi-task feature network having a parameter matrix of a learning model that allows the transfer of asymmetric knowledge between tasks and a feedback matrix for feedback connection from task to feature;
Calculating a parameter matrix of the asymmetric multi-task feature network using the input learning data so that a predetermined objective function is minimized; And
And using the calculated parameter matrix as a parameter of the generated asymmetric multi-task feature network to generate an asymmetric multi-task feature learning model.