KR102139729B1

KR102139729B1 - Electronic apparatus and method for re-learning of trained model thereof

Info

Publication number: KR102139729B1
Application number: KR1020180010936A
Authority: KR
Inventors: 황성주; 윤재홍; 이정태; 양은호
Original assignee: 한국과학기술원
Priority date: 2017-06-09
Filing date: 2018-01-29
Publication date: 2020-07-31
Also published as: KR20180134738A; KR102102772B1; KR102139740B1; KR20180134740A; KR20180134739A

Abstract

학습 모델의 재학습 방법이 개시된다. 본 재학습 방법은 복수의 뉴런으로 구성되는 학습 모델, 및 신규 태스크를 포함하는 데이터 세트를 입력받는 단계, 복수의 뉴런 중 신규 태스크와 관련된 뉴런을 식별하고, 식별된 뉴런에 대해서 신규 태스크와 관련된 파라미터를 선택적 재학습하는 단계, 및 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 선택적 재학습이 수행된 학습 모델의 크기를 동적으로 확장하여 입력된 학습 모델을 재구성하는 단계를 포함한다. A method of re-learning a learning model is disclosed. The re-learning method includes receiving a learning model composed of a plurality of neurons, and a data set including a new task, identifying neurons associated with new tasks among the plurality of neurons, and parameters related to new tasks for the identified neurons And selectively re-learning, and reconstructing the input learning model by dynamically expanding the size of the learning model in which the selective re-learning is performed when the learning model in which the selective re-learning has a predetermined loss value. .

Description

ELECTRONIC APPARATUS AND METHOD FOR RE-LEARNING OF TRAINED MODEL THEREOF}

본 개시는 전자 장치 및 학습 모델의 재학습 방법에 관한 것으로, 더욱 상세하기는 새로운 개념을 학습할 때, 새로운 개념에 대응되는 태스크에 대해서만 부분적인 학습을 진행할 수 있는 전자 장치 및 학습 모델 생성 방법에 관한 것이다. The present disclosure relates to a re-learning method of an electronic device and a learning model, and more specifically, to an electronic device and a learning model generating method capable of partially learning only tasks corresponding to a new concept when learning a new concept. It is about.

평생 학습(Lifelong learning)은 지속적 학습(continual learning)과 실시간 전이학습의 한 분야로, 새로운 개념이 학습될 때 기존에 배웠던 개념들의 성능을 올리면서 기존에 배웠던 지식을 활용하여 새로운 개념의 학습에도 도움을 주는 것을 이상적인 목표를 갖는다. Lifelong learning is a field of continuous learning and real-time transfer learning. When new concepts are learned, it improves the performance of previously learned concepts and utilizes the existing knowledge to help learn new concepts. It has an ideal goal to give.

기존 시계열에 따른 점진적 학습의 경우, 새로운 개념을 학습할 때 기존에 학습한 개념을 잊어버려 오히려 전체적인 성능을 떨어트리는 문제가 흔하게 발생하며 이를 의미적 전이(semantic drift)라고 하였다. In the case of gradual learning according to the existing time series, when learning a new concept, the problem of degrading the overall performance by forgetting the previously learned concept is common, and this is called semantic drift.

이러한 문제를 해결하기 위한 종래의 해결 방법으로써, 학습된 기존의 개념들을 유지하면서 네트워크 확장을 통해 새로운 개념을 학습하는 등의 방법이 존재한다. 하지만, 이때, 네트워크는 고정된 크기만큼 확장하여, 계산비용이 급격히 상승하거나, 네트워크 모델의 상황에 능동적으로 대처하지 못하는 한계가 존재하였다. As a conventional solution for solving this problem, there exist methods such as learning a new concept through network expansion while maintaining the learned existing concepts. However, at this time, the network expands by a fixed size, and the calculation cost increases rapidly, or there is a limitation that it cannot actively cope with the situation of the network model.

따라서, 본 개시의 목적은 새로운 개념을 학습할 때, 새로운 개념에 대응되는 태스크에 대해서만 부분적인 학습을 진행할 수 있는 전자 장치 및 학습 모델 생성 방법을 제공하는 데 있다. Accordingly, an object of the present disclosure is to provide an electronic device and a learning model generation method capable of performing partial learning only on a task corresponding to a new concept when learning a new concept.

상술한 바와 같은 목적을 달성하기 위한 본 개시의 학습 모델의 재학습 방법은, 복수의 뉴런으로 구성되는 학습 모델 및 신규 태스크를 포함하는 데이터 세트를 입력받는 단계, 상기 복수의 뉴런 중 상기 신규 태스크와 관련된 뉴런을 식별하고, 상기 식별된 뉴런에 대해서 상기 신규 태스크와 관련된 파라미터를 선택적 재학습하는 단계, 및 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 상기 선택적 재학습이 수행된 학습 모델의 크기를 동적으로 확장하여 상기 입력된 학습 모델을 재구성하는 단계를 포함한다. A method of re-learning a learning model of the present disclosure for achieving the above-described object includes receiving a data set including a learning model composed of a plurality of neurons and a new task, and the new task among the plurality of neurons. Identifying related neurons, selectively re-learning parameters related to the new task for the identified neurons, and learning, in which the selective re-learning is performed when the learning model in which the selective re-learning is performed has a predetermined loss value. And reconstructing the input learning model by dynamically expanding the size of the model.

이 경우, 상기 선택적 재학습 단계는 상기 입력된 학습 모델에 대한 손실 함수 및 희소성을 위한 정규화 항을 갖는 목적 함수가 최소화하도록 신규 파라미터 행렬을 산출하고, 상기 산출된 신규 파라미터 행렬을 이용하여 상기 신규 태스크와 관련된 뉴런을 식별할 수 있다. In this case, the selective re-learning step calculates a new parameter matrix such that the objective function having a loss function for the input learning model and a normalization term for scarcity is minimized, and uses the calculated new parameter matrix to calculate the new task. It can identify neurons associated with.

한편, 상기 선택적 재학습 단계는 상기 식별된 뉴런만으로 구성되는 네트워크 파라미터에 대해서 상기 데이터 세트를 이용하여 신규 파라미터 행렬을 산출하고, 상기 산출된 신규 파라미터 행렬을 상기 학습 모델의 식별된 뉴런에 반영하여 선택적 재학습을 수행할 수 있다. On the other hand, in the selective re-learning step, a new parameter matrix is calculated using the data set for network parameters composed of only the identified neurons, and the calculated new parameter matrix is reflected in the identified neurons of the learning model to be selective. Re-learning can be performed.

한편, 상기 입력된 학습 모델을 재구성하는 단계는 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 상기 선택적 재학습이 수행된 학습 모델에 레이어별로 고정된 개수의 뉴런을 추가하고, 그룹 희소성을 이용하여 상기 추가된 뉴런 중 불필요한 뉴런을 제거하여 상기 입력된 학습 모델을 재구성할 수 있다. Meanwhile, in the reconstructing the input learning model, if the learning model in which the selective re-learning is performed has a predetermined loss value, a fixed number of neurons is added to each layer in the learning model in which the selective re-learning is performed, and the group is added. The input learning model may be reconstructed by removing unnecessary neurons from the added neurons using scarcity.

이 경우, 상기 입력된 학습 모델을 재구성하는 단계는 상기 입력된 학습 모델에 대한 손실 함수, 희소성을 위한 정규화 항 및 그룹 희소성을 위한 그룹 정규화 항을 갖는 목적 함수를 이용하여 상기 추가된 뉴런 중 불필요한 뉴런을 확인할 수 있다. In this case, reconstructing the input learning model may include unnecessary neurons among the added neurons using an objective function having a loss function for the input learning model, a normalization term for sparseness, and a group normalization term for group scarcity. can confirm.

한편, 상기 입력된 학습 모델을 재구성하는 단계는 상기 식별된 뉴런의 변화가 기설정된 값을 가지면, 상기 식별된 뉴런을 복제하여 상기 입력된 학습 모델을 확장하고, 상기 식별된 뉴런은 기존의 값을 갖도록 하여 상기 입력된 학습 모델을 재구성할 수 있다. On the other hand, in the step of reconstructing the input learning model, if the change of the identified neuron has a predetermined value, the identified neuron is duplicated to expand the input learning model, and the identified neuron is used to replace the existing value. It is possible to reconstruct the input learning model.

한편, 본 개시의 일 실시 예에 따른 전자 장치는 복수의 뉴런으로 구성되는 학습 모델 및 신규 태스크를 포함하는 데이터 세트가 저장된 메모리, 및 상기 복수의 뉴런 중 상기 신규 태스크와 관련된 뉴런을 식별하여 상기 식별된 뉴런에 대해서 상기 신규 태스크와 관련된 파라미터를 선택적 재학습하고, 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면 상기 선택적 재학습이 수행된 학습 모델의 크기를 동적으로 확장하여 상기 입력된 학습 모델을 재구성하는 프로세서를 포함한다. Meanwhile, the electronic device according to an embodiment of the present disclosure identifies the memory by storing a learning model composed of a plurality of neurons and a data set including a new task, and a neuron associated with the new task among the plurality of neurons Selective re-learning of parameters related to the new task for the neurons, and if the learning model in which the selective re-learning is performed has a predetermined loss value, the input of the learning model in which the selective re-learning is performed is dynamically expanded. And a processor to reconstruct the learning model.

이 경우, 상기 프로세서는 상기 입력된 학습 모델에 대한 손실 함수 및 희소성을 위한 정규화 항을 갖는 목적 함수가 최소화하도록 신규 파라미터 행렬을 산출하고, 상기 산출된 신규 파라미터 행렬을 이용하여 상기 신규 태스크와 관련된 뉴런을 식별할 수 있다. In this case, the processor calculates a new parameter matrix such that the objective function having a loss function for the input learning model and a normalization term for scarcity is minimized, and uses the calculated new parameter matrix to neuron related to the new task. Can be identified.

한편, 상기 프로세서는 상기 식별된 뉴런만으로 구성되는 네트워크 파라미터에 대해서 상기 데이터 세트를 이용하여 신규 파라미터 행렬을 산출하고, 상기 산출된 신규 파라미터 행렬을 상기 학습 모델의 식별된 뉴런에 반영하여 선택적 재학습을 수행할 수 있다. On the other hand, the processor calculates a new parameter matrix using the data set for network parameters composed only of the identified neurons, and reflects the calculated new parameter matrix on the identified neurons of the learning model for selective re-learning. It can be done.

한편, 상기 프로세서는 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 상기 선택적 재학습이 수행된 학습 모델에 레이어별로 고정된 개수의 뉴런을 추가하고, 그룹 희소성을 이용하여 상기 추가된 뉴런 중 불필요한 뉴런을 제거하여 상기 입력된 학습 모델을 재구성할 수 있다. Meanwhile, when the learning model in which the selective re-learning is performed has a predetermined loss value, the processor adds a fixed number of neurons for each layer to the learning model in which the selective re-learning is performed, and adds the group using group scarcity. The input learning model may be reconstructed by removing unnecessary neurons from among the neurons.

이 경우, 상기 프로세서는 상기 입력된 학습 모델에 대한 손실 함수, 희소성을 위한 정규화 항 및 그룹 희소성을 위한 그룹 정규화 항을 갖는 목적 함수를 이용하여 상기 추가된 뉴런 중 불필요한 뉴런을 확인할 수 있다. In this case, the processor may identify unnecessary neurons among the added neurons by using an objective function having a loss function for the input learning model, a normalization term for sparsity, and a group normalization term for group sparsity.

한편, 본 개시의 전자 장치에서의 학습 모델의 재학습 방법을 실행하기 위한 프로그램을 포함하는 컴퓨터 판독가능 기록 매체에 있어서, 상기 학습 모델의 재학습 방법은 복수의 뉴런으로 구성되는 학습 모델, 및 신규 태스크를 포함하는 데이터 세트를 입력받는 단계, 상기 복수의 뉴런 중 상기 신규 태스크와 관련된 뉴런을 식별하고, 상기 식별된 뉴런에 대해서 상기 신규 태스크와 관련된 파라미터를 선택적 재학습하는 단계, 및 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 상기 선택적 재학습이 수행된 학습 모델의 크기를 동적으로 확장하여 상기 입력된 학습 모델을 재구성하는 단계를 포함한다. On the other hand, in a computer-readable recording medium including a program for executing a re-learning method of a learning model in the electronic device of the present disclosure, the re-learning method of the learning model comprises a learning model composed of a plurality of neurons, and a new Receiving a data set including a task, identifying a neuron associated with the new task among the plurality of neurons, selectively re-learning parameters related to the new task for the identified neuron, and selective re-learning And if the performed learning model has a predetermined loss value, reconstructing the input learning model by dynamically expanding the size of the learning model in which the selective re-learning is performed.

상술한 바와 같이 본 개시의 다양한 실시 예에 따르면, 즉, 기존에 선행학습 된 네트워크 모델을 기반으로 추가적인 학습을 진행할 경우, 전체적으로 다시 학습을 진행하는 것이 아닌, 해당하는 태스크 단위의 네트워크를 복제, 분할하여 추가적인 부분만 학습을 진행하여 학습시간 및 연산량을 절약할 수 있다. As described above, according to various embodiments of the present disclosure, that is, when additional learning is performed based on an existing pre-trained network model, the entire task unit network is not duplicated, but the corresponding task unit network is duplicated and divided. By learning only the additional part, it is possible to save the learning time and computation amount.

도 1은 본 개시의 일 실시 예에 따른 전자 장치의 간단한 구성을 나타내는 블록도,
도 2는 본 개시의 일 실시 예에 따른 전자 장치의 구체적인 구성을 나타내는 블록도,
도 3은 동적 확장 방법을 설명하기 위한 도면,
도 4는 동적 확장 네트워크에서의 증분 학습에 대한 내용을 설명하기 위한 도면,
도 5는 동적 확장 네트워크에서의 증분 학습 알고리즘을 나타내는 도면,
도 6은 선택적 재학습에 대한 알고리즘을 나타내는 도면,
도 7은 동적 확장 방법에 대한 알고리즘을 나타내는 도면,
도 8은 네트워크 분할 및 복제에 대한 알고리즘을 나타내는 도면,
도 9는 각 학습 모델 및 데이터 세트 각각에 대한 평균 태스크당 평균 성능을 나타내는 도면,
도 10은 각 학습 모델 및 데이터 세트 각각에 대한 네트워크 크기의 정확성을 나타내는 도면,
도 11은 선택적 재훈련의 효과를 나타내는 도면,
도 12는 MNIST-Variation 데이터 세트에 대한 시맨틱 드리프트 실험을 나타내는 도면,
도 13은 본 개시의 일 실시 예에 따른 학습 모델 행성 방법을 설명하기 위한 흐름도,
도 14는 본 개시의 일 실시 예에 따라 동적 확장 네트워크의 증분 학습을 설명하기 위한 흐름도,
도 15는 도 14의 선택적 재학습 단계를 설명하기 위한 흐름도이다. 1 is a block diagram showing a simple configuration of an electronic device according to an embodiment of the present disclosure;
2 is a block diagram showing a specific configuration of an electronic device according to an embodiment of the present disclosure;
3 is a view for explaining a dynamic expansion method,
4 is a view for explaining the contents of the incremental learning in the dynamic extension network,
5 is a diagram showing an incremental learning algorithm in a dynamic extension network,
6 is a diagram showing an algorithm for selective re-learning,
7 is a diagram showing an algorithm for a dynamic extension method,
8 is a diagram showing an algorithm for network segmentation and replication,
9 is a diagram showing average performance per average task for each training model and each data set;
10 is a diagram showing the accuracy of the network size for each training model and each data set,
11 is a view showing the effect of selective retraining,
12 is a view showing a semantic drift experiment for the MNIST-Variation data set,
13 is a flowchart illustrating a learning model planetary method according to an embodiment of the present disclosure;
14 is a flowchart illustrating an incremental learning of a dynamic extension network according to an embodiment of the present disclosure,
15 is a flow chart for explaining the selective re-learning step of FIG. 14.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 개시에 대해 구체적으로 설명하기로 한다. Terms used in this specification will be briefly described, and the present disclosure will be described in detail.

본 개시의 실시 예에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 개시의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다. Terms used in the embodiments of the present disclosure, while considering the functions in the present disclosure, general terms that are currently widely used are selected, but this may vary according to the intention or precedent of a person skilled in the art or the appearance of new technologies. . Also, in certain cases, some terms are arbitrarily selected by the applicant, and in this case, their meanings will be described in detail in the description of the corresponding disclosure. Therefore, the terms used in the present disclosure should be defined based on the meaning of the terms and the contents of the present disclosure, not simply the names of the terms.

본 개시의 실시 예들은 다양한 변환을 가할 수 있고 여러 가지 실시 예를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 특정한 실시 형태에 대해 범위를 한정하려는 것이 아니며, 개시된 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 실시 예들을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.The embodiments of the present disclosure may apply various transformations and have various embodiments, and thus, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the scope of the specific embodiments, it should be understood to include all conversions, equivalents, or substitutes included in the scope of the disclosed ideas and techniques. In the description of the embodiments, when it is determined that the detailed description of the related known technology may obscure the subject matter, the detailed description is omitted.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms are only used to distinguish one component from other components.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다." 또는 "구성되다." 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, "includes." Or "composed." Terms such as intended to designate the presence of a feature, number, step, operation, component, part, or combination thereof described in the specification, one or more other features or numbers, steps, operation, component, part, or It should be understood that the possibility of the presence or addition of these combinations is not excluded in advance.

본 개시의 실시 예에서 '모듈' 혹은 '부'는 적어도 하나의 기능이나 동작을 수행하며, 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 '모듈' 혹은 복수의 '부'는 특정한 하드웨어로 구현될 필요가 있는 '모듈' 혹은 '부'를 제외하고는 적어도 하나의 모듈로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.In the exemplary embodiment of the present disclosure, the'module' or the'unit' performs at least one function or operation, and may be implemented in hardware or software, or a combination of hardware and software. In addition, a plurality of'modules' or a plurality of'units' may be integrated with at least one module, except for a'module' or'unit', which needs to be implemented with specific hardware, and may be implemented with at least one processor.

아래에서는 첨부한 도면을 참고하여 본 개시의 실시 예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present disclosure pertains can easily carry out the embodiments. However, the present disclosure may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present disclosure in the drawings, parts irrelevant to the description are omitted, and like reference numerals are assigned to similar parts throughout the specification.

이하에서는 도면을 참조하여 본 개시에 대해 더욱 상세히 설명하기로 한다.Hereinafter, the present disclosure will be described in more detail with reference to the drawings.

도 1은 본 개시의 일 실시 예에 따른 전자 장치의 간단한 구성을 나타내는 블록도이다. 1 is a block diagram illustrating a simple configuration of an electronic device according to an embodiment of the present disclosure.

도 1을 참조하면, 전자 장치(100)는 메모리(110) 및 프로세서(120)로 구성될 수 있다. 여기서 전자 장치(100)는 데이터 연산이 가능한 PC, 노트북 PC, 서버 등일 수 있다. Referring to FIG. 1, the electronic device 100 may include a memory 110 and a processor 120. Here, the electronic device 100 may be a PC, a notebook PC, or a server capable of data calculation.

메모리(110)는 복수의 뉴런으로 구성되는 학습 모델을 저장한다. 여기서 학습 모델은 인공 지능 알고리즘을 이용하여 학습된 모델이다. 그리고 인공 지능 알고리즘은 심층 신경 네트워크(Deep Neural Network, DNN), 심층 합성곱 신경망(Deep Convolution Neural Network), 레지듀얼 네트워크(Residual Network) 등일 수 있다. 이러한 학습 모델은 복수의 레이어, 즉 계층적으로 구성될 수 있다. The memory 110 stores a learning model composed of a plurality of neurons. Here, the learning model is a model trained using an artificial intelligence algorithm. The artificial intelligence algorithm may be a deep neural network (DNN), a deep convolution neural network, a residual network, or the like. The learning model may be composed of a plurality of layers, that is, hierarchically.

메모리(110)는 학습 모델을 재학습하기 위한 학습 데이터 세트를 저장할 수 있으며, 해당 학습 모델을 이용하여 분류 또는 인식하기 위한 데이터를 저장할 수도 있다. The memory 110 may store a set of learning data for re-learning the learning model, and may store data for classification or recognition using the learning model.

또한, 메모리(110)는 학습 모델을 재학습하는데 필요한 프로그램을 저장하거나, 해당 프로그램에 의하여 재학습된 학습 모델을 저장할 수 있다. Also, the memory 110 may store a program necessary for re-learning a learning model or a learning model re-learned by the corresponding program.

이러한, 메모리(110)는 전자 장치(100) 내의 저장매체 및 외부 저장매체, 예를 들어 USB 메모리를 포함한 Removable Disk, 호스트(Host)에 연결된 저장매체, 네트워크를 통한 웹서버(Web server) 등으로 구현될 수 있다. The memory 110 is a storage medium and an external storage medium in the electronic device 100, for example, a removable disk including a USB memory, a storage medium connected to a host, a web server through a network, or the like. Can be implemented.

프로세서(120)는 전자 장치(100) 내의 각 구성에 대한 제어를 수행한다. 구체적으로, 프로세서(120)는 사용자로부터 부팅 명령이 입력되면, 메모리(110)에 저장된 운영체제를 이용하여 부팅을 수행할 수 있다. The processor 120 performs control for each component in the electronic device 100. Specifically, when a boot command is input from the user, the processor 120 may boot using an operating system stored in the memory 110.

프로세서(120)는 후술할 조작 입력부(140)를 통하여 재학습시킬 학습 모델을 선택받을 수 있으며, 선택된 학습 모델을 재학습하기 위한 각종 파라미터를 조작 입력부(140)를 통하여 입력받을 수 있다. 여기서 입력받는 각종 파라미터는 하이퍼파라미터 등일 수 있다. The processor 120 may select a learning model to be re-learned through the manipulation input unit 140 to be described later, and receive various parameters for re-learning the selected learning model through the manipulation input unit 140. Here, various parameters received may be hyperparameters or the like.

각종 정보를 입력받으면, 프로세서(120)는 복수의 뉴런 중 신규 태스크와 관련된 뉴런을 식별한다. 구체적으로, 프로세서(120)는 입력된 학습 모델에 대한 손실 함수 및 희소성을 위한 정규화 항을 갖는 목적 함수가 최소화하도록 신규 파라미터 행렬을 산출하고, 산출된 신규 파라미터 행렬을 이용하여 신규 태스크와 관련된 뉴런을 식별할 수 있다. 여기서 목적 함수는 손실 함수 및 희소성을 위한 정규화 항을 갖는 함수로 수학식 2와 같이 표현될 수 있다. 목적 함수의 구체적인 내용에 대해서는 도 3과 관련하여 후술한다. When various information is input, the processor 120 identifies neurons related to a new task among a plurality of neurons. Specifically, the processor 120 calculates a new parameter matrix such that the objective function having a loss function for the input learning model and a normalization term for scarcity is minimized, and uses the calculated new parameter matrix to generate neurons related to the new task. Can be identified. Here, the objective function may be expressed as Equation 2 as a function having a loss function and a normalization term for scarcity. Details of the objective function will be described later with reference to FIG. 3.

그리고 프로세서(120)는 식별된 뉴런에 대해서 신규 태스크와 관련된 파라미터를 선택적 재학습한다. 구체적으로, 프로세서(120)는 후술할 수학식 3과 같은 목적 함수가 최소화하도록 식별된 뉴런만으로 구성되는 네트워크 파라미터에 대해서 데이터 세트를 이용하여 신규 파라미터 행렬을 산출하고, 산출된 신규 파라미터 행렬을 학습 모델의 식별된 뉴런에 반영하여 선택적 재학습을 수행할 수 있다. In addition, the processor 120 selectively re-learns parameters related to the new task for the identified neuron. Specifically, the processor 120 calculates a new parameter matrix using a data set for a network parameter composed of only neurons identified to minimize the objective function, such as Equation 3, which will be described later, and trains the calculated new parameter matrix. Selective re-learning can be performed by reflecting the identified neurons.

그리고 프로세서(120)는 선택적으로 재학습된 학습 데이터를 재구성할 수 있다. 구체적으로, 프로세서(120)는 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 선택적 재학습이 수행된 학습 모델의 크기를 동적으로 확장하여 입력된 학습 모델을 재구성할 수 있다. In addition, the processor 120 may selectively reconstruct retrained learning data. Specifically, the processor 120 may reconstruct the input learning model by dynamically expanding the size of the learning model in which the selective re-learning is performed when the learning model in which the selective re-learning has a predetermined loss value.

보다 구체적으로, 프로세서(120)는 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 선택적 재학습이 수행된 학습 모델에 레이어별로 고정된 개수의 뉴런을 추가하고, 그룹 희소성을 이용하여 추가된 뉴런 중 불필요한 뉴런을 제거하여 입력된 학습 모델을 재구성할 수 있다. 이때, 프로세서(120)는 입력된 학습 모델에 대한 손실 함수, 희소성을 위한 정규화 항 및 그룹 희소성을 위한 그룹 정규화 항을 갖는 수학식 4와 같은 목적 함수를 이용하여 추가된 뉴런 중 불필요한 뉴런을 확인할 수 있다. More specifically, the processor 120 adds a fixed number of neurons per layer to the learning model in which the selective re-learning is performed, and uses group scarcity when the learning model in which the selective re-learning has a predetermined loss value. The input learning model can be reconstructed by removing unnecessary neurons from the added neurons. In this case, the processor 120 may identify unnecessary neurons among the added neurons by using an objective function such as Equation 4 having a loss function for the input learning model, a normalization term for sparseness, and a group normalization term for group scarcity. have.

또는 프로세서(120)는 후술할 수학식 5를 연산하여 기 식별된 뉴런의 변화를 계산하고, 기식별된 뉴런의 변화가 기설정된 값을 가지면, 식별된 뉴런을 복제하여 입력된 학습 모델을 확장하고, 식별된 뉴런은 기존의 값을 갖도록 하여 입력된 학습 모델을 재구성할 수 있다. Alternatively, the processor 120 calculates a change in the previously identified neuron by calculating Equation 5, which will be described later, and when the change in the identified neuron has a predetermined value, duplicates the identified neuron to expand the input learning model and , The identified neurons can be reconstructed by having the existing values.

프로세서(120)는 재학습된 학습 모델을 이용하여 비전 인식, 음성 인식, 자연어 처리 등의 각종 처리를 수행할 수 있다. 구체적으로, 학습 모델이 이미지 분류와 관련된 것이었으면, 프로세서(120)는 재학습된 학습 모델과 입력된 이미지를 이용하여 입력된 이미지가 어떠한 것인지를 분류할 수 있다. The processor 120 may perform various processes such as vision recognition, speech recognition, and natural language processing using the re-learned learning model. Specifically, if the learning model is related to image classification, the processor 120 may classify the input image using the retrained learning model and the input image.

이상과 같이 본 실시 예에 따른 전자 장치(100)는 태스크를 확장할 때, 즉 재학습을 수행할 때, 전체 가중치가 아니라 그 태스크에 의미 있는 뉴런만을 재학습하는바, 효율적인 재학습이 가능하다. 또한, 의미있는 뉴런만을 재학습하는바 의미적 전이를 방지할 수 있게 된다. As described above, the electronic device 100 according to the present embodiment re-learns only neurons that are meaningful to the task, not the entire weight, when extending the task, that is, when performing re-learning, so that efficient re-learning is possible. . In addition, re-learning only meaningful neurons can prevent semantic metastasis.

한편, 이상에서는 전자 장치를 구성하는 간단한 구성에 대해서만 도시하고 설명하였지만, 구현시에는 다양한 구성이 추가로 구비될 수 있다. 이에 대해서는 도 2를 참조하여 이하에서 설명한다. On the other hand, in the above, only a simple configuration constituting the electronic device is illustrated and described, but in the implementation, various configurations may be additionally provided. This will be described below with reference to FIG. 2.

도 2는 본 개시의 일 실시 예에 따른 전자 장치의 구체적인 구성을 나타내는 블록도이다. 2 is a block diagram illustrating a specific configuration of an electronic device according to an embodiment of the present disclosure.

도 2를 참조하면, 전자 장치(100)는 메모리(110), 프로세서(120), 통신부(130), 디스플레이(140) 및 조작 입력부(150)로 구성될 수 있다. Referring to FIG. 2, the electronic device 100 may include a memory 110, a processor 120, a communication unit 130, a display 140, and an operation input unit 150.

메모리(110) 및 프로세서(120)의 동작에 대해서는 도 1과 관련하여 설명하였는바, 중복 설명은 생략한다. The operation of the memory 110 and the processor 120 has been described with reference to FIG. 1, and duplicate description is omitted.

통신부(130)는 타 전자 장치와 연결되며, 타 전자 장치로부터 학습 모델 및/또는 학습 데이터를 수신할 수 있다. 또한, 통신부(130)는 분류 또는 평가가 필요한 정보를 입력받고, 분류 및 평가 결과를 타 전자 장치에 제공할 수 있다. The communication unit 130 may be connected to another electronic device and receive a learning model and/or learning data from another electronic device. Also, the communication unit 130 may receive information requiring classification or evaluation, and provide classification and evaluation results to other electronic devices.

이러한 통신부(130)는 전자 장치(100)를 외부 장치와 연결하기 위해 형성되고, 근거리 통신망(LAN: Local Area Network) 및 인터넷망을 통해 단말장치에 접속되는 형태뿐만 아니라, USB(Universal Serial Bus) 포트 또는 무선 통신(예를 들어, WiFi 802.11a/b/g/n, NFC, Bluetooth) 포트를 통하여 접속되는 형태도 가능하다. The communication unit 130 is formed to connect the electronic device 100 with an external device, and is connected to a terminal device through a local area network (LAN) and an internet network, as well as a universal serial bus (USB). It is also possible to connect via a port or a wireless communication (eg, WiFi 802.11a/b/g/n, NFC, Bluetooth) port.

디스플레이(140)는 전자 장치(100)에서 제공하는 각종 정보를 표시한다. 구체적으로, 디스플레이(140)는 전자 장치(100)가 제공하는 각종 기능을 선택받기 위한 사용자 인터페이스 창을 표시할 수 있다. 구체적으로, 해당 사용자 인터페이스 창은 재학습을 수행할 학습 모델을 선택받거나, 재학습 과정에 사용될 파라미터를 입력받기 위한 항목을 포함할 수 있다. The display 140 displays various information provided by the electronic device 100. Specifically, the display 140 may display a user interface window for selecting various functions provided by the electronic device 100. Specifically, the corresponding user interface window may include an item for selecting a learning model to perform re-learning or inputting parameters to be used in the re-learning process.

이러한 디스플레이(140)는 LCD, CRT, OLED 등과 같은 모니터일 수 있으며, 후술할 조작 입력부(150)의 기능을 동시에 수행할 수 있는 터치 스크린으로 구현될 수도 있다. The display 140 may be a monitor such as an LCD, a CRT, or an OLED, or may be implemented as a touch screen capable of simultaneously performing the functions of the manipulation input unit 150, which will be described later.

또한, 디스플레이(140)는 학습 모델을 이용하여 테스트 결과에 대한 정보를 표시할 수 있다. 예를 들어, 해당 학습 모델이 이미지를 분류하는 모델이었으면, 디스플레이(140)는 입력된 이미지에 대한 분류 결과를 표시할 수 있다. In addition, the display 140 may display information about test results using a learning model. For example, if the corresponding learning model is a model for classifying images, the display 140 may display a classification result for the input image.

조작 입력부(150)는 사용자로부터 재학습을 수행할 학습 데이터 및 재학습 과정에서 수행할 각종 파라미터를 입력받을 수 있다. The manipulation input unit 150 may receive, from a user, learning data to perform re-learning and various parameters to be performed in the re-learning process.

이러한 조작 입력부(150)는 복수의 버튼, 키보드, 마우스 등으로 구현될 수 있으며, 상술한 디스플레이(140)의 기능을 동시에 수행할 수 있는 터치 스크린으로도 구현될 수도 있다. The manipulation input unit 150 may be implemented with a plurality of buttons, keyboard, mouse, or the like, and may also be implemented with a touch screen capable of simultaneously performing the functions of the display 140 described above.

한편, 도 1 및 도 2를 도시하고 설명함에 있어서, 전자 장치(100)에 하나의 프로세서만 포함되는 것으로 설명하였지만, 전자 장치에는 복수의 프로세서가 포함될 수 있으며, 일반적인 CPU 뿐만 아니라 GPU가 활용될 수 있다. 구체적으로, 상술한 최적화 동작은 복수의 GPU를 이용하여 수행될 수 있다. On the other hand, in the illustration and description of FIGS. 1 and 2, it has been described that the electronic device 100 includes only one processor, but the electronic device may include a plurality of processors, and a general CPU as well as a GPU may be utilized. have. Specifically, the above-described optimization operation may be performed using a plurality of GPUs.

이하에서는 학습 모델 변경방법에 대해서 자세히 설명한다. Hereinafter, a method of changing the learning model will be described in detail.

평생 학습(Lifelong learning)은 이전 학습의 한 분야로, 평생 학습의 주요 목적은 이전 태스크의 지식을 활용하여 성능을 향상시키거나 이후 태스크를 위한 모델에서보다 빠른 수렴/학습 속도를 얻는 것이다. Lifelong learning is a field of prior learning, and the main purpose of lifelong learning is to leverage the knowledge of previous tasks to improve performance or to achieve faster convergence/learning speeds than in models for later tasks.

이러한 문제를 해결하기 위한 다양한 접근 방식이 존재한다. 그러나 본 개시에서는 심층 신경 네트워크의 능력을 활용하기 위하여, 심층 학습 상에서 평생 학습을 고려한다. 다행히, 심층 학습에서, 지식을 저장 또는 전이하는 것은 학습된 네트워크 가중치를 통해 간단한 방법으로 수행할 수 있다. 학습된 가중치는 기존 태스크에 대한 지식으로 제공할 수 있으며, 새로운 태스크는 단순히 이들의 가중치를 공유하는 것만으로 영향력을 미칠 수 있다. There are various approaches to solving this problem. However, in this disclosure, in order to utilize the capabilities of the deep neural network, lifelong learning is considered on deep learning. Fortunately, in deep learning, storing or transferring knowledge can be done in a simple way through the learned network weights. The learned weights can be provided as knowledge of existing tasks, and new tasks can be influential simply by sharing their weights.

따라서, 평생 학습은 심층 신경 네트워크에서의 온라인 학습(online learning) 또는 점진적 학습(incremental learning)의 특수 사례로 간주할 수 있다. Therefore, lifelong learning can be regarded as a special case of online learning or incremental learning in a deep neural network.

점진적 학습을 수행하는 다양한 방법이 있는데, 가장 단순한 방법은 새로운 학습 데이터로 네트워크를 계속 학습함으로써 네트워크를 새로운 태스크로 점진적으로 미세 조정하는 것이다. There are various ways to do incremental learning, the simplest is to gradually refine the network to new tasks by continuing to train the network with new learning data.

그러나 네트워크를 단순하게 재학습하면 새로운 태스크와 이전 태스크 모두에서 성능이 저하될 수 있다. 만약, 새로운 태스크가 이전의 태스크와 크게 다른 경우, 예를 들어, 이전 태스크는 동물의 이미지를 분류하는 것인데 새로운 태스크는 자동차의 이미지를 분리하는 것이라면, 이전 태스크의 특징 학습은 새로운 태스크 학습에 유용하지 않다. 동시에, 새로운 태스크에 대한 재학습은 원래 태스크에 벗어나서 더는 최적의 태스크가 아니게되어, 기존 태스크에 부정적인 영향을 미치게 된다. However, simply retraining the network can degrade performance in both new and old tasks. If the new task is significantly different from the previous task, for example, if the previous task is to classify the animal image and the new task is to separate the car image, learning the features of the previous task is not useful for learning the new task. not. At the same time, re-learning a new task deviates from the original task and is no longer an optimal task, negatively affecting the existing task.

예를 들어, 얼룩말의 줄무니 패턴을 설명하는 특징은 줄무늬 티셔츠 또는 울타리를 분류하는 후천적인 분류 태스크에 대한 의미로 변경하는 것과 같이, 특징을 변경하고, 그것의 의미도 크게 변경할 수 있다. For example, features that describe a zebra's stripe pattern can change features, and significantly change its meaning, such as changing it to a meaning for an adaptive classification task that classifies a striped t-shirt or fence.

이에 따라, 심층 신경망의 온라인/증분 학습에서 네트워크를 통하여 지식의 공유가 모든 태스크에 유익하다는 것을 어떻게 보장할 수 있는지를 고려하였다. 최근에 작업들은 큰 파라미터의 변화를 방지하는 정규화의 사용을 제안한다. 그러나 이러한 제안은 새로운 태스크에 대한 좋은 솔루션을 찾아야 하며, 오래된 태스크의 파라미터의 변화를 방지하는 문제가 있다. Accordingly, we considered how to ensure that the sharing of knowledge through the network is beneficial for all tasks in online/incremental learning of deep neural networks. Recently, work suggests the use of normalization to prevent large parameter changes. However, these proposals need to find a good solution for new tasks, and there is a problem to prevent changes in parameters of old tasks.

따라서, 본 개시에서는 필요한 경우, 네트워크 크기의 확장을 허용하면서, 새로운 태스크 활용을 위하여 각 태스크(t)에서 네트워크를 학습하고, 이전 학습 네트워크에서 대응되는 부분만을 변경시킨다. 이러한 방식으로, 각 태스크(t)는 이전 태스크와 다른 서브네트워크로 사용될 수 있으며, 그것과 관련된 서브네트워크의 상당한 부분을 공유할 수 있다. Therefore, in the present disclosure, if necessary, the network is expanded in each task t in order to utilize the new task while changing the network size, and only the corresponding part of the previous learning network is changed. In this way, each task t can be used as a subnetwork different from the previous task, and can share a significant portion of the subnetwork associated with it.

한편, 선택적 파라미터 공유 및 동적 레이어 확장을 통해 점진적으로 심층 학습 설정을 수행하기 위해서는 아래와 같은 점이 고려되어야 한다. Meanwhile, the following points should be considered in order to gradually set up deep learning through selective parameter sharing and dynamic layer expansion.

1) 학습의 확장성 및 효율성 달성 : 네트워크의 크기가 증가하면 이후 태스크를 통해 훨씬 더 큰 네트워크에 대한 연결이 설정되므로 태스크 당 학습 비용도 점차 증가하게 된다. 따라서, 재훈련의 계산상의 오버 헤드를 낮게 유지할 방법이 요구된다. 1) Achieve scalability and efficiency of learning: As the size of the network increases, the connection to a much larger network is established through subsequent tasks, so the learning cost per task increases gradually. Therefore, there is a need for a method to keep the computational overhead of retraining low.

2) 네트워크를 확장할 시기와 추가할 뉴런 수 결정 : 기존 네트워크가 새로운 태스크를 충분히 설명하면 네트워크가 크기를 확장할 필요가 없다. 반대로, 태스크가 기존의 것과 매우 다른 경우 많은 뉴런을 추가해야 할 필요가 있다. 따라서, 학습 모델은 필요한 수의 뉴런만 동적으로 추가할 필요가 있다. 2) Determining when to expand the network and how many neurons to add: If the existing network fully describes the new task, the network does not need to expand in size. Conversely, if the task is very different from the existing one, it is necessary to add many neurons. Therefore, the learning model needs to dynamically add only the necessary number of neurons.

3) 의미적 전이(semantic drift) 또는 네트워크가 초기 구성에서 벗어난 초기의 예제/태스크에 대한 성능이 저하되는 격변적 망각(catastrophic forgetting)을 방지할 필요가 있다. 이러한 점에서, 부분적으로는 네트워크를 재학습하고 나중에 학습한 태스크에 적합하게 하고 이전 서브 네트워크와의 연결을 설정하여 이전 태스크에 부정적인 영향을 미칠 수 있는 새로운 뉴런을 추가하기 때문에 잠재적인 의미적 전이의미 이동을 방지하는 메커니즘이 요구된다. 3) There is a need to prevent semantic drift or catastrophic forgetting, where network performance degrades for early examples/tasks that deviate from the initial configuration. In this regard, it has a potential semantic transfer meaning, in part because it re-learns the network, makes it suitable for tasks learned later, and establishes connections to previous sub-networks to add new neurons that can negatively affect the previous task. A mechanism for preventing movement is required.

이러한 점을 해결하기 위하여, 본 개시에서는 효율적이고 효과적인 증분 학습 알고리즘과 함께 새로운 차원의 네트워크 모델을 제안한다. 이러한 알고리즘을 Dynamically Expandable Networks (DEN)라고 지칭한다. In order to solve this problem, the present disclosure proposes a new dimension of a network model along with an efficient and effective incremental learning algorithm. These algorithms are called Dynamically Expandable Networks (DEN).

평생 학습 시나리오에서 DEN은 모든 이전 태스크에서 배운 네트워크를 최대한 활용하여 새로운 태스크를 예측하는 방법을 효율적으로 배우는 동시에 필요할 때 뉴런을 추가하거나 분할하여 네트워크 크기를 동적으로 늘릴 수 있다. 이러한 알고리즘은 합성곱 네트워크 (convolutional network)를 포함한 일반적인 네트워크에 적용 가능하다.In a lifelong learning scenario, DEN can efficiently learn how to predict new tasks by making the most of the network learned from all previous tasks, while dynamically increasing network size by adding or splitting neurons when needed. This algorithm is applicable to general networks including convolutional networks.

이하에서는 도 3을 참조하여 종래의 평생 학습 방법과 본 실시 예에 따른 학습 방법을 설명한다. Hereinafter, a conventional lifelong learning method and a learning method according to the present embodiment will be described with reference to FIG. 3.

도 3은 다양한 재학습 모델을 설명하기 위한 도면이다. 3 is a diagram for explaining various re-learning models.

도 3a은 Elastic Weight Consolidation과 같은 재학습 모델을 나타낸다. 해당 모델은 원래 모델과의 크 차이를 방지하기 위하여 정규화를 수행하면서 이전 태스크에 대해 학습된 전체 네트워크를 재학습하는 방식이다. 재학습된 단위(unit)와 가중치는 점선으로 표시되었고, 실선으로 되어 있는 부분은 고정된 단위와 가중치를 나타낸다. Figure 3a shows a re-learning model, such as Elastic Weight Consolidation. This model re-learns the entire network learned for the previous task while performing normalization to prevent a large difference from the original model. The retrained units and weights are indicated by dotted lines, and the solid lines indicate fixed units and weights.

도 3b은 Progressive Network와 같은 비 훈련 모델을 나타낸다. 해당 모델은 기존의 태스크에 대한 네트워크 가중치를 유지하면서 새로운 태스크(t)에 대해서 네트워크를 확장한다. 3B shows a non-training model such as a Progressive Network. This model extends the network for the new task (t) while maintaining the network weight for the existing task.

도 3c은 본 개시에 따른 학습 모델을 나타낸다. 본 개시에 따른 학습 모델은 기존의 네트워크를 선택적으로 재학습하여 필요한 경우 크기를 확장하므로 학습시의 최적의 크기를 동적으로 결정한다. 3C shows a learning model according to the present disclosure. The learning model according to the present disclosure selectively re-learns an existing network to expand the size if necessary, thereby dynamically determining an optimal size during learning.

이하에서는 본 개시에 따른 동적으로 확장 가능한 네트워크 증분 학습에 대해서 설명한다. Hereinafter, a dynamically scalable network incremental learning according to the present disclosure will be described.

학습 데이터의 알려지지 않은 분포를 가진 미지의 태스크 수가 모델에 순차적으로 도착하는 평생 학습 시나리오에서 심층 신경망의 증분 훈련 문제를 고려한다. 구체적으로, 연속적인 T 태스크(t = 1, ..., t, ..., T, T는 무한대)에 대한 모델을 학습하는 것이다. 여기서 특정 시점(t)에서 태스크는 학습 데이터(

)에 달려있다. Consider the incremental training problem of deep neural networks in a lifelong learning scenario where the number of unknown tasks with unknown distribution of learning data arrives sequentially in the model. Specifically, training a model for successive T tasks (t = 1, ..., t, ..., T, T is infinite). Here, at a specific point in time (t), the task is learning data (

).

특정 태스크의 일반화를 위한 방법은 단순화를 위하여, 입력 특징(

)에 대한

를 갖는 바이너리 분류 태스크를 고려할 수 있다. 평생 학습 환경에서 t-1까지의 이전 훈련 데이터 세트가 현재 시간 t에서 사용 가능하지 않다는 것이 주요 태스크이다. 이전 작업에 대한 모델 파라미터만 액세스 가능하다. The method for generalization of a specific task is, for simplicity, input features (

)for

Consider a binary classification task with. The main task is that the previous training dataset from t-1 in the lifelong learning environment is not available at the current time t. Only model parameters for previous work are accessible.

시간 t에서 평생 학습 에이전트는 다음과 같은 수학식을 해결하여 모델 파라미터

를 학습하는 것을 목표로 한다. At time t, the lifelong learning agent solves the following equation to model parameters

Aim to learn.

여기서

은 태스크 특정 손실 함수이고,

는 태스크 t에 대한 파라미터이고,

는 모델

를 적절하게 시행하기 위한 정규화(예를 들어, 엘리먼트-와이즈(element-wise)

놈(norm) )이다. 그리고 L(l=1,2,…,L)은 레이어의 개수, D는 데이터,

:

레이어에서의 t에 대한 가중치 텐서(tensor)이다. here

Is the task specific loss function,

Is a parameter for task t,

The model

Normalization (e.g., element-wise) to properly enforce

It is a norm. And L(l=1,2,…,L) is the number of layers, D is the data,

:

The weight tensor for t in the layer.

평생 학습의 이러한 태스크를 해결하기 위해 네트워크는 이전 태스크에서 얻은 지식을 최대한 활용하고 누적 지식만으로는 새로운 태스크를 충분히 설명할 수 없을 때 크기를 동적으로 확장할 수 있도록 한다. 구체적으로, 동적으로 네트워크의 크기를 조절해 나가며 최적해를 찾아갈 수 있다. 즉, 기존에 선행학습 된 네트워크 모델을 기반으로 추가적인 학습을 진행할 경우, 전체적으로 다시 학습을 진행하는 것이 아닌, 해당하는 태스크 단위의 네트워크를 복제, 분할하여 추가적인 부분만 학습을 진행하여 학습시간 및 연산량을 절약할 수 있다. To address these tasks in lifelong learning, the network makes the most of the knowledge gained from previous tasks and dynamically scales in size when the cumulative knowledge alone is not enough to describe the new task. Specifically, the network can be dynamically adjusted to find the optimal solution. In other words, when additional learning is performed based on the network model previously learned, instead of learning again, the network of each task unit is duplicated and divided to learn only the additional part, thereby learning and computing time. You can save.

이하에서는 도 4 및 도 5를 참고하여 점진적 학습 과정을 설명한다. Hereinafter, the gradual learning process will be described with reference to FIGS. 4 and 5.

도 4는 동적 확장 네트워크에서의 증분 학습에 대한 내용을 설명하기 위한 도면이고, 도 5는 동적 확장 네트워크에서의 증분 학습 알고리즘을 나타내는 도면이다. FIG. 4 is a diagram for explaining contents of incremental learning in a dynamic extension network, and FIG. 5 is a diagram showing an incremental learning algorithm in the dynamic extension network.

도 4 및 도 5를 참조하면, 먼저, 선택적 재학습을 수행한다. 구체적으로, 신규 태스크와 관련된 뉴런을 식별하고 관련 태스크와 관련된 네트워크 파라미터를 선택적으로 재학습한다. 선택적 재학습의 구체적인 동작은 도 6을 참조하여 후술한다. 4 and 5, first, selective re-learning is performed. Specifically, neurons associated with new tasks are identified and network parameters associated with the related tasks are selectively retrained. The specific operation of the selective re-learning will be described later with reference to FIG. 6.

그리고 동적으로 네트워크 확장한다. 구체적으로, 선택적 재훈련이 설정된 임계 값 이하의 원하는 손실을 얻지 못하면 그룹 - 희소성 정규화를 사용하여 불필요한 모든 신경을 제거하면서 네트워크 크기를 하향식으로 확장한다. 동적 확장 동작에 대해서는 도 7을 참조하여 후술한다. And it dynamically expands the network. Specifically, if selective retraining does not achieve the desired loss below a set threshold, group-sparse normalization is used to remove all unnecessary nerves and expand the network size downward. The dynamic expansion operation will be described later with reference to FIG. 7.

그리고 네트워크를 분할하거나 복제한다. 구체적으로, DEN은 각 유닛에 대한 드리프트(drift)

를 계산하여 훈련 도중 원래 값에서 너무 많이 벗어난 유닛을 식별하고 복제한다. 네트워크의 분할 및 복제에 대한 구체적인 동작은 도 8을 참조하여 후술한다. Then, divide or duplicate the network. Specifically, DEN is a drift for each unit

Calculate and identify units that deviate too much from the original value during training. The detailed operation for the division and replication of the network will be described later with reference to FIG. 8.

이하에서는 도 6을 참조하여 선택적 재학습 동작을 설명한다. Hereinafter, the selective re-learning operation will be described with reference to FIG. 6.

도 6은 선택적 재학습에 대한 알고리즘을 나타내는 도면이다. 6 is a diagram showing an algorithm for selective re-learning.

일련의 태스크에 대해 모델을 훈련시키는 가장 단순한 방법은 새로운 태스크가 도착할 때마다 전체 모델을 재학습하는 것이다. 그러나 이러한 재훈련은 심층 신경 네트워크에 큰 비용이 들게 한다. 따라서 신규 태스크의 영향을 받는 가중치만 재학습하여 모델의 선택적 재학습을 수행하는 것이 바람직하다. 이에 따라, 본 개시에서는 가중치의 희소성을 가속하기 위하여

정규화로 네트워크를 학습한다. 이에 따라, 각 뉴런은 아래의 레이어에서 단지 몇 개의 뉴런에만 연결된다. The simplest way to train a model for a series of tasks is to retrain the entire model each time a new task arrives. However, such retraining is expensive for deep neural networks. Therefore, it is desirable to perform selective re-learning of the model by re-learning only the weights affected by the new task. Accordingly, in the present disclosure, in order to accelerate the scarcity of the weight

Train the network with normalization. Accordingly, each neuron is connected to only a few neurons in the underlying layer.

여기서,

은 네트워크의

번째 레이어를 나타내며,

은 레이어

에서의 네트워크 파라미터이고,

은

의 희소성을 위한 엘리먼트-와이즈(element-wise)

놈의 정규화 파라미터로, regularize를 더할 때 그 크기를 정하는데 이용된다. 합성곱 레이어에서, 필터 상에 (2,1)-놈을 적용하여 이전 레이어의 필터만 선택하였다. here,

Of the network

The second layer,

Silver layer

Is a network parameter in

silver

-Wise for element scarcity

This is the normalization parameter of the norm, which is used to determine its size when adding regularize. In the convolutional layer, only the filter of the previous layer was selected by applying (2,1)-nome on the filter.

-정규화는 뉴런 간의 연결성이 희박하므로 서브 네트워크 연결 신규 태스크에 집중할 수 있다면 계산 오버 헤드를 크게 줄일 수 있다. 이를 위해 새로운 태스크가 모델에 도착하면 다음과 같은 수학식 2를 통해 신경망의 최상위 숨겨진 단위를 사용하여 태스크 t를 예측하기 위한 스파스(sparse) 선형 모델에 적합하게 만든다.

-Normalization lacks connectivity between neurons, so if you can focus on new tasks connecting to a subnetwork, you can greatly reduce the computational overhead. To this end, when a new task arrives at the model, the following equation (2) is used to fit the sparse linear model to predict the task t using the highest hidden unit of the neural network.

여기서

은

을 제외한 다른 모든 파라미터 집합을 나타낸다. 즉, 이 최적화를 해결하여 출력부(

)와 레이어 L-1에서 히든 레이어 간의 연결을 얻는다(레이어 L-1까지 다른 모든 파라미터를

으로 고정). 이러한 레이어에서의 스파스 연결을 구축하여, 학습 영향을 받는 네트워크 내의 단위 및 가중치를 식별할 수 있게 된다. 특히, 선택한 노드에서 시작하여 네트워크에서 너비 우선 탐색을 수행하여 경로가 있는 모든 유닛 (및 입력 특징)을 식별할 수 있다. 그 다음, 선택된 서브 네트워크(S)(

)의 가중치들만을 학습할 수 있다. here

silver

Except for all other parameter sets. In other words, by solving this optimization,

) And get the connection between layer L-1 and hidden layer (all other parameters up to layer L-1

Fixed with). By establishing sparse connections at these layers, it is possible to identify units and weights within the network that are affected by learning. In particular, it is possible to identify all units (and input features) that have a path by performing a width-first search on the network starting at the selected node. Then, the selected sub-network (S) (

) Can learn only the weights.

스파스 연결이 이미 확립되었기 때문에 엘리먼트-와이즈

정규화를 사용한다. 이러한 선택적 재학습은 선택되지 않은 뉴런은 재학습의 영향을 받지 않기 때문에, 계산 오버헤드를 낮추고 부정적인 전이를 회피하는데 도움이된다. Element-wise because the sparse connection has already been established

Use normalization. This selective relearning helps reduce computational overhead and avoid negative metastasis, as unselected neurons are not affected by relearning.

도 6을 참조하면, 먼저, l과 S를 초기화하고, 수학식 2를 이용하여

를 얻는다. Referring to FIG. 6, first, l and S are initialized, and Equation 2 is used.

Get

그리고 레이어 L에서의 task t 에 대한 웨이트인

의 i와 O_t 사이의 가중치가 0이 아니면, S에 뉴런 i를 추가한다. And the weight for task t in layer L

If the weight between i and O _t is not 0, neuron i is added to S.

또한,

인 뉴런 S가 존재할 때 S에 뉴런 i를 추가한다. Also,

Neuron i is added to S when phosphorus neuron S is present.

그리고 수학식 3을 이용하여

를 얻는다. And using Equation 3

Get

도 7은 동적 확장 방법에 대한 알고리즘을 나타내는 도면이다. 7 is a diagram showing an algorithm for a dynamic extension method.

새로운 태스크가 이전 태스크와 관련성이 높거나 각 태스크에서 얻은 부분적으로 얻게 된 지식이 새로운 태스크를 설명하기에 충분하다면, 선택적 재훈련만으로도 새로운 태스크를 수행하는 데 충분하다. Selective retraining is sufficient to perform a new task if the new task is highly relevant to the previous task, or if the partially gained knowledge gained from each task is sufficient to describe the new task.

그러나 학습된 특징이 새로운 태스크를 나타내는 데 충분하지 않다면, 신규 태스크에 필요한 기능을 설명하기 위해 추가 뉴런을 네트워크에 도입해야 한다. 한편, 태스크 난이도를 고려하지 않고, 일정수의 단위를 추가하거나, 반복적인 포워드 패스를 요구하는 것은 성능 및 네트워크 크기 측면에서 바람직하지 않다. However, if the learned features are not sufficient to represent a new task, additional neurons must be introduced into the network to describe the functionality required for the new task. On the other hand, it is not desirable in terms of performance and network size to add a certain number of units without requiring task difficulty or to request an iterative forward pass.

따라서, 본 개시에서는 각 레이어에 얼마나 많은 뉴런을 추가하여야 하는지 동적으로 결정하기 위하여 그룹 스파스 정규화를 사용한다. 네트워크의 레이어를 일정한 수(k)로 확장한다고 가정하면, 두 개의 파라미터 행렬은

와 같이 확장할 수 있다. 그리고 새로운 태스크와 이전 태스크 간의 관련성에 따라 항상

단위를 추가하기를 원하지 않기 때문에 수학식 4와 같이 추가된 파라미터에 그룹 희소 정규화를 수행한다. Therefore, in this disclosure, group sparse normalization is used to dynamically determine how many neurons to add to each layer. Assuming that the layer of the network is expanded to a certain number (k), the two parameter matrices

Can be expanded with And depending on the relationship between the new task and the old task,

Since we do not want to add units, we perform group sparse normalization on the added parameters as in Equation 4.

여기서

는 각 뉴런에 대한 유입 가중치로 정의 된 그룹이다. 합성곱 레이어는, 각 그룹을 각 합성곱 필터의 활성화 맵으로 정의하였다. 이러한 그룹 희소 정규화는 전체 네트워크의 적정한 뉴런 수를 찾으며, 본 개시에서는 일부 네트워크에서만 이를 적용한다. 도 7은 이러한 알고리즘을 나타낸다. here

Is a group defined by the influx weight for each neuron. The convolution layer defined each group as an activation map of each convolution filter. This group sparse normalization finds the proper number of neurons in the entire network, and this disclosure applies only to some networks. 7 shows this algorithm.

도 7을 참조하면, 선택적 재학습에서 목표로 하는 손실 값(

)이 기설정된 손실 값(

)보다 크면, 네트워크 크기를 고정적으로 확장하고, 그룹 희소성을 통해 확장된 뉴런 중 불필요한 뉴런을 제거하는 동작을 반복적으로 수행할 수 있다. Referring to FIG. 7, the target loss value in selective re-learning (

) Is a preset loss value (

If it is greater than ), the network size can be fixedly expanded and an operation of removing unnecessary neurons among the expanded neurons through group scarcity can be repeatedly performed.

도 8은 네트워크 분할 및 복제에 대한 알고리즘을 나타내는 도면이다. 8 is a diagram showing an algorithm for network segmentation and replication.

평생 학습에서 중요한 과제는 나중에 학습된 태스크에 의하여 초기 태스크를 잊고 그 결과 성능이 저하되는 의미적 전이, 치명적 망각이다. 의미적 전이를 방지할 수 있는 쉬운 방법은

정규화를 사용하여 원래 값으로부터 너무 많이 벗어나지 않도록 파라미터를 정규화하는 것이다. An important task in lifelong learning is semantic transition, fatal forgetting, in which the initial task is forgotten by the later learned task and the performance is reduced as a result. An easy way to prevent semantic transfer

The normalization is to normalize the parameters so that they do not deviate too much from the original value.

여기서

는 현재 태스크이고,

은 l 태스크

에 대해 훈련된 네트워크의 가중치 텐서(tensor)이다. 그리고,

는 정규화 파라미터이다. here

Is the current task,

Silver l task

Is the weighted tensor of the network trained for. And,

Is the normalization parameter.

정규화는 주어진

정도로 솔루션

가

에 가깝도록 강제하는 것이다.

가 작다면, 네트워크는 오래된 태스크를 잊어 버리는 동안 새로운 태스크를 더 반영하도록 배울 것이고,

가 크다면,

는 이전 태스크에서 배운 지식을 가능한 한 많이 보존하려고 할 것이다.

Normalization given

Degree solution

end

Is to force it closer to.

If is small, the network will learn to reflect more new tasks while forgetting old tasks,

If is large,

Will try to preserve as much of the knowledge learned in the previous task as possible.

정규화를 대신하여, 피셔 정보(Fisher information)로 각 요소를 가중하는 방법도 가능하다. 그럼에도, 태스크의 수가 매우 크거나, 나중의 태스크가 의미적으로 이전 태스크와 차이가 있다면, 이전 및 새로운 태스크 각각에 대한 좋은 솔루션을 찾기에 어렵게 된다.

Instead of normalization, it is also possible to weight each element with Fisher information. Nevertheless, if the number of tasks is very large, or if later tasks are semantically different from previous tasks, it will be difficult to find a good solution for each of the old and new tasks.

이러한 경우에는 최적화된 두 개의 다른 태스크를 분리하는 것이 좋은 방법이 된다. 수학식 5를 수행하여, t-1과 t에서 들어오는 가중치 사이의

거리로 각 숨겨진 레이어에서의 의미적 전이의 양(

)을 측정한다. In this case, it is a good idea to separate the two different optimized tasks. By performing Equation 5, between t-1 and the weight coming from t

The amount of semantic transition in each hidden layer by distance (

).

라면, 우리는 훈련 중에 특징의 의미가 크게 변했다고 생각할 수 있는바, 이 뉴런을 두 개의 복사본으로 나눈다. 이러한 방식은 모든 히든 뉴런에 대해서 병렬적으로 수행될 수 있다. 뉴런을 복제한 이후에, 뉴런은 수학식 5를 이용하여 가중치가 다시 학습될 수 있다. 그러나 초기화 학습으로부터 합리적인 파라미터 초기화에 기초하여 두 번째 학습은 빠르게 수행될 수 있다. 도 8은 이러한 알고리즘을 나타낸다.

Ramen, we can think of a significant change in the meaning of the feature during training, so we divide this neuron into two copies. This method can be performed in parallel for all hidden neurons. After duplicating the neuron, the neuron can be re-trained using Equation (5). However, the second learning can be quickly performed based on reasonable parameter initialization from the initialization learning. 8 shows this algorithm.

도 8을 참조하면, 앞선 동적 네트워크 확장 동작 이후에, 업데이트된 뉴런의 변화가 기준점보다 높으면 해당 뉴런들(B)을 복제하여 네트워크를 확장하고, 업데이트된 뉴런(B)을 이전 단계(A)로 복귀할 수 있다. Referring to FIG. 8, after the previous dynamic network expansion operation, if the change in the updated neuron is higher than the reference point, the corresponding neurons (B) are replicated to expand the network, and the updated neuron (B) is moved to the previous step (A). You can return.

한편, 네트워크 확장 및 네트워크 분할 절차 모두에서 네트워크에 추가될 때 학습 단계 t를 저장하는

를 설정하여 새로 추가된 유닛 j를 타임스탬프 처리하여 새로운 개념의 도입으로 인한 의미적 편차를 효과적으로 방지할 수 있다.On the other hand, in both the network expansion and network segmentation procedures, the learning step t is stored when added to the network.

By setting, it is possible to effectively prevent the semantic deviation caused by the introduction of a new concept by timestamping the newly added unit j.

구체적으로, 추론시 각 작업은 이전 단계에서 학습 과정에 추가된 새로운 숨겨진 단위를 사용하지 못하도록 단계 t까지 도입된 파라미터만을 사용할 수 있다. 이것은 각 학습 단계까지 학습된 가중치를 수정하는 것보다 유연하다. 초기 학습 과제는 이후 학습 과제에서 배울 수 있지만 학습은 더 잘되지만 분리되지는 못하기 때문에 이점이 있다. Specifically, in inference, each task may use only the parameters introduced up to step t so as not to use the new hidden unit added to the learning process in the previous step. This is more flexible than modifying the weights learned until each stage of learning. Early learning tasks can benefit from later learning tasks, but they are advantageous because learning is better, but not separated.

이하에서는 도 9 내지 도 12를 참조하여, 본 개시에 따른 선택적 재학습 방법의 효과에 대해서 설명한다. Hereinafter, the effect of the selective re-learning method according to the present disclosure will be described with reference to FIGS. 9 to 12.

이하에서는 본 개시에 따른 선택적 재학습 방법에 적용된 실험 조건의 비교 대상 및 그에 대한 설정 상태를 우선적으로 설명한다. Hereinafter, a comparison target of an experimental condition applied to the selective re-learning method according to the present disclosure and a setting state therefor will be described first.

1) DNN-STL(Singl-Task Learning) :각각의 태스크에 대해 개별적으로 훈련된 기본 심층 신경 네트워크이다. 1) DNN-STL (Singl-Task Learning): Basic deep neural network trained individually for each task.

2) DNN-MTL(Multi-Task Learning) : 한 번에 모든 태스크에 대해 학습된 기본 심층 신경 네트워크이다. 2) DNN-MTL (Multi-Task Learning): Basic deep neural network learned for all tasks at once.

3) DNN-L2 : 기본적인 심층 신경 네트워크로, 각 태스크 t에서 Wt는

로 초기화되고 SGD로 계속 훈련되며,

에 대해서`

정규화된 SGD로 훈련된다. 3) DNN-L2: Basic deep neural network, where Wt in each task t

Initialized to and continue training with SGD,

About `

Trained with normalized SGD.

4) DNN-EWC : 정교화를 위한 탄력적인 가중치 강화로 훈련된 심층 네트워크이다. 4) DNN-EWC: It is a deep network trained with elastic weight reinforcement for elaboration.

5) DNN- 프로그레시브 : 각 태스크에 대한 네트워크 가중치가 나중에 도착한 태스크에 대해 고정된 점진적 네트워크이다. 5) DNN- Progressive: The network weight for each task is a fixed progressive network for tasks arriving later.

6) DEN. 본 개시의 재학습 방법이다. 6) DEN. It is a re-learning method of the present disclosure.

기본 네트워크 설정. Basic network settings.

1) 피드 포워드 네트워크 : ReLU 활성화를 갖는 312-128개 각각 갖는 2 레이어 네트워크를 사용하였으며,

의 정규화를 위한

로 0.1을 사용하였다. 그리고 수학식 2의 희소성

는

로 설정하였다. 그리고 각 태스크에 추가되는 유닛의 수로 k=10을 사용하였으며, 수학식 4에서의 그룹 희소성 정규화 항의

은 0.01을 설정하였으며, 네트워크 분할 및 복제에서의 l2 거리 임계치로

을 설정하였다. 1) Feed forward network: 312-128 two-layer networks with ReLU activation were used,

For normalization of

0.1 was used as the furnace. And the scarcity of equation (2)

The

Was set to. Also, k=10 was used as the number of units added to each task, and the group sparse normalization protest in Equation 4 was used.

Is set to 0.01, and as the l2 distance threshold in network segmentation and replication

Was set.

2) 합성곱 네트워크 : 2개의 합성곱 레이어와 3개의 완전 연결 레이어를 ㄱ 갖는 LeNet을 사용하였다. 여기서 l2 정규화로서

로 0.01을 사용하였으며, 희소성을 위하여

를 사용하였으며, 그룹 희소성을 위하여

을 사용하였다. 그리고 네트워크 분할 및 복제를 위한

에 대해서

을 각각 합성곱 레이어 및 완전 연결 레이어에 설정하였다. 2) Convolutional network: LeNet with 2 convolutional layers and 3 fully connected layers was used. Where l2 is normalized

0.01 was used, and for scarcity

And for group scarcity

Was used. And for network segmentation and replication

about

Was set to the convolutional layer and the fully connected layer, respectively.

모든 모델 및 알고리즘은 Tensorflow 라이브러리를 사용하여 구현하였다. 이하에서는 사용된 데이터 세트들에 대해서 설명한다. All models and algorithms were implemented using the Tensorflow library. Hereinafter, the used data sets will be described.

1) MNIST-Variation. 이 데이터 세트는 0에서 9까지의 자필 자릿수로 구성된 62000개의 이미지로 구성된다. 1) MNIST-Variation. This data set consists of 62000 images of 0 to 9 handwritten digits.

2) CIFAR-10. 이 데이터 세트는 차량 및 동물을 비롯한 일반적인 객체의 60000개의 이미지로 구성된다. 각 클래스에는 6000개의 32x32 이미지가 있으며, 여기에는 5000개의 이미지가 학습에 사용하고 나머지는 테스트에 사용하였다. 2) CIFAR-10. This data set consists of 60000 images of common objects, including vehicles and animals. Each class has 6000 32x32 images, with 5000 images used for training and the rest for testing.

3) AWE. 이 데이터 세트는 50 동물의 30475 이미지로 구성된다. 3) AWE. This data set consists of 30475 images of 50 animals.

이하에서는 상술한 모델, 데이터 세트를 이용한 정량적 평가를 설명한다. 구체적으로 예측 정확도와 효율 모두에 대해 모델을 검증하였다. Hereinafter, quantitative evaluation using the above-described model and data set will be described. Specifically, the model was verified for both prediction accuracy and efficiency.

도 9는 각 학습 모델 및 데이터 세트 각각에 대한 평균 태스크당 평균 성능을 나타내는 도면이다. 구체적으로, 도 9a는 MNIST-variation 데이터 세트에 대한 모델별 태스크 수를 나타낸 도면이고, 도 9b는 CIFAR-10 데이터 세트에 대한 모델별 태스크 수를 나타낸 도면, 도 9c는 AWA 데이터 세트에 대한 모델별 태스크 수를 나타낸 도면이다. 9 is a diagram showing average performance per average task for each training model and each data set. Specifically, FIG. 9A is a diagram showing the number of tasks by model for the MNIST-variation data set, FIG. 9B is a diagram showing the number of tasks by model for the CIFAR-10 data set, and FIG. 9C is by model for the AWA data set It is a diagram showing the number of tasks.

도 9를 참조하면, CNN-MTL 및 DNN-STL은 각각 CIFAR-10 및 AWA에서 최고의 성능을 갖는 것을 확인할 수 있다. 한 번에 모든 태스크를 수행할 수 있도록 훈련되었거나 각 태스크에 가장 적합하기 때문에 실적이 좋을 것으로 예상된다. 9, it can be seen that CNN-MTL and DNN-STL have the best performance in CIFAR-10 and AWA, respectively. It is expected to perform well because it has been trained to perform all tasks at once or is best suited for each task.

반면 다른 모델들은 온라인으로 훈련되어 의미론적 표류를 유발할 수 있다. 태스크 수가 적으면 MTL은 다중 태스크 학습을 통한 지식 공유에서 가장 잘 작동하지만 태스크 수가 많으면 STL은 MTL보다 학습 크기가 크기 때문에 더 잘 작동한다. On the other hand, other models can be trained online to trigger semantic drift. If the number of tasks is small, MTL works best in knowledge sharing through multi-task learning, but if the number of tasks is large, STL works better because of the larger learning size than MTL.

DEN은 다른 모델과 거의 동일한 성능을 가지며, MNIST-변형과 같은 데이터 세트에서 다른 것들의 성능을 능가함을 확인할 수 있다. It can be seen that DEN has almost the same performance as other models, and surpasses the performance of others in data sets such as MNIST-deformation.

그리고 L2 및 EWC와 같은 정규화와 결합된 재학습 모델은 모든 데이터 세트에서 상대적으로 잘 수행되지 않음을 확인할 수 있다. And it can be seen that retraining models combined with normalizations such as L2 and EWC perform poorly in all data sets.

점진적 네트워크는 이 두 가지보다 잘 작동하지만, AWA에서는 성능이 상당히 나빠진다. 이는 많은 수의 태스크로 인해 적절한 네트워크 크기를 찾기가 어려울 수 있기 때문이다. 네트워크가 너무 작으면 새로운 태스크를 나타낼 수 있는 충분한 학습 능력을 갖추지 못하고, 너무 크면 과도하게되는 경향이 있다. Progressive networks work better than both, but performance is significantly worse in AWA. This is because a large number of tasks can make it difficult to find an appropriate network size. If the network is too small, it does not have enough learning ability to represent new tasks, and if it is too large, it tends to become excessive.

이와 같은 도 9를 참조하면, 본 개시에 따른 DEN은 평생학습 방식임에도 새로운 개념을 학습할 때, 독립적 학습(STL)이나 동시 학습(MTL)에 근접한 성능을 모이거나 뒤어난 결과를 보임을 확인할 수 있다. Referring to FIG. 9, when DEN according to the present disclosure is a lifelong learning method, when learning a new concept, it can be confirmed that the performance close to independent learning (STL) or simultaneous learning (MTL) is gathered or shows a result that is inferior. have.

도 10은 네트워크 크기에 따른 성능 차이를 나타내는 도면이다. 구체적으로, 도 10a은 MNIST-Variation 데이터 세트에서의 학습 모델별 네트워크 크기를 나타내는 도면이고, 도 10b는 CIFAR-10 데이터 세트에서의 학습 모델별 네트워크 크기를 나타내는 도면이고, 도 10c는 AWA 데이터 세트에서의 학습 모델별 네트워크 크기를 나타내는 도면이다. 10 is a diagram showing a performance difference according to a network size. Specifically, FIG. 10A is a diagram showing the network size per training model in the MNIST-Variation data set, FIG. 10B is a diagram showing the network size per training model in the CIFAR-10 data set, and FIG. 10C is in the AWA data set This is a diagram showing the network size of each learning model.

도 10을 참조하면, 본 개시에 따른 DEN은 프로그레시브 네트워크보다 훨씬 적은 수의 파라미터로 훨씬 더 나은 성능을 얻거나 비슷한 수의 파라미터를 사용하여 훨씬 더 우수한 성능을 가짐을 확인할 수 있다. DEN은 또한 MNIST-Variation, Cigar-10 및 AWA에서 각각 18.0%, 19.1% 및 11:9%의 크기를 사용하여 STL과 동일한 수준의 성능을 얻을 수 있다. Referring to FIG. 10, it can be seen that the DEN according to the present disclosure has much better performance with a much smaller number of parameters than a progressive network, or has a much better performance using a similar number of parameters. DEN can also achieve the same level of performance as STL using sizes of 18.0%, 19.1% and 11:9% respectively in MNIST-Variation, Cigar-10 and AWA.

이것은 AWA에서 실질적으로 큰 네트워크를 학습하는 동안 MNIST-Variation에서 매우 컴팩트 한 모델을 배우기 때문에 최적의 크기를 동적으로 찾을 수 있는 것이 DEN의 주요 이점이다. 모든 태스크 (DEN-Finetune)에 대한 DEN 미세 조정은 모든 데이터 집합에서 최상의 성능 모델을 얻는다. DEN은 평생 학습에 유용 할뿐만 아니라 모든 태스크이있을 때 네트워크 크기 예측에도 사용될 수 있다. 첫 번째 장소에서 사용할 수 있다.This is the main advantage of DEN being able to dynamically find the optimal size, as you learn a very compact model in MNIST-Variation while learning a substantially large network in AWA. DEN fine-tuning for all tasks (DEN-Finetune) yields the best performance model for all datasets. DEN is not only useful for lifelong learning, but can also be used to estimate network size when all tasks are present. Can be used in the first place.

또한, 학습된 네트워크의 크기를 사용해 재학습을 했을 때 모든 데이터 세트에서 가장 높은 성능을 보임을 확인할 수 있다. 이를 통해서 본 개시에 따른 학습 모델이 평생 학습 시나리오뿐 아니라 처음부터 복수의 태스크를 한꺼번에 학습할 때 또한 마찬가지로 학습하는 네트워크의 크기 추정에 활용할 수 있음을 확인할 수 있다. In addition, it can be seen that when re-learning using the size of the learned network, it shows the highest performance in all data sets. Through this, it can be confirmed that the learning model according to the present disclosure can be utilized not only for lifelong learning scenarios, but also for estimating the size of a learning network when learning a plurality of tasks at once.

도 11은 선택적 재훈련의 효과를 나타내는 도면이다. 구체적으로, 도 11a는 크기별 처리 시간을 나타낸 도면이고, 도 11b는 AUC를 나타낸 도면, 도 11c는 확장 성능을 나타내는 도면이다. 11 is a view showing the effect of selective retraining. Specifically, FIG. 11A is a view showing processing time by size, FIG. 11B is a view showing AUC, and FIG. 11C is a view showing expansion performance.

MNIST-Variation 데이터 세트의 각 태스크에 대한 학습 속도와 정확도를 측정함으로써 선택 학습이 얼마나 효율적이고 효과적인지 검토하였다. 이를 위해 우리는 DNN-Selective라고하는 네트워크 확장 없는 모델을 DNN-L2와 비교하였다. We examined how efficient and effective selective learning is by measuring the learning speed and accuracy for each task in the MNIST-Variation data set. To this end, we compared a model without network extension called DNN-Selective to DNN-L2.

도 11a 및 도 11b를 참조하면, DNN-Selective는 전체 네트워크에 비해 약간의 정확도 저하를 얻는 반면, 6:51배의 속도 향상을 달성하므로 선택 재학습은 네트워크의 전체 재학습보다 훨씬 효율적임을 확인할 수 있다. 11A and 11B, DNN-Selective achieves a slight decrease in accuracy compared to the entire network, while achieving a speed improvement of 6:51 times, confirming that selective re-learning is much more efficient than the entire re-learning of the network. have.

도 11c를 참조하면, DNN-L1은 또한 희소 정규화 없이 훈련된 전체 네트워크보다 우수한 성능을 발휘함을 확인할 수 있다. 왜냐하면, 희소(sparse) 네트워크는 순방향 및 역방향 통과에 대한 계산의 많은 부분을 없애기 때문이다. 그러나 DNN 선택형으로는 효율적이지 않다. Referring to Figure 11c, it can be seen that DNN-L1 also performs better than the entire network trained without sparse normalization. This is because sparse networks eliminate much of the computation for forward and reverse passes. However, it is not efficient with DNN selection.

이하에서는 도 12를 참조하여 네트워크 확장 효과에 대해서 검토한다. Hereinafter, the network expansion effect will be described with reference to FIG. 12.

도 12는 MNIST-Variation 데이터 세트에 대한 시맨틱 드리프트 실험을 나타내는 도면이다. 구체적으로, 도 12a는 CPU 시간으로 측정된 각 태스크에 대한 학습 시간이고, 도 12b는 비교된 모델의 평균 태스크 당 성능, 도 12c는 확장 성능을 나타낸다. 12 is a view showing a semantic drift experiment for the MNIST-Variation data set. Specifically, FIG. 12A is a learning time for each task measured in CPU time, FIG. 12B is an average performance per task of the compared model, and FIG. 12C is an extended performance.

선택적 재훈련 및 레이어 확장을 수행하지만 네트워크가 분할되지 않은 모델의 변형과 네트워크 확장의 효율성을 모델과 비교한다. Selective retraining and layer expansion are performed, but the model's transformation and network expansion efficiency are compared with the model in which the network is not divided.

DNN-Dynamic은 DNN-Constant를 포함한 모든 모델보다 월등히 우수한 평균 AUC를 얻는 반면 DNN-Constant보다 네트워크의 크기를 크게 늘린다. 이는 파라미터의 수가 적어지면 학습의 효율성 측면에서 유익 할뿐만 아니라 모델이 초과 적용되지 못하도록 방지하기 때문일 수 있다. DNN-Dynamic achieves significantly better average AUCs than all models, including DNN-Constant, while significantly increasing the size of the network than DNN-Constant. This may be because the number of parameters is not only beneficial in terms of learning efficiency, but also prevents the model from being over-applied.

도 12를 참조하면, 태스크 t = 1, t = 4 및 t = 7에 대한 각 학습 단계 t에서 모델의 성능이 어떻게 변하는지를 보여준다. Referring to FIG. 12, it shows how the performance of the model changes at each training step t for tasks t = 1, t = 4 and t = 7.

DNN-L2가 초기 단계에서 학습된 모델의 의미론적 표류를 방지하지만 이후 태스크 (t = 4, 7)에서 성능이 점차적으로 악화함을 확인할 수 있다. It can be seen that DNN-L2 prevents semantic drift of the model trained in the early stages, but gradually deteriorates in subsequent tasks (t = 4, 7).

이는 DNN-EWC에서도 공통적으로 나타나는 현상이다. 이러한 결과로 봤을 때, 각 태스크에 이전 태스크와 크게 다른 기능이 필요할 수 있다. MNIST-Variation 데이터 세트. DNN-Progressive는 이전 태스크에 대해 의미론적 드리프트가 없음을 보여 주며, 이전 태스크에 파라미터를 다시 입력하지 않기 때문에 예상된다. This is a phenomenon common to DNN-EWC. Based on these results, each task may require significantly different functionality from the previous task. MNIST-Variation data set. DNN-Progressive shows that there is no semantic drift for the previous task, and is expected because the previous task is not re-entered with parameters.

한편, 타임스탬핑(Timestamping)은 시간이 지남에 따라 약간의 성능 저하가 있는 이후 태스크에서 DNN-Progressive보다 효과적이다. 마지막으로 타임스탬프가 적용된 DEN의 전체 모델은 학습 단계에서 성능 저하의 징조를 보이지 않지만 DNN- 프로그레시브보다 월등히 뛰어남을 확인할 수 있다. On the other hand, Timestamping is more effective than DNN-Progressive in later tasks with a slight performance degradation over time. Lastly, the time-stamped DEN's entire model shows no signs of deterioration in the learning stage, but it can be confirmed that it is superior to DNN-progressive.

이 결과는 DEN이 의미론적 표류를 막는 데 매우 효과적이라는 것을 나타낸다. These results indicate that DEN is very effective in preventing semantic drift.

평생 학습 시나리오에서의 학습이 진행될 때, 각 태스크 별 성능 변화 추이를 보면 기존 기술(EWC) 의 경우에는 학습하는 오브젝트가 늘어날수록 의미적 전이(semantic drift)현상에 의해 성능이 하락하는 경향을 보이는 반면, 기존 기술인 progressive의 경우 학습하는 오브젝트가 늘어날 때마다 기존 학습된 네트워크를 고정하고 선형적으로 네트워크를 확장하기 때문에, 성능은 유지되나 네트워크의 확장에 따른 계산 시간이 지수적으로 늘어나는 문제를 해결하지 못함. 본 기술은 네트워크의 선택적 확장, 분리, 그리고 재학습 같은 일련의 과정을 통하여 일정한 성능을 유지할 수 있도록 기술화했기 때문에 효율적인 네트워크의 확장을 가능하게 함으로써 계산시간의 증가폭을 최소화하게 된다. When learning in the lifelong learning scenario progresses, the performance change for each task shows the tendency of performance decrease due to semantic drift as the number of objects to learn increases in the case of the existing technology (EWC). In the case of progressive, the existing technology, since the existing learned network is fixed and the network is linearly expanded whenever the number of objects to be learned increases, the performance is maintained, but the problem that the calculation time is exponentially increased due to the expansion of the network cannot be solved. . Since this technology is designed to maintain constant performance through a series of processes such as selective expansion, separation, and re-learning of the network, it is possible to efficiently expand the network, thereby minimizing the increase in computation time.

도 13은 본 개시의 일 실시 예에 따른 학습 모델의 재학습 방법을 설명하기 위한 흐름도이다. 13 is a flowchart illustrating a re-learning method of a learning model according to an embodiment of the present disclosure.

복수의 뉴런으로 구성되는 학습 모델 및 신규 태스크를 포함하는 데이터 세트를 입력받는다(S1310). A data set including a learning model composed of a plurality of neurons and a new task is received (S1310).

복수의 뉴런 중 신규 태스크와 관련된 뉴런을 식별하고, 식별된 뉴런에 대해서 신규 태스크와 관련된 파라미터를 선택적 재학습한다(S1320). 구체적으로, 입력된 학습 모델에 대한 손실 함수 및 희소성을 위한 정규화 항을 갖는 목적 함수가 최소화하도록 신규 파라미터 행렬을 산출하고, 산출된 신규 파라미터 행렬을 이용하여 신규 태스크와 관련된 뉴런을 식별한다. 그리고 식별된 뉴런만으로 구성되는 네트워크 파라미터에 대해서 데이터 세트를 이용하여 신규 파라미터 행렬을 산출하고, 산출된 신규 파라미터 행렬을 학습 모델의 식별된 뉴런에 반영하여 선택적 재학습을 수행할 수 있다. Among the plurality of neurons, neurons associated with a new task are identified, and parameters associated with the new task are selectively re-learned with respect to the identified neurons (S1320). Specifically, a new parameter matrix is calculated such that an objective function having a loss function for the input learning model and a normalization term for scarcity is minimized, and the new parameter matrix is used to identify neurons associated with the new task. In addition, a new parameter matrix may be calculated using a data set for network parameters composed of only the identified neurons, and selective re-learning may be performed by reflecting the calculated new parameter matrix in the identified neurons of the learning model.

그리고 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 선택적 재학습이 수행된 학습 모델의 크기를 동적으로 확장하여 입력된 학습 모델을 재구성한다(S1330). 구체적으로, 선택적 재학습이 수행된 학습 모델이 기설정된 손실 값을 가지면, 선택적 재학습이 수행된 학습 모델에 레이어별로 고정된 개수의 뉴런을 추가하고, 그룹 희소성을 이용하여 추가된 뉴런 중 불필요한 뉴런을 제거하여 입력된 학습 모델을 재구성할 수 있다. Then, when the learning model in which the selective re-learning is performed has a predetermined loss value, the input learning model is reconstructed by dynamically expanding the size of the learning model in which the selective re-learning is performed (S1330). Specifically, when a learning model in which selective re-learning is performed has a predetermined loss value, a fixed number of neurons are added to each learning model in which selective re-learning is performed, and unnecessary neurons among neurons added using group sparsity By removing, you can reconstruct the input learning model.

또는 식별된 뉴런의 변화가 기설정된 값을 가지면, 식별된 뉴런을 복제하여 입력된 학습 모델을 확장하고, 식별된 뉴런은 기존의 값을 갖도록 하여 입력된 학습 모델을 재구성할 수 있다. Alternatively, if the change of the identified neuron has a predetermined value, the inputted learning model may be expanded by replicating the identified neuron, and the inputted learning model may be reconstructed by allowing the identified neuron to have an existing value.

따라서, 본 실시 예에 따른 학습 모델의 재학습 방법은 태스크를 확장할 때, 즉 재학습을 수행할 때, 전체 가중치가 아니라 그 태스크에 의미있는 뉴런만을 재학습하는바, 효율적인 재학습이 가능하다. 또한, 의미있는 뉴런만을 재학습하는바 의미적 전이를 방지할 수 있게 된다. 도 13과 같은 학습 모델의 재학습 방법은 도 1 또는 도 2의 구성을 가지는 전자 장치상에서 실행될 수 있으며, 그 밖의 다른 구성을 가지는 전자 장치상에서도 실행될 수 있다. Therefore, in the re-learning method of the learning model according to the present embodiment, when the task is extended, that is, when re-learning, only the neurons meaningful to the task are re-learned, not the overall weight, so efficient re-learning is possible . In addition, re-learning only meaningful neurons can prevent semantic metastasis. The re-learning method of the learning model shown in FIG. 13 may be executed on the electronic device having the configuration of FIG. 1 or 2, and may also be performed on the electronic device having other configurations.

또한, 상술한 바와 같은 학습 모델의 재학습 방법은 컴퓨터에서 실행될 수 있는 실행 가능한 알고리즘을 포함하는 프로그램으로 구현될 수 있고, 상술한 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다. In addition, the re-learning method of the learning model as described above may be implemented as a program including an executable algorithm that can be executed on a computer, and the above-described program is stored in a non-transitory computer readable medium. Can be provided.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 방법을 수행하기 위한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium means a medium that stores data semi-permanently and that can be read by a device, rather than a medium that stores data for a short time, such as registers, caches, and memory. Specifically, programs for performing the various methods described above may be stored and provided on a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, and ROM.

도 14는 본 개시의 일 실시 예에 따라 동적 확장 네트워크의 증분 학습을 설명하기 위한 흐름도이다. 14 is a flowchart illustrating incremental learning of a dynamic extension network according to an embodiment of the present disclosure.

도 14를 참조하면, 먼저, t을 1로 초기화한다. Referring to FIG. 14, first, t is initialized to 1.

그리고 t가 T보다 작고(S1410-Y), 1이면(S1420-Y), 수학식 1을 이용하여 W¹을 학습한다(S1425). Then, if t is less than T (S1410-Y) and 1 (S1420-Y), W ¹ is learned using Equation 1 (S1425).

그리고 선택적 재학습을 수행한다(S1430). 구체적으로, 새 태스크와 관련된 뉴런을 식별하고 관련 태스크와 관련된 네트워크 파라미터를 선택적으로 재학습을 수행할 수 있다. 선택적 재학습의 구체적인 동작은 도 15를 참조하여 후술한다. Then, selective re-learning is performed (S1430). Specifically, neurons associated with a new task may be identified, and network parameters related to the related tasks may be selectively retrained. The specific operation of the selective re-learning will be described later with reference to FIG. 15.

그리고 선택적 재학습이 수행된 학습 모델의 손실 값이 기설정된 손실 값을 갖는지를 확인한다(S1435). Then, it is checked whether the loss value of the learning model in which the selective re-learning is performed has a predetermined loss value (S1435).

그리고 학습 모델의 손실 값이 기설정된 손실 값을 가지면(S1435-Y), 선택적 재훈련이 설정된 임계 값 이하의 원하는 손실을 얻지 못하면 그룹 - 희소성 정규화를 사용하여 불필요한 모든 신경을 제거하면서 네트워크 크기를 하향식으로 확장한다(S1440). 동적 확장 동작에 대해서는 도 16을 참조하여 후술한다. And if the loss value of the training model has a predetermined loss value (S1435-Y), if selective retraining does not achieve the desired loss below the set threshold, the network size is top-down while removing all unnecessary nerves using group-sparse normalization. Expand to (S1440). The dynamic expansion operation will be described later with reference to FIG. 16.

그리고 식별된 뉴런의 변화가 기설정된 값을 가지면, 네트워크를 분할하거나 복제한다(S1445). 구체적으로, DEN은 각 유닛에 대한 드리프트(drift)

를 계산하여 훈련 도중 원래 값에서 너무 많이 벗어난 유닛을 식별하고 복제할 수 있다. 네트워크의 분할 및 복제에 대한 구체적인 동작은 도 17을 참조하여 후술한다. Then, if the change of the identified neuron has a predetermined value, the network is divided or duplicated (S1445). Specifically, DEN is a drift for each unit

You can calculate and identify units that deviate too much from the original value during training. The detailed operation for the division and replication of the network will be described later with reference to FIG. 17.

이후에 t 값을 증가하고(S1450), 상술한 단계를 T가 될 때까지 반복한다. Thereafter, the t value is increased (S1450), and the above-described steps are repeated until T is reached.

따라서, 본 실시 예에 따른 증분 학습은 태스크를 확장할 때, 즉 재학습을 수행할 때, 전체 가중치가 아니라 그 태스크에 의미있는 뉴런만을 재학습하는바, 효율적인 재학습이 가능하다. 또한, 의미있는 뉴런만을 재학습하는바 의미적 전이를 방지할 수 있게 된다. 도 14와 같은 증분 학습은 도 1 또는 도 2의 구성을 가지는 전자 장치상에서 실행될 수 있으며, 그 밖의 다른 구성을 가지는 전자 장치상에서도 실행될 수 있다. Therefore, in the incremental learning according to the present embodiment, when extending a task, that is, when performing re-learning, only the neurons meaningful to the task are re-learned rather than the overall weight, and thus efficient re-learning is possible. In addition, re-learning only meaningful neurons can prevent semantic metastasis. The incremental learning as shown in FIG. 14 may be executed on the electronic device having the configuration of FIG. 1 or 2, or may be performed on the electronic device having other configurations.

또한, 상술한 바와 같은 증분 학습은 컴퓨터에서 실행될 수 있는 실행 가능한 알고리즘을 포함하는 프로그램으로 구현될 수 있고, 상술한 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다. In addition, the incremental learning as described above may be implemented as a program including an executable algorithm that can be executed on a computer, and the above-described program can be stored and provided in a non-transitory computer readable medium. have.

도 15는 도 14의 선택적 재학습 단계를 설명하기 위한 흐름도이다. 15 is a flowchart illustrating the selective re-learning step of FIG. 14.

도 15를 참조하면, l과 S를 초기화하고, 상술한 수학식 2를 이용하여

를 계산한다(S1510). Referring to FIG. 15, l and S are initialized, and Equation 2 described above is used.

Calculate (S1510).

그리고 레이어 L에서의 task t 에 대한 웨이트인

의 i와 O_t 사이의 가중치가 0이 아니면(S1520), S(서브네트워크)에 해당 뉴런 i를 추가한다(S1530). And the weight for task t in layer L

If the weight between i and O _t is not 0 (S1520), the corresponding neuron i is added to S (subnetwork) (S1530).

이후에 레이어가 0보다 크고(S1540),

인 뉴런 S가 존재할 때(S1550), S에 뉴런 i를 추가한다(S1560). After that, the layer is larger than 0 (S1540),

When phosphorous neuron S is present (S1550), neuron i is added to S (S1560).

이후에 레이어를 상위 레이어로 이동하고(S1570), 상술한 조건 1 및 조건 2를 만족하는 뉴런을 검색하는 동작을 반복적으로 수행한다. Thereafter, the layer is moved to the upper layer (S1570), and the operation of searching for neurons satisfying the above-described conditions 1 and 2 is repeatedly performed.

이러한 과정을 레이어 단위로 수행하여, 모든 레이어에 대한 재학습이 필요한 뉴런의 식별이 끝나면(S1550-N), 수학식 3을 이용하여

를 얻는다. By performing this process on a layer-by-layer basis, when identification of neurons requiring re-learning for all layers is completed (S1550-N), using Equation 3

Get

따라서, 본 실시 예에 따른 선택적 재학습은 태스크를 확장할 때, 즉 재학습을 수행할 때, 전체 가중치가 아니라 그 태스크에 의미있는 뉴런만을 재학습하는바, 효율적인 재학습이 가능하다. 또한, 의미있는 뉴런만을 재학습하는바 의미적 전이를 방지할 수 있게 된다. 도 15와 같은 증분 학습은 도 1 또는 도 2의 구성을 가지는 전자 장치상에서 실행될 수 있으며, 그 밖의 다른 구성을 가지는 전자 장치상에서도 실행될 수 있다. Accordingly, the selective re-learning according to the present exemplary embodiment re-learns only neurons that are meaningful to the task, not the overall weight when extending the task, that is, when performing re-learning, so efficient re-learning is possible. In addition, re-learning only meaningful neurons can prevent semantic metastasis. The incremental learning as shown in FIG. 15 may be executed on the electronic device having the configuration of FIG. 1 or 2, or may be performed on the electronic device having other configurations.

또한, 상술한 바와 같은 선택적 재학습은 컴퓨터에서 실행될 수 있는 실행 가능한 알고리즘을 포함하는 프로그램으로 구현될 수 있고, 상술한 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다. In addition, the selective re-learning as described above may be implemented as a program including an executable algorithm that can be executed on a computer, and the above-described program is stored and provided in a non-transitory computer readable medium. Can.

이하에서 도 14의 동적 네트워크 확장 단계를 설명한다. Hereinafter, the dynamic network expansion step of FIG. 14 will be described.

앞선 선택적 재학습에서 목표로 하는 손실 값을 계산한다. Calculate the target loss value in the previous selective relearning.

산출된 손실 값이 기준점보다 더 큰 손실 값을 가지면, 네트워크 크기를 고정적으로 확장할 수 있다. 구체적으로 h^N을 모든 레이어에 추가할 수 있다. If the calculated loss value has a loss value greater than the reference point, the network size can be fixedly extended. Specifically, h ^N can be added to all layers.

이후에 수학식 4를 계산하고, 현재 레이어가 0이 아니면, 확장된 뉴런 중 불필요한 뉴런을 제거하는 동작을 반복적으로 수행한다. Subsequently, Equation 4 is calculated, and if the current layer is not 0, an operation of removing unnecessary neurons from the extended neurons is repeatedly performed.

그리고 레이어를 상측으로 이동하고, 상술한 계산 동작을 반복하여 뉴런을 최적화하는 동작을 수행할 수 있다. Then, the layer may be moved upward, and the above-described calculation operation may be repeated to optimize the neuron.

따라서, 본 실시 예에 따른 동적 네트워크 확장은 보다 정확한 신규 태스크 반영을 위하여, 필요한 경우 동적으로 네트워크를 확장할 수 있다. 또한, 일괄적으로 모든 레이어에 뉴런을 추가하는 것이 아니라 불필요한 뉴런은 삭제하는 동작을 수행한바 최적화된 네트워크 확장이 가능하다. 동적 네트워크 확장은 도 1 또는 도 2의 구성을 가지는 전자 장치상에서 실행될 수 있으며, 그 밖의 다른 구성을 가지는 전자 장치상에서도 실행될 수 있다. Therefore, the dynamic network expansion according to the present embodiment may dynamically expand the network if necessary in order to reflect more accurate new tasks. In addition, it is possible to optimize the network expansion by performing an operation of deleting unnecessary neurons rather than adding neurons to all layers in a batch. The dynamic network extension may be performed on the electronic device having the configuration of FIG. 1 or 2, or may be performed on the electronic device having other configurations.

또한, 상술한 바와 같은 동적 네트워크 확장은 컴퓨터에서 실행될 수 있는 실행 가능한 알고리즘을 포함하는 프로그램으로 구현될 수 있고, 상술한 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다. In addition, the dynamic network extension as described above may be implemented as a program including executable algorithms executable on a computer, and the above-described program may be stored and provided in a non-transitory computer readable medium. Can.

이하에서는 도 14의 네트워크 분리 및 복제 단계를 설명하기 위한 흐름도이다. Below is a flow chart for explaining the network separation and replication steps of FIG. 14.

앞선 동적 네트워크 확장 동작 이후에, W^t _L,K를 계산한다. After the previous dynamic network expansion operation, W ^t _L,K is calculated.

그리고 모든 히든 뉴런에 대해서 변화(시멘트 드래프트) 양을 계산하고, 계산된 뉴런의 변화량이 기준점보다 높은지를 판단한다.Then, the amount of change (cement draft) is calculated for all hidden neurons, and it is determined whether the calculated change amount of the neuron is higher than the reference point.

판단 결과, 계산된 뉴런의 변화량이 기준점(

)보다 높으면, 해당 뉴런들(B)을 복제하여 네트워크를 확장하고, 업데이트된 뉴런(B)을 이전 단계(A)로 복귀할 수 있다. As a result of the judgment, the amount of change in the calculated neuron is the reference point (

), the network can be expanded by replicating the neurons (B), and the updated neurons (B) can be returned to the previous step (A).

그리고 모든 히든 뉴런에 대해서 상술한 확인 동작을 반복적으로 수행한다.And the above-described verification operation is repeatedly performed for all hidden neurons.

따라서, 본 실시 예에 따른 네트워크 분리 및 복제 방법은 뉴런의 변화가 기준 값보다 크면 네트워크를 분리 및 복제하는바 의미적 전이를 방지할 수 있게 된다. 네트워크 분리 및 복제 방법은 도 1 또는 도 2의 구성을 가지는 전자 장치상에서 실행될 수 있으며, 그 밖의 다른 구성을 가지는 전자 장치상에서도 실행될 수 있다. Therefore, in the network separation and replication method according to the present embodiment, when a change in neurons is greater than a reference value, separating and replicating a network can prevent semantic transfer. The network separation and duplication method may be executed on the electronic device having the configuration of FIG. 1 or 2, and may also be performed on the electronic device having other configurations.

또한, 상술한 바와 같은 네트워크 분리 및 복제 방법은 컴퓨터에서 실행될 수 있는 실행 가능한 알고리즘을 포함하는 프로그램으로 구현될 수 있고, 상술한 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다. In addition, the network separation and replication method as described above may be implemented as a program including an executable algorithm that can be executed on a computer, and the above-described program is stored in a non-transitory computer readable medium. Can be provided.

또한, 이상에서는 본 개시의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 개시는 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 개시의 요지를 벗어남이 없이 당해 개시가 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 개시의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.In addition, although the preferred embodiments of the present disclosure have been described and described above, the present disclosure is not limited to the specific embodiments described above, and the technical field to which the present disclosure belongs without departing from the gist of the present disclosure claimed in the claims. In addition, various modifications can be implemented by those having ordinary knowledge in the art, and these modifications should not be individually understood from the technical idea or prospect of the present disclosure.

100: 전자 장치 110: 메모리
120: 프로세서 130: 통신 인터페이스
140: 디스플레이 150: 조작 입력부100: electronic device 110: memory
120: processor 130: communication interface
140: display 150: operation input

Claims

In the re-learning method of the learning model in the electronic device,
Receiving, from the electronic device, a data set including a learning model composed of a plurality of neurons and a new task;
Identifying, in the electronic device, a neuron associated with the new task among the plurality of neurons, and selectively re-learning parameters related to the new task with respect to the identified neuron; And
In the electronic device, if the learning model in which the selective re-learning is performed has a predetermined loss value, reconstructing the input learning model by dynamically expanding the size of the learning model in which the selective re-learning is performed. How to retrain the learning model.

According to claim 1,
The selective re-learning step,
A new learning model matrix is calculated such that the objective function having a loss function for the input learning model and a normalization term for scarcity is minimized, and the learning model re-identifying a neuron related to the new task is calculated using the calculated new parameter matrix. Learning method.

According to claim 1,
The selective re-learning step,
A learning model that performs a selective re-learning by performing a new re-learning by calculating a new parameter matrix using the data set for the network parameters composed of only the identified neurons and reflecting the calculated new parameter matrix to the identified neurons of the learning model. Learning method.

According to claim 1,
Reconstructing the input learning model,
If the learning model in which the selective re-learning is performed has a predetermined loss value, a fixed number of neurons is added to each layer in the learning model in which the selective re-learning is performed, and unnecessary neurons among the added neurons are used by using group sparsity. A learning model re-learning method to remove and reconstruct the input learning model.

According to claim 4,
Reconstructing the input learning model,
A learning model re-learning method for identifying unnecessary neurons among the added neurons by using an objective function having a loss function for the input learning model, a normalization term for scarcity, and a group normalization term for group scarcity.

According to claim 1,
Reconstructing the input learning model,
If the change of the identified neuron has a predetermined value, the learning model is reconstructed by replicating the identified neuron to expand the input learning model, and allowing the identified neuron to have an existing value to reconstruct the input learning model. How to relearn.

In the electronic device,
A memory in which a data set including a learning model composed of a plurality of neurons and a new task is stored; And
Among the plurality of neurons, a neuron associated with the new task is identified to selectively relearn parameters related to the new task with respect to the identified neuron, and when the learning model in which selective relearning is performed has a predetermined loss value, the selective regeneration is performed. And a processor that dynamically expands a size of a learning model in which learning is performed to reconstruct a learning model stored in the memory.

The method of claim 7,
The processor,
An electronic device that calculates a new parameter matrix so that an objective function having a loss function for the learning model stored in the memory and a normalization term for scarcity is minimized, and uses the calculated new parameter matrix to identify neurons related to the new task .

The method of claim 7,
The processor,
An electronic device that performs a selective re-learning by calculating a new parameter matrix using the data set for a network parameter composed only of the identified neurons, and reflecting the calculated new parameter matrix to the identified neurons of the learning model.

The method of claim 7,
The processor,
If the learning model in which the selective re-learning is performed has a predetermined loss value, a fixed number of neurons is added to each layer in the learning model in which the selective re-learning is performed, and unnecessary neurons among the added neurons are used by using group sparsity. An electronic device that removes and reconstructs the input learning model.

The method of claim 10,
The processor,
An electronic device that identifies unnecessary neurons among the added neurons by using an objective function having a loss function for a learning model stored in the memory, a normalization term for sparseness, and a group normalization term for group scarcity.

The method of claim 7,
The processor,
When the change of the identified neuron has a predetermined value, the former duplicates the identified neuron to expand the learning model stored in the memory, and the identified neuron has an existing value to reconstruct the input learning model. Device.

A computer-readable recording medium comprising a program for executing a method of re-learning a learning model in an electronic device, comprising:
The re-learning method of the learning model,
Receiving a data set including a learning model composed of a plurality of neurons and a new task;
Identifying a neuron associated with the new task among the plurality of neurons, and selectively re-learning parameters associated with the new task for the identified neuron; And
And reconstructing the input learning model by dynamically expanding the size of the learning model in which the selective re-learning is performed, when the learning model in which the selective re-learning is performed has a predetermined loss value. .