KR20210057996A

KR20210057996A - Multi-task learning classifier learning apparatus and the method thereof

Info

Publication number: KR20210057996A
Application number: KR1020190144866A
Authority: KR
Inventors: 차정원; 성수진
Original assignee: 창원대학교 산학협력단
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2021-05-24

Abstract

The present invention relates to an apparatus and method for learning a multi-task learning classifier capable of appropriately performing the update of network parameters. An apparatus and method for learning a multi-task learning classifier according to the present invention includes a preprocessor for preprocessing an input corpus; a task-specific feature extraction unit for extracting features for one or more respective tasks from the preprocessed input corpus; a task sharing feature extraction unit for extracting features for all tasks; a feature synthesis unit for synthesizing the features extracted by the task feature extraction unit; a task classification unit for predicting a probability value of a label for a task with the feature synthesized by the feature synthesis unit and calculating a loss value; an auxiliary classification unit for predicting a probability value of a label for a task and calculating a loss value by inputting only the features extracted from a corresponding layer block whenever the layer block of the task-specific feature extraction unit is finished; and a loss value synthesis unit for performing backpropagation after synthesizing the loss values of the task classification unit and the auxiliary classification unit.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention Multi-task learning classifier learning apparatus and the method thereof

본 발명은 멀티 태스크 러닝 분류기 학습장치에 관한 것으로, 특히 네트워크 파라미터의 업데이트가 적절하게 수행되게 하여 기울기값의 소실문제(Vanishing gradient problem)를 최소화하면서 학습이 가능하도록 한 멀티 태스크 러닝 분류기 학습장치 및 방법에 관한 것이다. The present invention relates to a multi-task learning classifier learning apparatus, and in particular, a multi-task learning classifier learning apparatus and method that enables learning while minimizing a vanishing gradient problem by appropriately updating network parameters. It is about.

자연어 처리를 위해 사용되는 많은 기계학습 알고리즘은 일련의 자질(feature) 정보를 입력으로 받아 최고의 성능을 낼 수 있는 자질의 가중치(weight)를 찾아내서 처리하는 것이 성능 향상을 위해서 좋다. 그러나 자연어 처리 모듈에 따라 새로운 자질을 디자인하고 최적의 성능을 내는 자질의 조합을 찾는데에는 많은 시간과 노력이 필요하다. Many machine learning algorithms used for natural language processing receive a series of feature information as an input and find and process the weight of the feature that can achieve the best performance for performance improvement. However, it takes a lot of time and effort to design new features according to the natural language processing module and to find a combination of features that achieves optimal performance.

이러한 문제를 해결하기 위해 인공지능이나 딥 러닝(deep learning)을 이용한 자연어 처리 연구가 수행되고 있다. 그리고 이러한 인공지능 및 딥러닝을 통한 자연어 처리의 학습 효과를 높이기 위해서는 학습 데이터로부터 가능한 많은 자질을 추출하여 사용하는 것이 유리하다. 자질 추출은 일반적으로 깊은 네트워크를 통해 여러 계층의 레이어(layer)를 거쳐 학습하는 방안이 알려져 있다. In order to solve this problem, natural language processing research using artificial intelligence or deep learning is being conducted. In addition, in order to increase the learning effect of natural language processing through artificial intelligence and deep learning, it is advantageous to extract and use as many qualities as possible from the learning data. Feature extraction is generally known as a method of learning through multiple layers through a deep network.

하지만, 네트워크의 레이어가 많을수록(즉 깊을수록), 손실값(loss)이 극도로 작은 값이 되어 초기 레이어까지 완전하게 전달되지 못하게 된다. However, as the number of layers in the network increases (that is, the deeper it is), the loss becomes an extremely small value, and it is not completely transmitted to the initial layer.

따라서 파라미터 갱신(update)이 적절하게 수행되지 못하는 기울기값의 소실문제(Vanishing gradient problem)가 발생할 위험이 있다. 즉, 딥 러닝 학습에서 신경망은 다중의 은닉층을 포함하여 다양한 비선형적 관계를 학습할 수 있는데, 많은 레이어로 인하여 학습을 위한 연산량이 증가하게 되어 실제 데이터에 대해 오차가 증가하는 문제가 있었던 것이다. Therefore, there is a risk of occurrence of a vanishing gradient problem in which parameter update is not properly performed. That is, in deep learning learning, a neural network can learn a variety of nonlinear relationships including multiple hidden layers. However, due to the many layers, the amount of computation for learning increases, resulting in an increase in errors with respect to actual data.

따라서 본 발명의 목적은 상기한 문제점을 해결하기 위한 것으로, 입력 코퍼스를 분석하고 다중 태스크에 대한 레이블의 확률값을 계산하여 역전파를 수행하는 학습을 반복함으로써, 네트워크 파라미터의 업데이트를 적절하게 수행되게 하여 기울기값의 소실문제(Vanishing gradient problem)를 최소화하는 멀티 태스크 러닝 분류기 학습장치 및 방법을 제공하는 것이다. Accordingly, an object of the present invention is to solve the above problem, by repeating learning of performing backpropagation by analyzing an input corpus and calculating a probability value of a label for multiple tasks, updating of network parameters is performed appropriately. To provide a multitask learning classifier learning apparatus and method that minimizes the vanishing gradient problem.

이와 같은 목적을 달성하기 위한 본 발명은, 입력 코퍼스를 전처리하는 전처리부; 전처리된 입력 코퍼스로부터 하나 이상의 각 태스크에 대한 자질을 추출하는 태스크 특정 자질 추출부 및 전체 전체 태스크에 대한 자질을 추출하는 태스크 공유 자질 추출부로 이루어진 태스크 자질 추출부; 상기 태스크 자질 추출부가 추출한 자질을 종합하는 자질 종합부; 상기 자질 종합부가 종합한 자질로 태스크에 대한 레이블의 확률값을 예측하고 손실값을 계산하는 태스크 분류부; 상기 태스크 특정 자질 추출부의 레이어 블록이 끝날때마다 해당 레이어 블록에서 추출된 자질만을 입력으로 하여 태스크에 대한 레이블의 확률값을 예측하고 손실값을 계산하는 보조 분류부; 및 상기 태스크 분류부 및 보조 분류부의 손실값을 종합한 후 역전파를 수행하는 손실값 종합부를 포함하는 멀티 태스크 러닝 분류기 학습장치를 제공한다.The present invention for achieving the above object, the pre-processing unit for pre-processing the input corpus; A task feature extracting unit comprising a task specific feature extracting unit for extracting features for one or more tasks from the preprocessed input corpus and a task sharing feature extracting unit for extracting features for all tasks; A feature synthesis unit for synthesizing the features extracted by the task feature extraction unit; A task classifying unit that predicts a probability value of a label for a task with features synthesized by the feature synthesis unit and calculates a loss value; An auxiliary classifier for predicting a probability value of a label for a task and calculating a loss value by inputting only features extracted from a corresponding layer block whenever the layer block of the task specific feature extracting unit ends; And it provides a multi-task learning classifier learning apparatus comprising a loss value aggregator for performing backpropagation after synthesizing the loss values of the task classifier and the auxiliary classifier.

상기 태스크 특정 자질 추출부는, 여러 층의 레이어 블록으로 구성된다.The task specific feature extraction unit is composed of several layers of layer blocks.

상기 보조 분류부는, 상기 태스크 특정 자질 추출부의 레이어 블록 수만큼 구성된다. The auxiliary classification unit is configured as many as the number of layer blocks of the task specific feature extraction unit.

본 발명의 다른 특징에 따르면, 입력된 코퍼스를 전처리하는 단계; 상기 전처리된 코퍼스의 특정 자질 및 공유 자질을 추출하는 단계; 상기 추출된 특정 자질 및 공유 자질을 종합하는 단계; 상기 종합된 자질들을 이용하여 태스크에 대한 레이블의 확률값 및 손실값을 각각 계산하는 단계; 상기 특정 자질만을 이용하여 태스크에 대한 레이블의 보조 확률값 및 보조 손실값을 각각 계산하는 단계; 및 상기 계산된 손실값들을 종합하고 종합된 손실값들을 기준으로 역전파를 수행하여 네트워크 파라미터를 업데이트하는 단계를 포함하는 멀티 태스크 러닝 분류기 학습방법을 제공한다.According to another feature of the present invention, the step of pre-processing the input corpus; Extracting specific features and shared features of the preprocessed corpus; Synthesizing the extracted specific features and shared features; Calculating a probability value and a loss value of a label for a task, respectively, using the synthesized features; Calculating an auxiliary probability value and an auxiliary loss value of a label for a task using only the specific features; And updating a network parameter by synthesizing the calculated loss values and performing backpropagation based on the synthesized loss values.

상기 전처리 단계는, 입력 코퍼스에 포함된 특수문자 제거 및 형태소 분석, 음절 분리를 포함한다. The pre-processing step includes removing special characters included in the input corpus, analyzing morphemes, and separating syllables.

상기 특정 자질에는 자질마다 소정 가중치가 부여된다.A predetermined weight is assigned to each of the specific features.

상기 보조 확률 값은 상기 특정 자질에 부여된 가중치 결과를 통해 특정 자질에 대한 레이블들의 확률 값을 계산하는 것이다. The auxiliary probability value is to calculate a probability value of labels for a specific feature based on a result of a weight assigned to the specific feature.

상기 손실값은 상기 태스크에 대한 레이블들의 확률값과 기 셋팅된 정답 레이블의 확률값과의 차이를 비교하여 계산한다.The loss value is calculated by comparing a difference between a probability value of labels for the task and a preset probability value of a correct answer label.

상기 종합된 자질들에 대한 손실값은 네트워크의 학습이 완료된 후 예측 결과를 출력하는 용도로 사용된다.The loss values for the synthesized features are used for outputting a prediction result after network learning is completed.

상기 보조 손실 값은 네트워크 학습시에만 동작한다. The auxiliary loss value operates only during network training.

이상과 같은 본 발명의 멀티 태스크 러닝 분류기 학습장치 및 방법에 따르면, 입력 코퍼스에 대한 학습을 통해 다중 태스크의 레이블에 대한 확률값을 출력함으로써, 네트워크 파라미터의 업데이트가 적절하게 수행됨을 알 수 있다. According to the multitask learning classifier learning apparatus and method of the present invention as described above, it can be seen that the network parameter update is appropriately performed by outputting the probability value for the label of the multitask through learning about the input corpus.

따라서, 네트워크 레이어에 따라 네트워크 파라미터가 수행되지 못하여 발생할 수 있는 기울기값의 소실문제(Vanishing gradient problem)을 방지할 수 있는 효과가 있다. Accordingly, there is an effect of preventing a vanishing gradient problem that may occur when network parameters cannot be performed depending on the network layer.

도 1은 본 발명의 바람직한 실시 예에 따른 멀티 태스크 러닝 분류기 학습장치를 보인 구성도
도 2는 본 발명의 바람직한 실시 예에 따른 멀티 태스크 러닝 분류기 학습방법의 흐름도1 is a block diagram showing an apparatus for learning a multi-task learning classifier according to a preferred embodiment of the present invention.
2 is a flowchart of a method for learning a multi-task learning classifier according to a preferred embodiment of the present invention

본 발명의 목적 및 효과, 그리고 그것들을 달성하기 위한 기술적 구성들은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 본 발명을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다.Objects and effects of the present invention, and technical configurations for achieving them will become apparent with reference to the embodiments described later in detail together with the accompanying drawings. In describing the present invention, if it is determined that a detailed description of a known function or configuration may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted.

그리고 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다.In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of users or operators.

그러나 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있다. 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.However, the present invention is not limited to the embodiments disclosed below and may be implemented in various different forms. However, the present embodiments are provided to complete the disclosure of the present invention and to fully inform the scope of the invention to those of ordinary skill in the art, and the present invention is defined by the scope of the claims. It just becomes. Therefore, the definition should be made based on the contents throughout the present specification.

이하에서는 도면에 도시한 실시 예에 기초하면서 본 발명에 대하여 더욱 상세하게 설명하기로 한다. Hereinafter, the present invention will be described in more detail based on the embodiment shown in the drawings.

도 1에 도시된 바와 같이, 학습 장치(100)에는 입력 코퍼스에 포함된 특수문자를 제거하고, 또 형태소 분석 및 음절 분리 등의 전처리를 수행한 후 전처리부(110)가 구비된다. 전처리부(110)는, 입력 코퍼스로부터 여러 자질들을 용이하게 추출하도록 하기 위한 것이라 할 것이다. 실시 예에서 상기 코퍼스의 단위는 문장, 문단, 문서 등과 같이 다양하게 제공될 수 있다. As shown in FIG. 1, the learning apparatus 100 is provided with a preprocessor 110 after removing special characters included in the input corpus and performing preprocessing such as morpheme analysis and syllable separation. The preprocessing unit 110 will be said to be for easily extracting various features from the input corpus. In an embodiment, the unit of the corpus may be provided in various ways, such as sentences, paragraphs, and documents.

본 발명의 실시 예에 따르면 태스크(task)는 사용자의 설정에 따라 2개 이상일 수 있으면 이러한 각 태스크는 독립적이다. According to an embodiment of the present invention, if there can be two or more tasks according to a user's setting, each of these tasks is independent.

이와 같은 태스크로부터 자질을 추출하는 태스크 자질 추출부(120)가 구비된다. 태스크 자질 추출부(120)는 여러 층의 레이어 블록(layer block)으로 구성되며, 태스크 특정 자질 추출부(122) 및 태스크 공유 자질 추출부(124)를 포함할 수 있다. A task feature extraction unit 120 for extracting features from such a task is provided. The task feature extraction unit 120 is composed of layer blocks of several layers, and may include a task specific feature extraction unit 122 and a task sharing feature extraction unit 124.

태스크 특정 자질 추출부(122)는 입력 코퍼스로부터 각 태스크에 대한 자질을 추출하는 블록이다. 즉 입력 문장에 서 목표로 하는 출력 태스크들 중 특정 태스크에 대한 자질만을 추출하는 역할을 하는 것이다. 예를 들면 입력 코퍼스에 장르나 평점 등의 자질이 포함되어 있다면 이러한 영화 장르나 평점을 추출하는 자질 추출부가 제공될 수 있을 것이다. The task specific feature extraction unit 122 is a block for extracting features for each task from the input corpus. In other words, it plays a role of extracting only the features for a specific task among target output tasks from the input sentence. For example, if features such as genre or rating are included in the input corpus, a feature extraction unit for extracting such movie genre or rating may be provided.

그리고 태스크 특정 자질 추출부(122)에서 추출된 자질은 해당 태스크의 분류에는 의미있는 정보이지만 다른 태스크에서는 노이즈로 작용할 수 있을 것이다. In addition, the features extracted by the task-specific feature extraction unit 122 are meaningful information for classification of a corresponding task, but may act as noise in other tasks.

태스크 공유 자질 추출부(124)는 입력 코퍼스로부터 모든 태스크에 대해 공통적으로 사용될 수 있는 자질을 추출하는 블록이다. 상기 태스크 공유 자질 추출부(124)에서 추출되는 자질은 모든 태스크의 분류에 사용되며 특정 태스크를 구분하지 않게 된다.The task sharing feature extraction unit 124 is a block for extracting features that can be commonly used for all tasks from the input corpus. The features extracted by the task sharing feature extraction unit 124 are used for classification of all tasks, and specific tasks are not classified.

태스크 특정 자질 추출부(122)의 출력 및 상기 태스크 공유 자질 추출부(124)의 출력을 종합하는 자질 종합부(130)가 구비된다.A feature synthesis unit 130 for synthesizing the output of the task specific feature extraction unit 122 and the task sharing feature extraction unit 124 is provided.

그리고 상기 자질 종합부(130)가 출력하는 각 태스크에서 목표로 하는 레이블을 분류하는 태스크 분류부(140)가 구비된다. 또 태스크 분류부(140)는 태스크에 포함된 레이블들에 대한 확률값을 구하고 정답 레이블과 비교하여 손실값을 출력한다. 이러한 태스크 분류부(140)는 상기 태스크 자질 추출부(122)의 결과를 입력받아 상기한 동작을 수행한다In addition, a task classifying unit 140 for classifying a target label in each task output by the feature synthesis unit 130 is provided. In addition, the task classifying unit 140 obtains probability values for labels included in the task, compares them with correct answer labels, and outputs a loss value. The task classification unit 140 receives the result of the task feature extraction unit 122 and performs the above-described operation.

본 발명에 따르면 상기 태스트 분류부(140)와 달리 태스크 특정 자질 추출부(122)의 결과만을 입력으로 사용하여 해당 태스크의 레이블에 대한 보조 확률값을 계산하는 보조 분류부(150)가 더 구비된다. 보조 분류부(150)는 상기 보조 확률값과 정답 레이블의 확률값을 비교하여 보조 손실값을 출력한다. 이러한 보조 분류부는 상기 태스크 특정 자질 추출부(122)에 포함된 추출 레이어 블록 수만큼 존재한다. According to the present invention, unlike the task classifying unit 140, an auxiliary classifying unit 150 for calculating an auxiliary probability value for a label of a corresponding task by using only the result of the task specific feature extraction unit 122 as an input is further provided. The auxiliary classification unit 150 outputs an auxiliary loss value by comparing the auxiliary probability value with the probability value of the correct answer label. This auxiliary classification unit exists as many as the number of extraction layer blocks included in the task specific feature extraction unit 122.

상기 태스크 분류부(140) 및 보조 분류부(150)의 손실값을 종합하는 손실값 종합부(160)가 구비된다. 손실값 종합부(160)는 태스크의 수만큼 독립적으로 수행된다. 그리고 손실값 종합부(160)에서 종합된 최종 손실값은 역전파(back propagation)를 통해 네트워크 레이어에 순차적으로 전달되며, 태스크 특정 자질 추출부(122) 내부의 레이어 블록은 상기 손실값 종합부(160)와 연결된 보조 분류부(150)를 통해 상위 레이어 블록을 거치지 않은 손실값을 전달받는다. A loss value synthesis unit 160 for synthesizing the loss values of the task classification unit 140 and the auxiliary classification unit 150 is provided. The loss value synthesis unit 160 is independently performed as many as the number of tasks. And the final loss value synthesized by the loss value synthesis unit 160 is sequentially transmitted to the network layer through back propagation, and the layer block inside the task specific feature extraction unit 122 is the loss value synthesis unit ( The loss value that has not passed through the upper layer block is transmitted through the auxiliary classifier 150 connected to the 160).

한편 본 실시 예의 상기 손실값은 상기 태스크 분류부(140) 및 보조 분류부(150)에 의해 계산된 레이블들의 예측 확률값과 레이블의 정답 확률값 사이의 차이값을 의미한다.Meanwhile, the loss value according to the present embodiment refers to a difference value between the predicted probability value of the labels calculated by the task classifying unit 140 and the auxiliary classifying unit 150 and the correct answer probability value of the label.

다음에는 이와 같이 구성된 멀티 태스크 러닝 분류기 학습장치을 이용한 네트워크 학습과정에 대해 살펴본다. 도 2는 네트워크 학습과정을 보인 흐름도로서, 이에 도시된 바와 같이 입력 코퍼스를 리드(read)한 후(s100), 전처리부(110)가 전처리 과정을 수행한다(s102). Next, a network learning process using the multi-task learning classifier learning device configured as described above will be described. FIG. 2 is a flowchart showing a network learning process, and after reading the input corpus (s100), the preprocessor 110 performs a preprocessing process (s102).

실시 예에서 상기 입력 코퍼스의 예로, "통쾌한 액션씬이 빛남!! 음향쪽이 아쉽긴 하지만 ‥ 전반적으로 만족스러움"이라는 영화 리뷰 문장을 입력 코퍼스로 하였고, '4점'이라는 평점과 '액션/모험'이라는 영화장르를 분류하기로 한다.In the embodiment, as an example of the input corpus, the movie review sentence "A delightful action scene is shining!! Although the sound is unfortunate ‥ Overall satisfaction" was used as the input corpus, and a rating of '4 points' and a'action/adventure' You decide to classify the movie genre.

따라서 상기 전처리부(110)는 상기 입력 코퍼스에서 최대한 많은 자질을 추출할 수 있도록 하는 전처리과정으로 상기 '!!', '‥'와 같은 특수문자를 제거하고, 또한 형태소 분석, 음절 분리 등을 수행한 다음 벡터로 전환시킨다. Therefore, the pre-processing unit 110 removes special characters such as'!!' and'...' as a pre-processing process to extract as many features as possible from the input corpus, and also performs morpheme analysis, syllable separation, etc. Then convert it to a vector.

이와 같이 전처리된 입력 코퍼스는 태스크 자질 추출부(120)로 전달되어 자질 추출과정을 수행한다(s104).The input corpus preprocessed in this way is transmitted to the task feature extraction unit 120 to perform a feature extraction process (s104).

이러한 자질 추출과정은 특정 자질과 공유 자질을 각각 구분하여 추출할 것이다. 즉, 태스크 특정 자질 추출부(122)는, 평점 및 영화 장르에 대한 자질을 상기 전처리된 입력 코퍼스에서 추출한다. 이때 태스크 특정 자질 추출부(122)는 각 태스크별로 자질을 추출하게 될 것이다. 예를들면, 평점 자질 추출시에는 '아쉽긴', '만족스러움'과 같은 단어를 추출한 다음 다른 단어 대비 더 높은 가중치를 부여하고, 장르 자질 추출시에는 '액션씬'을 추출하여 더 높은 가중치를 부여하게 된다.This feature extraction process will separate and extract specific features and shared features. That is, the task specific feature extraction unit 122 extracts ratings and features for the movie genre from the preprocessed input corpus. At this time, the task specific feature extraction unit 122 will extract features for each task. For example, when extracting rating features, words such as'unfortunate' and'satisfactory' are extracted, and then a higher weight is given than other words, and when genre features are extracted,'action scenes' are extracted to give higher weights. do.

다른 추출부로서 태스크 공유 자질 추출부(124)는 상기 전처리된 입력 코퍼스로부터 '통쾌', '빛남'과 같은 영화 장르 및 평점과 관련된 모든 공유 자질을 모두 추출한다. As another extracting unit, the task sharing feature extracting unit 124 extracts all shared features related to the movie genre and rating, such as'excitement' and'brightness,' from the preprocessed input corpus.

이와 같이 태스크 특정 자질 추출부(122) 및 태스크 공유 자질 추출부(124)가 자질들을 모두 추출하면, 자질 종합부(130)는 상기 태스크 특정 자질 추출부(122) 및 태스크 공유 자질 추출부(124)의 결과를 종합한다(s106). 이때 추출부의 결과는 행렬값으로 연산자인 'Concatenation', 'Multiply'와 같은 행렬 연산으로 결과를 종합할 수 있다.In this way, when the task-specific feature extraction unit 122 and the task sharing feature extraction unit 124 extract all features, the feature synthesis unit 130 includes the task-specific feature extraction unit 122 and the task sharing feature extraction unit 124 The result of) is synthesized (s106). At this time, the result of the extraction unit is a matrix value, and the result may be synthesized by a matrix operation such as'Concatenation' and'Multiply'.

상기 자질 종합부(130)의 출력은 태스크 분류부(140)로 전달된다. 그러면 태스크 분류부(140)는 각 태스크의 레이블에 대한 확률값을 계산하고, 정답 레이블의 확률값과의 차이를 비교하여 손실값을 구한다(s108). 여기서 태스크 분류부(140)의 분류 결과 네트워크의 학습이 완료된 후 예측 결과를 출력하는 용도로 사용될 것이다.The output of the feature synthesis unit 130 is transmitted to the task classification unit 140. Then, the task classifying unit 140 calculates a probability value for the label of each task and compares the difference with the probability value of the correct answer label to obtain a loss value (s108). Here, it will be used for the purpose of outputting a prediction result after learning of the classification result network of the task classifier 140 is completed.

한편, 보조 분류부(150)는 테스크 특정 자질 추출부(122)의 결과만을 입력으로 하여 해당 태스크의 레이블에 대한 보조 확률값을 계산하고, 계산된 보조 확률값과 정답 레이블의 확률값을 비교하여 보조 손실값을 계산한다(s110). 구체적으로 태스크 특정 자질 추출부(122)가 부여한 가중치 결과를 통해 영화 장르와 평점에 대한 레이블들의 확률값을 계산하고, 정답 레이블의 확률값과 차이를 비교하여 '크로스 엔트로피(cross-entropy)' 방법으로 보조 손실값을 계산하는 것이다. On the other hand, the auxiliary classification unit 150 calculates an auxiliary probability value for the label of a corresponding task by inputting only the result of the task specific feature extraction unit 122, and compares the calculated auxiliary probability value with the probability value of the correct answer label to obtain an auxiliary loss value. Is calculated (s110). Specifically, the probability value of the labels for the movie genre and rating is calculated through the weight result given by the task-specific feature extraction unit 122, and the probability value of the correct answer label and the difference are compared, and assisted by the'cross-entropy' method. To calculate the loss value.

그리고 상기 보조 분류부(150)는 레이어 블록이 끝날 때마다 해당 레이어 블록에서 추출된 자질들을 입력으로 하여 태스크에 대한 레이블의 확률값을 예측하고 손실값을 계산하게 되며, 이러한 보조 분류부의 계산 값은 네트워크 학습 시에만 동작하며, 예측과정에는 영향을 주지 않는다. In addition, the auxiliary classifier 150 predicts a probability value of a label for a task and calculates a loss value by inputting features extracted from the corresponding layer block each time the layer block ends, and the calculated value of the auxiliary classifier is a network It works only during learning and does not affect the prediction process.

또한 상기 보조 분류부(150)는 상술한 바와 같이 태스크 특정 자질 추출부(122)의 레이어 블록 수만큼 구성되기 때문에 아래 [표 1]과 같이 레이어 블록이 8개로 설정되었다면 각 태스크마다 8개가 존재하게 될 것이다. [표 1]은 본 발명의 학습 과정에 따라 연산이 진행되었을 때 출력되는 손실값의 예를 나타내고 있다.In addition, since the auxiliary classification unit 150 is configured as many as the number of layer blocks of the task-specific feature extraction unit 122 as described above, if the layer blocks are set to 8 as shown in [Table 1] below, there are 8 for each task. Will be. [Table 1] shows an example of a loss value output when an operation is performed according to the learning process of the present invention.

장르genre 평점grade Internal classifier 1Internal classifier 1 3.19043.1904 2.12962.1296 Internal classifier 2Internal classifier 2 4.18294.1829 2.05052.0505 Internal classifier 3Internal classifier 3 4.19804.1980 1.76821.7682 Internal classifier 4Internal classifier 4 3.33733.3373 1.75281.7528 Internal classifier 5Internal classifier 5 3.45353.4535 1.53171.5317 Internal classifier 6Internal classifier 6 3.17953.1795 1.52961.5296 Internal classifier 7Internal classifier 7 3.25573.2557 1.73951.7395 lnternal classifier 8lnternal classifier 8 3.16933.1693 1.69511.6951 Final calssifierFinal calssifier 3.27463.2746 1.73071.7307

본 발명에 따라 네트워크의 파라미터를 업데이트시키기 위한 손실값은 손실값 종합부(160)에서 계산된다. 따라서 손실값 종합부(160)는 태스크 분류부(140)와 보조 분류부(150)가 계산한 손실값을 전달받아 종합한다(s112). 그리고 손실값 종합부(160)는 태스크의 수만큼 독립적으로 수행되기 때문에, 실시 예와 같이 레이어 블록이 8개로 설정된 경우 손실값 종합부(160)에서 종합되는 손실값은 태스크별로 보조 분류부의 손실값 8개, 태스크 분류부의 손실값 1개로 총 9개가 될 것이다. The loss value for updating the parameters of the network according to the present invention is calculated by the loss value synthesis unit 160. Accordingly, the loss value synthesis unit 160 receives and synthesizes the loss values calculated by the task classification unit 140 and the auxiliary classification unit 150 (s112). In addition, since the loss value synthesis unit 160 is independently performed as many as the number of tasks, when the number of layer blocks is set to eight as in the embodiment, the loss value synthesized by the loss value synthesis unit 160 is the loss value of the auxiliary classification unit for each task. There will be a total of 9 with 8 and 1 loss value of the task classifier.

또한 손실값 종합부(160)의 손실값 종합 방법은 손실값의 평균 등을 이용할 수 있으며, 보조 분류부(150)의 손실값 반영 비율을 0.3, 태스크 분류부(140)의 손실값 반영 비율을 0.7로 할 수 있으나, 이러한 손실값 반영 비율은 얼마든지 조절 가능할 것이다.In addition, the loss value synthesis method of the loss value synthesis unit 160 may use an average of the loss values, and the loss value reflection ratio of the auxiliary classifier 150 is 0.3, and the loss value reflection ratio of the task classifier 140 is set. It can be set to 0.7, but the ratio of reflecting this loss value will be adjustable.

한편, 손실값 종합부(160)에서 종합된 최종 손실값을 기준으로 역전파가 수행되며, 이와 같이 최종 손실값을 구하고 역전파를 수행하는 과정이 기 설정된 횟수만큼 반복 수행된 다음(s114), 네트워크의 파라미터가 업데이트된다(s116). 따라서 네트워크 파라미터의 업데이트는 적절하게 이루어질 수 있을 것이다.Meanwhile, backpropagation is performed based on the final loss value synthesized by the loss value aggregator 160, and the process of obtaining the final loss value and performing backpropagation as described above is repeatedly performed a predetermined number of times (s114), The parameters of the network are updated (s116). Therefore, the update of the network parameters may be appropriately performed.

이상과 같이 본 발명의 도시된 실시 예를 참고하여 설명하고 있으나, 이는 예시적인 것들에 불과하며, 본 발명이 속하는 기술 분야의 통상의 지식을 가진 자라면 본 발명의 요지 및 범위에 벗어나지 않으면서도 다양한 변형, 변경 및 균등한 타 실시 예들이 가능하다는 것을 명백하게 알 수 있을 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적인 사상에 의해 정해져야 할 것이다.Although described with reference to the illustrated embodiments of the present invention as described above, these are only exemplary, and those of ordinary skill in the technical field to which the present invention pertains, without departing from the gist and scope of the present invention, various It will be apparent that variations, modifications and other equivalent embodiments are possible. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

100: 학습장치
110: 전처리부
120: 태스크 자질 추출부
122: 태스크 특정 자질 추출부
124: 태스크 공유 자질 추출부
130: 자질 종합부
140: 태스크 분류부
150: 보조 분류부
160: 손실값 종합부100: learning device
110: pretreatment unit
120: task feature extraction unit
122: task specific feature extraction unit
124: task sharing feature extraction unit
130: Comprehensive Qualities Department
140: task classification unit
150: auxiliary classification unit
160: loss value synthesis unit

Claims

A preprocessor for preprocessing the input corpus;
A task specific feature extracting unit for extracting features for each task at least from the preprocessed input corpus;
A task sharing feature extracting unit for extracting features for all tasks;
A feature synthesis unit for synthesizing the features extracted by the task feature extraction unit;
A task classifying unit that predicts a probability value of a label for a task with features synthesized by the feature synthesis unit and calculates a loss value;
An auxiliary classifier for predicting a probability value of a label for a task and calculating a loss value by inputting only features extracted from a corresponding layer block whenever the layer block of the task specific feature extracting unit ends; And
A multi-task learning classifier learning apparatus comprising a loss value aggregator for performing backpropagation after synthesizing the loss values of the task classifier and the auxiliary classifier.

The method of claim 1,
The task specific feature extraction unit,
Multi-task learning classifier learning device composed of layer blocks of several layers.

The method of claim 1,
The auxiliary classification unit,
Multi-task learning classifier learning apparatus configured as many as the number of layer blocks of the task specific feature extraction unit.

Preprocessing the input corpus;
Extracting specific features and shared features of the preprocessed corpus;
Synthesizing the extracted specific features and shared features;
Calculating a probability value and a loss value of a label for a task, respectively, using the synthesized features;
Calculating an auxiliary probability value and an auxiliary loss value of a label for a task using only the specific features; And
And updating a network parameter by synthesizing the calculated loss values and performing backpropagation based on the aggregated loss values.

The method of claim 4,
The pretreatment step,
Multi-task learning classifier learning method including removal of special characters included in the input corpus, morpheme analysis, and syllable separation.

The method of claim 4,
A method of learning a multi-task learning classifier in which a predetermined weight is assigned to each of the specific features.

The method of claim 4,
The auxiliary probability value is a multi-task learning classifier learning method for calculating a probability value of labels for a specific feature based on a weight result assigned to the specific feature.

The method of claim 4,
The loss value is calculated by comparing a difference between a probability value of labels for the task and a preset probability value of a correct answer label.

The method of claim 4,
A method of learning a multi-task learning classifier used for outputting a prediction result after the network training is completed, the loss values for the synthesized features.

The method of claim 4,
The auxiliary loss value is a multi-task learning classifier learning method that operates only during network training.