KR20230058951A

KR20230058951A - Apparatus and method for classifying text

Info

Publication number: KR20230058951A
Application number: KR1020210142833A
Authority: KR
Inventors: 박동우; 임소연
Original assignee: 삼성에스디에스 주식회사
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2023-05-03

Abstract

텍스트 분류 장치 및 방법이 제공된다. 일 실시예에 따른 텍스트 분류 장치는 긍정 또는 부정으로 레이블링된 복수의 문장을 포함하는 학습 데이터에 의하여 학습되는 언어 모델에 기반하여, 입력된 분류 대상 문장이 긍정 문장인지 또는 부정 문장인지의 여부를 판단하는 제1 분류기; 및 상기 제1 분류기의 분류 결과에 대한 신뢰 점수(trustscore)에 기반하여, 상기 제1 분류기에 의하여 긍정 또는 부정으로 분류된 상기 대상 문장의 적어도 일부를 중립 문장으로 재분류하는 제2 분류기를 포함한다.A text classification apparatus and method are provided. A text classification apparatus according to an embodiment determines whether an input sentence to be classified is a positive sentence or a negative sentence based on a language model learned by training data including a plurality of sentences labeled as positive or negative. a first classifier that does; and a second classifier that reclassifies at least some of the target sentences classified as positive or negative by the first classifier into neutral sentences based on a trustscore of a classification result of the first classifier. .

Description

Apparatus and method for text classification {APPARATUS AND METHOD FOR CLASSIFYING TEXT}

개시되는 실시예들은 텍스트의 극성을 분류하기 위한 기술과 관련된다.The disclosed embodiments relate to techniques for classifying the polarity of text.

감성분석은 주어진 문장이 긍정인지 부정인지 그 극성을 분류해주는 문제이다. 감성분석을 위한 대다수의 데이터세트들은 긍정 또는 부정에 대해 레이블링 되어 있으므로, 이를 통해 학습된 모델 또한 긍정 또는 부정으로 분류되는게 일반적이다. 하지만 이와 같이 학습된 모델은 긍정 또는 부정의 어디에도 속하지 않는 중립문장들에 대해서는 극성을 판단하기 어렵다. 위와 같이 학습 때 사용되지 않은 클래스들에 대한 분류가 추론 과정에서 필요한 문제를 열린 세트 인식(Open Set Recognition)이라 한다.Sentiment analysis is a problem that classifies the polarity of a given sentence as positive or negative. Since most of the datasets for sentiment analysis are labeled positive or negative, it is common for models learned through this to be classified as positive or negative. However, it is difficult for the trained model to judge the polarity of neutral sentences that do not belong to either positive or negative sentences. As described above, the problem of classifying classes not used in learning is required in the inference process is called open set recognition.

Open Set Recognition을 해결하기 위해서는 분류하고자 하는 클래스들의 확률 분포를 잘 표현하는 것이 중요하다. 그러나 종래의 기술들은 모델의 구조가 커질수록 모델의 정확도는 올라가나 모델이 예측하는 클래스들에 대한 신뢰도(Confidence)는 오히려 낮아지는 경우가 많아, 이를 Open Set Recognition에 바로 적용하는 데 한계가 있다.In order to solve Open Set Recognition, it is important to express the probability distribution of classes to be classified well. However, in conventional technologies, the accuracy of the model increases as the structure of the model increases, but the confidence in the classes predicted by the model often decreases, so there is a limit to directly applying this to Open Set Recognition.

또한, 뉴럴 네트워크 기반의 분류기는 일반적으로 모델의 출력값에 소프트맥스(Softmax) 함수를 적용한 값을 확률 분포로 이용하여 클래스를 분류한다. 그러나 이는 학습되지 않은 새로운 클래스에 대한 예측이 필요한 Open Set Recognition 문제를 해결하기에는 적절하지 않다.In addition, a classifier based on a neural network generally classifies a class by using a value obtained by applying a softmax function to a model output value as a probability distribution. However, it is not suitable for solving the Open Set Recognition problem, which requires prediction of new unlearned classes.

대한민국 등록특허공보 제10-2216768호 (2021.02.09)Republic of Korea Patent Registration No. 10-2216768 (2021.02.09)

개시되는 실시예들은 긍정 및 부정 문장만으로 구성된 데이터셋만을 이용하여 긍정, 부정 및 중립 문장을 모두 분류하기 위한 기술적인 수단을 제공하기 위한 것이다.The disclosed embodiments are intended to provide a technical means for classifying all positive, negative, and neutral sentences using only a dataset consisting of only positive and negative sentences.

예시적인 실시예에 따르면, 긍정 또는 부정으로 레이블링된 복수의 문장을 포함하는 학습 데이터에 의하여 학습되는 언어 모델에 기반하여, 입력된 분류 대상 문장이 긍정 문장인지 또는 부정 문장인지의 여부를 판단하는 제1 분류기; 및 상기 제1 분류기의 분류 결과에 대한 신뢰 점수(trustscore)에 기반하여, 상기 제1 분류기에 의하여 긍정 또는 부정으로 분류된 상기 대상 문장의 적어도 일부를 중립 문장으로 재분류하는 제2 분류기를 포함하는, 텍스트 분류 장치가 제공된다.According to an exemplary embodiment, a first step for determining whether an input sentence to be classified is a positive sentence or a negative sentence based on a language model learned by training data including a plurality of sentences labeled as positive or negative. 1 classifier; and a second classifier that reclassifies at least some of the target sentences classified as positive or negative by the first classifier into neutral sentences based on a trustscore of a classification result of the first classifier. , a text classification device is provided.

상기 제1 분류기는, 상기 학습 데이터에 정규화된 노이즈가 포함되도록 설정된 목적 함수를 이용하여 상기 학습 데이터에 대한 학습을 수행할 수 있다.The first classifier may perform learning on the training data using an objective function set to include normalized noise in the training data.

상기 목적 함수는 기 설정된 노이즈 함수를 포함하며, 상기 노이즈 함수는, 상기 노이즈가 포함된 학습 데이터와 원 학습 데이터와의 거리가 기 설정된 범위 이내인 경우 함수값이 최대가 되도록 설정될 수 있다.The objective function may include a preset noise function, and the noise function may be set such that a function value is maximized when a distance between the noise-included learning data and the original learning data is within a preset range.

상기 제1 분류기는, 상기 대상 문장에 포함된 [CLS] 토큰을 사전 학습된 언어 모델(Pre-Trained Language Model)에 입력하여 생성되는 은닉 상태(hidden state) 벡터를 이용하여, 상기 분류 대상 문장이 긍정 문장인지 또는 부정 문장인지의 여부를 판단할 수 있다.The first classifier uses a hidden state vector generated by inputting the [CLS] token included in the target sentence into a pre-trained language model, so that the target sentence for classification is It is possible to determine whether the sentence is positive or negative.

상기 제2 분류기는, 상기 학습 데이터가 임베딩된 매니폴드 상에서의 상기 대상 문장과 상기 제1 분류기에 의하여 예측된 클래스까지의 거리, 및 상기 대상 문장과 상기 예측된 클래스를 제외한 가장 가까운 클래스까지의 거리에 기반하여 상기 신뢰 점수를 계산할 수 있다.The second classifier determines the distance between the target sentence on the manifold in which the training data is embedded and the class predicted by the first classifier, and the distance between the target sentence and the nearest class excluding the predicted class. The confidence score can be calculated based on .

상기 제2 분류기는, 상기 학습 데이터 각각에 포함된 [CLS] 토큰을 상기 언어 모델에 입력하여 복수의 은닉 상태 벡터를 생성하고, 상기 은닉 상태 벡터 중 분포 밀도가 낮은 일정 비율을 제외한 나머지를 상기 매니폴드 상에 임베딩하여 각 클래스 별 집합을 생성할 수 있다.The second classifier generates a plurality of hidden state vectors by inputting the [CLS] token included in each of the training data to the language model, and extracts the rest of the hidden state vectors from the hidden state vectors except for a certain ratio with a low distribution density into the language model. You can create a set for each class by embedding on a fold.

상기 제2 분류기는, 상기 생성된 복수의 은닉 상태 벡터를 상기 매니폴드 상에 임베딩하기 전, 상기 복수의 은닉 상태 벡터의 차원을 감소시킬 수 있다.The second classifier may reduce dimensions of the plurality of hidden state vectors before embedding the generated plurality of hidden state vectors on the manifold.

상기 제2 분류기는, 상기 신뢰 점수가 기 설정된 제1 임계값 이하인 경우, 상기 대상 문장을 중립 문장으로 재분류할 수 있다.The second classifier may reclassify the target sentence as a neutral sentence when the confidence score is equal to or less than a preset first threshold value.

상기 학습 데이터가 임베딩된 매니폴드 상에서의 상기 대상 문장과 각 클래스 간의 거리가 모두 기 설정된 제2 임계값 이상인 경우, 상기 대상 문장을 중립 문장으로 재분류할 수 있다.When the distance between the target sentence and each class on the manifold in which the training data is embedded is equal to or greater than a preset second threshold value, the target sentence may be reclassified as a neutral sentence.

다른 예시적인 실시예에 따르면, 컴퓨터상에서 수행되는 방법으로서, 긍정 또는 부정으로 레이블링된 복수의 문장을 포함하는 학습 데이터에 의하여 학습되는 언어 모델에 기반하여, 입력된 분류 대상 문장이 긍정 문장인지 또는 부정 문장인지의 여부를 판단하는 제1 분류 단계; 및 상기 제1 분류기의 분류 결과에 대한 신뢰 점수(trustscore)에 기반하여, 상기 제1 분류기에 의하여 긍정 또는 부정으로 분류된 상기 대상 문장의 적어도 일부를 중립 문장으로 재분류하는 제2 분류 단계를 포함하는, 텍스트 분류 방법이 제공된다.According to another exemplary embodiment, as a method performed on a computer, based on a language model learned by training data including a plurality of sentences labeled as positive or negative, whether an input sentence to be classified is positive or negative. a first classification step of determining whether the sentence is a sentence; and a second classification step of re-classifying at least some of the target sentences classified as positive or negative by the first classifier into neutral sentences based on a trustscore of a classification result of the first classifier. A text classification method is provided.

상기 제1 분류 단계는, 상기 학습 데이터에 정규화된 노이즈가 포함되도록 설정된 목적 함수를 이용하여 상기 학습 데이터에 대한 학습을 수행할 수 있다.In the first classification step, learning on the training data may be performed using an objective function set to include normalized noise in the training data.

상기 제1 분류 단계는, 상기 대상 문장에 포함된 [CLS] 토큰을 사전 학습된 언어 모델(Pre-Trained Language Model)에 입력하여 생성되는 은닉 상태(hidden state) 벡터를 이용하여, 상기 분류 대상 문장이 긍정 문장인지 또는 부정 문장인지의 여부를 판단할 수 있다.In the first classification step, the classification target sentence is generated by using a hidden state vector generated by inputting the [CLS] token included in the target sentence to a pre-trained language model. It is possible to determine whether the sentence is positive or negative.

상기 제2 분류 단계는, 상기 학습 데이터가 임베딩된 매니폴드 상에서의 상기 대상 문장과 상기 제1 분류기에 의하여 예측된 클래스까지의 거리, 및 상기 대상 문장과 상기 예측된 클래스를 제외한 가장 가까운 클래스까지의 거리에 기반하여 상기 신뢰 점수를 계산할 수 있다.In the second classification step, the distance between the target sentence on the manifold in which the learning data is embedded and the class predicted by the first classifier, and the distance between the target sentence and the closest class excluding the predicted class Based on the distance, the confidence score may be calculated.

상기 제2 분류 단계는, 상기 학습 데이터 각각에 포함된 [CLS] 토큰을 상기 언어 모델에 입력하여 복수의 은닉 상태 벡터를 생성하고, 상기 은닉 상태 벡터 중 분포 밀도가 낮은 일정 비율을 제외한 나머지를 상기 매니폴드 상에 임베딩하여 각 클래스 별 집합을 생성할 수 있다.In the second classification step, a plurality of hidden state vectors are generated by inputting the [CLS] token included in each of the learning data into the language model, and the rest of the hidden state vectors excluding a certain ratio having a low distribution density are described above. A set for each class can be created by embedding on a manifold.

상기 제2 분류 단계는, 상기 생성된 복수의 은닉 상태 벡터를 상기 매니폴드 상에 임베딩하기 전, 상기 복수의 은닉 상태 벡터의 차원을 감소시키 단계를 더 포함할 수 있다.The second classification step may further include reducing dimensions of the plurality of hidden state vectors before embedding the generated plurality of hidden state vectors on the manifold.

상기 제2 분류 단계는, 상기 신뢰 점수가 기 설정된 제1 임계값 이하인 경우, 상기 대상 문장을 중립 문장으로 재분류할 수 있다.In the second classification step, when the confidence score is equal to or less than a preset first threshold value, the target sentence may be reclassified as a neutral sentence.

상기 제2 분류 단계는, 상기 학습 데이터가 임베딩된 매니폴드 상에서의 상기 대상 문장과 각 클래스 간의 거리가 모두 기 설정된 제2 임계값 이상인 경우, 상기 대상 문장을 중립 문장으로 재분류할 수 있다.In the second classification step, when distances between the target sentence and each class on the manifold in which the training data is embedded are all equal to or greater than a preset second threshold value, the target sentence may be reclassified as a neutral sentence.

개시되는 실시예들에 따르면 긍정 및 부정 문장만으로 구성된 데이터셋만을 이용하여 중립 문장에 대해서도 분류가 가능한 감성 분류 모델을 구성할 수 있다. 이에 따라 학습 데이터 구축에 소요되는 시간과 비용을 절감할 수 있다.According to the disclosed embodiments, a sentiment classification model capable of classifying even neutral sentences can be configured using only a dataset consisting of only positive and negative sentences. Accordingly, it is possible to reduce the time and cost required to build learning data.

또한 개시되는 실시예들에 따르면 사전학습된 언어 모델(PLM; Pre-Trained Language Model)에 적대적 학습(Adversarial Training)을 적용함으로써 PLM 기반의 분류 모델의 신뢰도를 높일 수 있다.Also, according to the disclosed embodiments, reliability of a PLM-based classification model may be increased by applying adversarial training to a pre-trained language model (PLM).

도 1은 일 실시예에 따른 텍스트 분류 장치를 설명하기 위한 블록도
도 2는 일 실시예에 따른 제1 분류기의 상세 구성을 설명하기 위한 블록도
도 3은 일 실시예에 따른 제2 분류기에서 신뢰 점수를 이용하여 대상 문장의 적어도 일부를 중립 문장으로 재분류하는 과정을 설명하기 위한 예시도
도 4의 (a)는 분류 문제를 닫힌 세트 분류(closed-set classification)로 정의하였을 때의 저차원 매니폴드의 예시도
도 4의 (b)는 분류 문제를 열린 세트 분류(open-set recognition) 문제로 정의했을 때의 저차원 매니폴드의 예시도
도 5는 일 실시예에 따른 제2 분류기의 재분류 과정을 설명하기 위한 흐름도
도 6은 일 실시예에 따른 텍스트 분류 방법을 설명하기 위한 흐름도
도 7은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도1 is a block diagram for explaining a text classification apparatus according to an exemplary embodiment;
2 is a block diagram for explaining a detailed configuration of a first classifier according to an exemplary embodiment;
3 is an exemplary diagram for explaining a process of reclassifying at least a part of a target sentence into a neutral sentence by using a confidence score in a second classifier according to an embodiment;
Figure 4 (a) is an example of a low-dimensional manifold when the classification problem is defined as a closed-set classification (closed-set classification)
Figure 4 (b) is an example of a low-dimensional manifold when the classification problem is defined as an open-set recognition problem.
5 is a flowchart illustrating a re-classification process of a second classifier according to an exemplary embodiment;
6 is a flowchart for explaining a text classification method according to an exemplary embodiment;
7 is a block diagram for illustrating and describing a computing environment including a computing device suitable for use in example embodiments.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The detailed descriptions that follow are provided to provide a comprehensive understanding of the methods, devices and/or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing the embodiments of the present invention, if it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of a user or operator. Therefore, the definition should be made based on the contents throughout this specification. Terminology used in the detailed description is only for describing the embodiments of the present invention and should in no way be limiting. Unless expressly used otherwise, singular forms of expression include plural forms. In this description, expressions such as "comprising" or "comprising" are intended to indicate any characteristic, number, step, operation, element, portion or combination thereof, one or more other than those described. It should not be construed to exclude the existence or possibility of any other feature, number, step, operation, element, part or combination thereof.

도 1은 일 실시예에 따른 텍스트 분류 장치(100)를 설명하기 위한 블록도이다. 일 실시예에 따른 텍스트 분류 장치(100)는 긍정 또는 부정으로 레이블링된 학습 데이터를 학습하고, 학습된 모델을 이용하여 입력되는 문장을 긍정/부정/중립의 세 가지 클래스 중 하나로 분류하는, 이른바 열린 세트 인식(Open Set Recognition) 문제를 해결하기 위한 장치이다. 도시된 바와 같이, 일 실시예에 따른 텍스트 분류 장치(100)는 제1 분류기(102) 및 제2 분류기(104)를 포함한다.1 is a block diagram illustrating a text classification apparatus 100 according to an exemplary embodiment. The text classification apparatus 100 according to an embodiment learns training data labeled as positive or negative, and classifies an input sentence into one of three classes of positive/negative/neutral using the learned model, a so-called open text classification apparatus 100. It is a device to solve the problem of set recognition (Open Set Recognition). As shown, the text classification apparatus 100 according to an embodiment includes a first classifier 102 and a second classifier 104 .

제1 분류기(102)는 입력된 분류 대상 문장이 긍정 문장인지 또는 부정 문장인지의 여부를 판단한다. 제1 분류기(102)는 긍정 또는 부정으로 레이블링된 복수의 문장을 포함하는 학습 데이터에 의하여 학습된 모델을 이용하여 상기 대상 문장이 긍정 또는 부정 중 어느 클래스에 속하는지 여부를 판단한다.The first classifier 102 determines whether the input sentence to be classified is a positive sentence or a negative sentence. The first classifier 102 determines whether the target sentence belongs to any class among positive and negative by using a model learned from training data including a plurality of sentences labeled as positive or negative.

일 실시예에서, 제1 분류기(102)는 본 발명에서는 언어 모델(Language Model) 기반의 분류기를 사용할 수 있다. 예를 들어 제1 분류기(102)는 트랜스포머(transformers) 구조를 기반으로 하는 사전 학습된 언어 모델(Pre-trained Language Model; PLM)을 이용할 수 있다. PLM은 대규모의 레이블이 없는 텍스트 코퍼스(corpus)를 이용하여 학습하고자 하는 언어의 전반적인 구조 및 상황에 맞는 텍스트 표현(contextual text representation)을 학습한 모델을 의미한다. PLM은 특정 과제(task)를 실행할 수 있도록 미세 조정(fine-tune)될 수 있는데, 이미 언어적 지식을 학습한 상태이므로 적은 양의 학습 데이터만으로도 해당 과제에서 높은 성능을 낼 수 있다는 장점이 있다.In one embodiment, the first classifier 102 may use a classifier based on a language model in the present invention. For example, the first classifier 102 may use a pre-trained language model (PLM) based on a transformer structure. PLM refers to a model that learns the overall structure and contextual text representation of a language to be learned using a large-scale unlabeled text corpus. PLM can be fine-tuned to execute a specific task, and since linguistic knowledge has already been learned, it has the advantage of being able to achieve high performance in that task with only a small amount of training data.

도 2는 일 실시예에 따른 제1 분류기(102)의 상세 구성을 설명하기 위한 블록도이다. 도시된 바와 같이, 일 실시예에 따른 제1 분류기(102)는 언어 모델(202) 및 분류기(204)를 포함한다.2 is a block diagram for explaining a detailed configuration of the first classifier 102 according to an exemplary embodiment. As shown, the first classifier 102 according to one embodiment includes a language model 202 and a classifier 204 .

언어 모델(202)은 사전 학습된 언어 모델(PLM)로서, 긍정 또는 부정으로 레이블링된 복수의 문장을 포함하는 학습 데이터에 의하여 학습된 모델일 수 있다. 언어 모델(202)은 분류 대상 문장으로부터 생성된 복수의 입력 토큰(206)을 입력받는다. 이때 입력 토큰(206)은 문장의 시작을 나타내는 [CLS] 토큰 및 문장과 문장을 구별하기 위한 [SEP] 토큰을 포함할 수 있다. 언어 모델(202)의 내부에서는 각 토큰의 임베딩(208, Token Embedding)에 대응되는 은닉 상태(hidden states) 벡터(210)가 생성된다. 분류기(204, classifier)는 언어 모델(202)이 생성한 [CLS] 토큰의 은닉 상태(212) 벡터를 2차원 확률 벡터(214, logit)로 변환하고, 이를 argmax 함수에 입력하여 해당 문장이 긍정 또는 부정 문장으로 분류한다.The language model 202 is a pretrained language model (PLM), and may be a model learned by training data including a plurality of sentences labeled as positive or negative. The language model 202 receives a plurality of input tokens 206 generated from sentences to be classified. At this time, the input token 206 may include a [CLS] token indicating the beginning of a sentence and a [SEP] token for distinguishing a sentence from another sentence. Inside the language model 202, a hidden states vector 210 corresponding to each token embedding 208 is generated. The classifier (204, classifier) converts the vector of the hidden state (212) of the [CLS] token generated by the language model (202) into a two-dimensional probability vector (214, logit), and inputs it to the argmax function so that the sentence is positive or classified as negative sentences.

일 실시예에서, 제1 분류기(102)는 학습 데이터에 정규화된 노이즈가 포함되도록 설정된 목적 함수(loss function)를 이용하여 상기 학습 데이터에 대한 학습을 수행할 수 있다.In an embodiment, the first classifier 102 may perform learning on the training data using a loss function set to include normalized noise in the training data.

전술한 바와 같이, PLM 기반의 분류기는 사전 학습된 모델에 분류 과제를 수행하기 위한 파인 튜닝(fine-tuning)을 거쳐 문장을 긍정 또는 부정으로 분류한다. 그런데 파인 튜닝에 사용되는 학습 데이터의 양은 언어 모델의 사전 학습(pre-training)에 사용된 데이터의 양에 비해 현격하게 적다. 이는 사전 학습된 언어 모델이 매우 거대하고 복잡하기 때문이다. 이 때문에, 제1 분류기(102)에 포함된 언어 모델(202)은 파인 튜닝 단계에서 사용된 학습 데이터에 과적합되는 한계가 있다. 과적합된 모델은 학습 데이터에 포함되지 않은 데이터에 대한 추론에 취약하고, 일반화 능력이 떨어져 학습 데이터와의 아주 작은 차이에도 민감하게 반응하여 분류의 잘못을 초래할 수 있다.As described above, the PLM-based classifier classifies sentences as positive or negative through fine-tuning to perform a classification task on a pretrained model. However, the amount of training data used for fine tuning is significantly smaller than the amount of data used for pre-training of the language model. This is because pre-trained language models are very large and complex. For this reason, the language model 202 included in the first classifier 102 has a limitation of overfitting the training data used in the fine tuning step. An overfitting model is vulnerable to reasoning on data that is not included in the training data, and its generalization ability is poor, reacting sensitively to even the slightest difference from the training data, which can lead to classification errors.

이를 해결하기 위하여, 개시되는 실시예에서는 적대적 학습(Adversarial Training) 방식을 이용하여 학습 데이터에 대한 학습을 수행한다. 적대적 학습이란 이른바 적대적 데이터(adversarial data)를 생성하여 기존 데이터와 함께 모델을 학습시킴으로써 모델을 공격하고 모델이 이를 방어하는 과정에서 기존 모델에 비해 강건하고, 특히 적대적 공격(adversarial attack, 데이터의 미세한 차이를 주어 딥러닝 모델을 무력화하는 방법)에 강건할 수 있도록 하는 학습 방법이다. 개시되는 실시예에서는 적대적 학습을 통하여 제1 분류기(102)를 학습함으로써 긍/부정 분류의 정확성을 높일 뿐 아니라, 해당 모델을 기반으로 한 중립 문장 검출 성능 또한 향상시킬 수 있다.In order to solve this problem, in the disclosed embodiment, learning is performed on learning data using an adversarial training method. Adversarial learning refers to creating so-called adversarial data and training the model along with existing data to attack the model and in the process of defending the model, it is more robust than the existing model. It is a learning method that makes it robust against the method of incapacitating a deep learning model by giving In the disclosed embodiment, by learning the first classifier 102 through adversarial learning, not only the accuracy of positive/negative classification can be improved, but also neutral sentence detection performance based on the corresponding model can be improved.

일 실시예에서, 제1 분류기(102)는 긍/부정 학습 데이터를 이용한 학습 단계(파인 튜닝) 단계에서 입력 토큰에 대한 임베딩에 정규화된 노이즈를 추가하는 방식으로 학습을 수행한다. 이를 위하여 제1 분류기(102)는 기 설정된 노이즈 함수를 포함하는 목적 함수(loss function)을 이용한다. 구체적으로 모델이

이고, n개의 입력 데이터가

(x_i는 입력 데이터의 i번째 임베딩, y_i는 해당 입력 데이터의 레이블)로 표현될 때, 목적함수는 다음의 수학식 1과 같다.In one embodiment, the first classifier 102 performs learning by adding normalized noise to an embedding for an input token in a learning step (fine tuning) using positive/negative training data. To this end, the first classifier 102 uses an objective function (loss function) including a preset noise function. Specifically, the model

, and n input data

When expressed as (x _i is the ith embedding of the input data, y _i is the label of the corresponding input data), the objective function is as shown in Equation 1 below.

[수학식 1][Equation 1]

*

통상적으로 파인 튜닝시에는 모델의 결과와 정답 레이블을 비교하는 목적 함수를 사용하게 된다. 개시되는 실시예에서는 여기에 두 번째 항인 노이즈 함수(

)를 추가적으로 포함한다. 상기 노이즈 함수는, 상기 노이즈가 포함된 학습 데이터와 원 학습 데이터와의 거리가 기 설정된 범위(ε) 이내인 경우 함수값이 최대가 되도록 설정될 수 있다. 이와 같은 목적 함수를 사용할 경우 결과적으로 학습 데이터에 정규화된 노이즈가 추가된 것과 같은 효과를 줄 수 있다. 이렇게 만들어진 학습 데이터는 원 데이터와는 차이가 있지만, 학습된 모델의 결과가 크게 달라지지는 않는다. 이와 같이 학습 데이터에 정규화된 노이즈를 추가할 경우 모델의 과적합을 방지함으로써 적은 양의 데이터로 파인 튜닝 되는 경우에도 더 나은 일반화 성능을 가진 강건한 모델을 만들 수 있다. 또한 강건한 모델의 경우 학습 데이터에 포함되지 않은 문장들에 대한 추론 성능이 그렇지 않은 모델보다 높으므로, 학습 데이터에 포함되지 않은 중립 데이터를 분류해내고자 하는 과제에서도 큰 성능 개선을 가져온다.Normally, in fine tuning, an objective function is used to compare the model result with the correct answer label. In the disclosed embodiment, the second term here is the noise function (

) is additionally included. The noise function may be set to have a maximum function value when a distance between the noise-included training data and the original training data is within a preset range ε. When such an objective function is used, as a result, it can give the same effect as adding normalized noise to the training data. Although the training data created in this way is different from the original data, the result of the trained model does not change significantly. In this way, when normalized noise is added to the training data, it is possible to create a robust model with better generalization performance even when fine-tuned with a small amount of data by preventing overfitting of the model. In addition, in the case of a robust model, inference performance on sentences not included in the training data is higher than that of a model that is not included in the training data, resulting in a large performance improvement in the task of classifying neutral data not included in the training data.

다음으로, 제2 분류기(104)는 제1 분류기(102)에 의하여 긍정 또는 부정으로 분류된 상기 대상 문장의 적어도 일부를 중립 문장으로 재분류한다. Next, the second classifier 104 reclassifies at least a portion of the target sentences classified as positive or negative by the first classifier 102 into neutral sentences.

감성분석을 하다 보면 문장을 긍정 또는 부정만으로 분류할 수 없는 경우가 존재한다. 예를 들어 문장이 어떤 대상에 대한 평가를 포함하고 있지 않을 수도 있고, 문장 내에 상반된 평가가 공존하기도 한다. 제2 분류기(104)는 긍정 또는 부정으로 분류될 수 없는 문장을 중립 문장으로 정의하고, 긍정 또는 부정으로만 레이블링된 데이터로만 학습된 제1 분류기(102)를 이용하여 학습 데이터에 포함되지 않은 클래스인 중립 문장을 검출하도록 구성된다.In sentiment analysis, there are cases in which sentences cannot be classified as positive or negative. For example, a sentence may not contain an evaluation of an object, and conflicting evaluations may coexist within a sentence. The second classifier 104 defines sentences that cannot be classified as positive or negative as neutral sentences, and uses the first classifier 102 trained only with data labeled only as positive or negative to class not included in the training data. is configured to detect neutral sentences.

제2 분류기(104)는 제1 분류기(102)의 분류 결과에 대한 신뢰 점수(trustscore)에 기반하여, 제1 분류기(102)에 의하여 긍정 또는 부정으로 분류된 상기 대상 문장의 적어도 일부를 중립 문장으로 재분류하도록 구성된다.The second classifier 104 converts at least a part of the target sentence classified as positive or negative by the first classifier 102 into a neutral sentence based on a trustscore for the classification result of the first classifier 102. It is configured to be reclassified as

도 3은 일 실시예에 따른 제2 분류기(104)에서 신뢰 점수를 이용하여 대상 문장의 적어도 일부를 중립 문장으로 재분류하는 과정을 설명하기 위한 예시도이다. 개시되는 실시예들에서, 신뢰 점수란 학습 데이터의 분포를 바탕으로 제1 분류기(102)의 긍정 또는 부정 분류 결과를 신뢰할 수 있는지를 판단할 수 있는 값이다. 신뢰 점수를 계산하기 위하여, 제1 분류기(102)를 학습하기 위하여 사용된 학습 데이터가 임베딩되어 있는 저차원의 매니폴드(302, low-dimentional manifold embedded with training data)를 가정한다. 제2 분류기(104)는 학습 데이터 각각에 포함된 [CLS] 토큰을 언어 모델(202)에 입력하여 복수의 은닉 상태 벡터를 생성하고, 상기 은닉 상태 벡터 중 분포 밀도가 낮은 일정 비율을 제외한 나머지를 상기 매니폴드 상에 임베딩하여 각 클래스 별 집합을 생성한다.3 is an exemplary diagram for explaining a process of reclassifying at least a part of a target sentence into a neutral sentence by using a confidence score in the second classifier 104 according to an embodiment. In the disclosed embodiments, the confidence score is a value for determining whether the positive or negative classification result of the first classifier 102 is reliable based on the distribution of training data. In order to calculate the confidence score, it is assumed that a low-dimensional manifold (302, low-dimensional manifold embedded with training data) in which training data used to learn the first classifier 102 is embedded. The second classifier 104 generates a plurality of hidden state vectors by inputting the [CLS] token included in each of the training data into the language model 202, and the rest of the hidden state vectors except for a certain percentage of the hidden state vectors having a low distribution density A set for each class is created by embedding on the manifold.

일 실시예에서, 제2 분류기(104)는 생성된 복수의 은닉 상태 벡터를 매니폴드(302) 상에 임베딩하기 전, 상기 복수의 은닉 상태 벡터의 차원을 감소시킬 수 있다. 일반적으로 PLM 기반의 언어 모델의 경우 상기 은닉 상태 벡터의 차원이 256, 768 등과 같이 매우 크게 출력된다. 따라서 이를 그대로 시용할 경우 계산량이 지나치게 많아질 수 있으므로 제2 분류기(104)는 주성분분석(PCA; Principal Component Analysis) 등의 방법을 이용하여 해당 벡터의 차원을 감소시킬 수 있다.In one embodiment, the second classifier 104 may reduce the dimensionality of the plurality of hidden state vectors before embedding them on the manifold 302 . In general, in the case of a PLM-based language model, the dimension of the hidden state vector is output as very large, such as 256 or 768. Therefore, if this is used as it is, the amount of calculation may be excessively large, so the second classifier 104 can reduce the dimension of the vector by using a method such as principal component analysis (PCA).

제2 분류기(104)는 상기 차원이 감소된 은닉 상태 벡터 중 분포 밀도가 낮은 일정 비율을 제외하고 나머지로 α-고밀도-세트(α-high-density-set)를 구성한다. α-고밀도-세트는 학습 데이터 중 분포 밀도가 가장 낮은 α%만큼의 데이터를 제외한 학습 데이터의 집합으로 다음과 같이 정의될 수 있다.The second classifier 104 configures an α-high-density-set with the rest of the hidden state vectors having a reduced dimension, excluding a certain proportion having a low distribution density. The α-high-density-set may be defined as a set of training data excluding data of α% having the lowest distribution density among training data as follows.

이고, f를

상의 연속확률밀도라고 하면,

and f

If the continuous probability density of the phase is

f의 α-high-density-set

은

로 정의된다. α-high-density-set of f

silver

is defined as

여기서

이다. here

am.

도 4의 (a)는 분류 문제를 닫힌 세트 분류(closed-set classification)로 정의하였을 때 저차원 매니폴드의 예시도이다. 이와 같이 문제를 정의하였을 때, 모든 데이터를 학습 데이터에 포함된 긍정, 부정 클래스로만 분류하게 되어 아웃-오브-도메인(out-of-domain) 데이터(중립 문장)를 인-도메인(in-domain) 데이터인 것 처럼 잘못 분류하게 된다.4(a) is an exemplary diagram of a low-dimensional manifold when the classification problem is defined as closed-set classification. When the problem is defined in this way, all data is classified only into the positive and negative classes included in the training data, and out-of-domain data (neutral sentences) are classified as in-domain It is misclassified as if it were data.

도 4의 (b)는 분류 문제를 open-set recognition 문제로 정의했을 때의 매니폴드의 예시를 나타낸 것이다. 이 경우, 학습 데이터에 포함되지 않은 라벨의 데이터를 매니폴드 상에서 out-of-domain 데이터로 분류하기 위해 매니폴드 상에 각 클래스별 집합이 구성되고 어느 클래스에도 속하지 않는 영역을 생성한다. Out-of-domain 데이터를 분류하기 위해서는 (b)의 매니폴드 상에서 학습 데이터들의 클래스 집합에 해당 데이터가 포함되는지 여부를 판단해야 한다. 개시되는 실시예들에서 이는 신뢰점수와 클래스와의 거리를 이용하여 판단될 수 있다. 개시되는 실시예들에서는 해당 매니폴드 상에서 각 클래스 별 집합을 더 강건하게 표현하여 out-of-domain 데이터 분류 성능을 향상시킬 수 있도록 하기 위하여, 이상치로 판단될 수 있는 학습데이터를 제외한 α-고밀도-세트만을 매니폴드에 임베딩하게 된다.Figure 4 (b) shows an example of a manifold when the classification problem is defined as an open-set recognition problem. In this case, in order to classify label data not included in the training data as out-of-domain data on the manifold, a set for each class is configured on the manifold and a region not belonging to any class is created. In order to classify out-of-domain data, it is necessary to determine whether the corresponding data is included in the class set of training data on the manifold of (b). In the disclosed embodiments, this may be determined using a confidence score and a distance between the classes. In the disclosed embodiments, in order to improve the out-of-domain data classification performance by more robustly expressing the set for each class on the corresponding manifold, α-high-density-excluding training data that can be judged as outliers Only the set is embedded into the manifold.

제2 분류기(104)는 상기와 같이 계산된 학습 데이터의 α-고밀도-세트를 이용하여 신뢰 점수를 계산할 수 있다. 상기 매니폴드(302)에는 학습 데이터 중 이상치로 판단될 수 있는 데이터가 제거된 나머지 데이터(α-고밀도-세트)들이 임베딩된다. 신뢰 점수는 학습 데이터가 임베딩된 매니폴드(302)상에서 분류 대상 문장과 제1 분류기(102)에 의해 분류(예측)된 클래스까지의 거리 및 상기 대상 문장과 상기 예측된 클래스를 제외한 가장 가까운 클래스까지의 거리에 기반하여 계산된다. 구체적으로 제2 분류기(104)는 분류 대상 문장과 제1 분류기(102)에 의해 분류(예측)된 클래스까지의 거리를 상기 대상 문장과 상기 예측된 클래스를 제외한 가장 가까운 클래스까지의 거리로 나눈 값으로 상기 신뢰 점수를 정의할 수 있다. 이 값이 클수록 제1 분류기(102)의 분류 결과를 신뢰할 수 있다고 생각할 수 있다.The second classifier 104 may calculate a confidence score using the α-high-density-set of the training data calculated as described above. In the manifold 302 , remaining data (α-high-density-set) from which data that may be determined as outliers are removed from the learning data is embedded. The confidence score is the distance between the target sentence to be classified on the manifold 302 in which the training data is embedded and the class classified (predicted) by the first classifier 102 and the closest class excluding the target sentence and the predicted class. is calculated based on the distance of Specifically, the second classifier 104 divides the distance between the target sentence and the class classified (predicted) by the first classifier 102 by the distance between the target sentence and the nearest class excluding the predicted class. The trust score can be defined as It can be considered that the classification result of the first classifier 102 is more reliable as this value is larger.

도 5는 일 실시예에 따른 제2 분류기(104)의 재분류 과정을 설명하기 위한 흐름도(500)이다. 도시된 흐름도에서는 상기 방법 또는 과정을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.5 is a flowchart 500 illustrating a reclassification process of the second classifier 104 according to an exemplary embodiment. In the illustrated flowchart, the method or process is divided into a plurality of steps, but at least some of the steps are performed in reverse order, combined with other steps, performed together, omitted, divided into detailed steps, or shown. One or more steps not yet performed may be added and performed.

단계 502에서, 제2 분류기(104)는 분류 대상 문장에 포함된 [CLS] 토큰을 제1 분류기(102)의 언어 모델(202)에 입력할 경우 생성되는 은닉 상태 벡터의 차원을 축소한다. 전술한 바와 같이, 제2 분류기(104)는 주성분분석(PCA; Principal Component Analysis) 등의 방법을 이용하여 해당 벡터의 차원을 감소시킬 수 있다.In step 502, the second classifier 104 reduces the dimensionality of the hidden state vector generated when the [CLS] token included in the sentence to be classified is input to the language model 202 of the first classifier 102. As described above, the second classifier 104 may reduce the dimension of the corresponding vector using a method such as principal component analysis (PCA).

단계 504에서, 제2 분류기(104)는 저차원 매니폴드(302) 상에서 분류 대상 문장과 모든 클래스(긍정 및 부정) 각각의 거리를 계산한다. 매니폴드 상에서 클래스별 거리를 계산하는 방법에 대해서는 본 기술분야에서 잘 알려져 있으므로 여기서는 자세한 기재를 생략한다. 예를 들어, 제2 분류기(104)는 각 클래스에 속한 데이터와 분류 대상 문장간의 최대 거리를 각 클래스별 거리로 설정하거나, 또는 각 클래스의 중심점까지의 거리를 각 클래스별 거리로 설정할 수 있다.In step 504, the second classifier 104 calculates the distance between the sentence to be classified and all classes (positive and negative) on the low-dimensional manifold 302. Since a method for calculating the distance per class on a manifold is well known in the art, a detailed description thereof will be omitted. For example, the second classifier 104 may set the maximum distance between data belonging to each class and the sentence to be classified as the distance for each class, or set the distance to the central point of each class as the distance for each class.

단계 506에서, 제2 분류기(104)는 신뢰 점수를 계산한다. 전술한 바와 같이, 제2 분류기(104)는 상기 학습 데이터가 임베딩된 매니폴드 상에서의 상기 대상 문장과 제1 분류기(102)에 의하여 예측된 클래스까지의 거리, 및 상기 대상 문장과 상기 예측된 클래스를 제외한 가장 가까운 클래스까지의 거리에 기반하여 상기 신뢰 점수를 계산할 수 있다.At step 506, the second classifier 104 calculates a confidence score. As described above, the second classifier 104 determines the distance between the target sentence on the manifold in which the learning data is embedded and the class predicted by the first classifier 102, and the target sentence and the predicted class. The confidence score may be calculated based on the distance to the nearest class excluding .

단계 508에서, 제2 분류기(104)는 상기 신뢰 점수가 기 설정된 제1 임계값보다 큰지 여부를 판단한다. 만약 상기 508 단계의 판단 결과 상기 신뢰 점수가 기 설정된 제1 임계값 이하인 경우, 단계 510에서 제2 분류기(104)는 상기 대상 문장을 중립 문장으로 재분류한다.In step 508, the second classifier 104 determines whether the confidence score is greater than a preset first threshold value. If, as a result of the determination in step 508, the confidence score is equal to or less than the preset first threshold value, in step 510, the second classifier 104 reclassifies the target sentence as a neutral sentence.

만약 상기 508 단계의 판단 결과 상기 신뢰 점수가 상기 제1 임계값보다 큰 경우, 다음으로 단계 510에서 제2 분류기(508)는 상기 504 단계에서 계산된 거리가, 즉 상기 학습 데이터가 임베딩된 매니폴드 상에서의 상기 대상 문장과 각 클래스 간의 거리가 모두 기 설정된 제2 임계값 이상인지의 여부를 판단한다.If, as a result of the determination in step 508, the confidence score is greater than the first threshold value, in step 510, the second classifier 508 calculates the distance calculated in step 504, that is, the manifold in which the training data is embedded. It is determined whether all distances between the target sentence and each class on the image are equal to or greater than a preset second threshold value.

만약 상기 510의 판단 결과 각 클래스까지의 거리가 모두 제2 임계값 이상인 경우, 제2 분류기(104)는 상기 대상 문장을 중립 문장으로 재분류한다. 그러나 이와 달리 하나라도 제2 임계값 미만인 경우, 단계 514에서 제2 분류기(104)는 제1 분류기(102)의 분류 결과를 유지한다.If, as a result of the determination in step 510, the distances to each class are equal to or greater than the second threshold, the second classifier 104 reclassifies the target sentence as a neutral sentence. However, if at least one of them is less than the second threshold, the second classifier 104 maintains the classification result of the first classifier 102 in step 514 .

도 6은 일 실시예에 따른 텍스트 분류 방법(600)을 설명하기 위한 흐름도이다. 도시된 방법은 하나 이상의 프로세서들, 및 상기 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 컴퓨팅 장치, 예컨대 텍스트 분류 장치(100)에서 수행될 수 있다. 도시된 흐름도에서는 상기 방법 또는 과정을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.6 is a flowchart illustrating a text classification method 600 according to an exemplary embodiment. The illustrated method may be performed in a computing device having one or more processors and a memory storing one or more programs executed by the one or more processors, such as the text classification apparatus 100 . In the illustrated flowchart, the method or process is divided into a plurality of steps, but at least some of the steps are performed in reverse order, combined with other steps, performed together, omitted, divided into detailed steps, or shown. One or more steps not yet performed may be added and performed.

단계 602에서, 텍스트 분류 장치(100)의 제1 분류기(102)는 입력된 분류 대상 문장이 긍정 문장인지 또는 부정 문장인지의 여부를 판단한다. 일 실시예에서 제1 분류기(102)는 긍정 또는 부정으로 레이블링된 복수의 문장을 포함하는 학습 데이터에 의하여 학습되며, 이를 기반으로 상기 대상 문장의 긍/부정 여부를 판단할 수 있다.In step 602, the first classifier 102 of the text classification apparatus 100 determines whether the input sentence to be classified is a positive sentence or a negative sentence. In one embodiment, the first classifier 102 is trained by learning data including a plurality of sentences labeled as positive or negative, and based on this, it can determine whether the target sentence is positive/negative.

단계 604에서, 텍스트 분류 장치(100)의 제2 분류기(104)는 제1 분류기(102)의 분류 결과에 대한 신뢰 점수(trustscore)에 기반하여, 상기 제1 분류기에 의하여 긍정 또는 부정으로 분류된 상기 대상 문장의 적어도 일부를 중립 문장으로 재분류한다.In step 604, the second classifier 104 of the text classification apparatus 100 classifies the text as positive or negative based on the trustscore of the classification result of the first classifier 102. At least a part of the target sentence is reclassified as a neutral sentence.

아래의 표 1은 긍부정 분류 모델만을 이용하여 중립을 분류하는 4가지 방식의 성능을 비교한 표이다. 모델을 학습시키기 위해 사용된 학습 데이터셋은 상품에 대한 평가를 포함하고 있는 긍/부정으로 라벨링된 데이터이고, 모델을 평가하기 위한 테스트 데이터는 고객센터 발화 데이터, 인터넷 댓글 등의 데이터이며 사람이 직접 수집, 정제 및 검수한 데이터이다. 테스트 데이터는 중립, 긍정, 부정 문장이 2:1:1의 비율로 포함되어 있다. 중립 분류 기준 임계값은 각 기준별 전체 학습 데이터의 Q1값으로 설정하였다. 실험 (가) 내지 (아) 모두 동일한 언어 모델을 사용하였으며, 언어 모델의 결과를 풀링한 후 정답의 분포를 예측할 수 있도록 단층의 전결합 뉴럴 네트워크(Fully Connected Neural Network)를 붙인 구조를 사용하였다. Table 1 below is a table comparing the performance of four methods for classifying neutrals using only the positive-negative classification model. The training dataset used to train the model is positive/negative labeled data that includes product evaluations, and the test data for model evaluation is data such as customer center utterance data and internet comments. Data that has been collected, refined, and inspected. The test data contains neutral, positive, and negative sentences in a ratio of 2:1:1. The neutral classification criterion threshold was set as the Q1 value of the entire training data for each criterion. Experiments (A) to (H) all used the same language model, and after pooling the results of the language model, a structure with a single-layer Fully Connected Neural Network was used to predict the distribution of correct answers.

(가) (go) (나)(me) (다)(all) (라)(la) 정확도accuracy 0.6680.668 0.6870.687 0.7160.716 0.8200.820

(가)와 (나)는 긍/부정 분류기만 사용하여 중립 데이터를 분류하는데, (가)는 분류기의 결과로 나온 확률 벡터(logits)의 소프트맥스(softmax) 값이 임계값보다 작은 경우, (나)는 긍/부정 라벨에 대한 각각의 확률 벡터(logits)의 시그모이드(sigmoid) 값이 임계값보다 작은 경우를 중립으로 분류한다. (다)와 (라)는 학습 데이터의 α-고밀도-세트(α-high-density-set)를 구해 해당 데이터들의 분포를 계산하여 신뢰 점수(trustscore)를 구하고 이를 기준으로 중립을 분류한다. 신뢰 점수를 이용하여 중립을 분류하는 (다)와 (라)의 정확도(accuracy)가 (가)와 (나)에 비해 높은데, 이는 언어모델 기반의 분류기의 결과가 실제 결과의 신뢰도를 반영하는 것이 아님을 보여준다. (라)는 전술한 수학식 1의 방법으로 학습데이터에 노이즈를 추가하여 좀더 분류기가 학습데이터의 분포를 강건하게 학습하도록 했는데, (다)에 비해 크게 정확도가 상승함을 알 수 있다. 즉, 학습 데이터에 적절한 노이즈를 추가할 경우 모델의 과적합을 방지하고, 성능을 향상시켰다는 것을 확인할 수 있다. (A) and (B) classify neutral data using only positive/negative classifiers. In (A), when the softmax value of the probability vector (logits) resulting from the classifier is smaller than the critical value, ( B) classifies a case where the sigmoid value of each probability vector (logits) for positive/negative labels is smaller than a threshold value as neutral. (C) and (D) obtain an α-high-density-set of training data, calculate the distribution of the data, obtain a trustscore, and classify neutral based on this. The accuracy of (c) and (d) for classifying neutrals using confidence scores is higher than that of (a) and (b), which means that the result of the language model-based classifier reflects the reliability of the actual result. show that it is not In (D), noise was added to the learning data by the method of Equation 1 described above so that the classifier more robustly learned the distribution of the learning data, and it can be seen that the accuracy is greatly increased compared to (C). In other words, it can be confirmed that the overfitting of the model is prevented and the performance is improved when appropriate noise is added to the training data.

개시되는 실시예들에 따른 텍스트 분류 방식은 감성 분류 이외에도 다양한 분류 태스크에 적용하여, 학습 데이터에는 포함되지 않는 도메인(out-of-domain) 데이터를 검출할 수 있는 분류기를 만들 수 있다. 즉, 개시되는 실시예들은 감성 분류 뿐만 아니라 다양한 종류의 Open Set Recognition 문제를 해결할 수 있다. 앞서 언급한 것처럼 종래의 연구들은 out-of-domain 데이터를 검출하기 위해 모델의 결과값인 소프트맥스 값을 사용하거나 부족한 out-of-domain 데이터를 추가하고 이를 보충하기 위해 그와 비슷한 데이터를 새로운 모델을 이용해 생성하여 더 많은 데이터로 학습시키는 방식을 사용하기도 했다. 그러나 개시되는 실시예들에서 제안하는 방식은 표 1과 아래의 표 2에서 확인할 수 있는 것처럼, 소프트맥스를 사용하는 방식보다 정확도가 높고, out-of-domain 데이터가 필요하지 않아 학습이 간편하다.The text classification method according to the disclosed embodiments can be applied to various classification tasks in addition to sentiment classification to create a classifier capable of detecting out-of-domain data not included in training data. That is, the disclosed embodiments can solve various types of Open Set Recognition problems as well as sentiment classification. As mentioned above, previous studies have used the softmax value, which is the result of the model, to detect out-of-domain data, or added insufficient out-of-domain data and used similar data as a new model to supplement it. , and used a method of learning with more data. However, as can be seen in Table 1 and Table 2 below, the method proposed in the disclosed embodiments has higher accuracy than the method using softmax and is easy to learn because it does not require out-of-domain data.

표 2는 개시되는 실시예 따른 텍스트 분류 방식을 질문 유형 분류 모델에 적용한 결과이다. 모델을 학습시키기 위해 사용한 학습데이터 셋은 질문으로만 이루어져 있으며, {who, when, what, where, why, how}의 여섯 가지 질문 유형으로 구성되어 있다. 테스트는 학습데이터와 동일하게 라벨링된 질문 데이터와, 기사, 메신저 등 다양한 출처를 통해 수집한 질문이 아닌 문장 데이터의 각 라벨별로 유사한 비율로 포함되도록 구성하였다.Table 2 is a result of applying the text classification method according to the disclosed embodiment to the question type classification model. The training data set used to train the model consists of only questions, and consists of six question types: {who, when, what, where, why, how}. The test was configured to include question data labeled the same as learning data and sentence data, not questions collected through various sources such as articles and messengers, in a similar ratio for each label.

OOD 분류 기준 OOD Classification Criteria with Perturbationwith Perturbation without Perturbationwithout Perturbation (마)(mind) XX 0.81790.8179 0.81150.8115 (바)(bar) SoftmaxSoftmax 0.86030.8603 0.84350.8435 (사)(buy) SigmoidSigmoid 0.89610.8961 0.86150.8615 (아)(ah) Trustscore Trustscore 0.92180.9218 0.87440.8744

해당 태스크에서 질문이 아닌 문장이 모델이 학습한 라벨 중 어느 것으로라도 분류되는 경우(즉, 질문이 아닌 문장이 질문으로 판단되는 경우) 이는 잘못된 추론으로 판단될 수 있다. (마)의 정확도는 질문이 아닌 데이터에 대한 모델의 추론을 무조건 오답으로 처리한 결과이다. (바)와 (사)는 질문이 아닌 데이터를 out-of-domain (OOD) 데이터로 상정하여 각각 소프트맥스 값과 시그모이드 값을 기준으로 질문이 아닌 데이터를 검출했다. (아)는 신뢰 점수(trustscore)로 질문이 아닌 데이터에 대한 모델의 추론을 방지하고, 이렇게 모델에 의해 분류되지 않은 out-of-domain 데이터를 정답으로 처리한 결과다. 표 1과 마찬가지로 해당 태스크에서도 out-of-domain 데이터 분류 기준을 신뢰 점수로 설정한 경우가 소프트맥스와 시그모이드 값으로 분류기준을 정한 것보다 성능이 개선되었다. In that task, if a sentence that is not a question is classified as any of the labels learned by the model (ie, a sentence that is not a question is judged to be a question), this may be judged as incorrect inference. The accuracy of (E) is the result of unconditionally processing the model's inference about the data, not the question, as an incorrect answer. (F) and (G) assumed non-question data as out-of-domain (OOD) data and detected non-question data based on the softmax value and sigmoid value, respectively. (H) is the result of preventing the model from inferring data that is not a question with a trustscore, and processing out-of-domain data that has not been classified by the model as the correct answer. As in Table 1, even in the corresponding task, when the out-of-domain data classification criterion was set as the confidence score, performance was improved compared to when the classification criterion was set as the softmax and sigmoid values.

또한 (마), (바), (사)에서 퍼터베이션(perturbation)의 유무에 따른 성능 차이가 (아)에서 나타나는 성능차이보다 작은 것을 확인할 수 있는데, 이는 개시되는 실시예에서와 같이 적대적 학습(adversarial training) 방식과 신뢰 점수를 함께 사용한 경우 학습 데이터의 분포를 더 잘 일반화하여 학습하였기 때문에 학습데이터의 분포를 사용하는 신뢰 점수가 크게 개선된 것으로 설명할 수 있다.In addition, it can be confirmed that the difference in performance according to the presence or absence of perturbation in (E), (F), and (G) is smaller than the difference in performance shown in (H), which is the same as in the disclosed embodiment. When the adversarial training method and the confidence score were used together, it can be explained that the confidence score using the distribution of the training data is greatly improved because the distribution of the training data is better generalized and learned.

도 7은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술되지 않은 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.7 is a block diagram illustrating and describing a computing environment including a computing device suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities other than those described below, and may include additional components other than those not described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 전술한 텍스트 분류 장치(100)일 수 있다.The illustrated computing environment 10 includes a computing device 12 . In one embodiment, computing device 12 may be text classification device 100 described above.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 12 includes at least one processor 14 , a computer readable storage medium 16 and a communication bus 18 . Processor 14 may cause computing device 12 to operate according to the above-mentioned example embodiments. For example, processor 14 may execute one or more programs stored on computer readable storage medium 16 . The one or more programs may include one or more computer-executable instructions, which when executed by processor 14 are configured to cause computing device 12 to perform operations in accordance with an illustrative embodiment. It can be.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. Program 20 stored on computer readable storage medium 16 includes a set of instructions executable by processor 14 . In one embodiment, computer readable storage medium 16 includes memory (volatile memory such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, other forms of storage media that can be accessed by computing device 12 and store desired information, or any suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.Communications bus 18 interconnects various other components of computing device 12, including processor 14 and computer-readable storage medium 16.

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide interfaces for one or more input/output devices 24 . An input/output interface 22 and a network communication interface 26 are connected to the communication bus 18 . Input/output device 24 may be coupled to other components of computing device 12 via input/output interface 22 . Exemplary input/output devices 24 include a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touchscreen), a voice or sound input device, various types of sensor devices, and/or a photographing device. input devices, and/or output devices such as display devices, printers, speakers, and/or network cards. The exemplary input/output device 24 may be included inside the computing device 12 as a component constituting the computing device 12, or may be connected to the computing device 12 as a separate device distinct from the computing device 12. may be

이상에서 대표적인 실시예를 통하여 본 발명에 대하여 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 전술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although the present invention has been described in detail through representative examples above, those skilled in the art can make various modifications to the above-described embodiments without departing from the scope of the present invention. will understand Therefore, the scope of the present invention should not be limited to the described embodiments and should not be defined, and should be defined by not only the claims to be described later, but also those equivalent to these claims.

100: 텍스트 분류 장치
102: 제1 분류기
104: 제2 분류기
202: 언어 모델
204: 분류기
206: 입력 토큰
208: 토큰 임베딩
210: 은닉 상태 벡터
212: [CLS] 토큰의 은닉 상태 벡터
214: 확률 벡터
302: 저차원 매니폴드100: text classification device
102: first classifier
104: second classifier
202: language model
204: classifier
206: input token
208: token embedding
210: hidden state vector
212: [CLS] token's hidden state vector
214: probability vector
302: low-dimensional manifold

Claims

a first classifier that determines whether an input sentence to be classified is a positive sentence or a negative sentence based on a language model learned by training data including a plurality of sentences labeled as positive or negative; and
A second classifier for reclassifying at least a part of the target sentence classified as positive or negative by the first classifier into a neutral sentence based on a trustscore of a classification result of the first classifier, Text classification device.

The method of claim 1,
The first classifier,
A text classification apparatus for performing learning on the training data using an objective function set to include normalized noise in the training data.

The method of claim 2,
The objective function includes a preset noise function,
The noise function is set to have a maximum function value when the distance between the noise-containing training data and the original training data is within a preset range.

The method of claim 1,
The first classifier,
Using a hidden state vector generated by inputting the [CLS] token included in the target sentence into a pre-trained language model, whether the sentence to be classified is a positive sentence or a negative sentence A text classification device that determines whether a text is recognized or not.

The method of claim 4,
The second classifier,
The confidence score based on the distance between the target sentence on the manifold in which the training data is embedded and the class predicted by the first classifier, and the distance between the target sentence and the closest class excluding the predicted class. A text classification device that computes .

The method of claim 5,
The second classifier,
A plurality of hidden state vectors are generated by inputting [CLS] tokens included in each of the training data into the language model, and the rest of the hidden state vectors except for a certain ratio having a low distribution density are embedded on the manifold to obtain each A text classification device that creates a set of classes.

The method of claim 6,
The second classifier,
Before embedding the generated plurality of hidden state vectors on the manifold, a dimension of the plurality of hidden state vectors is reduced.

The method of claim 5,
The second classifier,
and re-classifying the target sentence as a neutral sentence when the confidence score is equal to or less than a preset first threshold.

The method of claim 8,
The second classifier,
Text classification apparatus for re-classifying the target sentence as a neutral sentence when all distances between the target sentence and each class on the manifold in which the learning data is embedded are equal to or greater than a preset second threshold value.

As a method performed on a computer,
a first classification step of determining whether an input sentence to be classified is a positive sentence or a negative sentence, based on a language model learned from training data including a plurality of sentences labeled as positive or negative; and
And a second classification step of reclassifying at least a part of the target sentence classified as positive or negative by the first classifier into neutral sentences based on a trustscore of a classification result of the first classifier. , a text classification method.

The method of claim 10,
In the first classification step,
A text classification method for performing learning on the training data using an objective function set to include normalized noise in the training data.

The method of claim 11,
The objective function includes a preset noise function,
The noise function is set to have a maximum function value when the distance between the noise-containing training data and the original training data is within a preset range.

The method of claim 10,
In the first classification step,
Using a hidden state vector generated by inputting the [CLS] token included in the target sentence into a pre-trained language model, whether the sentence to be classified is a positive sentence or a negative sentence A text classification method that determines whether or not it is recognized.

The method of claim 13,
The second classification step,
The confidence score based on the distance between the target sentence on the manifold in which the training data is embedded and the class predicted by the first classifier, and the distance between the target sentence and the closest class excluding the predicted class. A text classification method that computes .

The method of claim 14,
The second classification step,
A plurality of hidden state vectors are generated by inputting [CLS] tokens included in each of the training data into the language model, and the rest of the hidden state vectors except for a certain ratio having a low distribution density are embedded on the manifold to obtain each A text classification method that creates a set of classes.

The method of claim 15
The second classification step is
The text classification method of claim 1, further comprising reducing dimensions of the plurality of hidden state vectors before embedding the generated plurality of hidden state vectors on the manifold.

The method of claim 14,
The second classification step,
and re-classifying the target sentence as a neutral sentence when the confidence score is equal to or less than a preset first threshold.

The method of claim 17
The second classification step,
The text classification method of reclassifying the target sentence as a neutral sentence when all distances between the target sentence and each class on the manifold in which the training data is embedded are equal to or greater than a preset second threshold value.