KR20210085278A

KR20210085278A - Apparatus and method for learning imbalanced data

Info

Publication number: KR20210085278A
Application number: KR1020190178159A
Authority: KR
Inventors: 한병옥; 김호원; 유장희
Original assignee: 한국전자통신연구원
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-07-08
Also published as: KR102577714B1

Abstract

Disclosed are a device and a method for learning imbalanced data. According to an embodiment of the present invention, an unbalanced data learning device includes: one or more processors; and an executable memory for storing at least one or more programs to be executed by the one or more processors, wherein the at least one program receives imbalanced data, determines a weight based on the imbalanced data, and learns the imbalanced data using a predefined loss function in which a loss value changes based on the weight. Therefore, by learning the imbalanced data, a recognition rate degradation problem which occurs in a process of performing multi-class classification is solved.

Description

Unbalanced data learning apparatus and method {APPARATUS AND METHOD FOR LEARNING IMBALANCED DATA}

본 발명은 기계 학습 기술에 관한 것으로, 보다 상세하게는 불균형 데이터 학습 기술에 관한 것이다.The present invention relates to machine learning technology, and more particularly to unbalanced data learning technology.

지도학습 기반 딥 뉴럴 네트워크는 높은 표현력(Representation Power)를 바탕으로 최근 강력한 성능을 보여주고 있다. 이는 수 많은 데이터를 학습하여 네트워크 파라메터를 통해 일반화(Generalization)가 가능하기 때문이다. 그렇기 때문에 충분한 양의 데이터는 학습을 위한 기반이 되며, 그에 따른 올바른 레이블 정보는 정확한 네트워크 성능에 있어서 중요한 정보이다.Supervised learning-based deep neural networks have recently shown strong performance based on their high representation power. This is because generalization is possible through network parameters by learning a lot of data. Therefore, a sufficient amount of data becomes the basis for learning, and correct label information is important for accurate network performance.

하지만, 모든 클래스 별로 충분한 양의 데이터를 확보하는 것은 그 데이터의 도메인의 특성에 따라 쉽지 않다. 예를 들면, 웹에서 수집한 6가지 표정 분류(Classification)을 위한 표정 인식 데이터베이스의 경우 Happiness, Sadness 클래스의 데이터는 많이 찾을 수 있지만, Fear, Disgust, Surprise, Anger 감정 클래스 표정 영상은 해당 클래스의 본질적 특성에 의해 샘플의 수가 적어 상대적으로 수집하기 어렵다. 실험실 환경에서 인위적으로 표정을 짓게 하여 데이터를 균형 있게 수집할 수 있지만 이는 일반화(Generalization) 능력이 떨어져서 일반적으로 실환경(Real-world) 시나리오에서 성능이 떨어진다고 알려져 있다. 문제를 간단하게 해결하기 위한 방법으로 적은 양의 클래스 데이터를 기준으로 다른 클래스의 데이터 양을 조절하여 균형을 맞출 수 있다. 하지만 이는 전체적인 데이터 양이 적어지는 문제를 야기하게 되어 딥 뉴럴 네트워크의 분류 성능이 떨어뜨릴 수 있기 때문에 이는 간단치 않은 문제이다. However, it is not easy to secure a sufficient amount of data for every class depending on the characteristics of the data domain. For example, in the case of an expression recognition database for six facial expression classifications collected from the web, you can find a lot of data of the Happiness and Sadness classes, but the Fear, Disgust, Surprise, and Anger emotion class facial expression images are the essence of the class. Due to the characteristics, the number of samples is small and it is relatively difficult to collect. Although it is possible to collect data in a balanced way by artificially making facial expressions in a laboratory environment, it is known that the generalization ability is poor, and performance is generally poor in real-world scenarios. As a simple way to solve the problem, you can balance it by adjusting the amount of data in other classes based on a small amount of data in one class. However, this is not a simple problem because it causes a problem in which the overall amount of data decreases and the classification performance of the deep neural network decreases.

한편, 미국등록특허 US 9,224,104 “ Generating data from imbalanced training data sets”는 레이블 분포가 불균형한 학습 데이터(Imbalanced Training Data)를 균형 있는 데이터(Balanced Data)로 생성하는 방법에 관하여 개시하고 있다.Meanwhile, U.S. Patent No. 9,224,104 “Generating data from imbalanced training data sets” discloses a method of generating unbalanced training data with unbalanced label distribution as balanced data.

본 발명은 불균형 데이터를 학습하여 멀티 클래스 분류 수행 과정에서 생기는 인식률 저하 문제를 해결하는 것을 목적으로 한다.An object of the present invention is to solve the problem of lowering the recognition rate occurring in the process of performing multi-class classification by learning imbalanced data.

상기한 목적을 달성하기 위한 본 발명의 일실시예에 따른 불균형 데이터 학습 장치는 하나 이상의 프로세서 및 상기 하나 이상의 프로세서에 의해 실행되는 적어도 하나 이상의 프로그램을 저장하는 실행메모리를 포함하고, 상기 적어도 하나 이상의 프로그램은 불균형 데이터를 입력 받고, 상기 불균형 데이터에 기반하여 가중치를 결정하고, 상기 가중치에 기반하여 손실값이 변화하는 기정의된 손실함수를 이용하여 상기 불균형 데이터를 학습할 수 있다.An apparatus for learning imbalanced data according to an embodiment of the present invention for achieving the above object includes one or more processors and an execution memory for storing at least one or more programs executed by the one or more processors, the at least one program may receive imbalance data, determine a weight based on the imbalance data, and learn the imbalance data using a predefined loss function in which a loss value is changed based on the weight.

이 때, 상기 손실함수는 역전파(BACK-PROPAGATION) 알고리즘을 위한 목적함수로 사용될 수 있다.In this case, the loss function may be used as an objective function for a BACK-PROPAGATION algorithm.

이 때, 상기 적어도 하나 이상의 프로그램은 상기 불균형 데이터의 배치(BATCH) 별로, 상기 불균형 데이터의 클래스 별 데이터 분포 및 클래스 별 인식 정확도 중 적어도 하나에 기반하여 가중치를 결정 수 있다.In this case, the at least one program may determine a weight based on at least one of a data distribution for each class of the imbalanced data and a recognition accuracy for each class for each batch of the imbalanced data (BATCH).

이 때, 상기 적어도 하나 이상의 프로그램은At this time, the at least one or more programs

상기 클래스 별 데이터 분포에 기반한 가중치를 결정하기 위해, 상기 클래스 별로 상기 클래스의 데이터가 차지하는 양에 기반하여 상기 손실값이 변경되도록 상기 가중치를 결정할 수 있다.In order to determine the weight based on the data distribution for each class, the weight may be determined so that the loss value is changed based on the amount occupied by the data of the class for each class.

이 때, 상기 적어도 하나 이상의 프로그램은 상기 클래스 별로 클래스의 데이터가 차지하는 양이 기설정된 값보다 작으면 상기 손실값이 증가되는 방향으로 상기 가중치를 결정하고, 상기 클래스 별로 클래스의 데이터가 차지하는 양이 기설정된 값보다 크면 상기 손실값이 감소되는 방향으로 상기 가중치를 결정할 수 있다.In this case, the at least one program determines the weight in a direction in which the loss value increases when the amount of data of the class for each class is less than a preset value, and the amount of data of the class for each class is If it is greater than the set value, the weight may be determined in a direction in which the loss value is decreased.

이 때, 상기 적어도 하나 이상의 프로그램은 상기 클래스 별 인식 정확도에 기반한 가중치를 결정하기 위해, 클래스 별 히트 레이트(HIT RATE)를 계산할 수 있다.In this case, the at least one program may calculate a hit rate for each class in order to determine a weight based on the recognition accuracy for each class.

이 때, 상기 적어도 하나 이상의 프로그램은 상기 히트 레이트가 기설정된 값보다 작으면 상기 가중치를 기설정된 가중치보다 증가시키고, 상기 히트 레이트가 기설정된 값보다 높으면 상기 가중치를 상기 기설정된 가중치보다 감소시킬 수 있다.In this case, the at least one program may increase the weight more than the preset weight if the hit rate is less than a preset value, and decrease the weight than the preset weight if the hit rate is higher than the preset value. .

이 때, 상기 적어도 하나 이상의 프로그램은 변화된 상기 손실값과 상기 가중치에 기반하여 상기 배치 별로 상기 가중치를 재결정하고, 재결정된 상기 가중치를 이용하여 상기 배치 별로 상기 손실함수를 계산하여 상기 불균형 데이터를 학습할 수 있다.At this time, the at least one program recrystallizes the weight for each batch based on the changed loss value and the weight, and calculates the loss function for each batch using the recrystallized weight to learn the imbalance data. can

또한, 상기한 목적을 달성하기 위한 본 발명의 일실시예에 따른 불균형 데이터 학습 방법은 불균형 데이터 학습 장치의 불균형 데이터 학습 방법에 있어서, 불균형 데이터를 입력 받는 단계 및 상기 불균형 데이터에 기반하여 가중치를 결정하고, 상기 가중치에 기반하여 손실값이 변화하는 기정의된 손실함수를 이용하여 상기 불균형 데이터를 학습하는 단계를 포함한다.In addition, the unbalanced data learning method according to an embodiment of the present invention for achieving the above object is the unbalanced data learning method of the unbalanced data learning apparatus, the step of receiving unbalanced data and determining a weight based on the unbalanced data and learning the imbalanced data using a predefined loss function in which a loss value changes based on the weight.

이 때, 상기 불균형 데이터를 학습하는 단계는 상기 불균형 데이터의 배치(BATCH) 별로, 상기 불균형 데이터의 클래스 별 데이터 분포 및 클래스 별 인식 정확도 중 적어도 하나에 기반하여 가중치를 결정할 수 있다.In this case, in the learning of the imbalance data, the weight may be determined based on at least one of a data distribution for each class of the imbalance data and a recognition accuracy for each class for each batch of the imbalance data.

이 때, 상기 불균형 데이터를 학습하는 단계는 상기 클래스 별 데이터 분포에 기반한 가중치를 결정하기 위해, 상기 클래스 별로 상기 클래스의 데이터가 차지하는 양에 기반하여 상기 손실값이 변경되도록 상기 가중치를 결정할 수 있다.In this case, the learning of the imbalance data may determine the weight so that the loss value is changed based on the amount occupied by the data of the class for each class in order to determine the weight based on the data distribution for each class.

이 때, 상기 불균형 데이터를 학습하는 단계는 상기 클래스 별로 클래스의 데이터가 차지하는 양이 기설정된 값보다 작으면 상기 손실값이 증가되는 방향으로 상기 가중치를 결정하고, 상기 클래스 별로 클래스의 데이터가 차지하는 양이 기설정된 값보다 크면 상기 손실값이 감소되는 방향으로 상기 가중치를 결정할 수 있다.At this time, the step of learning the imbalance data determines the weight in a direction in which the loss value increases when the amount of data of the class for each class is less than a preset value, and the amount of data of the class for each class If it is greater than this preset value, the weight may be determined in a direction in which the loss value is decreased.

이 때, 상기 불균형 데이터를 학습하는 단계는 상기 클래스 별 인식 정확도에 기반한 가중치를 결정하기 위해, 클래스 별 히트 레이트(HIT RATE)를 계산할 수 있다.In this case, the learning of the imbalance data may calculate a hit rate for each class in order to determine a weight based on the recognition accuracy for each class.

이 때, 상기 불균형 데이터를 학습하는 단계는 상기 히트 레이트가 기설정된 값보다 작으면 상기 가중치를 기설정된 가중치보다 증가시키고, 상기 히트 레이트가 기설정된 값보다 높으면 상기 가중치를 상기 기설정된 가중치보다 감소시킬 수 있다.In this case, the learning of the imbalance data may include increasing the weight than the predetermined weight if the hit rate is less than a predetermined value, and decreasing the weight than the predetermined weight if the hit rate is higher than the predetermined value. can

이 때, 상기 불균형 데이터를 학습하는 단계는 변화된 상기 손실값과 상기 가중치에 기반하여 상기 배치 별로 상기 가중치를 재결정하고, 재결정된 상기 가중치를 이용하여 상기 배치 별로 상기 손실함수를 계산하여 상기 불균형 데이터를 학습할 수 있다.At this time, the step of learning the imbalance data comprises recrystallizing the weight for each batch based on the changed loss value and the weight, and calculating the loss function for each batch using the recrystallized weight to obtain the imbalance data. can learn

본 발명은 불균형 데이터를 학습하여 멀티 클래스 분류 수행 과정에서 생기는 인식률 저하 문제를 해결할 수 있다.The present invention can solve the problem of lowering the recognition rate that occurs in the process of performing multi-class classification by learning imbalanced data.

도 1은 본 발명의 일실시예에 따른 불균형 데이터 학습 장치를 나타낸 블록도이다.
도 2는 본 발명의 일실시예에 따른 불균형 데이터 학습 방법을 나타낸 동작흐름도이다.
도 3은 본 발명의 일실시예에 따른 컴퓨터 시스템을 나타낸 도면이다.1 is a block diagram illustrating an apparatus for learning imbalanced data according to an embodiment of the present invention.
2 is an operation flowchart illustrating a method for learning imbalanced data according to an embodiment of the present invention.
3 is a diagram illustrating a computer system according to an embodiment of the present invention.

본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.The present invention will be described in detail with reference to the accompanying drawings as follows. Here, repeated descriptions, well-known functions that may unnecessarily obscure the gist of the present invention, and detailed descriptions of configurations will be omitted. The embodiments of the present invention are provided in order to more completely explain the present invention to those of ordinary skill in the art. Accordingly, the shapes and sizes of elements in the drawings may be exaggerated for clearer description.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part "includes" a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 불균형 데이터 학습 장치를 나타낸 블록도이다.1 is a block diagram illustrating an apparatus for learning imbalanced data according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 불균형 데이터 학습 장치는 데이터 입력부(110), 데이터 학습부(120), 데이터 출력부(130) 및 데이터베이스부(140)를 포함할 수 있다.Referring to FIG. 1 , an apparatus for learning imbalanced data according to an embodiment of the present invention may include a data input unit 110 , a data learning unit 120 , a data output unit 130 , and a database unit 140 .

데이터 입력부(110)는 기계학습을 위한 불균형 데이터(IMBALANCED DATA)를 입력 받을 수 있다.The data input unit 110 may receive IMBALANCED DATA for machine learning.

데이터 학습부(120)는 불균형 데이터를 입력 받고, 상기 불균형 데이터에 기반하여 가중치를 결정하고, 상기 가중치에 기반하여 손실값이 변화하는 기정의된 손실함수를 이용하여 상기 불균형 데이터를 학습할 수 있다.The data learning unit 120 may receive unbalanced data, determine a weight based on the unbalanced data, and learn the unbalanced data using a predefined loss function in which a loss value is changed based on the weight. .

데이터 학습부(120)는 딥 뉴럴 네트워크(Deep Neural Network, DNN) 구조에서 사용되는 손실함수(Loss Function)를 이용하여 데이터를 학습할 수 있다.The data learning unit 120 may learn data using a loss function used in a deep neural network (DNN) structure.

손실함수는 DNN 이 데이터를 학습할 때 역전파(Back-propagation) 알고리즘을 위한 목적함수(Object Function)으로 사용될 수 있다.The loss function can be used as an object function for the back-propagation algorithm when the DNN learns data.

예를 들어, 손실함수는 ConvolutionalNN, ResNet 등 일반적인 DNN 구조에서 활용이 가능하다. For example, the loss function can be used in general DNN structures such as ConvolutionalNN and ResNet.

본 발명의 일실시예에 따른 크로스 엔트로피 손실(Cross Entropy Loss)은 수학식 1과 같이 정의할 수 있다.A cross entropy loss according to an embodiment of the present invention may be defined as in Equation (1).

수학식 1에서 c개의 클래스를 갖고 있는 총 n개의 샘플을 학습할 때,

는 학습 샘플,

는 그에 따른 레이블,

는 딥 뉴럴 네트워크의 파라메터를 나타낸다.

는 DNN을 나타내며 마지막 레이어는 소프트맥스(softmax) 함수이다. 여기서

의 정의는 수학식 2와 같이 나타낼 수 있다.When learning a total of n samples having c classes in Equation 1,

is the learning sample,

is the corresponding label,

denotes a parameter of a deep neural network.

represents the DNN and the last layer is a softmax function. here

The definition of can be expressed as Equation (2).

데이터 학습부(120)는 크로스 엔트로피(Cross Entropy) 함수를 이용하여 각 클래스 예측 확률에 -log 연산을 취한 값을 모두 더하여 손실 값(Loss)으로 사용할 수 있다.The data learner 120 may use a cross entropy function as a loss value by adding all values obtained by performing a -log operation to each class prediction probability.

이 때, 각 클래스 별 예측 확률은

함수에 의해 그 활성화 여부가 결정될 수 있다.At this time, the predicted probability for each class is

Whether or not it is activated can be determined by a function.

이 때, 데이터 학습부(120)는 학습을 수행하는 동안

함수가 학습 데이터의 레이블 빈도 수에 비례하여 활성화될 수 있다.At this time, the data learning unit 120 performs learning while

A function may be activated in proportion to the number of label frequencies in the training data.

이 때, 데이터 학습부(120)는 크로스 엔트로피(Cross Entropy)의 단점을 극복하고자 본 발명의 일실시예에 따른 손실함수인 적응형 크로스 엔트로피 손실 함수(Adaptive Cross Entropy Loss Function)를 이용하여 불균형 데이터를 학습할 수 있다.At this time, the data learning unit 120 uses an adaptive cross entropy loss function, which is a loss function according to an embodiment of the present invention, to overcome the disadvantage of cross entropy imbalance data can learn

본 발명의 일실시예에 따른 손실함수는 수학식 3과 같이 나타낼 수 있다.The loss function according to an embodiment of the present invention can be expressed as Equation (3).

수학식 3에서,

는 j번째 클래스의 샘플 개수를 나타내며,

는 해당 j번째 클래스의 hit 개수를 나타낸다.In Equation 3,

represents the number of samples in the jth class,

represents the number of hits of the j-th class.

이 때, 데이터 학습부(120)는 상기 불균형 데이터의 배치(BATCH) 별로, 상기 불균형 데이터의 클래스 별 데이터 분포 및 클래스 별 인식 정확도 중 적어도 하나에 기반하여 가중치를 결정할 수 있다.In this case, the data learning unit 120 may determine a weight based on at least one of a data distribution for each class of the imbalanced data and a recognition accuracy for each class for each batch of the imbalanced data BATCH.

이 때, 데이터 학습부(120)는 상기 클래스 별 데이터 분포에 기반한 가중치를 결정하기 위해, 상기 클래스 별로 상기 클래스의 데이터가 차지하는 양에 기반하여 상기 손실값이 변경되도록 상기 가중치를 결정할 수 있다.In this case, in order to determine the weight based on the data distribution for each class, the data learning unit 120 may determine the weight so that the loss value is changed based on the amount occupied by the data of the class for each class.

이 때, 데이터 학습부(120)는 상기 클래스 별로 클래스의 데이터가 차지하는 양이 기설정된 값보다 작으면 상기 손실값이 증가되는 방향으로 상기 가중치를 결정하고, 상기 클래스 별로 클래스의 데이터가 차지하는 양이 기설정된 값보다 크면 상기 손실값이 감소되는 방향으로 상기 가중치를 결정할 수 있다.At this time, the data learning unit 120 determines the weight in a direction in which the loss value increases when the amount of data of the class for each class is less than a preset value, and the amount of data of the class for each class is If it is greater than a preset value, the weight may be determined in a direction in which the loss value is decreased.

이 때, 데이터 학습부(120)는

항의 j번째 클래스의 데이터가 차지하는 양이 적으면 손실 값을 높여주는 방향으로 작동할 수 있다.At this time, the data learning unit 120

If the amount of data in the j-th class of the term is small, it can operate in the direction of increasing the loss value.

반대로, 데이터 학습부(120)는 데이터의 차지하는 양이 크면 손실 값을 낮춰주는 방향으로 작동하여 데이터 불균형에 의한 손실 값의 클래스별 비대칭을 완화시켜주는 방향으로 작동할 수 있다.Conversely, when the amount of data occupied by the data learning unit 120 is large, the data learning unit 120 may operate in the direction of lowering the loss value, thereby mitigating the class-specific asymmetry of the loss value due to data imbalance.

여기서 c/(c-1) 항은 가중치를 확률 값으로 정규화 하기 위한 상수항에 상응할 수 있다.Here, the c/(c-1) term may correspond to a constant term for normalizing the weight to a probability value.

또한, 데이터 불균형에 따른 클래스별 정확도를 향상시키기 위해, 데이터 학습부(120)는 상기 클래스 별 인식 정확도에 기반한 가중치를 결정하기 위해, 클래스 별 히트 레이트(HIT RATE)를 계산할 수 있다.In addition, in order to improve the accuracy for each class according to the data imbalance, the data learner 120 may calculate a hit rate for each class (HIT RATE) to determine a weight based on the recognition accuracy for each class.

이 때, 데이터 학습부(120)는 상기 히트 레이트가 기설정된 값보다 작으면 상기 가중치를 기설정된 가중치보다 증가시키고, 상기 히트 레이트가 기설정된 값보다 높으면 상기 가중치를 상기 기설정된 가중치보다 감소시킬 수 있다.In this case, the data learning unit 120 may increase the weight more than the preset weight if the hit rate is less than the preset value, and decrease the weight than the preset weight if the hit rate is higher than the preset value. have.

이 때, 데이터 학습부(120)는 학습시 클래스 별 히트 레이트(hit rate),

를 계산하며 클래스 별 히트 레이트(hit rate)가 낮으면 가중치를 높여주고, 반대로 높으면 가중치를 낮게 해주어, 예측 에러를 최소화함과 동시에 데이터 분포에 따른 정확도의 균형을 맞춰 줌으로써 전체적 인식률을 향상시킬 수 있다.At this time, the data learning unit 120 is a hit rate for each class during learning,

If the hit rate for each class is low, the weight is raised, and if the hit rate is high, the weight is lowered, thereby minimizing prediction error and balancing the accuracy according to the data distribution, thereby improving the overall recognition rate. .

데이터 학습부(120)는 DNN을 학습시킬 때, 손실함수를 배치(Batch) 별로 계산할 수 있다.When learning the DNN, the data learning unit 120 may calculate the loss function for each batch.

수학식 3에서, n 은 배치(Batch)의 크기(size)로 정의될 수 있다.In Equation 3, n may be defined as the size of a batch.

데이터 학습부(120)는 배치(Batch) 별로 손실함수를 정의하게 되면, 각 배치 별로 통계 정보(클래스 별 정확도 및 클래스 별 데이터 분포)를 손실함수에 정의될 수 있다.When the data learning unit 120 defines a loss function for each batch, statistical information (accuracy for each class and data distribution for each class) may be defined in the loss function for each batch.

이 때, 데이터 학습부(120)는 배치 별로 정의된 손실함수를 이용하여 불균형 데이터에서 이를 완화시키는 방향으로 역전파 알고리즘을 동작시킬 수 있다.In this case, the data learning unit 120 may operate the backpropagation algorithm in a direction to alleviate the imbalanced data using the loss function defined for each batch.

결과적으로, 데이터 학습부(120)는 배치 별로 크로스 엔트로피(Cross Entropy) 손실 함수의 가중치가 배치 별 레이블 통계 정보(클래스 별 정확도 및 클래스 별 데이터 분포)를 기반으로 결정되어 결과적으로 매 배치 별로 가중치를 변경하여 손실함수를 계산할 수 있다.As a result, the data learning unit 120 determines the weight of the cross entropy loss function for each batch based on the label statistical information for each batch (accuracy for each class and data distribution for each class), and as a result, the weight for each batch The loss function can be calculated by changing it.

이 때, 데이터 학습부(120)는 변화된 상기 손실값과 상기 가중치에 기반하여 상기 배치 별로 상기 가중치를 재결정하고, 재결정된 상기 가중치를 이용하여 상기 배치 별로 상기 손실함수를 계산하여 상기 불균형 데이터를 학습할 수 있다.At this time, the data learning unit 120 recrystallizes the weight for each batch based on the changed loss value and the weight, and calculates the loss function for each batch using the recrystallized weight to learn the imbalance data. can do.

데이터 출력부(130)는 불균형 데이터를 학습한 결과를 출력할 수 있다.The data output unit 130 may output a result of learning the imbalance data.

데이터베이스부(140)는 불균형 데이터를 학습한 결과를 저장할 수 있다.The database unit 140 may store a result of learning the imbalanced data.

도 2는 본 발명의 일실시예에 따른 불균형 데이터 학습 방법을 나타낸 동작흐름도이다.2 is an operation flowchart illustrating a method for learning imbalanced data according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일실시예에 따른 불균형 데이터 학습 방법은 먼저 데이터를 입력 받을 수 있다(S210).Referring to FIG. 2 , the method for learning imbalanced data according to an embodiment of the present invention may first receive data (S210).

즉, 단계(S210)는 기계학습을 위한 불균형 데이터(IMBALANCED DATA)를 입력 받을 수 있다.That is, step S210 may receive IMBALANCED DATA for machine learning.

또한, 본 발명의 일실시예에 따른 불균형 데이터 학습 방법은 데이터를 학습할 수 있다(S220).In addition, the method for learning imbalanced data according to an embodiment of the present invention can learn data (S220).

즉, 단계(S220)는 불균형 데이터를 입력 받고, 상기 불균형 데이터에 기반하여 가중치를 결정하고, 상기 가중치에 기반하여 손실값이 변화하는 기정의된 손실함수를 이용하여 상기 불균형 데이터를 학습할 수 있다.That is, in step S220, the imbalance data is received, a weight is determined based on the imbalance data, and the imbalance data can be learned using a predefined loss function in which a loss value is changed based on the weight. .

단계(S220)는 딥 뉴럴 네트워크(Deep Neural Network, DNN) 구조에서 사용되는 손실함수(Loss Function)를 이용하여 데이터를 학습할 수 있다.In step S220, data may be learned using a loss function used in a deep neural network (DNN) structure.

는 학습 샘플,

는 그에 따른 레이블,

는 딥 뉴럴 네트워크의 파라메터를 나타낸다.

is the learning sample,

is the corresponding label,

denotes a parameter of a deep neural network.

represents the DNN and the last layer is a softmax function. here

The definition of can be expressed as Equation (2).

단계(S220)는 크로스 엔트로피(Cross Entropy) 함수를 이용하여 각 클래스 예측 확률에 -log 연산을 취한 값을 모두 더하여 손실 값(Loss)으로 사용할 수 있다.Step S220 can be used as a loss value by adding all the values obtained by -log operation to each class prediction probability using a cross entropy function.

이 때, 각 클래스 별 예측 확률은

Whether or not it is activated can be determined by a function.

이 때, 단계(S220)는 학습을 수행하는 동안

함수가 학습 데이터의 레이블 빈도 수에 비례하여 활성화될 수 있다.At this time, step S220 is performed while learning

이 때, 단계(S220)는 크로스 엔트로피(Cross Entropy)의 단점을 극복하고자 본 발명의 일실시예에 따른 손실함수인 적응형 크로스 엔트로피 손실 함수(Adaptive Cross Entropy Loss Function)를 이용하여 불균형 데이터를 학습할 수 있다.At this time, step S220 is to overcome the disadvantage of cross entropy (Cross Entropy), learning imbalance data using the adaptive cross entropy loss function (Adaptive Cross Entropy Loss Function), which is a loss function according to an embodiment of the present invention can do.

수학식 3에서,

는 j번째 클래스의 샘플 개수를 나타내며,

는 해당 j번째 클래스의 hit 개수를 나타낸다.In Equation 3,

represents the number of samples in the jth class,

represents the number of hits of the j-th class.

이 때, 단계(S220)상기 불균형 데이터의 배치(BATCH) 별로, 상기 불균형 데이터의 클래스 별 데이터 분포 및 클래스 별 인식 정확도 중 적어도 하나에 기반하여 가중치를 결정할 수 있다.In this case, in step S220 , the weight may be determined based on at least one of a data distribution for each class of the imbalanced data and a recognition accuracy for each class for each batch of the imbalanced data BATCH.

이 때, 단계(S220)상기 클래스 별 데이터 분포에 기반한 가중치를 결정하기 위해, 상기 클래스 별로 상기 클래스의 데이터가 차지하는 양에 기반하여 상기 손실값이 변경되도록 상기 가중치를 결정할 수 있다.In this case, in step S220 , in order to determine the weight based on the data distribution for each class, the weight may be determined so that the loss value is changed based on the amount occupied by the data of the class for each class.

이 때, 단계(S220)상기 클래스 별로 클래스의 데이터가 차지하는 양이 기설정된 값보다 작으면 상기 손실값이 증가되는 방향으로 상기 가중치를 결정하고, 상기 클래스 별로 클래스의 데이터가 차지하는 양이 기설정된 값보다 크면 상기 손실값이 감소되는 방향으로 상기 가중치를 결정할 수 있다.In this case, in step S220, if the amount of data of the class for each class is less than a preset value, the weight is determined in a direction in which the loss value increases, and the amount of data of the class for each class is a preset value. If greater, the weight may be determined in a direction in which the loss value is decreased.

이 때, 단계(S220)는

항의 j번째 클래스의 데이터가 차지하는 양이 적으면 손실 값을 높여주는 방향으로 작동할 수 있다.At this time, step S220 is

반대로, 단계(S220)는 데이터의 차지하는 양이 크면 손실 값을 낮춰주는 방향으로 작동하여 데이터 불균형에 의한 손실 값의 클래스별 비대칭을 완화시켜주는 방향으로 작동할 수 있다.Conversely, step S220 may operate in the direction of lowering the loss value when the amount of data occupied is large, and may operate in the direction of alleviating the class-specific asymmetry of the loss value due to data imbalance.

또한, 단계(S220)는 상기 클래스 별 인식 정확도에 기반한 가중치를 결정하기 위해, 클래스 별 히트 레이트(HIT RATE)를 계산할 수 있다.Also, in step S220, a hit rate for each class may be calculated to determine a weight based on the recognition accuracy for each class.

이 때, 단계(S220)는 상기 히트 레이트가 기설정된 값보다 작으면 상기 가중치를 기설정된 가중치보다 증가시키고, 상기 히트 레이트가 기설정된 값보다 높으면 상기 가중치를 상기 기설정된 가중치보다 감소시킬 수 있다.In this case, in step S220, if the hit rate is less than the preset value, the weight may be increased than the preset weight, and if the hit rate is higher than the preset value, the weight may be decreased than the preset weight.

단계(S220)는 학습시 클래스 별 히트(hit rate),

를 계산하며 클래스별 히트 레이트(hit rate)가 낮으면 가중치를 높여주고, 반대로 높으면 가중치를 낮게 해주어, 예측 에러를 최소화함과 동시에 데이터 분포에 따른 정확도의 균형을 맞춰 줌으로써 전체적 인식률을 향상시킬 수 있다.Step S220 is a hit rate for each class during learning,

If the hit rate for each class is low, the weight is increased, and if the hit rate is high, the weight is decreased. .

이 때, 단계(S220)는 DNN을 학습시킬 때, 손실함수를 배치(Batch) 별로 계산할 수 있다.At this time, in step S220, when training the DNN, the loss function may be calculated for each batch.

이 때, 단계(S220)는 배치 별로 정의된 손실함수를 이용하여 불균형 데이터에서 이를 완화시키는 방향으로 역전파 알고리즘을 동작시킬 수 있다.In this case, in step S220, the backpropagation algorithm may be operated in a direction to alleviate the imbalanced data using the loss function defined for each batch.

결과적으로, 단계(S220)는 배치 별로 크로스 엔트로피(Cross Entropy) 손실 함수의 가중치가 배치 별 레이블 통계 정보(클래스 별 정확도 및 클래스 별 데이터 분포)를 기반으로 결정되어 결과적으로 매 배치 별로 가중치를 변경하여 손실함수를 계산할 수 있다.As a result, in step S220, the weight of the cross entropy loss function for each batch is determined based on the label statistical information for each batch (accuracy for each class and data distribution for each class). As a result, by changing the weight for each batch The loss function can be calculated.

이 때, 단계(S220)는 상기 통계 정보에 기반하여 배치 별로 상기 가중치를 재설정하고, 재설정된 상기 가중치를 이용하여 상기 배치 별로 상기 손실함수를 계산하여 상기 불균형 데이터를 학습할 수 있다.In this case, in step S220, the weight is reset for each batch based on the statistical information, and the loss function is calculated for each batch using the reset weight to learn the imbalance data.

또한, 본 발명의 일실시예에 따른 불균형 데이터 학습 방법은 데이터를 출력 및 저장할 수 있다(S230).In addition, the method for learning imbalanced data according to an embodiment of the present invention may output and store data (S230).

즉, 단계(S230)는 불균형 데이터를 학습한 결과를 출력 및 저장할 수 있다.That is, step S230 may output and store the result of learning the imbalance data.

도 3은 본 발명의 일실시예에 따른 컴퓨터 시스템을 나타낸 도면이다.3 is a diagram illustrating a computer system according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일실시예에 따른 불균형 데이터 학습 장치는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템(1100)에서 구현될 수 있다. 도 3에 도시된 바와 같이, 컴퓨터 시스템(1100)은 버스(1120)를 통하여 서로 통신하는 하나 이상의 프로세서(1110), 메모리(1130), 사용자 인터페이스 입력 장치(1140), 사용자 인터페이스 출력 장치(1150) 및 스토리지(1160)를 포함할 수 있다. 또한, 컴퓨터 시스템(1100)은 네트워크(1180)에 연결되는 네트워크 인터페이스(1170)를 더 포함할 수 있다. 프로세서(1110)는 중앙 처리 장치 또는 메모리(1130)나 스토리지(1160)에 저장된 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(1130) 및 스토리지(1160)는 다양한 형태의 휘발성 또는 비휘발성 저장 매체일 수 있다. 예를 들어, 메모리는 ROM(1131)이나 RAM(1132)을 포함할 수 있다.Referring to FIG. 3 , the apparatus for learning imbalanced data according to an embodiment of the present invention may be implemented in a computer system 1100 such as a computer-readable recording medium. As shown in FIG. 3 , the computer system 1100 includes one or more processors 1110 , a memory 1130 , a user interface input device 1140 , and a user interface output device 1150 that communicate with each other via a bus 1120 . and storage 1160 . In addition, the computer system 1100 may further include a network interface 1170 coupled to the network 1180 . The processor 1110 may be a central processing unit or a semiconductor device that executes processing instructions stored in the memory 1130 or the storage 1160 . The memory 1130 and the storage 1160 may be various types of volatile or non-volatile storage media. For example, the memory may include a ROM 1131 or a RAM 1132 .

본 발명의 일실시예에 따른 불균형 데이터 학습 장치는 하나 이상의 프로세서(1110); 및 상기 하나 이상의 프로세서(1110)에 의해 실행되는 적어도 하나 이상의 프로그램을 저장하는 실행메모리(1130)를 포함하고, 상기 적어도 하나 이상의 프로그램은 불균형 데이터를 입력 받고, 상기 불균형 데이터에 기반하여 가중치를 결정하고, 상기 가중치에 기반하여 손실값이 변화하는 기정의된 손실함수를 이용하여 상기 불균형 데이터를 학습할 수 있다.An apparatus for learning imbalanced data according to an embodiment of the present invention includes one or more processors 1110; and an execution memory 1130 for storing at least one or more programs executed by the one or more processors 1110, wherein the at least one program receives imbalance data, and determines a weight based on the imbalance data, , it is possible to learn the imbalance data using a predefined loss function in which the loss value changes based on the weight.

이 때, 상기 손실함수는 상기 클래스의 히트 레이트(HIT RATE)에 기반하여 가중치가 설정될 수 있다.In this case, the weight of the loss function may be set based on the hit rate (HIT RATE) of the class.

이 때, 상기 적어도 하나 이상의 프로그램은 상기 불균형 데이터의 배치(BATCH) 별로, 상기 불균형 데이터의 클래스 별 데이터 분포 및 클래스 별 인식 정확도 중 적어도 하나에 기반하여 가중치를 결정할 수 있다.In this case, the at least one program may determine a weight based on at least one of a data distribution for each class of the imbalanced data and a recognition accuracy for each class for each batch of the imbalanced data (BATCH).

이 때, 상기 적어도 하나 이상의 프로그램은 상기 클래스 별 데이터 분포에 기반한 가중치를 결정하기 위해, 상기 클래스 별로 상기 클래스의 데이터가 차지하는 양에 기반하여 상기 손실값이 변경되도록 상기 가중치를 결정할 수 있다.In this case, in order to determine the weight based on the data distribution for each class, the at least one program may determine the weight so that the loss value is changed based on the amount occupied by the data of the class for each class.

본 발명에서는 데이터 불균형 특성을 갖고 있는 데이터의 분류 성능을 높이기 위한 딥 뉴럴 네트워크의 적응형 손실함수를 제안한다. 제안한 본 발명의 일실시예에 따른 적응형 손실 함수는 데이터의 불균형을 손실함수의 가중치에 반영하여 적은 양의 클래스 데이터에 가중치를 높게 주는 방식으로 구성되며, 제안한 손실함수를 통해 불균형한 클래스별 데이터를 갖고 있는 데이터에서도 정확한 분류 결과를 기대할 수 있다.In the present invention, an adaptive loss function of a deep neural network is proposed to improve classification performance of data having data imbalance characteristics. The proposed adaptive loss function according to an embodiment of the present invention is configured in such a way that a high weight is given to a small amount of class data by reflecting the imbalance of data in the weight of the loss function, and the unbalanced class data through the proposed loss function Accurate classification results can be expected even with data with

이상에서와 같이 본 발명의 일실시예에 따른 불균형 데이터 학습 장치 및 방법은 상기한 바와 같이 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.As described above, in the apparatus and method for learning imbalanced data according to an embodiment of the present invention, the configuration and method of the embodiments described above are not limitedly applicable, but the embodiments are so that various modifications can be made. All or part of each embodiment may be selectively combined and configured.

110: 데이터 입력부 120: 데이터 학습부
130: 데이터 출력부 140: 데이터베이스부
1100: 컴퓨터 시스템 1110: 프로세서
1120: 버스 1130: 메모리
1131: 롬 1132: 램
1140: 사용자 인터페이스 입력 장치
1150: 사용자 인터페이스 출력 장치
1160: 스토리지 1170: 네트워크 인터페이스
1180: 네트워크110: data input unit 120: data learning unit
130: data output unit 140: database unit
1100: computer system 1110: processor
1120: bus 1130: memory
1131: rom 1132: ram
1140: user interface input device
1150: user interface output device
1160: storage 1170: network interface
1180: network

Claims

one or more processors; and
an execution memory for storing at least one or more programs executed by the one or more processors;
including,
the at least one program
receiving unbalanced data,
Determining a weight based on the imbalance data, and learning the imbalance data by using a predefined loss function in which a loss value changes based on the weight.

The method according to claim 1,
The loss function is
Unbalanced data learning apparatus, characterized in that it is used as an objective function for the BACK-PROPAGATION algorithm.

3. The method according to claim 2,
the at least one program
For each batch of the imbalanced data (BATCH), the imbalanced data learning apparatus, characterized in that the weight is determined based on at least one of a data distribution for each class of the imbalanced data and a recognition accuracy for each class.

4. The method according to claim 3,
the at least one program
In order to determine the weight based on the data distribution for each class, the weight is determined so that the loss value is changed based on the amount occupied by the data of the class for each class.

5. The method according to claim 4,
the at least one program
If the amount of data of the class for each class is smaller than a preset value, the weight is determined in a direction in which the loss value increases,
The apparatus for learning imbalanced data, characterized in that the weight is determined in a direction in which the loss value is decreased when the amount of data of each class is greater than a preset value.

4. The method according to claim 3,
the at least one program
In order to determine the weight based on the recognition accuracy for each class, the apparatus for learning unbalanced data, characterized in that calculating a hit rate (HIT RATE) for each class.

7. The method of claim 6,
the at least one program
When the hit rate is less than a preset value, the weight is increased than the preset weight, and when the hit rate is higher than the preset value, the weight is decreased than the preset weight.

4. The method according to claim 3,
the at least one program
Unbalanced data learning apparatus, characterized in that the weight is recrystallized for each batch based on the changed loss value and the weight, and the loss function is calculated for each batch using the recrystallized weight to learn the imbalance data.

In the unbalanced data learning method of the unbalanced data learning apparatus,
receiving unbalanced data as input; and
determining a weight based on the imbalance data, and learning the imbalance data using a predefined loss function in which a loss value is changed based on the weight;
Unbalanced data learning method comprising a.

10. The method of claim 9,
The loss function is
Unbalanced data learning method, characterized in that it is used as an objective function for the BACK-PROPAGATION algorithm.

11. The method of claim 10,
The step of learning the imbalanced data is
For each batch of the imbalanced data (BATCH), the imbalanced data learning method, characterized in that the weight is determined based on at least one of a data distribution for each class of the imbalanced data and a recognition accuracy for each class.

12. The method of claim 11,
The step of learning the imbalanced data is
In order to determine the weight based on the data distribution for each class, the weight is determined so that the loss value is changed based on the amount occupied by the data of the class for each class.

13. The method of claim 12,
The step of learning the imbalanced data is
If the amount of data of the class for each class is smaller than a preset value, the weight is determined in a direction in which the loss value increases,
Unbalanced data learning method, characterized in that the weight is determined in a direction in which the loss value is reduced when the amount of data of the class for each class is greater than a preset value.

12. The method of claim 11,
The step of learning the imbalanced data is
In order to determine the weight based on the recognition accuracy for each class, the method for learning unbalanced data, characterized in that calculating a hit rate (HIT RATE) for each class.

15. The method of claim 14,
The step of learning the imbalanced data is
When the hit rate is less than a preset value, the weight is increased than the preset weight, and when the hit rate is higher than the preset value, the weight is decreased than the preset weight.

12. The method of claim 11,
The step of learning the imbalanced data is
Unbalanced data learning method, characterized in that the weight is re-determined for each batch based on the changed loss value and the weight, and the loss function is calculated for each batch using the re-determined weight to learn the imbalance data.