KR102611121B1

KR102611121B1 - Method and apparatus for generating imaga classification model

Info

Publication number: KR102611121B1
Application number: KR1020210144176A
Authority: KR
Inventors: 박상현; 강명균
Original assignee: 재단법인대구경북과학기술원
Priority date: 2021-03-22
Filing date: 2021-10-27
Publication date: 2023-12-07
Also published as: KR20220131808A

Abstract

이미지 분류 모델 생성 방법 및 장치가 개시된다. 본 실시 예에 따른 이미지 분류 모델 생성 방법은, 이미지 분류 모델 생성을 위해 수집한 데이터 셋에서, 타겟 이미지와 타겟 이미지의 텍스처(texture)를 변경하기 위한 텍스처 이미지를 포함하는 이미지 쌍을 선택하는 단계와, 인코더 레이어를 통해, 타겟 이미지와 텍스처 이미지 각각에 대한 구조(structure) 정보 및 텍스처 정보를 추출하는 단계와, 정규화 레이어에서 타겟 이미지의 구조 정보와 텍스처 이미지의 구조 정보를 기반으로, 타겟 이미지에 대한 구조 정보를 대상으로 정규화를 수행하는 단계와, 이미지 생성 레이어에서 텍스처 이미지의 텍스처 정보와, 정규화된 구조 정보를 기반으로 이미지를 생성하는 단계와, 생성한 이미지와, 타겟 이미지 및 텍스처 이미지를 기반으로 하여, 이미지 분류 모델의 최적화를 수행하는 단계를 포함할 수 있다.A method and apparatus for generating an image classification model are disclosed. The method for generating an image classification model according to this embodiment includes the steps of selecting an image pair including a target image and a texture image for changing the texture of the target image from a data set collected to create an image classification model; , through the encoder layer, extracting structure information and texture information for each of the target image and texture image, and based on the structure information of the target image and the structure information of the texture image in the normalization layer, A step of performing normalization targeting structural information, a step of generating an image based on the texture information of the texture image in an image generation layer, and the normalized structural information, and a step of generating an image based on the generated image, target image, and texture image. Thus, the step of performing optimization of the image classification model may be included.

Description

Method and apparatus for generating image classification model {METHOD AND APPARATUS FOR GENERATING IMAGA CLASSIFICATION MODEL}

본 개시는 바이어스(bias)를 기반으로 라벨을 예측하지 않도록 학습 데이터 셋의 바이어스를 완화할 수 있는 이미지 분류 모델을 제안하는 이미지 분류 모델 생성 방법 및 장치에 관한 것이다.The present disclosure relates to a method and device for generating an image classification model that proposes an image classification model that can alleviate bias in a learning data set so as not to predict labels based on bias.

최근 딥러닝 발전으로 의료 영상 분류 성능이 크게 높아졌지만, 학습 데이터 셋 내에 레이블 외에 바이어스(bias)가 존재할 경우 바이어스를 효과적으로 제거해주지 못해 심각한 오버피팅 문제가 발생하였다. Recent advances in deep learning have greatly improved medical image classification performance, but when biases other than labels exist in the learning data set, the bias cannot be effectively removed, resulting in a serious overfitting problem.

이렇게 실제 예측에 적합하지 않은 정보를 활용해 라벨을 예측하는 것을 데이터위험(data risk)라고 하며, 특히 의료 영상의 경우 데이터를 취득하는 스캐너나 스캐너의 촬영설정, 그리고 demographic score 등이 질환으로 야기한 변형보다 더 큰 변형을 줄 수 있기 때문에, 딥러닝 모델 학습에 있어 데이터위험을 완화하는 기법이 필요하다.Predicting labels using information that is not suitable for actual prediction is called data risk. In particular, in the case of medical imaging, variations caused by disease due to the scanner that acquires data, the imaging settings of the scanner, and demographic score, etc. Because it can lead to greater changes, techniques to mitigate data risks are needed when learning deep learning models.

다시 말하면, 최근 비전, 음성 및 자연어처리의 성공은 딥러닝 연구와 GPU기반 병렬 처리 기술 발전에서 비롯되었다. 이들 기법은 주로 데이터 기반(data driven) 기법으로 학습 데이터 셋에 많은 부분을 의존하고 있으며, 모델이 현실에 대응할 수 있도록 수백만 개의 데이터를 필요로 한다. In other words, recent success in vision, speech, and natural language processing stems from deep learning research and advances in GPU-based parallel processing technology. These techniques are mainly data driven and rely heavily on learning data sets, requiring millions of pieces of data to enable the model to respond to reality.

하지만 최근 수백만 개의 학습 데이터 셋이 있어도, 거짓된 요인(spurious artifact)이 라벨 별로 분포하고 있으면, 모델이 의미 있는 요인이 아닌, 거짓된 요인을 기반으로 라벨을 추론할 수 있으며, 이는 바이어스 된 학습 데이터 셋으로 학습한 모델이 실제 입력에서 극도로 하락한 성능을 보이는 데이터 위험을 야기할 수 있음을 의미한다. However, even if there are millions of recent training data sets, if spurious artifacts are distributed for each label, the model may infer labels based on spurious factors rather than meaningful factors, which results in a biased training data set. This means that the learned model may cause data risk, showing extremely poor performance on actual input.

더 나아가 의료 영상 처리 분석 도메인에서는 다양한 센터에서 데이터를 수집하거나 질병별로 다른 기기 및 다른 시간에 촬영한 데이터를 획득하는 경우가 많아, 바이어스 된 데이터 셋이 구축되는 경우가 많기 때문에, 더욱 바이어스를 완화하는 연구의 필요성이 대두되었다.Furthermore, in the medical image processing and analysis domain, data is often collected from various centers or data taken with different devices and at different times for each disease is acquired, which often results in the construction of biased data sets. Therefore, it is necessary to further alleviate bias. The need for research emerged.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The above-mentioned background technology is technical information that the inventor possessed for deriving the present invention or acquired in the process of deriving the present invention, and cannot necessarily be said to be known art disclosed to the general public before filing the application for the present invention.

선행기술 1: Kang, M., Hong, K.S., Chikontwe, P., Luna, M., Jang, J.G., Park, J., Shin, K.C.,Park, S.H., Ahn, J.H.: Quantitative assessment of chest ct patterns in covid-19 and bacterial pneumonia patients: a deep learning perspective. Journal of Korean medical science 36(5) (2021) Prior Art 1: Kang, M., Hong, K.S., Chikontwe, P., Luna, M., Jang, J.G., Park, J., Shin, K.C., Park, S.H., Ahn, J.H.: Quantitative assessment of chest ct patterns in covid-19 and bacterial pneumonia patients: a deep learning perspective. Journal of Korean medical science 36(5) (2021) 선행기술 2: Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019) Prior Art 2: Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; Increasing shape bias improves accuracy and robustness. In: International Conference on Learning Representations (2019)

본 개시의 실시 예의 일 과제는, 적응형 구조 인스턴스 정규화(Adaptive structural instance normalization, AdaSIN) 알고리즘과 스타일 믹싱(style mixing) 알고리즘을 활용하여, 바이어스(bias)를 기반으로 라벨을 예측하지 않도록 학습 데이터 셋의 바이어스를 완화할 수 있는 이미지 분류 모델을 생성 및 학습하는 방법 및 장치를 제공하는 것이다.One task of the embodiment of the present disclosure is to use the adaptive structural instance normalization (AdaSIN) algorithm and the style mixing algorithm to prepare a learning data set so as not to predict labels based on bias. To provide a method and device for generating and learning an image classification model that can alleviate bias.

본 개시의 실시 예의 목적은 이상에서 언급한 과제에 한정되지 않으며, 언급되지 않은 본 개시의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 개시의 실시 예에 의해 보다 분명하게 이해될 것이다. 또한, 본 개시의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 알 수 있을 것이다.The purpose of the embodiments of the present disclosure is not limited to the problems mentioned above, and other purposes and advantages of the present disclosure that are not mentioned can be understood through the following description and can be understood more clearly by the embodiments of the present disclosure. will be. Additionally, it will be appreciated that the objects and advantages of the present disclosure can be realized by means and combinations thereof as indicated in the patent claims.

본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 방법은, 이미지 분류 모델 생성을 위해 수집한 데이터 셋에서, 타겟 이미지와 타겟 이미지의 텍스처(texture)를 변경하기 위한 텍스처 이미지를 포함하는 이미지 쌍을 선택하는 단계와, 인코더 레이어를 통해, 타겟 이미지와 텍스처 이미지 각각에 대한 구조(structure) 정보 및 텍스처 정보를 추출하는 단계와, 정규화 레이어에서 타겟 이미지의 구조 정보와 텍스처 이미지의 구조 정보를 기반으로, 타겟 이미지에 대한 구조 정보를 대상으로 정규화를 수행하는 단계와, 이미지 생성 레이어에서 텍스처 이미지의 텍스처 정보와, 정규화된 구조 정보를 기반으로 이미지를 생성하는 단계와, 생성한 이미지와, 타겟 이미지 및 텍스처 이미지를 기반으로 하여, 이미지 분류 모델의 최적화를 수행하는 단계를 포함할 수 있다.The image classification model generation method according to an embodiment of the present disclosure includes selecting an image pair including a target image and a texture image for changing the texture of the target image from a data set collected to create an image classification model. A step of extracting structure information and texture information for each of the target image and texture image through the encoder layer, and extracting the target image based on the structure information of the target image and the structure information of the texture image in the normalization layer. A step of performing normalization on structural information about the image, a step of generating an image based on the texture information of the texture image and the normalized structural information in an image generation layer, the generated image, a target image, and a texture image. Based on this, it may include performing optimization of the image classification model.

이 외에도, 본 개시의 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램이 저장된 컴퓨터로 판독 가능한 기록매체가 더 제공될 수 있다.In addition, another method for implementing the present disclosure, another system, and a computer-readable recording medium storing a computer program for executing the method may be further provided.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.Other aspects, features and advantages in addition to those described above will become apparent from the following drawings, claims and detailed description of the invention.

본 개시의 실시 예에 의하면, 적응형 구조 인스턴스 정규화(Adaptive structural instance normalization, AdaSIN) 알고리즘과 스타일 믹싱(style mixing) 알고리즘을 활용하여, 바이어스가 완화된 데이터 셋을 생성함으로써, 분류 모델의 성능을 향상시킬 수 있으며, 바이어스 된 학습 데이터 셋으로 학습한 분류 모델에 대해 실제 입력에서 극도로 하락한 성능을 보이는 데이터 위험(data risk)의 발생을 방지할 수 있다.According to an embodiment of the present disclosure, the performance of the classification model is improved by generating a data set with reduced bias using the adaptive structural instance normalization (AdaSIN) algorithm and the style mixing algorithm. It is possible to prevent the occurrence of data risk, which shows extremely poor performance in actual input for a classification model learned with a biased training data set.

또한, 바이어스가 포함될 가능성이 높은 텍스쳐(texture) 정보를 다른 라벨로 전이(transfer)하여, 직접적으로 바이어스와 관련된 정보를 모르고 있을지라도 최소한 텍스처와 관련된 바이어스 문제를 해소할 수 있도록 한다.Additionally, by transferring texture information that is likely to contain bias to another label, it is possible to at least resolve the bias problem related to the texture even if information related to the bias is not directly known.

또한, 본 개시의 일 실시 예에 의하면, 동일 도메인 이미지의 제조사 기기 그리고 촬영설정 등 과 같이 바이어스를 명확하게 정의하기 어려우며, 이미지 전체에 바이어스가 매핑 되어 바이어스를 마스킹(masking)하기 어려운 태스크에 적용하기에 매우 적합하며, 이 외에도 데이터 위험이 발생할 가능성이 높은 데이터 셋에 선제적으로 적용해 분류 모델의 real world generalization 성능을 확보할 수 있도록 할 수 있다.In addition, according to an embodiment of the present disclosure, it is difficult to clearly define the bias, such as the manufacturer, device, and shooting settings of the same domain image, and the bias is mapped to the entire image, making it difficult to mask the bias. It is very suitable for , and in addition, it can be applied preemptively to data sets with a high possibility of data risk to secure the real-world generalization performance of the classification model.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 시스템을 개략적으로 도시한 도면이다.
도 2는 본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 장치를 개략적으로 나타낸 블록도이다.
도 3은 본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 과정을 설명하기 위해 도시한 도면이다.
도 4는 본 개시의 일 실시 예에 따른 이미지 분류 모델 성능 실험 결과를 나타낸 예시도이다.
도 5는 본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 방법을 설명하기 위한 흐름도이다.1 is a diagram schematically showing an image classification model generation system according to an embodiment of the present disclosure.
Figure 2 is a block diagram schematically showing an image classification model generating device according to an embodiment of the present disclosure.
FIG. 3 is a diagram illustrating an image classification model creation process according to an embodiment of the present disclosure.
Figure 4 is an example diagram showing the results of an image classification model performance experiment according to an embodiment of the present disclosure.
Figure 5 is a flowchart illustrating a method for generating an image classification model according to an embodiment of the present disclosure.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 설명되는 실시 예들을 참조하면 명확해질 것이다.The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail together with the accompanying drawings.

그러나 본 개시는 아래에서 제시되는 실시 예들로 한정되는 것이 아니라, 서로 다른 다양한 형태로 구현될 수 있고, 본 개시의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 아래에 제시되는 실시 예들은 본 개시가 완전하도록 하며, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 본 개시를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.However, the present disclosure is not limited to the embodiments presented below, but may be implemented in various different forms, and should be understood to include all conversions, equivalents, and substitutes included in the spirit and technical scope of the present disclosure. . The embodiments presented below are provided to ensure that the present disclosure is complete and to fully inform those skilled in the art of the scope of the invention. In describing the present disclosure, if it is determined that a detailed description of related known technologies may obscure the gist of the present disclosure, the detailed description will be omitted.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 개시를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 개시에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.The terms used in this application are only used to describe specific embodiments and are not intended to limit the present disclosure. Singular expressions include plural expressions unless the context clearly dictates otherwise. In the present disclosure, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof. Terms such as first, second, etc. may be used to describe various components, but the components should not be limited by these terms. The above terms are used only for the purpose of distinguishing one component from another.

이하, 본 개시에 따른 실시 예들을 첨부된 도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, identical or corresponding components are assigned the same drawing numbers and duplicate descriptions thereof are omitted. I decided to do it.

도 1은 본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 시스템을 개략적으로 도시한 도면이다.1 is a diagram schematically showing an image classification model generation system according to an embodiment of the present disclosure.

도 1을 참조하면, 이미지 분류 모델 생성 시스템(1)은 이미지 분류 모델 생성 장치(100), 사용자 단말(200), 서버(300) 및 네트워크(400)를 포함할 수 있다.Referring to FIG. 1 , the image classification model generating system 1 may include an image classification model generating device 100, a user terminal 200, a server 300, and a network 400.

일 실시 예에 따른, 이미지 분류 모델 생성 장치(100)는 분류 모델을 추론함에 있어, 학습 데이터 셋의 바이어스를 완화하는 데이터 셋을 생성하여, 바이어스를 기반으로 라벨을 예측하지 않도록 하는 이미지 분류 모델을 생성하고 학습하는 것이다. According to one embodiment, the image classification model generating apparatus 100 generates a data set that alleviates the bias of the learning data set when inferring the classification model, and creates an image classification model that does not predict labels based on the bias. Creating and learning.

일 실시 예에 따른, 이미지 분류 모델 생성 장치(100)는 적응형 구조 인스턴스 정규화(Adaptive structural instance normalization, AdaSIN) 알고리즘과 스타일 믹싱(style mixing) 알고리즘을 활용하여 바이어스가 완화된 데이터 셋을 생성할 수 있다.According to one embodiment, the image classification model generating apparatus 100 can generate a data set with bias alleviated by utilizing an adaptive structural instance normalization (AdaSIN) algorithm and a style mixing algorithm. there is.

일 실시 예에서는, 바이어스가 완화된 데이터 셋으로 분류 모델을 학습하고, 해당 분류 모델에 대해 외부 데이터 셋을 활용해 성능을 검증하여도 높은 생성(generalization) 성능을 나타낼 수 있으며, 특히 폐 CT와 같은 의료 영상 분류에 있어서도 바이어스 완화 성능을 나타낼 수 있다.In one embodiment, high generalization performance can be achieved even by learning a classification model with a bias-relieved data set and verifying the performance of the classification model using an external data set, especially in a data set such as lung CT. Bias mitigation performance can also be demonstrated in medical image classification.

즉 이미지 분류 모델 생성 장치(100)는 바이어스가 완화된 데이터 셋을 구축하여 데이터 위험을 완화하는 기법을 구현하는 것으로, 다양한 이미지 증강(augmentation) 기법과 바이어스 문제를 완화하는 비교 기법 대비 좋은 바이어스 완화 성능을 나타낼 수 있다.In other words, the image classification model generator 100 implements a technique to mitigate data risks by building a data set with reduced bias, and has good bias reduction performance compared to various image augmentation techniques and comparison techniques to alleviate bias problems. can indicate.

특히, 이미지 분류 모델 생성 장치(100)는 예를 들어, 흉부 X-ray 혹은 폐 CT를 활용해 코로나19로 인한 폐렴 환자와 코로나19가 아닌 요인(세균성 혹은 인플루엔자 등)으로 인한 폐렴 환자를 진단하는데 있어, 매우 높은 분류 성능을 나타낼 수 있다.In particular, the image classification model generating device 100 uses, for example, chest X-ray or lung CT to diagnose patients with pneumonia caused by COVID-19 and patients with pneumonia caused by factors other than COVID-19 (bacterial or influenza, etc.). Therefore, it can exhibit very high classification performance.

상기와 같은 진단을 위해 수집한 데이터 셋에서 CT 제조사, CT 촬영 장비 그리고 CT 촬영 설정(Kilovoltage peak)이 상당 부분 바이어스 되어 있을 수 있으며, 또한 해당 데이터 셋은 단일 센터에서 수집한 영상인데도 불구하고 바이어스가 존재할 수 있다. 이처럼 바이어스 데이터 셋은 언제나 의도치 않게 구축될 수 있으며, 이를 활용해 학습한 모델은 실제 데이터 셋을 통해 성능 평가 시 정확도가 크게 하락할 수 있다.In the data set collected for the above diagnosis, the CT manufacturer, CT imaging equipment, and CT imaging settings (Kilovoltage peak) may be significantly biased, and even though the data set is images collected from a single center, the bias may be significant. It can exist. In this way, bias data sets can always be built unintentionally, and the accuracy of models learned using them can be significantly reduced when performance is evaluated using actual data sets.

따라서 일 실시 예에 따른, 이미지 분류 모델 생성 장치(100)는 바이어스를 완화하기 위해 바이어스가 분류 모델 학습에 있어 라벨을 추론하는데 유용하지 않도록 이미지를 생성하는 방식을 제안할 수 있다. 예를 들어, 코로나19와 세균성 폐렴을 구분하는 태스크에서 바이어스 정보를 기반으로, 모델이 추론하는 것을 막으려고 한다면, 다른 라벨의 바이어스 정보, 즉 세균성 폐렴 촬영 설정의 코로나19 영상을 생성하고, 코로나19 촬영 설정의 세균성 폐렴 영상을 생성한 후, 이를 각각 학습 데이터 셋에 추가해 분류 모델을 학습할 수 있다.Therefore, in order to alleviate bias, the image classification model generating apparatus 100 according to one embodiment may propose a method of generating images so that bias is not useful for inferring labels in learning a classification model. For example, in the task of distinguishing between COVID-19 and bacterial pneumonia, if you want to prevent the model from making inferences based on bias information, generate a different label of bias information, that is, a COVID-19 image in the bacterial pneumonia imaging setup, and After creating images of bacterial pneumonia in shooting settings, you can learn a classification model by adding them to the learning data set.

이처럼 바이어스가 포함 될 가능성이 높은 텍스처 정보를 다른 라벨로 전이(transfer)하면, 직접적으로 바이어스와 관련된 정보를 모르고 있을지라도 최소한 텍스처와 관련된 바이어스 문제는 해소할 수 있다.In this way, by transferring texture information that is likely to contain bias to another label, at least the bias problem related to texture can be resolved even if information related to bias is not directly known.

다시 말하면, 일 실시 예에 따른 이미지 분류 모델 생성 장치(100)는 style mixing과 adaptive structural instance normalization을 활용해 "make bias useless"에 적합한 이미지 생성이 가능하도록 할 수 있다.In other words, the image classification model generating device 100 according to an embodiment may utilize style mixing and adaptive structural instance normalization to generate images suitable for “make bias useless.”

한편, 일 실시 예에서는, 예를 들어, 이미지 생성에 성공적인 성능을 보인 StyleGAN2를 사용하여 이미지 생성을 수행할 수 있다. 이때 이미지 분류 모델 생성 장치(100)는 컨볼루션 가중치 변조(convolution weight modulation) 레이어와 복조(demodulation) 레이어를 활용해 텍스처 정보를 이미지 생성 레이어에 전달할 수 있다. Meanwhile, in one embodiment, for example, image generation may be performed using StyleGAN2, which has shown successful performance in image generation. At this time, the image classification model generating device 100 may transmit texture information to the image generation layer using a convolution weight modulation layer and a demodulation layer.

이와 더불어, 이미지 분류 모델 생성 장치(100)는 생성된 이미지에 텍스처 바이어스 정보가 남는 현상을 방지하기 위해, 구조 정보에 대한 구조 특징맵을 정규화하는 기법을 통해, 바이어스 완화에 가장 적합 이미지를 생성할 수 있다. In addition, the image classification model generating device 100 can generate an image most suitable for bias relief through a technique of normalizing the structural feature map for structural information in order to prevent texture bias information from remaining in the generated image. You can.

또한, 이미지 분류 모델 생성 장치(100)는 텍스처 변환 이미지 생성에 있어, 대조 학습(contrastive learning) 알고리즘을 통해 학습한 모델의 특징을 활용하며, 유사한 이미지를 변환 쌍(pair)으로 설정해, 잘못된 이미지 생성을 방지할 수 있다.In addition, the image classification model generating device 100 utilizes the characteristics of a model learned through a contrastive learning algorithm in generating a texture conversion image and sets similar images as a conversion pair, thereby generating an incorrect image. can be prevented.

이러한 이미지 분류 모델 생성 장치(100)에 대한 구체적인 설명은 후술하도록 한다.A detailed description of the image classification model generating device 100 will be described later.

한편 일 실시 예에서는, 사용자들이 사용자 단말(200)에서 구현되는 어플리케이션 또는 웹사이트에 접속하여, 이미지 분류 모델 생성 과정을 수행할 수 있다. 이러한 사용자 단말(200)은 사용자가 조작하는 데스크 탑 컴퓨터, 스마트폰, 노트북, 태블릿 PC, 스마트 TV, 휴대폰, PDA(personal digital assistant), 랩톱, 미디어 플레이어, 마이크로 서버, GPS(global positioning system) 장치, 전자책 단말기, 디지털방송용 단말기, 네비게이션, 키오스크, MP3 플레이어, 디지털 카메라, 가전기기 및 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 제한되지 않는다. 또한, 사용자 단말(200)은 통신 기능 및 데이터 프로세싱 기능을 구비한 시계, 안경, 헤어 밴드 및 반지 등의 웨어러블 단말기 일 수 있다. 사용자 단말(200)은 상술한 내용에 제한되지 아니하며, 웹 브라우징이 가능한 단말기는 제한 없이 차용될 수 있다.Meanwhile, in one embodiment, users can access an application or website implemented on the user terminal 200 and perform an image classification model creation process. These user terminals 200 include desktop computers, smartphones, laptops, tablet PCs, smart TVs, mobile phones, personal digital assistants (PDAs), laptops, media players, micro servers, and global positioning system (GPS) devices operated by the user. , e-book terminals, digital broadcasting terminals, navigation devices, kiosks, MP3 players, digital cameras, home appliances, and other mobile or non-mobile computing devices, but are not limited thereto. Additionally, the user terminal 200 may be a wearable terminal such as a watch, glasses, hair band, or ring equipped with a communication function and data processing function. The user terminal 200 is not limited to the above-described content, and any terminal capable of web browsing may be used without limitation.

일 실시 예에서, 이미지 분류 모델 생성 시스템(1)은 이미지 분류 모델 생성 장치(100) 및/또는 서버(300)에 의해 구현될 수 있는데, 이때 서버(300)는 이미지 분류 모델 생성 장치(100)를 운용하기 위한 서버일 수 있다. 즉 서버(300)는 이미지 분류 모델 생성을 위한 프로세스를 수행하는 서버일 수 있다.In one embodiment, the image classification model generating system 1 may be implemented by the image classification model generating device 100 and/or the server 300, where the server 300 is the image classification model generating device 100. It may be a server for operating . That is, the server 300 may be a server that performs a process for creating an image classification model.

또한 서버(300)는 이미지 분류 모델 생성 장치(100)를 동작시키는 데이터를 제공하는 데이터베이스 서버일 수 있다. 그 밖에 서버(300)는 웹 서버 또는 어플리케이션 서버를 포함할 수 있다. 또한 서버(300)는 각종 인공 지능 알고리즘을 적용하는데 필요한 빅데이터 서버 및 AI 서버, 각종 알고리즘의 연산을 수행하는 연산 서버 등을 포함할 수 있으며, 상술하는 서버들을 포함하거나 이러한 서버들과 네트워킹 할 수 있다.Additionally, the server 300 may be a database server that provides data for operating the image classification model generating device 100. Additionally, the server 300 may include a web server or an application server. In addition, the server 300 may include a big data server and an AI server necessary for applying various artificial intelligence algorithms, a calculation server that performs calculations of various algorithms, and may include the servers described above or network with these servers. there is.

네트워크(400)는 이미지 분류 모델 생성 시스템(1)에서 이미지 분류 모델 생성 장치(100) 및 서버(300), 그리고 사용자 단말(200)을 연결하는 역할을 수행할 수 있다. 이러한 네트워크(400)는 예컨대 LANs(local area networks), WANs(Wide area networks), MANs(metropolitan area networks), ISDNs(integrated service digital networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 개시의 범위가 이에 한정되는 것은 아니다. 또한 네트워크(400)는 근거리 통신 및/또는 원거리 통신을 이용하여 정보를 송수신할 수 있다. 여기서 근거리 통신은 블루투스(bluetooth), RFID(radio frequency identification), 적외선 통신(IrDA, infrared data association), UWB(ultra-wideband), ZigBee, Wi-Fi(Wireless fidelity) 기술을 포함할 수 있고, 원거리 통신은 CDMA(code division multiple access), FDMA(frequency division multiple access), TDMA(time division multiple access), OFDMA(orthogonal frequency division multiple access), SC-FDMA(single carrier frequency division multiple access) 기술을 포함할 수 있다.The network 400 may serve to connect the image classification model creation device 100, the server 300, and the user terminal 200 in the image classification model creation system 1. These networks 400 include, for example, wired networks such as local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), and integrated service digital networks (ISDNs), or wireless LANs, CDMA, Bluetooth, and satellite communications. It may encompass wireless networks such as, but the scope of the present disclosure is not limited thereto. Additionally, the network 400 may transmit and receive information using short-range communication and/or long-distance communication. Here, short-range communication may include Bluetooth, RFID (radio frequency identification), infrared communication (IrDA, infrared data association), UWB (ultra-wideband), ZigBee, and Wi-Fi (Wireless fidelity) technology, and long-distance communication may include Communications may include code division multiple access (CDMA), frequency division multiple access (FDMA), time division multiple access (TDMA), orthogonal frequency division multiple access (OFDMA), and single carrier frequency division multiple access (SC-FDMA) technologies. You can.

네트워크(400)는 허브, 브리지, 라우터, 스위치 및 게이트웨이와 같은 네트워크 요소들의 연결을 포함할 수 있다. 네트워크(400)는 인터넷과 같은 공용 네트워크 및 안전한 기업 사설 네트워크와 같은 사설 네트워크를 비롯한 하나 이상의 연결된 네트워크들, 예컨대 다중 네트워크 환경을 포함할 수 있다. 네트워크(400)에의 액세스는 하나 이상의 유선 또는 무선 액세스 네트워크들을 통해 제공될 수 있다. 더 나아가 네트워크(400)는 사물 등 분산된 구성 요소들 간에 정보를 주고받아 처리하는 IoT(Internet of Things, 사물인터넷) 망 및/또는 5G 통신을 지원할 수 있다.Network 400 may include connections of network elements such as hubs, bridges, routers, switches, and gateways. Network 400 may include one or more connected networks, including public networks such as the Internet and private networks such as secure enterprise private networks, such as a multi-network environment. Access to network 400 may be provided through one or more wired or wireless access networks. Furthermore, the network 400 may support an IoT (Internet of Things) network and/or 5G communication that exchanges and processes information between distributed components such as objects.

도 2는 본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 장치를 개략적으로 나타낸 블록도이다. Figure 2 is a block diagram schematically showing an image classification model generating device according to an embodiment of the present disclosure.

도 2를 참조하면, 이미지 분류 모델 생성 장치(100)는 통신부(110), 사용자 인터페이스(120), 메모리(130) 및 프로세서(140)를 포함할 수 있다.Referring to FIG. 2 , the image classification model generating apparatus 100 may include a communication unit 110, a user interface 120, a memory 130, and a processor 140.

통신부(110)는 네트워크(400)와 연동하여 외부 장치간의 송수신 신호를 패킷 데이터 형태로 제공하는 데 필요한 통신 인터페이스를 제공할 수 있다. 또한 통신부(110)는 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다. 이러한 통신부(110)는 각종 사물 지능 통신(IoT(internet of things), IoE(internet of everything), IoST(internet of small things) 등)을 지원할 수 있으며, M2M(machine to machine) 통신, V2X(vehicle to everything communication) 통신, D2D(device to device) 통신 등을 지원할 수 있다.The communication unit 110 may work with the network 400 to provide a communication interface necessary to provide transmission and reception signals between external devices in the form of packet data. Additionally, the communication unit 110 may be a device that includes hardware and software necessary to transmit and receive signals such as control signals or data signals through wired or wireless connections with other network devices. This communication unit 110 can support various object intelligence communications (IoT (internet of things), IoE (internet of everything), IoST (internet of small things), etc.), M2M (machine to machine) communication, and V2X (vehicle to machine) communication. to everything communication) communication, D2D (device to device) communication, etc. can be supported.

즉, 프로세서(140)는 통신부(110)를 통해 연결된 외부 장치로부터 각종 데이터 또는 정보를 수신할 수 있으며, 외부 장치로 각종 데이터 또는 정보를 전송할 수도 있다. 그리고, 통신부(110)는 WiFi 모듈, Bluetooth 모듈, 무선 통신 모듈, 및 NFC 모듈 중 적어도 하나를 포함할 수 있다.That is, the processor 140 can receive various data or information from an external device connected through the communication unit 110, and can also transmit various data or information to the external device. Additionally, the communication unit 110 may include at least one of a WiFi module, a Bluetooth module, a wireless communication module, and an NFC module.

사용자 인터페이스(120)는 이미지 분류 모델 생성을 위해 이미지를 획득하는 과정, 학습을 위한 파라미터를 설정하는 등에 대한 사용자 요청 및 명령들이 입력되는 입력 인터페이스를 포함할 수 있다. The user interface 120 may include an input interface through which user requests and commands are input for the process of acquiring images to create an image classification model, setting parameters for learning, etc.

그리고 사용자 인터페이스(120)는 이미지 분류 모델 생성 장치(100)에 의해 생성된 이미지 분류 모델을 기반으로, 이미지 생성 결과, 이미지 분류 결과 등이 출력되는 출력 인터페이스를 포함할 수 있다. 즉, 사용자 인터페이스(120)는 사용자 요청 및 명령에 따른 결과를 출력할 수 있다. 이러한 사용자 인터페이스(120)의 입력 인터페이스와 출력 인터페이스는 동일한 인터페이스에서 구현될 수 있다.Additionally, the user interface 120 may include an output interface through which image creation results, image classification results, etc. are output based on the image classification model generated by the image classification model generating apparatus 100. That is, the user interface 120 can output results according to user requests and commands. The input interface and output interface of this user interface 120 may be implemented in the same interface.

메모리(130)는 이미지 분류 모델 생성 장치(100)의 동작 및/또는 서버(300)의 제어(연산)에 필요한 각종 정보들을 저장하고, 제어 소프트웨어를 저장할 수 있는 것으로, 휘발성 또는 비휘발성 기록 매체를 포함할 수 있다. The memory 130 stores various information necessary for the operation of the image classification model generating device 100 and/or the control (computation) of the server 300, and can store control software, and may be used as a volatile or non-volatile recording medium. It can be included.

메모리(130)는 하나 이상의 프로세서(140)와 연결되는 것으로, 프로세서(140)에 의해 실행될 때, 프로세서(140)로 하여금 이미지 분류 모델 생성 장치(100) 및/또는 서버(300)를 제어하도록 야기하는(cause) 코드들을 저장할 수 있다.The memory 130 is connected to one or more processors 140 and, when executed by the processor 140, causes the processor 140 to control the image classification model generating device 100 and/or the server 300. You can save the code that causes it.

여기서, 메모리(130)는 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 일 실시 예에서의 범위가 이에 한정되는 것은 아니다. 이러한 메모리(130)는 내장 메모리 및/또는 외장 메모리를 포함할 수 있으며, DRAM, SRAM, 또는 SDRAM 등과 같은 휘발성 메모리, OTPROM(one time programmable ROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, NAND 플래시 메모리, 또는 NOR 플래시 메모리 등과 같은 비휘발성 메모리, SSD. CF(compact flash) 카드, SD 카드, Micro-SD 카드, Mini-SD 카드, Xd 카드, 또는 메모리 스틱(memory stick) 등과 같은 플래시 드라이브, 또는 HDD와 같은 저장 장치를 포함할 수 있다. 그리고, 메모리(130)에는 일 실시 예에 따른 학습을 수행하기 위한 알고리즘에 관련된 정보가 저장될 수 있다. 그 밖에도 일 실시 예의 목적을 달성하기 위한 범위 내에서 필요한 다양한 정보가 메모리(130)에 저장될 수 있으며, 메모리(130)에 저장된 정보는 서버 또는 외부 장치로부터 수신되거나 사용자에 의해 입력됨에 따라 갱신될 수도 있다.Here, the memory 130 may include magnetic storage media or flash storage media, but its scope in one embodiment is not limited thereto. This memory 130 may include internal memory and/or external memory, volatile memory such as DRAM, SRAM, or SDRAM, one time programmable ROM (OTPROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, Non-volatile memory, such as NAND flash memory, or NOR flash memory, SSD. It may include a flash drive such as a compact flash (CF) card, SD card, Micro-SD card, Mini-SD card, Xd card, or memory stick, or a storage device such as an HDD. Additionally, information related to an algorithm for performing learning according to an embodiment may be stored in the memory 130. In addition, various necessary information may be stored in the memory 130 within the scope of achieving the purpose of an embodiment, and the information stored in the memory 130 may be updated as it is received from a server or external device or input by the user. It may be possible.

프로세서(140)는 이미지 분류 모델 생성 장치(100)의 전반적인 동작을 제어할 수 있다. 구체적으로, 프로세서(140)는 메모리(130)를 포함하는 이미지 분류 모델 생성 장치(100)의 구성과 연결되며, 메모리(130)에 저장된 적어도 하나의 명령을 실행하여 이미지 분류 모델 생성 장치(100)의 동작을 전반적으로 제어할 수 있다.The processor 140 may control the overall operation of the image classification model generating device 100. Specifically, the processor 140 is connected to the configuration of the image classification model generating device 100 including the memory 130, and executes at least one command stored in the memory 130 to generate the image classification model generating device 100. The operation can be controlled overall.

이러한, 프로세서(140)는 다양한 방식으로 구현될 수 있다. 예를 들어, 프로세서(140)는 주문형 집적 회로(Application Specific Integrated Circuit, ASIC), 임베디드 프로세서, 마이크로 프로세서, 하드웨어 컨트롤 로직, 하드웨어 유한 상태 기계(Hardware Finite State Machine, FSM), 디지털 신호 프로세서(Digital Signal Processor, DSP) 중 적어도 하나로 구현될 수 있다. The processor 140 may be implemented in various ways. For example, the processor 140 may include an application specific integrated circuit (ASIC), an embedded processor, a microprocessor, hardware control logic, a hardware finite state machine (FSM), and a digital signal processor. Processor, DSP).

프로세서(140)는 일종의 중앙처리장치로서 메모리(130)에 탑재된 제어 소프트웨어를 구동하여 이미지 분류 모델 생성 장치(100) 전체의 동작을 제어할 수 있다. 프로세서(140)는 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령어로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다.The processor 140 is a type of central processing unit and can control the entire operation of the image classification model generating device 100 by running control software mounted on the memory 130. Processor 140 may include all types of devices capable of processing data. Here, 'processor' may mean, for example, a data processing device built into hardware that has a physically structured circuit to perform a function expressed by code or instructions included in a program.

일 실시 예에서, 프로세서(140)는 바이어스 완화를 위한 불필요한(useless) 바이어스 생성 기법을 구현하는 것으로, 적응형 구조 인스턴스 정규화(Adaptive structural instance normalization) 알고리즘과 스타일 믹싱(style mixing) 알고리즘을 활용해 바이어스 완화에 가장 적합한 이미지 생성 레이어를 학습할 수 있으며, 대조 학습 모델을 기반으로 다른 라벨에서 전이 할 텍스처 이미지를 선정하여 잘못된 이미지 변경 가능성을 사전 차단할 수 있다.In one embodiment, the processor 140 implements a useless bias generation technique for bias mitigation, and uses an adaptive structural instance normalization algorithm and a style mixing algorithm to generate bias. The image generation layer most suitable for relaxation can be learned, and the possibility of incorrect image changes can be prevented in advance by selecting a texture image to transfer from another label based on the contrast learning model.

도 3은 본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 과정을 설명하기 위해 도시한 도면이다.FIG. 3 is a diagram illustrating an image classification model creation process according to an embodiment of the present disclosure.

도 3을 참조하면, 일 실시 예에서, 이미지 분류 모델은 바이어스 완화를 위한 "불필요한 바이어스 생성"이라는 목적에 부합하는 이미지 생성을 위해 구조 정보와 텍스처 정보를 추출하는 인코더(E) 레이어와 구조 정보와 텍스처 정보를 활용해 이미지를 생성하는 이미지 생성 레이어(G), 그리고 퀄리티 향상 및 학습을 위한 분류 레이어(D_L, D₂)를 포함하여 구성될 수 있다.Referring to Figure 3, in one embodiment, the image classification model includes an encoder (E) layer that extracts structural information and texture information to generate an image that meets the purpose of "generating unnecessary bias" for bias mitigation, and structural information. It can be composed of an image generation layer (G) that creates an image using texture information, and a classification layer (D _L , D ₂ ) for quality improvement and learning.

일 실시 예에서는, 텍스처가 변환된 이미지를 생성하기 위해 타겟 이미지()에서 추출한 구조 정보와 다른 이미지()에서 추출한 텍스처 정보를 이미지 생성 레이어(G)의 입력으로 넣어 이미지를 생성할 수 있으며, 추출한 구조 정보에 기존 바이어스에 대한 정보가 없도록 구조 정보 또한 이미지 생성 레이어(G)에 입력할 수 있다.In one embodiment, a target image ( ) and other images ( ) can be inputted into the image generation layer (G) to create an image, and structural information can also be input to the image generation layer (G) so that there is no information about the existing bias in the extracted structure information.

텍스처 조합(Texture mixing)Texture mixing

일 실시 예에서, 명명하는 구조 정보는 인코더(E)에서 입력 받은 이미지가 상대적으로 적게 풀링(pooling)된 특징맵을 의미할 수 있고, 이와 반대로 텍스처 정보는 1*1까지 풀링된 특징맵을 의미할 수 있다. 각각의 특징맵이 실제로 구조 정보와 텍스처 정보를 100% 담고 있는지를 확신할 수는 없으나, 입력된 픽셀의 공간적(spatial) 정보가 풀링을 통해 점점 손실되어가는 과정에서 각각 구조와 텍스처에 해당하는 정보가 더 많이 포함되었다고 가정할 수 있다.In one embodiment, the naming structure information may mean a feature map in which relatively few images input from the encoder (E) are pooled, and on the contrary, the texture information means a feature map in which the image input is pooled up to 1*1. can do. Although we cannot be certain that each feature map actually contains 100% structure and texture information, as the spatial information of the input pixels is gradually lost through pooling, the information corresponding to the structure and texture is lost. It can be assumed that more is included.

일 실시 예에서는, 컨볼루션 가중치 변조(convolution weight modulation) 레이어와 컨볼루션 가중치 복조(convolution weight demodulation) 레이어를 활용해 텍스처 정보를 이미지 생성 레이어(G)에 전이, 즉 텍스처 정보를 생성 이미지에 전이할 수 있다.In one embodiment, a convolution weight modulation layer and a convolution weight demodulation layer are used to transfer texture information to the image generation layer (G), that is, to transfer texture information to the generated image. You can.

일 실시 예에 따른, 이미지 생성 레이어(G)는 마지막 픽셀 값을 출력하는 레이어를 제외한 모든 레이어에 변조 및 복조를 활용해 텍스처 정보를 전이할 수 있으며, 이때 잠재 변수(latent variable) z는 구조 정보에 기반하여 설정될 수 있다.According to one embodiment, the image generation layer (G) can transfer texture information using modulation and demodulation to all layers except the layer that outputs the last pixel value, where the latent variable z is structural information. It can be set based on .

적응형 구조 인스턴스 정규화(Adaptive structural instance normalization)Adaptive structural instance normalization

일 실시 예에서, 이미지의 구조를 유지하면서 텍스처를 변환하고자 할 때, 이미지 생성 레이어(G)는 인코더(E)가 추출한 구조 정보를 입력으로 활용할 수밖에 없다. 여기서 주의해야 하는 부분은 구조 정보를 전달하면서, 타겟 이미지()의 텍스처 정보, 즉 바이어스가 함께 이미지 생성 레이어(G)에 전달되어 바이어스가 유지된 이미지가 생성될 가능성이 높다는 것이다. 이러한 부분을 방지하기 위해 구조 정보의 채널을 매우 줄인 후 이미지 생성 레이어(G)의 입력으로 구조 정보를 입력하는 방법이 있을 수 있으나, 채널을 줄이는 단순한 방법만으로는 바이어스 정보가 아직 남아 생성 이미지에 반영될 수 있다.In one embodiment, when trying to transform the texture while maintaining the structure of the image, the image generation layer (G) has no choice but to use the structural information extracted by the encoder (E) as input. What you need to be careful about here is that while conveying structural information, the target image ( )'s texture information, that is, the bias, is also transmitted to the image creation layer (G), so there is a high possibility that an image with the bias maintained will be created. To prevent this, there may be a way to greatly reduce the channel of structural information and then input the structural information as an input to the image generation layer (G). However, with a simple method of reducing the channel, bias information will still remain and be reflected in the generated image. You can.

이에 일 실시 예에서는, 구조 정보의 채널을 줄임과 동시에, 적응형 구조 정규화를 통해 텍스처를 변환할 이미지()에서 추출한 특징을 활용해 평균 과 표준편차 를 맞춰줌으로써, 바이어스를 줄일 수 있다. 이는 다음 수학식 1과 같이 나타낼 수 있다.Accordingly, in one embodiment, while reducing the channel of structural information, the image to transform the texture through adaptive structural normalization ( ) using the features extracted from and standard deviation By adjusting , the bias can be reduced. This can be expressed as Equation 1 below:

여기서, 과 는 를 인코더(E) 레이어를 활용해 추출한 구조 정보를 의미하며, 은 변환하고자 하는 이미지, 는 변환할 텍스처 이미지를 의미한다. 일 실시 예에서는, 이미지 생성을 위해 구조 정보는 정규화 레이어를 통해 특징맵 분포로 변환한 후 이미지 생성 레이어(G)에 입력될 수 있으며, 추출한 원래(raw) 특징맵이 이미지 생성에 바로 활용되어 생성된 이미지에 바이어스가 남는 것을 방지할 수 있다.here, class Is refers to the structural information extracted using the encoder (E) layer, is the image you want to convert, means the texture image to be converted. In one embodiment, for image generation, the structural information can be converted into a feature map distribution through a normalization layer and then input to the image generation layer (G), and the extracted raw feature map is directly used to generate the image. It is possible to prevent bias from remaining in the image.

일 실시 예에서, 정규화 레이어는 정규화 될 입력값과, 정규화를 위한 평균 및 표준편차, 또는 정규화를 위한 평균 및 표준편차를 추출하기 위한 입력값을 입력받을 수 있다. 즉, 정규화 레이어는 이전의 특정 레이어로부터 입력 이미지에 대한 특징맵(정규화 될 입력값)을 입력받을 수 있으며, 이전의 다른 특정 레이어로부터 상기 입력 이미지에 대한 특징맵을 정규화 하기 위한 평균 및 표준편차를 입력받을 수 있다. In one embodiment, the normalization layer may receive input values to be normalized, a mean and standard deviation for normalization, or input values for extracting the mean and standard deviation for normalization. In other words, the normalization layer can receive the feature map (input value to be normalized) for the input image from a previous specific layer, and calculate the average and standard deviation for normalizing the feature map for the input image from another previous specific layer. You can receive input.

여기서 는 입력값의 평균을 출력하는 함수를 의미할 수 있고, 는 입력값의 표준편차를 출력하는 함수를 의미할 수 있다. 본 실시 예에서, 정규화 레이어는 같은 형태(shape)를 가지는 두 개의 특징맵(feature map)을 입력값으로 받을 수 있다. 하나는 정규화의 대상이 되는 특징맵을 의미하며, 정규화된 특징맵은 다른 특징맵에서 얻은 통계량을 활용해 스케일(scale) 및 바이어스(bias)가 입혀질 수 있다. 즉, 수학식 1을 살펴보면, 먼저 큰 괄호 부분에 나타난 바와 같이 가 정규화 된 후, 에서 얻은 표준편차로 스케일되고, 평균으로 바이어스가 입혀짐을 볼 수 있다.here may refer to a function that outputs the average of the input values, may refer to a function that outputs the standard deviation of the input value. In this embodiment, the normalization layer may receive two feature maps with the same shape as input values. One refers to a feature map that is subject to normalization, and the normalized feature map can be scaled and biased using statistics obtained from other feature maps. In other words, looking at Equation 1, first, as shown in the large parentheses, After is normalized, You can see that it is scaled by the standard deviation obtained from and biased by the average.

텍스처 정보 전이를 통해 사실적인 이미지 생성(Realistic image generation with texture transfer)Realistic image generation with texture transfer

일 실시 예에서는, 이미지 생성을 위해 구조 정보와 텍스처 정보를 추출하는 인코더(E) 레이어, 구조 정보와 텍스처 정보를 활용해 이미지를 생성하는 이미지 생성 레이어(G) 그리고 텍스처 정보 믹싱과 정규화를 위한 정규화 레이어를 통해 사실적인 이미지를 생성하되, 최적화를 수행하여 바이어스가 생성된 이미지에 남지 않도록 할 수 있다.In one embodiment, an encoder (E) layer that extracts structural information and texture information to generate an image, an image generation layer (G) that generates an image using the structural information and texture information, and a normalizer for mixing and normalizing the texture information. You can create realistic images through layers, but perform optimization to ensure that no bias remains in the generated images.

이를 위해, 일 실시 예에서는, 종래의 스타일 전이(style transfer) 모델이 학습되는 절차와 동일하게 ImageNet에서 사전 학습된 VGG19모델이 사용될 수 있으나, 이에 한정되는 것은 아니다. 즉 도 3에 도시된 D_L은 VGG19모델에 의해 구현될 수 있으며, 일 실시 예의 도면이나 수학식에서 VGG19로 표기할 수 있다.For this purpose, in one embodiment, a VGG19 model pre-trained on ImageNet may be used in the same manner as the procedure in which a conventional style transfer model is learned, but is not limited thereto. That is, D _L shown in FIG. 3 can be implemented by the VGG19 model, and can be expressed as VGG19 in the drawing or mathematical equation of one embodiment.

일 실시 예에서, 최적화를 위해, 다음 수학식 2 및 수학식 3과 같이, 평균과 표준편차를 최소화하는 손실 함수를 활용할 수 있다.In one embodiment, for optimization, a loss function that minimizes the mean and standard deviation can be utilized, as shown in Equation 2 and Equation 3 below.

여기서 VGG19의 아래 첨자는 이미지를 입력 후 추출되는 특징맵의 레이어를 의미하며, 필터링된 이미지에서 해당 특성을 찾는 ReLU 활성화 레이어를 의미할 수 있다. 그리고 과 는 에 대해 인코더 레이어(E)를 활용해 추출한 텍스처 정보를 의미하며, 과 는 이전 구조 정보를 의미할 수 있다. Here, the subscript of VGG19 refers to the layer of the feature map extracted after inputting the image, and may refer to the ReLU activation layer that finds the corresponding feature in the filtered image. and class Is This refers to the texture information extracted using the encoder layer (E), class may mean previous structural information.

또한 일 실시 예에서는, 이와 더불어 사실적인 이미지 생성을 위해 적대적 손실 함수(adversarial loss)를 학습에 활용할 수 있으며, 적대적 손실 함수는 다음 수학식 4와 같이 나타낼 수 있다.Additionally, in one embodiment, an adversarial loss function can be used for learning to generate realistic images, and the adversarial loss function can be expressed as Equation 4 below.

그리고 일 실시 예에서, 최종 활용된 손실 함수는 다음 수학식 5와 같이 나타낼 수 있다.And in one embodiment, the final utilized loss function can be expressed as Equation 5 below.

이때, 는 텍스처 기반 손실 함수의 정규화 가중치 파라미터일 수 있으며, 예를 들어 10으로 설정될 수 있다. At this time, may be the normalization weight parameter of the texture-based loss function and may be set to 10, for example.

다시 말하면, 일 실시 예에서, 프로세서(140)는 수학식 2와 같이, 생성한 이미지 와 타겟 이미지의 평균과 표준편차가 최소화되도록 구조 기반 손실 함수 를 계산할 수 있다.In other words, in one embodiment, the processor 140 generates the image as shown in Equation 2: and a structure-based loss function to minimize the mean and standard deviation of the target image. can be calculated.

그리고 프로세서(140)는 수학식 3과 같이, 생성한 이미지 와 텍스처 이미지의 평균과 표준편차가 최소화되도록 텍스처 기반 손실 함수 를 계산할 수 있다.And the processor 140 processes the generated image as shown in Equation 3. and a texture-based loss function to minimize the mean and standard deviation of the texture image. can be calculated.

또한, 프로세서(140)는 수학식 4와 같이, 타겟 이미지와 생성 이미지 의 오차가 최소화되도록 적대적(adversarial) 손실 함수 를 계산할 수 있다.In addition, the processor 140 processes the target image and the generated image as shown in Equation 4. Adversarial loss function to minimize the error of can be calculated.

최종적으로 프로세서(140)는 수학식 5와 같이, 구조 기반 손실 함수(), 텍스처 기반 손실 함수() 및 적대적 손실 함수()에 기반하여, 와 같이 최종 손실 함수()를 산출할 수 있다. Finally, the processor 140 generates a structure-based loss function ( ), texture-based loss function ( ) and adversarial loss function ( ) Based on, The final loss function ( ) can be calculated.

일 실시 예에서는, 상술한 과정에 따른 학습 후, 이미지 생성 시, 인코더 레이어(E)를 통해 텍스처와 구조 특징맵을 각각 추출한 후, 이미지 생성 레이어(G)에 와 같이 입력할 수 있으며, 바이어스가 완화된 데이터 셋 구축을 위한 텍스처 변환 이미지를 생성할 수 있다.In one embodiment, after learning according to the above-described process, when generating an image, the texture and structural feature maps are each extracted through the encoder layer (E) and then applied to the image generation layer (G). It can be input as follows, and a texture conversion image can be created to build a data set with reduced bias.

이미지 검색 및 대조 학습(Image Searching and Contrastive learning)Image Searching and Contrastive learning

일 실시 예에서는, 적응형 구조 인스턴스 정규화(Adaptive structural instance normalization) 알고리즘과 스타일 믹싱(style mixing) 알고리즘을 적용함에 있어, 다른 라벨에 존재하는 이미지 중 가장 유사한 이미지를 선별해 텍스처를 변환할 수 있다. In one embodiment, when applying an adaptive structural instance normalization algorithm and a style mixing algorithm, the texture may be converted by selecting the most similar image among images existing in different labels.

이는 구조 정보가 크게 상이할 경우, 텍스처의 정보 부제로 인한, 이미지가 잘못 생성되는 문제를 사전에 차단하기 위함으로, 일 실시 예에서는, 대조 학습 알고리즘을 통해 유사한 이미지를 선별할 수 있다. 할 것이다.This is to prevent problems in advance where images are incorrectly generated due to texture information subtitles when structural information is significantly different. In one embodiment, similar images can be selected through a contrast learning algorithm. something to do.

예를 들어, 대조 학습 알고리즘으로는 Momentum Contrast 학습 방식을 사용할 수 있으며, 학습된 모델이 추출한 특징의 거리를 측정해 가장 유사한 이미지를 선별할 수 있다.For example, the Momentum Contrast learning method can be used as a contrast learning algorithm, and the most similar images can be selected by measuring the distance between features extracted by the learned model.

도 4는 본 개시의 일 실시 예에 따른 이미지 분류 모델 성능 실험 결과를 나타낸 예시도이다. 도 4를 참조하여 일 실시 예에 따른 이미지 분류 모델의 성능 검증 결과에 대해 설명할 수 있다.Figure 4 is an example diagram showing the results of an image classification model performance experiment according to an embodiment of the present disclosure. The results of performance verification of an image classification model according to an embodiment can be described with reference to FIG. 4 .

실험 및 결과(Experiments and Results)Experiments and Results

(데이터 셋)(data set)

대부분의 이미지분류 benchmark는 train/validation/test가 동일한 분포를 따르도록 설정해 구축한다. 하지만 이는 일 실시 예에 따른, 이미지 분류 모델 생성 장치(100)의 바이어스 문제 완화 성능을 평가하기에는 적합한 데이터 셋이 아니며, train/validation과 test가 다른 분포를 갖는 데이터 셋을 구축할 필요가 있다. Most image classification benchmarks are built by setting train/validation/test to follow the same distribution. However, this is not a data set suitable for evaluating the bias problem mitigation performance of the image classification model generating device 100 according to one embodiment, and it is necessary to construct a data set in which train/validation and test have different distributions.

이에 일 실시 예에서는, CT의 제조사, 모델명 그리고 CT 촬영 설정(Kilovoltage peak)을 활용하여, train/validation 데이터 셋이 바이어스 된 데이터 셋을 구축했으며, 구축한 데이터 셋의 통계는 표 1과 같다.Accordingly, in one embodiment, a data set in which the train/validation data set is biased was constructed using the CT manufacturer, model name, and CT imaging settings (Kilovoltage peak), and the statistics of the constructed data set are shown in Table 1.

(폐와 병변 세그멘테이션)(Lung and lesion segmentation)

그리고 일 실시 예에서, 수집한 폐 CT는 폐 세그멘테이션과 병변(lesion) 세그멘테이션을 진행했으며, state-of-the-art instance segmentation 모델인 Mask-cascade-RCNN-ResNeSt-200 with deformable convolution neural network(DCN)을 활용하였다. 폐와 병변 세그멘테이션 성능은 각각 검증 데이터 셋에서 97.18%와 78.06%를 보였으며, 이미지 분류 모델 생성 장치(100)의 이미지 분류 모델은 수집한 모든 폐 CT에 적용해 폐와 병변 세그멘테이션을 추출하였다. And in one embodiment, the collected lung CT underwent lung segmentation and lesion segmentation, and the state-of-the-art instance segmentation model Mask-cascade-RCNN-ResNeSt-200 with deformable convolution neural network (DCN) ) was used. The lung and lesion segmentation performance was 97.18% and 78.06% in the validation data set, respectively, and the image classification model of the image classification model generator 100 was applied to all collected lung CTs to extract lung and lesion segmentation.

폐 세그멘테이션은 질병 진단에 의미가 없는 폐 이외의 부분을 검정색으로 표시하거나 이미지를 정사각형으로 크롭하는 역할을 하며, 병변 세그멘테이션은 병변이 없는 CT 슬라이스를 제거하는 데 활용된다.Lung segmentation is used to mark parts other than the lungs that are meaningless in disease diagnosis in black or to crop the image into a square, while lesion segmentation is used to remove CT slices without lesions.

(분류 모델 설정)(Classification model settings)

일 실시 예에서의 실험에 활용한 분류 모델은 ResNet18로 ImageNet pretrained 모델과 random initialization 모델을 활용했으며, Stochastic Gradient Descent를 learning rate 0.001으로 설정하여 학습했다. 배치 크기는 64로 설정하였으며, 색상과 관련 없는 증강(augmentation)은 랜덤 크롭(random crop)과 수평 반전(horizontal flip)을 활용했다. 타당성 검증(validation)과 테스트에 있어서, majority voting 방법을 활용해 환자 당 1개의 예측 값을 도출했으며, 정답 레이블과 비교해 최종 정확도를 계산했다. 분류모델 학습에 있어서 총 100 epoch를 학습했으며, validation 셋을 활용해 가장 좋은 성능을 보인 모델을 선정해 테스트 셋에서 최종 분류 성능을 평가하였다. 평가 지표는 매크로 평균(macro average)을 활용한 f1-score를 활용했으며, 학습 및 검증 데이터 셋의 데이터 불균형 영향을 최소화하려고 했다. 또한 학습/검증/테스트는 총 3번 이루어졌으며, 3번에 걸쳐서 도출한 정확도를 평균해 최종 정확도를 도출했다.The classification model used in the experiment in one embodiment was ResNet18, which used an ImageNet pretrained model and a random initialization model, and was trained with Stochastic Gradient Descent set to a learning rate of 0.001. The batch size was set to 64, and random crop and horizontal flip were used for augmentation unrelated to color. For validation and testing, a majority voting method was used to derive one predicted value per patient, and the final accuracy was calculated by comparing it with the correct answer label. In learning the classification model, a total of 100 epochs were learned, and the model with the best performance was selected using the validation set and the final classification performance was evaluated on the test set. The evaluation indicator used f1-score using macro average, and attempted to minimize the impact of data imbalance in the training and validation data sets. In addition, learning/verification/testing was conducted a total of three times, and the final accuracy was derived by averaging the accuracies derived three times.

(대조 학습 설정)(Contrast learning settings)

텍스처 이미지 선별을 위해 ResNet50을 Momentum Contrast 방법을 활용해 200epoch 학습했으며, 학습을 완료한 ResNet50은 각 슬라이스를 입력해 특징을 추출했으며, 다른 라벨에 존재하는 이미지 중 가장 유사한 특징을 갖은 이미지를 찾아 텍스처 전달 이미지 쌍으로 선정하였다. To select texture images, ResNet50 was trained for 200 epochs using the Momentum Contrast method. Once trained, ResNet50 inputs each slice to extract features, and finds the image with the most similar features among images existing in different labels to transfer the texture. Image pairs were selected.

이미지 생성에 기반한 비교 기법 및 이미지 분류 모델 생성 장치(100)의 기법은 공정한 성능 비교를 위해, 동일한 텍스처 이미지 쌍을 활용해 진행했다.The comparison technique based on image generation and the technique of the image classification model generating device 100 were conducted using the same pair of texture images for fair performance comparison.

(비교 방법)(How to compare)

텍스처 바이어스 제거 기반 비교 기법의 성능 비교를 위해, edge와 wavelet 기반 전처리를 진행했으며, edge의 경우 canny알고리즘을 활용했고, wavelet의 경우 Daubechies 2 필터를 활용하였다. 증강 기반 기법들은 color augmentation과 style augmentation을 활용했으며, color augmentation의 경우 brightness, contrast, saturation, 그리고 hue를 변경했다. To compare the performance of texture bias removal-based comparison techniques, edge- and wavelet-based preprocessing was performed. The canny algorithm was used for edges, and the Daubechies 2 filter was used for wavelets. Augmentation-based techniques utilized color augmentation and style augmentation, and in the case of color augmentation, brightness, contrast, saturation, and hue were changed.

Learning based model 비교기법은 Learning not to learn와 blindeye 기법을 활용해 성능을 비교했으며, 제안된 Augmentation based model의 경우 실제(real world) 이미지에만 처리하도록 되어있어, 의료영상 분석에 직접적으로 적용할 수 없기에 실험을 진행하지 않았다. 또한 Unimodal bias mitigation model 비교기법은 다중 입력 및 다중 모델을 시나리오를 다루고 있으며, 이에 이미지 분류 모델 생성 장치(100)와 직접적인 성능 비교가 어려워 실험을 진행하지 않았다.The learning based model comparison technique compared performance using learning not to learn and blindeye techniques, and the proposed augmentation based model is designed to process only real world images and cannot be directly applied to medical image analysis. No experiment was conducted. In addition, the Unimodal bias mitigation model comparison technique deals with scenarios of multiple inputs and multiple models, and therefore, no experiments were conducted because it was difficult to directly compare performance with the image classification model generating device 100.

이미지 생성 모델로 활용한 비교 기법은 AdaIN과 swapping-autoencoder를 활용했으며, 각 기법에 대해 ImageNet데이터로 사전 학습된 모델을 사용했을 때와 안 했을 때의 결과를 구했다.The comparison techniques used as image generation models used AdaIN and swapping-autoencoder, and for each technique, the results were obtained when a model pre-trained with ImageNet data was used and when it was not used.

(결과)(result)

표 2를 참조하면, 실험 결과 색상 관련된 증강을 활용하지 않은 모델은 사전 학습된 모델을 사용함과 상관없이 35% 미만의 f1-score를 보였다. 이는 학습 및 검증 데이터 셋이 바이어스 정보를 활용해 모델이 라벨을 예측하도록 학습되어 실제 데이터 셋에서 좋은 성능을 보이지 못하는 것으로 볼 수 있다. Referring to Table 2, the experimental results show that the model that did not utilize color-related augmentation showed an f1-score of less than 35% regardless of whether a pre-trained model was used. This can be seen as the fact that the training and validation data sets are trained to predict labels using bias information, so they do not perform well in the actual data set.

텍스처 관련된 바이어스를 제거하는 단순한 방법 중 하나인, edge 및 wavelet 전처리 기법도 최대 60%의 f1-score를 보이고 있기에 바이어스를 완화했다고 보기에는 어렵다. Edge and wavelet preprocessing techniques, which are one of the simple methods to remove texture-related bias, show an f1-score of up to 60%, so it is difficult to say that the bias has been alleviated.

Edge와 wavelet을 활용한 전처리가 큰 도움을 주지 못한 것은, 전처리를 함에 있어, 예측에 필요한 중요한 정보가 손실되었기 때문으로, 이 결과를 통해, 일 실시 예에서는, "불필요한 바이어스 생성" 기법이 바이어스를 제거하는 방법론보다 훨씬 적합한 해결방안으로 판단되어, 해당 기법에 기반하여 이미지 분류 모델을 생성할 수 있다. The reason that preprocessing using edges and wavelets did not provide much help was because important information needed for prediction was lost during preprocessing. Based on this result, in one embodiment, the “unnecessary bias generation” technique is used to reduce bias. It is judged to be a much more suitable solution than the removal methodology, and an image classification model can be created based on the technique.

Color augmentation을 활용할 시 최대 66%까지 바이어스를 완화된 것으로 보이나, 바이어스가 더 완화될 필요가 있으며, 스타일 전이를 활용한 스타일 증강의 경우 70%까지 정확도를 올릴 정도로 놀라운 바이어스 완화 성능을 보였으나, artistic한 형태로 이미지를 변형해 텍스처 바이어스만을 없애주는 증강이 아니기에 사용에 조심할 필요가 있다.When using color augmentation, the bias appears to have been alleviated by up to 66%, but the bias needs to be further alleviated. In the case of style augmentation using style transfer, the bias mitigation performance was amazing enough to raise the accuracy to 70%, but artistic You need to be careful when using it because it is not an augmentation that only removes texture bias by transforming the image in one form.

알려진 바이어스(Known bias)를 완화하는 기법은 confusion loss를 활용하거나 gradient reversal layer와 mutual information을 최소화하는 기법들이 제안되었으며 상기 모델은 모두 폐 CT 영상에서 바이어스 문제를 완화하는데 실패했다. 이는 오차 함수 기반의 바이어스 해소 기법이 사람도 명확하게 설명할 수 없는 CT 영상의 차이를 인식해 이를 완화하는 것이 어렵기 때문이라고 생각하며, 또한 배경 혹은 국소적인 영역에 바이어스가 있는 것과 달리, 영상 전체에 엷게 나타나는 바이어스를 인식 후 특징맵 추출 모델에게 이를 완화하거나 헷갈리게 하는 것은 상당히 어려운 문제이기 때문이다.Techniques to alleviate known bias were proposed by using confusion loss or minimizing mutual information with a gradient reversal layer, and all of the above models failed to alleviate the bias problem in lung CT images. This is believed to be because it is difficult for error function-based bias resolution techniques to recognize and alleviate differences in CT images that cannot be clearly explained even by humans, and unlike bias in the background or local areas, the entire image This is because it is a very difficult problem to recognize the bias that appears slightly and then alleviate or confuse the feature map extraction model.

일 실시 예에서는, "불필요한 바이어스 생성"을 위해 텍스처를 전이한 이미지를 활용할 수 있으며, 이를 위한 비교 기법으로 가장 대중적으로 활용되는 arbitrary style transfer 기법인 AdaIN과 텍스처 변환에 좋은 성능을 보인 swapping-auto encoder를 활용해 바이어스 완화(debias)된 데이터 셋을 구축해 분류 성능을 비교하였다.In one embodiment, an image with a texture transfer can be used to “create unnecessary bias,” and as a comparative technique for this, AdaIN, the most popular arbitrary style transfer technique, and swapping-auto encoder, which has shown good performance in texture conversion, We constructed a debiased data set using and compared classification performance.

실험 결과, 이미지 분류 모델 생성 장치(100), swapping-autoencoder, 그리고 AdaIN 순으로 성능이 높았으며, 이미지 분류 모델 생성 장치(100)가 가장 성공적으로 바이어스를 완화하였다. AdaIN의 경우 텍스처를 전이하는 성능은 높지만 생성된 이미지의 퀄리티가 떨어지기에 가장 낮은 성능을 보였으며, swapping-autoencoder의 경우 높은 퀄리티의 이미지가 생성되었고 텍스처 또한 잘 변경되었으나, 텍스처 소스의 이미지의 형태로 병변을 바꾸거나(혹은 제거하거나) 바이어스에 대한 정보가 생성된 이미지에 남는 경우가 있기 때문에 바이어스 완화 성능 향상에 한계가 있었다. 특히 swapping-autoencoder의 co-occurrence는 Cycle-GAN과 같이 다른 도메인으로 이미지 형태로 이미지를 생성하는 경향을 많이 보여, 이러한 특징들이 swapping-autoencoder의 데이터 완화 성능을 떨어뜨렸다. As a result of the experiment, the image classification model generator (100), swapping-autoencoder, and AdaIN had the highest performance in that order, and the image classification model generator (100) was the most successful in alleviating bias. In the case of AdaIN, the performance of transferring the texture was high, but the quality of the generated image was low, so it showed the lowest performance. In the case of swapping-autoencoder, a high quality image was generated and the texture was also changed well, but in the form of an image of the texture source. There was a limit to improving bias mitigation performance because information about bias sometimes remained in the generated image when the lesion was changed (or removed). In particular, the co-occurrence of swapping-autoencoder showed a tendency to generate images in the form of images in other domains, such as Cycle-GAN, and these characteristics reduced the data mitigation performance of swapping-autoencoder.

반면에 이미지 분류 모델 생성 장치(100)에서의 mixing-AdaSIN은 style mixing 기법을 활용해 높은 퀄리티의 이미지 생성을 가능하게 할 뿐만 아니라 사실적으로 텍스처를 전이했으며, AdaSIN을 통해 인코딩된 구조 정보에 바이어스가 최소화한 부분이 데이터 완화에 적합한 이미지 생성을 가능하게 할 수 있다. 즉 일 실시 예의 이미지 분류 모델 생성 장치(100)가 실제로 다른 텍스처 변환 기법 대비 높은 데이터 완화 성능을 나타냄을 알 수 있으며, 관련된 생성 이미지는 도 4를 참조할 수 있다.On the other hand, mixing-AdaSIN in the image classification model generator 100 not only enables the creation of high-quality images by utilizing the style mixing technique, but also realistically transfers the texture, and there is no bias in the structural information encoded through AdaSIN. Minimized parts can enable the creation of images suitable for data mitigation. In other words, it can be seen that the image classification model generating apparatus 100 of one embodiment actually exhibits higher data relaxation performance compared to other texture conversion techniques, and the related generated image can be referenced in FIG. 4.

(외부 타당성 검증)(External validation)

바이어스 완화된 데이터 셋을 활용해 학습한 분류 모델은 바이어스된 데이터 셋을 활용해 학습한 모델보다 더 높은 생성 성능을 보여야 하며, 이는 곧 외부의 데이터 셋을 활용해 병 분류(예를 들어, 코로나19 분류) 성능 검증 시에도 더 높은 성능을 보여야 함을 의미한다. A classification model learned using a bias-relieved data set should show higher generation performance than a model learned using a biased data set, which means that it can be used to classify diseases using external data sets (e.g., COVID-19 Classification) This means that higher performance must be shown during performance verification.

일 실시 예에 따른 실험에서는, 공개된 러시아 코로나19 데이터 셋인, MosMed 데이터 셋을 활용해 모델의 외부 성능을 검증했으며, MosMed 데이터 셋의 경우 병변이 CT 영상에 존재하지 않는 환자가 다수 존재해, 중증 환자를 대상으로 분류 성능을 테스트하였다. 실험 결과는 표 3을 참조할 수 있으며, 상술한 실험과 동일하게, 이미지 분류 모델 생성 장치(100), swapping-autoencoder, AdaIN 순으로 높은 성능을 보였다.In an experiment according to one embodiment, the external performance of the model was verified using the MosMed data set, a publicly available Russian COVID-19 data set. In the case of the MosMed data set, there were many patients whose lesions were not present in the CT images, leading to severe cases. Classification performance was tested on patients. The experimental results can be seen in Table 3. As with the above-mentioned experiment, the image classification model generator (100), swapping-autoencoder, and AdaIN showed high performance in that order.

사전 학습된 모델과 이미지 분류 모델 생성 장치(100)를 활용한 모델은 77.3% 정확도를 보였으며, 이는 기존 80.97% f1-score와 비교할 시에도 큰 성능차이를 보이지 않는다. 다만 사전 학습된 모델을 활용하지 않을 경우에는 성능 하락이 더 크나, 이는 높은 생성 성능을 얻기 위해서 사전 학습된 모델을 활용할 필요가 있음을 의미한다고 볼 수 있다.The model using the pre-trained model and the image classification model generator 100 showed an accuracy of 77.3%, which does not show a significant performance difference even when compared to the existing 80.97% f1-score. However, if the pre-trained model is not used, the performance drop is greater, but this can be seen as meaning that it is necessary to use the pre-trained model to obtain high generation performance.

바이어스를 완화한 데이터 셋은, 이미지 분류 모델 생성 장치(100)의 baseline기법(예를 들어, ResNet18 모델과 Majority voting를 활용한 예측 방법)을 통해 높은 바이어스 완화 성능을 보였으며, 가장 근래의 CT 영상을 활용해 코로나19 진단 기법에서도 높은 완화 성능을 나타낸다. The bias-relieved data set showed high bias-relief performance through the baseline technique of the image classification model generator 100 (e.g., a prediction method using the ResNet18 model and majority voting), and the most recent CT image. It shows high mitigation performance in COVID-19 diagnostic techniques using .

다른 모델에서, 이미지 분류 모델 생성 장치(100)가 유의미함을 확인하기 위해 코드를 공개한 COVNet과 Contrastive-COVIDNet을 활용해 바이어스 완화 성능을 검증했으며, 실험 결과는 표 4를 참조할 수 있다.In other models, the bias mitigation performance was verified using COVNet and Contrastive-COVIDNet, whose code was released to confirm that the image classification model generator 100 was meaningful, and the experimental results can be seen in Table 4.

CONNet의 경우 74.22% f1-score를 보임으로 성공적으로 바이어스를 완화한 성능을 보였으며, 기존 80.97% 대비해서는 낮은 성능을 보였으나, 사용한 증강이 기법 별로 상이하고, 학습 설정이 상이한 이유에 따라(ex: batch size 1) 근소한 성능차이는 있을 수 있다. 다만 Contrastive-COVIDNet의 경우는 데이터 완화는 보였으나, 61.74% f1-score를 보이는 것과 같이 낮은 성능을 보였다.In the case of CONNet, it showed performance in successfully alleviating bias by showing an f1-score of 74.22%, and showed lower performance compared to the existing 80.97%. However, the augmentation used was different for each technique and the learning settings were different (ex. : batch size 1) There may be a slight performance difference. However, in the case of Contrastive-COVIDNet, data was alleviated, but performance was low, as shown by 61.74% f1-score.

실험 결과를 통해 기법 별 성능 차는 일부 존재하나, 두 모델 모두 바이어스 완화된 데이터 셋을 활용해 성능 개선을 보였다. Experimental results show that although there are some differences in performance by technique, both models showed performance improvement by using a bias-relieved data set.

즉 상술한 바를 참조하면, 일 실시 예에 따른, 이미지 분류 모델 생성 장치(100)는 바이어스된 폐 CT 학습 데이터 셋을 활용해 추론 모델을 end-to-end 데이터 기반으로 학습하더라도, 데이터 위험에 빠지지 않도록 할 수 있으며, 바이어스 문제를 완화하기 위해, "불필요한 바이어스 생성"전략을 활용하고, 이와 더불어 사실적인 이미지 생성과 구조의 바이어스 정보 전달은 최소화하면서 텍스처 정보를 잘 전달할 수 있는 mixing-AdaSIN 알고리즘을 활용할 수 있다.That is, referring to the above, even if the image classification model generating apparatus 100 according to one embodiment learns an inference model based on end-to-end data using a biased lung CT learning data set, it does not fall into data risk. In order to alleviate the bias problem, the "unnecessary bias generation" strategy can be used, and in addition, the mixing-AdaSIN algorithm can be used to generate realistic images and convey texture information well while minimizing the transmission of structural bias information. You can.

일 실시 예에서는, mixing-AdaSIN 알고리즘을 통해 바이어스가 완화된 데이터 셋을 구축할 수 있으며, 이를 활용해 분류 모델을 학습한 결과, 다양한 이미지 증강과 바이어스 완화 기법 그리고 텍스처 변환을 위한 기법들 대비 높은 데이터 완화 성능을 나타낼 수 있다. 또한 바이어스가 완화된 데이터 셋을 활용해 학습한 모델은 외부 데이터 셋에서도 높은 생성 성능을 보였다.In one embodiment, a data set with bias alleviated can be constructed through the mixing-AdaSIN algorithm, and as a result of learning a classification model using this, the data is high compared to various image augmentation and bias mitigation techniques and techniques for texture conversion. It can indicate mitigation performance. In addition, the model learned using the bias-relieved data set showed high generation performance even in external data sets.

일 실시 예에 따른, 이미지 분류 모델 생성 장치(100)는 동일 도메인 이미지의 제조사 기기 그리고 촬영 설정 등과 같이 바이어스를 명확하게 정의하기 어려우며, 이미지 전체에 바이어스가 매핑 되어 바이어스를 마스킹하기도 어려운 태스크에 적용하기에 매우 적합하며, 이 외에도 데이터 위험이 발생할 가능성이 높은 데이터 셋에 선제적으로 적용해 분류 모델의 real world generalization 성능을 확보할 수 있다.According to one embodiment, the image classification model generating device 100 is applied to tasks in which it is difficult to clearly define bias, such as the manufacturer device and shooting settings of the same domain image, and it is difficult to mask the bias because the bias is mapped to the entire image. It is very suitable for , and in addition, real-world generalization performance of the classification model can be secured by preemptively applying it to data sets where data risks are likely to occur.

도 5는 본 개시의 일 실시 예에 따른 이미지 분류 모델 생성 방법을 설명하기 위한 흐름도이다.Figure 5 is a flowchart illustrating a method for generating an image classification model according to an embodiment of the present disclosure.

도 5를 참조하면, S100단계에서, 프로세서(140)는 이미지 분류 모델 생성을 위해 수집한 데이터 셋에서, 타겟 이미지와 타겟 이미지의 텍스처(texture)를 변경하기 위한 텍스처 이미지를 포함하는 이미지 쌍을 선택한다.Referring to FIG. 5, in step S100, the processor 140 selects an image pair including a target image and a texture image for changing the texture of the target image from the data set collected to create an image classification model. do.

이때 프로세서(140)는 기 학습된 대조 학습 레이어를 기반으로 데이터 셋에서 서로 유사한 이미지 쌍을 선별할 수 있다. 즉 프로세서(140)는 데이터 셋에서 서로 다른 라벨의 이미지에 대해, 기 학습된 대조 학습 레이어에서 추출한 특징의 거리를 측정하고, 특징의 거리를 기반으로 타겟 이미지와 타겟 이미지와 다른 라벨의 이미지 중 가장 거리가 가까운 이미지를 텍스처 이미지로 선별할 수 있다.At this time, the processor 140 may select pairs of similar images from the data set based on a previously learned contrast learning layer. That is, the processor 140 measures the distance of features extracted from a previously learned contrast learning layer for images of different labels in the data set, and based on the distance of the features, selects the target image and the image with a different label from the target image. Images that are close in distance can be selected as texture images.

S200단계에서, 프로세서(140)는 인코더 레이어를 통해, 타겟 이미지와 텍스처 이미지 각각에 대한 구조(structure) 정보 및 텍스처 정보를 추출한다.In step S200, the processor 140 extracts structure information and texture information for each of the target image and texture image through the encoder layer.

이때, 프로세서(140)는 타겟 이미지 및 텍스처 이미지 각각을 인코더 레이어에 입력하여 타겟 이미지 및 텍스처 이미지 각각에 대한 구조 특징맵 및 텍스처 특징맵을 추출할 수 있다. At this time, the processor 140 may input each of the target image and texture image to the encoder layer and extract a structural feature map and a texture feature map for each of the target image and texture image.

그리고 프로세서(140)는 타겟 이미지 및 텍스처 이미지 각각에 대한 구조 특징맵의 통계 정보를 이용하여, 타겟 이미지의 구조 정보 및 텍스처 이미지의 구조 정보를 추출하고, 타겟 이미지 및 텍스처 이미지 각각에 대한 텍스처 특징맵의 통계 정보를 이용하여, 타겟 이미지의 텍스처 정보 및 텍스처 이미지의 텍스처 정보를 추출할 수 있다. Then, the processor 140 extracts the structural information of the target image and the structural information of the texture image using the statistical information of the structural feature map for each of the target image and the texture image, and creates a texture feature map for each of the target image and the texture image. Using the statistical information, the texture information of the target image and the texture information of the texture image can be extracted.

S300단계에서, 프로세서(140)는 정규화 레이어에서 타겟 이미지의 구조 정보와 텍스처 이미지의 구조 정보를 기반으로, 타겟 이미지에 대한 구조 정보를 대상으로 정규화를 수행한다.In step S300, the processor 140 performs normalization on the structural information about the target image based on the structural information of the target image and the structural information of the texture image in the normalization layer.

여기서, 텍스처 이미지의 구조 정보는 구조 특징맵의 평균 및 표준 편차를 포함할 수 있다.Here, the structural information of the texture image may include the average and standard deviation of the structural feature map.

프로세서(140)는 타겟 이미지의 구조 특징맵을 정규화하고, 정규화된 타겟 이미지의 구조 특징맵에 텍스처 이미지의 구조 특징맵의 표준 편차를 곱하고, 텍스처 이미지의 구조 특징맵의 평균을 가산하여, 정규화를 수행할 수 있다.The processor 140 normalizes the structural feature map of the target image, multiplies the normalized structural feature map of the target image by the standard deviation of the structural feature map of the texture image, and adds the average of the structural feature map of the texture image to perform normalization. It can be done.

한편, 프로세서(140)는 이미지를 생성하기 전에, 이미지 생성 레이어 중 마지막 픽셀 값을 출력하는 레이어를 제외한 모든 레이어에 변조(modulation)와 복조(demodulation)를 적용하여, 텍스처 이미지의 텍스처 정보를 이미지 생성 레이어에 전이(transfer)할 수 있다.Meanwhile, before generating an image, the processor 140 applies modulation and demodulation to all layers except the layer that outputs the last pixel value among the image generation layers, and generates the image using the texture information of the texture image. It can be transferred to a layer.

그리고 S400단계에서, 프로세서(140)는 이미지 생성 레이어에서 텍스처 이미지의 텍스처 정보와, 정규화된 구조 정보를 기반으로 이미지를 생성할 수 있으며, S500단계에서, 생성한 이미지와, 타겟 이미지 및 텍스처 이미지를 기반으로 하여, 이미지 분류 모델의 최적화를 수행할 수 있다.And in step S400, the processor 140 may generate an image based on the texture information of the texture image and the normalized structure information in the image generation layer, and in step S500, the generated image, target image, and texture image are combined. Based on this, optimization of the image classification model can be performed.

이때, 프로세서(140)는 텍스처 이미지로부터의 구조 정보를 기반으로 하는 이미지의 특징을 잠재 변수(Latent variable)로 수치화할 수 있다.At this time, the processor 140 may quantify the characteristics of the image based on structural information from the texture image into latent variables.

그리고 프로세서(140)는 생성한 이미지와 타겟 이미지의 평균과 표준편차가 최소화되도록 구조 기반 손실 함수를 계산하고, 생성한 이미지와 텍스처 이미지의 평균과 표준편차가 최소화되도록 텍스처 기반 손실 함수를 계산할 수 있다.The processor 140 may calculate a structure-based loss function to minimize the average and standard deviation of the generated image and the target image, and calculate a texture-based loss function to minimize the average and standard deviation of the generated image and the texture image. .

또한, 프로세서(140)는 타겟 이미지와 생성 이미지의 오차가 최소화되도록 적대적(adversarial) 손실 함수를 계산할 수 있다.Additionally, the processor 140 may calculate an adversarial loss function to minimize the error between the target image and the generated image.

그리고 최종적으로 프로세서(140)는 구조 기반 손실 함수(), 텍스처 기반 손실 함수() 및 적대적 손실 함수()에 기반하여, 와 같이 최종 손실 함수()를 산출할 수 있다. 여기서, 는 텍스처 기반 손실 함수의 정규화 가중치 파라미터일 수 있다.And finally, the processor 140 uses a structure-based loss function ( ), texture-based loss function ( ) and adversarial loss function ( ) Based on, The final loss function ( ) can be calculated. here, may be the normalization weight parameter of the texture-based loss function.

이상 설명된 본 발명에 따른 실시 예는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다.Embodiments according to the present invention described above may be implemented in the form of a computer program that can be executed through various components on a computer, and such a computer program may be recorded on a computer-readable medium. At this time, the media includes magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and ROM. , RAM, flash memory, etc., may include hardware devices specifically configured to store and execute program instructions.

한편, 상기 컴퓨터 프로그램은 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 프로그램의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함될 수 있다.Meanwhile, the computer program may be designed and configured specifically for the present invention, or may be known and available to those skilled in the art of computer software. Examples of computer programs may include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

본 발명의 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다.In the specification (particularly in the claims) of the present invention, the use of the term “above” and similar referential terms may refer to both the singular and the plural. In addition, when a range is described in the present invention, the invention includes the application of individual values within the range (unless there is a statement to the contrary), and each individual value constituting the range is described in the detailed description of the invention. It's the same.

본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 발명이 한정되는 것은 아니다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.Unless there is an explicit order or statement to the contrary regarding the steps constituting the method according to the invention, the steps may be performed in any suitable order. The present invention is not necessarily limited by the order of description of the above steps. The use of any examples or illustrative terms (e.g., etc.) in the present invention is merely to describe the present invention in detail, and unless limited by the claims, the scope of the present invention is limited by the examples or illustrative terms. It doesn't work. Additionally, those skilled in the art will recognize that various modifications, combinations and changes may be made depending on design conditions and factors within the scope of the appended claims or their equivalents.

따라서, 본 발명의 사상은 상기 설명된 실시 예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described below as well as all scopes equivalent to or equivalently changed from the scope of the claims are within the scope of the spirit of the present invention. It will be said to belong to

1 : 이미지 분류 모델 생성 시스템
100 : 이미지 분류 모델 생성 장치
110 : 통신부
120 : 사용자 인터페이스
130 : 메모리
140 : 프로세서
200 : 사용자 단말
300 : 서버
400 : 네트워크1: Image classification model creation system
100: Image classification model creation device
110: Department of Communications
120: user interface
130: memory
140: processor
200: user terminal
300: Server
400: Network

Claims

1. A method for generating an image classification model, performed by at least one processor, comprising:
From a data set collected to create an image classification model, selecting an image pair including a target image and a texture image for changing the texture of the target image;
Extracting structure information and texture information for each of the target image and the texture image through an encoder layer;
Performing normalization on structural information about the target image based on structural information of the target image and structural information of the texture image in a normalization layer;
Generating a composite image based on texture information of the texture image and the normalized structure information in an image generation layer; and
Comprising: performing optimization of the image classification model based on the composite image, the target image, and the texture image,
Before generating the composite image, modulation and demodulation are applied to all layers except the layer that outputs the last pixel value among the image generation layers, and the texture information of the texture image is generated to generate the image. Further comprising the step of transferring to the layer,
How to create an image classification model.

According to claim 1,
The step of selecting the image pair is,
Including selecting pairs of images that are similar to each other from the data set based on a previously learned contrast learning layer,
How to create an image classification model.

According to claim 2,
The step of selecting the image pair is,
measuring distances of features extracted from the previously learned contrast learning layer for images with different labels in the data set; and
Comprising the step of selecting the image with the closest distance among the target image and images with different labels as the texture image based on the distance of the feature,
How to create an image classification model.

According to claim 1,
The step of extracting the structure information and texture information is,
Inputting each of the target image and the texture image into the encoder layer to extract a structural feature map and a texture feature map for each of the target image and the texture image;
extracting structural information of the target image and structural information of the texture image using statistical information of the structural feature map for each of the target image and the texture image; and
Comprising the step of extracting texture information of the target image and texture information of the texture image using statistical information of the texture feature map for each of the target image and the texture image.
How to create an image classification model.

According to claim 4,
The structural information of the texture image includes the average and standard deviation of the structural feature map,
The step of performing the normalization is,
Normalizing the structural feature map of the target image; and
Comprising the step of multiplying the structural feature map of the normalized target image by the standard deviation of the structural feature map of the texture image and adding the average of the structural feature map of the texture image,
How to create an image classification model.

delete

According to claim 1,
The step of generating the composite image is,
Including the step of quantifying the characteristics of the image based on the structural information from the texture image as a latent variable,
How to create an image classification model.

According to claim 1,
The step of performing the optimization is,
calculating a structure-based loss function to minimize the average and standard deviation of the composite image and the target image; and
Comprising a step of calculating a texture-based loss function so that the average and standard deviation of the composite image and the texture image are minimized,
How to create an image classification model.

According to claim 8,
The step of performing the optimization is,
Further comprising calculating an adversarial loss function to minimize the error between the target image and the composite image,
How to create an image classification model.

According to clause 9,
The step of performing the optimization is,
The structure-based loss function ( ), the texture-based loss function ( ) and the adversarial loss function ( ) Based on Equation 1, the final loss function ( ), including the step of calculating
here, is the regularization weight parameter of the texture-based loss function,
How to create an image classification model.
Equation 1

A computer-readable device having a recorded program that, when executed by at least one processor, causes the at least one processor to perform the method of any one of claims 1 to 5 and 7 to 10. Recording media.

An image classification model generating device,
Memory; and
At least one processor coupled to the memory and configured to execute computer-readable instructions contained in the memory,
The at least one processor,
An operation of selecting an image pair including a target image and a texture image for changing the texture of the target image from a data set collected to create an image classification model,
An operation of extracting structure information and texture information for each of the target image and the texture image through an encoder layer;
An operation of performing normalization on structural information about the target image based on the structural information of the target image and the structural information of the texture image in a normalization layer,
An operation of generating a composite image based on the texture information of the texture image and the normalized structure information in an image generation layer, and
is set to perform an operation of optimizing the image classification model based on the composite image, the target image, and the texture image,
The at least one processor,
Before the operation of generating the image, modulation and demodulation are applied to all layers except the layer that outputs the last pixel value among the image generation layers, and the texture information of the texture image is transferred to the image generation layer. is set to further perform an operation of transferring to,
Image classification model generation device.

According to claim 12,
The operation of selecting the image pair is,
Including an operation of selecting similar image pairs from the data set based on a previously learned contrast learning layer,
Image classification model generation device.

According to claim 13,
The operation of selecting the image pair is,
An operation of measuring the distance of features extracted from the previously learned contrast learning layer for images of different labels in the data set, and
Comprising the operation of selecting the image with the closest distance among the target image and images with different labels as the texture image based on the distance of the feature,
Image classification model generation device.

According to claim 12,
The operation of extracting the structure information and texture information is,
An operation of inputting each of the target image and the texture image into the encoder layer to extract a structural feature map and a texture feature map for each of the target image and the texture image;
Extracting structural information of the target image and structural information of the texture image using statistical information of the structural feature map for each of the target image and the texture image, and
Comprising the operation of extracting texture information of the target image and texture information of the texture image using statistical information of the texture feature map for each of the target image and the texture image,
Image classification model generation device.

According to claim 15,
The structural information of the texture image includes the average and standard deviation of the structural feature map,
The operation of performing the normalization is,
An operation of normalizing a structural feature map of the target image, and
Comprising the operation of multiplying the structural feature map of the normalized target image by the standard deviation of the structural feature map of the texture image and adding the average of the structural feature map of the texture image,
Image classification model generation device.

delete

According to claim 12,
The operation of generating the image is,
Including the operation of quantifying the characteristics of the image based on the structural information from the texture image into latent variables,
Image classification model generation device.

According to claim 12,
The operation of performing the optimization is,
Calculating a structure-based loss function to minimize the average and standard deviation of the synthetic image and the target image, and
Comprising an operation of calculating a texture-based loss function so that the average and standard deviation of the composite image and the texture image are minimized,
Image classification model generation device.

According to claim 19,
The operation of performing the optimization is,
Further comprising calculating an adversarial loss function to minimize the error between the target image and the composite image,
Image classification model generation device.