KR102591395B1

KR102591395B1 - System and method for supporting diagnosis of velo cardio facial syndrome(vcfs)

Info

Publication number: KR102591395B1
Application number: KR1020210094253A
Authority: KR
Inventors: 조안나; 백롱민; 명유진; 김헌민; 정윤기; 황희; 차지은; 강인철; 전용훈
Original assignee: 서울대학교병원; 이지케어텍(주)
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2023-10-20
Also published as: KR20230013468A

Abstract

딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 시스템 및 방법이 제공된다. 상기 시스템은 대상자의 얼굴 이미지를 입력 받는 입력부; 및 딥러닝 기반의 미리 학습된 예측 모델로서, 입력된 얼굴 이미지에 기초하여 대상자의 VCFS 발현 가능성을 예측하는 예측 모델을 포함하되, 상기 예측 모델은 상기 대상자의 VCFS 발현 가능성과 연관된 정보를 시각화한 정보를 더 제공한다.A deep learning-based palatal heart facial syndrome (VCFS) diagnosis support system and method is provided. The system includes an input unit that receives a subject's face image; and a deep learning-based pre-trained prediction model, which includes a prediction model that predicts the possibility of developing VCFS of the subject based on the input facial image, wherein the prediction model visualizes information related to the possibility of developing VCFS of the subject. Provides more.

Description

Deep learning-based VCFS diagnosis support system and method {SYSTEM AND METHOD FOR SUPPORTING DIAGNOSIS OF VELO CARDIO FACIAL SYNDROME(VCFS)}

본 발명은 입천장심장얼굴 증후군(VCFS) 진단 지원 시스템 및 방법에 관한 것으로, 더욱 상세하게는 딥러닝 기반의 예측 모델을 이용하여 환자의 얼굴 이미지로부터 VCFS에 의한 얼굴 표현 여부를 예측하고, 연관된 얼굴 영역의 클래스 활성화 맵을 제공함으로써 의료인의 VCFS 진단을 지원할 수 있는 시스템 및 방법에 관한 것입니다.The present invention relates to a diagnostic support system and method for VCFS, and more specifically, to predict whether or not facial expression is due to VCFS from a patient's facial image using a deep learning-based prediction model, and to predict the presence or absence of facial expression by VCFS using a deep learning-based prediction model. This is about a system and method that can support medical personnel in diagnosing VCFS by providing a class activation map of .

입천장심장얼굴 증후군(Velocardiofacial syndrome; VCFS)은 22번 염색체 장완 11.2 부위의 미세결실로 발생하는 복합 질환에 해당한다. VCFS는 심장기형, 저칼슘혈증, 갑상선호르몬 이상, 면역저하 등의 합병증이 발생하는 질환으로, 합병증의 치료와 예방을 위해서 VCFS의 정확한 조기 진단이 요구된다. Velocardiofacial syndrome (VCFS) is a complex disease caused by a microdeletion in the 11.2 region of the long arm of chromosome 22. VCFS is a disease that causes complications such as heart malformation, hypocalcemia, thyroid hormone abnormalities, and decreased immunity. Accurate early diagnosis of VCFS is required to treat and prevent complications.

그러나, VCFS는 발생 빈도가 매우 드문 질환이어서 환자의 안면에 표현된 다양한 특징을 전문의가 파악한 후 유전자 검사를 시행하여 최종 확진이 가능하며, VCFS를 주로 진료하는 전문가가 아닌 의료진이 조기에 의심하고 진단하는데 어려움이 있다. However, VCFS is a disease with a very rare occurrence, so a final diagnosis can be made by conducting a genetic test after a specialist identifies various characteristics expressed on the patient's face. Medical staff, not specialists who mainly treat VCFS, suspect and diagnose it early. There is difficulty in doing so.

최근 빠른 속도로 발전하고 있는 기계학습(machine learning) 기반 인공지능 기술로 인해, 환자들의 데이터를 활용하여 특정 질병의 발생을 예측하거나 관리하는 것이 가능해졌다. 특히, 기계학습 모델은 기존의 회귀 모델(regression models)과 비교하여 많은 기능을 통합할 수 있으므로 비선형 알고리즘을 사용할 수 있게 하고, 그 결과 다양한 질병 발생 예측 분야에서 신경망(neural network) 모델과 같은 기계학습 방법을 채택한 다양한 연구가 보고되고 있다.Recently, machine learning-based artificial intelligence technology has been developing at a rapid pace, making it possible to predict or manage the occurrence of specific diseases using patient data. In particular, machine learning models can integrate many functions compared to existing regression models, allowing the use of non-linear algorithms, and as a result, machine learning such as neural network models in various disease occurrence prediction fields. Various studies adopting the method have been reported.

이에, 본 출원의 발명자들은 VCFS의 조기 진단에 도움을 주고자 딥러닝 기반의 VCFS의 안면 특징을 구분할 수 있는 기술을 제안하고자 한다.Accordingly, the inventors of the present application would like to propose a technology that can distinguish facial features of VCFS based on deep learning to help with the early diagnosis of VCFS.

KR 10-2241483KR 10-2241483

이에 본 발명은, 본 발명은 딥러닝을 이용하여 환자의 얼굴 이미지로부터 VCFS에 의한 얼굴 표현 여부를 예측하고, 연관된 얼굴 영역의 클래스 활성화 맵을 제공함으로써 의료인의 VCFS 진단을 지원할 수 있는 시스템 및 방법을 제공하는 것을 목적으로 한다.Accordingly, the present invention provides a system and method that can support medical personnel's diagnosis of VCFS by predicting whether or not the face is expressed by VCFS from the patient's face image using deep learning and providing a class activation map of the associated facial area. The purpose is to provide

일 실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 시스템은 대상자의 얼굴 이미지를 입력 받는 입력부; 및 딥러닝 기반의 미리 학습된 예측 모델로서, 입력된 얼굴 이미지에 기초하여 대상자의 VCFS 발현 가능성을 예측하는 예측 모델을 포함하되, 상기 예측 모델은 상기 대상자의 VCFS 발현 가능성과 연관된 정보를 시각화한 정보를 더 제공한다. A deep learning-based palate heart facial syndrome (VCFS) diagnosis support system according to an embodiment includes an input unit that receives a face image of a subject; and a deep learning-based pre-trained prediction model, which includes a prediction model that predicts the possibility of developing VCFS of the subject based on the input facial image, wherein the prediction model visualizes information related to the possibility of developing VCFS of the subject. Provides more.

실시예에서, 상기 예측 모델은 입력된 얼굴 이미지에서 얼굴 영역을 검출하고 정렬하여 표준화된 얼굴 이미지를 생성하는 전처리부; 상기 표준화된 얼굴 이미지에서 피처 맵(feature map)을 추출하는 합성곱 계층; 상기 추출된 피처 맵(feature map)을 기초로 VCFS에 의한 안면 표현형일 확률 값을 출력하도록 구성된 전-연결 계층; 및 상기 시각화한 정보를 생성하는 데이터 시각화부를 포함할 수 있다. In an embodiment, the prediction model includes a preprocessor that detects and aligns face regions in input face images to generate a standardized face image; a convolution layer that extracts a feature map from the standardized face image; A pre-connection layer configured to output a probability value of a facial phenotype by VCFS based on the extracted feature map; and a data visualization unit that generates the visualized information.

실시예에서, 상기 시각화한 정보는 상기 전-연결 계층이 결과를 얻기 위해 특정 레이어의 어떤 위치에 있는 특징들을 이용했는지 시각화하여 나타낸 히트맵을 상기 입력된 얼굴 이미지와 함께 표시한 클래스 활성화 맵이며, 상기 클래스 활성화 맵은 상기 대상자의 얼굴 이미지에 VCFS 발현 가능성에 대한 판단 시 기여도가 타 얼굴 영역보다 높은 얼굴 영역을 표시하도록 구성될 수 있다.In an embodiment, the visualized information is a class activation map that displays a heatmap that visualizes which features of a specific layer were used by the pre-connected layer to obtain a result together with the input face image, The class activation map may be configured to display a face region that has a higher contribution than other face regions when determining the possibility of VCFS occurring in the subject's face image.

실시예에서, 상기 히트맵은 Grad-CAM(gradient-weighted class activation mapping)을 통해 생성될 수 있다.In an embodiment, the heatmap may be generated through gradient-weighted class activation mapping (Grad-CAM).

실시예에서, 상기 합성곱 계층은 ResNet(residual networks)으로 구성되고, 상기 전처리부는 MTCNN(Multi-task cascaded convolutional networks)으로 구성될 수 있다. In an embodiment, the convolution layer may be comprised of ResNet (residual networks), and the preprocessor may be comprised of MTCNN (Multi-task cascaded convolutional networks).

실시예에서, 상기 얼굴 이미지는 정면 얼굴 이미지일 수 있다. In an embodiment, the facial image may be a frontal facial image.

다른 실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 방법은 대상자의 얼굴 이미지를 입력 받는 단계; 상기 입력된 얼굴 이미지에 따른 대상자의 VCFS 발현 가능성을 딥러닝 기반의 미리 학습된 예측 모델을 통해 예측하는 단계; 및 상기 예측 모델을 통해 상기 VCFS 발현 가능성과 연관된 정보를 시각화한 정보를 제공하는 단계를 포함한다. According to another embodiment, a deep learning-based method of supporting diagnosis of VCFS includes receiving an image of a subject's face; Predicting the possibility of the subject's VCFS according to the input face image using a deep learning-based pre-trained prediction model; and providing information visualizing information related to the possibility of occurrence of VCFS through the prediction model.

실시예에서, 상기 입력된 얼굴 이미지에 따른 대상자의 VCFS 발현 가능성을 딥러닝 기반의 미리 학습된 예측 모델을 통해 예측하는 단계는 입력된 얼굴 이미지에서 얼굴 영역을 검출하고 정렬하여 표준화된 얼굴 이미지를 생성하는 단계; 상기 표준화된 얼굴 이미지에서 피처 맵(feature map)을 추출하는 단계; 및 상기 추출된 피처 맵(feature map)을 기초로 VCFS에 의한 안면 표현형일 확률 값을 출력하는 단계를 포함할 수 있다. In an embodiment, the step of predicting the possibility of the subject's VCFS according to the input face image using a deep learning-based pre-trained prediction model involves detecting and aligning the face area in the input face image to generate a standardized face image. steps; extracting a feature map from the standardized face image; And it may include outputting a probability value of a facial phenotype by VCFS based on the extracted feature map.

실시예에서, 상기 시각화한 정보는 상기 전-연결 계층이 결과를 얻기 위해 특정 레이어의 어떤 위치에 있는 특징들을 이용했는지 시각화하여 나타낸 히트맵을 상기 입력된 얼굴 이미지와 함께 표시한 클래스 활성화 맵이며, 상기 클래스 활성화 맵은 상기 대상자의 얼굴 이미지에 VCFS 발현 가능성에 대한 판단 시 기여도가 타 얼굴 영역보다 높은 얼굴 영역을 표시하도록 구성될 수 있다. In an embodiment, the visualized information is a class activation map that displays a heatmap that visualizes which features of a specific layer were used by the pre-connected layer to obtain a result together with the input face image, The class activation map may be configured to display a face region that has a higher contribution than other face regions when determining the possibility of VCFS occurring in the subject's face image.

실시예에서, 상기 히트맵은 Grad-CAM(gradient-weighted class activation mapping)을 통해 생성될 수 있다. In an embodiment, the heatmap may be generated through gradient-weighted class activation mapping (Grad-CAM).

실시예에서, 상기 입력된 얼굴 이미지에서 얼굴 영역을 검출하고 정렬하여 표준화된 얼굴 이미지를 생성하는 단계는 MTCNN(Multi-task cascaded convolutional networks)으로 구성되는 전처리부에서 수행되고, 상기 표준화된 얼굴 이미지에서 피처 맵(feature map)을 추출하는 단계는 ResNet(residual networks)으로 구성되는 합성곱 계층에서 수행될 수 있다. In an embodiment, the step of generating a standardized face image by detecting and aligning the face area in the input face image is performed in a preprocessor consisting of MTCNN (Multi-task cascaded convolutional networks), and in the standardized face image The step of extracting a feature map can be performed in a convolution layer composed of ResNet (residual networks).

또 다른 실시예에 따른 컴퓨터 프로그램은 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 방법을 구현하기 위한, 컴퓨터로 판독 가능한 기록 매체에 저장된다. A computer program according to another embodiment is stored in a computer-readable recording medium for implementing a deep learning-based VCFS diagnosis support method.

실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 시스템 및 방법은 딥러닝 기반의 예측 모델을 이용하여 환자의 얼굴 이미지로부터 VCFS에 의한 얼굴 표현 여부를 예측할 수 있고, 예측 모델의 결과에 영향을 미치는 특징 변수들과 결과 간의 관계를 시각적으로 나타내는 얼굴 영역의 클래스 활성화 맵을 제공하여 임상의에게 변수들이 예측 결과에 어떻게 영향을 미치는지에 대한 통찰력을 제공할 수 있다.A deep learning-based palatal heart facial syndrome (VCFS) diagnosis support system and method according to an embodiment can predict whether facial expression is performed by VCFS from a patient's facial image using a deep learning-based prediction model, and the results of the prediction model By providing a class activation map of facial regions that visually represents the relationship between the feature variables that affect the outcome and the outcome, it can provide clinicians with insight into how the variables affect the predicted outcome.

본 발명 또는 종래 기술의 실시예의 기술적 해결책을 보다 명확하게 설명하기 위해, 실시예에 대한 설명에서 필요한 도면이 아래에서 간단히 소개된다. 아래의 도면들은 본 명세서의 실시예를 설명하기 목적일 뿐 한정의 목적이 아니라는 것으로 이해되어야 한다. 또한, 설명의 명료성을 위해 아래의 도면들에서 과장, 생략 등 다양한 변형이 적용된 일부 요소들이 도시될 수 있다.
도 1은 일 실시예에 따른 딥러닝 기반의 VCFS 진단 지원 시스템의 구성을 나타낸 블록도이다.
도 2는 예측 모델의 네트워크 구조를 도시한다.
도 3은 전-연결 계층의 구조를 예시적으로 도시한다.
도 4는 일 실시예에 따른 딥러닝 기반의 VCFS 진단 지원 시스템에 따른 예측 결과를 예시적으로 도시한다.
도 5는 일 실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 방법의 각 단계를 나타내는 순서도이다.
도 6은 입력된 얼굴 이미지에 따른 대상자의 VCFS 발현 가능성을 예측 모델을 통해 예측하는 단계의 세부 단계를 나타내는 순서도이다.
도 7은 분류 성능을 계산하기 위한 계산식을 예시적으로 나타낸다.
도 8은 Test 이미지의 얼굴 각도(정면, Upward 정면, 측면1, 측면2)를 예시적으로 나타낸다.In order to more clearly explain the technical solutions of the embodiments of the present invention or the prior art, drawings necessary in the description of the embodiments are briefly introduced below. It should be understood that the drawings below are for illustrative purposes only and not for limiting purposes of the embodiments of the present specification. Additionally, for clarity of explanation, some elements may be shown in the drawings below with various modifications, such as exaggeration or omission.
Figure 1 is a block diagram showing the configuration of a deep learning-based VCFS diagnosis support system according to an embodiment.
Figure 2 shows the network structure of the prediction model.
Figure 3 exemplarily shows the structure of a pre-connection layer.
Figure 4 exemplarily shows prediction results according to a deep learning-based VCFS diagnosis support system according to an embodiment.
Figure 5 is a flowchart illustrating each step of a deep learning-based method to support diagnosis of VCFS according to an embodiment.
Figure 6 is a flow chart showing the detailed steps of predicting the possibility of developing VCFS of a subject according to the input face image through a prediction model.
Figure 7 exemplarily shows a calculation formula for calculating classification performance.
Figure 8 exemplarily shows the face angles (front, upward front, side 1, side 2) of the test image.

이하 첨부 도면들 및 첨부 도면들에 기재된 내용들을 참조하여 실시예를 상세하게 설명하지만, 청구하고자 하는 범위는 실시예들에 의해 제한되거나 한정되는 것은 아니다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings and the contents described in the accompanying drawings, but the claimed scope is not limited or limited by the embodiments.

본 명세서에서 사용되는 용어는 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 관례 또는 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 명세서의 설명 부분에서 그 의미를 기재할 것이다. 따라서 본 명세서에서 사용되는 용어는, 단순한 용어의 명칭이 아닌 그 용어가 가지는 실질적인 의미와 본 명세서의 전반에 걸친 내용을 토대로 해석되어야 함을 밝혀두고자 한다.The terminology used in this specification is a general term that is currently widely used as much as possible while considering function, but this may vary depending on the intention or practice of a technician working in the art or the emergence of new technology. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in the explanation part of the relevant specification. Therefore, we would like to clarify that the terms used in this specification should be interpreted based on the actual meaning of the term and the overall content of this specification, not just the name of the term.

또한, 본 명세서에 기술된 실시예는 전적으로 하드웨어이거나, 부분적으로 하드웨어이고 부분적으로 소프트웨어이거나, 또는 전적으로 소프트웨어인 측면을 가질 수 있다. 본 명세서에서 "부(unit)", “모듈(module)", "장치(device)" 또는 "시스템(system)" 등의 용어는 하드웨어, 하드웨어와 소프트웨어의 조합, 또는 소프트웨어 등 컴퓨터 관련 엔티티(entity)를 지칭한다. 예를 들어, 부, 모듈, 장치 또는 시스템은 플랫폼(platform)의 일부 또는 전부를 구성하는 하드웨어 및/또는 상기 하드웨어를 구동하기 위한 애플리케이션(application) 등의 소프트웨어를 지칭하는 것일 수 있다.Additionally, the embodiments described herein may have aspects that are entirely hardware, partly hardware and partly software, or entirely software. In this specification, terms such as “unit,” “module,” “device,” or “system” refer to computer-related entities such as hardware, a combination of hardware and software, or software. ). For example, a part, module, device, or system may refer to hardware that constitutes part or all of a platform and/or software such as an application for running the hardware. there is.

이하에서는 도면들을 참조하여 본 발명의 바람직한 실시예들에 대하여 상세히 살펴본다.Hereinafter, preferred embodiments of the present invention will be examined in detail with reference to the drawings.

도 1은 일 실시예에 따른 딥러닝 기반의 VCFS 진단 지원 시스템의 구성을 나타낸 블록도이다. 도 2는 예측 모델의 네트워크 구조를 도시한다. 도 3은 전-연결 계층의 구조를 예시적으로 도시한다. 도 4는 일 실시예에 따른 딥러닝 기반의 VCFS 진단 지원 시스템에 따른 예측 결과를 예시적으로 도시한다.Figure 1 is a block diagram showing the configuration of a deep learning-based VCFS diagnosis support system according to an embodiment. Figure 2 shows the network structure of the prediction model. Figure 3 exemplarily shows the structure of a pre-connection layer. Figure 4 exemplarily shows prediction results according to a deep learning-based VCFS diagnosis support system according to an embodiment.

입천장심장얼굴 증후군(VCFS)는 신체적, 대사적, 내분비적, 행동적 특징에서 광범위한 특징이 나타날 수 있으며, 심장병, 특징적인 얼굴, 면역결핍, 인지적 저하 등이 주요 특성에 해당한다. VCFS에 의해 나타나는 특징적인 얼굴 표현형은 구개열, 길쭉한 배 모양의 코, 작은 귀 또는 좁은 눈 등이 있다. 실시예에 따른 VCFS 진단 지원 시스템(10)은 대상자의 얼굴 이미지를 입력 받고, 입력된 대상자의 얼굴 이미지를 기초로 대상자의 VCFS 발현 가능성을 예측할 수 있다.Palatal Heart Facial Syndrome (VCFS) can have a wide range of physical, metabolic, endocrine, and behavioral characteristics, and major characteristics include heart disease, a characteristic face, immunodeficiency, and cognitive decline. Characteristic facial phenotypes caused by VCFS include cleft palate, an elongated pear-shaped nose, and small ears or narrow eyes. The VCFS diagnosis support system 10 according to the embodiment may receive an input face image of the subject and predict the possibility of the subject developing VCFS based on the input facial image of the subject.

도 1 내지 도 4를 참조하면, 실시예에 따른 VCFS 진단 지원 시스템(10)은 데이터 입력부(100), 예측 모델(110)을 포함한다.Referring to FIGS. 1 to 4 , the VCFS diagnosis support system 10 according to an embodiment includes a data input unit 100 and a prediction model 110.

데이터 입력부(100)는 대상자의 얼굴 이미지를 입력 받는다. 대상자의 얼굴 이미지는 대상자의 얼굴이 포함한 이미지를 의미한다. 대상자의 얼굴 이미지는 카메라가 대상자의 얼굴을 정면에서 촬영한 정면 얼굴 이미지일 수 있다. 여기서, 정면 얼굴 이미지는 대상자의 얼굴 특징이 가장 잘 나타나도록 촬영된 이미지로서, 본 실시예에 따른 VCFS 진단 지원 시스템에 더욱 적합한 이미지일 수 있다.The data input unit 100 receives the subject's face image. The subject's face image refers to an image containing the subject's face. The subject's face image may be a frontal facial image in which a camera captures the subject's face from the front. Here, the frontal facial image is an image taken to best reveal the subject's facial features, and may be an image more suitable for the VCFS diagnosis support system according to this embodiment.

예측 모델(110)은 딥러닝 기반의 미리 학습된 예측 모델로서, 입력된 얼굴 이미지에 기초하여 대상자의 VCFS 발현 가능성을 예측한다. 예측 모델(110)은 입력된 얼굴 이미지에서 인식되는 얼굴 영역이 VCFS에 의한 얼굴 표현형일 확률 값을 계산할 수 있다. 또한, 예측 모델(110)은 대상자의 VCFS 발현 가능성과 연관된 정보를 시각화한 정보를 더 제공할 수 있다.The prediction model 110 is a deep learning-based pre-trained prediction model that predicts the likelihood of a subject developing VCFS based on the input face image. The prediction model 110 may calculate a probability value that the face area recognized in the input face image is a face phenotype by VCFS. Additionally, the prediction model 110 may further provide information visualizing information related to the subject's likelihood of developing VCFS.

예측 모델(110)은 심층 신경망(deep neural network) 구조를 갖는 기계 학습 모델로, 도 2와 같은 구조를 가질 수 있다. 예측 모델(110)은 입력된 얼굴 이미지에서 얼굴 영역을 전처리하여 딥러닝 모델에 예측 가능한 입력으로 변환하도록 구성되며, 대규모 공개 얼굴 데이터 세트를 통해 학습된 얼굴 인식 모델을 VCFS 특화 얼굴 인식 모델로 미세 조정(Fine-tuning)한 모델로 구성되어, 입력된 얼굴 영역이 VCFS에 의한 얼굴 표현형일 확률 값을 추론할 수 있다. The prediction model 110 is a machine learning model with a deep neural network structure and may have the structure shown in FIG. 2. The prediction model 110 is configured to preprocess the face region in the input face image and convert it into a predictable input to the deep learning model, and fine-tune the face recognition model learned through a large-scale public face data set into a VCFS-specialized face recognition model. It is composed of a (fine-tuned) model, and can infer the probability that the input facial area is a facial phenotype by VCFS.

구체적으로, 예측 모델(110)은 전처리부(111), 합성곱 계층(112), 전-연결 계층(113) 및 데이터 시각화부(114)를 포함한다.Specifically, the prediction model 110 includes a preprocessor 111, a convolution layer 112, a pre-connection layer 113, and a data visualization unit 114.

전처리부(111)는 입력된 얼굴 이미지에서 얼굴 영역을 검출할 수 있다. 전처리부(111)는 MTCNN(Multi-task Cascaded Convolutional Neural Networks)를 사용하여 구현될 수 있다. 전처리부(111)는 검출된 얼굴 영역이 얼굴 이미지의 일정 영역에 위치하도록 얼굴 영역을 정렬하거나, 동일한 크기로 조정된 얼굴 영역을 얼굴 이미지에서 추출하여 표준화된 얼굴 이미지를 생성할 수 있다.The preprocessor 111 may detect a face area from the input face image. The preprocessor 111 may be implemented using Multi-task Cascaded Convolutional Neural Networks (MTCNN). The preprocessor 111 may align the face areas so that the detected face area is located in a certain area of the face image, or extract a face area adjusted to the same size from the face image to generate a standardized face image.

합성곱 계층(112, convolutional layer)은 표준화된 얼굴 이미지에서 얼굴을 인식하고 인식된 얼굴의 특징 맵(feature map)을 추출할 수 있다. 합성곱 계층(112)은 합성곱 필터(Convolutional filter)를 이동시키면서 표준화된 얼굴 이미지와 필터의 내적(inner product)을 이용해 특징 맵의 특징을 추출할 수 있다. 일 실시예에서, 합성곱 계층(112)은 영상 데이터 학습에 효율적이라고 알려진 ResNet(Residual networks)을 통해 구성될 수 있으며, 입력을 바로 출력으로 연결시키는 스킵 커넥션(skip connection)을 통해 더욱 심층적인 학습을 제공할 수 있다. The convolutional layer (112) can recognize faces from standardized face images and extract feature maps of the recognized faces. The convolutional layer 112 can extract features of the feature map using the inner product of the normalized face image and the filter while moving the convolutional filter. In one embodiment, the convolution layer 112 may be constructed through ResNet (Residual networks), which are known to be efficient for learning image data, and perform deeper learning through skip connections that directly connect input to output. can be provided.

전-연결 계층(113, fully-connected layer)은 합성곱 계층(112)에서 추출된 피처 맵(feature map)이 VCFS에 의한 안면 표현형일 확률 값을 출력하도록 구성될 수 있다. 전-연결 계층(113)은 VCFS에 의한 안면 표현형과 정상 구분을 위한 이진 분류(binary classification)형태로 구성될 수 있다. 전-연결 계층(113)은 합성곱 계층(112)에서 추출된 특징 맵의 차원을 감소시키도록 적어도 하나의 합성곱 계층(FC)과 차원이 감소된 특징 맵을 분류하는 분류기(Classifier)를 포함한다. 도 3에 도시된 바와 같이, 합성곱 계층(112)은 제1 전-연결 계층(FC1), 제2 전-연결 계층(FC2) 및 분류기(Classifier)를 포함하도록 구성될 수 있다. 제1, 제2 전-연결 계층(FC1, FC2)에서, 특징 맵들은 활성 함수(activation function)를 통과하고, 풀링(Pooling) 과정을 통해 크기가 줄어들게 된다. 크기가 줄어든 특징 맵에 분류기의 소프트맥스(softmax) 함수가 적용되어 0과 1 사이의 결과 값(class score, 결과 총합이 1인)이 출력될 수 있다. 전-연결 계층(113)은 VCFS에 의한 안면 표현형일 확률 값과 정상 표현형일 확률 값을 각각 계산할 수 있으며, 계산된 VCFS에 의한 안면 표현형일 확률 값과 계산된 정상 표현형일 확률 값 중 큰 값을 결과 값으로 출력할 수 있다. 다른 실시예에서, 크기가 줄어든 특징 맵에 분류기의 시그모이드(sigmoid) 함수가 적용되어 0과 1 사이의 결과 값(class score)이 출력될 수 있다. 전-연결 계층(113)은 하나의 결과 값을 출력할 수 있으며, 출력된 결과 값이 미리 지정된 임계 값 이상인 경우 VCFS에 의한 안면 표현형으로 분류하고, 결과 값이 미리 지정된 임계 값 미만인 경우 정상으로 분류할 수도 있다. 전-연결 계층(113)은 역전파(back-propagation)를 통해, 예측하고자 하는 클래스(class)의 이미지를 추정하도록 파라미터가 최적화된 상태일 수 있다. 예시적으로, 모델의 학습을 위해 교차 엔트로피 손실(Cross Entropy Loss) 함수와 Adam 최적화 프로그램이 사용될 수 있다.The fully-connected layer 113 may be configured to output a probability value that the feature map extracted from the convolution layer 112 is a facial phenotype by VCFS. The pre-connection layer 113 may be configured in the form of binary classification for facial phenotype and normal classification by VCFS. The pre-connection layer 113 includes at least one convolution layer (FC) to reduce the dimensionality of the feature map extracted from the convolution layer 112 and a classifier to classify the feature map with reduced dimensionality. do. As shown in FIG. 3, the convolution layer 112 may be configured to include a first pre-connected layer (FC1), a second pre-connected layer (FC2), and a classifier. In the first and second pre-connection layers (FC1, FC2), the feature maps pass through an activation function and are reduced in size through a pooling process. The classifier's softmax function can be applied to the reduced size feature map, and a result value between 0 and 1 (class score, with a total result of 1) can be output. The pre-connection layer 113 can calculate the probability value of a facial phenotype by VCFS and the probability value of a normal phenotype, respectively. The larger value of the calculated probability value of a facial phenotype by VCFS and the calculated probability value of a normal phenotype is calculated. It can be output as a result value. In another embodiment, the sigmoid function of the classifier may be applied to the reduced-sized feature map to output a class score between 0 and 1. The pre-connection layer 113 can output one result value, and if the output result value is above a pre-specified threshold, it is classified as a facial phenotype by VCFS, and if the result value is less than a pre-specified threshold, it is classified as normal. You may. The pre-connection layer 113 may have parameters optimized to estimate the image of the class to be predicted through back-propagation. As an example, the Cross Entropy Loss function and the Adam optimizer can be used to learn the model.

인공지능 기술의 발전으로 인해 질환 예측 시스템들의 성능은 향상되고 있지만 신경망 모델의 블랙박스 특성으로 인해 현장 의료진의 임상의사 결정에 직접적인 도움을 주는 것에 한계가 존재한다. 이는 기계학습 기반 모델이 복잡한 다중 은닉 계층과 가중치 매개 변수를 통해 한 번에 많은 양의 입력 데이터를 처리하기 때문에 사용자가 어떠한 기준과 방식으로 결과가 도출되었는지 해석하기 어렵기 때문이다. 이는 사용자가 예측 모델의 결과를 전적으로 신뢰할 수 없게 하고, 단순한 수치의 나열만으로는 의료진이 어떠한 조치를 취해야 하는지 결정하기 어렵게 만들기 때문이다.Due to the advancement of artificial intelligence technology, the performance of disease prediction systems is improving, but due to the black box nature of neural network models, there are limits to directly helping field medical staff make clinical decisions. This is because machine learning-based models process a large amount of input data at once through complex multiple hidden layers and weight parameters, making it difficult for users to interpret what criteria and methods resulted in the results. This is because users cannot completely trust the results of the prediction model, and it makes it difficult for medical staff to decide what action to take with just a simple list of numbers.

따라서 실제 의료 현장에서 예측 결과를 적용하려면 임상의가 예측 모델의 위험 평가가 어떻게 도출되는지에 대한 자료, 특히, 시각화된 자료가 필요하다. 이에, 본 발명의 실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 시스템(10)은 예측 모델의 결과를 해석할 수 있도록 하고 직관적인 분석을 제공하여 임상의사 결정을 지원할 수 있는 시각화된 자료를 더 제공할 수 있다.Therefore, in order to apply prediction results in actual medical settings, clinicians need data on how the risk assessment of the prediction model is derived, especially visualized data. Accordingly, the deep learning-based palatal heart facial syndrome (VCFS) diagnosis support system 10 according to an embodiment of the present invention enables interpretation of the results of the prediction model and provides intuitive analysis to support clinical decision-making. More visualized data can be provided.

데이터 시각화부(114)는 대상자의 VCFS 발현 가능성과 연관된 정보를 시각화한 정보로 더 제공할 수 있다. 데이터 시각화부(114)는 합성곱 계층(112) 및 전-연결 계층(113)이 결과를 얻기 위해 특정 레이어의 어떤 위치에 있는 특징들을 이용했는지 확인한 정보를 시각화하여 제공할 수 있다. 데이터 시각화부(114)는 결과 값(class score)과 합성곱 계층(112)의 최종 피처 맵 사이의 변화량을 추적하여, 입력된 얼굴 이미지와 동일한 크기를 갖는 히트맵(heatmap)을 생성할 수 있으며, 생성된 히트맵을 원본 얼굴 이미지와 함께 표시한 클래스 활성화 맵(CAM, Class Activation Map)을 시각화한 정보로서 제공할 수 있다. The data visualization unit 114 may further provide visualized information related to the subject's likelihood of developing VCFS. The data visualization unit 114 may visualize and provide information confirming which features of a specific layer were used by the convolution layer 112 and the pre-connection layer 113 to obtain a result. The data visualization unit 114 tracks the amount of change between the result value (class score) and the final feature map of the convolution layer 112, and generates a heatmap having the same size as the input face image. , The generated heat map can be provided as visual information of a class activation map (CAM) displayed together with the original face image.

일 실시예에서, 데이터 시각화부(114)는 Grad-CAM(Gradient-weighted class activation mapping)을 통해 히트맵을 생성할 수 있다. Grad-CAM은 신경망이 결과를 얻기 위해 특정 레이어의 어떤 위치에 있는 특징들을 이용했는지 확인할 수 있는 시각화 방법으로서, 그래디언트(gradients)를 사용하여 컨볼루션 레이어들에서 활성화된 공간 영역(spatial region)을 나타낼 수 있다. 일반적으로, 합성곱 계층(112)은 전-연결 계층(113)에서 상실되는 공간 정보를 보유할 수 있으며, 합성곱 계층(112)의 후단으로 가는 것에 따라 더욱 추상화된 정보를 보유할 수 있다. Grad-CAM에서는 합성곱 계층(112)들 중 최종층의 정보를 이용하여 히트맵의 작성이 진행될 수 있다. 구체적으로, Grad-CAM은 하기 수학식 1과 같이 구현될 수 있다.In one embodiment, the data visualization unit 114 may generate a heatmap through Gradient-weighted class activation mapping (Grad-CAM). Grad-CAM is a visualization method that allows you to check which features of a specific layer were used by a neural network to obtain a result. It uses gradients to represent the spatial region activated in the convolutional layers. You can. In general, the convolution layer 112 can retain spatial information that is lost in the pre-connection layer 113, and can retain more abstracted information as it progresses to the back end of the convolution layer 112. In Grad-CAM, a heatmap can be created using information from the final layer of the convolution layers 112. Specifically, Grad-CAM can be implemented as shown in Equation 1 below.

[수학식 1][Equation 1]

, ,

(: 합성곱 계층 구조의 마지막 단의 출력이자 전-연결 계층의 입력 값, : 합성곱 계층의 k번째에 있는 특징맵, Z: 특징 맵의 크기, ReLU는 음의 값을 0으로 만드는 활성 함수이다) ( : The output of the last stage of the convolutional layer structure and the input value of the pre-connection layer, : feature map in the kth convolution layer, Z: size of feature map, ReLU is an activation function that makes negative values 0)

는 신경망 중요도 가중치(neuron importance weight)로서, 가 에 대해 가지는 그래디언트 값인 을 모두 더한 다음 평균을 구한 값에 해당한다. 최종적으로 획득되는 Grad-cam 값()은 에 를 곱해서 모두 더한 다음, 음의 값을 0으로 만드는 활성 함수(ReLU)를 통과하여 계산될 수 있다. is the neural network importance weight, go The gradient value for It corresponds to the value obtained by adding up all and then calculating the average. Finally obtained Grad-cam value ( )silver to It can be calculated by multiplying and adding them all together and then passing them through an activation function (ReLU) that reduces negative values to 0.

상술한 과정을 통해 생성된 히트맵에서 강조되는 영역은 컨볼루션 레이어들에서 활성화된 공간 영역에 해당한다. 예시적으로, 히트맵은 활성화된 공간 영역을 적색으로 표시할 수 있으며, 비활성화된 공간 영역을 청색으로 나타낼 수 있으며, 이의 사이 영역은 적색과 청색 사이의 색상 그라데이션을 나타낼 수 있다. 생성된 히트맵을 원본 얼굴 이미지와 함께 표시한 클래스 활성화 맵(CAM, Class Activation Map)을 통해 원본 얼굴 이미지에서 신경망이 중요하게 고려하는 영역의 확인이 가능하다. 즉, 예측 모델(110)이 입력된 얼굴 이미지가 VCFS에 의한 얼굴 표현형으로 판단한 경우, 입력된 얼굴 이미지 중 VCFS에 의한 얼굴 표현형으로 판단하는 데에 있어 다른 얼굴 영역보다 기여도가 높은 얼굴 영역을 클래스 활성화 맵을 통해 확인할 수 있다. The areas emphasized in the heatmap generated through the above-described process correspond to the spatial areas activated in the convolutional layers. As an example, the heatmap may display an activated spatial region in red, an inactivated spatial region in blue, and an area between the heatmaps may represent a color gradient between red and blue. Through the Class Activation Map (CAM), which displays the generated heatmap together with the original face image, it is possible to check the areas that the neural network considers important in the original face image. That is, if the prediction model 110 determines that the input face image is a facial phenotype by VCFS, the class activates the face region that contributes more than other facial regions in determining the facial phenotype by VCFS among the input face images. You can check it through the map.

도 4를 참조하면, 입력된 대상자의 얼굴 이미지, 얼굴 이미지에 각각 대응되어 예측된 VCFS에 의한 얼굴 표현형일 확률 값 및 시각화 자료로서 생성된 클래스 활성화 맵을 확인할 수 있다. 클래스 활성화 맵에서, 적색으로 나타낸 영역이 VCFS에 의한 얼굴 표현형일 확률 값의 계산에 다른 영역 대비 크게 기여한 영역에 해당한다. 클래스 활성화 맵에서 적색으로 표시되는 얼굴 영역은 VCFS에 의한 특징적인 얼굴 표현이 나타나는 부위일 수 있다. 본 실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 시스템을 통해 생성되는 클래스 활성화 맵은 의료진의 VCFS 진단을 위한 보조 자료로 충분히 활용되어 의료진의 VCFS 진단을 지원할 수 있으며, 대상자의 VCFS 조기 진단에 도움을 주어 적절한 시기에 치료가 이루어질 수 있도록 지원할 수 있다. Referring to FIG. 4, you can check the input subject's face image, the probability value of the face phenotype predicted by VCFS corresponding to the face image, and the class activation map generated as visualization data. In the class activation map, the area shown in red corresponds to the area that contributed more than other areas to the calculation of the probability value of the facial phenotype by VCFS. The face area displayed in red in the class activation map may be an area where characteristic facial expressions by VCFS appear. The class activation map generated through the deep learning-based VCFS diagnosis support system according to this embodiment can be fully utilized as auxiliary data for the medical staff's VCFS diagnosis, and can support the medical staff's VCFS diagnosis. It can help with early diagnosis of VCFS and support treatment at an appropriate time.

도 5는 일 실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 방법의 각 단계를 나타내는 순서도이다. 도 6은 입력된 얼굴 이미지에 따른 대상자의 VCFS 발현 가능성을 예측 모델을 통해 예측하는 단계의 세부 단계를 나타내는 순서도이다. 각 단계의 일부 또는 전부는 컴퓨터 프로그램으로 구현될 수 있으며, 컴퓨터 프로세서에 의해 실행될 수 있다. 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 방법은 VCFS 진단 지원 시스템(10)에서 수행될 수 있으며, 본 실시예의 설명을 위해 도 1 내지 도 4, 관련된 설명 내용이 참조될 수 있다.Figure 5 is a flowchart illustrating each step of a deep learning-based method to support diagnosis of VCFS according to an embodiment. Figure 6 is a flow chart showing the detailed steps of predicting the possibility of developing VCFS of a subject according to the input face image through a prediction model. Part or all of each step may be implemented as a computer program and executed by a computer processor. The deep learning-based VCFS diagnosis support method may be performed in the VCFS diagnosis support system 10, and FIGS. 1 to 4 and related descriptions may be referred to for description of this embodiment.

도 5를 참조하면, 일 실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 방법은 대상자의 얼굴 이미지를 입력 받는 단계(S100); 상기 입력된 얼굴 이미지에 따른 대상자의 VCFS 발현 가능성을 딥러닝 기반의 미리 학습된 예측 모델을 통해 예측하는 단계(S110); 및 상기 예측 모델을 통해 상기 VCFS 발현 가능성과 연관된 정보를 시각화한 정보를 제공하는 단계(S120)를 포함한다.Referring to FIG. 5, a deep learning-based method of supporting diagnosis of VCFS based on deep learning according to an embodiment includes receiving a face image of a subject (S100); Predicting the possibility of the subject's VCFS according to the input face image using a deep learning-based pre-trained prediction model (S110); and providing information visualizing information related to the possibility of occurrence of VCFS through the prediction model (S120).

먼저, 대상자의 얼굴 이미지를 입력 받는다(S100).First, the subject's face image is input (S100).

대상자의 얼굴 이미지는 대상자의 얼굴이 포함한 이미지를 의미한다. 대상자의 얼굴 이미지는 카메라가 대상자의 얼굴을 정면에서 촬영한 정면 얼굴 이미지일 수 있다. 여기서, 정면 얼굴 이미지는 대상자의 얼굴 특징이 가장 잘 나타나도록 촬영된 이미지로서, 본 실시예에 따른 VCFS 진단 지원 시스템에 더욱 적합한 이미지일 수 있다.The subject's face image refers to an image containing the subject's face. The subject's face image may be a frontal facial image in which a camera captures the subject's face from the front. Here, the frontal facial image is an image taken to best reveal the subject's facial features, and may be an image more suitable for the VCFS diagnosis support system according to this embodiment.

다음으로, 상기 입력된 얼굴 이미지에 따른 대상자의 VCFS 발현 가능성을 딥러닝 기반의 미리 학습된 예측 모델을 통해 예측한다(S110).Next, the possibility of the subject developing VCFS according to the input face image is predicted using a deep learning-based pre-trained prediction model (S110).

본 단계(S110)에서, 예측 모델(110)은 딥러닝 기반의 미리 학습된 예측 모델로서, 입력된 얼굴 이미지에 기초하여 대상자의 VCFS 발현 가능성을 예측한다. 예측 모델(110)은 입력된 얼굴 이미지에서 얼굴 영역을 전처리하여 딥러닝 모델에 예측 가능한 입력으로 변환하도록 구성되며, 대규모 공개 얼굴 데이터 세트를 통해 학습된 얼굴 인식 모델을 VCFS 특화 얼굴 인식 모델로 미세 조정(Fine-tuning)한 모델로 구성되어, 입력된 얼굴 영역이 VCFS에 의한 얼굴 표현형일 확률 값을 추론할 수 있다.In this step (S110), the prediction model 110 is a deep learning-based pre-trained prediction model that predicts the likelihood of a subject developing VCFS based on the input face image. The prediction model 110 is configured to preprocess the face region in the input face image and convert it into a predictable input to the deep learning model, and fine-tune the face recognition model learned through a large-scale public face data set into a VCFS-specialized face recognition model. It is composed of a (fine-tuned) model, and can infer the probability that the input facial area is a facial phenotype by VCFS.

구체적으로, 도 6을 참조하면, 본 단계(S110)는 입력된 얼굴 이미지에서 얼굴 영역을 검출하고 정렬하여 표준화된 얼굴 이미지를 생성하는 단계(S112); 상기 표준화된 얼굴 이미지에서 피처 맵(feature map)을 추출하는 단계(S114); 및 상기 추출된 피처 맵(feature map)을 기초로 VCFS에 의한 안면 표현형일 확률 값을 출력하는 단계(S116)를 포함한다.Specifically, referring to FIG. 6, this step (S110) includes a step (S112) of detecting and aligning the face area in the input face image to generate a standardized face image; Extracting a feature map from the standardized face image (S114); And it includes a step (S116) of outputting a probability value of a facial phenotype by VCFS based on the extracted feature map.

여기서, 상기 입력된 얼굴 이미지에서 얼굴 영역을 검출하고 정렬하여 표준화된 얼굴 이미지를 생성하는 단계(S112)는 MTCNN(Multi-task cascaded convolutional networks)으로 구성되는 전처리부에서 수행되고, 상기 표준화된 얼굴 이미지에서 피처 맵(feature map)을 추출하는 단계(S114)는 ResNet(residual networks)으로 구성되는 합성곱 계층에서 수행될 수 있다. Here, the step (S112) of detecting and aligning the face area in the input face image to generate a standardized face image is performed in a preprocessor consisting of MTCNN (Multi-task cascaded convolutional networks), and the standardized face image The step of extracting a feature map (S114) may be performed in a convolution layer composed of ResNet (residual networks).

다음으로, 예측 모델을 통해 상기 VCFS 발현 가능성과 연관된 정보를 시각화한 정보를 제공한다(S120).Next, information visualizing information related to the possibility of occurrence of VCFS is provided through a prediction model (S120).

상기 시각화한 정보는 상기 전-연결 계층이 결과를 얻기 위해 특정 레이어의 어떤 위치에 있는 특징들을 이용했는지 시각화하여 나타낸 히트맵을 상기 입력된 얼굴 이미지와 함께 표시한 클래스 활성화 맵이며, 상기 클래스 활성화 맵은 상기 대상자의 얼굴 이미지에 VCFS 발현 가능성에 대한 판단 시 기여도가 타 얼굴 영역보다 높은 얼굴 영역을 표시하도록 구성될 수 있다. The visualized information is a class activation map that displays a heatmap that visualizes which features in a specific layer were used by the pre-connected layer to obtain a result together with the input face image, and the class activation map may be configured to display a facial region that has a higher contribution than other facial regions when determining the possibility of developing VCFS in the subject's facial image.

상기 히트맵은 Grad-CAM(gradient-weighted class activation mapping)을 통해 생성되며, 상기 Grad-CAM은 하기 수학식 1과 같이 구현된다. The heatmap is generated through Grad-CAM (gradient-weighted class activation mapping), and Grad-CAM is implemented as shown in Equation 1 below.

[수학식 1][Equation 1]

, ,

(: 합성곱 계층 구조의 마지막 단의 출력이자 전-연결 계층의 입력 값, : 합성곱 계층의 k번째에 있는 특징맵, Z: 특징 맵의 크기, ReLU는 음의 값을 0으로 만드는 활성 함수이다)( : The output of the last stage of the convolutional layer structure and the input value of the pre-connection layer, : feature map in the kth convolution layer, Z: size of feature map, ReLU is an activation function that makes negative values 0)

생성된 히트맵을 원본 얼굴 이미지와 함께 표시한 클래스 활성화 맵(CAM, Class Activation Map)을 통해 원본 얼굴 이미지에서 신경망이 중요하게 고려하는 영역의 확인이 가능하다. 즉, 예측 모델(110)이 입력된 얼굴 이미지가 VCFS에 의한 얼굴 표현형으로 판단한 경우, 입력된 얼굴 이미지 중 VCFS에 의한 얼굴 표현형으로 판단하는 데에 있어 다른 얼굴 영역보다 기여도가 높은 얼굴 영역을 클래스 활성화 맵을 통해 확인할 수 있다.Through the Class Activation Map (CAM), which displays the generated heatmap together with the original face image, it is possible to check the areas that the neural network considers important in the original face image. That is, if the prediction model 110 determines that the input face image is a facial phenotype by VCFS, the class activates the face region that contributes more than other facial regions in determining the facial phenotype by VCFS among the input face images. You can check it through the map.

이러한 실시예들에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 방법 은, 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. The deep learning-based method of supporting diagnosis of VCFS based on deep learning according to these embodiments is implemented as an application or in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. It can be. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the invention and vice versa.

실험예Experiment example

상술한 실시예에 따른 딥러닝 기반의 입천장심장얼굴 증후군(VCFS) 진단 지원 시스템 및 방법의 예측 모델을 여러 학습 데이터를 통해 구축하고, 구축된 예측 모델을 검증하는 실험을 수행하였다. A prediction model of the deep learning-based palatal heart facial syndrome (VCFS) diagnosis support system and method according to the above-described embodiment was built using various learning data, and an experiment was performed to verify the built prediction model.

예측 모델은 대규모 공개 얼굴 데이터 세트를 통해 학습된 얼굴 인식 모델의 전-연결 계층을 VCFS에 의한 얼굴 표현형과 정상 얼굴을 구별하도록 미세 조정(Fine-tuning)한 모델에 해당한다. 얼굴 인식 모델과 미세 조정 과정 각각에 대한 학습이 수행되어 예측 모델이 구축될 수 있다. The prediction model corresponds to a model that fine-tunes the pre-connection layer of the face recognition model learned through a large-scale public face data set to distinguish between facial phenotypes by VCFS and normal faces. Learning is performed for each of the face recognition model and the fine-tuning process to build a prediction model.

얼굴 인식 모델은 공개된 안면이미지 데이터 세트와 검증된 알고리즘을 이용하여 구축되었다. 영상 데이터 학습에 효율적이라고 알려진 residual networks (ResNet), 구체적으로 ResNet100, 또는 ResNet101을 사용하였으며, 입력되기 위한 데이터 전처리 과정을 MTCNN을 통해 수행되도록 하였다. 학습용 데이터 세트와 검증용 데이터 세트는 하기 표 1과 같이 준비되었다. The face recognition model was built using a publicly available facial image data set and a verified algorithm. We used residual networks (ResNet), specifically ResNet100 or ResNet101, which are known to be efficient in learning image data, and the data preprocessing process for input was performed through MTCNN. The training data set and validation data set were prepared as shown in Table 1 below.

구분division 데이터 세트명Data set name 학습용 데이터 세트training data set CASIACASIA MS-Celeb-1MMS-Celeb-1M K-FaceK-Face Asian-Celeb, CASIA(Asian), K-FaceAsian-Celeb, CASIA(Asian), K-Face 검증용 데이터 세트Data set for validation LFW(Labelled Faces in the Wild)Labeled Faces in the Wild (LFW) CFP-FPCFP-FP AGEDB-30AGEDB-30

Microsoft Celeb (MS-Celeb-1M), Institute of Automation Chinese Academy of Sciences (CASIA) 데이터 셋 등 다양한 공용 데이터 세트를 사용하여 각각 얼굴 인식 모델을 구축하였다. 구체적으로, 얼굴 인식 모델 1은 CASIA 데이터 셋을 통해 ResNet100을 활용하여 구축하였으며, 얼굴 인식 모델 2는 MS-Celeb-1M 데이터 셋을 통해 ResNet100을 활용하여 구축하였으며, 얼굴 인식 모델 3은 K-Face 데이터 셋을 통해 ResNet100을 활용하여 구축하였으며, 얼굴 인식 모델 4는 Asian-Celeb, CASIA(Asian), K-Face 데이터 셋과 ResNet101을 활용하여 구축하였다. 구축된 인식 모델 각각의 성능을 LFW(Labelled Faces in the Wild), CFP-FP, AGEDB-30와 같은 검증용 데이터 세트를 사용하여 검증하였으며, 이에 따른 결과 데이터는 하기 표 2와 같이 도출되었다.We built face recognition models using various public data sets, such as Microsoft Celeb (MS-Celeb-1M) and Institute of Automation Chinese Academy of Sciences (CASIA) data sets. Specifically, face recognition model 1 was built using ResNet100 using the CASIA data set, face recognition model 2 was built using ResNet100 using the MS-Celeb-1M data set, and face recognition model 3 was built using K-Face data. It was built using ResNet100 through three sets, and face recognition model 4 was built using Asian-Celeb, CASIA (Asian), and K-Face data sets and ResNet101. The performance of each constructed recognition model was verified using validation data sets such as LFW (Labelled Faces in the Wild), CFP-FP, and AGEDB-30, and the resulting data was derived as shown in Table 2 below.

모델Model 1One 22 33 44 구조structure ResNet100ResNet100 ResNet100ResNet100 ResNet100ResNet100 ResNet101ResNet101 학습 데이터training data CASIACASIA MS-Celeb-1MMS-Celeb-1M K-FaceK-Face Asian-Celeb, CASIA(Asian), K-FaceAsian-Celeb, CASIA(Asian), K-Face Face Verification
성능(%)Face Verification
Performance(%) 검증 데이터verification data LFWLFW 99.5299.52 99.7899.78 70.7870.78 98.2398.23 AgeDB-30AgeDB-30 94.5794.57 97.9397.93 51.1051.10 88.2288.22 CFP-FPCFP-FP 95.8995.89 94.0394.03 64.2064.20 89.1389.13

표 2를 참조하면, CASIA에 포함된 이미지 수가 약 0.5백만장, MS-Celeb-1M에 포함된 이미지 수가 3.9백만장, K-Face에 포함된 이미지 수가 1.1백만장, Asian-Celeb + CASIA(Asian) + K-Face에 포함된 이미지 수가 2.8백만장인 점을 감안할 때, ResNet의 초기 버전인 ResNet101보다 개선 버전인 ResNet100이 더 적은 학습 데이터 량에도 더 개선된 식별 성능을 나타내는 것을 알 수 있으며, 학습 데이터 량이 가장 많은 모델 2가 가장 우수한 얼굴 식별 성능을 나타내는 것을 알 수 있다. Referring to Table 2, the number of images included in CASIA is approximately 0.5 million, the number of images included in MS-Celeb-1M is 3.9 million, the number of images included in K-Face is 1.1 million, Asian-Celeb + CASIA(Asian) + K -Considering that the number of images included in Face is 2.8 million, it can be seen that ResNet100, an improved version of ResNet, shows improved identification performance even with a smaller amount of training data than ResNet101, the initial version of ResNet, and has the largest amount of training data. It can be seen that Model 2 shows the best face identification performance.

전-연결 계층을 VCFS에 의한 얼굴 표현형과 정상 얼굴을 구별하도록 미세 조정(Fine-tuning)하고, 조정된 전-연결 계층의 테스트를 위한 Train 데이터와 Test 데이터가 표 3과 같은 데이터가 활용되었다. The pre-connectivity layer was fine-tuned to distinguish facial phenotypes by VCFS from normal faces, and the Train and Test data shown in Table 3 were used to test the adjusted pre-connectivity layer.

구분division 전체entire TrainTrain TestTest MTCNN 검출 탈락MTCNN detection failed ID
(명)ID
(number of people) ImagesImages ID
(명)ID
(number of people) ImagesImages ID
(명)ID
(number of people) ImagesImages ImagesImages %% VCFSVCFS 9898 11051105 8888 868868 1010 5252 185185 16.7416.74 정상normal 9191 521521 8181 442442 1010 2121 5858 11.1311.13

표 3에서, VCFS는 유전자 검사로 22q11.2 결실이 확인된 VCFS 환자, 정상은 안면 표현형 이상이 없는 질환, 예를 들어 피부양성 종양 환자에 해당한다. In Table 3, VCFS corresponds to VCFS patients whose 22q11.2 deletion was confirmed by genetic testing, and normal corresponds to patients with diseases without facial phenotypic abnormalities, such as patients with skin benign tumors.

표 3의 Train 데이터를 통해 얼굴 인식 모델 1 내지 4의 전-연결 계층을 미세 조정하는 학습을 수행하여, 최종적인 예측 모델을 구축하였으며, 표 3의 Test 데이터를 통해 각각의 분류 성능을 확인하였다. 분류 성능은 실제 클래스(Actual Class)에 대한 예측된 클래스(Predicted class)의 적합 여부를 기초로 판단하게 된다. 도 7은 분류 성능을 계산하기 위한 계산식을 예시적으로 나타낸다. 도 7에 도시된 바와 같이, 각 모델에 대한 True Positives(TP), True Negatives (TN), False Negatives(FN), False Positives(FP)가 결정될 수 있으며, 결정된 TP, TN, FN, FP를 기초로 정확도(accuracy)의 계산이 가능할 수 있다. 이의 결과는 하기 표 4와 같이 나타났다. The final prediction model was constructed by learning to fine-tune the pre-connection layers of face recognition models 1 to 4 using the Train data in Table 3, and each classification performance was confirmed through the Test data in Table 3. Classification performance is judged based on the suitability of the predicted class to the actual class. Figure 7 exemplarily shows a calculation formula for calculating classification performance. As shown in Figure 7, True Positives (TP), True Negatives (TN), False Negatives (FN), and False Positives (FP) for each model can be determined, and based on the determined TP, TN, FN, and FP It may be possible to calculate accuracy. The results were shown in Table 4 below.

예측 모델prediction model 1One 22 33 44 구조structure ResNet100ResNet100 ResNet100ResNet100 ResNet100ResNet100 ResNet101ResNet101 학습 데이터training data CASIACASIA MS-Celeb-1MMS-Celeb-1M K-FaceK-Face Asian-Celeb, CASIA(Asian), K-FaceAsian-Celeb, CASIA(Asian), K-Face Classification
성능(%)Classification
Performance(%) TEST
(전체각도)TEST
(Full angle) F1-
ScoreF1-
Score 88.0288.02 81.4381.43 67.5067.50 90.3490.34 AccuracyAccuracy 87.6787.67 80.8280.82 71.2371.23 90.4190.41

표 4를 참조하면, 예측 모델 1, 예측 모델 2 및 예측 모델 4는 모두 높은 F1-score와 정확도(Accuracy)를 나타내는 것을 알 수 있으며, 충분한 분류 기능을 제공하는 것을 확인할 수 있다. Referring to Table 4, it can be seen that prediction model 1, prediction model 2, and prediction model 4 all show high F1-score and accuracy, and provide sufficient classification functions.

여기서, Train 데이터 또는 얼굴 인식 모델을 구축하기 위한 훈련 데이터는 다양한 얼굴 각도를 가진 이미지일 수 있으며, 얼굴 각도에 따라 분류 정확도가 달라질 수 있기에, 이에 대한 추가적인 검증을 수행하였다. Here, train data or training data to build a face recognition model may be images with various face angles, and since classification accuracy may vary depending on the face angle, additional verification was performed on this.

도 8은 Test 이미지의 얼굴 각도(정면, Upward 정면, 측면1(45도), 측면2(90도))를 예시적으로 나타낸다. Figure 8 exemplarily shows the face angles (front, upward front, side 1 (45 degrees), side 2 (90 degrees)) of the test image.

도 8에 도시된 바와 같이, 얼굴 각도를 달리한 Test 이미지의 분류 정확도를 하기 표 5와 같이 확인하였다.As shown in Figure 8, the classification accuracy of test images with different face angles was confirmed as shown in Table 5 below.

예측 모델prediction model 1One 22 33 44 구조structure ResNet100ResNet100 ResNet100ResNet100 ResNet100ResNet100 ResNet101ResNet101 학습 데이터training data CASIACASIA MS-Celeb-1MMS-Celeb-1M K-FaceK-Face Asian-Celeb, CASIA(Asian), K-FaceAsian-Celeb, CASIA(Asian), K-Face Classifi
cation
성능(%)Classification
cation
Performance(%) TEST

F1-score
(VCFS/정상)TEST

F1-score
(VCFS/normal) 정면face 94.9994.99 94.9994.99 60.1160.11 84.6584.65 Upward
정면Upward
face 89.5789.57 100.00100.00 68.0668.06 100.00100.00 측면1side 1 96.0196.01 88.7088.70 84.1984.19 100.00100.00 측면2side 2 71.9871.98 52.3852.38 56.2756.27 80.0080.00

측면 이미지 대비, 얼굴의 특징이 잘 나타나는 정면 이미지에 대한 분류 정확도가 더 높은 것을 확인할 수 있다. It can be seen that the classification accuracy is higher for frontal images that clearly show facial features compared to side images.

상술한 바와 같이, 구축된 예측 모델 1, 예측 모델 2, 예측 모델 3 및 예측 모델 4의 결과 값(class score)과 합성곱 계층의 최종 피처 맵 사이의 변화량을 추적하여, 입력된 얼굴 이미지와 동일한 크기를 갖는 히트맵(heatmap)을 각각 생성하고, 생성된 히트맵을 원본 얼굴 이미지와 함께 표시한 클래스 활성화 맵(CAM, Class Activation Map)을 시각화한 정보로 생성할 수 있다. As described above, by tracking the amount of change between the resulting values (class scores) of the built prediction model 1, prediction model 2, prediction model 3, and prediction model 4 and the final feature map of the convolution layer, the same as the input face image Heatmaps of each size can be generated, and the generated heatmap can be generated as information visualizing the class activation map (CAM) displayed along with the original face image.

이상에서 설명한 VCFS 진단 지원 시스템 및 방법의 실시예들에 의하면, 딥러닝 기반의 예측 모델을 이용하여 환자의 얼굴 이미지로부터 VCFS에 의한 얼굴 표현 여부를 예측할 수 있고, 예측 모델의 결과에 영향을 미치는 특징 변수들과 결과 간의 관계를 시각적으로 나타내는 얼굴 영역의 클래스 활성화 맵을 제공하여 임상의에게 변수들이 예측 결과에 어떻게 영향을 미치는지에 대한 통찰력을 제공할 수 있다. According to the embodiments of the VCFS diagnosis support system and method described above, facial expression by VCFS can be predicted from the patient's face image using a deep learning-based prediction model, and features that affect the results of the prediction model By providing class activation maps of facial regions that visually represent the relationships between variables and outcomes, we can provide clinicians with insight into how variables affect predicted outcomes.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to the embodiments, those skilled in the art can make various modifications and changes to the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand.

10: VCFS 진단 지원 시스템
100: 데이터 입력부
110: 예측 모델10: VCFS diagnostic support system
100: data input unit
110: Prediction model

Claims

An input unit that receives the subject's face image; and
It is a deep learning-based pre-trained prediction model, and includes a prediction model that predicts the likelihood of a subject developing VCFS based on the input face image,
The prediction model further provides information visualizing information associated with the subject's likelihood of developing palatal heart facial syndrome (VCFS),
The prediction model is,
A pre-processing unit that detects and aligns face areas in the input face image and generates a standardized face image;
a convolution layer that extracts a feature map from the standardized face image;
A pre-connection layer configured to output a probability value of a facial phenotype due to VCFS based on the extracted feature map; and
It includes a data visualization unit that generates the visualized information,
The visualized information is a class activation map that displays a heatmap that visualizes which features at which positions in a specific layer were used by the pre-connected layer to obtain a result, along with the input face image,
The class activation map is configured to display a facial region with a higher contribution than other facial regions when determining the possibility of developing VCFS in the subject's facial image,
The heatmap is generated through Grad-CAM (gradient-weighted class activation mapping),
The Grad-CAM is a deep learning-based palatal heart facial syndrome (VCFS) diagnosis support system, characterized in that it is implemented as shown in Equation 1 below.
[Equation 1]
,
( : The output of the last stage of the convolutional layer structure and the input value of the pre-connection layer, : feature map in the kth convolution layer, Z: size of feature map, ReLU is an activation function that makes negative values 0)

delete

According to claim 1,
The convolutional layer is composed of ResNet (residual networks), and the preprocessor is composed of MTCNN (Multi-task cascaded convolutional networks). A deep learning-based palate heart facial syndrome (VCFS) diagnosis support system.

According to claim 1,
A deep learning-based palatal heart facial syndrome (VCFS) diagnosis support system, wherein the facial image is a frontal facial image.

one or more processors; and
A deep learning-based palatal heart facial syndrome (VCFS) diagnosis support method performed by a device having a memory for storing one or more programs executed by the processors, comprising:
The above method is:
Receiving a face image of a subject;
Predicting the possibility of a subject developing VCFS according to the input face image using a deep learning-based pre-trained prediction model; and
Providing information visualizing information related to the possibility of developing VCFS through the prediction model,
The step of predicting the likelihood of the subject developing palatal heart facial syndrome (VCFS) according to the input face image using a deep learning-based pre-trained prediction model,
A pre-processing unit of the prediction model detecting and aligning a face area in an input face image to generate a standardized face image;
extracting, by the convolution layer of the prediction model, a feature map from the standardized face image; and
The pre-connection layer of the prediction model includes outputting a probability value of a facial phenotype due to VCFS based on the extracted feature map,
The visualized information is a class activation map that displays a heatmap that visualizes which features of a specific layer were used by the pre-connected layer to obtain a result together with the input face image,
The class activation map is configured to display a facial region with a higher contribution than other facial regions when determining the possibility of developing VCFS in the subject's facial image,
The heatmap is generated through Grad-CAM (gradient-weighted class activation mapping),
The Grad-CAM is a deep learning-based palatal heart facial syndrome (VCFS) diagnosis support method, characterized in that it is implemented as shown in Equation 1 below.
[Equation 1]
,
( : The output of the last stage of the convolutional layer structure and the input value of the pre-connection layer, : feature map in the kth convolution layer, Z: size of feature map, ReLU is an activation function that makes negative values 0)

delete

According to clause 7,
The step of generating a standardized face image by detecting and aligning the face area in the input face image is performed in a preprocessor consisting of MTCNN (Multi-task cascaded convolutional networks),
The step of extracting a feature map from the standardized face image is performed in a convolution layer composed of ResNet (residual networks). A deep learning-based method for supporting the diagnosis of VCFS. .

According to clause 7,
A deep learning-based method for supporting the diagnosis of palate heart facial syndrome (VCFS), wherein the face image is a frontal face image.

A computer program stored in a computer-readable recording medium for implementing the deep learning-based VCFS diagnosis support method according to any one of claims 7, 11, and 12.