KR20240108291A

KR20240108291A - Learning method and device for alzheimer prediction model based on domain adaptation

Info

Publication number: KR20240108291A
Application number: KR1020230196220A
Authority: KR
Inventors: 신현정; 박성홍; 손상준; 홍창형; 노현웅
Original assignee: 아주대학교산학협력단
Priority date: 2022-12-30
Filing date: 2023-12-29
Publication date: 2024-07-09

Abstract

본 개시의 일 실시예에 따르면, 도메인 적응 기반 알츠하이머 예측 모델 학습 방법은 제1 뇌와 연관된 타겟 도메인의 타겟 데이터셋을 획득하는 단계, 제2 뇌와 연관된 소스 도메인의 소스 데이터셋을 획득하는 단계, 소스 데이터셋을 기초로 타겟 데이터셋에 대한 도메인 적응을 수행하여 변환 타겟 데이터셋을 획득하는 단계 및 변환 타겟 데이터셋 및 소스 데이터셋을 학습 데이터로 하여 기계학습 모델을 학습시키는 단계를 포함할 수 있다.According to an embodiment of the present disclosure, a domain adaptation-based Alzheimer's prediction model learning method includes the steps of acquiring a target dataset of a target domain associated with a first brain, acquiring a source dataset of a source domain associated with a second brain, It may include the step of performing domain adaptation to the target dataset based on the source dataset to obtain a conversion target dataset and the step of training a machine learning model using the conversion target dataset and the source dataset as learning data. .

Description

Domain adaptation-based Alzheimer's prediction model learning method and device {LEARNING METHOD AND DEVICE FOR ALZHEIMER PREDICTION MODEL BASED ON DOMAIN ADAPTATION}

본 개시는 도메인 적응 기반 알츠하이머 예측 모델 학습 방법 및 장치에 관한 것이다.The present disclosure relates to a method and device for learning a domain adaptation-based Alzheimer's prediction model.

알츠하이머병(AD; Alzheimer Disease)은 노인들에게 영향을 미치는 치매의 가장 흔한 형태이다. 기대수명 연장으로 인해 알츠하이머병 환자 수는 계속 증가하고 있으며, 전 세계 알츠하이머병 인구는 2015년 약 5천만 명에서 2050년 1억 3,150만 명으로 3배 증가할 것으로 예상된다. 이처럼 AD는 심각한 문제로 대두되고 있으나 원인이 불분명하고 치료법도 없다. 따라서 AD의 조기 발견과 함께 질병 진행의 잠재적 위험을 예측하고 적절한 예방 전략을 찾는 것이 중요하다. 경도 기억 상실 또는 기타 인지 상실을 수반하는 임상 증상에 따르면, 경도 인지 장애(MCI; Mild Cognitive Impairment)는 장기 기억 상실, 언어 장애, 방향 감각 상실 및 성격 변화와 같은 증상을 동반하는 AD의 전구 단계로 간주된다. 이전 연구에서는 MCI로 고통받는 피험자의 약 12%가 첫 증상 이후 4년 내에 AD로 진행되는 것으로 나타났다.Alzheimer's disease (AD) is the most common form of dementia affecting older people. Due to the extension of life expectancy, the number of Alzheimer's disease patients continues to increase, and the global Alzheimer's disease population is expected to triple from approximately 50 million in 2015 to 131.5 million in 2050. AD is emerging as a serious problem, but the cause is unclear and there is no cure. Therefore, it is important to detect AD early, predict the potential risk of disease progression, and find appropriate prevention strategies. According to clinical symptoms accompanying mild memory loss or other cognitive impairment, Mild Cognitive Impairment (MCI) is a prodromal stage of AD accompanied by symptoms such as long-term memory loss, language impairment, disorientation, and personality changes. It is considered. Previous studies have shown that approximately 12% of subjects suffering from MCI progress to AD within 4 years after first symptoms.

본원 발명은 혁신형 만성뇌혈관질환 바이오뱅크(세부과제번호: KBN4-B02-2023-01) 지원 사업을 통해 개발된 기술이다. 한편, 본원 발명의 모든 측면에서 질병관리청 국립보건연구원의 재산 이익은 없다.The present invention is a technology developed through the Innovative Chronic Cerebrovascular Disease Biobank (Detailed Project Number: KBN4-B02-2023-01) support project. Meanwhile, there is no property interest of the National Institute of Health of the Korea Disease Control and Prevention Agency in any aspect of the present invention.

본 개시에서는 상술한 문제를 해결하기 위하여 경도 인지 장애로(MCI)로부터 알츠하이머병(AD)으로의 전환을 예측하도록 구성된 도메인 적응 기반 알츠하이머 예측 모델을 위한 학습 방법이 제공된다.In order to solve the above-described problems, the present disclosure provides a learning method for a domain adaptation-based Alzheimer's prediction model configured to predict the transition from mild cognitive impairment (MCI) to Alzheimer's disease (AD).

본 개시의 일 실시예에 따르면, 도메인 적응 기반 알츠하이머 예측 모델 학습 방법은 제1 뇌와 연관된 타겟 도메인의 타겟 데이터셋을 획득하는 단계, 제2 뇌와 연관된 소스 도메인의 소스 데이터셋을 획득하는 단계 - 소스 데이터셋은 제2 뇌에 대한 진단 정보를 나타내는 레이블(label)을 포함함 -, 소스 데이터셋을 기초로 타겟 데이터셋에 대한 도메인 적응을 수행하여 변환 타겟 데이터셋을 획득하는 단계 및 변환 타겟 데이터셋 및 소스 데이터셋을 학습 데이터로 하여 기계학습 모델을 학습시키는 단계를 포함하고, 기계학습 모델은 뇌와 연관된 데이터가 입력됨에 따라 경도 인지장애에서 알츠하이머병으로의 전환 가능성과 연관된 정보를 포함하는 데이터가 출력되도록 학습되고, 기계학습 모델의 목적 함수는 변환 타겟 데이터셋과 소스 데이터셋의 차이를 기초로 구성될 수 있다.According to an embodiment of the present disclosure, a method for learning a domain adaptation-based Alzheimer's prediction model includes the steps of acquiring a target dataset of a target domain associated with a first brain, acquiring a source dataset of a source domain associated with a second brain - The source dataset includes a label indicating diagnostic information about the second brain - performing domain adaptation on the target dataset based on the source dataset to obtain a transformation target dataset and transformation target data. It includes the step of training a machine learning model using the set and source dataset as learning data, and the machine learning model is data containing information related to the possibility of conversion from mild cognitive impairment to Alzheimer's disease as data related to the brain is input. is learned to be output, and the objective function of the machine learning model can be constructed based on the difference between the conversion target dataset and the source dataset.

일 실시예에 따르면, 목적 함수는 적응 행렬에 기초하고, 적응 행렬은 타겟 데이터셋으로부터 획득된 제1 평균 벡터와 소스 데이터셋으로부터 획득된 제2 평균 벡터 사이의 차이를 감소시키도록 구성되고, 제1 평균 벡터는 타겟 데이터셋의 레이블에 따른 데이터 분포의 중심에 대응되고, 제2 평균 벡터는 소스 데이터셋의 레이블에 따른 데이터 분포의 중심에 대응될 수 있다.According to one embodiment, the objective function is based on an adaptation matrix, the adaptation matrix is configured to reduce the difference between the first mean vector obtained from the target dataset and the second mean vector obtained from the source dataset, The first average vector may correspond to the center of the data distribution according to the label of the target dataset, and the second average vector may correspond to the center of the data distribution according to the label of the source dataset.

일 실시예에 따르면, 제1 평균 벡터는 타겟 데이터셋에 대응하는 타겟 데이터 행렬로부터 획득되고, 제2 평균 벡터는 소스 데이터셋에 대응하는 소스 데이터 행렬로부터 획득될 수 있다. According to one embodiment, the first average vector may be obtained from a target data matrix corresponding to the target dataset, and the second average vector may be obtained from a source data matrix corresponding to the source dataset.

일 실시예에 따르면, 목적 함수는 하기 [식 1]을 기초로 구성되고, According to one embodiment, the objective function is constructed based on [Equation 1] below,

[식 1][Equation 1]

식 1에서 는 적응 행렬에 대응되고, 는 제1 평균 벡터에 대응되고, 는 제2 평균 벡터에 대응될 수 있다.In equation 1 corresponds to the adaptation matrix, corresponds to the first average vector, may correspond to the second average vector.

일 실시예에 따르면, 목적 함수는 적응 행렬에 기초하고, 적응 행렬은 타겟 데이터셋으로부터 획득된 제1 상관관계 행렬(correlation matrix)에 소스 데이터셋으로부터 획득된 제2 상관관계 행렬 사이의 차이를 감소시키는 단계를 더 포함할 수 있다.According to one embodiment, the objective function is based on an adaptation matrix, and the adaptation matrix reduces the difference between the first correlation matrix obtained from the target dataset and the second correlation matrix obtained from the source dataset. Additional steps may be included.

일 실시예에 따르면, 제1 상관관계 행렬은 제1 뇌와 연관된 관심영역(ROI)들 사이의 상관관계와 연관된 정보를 포함하고, 제2 상관관계 행렬은 제2 뇌와 연관된 관심영역들 사이의 상관관계와 연관된 정보를 포함할 수 있다.According to one embodiment, the first correlation matrix includes information associated with correlations between regions of interest (ROIs) associated with the first brain, and the second correlation matrix includes information associated with correlations between regions of interest (ROIs) associated with the second brain. May contain information related to correlation.

일 실시예에 따르면, 목적 함수는 하기 [식 2]를 기초로 구성되고, According to one embodiment, the objective function is constructed based on [Equation 2] below,

[식 2][Equation 2]

식 2에서 는 적응 행렬에 대응되고, 는 제1 상관관계 행렬에 대응되고,는 제2 상관관계 행렬에 대응될 수 있다.In equation 2 corresponds to the adaptation matrix, corresponds to the first correlation matrix, may correspond to the second correlation matrix.

본 개시의 다른 실시예에 따르면, 도메인 적응 기반 알츠하이머 예측 모델 학습 방법을 실행시키도록 컴퓨터로 판독 가능한 기록 매체에 기록된 컴퓨터 프로그램이 제공될 수 있다.According to another embodiment of the present disclosure, a computer program recorded on a computer-readable recording medium may be provided to execute a domain adaptation-based Alzheimer's prediction model learning method.

본 개시의 일 실시예에 따르면, 소스 데이터셋에 대한 손실 없이 도메인 적응을 통해 서로 다른 기관으로부터 수집된 데이터셋을 통합할 수 있다.According to an embodiment of the present disclosure, datasets collected from different organizations can be integrated through domain adaptation without loss of the source dataset.

도 1은 본 개시의 일 시시예에 따른 도메인 적응 기반 알츠하이머 예측 모델 학습 방법을 나타내는 모식도이다.
도 2는 본 개시의 일 실시예에 따른 도메인 적응 기반 알츠하이머 예측 모델 학습 방법을 수행하는 정보 처리 시스템의 내부 구성을 나타내는 블록도이다.
도 3은 본 개시의 일 실시예에 따른 도메인 적응 기반 알츠하이머 예측 모델 학습 방법의 흐름도이다.Figure 1 is a schematic diagram showing a domain adaptation-based Alzheimer's prediction model learning method according to an embodiment of the present disclosure.
Figure 2 is a block diagram showing the internal configuration of an information processing system that performs a domain adaptation-based Alzheimer's prediction model learning method according to an embodiment of the present disclosure.
Figure 3 is a flowchart of a method for learning a domain adaptation-based Alzheimer's prediction model according to an embodiment of the present disclosure.

이하, 본 개시의 실시를 위한 구체적인 내용을 첨부된 도면을 참조하여 상세히 설명한다. 다만, 이하의 설명에서는 본 개시의 요지를 불필요하게 흐릴 우려가 있는 경우, 널리 알려진 기능이나 구성에 관한 구체적 설명은 생략하기로 한다.Hereinafter, specific details for implementing the present disclosure will be described in detail with reference to the attached drawings. However, in the following description, detailed descriptions of well-known functions or configurations will be omitted if there is a risk of unnecessarily obscuring the gist of the present disclosure.

첨부된 도면에서, 동일하거나 대응하는 구성요소에는 동일한 참조부호가 부여되어 있다. 또한, 이하의 실시예들의 설명에 있어서, 동일하거나 대응되는 구성요소를 중복하여 기술하는 것이 생략될 수 있다. 그러나 구성요소에 관한 기술이 생략되어도, 그러한 구성요소가 어떤 실시예에 포함되지 않는 것으로 의도되지는 않는다.In the accompanying drawings, identical or corresponding components are given the same reference numerals. Additionally, in the description of the following embodiments, overlapping descriptions of identical or corresponding components may be omitted. However, even if descriptions of components are omitted, it is not intended that such components are not included in any embodiment.

개시된 실시예의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 개시는 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 개시가 완전하도록 하고, 본 개시가 통상의 기술자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것일 뿐이다.Advantages and features of the disclosed embodiments and methods for achieving them will become clear by referring to the embodiments described below in conjunction with the accompanying drawings. However, the present disclosure is not limited to the embodiments disclosed below and may be implemented in various different forms. The present embodiments are merely provided to ensure that the present disclosure is complete and that the present disclosure does not fully convey the scope of the invention to those skilled in the art. It is only provided to inform you.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 개시된 실시예에 대해 구체적으로 설명하기로 한다. 본 명세서에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 관련 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.Terms used in this specification will be briefly described, and the disclosed embodiments will be described in detail. The terms used in this specification are general terms that are currently widely used as much as possible while considering the functions in the present disclosure, but this may vary depending on the intention or precedent of a technician working in the related field, the emergence of new technology, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the relevant invention. Therefore, the terms used in this disclosure should be defined based on the meaning of the term and the overall content of this disclosure, rather than simply the name of the term.

본 명세서에서의 단수의 표현은 문맥상 명백하게 단수인 것으로 특정하지 않는 한, 복수의 표현을 포함한다. 또한, 복수의 표현은 문맥상 명백하게 복수인 것으로 특정하지 않는 한, 단수의 표현을 포함한다. 명세서 전체에서 어떤 부분이 어떤 구성요소를 '포함'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. In this specification, singular expressions include plural expressions, unless the context clearly specifies the singular. Additionally, plural expressions include singular expressions, unless the context clearly specifies plural expressions. When it is said that a part 'includes' a certain element throughout the specification, this means that it does not exclude other elements, but may further include other elements, unless specifically stated to the contrary.

본 개시의 일 실시예에 따르면, '메모리'는 전자 정보를 저장 가능한 임의의 전자 컴포넌트를 포함하도록 넓게 해석되어야 한다. '메모리'는 임의 액세스 메모리(RAM), 판독-전용 메모리(ROM), 비-휘발성 임의 액세스 메모리(NVRAM), 프로그램가능 판독-전용 메모리(PROM), 소거-프로그램가능 판독 전용 메모리(EPROM), 전기적으로 소거가능 PROM(EEPROM), 플래쉬 메모리, 자기 또는 광학 데이터 저장장치, 레지스터들 등과 같은 프로세서-판독가능 매체의 다양한 유형들을 지칭할 수도 있다. 프로세서가 메모리로부터 정보를 판독하고/하거나 메모리에 정보를 기록할 수 있다면 메모리는 프로세서와 전자 통신 상태에 있다고 불린다. According to one embodiment of the present disclosure, 'memory' should be interpreted broadly to include any electronic component capable of storing electronic information. 'Memory' refers to random access memory (RAM), read-only memory (ROM), non-volatile random access memory (NVRAM), programmable read-only memory (PROM), erasable-programmable read-only memory (EPROM), May also refer to various types of processor-readable media, such as electrically erasable PROM (EEPROM), flash memory, magnetic or optical data storage, registers, etc. A memory is said to be in electronic communication with a processor if the processor can read information from and/or write information to the memory.

도 1은 본 개시의 일 시시예에 따른 도메인 적응 기반 알츠하이머 예측 모델 학습 방법(이하, '방법'이라 한다)을 나타내는 모식도이다. 구체적으로, (a)는 방법의 전체 과정을 나타내는 모식도이다. 다음으로, (b) 및 (c)는 각각 방법의 일부 과정을 나타내는 모식도이다. 한편, 본 개시에서 방법은 정보 처리 시스템(도 2에서 후술됨)의 적어도 하나의 프로세서에 의해 수행될 수 있다.Figure 1 is a schematic diagram showing a domain adaptation-based Alzheimer's prediction model learning method (hereinafter referred to as 'method') according to an embodiment of the present disclosure. Specifically, (a) is a schematic diagram showing the entire process of the method. Next, (b) and (c) are schematic diagrams showing some processes of the method, respectively. Meanwhile, the method in the present disclosure may be performed by at least one processor of an information processing system (described later in FIG. 2).

(a)를 참고하면, 프로세서는 어느 하나의 타겟 도메인(112)과 소스 도메인(114) 사이의 도메인 적응을 수행하는 방식으로 방법을 수행할 수 있다. 나아가, 프로세서는 복수의 타겟 도메인(112)(여기서, Target A 내지 D) 각각과의 도메인 적응을 수행하는 방식으로 방법을 수행할 수 있다. 구체적으로, 프로세서는 타겟 도메인(112)으로부터 획득된 타겟 데이터셋을 소스 도메인(114)으로부터 획득된 소스 데이터셋과 유사하게 변형함으로써 도메인 적응을 수행할 수 있다. 즉, 프로세서는 소스 데이터셋은 변형하지 아니한 채, 타겟 데이터셋만을 변형시키는 방식으로 도메인 적응을 수행할 수 있다. 이러한 구성에 의해, 상대적으로 데이터의 질과 양이 우수한 소스 데이터셋을 변환하는 과정에서 발생하는 데이터 로스의 발생을 막을 수 있다. 예를 들어, 소스 데이터셋은 대형 병원의 데이터베이스로부터 획득된 데이터셋을 나타낼 수 있고, 타겟 데이터셋은 소형 병원의 데이터베이스로부터 획득된 데이터셋을 나타낼 수 있다.Referring to (a), the processor may perform the method by performing domain adaptation between one target domain 112 and the source domain 114. Furthermore, the processor may perform the method by performing domain adaptation with each of the plurality of target domains 112 (here, Targets A to D). Specifically, the processor may perform domain adaptation by transforming the target dataset obtained from the target domain 112 to be similar to the source dataset obtained from the source domain 114. That is, the processor can perform domain adaptation by transforming only the target dataset without transforming the source dataset. With this configuration, data loss that occurs in the process of converting a source dataset with relatively excellent data quality and quantity can be prevented. For example, the source dataset may represent a dataset obtained from a database of a large hospital, and the target dataset may represent a dataset obtained from a database of a small hospital.

(b)를 참고하면, 프로세서는 타겟 데이터셋의 데이터 분포(122)를 소스 데이터셋의 데이터 분포(126)와 유사하게 변형하는 방식으로 타겟 데이터셋에 대한 도메인 적응을 수행할 수 있다. 여기서, '데이터 분포'는 각 도메인의 특징 공간에서의 데이터 분포를 의미할 수 있다. 구체적으로, 프로세서는 타겟 데이터셋의 데이터 분포(122)를 나타내는 타겟 데이터 행렬 를 획득하고, 획득된 데이터 행렬 로부터 제1 평균 벡터 를 획득할 수 있다. 이와 유사하게, 프로세서는 소스 데이터셋의 데이터 분포(126)를 나타내는 소스 데이터 행렬 를 획득하고, 획득된 소스 데이터 행렬 로부터 제2 평균 벡터 를 획득할 수 있다. 여기서, 제1 평균 벡터는 타겟 데이터셋의 데이터 분포(122)의 중심에 대응될 수 있다. 마찬가지로, 제2 평균 벡터는 소스 데이터셋의 데이터 분포(126)의 중심에 대응될 수 있다.Referring to (b), the processor may perform domain adaptation for the target dataset by transforming the data distribution 122 of the target dataset to be similar to the data distribution 126 of the source dataset. Here, ‘data distribution’ may mean data distribution in the feature space of each domain. Specifically, the processor provides a target data matrix representing the data distribution 122 of the target dataset. and obtain the obtained data matrix The first average vector from can be obtained. Similarly, the processor generates a source data matrix representing the data distribution 126 of the source dataset. and obtain the source data matrix the second mean vector from can be obtained. Here, the first average vector may correspond to the center of the data distribution 122 of the target dataset. Likewise, the second mean vector may correspond to the center of the data distribution 126 of the source dataset.

프로세서는 적응 행렬을 이용하여 타겟 데이터셋의 데이터 분포(122) 중심과 소스 데이터셋의 데이터 분포(126) 중심 사이의 차이를 감소시킬 수 있다. 구체적으로, 프로세서는 적응 행렬 를 기초로 하기 수학식 1에 따라 타겟 데이터 셋의 데이터 분포 중심과 소스 데이터셋의 분포 중심 사이의 거리를 좁힐 수 있다(또는, 최소화할 수 있다). 이에 따라, 프로세서는 데이터 분포 중심이 소스 데이터셋과 유사해진 데이터 분포(124)를 갖는 변형 타겟 데이터셋을 획득할 수 있다.The processor may use an adaptation matrix to reduce the difference between the center of the data distribution 122 of the target dataset and the center of the data distribution 126 of the source dataset. Specifically, the processor has an adaptive matrix Based on Equation 1 below, the distance between the data distribution center of the target data set and the distribution center of the source data set can be narrowed (or minimized). Accordingly, the processor may obtain a modified target dataset having a data distribution 124 whose data distribution center is similar to the source dataset.

추가적으로 또는 대안적으로, (c)를 참고하면, 프로세서는 타겟 데이터셋의 데이터간 상관관계(132)를 소스 데이터셋의 데이터간 상관관계(136)와 유사하게 변형하는 방식으로 타겟 데이터셋에 대한 도메인 적응을 수행할 수도 있다. 여기서, '데이터간 상관관계'는 각 도메인의 특징 공간에서 어느 하나의 데이터와 다른 하나의 데이터 사이의 뇌에서의 위치관계를 의미할 수 있다. 보다 상세하게, 하나의 뇌 이미지에서 추출된 제1 관심 영역과 제2 관심 영역이 해당 이미지에서 인접하게(또는, 접하게) 위치하는 경우, 제1 관심 영역에 대응하는 제1 데이터와 제2 관심 영역에 대응하는 제2 데이터는 특징 공간에서 관계성을 나타낼 수 있다. 구체적으로, 프로세서는 타겟 데이터셋으로부터 제1 상관관계 행렬 를 획득할 수 있다. 이와 유사하게, 프로세서는 소스 데이터셋으로부터 제2 상관관계 행렬 를 획득할 수 있다.Additionally or alternatively, referring to (c), the processor may transform the inter-data correlation 132 of the target dataset to be similar to the inter-data correlation 136 of the source dataset. Domain adaptation can also be performed. Here, 'correlation between data' may mean a positional relationship in the brain between one piece of data and another piece of data in the feature space of each domain. More specifically, when the first region of interest and the second region of interest extracted from one brain image are located adjacent to (or adjacent to) the image, the first data and the second region of interest corresponding to the first region of interest The second data corresponding to may indicate a relationship in the feature space. Specifically, the processor generates a first correlation matrix from the target dataset can be obtained. Similarly, the processor may generate a second correlation matrix from the source dataset. can be obtained.

프로세서는 적응 행렬 , 제1 상관관계 행렬 및 제2 상관관계 행렬 를 기초로 타겟 데이터셋의 상관관계와 소스 데이터셋의 상관관계 사이의 차이를 감소시킬 수 있다. 즉, 프로세서는 수학식 2에 따라 적응 행렬 , 제1 상관관계 행렬 및 제2 상관관계 행렬 를 기초로 타겟 데이터셋의 상관관계에 소스 데이터셋의 상관관계를 맵핑할 수 있다. 이에 따라, 프로세스는 변형된 데이터간 상관관계(134)를 갖는 변형 타겟 데이터셋을 획득할 수 있다.The processor adapts the matrix , first correlation matrix and a second correlation matrix. Based on , the difference between the correlation of the target dataset and the correlation of the source dataset can be reduced. That is, the processor adapts the adaptive matrix according to Equation 2 , first correlation matrix and a second correlation matrix. Based on this, you can map the correlation of the source dataset to the correlation of the target dataset. Accordingly, the process may obtain a transformed target dataset having a correlation 134 between transformed data.

도 2는 본 개시의 일 실시예에 따른 도메인 적응 기반 알츠하이머 예측 모델 학습 방법을 수행하는 정보 처리 시스템(200)의 내부 구성을 나타내는 블록도이다. 정보 처리 시스템(200)은 메모리(210), 프로세서(220), 통신 모듈(230) 및 입출력 인터페이스(240)를 포함할 수 있다. 도 2에 도시된 바와 같이, 정보 처리 시스템(200)은 각각의 통신 모듈(230)을 이용하여 네트워크를 통해 정보 및/또는 데이터를 통신할 수 있도록 구성될 수 있다.FIG. 2 is a block diagram showing the internal configuration of an information processing system 200 that performs a domain adaptation-based Alzheimer's prediction model learning method according to an embodiment of the present disclosure. The information processing system 200 may include a memory 210, a processor 220, a communication module 230, and an input/output interface 240. As shown in FIG. 2, the information processing system 200 may be configured to communicate information and/or data through a network using each communication module 230.

메모리(210)는 비-일시적인 임의의 컴퓨터 판독 가능한 기록매체를 포함할 수 있다. 일 실시예에 따르면, 메모리(210)는 RAM(random access memory), ROM(read only memory), 디스크 드라이브, SSD(solid state drive), 플래시 메모리(flash memory) 등과 같은 비소멸성 대용량 저장 장치(permanent mass storage device)를 포함할 수 있다. 다른 예로서, ROM, SSD, 플래시 메모리, 디스크 드라이브 등과 같은 비소멸성 대용량 저장 장치는 메모리와는 구분되는 별도의 영구 저장 장치로서 정보 처리 시스템(200)에 포함될 수 있다. 또한, 메모리(210)에는 운영체제와 적어도 하나의 프로그램 코드(예를 들어, 정보 처리 시스템(200)에 설치되어 구동되는 정보 처리 시스템(200)의 제어를 위한 코드)가 저장될 수 있다.Memory 210 may include any non-transitory computer-readable recording medium. According to one embodiment, the memory 210 is a non-permanent mass storage device such as random access memory (RAM), read only memory (ROM), disk drive, solid state drive (SSD), flash memory, etc. mass storage device). As another example, non-perishable mass storage devices such as ROM, SSD, flash memory, disk drive, etc. may be included in the information processing system 200 as a separate persistent storage device that is distinct from memory. Additionally, the memory 210 may store an operating system and at least one program code (eg, a code for controlling the information processing system 200 installed and driven in the information processing system 200).

이러한 소프트웨어 구성요소들은 메모리(210)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독가능한 기록매체는 이러한 정보 처리 시스템(200)에 직접 연결가능한 기록 매체를 포함할 수 있는데, 예를 들어, 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 예로서, 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록매체가 아닌 통신 모듈(230)을 통해 메모리(210)에 로딩될 수도 있다. 예를 들어, 적어도 하나의 프로그램은 개발자들 또는 어플리케이션의 설치 파일을 배포하는 파일 배포 시스템이 네트워크를 통해 제공하는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 메모리(210)에 로딩될 수 있다.These software components may be loaded from a computer-readable recording medium separate from the memory 210. Recording media readable by such a separate computer may include recording media directly connectable to the information processing system 200, for example, floppy drives, disks, tapes, DVD/CD-ROM drives, memory cards, etc. It may include a recording medium that can be read by a computer. As another example, software components may be loaded into the memory 210 through the communication module 230 rather than a computer-readable recording medium. For example, at least one program may be loaded into the memory 210 based on a computer program installed by files provided over a network by developers or a file distribution system that distributes application installation files.

프로세서(220)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(210) 또는 통신 모듈(230)에 의해 프로세서(220)로 제공될 수 있다. 예를 들어, 프로세서(220)는 메모리(210)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 220 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Commands may be provided to the processor 220 by the memory 210 or the communication module 230. For example, processor 220 may be configured to execute received instructions according to program code stored in a recording device such as memory 210.

통신 모듈(230)은 네트워크를 통해 정보 처리 시스템(200)이 다른 시스템(일례로 별도의 클라우드 시스템 등)과 통신하기 위한 구성 또는 기능을 제공할 수 있다. 일례로, 정보 처리 시스템(200)의 프로세서(220)의 제어에 따라 제공되는 제어 신호나 명령이 통신 모듈(230)과 네트워크를 거쳐 다른 시스템에 수신될 수 있다.The communication module 230 may provide a configuration or function for the information processing system 200 to communicate with another system (for example, a separate cloud system, etc.) through a network. For example, a control signal or command provided under the control of the processor 220 of the information processing system 200 may be received by another system through the communication module 230 and a network.

입출력 인터페이스(240)는 정보 처리 시스템(200)과 연결되거나 정보 처리 시스템(200)이 포함할 수 있는 입력 또는 출력을 위한 장치(미도시)와의 인터페이스를 위한 수단일 수 있다. 도 2에서는 입출력 인터페이스(240)가 프로세서(220)와 별도로 구성된 요소로서 도시되었으나, 이에 한정되지 않으며, 입출력 인터페이스(240)가 프로세서(220)에 포함되도록 구성될 수 있다.The input/output interface 240 may be connected to the information processing system 200 or may be a means for interfacing with a device (not shown) for input or output that the information processing system 200 may include. In FIG. 2 , the input/output interface 240 is shown as an element configured separately from the processor 220, but the present invention is not limited thereto, and the input/output interface 240 may be included in the processor 220.

정보 처리 시스템(200)은 도 2의 구성요소들보다 더 많은 구성요소들을 포함할 수 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다.Information processing system 200 may include more components than those in FIG. 2 . However, there is no need to clearly show most prior art components.

일 실시예에 따르면, 프로세서(220)는 신경망의 학습 및 추론을 위한 프로그램을 동작하도록 구성될 수 있다. 이 때, 해당 프로그램과 연관된 코드가 메모리(210)에 로딩될 수 있다. 프로그램이 동작되는 동안에, 프로세서(220)는 입출력 장치(미도시)로부터 제공된 정보 및/또는 데이터를 입출력 인터페이스(240)를 통해 수신하거나 통신 모듈(230)을 통해 다른 시스템으로부터 정보 및/또는 데이터를 수신할 수 있으며, 수신된 정보 및/또는 데이터를 처리하여 메모리(210)에 저장할 수 있다. 또한, 이러한 정보 및/또는 데이터는 통신 모듈(230)을 통해 다른 시스템에 제공될 수 있다.According to one embodiment, the processor 220 may be configured to operate a program for learning and inference of a neural network. At this time, code related to the program may be loaded into the memory 210. While the program is operating, the processor 220 receives information and/or data provided from an input/output device (not shown) through the input/output interface 240 or receives information and/or data from another system through the communication module 230. The received information and/or data can be processed and stored in the memory 210. Additionally, such information and/or data may be provided to other systems through the communication module 230.

프로세서(220)는 복수의 다른 시스템으로부터 수신된 정보 및/또는 데이터를 관리, 처리 및/또는 저장하도록 구성될 수 있다. 일 실시예에 따르면, 프로세서(220)는 다른 시스템으로부터 수신한 이미지, 이미지로부터 추출된 특징 등을 저장, 처리 및 전송할 수 있다. 추가적으로 또는 대안적으로, 프로세서(220)는 네트워크와 연결된 별도의 클라우드 시스템, 데이터베이스 등으로부터 인공 신경망의 학습 및 추론을 위해 이용되는 알고리즘을 실행하기 위한 프로그램 등을 저장 및/또는 업데이트하도록 구성될 수 있다.Processor 220 may be configured to manage, process, and/or store information and/or data received from a plurality of different systems. According to one embodiment, the processor 220 may store, process, and transmit images received from other systems, features extracted from the images, etc. Additionally or alternatively, the processor 220 may be configured to store and/or update a program for executing an algorithm used for learning and inference of an artificial neural network from a separate cloud system, database, etc. connected to the network. .

도 3은 본 개시의 일 실시예에 따른 도메인 적응 기반 알츠하이머 예측 모델 학습 방법(300)의 흐름도이다. 방법(300)은 정보 처리 시스템의 적어도 하나의 프로세서(예: 프로세서(220))에 의해 수행될 수 있다.Figure 3 is a flowchart of a domain adaptation-based Alzheimer's prediction model learning method 300 according to an embodiment of the present disclosure. Method 300 may be performed by at least one processor (eg, processor 220) of an information processing system.

도시된 바와 같이, 방법(300)은 본 개시의 일 실시예에 따르면, 도메인 적응 기반 알츠하이머 예측 모델 학습 방법은 제1 뇌와 연관된 타겟 도메인의 타겟 데이터셋을 획득하는 단계(S310)로 개시될 수 있다. 그리고 나서 프로세서는 제2 뇌와 연관된 소스 도메인의 소스 데이터셋을 획득할 수 있다(S320). 이 경우, 소스 데이터셋은 제2 뇌에 대한 진단 정보를 나타내는 레이블(label)을 포함할 수 있다. 한편, 도 3에는 제1 뇌와 연관된 타겟 도메인의 타겟 데이터셋을 획득하는 단계(S310)가 제2 뇌와 연관된 소스 도메인의 소스 데이터셋을 획득(S320)보다 먼저 수행되는 것으로 도시되었으나 이에 한정되지 않는다. 예를 들어, 두 단계(S310, S320)는 역순으로 수행되거나, 병렬적으로(즉, 동시에) 수행될 수도 있다.As shown, the method 300, according to an embodiment of the present disclosure, may begin with a step (S310) of acquiring a target dataset of a target domain associated with the first brain. there is. Then, the processor may obtain the source dataset of the source domain associated with the second brain (S320). In this case, the source dataset may include a label indicating diagnostic information about the second brain. Meanwhile, in Figure 3, the step of acquiring the target dataset of the target domain associated with the first brain (S310) is shown to be performed before the acquisition of the source dataset of the source domain associated with the second brain (S320), but is not limited to this. No. For example, the two steps (S310 and S320) may be performed in reverse order or in parallel (i.e., simultaneously).

프로세서는 소스 데이터셋을 기초로 타겟 데이터셋에 대한 도메인 적응을 수행하여 변환 타겟 데이터셋을 획득할 수 있다(S330). 그리고 나서, 프로세서는 변환 타겟 데이터셋 및 소스 데이터셋을 학습 데이터로 하여 기계학습 모델을 학습시킬 수 있다(S340). 이 경우, 기계학습 모델은 뇌와 연관된 데이터가 입력됨에 따라 경도 인지장애에서 알츠하이머병으로의 전환 가능성과 연관된 정보를 포함하는 데이터가 출력되도록 학습될 수 있다. 추가적으로 또는 대안적으로 기계학습 모델의 목적 함수는 변환 타겟 데이터셋과 소스 데이터셋의 차이를 기초로 구성될 수 있다.The processor may obtain a transformation target dataset by performing domain adaptation on the target dataset based on the source dataset (S330). Then, the processor can train a machine learning model using the conversion target dataset and source dataset as learning data (S340). In this case, the machine learning model can be trained to output data containing information related to the possibility of conversion from mild cognitive impairment to Alzheimer's disease as brain-related data is input. Additionally or alternatively, the objective function of the machine learning model may be constructed based on the differences between the transformation target dataset and the source dataset.

[식 1][Equation 1]

[식 2][Equation 2]

식 2에서 는 적응 행렬에 대응되고, 는 제1 상관관계 행렬에 대응되고, 는 제2 상관관계 행렬에 대응될 수 있다.In equation 2 corresponds to the adaptation matrix, corresponds to the first correlation matrix, may correspond to the second correlation matrix.

본 개시의 앞선 설명은 통상의 기술자들이 본 개시를 행하거나 이용하는 것을 가능하게 하기 위해 제공된다. 본 개시의 다양한 수정예들이 통상의 기술자들에게 쉽게 자명할 것이고, 본원에 정의된 일반적인 원리들은 본 개시의 취지 또는 범위를 벗어나지 않으면서 다양한 변형예들에 적용될 수도 있다. 따라서, 본 개시는 본원에 설명된 예들에 제한되도록 의도된 것이 아니고, 본원에 개시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위가 부여되도록 의도된다.The preceding description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications of the present disclosure will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to the various modifications without departing from the spirit or scope of the present disclosure. Accordingly, this disclosure is not intended to be limited to the examples shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

비록 예시적인 구현예들이 하나 이상의 독립형 컴퓨터 시스템의 맥락에서 현재 개시된 주제의 양태들을 활용하는 것을 언급할 수도 있으나, 본 주제는 그렇게 제한되지 않고, 오히려 네트워크나 분산 컴퓨팅 환경과 같은 임의의 컴퓨팅 환경과 연계하여 구현될 수도 있다. 또 나아가, 현재 개시된 주제의 양상들은 복수의 프로세싱 칩들이나 디바이스들에서 또는 그들에 걸쳐 구현될 수도 있고, 스토리지는 복수의 디바이스들에 걸쳐 유사하게 영향을 받게 될 수도 있다. 이러한 디바이스들은 PC들, 네트워크 서버들, 및 핸드헬드 디바이스들을 포함할 수도 있다.Although example implementations may refer to utilizing aspects of the presently disclosed subject matter in the context of one or more standalone computer systems, the subject matter is not so limited, but rather in conjunction with any computing environment, such as a network or distributed computing environment. It can also be implemented. Furthermore, aspects of the presently disclosed subject matter may be implemented in or across multiple processing chips or devices, and storage may be similarly effected across the multiple devices. These devices may include PCs, network servers, and handheld devices.

본 명세서에서는 본 개시가 일부 실시예들과 관련하여 설명되었지만, 본 발명이 속하는 기술분야의 통상의 기술자가 이해할 수 있는 본 개시의 범위를 벗어나지 않는 범위에서 다양한 변형 및 변경이 이루어질 수 있다는 점을 알아야 할 것이다. 또한, 그러한 변형 및 변경은 본 명세서에서 첨부된 특허청구의 범위 내에 속하는 것으로 생각되어야 한다.Although the present disclosure has been described in relation to some embodiments in the specification, it should be noted that various modifications and changes can be made without departing from the scope of the present disclosure as can be understood by those skilled in the art. something to do. Additionally, such modifications and changes should be considered to fall within the scope of the claims appended hereto.

Claims

Obtaining a target dataset of a target domain associated with the first brain;
Obtaining a source dataset of a source domain associated with a second brain, the source dataset including a label representing diagnostic information for the second brain;
Obtaining a transformation target dataset by performing domain adaptation on the target dataset based on the source dataset; and
A step of training a machine learning model using the conversion target dataset and the source dataset as learning data.
Including,
The machine learning model is trained to output data containing information related to the possibility of conversion from mild cognitive impairment to Alzheimer's disease as brain-related data is input,
A domain adaptation-based Alzheimer's prediction model learning method performed by at least one processor, wherein the objective function of the machine learning model is configured based on the difference between the conversion target dataset and the source dataset.

According to paragraph 1,
The objective function is based on an adaptation matrix,
the adaptation matrix is configured to reduce the difference between a first average vector obtained from the target dataset and a second average vector obtained from the source dataset,
The first average vector corresponds to the center of the data distribution according to the label of the target dataset,
The domain adaptation-based Alzheimer's prediction model learning method performed by at least one processor, wherein the second average vector corresponds to the center of the data distribution according to the label of the source dataset.

According to paragraph 2,
The first average vector is obtained from a target data matrix corresponding to the target data set,
The domain adaptation-based Alzheimer's prediction model learning method performed by at least one processor, wherein the second average vector is obtained from a source data matrix corresponding to the source dataset.

According to paragraph 2,
The objective function is constructed based on the following [Equation 1],
[Equation 1]

In equation 1 above, corresponds to the adaptation matrix, corresponds to the first average vector, is a domain adaptation-based Alzheimer's prediction model learning method that corresponds to the second average vector and is performed by at least one processor.

According to paragraph 1,
The objective function is based on an adaptation matrix,
The adaptation matrix reduces the difference between a first correlation matrix obtained from the target dataset and a second correlation matrix obtained from the source dataset.
A domain adaptation-based Alzheimer's prediction model learning method performed by at least one processor, further comprising:

According to clause 5,
The first correlation matrix includes information related to correlations between regions of interest (ROIs) associated with the first brain,
The domain adaptation-based Alzheimer's prediction model learning method performed by at least one processor, wherein the second correlation matrix includes information related to correlation between regions of interest associated with the second brain.

According to clause 5,
The objective function is constructed based on the following [Equation 2],
[Equation 2]

In equation 2 above, corresponds to the adaptation matrix, corresponds to the first correlation matrix, is a domain adaptation-based Alzheimer's prediction model learning method corresponding to the second correlation matrix and performed by at least one processor.

A computer program recorded on a computer-readable recording medium to execute the domain adaptation-based Alzheimer's prediction model learning method according to claim 1.