KR102661357B1

KR102661357B1 - Method and apparatus for analyzing medical data

Info

Publication number: KR102661357B1
Application number: KR1020240001170A
Authority: KR
Inventors: 이찬중; 장은찬; 우현기
Original assignee: 주식회사 에비드넷
Priority date: 2024-01-03
Filing date: 2024-01-03
Publication date: 2024-04-26

Abstract

본 개시의 일 실시예에 따르면 의료기관의 의료데이터를 분석하는 방법이 제공된다. 상기 방법은, 제 1 의료기관 및 제 2 의료기관으로부터 상기 제 1 의료기관의 제 1 의료데이터셋 및 상기 제 2 의료기관의 제 2 의료데이터셋을 획득하는 단계; 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 통합(total) 합성데이터셋을 생성하는 단계; 및 타겟(target) 의료기관의 타겟 의료데이터셋에 기초하여, 상기 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계를 포함하며, 그리고 상기 제 1 의료데이터셋, 상기 제 2 의료데이터셋 및 상기 통합 합성데이터셋은 기설정된 독립변수 및 기설정된 종속변수를 포함할 수 있다.According to an embodiment of the present disclosure, a method for analyzing medical data of a medical institution is provided. The method includes obtaining a first medical dataset of the first medical institution and a second medical dataset of the second medical institution from a first medical institution and a second medical institution; generating a total synthetic dataset from the first medical dataset and the second medical dataset; And based on a target medical dataset of a target medical institution, obtaining an analysis result for the target medical dataset from the integrated synthetic dataset, and the first medical dataset and the second medical dataset. The data set and the integrated synthetic data set may include preset independent variables and preset dependent variables.

Description

Method and device for analyzing medical data {METHOD AND APPARATUS FOR ANALYZING MEDICAL DATA}

본 개시는 의료데이터를 분석하기 위한 방법 및 장치에 관한 것으로, 구체적으로 합성데이터셋으로부터 타겟 의료기관의 의료데이터셋을 분석하기 위한 방법 및 장치에 관한 것이다.This disclosure relates to a method and apparatus for analyzing medical data, and specifically relates to a method and apparatus for analyzing a medical data set of a target medical institution from a synthetic data set.

환자의 의료데이터는 개인정보보호법상 해당 정보주체의 개인정보 및 민감정보에 해당하고 환자의 의료데이터 활용 시, 개인정보유출이나 개인 식별 위험이 따른다. 이로 인해, 의료데이터는 병원 외부에서의 접근성이 떨어지고, 의료데이터 활용 및 적용에 있어 엄격한 기준이 따른다. 최근 의료 분야에서는, 의료데이터로부터 생성되는 의료 합성데이터(synthetic data)를 연구에 적용하는 사례가 늘어나고 있다. 의료 합성데이터를 활용하면 의료데이터의 직접적인 저장이나 반출 없이 연구를 진행할 수 있어, 기존 개인정보 및 민감정보의 개인정보유출 문제와 같은 보안 이슈로 인한 접근성에 제약 받지않고 연구 결과를 획득할 수 있다. 또한 합성데이터 생성 기술은 최근 데이터 입력 오류나 데이터 부족 문제도 해결하는 방안으로 제시되면서 의료데이터 분석에 수반되는 시간적, 경제적 비용이 크게 절감되는 효과도 얻을 수 있다.A patient's medical data is considered personal information and sensitive information of the information subject under the Personal Information Protection Act, and when using a patient's medical data, there is a risk of personal information leakage or personal identification. As a result, medical data is less accessible outside of hospitals, and strict standards are followed for the use and application of medical data. Recently, in the medical field, the number of cases of applying medical synthetic data generated from medical data to research is increasing. Using synthetic medical data, research can be conducted without direct storage or export of medical data, and research results can be obtained without being restricted by accessibility due to security issues such as personal information leakage of existing personal information and sensitive information. In addition, synthetic data generation technology has recently been proposed as a solution to data input errors and data shortage problems, which can significantly reduce the time and economic costs associated with medical data analysis.

대한민국 등록특허 10-2403461 (2022.05.25)Republic of Korea registered patent 10-2403461 (2022.05.25)

본 개시의 전술한 배경기술에 대응하여 안출된 것으로, 의료데이터를 분석하는 것을 해결 과제로 한다. 예를 들어, 본 개시는, 타겟(target) 의료기관의 타겟 의료데이터셋에 기초하여, 통합 합성데이터셋으로부터 타겟 의료데이터셋에 대한 분석 결과를 획득하는 것을 해결 과제로 한다.This disclosure was developed in response to the above-described background technology, and analyzes medical data as a problem to be solved. For example, the present disclosure aims to solve the problem of obtaining analysis results for the target medical dataset from an integrated synthetic dataset based on the target medical dataset of the target medical institution.

한편, 본 개시가 이루고자 하는 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 이하에서 설명할 내용으로부터 통상의 기술자에게 자명한 범위 내에서 다양한 기술적 과제가 포함될 수 있다.Meanwhile, the technical problem to be achieved by the present disclosure is not limited to the technical problems mentioned above, and may include various technical problems within the scope of what is apparent to those skilled in the art from the contents described below.

본 개시는 전술한 배경기술에 대응하여 안출된 것으로, 의료기관의 의료데이터를 분석하는 방법 및 장치를 제공하고자 한다.The present disclosure was made in response to the above-described background technology, and seeks to provide a method and device for analyzing medical data from a medical institution.

본 개시의 일 측면에 따르면, 컴퓨팅 장치에 의해 수행되는, 의료기관의 의료데이터를 분석하는 방법이 제공될 수 있다. 상기 방법은, 제 1 의료기관 및 제 2 의료기관으로부터 상기 제 1 의료기관의 제 1 의료데이터셋 및 상기 제 2 의료기관의 제 2 의료데이터셋을 획득하는 단계; 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 통합(total) 합성데이터셋을 생성하는 단계; 및 타겟(target) 의료기관의 타겟 의료데이터셋에 기초하여, 상기 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계를 포함하며, 그리고 상기 제 1 의료데이터셋, 상기 제 2 의료데이터셋 및 상기 통합 합성데이터셋은 기설정된 독립변수 및 기설정된 종속변수를 포함할 수 있다.According to one aspect of the present disclosure, a method of analyzing medical data of a medical institution performed by a computing device may be provided. The method includes obtaining a first medical dataset of the first medical institution and a second medical dataset of the second medical institution from a first medical institution and a second medical institution; generating a total synthetic dataset from the first medical dataset and the second medical dataset; And based on a target medical dataset of a target medical institution, obtaining an analysis result for the target medical dataset from the integrated synthetic dataset, and the first medical dataset and the second medical dataset. The data set and the integrated synthetic data set may include preset independent variables and preset dependent variables.

일 실시예에서, 상기 통합 합성데이터셋을 생성하는 단계는, 상기 기설정된 독립변수와 상기 기설정된 종속변수 간 상관관계에 대한 정보를 포함하는 임상적 사전정보를 획득하는 단계; 및 상기 임상적 사전정보, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 상기 통합 합성데이터셋을 생성하는 단계를 포함할 수 있다.In one embodiment, generating the integrated synthetic data set includes obtaining clinical prior information including information about the correlation between the preset independent variable and the preset dependent variable; And it may include generating the integrated synthetic dataset from the clinical prior information, the first medical dataset, and the second medical dataset.

일 실시예에서, 상기 통합 합성데이터셋을 생성하는 단계는, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 상기 제 1 의료데이터셋에 대응되는 제 1 합성데이터셋 및 상기 제 2 의료데이터셋에 대응되는 제 2 합성데이터셋을 생성하는 단계; 및 상기 제 1 합성데이터셋 및 상기 제 2 합성데이터셋을 통합하여 상기 통합 합성데이터셋을 생성하는 단계를 포함할 수 있다.In one embodiment, the step of generating the integrated synthetic dataset includes generating a first synthetic dataset and a second medical data corresponding to the first medical dataset from the first medical dataset and the second medical dataset. Generating a second synthetic data set corresponding to the set; And it may include generating the integrated synthetic dataset by integrating the first synthetic dataset and the second synthetic dataset.

일 실시예에서, 상기 제 1 합성데이터셋 및 상기 제 2 합성데이터셋을 생성하는 단계는, 사전 학습된 GAN(Generative Adversarial Networks) 모델을 이용하여, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 상기 제 1 합성데이터셋 및 상기 제 2 합성데이터셋을 생성하는 단계를 포함할 수 있다.In one embodiment, the step of generating the first synthetic dataset and the second synthetic dataset is to generate the first medical dataset and the second medical data using a pre-trained GAN (Generative Adversarial Networks) model. It may include generating the first synthetic dataset and the second synthetic dataset from a set.

일 실시예에서, 상기 제 1 합성데이터셋 및 상기 제 2 합성데이터셋을 생성하는 단계는, SMOTE(Synthetic Minority Oversampling Technique) 알고리즘을 이용하여, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 상기 제 1 합성데이터셋 및 상기 제 2 합성데이터셋을 생성하는 단계를 포함할 수 있다.In one embodiment, the step of generating the first synthetic dataset and the second synthetic dataset is to generate data from the first medical dataset and the second medical dataset using a SMOTE (Synthetic Minority Oversampling Technique) algorithm. It may include generating the first synthetic dataset and the second synthetic dataset.

일 실시예에서, 상기 제 1 합성데이터셋 및 상기 제 2 합성데이터셋을 생성하는 단계는, 상기 기설정된 독립변수와 상기 기설정된 종속변수 간 상관관계에 대한 정보를 포함하는 임상적 사전정보를 획득하는 단계; 및 베이지안(Bayesian) 추론 기반의 베이지안 알고리즘을 이용하여, 상기 임상적 사전정보에 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋을 반영하여 상기 제 1 합성데이터셋 및 상기 제 2 합성데이터셋을 생성하는 단계를 포함할 수 있다.In one embodiment, the step of generating the first synthetic data set and the second synthetic data set includes obtaining clinical prior information including information about the correlation between the preset independent variable and the preset dependent variable. steps; And using a Bayesian algorithm based on Bayesian inference, the first synthetic data set and the second synthetic data set are created by reflecting the first medical data set and the second medical data set in the clinical prior information. It may include a creation step.

일 실시예에서, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터, 상기 종속변수와 상관관계에 있고 그리고 상기 기설정된 독립변수에 포함되지 않는 제 1 독립변수를 결정하는 단계를 더 포함하고, 그리고 상기 분석 결과는, 상기 제 1 독립변수와 상기 기설정된 종속변수 간 상관관계에 대한 정보를 포함할 수 있다.In one embodiment, from the first medical dataset and the second medical dataset, determining a first independent variable that is correlated with the dependent variable and is not included in the preset independent variable; , and the analysis result may include information about the correlation between the first independent variable and the preset dependent variable.

일 실시예에서, 상기 통합 합성데이터셋은 상기 기설정된 독립변수에 대응되는 제 1 가중치를 포함하고, 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계는, 상기 타겟 의료데이터셋으로부터 상기 기설정된 독립변수에 대응되는 제 2 가중치를 획득하는 단계; 상기 제 1 가중치 및 상기 제 2 가중치에 기초하여, 상기 기설정된 독립변수에 대응되는 제 3 가중치를 획득하는 단계; 및 상기 기설정된 독립변수, 기설정된 종속변수 및 상기 제 3 가중치에 기초하여, 상기 통합 합성데이터셋으로부터 상기 분석 결과를 획득하는 단계를 포함할 수 있다.In one embodiment, the integrated synthetic dataset includes a first weight corresponding to the preset independent variable, and the step of obtaining an analysis result for the target medical dataset includes selecting the preset from the target medical dataset. Obtaining a second weight corresponding to the independent variable; Obtaining a third weight corresponding to the preset independent variable based on the first weight and the second weight; And it may include obtaining the analysis result from the integrated synthetic data set based on the preset independent variable, the preset dependent variable, and the third weight.

일 실시예에서, 상기 기설정된 독립변수에 대응되는 제 3 가중치를 획득하는 단계는, 상기 제 1 의료데이터셋의 제 1 의료데이터 개수, 상기 제 2 의료데이터셋의 제 2 의료데이터 개수 및 상기 타겟 의료데이터셋의 타겟 의료데이터 개수에 기초하여, 상기 제 1 가중치 및 상기 제 2 가중치 각각의 가중 정도를 결정하는 단계; 및 상기 제 1 가중치 및 상기 제 2 가중치 각각의 가중 정도에 기초하여, 상기 제 3 가중치를 획득하는 단계를 포함할 수 있다.In one embodiment, the step of acquiring a third weight corresponding to the preset independent variable includes the number of first medical data in the first medical data set, the number of second medical data in the second medical data set, and the target. Determining a weighting degree of each of the first weight and the second weight based on the number of target medical data in the medical data set; and obtaining the third weight based on the weighting degree of each of the first weight and the second weight.

일 실시예에서, 상기 제 1 가중치 및 상기 제 2 가중치 각각의 가중 정도를 결정하는 단계는, 임상적 수치가 기록된 임상 통계자료를 포함하는 임상적 사전정보를 획득하는 단계; 및 상기 임상 통계자료의 임상 의료데이터 개수, 상기 제 1 의료데이터셋의 제 1 의료데이터 개수, 상기 제 2 의료데이터셋의 제 2 의료데이터 개수 및 상기 타겟 의료데이터셋의 타겟 의료데이터 개수에 기초하여, 상기 제 1 가중치 및 상기 제 2 가중치 각각의 가중 정도를 결정하는 단계를 포함할 수 있다.In one embodiment, the step of determining the weighting degree of each of the first weight and the second weight includes: acquiring clinical prior information including clinical statistical data in which clinical values are recorded; And based on the number of clinical medical data in the clinical statistical data, the number of first medical data in the first medical data set, the number of second medical data in the second medical data set, and the number of target medical data in the target medical data set. , may include determining a weighting degree of each of the first weight and the second weight.

일 실시예에서, 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계는, 상기 기설정된 독립변수, 상기 기설정된 종속변수 및 상기 제 3 가중치에 기초하여, 상기 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대응되는 타겟 합성데이터셋을 생성하는 단계; 및 상기 타겟 합성데이터셋으로부터 상기 분석 결과를 획득하는 단계를 포함할 수 있다.In one embodiment, the step of obtaining an analysis result for the target medical dataset includes obtaining the target medical data from the integrated synthetic dataset based on the preset independent variable, the preset dependent variable, and the third weight. Generating a target synthetic dataset corresponding to the set; And it may include obtaining the analysis result from the target synthetic dataset.

일 실시예에서, 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계는, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터, 상기 종속변수와 상관관계에 있고 그리고 상기 기설정된 독립변수에 포함되지 않는 제 1 독립변수를 결정하는 단계; 상기 기설정된 독립변수, 상기 제 1 독립변수, 상기 기설정된 종속변수 및 상기 제 3 가중치에 기초하여, 상기 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대응되는 타겟 합성데이터셋을 생성하는 단계; 및 상기 타겟 합성데이터셋으로부터 상기 분석 결과를 획득하는 단계를 포함할 수 있다.In one embodiment, the step of obtaining an analysis result for the target medical data set is correlated with the dependent variable and the preset independent variable from the first medical data set and the second medical data set. determining a first independent variable not included; Generating a target synthetic dataset corresponding to the target medical dataset from the integrated synthetic dataset based on the preset independent variable, the first independent variable, the preset dependent variable, and the third weight; And it may include obtaining the analysis result from the target synthetic dataset.

일 실시예에서, 상기 타겟 의료데이터셋에 대한 분석 결과는, 상기 기설정된 독립변수, 상기 기설정된 종속변수, 상기 제 3 가중치 및 상기 기설정된 독립변수와 상기 기설정된 종속변수 간 상관관계에 관한 정보를 포함할 수 있다.In one embodiment, the analysis result of the target medical data set includes information about the preset independent variable, the preset dependent variable, the third weight, and the correlation between the preset independent variable and the preset dependent variable. may include.

일 실시예에서, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋은 제 1 로우(raw) 의료데이터셋 및 제 2 로우 의료데이터셋에서 공통 데이터 모델(Common Data Model, CDM)의 데이터구조로 변환된 의료데이터셋일 수 있다 - 상기 제 1 의료데이터셋은 상기 제 1 로우 의료데이터셋에 대응되고, 그리고 상기 제 2 의료데이터셋은 상기 제 2 로우 의료데이터셋에 대응됨 -.In one embodiment, the first medical data set and the second medical data set have a data structure of a common data model (CDM) in the first raw medical data set and the second raw medical data set. It may be a converted medical dataset - the first medical dataset corresponds to the first raw medical dataset, and the second medical dataset corresponds to the second raw medical dataset.

일 실시예에서, 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계는, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 제 1 통합 합성데이터셋을 생성하는 단계; 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 제 2 통합 합성데이터셋을 생성하는 단계; 및 상기 타겟 의료데이터셋에 기초하여, 상기 제 1 통합 합성데이터셋 및 상기 제 2 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계를 포함하고, 그리고 상기 제 1 통합 합성데이터셋은 상기 기설정된 독립변수에 대응되는 제 1-1 가중치를 포함하고, 상기 제 2 통합 합성데이터셋은 상기 기설정된 독립변수에 대응되는 제 1-2 가중치를 포함하고, 상기 제 1-1 가중치 및 상기 제 1-2 가중치는 서로 상이할 수 있다.In one embodiment, obtaining an analysis result for the target medical dataset includes generating a first integrated synthetic dataset from the first medical dataset and the second medical dataset; generating a second integrated synthetic dataset from the first medical dataset and the second medical dataset; And based on the target medical dataset, obtaining an analysis result for the target medical dataset from the first integrated synthetic dataset and the second integrated synthetic dataset, and the first integrated synthetic data The set includes a 1-1 weight corresponding to the preset independent variable, and the second integrated synthetic data set includes a 1-2 weight corresponding to the preset independent variable, and the 1-1 weight And the 1-2 weights may be different from each other.

일 실시예에서, 상기 제 1 통합 합성데이터셋 및 상기 제 2 통합 합성데이터셋 각각에 포함된 기설정된 종속변수는 서로 상이한 평균 및 표준편차에 기초한 확률 분포를 가질 수 있다. In one embodiment, preset dependent variables included in each of the first integrated synthetic data set and the second integrated synthetic data set may have probability distributions based on different means and standard deviations.

일 실시예에서, 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계는, 상기 제 1 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대한 제 1 분석 결과를 획득하는 단계; 상기 제 2 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대한 제 2 분석 결과를 획득하는 단계; 및 상기 제 1 분석 결과 및 상기 제 2 분석 결과에 기초하여, 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 단계를 포함할 수 있다.In one embodiment, obtaining an analysis result for the target medical dataset includes obtaining a first analysis result for the target medical dataset from the first integrated synthetic dataset; Obtaining a second analysis result for the target medical dataset from the second integrated synthetic dataset; and obtaining an analysis result for the target medical dataset based on the first analysis result and the second analysis result.

일 실시예에서, 기설정된 효과크기, 기설정된 검정력 및 기설정된 유의수준 중 적어도 하나에 기초하여, 상기 통합 합성데이터셋을 생성하기 위한 의료데이터셋의 필요 데이터 개수를 결정하는 단계를 더 포함하고, 그리고 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋 각각은 상기 필요 데이터 개수 이상의 의료데이터를 포함할 수 있다.In one embodiment, based on at least one of a preset effect size, preset power, and preset significance level, determining the required number of data of the medical dataset to generate the integrated synthetic dataset, further comprising: And each of the first medical data set and the second medical data set may include more medical data than the required number of data.

본 개시의 일 측면에 따르면, 의료기관의 의료데이터를 분석하기 위한 컴퓨팅 장치가 제공될 수 있다. 상기 컴퓨팅 장치는, 적어도 하나의 프로세서; 및 메모리를 포함하며, 상기 적어도 하나의 프로세서는, 제 1 의료기관 및 제 2 의료기관으로부터 상기 제 1 의료기관의 제 1 의료데이터셋 및 상기 제 2 의료기관의 제 2 의료데이터셋을 획득하고, 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 통합(total) 합성데이터셋을 생성하고, 그리고 타겟(target) 의료기관의 타겟 의료데이터셋에 기초하여, 상기 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하며, 그리고 상기 제 1 의료데이터셋, 상기 제 2 의료데이터셋 및 상기 통합 합성데이터셋은 기설정된 독립변수 및 기설정된 종속변수를 포함할 수 있다.According to one aspect of the present disclosure, a computing device for analyzing medical data of a medical institution may be provided. The computing device includes at least one processor; and a memory, wherein the at least one processor acquires a first medical dataset of the first medical institution and a second medical dataset of the second medical institution from the first medical institution and the second medical institution, and Generating a total synthetic data set from the data set and the second medical data set, and analyzing the target medical data set from the total synthetic data set based on the target medical data set of the target medical institution. A result is obtained, and the first medical dataset, the second medical dataset, and the integrated synthetic dataset may include a preset independent variable and a preset dependent variable.

본 개시의 일 측면에 따르면, 컴퓨터 판독가능 저장 매체에 저장된 컴퓨터 프로그램이 제공될 수 있다. 상기 컴퓨터 프로그램은 하나 이상의 프로세서에 의해 실행되는 경우, 상기 하나 이상의 프로세서로 하여금 의료기관의 의료데이터를 분석하는 동작들을 수행하도록 하며, 상기 동작들은: 제 1 의료기관 및 제 2 의료기관으로부터 상기 제 1 의료기관의 제 1 의료데이터셋 및 상기 제 2 의료기관의 제 2 의료데이터셋을 획득하는 동작; 상기 제 1 의료데이터셋 및 상기 제 2 의료데이터셋으로부터 통합(total) 합성데이터셋을 생성하는 동작; 및 타겟(target) 의료기관의 타겟 의료데이터셋에 기초하여, 상기 통합 합성데이터셋으로부터 상기 타겟 의료데이터셋에 대한 분석 결과를 획득하는 동작을 포함하며, 그리고 상기 제 1 의료데이터셋, 상기 제 2 의료데이터셋 및 상기 통합 합성데이터셋은 기설정된 독립변수 및 기설정된 종속변수를 포함할 수 있다.According to one aspect of the present disclosure, a computer program stored in a computer-readable storage medium may be provided. When the computer program is executed by one or more processors, it causes the one or more processors to perform operations for analyzing medical data of a medical institution, the operations comprising: receiving data from a first medical institution and a second medical institution; An operation of acquiring a first medical dataset and a second medical dataset of the second medical institution; An operation of generating a total synthetic data set from the first medical data set and the second medical data set; And based on a target medical dataset of a target medical institution, obtaining an analysis result for the target medical dataset from the integrated synthetic dataset, and the first medical dataset and the second medical dataset. The data set and the integrated synthetic data set may include preset independent variables and preset dependent variables.

본 개시의 몇몇 실시예에 따르면, 타겟 의료기관의 타겟 의료데이터셋에 기초하여, 통합 합성데이터셋으로부터 타겟 의료데이터셋에 대한 분석 결과를 획득할 수 있다.According to some embodiments of the present disclosure, based on the target medical dataset of the target medical institution, analysis results for the target medical dataset may be obtained from the integrated synthetic dataset.

본 개시에서 얻을 수 있는 효과는 이상에서 언급한 효과로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects that can be obtained from the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. .

다양한 양상들이 이제 도면들을 참조로 기재되며, 여기서 유사한 참조 번호들은 총괄적으로 유사한 구성요소들을 지칭하는데 이용된다. 이하의 실시예에서, 설명 목적을 위해, 다수의 특정 세부사항들이 하나 이상의 양상들의 총체적 이해를 제공하기 위해 제시된다. 그러나, 그러한 양상(들)이 이러한 특정 세부사항들 없이 실시될 수 있음은 명백할 것이다. 다른 예시들에서, 공지의 구조들 및 장치들이 하나 이상의 양상들의 기재를 용이하게 하기 위해 블록도 형태로 도시된다.
도 1은 본 개시의 일 실시예에 따른 타겟 의료데이터셋에 대한 분석 결과를 획득하는 컴퓨팅 장치의 블록 구성도이다.
도 2는 본 개시의 일 실시예에 따른 네트워크 함수를 나타내는 개략도이다.
도 3은 본 개시의 일 실시예에 따라 컴퓨팅 장치가 타겟 의료데이터셋에 대한 분석 결과를 획득하는 방법을 나타내는 순서도이다.
도 4는 본 개시의 일 실시예에 따라 컴퓨팅 장치가 타겟 의료데이터셋에 대한 분석 결과를 획득하는 과정을 나타내는 개략도이다.
도 5는 본 개시의 일 실시예에 따른 타겟 의료기관의 가중치를 구하는 과정을 나타내는 개략도이다.
도 6은 본 개시의 실시예들이 구현될 수 있는 예시적인 컴퓨팅 환경에 대한 간략하고 일반적인 개략도이다.Various aspects will now be described with reference to the drawings, where like reference numerals are used to collectively refer to like elements. In the following examples, for purposes of explanation, numerous specific details are set forth to provide a comprehensive understanding of one or more aspects. However, it will be clear that such aspect(s) may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form to facilitate describing one or more aspects.
1 is a block diagram of a computing device that obtains analysis results for a target medical dataset according to an embodiment of the present disclosure.
Figure 2 is a schematic diagram showing a network function according to an embodiment of the present disclosure.
Figure 3 is a flowchart showing a method by which a computing device obtains analysis results for a target medical dataset according to an embodiment of the present disclosure.
Figure 4 is a schematic diagram showing a process by which a computing device obtains analysis results for a target medical dataset according to an embodiment of the present disclosure.
Figure 5 is a schematic diagram showing a process for calculating the weight of a target medical institution according to an embodiment of the present disclosure.
6 is a brief, general schematic diagram of an example computing environment in which embodiments of the present disclosure may be implemented.

다양한 실시예들이 이제 도면을 참조하여 설명된다. 본 명세서에서, 다양한 설명들이 본 개시의 이해를 제공하기 위해서 제시된다. 그러나, 이러한 실시예들은 이러한 구체적인 설명 없이도 실행될 수 있음이 명백하다.Various embodiments are now described with reference to the drawings. In this specification, various descriptions are presented to provide an understanding of the disclosure. However, it is clear that these embodiments may be practiced without these specific descriptions.

본 명세서에서 사용되는 용어 "컴포넌트", "모듈", "시스템" 등은 컴퓨터-관련 엔티티, 하드웨어, 펌웨어, 소프트웨어, 소프트웨어 및 하드웨어의 조합, 또는 소프트웨어의 실행을 지칭한다. 예를 들어, 컴포넌트는 프로세서상에서 실행되는 처리과정(procedure), 프로세서, 객체, 실행 스레드, 프로그램, 및/또는 컴퓨터일 수 있지만, 이들로 제한되는 것은 아니다. 예를 들어, 컴퓨팅 장치에서 실행되는 애플리케이션 및 컴퓨팅 장치 모두 컴포넌트일 수 있다. 하나 이상의 컴포넌트는 프로세서 및/또는 실행 스레드 내에 상주할 수 있다. 일 컴포넌트는 하나의 컴퓨터 내에 로컬화 될 수 있다. 일 컴포넌트는 2개 이상의 컴퓨터들 사이에 분배될 수 있다. 또한, 이러한 컴포넌트들은 그 내부에 저장된 다양한 데이터 구조들을 갖는 다양한 컴퓨터 판독가능한 매체로부터 실행할 수 있다. 컴포넌트들은 예를 들어 하나 이상의 데이터 패킷들을 갖는 신호(예를 들면, 로컬 시스템, 분산 시스템에서 다른 컴포넌트와 상호작용하는 하나의 컴포넌트로부터의 데이터 및/또는 신호를 통해 다른 시스템과 인터넷과 같은 네트워크를 통해 전송되는 데이터)에 따라 로컬 및/또는 원격 처리들을 통해 통신할 수 있다.As used herein, the terms “component,” “module,” “system,” and the like refer to a computer-related entity, hardware, firmware, software, a combination of software and hardware, or an implementation of software. For example, a component may be, but is not limited to, a process running on a processor, a processor, an object, a thread of execution, a program, and/or a computer. For example, both an application running on a computing device and the computing device can be a component. One or more components may reside within a processor and/or thread of execution. A component may be localized within one computer. A component may be distributed between two or more computers. Additionally, these components can execute from various computer-readable media having various data structures stored thereon. Components may transmit signals, for example, with one or more data packets (e.g., data and/or signals from one component interacting with other components in a local system, a distributed system, to other systems and over a network such as the Internet). Depending on the data being transmitted, they may communicate through local and/or remote processes.

더불어, 용어 "또는"은 배타적 "또는"이 아니라 내포적 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되지 않거나 문맥상 명확하지 않은 경우에, "X는 A 또는 B를 이용한다"는 자연적인 내포적 치환 중 하나를 의미하는 것으로 의도된다. 즉, X가 A를 이용하거나; X가 B를 이용하거나; 또는 X가 A 및 B 모두를 이용하는 경우, "X는 A 또는 B를 이용한다"가 이들 경우들 어느 것으로도 적용될 수 있다. 또한, 본 명세서에 사용된 "또는", "및/또는"이라는 용어는 열거된 관련 아이템들 중 하나 이상의 아이템의 가능한 모든 조합을 지칭하고 포함하는 것으로 이해되어야 한다.Additionally, the term “or” is intended to mean an inclusive “or” and not an exclusive “or.” That is, unless otherwise specified or clear from context, “X utilizes A or B” is intended to mean one of the natural implicit substitutions. That is, either X uses A; X uses B; Or, if X uses both A and B, “X uses A or B” can apply to either of these cases. Additionally, the terms “or”, “and/or” as used herein should be understood to refer to and include all possible combinations of one or more of the related items listed.

또한, "포함한다" 및/또는 "포함하는"이라는 용어는, 해당 특징 및/또는 구성요소가 존재함을 의미하는 것으로 이해되어야 한다. 다만, "포함한다" 및/또는 "포함하는"이라는 용어는, 하나 이상의 다른 특징, 구성요소 및/또는 이들의 그룹의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다. 또한, 달리 특정되지 않거나 단수 형태를 지시하는 것으로 문맥상 명확하지 않은 경우에, 본 명세서와 청구범위에서 단수는 일반적으로 "하나 또는 그 이상"을 의미하는 것으로 해석되어야 한다.Additionally, the terms “comprise” and/or “comprising” should be understood to mean that the corresponding feature and/or element is present. However, the terms “comprise” and/or “comprising” should be understood as not excluding the presence or addition of one or more other features, elements and/or groups thereof. Additionally, unless otherwise specified or the context is clear to indicate a singular form, the singular terms herein and in the claims should generally be construed to mean “one or more.”

그리고, "A 또는 B 중 적어도 하나"이라는 용어는, "A만을 포함하는 경우", "B 만을 포함하는 경우", "A와 B의 구성으로 조합된 경우"를 의미하는 것으로 해석되어야 한다. And, the term “at least one of A or B” should be interpreted to mean “a case containing only A,” “a case containing only B,” and “a case of combining A and B.”

당업자들은 추가적으로 여기서 개시된 실시예들과 관련되어 설명된 다양한 예시적 논리적 블록들, 구성들, 모듈들, 회로들, 수단들, 로직들, 및 알고리즘 단계들이 전자 하드웨어, 컴퓨터 소프트웨어, 또는 양쪽 모두의 조합들로 구현될 수 있음을 인식해야 한다. 하드웨어 및 소프트웨어의 상호교환성을 명백하게 예시하기 위해, 다양한 예시적 컴포넌트들, 블록들, 구성들, 수단들, 로직들, 모듈들, 회로들, 및 단계들은 그들의 기능성 측면에서 일반적으로 위에서 설명되었다. 그러한 기능성이 하드웨어로 또는 소프트웨어로서 구현되는지 여부는 전반적인 시스템에 부과된 특정 어플리케이션(application) 및 설계 제한들에 달려 있다. 숙련된 기술자들은 각각의 특정 어플리케이션들을 위해 다양한 방법들로 설명된 기능성을 구현할 수 있다. 다만, 그러한 구현의 결정들이 본 개시내용의 영역을 벗어나게 하는 것으로 해석되어서는 안 된다.Those skilled in the art will additionally recognize that the various illustrative logical blocks, components, modules, circuits, means, logic, and algorithm steps described in connection with the embodiments disclosed herein may be implemented using electronic hardware, computer software, or a combination of both. It must be recognized that it can be implemented with To clearly illustrate the interchangeability of hardware and software, various illustrative components, blocks, configurations, means, logics, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented in hardware or software will depend on the specific application and design constraints imposed on the overall system. A skilled technician can implement the described functionality in a variety of ways for each specific application. However, such implementation decisions should not be construed as causing a departure from the scope of the present disclosure.

제시된 실시예들에 대한 설명은 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 발명을 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이다. 여기에 정의된 일반적인 원리들은 본 개시의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 발명은 여기에 제시된 실시예 들로 한정되는 것이 아니다. 본 발명은 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다. The description of the presented embodiments is provided to enable anyone skilled in the art to use or practice the present invention. Various modifications to these embodiments will be apparent to those skilled in the art. The general principles defined herein may be applied to other embodiments without departing from the scope of the disclosure. Therefore, the present invention is not limited to the embodiments presented herein. The present invention is to be interpreted in the broadest scope consistent with the principles and novel features presented herein.

본 개시내용에서의 제 1, 제 2, 또는 제 3 과 같이 제 N으로 표현되는 용어들은 적어도 하나의 엔티티들을 구분하기 위해 사용된다. 예를 들어, 제 1과 제 2로 표현된 엔티티들은 서로 동일하거나 또는 상이할 수 있다.In the present disclosure, terms represented by N, such as first, second, or third, are used to distinguish at least one entity. For example, the entities expressed as first and second may be the same or different from each other.

도 1은 본 개시의 일 실시예에 따른 타겟 의료데이터셋에 대한 분석 결과를 획득하는 컴퓨팅 장치를 설명하기 위한 도면이다.FIG. 1 is a diagram illustrating a computing device that obtains analysis results for a target medical dataset according to an embodiment of the present disclosure.

도 1에 도시된 컴퓨팅 장치(100)의 구성은 간략화 하여 나타낸 예시일 뿐이다. 본 개시의 일 실시예에서 컴퓨팅 장치(100)는 컴퓨팅 장치(100)의 컴퓨팅 환경을 수행하기 위한 다른 구성들이 포함될 수 있고, 개시된 구성들 중 일부만이 컴퓨팅 장치(100)를 구성할 수도 있다. The configuration of the computing device 100 shown in FIG. 1 is only a simplified example. In one embodiment of the present disclosure, the computing device 100 may include different components for performing the computing environment of the computing device 100, and only some of the disclosed components may configure the computing device 100.

본 개시의 몇몇 실시예에 따른 컴퓨팅 장치(100)는 타겟 의료데이터셋에 대한 분석을 수행하기 위한 장치일 수 있다. 예를 들어, 컴퓨팅 장치(100)는 통합(total) 합성데이터셋을 생성하고, 통합 합성데이터셋으로부터 타겟 의료데이터셋에 대한 분석을 수행하는 장치일 수 있다. 컴퓨팅 장치(100)는 임의의 형태의 서버 또는 사용자 단말을 포함할 수 있다.The computing device 100 according to some embodiments of the present disclosure may be a device for performing analysis on a target medical dataset. For example, the computing device 100 may be a device that generates a total synthetic data set and performs analysis on a target medical data set from the total synthetic data set. Computing device 100 may include any type of server or user terminal.

컴퓨팅 장치(100)는 프로세서(110), 메모리(130), 네트워크부(150)를 포함할 수 있다.The computing device 100 may include a processor 110, a memory 130, and a network unit 150.

프로세서(110)는 하나 이상의 코어로 구성될 수 있으며, 컴퓨팅 장치(100)의 중앙 처리 장치(CPU: central processing unit), 범용 그래픽 처리 장치(GPGPU: general purpose graphics processing unit), 텐서 처리 장치(TPU: tensor processing unit)등의 데이터의 처리에 관련된 동작을 수행하기 위한 프로세서를 포함할 수 있다.The processor 110 may consist of one or more cores, and the computing device 100 may include a central processing unit (CPU), a general purpose graphics processing unit (GPGPU), and a tensor processing unit (TPU). : tensor processing unit) may include a processor to perform operations related to data processing.

본 개시의 일 실시예에 따라 프로세서(110)는 신경망의 학습을 위한 연산을 수행할 수도 있다. 예를 들어, 프로세서(110)는 딥러닝(DL: deep learning)에서 학습을 위한 입력 데이터의 처리, 입력 데이터에서의 피쳐 추출, 오차 계산, 역전파(backpropagation)를 이용한 신경망의 가중치 업데이트 등의 신경망의 학습을 위한 계산을 수행할 수 있다. 프로세서(110)의 CPU, GPGPU, 및 TPU 중 적어도 하나가 네트워크 함수의 학습을 처리할 수 있다. 예를 들어, CPU 와 GPGPU가 함께 네트워크 함수의 학습, 네트워크 함수를 이용한 데이터 분류를 처리할 수 있다. 또한, 본 개시의 일 실시예에서 복수의 컴퓨팅 장치들의 프로세서들을 함께 사용하여 네트워크 함수의 학습, 네트워크 함수를 이용한 데이터 분류를 처리할 수도 있다.According to an embodiment of the present disclosure, the processor 110 may perform an operation for learning a neural network. For example, the processor 110 processes input data for learning in deep learning (DL), extracts features from input data, calculates errors, and updates neural network weights using backpropagation. Calculations for learning can be performed. At least one of the CPU, GPGPU, and TPU of the processor 110 may process learning of the network function. For example, CPU and GPGPU can work together to process learning of network functions and data classification using network functions. Additionally, in one embodiment of the present disclosure, processors of a plurality of computing devices may be used together to process learning of a network function and data classification using a network function.

프로세서(110)는 통상적으로 컴퓨팅 장치(100)의 전반적인 동작을 제어할 수 있다. 프로세서(110)는 컴퓨팅 장치(100)에 포함된 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 메모리(130)에 저장된 응용 프로그램을 구동함으로써, 사용자에게 적절한 정보 또는 기능을 제공 또는 처리할 수 있다.The processor 110 may typically control the overall operation of the computing device 100. The processor 110 provides appropriate information or functions to the user by processing signals, data, information, etc. input or output through components included in the computing device 100 or running an application program stored in the memory 130. Or you can process it.

본 개시의 일 실시예에서, 메모리(130)는 프로세서(110)가 생성하거나 결정한 임의의 형태의 정보 및/또는 네트워크부(150)가 수신한 임의의 형태의 정보를 저장할 수 있다. 일 실시예에서, 메모리(130)는 데이터베이스를 저장하고 있을 수 있다. 데이터베이스는 컴퓨팅 장치(100)가 처리할 수 있는 형태로 저장된 데이터의 집합일 수 있다. 예를 들어, 메모리(130)는 의료데이터 데이터베이스를 포함할 수 있다. 의료데이터 데이터베이스는 복수의 의료데이터들이 저장되어 있을 수 있다.In one embodiment of the present disclosure, the memory 130 may store any type of information generated or determined by the processor 110 and/or any type of information received by the network unit 150. In one embodiment, memory 130 may store a database. A database may be a set of data stored in a form that the computing device 100 can process. For example, the memory 130 may include a medical data database. A medical data database may store a plurality of medical data.

본 개시의 일 실시예에서, 메모리(130)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(Read-Only Memory, ROM), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크 및/또는 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 컴퓨팅 장치(100)는 인터넷(internet) 상에서 메모리(130)의 저장 기능을 수행하는 웹 스토리지(web storage)와 관련되어 동작할 수도 있다. 전술한 메모리에 대한 기재는 예시일 뿐, 본 개시는 이에 제한되지 않는다. 메모리(130)는 프로세서(110)에 의하여 동작 될 수 있다.In one embodiment of the present disclosure, the memory 130 is a flash memory type, hard disk type, multimedia card micro type, or card type memory (e.g. SD or It may include at least one type of storage medium among (Only Memory), magnetic memory, magnetic disk, and/or optical disk. The computing device 100 may operate in connection with web storage that performs a storage function of the memory 130 on the Internet. The description of the memory described above is merely an example, and the present disclosure is not limited thereto. Memory 130 may be operated by processor 110.

본 개시의 일 실시예에 따른 네트워크부(150)는 임의의 형태의 데이터 및 신호 등을 송신 및 수신할 수 있는 임의의 유무선 통신 네트워크가 본 개시 내용에서 표현되는 네트워크에 포함될 수 있다. 본 명세서에서 설명된 기술들은 위에서 언급된 네트워크들뿐만 아니라, 다른 네트워크들에서도 사용될 수 있다.The network unit 150 according to an embodiment of the present disclosure may include any wired or wireless communication network capable of transmitting and receiving arbitrary types of data and signals, etc., as expressed in the present disclosure. The techniques described herein can be used in the networks mentioned above, as well as other networks.

도 2는 본 개시의 일 실시예에 따른 네트워크 함수를 나타내는 도면이다.Figure 2 is a diagram showing a network function according to an embodiment of the present disclosure.

본 명세서에 걸쳐, 인공지능 기반의 모델(예를 들어, 요약 정보 생성 모델, 정보 분류 모델 등), 연산 모델, 신경망, 네트워크 함수, 뉴럴 네트워크(neural network)는 동일한 의미(상호교환 가능한 의미)로 사용될 수 있다.Throughout this specification, the terms artificial intelligence-based model (e.g., summary information generation model, information classification model, etc.), computational model, neural network, network function, and neural network have the same meaning (interchangeable meaning). can be used

신경망은 일반적으로 노드(node)라 지칭될 수 있는 상호 연결된 계산 단위들의 집합으로 구성될 수 있다. 이러한 노드들은 뉴런(neuron)들로 지칭될 수도 있다. 신경망은 적어도 하나 이상의 노드들을 포함하여 구성된다. 신경망들을 구성하는 노드(또는 뉴런)들은 하나 이상의 링크에 의해 상호 연결될 수 있다.A neural network can generally consist of a set of interconnected computational units, which can be referred to as nodes. These nodes may also be referred to as neurons. A neural network consists of at least one node. Nodes (or neurons) that make up neural networks may be interconnected by one or more links.

신경망 내에서, 링크를 통해 연결된 하나 이상의 노드들은 상대적으로 입력 노드 및 출력 노드의 관계를 형성할 수 있다. 입력 노드 및 출력 노드의 개념은 상대적인 것으로서, 하나의 노드에 대하여 출력 노드 관계에 있는 임의의 노드는 다른 노드와의 관계에서 입력 노드 관계에 있을 수 있으며, 그 역도 성립할 수 있다. 상술한 바와 같이, 입력 노드 대 출력 노드 관계는 링크를 중심으로 생성될 수 있다. 하나의 입력 노드에 하나 이상의 출력 노드가 링크를 통해 연결될 수 있으며, 그 역도 성립할 수 있다.Within a neural network, one or more nodes connected through a link may form a relative input node and output node relationship. The concepts of input node and output node are relative, and any node in an output node relationship with one node may be in an input node relationship with another node, and vice versa. As described above, input node to output node relationships can be created around links. One or more output nodes can be connected to one input node through a link, and vice versa.

하나의 링크를 통해 연결된 입력 노드 및 출력 노드 관계에서, 출력 노드의 데이터는 입력 노드에 입력된 데이터에 기초하여 그 값이 결정될 수 있다. 여기서 입력 노드와 출력 노드를 상호 연결하는 링크는 가중치(weight)를 가질 수 있다. 가중치는 가변적일 수 있으며, 신경망이 원하는 기능을 수행하기 위해, 사용자 또는 알고리즘에 의해 가변 될 수 있다. 예를 들어, 하나의 출력 노드에 하나 이상의 입력 노드가 각각의 링크에 의해 상호 연결된 경우, 출력 노드는 상기 출력 노드와 연결된 입력 노드들에 입력된 값들 및 각각의 입력 노드들에 대응하는 링크에 설정된 가중치에 기초하여 출력 노드 값을 결정할 수 있다.In a relationship between an input node and an output node connected through one link, the value of the data of the output node may be determined based on the data input to the input node. Here, the link connecting the input node and the output node may have a weight. Weights may be variable and may be varied by the user or algorithm in order for the neural network to perform the desired function. For example, when one or more input nodes are connected to one output node by respective links, the output node is set to the values input to the input nodes connected to the output node and the links corresponding to each input node. The output node value can be determined based on the weight.

상술한 바와 같이, 신경망은 하나 이상의 노드들이 하나 이상의 링크를 통해 상호 연결되어 신경망 내에서 입력 노드 및 출력 노드 관계를 형성한다. 신경망 내에서 노드들과 링크들의 개수 및 노드들과 링크들 사이의 연관관계, 링크들 각각에 부여된 가중치의 값에 따라, 신경망의 특성이 결정될 수 있다. 예를 들어, 동일한 개수의 노드 및 링크들이 존재하고, 링크들의 가중치 값이 상이한 두 신경망이 존재하는 경우, 두 개의 신경망들은 서로 상이한 것으로 인식될 수 있다.As described above, in a neural network, one or more nodes are interconnected through one or more links to form an input node and output node relationship within the neural network. The characteristics of the neural network may be determined according to the number of nodes and links within the neural network, the correlation between the nodes and links, and the value of the weight assigned to each link. For example, if the same number of nodes and links exist and two neural networks with different weight values of the links exist, the two neural networks may be recognized as different from each other.

신경망은 하나 이상의 노드들의 집합으로 구성될 수 있다. 신경망을 구성하는 노드들의 부분 집합은 레이어(layer)를 구성할 수 있다. 신경망을 구성하는 노드들 중 일부는, 최초 입력 노드로부터의 거리들에 기초하여, 하나의 레이어를 구성할 수 있다. 예를 들어, 최초 입력 노드로부터 거리가 n인 노드들의 집합은, n 레이어를 구성할 수 있다. 최초 입력 노드로부터 거리는, 최초 입력 노드로부터 해당 노드까지 도달하기 위해 거쳐야 하는 링크들의 최소 개수에 의해 정의될 수 있다. 그러나, 이러한 레이어의 정의는 설명을 위한 임의적인 것으로서, 신경망 내에서 레이어의 차수는 상술한 것과 상이한 방법으로 정의될 수 있다. 예를 들어, 노드들의 레이어는 최종 출력 노드로부터 거리에 의해 정의될 수도 있다.A neural network may consist of a set of one or more nodes. A subset of nodes that make up a neural network can form a layer. Some of the nodes constituting the neural network may form one layer based on the distances from the first input node. For example, a set of nodes with a distance n from the initial input node may constitute n layers. The distance from the initial input node can be defined by the minimum number of links that must be passed to reach the node from the initial input node. However, this definition of a layer is arbitrary for explanation purposes, and the order of a layer within a neural network may be defined in a different way than described above. For example, a layer of nodes may be defined by distance from the final output node.

본 개시내용의 일 실시예에서, 뉴런들 또는 노드들의 집합은 레이어라는 표현으로 정의될 수 있다.In one embodiment of the present disclosure, a set of neurons or nodes may be defined by the expression layer.

최초 입력 노드는 신경망 내의 노드들 중 다른 노드들과의 관계에서 링크를 거치지 않고 데이터가 직접 입력되는 하나 이상의 노드들을 의미할 수 있다. 또는, 신경망 네트워크 내에서, 링크를 기준으로 한 노드 간의 관계에 있어서, 링크로 연결된 다른 입력 노드들을 가지지 않는 노드들을 의미할 수 있다. 이와 유사하게, 최종 출력 노드는 신경망 내의 노드들 중 다른 노드들과의 관계에서, 출력 노드를 가지지 않는 하나 이상의 노드들을 의미할 수 있다. 또한, 히든 노드는 최초 입력 노드 및 최후 출력 노드가 아닌 신경망을 구성하는 노드들을 의미할 수 있다.The initial input node may refer to one or more nodes in the neural network through which data is directly input without going through links in relationships with other nodes. Alternatively, in the relationship between nodes based on links within a neural network, it may refer to nodes that do not have other input nodes connected by links. Similarly, the final output node may refer to one or more nodes that do not have an output node in their relationship with other nodes among the nodes in the neural network. Additionally, hidden nodes may refer to nodes constituting a neural network other than the first input node and the last output node.

본 개시의 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수와 동일할 수 있으며, 입력 레이어에서 히든 레이어(hidden layer, 은닉층)로 진행됨에 따라 노드의 수가 감소하다가 다시 증가하는 형태의 신경망일 수 있다. 또한, 본 개시의 다른 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수 보다 적을 수 있으며, 입력 레이어에서 히든 레이어로 진행됨에 따라 노드의 수가 감소하는 형태의 신경망일 수 있다. 또한, 본 개시의 또 다른 일 실시예에 따른 신경망은 입력 레이어의 노드의 개수가 출력 레이어의 노드의 개수보다 많을 수 있으며, 입력 레이어에서 히든 레이어로 진행됨에 따라 노드의 수가 증가하는 형태의 신경망일 수 있다. 본 개시의 또 다른 일 실시예에 따른 신경망은 상술한 신경망들의 조합된 형태의 신경망일 수 있다.In a neural network according to an embodiment of the present disclosure, the number of nodes in the input layer may be the same as the number of nodes in the output layer, and as it progresses from the input layer to the hidden layer, the number of nodes decreases and then increases again. It could be an incremental neural network. In addition, the neural network according to another embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer may be less than the number of nodes in the output layer, and the number of nodes decreases as it progresses from the input layer to the hidden layer. there is. In addition, the neural network according to another embodiment of the present disclosure may be a neural network in which the number of nodes in the input layer may be greater than the number of nodes in the output layer, and the number of nodes increases as it progresses from the input layer to the hidden layer. You can. A neural network according to another embodiment of the present disclosure may be a neural network that is a combination of the above-described neural networks.

딥 뉴럴 네트워크(DNN: deep neural network, 심층신경망)는 입력 레이어와 출력 레이어 외에 복수의 히든 레이어를 포함하는 신경망을 의미할 수 있다. 딥 뉴럴 네트워크를 이용하면 데이터의 잠재적인 구조(latent structures)를 파악할 수 있다. 딥 뉴럴 네트워크는 컨볼루션 뉴럴 네트워크(CNN: convolutional neural network), 리커런트 뉴럴 네트워크(RNN: recurrent neural network), 오토 인코더(auto encoder), 적대적 생성 네트워크(GAN: Generative Adversarial Network), 제한 볼츠만 머신(RBM: Restricted Boltzmann Machine), 심층 신뢰 네트워크(DBN: deep belief network), Q 네트워크, U 네트워크, 샴 네트워크 등을 포함할 수 있다. 전술한 딥 뉴럴 네트워크의 기재는 예시일 뿐이며, 본 개시는 이에 제한되지 않는다.A deep neural network (DNN) may refer to a neural network that includes multiple hidden layers in addition to the input layer and output layer. Deep neural networks allow you to identify latent structures in data. Deep neural networks include convolutional neural networks (CNNs), recurrent neural networks (RNNs), auto encoders, generative adversarial networks (GANs), and restricted Boltzmann machines ( It may include Restricted Boltzmann Machine (RBM), deep belief network (DBN), Q network, U network, Siamese network, etc. The description of the deep neural network described above is merely an example, and the present disclosure is not limited thereto.

예시적으로, 본 개시내용의 인공지능 기반의 모델은, 트랜스포머, 생성형 사전학습 트랜스포머(GPT), BERT(Bidirectional Encoder Representations from Transformers) 등을 포함할 수 있다.Illustratively, the artificial intelligence-based model of the present disclosure may include a transformer, a generative pre-training transformer (GPT), and BERT (Bidirectional Encoder Representations from Transformers).

본 개시내용의 인공지능 기반의 모델은 입력 레이어, 히든 레이어 및 출력 레이어를 포함하는 전술한 임의의 구조의 네트워크 구조에 의해 표현될 수 있다.The artificial intelligence-based model of the present disclosure can be expressed by a network structure of any of the structures described above, including an input layer, a hidden layer, and an output layer.

본 개시내용의 인공지능 기반의 모델에서 사용될 수 있는 뉴럴 네트워크는 지도 학습(supervised learning), 비지도 학습(unsupervised learning), 반-지도 학습(semi-supervised learning), 또는 강화학습(reinforcement learning), 분산 딥러닝을 위한 연합학습(Federated Learning), 점진학습(incremental learning) 중 적어도 하나의 방식으로 학습될 수 있다. 뉴럴 네트워크의 학습은 뉴럴 네트워크가 특정한 동작을 수행하기 위한 지식을 뉴럴 네트워크에 적용하는 과정일 수 있다.Neural networks that can be used in the artificial intelligence-based model of the present disclosure include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, It can be learned using at least one of federated learning and incremental learning for distributed deep learning. Learning of a neural network may be a process of applying knowledge for the neural network to perform a specific operation to the neural network.

뉴럴 네트워크는 출력의 오류를 최소화하는 방향으로 학습될 수 있다. 뉴럴 네트워크의 학습에서 반복적으로 학습 데이터를 뉴럴 네트워크에 입력시키고 학습 데이터에 대한 뉴럴 네트워크의 출력과 타겟의 에러를 계산하고, 에러를 줄이기 위한 방향으로 뉴럴 네트워크의 에러를 뉴럴 네트워크의 출력 레이어에서부터 입력 레이어 방향으로 역전파(backpropagation)하여 뉴럴 네트워크의 각 노드의 가중치를 업데이트 하는 과정이다. 지도 학습의 경우 각각의 학습 데이터에 정답이 라벨링되어있는 학습 데이터를 사용하며(즉, 라벨링된 학습 데이터), 비지도 학습의 경우는 각각의 학습 데이터에 정답이 라벨링되어 있지 않을 수 있다. 즉, 예를 들어 데이터 분류에 관한 지도 학습의 경우의 학습 데이터는 학습 데이터 각각에 카테고리가 라벨링된 데이터일 수 있다. 라벨링된 학습 데이터가 뉴럴 네트워크에 입력되고, 뉴럴 네트워크의 출력(카테고리)과 학습 데이터의 라벨을 비교함으로써 오류(error)가 계산될 수 있다.Neural networks can be trained to minimize output errors. In neural network learning, learning data is repeatedly input into the neural network, the output of the neural network and the error of the target for the learning data are calculated, and the error of the neural network is transferred from the output layer of the neural network to the input layer in the direction of reducing the error. This is the process of updating the weight of each node in the neural network through backpropagation. In the case of supervised learning, learning data in which the correct answer is labeled in each learning data is used (i.e., labeled learning data), and in the case of unsupervised learning, the correct answer may not be labeled in each learning data. That is, for example, in the case of supervised learning on data classification, the learning data may be data in which each training data is labeled with a category. Labeled training data is input to the neural network, and the error can be calculated by comparing the output (category) of the neural network with the label of the training data.

다른 예로, 데이터 분류에 관한 비지도 학습의 경우 입력인 학습 데이터가 뉴럴 네트워크 출력과 비교됨으로써 오류가 계산될 수 있다. 계산된 오류는 뉴럴 네트워크에서 역방향(즉, 출력 레이어에서 입력 레이어 방향)으로 역전파되며, 역전파에 따라 뉴럴 네트워크의 각 레이어의 각 노드들의 연결 가중치가 업데이트될 수 있다. 업데이트되는 각 노드의 연결 가중치는 학습률(learning rate)에 따라 변화량이 결정될 수 있다. 입력 데이터에 대한 뉴럴 네트워크의 계산과 에러의 역전파는 학습 사이클(epoch)을 구성할 수 있다. 학습률은 뉴럴 네트워크의 학습 사이클의 반복 횟수에 따라 상이하게 적용될 수 있다. 예를 들어, 뉴럴 네트워크의 학습 초기에는 높은 학습률을 사용하여 뉴럴 네트워크가 빠르게 일정 수준의 성능을 확보하도록 하여 효율성을 높이고, 학습 후기에는 낮은 학습률을 사용하여 정확도를 높일 수 있다.As another example, in the case of unsupervised learning on data classification, the error can be calculated by comparing the input training data with the neural network output. The calculated error is back-propagated in the neural network in the reverse direction (i.e., from the output layer to the input layer), and the connection weight of each node in each layer of the neural network may be updated according to back-propagation. The amount of change in the connection weight of each updated node may be determined according to the learning rate. The neural network's calculation of input data and backpropagation of errors can constitute a learning cycle (epoch). The learning rate may be applied differently depending on the number of repetitions of the learning cycle of the neural network. For example, in the early stages of neural network training, a high learning rate can be used to increase efficiency by allowing the neural network to quickly achieve a certain level of performance, and in the later stages of training, a low learning rate can be used to increase accuracy.

뉴럴 네트워크의 학습에서 일반적으로 학습 데이터는 실제 데이터(즉, 학습된 뉴럴 네트워크를 이용하여 처리하고자 하는 데이터)의 부분집합일 수 있으며, 따라서, 학습 데이터에 대한 오류는 감소하나 실제 데이터에 대해서는 오류가 증가하는 학습 사이클이 존재할 수 있다. 과적합(overfitting)은 이와 같이 학습 데이터에 과하게 학습하여 실제 데이터에 대한 오류가 증가하는 현상이다. 예를 들어, 노란색 고양이를 보여 고양이를 학습한 뉴럴 네트워크가 노란색 이외의 고양이를 보고는 고양이임을 인식하지 못하는 현상이 과적합의 일종일 수 있다. 과적합은 머신러닝 알고리즘의 오류를 증가시키는 원인으로 작용할 수 있다. 이러한 과적합을 막기 위하여 다양한 최적화 방법이 사용될 수 있다. 과적합을 막기 위해서는 학습 데이터를 증가시키거나, 레귤라리제이션(regularization), 학습의 과정에서 네트워크의 노드 일부를 비활성화하는 드롭아웃(dropout), 배치 정규화 레이어(batch normalization layer) 활용 등의 방법이 적용될 수 있다.In the learning of neural networks, the training data can generally be a subset of real data (i.e., the data to be processed using the learned neural network), and thus the error for the training data is reduced, but the error for the real data is reduced. There may be an incremental learning cycle. Overfitting is a phenomenon in which errors in actual data increase due to excessive learning on training data. For example, a phenomenon in which a neural network that learned a cat by showing a yellow cat fails to recognize that it is a cat when it sees a non-yellow cat may be a type of overfitting. Overfitting can cause errors in machine learning algorithms to increase. To prevent such overfitting, various optimization methods can be used. To prevent overfitting, methods such as increasing the learning data, regularization, dropout to disable some of the network nodes during the learning process, and use of a batch normalization layer can be applied. You can.

일 실시예에서, 컴퓨팅 장치(100)는 사전 학습된 GAN(Generative Adversarial Networks) 모델을 활용할 수 있다. 사전 학습된 GAN 모델이란, 생성자(generator) 신경망 및 판별자(discriminator) 신경망을 포함하고, 그리고 생성자 신경망은 실제 데이터로부터 합성데이터를 생성하고, 판별자 신경망은 실제 데이터와 합성데이터를 판별하도록 사전 학습된 인공지능 모델에 대응될 수 있다. 생성자 신경망은 상기 학습과정을 통해 판별자 신경망으로 하여금 실제 데이터와 생성된 합성데이터 간 판별이 어려워지도록 사전 학습될 수 있다. 컴퓨팅 장치(100)는 사전 학습된 GAN 모델을 이용하여, 실제 의료데이터셋으로부터 실제 의료데이터셋의 확률 분포 및 특성을 가지는 합성데이터셋을 생성할 수 있다.In one embodiment, computing device 100 may utilize a pre-trained Generative Adversarial Networks (GAN) model. A pre-trained GAN model includes a generator neural network and a discriminator neural network, and the generator neural network generates synthetic data from real data, and the discriminator neural network is pre-trained to discriminate between real data and synthetic data. It can correspond to an artificial intelligence model. The generator neural network can be pre-trained through the above learning process to make it difficult for the discriminator neural network to distinguish between real data and generated synthetic data. The computing device 100 may use a pre-trained GAN model to generate a synthetic dataset having the probability distribution and characteristics of the actual medical dataset from the actual medical dataset.

이하에서는, 본 개시의 일 실시예에 따라 컴퓨팅 장치(100)가 복수의 의료기관들 각각의 의료데이터셋으로부터 통합 합성데이터셋을 생성하고, 통합 합성데이터셋으로부터 타겟 의료기관의 타겟 의료데이터셋에 대한 분석 결과를 획득하는 방법이 개시된다.Hereinafter, according to an embodiment of the present disclosure, the computing device 100 generates an integrated synthetic dataset from medical datasets of each of a plurality of medical institutions, and analyzes the target medical dataset of the target medical institution from the integrated synthetic dataset. A method for obtaining a result is disclosed.

도 3은 본 개시의 일 실시예에 따라 컴퓨팅 장치가 타겟 의료데이터셋에 대한 분석 결과를 획득하는 방법을 나타내는 순서도이다.Figure 3 is a flowchart showing a method by which a computing device obtains analysis results for a target medical dataset according to an embodiment of the present disclosure.

단계 S301에서, 컴퓨팅 장치(100)는 제 1 의료기관 및 제 2 의료기관으로부터 제 1 의료기관의 제 1 의료데이터셋 및 제 2 의료기관의 제 2 의료데이터셋을 획득할 수 있다. 컴퓨팅 장치(100)는 복수의 의료기관들 각각으로부터 의료데이터셋을 획득할 수 있다. 이하에서는, 설명의 편의를 위해 제 1 의료기관 및 제 2 의료기관으로부터 의료데이터셋을 획득하는 것을 예시로 들어 설명하도록 한다. 그러나, 이에 한정되지 않고, 복수의 의료기관들 각각으로부터 의료데이터셋을 획득하고, 복수의 의료데이터셋으로부터 통합 합성데이터셋을 생성할 수 있다.In step S301, the computing device 100 may acquire a first medical dataset of the first medical institution and a second medical dataset of the second medical institution from the first medical institution and the second medical institution. The computing device 100 may acquire medical data sets from each of a plurality of medical institutions. Hereinafter, for convenience of explanation, acquisition of medical data sets from a first medical institution and a second medical institution will be explained as an example. However, the method is not limited to this, and medical data sets can be acquired from each of a plurality of medical institutions, and an integrated synthetic data set can be generated from the plurality of medical data sets.

일 실시예에서, 의료데이터셋이란 의료데이터들의 집합을 의미할 수 있다. 의료데이터는 특정 사람의 의료에 관련된 정보를 포함할 수 있다. 예를 들어, 의료데이터는 의료 분야에 있어서 적어도 하나의 질병과 관련된 변수를 포함할 수 있다. 예를 들어, 의료데이터는 특정 사람의 신체 정보, 건강 정보, 진료 정보 등을 포함하는 데이터로, 특정 사람의 CT 영상 데이터, MRI 검사 결과 데이터, 심전도 측정 결과 데이터, 전자의무기록(Electronic Medical Record, EMR) 데이터 등을 포함할 수 있다. 건강 정보는 특정 사람의 질병의 유무, 질병의 명칭, 질병의 병기(staging) 등과 같이 개인의 질병에 관한 정보를 포함할 수 있다. 진료 정보는 특정 사람의 처방 약제의 유무, 처방 약제의 명칭, 처방 약제의 복용 기간, 처방 약제의 복용량, 수술 여부, 수술의 명칭, 수술의 내용 등과 같은 개인진료기록에 관한 정보를 포함할 수 있다.In one embodiment, a medical dataset may refer to a set of medical data. Medical data may include information related to the medical care of a specific person. For example, medical data may include variables related to at least one disease in the medical field. For example, medical data is data that includes a specific person's physical information, health information, and medical information, such as a specific person's CT image data, MRI test result data, electrocardiogram measurement result data, and electronic medical record (Electronic Medical Record, EMR) data, etc. may be included. Health information may include information about an individual's disease, such as the presence or absence of the disease, the name of the disease, and the staging of the disease. Medical information may include information about personal medical records, such as the presence or absence of a specific person's prescribed medication, the name of the prescribed medication, the period of taking the prescribed medication, the dosage of the prescribed medication, whether surgery was performed, the name of the surgery, and the details of the surgery. .

일 실시예에서, 의료데이터는 특정 사람의 신체 정보, 건강 정보, 진료 정보 등에 대한 변수를 포함할 수 있다. 또한, 의료데이터는 생활 습관(음주, 흡연 등), 가족력, 나이, 성별, 콜레스테롤 수치, 유전자 관련 변수 등 질병과 관련된 변수를 포함할 수 있다. 컴퓨팅 장치(100)는 연구 대상 및 목적을 설정할 수 있다. 또한, 컴퓨팅 장치(100) 연구 대상 및 목적에 따라 의료데이터에 포함된 변수들 중 독립변수 및 종속변수를 설정할 수 있다. 일 실시예에서, 독립변수는 원인이 되는 변수를 의미할 수 있고, 종속변수는 독립변수에 따라 값이 결정되는 결과가 되는 변수를 의미할 수 있다. 독립변수 및 종속변수 각각은 적어도 하나의 변수를 포함할 수 있다. 제 1 의료기관의 제 1 의료데이터셋 및 제 2 의료기관의 제 2 의료데이터셋은 기설정된 독립변수 및 기설정된 종속변수를 포함할 수 있다.In one embodiment, medical data may include variables for a specific person's physical information, health information, medical information, etc. Additionally, medical data may include disease-related variables such as lifestyle habits (drinking, smoking, etc.), family history, age, gender, cholesterol level, and gene-related variables. The computing device 100 can set research subjects and purposes. In addition, independent variables and dependent variables among the variables included in the medical data can be set depending on the research object and purpose of the computing device 100. In one embodiment, the independent variable may refer to a causal variable, and the dependent variable may refer to a variable whose value is determined according to the independent variable. Each of the independent and dependent variables may include at least one variable. The first medical data set of the first medical institution and the second medical data set of the second medical institution may include preset independent variables and preset dependent variables.

예를 들어, 연구 목적이 비알코올성 지방간 환자를 대상으로 한 피오글리타존(pioglitazon)의 부작용 발생 여부인 경우, 컴퓨팅 장치(100)는 연구 대상을 비알코올성 지방간 환자 중 피오글리타존을 복용한 환자로 설정하고, 합성데이터셋 생성을 위해 제 1 의료기관 및 제 2 의료기관으로부터 비알코올성 지방간 환자 중 피오글리타존을 복용한 환자의 의료데이터를 포함하는 의료데이터셋을 획득할 수 있다. 이 경우, 컴퓨팅 장치(100)는 비알코올성 지방간 유무, 비알코올성 지방간의 병기, 비알코올성 지방간 발생 기간, 피오글리타존 복용 여부, 피오글리타존 복용량, 피오글리타존 복용 기간 등을 독립변수로 설정할 수 있다. 컴퓨팅 장치(100)는 체중 증가, 부종 발생 등의 부작용과 관련된 변수(예를 들어, 체중, 부종의 유무 등)를 종속변수로 설정할 수 있다.For example, if the purpose of the study is to determine whether side effects occur with pioglitazone in patients with non-alcoholic fatty liver disease, the computing device 100 sets the research subject as a patient who took pioglitazone among patients with non-alcoholic fatty liver disease and synthesizes the To create the dataset, a medical dataset containing medical data of patients taking pioglitazone among non-alcoholic fatty liver patients can be obtained from the first and second medical institutions. In this case, the computing device 100 may set the presence or absence of non-alcoholic fatty liver disease, the stage of non-alcoholic fatty liver disease, the development period of non-alcoholic fatty liver disease, whether or not pioglitazone is being taken, the pioglitazone dose, and the pioglitazone taking period as independent variables. The computing device 100 may set variables related to side effects such as weight gain and edema (eg, body weight, presence of edema, etc.) as dependent variables.

일 실시예에서, 제 1 의료데이터셋 및 제 2 의료데이터셋은 제 1 로우(raw) 의료데이터셋 및 제 2 로우 의료데이터셋에서 공통 데이터 모델(Common Data Model, CDM)의 데이터구조로 변환된 의료데이터셋일 수 있다. 이 때, 제 1 의료데이터셋은 제 1 로우 의료데이터셋에 대응되고, 그리고 제 2 의료데이터셋은 제 2 로우 의료데이터셋에 대응될 수 있다. 일 실시예에서, 로우 의료데이터셋이란 복수의 의료기관들 각각이 보유하는 서로 상이한 데이터구조의 비정형 의료데이터셋일 수 있다. 일 실시예에서, 공통 데이터 모델이란 복수의 의료기관들 각각이 보유하는 서로 상이한 데이터구조의 의료데이터셋에 동일한 데이터구조와 규격을 설정한 데이터 모델을 의미할 수 있다. 예를 들어, 공통 데이터 모델에는 OMOP-CDM, Sentinel-CDM, PCORnet CDM 등이 포함될 수 있다. 로우 의료데이터에 포함된 변수들은 규격화된 데이터 형식, 라벨, 클래스 등으로 변환 또는 분류되어 공통 데이터 모델의 데이터 구조를 가지는 의료데이터로 변환될 수 있다. 이를 통해, 복수의 의료기관들 각각의 의료데이터셋을 일괄적으로 데이터 처리할 수 있다. In one embodiment, the first medical dataset and the second medical dataset are converted from the first raw medical dataset and the second raw medical dataset to the data structure of the Common Data Model (CDM). It could be a medical dataset. At this time, the first medical dataset may correspond to the first raw medical dataset, and the second medical dataset may correspond to the second raw medical dataset. In one embodiment, a raw medical dataset may be an unstructured medical dataset with a different data structure held by each of a plurality of medical institutions. In one embodiment, the common data model may mean a data model that sets the same data structure and standards to medical datasets with different data structures held by each of a plurality of medical institutions. For example, common data models may include OMOP-CDM, Sentinel-CDM, PCORnet CDM, etc. Variables included in raw medical data can be converted or classified into standardized data formats, labels, classes, etc., and converted into medical data with a data structure of a common data model. Through this, medical data sets from multiple medical institutions can be processed in batches.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 로우 의료데이터셋 및 제 2 로우 의료데이터셋을 공통 데이터 모델의 데이터구조를 가지는 제 1 의료데이터셋 및 제 2 의료데이터셋으로 변환할 수 있다. 다른 실시예에서, 컴퓨팅 장치(100)는 제 1 의료기관 및 제 2 의료기관으로부터 공통 데이터 모델의 데이터구조를 가지는 제 1 의료데이터셋 및 제 2 의료데이터셋을 획득할 수 있다. In one embodiment, the computing device 100 may convert the first raw medical dataset and the second raw medical dataset into a first medical dataset and a second medical dataset having a data structure of a common data model. In another embodiment, the computing device 100 may acquire a first medical dataset and a second medical dataset having a data structure of a common data model from the first medical institution and the second medical institution.

일 실시예에서, 컴퓨팅 장치(100)는 기설정된 효과크기, 기설정된 검정력 및 기설정된 유의수준 중 적어도 하나에 기초하여, 통합 합성데이터셋을 생성하기 위한 의료데이터셋의 필요 데이터 개수를 결정할 수 있다. 제 1 의료데이터셋 및 제 2 의료데이터셋 각각은 필요 데이터 개수 이상의 의료데이터를 포함할 수 있다. 예를 들어, 필요 데이터 개수란 모집단에 대한 유의미한 결과를 얻기 위한 표본집단의 필요 데이터 개수를 의미할 수 있다. 일 실시예에서, 효과크기란 연구하고 있는 제 1 의료데이터셋 및 제 2 의료데이터셋의 평균값 차이를 제 1 의료데이터셋의 표준편차 또는 제 2 의료데이터셋의 표준편차로 나눈 값일 수 있다. 일 실시예에서, 검정력이란 통계적으로 유의하다는 결론을 내기 위해 충분한 정도를 확률로 나타낸 값일 수 있다. 일 실시예에서, 유의수준이란 1에서 신뢰도를 뺀 값으로, 예를 들어 신뢰도가 95%인 경우, 유의수준은 1-0.95인 0.05일 수 있다. 컴퓨팅 장치(100)는 효과크기, 검정력 및 유의수준을 설정하고, 설정된 효과크기, 검정력 및 유의수준에 기초하여 제 1 의료데이터셋 및 제 2 의료데이터셋 각각의 필요 데이터 개수를 결정할 수 있다. In one embodiment, the computing device 100 may determine the required number of data in the medical dataset to generate an integrated synthetic dataset based on at least one of a preset effect size, preset power, and preset significance level. . Each of the first medical data set and the second medical data set may include more than the required number of medical data. For example, the required number of data may refer to the number of data required for a sample group to obtain meaningful results for the population. In one embodiment, the effect size may be the difference between the average values of the first and second medical datasets being studied divided by the standard deviation of the first medical dataset or the standard deviation of the second medical dataset. In one embodiment, power may be a value that represents a sufficient degree of probability to conclude that it is statistically significant. In one embodiment, the significance level is 1 minus the reliability. For example, if the reliability is 95%, the significance level may be 0.05, which is 1-0.95. The computing device 100 may set the effect size, power, and significance level, and determine the number of data required for each of the first and second medical datasets based on the set effect size, power, and significance level.

일 실시예에서, 컴퓨팅 장치(100)는 복수의 의료기관들 중 의료데이터셋을 연구 대상 및 목적에 따른 기설정된 독립변수 및 기설정된 종속변수를 포함하는 의료데이터를 필요 데이터 개수보다 적게 포함하고 있는 의료기관에 대해, 대응되는 의료데이터셋을 통합 합성데이터셋 생성에 활용하지 않을 수 있다. In one embodiment, the computing device 100 is a medical data set among a plurality of medical institutions that includes medical data including preset independent variables and preset dependent variables according to the research object and purpose. For , the corresponding medical dataset may not be used to create an integrated synthetic dataset.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋에 포함된 제 1 의료데이터 중 기설정된 종속변수 및 기설정된 독립변수 중 적어도 하나를 포함하지 않는 제 1-1 의료데이터를 제 1 의료데이터셋에서 삭제할 수 있다. 마찬가지로, 컴퓨팅 장치(100)는 제 2 의료데이터셋에 포함된 제 2 의료데이터 중 기설정된 종속변수 및 기설정된 독립변수 중 적어도 하나를 포함하지 않는 제 2-1 의료데이터를 제 2 의료데이터셋에서 삭제할 수 있다. In one embodiment, the computing device 100 selects the 1-1 medical data that does not include at least one of the preset dependent variable and the preset independent variable among the first medical data included in the first medical data set as the first medical data. It can be deleted from the dataset. Likewise, the computing device 100 selects the 2-1 medical data that does not include at least one of the preset dependent variable and the preset independent variable among the second medical data included in the second medical data set. It can be deleted.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋에 포함된 제 1 의료데이터 중 임상적 사전정보에 포함된 임상적 수치로부터 기설정된 임계치를 벗어난 수치를 포함하는 제 1-2 의료데이터를 제 1 의료데이터셋에서 삭제할 수 있다. 마찬가지로, 컴퓨팅 장치(100)는 제 2 의료데이터셋에 포함된 제 2 의료데이터 중 임상적 사전정보에 포함된 임상적 수치로부터 기설정된 임계치를 벗어난 수치를 포함하는 제 2-2 의료데이터를 제 2 의료데이터셋에서 삭제할 수 있다. 예를 들어, 기설정된 임계치를 벗어난 수치를 포함하는 의료데이터는 입력 오류값을 포함하는 의료데이터일 수 있다.In one embodiment, the computing device 100 includes 1-2 medical data including values that deviate from a preset threshold from clinical values included in clinical prior information among the first medical data included in the first medical data set. Can be deleted from the first medical dataset. Likewise, the computing device 100 selects the 2-2 medical data including a value that deviates from a preset threshold from the clinical value included in the clinical preliminary information among the second medical data included in the second medical data set as the second medical data. It can be deleted from the medical dataset. For example, medical data containing a value outside a preset threshold may be medical data containing an input error value.

단계 S302에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 통합 합성데이터셋을 생성할 수 있다. 통합 합성데이터셋은 기설정된 독립변수 및 기설정된 종속변수를 포함할 수 있다. 일 실시예에서, 통합 합성데이터셋은, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 산출된 기설정된 독립변수에 따른 기설정된 종속변수의 확률 분포를 따르도록 생성된 합성데이터셋일 수 있다. 예를 들어, 기설정된 종속변수가 복수의 종속변수를 포함하는 경우, 통합 합성데이터셋은 기설정된 독립변수에 따른 복수의 종속변수 각각의 확률 분포를 따르도록 생성될 수 있다. 일 실시예에서, 컴퓨팅 장치(100)는 외부 장치 또는 외부 서버로 통합 합성데이터셋을 전송할 수 있다. 이를 통해, 분석을 의뢰한 연구기관 또는 연구자는 실제 의료데이터셋에 기초한 합성데이터셋을 제공받을 수 있다는 기술적 효과가 달성될 수 있다.In step S302, the computing device 100 may generate an integrated synthetic dataset from the first medical dataset and the second medical dataset. The integrated synthetic data set may include preset independent variables and preset dependent variables. In one embodiment, the integrated synthetic dataset may be a synthetic dataset created to follow a probability distribution of a preset dependent variable according to a preset independent variable calculated from the first medical dataset and the second medical dataset. For example, when a preset dependent variable includes a plurality of dependent variables, an integrated synthetic data set may be created to follow the probability distribution of each of the plurality of dependent variables according to the preset independent variable. In one embodiment, the computing device 100 may transmit the integrated synthetic dataset to an external device or external server. Through this, the technical effect can be achieved in that the research institution or researcher who requested the analysis can be provided with a synthetic dataset based on the actual medical dataset.

일 실시예에서, 컴퓨팅 장치(100)는 기설정된 독립변수 및 기설정된 종속변수 간 상관관계에 대한 정보를 포함하는 임상적 사전정보를 획득할 수 있다. 일 실시예에서, 임상적 사전정보란 기설정된 연구 대상 및 목적에 대한 선행 임상연구, 임상 통계자료 등을 포함할 수 있다. 예를 들어, 임상적 사전정보는 연구 기간에 관한 정보, 연구 목적이 되는 타겟 질환의 질환(또는, 약물) 코드 정보, 연도/의료기관별 타겟 질환의 유병률 및 발생률에 관한 정보, 기설정된 독립변수 및 기설정된 종속변수에 대한 통계자료(예를 들어, 통계량, 통계 방식, 통계 데이터, 통계 범위, 통계 단위 등)를 포함할 수 있다. 일 실시예에서, 임상적 사전정보는 임상적 수치가 기록된 임상 통계자료를 포함할 수 있다. 예를 들어, 임상 통계자료는 임상 의료데이터 개수에 대한 정보를 포함할 수 있다. In one embodiment, the computing device 100 may acquire clinical prior information including information about the correlation between a preset independent variable and a preset dependent variable. In one embodiment, clinical prior information may include prior clinical studies, clinical statistical data, etc. for pre-established research subjects and purposes. For example, clinical preliminary information includes information on the study period, disease (or drug) code information for the target disease for the purpose of the study, information on the prevalence and incidence of the target disease by year/medical institution, pre-set independent variables, and It may include statistical data (e.g., statistical quantity, statistical method, statistical data, statistical range, statistical unit, etc.) for the preset dependent variable. In one embodiment, clinical preliminary information may include clinical statistics in which clinical values are recorded. For example, clinical statistical data may include information about the number of clinical medical data.

일 실시예에서, 기설정된 독립변수와 기설정된 종속변수 간 상관관계에 관한 정보란, 기설정된 독립변수에 따른 기설정된 종속변수의 수치 변화에 대한 정보일 수 있다. 예를 들어, 기설정된 독립변수는 복수의 독립변수들을 포함할 수 있으며, 그리고 복수의 독립변수 각각에는 대응되는 가중치가 할당될 수 있다. 복수의 독립변수 각각에 대응되는 가중치에는 기설정된 독립변수와 기설정된 종속변수 간 상관관계가 반영될 수 있다. 예를 들어, 기설정된 독립변수와 기설정된 종속변수가 선형적인 비례관계인 경우, 복수의 독립변수 각각에 대응되는 가중치는 비례상수일 수 있다. 기설정된 독립변수에 대한 가중치에 기초하여, 기설정된 독립변수에 따른 기설정된 종속변수의 확률 분포가 획득될 수 있다. In one embodiment, information about the correlation between a preset independent variable and a preset dependent variable may be information about a numerical change in a preset dependent variable according to a preset independent variable. For example, a preset independent variable may include a plurality of independent variables, and a corresponding weight may be assigned to each of the plurality of independent variables. The weight corresponding to each of the plurality of independent variables may reflect the correlation between the preset independent variable and the preset dependent variable. For example, when a preset independent variable and a preset dependent variable have a linear proportional relationship, the weight corresponding to each of the plurality of independent variables may be a proportionality constant. Based on the weights for the preset independent variables, the probability distribution of the preset dependent variable according to the preset independent variable may be obtained.

일 실시예에서, 컴퓨팅 장치(100)는 획득된 임상적 사전정보, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 통합 합성데이터셋을 생성할 수 있다. 예를 들어, 컴퓨팅 장치(100)는 임상적 사전정보에 포함된 임상 통계자료에서의 기설정된 독립변수에 따른 기설정된 종속변수의 확률 분포 및 제 1 의료데이터셋 및 제 2 의료데이터셋에서의 기설정된 독립변수에 따른 기설정된 종속변수의 확률 분포를 따르는 통합 합성데이터셋을 생성할 수 있다. 이를 통해, 임상적 사전정보를 이용하여 임상적 수치에 기초한 확률 분포를 따르는 합성데이터셋을 생성할 수 있다는 기술적 효과를 달성할 수 있다. 또한, 단순히 임상적 수치에만 기초하지 않고, 실제 의료기관의 의료데이터를 이용하여 실제 데이터에 기초한 확률 분포를 따르는 합성데이터셋을 생성할 수 있다는 기술적 효과를 달성할 수 있다.In one embodiment, the computing device 100 may generate an integrated synthetic dataset from the acquired clinical prior information, the first medical dataset, and the second medical dataset. For example, the computing device 100 calculates the probability distribution of a preset dependent variable according to a preset independent variable in clinical statistical data included in clinical preliminary information and the probability distribution of the preset dependent variable in the first and second medical datasets. You can create an integrated synthetic data set that follows the probability distribution of the preset dependent variable according to the set independent variable. Through this, it is possible to achieve the technical effect of generating a synthetic dataset that follows a probability distribution based on clinical values using clinical prior information. In addition, it is possible to achieve the technical effect of generating a synthetic dataset that follows a probability distribution based on actual data using medical data from actual medical institutions rather than simply based on clinical values.

일 실시예에서, 기설정된 독립변수 및 기설정된 종속변수가 수치를 가지는 수치형 변수인 경우, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 기설정된 독립변수의 수치 구간 별 기설정된 종속변수 수치의 확률 분포를 산출할 수 있다. 또한, 컴퓨팅 장치(100)는 임상적 사전정보, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 기설정된 독립변수의 수치 구간 별 기설정된 종속변수 수치의 확률 분포를 산출할 수 있다. 컴퓨팅 장치(100)는 산출된 기설정된 독립변수의 수치 구간 별 기설정된 종속변수 수치의 확률 분포를 따르는 통합 합성데이터셋을 생성할 수 있다. 이 경우, 통합 합성데이터셋은 기설정된 종속변수 수치의 확률 분포에 따라 기설정된 독립변수의 수치 구간들 각각에 대응되는 가중치를 가질 수 있다. 예를 들어, 가중치는 기설정된 독립변수의 수치 구간에 따른 기설정된 종속변수의 평균값에 대응될 수 있다.In one embodiment, when the preset independent variable and the preset dependent variable are numeric variables having numerical values, the computing device 100 determines each numerical interval of the preset independent variable from the first medical data set and the second medical data set. The probability distribution of preset dependent variable values can be calculated. Additionally, the computing device 100 may calculate a probability distribution of a preset dependent variable value for each numerical interval of a preset independent variable from clinical prior information, a first medical data set, and a second medical data set. The computing device 100 may generate an integrated synthetic data set that follows the probability distribution of the preset dependent variable values for each numerical interval of the calculated preset independent variables. In this case, the integrated synthetic data set may have a weight corresponding to each numerical interval of the preset independent variable according to the probability distribution of the preset dependent variable value. For example, the weight may correspond to the average value of the preset dependent variable according to the numerical interval of the preset independent variable.

예를 들어, 연구 목적이 비만과 당뇨병 간 상관관계에 대한 연구인 경우, 독립변수는 BMI(Body Mass Index) 수치로, 종속변수는 혈당 수치로 설정될 수 있다. 컴퓨팅 장치(100)는 각 의료데이터에 BMI 지수 및 혈당 수치가 포함된 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 BMI 지수의 수치 구간 별 혈당 수치의 확률 분포를 산출할 수 있다. 컴퓨팅 장치(100)는 산출된 확률 분포를 가지는 통합 합성데이터셋을 생성할 수 있다.For example, if the research purpose is to study the correlation between obesity and diabetes, the independent variable can be set as the BMI (Body Mass Index) level and the dependent variable can be set as the blood sugar level. The computing device 100 may calculate a probability distribution of blood sugar levels for each BMI index value section from the first and second medical data sets in which each medical data includes the BMI index and blood sugar level. The computing device 100 may generate an integrated synthetic dataset having a calculated probability distribution.

일 실시예에서, 기설정된 독립변수가 수치형 변수이고 그리고 기설정된 종속변수가 질병(또는 부작용)의 발생 여부 등에 따라 1(True) 또는 0(False)의 값을 가지는 논리형 변수인 경우, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 기설정된 독립변수의 수치에 따른 기설정된 종속변수의 발생 확률에 대한 확률 분포를 산출할 수 있다. 또한, 컴퓨팅 장치(100)는 임상적 사전정보, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 기설정된 독립변수의 수치에 따른 기설정된 종속변수의 발생 확률에 대한 확률 분포를 산출할 수 있다. 예를 들어, 기설정된 종속변수가 1(True)인 경우, 기설정된 종속변수에 대응되는 질병(또는 부작용)이 발생한 경우를, 기설정된 종속변수가 0(False)인 경우, 기설정된 종속변수에 대응되는 질병(또는 부작용)이 발생하지 않은 경우 등을 의미할 수 있다. 컴퓨팅 장치(100)는 산출된 기설정된 독립변수의 수치에 따른 기설정된 종속변수의 발생 확률에 대한 확률 분포를 따르는 통합 합성데이터셋을 생성할 수 있다. 이 경우, 통합 합성데이터셋은 기설정된 종속변수의 발생 확률에 대한 확률 분포에 따라 기설정된 독립변수에 대응되는 가중치를 가질 수 있다. 예를 들어, 가중치는 기설정된 독립변수 수치에 따른 기설정된 종속변수의 발생 확률에 대응될 수 있다.In one embodiment, when the preset independent variable is a numeric variable and the preset dependent variable is a logical variable with a value of 1 (True) or 0 (False) depending on whether a disease (or side effect) occurs, etc., computing The device 100 may calculate a probability distribution for the probability of occurrence of a preset dependent variable according to the value of a preset independent variable from the first medical data set and the second medical data set. Additionally, the computing device 100 may calculate a probability distribution for the probability of occurrence of a preset dependent variable according to the value of the preset independent variable from clinical prior information, the first medical data set, and the second medical data set. For example, if the preset dependent variable is 1 (True), a disease (or side effect) corresponding to the preset dependent variable occurs, and if the preset dependent variable is 0 (False), the preset dependent variable This may mean that the corresponding disease (or side effect) has not occurred. The computing device 100 may generate an integrated synthetic data set that follows a probability distribution for the probability of occurrence of a preset dependent variable according to the calculated values of the preset independent variable. In this case, the integrated synthetic data set may have weights corresponding to the preset independent variables according to the probability distribution for the probability of occurrence of the preset dependent variable. For example, the weight may correspond to the probability of occurrence of a preset dependent variable according to the preset value of the independent variable.

예를 들어, 연구 목적이 비만과 당뇨병 간 상관관계에 대한 연구인 경우, 독립변수는 BMI(Body Mass Index) 수치로, 종속변수는 당뇨병의 유무로 설정될 수 있다. 컴퓨팅 장치(100)는 각 의료데이터에 BMI 지수 및 당뇨병의 유무가 포함된 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 BMI 지수에 따른 당뇨병 발생 확률에 대한 확률 분포를 산출할 수 있다. 컴퓨팅 장치(100)는 산출된 확률 분포를 가지는 통합 합성데이터셋을 생성할 수 있다.For example, if the research purpose is to study the correlation between obesity and diabetes, the independent variable can be set as the BMI (Body Mass Index) value and the dependent variable can be set as the presence or absence of diabetes. The computing device 100 may calculate a probability distribution for the probability of developing diabetes according to the BMI index from the first and second medical datasets in which each medical data includes the BMI index and the presence or absence of diabetes. The computing device 100 may generate an integrated synthetic dataset having a calculated probability distribution.

일 실시예에서, 기설정된 독립변수가 질병의 발생 여부 또는 처방 약제의 유무 등에 따라 1(True) 또는 0(False)의 값을 가지는 논리형 변수이고 그리고 기설정된 종속변수가 수치형 변수인 경우, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 1(True)의 값을 가지는 기설정된 독립변수에 따른 기설정된 종속변수 수치의 확률 분포를 산출할 수 있다. 또한, 컴퓨팅 장치(100)는 임상적 사전정보, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 1(True)의 값을 가지는 기설정된 독립변수에 따른 기설정된 종속변수 수치의 확률 분포를 산출할 수 있다. 컴퓨팅 장치(100)는 산출된 기설정된 독립변수에 따른 기설정된 종속변수 수치의 확률 분포를 따르는 통합 합성데이터셋을 생성할 수 있다. 이 경우, 통합 합성데이터셋은 기설정된 종속변수 수치의 확률 분포에 따라 기설정된 독립변수에 대응되는 가중치를 가질 수 있다. 예를 들어, 가중치는 기설정된 독립변수에 따른 기설정된 종속변수의 평균값에 대응될 수 있다.In one embodiment, when the preset independent variable is a logical variable with a value of 1 (True) or 0 (False) depending on whether a disease occurs or the presence or absence of a prescription drug, and the preset dependent variable is a numeric variable, The computing device 100 may calculate a probability distribution of a preset dependent variable value according to a preset independent variable with a value of 1 (True) from the first medical data set and the second medical data set. In addition, the computing device 100 calculates the probability distribution of a preset dependent variable value according to a preset independent variable with a value of 1 (True) from clinical prior information, the first medical data set, and the second medical data set. You can. The computing device 100 may generate an integrated synthetic data set that follows a probability distribution of preset dependent variable values according to the calculated preset independent variables. In this case, the integrated synthetic data set may have weights corresponding to the preset independent variables according to the probability distribution of the preset dependent variable values. For example, the weight may correspond to the average value of a preset dependent variable according to a preset independent variable.

예를 들어, 비알코올성 지방간 환자에 대한 피오글리타존의 부작용을 연구하는 경우, 독립변수는 비알코올성 지방간의 유무 및 피오글리타존의 복용 유무로, 종속변수는 체중 증가량으로 설정될 수 있다. 컴퓨팅 장치(100)는 피오글리타존을 복용한 비알코올성 지방간 환자들에 대한 의료데이터를 포함하는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 비알코올성 지방간 환자의 피오글리타존 복용에 따른 체중 증가량의 확률 분포를 산출할 수 있다. 컴퓨팅 장치(100)는 산출된 확률 분포를 가지는 통합 합성데이터셋을 생성할 수 있다.For example, when studying the side effects of pioglitazone on patients with non-alcoholic fatty liver disease, the independent variables can be set as the presence or absence of non-alcoholic fatty liver disease and the use of pioglitazone, and the dependent variable can be set as the amount of weight gain. The computing device 100 calculates a probability distribution of the amount of weight gain in non-alcoholic fatty liver patients due to taking pioglitazone from the first medical dataset and the second medical dataset including medical data on non-alcoholic fatty liver patients taking pioglitazone. can do. The computing device 100 may generate an integrated synthetic dataset having a calculated probability distribution.

일 실시예에서, 기설정된 독립변수 및 기설정된 종속변수가 논리형 변수인 경우, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 기설정된 독립변수에 따른 기설정된 종속변수의 발생 확률을 산출할 수 있다. 또한, 컴퓨팅 장치(100)는 임상적 사전정보, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 기설정된 독립변수에 따른 기설정된 종속변수의 발생 확률을 산출할 수 있다. 컴퓨팅 장치(100)는 산출된 기설정된 독립변수에 따른 기설정된 종속변수의 발생 확률을 가지는 통합 합성데이터셋을 생성할 수 있다. 이 경우, 통합 합성데이터셋은 기설정된 종속변수의 발생 확률에 따라 기설정된 독립변수에 대응되는 가중치를 가질 수 있다. 예를 들어, 가중치는 기설정된 독립변수에 따른 기설정된 종속변수의 발생 확률에 대응될 수 있다. In one embodiment, when the preset independent variable and the preset dependent variable are logical variables, the computing device 100 determines the preset dependent variable according to the preset independent variable from the first medical dataset and the second medical dataset. The probability of occurrence can be calculated. Additionally, the computing device 100 may calculate the probability of occurrence of a preset dependent variable according to a preset independent variable from clinical prior information, the first medical data set, and the second medical data set. The computing device 100 may generate an integrated synthetic data set having a probability of occurrence of a preset dependent variable according to the calculated preset independent variable. In this case, the integrated synthetic data set may have weights corresponding to the preset independent variables according to the probability of occurrence of the preset dependent variable. For example, the weight may correspond to the probability of occurrence of a preset dependent variable according to a preset independent variable.

예를 들어, 연구 목적이 비만과 당뇨병 간 상관관계에 대한 연구인 경우, 독립변수는 비만 여부로, 종속변수는 당뇨병의 유무로 설정될 수 있다. 컴퓨팅 장치(100)는 각 의료데이터에 비만 여부 및 당뇨병의 유무가 포함된 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 비만 여부에 따른 당뇨병 발생 확률에 대한 확률 분포를 산출할 수 있다. 컴퓨팅 장치(100)는 산출된 확률 분포를 가지는 통합 합성데이터셋을 생성할 수 있다.For example, if the research purpose is to study the correlation between obesity and diabetes, the independent variable can be set as obesity and the dependent variable can be set as the presence or absence of diabetes. The computing device 100 may calculate a probability distribution for the probability of developing diabetes according to obesity from the first and second medical datasets, which include obesity and diabetes in each medical data. The computing device 100 may generate an integrated synthetic dataset having a calculated probability distribution.

일 실시예에서, 기설정된 독립변수 및 기설정된 종속변수 각각은 수치형 변수 및 논리형 변수 중 적어도 하나를 포함할 수 있다. 예를 들어, 컴퓨팅 장치(100)는 상기 기설정된 독립변수에 따른 기설정된 종속변수의 확률 분포(또는 발생 확률) 산출 방법에 따라 복수의 독립변수들 각각에 대응되는 가중치를 산출할 수 있다. 일 실시예에서, 복수의 종속변수들 각각의 평균값은 복수의 독립변수들 각각에 복수의 독립변수들 각각에 대응되는 복수의 가중치들을 곱하여 구해지는 가중평균으로 산출될 수 있다. 예를 들어, W1,…,Wn은 각 독립변수에 대응되는 가중치, X1,…Xn은 복수의 독립변수라 할 때, 각 종속변수의 평균값은 (W1*X1 + W2*X2 +…+ Wn*Xn)/(W1 + W2 +…+ Wn)으로 구해질 수 있다.In one embodiment, each of the preset independent variable and the preset dependent variable may include at least one of a numeric variable and a logical variable. For example, the computing device 100 may calculate a weight corresponding to each of a plurality of independent variables according to a method for calculating the probability distribution (or probability of occurrence) of a preset dependent variable according to the preset independent variable. In one embodiment, the average value of each of the plurality of dependent variables may be calculated as a weighted average obtained by multiplying each of the plurality of independent variables by a plurality of weights corresponding to each of the plurality of independent variables. For example, W1,… ,Wn is the weight corresponding to each independent variable, When Xn is a plurality of independent variables, the average value of each dependent variable can be obtained as (W1*X1 + W2*X2 +…+ Wn*Xn)/(W1 + W2 +…+ Wn).

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 제 1 의료데이터셋에 대응되는 제 1 합성데이터셋 및 제 2 의료데이터셋에 대응되는 제 2 합성데이터셋을 생성할 수 있다. 컴퓨팅 장치(100)가 복수의 의료기관들 각각으로부터 합성데이터셋을 생성하고, 복수의 의료기관들 각각에 대응되는 합성데이터셋으로부터 통합 합성데이터셋을 생성하는 방법에 대한 자세한 설명은 도 4를 참조하여 후술하도록 한다.In one embodiment, the computing device 100 generates a first synthetic dataset corresponding to the first medical dataset and a second synthetic dataset corresponding to the second medical dataset from the first medical dataset and the second medical dataset. can be created. A detailed description of how the computing device 100 generates a synthetic data set from each of a plurality of medical institutions and an integrated synthetic data set from the synthetic data set corresponding to each of the plurality of medical institutions will be described later with reference to FIG. 4. Let's do it.

단계 S303에서, 컴퓨팅 장치(100)는 타겟(target) 의료기관의 타겟 의료데이터셋에 기초하여, 통합 합성데이터셋으로부터 타겟 의료데이터셋에 대한 분석 결과를 획득할 수 있다. 일 실시예에서, 타겟 의료기관이란 분석 결과를 획득하려는 대상이 되는 의료기관일 수 있다. 예를 들어, 컴퓨팅 장치(100)는 타겟 의료기관에 대한 분석 결과를 획득하려는 연구기관 또는 연구자로부터 타겟 의료기관에 대한 정보를 획득할 수 있다. 컴퓨팅 장치(100)는 외부 디바이스 또는 외부 서버로부터 타겟 의료기관에 대한 정보를 획득할 수 있다. 일 실시예에서, 타겟 의료기관은 제 1 의료기관 및 제 2 의료기관 중 하나일 수 있다. 다른 실시예에서, 타겟 의료기관은 제 1 의료기관 및 제 2 의료기관에 포함되지 않는 의료기관일 수 있다. 타겟 의료데이터셋에 대한 분석 결과는 타겟 의료데이터셋에 있어, 기설정된 독립변수와 기설정된 종속변수 간 상관관계에 대한 정보를 포함할 수 있다. 일 실시예에서, 컴퓨팅 장치(100)는 외부 장치 또는 외부 서버로 타겟 의료데이터셋에 대한 분석 결과를 전송할 수 있다. 이를 통해, 분석을 의뢰한 연구기관 또는 연구자는 통합 합성데이터셋에 타겟 의료데이터셋이 반영된 분석 결과를 제공받을 수 있다는 기술적 효과가 달성될 수 있다.In step S303, the computing device 100 may obtain an analysis result for the target medical dataset from the integrated synthetic dataset based on the target medical dataset of the target medical institution. In one embodiment, the target medical institution may be a medical institution for which analysis results are to be obtained. For example, the computing device 100 may obtain information about the target medical institution from a research institution or researcher trying to obtain analysis results about the target medical institution. The computing device 100 may obtain information about the target medical institution from an external device or external server. In one embodiment, the target medical institution may be one of a first medical institution and a second medical institution. In another embodiment, the target medical institution may be a medical institution that is not included in the first medical institution and the second medical institution. The analysis results for the target medical dataset may include information about the correlation between preset independent variables and preset dependent variables in the target medical dataset. In one embodiment, the computing device 100 may transmit analysis results for the target medical dataset to an external device or external server. Through this, the technical effect can be achieved in that the research institution or researcher who requested the analysis can receive analysis results that reflect the target medical dataset in the integrated synthetic dataset.

예를 들어, 컴퓨팅 장치(100)는 통합 합성데이터셋에서의 기설정된 종속변수의 확률 분포에 타겟 의료데이터셋에서의 기설정된 종속변수의 확률 분포를 반영하여 타겟 의료데이터셋에 대한 분석 결과를 획득할 수 있다. 예를 들어, 컴퓨팅 장치(100)는 임상적 사전정보, 제 1 의료데이터셋 및 제 2 의료데이터셋에 기초하여 생성된 통합 합성데이터셋에서의 기설정된 독립변수에 대응되는 제 1 가중치에 타겟 의료데이터셋에서의 기설정된 독립변수에 대응되는 제 2 가중치를 반영하여 타겟 의료데이터셋에 대한 분석 결과를 획득할 수 있다. 이를 통해, 타겟 의료데이터셋의 확률 분포가 반영되면서도, 타겟 의료데이터셋에 과적합(over-fitting)되지 않은 분석 결과를 획득할 수 있다.For example, the computing device 100 obtains analysis results for the target medical dataset by reflecting the probability distribution of the preset dependent variable in the target medical dataset to the probability distribution of the preset dependent variable in the integrated synthetic dataset. can do. For example, the computing device 100 may apply target medical information to a first weight corresponding to a preset independent variable in an integrated synthetic data set generated based on clinical prior information, a first medical data set, and a second medical data set. Analysis results for the target medical dataset can be obtained by reflecting the second weight corresponding to the preset independent variable in the dataset. Through this, it is possible to obtain analysis results that reflect the probability distribution of the target medical dataset and are not over-fitted to the target medical dataset.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터, 기설정된 종속변수와 상관관계에 있고 그리고 기설정된 독립변수에 포함되지 않는 제 1 독립변수를 결정할 수 있다. 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋에 포함된 복수의 변수들 각각에 대한 관계식을 산출하고, 기설정된 종속변수와 상관관계에 있는 변수를 제 1 독립변수로 결정할 수 있다. 이 때, 타겟 의료데이터셋에 대한 분석 결과는, 제 1 독립변수와 기설정된 종속변수 간 상관관계에 대한 정보를 포함할 수 있다. 이를 통해, 기설정된 종속변수와 상관관계에 있는 복수의 변수들 중 연구기관 또는 연구자가 고려하지 못한 독립변수를 확인할 수 있는 기술적 효과가 달성될 수 있다. In one embodiment, the computing device 100 may determine, from the first medical dataset and the second medical dataset, a first independent variable that is correlated with a preset dependent variable and is not included in the preset independent variable. . The computing device 100 may calculate a relational expression for each of the plurality of variables included in the first medical data set and the second medical data set, and determine the variable that is correlated with the preset dependent variable as the first independent variable. there is. At this time, the analysis result of the target medical dataset may include information about the correlation between the first independent variable and the preset dependent variable. Through this, a technical effect can be achieved that can identify independent variables that the research institution or researcher has not considered among a plurality of variables that are correlated with a preset dependent variable.

일 실시예에서, 컴퓨팅 장치(100)는 임상적 사전정보, 통합 합성데이터셋 및 타겟 의료데이터셋으로부터 타겟 의료데이터셋에 대한 분석 결과에 반영될 가중치를 획득할 수 있다. 타겟 의료데이터셋에 대한 분석 결과에 반영될 가중치를 획득하는 방법에 대한 자세한 설명은 도 5를 참조하여 후술하도록 한다.In one embodiment, the computing device 100 may obtain weights to be reflected in analysis results for the target medical dataset from clinical prior information, an integrated synthetic dataset, and the target medical dataset. A detailed description of the method for obtaining weights to be reflected in the analysis results of the target medical dataset will be described later with reference to FIG. 5.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 복수의 통합 합성데이터셋을 생성할 수 있다. 이하에서는, 설명의 편의를 위해 컴퓨팅 장치(100)가 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 제 1 통합 합성데이터셋 및 제 2 통합 합성데이터셋을 생성하는 방법을 예시로 들어 설명하도록 한다. In one embodiment, the computing device 100 may generate a plurality of integrated synthetic datasets from the first medical dataset and the second medical dataset. Hereinafter, for convenience of explanation, a method by which the computing device 100 generates a first integrated synthetic dataset and a second integrated synthetic dataset from the first medical dataset and the second medical dataset will be described as an example. .

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 제 1 통합 합성데이터셋을 생성할 수 있다. 또한, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 제 2 통합 합성데이터셋을 생성할 수 있다. 컴퓨팅 장치(100)는 타겟 의료데이터셋에 기초하여, 제 1 통합 합성데이터셋 및 제 2 통합 합성데이터셋으로부터 타겟 의료데이터셋에 대한 분석 결과를 획득할 수 있다. 제 1 통합 합성데이터셋은 기설정된 독립변수에 대응되는 제 1-1 가중치를 포함하고, 제 2 통합 합성데이터셋은 기설정된 독립변수에 대응되는 제 1-2 가중치를 포함할 수 있다. 이 때, 제 1-1 가중치 및 제 1-2 가중치는 서로 상이할 수 있다. 예를 들어, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 통합 합성데이터셋을 생성하는 경우, 생성되는 각 합성데이터에 포함된 기설정된 독립변수 및 기설정된 종속변수의 수치는 설정된 확률 분포 내에서 서로 상이할 수 있다. 이에 따라, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 통합 합성데이터셋을 생성할 때마다, 통합 합성데이터셋에 포함되는 합성데이터의 값 및 통합 합성데이터셋의 제 1 가중치는 서로 달라질 수 있다.In one embodiment, the computing device 100 may generate a first integrated synthetic dataset from the first medical dataset and the second medical dataset. Additionally, the computing device 100 may generate a second integrated synthetic dataset from the first medical dataset and the second medical dataset. The computing device 100 may obtain analysis results for the target medical dataset from the first integrated synthetic dataset and the second integrated synthetic dataset, based on the target medical dataset. The first integrated synthetic data set may include a 1-1 weight corresponding to a preset independent variable, and the second integrated synthetic data set may include a 1-2 weight corresponding to a preset independent variable. At this time, the 1-1 weight and the 1-2 weight may be different from each other. For example, when creating an integrated synthetic data set from the first medical data set and the second medical data set, the values of the preset independent variable and the preset dependent variable included in each synthetic data generated are within the set probability distribution. may be different from each other. Accordingly, each time an integrated synthetic dataset is created from the first medical dataset and the second medical dataset, the value of the synthetic data included in the integrated synthetic dataset and the first weight of the integrated synthetic dataset may be different. .

일 실시예에서, 제 1 통합 합성데이터셋 및 제 2 통합 합성데이터셋 각각에 포함된 기설정된 종속변수는 서로 상이한 평균 및 표준편차에 기초한 확률 분포를 가질 수 있다. 예를 들어, 제 1 통합 합성데이터셋 및 제 2 통합 합성데이터셋 각각에 대응되는 제 1-1 가중치 및 제 1-2 가중치가 서로 상이할 경우, 제 1 통합 합성데이터셋 및 제 2 통합 합성데이터셋 각각의 확률 분포는 서로 상이할 수 있다.In one embodiment, preset dependent variables included in each of the first integrated synthetic data set and the second integrated synthetic data set may have probability distributions based on different means and standard deviations. For example, when the 1-1 weight and the 1-2 weight corresponding to each of the first integrated synthetic data set and the second integrated synthetic data set are different from each other, the first integrated synthetic data set and the second integrated synthetic data set The probability distribution of each of the three may be different.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 통합 합성데이터셋으로부터 타겟 의료데이터셋에 대한 제 1 분석 결과를 획득할 수 있다. 또한, 컴퓨팅 장치(100)는 제 2 통합 합성데이터셋으로부터 타겟 의료데이터셋에 대한 제 2 분석 결과를 획득할 수 있다. 타겟 의료데이터셋에 대한 제 1 분석 결과 및 제 2 분석 결과는 기설정된 종속변수에 대해 서로 상이한 제 3 가중치 및 확률 분포를 가질 수 있다. 즉, 타겟 의료데이터셋에 대한 제 1 분석 결과 및 제 2 분석 결과는 서로 상이한 분석 결과일 수 있다. 컴퓨팅 장치(100)는 제 1 분석 결과 및 제 2 분석 결과에 기초하여, 타겟 의료데이터셋에 대한 분석 결과를 획득할 수 있다. 예를 들어, 타겟 의료데이터셋에 대한 분석 결과는 제 1 분석 결과 및 제 2 분석 결과를 포함하는 분석 결과일 수 있다. 일 실시예에서, 컴퓨팅 장치(100)는 외부 장치 또는 외부 서버로 제 1 분석 결과 및 제 2 분석 결과를 포함하는 타겟 의료데이터셋에 대한 분석 결과를 전송할 수 있다. 이를 통해, 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터 복수의 합성데이터셋을 생성하고, 복수의 합성데이터셋에 기초하여 타겟 의료데이터셋에 대한 다중 시뮬레이션을 수행하여, 분석을 의뢰한 연구기관 또는 연구자에게 신뢰도 높은 분석 결과를 제공할 수 있다는 기술적 효과를 달성할 수 있다.In one embodiment, the computing device 100 may obtain a first analysis result for the target medical dataset from the first integrated synthetic dataset. Additionally, the computing device 100 may obtain a second analysis result for the target medical dataset from the second integrated synthetic dataset. The first analysis result and the second analysis result for the target medical dataset may have different third weights and probability distributions for the preset dependent variable. That is, the first analysis result and the second analysis result for the target medical dataset may be different analysis results. The computing device 100 may obtain an analysis result for the target medical dataset based on the first analysis result and the second analysis result. For example, the analysis result for the target medical dataset may be an analysis result including a first analysis result and a second analysis result. In one embodiment, the computing device 100 may transmit the analysis result of the target medical dataset including the first analysis result and the second analysis result to an external device or an external server. Through this, multiple synthetic datasets are created from the first and second medical datasets, multiple simulations are performed on the target medical dataset based on the multiple synthetic datasets, and the research institution that requested the analysis Alternatively, the technical effect of providing researchers with highly reliable analysis results can be achieved.

도 4는 본 개시의 일 실시예에 따라 컴퓨팅 장치가 타겟 의료데이터셋에 대한 분석 결과를 획득하는 과정을 나타내는 개략도이다.Figure 4 is a schematic diagram showing a process by which a computing device obtains analysis results for a target medical dataset according to an embodiment of the present disclosure.

도 4를 참조하면, 일 실시예에 따른 컴퓨팅 장치(100)는 제 1 의료기관, 제 2 의료기관 및 제 3 의료기관으로부터 제 1 의료데이터셋(401a), 제 2 의료데이터셋(401b), 제 3 의료데이터셋(401c)을 획득할 수 있다. 컴퓨팅 장치(100)는 제 1 의료데이터셋(401a)으로부터 제 1 합성데이터셋(402a)을 생성할 수 있다. 컴퓨팅 장치(100)는 제 2 의료데이터셋(401b)으로부터 제 2 합성데이터셋(402b)을 생성할 수 있다. 컴퓨팅 장치(100)는 제 3 의료데이터셋(401c)으로부터 제 3 합성데이터셋(402c)을 생성할 수 있다. 컴퓨팅 장치(100)는 제 1 합성데이터셋(402a), 제 2 합성데이터셋(402b) 및 제 3 합성데이터셋(402c)을 통합하여 통합 합성데이터셋(403)을 생성할 수 있다. Referring to FIG. 4, the computing device 100 according to an embodiment receives first medical data set 401a, second medical data set 401b, and third medical data from the first medical institution, the second medical institution, and the third medical institution. A dataset 401c can be obtained. The computing device 100 may generate a first synthetic dataset 402a from the first medical dataset 401a. The computing device 100 may generate a second synthetic dataset 402b from the second medical dataset 401b. The computing device 100 may generate a third synthetic dataset 402c from the third medical dataset 401c. The computing device 100 may generate an integrated synthetic dataset 403 by integrating the first synthetic dataset 402a, the second synthetic dataset 402b, and the third synthetic dataset 402c.

일 실시예에서, 컴퓨팅 장치(100)는 타겟 의료기관의 타겟 의료데이터셋(404)에 기초하여, 통합 합성데이터셋(403)으로부터 타겟 의료데이터셋(404)에 대한 분석 결과(405)를 획득할 수 있다. 예를 들어, 컴퓨팅 장치(100)는 통합 합성데이터셋(403)에서의 기설정된 종속변수의 확률 분포에 타겟 의료데이터셋(404)에서의 기설정된 종속변수의 확률 분포를 반영하여, 타겟 의료데이터셋(404)에 대한 분석 결과(405)를 획득할 수 있다. 이하에서는, 설명의 편의를 위해 컴퓨팅 장치(100)가 제 1 의료데이터셋(401a) 및 제 2 의료데이터셋(401b)으로부터 통합 합성데이터셋(403)을 생성하는 방법을 예시로 들어 설명하도록 한다. 다만, 이에 한정되지 않고, 컴퓨팅 장치(100)는 동일한 방법으로 제 1 의료데이터셋(401a), 제 2 의료데이터셋(401b) 및 제 3 의료데이터셋(401c)으로부터 통합 합성데이터셋(403)을 생성할 수 있다.In one embodiment, the computing device 100 obtains an analysis result 405 for the target medical dataset 404 from the integrated synthetic dataset 403 based on the target medical dataset 404 of the target medical institution. You can. For example, the computing device 100 reflects the probability distribution of the preset dependent variable in the target medical dataset 404 to the probability distribution of the preset dependent variable in the integrated synthetic dataset 403, thereby generating target medical data. The analysis result 405 for the set 404 can be obtained. Hereinafter, for convenience of explanation, a method by which the computing device 100 generates the integrated synthetic data set 403 from the first medical data set 401a and the second medical data set 401b will be described as an example. . However, it is not limited to this, and the computing device 100 generates an integrated synthetic dataset 403 from the first medical dataset 401a, the second medical dataset 401b, and the third medical dataset 401c in the same manner. can be created.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋(401a) 및 제 2 의료데이터셋(401b)으로부터 제 1 의료데이터셋(401a)에 대응되는 제 1 합성데이터셋(402a) 및 제 2 의료데이터셋(401b)에 대응되는 제 2 합성데이터셋(402b)을 생성할 수 있다. 컴퓨팅 장치(100)는 제 1 합성데이터셋(402a) 및 제 2 합성데이터셋(402b)을 통합하여 통합 합성데이터셋(403)을 생성할 수 있다.In one embodiment, the computing device 100 generates a first synthetic dataset 402a and a first synthetic dataset corresponding to the first medical dataset 401a from the first medical dataset 401a and the second medical dataset 401b. 2 A second synthetic data set (402b) corresponding to the medical data set (401b) can be created. The computing device 100 may generate an integrated synthetic dataset 403 by integrating the first synthetic dataset 402a and the second synthetic dataset 402b.

일 실시예에서, 컴퓨팅 장치(100)는 사전 학습된 GAN 모델을 이용하여, 제 1 의료데이터셋(401a) 및 제 2 의료데이터셋(401b)으로부터 제 1 합성데이터셋(402a) 및 제 2 합성데이터셋(402b)을 생성할 수 있다. 예를 들어, 컴퓨팅 장치(100)는 사전 학습된 GAN 모델을 이용하여, 제 1 의료데이터셋(401a)에서의 기설정된 종속변수의 확률분포를 가지는 제 1 합성데이터셋(402a)을 생성할 수 있다. 또한, 컴퓨팅 장치(100)는 사전 학습된 GAN 모델을 이용하여, 제 2 의료데이터셋(401b)에서의 기설정된 종속변수의 확률분포를 가지는 제 2 합성데이터셋(402b)을 생성할 수 있다. In one embodiment, the computing device 100 uses a pre-trained GAN model to generate a first synthetic dataset 402a and a second synthesis from the first medical dataset 401a and the second medical dataset 401b. A dataset 402b can be created. For example, the computing device 100 may use a pre-trained GAN model to generate a first synthetic dataset 402a having a probability distribution of a preset dependent variable in the first medical dataset 401a. there is. Additionally, the computing device 100 may use a pre-trained GAN model to generate a second synthetic dataset 402b having a probability distribution of a preset dependent variable in the second medical dataset 401b.

일 실시예에서, 컴퓨팅 장치(100)는 임상적 사전정보에서의 기설정된 종속변수의 확률 분포를 따르는 임상 의료데이터셋을 생성할 수 있다. 컴퓨팅 장치(100)는 사전 학습된 GAN 모델을 이용하여, 제 1 의료데이터셋(401a) 및 임상 의료데이터셋으로부터 제 1 합성데이터셋(402a)을 생성할 수 있다. 컴퓨팅 장치(100)는 임상적 사전정보와 제 1 의료데이터셋(401a) 간 기설정된 제 1 반영 비율에 기초하여, 생성할 임상 의료데이터셋의 개수를 결정할 수 있다. 예를 들어, 생성될 임상 의료데이터셋의 개수는 제 1 의료데이터셋(401a)의 개수 및 기설정된 제 1 반영 비율에 기초하여 결정될 수 있다. 컴퓨팅 장치(100)는 동일한 방식으로 사전 학습된 GAN 모델을 이용하여 제 2 의료데이터셋(401b) 및 제 3 의료데이터셋(401c)으로부터 제 2 합성데이터셋(402b) 및 제 3 합성데이터셋(402c)을 생성할 수 있다.In one embodiment, the computing device 100 may generate a clinical medical data set that follows a probability distribution of a preset dependent variable in clinical prior information. The computing device 100 may generate a first synthetic dataset 402a from the first medical dataset 401a and the clinical medical dataset using a pre-trained GAN model. The computing device 100 may determine the number of clinical medical datasets to be created based on the preset first reflection ratio between the clinical prior information and the first medical dataset 401a. For example, the number of clinical medical datasets to be generated may be determined based on the number of first medical datasets 401a and a preset first reflection ratio. The computing device 100 generates a second synthetic dataset 402b and a third synthetic dataset (402b) from the second medical dataset 401b and the third medical dataset 401c using a GAN model pre-trained in the same manner. 402c) can be created.

일 실시예에서, 컴퓨팅 장치(100)는 SMOTE(Synthetic Minority Oversampling Technique) 알고리즘을 이용하여, 제 1 의료데이터셋(401a) 및 제 2 의료데이터셋(401b)으로부터 제 1 합성데이터셋(402a) 및 제 2 합성데이터셋(402b)을 생성할 수 있다. 일 실시예에서, SMOTE 알고리즘이란, 데이터셋에서 샘플링된 데이터에 대해 K-NN(K-Nearest Neighbor) 알고리즘으로 같은 클래스에 속한 K개의 최근접 이웃 데이터들을 산출하고, 샘플링된 데이터와 K개의 최근접 이웃 데이터들 각각과 연결된 선분 상에 위치한 임의의 점에 대응되는 데이터를 합성데이터로 생성하는 알고리즘을 의미할 수 있다. In one embodiment, the computing device 100 uses the SMOTE (Synthetic Minority Oversampling Technique) algorithm to generate a first synthetic dataset 402a and a first synthetic dataset 402a from the first medical dataset 401a and the second medical dataset 401b. A second synthetic data set 402b can be created. In one embodiment, the SMOTE algorithm calculates K nearest neighbor data belonging to the same class using the K-NN (K-Nearest Neighbor) algorithm for data sampled from a dataset, and combines the sampled data with K nearest neighbors. It may refer to an algorithm that generates data corresponding to an arbitrary point located on a line segment connected to each of neighboring data as synthetic data.

예를 들어, 컴퓨팅 장치(100)는 복수의 독립변수들을 축으로 하는 평면 또는 공간 상에서 제 1 의료데이터셋(401a)에 포함된 제 1 의료데이터와 최근접 이웃 의료데이터들 각각을 연결한 선분 상에 위치한 임의의 점을 선택할 수 있다. 컴퓨팅 장치(100)는 선택된 점에 대응되는 복수의 독립변수들 각각의 수치를 가지는 제 1 합성데이터를 생성할 수 있다. 이 때, 제 1 합성데이터의 종속변수는 제 1 의료데이터 및 최근접 이웃 의료데이터의 종속변수 클래스와 동일한 클래스를 가질 수 있다. 예를 들어, 종속변수 클래스는 질병(또는 부작용)의 발생 유무를 나타내는 클래스일 수 있다. 컴퓨팅 장치(100)는 복수의 제 1 합성데이터로부터 제 1 합성데이터셋(402a)을 생성할 수 있다. 컴퓨팅 장치(100)는 동일한 방식으로 SMOTE 알고리즘을 이용하여 제 2 의료데이터셋(401b) 및 제 3 의료데이터셋(401c)으로부터 제 2 합성데이터셋(402b) 및 제 3 합성데이터셋(402c)을 생성할 수 있다.For example, the computing device 100 operates on a line segment connecting each of the first medical data included in the first medical data set 401a and the nearest neighboring medical data on a plane or space with a plurality of independent variables as axes. You can select any point located in . The computing device 100 may generate first synthetic data having numerical values of each of a plurality of independent variables corresponding to the selected point. At this time, the dependent variable of the first synthetic data may have the same class as the dependent variable class of the first medical data and the nearest neighbor medical data. For example, the dependent variable class may be a class that indicates whether a disease (or side effect) occurs. The computing device 100 may generate a first synthetic data set 402a from a plurality of first synthetic data. The computing device 100 generates the second synthetic dataset 402b and the third synthetic dataset 402c from the second medical dataset 401b and the third medical dataset 401c using the SMOTE algorithm in the same manner. can be created.

일 실시예에서, 컴퓨팅 장치(100)는 베이지안(Bayesian) 추론 기반의 베이지안 알고리즘을 이용하여, 임상적 사전정보에 제 1 의료데이터셋(401a) 및 제 2 의료데이터셋(401b)을 반영하여 제 1 합성데이터셋(402a) 및 제 2 의료데이터셋(402b)을 생성할 수 있다. 예를 들어, 컴퓨팅 장치(100)는 베이지안(Bayesian) 추론 기반의 베이지안 알고리즘을 이용하여, 임상적 사전정보에 제 1 의료데이터셋(401a)을 반영하여 제 1 합성데이터셋(402a)을 생성할 수 있다. 마찬가지로, 컴퓨팅 장치(100)는 베이지안(Bayesian) 추론 기반의 베이지안 알고리즘을 이용하여, 임상적 사전정보에 제 2 의료데이터셋(401b)을 반영하여 제 2 의료데이터셋(402b)을 생성할 수 있다. 일 실시예에서, 베이지안 알고리즘이란 사전 확률 분포에 의료데이터들을 적용하여 사후 확률 분포를 추론하는 알고리즘을 의미할 수 있다. In one embodiment, the computing device 100 uses a Bayesian algorithm based on Bayesian inference to reflect the first medical data set 401a and the second medical data set 401b in the clinical prior information. 1 A synthetic dataset 402a and a second medical dataset 402b can be created. For example, the computing device 100 uses a Bayesian algorithm based on Bayesian inference to reflect the first medical data set 401a in the clinical prior information to generate the first synthetic data set 402a. You can. Likewise, the computing device 100 may generate the second medical data set 402b by reflecting the second medical data set 401b in the clinical prior information using a Bayesian algorithm based on Bayesian inference. . In one embodiment, the Bayesian algorithm may refer to an algorithm that infers a posterior probability distribution by applying medical data to a prior probability distribution.

예를 들어, 컴퓨팅 장치(100)는 임상적 사전정보에 기초하여, 기설정된 독립변수에 따른 기설정된 종속변수의 사전 확률 분포를 획득할 수 있다. 컴퓨팅 장치(100)는 제 1 의료데이터셋(401a)에 포함된 제 1 의료데이터에 대해, 확률 분포를 베타분포, 감마분포, 정규 분포 등으로 설정하고, 가능도를 획득하여 사전 확률 분포에 적용할 수 있다. 컴퓨팅 장치(100)는 사전 확률 분포에 제 1 의료데이터셋(401a)을 적용하여 기설정된 독립변수에 따른 기설정된 종속변수의 사후 확률 분포를 획득할 수 있다. 컴퓨팅 장치(100)는 획득된 사후 확률 분포의 평균값 및 표준편차에 기초하여, 제 1 합성데이터셋(402a)을 생성할 수 있다. 컴퓨팅 장치(100)는 획득된 사후 확률 분포의 평균값 및 표준편차와 동일하거나 유사한 평균값 및 표준편차에 기초한 확률 분포를 따르는 제 1 합성데이터셋(402a)을 생성할 수 있다. 컴퓨팅 장치(100)는 동일한 방식으로 베이지안(Bayesian) 추론 기반의 베이지안 알고리즘을 이용하여 제 2 의료데이터셋(401b) 및 제 3 의료데이터셋(401c)으로부터 제 2 합성데이터셋(402b) 및 제 3 합성데이터셋(402c)을 생성할 수 있다.For example, the computing device 100 may obtain a prior probability distribution of a preset dependent variable according to a preset independent variable, based on clinical prior information. The computing device 100 sets the probability distribution to beta distribution, gamma distribution, normal distribution, etc. for the first medical data included in the first medical data set 401a, obtains the likelihood, and applies it to the prior probability distribution. can do. The computing device 100 may obtain a posterior probability distribution of a preset dependent variable according to a preset independent variable by applying the first medical dataset 401a to the prior probability distribution. The computing device 100 may generate a first synthetic data set 402a based on the mean value and standard deviation of the obtained posterior probability distribution. The computing device 100 may generate a first synthetic data set 402a that follows a probability distribution based on a mean value and standard deviation that are the same as or similar to the mean value and standard deviation of the obtained posterior probability distribution. The computing device 100 uses a Bayesian algorithm based on Bayesian inference in the same manner to obtain the second synthetic data set 402b and the third medical data set 401c from the second medical data set 401b and the third medical data set 401c. A synthetic data set 402c can be created.

예를 들어, 연구 목적이 고혈압 환자에 대한 푸로세마이드(furosemide)의 혈압 조절 효과에 관한 연구인 경우, 컴퓨팅 장치(100)는 독립변수를 고혈압 유무, 푸로세마이드 복용 여부, 복용 기간, 복용량 등으로 설정하고, 종속변수를 수축기혈압(Systolic Blood Pressure, SBP)으로 설정할 수 있다. 컴퓨팅 장치(100)는 임상적 사전정보에 포함된 선행 임상연구 또는 임상 통계자료에 기초하여, 푸로세마이드 복용에 따른 수축기혈압의 사전 확률 분포를 획득할 수 있다. 컴퓨팅 장치(100)는 사전 확률 분포에 제 1 의료데이터셋 및 제 2 의료데이터셋을 적용하여 푸로세마이드 복용에 따른 수축기혈압의 사후 확률 분포를 획득할 수 있다. 컴퓨팅 장치(100)는 획득된 사후 확률 분포의 평균값 및 표준편차에 기초한 확률 분포를 따르는 제 1 합성데이터셋(402a)을 생성할 수 있다.For example, if the research purpose is to study the blood pressure control effect of furosemide on patients with high blood pressure, the computing device 100 may use independent variables such as presence of high blood pressure, whether furosemide is taken, period of taking furosemide, dosage, etc. and the dependent variable can be set to systolic blood pressure (SBP). The computing device 100 may obtain a prior probability distribution of systolic blood pressure according to taking furosemide based on prior clinical studies or clinical statistical data included in clinical prior information. The computing device 100 may obtain a posterior probability distribution of systolic blood pressure according to taking furosemide by applying the first medical dataset and the second medical dataset to the prior probability distribution. The computing device 100 may generate a first synthetic data set 402a that follows a probability distribution based on the mean value and standard deviation of the obtained posterior probability distribution.

도 5는 본 개시의 일 실시예에 따른 타겟 의료기관의 가중치를 구하는 과정을 나타내는 개략도이다.Figure 5 is a schematic diagram showing a process for calculating the weight of a target medical institution according to an embodiment of the present disclosure.

일 실시예에서, 컴퓨팅 장치(100)는 임상적 사전정보(501), 통합 합성데이터셋(502) 및 타겟 의료데이터셋(503)에 기초하여, 타겟 의료데이터셋(503)에 대한 분석 결과에 적용될 기설정된 독립변수에 대응되는 제 3 가중치(504)를 획득할 수 있다. 일 실시예에서, 통합 합성데이터셋(502) 생성에 있어서 임상적 사전정보(501)가 반영된 경우, 컴퓨팅 장치(100)는 통합 합성데이터셋(502) 및 타겟 의료기관의 타겟 의료데이터셋(503)에 기초하여, 타겟 의료데이터셋(503)에 대한 분석 결과에 적용될 기설정된 독립변수에 대응되는 제 3 가중치(504)를 획득할 수 있다. In one embodiment, the computing device 100 provides analysis results for the target medical dataset 503 based on the clinical prior information 501, the integrated synthetic dataset 502, and the target medical dataset 503. A third weight 504 corresponding to the preset independent variable to be applied can be obtained. In one embodiment, when the clinical prior information 501 is reflected in generating the integrated synthetic dataset 502, the computing device 100 generates the integrated synthetic dataset 502 and the target medical dataset 503 of the target medical institution. Based on this, a third weight 504 corresponding to a preset independent variable to be applied to the analysis result for the target medical dataset 503 can be obtained.

일 실시예에서, 통합 합성데이터셋(502)은 기설정된 독립변수에 대응되는 제 1 가중치를 포함할 수 있다. 컴퓨팅 장치(100)는 타겟 의료데이터셋(503)으로부터 기설정된 독립변수에 대응되는 제 2 가중치를 획득할 수 있다. 컴퓨팅 장치(100)는 제 1 가중치 및 제 2 가중치에 기초하여, 기설정된 독립변수에 대응되는 제 3 가중치(504)를 획득할 수 있다. 컴퓨팅 장치(100)는 기설정된 독립변수, 기설정된 종속변수 및 제 3 가중치(504)에 기초하여, 통합 합성데이터셋(502)으로부터 타겟 의료데이터셋(503)에 대한 분석 결과를 획득할 수 있다.In one embodiment, the integrated synthetic dataset 502 may include a first weight corresponding to a preset independent variable. The computing device 100 may obtain a second weight corresponding to a preset independent variable from the target medical dataset 503. The computing device 100 may obtain a third weight 504 corresponding to a preset independent variable based on the first weight and the second weight. The computing device 100 may obtain an analysis result for the target medical dataset 503 from the integrated synthetic dataset 502 based on the preset independent variable, the preset dependent variable, and the third weight 504. .

일 실시예에서, 컴퓨팅 장치(100)는 임상적 사전정보(501)에서의 기설정된 독립변수(및 기설정된 종속변수)의 통계량, 범위 및 단위, 타겟 의료데이터셋(503)의 데이터 개수, 확률 분포의 표준편차, 타겟 의료데이터셋(503)에서의 기설정된 독립변수 및 기설정된 종속변수의 수치, 임상적 사전정보(501) 대비 편차 및 우도비(likelihood ratio) 등에 기초하여, 제 3 가중치(504)를 획득할 수 있다. 예를 들어, 컴퓨팅 장치(100)는 임상적 사전정보(501) 대비 타겟 의료데이터셋(503)의 우도비가 클수록, 데이터 개수가 많을수록, 분산이 작을수록 제 3 가중치(504)를 크게 결정할 수 있다. 즉, 임상적 사전정보는 제 3 가중치(504)를 구하기 위한 전역 파라미터(global parameter)로, 타겟 의료데이터셋은 제 3 가중치(504)를 구하기 위한 지역적 파라미터(local parameter)로 작용할 수 있다.In one embodiment, the computing device 100 displays statistics, ranges, and units of preset independent variables (and preset dependent variables) in the clinical prior information 501, the number of data, and probability of the target medical dataset 503. The third weight ( 504) can be obtained. For example, the computing device 100 may determine the third weight 504 to be larger as the likelihood ratio of the target medical data set 503 relative to the clinical prior information 501 increases, the number of data increases, and the variance decreases. . That is, the clinical prior information can serve as a global parameter for calculating the third weight 504, and the target medical dataset can serve as a local parameter for calculating the third weight 504.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋의 제 1 의료데이터 개수, 제 2 의료데이터셋의 제 2 의료데이터 개수 및 타겟 의료데이터셋(503)의 타겟 의료데이터 수에 기초하여, 제 1 가중치 및 제 2 가중치 각각의 가중 정도를 결정할 수 있다. 예를 들어, 제 1 가중치 및 제 2 가중치 각각의 가중 정도는 제 3 가중치(504)에 반영될 반영 비율을 의미할 수 있다. 컴퓨팅 장치(100)는 통합 합성데이터셋(502)의 합성데이터 개수 및 타겟 의료데이터셋(503)의 타겟 의료데이터 개수에 기초하여, 데이터 개수가 더 많은 데이터셋의 확률 분포가 제 3 가중치(504)에 더 반영되도록 제 1 가중치 및 제 2 가중치 각각의 가중 정도를 결정할 수 있다. 이 때, 통합 합성데이터셋(502)의 합성데이터 개수는 제 1 의료데이터 개수 및 제 2 의료데이터 개수의 합산값과 동일할 수 있다. 컴퓨팅 장치(100)는 제 1 가중치 및 제 2 가중치 각각의 가중 정도에 기초하여, 제 3 가중치(504)를 획득할 수 있다. In one embodiment, the computing device 100 based on the number of first medical data in the first medical dataset, the number of second medical data in the second medical dataset, and the number of target medical data in the target medical dataset 503. , the weighting degree of each of the first weight and the second weight can be determined. For example, the weighting degree of each of the first weight and the second weight may mean a reflection ratio to be reflected in the third weight 504. Based on the number of synthetic data in the integrated synthetic data set 502 and the number of target medical data in the target medical data set 503, the computing device 100 determines the probability distribution of the dataset with a larger number of data by using the third weight 504 ) can be determined to determine the degree of weighting of each of the first weight and the second weight. At this time, the number of synthetic data in the integrated synthetic data set 502 may be equal to the sum of the number of first medical data and the number of second medical data. The computing device 100 may obtain the third weight 504 based on the weighting degree of each of the first weight and the second weight.

일 실시예에서, 컴퓨팅 장치(100)는 임상적 수치가 기록된 임상 의료데이터셋을 포함하는 임상적 사전정보(501)를 획득할 수 있다. 컴퓨팅 장치(100)는 임상 의료데이터셋의 임상 의료데이터 개수, 제 1 의료데이터셋의 제 1 의료데이터 개수, 제 2 의료데이터셋의 제 2 의료데이터 개수 및 타겟 의료데이터셋의 타겟 데이터 수에 기초하여, 제 1 가중치 및 제 2 가중치 각각의 가중 정도를 결정할 수 있다. 컴퓨팅 장치(100)는 통합 합성데이터셋(502)의 합성데이터 개수 및 타겟 의료데이터셋(503)의 타겟 의료데이터 개수를 비교하여, 데이터 개수가 더 많은 데이터셋의 확률 분포가 더 반영되도록 제 1 가중치 및 제 2 가중치 각각의 가중 정도를 결정할 수 있다. 이 때, 통합 합성데이터셋(502)의 합성데이터 개수는 임상 의료데이터 개수, 제 1 의료데이터 개수 및 제 2 의료데이터 개수의 합산값과 동일할 수 있다. 컴퓨팅 장치(100)는 제 1 가중치 및 제 2 가중치 각각의 가중 정도에 기초하여, 제 3 가중치(504)를 획득할 수 있다.In one embodiment, the computing device 100 may acquire clinical prior information 501 including a clinical medical dataset in which clinical values are recorded. The computing device 100 is based on the number of clinical medical data in the clinical medical dataset, the number of first medical data in the first medical dataset, the number of second medical data in the second medical dataset, and the number of target data in the target medical dataset. Thus, the weighting degree of each of the first weight and the second weight can be determined. The computing device 100 compares the number of synthetic data of the integrated synthetic data set 502 and the number of target medical data of the target medical data set 503, so that the probability distribution of the dataset with a larger number of data is more reflected. The weighting degree of each weight and the second weight can be determined. At this time, the number of synthetic data in the integrated synthetic data set 502 may be equal to the sum of the number of clinical medical data, the number of first medical data, and the number of second medical data. The computing device 100 may obtain the third weight 504 based on the weighting degree of each of the first weight and the second weight.

일 실시예에서, 컴퓨팅 장치(100)는 기설정된 독립변수, 기설정된 종속변수 및 제 3 가중치(504)에 기초하여, 통합 합성데이터셋(502)으로부터 타겟 의료데이터셋(503)에 대응되는 타겟 합성데이터셋을 생성할 수 있다. 타겟 합성데이터셋에서의 기설정된 종속변수의 확률 분포는 제 3 가중치(504)에 기초하여 결정될 수 있다. 컴퓨팅 장치(100)는 타겟 합성데이터셋으로부터 타겟 의료데이터셋(503)에 대한 분석 결과를 획득할 수 있다.In one embodiment, the computing device 100 selects a target corresponding to the target medical dataset 503 from the integrated synthetic dataset 502 based on the preset independent variable, the preset dependent variable, and the third weight 504. You can create synthetic datasets. The probability distribution of a preset dependent variable in the target synthetic data set may be determined based on the third weight 504. The computing device 100 may obtain an analysis result for the target medical dataset 503 from the target synthetic dataset.

일 실시예에서, 컴퓨팅 장치(100)는 제 1 의료데이터셋 및 제 2 의료데이터셋으로부터, 종속변수와 상관관계에 있고 그리고 기설정된 독립변수에 포함되지 않는 제 1 독립변수를 결정할 수 있다. 컴퓨팅 장치(100)는 기설정된 독립변수, 제 1 독립변수, 기설정된 종속변수 및 제 3 가중치(504)에 기초하여, 통합 합성데이터셋(502)으로부터 타겟 의료데이터셋에 대응되는 타겟 합성데이터셋을 생성할 수 있다. 컴퓨팅 장치(100)는 타겟 합성데이터셋으로부터 타겟 의료데이터셋(503)에 대한 분석 결과를 획득할 수 있다. 이를 통해, 기설정된 종속변수와 상관관계에 있는 복수의 변수들 중 연구기관 또는 연구자가 고려하지 못한 독립변수를 포함한 합성데이터셋을 생성할 수 있다는 기술적 효과가 달성될 수 있다.In one embodiment, the computing device 100 may determine, from the first medical dataset and the second medical dataset, a first independent variable that is correlated with the dependent variable and is not included in the preset independent variable. The computing device 100 generates a target synthetic dataset corresponding to the target medical dataset from the integrated synthetic dataset 502 based on the preset independent variable, the first independent variable, the preset dependent variable, and the third weight 504. can be created. The computing device 100 may obtain an analysis result for the target medical dataset 503 from the target synthetic dataset. Through this, the technical effect of being able to create a synthetic data set including independent variables that the research institution or researcher has not considered among a plurality of variables that are correlated with a preset dependent variable can be achieved.

일 실시예에서, 타겟 의료데이터셋(503)에 대한 분석 결과는, 기설정된 독립변수, 기설정된 종속변수, 제 3 가중치(504) 및 기설정된 독립변수와 기설정된 종속변수 간 상관관계에 관한 정보를 포함할 수 있다. 예를 들어, 분석 결과에 포함되는 독립변수와 종속변수 간 상관관계는 제 3 가중치(504)에 기초하여 획득될 수 있다.In one embodiment, the analysis results for the target medical data set 503 include information about a preset independent variable, a preset dependent variable, a third weight 504, and the correlation between the preset independent variable and the preset dependent variable. may include. For example, the correlation between independent variables and dependent variables included in the analysis results may be obtained based on the third weight 504.

본 개시의 일 실시예에 따라 데이터 구조를 저장한 컴퓨터 판독가능 매체가 개시된다. 데이터 구조는 데이터에 효율적인 접근 및 수정을 가능하게 하는 데이터의 조직, 관리, 저장을 의미할 수 있다. 데이터 구조는 특정 문제(예를 들어, 최단 시간으로 데이터 검색, 데이터 저장, 데이터 수정) 해결을 위한 데이터의 조직을 의미할 수 있다. 데이터 구조는 특정한 데이터 처리 기능을 지원하도록 설계된, 데이터 요소들 간의 물리적이거나 논리적인 관계로 정의될 수도 있다. 데이터 요소들 간의 논리적인 관계는 사용자 정의 데이터 요소들 간의 연결관계를 포함할 수 있다. 데이터 요소들 간의 물리적인 관계는 컴퓨터 판독가능 저장매체(예를 들어, 영구 저장 장치)에 물리적으로 저장되어 있는 데이터 요소들 간의 실제 관계를 포함할 수 있다. 데이터 구조는 구체적으로 데이터의 집합, 데이터 간의 관계, 데이터에 적용할 수 있는 함수 또는 명령어를 포함할 수 있다. 효과적으로 설계된 데이터 구조를 통해 컴퓨팅 장치(100)는 컴퓨팅 장치(100)의 자원을 최소한으로 사용하면서 연산을 수행할 수 있다. 구체적으로 컴퓨팅 장치(100)는 효과적으로 설계된 데이터 구조를 통해 연산, 읽기, 삽입, 삭제, 비교, 교환, 검색의 효율성을 높일 수 있다.According to an embodiment of the present disclosure, a computer-readable medium storing a data structure is disclosed. Data structure can refer to the organization, management, and storage of data to enable efficient access and modification of data. Data structure can refer to the organization of data to solve a specific problem (e.g., retrieving data, storing data, or modifying data in the shortest possible time). A data structure may be defined as a physical or logical relationship between data elements designed to support a specific data processing function. Logical relationships between data elements may include connection relationships between user-defined data elements. Physical relationships between data elements may include actual relationships between data elements that are physically stored in a computer-readable storage medium (e.g., a persistent storage device). A data structure may specifically include a set of data, relationships between data, and functions or instructions applicable to the data. Through an effectively designed data structure, the computing device 100 can perform calculations while minimally using the resources of the computing device 100. Specifically, the computing device 100 can increase the efficiency of operations, reading, insertion, deletion, comparison, exchange, and search through an effectively designed data structure.

데이터 구조는 데이터 구조의 형태에 따라 선형 데이터 구조와 비선형 데이터 구조로 구분될 수 있다. 선형 데이터 구조는 하나의 데이터 뒤에 하나의 데이터만이 연결되는 구조일 수 있다. 선형 데이터 구조는 리스트(List), 스택(Stack), 큐(Queue), 데크(Deque)를 포함할 수 있다. 리스트는 내부적으로 순서가 존재하는 일련의 데이터 집합을 의미할 수 있다. 리스트는 연결 리스트(Linked List)를 포함할 수 있다. 연결 리스트는 각각의 데이터가 포인터를 가지고 한 줄로 연결되어 있는 방식으로 데이터가 연결된 데이터 구조일 수 있다. 연결 리스트에서 포인터는 다음이나 이전 데이터와의 연결 정보를 포함할 수 있다. 연결 리스트는 형태에 따라 단일 연결 리스트, 이중 연결 리스트, 원형 연결 리스트로 표현될 수 있다. 스택은 제한적으로 데이터에 접근할 수 있는 데이터 나열 구조일 수 있다. 스택은 데이터 구조의 한 쪽 끝에서만 데이터를 처리(예를 들어, 삽입 또는 삭제)할 수 있는 선형 데이터 구조일 수 있다. 스택에 저장된 데이터는 늦게 들어갈수록 빨리 나오는 데이터 구조(LIFO-Last in First Out)일 수 있다. 큐는 제한적으로 데이터에 접근할 수 있는 데이터 나열 구조로서, 스택과 달리 늦게 저장된 데이터일수록 늦게 나오는 데이터 구조(FIFO-First in First Out)일 수 있다. 데크는 데이터 구조의 양 쪽 끝에서 데이터를 처리할 수 있는 데이터 구조일 수 있다.Data structures can be divided into linear data structures and non-linear data structures depending on the type of data structure. A linear data structure may be a structure in which only one piece of data is connected to another piece of data. Linear data structures may include List, Stack, Queue, and Deque. A list can refer to a set of data that has an internal order. The list may include a linked list. A linked list may be a data structure in which data is connected in such a way that each data is connected in a single line with a pointer. In a linked list, a pointer may contain connection information to the next or previous data. Depending on its form, a linked list can be expressed as a singly linked list, a doubly linked list, or a circularly linked list. A stack may be a data listing structure that allows limited access to data. A stack can be a linear data structure in which data can be processed (for example, inserted or deleted) at only one end of the data structure. Data stored in the stack may have a data structure (LIFO-Last in First Out) where the later it enters, the sooner it comes out. A queue is a data listing structure that allows limited access to data. Unlike the stack, it can be a data structure (FIFO-First in First Out) where data stored later is released later. A deck can be a data structure that can process data at both ends of the data structure.

비선형 데이터 구조는 하나의 데이터 뒤에 복수개의 데이터가 연결되는 구조일 수 있다. 비선형 데이터 구조는 그래프(Graph) 데이터 구조를 포함할 수 있다. 그래프 데이터 구조는 정점(Vertex)과 간선(Edge)으로 정의될 수 있으며 간선은 서로 다른 두개의 정점을 연결하는 선을 포함할 수 있다. 그래프 데이터 구조 트리(Tree) 데이터 구조를 포함할 수 있다. 트리 데이터 구조는 트리에 포함된 복수개의 정점 중에서 서로 다른 두개의 정점을 연결시키는 경로가 하나인 데이터 구조일 수 있다. 즉 그래프 데이터 구조에서 루프(loop)를 형성하지 않는 데이터 구조일 수 있다.A non-linear data structure may be a structure in which multiple pieces of data are connected behind one piece of data. Nonlinear data structures may include graph data structures. A graph data structure can be defined by vertices and edges, and an edge can include a line connecting two different vertices. Graph data structure may include a tree data structure. A tree data structure may be a data structure in which there is only one path connecting two different vertices among a plurality of vertices included in the tree. In other words, it may be a data structure that does not form a loop in the graph data structure.

본 명세서에 걸쳐, 연산 모델, 신경망, 네트워크 함수, 뉴럴 네트워크(neural network)는 동일한 의미로 사용될 수 있다. 이하에서는 신경망으로 통일하여 기술한다. 데이터 구조는 신경망을 포함할 수 있다. 그리고 신경망을 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망을 포함한 데이터 구조는 또한 신경망에 의한 처리를 위하여 전처리된 데이터, 신경망에 입력되는 데이터, 신경망의 가중치, 신경망의 하이퍼 파라미터, 신경망으로부터 획득한 데이터, 신경망의 각 노드 또는 레이어와 연관된 활성 함수, 신경망의 학습을 위한 손실 함수 등을 포함할 수 있다. 신경망을 포함한 데이터 구조는 상기 개시된 구성들 중 임의의 구성 요소들을 포함할 수 있다. 즉 신경망을 포함한 데이터 구조는 신경망에 의한 처리를 위하여 전처리된 데이터, 신경망에 입력되는 데이터, 신경망의 가중치, 신경망의 하이퍼 파라미터, 신경망으로부터 획득한 데이터, 신경망의 각 노드 또는 레이어와 연관된 활성 함수, 신경망의 학습을 위한 손실 함수 등 전부 또는 이들의 임의의 조합을 포함하여 구성될 수 있다. 전술한 구성들 이외에도, 신경망을 포함한 데이터 구조는 신경망의 특성을 결정하는 임의의 다른 정보를 포함할 수 있다. 또한, 데이터 구조는 신경망의 연산 과정에 사용되거나 발생되는 모든 형태의 데이터를 포함할 수 있으며 전술한 사항에 제한되는 것은 아니다. 컴퓨터 판독가능 매체는 컴퓨터 판독가능 기록 매체 및/또는 컴퓨터 판독가능 전송 매체를 포함할 수 있다. 신경망은 일반적으로 노드라 지칭될 수 있는 상호 연결된 계산 단위들의 집합으로 구성될 수 있다. 이러한 노드들은 뉴런(neuron)들로 지칭될 수도 있다. 신경망은 적어도 하나 이상의 노드들을 포함하여 구성된다.Throughout this specification, computational model, neural network, network function, and neural network may be used interchangeably. Below, it is described in a unified manner as a neural network. Data structures may include neural networks. And the data structure including the neural network may be stored in a computer-readable medium. Data structures including neural networks also include data preprocessed for processing by a neural network, data input to the neural network, weights of the neural network, hyperparameters of the neural network, data acquired from the neural network, activation functions associated with each node or layer of the neural network, neural network It may include a loss function for learning. A data structure containing a neural network may include any of the components disclosed above. In other words, the data structure including the neural network includes data preprocessed for processing by the neural network, data input to the neural network, weights of the neural network, hyperparameters of the neural network, data acquired from the neural network, activation functions associated with each node or layer of the neural network, neural network It may be configured to include all or any combination of loss functions for learning. In addition to the configurations described above, a data structure containing a neural network may include any other information that determines the characteristics of the neural network. Additionally, the data structure may include all types of data used or generated in the computational process of a neural network and is not limited to the above. Computer-readable media may include computer-readable recording media and/or computer-readable transmission media. A neural network can generally consist of a set of interconnected computational units, which can be referred to as nodes. These nodes may also be referred to as neurons. A neural network consists of at least one node.

데이터 구조는 신경망에 입력되는 데이터를 포함할 수 있다. 신경망에 입력되는 데이터를 포함하는 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망에 입력되는 데이터는 신경망 학습 과정에서 입력되는 학습 데이터 및/또는 학습이 완료된 신경망에 입력되는 입력 데이터를 포함할 수 있다. 신경망에 입력되는 데이터는 전처리(pre-processing)를 거친 데이터 및/또는 전처리 대상이 되는 데이터를 포함할 수 있다. 전처리는 데이터를 신경망에 입력시키기 위한 데이터 처리 과정을 포함할 수 있다. 따라서 데이터 구조는 전처리 대상이 되는 데이터 및 전처리로 발생되는 데이터를 포함할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 개시는 이에 제한되지 않는다.The data structure may include data input to the neural network. A data structure containing data input to a neural network may be stored in a computer-readable medium. Data input to the neural network may include learning data input during the neural network learning process and/or input data input to the neural network on which training has been completed. Data input to the neural network may include data that has undergone pre-processing and/or data subject to pre-processing. Preprocessing may include a data processing process to input data into a neural network. Therefore, the data structure may include data subject to preprocessing and data generated by preprocessing. The above-described data structure is only an example and the present disclosure is not limited thereto.

데이터 구조는 신경망의 가중치를 포함할 수 있다. (본 명세서에서 가중치, 파라미터는 동일한 의미로 사용될 수 있다.) 그리고 신경망의 가중치를 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 신경망은 복수개의 가중치를 포함할 수 있다. 가중치는 가변적일 수 있으며, 신경망이 원하는 기능을 수행하기 위해, 사용자 또는 알고리즘에 의해 가변 될 수 있다. 예를 들어, 하나의 출력 노드에 하나 이상의 입력 노드가 각각의 링크에 의해 상호 연결된 경우, 출력 노드는 상기 출력 노드와 연결된 입력 노드들에 입력된 값들 및 각각의 입력 노드들에 대응하는 링크에 설정된 가중치에 기초하여 출력 노드에서 출력되는 데이터 값을 결정할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 개시는 이에 제한되지 않는다.The data structure may include the weights of the neural network. (In this specification, weights and parameters may be used with the same meaning.) And the data structure including the weights of the neural network may be stored in a computer-readable medium. A neural network may include multiple weights. Weights may be variable and may be varied by the user or algorithm in order for the neural network to perform the desired function. For example, when one or more input nodes are connected to one output node by respective links, the output node is set to the values input to the input nodes connected to the output node and the links corresponding to each input node. Based on the weight, the data value output from the output node can be determined. The above-described data structure is only an example and the present disclosure is not limited thereto.

제한이 아닌 예로서, 가중치는 신경망 학습 과정에서 가변되는 가중치 및/또는 신경망 학습이 완료된 가중치를 포함할 수 있다. 신경망 학습 과정에서 가변되는 가중치는 학습 사이클이 시작되는 시점의 가중치 및/또는 학습 사이클 동안 가변되는 가중치를 포함할 수 있다. 신경망 학습이 완료된 가중치는 학습 사이클이 완료된 가중치를 포함할 수 있다. 따라서 신경망의 가중치를 포함한 데이터 구조는 신경망 학습 과정에서 가변되는 가중치 및/또는 신경망 학습이 완료된 가중치를 포함한 데이터 구조를 포함할 수 있다. 그러므로 상술한 가중치 및/또는 각 가중치의 조합은 신경망의 가중치를 포함한 데이터 구조에 포함되는 것으로 한다. 전술한 데이터 구조는 예시일 뿐 본 개시는 이에 제한되지 않는다.As an example and not a limitation, the weights may include weights that are changed during the neural network learning process and/or weights for which neural network learning has been completed. Weights that change during the neural network learning process may include weights that change at the start of the learning cycle and/or weights that change during the learning cycle. Weights for which neural network training has been completed may include weights for which a learning cycle has been completed. Therefore, the data structure including the weights of the neural network may include weights that are changed during the neural network learning process and/or the data structure including the weights for which neural network learning has been completed. Therefore, the above-mentioned weights and/or combinations of each weight are included in the data structure including the weights of the neural network. The above-described data structure is only an example and the present disclosure is not limited thereto.

신경망의 가중치를 포함한 데이터 구조는 직렬화(serialization) 과정을 거친 후 컴퓨터 판독가능 저장 매체(예를 들어, 메모리, 하드 디스크)에 저장될 수 있다. 직렬화는 데이터 구조를 동일하거나 다른 컴퓨팅 장치(100)에 저장하고 나중에 다시 재구성하여 사용할 수 있는 형태로 변환하는 과정일 수 있다. 컴퓨팅 장치(100)는 데이터 구조를 직렬화하여 네트워크를 통해 데이터를 송수신할 수 있다. 직렬화된 신경망의 가중치를 포함한 데이터 구조는 역직렬화(deserialization)를 통해 동일한 컴퓨팅 장치(100) 또는 다른 컴퓨팅 장치(100)에서 재구성될 수 있다. 신경망의 가중치를 포함한 데이터 구조는 직렬화에 한정되는 것은 아니다. 나아가 신경망의 가중치를 포함한 데이터 구조는 컴퓨팅 장치(100)의 자원을 최소한으로 사용하면서 연산의 효율을 높이기 위한 데이터 구조(예를 들어, 비선형 데이터 구조에서 B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree)를 포함할 수 있다. 전술한 사항은 예시일 뿐 본 개시는 이에 제한되지 않는다.The data structure including the weights of the neural network may be stored in a computer-readable storage medium (e.g., memory, hard disk) after going through a serialization process. Serialization may be the process of converting a data structure into a form that can be stored on the same or a different computing device 100 and later reconstructed and used. The computing device 100 can transmit and receive data over a network by serializing the data structure. The data structure including the weights of the serialized neural network can be reconstructed in the same computing device 100 or another computing device 100 through deserialization. The data structure including the weights of the neural network is not limited to serialization. Furthermore, the data structure including the weights of the neural network is a data structure to increase computational efficiency while minimizing the use of resources of the computing device 100 (e.g., in a non-linear data structure, B-Tree, Trie, m-way search tree, AVL tree, Red-Black Tree) may be included. The foregoing is merely an example and the present disclosure is not limited thereto.

데이터 구조는 신경망의 하이퍼 파라미터(Hyper-parameter)를 포함할 수 있다. 그리고 신경망의 하이퍼 파라미터를 포함한 데이터 구조는 컴퓨터 판독가능 매체에 저장될 수 있다. 하이퍼 파라미터는 사용자에 의해 가변되는 변수일 수 있다. 하이퍼 파라미터는 예를 들어, 학습률(learning rate), 비용 함수(cost function), 학습 사이클 반복 횟수, 가중치 초기화(Weight initialization)(예를 들어, 가중치 초기화 대상이 되는 가중치 값의 범위 설정), Hidden Unit 개수(예를 들어, 히든 레이어의 개수, 히든 레이어의 노드 수)를 포함할 수 있다. 전술한 데이터 구조는 예시일 뿐 본 개시는 이에 제한되지 않는다.The data structure may include hyper-parameters of a neural network. And the data structure including the hyperparameters of the neural network can be stored in a computer-readable medium. A hyperparameter may be a variable that can be changed by the user. Hyperparameters include, for example, learning rate, cost function, number of learning cycle repetitions, weight initialization (e.g., setting the range of weight values subject to weight initialization), Hidden Unit. It may include a number (e.g., number of hidden layers, number of nodes in hidden layers). The above-described data structure is only an example and the present disclosure is not limited thereto.

도 6은 본 개시의 실시예들이 구현될 수 있는 예시적인 컴퓨팅 환경에 대한 간략하고 일반적인 개략도이다.6 is a brief, general schematic diagram of an example computing environment in which embodiments of the present disclosure may be implemented.

본 개시가 일반적으로 컴퓨팅 장치(100)에 의해 구현될 수 있는 것으로 전술되었지만, 통상의 기술자라면 본 개시가 하나 이상의 컴퓨터 상에서 실행될 수 있는 컴퓨터 실행가능 명령어 및/또는 기타 프로그램 모듈들과 결합되어 및/또는 하드웨어와 소프트웨어의 조합으로써 구현될 수 있다는 것을 잘 알 것이다.Although the disclosure has been described above as being capable of being implemented generally by computing device 100, those skilled in the art will recognize that the disclosure can be combined with computer-executable instructions and/or other program modules that can be executed on one or more computers and/ Alternatively, it will be well known that it can be implemented as a combination of hardware and software.

일반적으로, 프로그램 모듈은 특정의 태스크를 수행하거나 특정의 추상 데이터 유형을 구현하는 루틴, 프로그램, 컴포넌트, 데이터 구조, 기타 등등을 포함한다. 또한, 통상의 기술자라면 본 개시의 방법이 단일-프로세서 또는 멀티프로세서 컴퓨터 시스템, 미니컴퓨터, 메인프레임 컴퓨터는 물론 퍼스널 컴퓨터, 핸드헬드(handheld) 컴퓨팅 장치, 마이크로프로세서-기반 또는 프로그램가능 가전 제품, 기타 등등(이들 각각은 하나 이상의 연관된 장치와 연결되어 동작할 수 있음)을 비롯한 다른 컴퓨터 시스템 구성으로 실시될 수 있다는 것을 잘 알 것이다.Typically, program modules include routines, programs, components, data structures, etc. that perform specific tasks or implement specific abstract data types. Additionally, those skilled in the art will understand that the methods of the present disclosure can be used on single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, handheld computing devices, microprocessor-based or programmable consumer electronics, etc. It will be appreciated that other computer system configurations may be implemented, including, but not limited to, each of which may operate in connection with one or more associated devices.

본 개시의 설명된 실시예들은 또한 어떤 태스크들이 통신 네트워크를 통해 연결되어 있는 원격 처리 장치들에 의해 수행되는 분산 컴퓨팅 환경에서 실시될 수 있다. 분산 컴퓨팅 환경에서, 프로그램 모듈은 로컬 및 원격 메모리 저장 장치 둘 다에 위치할 수 있다.The described embodiments of the disclosure can also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

컴퓨터는 통상적으로 다양한 컴퓨터 판독가능 매체를 포함한다. 컴퓨터에 의해 액세스 가능한 매체는 그 어떤 것이든지 컴퓨터 판독가능 매체가 될 수 있고, 이러한 컴퓨터 판독가능 매체는 휘발성 및 비휘발성 매체, 일시적(transitory) 및 비일시적(non-transitory) 매체, 이동식 및 비-이동식 매체를 포함한다. 제한이 아닌 예로서, 컴퓨터 판독가능 매체는 컴퓨터 판독가능 저장 매체 및 컴퓨터 판독가능 전송 매체를 포함할 수 있다. 컴퓨터 판독가능 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보를 저장하는 임의의 방법 또는 기술로 구현되는 휘발성 및 비휘발성 매체, 일시적 및 비-일시적 매체, 이동식 및 비이동식 매체를 포함한다. 컴퓨터 판독가능 저장 매체는 RAM, ROM, EEPROM, 플래시 메모리 또는 기타 메모리 기술, CD-ROM, DVD(digital video disk) 또는 기타 광 디스크 저장 장치, 자기 카세트, 자기 테이프, 자기 디스크 저장 장치 또는 기타 자기 저장 장치, 또는 컴퓨터에 의해 액세스될 수 있고 원하는 정보를 저장하는 데 사용될 수 있는 임의의 기타 매체를 포함하지만, 이에 한정되지 않는다.Computers typically include a variety of computer-readable media. Computer-readable media can be any medium that can be accessed by a computer, and such computer-readable media includes volatile and non-volatile media, transitory and non-transitory media, removable and non-transitory media. Includes removable media. By way of example, and not limitation, computer-readable media may include computer-readable storage media and computer-readable transmission media. Computer-readable storage media refers to volatile and non-volatile media, transient and non-transitory media, removable and non-removable, implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Includes media. Computer readable storage media may include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disk (DVD) or other optical disk storage, magnetic cassette, magnetic tape, magnetic disk storage or other magnetic storage. This includes, but is not limited to, a device, or any other medium that can be accessed by a computer and used to store desired information.

컴퓨터 판독가능 전송 매체는 통상적으로 반송파(carrier wave) 또는 기타 전송 메커니즘(transport mechanism)과 같은 피변조 데이터 신호(modulated data signal)에 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터 등을 구현하고 모든 정보 전달 매체를 포함한다. 피변조 데이터 신호라는 용어는 신호 내에 정보를 인코딩하도록 그 신호의 특성들 중 하나 이상을 설정 또는 변경시킨 신호를 의미한다. 제한이 아닌 예로서, 컴퓨터 판독가능 전송 매체는 유선 네트워크 또는 직접 배선 접속(direct-wired connection)과 같은 유선 매체, 그리고 음향, RF, 적외선, 기타 무선 매체와 같은 무선 매체를 포함한다. 상술된 매체들 중 임의의 것의 조합도 역시 컴퓨터 판독가능 전송 매체의 범위 안에 포함되는 것으로 한다.A computer-readable transmission medium typically implements computer-readable instructions, data structures, program modules, or other data on a modulated data signal, such as a carrier wave or other transport mechanism. Includes all information delivery media. The term modulated data signal refers to a signal in which one or more of the characteristics of the signal have been set or changed to encode information within the signal. By way of example, and not limitation, computer-readable transmission media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also intended to be included within the scope of computer-readable transmission media.

컴퓨터(1102)를 포함하는 본 개시의 여러가지 측면들을 구현하는 예시적인 환경(1100)이 나타내어져 있으며, 컴퓨터(1102)는 처리 장치(1104), 시스템 메모리(1106) 및 시스템 버스(1108)를 포함한다. 시스템 버스(1108)는 시스템 메모리(1106)(이에 한정되지 않음)를 비롯한 시스템 컴포넌트들을 처리 장치(1104)에 연결시킨다. 처리 장치(1104)는 다양한 상용 프로세서들 중 임의의 프로세서일 수 있다. 듀얼 프로세서 및 기타 멀티프로세서 아키텍처도 역시 처리 장치(1104)로서 이용될 수 있다.An example environment 1100 is shown that implements various aspects of the present disclosure, including a computer 1102, which includes a processing unit 1104, a system memory 1106, and a system bus 1108. do. System bus 1108 couples system components, including but not limited to system memory 1106, to processing unit 1104. Processing unit 1104 may be any of a variety of commercially available processors. Dual processors and other multiprocessor architectures may also be used as processing unit 1104.

시스템 버스(1108)는 메모리 버스, 주변장치 버스, 및 다양한 상용 버스 아키텍처 중 임의의 것을 사용하는 로컬 버스에 추가적으로 상호 연결될 수 있는 몇 가지 유형의 버스 구조 중 임의의 것일 수 있다. 시스템 메모리(1106)는 판독 전용 메모리(ROM)(1110) 및 랜덤 액세스 메모리(RAM)(1112)를 포함한다. 기본 입/출력 시스템(BIOS)은 ROM, EPROM, EEPROM 등의 비휘발성 메모리(1110)에 저장되며, 이 BIOS는 시동 중과 같은 때에 컴퓨터(1102) 내의 구성요소들 간에 정보를 전송하는 일을 돕는 기본적인 루틴을 포함한다. RAM(1112)은 또한 데이터를 캐싱하기 위한 정적 RAM 등의 고속 RAM을 포함할 수 있다.System bus 1108 may be any of several types of bus structures that may further be interconnected to a memory bus, peripheral bus, and local bus using any of a variety of commercial bus architectures. System memory 1106 includes read only memory (ROM) 1110 and random access memory (RAM) 1112. The basic input/output system (BIOS) is stored in non-volatile memory 1110, such as ROM, EPROM, and EEPROM, and is a basic input/output system that helps transfer information between components within the computer 1102, such as during startup. Contains routines. RAM 1112 may also include high-speed RAM, such as static RAM, for caching data.

컴퓨터(1102)는 또한 내장형 하드 디스크 드라이브(HDD)(1114)(예를 들어, EIDE, SATA)－이 내장형 하드 디스크 드라이브(1114)는 또한 적당한 섀시(도시 생략) 내에서 외장형 용도로 구성될 수 있음－, 자기 플로피 디스크 드라이브(FDD)(1116)(예를 들어, 이동식 디스켓(1118)으로부터 판독을 하거나 그에 기록을 하기 위한 것임), 및 광 디스크 드라이브(1120)(예를 들어, CD-ROM 디스크(1122)를 판독하거나 DVD 등의 기타 고용량 광 매체로부터 판독을 하거나 그에 기록을 하기 위한 것임)를 포함한다. 하드 디스크 드라이브(1114), 자기 디스크 드라이브(1116) 및 광 디스크 드라이브(1120)는 각각 하드 디스크 드라이브 인터페이스(1124), 자기 디스크 드라이브 인터페이스(1126) 및 광 드라이브 인터페이스(1128)에 의해 시스템 버스(1108)에 연결될 수 있다. 외장형 드라이브 구현을 위한 인터페이스(1124)는 USB(Universal Serial Bus) 및 IEEE 1394 인터페이스 기술 중 적어도 하나 또는 그 둘 다를 포함한다.Computer 1102 may also include an internal hard disk drive (HDD) 1114 (e.g., EIDE, SATA)—the internal hard disk drive 1114 may also be configured for external use within a suitable chassis (not shown). Yes - a magnetic floppy disk drive (FDD) 1116 (e.g., for reading from or writing to a removable diskette 1118), and an optical disk drive 1120 (e.g., a CD-ROM for reading the disk 1122 or reading from or writing to other high-capacity optical media such as DVDs). Hard disk drive 1114, magnetic disk drive 1116, and optical disk drive 1120 are connected to system bus 1108 by hard disk drive interface 1124, magnetic disk drive interface 1126, and optical drive interface 1128, respectively. ) can be connected to. The interface 1124 for implementing an external drive includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies.

이들 드라이브 및 그와 연관된 컴퓨터 판독가능 매체는 데이터, 데이터 구조, 컴퓨터 실행가능 명령어, 기타 등등의 비휘발성 저장을 제공한다. 컴퓨터(1102)의 경우, 드라이브 및 매체는 임의의 데이터를 적당한 디지털 형식으로 저장하는 것에 대응한다. 상기에서의 컴퓨터 판독가능 매체에 대한 설명이 HDD, 이동식 자기 디스크, 및 CD 또는 DVD 등의 이동식 광 매체를 언급하고 있지만, 통상의 기술자라면 집 드라이브(zip drive), 자기 카세트, 플래쉬 메모리 카드, 카트리지, 기타 등등의 컴퓨터에 의해 판독가능한 다른 유형의 매체도 역시 예시적인 운영 환경에서 사용될 수 있으며 또 임의의 이러한 매체가 본 개시의 방법들을 수행하기 위한 컴퓨터 실행가능 명령어를 포함할 수 있다는 것을 잘 알 것이다.These drives and their associated computer-readable media provide non-volatile storage of data, data structures, computer-executable instructions, and the like. For computer 1102, drive and media correspond to storing any data in a suitable digital format. Although the description of computer-readable media above refers to removable optical media such as HDDs, removable magnetic disks, and CDs or DVDs, those of ordinary skill in the art would also recognize zip drives, magnetic cassettes, flash memory cards, and cartridges. It will be appreciated that other types of computer-readable media may also be used in the exemplary operating environment, and that any such media may contain computer-executable instructions for performing the methods of the present disclosure. .

운영 체제(1130), 하나 이상의 애플리케이션 프로그램(1132), 기타 프로그램 모듈(1134) 및 프로그램 데이터(1136)를 비롯한 다수의 프로그램 모듈이 드라이브 및 RAM(1112)에 저장될 수 있다. 운영 체제, 애플리케이션, 모듈 및/또는 데이터의 전부 또는 그 일부분이 또한 RAM(1112)에 캐싱될 수 있다. 본 개시가 여러가지 상업적으로 이용가능한 운영 체제 또는 운영 체제들의 조합에서 구현될 수 있다는 것을 잘 알 것이다.A number of program modules may be stored in the drive and RAM 1112, including an operating system 1130, one or more application programs 1132, other program modules 1134, and program data 1136. All or portions of the operating system, applications, modules and/or data may also be cached in RAM 1112. It will be appreciated that the present disclosure may be implemented on various commercially available operating systems or combinations of operating systems.

사용자는 하나 이상의 유선/무선 입력 장치, 예를 들어, 키보드(1138) 및 마우스(1140) 등의 포인팅 장치를 통해 컴퓨터(1102)에 명령 및 정보를 입력할 수 있다. 기타 입력 장치(도시 생략)로는 마이크, IR 리모콘, 조이스틱, 게임 패드, 스타일러스 펜, 터치 스크린, 기타 등등이 있을 수 있다. 이들 및 기타 입력 장치가 종종 시스템 버스(1108)에 연결되어 있는 입력 장치 인터페이스(1142)를 통해 처리 장치(1104)에 연결되지만, 병렬 포트, IEEE 1394 직렬 포트, 게임 포트, USB 포트, IR 인터페이스, 기타 등등의 기타 인터페이스에 의해 연결될 수 있다.A user may enter commands and information into computer 1102 through one or more wired/wireless input devices, such as a keyboard 1138 and a pointing device such as mouse 1140. Other input devices (not shown) may include microphones, IR remote controls, joysticks, game pads, stylus pens, touch screens, etc. These and other input devices are connected to the processing unit 1104 through an input device interface 1142, which is often connected to the system bus 1108, but may also include a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, It can be connected by other interfaces, etc.

모니터(1144) 또는 다른 유형의 디스플레이 장치도 역시 비디오 어댑터(1146) 등의 인터페이스를 통해 시스템 버스(1108)에 연결된다. 모니터(1144)에 부가하여, 컴퓨터는 일반적으로 스피커, 프린터, 기타 등등의 기타 주변 출력 장치(도시 생략)를 포함한다.A monitor 1144 or other type of display device is also connected to system bus 1108 through an interface, such as a video adapter 1146. In addition to monitor 1144, computers typically include other peripheral output devices (not shown) such as speakers, printers, etc.

컴퓨터(1102)는 유선 및/또는 무선 통신을 통한 원격 컴퓨터(들)(1148) 등의 하나 이상의 원격 컴퓨터로의 논리적 연결을 사용하여 네트워크화된 환경에서 동작할 수 있다. 원격 컴퓨터(들)(1148)는 워크스테이션, 컴퓨팅 디바이스 컴퓨터, 라우터, 퍼스널 컴퓨터, 휴대용 컴퓨터, 마이크로프로세서-기반 오락 기기, 피어 장치 또는 기타 통상의 네트워크 노드일 수 있으며, 일반적으로 컴퓨터(1102)에 대해 기술된 구성요소들 중 다수 또는 그 전부를 포함하지만, 간략함을 위해, 메모리 저장 장치(1150)만이 도시되어 있다. 도시되어 있는 논리적 연결은 근거리 통신망(LAN)(1152) 및/또는 더 큰 네트워크, 예를 들어, 원거리 통신망(WAN)(1154)에의 유선/무선 연결을 포함한다. 이러한 LAN 및 WAN 네트워킹 환경은 사무실 및 회사에서 일반적인 것이며, 인트라넷 등의 전사적 컴퓨터 네트워크(enterprise-wide computer network)를 용이하게 해주며, 이들 모두는 전세계 컴퓨터 네트워크, 예를 들어, 인터넷에 연결될 수 있다.Computer 1102 may operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1148, via wired and/or wireless communications. Remote computer(s) 1148 may be a workstation, computing device computer, router, personal computer, portable computer, microprocessor-based entertainment device, peer device, or other conventional network node, and is generally connected to computer 1102. For simplicity, only memory storage device 1150 is shown, although it includes many or all of the components described. The logical connections depicted include wired/wireless connections to a local area network (LAN) 1152 and/or a larger network, such as a wide area network (WAN) 1154. These LAN and WAN networking environments are common in offices and companies and facilitate enterprise-wide computer networks, such as intranets, all of which can be connected to a worldwide computer network, such as the Internet.

LAN 네트워킹 환경에서 사용될 때, 컴퓨터(1102)는 유선 및/또는 무선 통신 네트워크 인터페이스 또는 어댑터(1156)를 통해 로컬 네트워크(1152)에 연결된다. 어댑터(1156)는 LAN(1152)에의 유선 또는 무선 통신을 용이하게 해줄 수 있으며, 이 LAN(1152)은 또한 무선 어댑터(1156)와 통신하기 위해 그에 설치되어 있는 무선 액세스 포인트를 포함하고 있다. WAN 네트워킹 환경에서 사용될 때, 컴퓨터(1102)는 모뎀(1158)을 포함할 수 있거나, WAN(1154) 상의 통신 컴퓨팅 디바이스에 연결되거나, 또는 인터넷을 통하는 등, WAN(1154)을 통해 통신을 설정하는 기타 수단을 갖는다. 내장형 또는 외장형 및 유선 또는 무선 장치일 수 있는 모뎀(1158)은 직렬 포트 인터페이스(1142)를 통해 시스템 버스(1108)에 연결된다. 네트워크화된 환경에서, 컴퓨터(1102)에 대해 설명된 프로그램 모듈들 또는 그의 일부분이 원격 메모리/저장 장치(1150)에 저장될 수 있다. 도시된 네트워크 연결이 예시적인 것이며 컴퓨터들 사이에 통신 링크를 설정하는 기타 수단이 사용될 수 있다는 것을 잘 알 것이다.When used in a LAN networking environment, computer 1102 is connected to local network 1152 through wired and/or wireless communications network interfaces or adapters 1156. Adapter 1156 may facilitate wired or wireless communication to LAN 1152, which also includes a wireless access point installed thereon for communicating with wireless adapter 1156. When used in a WAN networking environment, the computer 1102 may include a modem 1158 or be connected to a communicating computing device on the WAN 1154 or to establish communications over the WAN 1154, such as over the Internet. have other means. Modem 1158, which may be internal or external and a wired or wireless device, is coupled to system bus 1108 via serial port interface 1142. In a networked environment, program modules described for computer 1102, or portions thereof, may be stored in remote memory/storage device 1150. It will be appreciated that the network connections shown are exemplary and that other means of establishing a communications link between computers may be used.

컴퓨터(1102)는 무선 통신으로 배치되어 동작하는 임의의 무선 장치 또는 개체, 예를 들어, 프린터, 스캐너, 데스크톱 및/또는 휴대용 컴퓨터, PDA(portable data assistant), 통신 위성, 무선 검출가능 태그와 연관된 임의의 장비 또는 장소, 및 전화와 통신을 하는 동작을 한다. 이것은 적어도 Wi-Fi 및 블루투스 무선 기술을 포함한다. 따라서, 통신은 종래의 네트워크에서와 같이 미리 정의된 구조이거나 단순하게 적어도 2개의 장치 사이의 애드혹 통신(ad hoc communication)일 수 있다.Computer 1102 may be associated with any wireless device or object deployed and operating in wireless communications, such as a printer, scanner, desktop and/or portable computer, portable data assistant (PDA), communications satellite, wirelessly detectable tag. Performs actions to communicate with any device or location and telephone. This includes at least Wi-Fi and Bluetooth wireless technologies. Accordingly, communication may be a predefined structure as in a conventional network or may simply be ad hoc communication between at least two devices.

Wi-Fi(Wireless Fidelity)는 유선 없이도 인터넷 등으로의 연결을 가능하게 해준다. Wi-Fi는 이러한 장치, 예를 들어, 컴퓨터가 실내에서 및 실외에서, 즉 기지국의 통화권 내의 아무 곳에서나 데이터를 전송 및 수신할 수 있게 해주는 셀 전화와 같은 무선 기술이다. Wi-Fi 네트워크는 안전하고 신뢰성 있으며 고속인 무선 연결을 제공하기 위해 IEEE 802.11(a, b, g, 기타)이라고 하는 무선 기술을 사용한다. 컴퓨터를 서로에, 인터넷에 및 유선 네트워크(IEEE 802.3 또는 이더넷을 사용함)에 연결시키기 위해 Wi-Fi가 사용될 수 있다. Wi-Fi 네트워크는 비인가 2.4 및 5GHz 무선 대역에서, 예를 들어, 11Mbps(802.11a) 또는 54 Mbps(802.11b) 데이터 레이트로 동작하거나, 양 대역(듀얼 대역)을 포함하는 제품에서 동작할 수 있다.Wi-Fi (Wireless Fidelity) allows connection to the Internet, etc. without wires. Wi-Fi is a wireless technology, like cell phones, that allows these devices, such as computers, to send and receive data indoors and outdoors, anywhere within the coverage area of a cell tower. Wi-Fi networks use wireless technology called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, and high-speed wireless connections. Wi-Fi can be used to connect computers to each other, to the Internet, and to wired networks (using IEEE 802.3 or Ethernet). Wi-Fi networks can operate in the unlicensed 2.4 and 5 GHz wireless bands, for example, at data rates of 11 Mbps (802.11a) or 54 Mbps (802.11b), or in products that include both bands (dual band). .

본 개시의 기술 분야에서 통상의 지식을 가진 자는 정보 및 신호들이 임의의 다양한 상이한 기술들 및 기법들을 이용하여 표현될 수 있다는 것을 이해할 것이다. 예를 들어, 위의 설명에서 참조될 수 있는 데이터, 지시들, 명령들, 정보, 신호들, 비트들, 심볼들 및 칩들은 전압들, 전류들, 전자기파들, 자기장들 또는 입자들, 광학장들 또는 입자들, 또는 이들의 임의의 결합에 의해 표현될 수 있다.Those skilled in the art will understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols and chips that may be referenced in the above description include voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields. It can be expressed by particles or particles, or any combination thereof.

본 개시의 기술 분야에서 통상의 지식을 가진 자는 여기에 개시된 실시예들과 관련하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 프로세서들, 수단들, 회로들 및 알고리즘 단계들이 전자 하드웨어, (편의를 위해, 여기에서 소프트웨어로 지칭되는) 다양한 형태들의 프로그램 또는 설계 코드 또는 이들 모두의 결합에 의해 구현될 수 있다는 것을 이해할 것이다. 하드웨어 및 소프트웨어의 이러한 상호 호환성을 명확하게 설명하기 위해, 다양한 예시적인 컴포넌트들, 블록들, 모듈들, 회로들 및 단계들이 이들의 기능과 관련하여 위에서 일반적으로 설명되었다. 이러한 기능이 하드웨어 또는 소프트웨어로서 구현되는지 여부는 특정한 애플리케이션 및 전체 시스템에 대하여 부과되는 설계 제약들에 따라 좌우된다. 본 개시의 기술 분야에서 통상의 지식을 가진 자는 각각의 특정한 애플리케이션에 대하여 다양한 방식들로 설명된 기능을 구현할 수 있으나, 이러한 구현 결정들은 본 개시의 범위를 벗어나는 것으로 해석되어서는 안 될 것이다.Those skilled in the art will understand that the various illustrative logical blocks, modules, processors, means, circuits and algorithm steps described in connection with the embodiments disclosed herein may be used in electronic hardware, (for convenience) It will be understood that it may be implemented by various forms of program or design code (referred to herein as software) or a combination of both. To clearly illustrate this interoperability of hardware and software, various illustrative components, blocks, modules, circuits and steps have been described above generally with respect to their functionality. Whether this functionality is implemented as hardware or software depends on the specific application and design constraints imposed on the overall system. A person skilled in the art of this disclosure may implement the described functionality in various ways for each specific application, but such implementation decisions should not be construed as departing from the scope of this disclosure.

여기서 제시된 다양한 실시예들은 방법, 장치, 또는 표준 프로그래밍 및/또는 엔지니어링 기술을 사용한 제조 물품(article)으로 구현될 수 있다. 용어 제조 물품은 임의의 컴퓨터-판독가능 저장장치로부터 액세스 가능한 컴퓨터 프로그램, 캐리어, 또는 매체(media)를 포함한다. 예를 들어, 컴퓨터-판독가능 저장매체는 자기 저장 장치(예를 들면, 하드 디스크, 플로피 디스크, 자기 스트립, 등), 광학 디스크(예를 들면, CD, DVD, 등), 스마트 카드, 및 플래쉬 메모리 장치(예를 들면, EEPROM, 카드, 스틱, 키 드라이브, 등)를 포함하지만, 이들로 제한되는 것은 아니다. 또한, 여기서 제시되는 다양한 저장 매체는 정보를 저장하기 위한 하나 이상의 장치 및/또는 다른 기계-판독가능한 매체를 포함한다.The various embodiments presented herein may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term article of manufacture includes a computer program, carrier, or media accessible from any computer-readable storage device. For example, computer-readable storage media include magnetic storage devices (e.g., hard disks, floppy disks, magnetic strips, etc.), optical disks (e.g., CDs, DVDs, etc.), smart cards, and flash. Includes, but is not limited to, memory devices (e.g., EEPROM, cards, sticks, key drives, etc.). Additionally, various storage media presented herein include one or more devices and/or other machine-readable media for storing information.

제시된 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조는 예시적인 접근들의 일례임을 이해하도록 한다. 설계 우선순위들에 기반하여, 본 개시의 범위 내에서 프로세스들에 있는 단계들의 특정한 순서 또는 계층 구조가 재배열될 수 있다는 것을 이해하도록 한다. 첨부된 방법 청구항들은 샘플 순서로 다양한 단계들의 엘리먼트들을 제공하지만 제시된 특정한 순서 또는 계층 구조에 한정되는 것을 의미하지는 않는다.It is to be understood that the specific order or hierarchy of steps in the processes presented is an example of illustrative approaches. It is to be understood that the specific order or hierarchy of steps in processes may be rearranged within the scope of the present disclosure, based on design priorities. The appended method claims present elements of the various steps in a sample order but are not meant to be limited to the particular order or hierarchy presented.

제시된 실시예들에 대한 설명은 임의의 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 개시를 이용하거나 또는 실시할 수 있도록 제공된다. 이러한 실시예들에 대한 다양한 변형들은 본 개시의 기술 분야에서 통상의 지식을 가진 자에게 명백할 것이며, 여기에 정의된 일반적인 원리들은 본 개시의 범위를 벗어남이 없이 다른 실시예들에 적용될 수 있다. 그리하여, 본 개시는 여기에 제시된 실시예들로 한정되는 것이 아니라, 여기에 제시된 원리들 및 신규한 특징들과 일관되는 최광의의 범위에서 해석되어야 할 것이다.The description of the presented embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments without departing from the scope of the disclosure. Thus, the present disclosure is not limited to the embodiments presented herein but is to be interpreted in the broadest scope consistent with the principles and novel features presented herein.

Claims

In a method of analyzing medical data from a medical institution performed by a computing device,
Obtaining a first medical dataset of the first medical institution and a second medical dataset of the second medical institution from a first medical institution and a second medical institution;
generating a total synthetic dataset from the first medical dataset and the second medical dataset; and
Based on a target medical dataset of a target medical institution, obtaining an analysis result for the target medical dataset from the integrated synthetic dataset;
Includes,
The first medical data set, the second medical data set, and the integrated synthetic data set include preset independent variables and preset dependent variables,
The integrated synthetic data set includes a first weight corresponding to the preset independent variable, and
The steps for obtaining analysis results for the target medical dataset are:
Obtaining a second weight corresponding to the preset independent variable from the target medical data set;
Obtaining a third weight corresponding to the preset independent variable based on the first weight and the second weight; and
Obtaining the analysis result from the integrated synthetic data set based on the preset independent variable, the preset dependent variable, and the third weight;
Including,
method.

According to claim 1,
The step of generating the integrated synthetic data set is,
Obtaining clinical prior information including information about the correlation between the preset independent variable and the preset dependent variable; and
generating the integrated synthetic data set from the clinical prior information, the first medical data set, and the second medical data set;
Including,
method.

According to claim 1,
The step of generating the integrated synthetic data set is,
Generating a first synthetic dataset corresponding to the first medical dataset and a second synthetic dataset corresponding to the second medical dataset from the first medical dataset and the second medical dataset; and
Integrating the first synthetic data set and the second synthetic data set to generate the integrated synthetic data set;
Including,
method.

According to claim 3,
The step of generating the first synthetic dataset and the second synthetic dataset is,
Generating the first synthetic dataset and the second synthetic dataset from the first medical dataset and the second medical dataset using a pre-trained GAN (Generative Adversarial Networks) model;
Including,
method.

According to claim 3,
The step of generating the first synthetic dataset and the second synthetic dataset is,
Generating the first synthetic dataset and the second synthetic dataset from the first medical dataset and the second medical dataset using a SMOTE (Synthetic Minority Oversampling Technique) algorithm;
Including,
method.

According to claim 3,
The step of generating the first synthetic dataset and the second synthetic dataset is,
Obtaining clinical prior information including information about the correlation between the preset independent variable and the preset dependent variable; and
Using a Bayesian algorithm based on Bayesian inference, the first synthetic data set and the second synthetic data set are generated by reflecting the first medical data set and the second medical data set in the clinical prior information. steps;
Including,
method.

According to claim 1,
determining, from the first medical data set and the second medical data set, a first independent variable that is correlated with the dependent variable and is not included in the preset independent variable;
It further includes, and
The analysis result includes information about the correlation between the first independent variable and the preset dependent variable,
method.

delete

According to claim 1,
The step of obtaining a third weight corresponding to the preset independent variable is:
Based on the number of first medical data in the first medical data set, the number of second medical data in the second medical data set, and the number of target medical data in the target medical data set, the first weight and the second weight are respectively determining the degree of weighting; and
Obtaining the third weight based on the weighting degree of each of the first weight and the second weight;
Including,
method.

According to clause 9,
The step of determining the weighting degree of each of the first weight and the second weight is:
Obtaining clinical preliminary information including clinical statistics in which clinical values are recorded; and
Based on the number of clinical medical data in the clinical statistical data, the number of first medical data in the first medical data set, the number of second medical data in the second medical data set, and the number of target medical data in the target medical data set, determining a weighting degree of each of the first weight and the second weight;
Including,
method.

According to claim 1,
The steps for obtaining analysis results for the target medical dataset are:
Generating a target synthetic dataset corresponding to the target medical dataset from the integrated synthetic dataset based on the preset independent variable, the preset dependent variable, and the third weight; and
Obtaining the analysis result from the target synthetic dataset;
Including,
method.

According to claim 1,
The steps for obtaining analysis results for the target medical dataset are:
determining, from the first medical data set and the second medical data set, a first independent variable that is correlated with the dependent variable and is not included in the preset independent variable;
Generating a target synthetic dataset corresponding to the target medical dataset from the integrated synthetic dataset based on the preset independent variable, the first independent variable, the preset dependent variable, and the third weight; and
Obtaining the analysis result from the target synthetic dataset;
Including,
method.

According to claim 1,
The analysis results for the target medical data set are:
Containing information about the preset independent variable, the preset dependent variable, the third weight, and the correlation between the preset independent variable and the preset dependent variable,
method.

According to claim 1,
The first medical data set and the second medical data set are medical data sets converted from the first raw medical data set and the second raw medical data set to the data structure of the Common Data Model (CDM). in - the first medical dataset corresponds to the first raw medical dataset, and the second medical dataset corresponds to the second raw medical dataset -,
method.

In a method of analyzing medical data from a medical institution performed by a computing device,
Obtaining a first medical dataset of the first medical institution and a second medical dataset of the second medical institution from a first medical institution and a second medical institution;
generating a total synthetic dataset from the first medical dataset and the second medical dataset; and
Based on a target medical dataset of a target medical institution, obtaining an analysis result for the target medical dataset from the integrated synthetic dataset;
Includes,
The first medical data set, the second medical data set, and the integrated synthetic data set include preset independent variables and preset dependent variables,
The steps for obtaining analysis results for the target medical dataset are:
generating a first integrated synthetic dataset from the first medical dataset and the second medical dataset;
generating a second integrated synthetic dataset from the first medical dataset and the second medical dataset; and
Based on the target medical dataset, obtaining an analysis result for the target medical dataset from the first integrated synthetic dataset and the second integrated synthetic dataset;
contains, and
The first integrated synthetic data set includes a 1-1 weight corresponding to the preset independent variable,
The second integrated synthetic data set includes 1-2 weights corresponding to the preset independent variables,
The 1-1 weight and the 1-2 weight are different from each other,
method.

According to claim 15,
The preset dependent variables included in each of the first integrated synthetic data set and the second integrated synthetic data set have probability distributions based on different means and standard deviations,
method.

According to claim 15,
The step of obtaining analysis results for the target medical dataset is,
Obtaining a first analysis result for the target medical dataset from the first integrated synthetic dataset;
Obtaining a second analysis result for the target medical dataset from the second integrated synthetic dataset; and
Obtaining an analysis result for the target medical dataset based on the first analysis result and the second analysis result;
Including,
method.

According to claim 1,
Based on at least one of a preset effect size, preset power, and preset significance level, determining the required number of data of the medical dataset for generating the integrated synthetic dataset;
It further includes, and
Each of the first medical data set and the second medical data set includes medical data greater than the required number of data,
method.

A computing device for analyzing medical data from a medical institution,
at least one processor; and
Memory;
Includes,
The at least one processor,
Obtaining a first medical data set of the first medical institution and a second medical data set of the second medical institution from the first medical institution and the second medical institution,
Generate a total synthetic dataset from the first medical dataset and the second medical dataset, and
Based on the target medical dataset of the target medical institution, an analysis result for the target medical dataset is obtained from the integrated synthetic dataset,
The first medical data set, the second medical data set, and the integrated synthetic data set include preset independent variables and preset dependent variables,
The integrated synthetic data set includes a first weight corresponding to the preset independent variable, and
Obtaining analysis results for the target medical dataset is,
Obtaining a second weight corresponding to the preset independent variable from the target medical data set,
Based on the first weight and the second weight, a third weight corresponding to the preset independent variable is obtained, and
Obtaining the analysis result from the integrated synthetic data set based on the preset independent variable, the preset dependent variable, and the third weight,
Computing device.

A computer program stored in a computer-readable storage medium, wherein the computer program, when executed by one or more processors, causes the one or more processors to perform operations for analyzing medical data of a medical institution, the operations comprising:
Obtaining a first medical data set of the first medical institution and a second medical data set of the second medical institution from a first medical institution and a second medical institution;
An operation of generating a total synthetic data set from the first medical data set and the second medical data set; and
An operation of obtaining an analysis result for the target medical dataset from the integrated synthetic dataset based on the target medical dataset of a target medical institution;
Includes,
The first medical data set, the second medical data set, and the integrated synthetic data set include preset independent variables and preset dependent variables,
The integrated synthetic data set includes a first weight corresponding to the preset independent variable, and
The operation of obtaining analysis results for the target medical dataset is:
Obtaining a second weight corresponding to the preset independent variable from the target medical data set;
Obtaining a third weight corresponding to the preset independent variable based on the first weight and the second weight; and
Obtaining the analysis result from the integrated synthetic data set based on the preset independent variable, the preset dependent variable, and the third weight;
Including,
A computer program stored on a computer-readable storage medium.