KR20220111212A

KR20220111212A - Virtual patient information generating system and method using machine learning

Info

Publication number: KR20220111212A
Application number: KR1020220013903A
Authority: KR
Inventors: 김수연
Original assignee: 주식회사 코스모스메딕
Priority date: 2021-02-01
Filing date: 2022-02-03
Publication date: 2022-08-09
Also published as: KR102503609B1

Abstract

The present invention relates to a system and a method for generating virtual patient information by using machine learning. In accordance with the present invention, the system for generating virtual patient information by using machine learning, includes: an information collection part collecting publicized available disease-related information or biometric information of a patient; an information generation part forming a group corresponding to each disease by using a machine learning AI model based on the collected information, and generating information (data) of patients belonging to each formed group; a virtual patient group generation part generating a new virtual patient group suffering from complex diseases by using a machine learning AI model based on the patient information (data) of each group; a virtual patient information generation part generating information of virtual patients belonging to each virtual patient group by using the machine learning AI model with respect to each generated virtual patient group; and a control part controlling a state check and operation of the information collection part, the information generation part, the virtual patient group generation part, and the virtual patient information generation part, and transmitting control commands for the machine learning AI model to form each group corresponding to each disease, generate a new virtual patient group, and generate virtual patient information. Therefore, the present invention is capable of helping general patients as well as emergency patients with treatment in a medical site.

Description

Virtual patient information generating system and method using machine learning

본 발명은 머신 러닝을 이용한 가상 환자 정보 생성 시스템 및 방법에 관한 것으로서, 더 상세하게는 머신 러닝 AI(Artificial Intelligence) 모델을 이용하여 환자의 생체 정보와 의료 영상(이미지)을 결합하여 실제 환자와 비슷한 수준의 가상의 환자의 정보를 생성하여 제공할 수 있는 머신 러닝을 이용한 가상 환자 정보 생성 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for generating virtual patient information using machine learning, and more particularly, by combining a patient's biometric information and a medical image (image) using a machine learning AI (Artificial Intelligence) model, similar to a real patient It relates to a system and method for generating virtual patient information using machine learning that can generate and provide level virtual patient information.

환자에 대한 정보를 생성하려는 시도는 의료 영상 분야에서 먼저 시행되었고 지금도 시행되고 있다. 그러나 환자에 대한 진료는 영상 데이터(정보)만으로 수행되는 것이 아니라 환자의 생체 정보(데이터)도 분명히 필요하다. 하지만 이와 같은 환자의 생체 정보(데이터) 생성은 아직까지 만족할 만한 성과를 거두지 못하고 있다.Attempts to generate patient information were first carried out in the field of medical imaging and are still being practiced today. However, patient care is not only performed with image data (information), but biometric information (data) of the patient is also clearly required. However, such biometric information (data) generation of the patient has not yet achieved satisfactory results.

현재까지 공개되어 있는 머신 러닝과 딥 러닝 기법들을 사용하면 생체 정보와 의료 이미지를 결합하여 가상 환자의 데이터를 만들 수 있다. 그 기법들은 빠른속도로 발전하고 있기 때문에, 가상 환자에 대한 데이터의 품질은 더욱 향상될 수 있고 정교하게 만들 수 있다. 이러한 가상 환자의 데이터를 적용할 수 있는 분야는 광범위하다.Using machine learning and deep learning techniques that have been released so far, biometric information and medical images can be combined to create virtual patient data. As the techniques are advancing at a rapid pace, the quality of the data on the virtual patient can be further improved and refined. The fields in which such virtual patient data can be applied are wide.

4차 산업의 핵심인 스마트 의료시장에서 환자의 정보는 매우 민감한 사안이다. 데이터 3법이 시행되었지만 환자들의 정보를 환자의 동의없이 사용하는 것은 엄연히 불법이다. 하지만, 세계적으로 다양한 의료 인공지능과 관련하여 많은 회사들이 연구개발에 박차를 가하고 있는 상황에서 그와 같은 규제는 결국 대한민국 의료 인공지능의 개발에 역효과를 미치게 되고, 궁극적으로 의료 인공지능의 경쟁력을 약화시키게 되는 결과를 초래하게 될 것이다.In the smart medical market, which is the core of the 4th industry, patient information is a very sensitive issue. Although the Data 3 Act has been implemented, it is absolutely illegal to use patient information without the patient's consent. However, in a situation where many companies are spurring R&D in relation to various medical AI around the world, such regulations will eventually have an adverse effect on the development of medical AI in Korea, ultimately weakening the competitiveness of medical AI. will lead to the consequences of doing so.

통상 의학 논문을 보면, 환자군의 특성(character)에 대한 언급이 있고, 그 중에서 두 개의 비교 환자군 간의 유의미한 차이를 나타내는 지표를 체크하여 보여주기도 한다. 이와 관련하여 때로는 동일한 질병을 가진 환자에게, 같은 약을 썼을 때 해당 환자의 특성에 따라 그 약의 효과는 달라질 수 있다. 예를 들면, 간단한 전신근육통에 소염제를 처방하였으나, 소염제 알레르기가 있는 환자는 그 약을 먹으면 두드러기가 나타나고, 심하면 사망에 이를 수가 있다.Usually, when looking at medical papers, there is a reference to the characteristics of a patient group, and among them, an index indicating a significant difference between two comparison patient groups is checked and shown. In this regard, sometimes when the same drug is used for a patient with the same disease, the effect of the drug may vary depending on the characteristics of the patient. For example, an anti-inflammatory drug was prescribed for simple systemic myalgia, but if a patient with an anti-inflammatory drug allergy takes the drug, hives may appear and, in severe cases, may lead to death.

이와 같이 환자의 특성을 이해(파악)하는 것은 매우 중요하지만, 응급환자나 희귀질환 등을 가진 환자의 정보는 충분히 구하기가 쉽지 않다.　특히 응급실 및 응급 상황의 특성상 검증되지 않은 새로운 약이나 시술 등을 적용한다는 것은 윤리적인 측면에서 문제가 있고, 이는 의학 발전의 중대한 걸림돌이 되고 있다. It is very important to understand (figure out) the characteristics of a patient as described above, but it is not easy to obtain sufficient information about emergency patients or patients with rare diseases. In particular, the application of new drugs or procedures that have not been verified due to the nature of emergency rooms and emergencies is problematic from an ethical point of view, which is a major obstacle to medical development.

따라서 충분한 '가상의 환자'의 정보를 생성하여 이에 대해 연구가 이루어진다면, 특히 인공지능 프로그램 개발에 있어서 사용할 수 있는 데이터셋(dataset)을 만들 수 있다면, 이는 응급 환자들에게 많은 도움이 될 것이다.Therefore, if sufficient 'virtual patient' information is generated and research is conducted, it will be of great help to emergency patients, especially if it is possible to create a dataset that can be used in the development of artificial intelligence programs.

한편, 한국 공개특허공보 제10-2021-0051141호(특허문헌 1)에는 "환자의 증강 현실 기반의 의료 정보를 제공하는 방법, 장치 및 컴퓨터 프로그램"이 개시되어 있는 바, 이에 따른 증강 현실 기반의 의료 정보를 제공하는 방법은, 환자의 의료 영상에 기초하여 생성된 환자의 피부에 대한 정보를 포함하는 3D 모델링 데이터를 획득하는 단계; 상기 환자의 상태, 상기 환자의 실시간 자세, 환부 및 시술 종류 중 적어도 하나에 기초하여 상기 3D 모델링 데이터로부터 적어도 일부의 기준 피부에 대한 정보를 샘플링하는 단계; 상기 환자가 위치하는 공간의 정보를 포함하는 실시간 공간 영상을 획득하는 단계; 상기 기준 피부에 대한 정보에 기초하여 상기 3D 모델링 데이터와 상기 실시간 공간 영상을 정합하는 단계; 상기 환자의 피부에 대한 정보에 기초하여, 상기 실시간 공간 영상에서 상기 환자의 신체 상에 의료 정보가 출력되도록 하는 증강 현실 기반의 의료 영상을 생성하는 단계; 및 상기 증강 현실 기반의 의료 영상을 출력하는 단계를 포함하는 것을 특징으로 한다.On the other hand, Korean Patent Application Laid-Open No. 10-2021-0051141 (Patent Document 1) discloses a "method, apparatus and computer program for providing augmented reality-based medical information to a patient". A method of providing medical information includes: acquiring 3D modeling data including information on a patient's skin generated based on a patient's medical image; sampling information on at least a portion of the reference skin from the 3D modeling data based on at least one of a condition of the patient, a real-time posture of the patient, an affected part, and a type of treatment; acquiring a real-time spatial image including information on a space in which the patient is located; matching the 3D modeling data and the real-time spatial image based on the information on the reference skin; generating an augmented reality-based medical image to output medical information on the patient's body from the real-time spatial image, based on the patient's skin information; and outputting the augmented reality-based medical image.

이상과 같은 특허문헌 1의 경우, 환자의 의료 영상에 기초하여 생성된 환자의 피부에 대한 정보를 포함하는 3D 모델링 데이터를 활용함으로써, 비마커 기반의 좌표계 정합을 통해 좌표계 정합의 정확도를 높일 수 있는 효과가 있기는 하나, 이는 환자의 상태, 환자의 실시간 자세, 환부 및 시술 종류 등 실제 환자를 기반으로 하고, 또한 궁극적으로 시술을 위한 환자의 피부에 대한 정보와 시술시의 정합도를 높이고자 하는 것으로, 일반적인 다양한 응급 환자들에 대해 적재적소에 필요한 다양한 환자의 정보를 제공할 수 없는 문제점을 내포하고 있다. In the case of Patent Document 1 as described above, by utilizing 3D modeling data including information on the patient's skin generated based on the patient's medical image, the accuracy of coordinate system matching can be increased through non-marker-based coordinate system matching. Although effective, it is based on the actual patient, such as the patient's condition, the patient's real-time posture, the affected area and the type of procedure, and ultimately, the information on the patient's skin for the procedure and the degree of conformity during the procedure As a result, there is a problem in that it is not possible to provide various patient information required in the right place for a variety of general emergency patients.

한국 공개특허공보 제10-2021-0051141호(2021.05.10.)Korean Patent Publication No. 10-2021-0051141 (2021.05.10.)

본 발명은 이상과 같은 사항을 감안하여 창출된 것으로서, 머신 러닝 AI(Artificial Intelligence) 모델을 이용하여 환자의 생체 정보와 의료 영상(이미지) 등을 결합하여 실제 환자와 비슷한 수준의 가상의 환자의 정보를 생성하여 제공함으로써, 일반 환자는 물론 응급 환자들의 진료 및 치료에 많은 도움을 줄 수 있는 머신 러닝을 이용한 가상 환자 정보 생성 시스템 및 방법을 제공함에 그 목적이 있다.The present invention was created in consideration of the above, and by combining the patient's biometric information and medical images (images) using a machine learning AI (Artificial Intelligence) model, virtual patient information similar to that of an actual patient It is an object of the present invention to provide a system and method for generating virtual patient information using machine learning that can provide a lot of help in the treatment and treatment of emergency patients as well as general patients by generating and providing them.

상기의 목적을 달성하기 위하여 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템은,In order to achieve the above object, a virtual patient information generation system using machine learning according to the present invention,

공개되어 있고, 사용 가능한 질병 관련 정보 또는 환자의 생체 정보를 수집하는 정보 수집부와;an information collection unit for collecting publicly available disease-related information or biometric information of a patient;

상기 정보 수집부에 의해 수집된 정보를 바탕으로 머신 러닝 AI(Artificial Intelligence) 모델을 이용하여 각각의 질병에 대해 각각 대응하는 그룹을 형성하고, 형성된 각 그룹에 속하는 환자의 정보(데이터)를 생성하는 정보 생성부와;Based on the information collected by the information collection unit, using a machine learning AI (Artificial Intelligence) model to form a group corresponding to each disease, and to generate information (data) of patients belonging to each formed group an information generating unit;

상기 정보 생성부에 의해 생성된 각 그룹의 환자의 정보(데이터)를 바탕으로 상기 머신 러닝 AI 모델을 이용하여 복합 질환을 앓고 있는 새로운 가상 환자군을 각각 생성하는 가상 환자군 생성부와;a virtual patient group generation unit generating a new virtual patient group each suffering from a complex disease by using the machine learning AI model based on the patient information (data) of each group generated by the information generation unit;

상기 가상 환자군 생성부에 의해 생성된 각각의 가상 환자군에 대하여 상기 머신 러닝 AI 모델을 이용하여 각각의 가상 환자군에 속하는 가상 환자의 정보를 생성하는 가상 환자 정보 생성부; 및 a virtual patient information generation unit for generating information on virtual patients belonging to each virtual patient group by using the machine learning AI model for each virtual patient group generated by the virtual patient group generation unit; and

상기 정보 수집부, 정보 생성부, 가상 환자군 생성부, 가상 환자 정보 생성부의 상태 체크 및 동작을 제어하고, 상기 머신 러닝 AI 모델에 의한 각각의 질병에 대응하는 각 그룹의 형성, 새로운 가상 환자군 생성 및 가상 환자의 정보 생성을 위한 제어 명령을 송출하는 제어부를 포함하는 점에 그 특징이 있다.The information collection unit, the information generation unit, the virtual patient group generation unit, and the virtual patient information generation unit control the status check and operation, and form each group corresponding to each disease by the machine learning AI model, create a new virtual patient group, and It is characterized in that it includes a control unit that transmits a control command for generating information about a virtual patient.

여기서, 바람직하게는 상기 가상 환자 정보 생성부에 의해 생성된 가상 환자의 정보(데이터)에 대해 SPSS, R Studio를 포함하는 통계 프로그램을 통해 신뢰도를 검증하고 필요시 수정하는 데이터 검증/수정부를 더 포함할 수 있다.Here, preferably, the virtual patient information (data) generated by the virtual patient information generating unit further includes a data verification/correction unit that verifies reliability through a statistical program including SPSS and R Studio and corrects if necessary can do.

또한, 상기 정보 수집부에 의해 수집되는 상기 공개되어 있고, 사용 가능한 질병 관련 정보 또는 환자의 생체 정보는 의학서적이나 의학논문에 기재되어 있는 정보 또는 데이터, 공개되고 법적으로 사용 가능한 환자의 생체정보 데이터, 공개되고 법적으로 사용 가능한 환자의 의료 영상(이미지) 데이터 중 적어도 어느 하나를 포함할 수 있다.In addition, the disclosed and usable disease-related information or patient's biometric information collected by the information collection unit includes information or data described in medical books or medical papers, and publicly available and legally available patient biometric data. , may include at least any one of publicly available and legally available medical image (image) data of the patient.

또한, 상기 정보 생성부가 상기 각 그룹에 속하는 환자의 정보(데이터)를 생성함에 있어서, 상기 각 그룹에 속하는 환자 집단이 표준 정규분포를 따르는 것으로 가정하고, 평균을 정한 후 상위 및 하위 각 5%를 제외한 나머지 90%의 정규분포 환자에 대한 정보를 생성할 수 있다.In addition, when the information generating unit generates information (data) of patients belonging to each group, it is assumed that the patient group belonging to each group follows a standard normal distribution, and after determining the mean, the upper and lower 5% of each It is possible to generate information about the remaining 90% normal distribution patients.

또한, 상기 정보 생성부가 상기 수집된 정보를 바탕으로 머신 러닝 AI 모델을 이용하여 각각의 질병에 대해 각각 대응하는 그룹을 형성함에 있어서, 상기 수집된 정보가 상호 간에 상관관계가 있는지의 여부를 체크하여, 상관관계가 있는 정보들은 그 상관성을 분석하여 같은 그룹으로 묶고, 상관관계가 없는 정보들은 그들끼리 임의로 섞어서 해당 질병(질환)의 가상 환자군을 생성할 수 있다.In addition, when the information generating unit forms a group corresponding to each disease using a machine learning AI model based on the collected information, it is checked whether the collected information is correlated with each other. , correlated information can be grouped into the same group by analyzing the correlation, and uncorrelated information can be arbitrarily mixed with each other to create a virtual patient group of the disease (disease).

또한, 상기 정보 생성부가 상기 환자의 정보(데이터)를 생성함에 있어서, 문자, 숫자로 구성된 환자 정보는 통계 프로그램 R studio를 포함한 머신 러닝 툴(machine learning tool)을 이용하여 표준 정규분포 데이터를 생성할 수 있다.In addition, when the information generating unit generates the patient information (data), the patient information consisting of letters and numbers generates standard normal distribution data using a machine learning tool including a statistical program R studio. can

또한, 상기 가상 환자군 생성부가 상기 각 그룹의 환자의 정보(데이터)를 바탕으로 머신 러닝 AI 모델을 이용하여 복합 질환을 앓고 있는 새로운 가상 환자군을 각각 생성함에 있어서, 앙상블(Ensemble) 기법을 포함한 머신 러닝 기법을 이용하여 각 질환별 환자의 정보(데이터)를 조합해서 새로운 가상 환자군을 생성할 수 있다.In addition, when the virtual patient group generating unit generates each new virtual patient group suffering from a complex disease using a machine learning AI model based on the patient information (data) of each group, machine learning including an ensemble technique Using the technique, a new virtual patient group can be created by combining patient information (data) for each disease.

또한, 상기의 목적을 달성하기 위하여 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 방법은,In addition, in order to achieve the above object, the method for generating virtual patient information using machine learning according to the present invention,

a) 정보 수집부에 의해 공개되어 있고, 사용 가능한 질병 관련 정보 또는 환자의 생체 정보를 수집하는 단계와; a) collecting the available disease-related information or biometric information of the patient disclosed by the information collection unit;

b) 정보 생성부가 상기 수집된 정보를 바탕으로 머신 러닝 AI 모델을 이용하여 각각의 질병에 대해 각각 대응하는 그룹을 형성하고, 형성된 각 그룹에 속하는 환자의 정보(데이터)를 생성하는 단계와;b) an information generating unit forming a group corresponding to each disease by using a machine learning AI model based on the collected information, and generating information (data) of a patient belonging to each formed group;

c) 가상 환자군 생성부가 상기 단계 b)에서 생성된 각 그룹의 환자의 정보(데이터)를 바탕으로 머신 러닝 AI 모델을 이용하여 복합 질환을 앓고 있는 새로운 가상 환자군을 각각 생성하는 단계; 및 c) generating, by a virtual patient group generation unit, a new virtual patient group each suffering from a complex disease using a machine learning AI model based on the information (data) of each group generated in step b); and

d) 가상 환자 정보 생성부가 상기 생성된 각각의 가상 환자군에 대하여 머신 러닝 AI 모델을 이용하여 각각의 가상 환자군에 속하는 가상 환자의 정보를 생성하는 단계를 포함하는 머신 러닝을 이용한 가상 환자 정보 생성 방법.d) generating, by a virtual patient information generating unit, information on virtual patients belonging to each virtual patient group by using a machine learning AI model for each of the generated virtual patient groups.

여기서, 바람직하게는 상기 단계 d) 이후에 데이터 검증/수정부가 SPSS, R Studio를 포함하는 통계 프로그램을 통해 데이터의 신뢰도를 검증하고 필요시 수정하는 단계를 더 포함할 수 있다.Here, preferably, after step d), the data verification/correction unit may further include a step of verifying the reliability of the data through a statistical program including SPSS and R Studio, and correcting if necessary.

또한, 상기 단계 a)에서 상기 공개되어 있고, 사용 가능한 질병 관련 정보 또는 환자의 생체 정보는 의학서적이나 의학논문에 기재되어 있는 정보 또는 데이터, 공개되고 법적으로 사용 가능한 환자의 생체정보 데이터, 공개되고 법적으로 사용 가능한 환자의 의료 영상(이미지) 데이터 중 적어도 어느 하나를 포함할 수 있다.In addition, the disclosed and available disease-related information or patient biometric information in step a) includes information or data described in medical books or medical papers, public and legally available patient bioinformation data, and It may include at least one of legally usable medical image (image) data of the patient.

또한, 상기 단계 b)에서 상기 각 그룹에 속하는 환자의 정보(데이터)를 생성함에 있어서, 상기 각 그룹에 속하는 환자 집단이 표준 정규분포를 따르는 것으로 가정하고, 평균을 정한 후 상위 및 하위 각 5%를 제외한 나머지 90%의 정규분포 환자에 대한 정보를 생성할 수 있다.In addition, in generating information (data) of patients belonging to each group in step b), it is assumed that the patient population belonging to each group follows a standard normal distribution, and after determining the mean, each of the upper and lower 5% It is possible to generate information about the remaining 90% normal distribution patients except for .

또한, 상기 단계 b)에서 상기 수집된 정보를 바탕으로 머신 러닝 AI 모델을 이용하여 각각의 질병에 대해 각각 대응하는 그룹을 형성함에 있어서, 상기 수집된 정보가 상호 간에 상관관계가 있는지의 여부를 체크하여, 상관관계가 있는 정보들은 그 상관성을 분석하여 같은 그룹으로 묶고, 상관관계가 없는 정보들은 그들끼리 임의로 섞어서 해당 질병(질환)의 가상 환자군을 생성할 수 있다.In addition, in forming a group corresponding to each disease using a machine learning AI model based on the collected information in step b), it is checked whether the collected information is correlated with each other Thus, the correlated information can be grouped into the same group by analyzing the correlation, and the uncorrelated information can be arbitrarily mixed with each other to create a virtual patient group of the disease (disease).

또한, 상기 단계 b)에서 상기 환자의 정보(데이터)를 생성함에 있어서, 문자, 숫자로 구성된 환자 정보는 통계 프로그램 R studio를 포함한 머신 러닝 툴(machine learning tool)을 이용하여 표준 정규분포 데이터를 생성할 수 있다.In addition, in generating the patient information (data) in step b), the patient information consisting of letters and numbers generates standard normal distribution data using a machine learning tool including a statistical program R studio. can do.

또한, 상기 단계 c)에서 상기 각 그룹의 환자의 정보(데이터)를 바탕으로 머신 러닝 AI 모델을 이용하여 복합 질환을 앓고 있는 새로운 가상 환자군을 각각 생성함에 있어서, 앙상블(Ensemble) 기법을 포함한 머신 러닝 기법을 이용하여 각 질환별 환자의 정보(데이터)를 조합해서 새로운 가상 환자군을 생성할 수 있다.In addition, in generating each new virtual patient group suffering from a complex disease using a machine learning AI model based on the information (data) of each group of patients in step c), machine learning including an ensemble technique Using the technique, a new virtual patient group can be created by combining patient information (data) for each disease.

이와 같은 본 발명에 의하면, 머신 러닝 AI 모델을 이용하여 환자의 생체 정보와 의료 영상(이미지) 등을 결합하여 실제 환자와 비슷한 수준의 가상의 환자의 정보를 생성하여 제공함으로써, 의료 현장에서 일반 환자는 물론 응급 환자들의 진료 및 치료에 많은 도움을 줄 수 있는 장점이 있다.According to the present invention as described above, by combining the patient's biometric information and medical images (images) using a machine learning AI model to generate and provide virtual patient information at a level similar to that of a real patient, general patients in the medical field Of course, it has the advantage of being able to help a lot in the treatment and treatment of emergency patients.

도 1은 본 발명의 실시예에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템의 구성을 개략적으로 나타낸 도면이다.
도 2는 본 발명의 실시예에 따른 머신 러닝을 이용한 가상 환자 정보 생성 방법의 실행 과정을 나타낸 흐름도이다.
도 3은 모든 데이터가 라벨을 가지고 있을 경우, 그를 바탕으로 군집화를 다시 진행하고 이상치는 제거하는 개념을 도식적으로 나타낸 도면이다.
도 4a 및 도 4b는 본 발명에 도입되는 생성된 가상 환자들의 데이터를 의사들에 의해 평가하는 평가 툴(tool)을 나타낸 도면이다.1 is a diagram schematically showing the configuration of a system for generating virtual patient information using machine learning according to an embodiment of the present invention.
2 is a flowchart illustrating an execution process of a method for generating virtual patient information using machine learning according to an embodiment of the present invention.
3 is a diagram schematically illustrating the concept of re-grouping and removing outliers based on all data having labels.
4A and 4B are diagrams illustrating an evaluation tool for evaluating data of generated virtual patients introduced in the present invention by doctors.

본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정되어 해석되지 말아야 하며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 한다.The terms or words used in the present specification and claims should not be construed as being limited to their ordinary or dictionary meanings, and the inventor may properly define the concept of the term in order to best describe his invention. Based on the principle, it should be interpreted as meaning and concept consistent with the technical idea of the present invention.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈", "장치" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when a part "includes" a certain element, it means that other elements may be further included, rather than excluding other elements, unless otherwise stated. In addition, terms such as “…unit”, “…group”, “module”, and “device” described in the specification mean a unit that processes at least one function or operation, which is hardware or software or a combination of hardware and software. can be implemented as

이하 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템의 구성을 개략적으로 나타낸 도면이다.1 is a diagram schematically illustrating the configuration of a system for generating virtual patient information using machine learning according to an embodiment of the present invention.

도 1을 참조하면, 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템(100)은 정보 수집부(110), 정보 생성부(120), 가상 환자군 생성부(130), 가상 환자 정보 생성부(140), 제어부(150)를 포함하여 구성된다.Referring to FIG. 1 , the virtual patient information generation system 100 using machine learning according to the present invention includes an information collection unit 110 , an information generation unit 120 , a virtual patient group generation unit 130 , and a virtual patient information generation unit. 140 , and a control unit 150 .

정보 수집부(110)는 공개되어 있고, 사용 가능한 질병 관련 정보 또는 환자의 생체 정보를 수집한다. 여기서, 이와 같은 정보 수집부(110)에 의해 수집되는 상기 공개되어 있고, 사용 가능한 질병 관련 정보 또는 환자의 생체 정보는 의학서적이나 의학논문에 기재되어 있는 정보 또는 데이터, 공개되고 법적으로 사용 가능한 환자의 생체정보 데이터, 공개되고 법적으로 사용 가능한 환자의 의료 영상(이미지) 데이터(예를 들면, X-ray, CT, MRI, 초음파 등) 중 적어도 어느 하나를 포함할 수 있다. 여기서, 수집되는 정보의 일 예로서, 예를 들어, Type Ⅱ 당뇨(제2형 당뇨병)가　진단된 환자의 경우에 대해 설명해 보기로 한다.The information collection unit 110 collects publicly available disease-related information or biometric information of a patient. Here, the disclosed and usable disease-related information or patient's biometric information collected by the information collection unit 110 is information or data described in medical books or medical papers, open and legally available patient information. of biometric data, and medical image (image) data (eg, X-ray, CT, MRI, ultrasound, etc.) of a patient that is publicly available and legally available. Here, as an example of the collected information, for example, a case of a patient diagnosed with Type II diabetes (type 2 diabetes) will be described.

- Type Ⅱ 당뇨(제2형 당뇨병)가　진단된 환자의 체질량 지수(Body Mass Index; BMI) 평균은 30이다. 이와 같은 체질량 지수(BMI)의 범위로서 25∼40이 있다.- The average body mass index (BMI) of patients diagnosed with type II diabetes (type 2 diabetes) is 30. A range of such a body mass index (BMI) is 25 to 40.

- Type Ⅱ 당뇨(제2형 당뇨병)가 진단된 환자의 복부 비만율은 60%이다.- The rate of abdominal obesity in patients diagnosed with Type II diabetes (type 2 diabetes) is 60%.

- Type Ⅱ 당뇨(제2형 당뇨병)가 진단된 환자의 성인기준 남성 대 여성의 비율이　6:4　정도이다.- The adult male to female ratio of patients diagnosed with Type II diabetes (type 2 diabetes) is about 6:4　.

- Type Ⅱ 당뇨(제2형 당뇨병)가 진단된 환자의 최초 평균 진단 나이는 남성은 45세,　여성은 50세이다.- The average age of initial diagnosis of patients diagnosed with Type II diabetes (type 2 diabetes) is 45 years old for men and 50 years old for women.

정보 생성부(120)는 상기 정보 수집부(110)에 의해 수집된 정보를 바탕으로 머신 러닝 AI(Artificial Intelligence) 모델(150m)을 이용하여 각각의 질병에 대해 각각 대응하는 그룹을 형성하고, 형성된 각 그룹에 속하는 환자의 정보(데이터)를 생성한다. 이와 같은 정보 생성부(120)가 상기 각 그룹에 속하는 환자의 정보(데이터)를 생성함에 있어서, 상기 각 그룹에 속하는 환자 집단이 표준 정규분포를 따르는 것으로 가정하고, 평균을 정한 후 상위 및 하위 각 5%를 제외한 나머지 90%의 정규분포 환자에 대한 정보를 생성할 수 있다. 여기서, 상위 및 하위 각 5%의 정보 제거와 관련하여 조금 더 설명을 부가하면, 모든 데이터셋(dataset)에서 상위, 하위 각 5%의 정보를 제거한다는 의미는 아니다. 왜냐하면, 때로는 상, 하위 5%의 환자가 가지고 있는 정보가 오히려 90%의 정규분포 환자가 가진 정보보다 유의미할 수 있기 때문이다. 따라서, 본 발명에서는 가상환자 정보 생성의 토대가 되는 기본 데이터셋을 철저히 분석하여 표본의 위치에 따라 소수 표본을 노이즈 표본, 불안정 표본, 경계 표본, 그리고 정규성을 따르는 표본으로 분류하여 각각의 샘플 특성에 따라 오버샘플링(oversampling)하는 방식을 도입한다. 여기서, 오버샘플링은 데이터가 불균형한 분포를 가지는 경우, 모델의 학습이 제대로 이루어지지 않을 확률이 높기 때문에 낮은 비율 클래스의 데이터 수를 증가함으로써 데이터 불균형을 해소하는 것을 말한다.The information generation unit 120 forms a group corresponding to each disease using a machine learning AI (Artificial Intelligence) model 150m based on the information collected by the information collection unit 110, and is formed Generate information (data) of patients belonging to each group. When the information generating unit 120 generates information (data) of patients belonging to each group, it is assumed that the patient groups belonging to each group follow a standard normal distribution, and after determining the average, the upper and lower angles It is possible to generate information about the remaining 90% normal distribution patients excluding 5%. Here, if a little more explanation is added with respect to the removal of information of each of the upper and lower 5%, it does not mean that the information of each of the upper and lower 5% is removed from all datasets. This is because, sometimes, the information possessed by the upper and lower 5% of patients can be more meaningful than the information possessed by 90% of the patients with a normal distribution. Therefore, in the present invention, by thoroughly analyzing the basic dataset, which is the basis for generating virtual patient information, and classifying a small number of samples into noise samples, unstable samples, boundary samples, and samples following normality according to the location of the sample, the characteristics of each sample are analyzed. A method of oversampling is introduced. Here, oversampling refers to resolving data imbalance by increasing the number of data of a low ratio class because there is a high probability that the model is not properly trained when data has an unbalanced distribution.

또한, 상기 정보 생성부(120)가 상기 수집된 정보를 바탕으로 머신 러닝 AI 모델(150m)을 이용하여 각각의 질병에 대해 각각 대응하는 그룹을 형성함에 있어서, 상기 수집된 정보가 상호 간에 상관관계가 있는지의 여부를 체크하여, 상관관계가 있는 정보들은 그 상관성을 분석하여 같은 그룹으로 묶고, 상관관계가 없는 정보들은 그들끼리(즉, 상관관계가 없는 정보들끼리) 임의로 섞어서 해당 질병(질환)의 가상 환자군을 생성할 수 있다. 이때, 생성한 가상 환자 정보의 양은 정보 수집부(110)에서 획득할 수 있는 정보의 양과 질에 따라 달라질 수 있다. 이상과 관련하여 일 예를 들어 부연 설명해 보면, 예를 들면, ○○대학교 부속병원에서 진료 또는 치료를 받는　1000명의 환자군으로서 남자는　620명,　여자는 380명이며,　그 평균 나이는 남성　46세,　여성　52세이다.　남성 및 여성 모두 복부비만율은　50%　이상이었으며, 체질량 지수는　28∼38로 측정되었고, 그 평균은　32였다.In addition, when the information generating unit 120 forms a group corresponding to each disease using the machine learning AI model 150m based on the collected information, the collected information is correlated with each other By checking whether there is a correlation, the correlated information is analyzed and grouped into the same group, and the uncorrelated information is randomly mixed with each other (that is, between uncorrelated information) and the disease (disease). of virtual patient groups can be created. In this case, the amount of generated virtual patient information may vary depending on the amount and quality of information obtainable from the information collection unit 110 . To elaborate on the above with an example, for example, in a group of 1000 patients receiving treatment or treatment at the ○○ University hospital, 620 males and 380 females, and the average age of males is 46 years old, Female　52 years old. The abdominal obesity rate for both men and women was 　50% 　 or higher, and the body mass index was measured from 　28 to 38, and the average was 　32.

또한, 상기 정보 생성부(120)가 상기 환자의 정보(데이터)를 생성함에 있어서, 문자, 숫자로 구성된 환자 정보는 통계 프로그램 R studio를 포함한 머신 러닝 툴(machine learning tool)을 이용하여 표준 정규분포 데이터를 생성할 수 있다.In addition, when the information generating unit 120 generates the patient's information (data), the patient information composed of letters and numbers is a standard normal distribution using a machine learning tool including a statistical program R studio. data can be generated.

가상 환자군 생성부(130)는 상기 정보 생성부(120)에 의해 생성된 각 그룹의 환자의 정보(데이터)를 바탕으로 상기 머신 러닝 AI 모델(150m)을 이용하여 복합 질환을 앓고 있는 새로운 가상 환자군을 각각 생성한다. 여기서, 이와 같은 가상 환자군 생성부(130)가 상기 각 그룹의 환자의 정보(데이터)를 바탕으로 머신 러닝 AI 모델(150m)을 이용하여 복합 질환을 앓고 있는 새로운 가상 환자군을 각각 생성함에 있어서, 앙상블(Ensemble) 기법을 포함한 머신 러닝 기법을 이용하여 각 질환별 환자의 정보(데이터)를 조합해서 새로운 가상 환자군을 생성할 수 있다. 여기서, 앙상블(Ensemble) 기법이란 여러 개의 모델을 조합하여 결과를 예측하는 기법으로서, 정확도가 높은 모델 하나를 사용하는 것보다 정확도가 낮은 모델을 여러 개 조합하여 사용하는 방식이 오히려 좋은 성능을 가질 수 있다라는 가정하에 만들어진 기법을 말한다.The virtual patient group generation unit 130 uses the machine learning AI model 150m based on the patient information (data) of each group generated by the information generation unit 120 to create a new virtual patient group suffering from a complex disease. create each. Here, when the virtual patient group generation unit 130 generates a new virtual patient group suffering from a complex disease using the machine learning AI model 150m based on the patient information (data) of each group, the ensemble A new virtual patient group can be created by combining patient information (data) for each disease using machine learning techniques, including the ensemble technique. Here, the ensemble technique is a technique for predicting the result by combining multiple models. Rather than using one high-accuracy model, the method using multiple low-accuracy models may have better performance. It refers to a technique created on the assumption that there is

가상 환자 정보 생성부(140)는 상기 가상 환자군 생성부(130)에 의해 생성된 각각의 가상 환자군에 대하여 상기 머신 러닝 AI 모델(150m)을 이용하여 각각의 가상 환자군에 속하는 가상 환자의 정보를 생성한다. 이때, 생성된 가상 환자의 정보에 대한 신뢰성을 확인하기 위해 각종 통계적 기법을 이용하여 가상 환자의 정보를 검증할 수 있다. 여기서, 이상과 같은 가상 환자 정보 생성부(140) 및 상기 가상 환자군 생성부(130)와 관련하여 설명을 조금 더 부가해 보기로 한다. 예를 들어, 고혈압 환자군, 당뇨 환자군, 고지혈증 환자군이 있을 때, 이들을 조합하여 다양한 환자군 및 그 정보를 생성할 수 있다. 즉, 고혈압, 당뇨,　고지혈증을 모두 앓고 있는 환자군 및 그 정보, "고혈압　+　당뇨", "고혈압　+　고지혈증",　"당뇨　+　고지혈증"을 앓고 있는 환자군의 정보, 고혈압, 당뇨,　고지혈증 중 하나씩만 앓고 있는 환자군 및 그 정보를 생성할 수 있다.The virtual patient information generation unit 140 generates information on virtual patients belonging to each virtual patient group by using the machine learning AI model 150m for each virtual patient group generated by the virtual patient group generation unit 130 . do. In this case, in order to verify the reliability of the generated virtual patient information, various statistical techniques may be used to verify the virtual patient information. Here, a little more explanation will be added in relation to the virtual patient information generating unit 140 and the virtual patient group generating unit 130 as described above. For example, when there are a hypertensive patient group, a diabetic patient group, and a hyperlipidemia patient group, various patient groups and information thereof may be generated by combining them. That is, the patient group suffering from hypertension, diabetes, and hyperlipidemia and their information, information on the patient group suffering from "hypertension 　 + 　 diabetes", "hypertension 　 + 　 hyperlipidemia", 　 "diabetes 　 + 　 hyperlipidemia", information on the patient group suffering from only one of hypertension, diabetes, and hyperlipidemia. You can create a patient group and its information.

제어부(150)는 상기 정보 수집부(110), 정보 생성부(120), 가상 환자군 생성부(130), 가상 환자 정보 생성부(140)의 상태 체크 및 동작을 제어하고, 상기 머신 러닝 AI 모델(150m)에 의한 각각의 질병에 대응하는 각 그룹의 형성, 새로운 가상 환자군 생성 및 가상 환자의 정보 생성을 위한 제어 명령을 송출한다. 이와 같은 제어부(150)는 마이크로프로세서나 마이크로컨트롤러 등으로 구성될 수 있다.The controller 150 controls the status check and operation of the information collection unit 110 , the information generation unit 120 , the virtual patient group generation unit 130 , and the virtual patient information generation unit 140 , and the machine learning AI model A control command for forming each group corresponding to each disease by 150m, creating a new virtual patient group, and generating information about the virtual patient is transmitted. Such a control unit 150 may be composed of a microprocessor or a microcontroller.

이상과 같은 구성을 가지는 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템(100)은 바람직하게는 상기 가상 환자 정보 생성부(140)에 의해 생성된 가상 환자의 정보(데이터)에 대해 SPSS, R Studio를 포함하는 통계 프로그램을 통해 신뢰도를 검증하고 필요시 수정하는 데이터 검증/수정부(160)를 더 포함할 수 있다.The virtual patient information generation system 100 using machine learning according to the present invention having the above configuration preferably includes SPSS, It may further include a data verification/correction unit 160 that verifies reliability through a statistical program including R Studio and corrects if necessary.

또한, 이상과 같은 구성을 가지는 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템(100)은 하나의 컴퓨터 시스템(예를 들면, 데스크탑 컴퓨터)으로 구성될 수도 있다.In addition, the virtual patient information generation system 100 using machine learning according to the present invention having the above configuration may be configured as one computer system (eg, a desktop computer).

그러면, 이하에서는 이상과 같은 구성을 가지는 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템을 기반으로 한 가상 환자 정보 생성 방법에 대해 설명해 보기로 한다.Hereinafter, a method for generating virtual patient information based on a system for generating virtual patient information using machine learning according to the present invention having the above configuration will be described.

도 2는 본 발명의 실시예에 따른 머신 러닝을 이용한 가상 환자 정보 생성 방법의 실행 과정을 나타낸 흐름도이다.2 is a flowchart illustrating an execution process of a method for generating virtual patient information using machine learning according to an embodiment of the present invention.

도 2를 참조하면, 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 방법은, 전술한 바와 같은 정보 수집부(110), 정보 생성부(120), 가상 환자군 생성부(130), 가상 환자 정보 생성부(140) 및 제어부(150)를 포함하는 머신 러닝을 이용한 가상 환자 정보 생성 시스템(100)에 기반한 가상 환자 정보 생성 방법으로서, 먼저 상기 정보 수집부(110)에 의해 공개되어 있고, 사용 가능한 질병 관련 정보 또는 환자의 생체 정보를 수집한다(단계 S201). 여기서, 상기 공개되어 있고, 사용 가능한 질병 관련 정보 또는 환자의 생체 정보는 의학서적이나 의학논문에 기재되어 있는 정보 또는 데이터, 공개되고 법적으로 사용 가능한 환자의 생체정보 데이터, 공개되고 법적으로 사용 가능한 환자의 의료 영상(이미지) 데이터(예를 들면, X-ray, CT, MRI, 초음파 등)중 적어도 어느 하나를 포함할 수 있다.Referring to FIG. 2 , the method for generating virtual patient information using machine learning according to the present invention includes the above-described information collecting unit 110 , information generating unit 120 , virtual patient group generating unit 130 , and virtual patient information. A method for generating virtual patient information based on a virtual patient information generation system 100 using machine learning including a generation unit 140 and a control unit 150, first disclosed by the information collection unit 110 and available Disease-related information or biometric information of the patient is collected (step S201). Here, the disclosed and available disease-related information or patient's biometric information includes information or data described in medical books or medical papers, publicly available and legally available patient's biometric data, and publicly available and legally available patient information. of medical image (image) data (eg, X-ray, CT, MRI, ultrasound, etc.).

이렇게 하여 정보 수집부(110)에 의해 정보가 수집되면, 정보 생성부(120)는 상기 수집된 정보를 바탕으로 머신 러닝 AI 모델(150m)을 이용하여 각각의 질병에 대해 각각 대응하는 그룹을 형성하고, 형성된 각 그룹에 속하는 환자의 정보(데이터)를 생성한다(단계 S202). 여기서, 상기 각 그룹에 속하는 환자의 정보(데이터)를 생성함에 있어서, 상기 각 그룹에 속하는 환자 집단이 표준 정규분포를 따르는 것으로 가정하고, 평균을 정한 후 상위 및 하위 각 5%를 제외한 나머지 90%의 정규분포 환자에 대한 정보를 생성할 수 있다. 여기서, 상위 및 하위 각 5%의 정보 제거와 관련하여 조금 더 설명을 부가하면, 전술한 바와 같이 모든 데이터셋(dataset)에서 상위, 하위 각 5%의 정보를 제거한다는 의미는 아니다. 왜냐하면, 때로는 상, 하위 5%의 환자가 보유하고 있는 정보가 오히려 90%의 정규분포 환자가 보유한 정보보다 더 유의미할 수 있기 때문이다. 따라서, 본 발명에서는 가상환자 정보 생성의 토대가 되는 기본 데이터셋을 철저히 분석하여 표본의 위치에 따라 소수 표본을 노이즈 표본, 불안정 표본, 경계 표본, 그리고 정규성을 따르는 표본으로 분류하여 각각의 샘플 특성에 따라 오버샘플링(oversampling)하는 방식을 도입한다. 여기서, 오버샘플링은 데이터가 불균형한 분포를 가지는 경우, 모델의 학습이 제대로 이루어지지 않을 확률이 높기 때문에 낮은 비율 클래스의 데이터 수를 증가함으로써 데이터 불균형을 해소하는 것을 말한다.When information is collected by the information collection unit 110 in this way, the information generation unit 120 forms a group corresponding to each disease using the machine learning AI model 150m based on the collected information. and generates patient information (data) belonging to each formed group (step S202). Here, in generating the patient information (data) belonging to each group, it is assumed that the patient group belonging to each group follows a standard normal distribution, and after determining the mean, the remaining 90% except for each of the upper and lower 5% It is possible to generate information about patients with a normal distribution of Here, if a little more explanation is added with respect to the removal of information of each of the upper and lower 5%, it does not mean that the information of each of the upper and lower 5% is removed from all datasets as described above. This is because, sometimes, the information possessed by the upper and lower 5% of patients can be more meaningful than the information possessed by 90% of the patients with a normal distribution. Therefore, in the present invention, by thoroughly analyzing the basic dataset, which is the basis for generating virtual patient information, and classifying a small number of samples into noise samples, unstable samples, boundary samples, and samples following normality according to the location of the sample, the characteristics of each sample are analyzed. A method of oversampling is introduced. Here, oversampling refers to resolving data imbalance by increasing the number of data of a low ratio class because there is a high probability that the model is not properly trained when data has an unbalanced distribution.

또한, 상기 정보 생성부(120)가 상기 수집된 정보를 바탕으로 머신 러닝 AI 모델(150m)을 이용하여 각각의 질병에 대해 각각 대응하는 그룹을 형성함에 있어서, 상기 수집된 정보가 상호 간에 상관관계가 있는지의 여부를 체크하여, 상관관계가 있는 정보들은 그 상관성을 분석하여 같은 그룹으로 묶고, 상관관계가 없는 정보들은 그들끼리 임의로 섞어서 해당 질병(질환)의 가상 환자군을 생성할 수 있다.In addition, when the information generating unit 120 forms a group corresponding to each disease using the machine learning AI model 150m based on the collected information, the collected information is correlated with each other By checking whether or not there is , correlated information can be grouped into the same group by analyzing the correlation, and uncorrelated information can be arbitrarily mixed with each other to create a virtual patient group of the disease (disease).

이상과 같이, 상기 정보 생성부(120)에 의해 각 그룹에 속하는 환자의 정보(데이터)가 생성되면, 가상 환자군 생성부(130)는 그 생성된 각 그룹의 환자의 정보(데이터)를 바탕으로 머신 러닝 AI 모델(150m)을 이용하여 복합 질환을 앓고 있는 새로운 가상 환자군을 각각 생성한다(단계 S203). 여기서, 상기 각 그룹의 환자의 정보(데이터)를 바탕으로 머신 러닝 AI 모델(150m))을 이용하여 복합 질환을 앓고 있는 새로운 가상 환자군을 각각 생성함에 있어서, 앙상블(Ensemble) 기법을 포함한 머신 러닝 기법을 이용하여 각 질환별 환자의 정보(데이터)를 조합해서 새로운 가상 환자군을 생성할 수 있다.As described above, when information (data) of a patient belonging to each group is generated by the information generating unit 120, the virtual patient group generating unit 130 is based on the generated information (data) of the patient in each group. Using the machine learning AI model 150m, each new virtual patient group suffering from a complex disease is generated (step S203). Here, in generating each new virtual patient group suffering from a complex disease using a machine learning AI model (150m)) based on the patient information (data) of each group, a machine learning technique including an ensemble technique can be used to create a new virtual patient group by combining patient information (data) for each disease.

이후, 가상 환자 정보 생성부(140)는 상기 생성된 각각의 가상 환자군에 대하여 머신 러닝 AI 모델(150m)을 이용하여 각각의 가상 환자군에 속하는 가상 환자의 정보를 생성한다(단계 S204).Thereafter, the virtual patient information generating unit 140 generates information on virtual patients belonging to each virtual patient group by using the machine learning AI model 150m for each of the generated virtual patient groups (step S204).

여기서, 이상과 같은 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 방법은, 바람직하게는 상기 단계 S204 이후에 데이터 검증/수정부(160)가 SPSS, R Studio를 포함하는 통계 프로그램을 통해 데이터의 신뢰도를 검증하고 필요시 수정하는 단계를 더 포함할 수 있다.Here, in the method for generating virtual patient information using machine learning according to the present invention as described above, preferably, after step S204, the data verification/correction unit 160 analyzes data through a statistical program including SPSS and R Studio. It may further include the step of verifying the reliability and correcting if necessary.

여기서, 또한 상기 가상 환자군 생성부(130)에 의한 가상 환자군 생성 및 가상 환자 정보 생생부(140)에 의한 가상 환자 정보 생성과 관련하여 조금 더 설명을 부가해 보기로 한다.Here, the virtual patient group generation by the virtual patient group generating unit 130 and the virtual patient information generation by the virtual patient information generating unit 140 will be further explained.

가상 환자군을 생성함에 있어서, 1차적으로 판단해야 할 사항은 가지고 있는 질병(질환) 또는 환자 생체정보 관련 실제 데이터가 모집단의 특성을 잘 반영하고 있는지에 대한 사항이다. 실제로 가지고 있는 데이터가 모집단의 특성을 잘 반영한 샘플이라고 본다면, 이를 토대로 새로운 데이터(정보)를 생성하여 기존의 가지고 있는 데이터와 합쳐서 사용할 경우 데이터가 부족한 문제도 어느 정도 해결이 가능하고, 가상 환자군 및 가상 환자의 정보 생성에 대한 연구에도 도움이 될 수 있다.In creating a virtual patient group, the first thing to be determined is whether the disease (disease) or actual data related to the patient's biometric information reflects the characteristics of the population well. If we consider that the data we actually have is a sample that reflects the characteristics of the population well, then if we create new data (information) based on it and use it together with the existing data, it is possible to solve the problem of insufficient data to some extent. It can also be helpful in research on patient information generation.

따라서, 아무것도 없는 상태에서 데이터를 생성해내는 것이 아니라 기존의 데이터를 기반으로 새로운 데이터를 생성해내어 데이터 변수(data variable)가 겹치는 케이스에서 사용해야 한다. 예를 들어, 위에서 언급한 당뇨병의 경우, 당뇨병 환자의 실제 데이터에 포함되어 있는 체질량 지수, 복부 비만율, 혈당 등의 데이터가 있지만, 당뇨병 과제(task)를 위해 수집된 데이터가 아니라 비만이나 심장질환을 조사하기 위해 수집된 데이터에서도 당뇨병 환자가 있다면 이를 추출해서 겹치는 변수들을 조합하여 데이터를 생성해낼 수 있다.Therefore, it is necessary to create new data based on existing data, rather than generating data from nothing, and use it in cases where data variables overlap. For example, in the case of diabetes mentioned above, there are data such as body mass index, abdominal obesity rate, and blood sugar included in the actual data of diabetic patients, but not the data collected for the diabetes task (obesity or heart disease) If there are diabetic patients from the data collected to investigate

더불어 최근 인공지능 분야에서도 유사한 문제에 대한 조사가 많이 이루어지고 있다.In addition, a lot of research on similar problems has been conducted in the field of artificial intelligence recently.

지도 학습(supervised learning)은 가지고 있는 데이터에 대한 라벨이 이미 정해진 상태에서 학습을 진행하는 것을 말한다. 즉, 각각의 데이터가 어떤 데이터인지 그 해답을 알고 있는 상태에서 학습을 진행한다.Supervised learning refers to learning in a state in which the labels for the data are already set. In other words, learning proceeds in a state where the answer is known what kind of data each data is.

비지도학습(unsupervised learning)은 위의 지도 학습과는 달리 사용하는 데이터에 특징(label)이 부여되어 있지 않다. 본 발명에서는 이러한 경우에 기존 데이터를 기반으로 비지도학습 데이터들이 어떻게 구성되어 있는지를 분석한다. 이와 같은 비지도학습 방법에도 본 발명의 방법 및 의료 데이터를 적용해 볼 수 있다.In unsupervised learning, unlike the above supervised learning, a label is not assigned to the data used. In this case, the present invention analyzes how unsupervised learning data is configured based on existing data. The method and medical data of the present invention can also be applied to such an unsupervised learning method.

이를 순차적으로 정리해 보면 다음과 같다.Here is a summary of these in order:

1) 당뇨병 환자들에 대해서 수집 가능한 데이터를 최대한 수집한다.1) Collect as much data as possible on diabetic patients.

2) 수집된 데이터에 어떠한 변수들이 있는지, 그리고 각 데이터가 당뇨병과 어느 정도의 상관관계를 가지는지 파악한다.2) Find out what variables exist in the collected data and to what extent each data has a correlation with diabetes.

3) 당뇨병과 높은 상관관계를 가지는 변수들 순으로 정리 후, 이를 포함하고 있는 다른 오픈 데이터(open data)를 찾아본다.3) After arranging variables that have a high correlation with diabetes in order, look for other open data that contain them.

4) 만약 당뇨병 환자들예게서 중요하게 여겨지는 변수가 "나이, 성별, 체질량지수, 복부비만율, 혈당, 혈압" 이라고 할 경우, 그러면 오픈 데이터 중에서 당뇨병을 판단하기 위해 수집된 데이터가 아니더라도 "나이, 성별, 체질량지수, 복부비만율, 혈당, 혈압"을 포함하고 있는 데이터를 찾아서 모두 수집한다.4) If the variables considered important for diabetic patients are "age, sex, body mass index, abdominal obesity rate, blood sugar, blood pressure", then "age , gender, body mass index, abdominal obesity rate, blood sugar, blood pressure" and collect all data.

5) 기존 당뇨병 데이터를 군집화하여 n개의 군집으로 나눈다.　만약 당뇨병인지, 아닌지를 판단하겠다고 하면 2개의 군집으로 나눈다.5) Cluster the existing diabetes data and divide it into n clusters. If you decide to determine whether you have diabetes or not, divide it into two clusters.

6) 새롭게 수집된 데이터와 기존 당뇨병 데이터를 중복되는 변수들을 기준으로 하여 합친다.6) Newly collected data and existing diabetes data are combined based on overlapping variables.

7) 군집화 기법(clustering methods)을 통해 새롭게 추가된 데이터들도 함께 n개의 군집으로 나눈다.7) Newly added data through clustering methods are also divided into n clusters.

8) 새롭게 추가된 데이터들이 어떤 군집에 속했는지 확인하고 라벨링 (labeling) 작업을 해준다.8) It checks which cluster the newly added data belongs to and performs labeling.

9) 모든 데이터가 라벨(label)을 가지고 있을 경우, 이를 통해 도 3에 도시된 바와 같이, 군집화를 다시 진행하고 이상치는 제거한다.9) When all data has a label, as shown in FIG. 3, clustering is performed again and outliers are removed.

한편, 위에서 언급한 “단계 S204 이후에 데이터 검증/수정부(160)가 SPSS, R Studio를 포함하는 통계 프로그램을 통해 데이터의 신뢰도를 검증하고 필요시 수정하는 단계를 더 포함할 수 있다.”라는 내용과 관련하여 조금 더 설명을 부가해 보기로 한다.On the other hand, the above-mentioned "after step S204, the data verification/correction unit 160 may further include the step of verifying the reliability of the data through a statistical program including SPSS and R Studio and correcting if necessary." I would like to add a little more explanation regarding the content.

도 4a 및 도 4b는 본 발명에 도입되는 생성된 가상 환자들의 데이터를 의사들에 의해 평가하는 평가 툴(tool)을 나타낸 도면이다.4A and 4B are diagrams illustrating an evaluation tool for evaluating data of generated virtual patients introduced in the present invention by doctors.

도 4a를 참조하면, 이는 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템 및 방법에 의해 생성된 가상 환자들의 데이터를 의사들(최소 3명의 의사)에 의해 평가하는 평가 툴(tool)로서, 먼저 (a)와 같이 임의의 가상 환자(본 예시에서는 '홍길동')에 대해 기본 증상(예컨대, 의식상태, 혈압, 맥박, 호흡상태, 체온, 통증강도 등)을 입력 받고, (b) 및 (c)와 같이 신체의 다양한 부분(예컨대, 호흡기계, 순환기계, 근골격계, 비뇨기계…, 등)과 특정 부위(예를 들면, 눈, 코, 귀 등) 또는 특정 사항(예를 들면, 정신건강, 물질오용 등)의 해당되는 증상이나 사항을 선택하도록 한다.Referring to FIG. 4A , this is an evaluation tool that evaluates data of virtual patients generated by the virtual patient information generation system and method using machine learning according to the present invention by doctors (at least three doctors), First, as in (a), basic symptoms (eg, state of consciousness, blood pressure, pulse, respiration, body temperature, pain intensity, etc.) are input for an arbitrary virtual patient ('Hong Gil-dong' in this example), and (b) and ( c) various parts of the body (eg respiratory system, circulatory system, musculoskeletal system, urinary system…, etc.) and specific parts (eg eyes, nose, ears, etc.) or specific aspects (eg mental health , substance misuse, etc.)

이후, 도 4b의 (d)와 같이 해당되는 증상을 모두 선택하도록 하고, (d)에서의 선택 결과에 따라 (e)와 같이 질문에 따른 답을 선택하도록 한다. 그리고 최종적으로 (f)와 해당 가상 환자에 대한 문진결과를 출력한다. 여기서, 이상과 같은 평가 툴은 하나의 소프트웨어 프로그램(일종의 application)으로서 본 발명을 활용하는 의료진의 개인 단말기(일반적인 PC나 스마트폰 등)에 제공될 수 있다.Then, as shown in (d) of FIG. 4B, all the corresponding symptoms are selected, and the answer according to the question is selected as shown in (e) according to the selection result in (d). And finally, (f) and the interview result for the virtual patient are output. Here, the evaluation tool as described above may be provided as one software program (a kind of application) to a personal terminal (a general PC or smart phone, etc.) of a medical staff utilizing the present invention.

이상과 같은 평가 툴(tool)에 의해 획득된 데이터에 대해 최소 3명의 의사들이 해당 데이터가 데이터로서 적합한지를 직접 검토하고, 과반수 이상이 적합하지 않다고 판단한 환자 데이터는 추출 및 제거하게 된다. 그리고 이와 같은 일련의 과정은 연구자(혹은 의료진)가 원하는 양의 데이터를 확보할 때까지 다수회 반복 수행한다. 이와 같이 기존 데이터와 생성된 가상 환자 정보(데이터)를 결합하여 본 발명에서의 머신 러닝을 이용한 가상 환자 정보의 원본 데이터로 활용할 수 있게 된다. 또한, 이와 같이 해서 생성된 가상 환자 정보는 일반 환자는 물론, 특히 응급 환자들에 대한 진료 및 치료에 유용하게 활용될 수 있다.At least three doctors directly review whether the data is appropriate as data for the data acquired by the above evaluation tool, and more than half of the patients judged to be unsuitable are extracted and removed. And this series of processes is repeated multiple times until the researcher (or medical staff) has the desired amount of data. In this way, by combining the existing data and the generated virtual patient information (data), it is possible to utilize it as original data of the virtual patient information using machine learning in the present invention. In addition, the generated virtual patient information may be usefully used for treatment and treatment of not only general patients, but especially emergency patients.

이상의 설명과 같이, 본 발명에 따른 머신 러닝을 이용한 가상 환자 정보 생성 시스템 및 방법은 머신 러닝 AI 모델을 이용하여 환자의 생체 정보와 의료 영상(이미지) 등을 결합하여 실제 환자와 비슷한 수준의 가상의 환자의 정보를 생성하여 제공함으로써, 의료 현장에서 일반 환자는 물론 응급 환자들의 진료 및 치료에 많은 도움을 줄 수 있는 장점이 있다.As described above, the system and method for generating virtual patient information using machine learning according to the present invention combines the patient's biometric information with medical images (images) using a machine learning AI model to create a virtual virtual reality similar to that of a real patient. By generating and providing patient information, there is an advantage that can help a lot in the treatment and treatment of emergency patients as well as general patients in the medical field.

이상, 바람직한 실시 예를 통하여 본 발명에 관하여 상세히 설명하였으나, 본 발명은 이에 한정되는 것은 아니며, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 다양하게 변경, 응용될 수 있음은 당해 기술분야의 통상의 기술자에게 자명하다. 따라서, 본 발명의 진정한 보호 범위는 다음의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술적 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.As mentioned above, although the present invention has been described in detail through preferred embodiments, the present invention is not limited thereto, and it is common in the art that various changes and applications can be made without departing from the technical spirit of the present invention. self-explanatory to the technician. Accordingly, the true protection scope of the present invention should be construed by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

100: (본 발명)머신 러닝을 이용한 가상 환자 정보 생성 시스템
110: 정보 수집부 120: 정보 생성부
130: 가상 환자군 생성부 140: 가상 환자 정보 생성부
150: 제어부 150m: 머신 러닝 AI 모델
160: 데이터 검증/수정부100: (the present invention) virtual patient information generation system using machine learning
110: information collection unit 120: information generation unit
130: virtual patient group generation unit 140: virtual patient information generation unit
150: control unit 150m: machine learning AI model
160: data verification/correction unit

Claims

an information collection unit for collecting publicly available disease-related information or biometric information of a patient;
Based on the information collected by the information collection unit, using a machine learning AI (Artificial Intelligence) model to form a group corresponding to each disease, and to generate information (data) of patients belonging to each formed group an information generating unit;
a virtual patient group generation unit generating a new virtual patient group each suffering from a complex disease by using the machine learning AI model based on the patient information (data) of each group generated by the information generation unit;
a virtual patient information generation unit for generating information on virtual patients belonging to each virtual patient group by using the machine learning AI model for each virtual patient group generated by the virtual patient group generation unit; and
The information collection unit, the information generation unit, the virtual patient group generation unit, and the virtual patient information generation unit control the status check and operation, and form each group corresponding to each disease by the machine learning AI model, create a new virtual patient group, and A system for generating virtual patient information using machine learning, comprising a control unit that transmits a control command for generating virtual patient information.

According to claim 1,
Using machine learning that further includes a data verification/correction unit that verifies reliability through statistical programs including SPSS and R Studio for the virtual patient information (data) generated by the virtual patient information generation unit and corrects if necessary Virtual patient information generation system.

According to claim 1,
The disclosed and usable disease-related information or patient biometric information collected by the information collection unit includes information or data described in medical books or medical papers, public and legally available patient biometric data, and disclosure A virtual patient information generation system using machine learning that includes at least one of medical image (image) data of a patient that is legally usable.

According to claim 1,
When the information generating unit generates information (data) of patients belonging to each group, it is assumed that the patient populations belonging to each group follow a standard normal distribution, and after determining the mean, the remainder except for each of the upper and lower 5% A virtual patient information generation system using machine learning that generates information about 90% of normal patients.

According to claim 1,
When the information generating unit forms a group corresponding to each disease using a machine learning AI model based on the collected information, it is checked whether the collected information is correlated with each other, A virtual patient information generation system using machine learning that analyzes the correlation of related information and groups them into the same group, and randomly mixes non-correlated information with each other to create a virtual patient group for the disease (disease).

According to claim 1,
When the information generating unit generates the patient information (data), the patient information composed of letters and numbers is machine learning that generates standard normal distribution data using a machine learning tool including a statistical program R studio. Virtual patient information generation system using

According to claim 1,
When the virtual patient group generation unit generates a new virtual patient group suffering from a complex disease using a machine learning AI model based on the patient information (data) of each group, the machine learning technique including the ensemble technique is used. A virtual patient information generation system using machine learning that creates a new virtual patient group by combining patient information (data) for each disease.

a) collecting the available disease-related information or biometric information of the patient disclosed by the information collection unit;
b) an information generating unit forming a group corresponding to each disease by using a machine learning AI model based on the collected information, and generating information (data) of a patient belonging to each formed group;
c) generating, by the virtual patient group generation unit, a new virtual patient group each suffering from a complex disease using the machine learning AI model based on the patient information (data) of each group generated in step b); and
d) generating, by a virtual patient information generating unit, information of a virtual patient belonging to each virtual patient group by using a machine learning AI model for each of the generated virtual patient groups.

9. The method of claim 8,
After the step d), the data verification/correction unit verifies the reliability of the data through a statistical program including SPSS and R Studio, and further comprises the step of correcting if necessary.

9. The method of claim 8,
In step a), the disclosed and available disease-related information or patient's biometric information includes information or data described in medical books or medical papers, publicly available and legally available patient biometric data, and disclosed and legally A method of generating virtual patient information using machine learning including at least one of available patient medical image (image) data.

9. The method of claim 8,
In generating the patient information (data) belonging to each group in step b), it is assumed that the patient population belonging to each group follows a standard normal distribution, and after determining the mean, the upper and lower 5% are excluded. A method of generating virtual patient information using machine learning that generates information about the remaining 90% of patients with normal distribution.

9. The method of claim 8,
In forming a group corresponding to each disease using a machine learning AI model based on the collected information in step b), it is checked whether the collected information is correlated with each other, A method of generating virtual patient information using machine learning that analyzes the correlation of correlated information and groups them into the same group, and randomly mixes uncorrelated information with each other to create a virtual patient group for the disease (disease).

9. The method of claim 8,
In generating the patient information (data) in step b), the patient information composed of letters and numbers is a machine that generates standard normal distribution data using a machine learning tool including a statistical program R studio. A method of generating virtual patient information using learning.

9. The method of claim 8,
In step c), in generating each new virtual patient group suffering from a complex disease using a machine learning AI model based on the information (data) of each group of patients, a machine learning technique including an ensemble technique was used. A method of generating virtual patient information using machine learning to create a new virtual patient group by combining patient information (data) for each disease using