KR20220064243A

KR20220064243A - Artificial intelligence-based abnormal reaction monitoring method and system therefor

Info

Publication number: KR20220064243A
Application number: KR1020200150536A
Authority: KR
Inventors: 윤덕용; 장종환; 김유정; 박남기
Original assignee: 아주대학교산학협력단
Priority date: 2020-11-11
Filing date: 2020-11-11
Publication date: 2022-05-18
Also published as: KR102559641B1

Abstract

The present specification relates to a method for monitoring an abnormal response based on artificial intelligence and an apparatus therefor. More specifically, the method for monitoring an abnormal response disclosed in the present invention includes the steps of: extracting input data for learning a machine learning model from a database storing medical data related to medical records of patients, based on the medical data; inputting the extracted input data to the machine learning model, and learning the machine learning model to assign feature importance to each of all elements included in the input data, in which the feature importance is calculated based on a degree of correlation between a specific element and the abnormal response determined through the machine learning model; and determining at least one suspicious element related to the occurrence of the abnormal response among the all elements based on the feature importance. Thus, the number of reaction-causing test objects is reduced.

Description

Artificial intelligence-based abnormal reaction monitoring method and system therefor

본 발명은 인공 지능 기반의 이상 반응 감시 방법 및 이에 대한 시스템에 관한 것으로써, 보다 구체적으로 머신러닝 모델 학습 과정에서 획득된 특징 중요도에 기반하여 이상 반응을 감시하는 방법 및 이에 대한 시스템에 관한 것이다.The present invention relates to an artificial intelligence-based abnormal reaction monitoring method and system, and more particularly, to a method and system for monitoring an abnormal reaction based on feature importance obtained in a machine learning model learning process.

최근 약물 이상 반응 감시에 대한 관심이 증가하고 그 중요성이 강조되고 있는 가운데 식품의약품안정청, 약물의 처방 및 제조가 이루어지는 의료기관, 의약품을 생산하는 제약 회사 등에서 약물 이상 반응을 선제적으로 감시하여 대응하기 위한 노력이 이루어지고 있다.In the midst of increasing interest in and importance of monitoring for adverse drug reactions in recent years, the Food and Drug Administration, medical institutions that prescribe and manufacture drugs, and pharmaceutical companies that produce drugs, etc. Efforts are being made

현재 통계 모델링을 통한 이상반응 능동감시가 주를 이루고 있고, 세계 유명 학회지 등에 따르면 통계 모델링은 모집단에 대한 추론을 도출하는데 특화되어 있으며, 머신러닝은 일반화 가능한 패턴을 찾는 것에 특화되어 있는 것으로 밝혀졌다. 보다 구체적으로, 트리 스캔 기반 분석(Tree scan based statistic)이라는 데이터 마이닝 기법이 이상반응 능동 감시에 함께 활용되어왔다. 상기 트리 스캔 기반 분석에서는, 계층 구조를 갖는 분석 대상에 대해서만 분석이 이루어질 수 있고, 진단 혹은 처방 유무만이 분석에 활용될 뿐, 진단 혹은 처방 횟수와 같은 정량적 정보가 반영되지 못한다는 점에서 분석 능력의 한계를 가진다. 또한, 상기 트리 스캔 기반 분석에서는, 다른 계층의 약물 간의 상호작용이 고려되지 못하는 점에서 분석 능력의 한계를 가진다. Currently, active monitoring of adverse reactions through statistical modeling is the mainstay, and according to world-renowned academic journals, statistical modeling is specialized in deriving inferences about the population, and machine learning is specialized in finding generalizable patterns. More specifically, a data mining technique called tree scan based statistic has been used for active monitoring of adverse reactions. In the tree scan-based analysis, analysis can be performed only on an analysis target having a hierarchical structure, only the presence or absence of diagnosis or prescription is used for analysis, and quantitative information such as the number of diagnoses or prescriptions is not reflected. has the limitations of In addition, in the tree scan-based analysis, there is a limitation in analysis ability in that interactions between drugs of different layers are not considered.

상기 트리 스캔 기반의 분석 외에, 또 다른 머신러닝 기반의 이상반응 연구로는 자연어처리에 기반한 분석이 있다. 자연어처리에 기반한 분석에서는, 문헌조사를 활용하여 간접적으로 이상 반응을 찾아낼 수 있다. 그러나, 상기 자연어처리 기반 분석을 통해 유추된 이상반응은 자연어처리가 부정확성을 갖는다는 점, 유추된 이상반응은 처방 및 진단 데이터를 직접 연구한 것에 기반한 것이 아닌 점에서 한계를 가진다. 또한, 기존 통계 모델링을 통한 연구에서는 테라바이트 단위의 데이터를 closed form 형태로 최적해를 찾는 방식이 사용되는데, 테라바이트 단위의 데이터에 대하여 closed form 형태로 최적해를 찾기 위해서는 소요되는 컴퓨팅 파워와 연산량이 매우 커서 빅데이터 적용이 쉽지 않은 문제점이 존재한다.In addition to the tree scan-based analysis, another machine learning-based adverse reaction study includes an analysis based on natural language processing. In analysis based on natural language processing, adverse reactions can be found indirectly by using literature search. However, the adverse reactions inferred through the natural language processing-based analysis have limitations in that natural language processing has inaccuracies and that the inferred adverse reactions are not based on direct study of prescription and diagnostic data. In addition, in research through conventional statistical modeling, the method of finding the optimal solution in closed form for terabytes of data is used. However, the amount of computing power and computation required to find the optimal solution in closed form for terabytes of data is very high. There is a problem that it is not easy to apply big data because it is a cursor.

본 명세서는 인공지능 기반의 이상 반응 감시 방법 및 이에 대한 시스템 을 제공함에 그 목적이 있다.An object of the present specification is to provide an artificial intelligence-based abnormal reaction monitoring method and a system therefor.

또한, 본 명세서는 머신러닝 모델의 특징 중요도를 활용하여 이상반응을 유발하는 의심 요소를 감지하는 방법 및 이에 대한 시스템을 제공함에 그 목적이 있다.Another object of the present specification is to provide a method and a system for detecting a suspicious element causing an adverse reaction by using the feature importance of a machine learning model.

본 명세서에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in this specification are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the description below. will be able

본 명세서는 인공지능 기반의 이상 반응 감시 방법 및 이에 대한 시스템 을 제공한다. The present specification provides an artificial intelligence-based abnormal reaction monitoring method and a system therefor.

보다 구체적으로, 본 명세서는, 환자들의 의료 기록과 관련된 의료 데이터를 저장하는 데이터 베이스(data base)로부터 머신러닝 모델(machine learning model) 학습을 위한 입력 데이터를 상기 의료 데이터에 기초하여 추출하는 단계; 상기 추출된 입력 데이터를 머신러닝 모델에 입력하여, 상기 입력 데이터에 포함된 전체 요소들 각각에 대하여 특징 중요도가 부여되도록 상기 머신러닝 모델을 학습시키는 단계, 상기 특징 중요도는 상기 머신러닝 모델 학습을 통하여 판단된 상기 이상 반응과 특정 요소 간의 상관 관계의 정도에 기초하여 각각 계산되고; 및 상기 특징 중요도에 기초하여 상기 전체 요소들 중에서 이상 반응의 유발과 관련된 적어도 하나의 의심 요소를 결정하는 단계를 포함하는 것을 특징으로 한다.More specifically, the present specification provides a method comprising: extracting input data for learning a machine learning model from a database storing medical data related to medical records of patients based on the medical data; inputting the extracted input data into a machine learning model, and training the machine learning model so that feature importance is assigned to each of all elements included in the input data, wherein the feature importance is determined through learning the machine learning model calculated based on the degree of correlation between the determined adverse reaction and a specific factor; and determining at least one suspicious factor related to induction of an adverse reaction from among the total factors based on the feature importance.

또한, 본 명세서는, 상기 입력 데이터를 추출하는 단계는, 상기 이상 반응이 상기 환자들에게 발생한 시점과 관련된 이상 반응 발생 시점 데이터를 환자 별로 각각 추출하는 단계를 더 포함하는 것을 특징으로 할 수 있다.Also, in the present specification, the step of extracting the input data may further include the step of extracting, for each patient, data at the time of occurrence of the adverse event related to the time at which the adverse event occurred in the patients, respectively.

또한, 본 명세서는, 상기 입력 데이터를 추출하는 단계는, 상기 환자 별로 각각 추출된 이상 반응 발생 데이터에 기초하여, 상기 이상 반응이 특정 환자에게 발생한 시점 이전 특정 기간 동안의 상기 특정 환자의 의료 기록과 관련된 이벤트(event) 구간 데이터를 추출하는 단계를 더 포함하는 것을 특징으로 할 수 있다.Also, in the present specification, the extracting of the input data includes, based on the data on occurrence of an adverse event extracted for each patient, a medical record of the specific patient for a specific period before the time when the adverse event occurs in the specific patient and It may be characterized in that it further comprises the step of extracting the related event (event) interval data.

또한, 본 명세서는, 상기 입력 데이터를 추출하는 단계는, 상기 이상 반응이 특정 환자에게 발생한 시점 외의 임의의 시점 이전 상기 특정 기간 동안의 상기 특정 환자의 의료 기록과 관련된 제어(control) 구간 데이터를 추출하는 단계를 더 포함하는 것을 특징으로 할 수 있다.In addition, in the present specification, the step of extracting the input data includes extracting control section data related to the medical record of the specific patient during the specific period before any time other than the time when the adverse event occurs in the specific patient. It may be characterized in that it further comprises the step of

또한, 본 명세서는, 상기 환자들에 대한 상기 이벤트 구간 데이터 및 상기 제어 구간 데이터 각각에 포함된 의료 기록은 특정 증상에 대한 진단과 관련된 기록, 약물 처방과 관련된 기록 또는 백신 접종과 관련된 기록 중 적어도 하나를 포함하는 것을 특징으로 할 수 있다.In addition, in the present specification, the medical record included in each of the event section data and the control section data for the patients is at least one of a record related to diagnosis of a specific symptom, a record related to drug prescription, or a record related to vaccination It may be characterized in that it includes.

또한, 본 명세서는, 상기 이벤트 구간 데이터가 추출되는 시간 구간 및 상기 제어 구간 데이터가 추출되는 시간 구간의 사이에는 특정 간격의 시간 구간이 삽입되는 것을 특징으로 할 수 있다.In addition, the present specification may be characterized in that a time section of a specific interval is inserted between the time section from which the event section data is extracted and the time section from which the control section data is extracted.

또한, 본 명세서는, 상기 삽입된 특정 간격의 시간 구간에 기초하여 상기 이벤트 구간 데이터가 추출되는 시간 구간 및 상기 제어 구간 데이터가 추출되는 시간 구간은 중첩없이 시간적으로 이격되는 것을 특징으로 할 수 있다.In addition, the present specification may be characterized in that the time interval from which the event interval data is extracted and the time interval from which the control interval data is extracted based on the inserted time interval of the specific interval are temporally spaced apart without overlapping.

또한, 본 명세서는, 상기 머신러닝 모델은 상기 전체 요소들 각각에 대하여 상기 특징 중요도를 부여하여, 상기 이벤트 구간 데이터와 상기 제어 구간 데이터를 구분하도록 학습되는 것을 특징으로 할 수 있다.In addition, in the present specification, the machine learning model may be characterized in that it is learned to distinguish between the event section data and the control section data by giving the feature importance to each of the entire elements.

또한, 본 명세서는, 상기 머신러닝 모델 학습이 완료된 상기 머신러닝 모델에 상기 이벤트 구간 데이터 또는 상기 제어 구간 데이터 중 하나인 특정 입력 데이터를 입력하여 상기 이상 반응의 발생 여부를 판단하는 단계를 더 포함하는 것을 특징으로 할 수 있다.In addition, the present specification further comprises the step of determining whether the abnormal reaction occurs by inputting specific input data that is one of the event section data or the control section data to the machine learning model on which the machine learning model learning is completed can be characterized as

또한, 본 명세서는, 상기 특징 중요도는 상기 상관 관계의 정도가 상대적으로 큰 것으로 판단된 요소에 대하여 높은 값으로 부여되는 것을 특징으로 할 수 있다.In addition, the present specification may be characterized in that the feature importance is given as a high value to a factor determined to have a relatively high degree of correlation.

또한, 본 명세서는, 상기 적어도 하나의 의심 요소는 상기 전체 요소들 중에서 상기 특징 중요도가 특정한 임계값보다 큰 요소들로 구성되는 것을 특징으로 할 수 있다.In addition, the present specification may be characterized in that the at least one suspicious element is composed of elements whose feature importance is greater than a specific threshold value among all the elements.

또한, 본 명세서는, 상기 특정한 임계값은 상기 전체 요소들에 각각 부여된 상기 특징 중요도의 전체 평균 값의 2배인 값으로 설정되는 것을 특징으로 할 수 있다.In addition, the present specification may be characterized in that the specific threshold value is set to a value that is twice the overall average value of the feature importance given to each of the all elements.

또한, 본 명세서에서 제공되는 이상 반응 감시 시스템은, 환자들의 의료 기록과 관련된 의료 데이터를 저장하는 데이터 베이스(data base); 머신러닝 모델(machine learning model); 및 제어부;를 포함하고, 상기 제어부는, 상기 데이터 베이스로부터 머신러닝 모델(machine learning model) 학습을 위한 입력 데이터를 상기 의료 데이터에 기초하여 추출하고, 상기 추출된 입력 데이터를 머신러닝 모델에 입력하여, 상기 입력 데이터에 포함된 전체 요소들 각각에 대하여 특징 중요도가 부여되도록 상기 머신러닝 모델을 학습시키고, 상기 특징 중요도는 상기 머신러닝 모델 학습을 통하여 판단된 상기 이상 반응과 특정 요소 간의 상관 관계의 정도에 기초하여 각각 계산되고, 상기 특징 중요도에 기초하여 상기 전체 요소들 중에서 이상 반응의 유발과 관련된 적어도 하나의 의심 요소를 결정하는 것을 특징으로 한다.In addition, the adverse reaction monitoring system provided herein includes a database (data base) for storing medical data related to patients' medical records; machine learning models; and a control unit, wherein the control unit extracts input data for learning a machine learning model from the database based on the medical data, and inputs the extracted input data to the machine learning model. , the machine learning model is trained so that feature importance is assigned to each of the elements included in the input data, and the feature importance is the degree of correlation between the specific element and the abnormal response determined through the machine learning model learning is calculated based on , and at least one suspect factor related to induction of an adverse reaction is determined from among all the factors based on the feature importance.

또한, 본 명세서는, 상기 제어부는, 상기 입력 데이터를 추출하기 위해, 상기 이상 반응이 상기 환자들에게 발생한 시점과 관련된 이상 반응 발생 시점 데이터를 환자 별로 각각 추출하는 것을 특징으로 할 수 있다.In addition, in the present specification, the controller may extract, for each patient, data at the time of occurrence of the adverse event related to the time at which the adverse event occurred in the patients, respectively, in order to extract the input data.

또한, 본 명세서는, 상기 제어부는, 상기 입력 데이터를 추출하기 위해, 상기 환자 별로 각각 추출된 이상 반응 발생 데이터에 기초하여, 상기 이상 반응이 특정 환자에게 발생한 시점 이전 특정 기간 동안의 상기 특정 환자의 의료 기록과 관련된 이벤트(event) 구간 데이터를 추출하는 것을 특징으로 할 수 있다.In addition, in the present specification, in order to extract the input data, the control unit may control the specific patient's It may be characterized in that the event period data related to the medical record is extracted.

또한, 본 명세서는, 상기 제어부는, 상기 입력 데이터를 추출하기 위해, 상기 이상 반응이 특정 환자에게 발생한 시점 외의 임의의 시점 이전 상기 특정 기간 동안의 상기 특정 환자의 의료 기록과 관련된 제어(control) 구간 데이터를 추출 하는 것을 특징으로 할 수 있다.In addition, in the present specification, in order to extract the input data, the controller is a control section related to the medical record of the specific patient during the specific period before any time other than the time when the abnormal reaction occurs in the specific patient. It may be characterized by extracting data.

또한, 본 명세서는, 상기 환자들에 대한 상기 이벤트 구간 데이터 및 상기 제어 구간 데이터 각각에 포함된 의료 기록은 특정 증상에 대한 진단과 관련된 기록, 약물 처방과 관련된 기록 또는 백신 접종과 관련된 기록 중 적어도 하나를 포함 하는 것을 특징으로 할 수 있다.In addition, in the present specification, the medical record included in each of the event section data and the control section data for the patients is at least one of a record related to diagnosis of a specific symptom, a record related to drug prescription, or a record related to vaccination It may be characterized by including.

본 명세서는 인공지능에 기반하여 이상 반응 감시할 수 있는 효과가 있다.The present specification has the effect of monitoring abnormal reactions based on artificial intelligence.

또한, 본 명세서는 머신러닝 모델의 특징 중요도를 활용하여 이상반응을 유발하는 의심 요소를 감지할 수 있는 효과가 있다.In addition, the present specification has the effect of detecting a suspicious element causing an adverse reaction by utilizing the feature importance of the machine learning model.

또한, 본 명세서는 이상반응을 유발하는 의심 요소를 감지하여 이상 반응 유발 검사 대상 물질의 수를 감소시킬 수 있는 효과가 있다. In addition, the present specification has the effect of reducing the number of substances to be tested for inducing an adverse reaction by detecting a suspicious element that induces an adverse reaction.

본 명세서에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.Effects that can be obtained in this specification are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description. will be.

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는, 첨부 도면은 본 발명에 대한 실시 예를 제공하고, 상세한 설명과 함께 본 발명의 기술적 특징을 설명한다.
도 1은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 시스템의 블록도이다.
도 2는 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법의 순서도이다.
도 3은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에서, 머신러닝 모델 학습을 위한 입력 데이터를 추출하는 예시를 나타낸 도이다.
도 4는 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에서, 머신러닝 모델의 학습 과정의 이해를 돕기위한 예시를 나타낸 도이다.
도 5는 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에서, 머신러닝 모델의 학습 과정의 이해를 돕기위한 예시를 나타낸 도이다.
도 6은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에서, 머신러닝 모델 학습을 학습하는 과정의 예시를 나타낸 도이다.
도 7은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에 기초하여 이상반응을 유발하는 의심 약물을 감시한 결과를 나타낸 도이다.
도 8은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법을 수행하기 위한 제어부에서 구현되는 동작의 일례를 나타낸 순서도이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as part of the detailed description to help the understanding of the present invention, provide embodiments of the present invention, and together with the detailed description, explain the technical features of the present invention.
1 is a block diagram of an artificial intelligence-based abnormal reaction monitoring system according to an embodiment of the present invention.
2 is a flowchart of an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.
3 is a diagram illustrating an example of extracting input data for machine learning model learning in an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.
4 is a diagram illustrating an example for helping the understanding of a learning process of a machine learning model in an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.
5 is a diagram illustrating an example for helping the understanding of a learning process of a machine learning model in an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.
6 is a diagram illustrating an example of a process of learning a machine learning model in an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.
7 is a diagram illustrating a result of monitoring a suspected drug causing an adverse reaction based on an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.
8 is a flowchart illustrating an example of an operation implemented in a control unit for performing an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는 첨부 도면은 본 발명에 대한 실시예를 제공하고, 상세한 설명과 함께 본 발명의 기술적 특징을 설명한다. 명세서 전체에 걸쳐서 동일한 참조번호들은 원칙적으로 동일한 구성요소들을 나타낸다. 또한, 본 발명과 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니 됨을 유의해야 한다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as part of the detailed description for better understanding of the present invention, provide embodiments of the present invention, and together with the detailed description, explain the technical features of the present invention. Throughout the specification, like reference numerals refer to like elements in principle. In addition, if it is determined that a detailed description of a known function or configuration related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, it should be noted that the accompanying drawings are only for easy understanding of the spirit of the present invention, and should not be construed as limiting the spirit of the present invention by the accompanying drawings.

이하, 본 발명과 관련된 방법 및 장치에 대하여 도면을 참조하여 보다 상세하게 설명한다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "유닛", "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. "제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다.Hereinafter, the method and apparatus related to the present invention will be described in more detail with reference to the drawings. In addition, the general terms used in the present invention should be interpreted according to the definition in the dictionary or according to the context before and after, and should not be interpreted in an excessively reduced meaning. Also, as used herein, the singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “consisting of” or “comprising” should not be construed as necessarily including all of the various components or various steps described in the specification, some of which components or some steps are It should be construed that it may not include, or may further include additional components or steps. The suffixes "unit", "module" and "unit" for components used in the following description are given or mixed in consideration of ease of writing the specification, and do not have a meaning or role distinct from each other by themselves. . Terms such as “first” and “second” are for distinguishing one component from another, and the scope of rights should not be limited by these terms.

이하, 본 발명에 따른 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, an embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 시스템의 블록도이고, 도 2는 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법의 순서도이다.1 is a block diagram of an artificial intelligence-based abnormal response monitoring system according to an embodiment of the present invention, and FIG. 2 is a flowchart of an AI-based abnormal response monitoring method according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 본 발명의 일 실시예는 제어부(130)가 데이터 베이스(110)로부터 머신러닝 모델(machin learning model)을 학습시키기 위한 데이터를 추출할 수 있다(S210). 보다 구체적으로, 제어부(130)는, 데이터 베이스(110)로부터, 머신러닝 모델(120) 학습을 위한 데이터 추출에 사용되는 데이터를 먼저 추출하고, 상기 먼저 추출된 데이터에 기초하여 상기 머신러닝 모델 학습을 위한 데이터를 추출할 수 있다. 제어부(130)가 상기 먼저 추출된 데이터에 기초하여 상기 머신러닝 모델 학습을 위한 데이터를 추출하는 과정은 ‘데이터 전처리(preprocessing)’ 과정으로 호칭될 수 있다. 여기서, 데이터 베이스(110)는 환자들의 의료 기록과 관련된 의료 데이터를 저장할 수 있다.1 and 2 , according to an embodiment of the present invention, the controller 130 may extract data for learning a machine learning model from the database 110 ( S210 ). More specifically, the controller 130 first extracts data used for data extraction for learning the machine learning model 120 from the database 110 , and learns the machine learning model based on the previously extracted data. data can be extracted for A process in which the controller 130 extracts data for learning the machine learning model based on the previously extracted data may be referred to as a 'data preprocessing' process. Here, the database 110 may store medical data related to medical records of patients.

이후, 제어부(130)는 S210 단계에서 추출된 머신러닝 모델 학습을 위한 데이터 사용하여 머신러닝 모델(120)을 학습시킬 수 있다(S220). 보다 구체적으로, 상기 추출된 머신러닝 모델 학습을 위한 데이터들은 특정한 입력 값과(X) 상기 특정한 입력 값(X)에 대한 출력 값(Y)의 형태로 구성 될 수 있다. 여기서, 머신러닝 모델(120)을 학습시키는 과정은 임의의 입력 값(X)과 상기 임의의 입력 값(X)에 대한 출력 값(Y) 사이의 관계를 추론하는 과정으로 이해될 수 있다. 즉, 학습이 완료된 머신러닝 모델(120)은 입력 값과 출력 값 사이의 관계를 알 수 있고, 학습이 완료된 머신러닝 모델(120)에 임의의 입력 값(X)를 입력하면 머신러닝 모델(120)은 추론된 상관 관계에 기초하여 출력 값(Y)를 출력할 수 있게 된다. 본 발명에서, 입력 값(X)은 의심 약물, 백신 접종 등일 수 있고, 출력 값(Y)은 이상 반응 발생 여부 일 수 있다. 이하, 본 명세서에서 머신러닝 모델(120)을 학습시킨다는 것의 의미는 상술한 내용을 바탕으로 이해될 수 있다. Thereafter, the controller 130 may train the machine learning model 120 by using the data for learning the machine learning model extracted in step S210 ( S220 ). More specifically, the extracted data for machine learning model learning may be configured in the form of a specific input value (X) and an output value (Y) for the specific input value (X). Here, the process of training the machine learning model 120 may be understood as a process of inferring a relationship between an arbitrary input value (X) and an output value (Y) for the arbitrary input value (X). That is, the trained machine learning model 120 can know the relationship between the input value and the output value, and when an arbitrary input value (X) is input to the machine learning model 120 that has been trained, the machine learning model 120 ) can output the output value Y based on the inferred correlation. In the present invention, the input value (X) may be a suspected drug, vaccination, etc., and the output value (Y) may be whether or not an abnormal reaction occurs. Hereinafter, the meaning of learning the machine learning model 120 in the present specification may be understood based on the above-described content.

다음, 제어부(130)는 머신러닝 모델(120)을 학습시킨 결과에 기초하여 이상 반응을 유발한 의심 요소를 결정할 수 있다(S230). 보다 구체적으로, 본 발명에서 머신러닝 모델(120)을 학습시킨 결과에 기초하여 의심요소를 결정한다는 것은, 학습한 결과 머신러닝 모델(120)이 추론한 입력 값(X)과 출력 값(Y)의 관계 자체를 사용하는 것으로 이해될 수 있다. 즉, 본 발명에서는 이상 반응을 유발한 의심 요소를 결정하기 위해, 특정한 입력 값(X)을 입력하면 추론된 입력 값과 출력 값 사이의 관계를 사용하여 특정한 입력 값에 대응되는 출력 값(Y)을 얻는 머신러닝 모델(120)의 능력을 사용하는 것이 아니라, 추론된 입력 값과 출력 값 사이의 관계 자체를 이용하여 이상 반응을 유발한 의심 요소를 결정할 수 있다.Next, the controller 130 may determine a suspicious factor that caused an abnormal reaction based on the result of training the machine learning model 120 ( S230 ). More specifically, in the present invention, determining a suspect element based on the result of training the machine learning model 120 means the input value (X) and the output value (Y) inferred by the machine learning model 120 as a result of learning It can be understood as using the relationship itself of That is, in the present invention, when a specific input value (X) is input in order to determine a suspicious factor that induced an abnormal reaction, the relationship between the inferred input value and the output value is used to determine the output value (Y) corresponding to the specific input value. Rather than using the ability of the machine learning model 120 to obtain

이하에서, 상술한 S210 단계 내지 S230 단계 각각에 대해서 보다 구체적으로 살펴보도록 한다.Hereinafter, each of the above-described steps S210 to S230 will be described in more detail.

머신러닝 모델 학습을 위한 입력 데이터 추출 방법How to extract input data for training machine learning models

도 3은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에서, 머신러닝 모델 학습을 위한 입력 데이터를 추출하는 예시를 나타낸 도이다. 3 is a diagram illustrating an example of extracting input data for machine learning model learning in an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.

이하에서, 도 3을 참조하여 제어부(130)가 환자들의 의료 기록과 관련된 의료 데이터를 저장하는 데이터 베이스(110)로부터 머신러닝 학습을 위한 입력 데이터를 상기 의료 데이터에 기초하여 추출하는 방법에 대해서 구체적으로 살펴보도록 한다. Hereinafter, with reference to FIG. 3 , a method in which the control unit 130 extracts input data for machine learning learning from the database 110 storing medical data related to medical records of patients based on the medical data will be described in detail. Let's take a look at

도 3을 참조하면, 먼저, 제어부(130)는 데이터 베이스(110)로부터 환자들의 의료 기록과 관련된 의료 데이터를 추출할 수 있다. 상기 의료 데이터는 환자별로 각각 추출될 수 있다(301 내지 303).상기 의료 데이터 추출 과정은 제어부(130) 및 데이터 베이스(110) 각각의 통신부를 통하여 상호간에 수행될 수 있다. Referring to FIG. 3 , first, the controller 130 may extract medical data related to patients' medical records from the database 110 . The medical data may be extracted for each patient ( 301 to 303 ). The medical data extraction process may be mutually performed through each communication unit of the controller 130 and the database 110 .

다시 도 3을 참조하면, 제어부(130)는, 상기 머신러닝 학습을 위한 입력 데이터를 추출하기 위해, 추출된 의료 데이터를 분석하고, 이상 반응이 환자들에게 발생한 시점과 관련된 데이터를 환자 별로 각각 추출할 수 있다(도 310 내지 330). 여기서, 상기 이상 반응이 환자들에게 발생한 시점과 관련된 데이터는 이상 반응 발생 시점 데이터로 호칭될 수 있으며, 이와 동일 유사하게 해석되는 범위에서 다양하게 표현될 수 있음은 물론이다. 도 3에서, 환자 A 및 환자 C(310 및 330)는 이상 반응이 발생하였지만, 환자 B(120)는 이상 반응이 발생하지 않은 것을 알 수 있다.Referring back to FIG. 3 , the controller 130 analyzes the extracted medical data in order to extract the input data for machine learning learning, and extracts data related to the time when an adverse reaction occurs in the patients for each patient. It can be done (FIGS. 310 to 330). Here, the data related to the time point at which the adverse event occurred in the patients may be referred to as data at the time point at which the adverse event occurred, and of course, it may be expressed in various ways within the same and similarly interpreted range. In FIG. 3 , it can be seen that an adverse reaction occurred in patient A and patient C 310 and 330 , but no adverse reaction occurred in patient B 120 .

다시 도 3을 참조하면, 제어부(130)는, 상기 머신러닝 학습을 위한 입력 데이터를 추출하기 위해, 환자 별로 각각 추출된 이상 반응 발생 데이터에 기초하여, 이상 반응이 특정 환자에게 발생한 시점 이전 특정 기간 동안의 상기 특정 환자의 의료 기록과 관련된 데이터를 추출할 수 있다(312, 331, 및 333). 여기서, 이상 반응이 특정 환자에게 발생한 시점 이전 특정 기간 동안의 상기 특정 환자의 의료 기록과 관련된 데이터는 이벤트(event) 구간 데이터로 호칭될 수 있으며, 이와 동일 유사하게 해석되는 범위에서 다양하게 표현될 수 있음은 물론이다. 상기 특정 기간은 이상 반응이 발생한 시점을 포함하여 설정될 수 있다. 또한, 상기 이벤트 구간 데이터가 추출되는 특정 기간의 길이는 타겟 이상반응의 특성에 따라서 유동적으로 설정될 수 있다. 예를 들어, 타겟 이상 반응이 상기 타겟 이상 반응과 관련된 의심 요소(즉, 약물 처방, 백신 접종 등)가 있었던 시점으로부터 짧은 기간 후에 발생하는 것인 경우, 상기 특정 기간은 짧게 설정될 수 있다. 반대로, 타겟 이상 반응이 상기 타겟 이상 반응과 관련된 의심 요소(즉, 약물 처방, 백신 접종 등)가 있었던 시점으로부터 장기간 후에 발생하는 것인 경우, 상기 특정 기간은 길게 설정될 수 있다. Referring back to FIG. 3 , in order to extract the input data for machine learning learning, the control unit 130, based on the abnormal reaction occurrence data extracted for each patient, a specific period before the time point when the abnormal reaction occurred in a specific patient Data related to the medical record of the specific patient during the period may be extracted (312, 331, and 333). Here, the data related to the medical record of the specific patient for a specific period before the time when the adverse reaction occurred in the specific patient may be referred to as event interval data, and may be expressed in various ways within the same interpretation range. of course there is The specific period may be set including a time point at which an adverse reaction occurs. In addition, the length of the specific period from which the event period data is extracted may be flexibly set according to the characteristics of the target adverse reaction. For example, when the target adverse event occurs after a short period of time from the time when there was a suspicious factor (ie, drug prescription, vaccination, etc.) related to the target adverse event, the specific period may be set to be short. Conversely, when the target adverse event occurs after a long period of time from the point in time when there was a suspicious factor (ie, drug prescription, vaccination, etc.) related to the target adverse event, the specific period may be set to be long.

다시 도 3을 참조하면, 제어부(130)는, 상기 머신러닝 학습을 위한 입력 데이터를 추출하기 위해, 이상 반응이 특정 환자에게 발생한 시점 외의 임의의 시점 이전 상기 특정 기간 동안의 상기 특정 환자의 의료 기록과 관련된 데이터를 추출할 수 있다(311, 313, 321, 322, 323 및 332). 상기 이상 반응이 특정 환자에게 발생한 시점 외의 임의의 시점 이전 상기 특정 기간 동안의 상기 특정 환자의 의료 기록과 관련된 데이터는 제어(control) 구간 데이터로 호칭될 수 있으며, 이와 동일 유사하게 해석되는 범위에서 다양하게 표현될 수 있음은 물론이다.Referring back to FIG. 3 , in order to extract the input data for machine learning learning, the controller 130 , the medical record of the specific patient for the specific period before any point other than the time when the abnormal reaction occurred in the specific patient. Data related to can be extracted (311, 313, 321, 322, 323 and 332). Data related to the medical record of the specific patient for the specific period before any time other than the time when the adverse event occurred in the specific patient may be referred to as control interval data, and may vary within the same similarly interpreted range. It is, of course, possible to express

추출된 각각의 이벤트 구간 데이터 및 제어 구간 데이터는 환자들의 진단, 약물 처방, 백신 접종 등의 정보를 포함할 수 있다. 또한, 상기 진단, 상기 약물 처방, 상기 백신 접종 등은 각각 서로 다른 종류의 복수의 진단, 약물 처방, 백신 접종 등의 정보일 수 있다. 추가적으로, 추출된 각각의 이벤트 구간 데이터 및 제어 구간 데이터는 환자들의 성별, 나이 등의 환자 정보를 더 포함할 수 있다. 단, 본 명세서에서, 추출된 각각의 이벤트 구간 데이터 및 제어 구간 데이터에 포함된 정보들 중, 약물 처방, 백신 접종과 관련된 정보가 이상반응을 유발하는 의심 요소를 감시하기 위해서 보다 바람직하게 사용될 수 있다. Each of the extracted event section data and control section data may include information such as diagnosis, drug prescription, and vaccination of patients. In addition, the diagnosis, the drug prescription, the vaccination, and the like may be information on a plurality of different types of diagnosis, drug prescription, vaccination, and the like, respectively. Additionally, each of the extracted event section data and control section data may further include patient information such as gender and age of the patients. However, in the present specification, among the information included in each extracted event section data and control section data, information related to drug prescription and vaccination may be more preferably used to monitor suspicious factors that cause adverse reactions. .

추가적으로, 추출된 전체 이벤트 구간 데이터의 수와 전체 제어 구간 데이터의 수는 일정한 비율로 구성될 수 있다. 일 예로, 상기 비율은 전체 이벤트 구간 데이터의 수:전체 제어 구간 데이터의 수 = 1:2의 비율로 구성될 수 있다.Additionally, the number of extracted total event period data and the total number of control period data may be configured at a constant ratio. For example, the ratio may consist of a ratio of the total number of event section data: the total number of control section data = 1:2.

다시 도 3을 참조하면, 제어부(130)는, 상기 머신러닝 학습을 위한 입력 데이터를 추출하기 위해, 상기 이벤트 구간 데이터가 추출되는 시간 구간 및 상기 제어 구간 데이터가 추출되는 시간 구간의 사이에는 특정 간격의 시간 구간이 삽입할 수 있다. 상기 특정 간격의 시간 구간은 wash out 구간으로도 표현될 수 있으며, 이와 동일 유사하게 해석되는 범위에서 다양하게 표현될 수 있음은 물론이다. 삽입된 특정 간격의 시간 구간에 기초하여 상기 이벤트 구간 데이터가 추출되는 시간 구간 및 상기 제어 구간 데이터가 추출되는 시간 구간은 중첩없이 시간적으로 이격될 수 있다. 이벤트 구간 데이터와 제어 구간 데이터가 이격되지 않아 중첩되는 경우, 입력 데이터에 기초한 머신러닝 모델(120)의 학습이 부정확하게 수행될 수 있다. 따라서, 이벤트 구간 데이터가 추출되는 시간 구간과 제어 구간 데이터가 추출되는 시간 구간 사이에 중첩이 없도록 구성함으로써, 이벤트 구간 데이터들과 제어 구간 데이터들이 서로에 간에 미치는 영향을 제거할 수 있고, 머신 러닝 모델(120)의 학습이 정확하게 수행될 수 있다.Referring back to FIG. 3 , in order to extract the input data for machine learning learning, the control unit 130 provides a specific interval between a time interval in which the event interval data is extracted and a time interval in which the control interval data is extracted. A time interval of can be inserted. Of course, the time period of the specific interval may be expressed as a wash out period, and may be expressed in various ways within the same and similarly interpreted range. Based on the inserted time section of the specific interval, the time section from which the event section data is extracted and the time section from which the control section data is extracted may be temporally spaced apart from each other without overlapping. When the event section data and the control section data are not separated and overlapped, the learning of the machine learning model 120 based on the input data may be performed inaccurately. Therefore, by configuring so that there is no overlap between the time period in which the event period data is extracted and the time period in which the control period data is extracted, the influence of the event period data and the control period data on each other can be removed, and the machine learning model The learning of (120) can be accurately performed.

본 방법에서는, 입력 데이터가 단순히 처방 유무, 백신 접종 등의 유무만을 고려하여 구성되는 것이 아니라, 약물 처방 , 백신 접종 등의 횟수와 같은 정량적인 정보가 고려될 수 있는 효과가 있다. 또한, 정량적인 정보가 고려될 수 있으므로, 서로 다른 의심 요소들 간의 상호 작용이 고려된 머신 러닝 모델(120)의 학습 결과를 얻을 수 있는 효과가 있다.In this method, there is an effect that quantitative information such as the number of times of drug prescription and vaccination, etc. can be taken into account, rather than simply considering input data by considering only the presence or absence of prescription and vaccination. In addition, since quantitative information can be considered, there is an effect of obtaining a learning result of the machine learning model 120 in which the interaction between different suspicious elements is considered.

머신러닝 모델 학습 방법How to train a machine learning model

이하에서, 제어부(130)가 상기 추출된 입력 데이터를 머신러닝 모델(120)에 입력하여, 머신러닝 모델을 학습시키는 방법에 대해서 구체적으로 살펴보도록 한다. 상기 입력 데이터는 학습 데이터 등으로도 호칭될 수 있으며, 이와 동일 유사하게 해석될 수 있는 범위에서 다양하게 표현될 수 있음은 물론이다. Hereinafter, a method in which the control unit 130 inputs the extracted input data to the machine learning model 120 and trains the machine learning model will be described in detail. It goes without saying that the input data may also be referred to as learning data and the like, and may be expressed in various ways within a range that can be interpreted similarly to this.

입력 데이터는 제어부(130)가 데이터 베이스로부터 환자별로 추출한 전체 이벤트 구간 데이터 및 제어 구간 데이터 중에서 특정 개수의 데이터를 무작위적으로 특정 횟수로 샘플링하여 얻은 샘플 데이터들로 구성될 수 있다. 예를 들어, 제어부(130)가 서로 다른 환자 10명의 환자(환자 1 내지 환자 10) 각각에 대한 이벤트 구간 데이터 및 제어 구간 데이터를 추출하였고, 환자 각각에 대하여 하나의 이벤트 구간 데이터 및 두 개의 제어 구간 데이터가 존재하는 경우, 입력 데이터는 전체 이벤트 구간 데이터 및 제어 구간(또는, 대조 구간) 데이터로부터(이벤트 구간 데이터 10개 및 제어 구간 데이터 20개) 특정 개수의 데이터를 무작위적으로 복수 회 샘플링하여 얻은 복수 개의 샘플링 데이터로 구성될 수 있다. 즉, 전체 이벤트 구간 데이터 및 제어 구간 데이터로부터(이벤트 구간 데이터 10개 및 제어 구간 데이터 20개) 5개의 데이터를 무작위적으로 10회 샘플링한다면, 5개의 이벤트 구간 데이터 및/또는 제어 구간 데이터로 구성된 10개의 샘플 데이터가 입력 데이터를 구성할 수 있다. 단, 이와 같이 입력 데이터를 구성하는 것은 하나의 예시에 불과할 뿐, 본 발명이 이에 제한되는 것은 아니다.The input data may be composed of sample data obtained by randomly sampling a specific number of data from among the total event section data and control section data extracted by the controller 130 for each patient from the database at a specific number of times. For example, the controller 130 extracts event section data and control section data for each of 10 different patients (patients 1 to 10), and one event section data and two control sections for each patient When data exists, the input data is obtained by randomly sampling a certain number of data multiple times from the entire event interval data and control interval (or control interval) data (10 event interval data and 20 control interval data). It may consist of a plurality of sampling data. That is, if 5 data are randomly sampled 10 times from the entire event interval data and control interval data (10 event interval data and 20 control interval data), 10 consisting of 5 event interval data and/or control interval data The number of sample data may constitute input data. However, configuring the input data in this way is only an example, and the present invention is not limited thereto.

상술한 바와 같이, 제어부(130)가 데이터 베이스로부터 환자별로 추출한 각각의 이벤트 구간 데이터 및 제어 구간 데이터는 환자들의 특정 증상에 대한 진단, 약물 처방, 백신 접종 등의 정보를 포함할 수 있다. 추가적으로, 추출된 각각의 이벤트 구간 데이터 및 제어 구간 데이터는 환자들의 성별, 나이 등의 환자 정보를 더 포함할 수 있다. 여기서, 입력 데이터에 포함된 환자들의 특정 증상에 대한 진단, 약물 처방, 백신 접종 등의 정보들은 ‘전체 요소’로 통칭될 수 있다. 즉, 제어부(130)가 데이터 베이스로부터 환자별로 추출한 전체 이벤트 구간 데이터 및 제어 구간 데이터를 분석한 결과, 약물 A,B,C 및 백신 1,2,3가 전체 이벤트 구간 데이터 및 제어 구간 데이터에 포함되어 있었다면, 약물 A,B,C 및 백신 1,2,3가 ‘전체 요소’가 될 수 있다.As described above, each event section data and control section data extracted by the controller 130 for each patient from the database may include information such as diagnosis, drug prescription, vaccination, etc. for specific symptoms of patients. Additionally, each of the extracted event section data and control section data may further include patient information such as gender and age of the patients. Here, information such as diagnosis of patients' specific symptoms, drug prescription, vaccination, etc. included in the input data may be collectively referred to as 'total factor'. That is, as a result of the control section 130 analyzing the entire event section data and control section data extracted from the database for each patient, drugs A, B, C and vaccines 1, 2, 3 are included in the total event section data and control section data If so, drugs A, B, C and vaccines 1,2,3 could be 'whole elements'.

본 방법에서, 제어부(130)는 입력 데이터에 포함된 전체 요소들 각각에 대하여 특징 중요도가 부여되도록 머신러닝 모델(120)을 학습시킬 수 있다. 즉, 위와 같이, 제어부(130)가 데이터 베이스로부터 환자별로 추출한 전체 이벤트 구간 데이터 및 제어 구간 데이터를 분석한 결과, 약물 A,B,C 및 백신 1,2,3가 전체 이벤트 구간 데이터 및 제어 구간 데이터에 포함되어 있었다면, 머신러닝 모델(120)이 학습이 완료된 경우, 약물 A,B,C 및 백신 1,2,3 각각에 대하여 특징 중요도가 부여될 수 있다. 이 때, 상기 특징 중요도는 상기 머신러닝 모델 학습을 통하여 판단된 이상 반응과 특정 요소 간의 상관 관계의 정도에 기초하여 각각 계산될 수 있다. 즉, 상기 특징 중요도는 입력 데이터에 포함된 전체 요소들 중 상관 관계의 정도가 상대적으로 큰 것으로 판단된 요소에 대하여 높은 값으로 부여(또는, 할당)될 수 있다. 보다 구체적으로, 머신러닝 모델(120)의 학습 결과, 전체 요소에 포함된 특정한 요소가 이상 반응 발생과의 인과성이 큰 것으로 판단된 경우, 상기 특정한 요소에는 상대적으로 높은 값의 특징 중요도 값이 부여될 수 있다. 반대로, 머신러닝 모델(120)의 학습 결과, 전체 요소에 포함된 특정한 요소가 이상 반응 발생과의 인과성이 작은 것으로 판단된 경우, 상기 특정한 요소에는 상대적으로 낮은 값의 특징 중요도 값이 부여될 수 있다. 상기 특징 중요도는 전체 요소들 각각에 대하여 부여될 수 있다. 상기 특징 중요도는 특성 중요도 등으로도 호칭될 수 있으며, 이와 동일 유사하게 해석될 수 있는 범위에서 다양하게 표현될 수 있음은 물론이다. In this method, the controller 130 may train the machine learning model 120 so that feature importance is assigned to each of all elements included in the input data. That is, as above, as a result of the control section 130 analyzing the entire event section data and control section data extracted from the database for each patient, drugs A, B, C and vaccines 1, 2, 3 are the entire event section data and control section If included in the data, when the machine learning model 120 has completed learning, the feature importance may be assigned to each of drugs A, B, C and vaccines 1, 2, and 3 . In this case, the feature importance may be respectively calculated based on the degree of correlation between the abnormal response determined through the machine learning model learning and a specific factor. That is, the feature importance may be assigned (or assigned) as a high value to an element determined to have a relatively high degree of correlation among all elements included in the input data. More specifically, as a result of the learning of the machine learning model 120, when it is determined that a specific element included in all elements has a large causal relationship with the occurrence of an abnormal reaction, a relatively high feature importance value will be given to the specific element. can Conversely, as a result of learning the machine learning model 120 , when it is determined that a specific factor included in all factors has a small causal relationship with the occurrence of an abnormal reaction, a relatively low feature importance value may be assigned to the specific factor. . The feature importance may be assigned to each of the entire elements. It goes without saying that the feature importance may also be referred to as feature importance or the like, and may be expressed in various ways within a range that can be interpreted in the same way.

이하에서, 도 4 및 도 5를 참조하여, 특징 중요도가 계산되어 입력 데이터에 포함된 전체 요소 각각에 대해서 부여되는 과정을 보다 구체적으로 설명하도록 한다. Hereinafter, a process in which feature importance is calculated and assigned to each of all elements included in input data will be described in more detail with reference to FIGS. 4 and 5 .

도 4는 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에서, 머신러닝 모델의 학습 과정의 이해를 돕기위한 예시를 나타낸 도이다. 4 is a diagram illustrating an example for helping the understanding of a learning process of a machine learning model in an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.

보다 구체적으로, 도 4는 트리 스캔 방식에 기초한 이상 반응 감시 방법의 문제점을 설명하기 위한 일 예를 나타낸 도이다. 트리 스캔 방식에 기초한 이상 반응 감시 방법은 계층 구조를 가지는 분석 대상 데이터에 대해서 수행될 수 있다. 여기서, 상기 분석 대상 데이터는 복수의 샘플 데이터를 포함할 수 있다. 도 4를 참조하면, 410(계층 1), 421 및 422(계층 2) 및 431, 432, 433 및 434(계층3)와 같이 분석 대상 데이터(또는, 원본 데이터)가 계층적인 구조를 이루고 있음을 알 수 있다. 보다 구체적으로, 계층 2는 계층 1의 분석 대상 데이터가 분기하여 구성되고, 계층 3은 계층 2의 데이터들이 각각 분기하여 구성될 수 있다.More specifically, FIG. 4 is a diagram illustrating an example for explaining a problem of an abnormal reaction monitoring method based on a tree scan method. The abnormal reaction monitoring method based on the tree scan method may be performed on data to be analyzed having a hierarchical structure. Here, the analysis target data may include a plurality of sample data. Referring to FIG. 4 , analysis target data (or source data) has a hierarchical structure such as 410 (layer 1), 421 and 422 (layer 2), and 431, 432, 433, and 434 (layer 3). Able to know. More specifically, the layer 2 may be configured by branching the analysis target data of the layer 1, and the layer 3 may be configured by branching the data of the layer 2, respectively.

도 4의 계층 1을 참조하면, 원본 데이터(410)는 이상 반응이 발생한 샘플 데이터 4개와 이상 반응이 발생하지 않은 샘플 데이터 8개로 구성되어 있으며, 이 때 원본 데이터(410)의 표본 집단 발생률은 33(4/12 * 100)%가 된다. Referring to layer 1 of FIG. 4 , the original data 410 consists of 4 sample data in which an abnormal reaction occurred and 8 sample data in which an abnormal reaction did not occur. In this case, the sample group incidence rate of the original data 410 is 33 It becomes (4/12 * 100)%.

다음, 도 4의 계층 2를 참조하면, 계층 1의 원본 데이터는 A를 처방 받은 경우에 대한 데이터(421)와 B를 처방 받은 경우에 대한 데이터(422)로 분할될 수 있다. 여기서, A를 처방 받은 경우에 대한 데이터(421)는 이상 반응이 발생한 샘플 데이터 3개와 이상 반응이 발생하지 않은 샘플 데이터 6개로 구성되어 있으며, 이 때 A를 처방 받은 경우에 대한 데이터(421)의 표본 집단 발생률을 33(3/9 * 100)%가 된다. 또한, B를 처방 받은 경우에 대한 데이터(422)는 이상 반응이 발생한 샘플 데이터 2개와 이상 반응이 발생하지 않은 샘플 데이터 4개로 구성되어 있으며, 이 때 A를 처방 받은 경우에 대한 데이터(421)의 표본 집단 발생률을 33(2/6 * 100)%가 된다. 이 때, 원본 데이터(410)에 포함된 샘플 데이터는 A와 B를 모두 처방 받은 경우에 해당하는 샘플 데이를 포함할 수 있으므로, 계층 2의 A를 처방 받은 경우에 대한 데이터에 포함된 샘플 데이터의 개수와 B를 처방 받은 경우에 대한 데이터에 포함된 샘플 데이터의 개수의 합은 원본 데이터에 포함된 샘플 데이터의 개수보다 클 수 있다. 여기서, A와 B 각각은 약물 그룹 단위일 수 있다. 결과적으로, A를 처방 받은 경우에 대한 데이터(421)와 B를 처방 받은 경우에 대한 데이터(422) 각각에서의 표본 집단 발생률이 동일하므로, 이와 같은 경우, 계층 2에서 얻은 데이터들은 이상 반응의 발생과 관련된 의심 요소를 결정하기 위한 유의미한 결과로 볼 수 없다.Next, referring to layer 2 of FIG. 4 , the original data of layer 1 may be divided into data 421 for a case where A is prescribed and data 422 for a case where B is prescribed. Here, the data 421 for the case where A was prescribed consists of 3 sample data in which an adverse reaction occurred and 6 sample data in which the adverse reaction did not occur, and at this time, the data 421 for the case where A was prescribed. The sample population incidence is 33 (3/9 * 100)%. In addition, the data 422 for the case of receiving a prescription for B is composed of two sample data in which an abnormal reaction occurred and four sample data in which an abnormal reaction did not occur. The sample population incidence is 33 (2/6 * 100)%. At this time, since the sample data included in the original data 410 may include sample data corresponding to the case where both A and B are prescribed, the sample data included in the data for the case where A of the layer 2 is prescribed. The sum of the number and the number of sample data included in data for a case in which B is prescribed may be greater than the number of sample data included in original data. Here, each of A and B may be a drug group unit. As a result, since the sample population incidence rate in each of the data 421 for the case of prescription A and the data 422 for the case where B was prescribed is the same, in this case, the data obtained from tier 2 are It cannot be regarded as a meaningful result for determining the suspicious factors related to

다음, 도 4의 계층 3를 참조하면, 계층 2의 A를 처방 받은 경우에 대한 데이터(421)는 A01을 처방 받은 경우에 대한 데이터(431) 및 A02를 처방 받은 경우에 대한 데이터(432)로 분할될 수 있다. 또한, B를 처방 받은 경우에 대한 데이터(422)는 B01을 처방 받은 경우에 대한 데이터(433) 및 B02를 처방 받은 경우에 대한 데이터(434)로 분할될 수 있다. 여기서, A01을 처방 받은 경우에 대한 데이터(431)는 이상 반응이 발생한 샘플 데이터 2개와 이상 반응이 발생하지 않은 샘플 데이터 4개로 구성되어 있으며, 이 때 A01을 처방 받은 경우에 대한 데이터(431)의 표본 집단 발생률을 33(2/6 * 100)%가 된다. 또한, A02을 처방 받은 경우에 대한 데이터(431)는 이상 반응이 발생한 샘플 데이터 1개와 이상 반응이 발생하지 않은 샘플 데이터 2개로 구성되어 있으며, 이 때 A01을 처방 받은 경우에 대한 데이터(431)의 표본 집단 발생률을 33(1/3 * 100)%가 된다.Next, referring to the layer 3 of FIG. 4 , the data 421 for the case where A of the layer 2 is prescribed is data 431 for the case where A01 is prescribed and the data 432 for the case where A02 is prescribed. can be divided. Also, the data 422 for the case where B is prescribed may be divided into data 433 for the case where B01 is prescribed and data 434 for the case where B02 is prescribed. Here, the data 431 for the case where A01 was prescribed consists of two sample data in which an adverse reaction occurred and four sample data in which an adverse reaction did not occur. The sample population incidence is 33 (2/6 * 100)%. In addition, the data 431 for the case where A02 was prescribed consists of one sample data in which an adverse reaction occurred and two sample data in which an adverse reaction did not occur. The sample population incidence is 33 (1/3 * 100)%.

B01을 처방 받은 경우에 대한 데이터(433)는 이상 반응이 발생한 샘플 데이터 1개와 이상 반응이 발생하지 않은 샘플 데이터 2개로 구성되어 있으며, 이 때 B01을 처방 받은 경우에 대한 데이터(433)의 표본 집단 발생률을 33(1/2 * 100)%가 된다. 또한, B02을 처방 받은 경우에 대한 데이터(434)는 이상 반응이 발생한 샘플 데이터 1개와 이상 반응이 발생하지 않은 샘플 데이터 2개로 구성되어 있으며, 이 때 A01을 처방 받은 경우에 대한 데이터(434)의 표본 집단 발생률을 33(1/3 * 100)%가 된다.The data 433 for the case where B01 was prescribed consists of one sample data in which an adverse reaction occurred and two sample data in which an adverse reaction did not occur. The incidence rate becomes 33 (1/2 * 100)%. In addition, the data 434 for the case where B02 was prescribed consists of one sample data in which an adverse reaction occurred and two sample data in which an adverse reaction did not occur. The sample population incidence is 33 (1/3 * 100)%.

여기서, A01과 A02 A 약물 그룹에 속하는 세부 약물일 수 있고, B01과 B02 각각은 B 약물 그룹에 속하는 세부 약물 수 있다. 결과적으로, A01을 처방 받은 경우에 대한 데이터(431), A02를 처방 받은 경우에 대한 데이터(432), B01을 처방 받은 경우에 대한 데이터(433) 및 B02를 처방 받은 경우에 대한 데이터(434) 각각에서의 표본 집단 발생률이 모두 동일하므로, 이와 같은 경우, 계층 3에서 얻은 데이터들은 이상 반응의 발생과 관련된 의심 요소를 결정하기 위한 유의미한 결과로 볼 수 없다.Here, A01 and A02 may be sub-drugs belonging to drug group A, and each of B01 and B02 may be sub-drugs belonging to drug group B. As a result, data on the case where A01 was prescribed (431), data about the case where A02 was prescribed (432), data about the case where B01 was prescribed (433), and data about the case where B02 was prescribed (434) Since the incidence rates of the sample population in each are the same, in this case, the data obtained from stratum 3 cannot be regarded as a meaningful result for determining a suspect factor related to the occurrence of an adverse event.

이와 같이, 트리 스캔 방식에 기초한 이상 반응 감시 방법의 경우, 계층 내에서만 분석이 수행되므로, 분석 대상 데이터에 포함된 샘플 데이터의 구성에 따라서 이상 반응의 발생과 관련된 의심 요소가 검출 될 수 없는 경우가 존재한다. 즉, 본 방법은 서로 다른 계층의 약물 간의 상호작용(즉, B 처방을 전제로 A 처방 유무에 대한 분석)이 고려하지 못하는 한계를 가질 수 있다.As such, in the case of the abnormal reaction monitoring method based on the tree scan method, since the analysis is performed only within the hierarchy, there may be cases where a suspicious element related to the occurrence of an adverse reaction cannot be detected depending on the composition of the sample data included in the analysis target data. exist. That is, the present method may have a limitation that the interaction between drugs of different layers (ie, analysis of the presence or absence of prescription A on the premise of prescription B) cannot be considered.

도 5는 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에서, 머신러닝 모델의 학습 과정의 이해를 돕기위한 예시를 나타낸 도이다. 보다 구체적으로, 도 5는 랜덤 포레스트(random forest) 방식에 기초한 특징 중요도 계산의 이해를 돕기 위한 도이다.5 is a diagram illustrating an example for helping the understanding of a learning process of a machine learning model in an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention. More specifically, FIG. 5 is a diagram to help the understanding of feature importance calculation based on a random forest method.

랜덤 포레스트 방식에 기초한 머신러닝 모델 학습을 위한 입력 데이터는 환자별로 추출된 전체 이벤트 구간 데이터 및 제어 구간 데이터 중에서 특정 개수의 데이터를 무작위적으로 샘플링한 샘플 데이터들로 구성될 수 있다.The input data for learning the machine learning model based on the random forest method may be composed of sample data obtained by randomly sampling a specific number of data from the entire event section data and control section data extracted for each patient.

랜덤 포레스트 방식에서는, 머신 러닝 모델 학습을 위한 입력 데이터에 포함된 전체 요소들 각각의 특징 중요도를 계산하기 위해서 불순도라는 개념을 사용하고, 불순도에 기초해서 상기 전체 요소들 각각의 중요도 이득(impotance gain: IG)를 계산하며, 상기 전체 요소들 각각의 중요도 이득에 기초하여 특징 중요도가 계산될 수 있다. 위와 같은 과정은 입력 데이터에 포함된 샘플 데이터들에 대해서 각각 수행될 수 있다.In the random forest method, the concept of impurity is used to calculate the feature importance of each of all elements included in the input data for machine learning model training, and the importance of each of the elements based on the impurity is used. gain: IG), and feature importance may be calculated based on the importance gain of each of the total elements. The above process may be performed on each of the sample data included in the input data.

도 5를 참조하여 하나의 샘플 데이터에서 불순도, 중요도 이득에 기초하여 특징 중요도가 계산되는 과정을 보다 구체적으로 설명하도록 한다. A process in which feature importance is calculated based on impurity and importance gain in one sample data will be described in more detail with reference to FIG. 5 .

도 5는, 제어부(130)가 데이터 베이스(110)로부터 환자별로 각각 추출한 전체 이벤트 구간 데이터 및 제어 구간 데이터를 분석한 결과, 전체 요소는 약물 A 및 B로 판단된 경우를 가정하며, 도 5에서 사용된 샘플링 데이터는 4개의 이벤트 구간 데이터(이상 반응 발생) 및 6개의 제어 구간 데이터(이상 반응 발생 하지 않음)로 구성된다. 상기 샘플링 데이터는 제어부(130)가 데이터 베이스(110)로부터 환자별로 각각 추출한 전체 이벤트 구간 데이터 및 제어 구간 데이터로부터 샘플링된 것일 수 있다.5, it is assumed that the controller 130 analyzes the entire event section data and control section data extracted from the database 110 for each patient, and it is assumed that all elements are drugs A and B, in FIG. The sampling data used consists of 4 event section data (abnormal reaction occurrence) and 6 control section data (no abnormal reaction occurrence). The sampling data may be sampled from the entire event section data and control section data extracted by the controller 130 for each patient from the database 110 .

먼저, 약물 B의 중요도 이득을 계산하는 과정을 설명한다. 약물 B의 중요도 이득을 계산하기 위해서는, 먼저 전체 데이터로부터 샘플링 된 원본 데이터(510)의 불순도를 계산한다. 상기, 불순도는 전체 1-(이상 반응이 발생한 샘플 데이터의 수/전체 샘플 데이터의 수)^2 - (이상 반응이 발생하지 않은 샘플 데이터의 수/전체 샘플 데이터의 수)^2와 같이 계산될 수 있다. 따라서, 원본 데이터(510)의 불순도는 1-(4/12)^2 - (8/12)^2 = 4/9가 된다. First, the process of calculating the importance benefit of drug B is described. In order to calculate the importance gain of drug B, first, the impurity of the original data 510 sampled from the entire data is calculated. Above, the impurity is calculated as 1 - (the number of sample data in which an adverse reaction occurred / the total number of sample data)^2 - (the number of sample data in which an adverse reaction did not occur / the total number of sample data)^2 can be Accordingly, the impurity of the original data 510 becomes 1-(4/12)^2 - (8/12)^2 = 4/9.

다음, B의 불순도를 계산한다. 원본 데이터(510)는 약물 B를 처방 받지 않은 경우에 대한 데이터(521)와 약물 B를 처방 받은 경우에 대한 데이터(522)로 분할되는데, 약물 B를 처방 받지 않은 경우에 대한 데이터(521)과 약물 B를 처방 받은 경우에 대한 데이터(522) 각각에 대한 불순도를 먼저 계산한다. 상기 불순도를 구하는 방식에 의하여, 약물 B를 처방 받지 않은 경우에 대한 데이터(521)의 불순도는 1-(2/6)^2 - (4/6)^2 = 4/9가 된다. 또한, 약물 B를 처방 받은 경우에 대한 데이터(522)의 불순도는 1-(2/6)^2 - (4/6)^2 = 4/9가 된다. 다음, 약물 B를 처방 받지 않은 경우에 대한 데이터(521)와 약물 B를 처방 받은 경우에 대한 데이터(522) 각각의 가중치를 고려하여 약물 B의 전체 불순도를 계산한다. 이 때, 가중치는 원본 데이터로부터 분할된 데이터 각각에 포함된 샘플 데이터 수의 비율에 기초하여 결정될 수 있다. 즉, 도 5에서 원본 데이터(510)에 포함된 샘플 데이터의 수는 12였고, 약물 B를 처방 받지 않은 경우에 대한 데이터(521)와 약물 B를 처방 받은 경우에 대한 데이터(522) 각각이 6개의 샘플 데이터를 포함하므로, 약물 B를 처방 받지 않은 경우에 대한 데이터(521)와 약물 B를 처방 받은 경우에 대한 데이터(522) 각각의 가중치는 6/12 , 6/12 로 각각 계산될 수 있다. 계산된 가중치에 기초하여 약물 B의 전체 불순도를 계산하면 1/2*4/9 + 1/2* 4/9 = 4/9로 계산될 수 있다.Next, calculate the impurity of B. The original data 510 is divided into data 521 for a case where drug B is not prescribed and data 522 for a case where drug B is prescribed, and data 521 for a case where drug B is not prescribed and First, the impurity for each of the data 522 for the case of receiving drug B is calculated. By the method of calculating the impurity, the impurity of the data 521 for the case where drug B is not prescribed becomes 1-(2/6)^2 - (4/6)^2 = 4/9. In addition, the impurity of the data 522 for the case where drug B is prescribed is 1-(2/6)^2 - (4/6)^2 = 4/9. Next, the total impurity of the drug B is calculated by considering the weights of the data 521 for the case where the drug B is not prescribed and the data 522 for the case where the drug B is prescribed. In this case, the weight may be determined based on a ratio of the number of sample data included in each data divided from the original data. That is, the number of sample data included in the original data 510 in FIG. 5 was 12, and the data 521 for the case where the drug B was not prescribed and the data 522 for the case where the drug B was prescribed are 6 Since the sample data is included, the weights of the data 521 for the case where the drug B is not prescribed and the data 522 for the case where the drug B is prescribed can be calculated as 6/12 and 6/12, respectively. . If we calculate the total impurity of drug B based on the calculated weight, it can be calculated as 1/2*4/9 + 1/2*4/9 = 4/9.

최종적으로, 계산된 약물 B의 전체 불순도에 기초하여 약물 B의 중요도 이득을 계산한다. 중요도 이득 = (원본 데이터의 불순도) - (분할된 데이터의 전체 불순도) 와 같이 계산될 수 있다. 따라서, 약물 B의 (전체) 중요도 이득 = (4/9) - (4/9) = 0이 된다. 이와 같은 결과는, 약물 B는 이상 반응 발생과의 상관 관계가 없다는 것을 의미할 수 있다. Finally, the importance gain of drug B is calculated based on the calculated total impurity of drug B. It can be calculated as importance gain = (impurity of the original data) - (total impurity of the partitioned data). Thus, the (total) importance gain of drug B = (4/9) - (4/9) = 0. These results may mean that drug B has no correlation with the occurrence of adverse events.

다음, 약물 A의 중요도 이득도 상술한 방법과 동일한 과정을 통하여 계산될 수 있다. 도 5에는 B를 처방 받은 경우에 대한 데이터(522)가 A를 처방 받은 경우에 대한 데이터(531) 및 A를 처방 받지 않은 경우에 대한 데이터(532)로 분할되는 것만을 나타내었지만, B를 처방 받지 않은 경우에 대한 데이터(521) 역시 A를 처방 받은 경우에 대한 데이터 및 A를 처방 받지 않은 경우에 대한 데이터로 분할될 수 있다. 이 때, B를 처방 받은 경우에 대한 데이터(522) 및 B를 처방 받지 않은 경우에 대한 데이터(521) 각각이 A를 처방 받은 경우 및 처방 받지 않은 경우에 대해서 분할된 데이터 각각에 대한 원본 데이터가 될 수 있다. 이와 같은 경우, 약물 A의 중요도 이득 계산 과정은 (i) 도 5에 도시되지 않은 B를 처방 받지 않은 경우에 대한 데이터(521)로부터 약물 A 처방 여부에 기초하여 분할된 데이터 및 (ii) B를 처방 받은 경우에 대한 데이터(522)로부터 약물 A 처방 여부에 기초하여 분할된 데이터(531 및 532) 각각에 대하여 중요도 이득을 계산하고, 계산된 중요도 이득의 평균 값이 약물 A의 전체 중요도 이득으로 계산된다. Next, the importance benefit of drug A can also be calculated through the same process as the above-described method. 5 shows that the data 522 for the case where B is prescribed is divided into the data 531 for the case where A is prescribed and the data 532 for the case where A is not prescribed, but only when B is prescribed. The data 521 for the case of not receiving may also be divided into data for the case where A is prescribed and data for the case where A is not prescribed. At this time, the data 522 for the case where B is prescribed and the data 521 for the case where B is not prescribed are the original data for each of the divided data for the case where A is prescribed and the case where the prescription is not received. can be In this case, the process of calculating the importance gain of drug A is (i) data divided based on whether drug A is prescribed from data 521 for the case where B is not prescribed in FIG. 5 and (ii) B An importance gain is calculated for each of the divided data 531 and 532 based on whether drug A is prescribed from the data 522 for the case of prescription, and the average value of the calculated importance gain is calculated as the overall importance gain of drug A do.

마지막으로, 도 5의 샘플 데이터에 기초한 약물 A 및 약물 B의 특징 중요도를 계산한다. 약물 A의 특징 중요도 = 약물 A의 전체 중요도 이득/(약물 A의 전체 중요도 이득 + 약물 B의 전체 중요도 이득)으로 계산될 수 있다. 또한, 약물 B의 특징 중요도 = 약물 B의 전체 중요도 이득/(약물 A의 전체 중요도 이득 + 약물 B의 전체 중요도 이득)으로 계산될 수 있다.Finally, the feature importance of drug A and drug B is calculated based on the sample data in FIG. 5 . It can be calculated as feature importance of drug A = total importance gain of drug A/(total importance gain of drug A + overall importance gain of drug B). It can also be calculated as feature importance of drug B = total importance gain of drug B/(total importance gain of drug A + total importance gain of drug B).

상술한 과정에 의해서, 전체 데이터로부터 샘플링된 특정 샘플 데이터에 대한 특징 중요도가 계산될 수 있다. Through the above-described process, feature importance for specific sample data sampled from all data may be calculated.

특정한 샘플 데이터에서 특정한 분할(또는, 분기) 방법으로 분할하고, 분할이 마무리된 데이터는 하나의 데이터 트리(tree)로 이해될 수 있다. 즉, 하나의 샘플 데이터에서도 분할 방법에 따라서 복수 개의 데이터 트리가 구성될 수 있다. 결론적으로, 전체 요소들에 대한 최종적인 특징 중요도를 계산하는 과정은, 데이터 샘플 및 분할 방법에 기초해서 구성되는 복수의 데이터 트리 각각에서 전체 요소에 대한 특징 중요도를 각각 계산하고, 복수의 데이터 트리 각각에서 계산된 특징 중요도의 평균값을 취하는 과정으로 수행될 수 있다. 이 때, 하나의 데이터 트리에서 분할이 일어나는 횟수, 즉 계층의 수는 전체 요소의 수에 따라서 적절하게 설정될 수 있다. 또한, 샘플 데이터에 포함된 데이터의 개수 및 데이터 샘플링 횟수는 머신러닝 모델(120) 사용자에 의해 적절하게 설정될 수 있다. Data that is divided by a specific division (or branching) method in specific sample data and the division is completed may be understood as one data tree. That is, even in one sample data, a plurality of data trees may be configured according to a division method. In conclusion, the process of calculating the final feature importance for all elements is to calculate the feature importance for all elements in each of a plurality of data trees constructed based on a data sample and a segmentation method, respectively, and each of the plurality of data trees It can be performed as a process of taking the average value of the feature importance calculated in . In this case, the number of times division occurs in one data tree, that is, the number of layers may be appropriately set according to the total number of elements. In addition, the number of data included in the sample data and the number of data sampling may be appropriately set by the machine learning model 120 user.

앞서 설명한 랜덤 포레스트 방법에 기초한 특징 중요도가 계산 방법은 특징 중요도 계산 과정에 대한 이해를 돕기위한 일 예시에 해당할 뿐이며, 본 발명의 특징 중요도 계산 방법이 이에 제한되는 것은 아니다. 랜덤 포레스트 외에도, SVM 모델 등이 특징 중요도 계산에 사용될 수 있다. 특히, SVM 모델의 경우, 특정 평면 상에 전체 요소들 각각의 특징 중요도 결정을 위한 기준선을 설정하고, 상기 특정 평면 상에서의 전체 요소들 각각의 위치를 특정한 방법에 의하여 설정하며, 상기 기준 선과 설정된 전체 요소들 각각의 위치 사이 간격(margin)의 넓고/좁음에 기초하여 전체 요소들 각각의 특징 중요도가 결정될 수 있다. 보다 구체적으로, 간격이 넓을수록 높은 특징 중요도를 갖는 것일 수 있다.The feature importance calculation method based on the random forest method described above is only an example for helping the understanding of the feature importance calculation process, and the feature importance calculation method of the present invention is not limited thereto. In addition to random forests, SVM models and the like can be used to calculate feature importance. In particular, in the case of the SVM model, a reference line for determining the feature importance of each element on a specific plane is set, the position of each element on the specific plane is set by a specific method, and the reference line and the entire set A feature importance of each of the entire elements may be determined based on wide/narrowness of a margin between positions of each of the elements. More specifically, the wider the interval, the higher the feature importance may be.

도 6은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에서, 머신러닝 모델 학습을 학습하는 과정의 예시를 나타낸 도이다. 6 is a diagram illustrating an example of a process of learning a machine learning model in an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.

본 방법에서, 제어부(130)는 입력 데이터에 포함된 전체 요소들 각각에 대하여 특징 중요도가 부여되도록 머신러닝 모델(120)을 학습시킨다는 것의 의미는, 머신러닝 모델(120)이 이벤트 구간 데이터와 제어 구간 데이터를 구분할 수 있도록 학습된다는 것과 동일/유사한 의미일 수 있다. 따라서, 이하에서 설명되는, 제어부(130)가 전체 요소들 각각에 부여된 특징 중요도에 기초하여 이상 반응 유발과 관련된 의심 요소를 결정하는 동작과는 별개로, 학습이 완료된 머신러닝 모델(120)은 학습 결과에 기초하여 이벤트 구간 데이터와 제어 구간 데이터를 구분할 수 있게 된다. 즉, 학습이 완료된 머신 러닝 모델(120)은 입력 데이터로 입력되는 특정 데이터에 포함된 요소(약물 처방, 백신 접종 등)들 각각의 특징 중요도를 추출하고, 추출된 요소들 각각의 특징 중요도에 기초하여 입력 데이터가 이벤트 구간 데이터인지 또는 제어 구간 데이터인지 여부를 판단할 수 있다. 추가적으로, 머신 러닝 모델(120)은 특정한 입력 데이터가 이벤트 구간 데이터인지 또는 제어 구간 데이터인지 판단한 결과에 따라, 이상 반응의 발생 여부를 판단할 수 있다. 예를 들어, 입력된 특정 데이터가 이벤트 구간 데이터로 판단된 경우, 입력된 특정 데이터에 반영된 요소들의 패턴은 이상 반응을 발생시킬 수 있는 것으로 판단할 수 있다. 반대로, 입력된 특정 데이터가 제어 구간 데이터로 판단된 경우, 입력된 특정 데이터에 반영된 요소들의 패턴은 이상 반응을 발생시키지 않는 것으로 판단할 수 있다.In this method, the control unit 130 trains the machine learning model 120 so that feature importance is assigned to each of all elements included in the input data, meaning that the machine learning model 120 controls the event interval data and It may have the same/similar meaning as learning to distinguish interval data. Therefore, apart from the operation of the controller 130 to determine the suspicious factor related to inducing an abnormal reaction based on the feature importance given to each of the elements, which will be described below, the machine learning model 120 that has been trained is It is possible to distinguish the event section data and the control section data based on the learning result. That is, the machine learning model 120 on which the learning is completed extracts the feature importance of each of the elements (drug prescription, vaccination, etc.) included in the specific data input as the input data, and based on the feature importance of each of the extracted elements Thus, it can be determined whether the input data is event section data or control section data. Additionally, the machine learning model 120 may determine whether an abnormal reaction occurs according to a result of determining whether specific input data is event section data or control section data. For example, when it is determined that the input specific data is event section data, it may be determined that the pattern of elements reflected in the input specific data may cause an abnormal reaction. Conversely, when it is determined that the input specific data is control section data, it may be determined that the pattern of elements reflected in the input specific data does not cause an abnormal reaction.

의심 요소 결정 방법How to Determine Suspects

이하에서, 제어부(130)가 특징 중요도에 기초하여 입력 데이터에 포함된 전체 요소들 중에서 이상 반응의 유발과 관련된 의심 요소를 결정하는 방법에 대해서 구체적으로 살펴보도록 한다. 본 방법에서 이상 반응의 유발과 관련된 의심 요소는 적어도 하나 이상일 수 있다. 또한, 상기 적어도 하나의 의심 요소는 각각은 특정한 약물군으로 구성될 수 있다. Hereinafter, a detailed description will be given of a method in which the controller 130 determines a suspicious factor related to inducing an abnormal reaction from among all factors included in the input data based on the feature importance. In the present method, there may be at least one suspect factor related to the induction of an adverse reaction. In addition, each of the at least one suspicious element may be composed of a specific drug group.

본 방법에서, 제어부(130)는 특징 중요도에 기초하여 전체 요소들 중 이상 반응의 유발과 관련된 의심 요소를 결정하는데, 여기서 의심 요소인 것으로 결정된 요소들의 특징 중요도는 사전 설정된 특정한 임계값보다 큰 값을 가질 수 있다. 일 예로, 상기 특정한 임계 값은 상기 전체 요소들에 각각 부여된 특징 중요도의 전체 평균 값의 2배인 값으로 설정될 수 있다. 이는 설명을 돕기위한 예시에 불과할 뿐, 본 발명이 이에 제한되는 것은 아니다. 또 다른 일 예로, 제어부(130)는 전체 요소들 중 특징 중요도 값이 큰 상위 특정 개의 요소들을 의심 요소로 결정할 수도 있다. In this method, the control unit 130 determines a suspicious factor related to the induction of an adverse reaction among all the factors based on the characteristic importance, wherein the characteristic importance of the factors determined to be the suspicious factor is a value greater than a predetermined specific threshold value. can have As an example, the specific threshold value may be set to a value that is twice the overall average value of the feature importance assigned to each of the elements. This is only an example to help the description, and the present invention is not limited thereto. As another example, the control unit 130 may determine, among all the elements, a specific upper-order element having a large feature importance value as the suspect element.

앞서 도 1 및 도 2를 참조하여 설명한 것과 같이, 본 발명에서 머신러닝 모델(120)을 학습시킨 결과에 기초하여 의심요소를 결정한다는 것은, 학습한 결과 머신러닝 모델(120)이 추론한 입력 값(X)과 출력 값(Y)의 관계 자체를 사용하는 것으로 이해될 수 있다. 즉, 본 발명에서는 이상 반응을 유발한 의심 요소를 결정하기 위해, 특정한 입력 값(특정 입력 데이터)을 입력하면 추론된 입력 값과 출력 값 사이의 관계(전체 요소들에 대하여 부여된 각각의 특징 중요도)를 사용하여 특정한 입력 값에 대응되는 출력 값(이상 반응 발생 여부)을 얻는 머신러닝 모델(120)의 능력을 사용하는 것이 아니라, 추론된 입력 값과 출력 값 사이의 관계(특징 중요도) 자체를 이용하여 특정 임계값 이상의 특징 중요도를 갖는 요소들을 이상 반응을 유발한 의심 요소로 결정할 수 있다.As described above with reference to FIGS. 1 and 2 , in the present invention, determining a suspect element based on the result of training the machine learning model 120 means an input value inferred by the machine learning model 120 as a result of learning. It can be understood as using the relationship between (X) and the output value (Y) itself. That is, in the present invention, when a specific input value (specific input data) is input in order to determine the suspicious factor that caused the abnormal reaction, the relationship between the inferred input value and the output value (the importance of each feature assigned to all factors) ), rather than using the ability of the machine learning model 120 to obtain an output value (whether or not an adverse reaction occurs) corresponding to a specific input value, the relationship between the inferred input value and the output value (feature importance) itself By using it, it is possible to determine factors having a feature importance greater than or equal to a specific threshold as a suspect factor that induced an abnormal reaction.

도 7은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법에 기초하여 이상반응을 유발하는 의심 약물을 감시한 결과를 나타낸 도이다. 도 7에서의 이상 반응은 아나필락시스일 수 있다. 보다 구체적으로, 도 7은 이상 반응 발생의 의심 요소로서 상위 20개의 약물 그룹을 결정한 경우에 해당한다. 분석 결과, M09 약물 그룹을 제외한 나머지 95%의 약물 그룹은 약물 부작용 데이터 베이스에 등록되어 있는 그룹임을 확인하였다. 또한, 약물 부작용 데이터 베이스에 등록되어 있지 않은 M09그룹을 조사한 결과, M09 약물 그룹이 이상반응을 유발할 수 있다는 문헌이 확인되었다. 상기 약물 부작용 데이터 베이스는 SIDER(Protect collaborative and the Side Effect Resouce)일 수 있다.7 is a diagram illustrating a result of monitoring a suspected drug causing an adverse reaction based on an artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention. The adverse reaction in FIG. 7 may be anaphylaxis. More specifically, FIG. 7 corresponds to a case in which the top 20 drug groups are determined as a factor suspected of the occurrence of an adverse reaction. As a result of the analysis, it was confirmed that 95% of drug groups except for the M09 drug group were registered in the drug side effect database. In addition, as a result of examining the M09 group, which is not registered in the drug side effect database, it was confirmed that the M09 drug group could cause adverse reactions. The drug side effect database may be SIDER (Protect collaborative and the Side Effect Resouce).

도 8은 본 발명의 일 실시예에 따른 인공지능 기반의 이상 반응 감시 방법을 수행하기 위한 제어부(130)에서 구현되는 동작의 일례를 나타낸 순서도이다.8 is a flowchart illustrating an example of an operation implemented by the control unit 130 for performing the artificial intelligence-based abnormal reaction monitoring method according to an embodiment of the present invention.

보다 구체적으로, 제어부(130)는 환자들의 의료 기록과 관련된 의료 데이터를 저장하는 데이터 베이스(data base)로부터 머신러닝 모델(machine learning model) 학습을 위한 입력 데이터를 상기 의료 데이터에 기초하여 추출한다(S810).More specifically, the controller 130 extracts input data for learning a machine learning model from a database storing medical data related to medical records of patients based on the medical data ( S810).

다음, 제어부(130)는 상기 추출된 입력 데이터를 머신러닝 모델에 입력하여, 상기 입력 데이터에 포함된 전체 요소들 각각에 대하여 특징 중요도가 부여되도록 상기 머신러닝 모델을 학습시킨다(S820). 여기서, 상기 특징 중요도는 상기 머신러닝 모델 학습을 통하여 판단된 상기 이상 반응과 특정 요소 간의 상관 관계의 정도에 기초하여 각각 계산된다.Next, the controller 130 inputs the extracted input data to the machine learning model, and trains the machine learning model so that feature importance is assigned to each of all elements included in the input data (S820). Here, the feature importance is calculated based on the degree of correlation between the abnormal response and a specific element determined through the machine learning model learning.

마지막으로, 제어부(130)는 상기 특징 중요도에 기초하여 상기 전체 요소들 중에서 이상 반응의 유발과 관련된 적어도 하나의 의심 요소를 결정한다(S830).Finally, the control unit 130 determines at least one suspicious factor related to induction of an abnormal reaction from among all the factors based on the feature importance ( S830 ).

본 발명은 본 발명의 필수적 특징을 벗어나지 않는 범위에서 다른 특정한 형태로 구체화될 수 있음은 통상의 기술자에게 자명하다. 따라서, 상술한 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니 되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.It is apparent to those skilled in the art that the present invention may be embodied in other specific forms without departing from the essential characteristics of the present invention. Accordingly, the above detailed description should not be construed as restrictive in all respects but as exemplary. The scope of the present invention should be determined by a reasonable interpretation of the appended claims, and all modifications within the equivalent scope of the present invention are included in the scope of the present invention.

이상에서 설명된 실시 예들은 본 발명의 구성요소들과 특징들이 소정 형태로 결합된 것들이다. 각 구성요소 또는 특징은 별도의 명시적 언급이 없는 한 선택적인 것으로 고려되어야 한다. 각 구성요소 또는 특징은 다른 구성요소나 특징과 결합되지 않은 형태로 실시될 수 있다. 또한, 일부 구성요소들 및/또는 특징들을 결합하여 본 발명의 실시 예를 구성하는 것도 가능하다. 본 발명의 실시 예들에서 설명되는 동작들의 순서는 변경될 수 있다. 어느 실시예의 일부 구성이나 특징은 다른 실시 예에 포함될 수 있고, 또는 다른 실시예의 대응하는 구성 또는 특징과 교체될 수 있다. 특허청구범위에서 명시적인 인용 관계가 있지 않은 청구항들을 결합하여 실시 예를 구성하거나 출원 후의 보정에 의해 새로운 청구항으로 포함시킬 수 있음은 자명하다.The embodiments described above are those in which elements and features of the present invention are combined in a predetermined form. Each component or feature should be considered optional unless explicitly stated otherwise. Each component or feature may be implemented in a form that is not combined with other components or features. In addition, it is also possible to configure an embodiment of the present invention by combining some elements and/or features. The order of operations described in the embodiments of the present invention may be changed. Some features or features of one embodiment may be included in another embodiment, or may be replaced with corresponding features or features of another embodiment. It is obvious that claims that are not explicitly cited in the claims can be combined to form an embodiment or included as a new claim by amendment after filing.

본 발명에 따른 실시 예는 다양한 수단, 예를 들어, 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 본 발명의 일 실시 예는 하나 또는 그 이상의 ASICs(application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서, 콘트롤러, 마이크로 콘트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다.Embodiments according to the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof. In the case of implementation by hardware, an embodiment of the present invention provides one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), FPGAs ( field programmable gate arrays), a processor, a controller, a microcontroller, a microprocessor, and the like.

펌웨어나 소프트웨어에 의한 구현의 경우, 본 발명의 일 실시 예는 이상에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차, 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드는 메모리에 저장되어 프로세서에 의해 구동될 수 있다. 상기 메모리는 상기 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 상기 프로세서와 데이터를 주고 받을 수 있다.In the case of implementation by firmware or software, an embodiment of the present invention may be implemented in the form of modules, procedures, functions, etc. that perform the functions or operations described above. The software code may be stored in the memory and driven by the processor. The memory may be located inside or outside the processor, and may transmit/receive data to and from the processor by various well-known means.

100: 이상반응 감시 시스템
110: 데이터 베이스
120: 머신러닝 모델
130: 제어부100: adverse reaction monitoring system
110: database
120: machine learning model
130: control unit

Claims

extracting input data for learning a machine learning model from a database storing medical data related to medical records of patients based on the medical data;
inputting the extracted input data into a machine learning model, and training the machine learning model so that feature importance is assigned to each of all elements included in the input data;
the feature importance is calculated based on a degree of correlation between the abnormal response and a specific factor determined through the machine learning model learning; and
and determining at least one suspect factor related to induction of the adverse event from among the total factors based on the feature importance.

The method of claim 1,
The step of extracting the input data,
The method further comprising the step of extracting, for each patient, data on the occurrence time of the adverse event related to the time point at which the adverse event occurred in the patients, respectively.

3. The method of claim 2,
The step of extracting the input data is,
The method further comprising the step of extracting event section data related to the medical record of the specific patient for a specific period prior to the time when the adverse event occurred in the specific patient, based on the abnormal reaction occurrence data extracted for each patient Adverse Event Monitoring Methods.

4. The method of claim 3,
The step of extracting the input data is,
The method further comprising the step of extracting control interval data related to the medical record of the specific patient for the specific period before any time other than the time when the adverse event occurred in the specific patient.

5. The method of claim 4,
Medical records included in each of the event section data and the control section data for the patients include at least one of a record related to diagnosis of a specific symptom, a record related to drug prescription, or a record related to vaccination. Way.

5. The method of claim 4,
An abnormal reaction monitoring method in which a time section of a specific interval is inserted between a time section from which the event section data is extracted and a time section from which the control section data is extracted.

7. The method of claim 6,
An abnormal reaction monitoring method in which a time section from which the event section data is extracted and a time section from which the control section data is extracted based on the inserted time section of the specific interval are temporally spaced apart without overlap.

5. The method of claim 4,
The machine learning model assigns the feature importance to each of the entire elements, and the abnormal reaction monitoring method is trained to distinguish the event section data from the control section data.

9. The method of claim 8,
Abnormal reaction monitoring method further comprising the step of determining whether the abnormal reaction occurs by inputting specific input data that is one of the event section data or the control section data to the machine learning model on which the machine learning model learning has been completed.

The method of claim 1,
The feature importance is assigned a high value to a factor determined to have a relatively high degree of correlation.

11. The method of claim 10,
and wherein the at least one suspicious element includes elements in which the feature importance is greater than a specific threshold value among all elements.

12. The method of claim 11,
and the specific threshold value is set to a value that is twice a total average value of the feature importance assigned to each of the all elements.

a database for storing medical data related to medical records of patients;
machine learning models; and
control unit; including;
The control unit is
Extracting input data for learning a machine learning model from the database based on the medical data,
By inputting the extracted input data to a machine learning model, the machine learning model is trained so that feature importance is assigned to each of all elements included in the input data,
The feature importance is calculated based on the degree of correlation between the abnormal response and a specific element determined through the machine learning model learning,
The adverse reaction monitoring system is configured to determine at least one suspect element related to induction of the adverse reaction from among the entire elements based on the feature importance.

14. The method of claim 13,
The control unit, to extract the input data,
An adverse reaction monitoring system for extracting, for each patient, data at the time of occurrence of the adverse event related to the time at which the adverse event occurred in the patients.

15. The method of claim 14,
The control unit, to extract the input data,
An adverse reaction monitoring system for extracting event section data related to the medical record of the specific patient for a specific period prior to the time when the adverse event occurred in the specific patient, based on the data on occurrence of the abnormal event extracted for each patient.

16. The method of claim 15,
The control unit, to extract the input data,
An adverse reaction monitoring system for extracting control interval data related to the medical record of the specific patient for the specific period before any time other than the time when the adverse event occurred in the specific patient.

17. The method of claim 16,
Medical records included in each of the event section data and the control section data for the patients include at least one of a record related to diagnosis of a specific symptom, a record related to drug prescription, or a record related to vaccination. system.

17. The method of claim 16,
An abnormal reaction monitoring system in which a time section of a specific interval is inserted between a time section from which the event section data is extracted and a time section from which the control section data is extracted.

19. The method of claim 18,
An abnormal reaction monitoring system in which a time section from which the event section data is extracted and a time section from which the control section data is extracted based on the inserted time section of the specific interval are temporally spaced apart from each other without overlap.

17. The method of claim 16,
The machine learning model is an abnormal reaction monitoring system that is trained to distinguish between the event section data and the control section data by assigning the feature importance to each of the entire elements.