KR102577105B1

KR102577105B1 - Apparatus and method for building a pipelines to explore adverse drug reaction

Info

Publication number: KR102577105B1
Application number: KR1020210101922A
Authority: KR
Inventors: 이수현; 김종엽; 이승희; 이충천; 우혜경
Original assignee: 건양대학교산학협력단; 공주대학교 산학협력단
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2023-09-12
Also published as: KR20230020608A

Abstract

본 발명은 약물 부작용 탐지를 위한 파이프라인 구축 방법 및 장치에 관한 것이다. 본 발명의 일 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 방법은 소셜 네트워크 서비스(Social Network Services, SNS)를 기반으로 하는 소셜 채널에 접속하여 제1 용어세트를 기반으로 목표 약물과 관련한 적어도 하나의 소셜데이터를 수집하는 단계; 상기 수집된 소셜데이터를 전처리하는 단계; 제2 용어세트를 기반으로 상기 전처리된 소셜데이터 중 부작용 관련 데이터를 추출하여 탐색적 데이터 분석을 수행하는 단계; 상기 분석 결과에 따라 상기 목표 약물에 대한 약물 부작용 패턴을 분석하여 미리 설정된 카테고리에 따라 분류하는 단계; 및 상기 분류 결과를 이용하여 약물 부작용 탐지 예측 모델을 구축 또는 학습하는 단계를 포함하며, 상기 제1 용어세트는 적어도 하나의 약물 각각을 나타내는 적어도 하나의 용어로 구성된 데이터 세트들을 포함하며, 상기 제2 용어세트는 약물 부작용을 나타내는 용어로 구성된 데이터 세트를 포함할 수 있다.The present invention relates to a method and device for constructing a pipeline for detecting drug side effects. A method of building a pipeline for detecting drug side effects according to an embodiment of the present invention is to connect to a social channel based on social network services (SNS) and use at least one word related to the target drug based on the first term set. collecting social data; Preprocessing the collected social data; Extracting data related to side effects from the preprocessed social data based on a second term set and performing exploratory data analysis; Analyzing drug side effect patterns for the target drug according to the analysis results and classifying them according to preset categories; and constructing or learning a drug side effect detection prediction model using the classification result, wherein the first term set includes data sets composed of at least one term each representing at least one drug, and the second term set includes data sets consisting of at least one term representing each of at least one drug. The term set may include a data set comprised of terms representing drug side effects.

Description

Pipeline building method and device for detecting drug side effects {APPARATUS AND METHOD FOR BUILDING A PIPELINES TO EXPLORE ADVERSE DRUG REACTION}

본 발명은 약물 부작용 탐지를 위한 파이프라인 구축 방법 및 장치에 관한 것으로, 보다 상세하게는 소셜 데이터 분석을 통해 약물에 대한 부작용을 탐색 및 업데이트 할 수 있도록 하는 약물 부작용 탐지를 위한 파이프라인 구축 방법 및 장치에 관한 것이다.The present invention relates to a method and device for building a pipeline for detecting side effects of drugs, and more specifically, a method and device for building a pipeline for detecting side effects of drugs that allows exploring and updating side effects of drugs through social data analysis. It's about.

최근 전 세계적으로 고령화에 따른 약물 복용 빈도와 수량은 급격하게 증가하고 있는 추세이다. 이에 따른 약물 부작용 역시 급속하게 증가하여 환자의 안전을 위협하고 있는 실정이다. 이에 시판되고 있는 약물에 대한 새로운 부작용 또는 심각한 부작용이 있는지를 분석하여 조기에 발견하는 것이 중요한 이슈로 부각되고 있다.Recently, the frequency and quantity of drug use is rapidly increasing worldwide due to aging. As a result, drug side effects are also rapidly increasing, threatening patient safety. Accordingly, early detection by analyzing new or serious side effects of commercially available drugs is emerging as an important issue.

목표 약물에 대해 공공연하게 알려져있는 부작용 외에도 새로운 부작용이 발생되는 경우가 있고, 또한 장기간 복용에 따른 부작용이 추가로 발생되는 경우가 있다.In addition to the publicly known side effects of the target drug, new side effects may occur, and additional side effects may occur due to long-term use.

그러나, 일상생활에서 그 목표 약물을 복용하는 약물 복용자가 약물, 약물 복용 후기 데이터를 입력받아 복용에 의해 발현한 증상들을 일일이 수집 또는 모니터링 하는 것이 번거롭고 불편하여 약물에 대한 실제적인 정보를 얻는 것에는 한계가 있다. However, it is cumbersome and inconvenient for drug users who take the target drug in their daily lives to input the drug and drug use review data and individually collect or monitor the symptoms caused by taking the drug, so there is a limit to obtaining actual information about the drug. There is.

한편, 최근 의료산업에서 소셜 네트워킹의 사용이 급속히 증가함에 따라, 많은 사람들이 자신의 감정과 경험을 소셜 네트워크 서비스(Social Network Services, SNS)에서 공유한다. 그들은 귀중한 정보를 공유하는 행위 중 하나로서 자신이 복용한(하는) 특정 약물에 대한 복용 후기를 게시물로서 게재하거나 댓글 등을 기재하여 반응한다.Meanwhile, as the use of social networking has recently increased rapidly in the medical industry, many people share their emotions and experiences on social network services (SNS). As an act of sharing valuable information, they respond by posting reviews of specific drugs they have taken or by leaving comments.

이와 같이 소셜 네트워크 서비스를 통해 게재 또는 등록되는 정보들은 개개인에 의한 직접적인 복용 후기에 해당하기 때문에, 그 특정 약물에 대해 이미 알려진 부작용 외에 새로운 부작용이나 적응증과 관련한 데이터들을 포함하고 있어 유의미할 것이다.In this way, information posted or registered through social networking services corresponds to direct reviews of use by individuals, so it will be meaningful as it includes data related to new side effects or indications in addition to the side effects already known for the specific drug.

따라서, 소셜 네트워크 서비스를 기반으로 목표 약물과 관련한 소셜데이터를 수집하여 약물 부작용의 패턴을 분석함으로써, 그 목표 약물에 의한 약물 부작용을 보다 정확하게 예측할 수 있도록 하는 기술이 개발될 필요가 있다.Therefore, there is a need to develop technology that can more accurately predict drug side effects caused by the target drug by collecting social data related to the target drug based on social network services and analyzing the pattern of drug side effects.

한국공개특허공보 제10-2015-0049937호 (공개일: 2015년 05월 08일)Korean Patent Publication No. 10-2015-0049937 (Publication date: May 8, 2015)

본 발명은 상기한 바와 같은 문제점을 해결하기 위하여 제안된 것으로, 소셜 네트워크 서비스를 기반으로 목표 약물과 관련한 소셜데이터를 수집하여 약물 부작용의 패턴을 분석함으로써, 그 목표 약물에 의한 약물 부작용을 보다 정확하게 예측할 수 있도록 하는 약물 부작용 탐지를 위한 파이프라인 구축 방법 및 장치를 제공함에 있다.The present invention was proposed to solve the problems described above. By collecting social data related to the target drug based on a social network service and analyzing the pattern of drug side effects, it is possible to more accurately predict drug side effects caused by the target drug. To provide a pipeline construction method and device for detecting drug side effects.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

상술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 방법은, 소셜 네트워크 서비스(Social Network Services, SNS)를 기반으로 하는 소셜 채널에 접속하여 제1 용어세트를 기반으로 목표 약물과 관련한 적어도 하나의 소셜데이터를 수집하는 단계; 상기 수집된 소셜데이터를 전처리하는 단계; 제2 용어세트를 기반으로 상기 전처리된 소셜데이터 중 부작용 관련 데이터를 추출하여 탐색적 데이터 분석을 수행하는 단계; 상기 분석 결과에 따라 상기 목표 약물에 대한 약물 부작용 패턴을 분석하여 미리 설정된 카테고리에 따라 분류하는 단계; 및 상기 분류 결과를 이용하여 약물 부작용 탐지 예측 모델을 구축 또는 학습하는 단계를 포함하며, 상기 제1 용어세트는 적어도 하나의 약물 각각을 나타내는 적어도 하나의 용어로 구성된 데이터 세트들을 포함하며, 상기 제2 용어세트는 약물 부작용을 나타내는 용어로 구성된 데이터 세트를 포함할 수 있다.The method of building a pipeline for detecting drug side effects according to an embodiment of the present invention to solve the above-described problem involves accessing a social channel based on social network services (SNS) to create a first term set. collecting at least one social data related to the target drug based on; Preprocessing the collected social data; Extracting data related to side effects from the preprocessed social data based on a second term set and performing exploratory data analysis; Analyzing drug side effect patterns for the target drug according to the analysis results and classifying them according to preset categories; and constructing or learning a drug side effect detection prediction model using the classification result, wherein the first term set includes data sets composed of at least one term each representing at least one drug, and the second term set includes data sets consisting of at least one term representing each of at least one drug. The term set may include a data set comprised of terms representing drug side effects.

한편, 본 발명의 일 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 장치는, 통신모듈; 상기 약물 부작용 탐지를 위한 파이프라인을 구축하기 위한 적어도 하나의 정보 또는 데이터를 저장하는 저장모듈; 소셜 네트워크 서비스(Social Network Services, SNS)를 기반으로 하는 소셜 채널에 접속하여 제1 용어세트를 기반으로 목표 약물과 관련한 적어도 하나의 소셜데이터를 수집하여 전처리하고, 제2 용어세트를 기반으로 상기 전처리된 소셜데이터 중 부작용 관련 데이터를 추출하여 탐색적 데이터 분석을 수행한 후, 상기 분석 결과에 따라 상기 목표 약물에 대한 약물 부작용 패턴을 분석하여 미리 설정된 카테고리에 따라 분류하는 분석모듈; 상기 분류 결과를 이용하여 약물 부작용 탐지 예측 모델을 구축 또는 학습하는 학습모듈; 및 상기 제1 용어세트를 기반으로 목표 약물과 관련한 적어도 하나의 소셜데이터를 수집하여 전처리하고, 상기 제2 용어세트를 기반으로 상기 전처리된 소셜데이터 중 부작용 관련 데이터를 추출하여 탐색적 데이터 분석을 수행하고, 상기 분석 결과에 따라 상기 목표 약물에 대한 약물 부작용 패턴을 분석하여 미리 설정된 카테고리에 따라 분류한 후, 상기 분류 결과를 이용하여 약물 부작용 탐지 예측 모델을 구축 또는 학습하도록 제어하는 제어모듈을 포함하며, 상기 제1 용어세트는 적어도 하나의 약물 각각을 나타내는 적어도 하나의 용어로 구성된 데이터 세트들을 포함하며, 상기 제2 용어세트는 약물 부작용을 나타내는 용어로 구성된 데이터 세트를 포함할 수 있다.Meanwhile, a pipeline construction device for detecting drug side effects according to an embodiment of the present invention includes a communication module; a storage module that stores at least one piece of information or data for building a pipeline for detecting the drug side effects; Access a social channel based on Social Network Services (SNS) to collect and pre-process at least one social data related to the target drug based on a first term set, and pre-process the data based on a second term set. an analysis module that extracts data related to side effects from social data, performs exploratory data analysis, analyzes drug side effect patterns for the target drug according to the analysis results, and classifies them according to preset categories; A learning module that builds or learns a drug side effect detection prediction model using the classification results; And collect and preprocess at least one social data related to the target drug based on the first term set, and perform exploratory data analysis by extracting data related to side effects from the preprocessed social data based on the second term set. And, according to the analysis results, it analyzes the drug side effect pattern for the target drug, classifies it according to preset categories, and then uses the classification result to build or learn a drug side effect detection prediction model. , the first term set may include data sets composed of at least one term representing each of at least one drug, and the second term set may include a data set composed of terms representing drug side effects.

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

본 발명에 의하면, 소셜 네트워크 서비스를 기반으로 목표 약물과 관련한 소셜데이터를 수집하여 약물 부작용의 패턴을 분석함으로써, 그 목표 약물에 의한 약물 부작용을 보다 정확하게 예측할 수 있도록 한다.According to the present invention, by collecting social data related to a target drug based on a social network service and analyzing the pattern of drug side effects, it is possible to more accurately predict drug side effects caused by the target drug.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 장치의 구성을 나타내는 블록도이다.
도 2는 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 방법을 나타내는 순서도이다.
도 3은 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인을 개략적으로 나타내는 도면이다.
도 4는 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축을 위해 사용되는 제2 용어세트의 생성 절차를 나타내는 도면이다.
도 5a는 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 시에 약물 부작용 패턴을 분석하는 제1 실시예를 나타내는 도면이다.
도 5b은 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 시에 약물 부작용 패턴을 분석하는 제2 실시예를 나타내는 도면이다.
도 5c은 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 시에 약물 부작용 패턴을 분석하는 제3 실시예를 나타내는 도면이다.Figure 1 is a block diagram showing the configuration of a pipeline construction device for detecting drug side effects according to an embodiment of the present invention.
Figure 2 is a flow chart showing a pipeline construction method for detecting drug side effects according to an embodiment of the present invention.
Figure 3 is a diagram schematically showing a pipeline for detecting drug side effects according to an embodiment of the present invention.
Figure 4 is a diagram showing the creation procedure of a second term set used to build a pipeline for detecting drug side effects according to an embodiment of the present invention.
Figure 5a is a diagram showing a first embodiment of analyzing drug side effect patterns when building a pipeline for drug side effect detection according to an embodiment of the present invention.
Figure 5b is a diagram showing a second embodiment of analyzing drug side effect patterns when building a pipeline for drug side effect detection according to an embodiment of the present invention.
Figure 5c is a diagram showing a third embodiment of analyzing drug side effect patterns when building a pipeline for drug side effect detection according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various different forms. The present embodiments are merely provided to ensure that the disclosure of the present invention is complete and to provide a general understanding of the technical field to which the present invention pertains. It is provided to fully inform the skilled person of the scope of the present invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for describing embodiments and is not intended to limit the invention. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context. As used in the specification, “comprises” and/or “comprising” does not exclude the presence or addition of one or more other elements in addition to the mentioned elements. Like reference numerals refer to like elements throughout the specification, and “and/or” includes each and every combination of one or more of the referenced elements. Although “first”, “second”, etc. are used to describe various components, these components are of course not limited by these terms. These terms are merely used to distinguish one component from another. Therefore, it goes without saying that the first component mentioned below may also be a second component within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those skilled in the art to which the present invention pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless clearly specifically defined.

공간적으로 상대적인 용어인 "아래(below)", "아래(beneath)", "하부(lower)", "위(above)", "상부(upper)" 등은 도면에 도시되어 있는 바와 같이 하나의 구성요소와 다른 구성요소들과의 상관관계를 용이하게 기술하기 위해 사용될 수 있다. 공간적으로 상대적인 용어는 도면에 도시되어 있는 방향에 더하여 사용시 또는 동작시 구성요소들의 서로 다른 방향을 포함하는 용어로 이해되어야 한다. 예를 들어, 도면에 도시되어 있는 구성요소를 뒤집을 경우, 다른 구성요소의 "아래(below)"또는 "아래(beneath)"로 기술된 구성요소는 다른 구성요소의 "위(above)"에 놓여질 수 있다. 따라서, 예시적인 용어인 "아래"는 아래와 위의 방향을 모두 포함할 수 있다. 구성요소는 다른 방향으로도 배향될 수 있으며, 이에 따라 공간적으로 상대적인 용어들은 배향에 따라 해석될 수 있다.Spatially relative terms such as “below”, “beneath”, “lower”, “above”, “upper”, etc. are used as a single term as shown in the drawing. It can be used to easily describe the correlation between a component and other components. Spatially relative terms should be understood as terms that include different directions of components during use or operation in addition to the directions shown in the drawings. For example, if a component shown in a drawing is flipped over, a component described as “below” or “beneath” another component will be placed “above” the other component. You can. Accordingly, the illustrative term “down” may include both downward and upward directions. Components can also be oriented in other directions, so spatially relative terms can be interpreted according to orientation.

명세서에서 사용되는 "부" 또는 "모듈"이라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부" 또는 "모듈"은 어떤 역할들을 수행한다. 그렇지만 "부" 또는 "모듈"은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부" 또는 "모듈"은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부" 또는 "모듈"은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부" 또는 "모듈"들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부" 또는 "모듈"들로 결합되거나 추가적인 구성요소들과 "부" 또는 "모듈"들로 더 분리될 수 있다.As used in the specification, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and the “unit” or “module” performs certain roles. However, “part” or “module” is not limited to software or hardware. A “unit” or “module” may be configured to reside on an addressable storage medium and may be configured to run on one or more processors. Thus, as an example, a “part” or “module” refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, Includes procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functionality provided within components and “parts” or “modules” can be combined into smaller components and “parts” or “modules” or into additional components and “parts” or “modules”. Could be further separated.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the attached drawings.

도 1은 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 장치의 구성을 나타내는 블록도이다.Figure 1 is a block diagram showing the configuration of a pipeline construction device for detecting drug side effects according to an embodiment of the present invention.

도 1을 참조하면, 약물 부작용 예측를 위한 파이프라인 구축 장치(이하, ‘구축 장치’라 칭함)(100)는 통신모듈(110), 저장모듈(130), 분석모듈(150), 학습모듈(170) 및 제어모듈(190)을 포함하여 구성된다.Referring to Figure 1, the pipeline construction device (hereinafter referred to as 'construction device') 100 for predicting drug side effects includes a communication module 110, a storage module 130, an analysis module 150, and a learning module 170. ) and a control module 190.

통신모듈(110)은 약물 부작용 탐지를 위한 파이프라인 구축을 위해 필요한 각종 정보 또는 데이터들을 외부 장치와 송수신한다. 구체적으로, 이 통신모듈(110)을 통해 소셜 네트워크 서비스(포털 사이트, 메신저 서비스 등)에 접속(접근)하여 소셜데이터들을 수집할 수 있다. 즉, 이 통신모듈(110)은 다른 단말, 서버, 장치 등과의 통신을 수행하기 위한 것으로, 무선 인터넷 기술들에 따른 통신망에서 무선 신호를 송수신하도록 한다. The communication module 110 transmits and receives various information or data needed to build a pipeline for detecting drug side effects with an external device. Specifically, social data can be collected by connecting to a social network service (portal site, messenger service, etc.) through this communication module 110. In other words, this communication module 110 is used to communicate with other terminals, servers, devices, etc., and transmits and receives wireless signals in a communication network based on wireless Internet technologies.

무선 인터넷 기술로는, 예를 들어 WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced) 등이 있으며, 식사 모니터링 장치(100)는 앞에서 나열되지 않은 인터넷 기술까지 포함한 범위에서 적어도 하나의 무선 인터넷 기술에 따라 데이터를 송수신하게 된다.Wireless Internet technologies include, for example, WLAN (Wireless LAN), Wi-Fi (Wireless-Fidelity), Wi-Fi (Wireless Fidelity) Direct, DLNA (Digital Living Network Alliance), WiBro (Wireless Broadband), and WiMAX (Worldwide). Interoperability for Microwave Access), HSDPA (High Speed Downlink Packet Access), HSUPA (High Speed Uplink Packet Access), LTE (Long Term Evolution), LTE-A (Long Term Evolution-Advanced), etc., and meal monitoring devices (100 ) transmits and receives data according to at least one wireless Internet technology, including Internet technologies not listed above.

근거리 통신(Short range communication)을 위한 것으로서, 블루투스(Bluetooth™RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi(Wireless-Fidelity), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여, 근거리 통신을 지원할 수 있다. 이러한, 근거리 무선 통신망(Wireless Area Networks)을 구축 장치(100)와 외부 장치 간 무선 통신을 지원할 수 있다. 이때, 근거리 무선 통신망은 근거리 무선 개인 통신망(Wireless Personal Area Networks)일 수 있다.For short range communication, Bluetooth™ RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra Wideband), ZigBee, NFC (Near Field Communication), Wi-Fi Short-distance communication can be supported using at least one of (Wireless-Fidelity), Wi-Fi Direct, and Wireless USB (Wireless Universal Serial Bus) technologies. The device 100 for constructing such wireless area networks It can support wireless communication between and external devices.At this time, the short-range wireless communication network may be a short-range wireless personal area network.

저장모듈(130)은 약물 관련 정보로서 적어도 하나의 용어 세트를 저장하고, 약물 부작용 탐지를 위한 파이프라인 구축을 위해 필요한 적어도 하나의 프로세스를 저장한다.The storage module 130 stores at least one set of terms as drug-related information and at least one process required to build a pipeline for detecting drug side effects.

분석모듈(150)은 소셜 네트워크 서비스를 기반으로 목표 약물에 대한 소셜데이터를 수집하여 적어도 하나의 용어세트를 구축하고, 이를 기반으로 그 목표 약물에 대한 약물 부작용 패턴을 분석 및 분류한다. 여기서, 소셜 데이터는 소셜 네트워크 서비스를 기반으로 하는 포털 사이트, 메신저 서비스 등일 수 있다.The analysis module 150 collects social data about the target drug based on a social network service, builds at least one term set, and analyzes and classifies drug side effect patterns for the target drug based on this. Here, social data may be a portal site or messenger service based on a social network service.

이를 위해, 분석모듈(150)은 수집부(151), 전처리부(153), 용어생성부(155), 패턴분석부(157) 및 분류부(159)를 포함하여 구성될 수 있다.To this end, the analysis module 150 may be configured to include a collection unit 151, a preprocessing unit 153, a term generation unit 155, a pattern analysis unit 157, and a classification unit 159.

구체적으로, 수집부(151)가 소셜 네트워크에 접속하여 목표 약물과 관련한 적어도 하나의 소셜데이터를 수집하고, 전처리부(153)가 그 수집된 적어도 하나의 소셜데이터를 전처리(정제)하면, 용어생성부(155)가 그 목표 약물에 대한 적어도 하나의 용어세트를 구축한다. 이후, 패턴분석부(157)가 구축된 적어도 하나의 용어세트를 기반으로 그 수집된 소셜데이터 중 부작용 관련 데이터를 추출하여 부작용 패턴을 분석하고, 분류부(159)가 그 분석결과를 기반으로 미리 설정된 카테고리에 따라 부작용 정보를 분류한다. 이때, 미리 설정된 카테고리는 알려진 부작용, 알려지지 않은 부작용, 적응증으로 구분되어 설정될 수 있으나, 이를 한정하지 않으며, 변경 설정되거나 다른 카테고리가 더 추가될 수 있다.Specifically, the collection unit 151 connects to a social network and collects at least one social data related to the target drug, and the preprocessing unit 153 preprocesses (refines) the collected at least one social data to generate a term. Part 155 builds at least one term set for the target drug. Thereafter, the pattern analysis unit 157 extracts side effect-related data from the collected social data based on at least one constructed term set to analyze the side effect pattern, and the classification unit 159 preliminarily based on the analysis results. Classify side effect information according to established categories. At this time, the preset categories may be divided into known side effects, unknown side effects, and indications, but this is not limited, and may be changed or additional categories may be added.

여기서, 적어도 하나의 용어세트는 목표 약물을 나타내는(지칭하는) 용어(예를 들어, 성분, 상품 이름, 브랜드, 별칭, 줄임말 등)에 대한 데이터 세트인 제1 용어세트 및 약물 부작용과 관련된 표준화되어 공개된 데이터베이스를 기반으로 구축된 목표 약물과 관련한 약물 부작용을 나타내는 용어에 대한 데이터 세트인 제2 용어세트를 포함할 수 있다. 이때, 제1 용어세트 및 제2 용어세트는 복수의 약물들 각각에 대한 데이터 세트를 구분하여 포함할 수 있다. 다만, 파이프라인을 최초 구축할 시에 관리자 또는 작업자에 의해 입력되고, 파이프라인이 구동되며 제1 용어세트 및 제2 용어세트는 각각 업데이트 될 수 있다.Here, at least one term set is a first term set, which is a data set for terms representing (referring to) the target drug (e.g., ingredient, product name, brand, nickname, abbreviation, etc.) and a standardized term set related to drug side effects. It may include a second term set, which is a data set of terms indicating drug side effects related to the target drug constructed based on a public database. At this time, the first term set and the second term set may separately include data sets for each of a plurality of drugs. However, when the pipeline is first built, input is made by an administrator or worker, the pipeline is run, and the first term set and the second term set can be updated, respectively.

한편, 분석모듈(150)은 소셜 네트워크에 접속하여 목표 약물과 관련한 적어도 하나의 소셜데이터를 수집할 시, 기구축된 제1 용어세트를 이용할 수 있다. 즉, 분석모듈(150)은 제1 용어세트에서 목표 약물에 대한 데이터 세트에 포함된 키워드를 이용하여 소셜데이터를 수집한다.Meanwhile, when the analysis module 150 accesses a social network and collects at least one social data related to a target drug, it may use a pre-established first term set. That is, the analysis module 150 collects social data using keywords included in the data set for the target drug in the first term set.

학습모듈(170)은 분석모듈(150)에 의해 분류된 데이터에 대한 약물 부작용 레이블링을 수행한 후, 그 레이블링 된 데이터를 학습데이터로 이용하여 약물 부작용 탐지 예측 모델을 구축 및/또는 학습한다.The learning module 170 performs drug side effect labeling on the data classified by the analysis module 150 and then uses the labeled data as learning data to build and/or learn a drug side effect detection prediction model.

이때, 학습모듈(170)은 순환 신경망(Recurrent Neural Nerwork, RNN) 학습 방식에 기초하여 약물 부작용 탐지 예측 모델을 구축할 수 있는데, 순환 신경망(RNN)을 구성하는 구조로서 LSTM(Long short term memory network) 방식이 사용될 수 있다.At this time, the learning module 170 can build a drug side effect detection prediction model based on the Recurrent Neural Network (RNN) learning method. As a structure constituting the Recurrent Neural Network (RNN), the LSTM (Long short term memory network) ) method can be used.

한편, 학습모듈(170)은 분류된 데이터 중에서 알려진 부작용으로 분류된 데이터를 기반으로 약물 부작용 탐지 예측 모델을 학습에 이용하고, 알려지지 않은 부작용으로 분류된 데이터가 미리 설정된 횟수 이상으로 누적되면, 이를 알려진 부작용으로 재분류하여 약물 부작용 탐지 예측 모델을 학습하는데 이용할 수 있다.Meanwhile, the learning module 170 uses the drug side effect detection prediction model for learning based on the data classified as known side effects among the classified data, and when the data classified as unknown side effects accumulates more than a preset number of times, it becomes known. It can be reclassified as a side effect and used to learn a prediction model for detecting drug side effects.

제어모듈(190)은 소셜 네트워크에 접속하여 목표 약물과 관련한 적어도 하나의 소셜데이터를 수집하고, 그 수집된 적어도 하나의 소셜데이터를 전처리(정제)하면, 용어생성부(155)가 그 목표 약물에 대한 적어도 하나의 용어세트를 구축하도록 제어한다. 이후, 제어모듈(190)은 구축된 적어도 하나의 용어세트를 기반으로 그 수집된 소셜데이터 중 부작용 관련 데이터를 추출하여 부작용 패턴을 분석하고, 그 분석결과를 기반으로 미리 설정된 카테고리에 따라 부작용 정보를 분류하도록 제어한다.The control module 190 connects to a social network to collect at least one social data related to the target drug, and when the collected at least one social data is preprocessed (refined), the term generator 155 generates the target drug. Control to build at least one term set for Thereafter, the control module 190 extracts data related to side effects from the collected social data based on at least one constructed term set, analyzes side effect patterns, and provides side effect information according to preset categories based on the analysis results. Control to classify.

도 2는 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 방법을 나타내는 순서도이다.Figure 2 is a flow chart showing a pipeline construction method for detecting drug side effects according to an embodiment of the present invention.

도 2를 참조하면, 약물 부작용 탐지를 위한 파이프라인 구축을 위해 분석모듈(150)이 소셜 네트워크에 접속하여 제1 용어세트 중 목표 약물과 관련한 데이터 세트를 이용하여 그 목표 약물과 관련한 적어도 하나의 소셜데이터를 수집하여 전처리하고(S201), 제2 용어세트를 기반으로 그 전처리된 소셜데이터 중 부작용 관련 데이터를 추출하여 부작용 패턴을 분석한다(S203).Referring to FIG. 2, in order to build a pipeline for detecting drug side effects, the analysis module 150 connects to a social network and uses a data set related to the target drug among the first term sets to create at least one social network related to the target drug. Data is collected and preprocessed (S201), and data related to side effects are extracted from the preprocessed social data based on the second term set to analyze side effect patterns (S203).

이후, 그 분석 결과에 따라 추출된 데이터들을 기반으로 약물 부작용에 대한 레이블링을 수행하되(S205), 이때, 레이블링은 분석모듈(150)에 의해 자동으로 수행되거나, 관리자 또는 작업자에 의해 수동으로 수행될 수 있다.Thereafter, labeling of drug side effects is performed based on the data extracted according to the analysis results (S205). At this time, labeling may be performed automatically by the analysis module 150 or manually by a manager or worker. You can.

이후, 학습모듈(170)이 레이블링 된 데이터를 학습데이터로 이용하여 약물 부작용 예측 모듈을 학습한다(S207). Afterwards, the learning module 170 uses the labeled data as learning data to learn a drug side effect prediction module (S207).

도 3은 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인을 개략적으로 나타내는 도면이다. 이 파이프라인은 목표 약물에 대한 부작용을 지속적으로 또는 주기적을 예측(탐지)하고 그 결과를 업데이트할 수 있도록 한다. 여기서, 제1 용어세트 및 제2 용어세트는 사전에 미리 구축된(생성된) 것을 이용한 경우를 가정하여 설명한다.Figure 3 is a diagram schematically showing a pipeline for detecting drug side effects according to an embodiment of the present invention. This pipeline predicts (detects) side effects for target drugs continuously or periodically and updates the results. Here, the first term set and the second term set will be explained assuming that pre-constructed (generated) ones are used.

도 3을 참조하면, 구축 장치(100)는 소셜 네트워크 서비스를 기반으로 하는 소셜 채널에 접속하고(S301), 목표 약물에 대한 제1 용어세트를 기반으로 그 목표 약물에 대한 적어도 하나의 소셜데이터를 수집한다(S303).Referring to FIG. 3, the construction device 100 accesses a social channel based on a social network service (S301) and provides at least one social data for the target drug based on the first term set for the target drug. Collect (S303).

그렇게 수집된 적어도 하나의 소셜데이터는 비정형적이고 예기지 못한 비정형 데이터로서 이를 이용하기 위해서는 전처리(정제)해야만 하는데, 이때 비정형 데이터는 자연어 처리 또는 텍스트 마이닝(text mining)을 통해 정제할 수 있다(S305).At least one social data collected in this way is unstructured and unexpected, and must be preprocessed (refined) in order to use it. In this case, the unstructured data can be refined through natural language processing or text mining (S305) .

여기서, 텍스트 마이닝이란 비정형 데이터에 대한 마이닝 과정으로서, 데이터로부터 통계적인 의미가 있는 개념이나 특성을 추출하고, 이것들 간의 패턴이나 추세 등의 고품질의 정보를 끌어내는 것이다.Here, text mining is a mining process for unstructured data, extracting statistically meaningful concepts or characteristics from data, and deriving high-quality information such as patterns or trends between them.

한편, 전처리된 적어도 하나의 소셜데이터는 제2 용어세트를 기반으로 탐색적 데이터 분석을 기반으로 데이터를 분석하고(S307), 그 분석 결과를 이용하여 통해 약물 부작용 패턴을 분석한다(S309). 여기서, 탐색적 데이터 분석은 빅데이터 중에서 사용자 설정에 따라 유효 데이터를 분석하고, 그 분석된 유효 데이터를 미리 설정된 설정 옵션에 따라 축약하여 시각화하는 것일 수 있다.Meanwhile, at least one preprocessed social data is analyzed based on exploratory data analysis based on the second term set (S307), and drug side effect patterns are analyzed using the analysis results (S309). Here, exploratory data analysis may be analyzing valid data among big data according to user settings, condensing the analyzed valid data according to preset setting options, and visualizing it.

여기서, 탐색적 데이터 분석은 연관성 분석 또는 워드 임베딩(word2vec) 모델을 기반으로 이뤄질 수 있다. 구체적으로, 연관성 분석은 데이터 내부에 존재하는 데이터 간의 상호관계 혹은 종속관계를 찾아내는 분석으로서, 단순하지만 명확한 결과 해석이 가능하도록 한다. 이를 통해 데이터에 대한 이해하기 쉬운 규칙을 생성하여 데이터에서 예상치 못한 지식을 발굴하는데 유용하게 활용될 수 있다. 또한, 워드 임베딩은 단어(키워드) 간 유사성을 고려하기 위해 단어의 의미를 벡터화 시켜주는 추론 기반 기법으로서, 텍스트를 정량화하여 새로운 시각으로 지식을 발굴하도록 한다. 이를 위해 단어의 의미를 최대한 담는 벡터를 생성하여 단어쌍의 유사도나 관련도를 검사하여 해석한다.Here, exploratory data analysis can be done based on correlation analysis or word embedding (word2vec) model. Specifically, correlation analysis is an analysis that finds interrelationships or dependencies between data that exist within data, enabling simple but clear interpretation of results. Through this, it can be useful in discovering unexpected knowledge from data by creating easy-to-understand rules for data. In addition, word embedding is a reasoning-based technique that vectorizes the meaning of words to take into account similarities between words (keywords). It quantifies the text and discovers knowledge from a new perspective. For this purpose, a vector that contains the meaning of the word as much as possible is created and the similarity or relatedness of the word pair is checked and interpreted.

S309 단계에 의한 분석 결과에 따라 약물 부작용 패턴이 어느 하나의 카테고리로 분류되는데, 예를 들어, 그 목표 약물에 대해 이미 알려진 부작용, 알려지지 않은 부작용, 적응증 중 어느 하나의 카테고리로 분류될 수 있다.According to the analysis results in step S309, the drug side effect pattern is classified into one category. For example, the target drug may be classified into any one of known side effects, unknown side effects, and indications.

이후, 그 분류된 데이터 중에서 알려진 부작용으로 분류된 데이터를 기반으로 레이블링을 수행하고, 그 레이블링 된 데이터를 학습데이터로서 이용하여 약물 부작용 탐지 예측 모델을 학습한다. Afterwards, labeling is performed based on data classified as known side effects among the classified data, and a prediction model for detecting drug side effects is learned using the labeled data as learning data.

이렇게 학습된 약물 부작용 탐지 예측 모델은 이후 수집되는 소셜데이터의 부작용을 탐지하기 위해 사용되며, 소셜데이터의 수집 및 분석이 이뤄질 때마다 그 성능이 향상됨에 따라 보다 정확한 예측 및 탐지가 가능해진다. The drug side effect detection prediction model learned in this way is used to detect side effects of social data collected later, and as its performance improves each time social data is collected and analyzed, more accurate prediction and detection becomes possible.

도 4는 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축을 위해 사용되는 제2 용어세트의 생성 절차를 나타내는 도면이다.Figure 4 is a diagram showing the creation procedure of a second term set used to build a pipeline for detecting drug side effects according to an embodiment of the present invention.

도 4를 참조하면, 제2 용어세트는 표준화되어 공개된 데이터베이스인 WHO-ART, SIDER를 기반으로 생성된 약물 부작용 리스트 및 사전 생성된 소비자 용어 사전을 이용하여 생성될 수 있다. 여기서, WHO-ART는 의약품 부작용 용어에 관한 국제 분류 체계를 나타내는 것으로, 이미 국내에서 ADR(Adverse Drug Reaction) 보고에 가장 널리 사용되고 있다. 또한, SIDER는 Drug-ADR 관계를 제공하는 약물 부작용 데이터 베이스로서 시판되는 의약품 및 그 의약품들에 대한 약물 부작용 정보를 포함한다. 또한, 소비자 용어 사전은 관리자 또는 작업자에 의해 정의된 것일 수 있다. 즉, 제2 용어세트는 WHO-ART 및 SIDER 등의 표준화된 세트를 활용하여 생성된다.Referring to FIG. 4, the second term set may be created using a list of drug side effects created based on WHO-ART and SIDER, which are standardized and public databases, and a dictionary of pre-generated consumer terms. Here, WHO-ART represents an international classification system for drug side effect terminology, and is already the most widely used in ADR (Adverse Drug Reaction) reporting in Korea. In addition, SIDER is a drug side effect database that provides Drug-ADR relationships and includes information on commercially available drugs and drug side effects for those drugs. Additionally, the consumer terminology dictionary may be defined by an administrator or worker. That is, the second term set is created using standardized sets such as WHO-ART and SIDER.

구체적으로, WHO-ART 기반으로 약물 부작용 리스트를 생성하고(S401), SIDER 기반으로 그 약물 부작용 리스트를 보완하며(S403), 소비자 용어를 기반으로 하는 약물 부작용 리스트를 생성한 후(S405), 이들을 상호 매핑함으로써(S407), 그 결과로 제2 용어세트를 생성한다(S409).Specifically, a list of drug side effects is created based on WHO-ART (S401), the list of drug side effects is supplemented based on SIDER (S403), and a list of drug side effects is created based on consumer terms (S405). By mutual mapping (S407), a second term set is generated as a result (S409).

그러나, S401 단계 내지 S405 단계의 수행 순서는 규정되어 있지 않으며, 독립적으로(개별적으로) 수행될 수 있으며, 필요에 따라 자동 또는 수동으로 각각 업데이트가 개별적으로 이뤄질 수 있다.However, the execution order of steps S401 to S405 is not specified, and can be performed independently (individually), and updates can be individually performed automatically or manually as needed.

그러나, WHO-ART 및 SIDER는 표준화되어 공개된 데이터베이스의 일 예시일 뿐, 이를 한정하지 않으며, 다른 약물 체계, 약물 데이터 등을 이용할 수도 있다.However, WHO-ART and SIDER are only examples of standardized and publicly available databases, and are not limited thereto, and other drug systems, drug data, etc. may be used.

한편, 구축 장치(100)가 소셜 네트워크 서비스를 기반으로 수집된 적어도 하나의 소셜데이터로부터 약물 부작용 패턴을 분석을 위한 탐색적 데이터 분석에 다양한 방식이 적용될 수 있다. 그 각각의 방식에 대한 예시들을 도 5a 내지 도 5c를 이용하여 설명하도록 한다.Meanwhile, various methods may be applied to exploratory data analysis for the construction device 100 to analyze drug side effect patterns from at least one social data collected based on a social network service. Examples of each method will be described using FIGS. 5A to 5C.

도 5a는 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 시에 약물 부작용 패턴을 분석하는 제1 실시예를 나타내는 도면이다.Figure 5a is a diagram showing a first embodiment of analyzing drug side effect patterns when building a pipeline for drug side effect detection according to an embodiment of the present invention.

도 5a에 따르면, 구축 장치(100)의 분석모듈(150)은 그 수집된 적어도 하나의 소셜데이터들에 등장하는 단어(키워드)들을 추출하고, 그 추출된 단어들 각각의 빈도수를 분석하여 (a)에 도시된 바와 같이 빈도표를 생성한다. According to FIG. 5A, the analysis module 150 of the construction device 100 extracts words (keywords) that appear in at least one collected social data, analyzes the frequency of each of the extracted words, and produces (a) ) Create a frequency table as shown in .

또한, 그 용어들을 이용하여 (b)에 도시된 바와 같이 워드클라우드를 생성함으로써 어떤 단어가 얼마나 높은 빈도로 사용되었는지를 시각적으로 확인할 수 있도록 한다.Additionally, by using the terms to create a word cloud as shown in (b), it is possible to visually check which words are used and how frequently they are used.

도 5b은 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 시에 약물 부작용 패턴을 분석하는 제2 실시예를 나타내는 도면이다.Figure 5b is a diagram showing a second embodiment of analyzing drug side effect patterns when building a pipeline for drug side effect detection according to an embodiment of the present invention.

도 5b에 따르면, 구축 장치(100)의 분석모듈(150)은 그 수집된 적어도 하나의 소셜데이터들에 등장하는 단어(키워드)들을 추출하고, 그 추출된 단어들 중 상위 n개의 연관 단어표를 생성하고, 또한 그 단어들 간의 연관성을 분석하여 시각적으로 확인할 수 있도록 시각화 그래프를 생성한다.According to FIG. 5B, the analysis module 150 of the construction device 100 extracts words (keywords) that appear in at least one collected social data, and creates a table of the top n related words among the extracted words. It also creates a visualization graph so that you can visually check it by analyzing the relationships between the words.

도 5c은 본 발명의 실시예에 따른 약물 부작용 탐지를 위한 파이프라인 구축 시에 약물 부작용 패턴을 분석하는 제3 실시예를 나타내는 도면이다.Figure 5c is a diagram showing a third embodiment of analyzing drug side effect patterns when building a pipeline for drug side effect detection according to an embodiment of the present invention.

도 5c에 따르면, 구축 장치(100)의 분석모듈(150)은 워드임베딩 그 수집된 적어도 하나의 소셜데이터들에 등장하는 단어(키워드)들을 추출하고, 그 추출된 단어들을 임베딩 후 부작용 사전과 매칭하여 인체의 기관계(SOC) 별 부작용 패턴을 파악하고, 특정 부작용 코사인 거리순 상위 단어를 파악하여 약물 부작용 패턴을 확인할 수 있다.According to FIG. 5C, the analysis module 150 of the construction device 100 extracts words (keywords) that appear in at least one social data collected through word embedding, embeds the extracted words, and then matches them with the side effect dictionary. By doing this, you can identify the side effect pattern for each organ system (SOC) of the human body and check the drug side effect pattern by identifying the top words in order of the cosine distance of a specific side effect.

여기서, 워드임베딩 방법 중의 하나로서 word2vec을 이용할 수 있다.Here, word2vec can be used as one of the word embedding methods.

본 발명의 일 실시예에 따른 구축 장치(100)는 약물 부작용 탐지를 위한 파이프라인을 구축하는 서버일 수 있으며, 구축 장치(100)는 관리자 또는 작업자가 원하는 다수의 응용 프로그램(즉, 애플리케이션)을 설치하여 실행할 수 있는 컴퓨터, UMPC(Ultra Mobile PC), 워크스테이션, 넷북(net-book), PDA(Personal Digital Assistants), 포터블(portable) 컴퓨터, 웹 테블릿(web tablet), 무선 전화기(wireless phone), 모바일 폰(mobile phone), 스마트 폰(smart phone), 패드(Pad), 스마트 워치(Smart watch), 웨어러블(wearable) 단말, e-북(e-book), PMP(portable multimedia player), 휴대용 게임기, 네비게이션(navigation) 장치, 블랙 박스(black box) 또는 디지털 카메라(digital camera), 기타 이동통신 단말 등일 수 있다. 이로써, 구축 장치(100)는 파이프라인을 구축하기 위해 별도의 프로그램 또는 어플리케이션을 설치해야할 수 있다. 그러나, 이는 하나의 실시예일 뿐, 웹페이지에 접속함으로써 파이프라인을 구축하도록 할 수도 있다.The construction device 100 according to an embodiment of the present invention may be a server that builds a pipeline for detecting drug side effects, and the construction device 100 supports multiple applications (i.e., applications) desired by an administrator or worker. Computers that can be installed and run, UMPC (Ultra Mobile PC), workstations, net-books, PDAs (Personal Digital Assistants), portable computers, web tablets, wireless phones ), mobile phone, smart phone, Pad, smart watch, wearable terminal, e-book, PMP (portable multimedia player), It may be a portable game console, navigation device, black box or digital camera, or other mobile communication terminal. Accordingly, the construction device 100 may need to install a separate program or application to build the pipeline. However, this is only one example, and it is also possible to build a pipeline by accessing a web page.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of the method or algorithm described in connection with embodiments of the present invention may be implemented directly in hardware, implemented as a software module executed by hardware, or a combination thereof. The software module may be RAM (Random Access Memory), ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), Flash Memory, hard disk, removable disk, CD-ROM, or It may reside on any type of computer-readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.Above, embodiments of the present invention have been described with reference to the attached drawings, but those skilled in the art will understand that the present invention can be implemented in other specific forms without changing its technical idea or essential features. You will be able to understand it. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive.

100 : 파이프라인 구축 장치 110 : 통신모듈
130 : 저장모듈 150: 분석모듈
170 : 학습모듈 190: 제어모듈
151 : 수집부 153 : 전처리부
155 : 용어생성부 157 : 패턴분석부
159 : 분류부100: Pipeline construction device 110: Communication module
130: storage module 150: analysis module
170: Learning module 190: Control module
151: collection unit 153: preprocessing unit
155: term generation unit 157: pattern analysis unit
159: Classification department

Claims

In a method of building a pipeline for detecting drug side effects, performed by a device,
An analysis module accessing a social channel based on a social network service (Social Network Services, SNS) to collect at least one social data related to a target drug based on a first term set;
Preprocessing the collected social data by the analysis module;
performing exploratory data analysis by the analysis module extracting data related to side effects from the preprocessed social data based on a second term set;
The analysis module analyzes drug side effect patterns for the target drug according to the analysis results and classifies them according to preset categories; and
A learning module includes a step of building or learning a drug side effect detection prediction model using the classification results,
The first term set includes data sets composed of at least one term representing each of at least one drug, and the second term set includes a data set composed of terms representing drug side effects,
The preset categories are set by the analysis module by dividing them into known side effects, unknown side effects, and indications for the target drug,
The drug side effect detection prediction model is learned by using the data classified as known side effects by the learning module as learning data, and among the data classified as unknown side effects, data accumulated more than a preset number of times is used to study the known side effects. Characterized in that it is reclassified and later used as learning data to learn the drug side effect detection prediction model,
How to build a pipeline for drug side effect detection.

delete

According to paragraph 1,
The exploratory data analysis is,
This is achieved by extracting keywords appearing in the at least one social data collected by the analysis module and creating a visualized graph using the extracted keywords,
The visualized graph is characterized in that it is generated using any one of a frequency analysis technique, a correlation analysis technique, and a word embedding technique.
How to build a pipeline for drug side effect detection.

According to paragraph 1,
The second set of terms is,
Characterized in that it is generated by mapping the list of drug side effects for the target drug obtained based on SIDER by the analysis module and the pre-generated dictionary of consumer terms to WHO-ART,
How to build a pipeline for drug side effect detection.

In a pipeline construction device for detecting drug side effects,
communication module;
a storage module that stores at least one piece of information or data for building a pipeline for detecting the drug side effects;
Access a social channel based on Social Network Services (SNS) to collect and pre-process at least one social data related to the target drug based on a first term set, and pre-process the data based on a second term set. an analysis module that extracts data related to side effects from the social data, performs exploratory data analysis, analyzes drug side effect patterns for the target drug according to the analysis results, and classifies them according to preset categories;
A learning module that builds or learns a drug side effect detection prediction model using the classification results; and
Collect and preprocess at least one social data related to the target drug based on the first term set, extract data related to side effects from the preprocessed social data based on the second term set, and perform exploratory data analysis; , Analyzing the drug side effect pattern for the target drug according to the analysis results, classifying it according to preset categories, and then using the classification result to control a control module to build or learn a drug side effect detection prediction model,
The first term set includes data sets composed of at least one term representing each of at least one drug, and the second term set includes a data set composed of terms representing drug side effects,
The preset categories are divided into known side effects, unknown side effects, and indications for the target drug by the analysis module, and
The drug side effect detection prediction model is learned by using the data classified as known side effects by the learning module as learning data, and among the data classified as unknown side effects, data accumulated more than a preset number of times is used to study the known side effects. Characterized in that it is reclassified and later used as learning data to learn the drug side effect detection prediction model,
Pipeline building device for drug side effect detection.