KR101609816B1

KR101609816B1 - The data fusion apparatus based healthcare data integration model and method therefor

Info

Publication number: KR101609816B1
Application number: KR1020140120470A
Authority: KR
Inventors: 이승룡; 알리 라흐만
Original assignee: 경희대학교 산학협력단
Priority date: 2014-09-11
Filing date: 2014-09-11
Publication date: 2016-04-06
Also published as: KR20160030807A

Abstract

본 발명은 바이오 메디컬 영역에 존재하는 이종의 데이터를 융합하고 편집하기 위한 데이터 통합모델을 기반으로 한 데이터 융합장치 및 그 방법에 관한 것으로 통합모델을 기반으로 한 데이터 융합장치는 이종 데이터 획득부, 데이터 가공부, 이종 데이터 융합부를 포함할 수 있으며 다양한 형식의 헬스케어 이종 데이터를 통합 모델을 이용하여 자동적으로 융합하고 사용자가 원하는 형식으로 데이터 세트를 추출하여 질병의 분석 및 예후를 위해 사용하는 것을 목적으로 한다.The present invention relates to a data fusion apparatus based on a data integration model for fusing and editing heterogeneous data existing in a biomedical region, and a method thereof. The data fusion apparatus based on the integrated model includes a heterogeneous data acquisition unit, Processing part, and heterogeneous data fusion part, and it is possible to automatically integrate various types of healthcare heterogeneous data using an integrated model and to extract data sets in a format that the user desires to use for analysis and prognosis of disease do.

Description

[0001] The present invention relates to a data fusion apparatus based on a healthcare data integration model,

본 발명은 바이오 메디컬 영역에 존재하는 이종의 데이터를 융합하고 편집하기 위한 장치 및 그 방법에 관한 것으로, 보다 상세하게는 데이터 통합모델을 기반으로 한 데이터 융합장치 및 그 방법에 관한 것이다.The present invention relates to an apparatus and method for fusing and editing heterogeneous data existing in a biomedical field, and more particularly, to a data fusion apparatus and a method thereof based on a data integration model.

종래에는 바이오 메디컬 분야의 연구 실무자는 방대한 양의 바이오 메디컬 데이터를 통해 질병의 다양한 양상을 연구하기 위해 바이오 메디컬 소프트웨어 툴(Tool)을 사용하여 왔다.Traditionally, biomedical researchers have used biomedical software tools to study various aspects of disease through vast amounts of biomedical data.

그러나 바이오 메디컬 데이터는 다수의 데이터 형식(format)을 가지는 데이터들로 이루어져 있어 하나의 시스템, 하나의 소프트웨어 툴에서 이러한 다양한 형식을 가진 데이터를 가공하고 유의미한 정보를 생성하는 데에는 제약이 존재하였다.However, since biomedical data is composed of data having a plurality of data formats, there is a limitation in processing data of various formats and generating meaningful information in one system and one software tool.

본 발명은 바이오 메디컬 분야의 다양한 형식의 헬스케어 이종 데이터를 통합 모델을 이용하여 자동적으로 융합하여 피드백 정보를 통해 산정된 우선 순위를 기반으로 한 통합 형식을 가지는 데이터 세트를 생성하고, 사용자가 원하는 형식으로 데이터 세트를 추출하여, 추출한 정보를 질병의 분석 및 예후를 위해 사용하는 것을 목적으로 한다. The present invention automatically merges healthcare heterogeneous data of various types of biomedical fields using an integrated model to generate a data set having an integrated format based on the calculated priority based on feedback information, , And to use the extracted information for analysis and prognosis of disease.

본 발명의 실시 예에 따르면 헬스케어 데이터 통합모델을 기반으로 한 데이터 융합장치는 피드백 데이터를 포함하며 적어도 하나 이상의 형식을 가지는 이종의 헬스케어 데이터를 획득하는 이종 데이터 획득부, 상기 획득된 이종 데이터에 대하여 전처리를 수행하고 데이터 세트 생성하며, 상기 생성된 데이터 세트에 피드백의 횟수 또는 신뢰도에 따라 우선순위를 지정하는 이종 데이터 가공부, 상기 지정된 우선순위를 기반으로 하여 각 데이터 세트들이 오버랩 되지 아니하도록 통합모델을 생성하고, 상기 생성된 통합모델에 따라 데이터 세트를 융합하는 이종 데이터 융합부를 포함할 수 있다.According to an embodiment of the present invention, a data fusion apparatus based on a healthcare data integration model includes a heterogeneous data acquisition unit for acquiring heterogeneous healthcare data including feedback data and having at least one format, A heterogeneous data processor for performing preprocessing on the generated data sets and assigning priority to the generated data sets according to the number of times or reliability of feedbacks, And a heterogeneous data fusion unit for generating a model and fusing the data set according to the generated integrated model.

본 발명의 실시 에에 따르면 이종 데이터 가공부는, 상기 획득된 이종 데이터에 존재하는 불균형 또는 노이즈 데이터를 제거하며 전처리를 통해 미리 설정된 형식의 데이터 세트를 생성하는 데이터 전처리부, 상기 생성된 데이터 세트에 포함된 데이터 중 피드백의 횟수 및 신뢰도가 높은 데이터를 포함한 데이터 세트에 높은 우선순위를 지정하는 우선순위 지정부를 더 포함할 수 있다.According to an embodiment of the present invention, the heterogeneous data processing unit includes a data preprocessing unit that removes unbalance or noise data existing in the acquired heterogeneous data and generates a data set of a preset format through a preprocessing, And a priority assigning unit that assigns a high priority to the data set including the number of times of the feedback and the data with high reliability among the data.

본 발명의 일 실시 예에 따르면 이종 데이터 융합부는， 상기 지정된 각 데이터 세트의 우선순위를 기반으로 데이터 세트를 융합하는 통합모델을 생성하는 통합모델 생성부, 상기 통합 모델에 따라 각 데이터 세트를 융합하여 생성된 융합 데이터 세트를 획득하는 융합 데이터 획득부, 상기 생성된 융합 데이터 세트 중 사용자가 요구하는 형식에 해당하는 데이터 세트를 추출하는 데이터 세트 추출부를 더 포함할 수 있다.According to an embodiment of the present invention, the heterogeneous data fusion unit includes an integrated model generation unit that generates an integrated model for fusing a data set based on the priority of each of the designated data sets, A fusion data obtaining unit for obtaining a fusion data set generated by the user, and a data set extracting unit for extracting a data set corresponding to a format requested by the user among the generated fusion data sets.

본 발명의 또 다른 실시 예에 이종 데이터 융합부는， 상기 생성된 융합 데이터 세트 중 추출해야 할 사용자가 필요한 형식에 대한정보를 획득하는 사용자 요청정보 획득부를 더 포함할 수 있다.In yet another embodiment of the present invention, the heterogeneous data fusion unit may further include a user request information obtaining unit that obtains information on a format required by a user to be extracted from the generated fusion data set.

본 발명의 실시 예에 따르면 헬스케어 데이터 통합모델을 기반으로 한 데이터 융합방법은 피드백 데이터를 포함하며 적어도 하나 이상의 형식을 가지는 이종의 헬스케어 데이터를 획득하는 단계, 상기 획득된 이종 데이터에 대하여 전처리를 수행하고 데이터 세트 생성하며, 상기 생성된 데이터 세트에 피드백의 횟수 또는 신뢰도에 따라 우선순위를 지정하는 단계, 상기 지정된 우선순위를 기반으로 하여 각 데이터 세트들이 오버랩 되지 아니하도록 통합모델을 생성하고, 상기 생성된 통합모델에 따라 데이터 세트를 융합하는 단계를 포함할 수 있다.According to an embodiment of the present invention, a data fusion method based on a healthcare data integration model includes the steps of acquiring heterogeneous healthcare data including feedback data and having at least one format, preprocessing the obtained heterogeneous data Generating a data set and assigning a priority to the generated data set according to the number of times or reliability of feedback, generating an integrated model so that each data set is not overlapped on the basis of the designated priority, And fusing the data set according to the generated integrated model.

본 발명의 실시 예에 따르면 우선순위를 지정하는 단계는, 상기 획득된 이종 데이터에 존재하는 불균형 또는 노이즈 데이터를 제거하며 전처리를 통해 미리 설정된 형식의 데이터 세트를 생성하는 단계, 상기 생성된 데이터 세트에 포함된 데이터 중 피드백의 횟수 및 신뢰도가 높은 데이터를 포함한 데이터 세트에 높은 우선순위를 지정하는 단계를 더 포함할 수 있다.According to an embodiment of the present invention, the step of designating the priority order includes the steps of removing unbalance or noise data existing in the obtained heterogeneous data and generating a data set of a preset format through a preprocessing, And a step of assigning a high priority to the data set including the number of times of feedback and the data with high reliability among the included data.

본 발명의 일 실시 예에 따르면 상기 데이터 세트를 융합하는 단계는， 상기 지정된 각 데이터 세트의 우선순위를 기반으로 데이터 세트를 융합하는 통합모델을 생성하는 단계, 상기 통합 모델에 따라 각 데이터 세트를 융합하여 생성된 융합 데이터 세트를 획득하는 단계, 상기 생성된 융합 데이터 세트 중 사용자가 요구하는 형식에 해당하는 데이터 세트를 추출하는 단계를 더 포함할 수 있다.According to an embodiment of the present invention, the fusing of the data sets may include generating an integrated model that fuses the data sets based on the priorities of the designated respective data sets, Acquiring a fusion data set generated by the fusion data set, and extracting a data set corresponding to a format requested by the user from the generated fusion data sets.

본 발명의 또 다른 실시 예에 따르면 상기 데이터 세트를 융합하는 단계는 ， According to another embodiment of the present invention,

상기 생성된 융합 데이터 세트 중 추출해야 할 사용자가 필요한 형식에 대한정보를 획득하는 단계를 더 포함할 수 있다.And acquiring information on a format required by a user to be extracted from the generated fusion data set.

도 1은 본 발명의 실시 예에 따른 통합모델을 기반으로 한 데이터 융합장치의 구성도이다.
도 2는 도 1에 개시된 본 발명의 실시 예에 따른 데이터 가공부의 세부 구성도이다.
도 3은 도 1에 개시된 본 발명의 일 실시 예에 따른 이종 데이터 융합부의 세부 구성도이다
도 4는 도 1에 개시된 본 발명의 또 다른 실시 예에 따른 이종 데이터 융합부의 세부 구성도이다
도 5는 본 발명의 일 실시 예에 따른 통합모델을 기반으로 한 데이터 융합장치를 구현한 것을 나타낸 도면이다.
도 6은 본 발명의 일 실시 예에 따른 통합모델을 기반으로 한 데이터 융합방법을 나타내는 흐름도이다.1 is a block diagram of a data fusion apparatus based on an integrated model according to an embodiment of the present invention.
2 is a detailed configuration diagram of a data processing unit according to an embodiment of the present invention disclosed in FIG.
FIG. 3 is a detailed configuration diagram of a heterogeneous data fusion unit according to an embodiment of the present invention shown in FIG. 1
4 is a detailed configuration diagram of a heterogeneous data fusion unit according to another embodiment of the present invention disclosed in FIG. 1
FIG. 5 illustrates a data fusion apparatus based on an integrated model according to an embodiment of the present invention. Referring to FIG.
6 is a flowchart illustrating a data fusion method based on an integrated model according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise.

이하, 도면을 참조하여 본 발명의 실시 예에 따른 헬스케어 데이터 통합모델을 기반으로 한 데이터 융합장치 및 그 방법에 대하여 설명한다.Hereinafter, a data fusion apparatus and method based on a healthcare data integration model according to an embodiment of the present invention will be described with reference to the drawings.

도 1은 본 발명의 실시 예에 따른 통합모델을 기반으로 한 데이터 융합장치의 구성도이다.1 is a block diagram of a data fusion apparatus based on an integrated model according to an embodiment of the present invention.

도 1을 참조하면 통합모델을 기반으로 한 데이터 융합장치는 이종 데이터 획득부(100), 데이터 가공부(200), 이종 데이터 융합부(300)를 포함할 수 있다.Referring to FIG. 1, a data fusion apparatus based on an integrated model may include a heterogeneous data acquisition unit 100, a data processing unit 200, and a heterogeneous data fusion unit 300.

이종 데이터 획득부(100)는 피드백 데이터를 포함하며 적어도 하나 이상의 형식을 가지는 이종의 헬스케어 데이터를 획득할 수 있다.The heterogeneous data acquisition unit 100 may acquire heterogeneous healthcare data including feedback data and having at least one format.

본 발명의 일 실시 예에 따르면 피드백 데이터는 네트워크를 통해 특정분야의 전문가 집단이 해당 데이터에 대하여 피드백을 준 횟수 및 피드백을 준 전문가에 대한 신뢰도, 피드백 자체에 대한 신뢰도를 수치화 한 데이터를 의미할 수 있다.According to an embodiment of the present invention, the feedback data may be data obtained by quantifying the number of times the expert group of a specific field has given feedback on the data through the network, the reliability of the expert who gave the feedback, and the reliability of the feedback itself have.

본 발명의 실시 예에 따르면 이종 데이터는 전자 의료 기록 및 전자 건강기록, 센서 정보 및 소셜 미디어 정보 등과 같은 환자 정보시스템에서부터 획득한 데이터일 수 있으나 이에 한정되지 아니한다.According to an embodiment of the present invention, heterogeneous data may be but is not limited to data obtained from patient information systems such as electronic medical records and electronic health records, sensor information and social media information.

획득할 수 있는 이종 데이터의 형식은 연산 및 로딩이 가능한 디지털 파일의 형식이라면 제한 없이 사용될 수 있으며 TSV, CSV등 데이터 베이스에 접속할 수 있는 형식이 사용될 수도 있다.The type of heterogeneous data that can be acquired can be used without restriction as long as it is a digital file format that can be computed and loaded, and a format capable of accessing a database such as TSV or CSV may be used.

데이터 가공부(200)는 획득된 이종 데이터에 대하여 전처리를 수행하고 데이터 세트 생성하며, 상기 생성된 데이터 세트에 피드백의 횟수 또는 신뢰도에 따라 우선순위를 지정할 수 있다.The data processor 200 preprocesses the generated heterogeneous data and generates a data set, and assigns priority to the generated data set according to the number of times of feedback or reliability.

데이터 가공부(200)는 도 2을 참조하여 더 자세히 설명하도록 한다.The data processing unit 200 will be described in detail with reference to FIG.

이종 데이터 융합부(300)는 지정된 우선순위를 기반으로 하여 각 데이터 세트들이 오버랩 되지 아니하도록 통합모델을 생성하고, 상기 생성된 통합모델에 따라 데이터 세트를 융합할 수 있다.The heterogeneous data fusion unit 300 may generate an integrated model such that each data set is not overlapped based on the designated priority and may fuse the data set according to the generated integrated model.

이종 데이터 융합부(300)는 도 3 및 도 4를 참조하여 더 자세히 설명하도록 한다.The heterogeneous data fusion unit 300 will be described in more detail with reference to FIG. 3 and FIG.

도 2는 도 1에 개시된 본 발명의 실시 예에 따른 데이터 가공부(200)의 세부 구성도이다.FIG. 2 is a detailed configuration diagram of the data processing unit 200 according to the embodiment of the present invention shown in FIG.

데이터 가공부(200)는 데이터 전처리부(210), 우선순위 지정부(220)를 포함 할 수 있다.The data processing unit 200 may include a data preprocessing unit 210 and a priority specification unit 220.

데이터 전처리부(210)는 획득된 이종 데이터에 존재하는 불균형 또는 노이즈 데이터를 제거하며 전처리를 통해 미리 설정된 형식의 데이터 세트를 생성할 수 있다.The data preprocessing unit 210 removes unbalance or noise data existing in the obtained heterogeneous data and can generate a data set of a predetermined format through the preprocessing.

본 발명의 일 실시 예에 따르면 데이터 전처리부(210)는 이종 데이터 획득부(100)에서 획득한 데이터를 저장 메모리에 저장할 수 있으며 데이터의 융합이 요청될 때마다 저장 메모리에서 해당 데이터들을 로딩할 수 있다.According to an embodiment of the present invention, the data preprocessing unit 210 may store the data acquired by the heterogeneous data acquisition unit 100 in a storage memory and may load the corresponding data in the storage memory every time data fusion is requested have.

본 발명의 실시 예에 따르면 데이터 전처리부(210)는 데이터 세트의 융합을 하기 전에 필요한 처리, 즉 프로세싱(Processing) 작업을 수행할 수 있다.According to the embodiment of the present invention, the data preprocessing unit 210 can perform a necessary processing, that is, a processing operation, before merging a data set.

본 발명의 실시 예에 따르면 전처리는 우선 획득한 이종 데이터 중에 포함된 불균형한 데이터 및 노이즈 데이터들을 제거하는 작업 및 제거 후 남은 데이터를 이용하여 사용자가 미리 설정한 형식에 따라 데이터 세트를 구성하는 것을 의미할 수 있다.According to the embodiment of the present invention, the preprocessing means that the data set is configured according to the preset format by the user using the remaining data after removing the unbalanced data and the noise data included in the heterogeneous data acquired first can do.

여기서 데이터 세트란 일정한 분류자(classifier)를 기준으로 하여 하나로 묶여진 데이터의 다발을 의미할 수 있다.Here, the data set may refer to a bundle of data bundled together with a certain classifier.

분류자는 사용자가 미리 설정한 기준에 따를 수 있으며 본 발명의 일 실시 예에 따면 데이터의 내용, 대상, 카테고리, 형식 등 다양한 기준들이 사용될 수 있다.The classifier can follow the criteria set by the user in advance, and according to one embodiment of the present invention, various criteria such as contents, object, category, and format of data can be used.

우선순위 지정부(220)는 생성된 데이터 세트에 포함된 데이터 중 피드백의 횟수 및 신뢰도가 높은 데이터를 포함한 데이터 세트에 높은 우선순위를 지정할 수 있다.The priority specification unit 220 can assign a high priority to the data set including the number of times of feedback and highly reliable data among the data included in the generated data set.

여기서 우선 순위는 동일한 대상에 대한 데이터가 존재하는 경우 데이터를 통합하는데 있어 어떤 데이터의 내용이 우선적으로 사용되어야 하는지에 대한 순위를 의미한다.Here, the priority means a ranking of what data content should be used in order to integrate data when there is data about the same object.

우선 순위가 필요한 이유는 이종의 데이터들 중에 동일한 대상에 대한 다른 내용을 가진 데이터 및 동일 한 대상에 대한 다른 형식을 가진 데이터가 존재할 가능성이 있는바. 이런 경우 특정 내용 및 특정 형식에 대하여 우선 순위를 부여하여 우선 순위가 높은 데이터를 중심으로 데이터를 융합하는 것이 데이터를 융합할 때 무작위로 오버랩 시키는 것보다 더 높은 정보에 대한 신뢰도 및 높은 정보의 유의성을 가질 수 있게 되기 때문이다.The reason why priority is needed is that there may exist data having different contents for the same object and different types of data for the same object among different kinds of data. In this case, prioritizing specific contents and specific formats, and fusing data based on high-priority data is more important than reliability of higher information and higher information significance than randomly overlapping data when fusing. .

본 발명의 일 실시 예에 따르면 우선 순위는 이종 데이터에 포함되어 있는 피드백 데이터를 통해 각 데이터 세트 별로 산정할 수 있다.According to an embodiment of the present invention, the priority order can be calculated for each data set through the feedback data included in the heterogeneous data.

여기서 우선 순위를 산정하는 방법은 피드백 데이터에 포함된 내용에 따라 상대적으로 피드백을 준 횟수가 높은 데이터, 피드백을 준 전문가의 신뢰도가 높은 데이터, 피드백 자체에 대한 신뢰도가 높은 데이터에 사용자가 미리 설정한 가중치를 주어 수치를 계산하여 각 수치를 데이터 별로 산정하고, 데이터 세트에 포함된 데이터의 수치의 총합에 따라 총합이 높을수록 높은 우선순위를 산정하는 방법을 사용할 수 있다.Here, the method of estimating the priority is a method in which the data having a high number of times of giving feedback relative to the contents included in the feedback data, the data having high reliability of the expert who gave the feedback, A method of calculating a numerical value by giving a weight, calculating each value by data, and calculating a higher priority according to the total sum of the numerical values of data included in the data set.

이렇게 산정된 우선순위를 각 데이터 세트 별로 지정할 수 있다.The calculated priority can be specified for each data set.

도 3은 도 1에 개시된 본 발명의 일 실시 예에 따른 이종 데이터 융합부(300)의 세부 구성도이다.FIG. 3 is a detailed configuration diagram of a heterogeneous data fusion unit 300 according to an embodiment of the present invention shown in FIG.

도 3을 참조하면 이종 데이터 융합부(300)는 통합모델 생성부(310), 융합 데이터 획득부(320), 데이터 세트 추출부(330)를 포함할 수 있다.Referring to FIG. 3, the heterogeneous data fusion unit 300 may include an integrated model generation unit 310, a fusion data acquisition unit 320, and a data set extraction unit 330.

통합모델 생성부(310)는 지정된 각 데이터 세트의 우선순위를 기반으로 데이터 세트를 융합하는 통합모델을 생성할 수 있다.The integrated model generation unit 310 may generate an integrated model that fuses the data sets based on the priority of each designated data set.

본 발명의 실시 예에 따르면 통합 모델은 우선 순위가 지정된 데이터 세트를 융합하는 기준이 되는 구조모델을 의미할 수 있으며 이에 따라 융합된 데이터의 고유 및 공유 기능이 정의될 수 있다.According to an embodiment of the present invention, an integrated model may refer to a structural model that is a basis for fusing a set of prioritized data, so that the unique and shared functions of the fused data can be defined.

통합모델은 우선순위를 가지는 데이터 세트의 내용으로 동일한 대상을 가지는 데이터 세트의 내용을 오버랩, 즉 덮어쓰며 융합될 수 있게 생성될 수 있다.An integrated model can be created such that the contents of a data set having the same object as the contents of a prioritized data set can be overlapped, i.e. overwritten, and fused.

융합 데이터 획득부(320)는 생성된 통합 모델에 따라 각 데이터 세트를 융합하여 생성된 융합 데이터 세트를 획득할 수 있다.The fusion data obtaining unit 320 may obtain a fusion data set generated by fusing each data set according to the generated integrated model.

데이터 세트 추출부(330)는 생성된 융합 데이터 세트 중 사용자가 요구하는 형식에 해당하는 데이터 세트를 추출할 수 있다.The data set extracting unit 330 can extract a data set corresponding to the format requested by the user from among the generated fusion data sets.

여기서 사용자가 요구하는 형식은 사용자가 미리 설정해놓은 형식일 수 있다.Here, the format requested by the user may be a format preset by the user.

도 4는 도 1에 개시된 본 발명의 또 다른 실시 예에 따른 이종 데이터 융합부(300)의 세부 구성도이다.FIG. 4 is a detailed configuration diagram of a heterogeneous data fusion unit 300 according to another embodiment of the present invention shown in FIG.

도 4를 참조하면 이종 데이터 융합부(300)는 통합모델 생성부(310), 융합 데이터 획득부(320), 데이터 세트 추출부(330), 사용자 요청정보 획득부(340)을 포함할 수 있다.4, the heterogeneous data fusion unit 300 may include an integrated model generation unit 310, a fusion data acquisition unit 320, a data set extraction unit 330, and a user request information acquisition unit 340 .

본 발명의 일 실시 예에 따르면 통합모델 생성부(310), 융합 데이터 획득부(320), 데이터 세트 추출부(330)는 도 3에서 상술한 바와 동일하며, 사용자 요청정보 획득부(340)는 데이터 세트 추출부(330)에서 데이터 세트를 추출하는데 필요한 사용자가 요구하는 형식에 대한 정보를 추가적으로 획득할 수 있다.3, the user request information obtaining unit 340 may obtain the user ID of the user who requests the merged model, the merged data, and the data set. The merged model generating unit 310, the merged data obtaining unit 320, and the data set extracting unit 330 are the same as those described above with reference to FIG. The data set extraction unit 330 can additionally obtain information on a format required by the user for extracting the data set.

도 5는 본 발명의 일 실시 예에 따른 통합모델을 기반으로 한 데이터 융합장치를 구현한 것을 나타낸 도면이다.FIG. 5 illustrates a data fusion apparatus based on an integrated model according to an embodiment of the present invention. Referring to FIG.

본 발명의 일 실시 예에 따르면 데이터 획득부가(100)가 다수의 이종 데이터를 획득하고, 획득한 이종 데이터에 대하여 데이터 전처리부(210)가 불균형한 데이터 및 노이즈 데이터를 제거한 후 데이터 세트를 생성하며, 우선 순위 지정부(220)에서 생성된 데이터 세트에 우선순위를 지정할 수 있다.According to an embodiment of the present invention, the data acquisition unit 100 acquires a plurality of heterogeneous data, and the data preprocessing unit 210 for the obtained heterogeneous data removes unbalanced data and noise data and generates a data set , It is possible to assign a priority to the data set generated by the priority specification unit 220.

통합 모델 생성부(310)은 우선 순위에 따라 데이터를 융합할 수 있는 통합 모델을 생성하고, 융합 데이터 획득부(320)에서는 통합 모델에 따라 데이터를 융합하고 데이터 세트를 생성할 수 있으며, 사용자 요청정보 획득부(340)에서 획득한 사용자 요청정보에 따라 데이터 세트 추출부(330)에서는 사용자가 요구하는 형식의 데이터 세트를 생성된 융합 데이터 세트로부터 추출할 수 있다.The integrated model generation unit 310 generates an integrated model capable of fusing data according to the priority order. The fusion data acquisition unit 320 can fuse data according to the integrated model and generate a data set. According to the user request information acquired by the information obtaining unit 340, the data set extracting unit 330 can extract a data set of a format requested by the user from the generated fusion data set.

도 6은 본 발명의 일 실시 예에 따른 통합모델을 기반으로 한 데이터 융합방법을 나타낸 흐름도이다.FIG. 6 is a flowchart illustrating a data fusion method based on an integrated model according to an embodiment of the present invention.

이종 데이터를 획득한다(610). And obtains heterogeneous data (610).

본 발명의 일 실시 예에 따르면 피드백 데이터는 네트워크를 통해 특정분야의 전문가 집단이 해당 데이터에 대하여 피드백을 준 횟수 및 피드백을 준 전문가에 대한 신뢰도, 피드백 자체에 대한 신뢰도를 수치화 하는 데이터를 의미할 수 있다.According to an embodiment of the present invention, the feedback data may be data for quantifying the number of times the expert group of a specific field has provided feedback on the data through the network, the reliability of the expert who gave the feedback, and the reliability of the feedback itself have.

획득한 데이터를 대상으로 전처리를 수행하고 데이터 세트를 생성한다(620).Preprocessing is performed on the acquired data and a data set is generated (620).

생성된 데이터 세트의 우선 순위를 지정한다(630).The priority of the generated data set is designated (630).

여기서 우선 순위를 산정하는 방법은 피드백 데이터에 포함된 내용에 따라 상대적으로 피드백을 준 횟수가 높은 데이터, 피드백을 준 전문가의 신뢰도가 높은 데이터, 피드백 자체에 대한 신뢰도가 높은 데이터에 사용자가 미리 설정한 가중치를 두어 수치를 계산하여 각 수치를 데이터 별로 산정하고, 데이터 세트에 포함된 데이터의 수치의 총합에 따라 총합이 높을수록 높은 우선순위를 산정하는 방법을 사용할 수 있다.Here, the method of estimating the priority is a method in which the data having a high number of times of giving feedback relative to the contents included in the feedback data, the data having high reliability of the expert who gave the feedback, A weighting value is calculated to calculate each value by data, and a method of calculating a higher priority value as the total value is calculated according to the total value of the data included in the data set can be used.

우선 순위를 기반으로 통합 모델을 생성한다(640).An integrated model is created based on the priority (640).

본 발명의 실시 예에 따르면 통합모델은 우선순위를 가지는 데이터 세트의 내용으로 동일한 대상을 가지는 데이터 세트의 내용을 오버랩, 즉 덮어쓰며 융합될 수 있게 생성될 수 있다.According to an embodiment of the present invention, the integrated model can be generated such that the contents of the data set having the same object as the contents of the data set having the priority can be overlapped, i.e. overwritten, and fused.

통합 모델에 따라 데이터를 융합하여 융합 데이터 세트를 생성한다(650).The data is fused according to the integrated model to generate a fusion data set (650).

사용자가 요청한 형식에 따라 데이터 세트를 추출한다(660).The data set is extracted according to the format requested by the user (660).

사용자가 요구하는 형식은 사용자가 미리 설정해놓은 형식일 수 있으며, 획득한 사용자 요청정보에 포함된 형식일 수도 있다.The format requested by the user may be a format preset by the user, or may be a format included in the acquired user request information.

본 발명의 실시 예는 이상에서 설명한 장치 및/또는 방법을 통해서만 구현이 되는 것은 아니며, 이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Various modifications and improvements of those skilled in the art using the basic concept of the present invention are also within the scope of the present invention.

100 : 이종 데이터 획득부 200 : 데이터 가공부
210 : 데이터 전처리부 220 : 우선순위 지정부
300 : 이종 데이터 융합부 310 : 통합모델 생성부
320 : 융합 데이터 획득부 330 : 데이터 세트 추출부
340 : 사용자 요청정보 획득부 100: heterogeneous data acquisition unit 200:
210: data preprocessing unit 220:
300: heterogeneous data fusion unit 310: integrated model generation unit
320: fusion data acquisition unit 330: data set extraction unit
340: User request information acquisition unit

Claims

A heterogeneous data obtaining unit for obtaining heterogeneous healthcare data including feedback data and having at least one format;
A heterogeneous data processor for performing preprocessing on the acquired heterogeneous data and generating a data set, and assigning priority to the generated data set according to the number of times or reliability of feedback; And
And a heterogeneous data fusion unit for generating an integrated model such that each data set is not overlapped based on the designated priority and fusing the data set according to the generated integrated model,
Wherein the heterogeneous data processing unit comprises:
A data preprocessing unit for removing unbalance or noise data existing in the obtained heterogeneous data and generating a data set of a predetermined format through preprocessing; And
Further comprising a priority assigning unit for assigning a high priority to a data set including data having high reliability and high frequency of feedback among data included in the generated data set, Fusion device.

delete

The apparatus of claim 1, wherein the heterogeneous data fusion unit comprises:
An integrated model generation unit that generates an integrated model that fuses the data sets based on the priority of each of the designated data sets;
A fusion data acquiring unit acquiring a fusion data set generated by fusing each data set according to the integrated model; And
And a data set extracting unit for extracting a data set corresponding to a format requested by a user from the generated fusion data sets.

The apparatus of claim 3, wherein the heterogeneous data fusion unit comprises:
And a user request information acquisition unit for acquiring information on a format required by a user to be extracted from the generated fusion data set.

Obtaining heterogeneous healthcare data including feedback data and having at least one format;
Performing preprocessing on the obtained heterogeneous data and generating a data set, and assigning a priority to the generated data set according to the number of times of feedback or reliability; And
Generating an integrated model such that each data set is not overlapped based on the designated priority and fusing a data set according to the generated integrated model,
Wherein the step of assigning the priority comprises:
Removing unbalance or noise data existing in the obtained heterogeneous data and generating a data set of a preset format through a preprocessing; And
Further comprising the step of assigning a high priority to a data set including data having a high number of times of feedback and reliability among the data included in the generated data set, and a data fusion method based on the healthcare data integration model .

delete

6. The method of claim 5, wherein fusing the data set comprises:
Generating an integrated model that fuses a set of data based on a priority of each of the designated data sets;
Acquiring a fusion data set generated by fusing each data set according to the integrated model; And
Further comprising the step of extracting a data set corresponding to a format requested by a user from among the generated fusion data sets.

8. The method of claim 7, wherein fusing the data set comprises:
Further comprising the step of acquiring information on a format required by a user to be extracted from the generated fusion data set.