KR101799823B1 - Method of Normalization for Combination of Clinical Data from Different Electronic Healthcare Databases and System thereof - Google Patents

Method of Normalization for Combination of Clinical Data from Different Electronic Healthcare Databases and System thereof Download PDF

Info

Publication number
KR101799823B1
KR101799823B1 KR1020150113953A KR20150113953A KR101799823B1 KR 101799823 B1 KR101799823 B1 KR 101799823B1 KR 1020150113953 A KR1020150113953 A KR 1020150113953A KR 20150113953 A KR20150113953 A KR 20150113953A KR 101799823 B1 KR101799823 B1 KR 101799823B1
Authority
KR
South Korea
Prior art keywords
data set
normalization
data
normalization target
database
Prior art date
Application number
KR1020150113953A
Other languages
Korean (ko)
Other versions
KR20170019739A (en
Inventor
박래웅
윤덕용
Original Assignee
아주대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 아주대학교산학협력단 filed Critical 아주대학교산학협력단
Priority to KR1020150113953A priority Critical patent/KR101799823B1/en
Publication of KR20170019739A publication Critical patent/KR20170019739A/en
Application granted granted Critical
Publication of KR101799823B1 publication Critical patent/KR101799823B1/en

Links

Images

Classifications

    • G06F19/3443
    • G06F19/32
    • G06F19/36

Abstract

The present invention relates to a method and apparatus for analyzing and analyzing medical data collected from a plurality of medical institutions.
A method for normalizing a multicenter inspection data according to the present invention includes a subgroup dividing step of dividing a normalization target data set stored in a normalization target database to be normalized into at least two or more subgroups based on a predetermined characteristic index, A correction statistic for calculating a correction statistical index for normalizing the normalization target data set according to the reference data set using the statistical information of the normalization target data set and the reference data set stored in the reference database serving as a reference for performing the normalization, And a normalizing step of normalizing the inspection data values of the data samples included in the normalization target data set based on the reference data set using the correction statistical index.
According to the method and apparatus of the present invention, it is possible to eliminate the heterogeneity of numerical values of clinical test data existing between databases generated according to the numerical values of clinical test data measured at different medical institutions, So that it can be used as a single integrated analysis.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and system for normalizing multi-

The present invention relates to a method and system for analyzing and analyzing medical data collected from a plurality of medical institutions.

Distributed Research Networks, which allow mutual access to inspection data that each institution acquires and manages, is designed to normalize the inspection data acquired by a plurality of partner organizations and to share the analysis results, There is an advantage that inspection data can be obtained. For example, medical data obtained from a plurality of different hospitals from their patients can be shared among partner institutions through a distributed research network. However, the results of examinations or experiments obtained in different institutions, for example in different medical institutions, are obtained by measuring different equipment for different patients or groups of subjects, and thus are clinically or demographically heterogeneous There is a difficulty in analyzing and integrating it into one data set.

However, the existing normalization methods do not show satisfactory results when analyzing the inspection data acquired from the manifolds into a single data set. For example, in applying the above-described manifold inspection data to analysis, the existing rank-based method (prior art document 001) The conventional Z-score conversion method (prior art document 002) has a limitation in that it can not correct the heterogeneity between data merely by matching the mean and variance between different data groups .

In other words, the existing test data normalization or integrated analysis methods have a limitation in correctly integrating and analyzing test record data such as clinical test data and medical record data acquired and managed by various institutions.

Beasley TM, Erickson S and Allison DB Rank-based inverse normal transformations are increasingly used, but they are merited Behav Genet 2009; 39: 580-595. DOI: 10.1007 / s10519-009-9281-0

Cheadle C, Vawter MP, Freed WJ, et al. Analysis of microarray data using Z score transformation. J Mol Diagn 2003; 5: 73-81. DOI: 10.1016 / S1525-1578 (10) 60455-2

The present invention provides a method for normalizing and analyzing inspection data included in a database storing a medical data set acquired and stored and managed by patients from a plurality of medical institutions, and a system therefor will be.

The problem to be solved by the present invention is to overcome the clinicopathological or demographic structural heterogeneity existing in the test data obtained for different patients or subjects in the medical institutions and normalize and integrate the test data sets Method and a system therefor.

It is another object of the present invention to provide a method and system for analyzing a plurality of sets of medical record data while maintaining the confidentiality of patient information included in the medical record.

According to an aspect of the present invention, there is provided a method for normalizing a multicast inspection data according to one aspect of the present invention, the method comprising the steps of: normalizing a normalization target data set stored in a normalization target database to be normalized, A partial grouping step of dividing the partial grouping step; Calculating a correction statistical index for normalizing the normalization target data set according to the reference data set using the statistical information of the normalization target data set and the reference data set stored in the reference database serving as a reference for performing normalization, Statistical index calculation step; And a normalizing step of normalizing inspection data values of data samples included in the normalization target data set based on the reference data set using the correction statistical index.

Here, the reference database and the normalization target database may be connected to each other through a network.

Wherein the characteristic index includes a demographic index or a clinical index.

Wherein the reference database and the normalization target database are medical record databases for storing medical records and the inspection data values of the data samples included in the reference data set and the normalization target data set are the medical data values measured for the patient or the subject .

Here, the reference database and the normalization object database may include a plurality of electronic medical record systems connected to each other through a network, the distributed research network comprising: a database that operates in cooperation with the electronic medical record system; .

Here, the method may further include a database selection step of selecting the reference database or the normalization target database among at least two or more networks connected through a network.

Wherein the subgroup segmentation step may include a step of dividing the normalization target data set into a plurality of the subgroups based on the characteristic index, wherein the second processing unit is interlocked with the normalization target database.

Wherein the subgrouping step may further include the step of dividing the reference data set into a plurality of the subgroups based on the characteristic index, wherein the first processing unit is interlocked with the reference database.

Here, the correction statistical index calculating step may calculate the correction statistical index using information on the distribution of the number of the population of the subset of the normalization target data set.

Wherein the correction statistical index includes a corrected average of the reference data set and a corrected standard deviation calculated by correcting the reference data set using the information on the distribution of population numbers of the partial group of the normalization target data set .

Wherein the correction statistical index calculating step calculates the statistical information of the normalization target data set including information on the distribution of the number of the population of the normalization target data set, And calculating the correction statistical index using the statistical information of the reference data set including the information on the distribution of the number of the population of the reference data set.

Here, the correction statistical index calculation step may include: calculating a statistical information of the normalization target data set, the second processing unit being interlocked with the normalization target database; A first processing unit linked with the reference database, the method comprising: calculating statistical information of the reference data set; And the first processing unit may calculate the correction statistical index using the statistical information of the normalization target data set and the statistical information of the reference data set.

Wherein the statistical information of the reference data set includes an average of the inspection data values of the data samples of the reference data set, an average of the inspection data values of the data samples of the subset of the reference data set, Wherein the statistical information of the normalization target data set includes the number of data samples of the normalization target data set and the number of data samples of the subset group of the normalization target data set .

Wherein the normalizing step includes a step of calculating an inspection data value of a data sample included in the normalization target data set by using the corrected average and the corrected standard deviation of the reference data set and the inspection data of the data sample of the normalization target data set And normalization is performed using the average and standard deviation of the numerical values.

Here, the computer program according to another aspect of the present invention may be a computer program stored in a medium for executing the method of normalizing the multicenter inspection data in combination with a database.

According to another aspect of the present invention, there is provided a multi-pipe inspection data normalization system including a first processing unit interlocked with a reference database storing a reference data set serving as a reference for performing normalization; And a second processing unit operable to interoperate with a normalization target database that stores a normalization target data set to be subjected to normalization, wherein the first processing unit performs a correction for normalizing the normalization target data set according to the reference data set, Wherein the second processing unit divides the normalization target data set into at least two or more partial groups based on a predetermined characteristic index and sets the inspection data values of the data samples included in the normalization target data set as And normalizing the reference statistical data based on the reference data set using the correction statistical index.

Wherein the second processing unit comprises: a second subgroup dividing unit for dividing the normalization target data set into the subgroup based on the characteristic index; A second statistical information calculating unit for calculating statistical information of the normalization target data set including information on the distribution of the number of the population of the normalization target data set; And a normalization unit for normalizing inspection data values of data samples included in the normalization target data set based on the reference data set using the correction statistical index.

The first processing unit may include: a first subgroup dividing unit that divides the reference data set into the subgroup based on the characteristic index; A first statistical information calculating unit for calculating statistical information of the reference data set including information on the distribution of the number of the population of the reference data set; And a correction statistical index calculation unit for calculating the correction statistical index using the statistical information of the normalization target data set, the statistical information of the reference data set, and the inspection data values of the data samples of the reference data set .

According to the method and apparatus of the present invention, it is possible to eliminate the heterogeneity of numerical values of clinical test data existing between databases generated according to the numerical values of clinical test data measured at different medical institutions, So that it can be used as a single integrated analysis.

In particular, the method of normalizing the multicenter inspection data according to the present invention has the effect of normalizing and integrating and analyzing the medical data acquired from different data sources in the distributed research network environment.

FIG. 1 is a reference view showing a network system in which a method of normalizing manifest inspection data according to an embodiment of the present invention operates.
2 is a block diagram illustrating a system in which a method for normalizing manifest inspection data according to an embodiment of the present invention operates.
3 is a flow chart of a method for normalizing manifold inspection data according to an embodiment of the present invention.
4 is a flowchart of a method for normalizing manifold inspection data according to another embodiment of the present invention.
5 is a detailed flowchart of a method for normalizing the manifold inspection data according to the present invention.
FIG. 6 is a detailed block diagram of a first processing unit operating in conjunction with a reference database in the multicenter inspection data normalization system according to the present invention.
FIG. 7 is a detailed block diagram of a second processing unit operating in conjunction with a normalization target database in the multicenter inspection data normalization system according to the present invention.
FIG. 8 is a reference diagram showing an example of the operation of the correction statistical index calculation step according to the present invention.
9 is a reference diagram showing the heterogeneity of data existing between data sets obtained from different medical institutions.
FIG. 10 is a reference diagram for explaining the result of performing the group correction normalization method according to the present invention and the comparison result of the results according to the conventional method.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, the preferred embodiments of the present invention will be described below, but it is needless to say that the technical idea of the present invention is not limited thereto and can be variously modified by those skilled in the art.

First, the method for normalizing the manifold inspection data according to the present invention and the basic background in which the system is invented will be described in more detail.

Electronic Health Record System or Electronic Medical Record System, which acquires and manages medical records electronically, is widely used, and electronic medical records are networked among mutual partnerships. Researches are underway to integrate and analyze the data. Distributed research networks, which allow mutual access to the medical data that each institution acquires and manage, can collect more inspection data sets by collectively analyzing the inspection data acquired by a plurality of partner organizations There are advantages. It is preferable that a plurality of partner organizations are connected to each other in the network. For example, medical data obtained from a plurality of different hospitals from their patients can be shared among partner institutions through a distributed research network. In addition, the analysis data integrated through the distributed research network can be effectively used for retrospective study.

However, there are the following problems or limitations in integrating and analyzing electronic medical record systems of different medical institutions.

First, the test or experimental results obtained from different medical institutions are obtained by measuring with different equipment for different patients or groups of subjects. Thus, the test data obtained from each of these manifolds are clinically or demographically heterogeneous and difficult to integrate into one data set.

However, the existing normalization methods do not show satisfactory results when analyzing the inspection data acquired from the manifolds into a single data set. For example, in applying the above-described manifold inspection data to analysis, the existing rank-based method (prior art document 001) The conventional Z-score conversion method (Prior Art Document 002) is a method for simply matching the mean and variance between different data groups, There is a limitation in that it can not be corrected.

In other words, the existing test data normalization or integrated analysis methods can not be used to correctly integrate and analyze analysis record data such as clinical test data or medical record data that are acquired and managed by various institutions while maintaining the clinical meaning, .

Second, due to the nature of medical records, there is a problem of confidentiality of patient information. Therefore, there is a limitation that the medical records including the patient's personal information can not be simply analyzed and exchanged among the medical institutions.

In the present invention, in the distributed research networks, it is necessary to solve the heterogeneity between the medical data acquired at various medical institutions at the same time while ensuring confidentiality of the medical records as described above, and to perform normalization Method and a system therefor. The method proposed by the present invention is called a subgroup-adjusted normalization (SAN) method.

Hereinafter, the method of normalizing the multicenter inspection data according to the present invention and the operation of the system will be described in detail.

FIG. 1 is a reference view showing a network system in which a method of normalizing manifest inspection data according to an embodiment of the present invention operates.

A plurality of institutions can use databases respectively to store and manage the inspection data to be acquired. In the method for normalizing the multicenter inspection data according to the present invention, a reference database 10 for storing a reference dataset serving as a reference may be selected in order to perform normalization among these databases. Here, it is desirable to select a database having the largest number of populations as a reference database.

Next, the method for normalizing the multicenter inspection data according to the present invention can normalize the values of data samples included in the data set stored in the remaining databases on the basis of the selected reference database 10. For this, one of the remaining databases may be selected as the normalization object database 20, and the values of the data samples included in the selected normalization object database 20 may be normalized based on the reference database 10. Such a normalization operation may be performed for the remaining databases 30 and 40 as well.

2 is a block diagram illustrating a system in which a method for normalizing manifest inspection data according to an embodiment of the present invention operates.

As shown in FIG. 1, in order to normalize the normalization object database 20 on the basis of the selected reference database 10, the system for normalizing the manifold inspection data according to the present invention includes a first processing unit 100 and a second processing unit 200 interlocked with the normalization target database 20. Here, the first processing unit 100 and the second processing unit 200 may be a single processing unit as needed. However, in order to maintain the confidentiality of the inspection data stored in the database for processing by the processing unit, it is preferable that the different databases are interlocked with different processing units. In the case where the processing unit is separated as described above, it is unnecessary to transmit all of the area data stored in the database between the processing units, and only the specific information generated for the normalization, such as the statistical information of the data set, The security of the local data can be maintained.

3 is a flow chart of a method for normalizing manifold inspection data according to an embodiment of the present invention.

The method for normalizing the multicenter inspection data according to the present invention may include a partial grouping step S100, a correction statistical index calculating step S200, and a normalizing step S300. In addition, if necessary, the method for normalizing the manifold inspection data according to the present invention may further include a database selection step (S50). 4 is a flowchart of a method for normalizing the manifold inspection data of the embodiment in which the database selection step (S50) is further included.

In the partial grouping step S100, the normalization target data set stored in the normalization target database 20 to be normalized is divided into at least two partial groups based on a predetermined characteristic index.

The correction statistical index calculation step S200 may calculate the normalization target data set as the reference data set using the statistical information of the normalization target data set and the reference data set stored in the reference database 10 serving as a reference for performing the normalization, To calculate a correction statistical index for normalization.

In the normalization step S300, the inspection data values of the data samples included in the normalization target data set are normalized based on the reference data set using the correction statistical index.

The database selection step S50, which may be further included as described above, selects the reference database 10 or the normalization target database 20 among at least two or more databases connected to the network.

Here, the reference database 10 and the normalization target database 20 may be connected to each other through a network. Here, the reference database 10 and the normalization subject database 20 may be a medical record database for storing medical records, whereby the test data values of the data samples included in the reference data set and the normalization target data set are stored in the patient Or the measured medical data for the subject.

Wherein the data samples contained in the data set have predetermined test data values wherein the test data values can be various types of data values obtained by performing an inspection on a particular object. For example, the test data value may be a measured medical data value for a patient or a subject. Here, the medical data values may include various types of data obtained from a hospital, a medical institution, or a laboratory by examining and acquiring data for experimental purposes relating to a medical purpose or medical practice for a patient or subject. Also, the medical data herein may include any clinical examination data. For example, the medical data may be the body weight, height, size value of a particular body part, data about a particular component present in the liquid or tissue, including the blood obtained from the patient or subject, And may include any inspection data obtainable from other patients or subjects.

The reference database 10 and the normalization object database 20 are connected to a plurality of electronic medical record systems in a distributed research network and operate in cooperation with the electronic medical record system It can be a database. For example, each medical institution may have a database operating in conjunction with an electronic medical record system, and thus the databases held by each medical institution may be interconnected to form a part of a distributed research network. In the method of normalizing the multicenter inspection data according to the present invention, the reference database 10 and the normalization object database 20 are defined as described above among the databases included in the distributed research network, and the normalization object The data set can be normalized according to the reference data set stored in the reference database 10. [

Hereinafter, the operation of the partial grouping step S100 will be described first.

In the partial grouping step S100, the normalization target data set stored in the normalization target database 20 to be normalized is divided into at least two partial groups based on a predetermined characteristic index.

Wherein the characteristic index may be a Demographic Variable or a Clinical Variable. For example, the characteristic index may be a demographic indicator such as sex, racial residence or place of birth, or may be a clinical indicator such as an index indicating the severity of a specific disease. Hereinafter, for convenience of explanation, the characteristic indexes will be described according to examples of age and sex. However, it is to be understood that the characteristic index is not limited to the above example and may include demographic index or clinical index.

For example, in the partial grouping step S100, the normalization target data set can be divided into a plurality of subgroups based on age and sex. For example, age is divided into subgroups such as 0 to 10, 11 to 20, 21 to 30, 31 to 40, 41 to 50, 51 to 60, 61 to 70, 71 to 80, And divide the male and female gender into subgroups at the same time.

Also, the sub-group segmentation step S100 may divide the subset based on the specific index in the same manner as the reference data set.

5 is a detailed flowchart of a method for normalizing the manifold inspection data according to the present invention.

Here, the subgroup segmentation step S100 may include a step S120 of the second processing unit 200 interlocking with the normalization target database 20 to divide the normalization target data set into a plurality of subgroups based on the characteristic index, . ≪ / RTI >

The partial grouping step S100 further includes a step S110 of the first processing unit 100 interlocking with the reference database 10 to divide the reference data set into a plurality of the subgroups based on the characteristic index . The step S110 of dividing the reference data set into a plurality of the subgroups based on the characteristic index is performed once when the reference data set is initially set unless the division of the subgroup is changed, You can use a set of reference data that is divided into subgroups. Therefore, if the reference data set has already been divided into subgroups, step S110 may be omitted.

Here, the normalized data set and the reference data set are all divided into the same number of subgroups based on the same characteristic index. The divided sub-group is used to calculate the correction statistical index by using the information on the distribution of the population numbers per sub-group in the correction statistical index calculation step (S200) to be described below.

Next, the operation of the correction statistical index calculation step S200 will be described.

The correction statistical index calculation step S200 calculates a correction statistical index for normalizing the normalization target data set according to the reference data set. Here, the correction statistical index may be calculated using the statistical information of the normalization target data set and the reference data set stored in the reference database 10 serving as a reference for performing the normalization.

To this end, the correction statistical index calculation step (S200) may calculate the statistical information of the normalization target data set including the information on the distribution of the population numbers of the subset of the normalization target data set. The correction statistical index calculation step (S200) may calculate the correction statistical index using the information on the distribution of the population numbers of the subset of the normalization target data set.

Here, the statistical information of the normalization target data set may include information on the distribution of the population numbers of the subset of the normalization target data set. More specifically, the statistical information of the normalization target data set includes the number of data samples of the normalization target data set (

Figure 112015078280814-pat00001
), The number of data samples per subgroup of the normalization target data set (
Figure 112015078280814-pat00002
, The number of data samples of the ith subset of the normalization target data set). That is, information on the distribution of the population numbers of the subset of the normalization target data set may be expressed by the number of data samples and the number of data samples per subset.

Also, the correction statistical index calculation step (S200) may calculate the statistical information of the reference data set including the information on the distribution of the number of the population of the reference data set. The correction statistical index calculation step S200 may calculate the correction statistical index using the statistical information of the normalization target data set and the statistical information of the reference data set calculated as described above.

Here, the statistical information of the reference data set calculated in the correction statistical index calculating step (S200) may be calculated as an average of the inspection data values of the data samples of the reference data set

Figure 112015078280814-pat00003
) And an average of the inspection data values of the sub-group data samples of the reference data set
Figure 112015078280814-pat00004
, The average of the inspection data values of the data samples of the ith subset of the reference data set) and the number of data samples per subset of the reference data set
Figure 112015078280814-pat00005
, The number of data samples of the ith subset of the reference data set).

And the correction statistical indicator may include a corrected mean and a corrected standard deviation of the reference data set. Wherein the corrected average and corrected standard deviation of the reference data set are calculated by averaging and standard deviation of the reference data set using statistical information of the normalized data set and statistical information of the reference data set, It is deviation. To this end, the correction statistical index calculation step (S200) may calculate the corrected average and corrected standard deviation of the reference data set using the statistical information of the normalization target data set and the statistical information of the reference data set.

Here, the corrected average and the corrected standard deviation can be calculated by the following equations (1) and (2).

Figure 112015078280814-pat00006

Figure 112015078280814-pat00007

here

Figure 112015078280814-pat00008
Wow
Figure 112015078280814-pat00009
Are the corrected mean and the corrected standard deviation of the reference data set, respectively,
Figure 112015078280814-pat00010
Is the inspection data value of the data samples of the ith subset of the reference data set,
Figure 112015078280814-pat00011
Is the average of the inspection data values of the data samples of the reference data set,
Figure 112015078280814-pat00012
Is the average of the inspection data values of the data samples of the ith subset of the reference data set, S is the number of said subset,
Figure 112015078280814-pat00013
Is the number of data samples of the normalization target data set,
Figure 112015078280814-pat00014
Is the number of data samples of the ith subset of the reference data set,
Figure 112015078280814-pat00015
Is the number of data samples of the ith subset of the normalization target data set.

Hereinafter, the operation of the correction statistical index calculation step (S200) will be described with reference to FIG. 5 showing a detailed flowchart of the method of normalizing the manifold inspection data according to the present invention.

The correction statistical index calculation step S200 includes a step S220 of calculating the statistical information of the normalization target data set by the second processing unit 200 linked with the normalization object database 20, The first processing unit 100 calculates the statistical information of the reference data set, and the first processing unit 100 uses the statistical information of the normalized data set and the statistical information of the reference data set, And calculating the correction statistical index (S230).

Here, steps S210 and S220 may be performed first. In addition, the step S210 of the first processor 100 to calculate the statistical information of the reference data set does not need to be repeatedly performed unless the division of the subgroup is changed after the operation is performed once, The statistical information of the reference data set previously calculated can be used. Therefore, if the statistical information of the normalization target data set has already been calculated, step S110 may be omitted.

The correction statistical index calculation step S200 may further include a step S225 of transmitting the statistical information of the normalization target data set calculated by the second processing unit 200 to the first processing unit 100 in step S220 have. The first processor 100 may calculate the correction statistical index using the statistical information of the normalization target data set and the statistical information of the reference data set (S230).

The correction statistical index calculation step S200 may further include a step S235 of transmitting the correction statistical index calculated by the first processing unit 100 to the second processing unit 200. [ Then, the second processing unit 200 may perform normalization using the correction statistical index received as described above (S300).

In the case where the processing unit is divided into the first processing unit 100 and the second processing unit 200 as described above, the statistical information of the normalization target data set or the correction statistical index It is possible to maintain the security of the area data stored in each database by transmitting only the specific information generated for normalization and performing normalization.

Next, the operation of the normalization step (S300) will be described.

In the normalization step S300, the inspection data values of the data samples included in the normalization target data set are normalized based on the reference data set using the correction statistical index. Here, the normalization operation as described above can be performed by the second processing unit 200.

In more detail, the normalizing step S300 includes comparing the inspection data values of the data samples included in the normalization target data set with the corrected average and the corrected standard deviation of the reference data set and the data samples of the normalization target data set Using the average and standard deviation of the test data values of the test data.

Here, the normalization step S300 may normalize the inspection data values of the data samples included in the normalization target data set according to the following equation (3).

Figure 112015078280814-pat00016

here

Figure 112015078280814-pat00017
Is the normalized result,
Figure 112015078280814-pat00018
Is an inspection data value of a data sample included in the normalization target data set,
Figure 112015078280814-pat00019
Wow
Figure 112015078280814-pat00020
Is an average and standard deviation of the inspection data values of the data samples of the normalization target data set,
Figure 112015078280814-pat00021
Wow
Figure 112015078280814-pat00022
Are the corrected mean and the corrected standard deviation of the reference data set, respectively.

8 is a reference diagram showing an example of the operation of the correction statistical index calculation step S200 according to the present invention. As shown in FIG. 8, the correction statistical index calculation step S200 calculates a corrected statistical index of the reference data set (Dataset Da) to normalize the normalized data set (Dataset Db)

Figure 112015078280814-pat00023
) And standard deviation (
Figure 112015078280814-pat00024
) Can be calculated. In Figure 8, mean is the average of the test data values of the data samples belonging to each subset (Subgroup (i))
Figure 112015078280814-pat00025
, ), N is the number of data samples belonging to each subset, and the average of the squared differences from the total mean is
Figure 112015078280814-pat00027
And
Figure 112015078280814-pat00028
it means. Here, the respective variables are the same as the meanings of the variables in the above-mentioned equations (1) to (3). In this case, the corrected average of the reference data set (
Figure 112015078280814-pat00029
) And standard deviation (
Figure 112015078280814-pat00030
Can be calculated according to Equations (1) and (2).

Figure 112015078280814-pat00031

2 is a block diagram illustrating a multi-pipe inspection data normalization system in accordance with one embodiment of the present invention.

2, in order to normalize the normalization object database 20 on the basis of the reference database 10, the system for normalizing the manifold inspection data according to the present invention includes a first processing unit 100 interlocked with the reference database 10, And a second processing unit 200 linked to the database 20. Here, the multi-pipe inspection data normalization system according to the present invention can operate in the same manner as the multi-pipe inspection data normalization method according to the present invention described in detail with reference to FIG. 1 to FIG. The overlapping portions will be omitted and briefly described.

The first processor 100 interoperates with the reference database 10 that stores a reference data set as a reference for performing normalization and the second processor 200 stores a normalized data set to be normalized And interacts with the normalization object database 20.

Here, the first processing unit 100 calculates a correction statistical index for normalizing the normalization target data set according to the reference data set.

The second processing unit 200 divides the normalization target data set into at least two or more subgroups based on a predetermined characteristic index and sets the inspection data values of the data samples included in the normalization target data set as the correction statistics And uses the index to normalize based on the reference data set.

FIG. 6 is a detailed block diagram of the first processing unit 100 operating in conjunction with the reference database 10 in the multi-pipe inspection data normalization system according to the present invention. FIG. 7 is a detailed block diagram of the normalization inspection data normalization system according to the present invention. FIG. 8 is a detailed block diagram of a second processing unit 200 that operates in conjunction with the database 20; FIG.

As shown in FIG. 6, the first processor 100 may include a first subgroup dividing unit 110, a first statistical information calculating unit 120, and a correction statistical index calculating unit 130.

The first subgroup division section 110 divides the set of reference data into the subgroup based on the characteristic index.

The first statistical information calculating unit 120 calculates statistical information of the reference data set including information on the distribution of the number of the population of the reference data set.

The correction statistical index calculator 130 calculates the correction statistical index using the statistical information of the normalization target data set, the statistical information of the reference data set, and the inspection data values of the data samples of the reference data set.

As shown in FIG. 7, the second processing unit 200 may include a second partial grouping unit 210, a second statistical information calculating unit 220, and a normalizing unit 230.

Here, the second subgroup dividing unit 210 divides the normalization target data set into the subgroup based on the characteristic index.

The second statistical information calculating unit 220 calculates the statistical information of the normalization target data set including the information on the distribution of the number of the population of the subset of the normalization target data set.

The normalization unit 230 normalizes the inspection data values of the data samples included in the normalization target data set on the basis of the reference data set using the correction statistical index.

The manifold inspection data normalization computer program according to another embodiment of the present invention may be a computer program stored in the medium for executing the method of normalizing the manifold inspection data in combination with the database.

Hereinafter, a method for normalizing the manifold inspection data according to the present invention and the performance and effects of the system will be described based on actual experimental results.

The partial population calibration normalization method according to the present invention is a method of normalizing a partial population correction normalized blood pressure data including blood test data (BUN), serum creatinine, hematocrit, hemoglobin, serum potassium, and total bilirubin obtained in two hospitals (A and B) , And it was confirmed that the heterogeneity between test data was effectively eliminated when compared with the existing methods as described below. The standardized difference in mean (SDM) and the Kolmogorov Smirnov Value (KS) were used for the performance test.

9 is a reference diagram showing the heterogeneity of data existing between data sets obtained from different medical institutions. As can be seen from FIGS. 9A to 9D, between the inspection data obtained in the two different hospitals A and B, the age (FIG. 9A and FIG. 9C) ) And gender view (FIGS. 9 (b) and 9 (d)) show the distribution characteristics of different hemoglobin.

The results of performing the group correction normalization method according to the present invention on the blood test data obtained from the two hospitals as described above and the conventional method of the Z-score transformation and the Rank-based inverse normal transformation (INT) The results obtained by performing the normalization by applying the respective methods are compared with each other in terms of the standardized difference in mean (SDM) and the Kolmogorov Smirnov value (KS, Kolmogorov Smirnov Value) (a) and (b).

The standardization method (Z-score transformation) is described in "Cheadle C, Vawter MP, Freed WJ, et al .: Analysis of microarray data using Z score transformation." J Mol Diagn 2003; 5: 73-81 DOI: 10.1016 / S1525 -1578 (10) 60455-2 ", and Rank-based inverse normal transformation (INT) method was used in" Bolstad BM, Irizarry RA, Astrand M, et al. A comparison of normalization methods for high density oligonucleotide DOI: ", Bioinformatics 2003; 19: 185-193.

Here, the standardized difference in means (SDM) is calculated as shown in Equation (5), and the Kolmogorov Smirnov Value (KS) can be calculated as Equation (6).

Figure 112015078280814-pat00032

here

Figure 112015078280814-pat00033
Is the mean of the m data set,
Figure 112015078280814-pat00034
Is the variance of the m data set.

Figure 112015078280814-pat00035

Where sup is the supremum of the set of distances,

Figure 112015078280814-pat00036
Wow
Figure 112015078280814-pat00037
Is the empirical distribution function of the first data set and the second data set, respectively.

As can be seen in FIG. 10, the SAN cancellation method according to the present invention is superior to the conventional methods such as Z-score transformation method or Rank-based inverse normal transformation method INT It can be confirmed that the normalization has been performed. Here, RAW means raw data.

As described above, according to the method and apparatus for normalizing multi-organ examination data according to the present invention, it is possible to eliminate the heterogeneity of the values of clinical examination data existing between databases generated according to the numerical values of clinical examination data measured at different medical institutions, It is possible to efficiently use the integrated analysis as one.

In particular, the method of normalizing the multicenter inspection data according to the present invention has the effect of normalizing and integrating and analyzing the medical data acquired from different data sources in the distributed research network environment.

It is to be understood that the present invention is not limited to these embodiments, and all elements constituting the embodiment of the present invention described above are described as being combined or operated in one operation. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them.

In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. In addition, such a computer program may be stored in a computer readable medium such as a USB memory, a CD disk, a flash memory, etc., and read and executed by a computer to implement an embodiment of the present invention. As the recording medium of the computer program, a magnetic recording medium, an optical recording medium, or the like can be included.

Furthermore, all terms including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined in the Detailed Description. Commonly used terms, such as predefined terms, should be interpreted to be consistent with the contextual meanings of the related art, and are not to be construed as ideal or overly formal, unless expressly defined to the contrary.

It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be. Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

10: Reference database
20: Normalization target database
100:
200: second processing section
S50: Database selection step
S100: Subgrouping step
S200: Calculation statistical index calculation step
S300: Normalization step

Claims (20)

A method for normalizing a manifold inspection data,
A subgroup dividing step of dividing the normalization target data set stored in the normalization target database to be normalized into at least two or more subgroups based on a predetermined characteristic index;
Calculating a correction statistical index for normalizing the normalization target data set according to the reference data set using the statistical information of the normalization target data set and the reference data set stored in the reference database serving as a reference for performing normalization, Statistical index calculation step; And
And normalizing the numerical values of the inspection data of the data samples included in the data set to be normalized based on the reference data set using the correction statistical index.
The method according to claim 1,
Wherein the reference database and the normalization object database are connected to each other through a network.
The method according to claim 1,
Wherein the characteristic index includes a demographic index or a clinical index.
The method according to claim 1,
Wherein the reference database and the normalization object database are medical record databases for storing medical records,
Wherein the inspection data values of the reference data set and the data samples included in the normalization target data set are medical data values measured for the patient or the subject.
3. The method of claim 2,
Wherein the reference database and the normalization target database are databases in which a plurality of electronic medical record systems are interconnected via a network and which operate in cooperation with the electronic medical record system, Wherein the data is normalized.
3. The method of claim 2,
Further comprising: a database selection step of selecting the reference database or the normalization target database among at least two or more networks connected through the network.
The method according to claim 1,
Wherein the subgrouping step comprises:
And a second processing unit operable to interoperate with the normalization target database, the method comprising: dividing the normalization target data set into a plurality of subgroups based on the characteristic index.
8. The method of claim 7,
Wherein the subgrouping step comprises:
Further comprising the step of: dividing the reference data set into a plurality of the subgroups based on the characteristic index, wherein the first processing unit is interlocked with the reference database.
The method according to claim 1,
The correction statistical index calculating step may include:
Wherein the correction statistical index is calculated using information on the distribution of population numbers of the subgroups of the normalization target data set.
10. The method of claim 9,
Wherein the correction statistical index includes a corrected average of the reference data set and a corrected standard deviation calculated by correcting the reference data set using information on the distribution of population numbers of the subset of the normalized data set Wherein the data is normalized.
11. The method of claim 10,
The correction statistical index calculating step may include:
Calculating statistical information of the normalization target data set including information on the distribution of population numbers of the subset of the normalization target data set,
The correction statistical index is calculated by using the statistical information of the reference data set including the statistical information of the normalized data set and the information on the distribution of the number of the population of the reference data set by the partial data set calculated as described above A method of normalizing the data of the manifold inspection data.
12. The method of claim 11,
The correction statistical index calculating step may include:
A second processing unit interlocked with the normalization target database, the method comprising: calculating statistical information of the normalization target data set;
A first processing unit linked with the reference database, the method comprising: calculating statistical information of the reference data set; And
Wherein the first processing unit includes a step of calculating the correction statistical index using the statistical information of the normalization target data set and the statistical information of the reference data set.
The method of claim 11, wherein
Wherein the statistical information of the reference data set includes an average of the inspection data values of the data samples of the reference data set, an average of the inspection data values of the data samples of the subset of the reference data set, Includes the population of star data samples,
Wherein the statistical information of the normalization target data set includes a number of data samples of the normalization target data set and a number of data samples of the subset of the normalization target data set.
14. The method of claim 13,
Wherein the corrected average and the corrected standard deviation are calculated by the following Equations (1) and (2).
[Formula 1]
Figure 112015078280814-pat00038

[Formula 2]
Figure 112015078280814-pat00039

here
Figure 112015078280814-pat00040
Wow
Figure 112015078280814-pat00041
Are the corrected mean and the corrected standard deviation of the reference data set, respectively,
Figure 112015078280814-pat00042
Is the inspection data value of the data samples of the ith subset of the reference data set,
Figure 112015078280814-pat00043
Is the average of the inspection data values of the data samples of the reference data set,
Figure 112015078280814-pat00044
Is the average of the inspection data values of the data samples of the ith subset of the reference data set, S is the number of said subset,
Figure 112015078280814-pat00045
Is the number of data samples of the normalization target data set,
Figure 112015078280814-pat00046
Is the number of data samples of the ith subset of the reference data set,
Figure 112015078280814-pat00047
Is the number of data samples of the ith subset of the normalization target data set.
11. The method of claim 10, wherein the normalizing step
Wherein the number of the inspection data of the data samples included in the normalization target data set is set to the average and standard deviation of the corrected average value and the corrected standard deviation of the standard data set and the inspection data values of the data samples of the normalization target data set, And the normalization is performed using the normalized normalization method.
16. The method of claim 15,
Wherein the normalizing step normalizes the inspection data values of the data samples included in the normalization target data set according to Equation (3).
[Formula 3]
Figure 112015078280814-pat00048

here
Figure 112015078280814-pat00049
Is the normalized result,
Figure 112015078280814-pat00050
Is an inspection data value of a data sample included in the normalization target data set,
Figure 112015078280814-pat00051
Wow
Figure 112015078280814-pat00052
Is an average and standard deviation of the inspection data values of the data samples of the normalization target data set,
Figure 112015078280814-pat00053
Wow
Figure 112015078280814-pat00054
Are the corrected mean and the corrected standard deviation of the reference data set, respectively.
17. A computer program stored in a medium for executing a method for normalizing manifold inspection data according to any one of claims 1 to 16 in combination with a database. In a multi-panel inspection data normalization system,
A first processing unit operable to interoperate with a reference database storing a reference data set as a reference for performing normalization; And
And a second processing unit operable to interoperate with a normalization target database storing a normalization target data set to be normalized,
Wherein the first processing unit calculates a correction statistical index for normalizing the normalization target data set according to the reference data set,
Wherein the second processing unit divides the normalization target data set into at least two or more partial groups based on a predetermined characteristic index and sets the inspection data values of the data samples included in the normalization target data set using the correction statistic index Wherein the normalization is performed on the basis of the reference data set.
19. The apparatus according to claim 18,
A second subgroup division unit for dividing the normalization target data set into the subgroup based on the characteristic index;
A second statistical information calculating unit for calculating statistical information of the normalization target data set including information on the distribution of the number of the population of the normalization target data set; And
And a normalization unit for normalizing inspection data values of data samples included in the normalization target data set based on the reference data set using the correction statistical index.
20. The image processing apparatus according to claim 19,
A first subgroup dividing unit for dividing the reference data set into the subgroup based on the characteristic index;
A first statistical information calculating unit for calculating statistical information of the reference data set including information on the distribution of the number of the population of the reference data set; And
And a correction statistical index calculating unit for calculating the correction statistical index using the statistical information of the normalization target data set, the statistical information of the reference data set, and the inspection data values of the data samples of the reference data set. Manifold system for data of manifold inspection data.
KR1020150113953A 2015-08-12 2015-08-12 Method of Normalization for Combination of Clinical Data from Different Electronic Healthcare Databases and System thereof KR101799823B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150113953A KR101799823B1 (en) 2015-08-12 2015-08-12 Method of Normalization for Combination of Clinical Data from Different Electronic Healthcare Databases and System thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150113953A KR101799823B1 (en) 2015-08-12 2015-08-12 Method of Normalization for Combination of Clinical Data from Different Electronic Healthcare Databases and System thereof

Publications (2)

Publication Number Publication Date
KR20170019739A KR20170019739A (en) 2017-02-22
KR101799823B1 true KR101799823B1 (en) 2017-11-21

Family

ID=58315160

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150113953A KR101799823B1 (en) 2015-08-12 2015-08-12 Method of Normalization for Combination of Clinical Data from Different Electronic Healthcare Databases and System thereof

Country Status (1)

Country Link
KR (1) KR101799823B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200036296A (en) 2018-09-28 2020-04-07 주식회사 어큐진 Common data convert system for genome information
KR20200036298A (en) 2018-09-28 2020-04-07 주식회사 어큐진 Common data convert method of genome information
KR20220102166A (en) 2021-01-11 2022-07-20 연세대학교 산학협력단 A method for estimating a centralized model based on horizontal division without physical data sharing based on weighted integration

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102226899B1 (en) * 2018-11-16 2021-03-11 주식회사 딥바이오 Method and system for consensus diagnosis system based on supervised learning
KR102571593B1 (en) * 2021-04-07 2023-08-28 주식회사 에비드넷 A method of constructing an interest pattern candidate database using medical data between medical institutions, and its devicee

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004518187A (en) 2000-10-11 2004-06-17 ヘルストリオ,インコーポレイテッド Health management data communication system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004518187A (en) 2000-10-11 2004-06-17 ヘルストリオ,インコーポレイテッド Health management data communication system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200036296A (en) 2018-09-28 2020-04-07 주식회사 어큐진 Common data convert system for genome information
KR20200036298A (en) 2018-09-28 2020-04-07 주식회사 어큐진 Common data convert method of genome information
KR20220102166A (en) 2021-01-11 2022-07-20 연세대학교 산학협력단 A method for estimating a centralized model based on horizontal division without physical data sharing based on weighted integration

Also Published As

Publication number Publication date
KR20170019739A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
KR101799823B1 (en) Method of Normalization for Combination of Clinical Data from Different Electronic Healthcare Databases and System thereof
US11037070B2 (en) Diagnostic test planning using machine learning techniques
JP5785184B2 (en) Diagnostic techniques for continuous storage and integrated analysis of both medical and non-image medical data
EP2959414B1 (en) Methods for indirect determination of reference intervals
EP1399868A2 (en) Information processing method for disease stratification and assessment of disease progressing
Reynolds et al. Association of time-varying blood pressure with chronic kidney disease progression in children
CN101061483A (en) In-situ data collection architecture for computer-aided diagnosis
CN111048210A (en) Method and device for evaluating disease risk based on fundus image
CN114023441A (en) Severe AKI early risk assessment model and device based on interpretable machine learning model and development method thereof
Jiang et al. Longitudinal analysis of change in mammographic density in each breast and its association with breast cancer risk
Haakma et al. Belief elicitation to populate health economic models of medical diagnostic devices in development
Man et al. Improving non-invasive hemoglobin measurement accuracy using nonparametric models
JP5799377B2 (en) Abnormal frequency estimation device, abnormal frequency estimation program, abnormal frequency estimation method, and abnormal frequency estimation system
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
CN115691735B (en) Multi-mode data management method and system based on slow-resistance pulmonary specialty data
JP7124265B2 (en) Biomarker detection method, disease determination method, biomarker detection device, and biomarker detection program
JP5802614B2 (en) CLINICAL INFORMATION DISPLAY DEVICE, CLINICAL INFORMATION DISPLAY DEVICE OPERATION METHOD, AND CLINICAL INFORMATION DISPLAY PROGRAM
EP4099335A1 (en) System and method for estimation of delivery date of pregnant subject using microbiome data
Eadie et al. Recommendations for research design and reporting in computer-assisted diagnosis to facilitate meta-analysis
Bahar et al. Model Structure of Fetal Health Status Prediction
Kalra et al. Online variational learning of finite inverted Beta‐Liouville mixture model for biomedical analysis
US20040030672A1 (en) Dynamic health metric reporting method and system
Kim Development and Validation of Traumatic Brain Injury Outcome Prognosis Model and Identification of Novel Quantitative Data-Driven Endotypes
CN110603592B (en) Biomarker detection method, disease judgment method, biomarker detection device, and biomarker detection program
JP2014002498A (en) Clinical information display device, operating method for the same, and clinical information display program

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal