KR101799823B1

KR101799823B1 - Method of Normalization for Combination of Clinical Data from Different Electronic Healthcare Databases and System thereof

Info

Publication number: KR101799823B1
Application number: KR1020150113953A
Authority: KR
Inventors: 박래웅; 윤덕용
Original assignee: 아주대학교산학협력단
Priority date: 2015-08-12
Filing date: 2015-08-12
Publication date: 2017-11-21
Also published as: KR20170019739A

Abstract

The present invention relates to a method and apparatus for analyzing and analyzing medical data collected from a plurality of medical institutions.
A method for normalizing a multicenter inspection data according to the present invention includes a subgroup dividing step of dividing a normalization target data set stored in a normalization target database to be normalized into at least two or more subgroups based on a predetermined characteristic index, A correction statistic for calculating a correction statistical index for normalizing the normalization target data set according to the reference data set using the statistical information of the normalization target data set and the reference data set stored in the reference database serving as a reference for performing the normalization, And a normalizing step of normalizing the inspection data values of the data samples included in the normalization target data set based on the reference data set using the correction statistical index.
According to the method and apparatus of the present invention, it is possible to eliminate the heterogeneity of numerical values of clinical test data existing between databases generated according to the numerical values of clinical test data measured at different medical institutions, So that it can be used as a single integrated analysis.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and system for normalizing multi-

The present invention relates to a method and system for analyzing and analyzing medical data collected from a plurality of medical institutions.

Distributed Research Networks, which allow mutual access to inspection data that each institution acquires and manages, is designed to normalize the inspection data acquired by a plurality of partner organizations and to share the analysis results, There is an advantage that inspection data can be obtained. For example, medical data obtained from a plurality of different hospitals from their patients can be shared among partner institutions through a distributed research network. However, the results of examinations or experiments obtained in different institutions, for example in different medical institutions, are obtained by measuring different equipment for different patients or groups of subjects, and thus are clinically or demographically heterogeneous There is a difficulty in analyzing and integrating it into one data set.

However, the existing normalization methods do not show satisfactory results when analyzing the inspection data acquired from the manifolds into a single data set. For example, in applying the above-described manifold inspection data to analysis, the existing rank-based method (prior art document 001) The conventional Z-score conversion method (prior art document 002) has a limitation in that it can not correct the heterogeneity between data merely by matching the mean and variance between different data groups .

In other words, the existing test data normalization or integrated analysis methods have a limitation in correctly integrating and analyzing test record data such as clinical test data and medical record data acquired and managed by various institutions.

Beasley TM, Erickson S and Allison DB Rank-based inverse normal transformations are increasingly used, but they are merited Behav Genet 2009; 39: 580-595. DOI: 10.1007 / s10519-009-9281-0

Cheadle C, Vawter MP, Freed WJ, et al. Analysis of microarray data using Z score transformation. J Mol Diagn 2003; 5: 73-81. DOI: 10.1016 / S1525-1578 (10) 60455-2

The present invention provides a method for normalizing and analyzing inspection data included in a database storing a medical data set acquired and stored and managed by patients from a plurality of medical institutions, and a system therefor will be.

The problem to be solved by the present invention is to overcome the clinicopathological or demographic structural heterogeneity existing in the test data obtained for different patients or subjects in the medical institutions and normalize and integrate the test data sets Method and a system therefor.

It is another object of the present invention to provide a method and system for analyzing a plurality of sets of medical record data while maintaining the confidentiality of patient information included in the medical record.

According to an aspect of the present invention, there is provided a method for normalizing a multicast inspection data according to one aspect of the present invention, the method comprising the steps of: normalizing a normalization target data set stored in a normalization target database to be normalized, A partial grouping step of dividing the partial grouping step; Calculating a correction statistical index for normalizing the normalization target data set according to the reference data set using the statistical information of the normalization target data set and the reference data set stored in the reference database serving as a reference for performing normalization, Statistical index calculation step; And a normalizing step of normalizing inspection data values of data samples included in the normalization target data set based on the reference data set using the correction statistical index.

Here, the reference database and the normalization target database may be connected to each other through a network.

Wherein the characteristic index includes a demographic index or a clinical index.

Wherein the reference database and the normalization target database are medical record databases for storing medical records and the inspection data values of the data samples included in the reference data set and the normalization target data set are the medical data values measured for the patient or the subject .

Here, the reference database and the normalization object database may include a plurality of electronic medical record systems connected to each other through a network, the distributed research network comprising: a database that operates in cooperation with the electronic medical record system; .

Here, the method may further include a database selection step of selecting the reference database or the normalization target database among at least two or more networks connected through a network.

Wherein the subgroup segmentation step may include a step of dividing the normalization target data set into a plurality of the subgroups based on the characteristic index, wherein the second processing unit is interlocked with the normalization target database.

Wherein the subgrouping step may further include the step of dividing the reference data set into a plurality of the subgroups based on the characteristic index, wherein the first processing unit is interlocked with the reference database.

Here, the correction statistical index calculating step may calculate the correction statistical index using information on the distribution of the number of the population of the subset of the normalization target data set.

Wherein the correction statistical index includes a corrected average of the reference data set and a corrected standard deviation calculated by correcting the reference data set using the information on the distribution of population numbers of the partial group of the normalization target data set .

Wherein the correction statistical index calculating step calculates the statistical information of the normalization target data set including information on the distribution of the number of the population of the normalization target data set, And calculating the correction statistical index using the statistical information of the reference data set including the information on the distribution of the number of the population of the reference data set.

Here, the correction statistical index calculation step may include: calculating a statistical information of the normalization target data set, the second processing unit being interlocked with the normalization target database; A first processing unit linked with the reference database, the method comprising: calculating statistical information of the reference data set; And the first processing unit may calculate the correction statistical index using the statistical information of the normalization target data set and the statistical information of the reference data set.

Wherein the statistical information of the reference data set includes an average of the inspection data values of the data samples of the reference data set, an average of the inspection data values of the data samples of the subset of the reference data set, Wherein the statistical information of the normalization target data set includes the number of data samples of the normalization target data set and the number of data samples of the subset group of the normalization target data set .

Wherein the normalizing step includes a step of calculating an inspection data value of a data sample included in the normalization target data set by using the corrected average and the corrected standard deviation of the reference data set and the inspection data of the data sample of the normalization target data set And normalization is performed using the average and standard deviation of the numerical values.

Here, the computer program according to another aspect of the present invention may be a computer program stored in a medium for executing the method of normalizing the multicenter inspection data in combination with a database.

According to another aspect of the present invention, there is provided a multi-pipe inspection data normalization system including a first processing unit interlocked with a reference database storing a reference data set serving as a reference for performing normalization; And a second processing unit operable to interoperate with a normalization target database that stores a normalization target data set to be subjected to normalization, wherein the first processing unit performs a correction for normalizing the normalization target data set according to the reference data set, Wherein the second processing unit divides the normalization target data set into at least two or more partial groups based on a predetermined characteristic index and sets the inspection data values of the data samples included in the normalization target data set as And normalizing the reference statistical data based on the reference data set using the correction statistical index.

Wherein the second processing unit comprises: a second subgroup dividing unit for dividing the normalization target data set into the subgroup based on the characteristic index; A second statistical information calculating unit for calculating statistical information of the normalization target data set including information on the distribution of the number of the population of the normalization target data set; And a normalization unit for normalizing inspection data values of data samples included in the normalization target data set based on the reference data set using the correction statistical index.

The first processing unit may include: a first subgroup dividing unit that divides the reference data set into the subgroup based on the characteristic index; A first statistical information calculating unit for calculating statistical information of the reference data set including information on the distribution of the number of the population of the reference data set; And a correction statistical index calculation unit for calculating the correction statistical index using the statistical information of the normalization target data set, the statistical information of the reference data set, and the inspection data values of the data samples of the reference data set .

According to the method and apparatus of the present invention, it is possible to eliminate the heterogeneity of numerical values of clinical test data existing between databases generated according to the numerical values of clinical test data measured at different medical institutions, So that it can be used as a single integrated analysis.

In particular, the method of normalizing the multicenter inspection data according to the present invention has the effect of normalizing and integrating and analyzing the medical data acquired from different data sources in the distributed research network environment.

FIG. 1 is a reference view showing a network system in which a method of normalizing manifest inspection data according to an embodiment of the present invention operates.
2 is a block diagram illustrating a system in which a method for normalizing manifest inspection data according to an embodiment of the present invention operates.
3 is a flow chart of a method for normalizing manifold inspection data according to an embodiment of the present invention.
4 is a flowchart of a method for normalizing manifold inspection data according to another embodiment of the present invention.
5 is a detailed flowchart of a method for normalizing the manifold inspection data according to the present invention.
FIG. 6 is a detailed block diagram of a first processing unit operating in conjunction with a reference database in the multicenter inspection data normalization system according to the present invention.
FIG. 7 is a detailed block diagram of a second processing unit operating in conjunction with a normalization target database in the multicenter inspection data normalization system according to the present invention.
FIG. 8 is a reference diagram showing an example of the operation of the correction statistical index calculation step according to the present invention.
9 is a reference diagram showing the heterogeneity of data existing between data sets obtained from different medical institutions.
FIG. 10 is a reference diagram for explaining the result of performing the group correction normalization method according to the present invention and the comparison result of the results according to the conventional method.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, the preferred embodiments of the present invention will be described below, but it is needless to say that the technical idea of the present invention is not limited thereto and can be variously modified by those skilled in the art.

First, the method for normalizing the manifold inspection data according to the present invention and the basic background in which the system is invented will be described in more detail.

Electronic Health Record System or Electronic Medical Record System, which acquires and manages medical records electronically, is widely used, and electronic medical records are networked among mutual partnerships. Researches are underway to integrate and analyze the data. Distributed research networks, which allow mutual access to the medical data that each institution acquires and manage, can collect more inspection data sets by collectively analyzing the inspection data acquired by a plurality of partner organizations There are advantages. It is preferable that a plurality of partner organizations are connected to each other in the network. For example, medical data obtained from a plurality of different hospitals from their patients can be shared among partner institutions through a distributed research network. In addition, the analysis data integrated through the distributed research network can be effectively used for retrospective study.

However, there are the following problems or limitations in integrating and analyzing electronic medical record systems of different medical institutions.

First, the test or experimental results obtained from different medical institutions are obtained by measuring with different equipment for different patients or groups of subjects. Thus, the test data obtained from each of these manifolds are clinically or demographically heterogeneous and difficult to integrate into one data set.

However, the existing normalization methods do not show satisfactory results when analyzing the inspection data acquired from the manifolds into a single data set. For example, in applying the above-described manifold inspection data to analysis, the existing rank-based method (prior art document 001) The conventional Z-score conversion method (Prior Art Document 002) is a method for simply matching the mean and variance between different data groups, There is a limitation in that it can not be corrected.

In other words, the existing test data normalization or integrated analysis methods can not be used to correctly integrate and analyze analysis record data such as clinical test data or medical record data that are acquired and managed by various institutions while maintaining the clinical meaning, .

Second, due to the nature of medical records, there is a problem of confidentiality of patient information. Therefore, there is a limitation that the medical records including the patient's personal information can not be simply analyzed and exchanged among the medical institutions.

In the present invention, in the distributed research networks, it is necessary to solve the heterogeneity between the medical data acquired at various medical institutions at the same time while ensuring confidentiality of the medical records as described above, and to perform normalization Method and a system therefor. The method proposed by the present invention is called a subgroup-adjusted normalization (SAN) method.

Hereinafter, the method of normalizing the multicenter inspection data according to the present invention and the operation of the system will be described in detail.

FIG. 1 is a reference view showing a network system in which a method of normalizing manifest inspection data according to an embodiment of the present invention operates.

A plurality of institutions can use databases respectively to store and manage the inspection data to be acquired. In the method for normalizing the multicenter inspection data according to the present invention, a reference database 10 for storing a reference dataset serving as a reference may be selected in order to perform normalization among these databases. Here, it is desirable to select a database having the largest number of populations as a reference database.

Next, the method for normalizing the multicenter inspection data according to the present invention can normalize the values of data samples included in the data set stored in the remaining databases on the basis of the selected reference database 10. For this, one of the remaining databases may be selected as the normalization object database 20, and the values of the data samples included in the selected normalization object database 20 may be normalized based on the reference database 10. Such a normalization operation may be performed for the remaining databases 30 and 40 as well.

2 is a block diagram illustrating a system in which a method for normalizing manifest inspection data according to an embodiment of the present invention operates.

As shown in FIG. 1, in order to normalize the normalization object database 20 on the basis of the selected reference database 10, the system for normalizing the manifold inspection data according to the present invention includes a first processing unit 100 and a second processing unit 200 interlocked with the normalization target database 20. Here, the first processing unit 100 and the second processing unit 200 may be a single processing unit as needed. However, in order to maintain the confidentiality of the inspection data stored in the database for processing by the processing unit, it is preferable that the different databases are interlocked with different processing units. In the case where the processing unit is separated as described above, it is unnecessary to transmit all of the area data stored in the database between the processing units, and only the specific information generated for the normalization, such as the statistical information of the data set, The security of the local data can be maintained.

3 is a flow chart of a method for normalizing manifold inspection data according to an embodiment of the present invention.

The method for normalizing the multicenter inspection data according to the present invention may include a partial grouping step S100, a correction statistical index calculating step S200, and a normalizing step S300. In addition, if necessary, the method for normalizing the manifold inspection data according to the present invention may further include a database selection step (S50). 4 is a flowchart of a method for normalizing the manifold inspection data of the embodiment in which the database selection step (S50) is further included.

In the partial grouping step S100, the normalization target data set stored in the normalization target database 20 to be normalized is divided into at least two partial groups based on a predetermined characteristic index.

The correction statistical index calculation step S200 may calculate the normalization target data set as the reference data set using the statistical information of the normalization target data set and the reference data set stored in the reference database 10 serving as a reference for performing the normalization, To calculate a correction statistical index for normalization.

In the normalization step S300, the inspection data values of the data samples included in the normalization target data set are normalized based on the reference data set using the correction statistical index.

The database selection step S50, which may be further included as described above, selects the reference database 10 or the normalization target database 20 among at least two or more databases connected to the network.

Here, the reference database 10 and the normalization target database 20 may be connected to each other through a network. Here, the reference database 10 and the normalization subject database 20 may be a medical record database for storing medical records, whereby the test data values of the data samples included in the reference data set and the normalization target data set are stored in the patient Or the measured medical data for the subject.

Wherein the data samples contained in the data set have predetermined test data values wherein the test data values can be various types of data values obtained by performing an inspection on a particular object. For example, the test data value may be a measured medical data value for a patient or a subject. Here, the medical data values may include various types of data obtained from a hospital, a medical institution, or a laboratory by examining and acquiring data for experimental purposes relating to a medical purpose or medical practice for a patient or subject. Also, the medical data herein may include any clinical examination data. For example, the medical data may be the body weight, height, size value of a particular body part, data about a particular component present in the liquid or tissue, including the blood obtained from the patient or subject, And may include any inspection data obtainable from other patients or subjects.

The reference database 10 and the normalization object database 20 are connected to a plurality of electronic medical record systems in a distributed research network and operate in cooperation with the electronic medical record system It can be a database. For example, each medical institution may have a database operating in conjunction with an electronic medical record system, and thus the databases held by each medical institution may be interconnected to form a part of a distributed research network. In the method of normalizing the multicenter inspection data according to the present invention, the reference database 10 and the normalization object database 20 are defined as described above among the databases included in the distributed research network, and the normalization object The data set can be normalized according to the reference data set stored in the reference database 10. [

Hereinafter, the operation of the partial grouping step S100 will be described first.

Wherein the characteristic index may be a Demographic Variable or a Clinical Variable. For example, the characteristic index may be a demographic indicator such as sex, racial residence or place of birth, or may be a clinical indicator such as an index indicating the severity of a specific disease. Hereinafter, for convenience of explanation, the characteristic indexes will be described according to examples of age and sex. However, it is to be understood that the characteristic index is not limited to the above example and may include demographic index or clinical index.

For example, in the partial grouping step S100, the normalization target data set can be divided into a plurality of subgroups based on age and sex. For example, age is divided into subgroups such as 0 to 10, 11 to 20, 21 to 30, 31 to 40, 41 to 50, 51 to 60, 61 to 70, 71 to 80, And divide the male and female gender into subgroups at the same time.

Also, the sub-group segmentation step S100 may divide the subset based on the specific index in the same manner as the reference data set.

5 is a detailed flowchart of a method for normalizing the manifold inspection data according to the present invention.

Here, the subgroup segmentation step S100 may include a step S120 of the second processing unit 200 interlocking with the normalization target database 20 to divide the normalization target data set into a plurality of subgroups based on the characteristic index, . &Lt; / RTI >

The partial grouping step S100 further includes a step S110 of the first processing unit 100 interlocking with the reference database 10 to divide the reference data set into a plurality of the subgroups based on the characteristic index . The step S110 of dividing the reference data set into a plurality of the subgroups based on the characteristic index is performed once when the reference data set is initially set unless the division of the subgroup is changed, You can use a set of reference data that is divided into subgroups. Therefore, if the reference data set has already been divided into subgroups, step S110 may be omitted.

Here, the normalized data set and the reference data set are all divided into the same number of subgroups based on the same characteristic index. The divided sub-group is used to calculate the correction statistical index by using the information on the distribution of the population numbers per sub-group in the correction statistical index calculation step (S200) to be described below.

Next, the operation of the correction statistical index calculation step S200 will be described.

The correction statistical index calculation step S200 calculates a correction statistical index for normalizing the normalization target data set according to the reference data set. Here, the correction statistical index may be calculated using the statistical information of the normalization target data set and the reference data set stored in the reference database 10 serving as a reference for performing the normalization.

To this end, the correction statistical index calculation step (S200) may calculate the statistical information of the normalization target data set including the information on the distribution of the population numbers of the subset of the normalization target data set. The correction statistical index calculation step (S200) may calculate the correction statistical index using the information on the distribution of the population numbers of the subset of the normalization target data set.

Here, the statistical information of the normalization target data set may include information on the distribution of the population numbers of the subset of the normalization target data set. More specifically, the statistical information of the normalization target data set includes the number of data samples of the normalization target data set (

), The number of data samples per subgroup of the normalization target data set (

, The number of data samples of the ith subset of the normalization target data set). That is, information on the distribution of the population numbers of the subset of the normalization target data set may be expressed by the number of data samples and the number of data samples per subset.

Also, the correction statistical index calculation step (S200) may calculate the statistical information of the reference data set including the information on the distribution of the number of the population of the reference data set. The correction statistical index calculation step S200 may calculate the correction statistical index using the statistical information of the normalization target data set and the statistical information of the reference data set calculated as described above.

Here, the statistical information of the reference data set calculated in the correction statistical index calculating step (S200) may be calculated as an average of the inspection data values of the data samples of the reference data set

) And an average of the inspection data values of the sub-group data samples of the reference data set

, The average of the inspection data values of the data samples of the ith subset of the reference data set) and the number of data samples per subset of the reference data set

, The number of data samples of the ith subset of the reference data set).

And the correction statistical indicator may include a corrected mean and a corrected standard deviation of the reference data set. Wherein the corrected average and corrected standard deviation of the reference data set are calculated by averaging and standard deviation of the reference data set using statistical information of the normalized data set and statistical information of the reference data set, It is deviation. To this end, the correction statistical index calculation step (S200) may calculate the corrected average and corrected standard deviation of the reference data set using the statistical information of the normalization target data set and the statistical information of the reference data set.

Here, the corrected average and the corrected standard deviation can be calculated by the following equations (1) and (2).

here

Wow

Are the corrected mean and the corrected standard deviation of the reference data set, respectively,

Is the inspection data value of the data samples of the ith subset of the reference data set,

Is the average of the inspection data values of the data samples of the reference data set,

Is the average of the inspection data values of the data samples of the ith subset of the reference data set, S is the number of said subset,

Is the number of data samples of the normalization target data set,

Is the number of data samples of the ith subset of the reference data set,

Is the number of data samples of the ith subset of the normalization target data set.

Hereinafter, the operation of the correction statistical index calculation step (S200) will be described with reference to FIG. 5 showing a detailed flowchart of the method of normalizing the manifold inspection data according to the present invention.

The correction statistical index calculation step S200 includes a step S220 of calculating the statistical information of the normalization target data set by the second processing unit 200 linked with the normalization object database 20, The first processing unit 100 calculates the statistical information of the reference data set, and the first processing unit 100 uses the statistical information of the normalized data set and the statistical information of the reference data set, And calculating the correction statistical index (S230).

Here, steps S210 and S220 may be performed first. In addition, the step S210 of the first processor 100 to calculate the statistical information of the reference data set does not need to be repeatedly performed unless the division of the subgroup is changed after the operation is performed once, The statistical information of the reference data set previously calculated can be used. Therefore, if the statistical information of the normalization target data set has already been calculated, step S110 may be omitted.

The correction statistical index calculation step S200 may further include a step S225 of transmitting the statistical information of the normalization target data set calculated by the second processing unit 200 to the first processing unit 100 in step S220 have. The first processor 100 may calculate the correction statistical index using the statistical information of the normalization target data set and the statistical information of the reference data set (S230).

The correction statistical index calculation step S200 may further include a step S235 of transmitting the correction statistical index calculated by the first processing unit 100 to the second processing unit 200. [ Then, the second processing unit 200 may perform normalization using the correction statistical index received as described above (S300).

In the case where the processing unit is divided into the first processing unit 100 and the second processing unit 200 as described above, the statistical information of the normalization target data set or the correction statistical index It is possible to maintain the security of the area data stored in each database by transmitting only the specific information generated for normalization and performing normalization.

Next, the operation of the normalization step (S300) will be described.

In the normalization step S300, the inspection data values of the data samples included in the normalization target data set are normalized based on the reference data set using the correction statistical index. Here, the normalization operation as described above can be performed by the second processing unit 200.

In more detail, the normalizing step S300 includes comparing the inspection data values of the data samples included in the normalization target data set with the corrected average and the corrected standard deviation of the reference data set and the data samples of the normalization target data set Using the average and standard deviation of the test data values of the test data.

Here, the normalization step S300 may normalize the inspection data values of the data samples included in the normalization target data set according to the following equation (3).

here

Is the normalized result,

Is an inspection data value of a data sample included in the normalization target data set,

Wow

Is an average and standard deviation of the inspection data values of the data samples of the normalization target data set,

Wow

Are the corrected mean and the corrected standard deviation of the reference data set, respectively.

8 is a reference diagram showing an example of the operation of the correction statistical index calculation step S200 according to the present invention. As shown in FIG. 8, the correction statistical index calculation step S200 calculates a corrected statistical index of the reference data set (Dataset Da) to normalize the normalized data set (Dataset Db)

) And standard deviation (

) Can be calculated. In Figure 8, mean is the average of the test data values of the data samples belonging to each subset (Subgroup (i))

, ), N is the number of data samples belonging to each subset, and the average of the squared differences from the total mean is

And

it means. Here, the respective variables are the same as the meanings of the variables in the above-mentioned equations (1) to (3). In this case, the corrected average of the reference data set (

) And standard deviation (

Can be calculated according to Equations (1) and (2).

2 is a block diagram illustrating a multi-pipe inspection data normalization system in accordance with one embodiment of the present invention.

2, in order to normalize the normalization object database 20 on the basis of the reference database 10, the system for normalizing the manifold inspection data according to the present invention includes a first processing unit 100 interlocked with the reference database 10, And a second processing unit 200 linked to the database 20. Here, the multi-pipe inspection data normalization system according to the present invention can operate in the same manner as the multi-pipe inspection data normalization method according to the present invention described in detail with reference to FIG. 1 to FIG. The overlapping portions will be omitted and briefly described.

The first processor 100 interoperates with the reference database 10 that stores a reference data set as a reference for performing normalization and the second processor 200 stores a normalized data set to be normalized And interacts with the normalization object database 20.

Here, the first processing unit 100 calculates a correction statistical index for normalizing the normalization target data set according to the reference data set.

The second processing unit 200 divides the normalization target data set into at least two or more subgroups based on a predetermined characteristic index and sets the inspection data values of the data samples included in the normalization target data set as the correction statistics And uses the index to normalize based on the reference data set.

FIG. 6 is a detailed block diagram of the first processing unit 100 operating in conjunction with the reference database 10 in the multi-pipe inspection data normalization system according to the present invention. FIG. 7 is a detailed block diagram of the normalization inspection data normalization system according to the present invention. FIG. 8 is a detailed block diagram of a second processing unit 200 that operates in conjunction with the database 20; FIG.

As shown in FIG. 6, the first processor 100 may include a first subgroup dividing unit 110, a first statistical information calculating unit 120, and a correction statistical index calculating unit 130.

The first subgroup division section 110 divides the set of reference data into the subgroup based on the characteristic index.

The first statistical information calculating unit 120 calculates statistical information of the reference data set including information on the distribution of the number of the population of the reference data set.

The correction statistical index calculator 130 calculates the correction statistical index using the statistical information of the normalization target data set, the statistical information of the reference data set, and the inspection data values of the data samples of the reference data set.

As shown in FIG. 7, the second processing unit 200 may include a second partial grouping unit 210, a second statistical information calculating unit 220, and a normalizing unit 230.

Here, the second subgroup dividing unit 210 divides the normalization target data set into the subgroup based on the characteristic index.

The second statistical information calculating unit 220 calculates the statistical information of the normalization target data set including the information on the distribution of the number of the population of the subset of the normalization target data set.

The normalization unit 230 normalizes the inspection data values of the data samples included in the normalization target data set on the basis of the reference data set using the correction statistical index.

The manifold inspection data normalization computer program according to another embodiment of the present invention may be a computer program stored in the medium for executing the method of normalizing the manifold inspection data in combination with the database.

Hereinafter, a method for normalizing the manifold inspection data according to the present invention and the performance and effects of the system will be described based on actual experimental results.

The partial population calibration normalization method according to the present invention is a method of normalizing a partial population correction normalized blood pressure data including blood test data (BUN), serum creatinine, hematocrit, hemoglobin, serum potassium, and total bilirubin obtained in two hospitals (A and B) , And it was confirmed that the heterogeneity between test data was effectively eliminated when compared with the existing methods as described below. The standardized difference in mean (SDM) and the Kolmogorov Smirnov Value (KS) were used for the performance test.

9 is a reference diagram showing the heterogeneity of data existing between data sets obtained from different medical institutions. As can be seen from FIGS. 9A to 9D, between the inspection data obtained in the two different hospitals A and B, the age (FIG. 9A and FIG. 9C) ) And gender view (FIGS. 9 (b) and 9 (d)) show the distribution characteristics of different hemoglobin.

The results of performing the group correction normalization method according to the present invention on the blood test data obtained from the two hospitals as described above and the conventional method of the Z-score transformation and the Rank-based inverse normal transformation (INT) The results obtained by performing the normalization by applying the respective methods are compared with each other in terms of the standardized difference in mean (SDM) and the Kolmogorov Smirnov value (KS, Kolmogorov Smirnov Value) (a) and (b).

The standardization method (Z-score transformation) is described in "Cheadle C, Vawter MP, Freed WJ, et al .: Analysis of microarray data using Z score transformation." J Mol Diagn 2003; 5: 73-81 DOI: 10.1016 / S1525 -1578 (10) 60455-2 ", and Rank-based inverse normal transformation (INT) method was used in" Bolstad BM, Irizarry RA, Astrand M, et al. A comparison of normalization methods for high density oligonucleotide DOI: ", Bioinformatics 2003; 19: 185-193.

Here, the standardized difference in means (SDM) is calculated as shown in Equation (5), and the Kolmogorov Smirnov Value (KS) can be calculated as Equation (6).

here

Is the mean of the m data set,

Is the variance of the m data set.

Where sup is the supremum of the set of distances,

Wow

Is the empirical distribution function of the first data set and the second data set, respectively.

As can be seen in FIG. 10, the SAN cancellation method according to the present invention is superior to the conventional methods such as Z-score transformation method or Rank-based inverse normal transformation method INT It can be confirmed that the normalization has been performed. Here, RAW means raw data.

As described above, according to the method and apparatus for normalizing multi-organ examination data according to the present invention, it is possible to eliminate the heterogeneity of the values of clinical examination data existing between databases generated according to the numerical values of clinical examination data measured at different medical institutions, It is possible to efficiently use the integrated analysis as one.

It is to be understood that the present invention is not limited to these embodiments, and all elements constituting the embodiment of the present invention described above are described as being combined or operated in one operation. That is, within the scope of the present invention, all of the components may be selectively coupled to one or more of them.

In addition, although all of the components may be implemented as one independent hardware, some or all of the components may be selectively combined to perform a part or all of the functions in one or a plurality of hardware. As shown in FIG. In addition, such a computer program may be stored in a computer readable medium such as a USB memory, a CD disk, a flash memory, etc., and read and executed by a computer to implement an embodiment of the present invention. As the recording medium of the computer program, a magnetic recording medium, an optical recording medium, or the like can be included.

Furthermore, all terms including technical or scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined in the Detailed Description. Commonly used terms, such as predefined terms, should be interpreted to be consistent with the contextual meanings of the related art, and are not to be construed as ideal or overly formal, unless expressly defined to the contrary.

It will be apparent to those skilled in the art that various modifications, substitutions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. will be. Therefore, the embodiments disclosed in the present invention and the accompanying drawings are intended to illustrate and not to limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings . The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

10: Reference database
20: Normalization target database
100:
200: second processing section
S50: Database selection step
S100: Subgrouping step
S200: Calculation statistical index calculation step
S300: Normalization step

Claims

A method for normalizing a manifold inspection data,
A subgroup dividing step of dividing the normalization target data set stored in the normalization target database to be normalized into at least two or more subgroups based on a predetermined characteristic index;
Calculating a correction statistical index for normalizing the normalization target data set according to the reference data set using the statistical information of the normalization target data set and the reference data set stored in the reference database serving as a reference for performing normalization, Statistical index calculation step; And
And normalizing the numerical values of the inspection data of the data samples included in the data set to be normalized based on the reference data set using the correction statistical index.

The method according to claim 1,
Wherein the reference database and the normalization object database are connected to each other through a network.

The method according to claim 1,
Wherein the characteristic index includes a demographic index or a clinical index.

The method according to claim 1,
Wherein the reference database and the normalization object database are medical record databases for storing medical records,
Wherein the inspection data values of the reference data set and the data samples included in the normalization target data set are medical data values measured for the patient or the subject.

3. The method of claim 2,
Wherein the reference database and the normalization target database are databases in which a plurality of electronic medical record systems are interconnected via a network and which operate in cooperation with the electronic medical record system, Wherein the data is normalized.

3. The method of claim 2,
Further comprising: a database selection step of selecting the reference database or the normalization target database among at least two or more networks connected through the network.

The method according to claim 1,
Wherein the subgrouping step comprises:
And a second processing unit operable to interoperate with the normalization target database, the method comprising: dividing the normalization target data set into a plurality of subgroups based on the characteristic index.

8. The method of claim 7,
Wherein the subgrouping step comprises:
Further comprising the step of: dividing the reference data set into a plurality of the subgroups based on the characteristic index, wherein the first processing unit is interlocked with the reference database.

The method according to claim 1,
The correction statistical index calculating step may include:
Wherein the correction statistical index is calculated using information on the distribution of population numbers of the subgroups of the normalization target data set.

10. The method of claim 9,
Wherein the correction statistical index includes a corrected average of the reference data set and a corrected standard deviation calculated by correcting the reference data set using information on the distribution of population numbers of the subset of the normalized data set Wherein the data is normalized.

11. The method of claim 10,
The correction statistical index calculating step may include:
Calculating statistical information of the normalization target data set including information on the distribution of population numbers of the subset of the normalization target data set,
The correction statistical index is calculated by using the statistical information of the reference data set including the statistical information of the normalized data set and the information on the distribution of the number of the population of the reference data set by the partial data set calculated as described above A method of normalizing the data of the manifold inspection data.

12. The method of claim 11,
The correction statistical index calculating step may include:
A second processing unit interlocked with the normalization target database, the method comprising: calculating statistical information of the normalization target data set;
A first processing unit linked with the reference database, the method comprising: calculating statistical information of the reference data set; And
Wherein the first processing unit includes a step of calculating the correction statistical index using the statistical information of the normalization target data set and the statistical information of the reference data set.

The method of claim 11, wherein
Wherein the statistical information of the reference data set includes an average of the inspection data values of the data samples of the reference data set, an average of the inspection data values of the data samples of the subset of the reference data set, Includes the population of star data samples,
Wherein the statistical information of the normalization target data set includes a number of data samples of the normalization target data set and a number of data samples of the subset of the normalization target data set.

14. The method of claim 13,
Wherein the corrected average and the corrected standard deviation are calculated by the following Equations (1) and (2).
[Formula 1]

[Formula 2]

here

Wow

Is the number of data samples of the normalization target data set,

Is the number of data samples of the ith subset of the reference data set,

11. The method of claim 10, wherein the normalizing step
Wherein the number of the inspection data of the data samples included in the normalization target data set is set to the average and standard deviation of the corrected average value and the corrected standard deviation of the standard data set and the inspection data values of the data samples of the normalization target data set, And the normalization is performed using the normalized normalization method.

16. The method of claim 15,
Wherein the normalizing step normalizes the inspection data values of the data samples included in the normalization target data set according to Equation (3).
[Formula 3]

here

Is the normalized result,

Wow

17. A computer program stored in a medium for executing a method for normalizing manifold inspection data according to any one of claims 1 to 16 in combination with a database.

In a multi-panel inspection data normalization system,
A first processing unit operable to interoperate with a reference database storing a reference data set as a reference for performing normalization; And
And a second processing unit operable to interoperate with a normalization target database storing a normalization target data set to be normalized,
Wherein the first processing unit calculates a correction statistical index for normalizing the normalization target data set according to the reference data set,
Wherein the second processing unit divides the normalization target data set into at least two or more partial groups based on a predetermined characteristic index and sets the inspection data values of the data samples included in the normalization target data set using the correction statistic index Wherein the normalization is performed on the basis of the reference data set.

19. The apparatus according to claim 18,
A second subgroup division unit for dividing the normalization target data set into the subgroup based on the characteristic index;
A second statistical information calculating unit for calculating statistical information of the normalization target data set including information on the distribution of the number of the population of the normalization target data set; And
And a normalization unit for normalizing inspection data values of data samples included in the normalization target data set based on the reference data set using the correction statistical index.

20. The image processing apparatus according to claim 19,
A first subgroup dividing unit for dividing the reference data set into the subgroup based on the characteristic index;
A first statistical information calculating unit for calculating statistical information of the reference data set including information on the distribution of the number of the population of the reference data set; And
And a correction statistical index calculating unit for calculating the correction statistical index using the statistical information of the normalization target data set, the statistical information of the reference data set, and the inspection data values of the data samples of the reference data set. Manifold system for data of manifold inspection data.