KR20180062917A - Apparatus and method for diagnosing cardiovascular disorders using genome information and health medical examination data - Google Patents

Apparatus and method for diagnosing cardiovascular disorders using genome information and health medical examination data Download PDF

Info

Publication number
KR20180062917A
KR20180062917A KR1020170012278A KR20170012278A KR20180062917A KR 20180062917 A KR20180062917 A KR 20180062917A KR 1020170012278 A KR1020170012278 A KR 1020170012278A KR 20170012278 A KR20170012278 A KR 20170012278A KR 20180062917 A KR20180062917 A KR 20180062917A
Authority
KR
South Korea
Prior art keywords
data
learning
cardiovascular disease
gene
snp
Prior art date
Application number
KR1020170012278A
Other languages
Korean (ko)
Inventor
김대희
김민호
김영원
이동훈
임명은
정호열
최재훈
한영웅
Original Assignee
한국전자통신연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국전자통신연구원 filed Critical 한국전자통신연구원
Priority to US15/808,476 priority Critical patent/US20180150608A1/en
Publication of KR20180062917A publication Critical patent/KR20180062917A/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to an apparatus and a method for diagnosing cardiovascular diseases using genome information and health checkup data, and a method and apparatus for diagnosing cardiovascular disease by using periodically measured personal health examination data and genome data and genes of cardiovascular diseases, The present invention relates to an apparatus and a method for predicting the incidence of diseases and enabling early diagnosis of cardiovascular diseases.

Description

[0001] APPARATUS AND METHOD FOR DIAGNOSING CARDIOVASCULAR DISORDERS USING GENOME INFORMATION AND HEALTH MEDICAL EXAMINATION DATA [0002]

The present invention relates to an apparatus and a method for diagnosing cardiovascular disease using genome information and health checkup data, and a method for diagnosing a cardiovascular disease by using a periodic measurement of a user's personal health examination data and genome information, The present invention relates to an apparatus and a method for accurately and accurately diagnosing a cardiovascular disease in a subject.

As the living standards of people increase due to the recent industrial development and the increase of income due to economic development, modern society is gradually entering into an aging society, and the prevalence of cardiovascular diseases is increasing due to changes in lifestyle and erroneous eating habits The mortality rate has been steadily increasing.

In general, cardiovascular disease occurs in the heart or major arteries, such as coronary artery disease. Once it develops, it has a very high mortality rate, leading to premature death and costly treatment.

The causes of cardiovascular diseases include the combination of lifestyle habits such as obesity, smoking, lack of exercise, stress, and the effects of genes found in these diseases.

However, early detection of cardiovascular disease can prevent the progression of cardiovascular disease through appropriate management and keep the risk of death from disease low throughout the lifetime. Therefore, the reliability of early diagnosis and diagnosis of cardiovascular disease is recognized as a very important problem in society.

In the past, it is common to diagnose cardiovascular disease by examining echocardiogram or ECG (Electro Cardiogram, ECG, ECG).

However, the diagnosis of cardiovascular disease through echocardiography or ECG is costly and takes a long time to diagnose. There is a limit to early diagnosis and early treatment and management of cardiovascular disease.

To solve these problems, a method of diagnosing cardiovascular diseases using only the personal health examination data of the user is being developed. The method for diagnosing cardiovascular diseases using the health examination data is a technique that expresses probability of occurrence of cardiovascular disease within 10 years by reflecting only simple physical data related to lifestyle acquired through health examination data and provides it to the user.

However, the method for diagnosing cardiovascular diseases using the above health examination data has the problem that the accuracy and reliability are significantly lowered because the probability of occurrence of a true vessel disease is presented only by the user's body information, excluding the influence of genes found in cardiovascular diseases have.

Therefore, in the present invention, it is possible to diagnose cardiovascular diseases for a user early using the polymorphism information obtained by using the target gene of the cardiovascular disease of interest, the personal health examination data of the user and the genome information, And to provide an apparatus and a method for performing the method accurately.

Next, a brief description will be given of the prior arts that exist in the technical field of the present invention, and technical matters which the present invention intends to differentiate from the prior arts will be described.

First, Korean Patent No. 0876764 (Dec. 23, 2008) relates to a cardiovascular disease diagnosis system and its diagnostic service method. It analyzes the user's actual cardiogram and electrocardiogram related medical data to generate an electrocardiogram analysis result value, A virtual cardiac degree and an electrocardiogram are generated by performing a virtual cardiac simulation using an argument value for cardiac simulation to analyze the virtual cardinality and heart rate, the actual cardinality and the electrocardiogram, To a technique for performing a final diagnosis of a related disease.

In the prior art, as described above, the user has to perform a cardiogram and an electrocardiogram (ECG) in an actual medical institution in order to receive a blood-related disease through on-line service. .

On the other hand, the present invention extracts SNP (Single Nucleotide Polymorphism) feature data from cardiovascular disease-related genome data and user's genome data, reduces data amount, and employs a machine learning technique to extract characteristics of user's personal health examination data Thus, it is possible to diagnose cardiovascular diseases quickly and accurately by combining the data with the health examination data of each cycle.

Korean Patent Laid-Open Publication No. 2009-0077535 (July 15, 2009) relates to a device for diagnosing a cardiovascular disease, which extracts a plurality of protein information from a blood sample of a patient using a microarray, And a device for diagnosing cardiovascular disease by comparing patterns generated from past cardiovascular disease data of patients.

Although the prior art has some similarities to the technical features of the present invention in that it performs diagnostic of cardiovascular disease using protein information, the present invention collects the gene for cardiovascular disease, Genetic data and personal health screening data to enable early diagnosis of a cardiovascular disease of a user. The prior art does not describe or suggest the technical characteristics of the present invention.

Disclosure of Invention Technical Problem [8] Accordingly, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a method and apparatus for extracting SNP feature data (single nucleotide polymorphism information) of a gene from gene data, It is an object of the present invention to provide an apparatus and method which can accurately and reliably diagnose a cardiovascular disease.

Further, by applying the machine learning to the SNP characteristic data and the personal health examination data to extract the characteristics of the SNP characteristic data and the personal health examination data to reduce the number of characteristics of the SNP characteristic data and the personal health examination data, And an object thereof is to provide an apparatus and a method for diagnosing an abnormality.

The apparatus for diagnosing a cardiovascular disease using genetic information and health examination data according to an embodiment of the present invention includes a gene data learning unit for learning using a plurality of gene data, a health examination data learning And an integrated learning unit for collecting and learning the learning results of the gene data and the health examination data and generating a prediction model.

Further, the integrated learning unit and the health examination data learning unit recursively learn and reflect learning results of a specific learning stage to a previous learning stage to improve learning performance.

And the gene data learning unit learns SNP feature data from a plurality of gene data and learns the extracted SNP feature data.

The health examination data learning unit may convert the plurality of health examination data into a two-dimensional binary image so that the feature values of the plurality of health examination data have values of 0 and 1, The health examination data of the patient is learned.

The cardiovascular disease diagnosing apparatus further includes a SNP extracting unit for collecting gene data for each cardiovascular disease and for extracting SNP position information for each collected gene data, wherein the SNP feature data refers to the extracted SNP position information .

The cardiovascular disease diagnosing apparatus may further include a user interface unit for receiving query data including user's personal health data and gene data, wherein the cardiovascular disease diagnostic apparatus further comprises: And extracts SNP feature data from the genome data of the corresponding user by referring to the stored SNP location information.

The cardiovascular disease diagnosing apparatus may further include a cardiovascular disease predicting unit for inputting the user's personal health data converted into the two-dimensional binary image and the extracted SNP feature data into the generated prediction model, and outputting the prediction result of each cardiovascular disease .

In addition, the method for diagnosing cardiovascular diseases according to an embodiment of the present invention may include a gene data learning step for learning using a plurality of gene data, a health examination data learning step for learning using a plurality of health examination data, And an integrated learning step of learning and integrating the learning results of the data and generating a prediction model.

In addition, the integrated learning step and the health examination data learning step are characterized by cyclically learning and reflecting learning results of a specific learning stage to a previous learning stage to improve learning performance.

The gene data learning step may include extracting SNP feature data from the plurality of gene data and learning the extracted SNP feature data.

The health examination data learning step may include a step of converting the plurality of health examination data into a two-dimensional binary image so as to have characteristic values of the plurality of health examination data to have values of 0 and 1, And a plurality of health examination data are learned.

Further, the method for diagnosing cardiovascular disease may further include a SNP extracting step of collecting gene data for cardiovascular diseases and extracting SNP position information for each of the collected gene data, wherein the SNP feature data includes information on the extracted SNP position information .

The cardiovascular disease diagnosis method further includes a user query data input step of receiving query data including user's personal health data and gene data, wherein the method for diagnosing cardiovascular diseases comprises the steps of: Dimensional binary image, and extracts SNP feature data from the gene data of the corresponding user by referring to the stored respective SNP location information.

The cardiovascular disease diagnosis method may further include a step of predicting a cardiovascular disease by inputting the user's personal health data converted into the two-dimensional binary image and the extracted SNP feature data into the generated prediction model and outputting diagnosis results for each cardiovascular disease And further comprising:

The present invention relates to an apparatus and method for diagnosing a cardiovascular disease using genetic information and health examination data, which comprises extracting SNP position information from genetic data on cardiovascular disease, extracting SNP characteristics from the user's genome data Extracting the data, and diagnosing the cardiovascular disease of the user by using the extracted SNP feature data and the user's personal health examination data.

1 is a conceptual diagram for schematically explaining an apparatus and method for diagnosing a cardiovascular disease using dielectric data and health examination data according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a method of imaging personal health screening data of a user according to an exemplary embodiment of the present invention. Referring to FIG.
FIG. 3 is a diagram for explaining a method for searching for a protein produced by each gene using gene data of cardiovascular disease according to an embodiment of the present invention.
4A is a diagram showing a result of searching UCSC Known Gene database using protein ID information according to an embodiment of the present invention.
4 (b) is a diagram illustrating a schema of a UCSC Known Gene database according to an embodiment of the present invention.
FIG. 5 is a view for explaining a method for extracting SNP feature data from gene data for learning by obtaining SNP location information of gene data of cardiovascular disease subject to an embodiment of the present invention.
6 is a diagram illustrating a learning process according to an embodiment of the present invention.
FIG. 7 is a block diagram illustrating a configuration of a cardiovascular disease diagnosis apparatus according to an embodiment of the present invention. Referring to FIG.
FIG. 8 is a flowchart illustrating a procedure for labeling and storing SNP location information for each cardiovascular disease gene data according to an embodiment of the present invention.
9 is a flowchart illustrating a procedure for diagnosing cardiovascular disease for a user based on query data input from a user according to an embodiment of the present invention.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Like reference symbols in the drawings denote like elements.

1 is a conceptual diagram for schematically explaining an apparatus and method for diagnosing a cardiovascular disease using whole information and health examination data according to an embodiment of the present invention.

 As shown in FIG. 1, the cardiovascular disease diagnosis apparatus 100 periodically collects genetic data and health examination data of a person suffering from or suffering from cardiovascular disease.

In addition, the collected gene data and health checkup data are learning data for generating a predictive model for predicting cardiovascular disease of a specific user.

The gene data and the health examination data may be provided by a hospital or a government agency, and may be collected by direct access to a database provided in the hospital or a government agency, or collected by request.

In addition, the cardiovascular disease diagnosis apparatus 100 learns the collected gene data and health examination data to generate a cardiovascular disease prediction model, and estimates the cardiovascular disease of the specific user based on the genome data of the specific user and the personal health examination data So that diagnosis can be performed early.

In addition, the cardiovascular disease diagnosis apparatus 100 converts the collected health examination data into a two-dimensional binary monochrome image, extracts SNP feature data from the collected gene data, One health checkup data and SNP feature data are learned.

A method for converting the health examination data into a two-dimensional binary monochrome image will be described in detail with reference to FIG.

In order to extract SNP feature data from the collected gene data for learning, SNP position information on the cardiovascular disease gene data is required, which is generated based on the gene data of the cardiovascular disease.

Therefore, the cardiovascular disease diagnostic apparatus 100 preferentially constructs the cardiovascular disease gene list database 200 to generate SNP location information of closely related gene data for each cardiovascular disease.

The cardiovascular disease gene list database 200 is connected to the document database 300 and collects gene data of cardiovascular disease according to a preset period through a document search.

The document database 300 includes a genetic association database (GAD), a literature-derived human gene-disease network (LHGDN), a beef data (BFD), or a combination thereof.

The document database 300 is a database 300 storing a list of genes for various diseases including a list of cardiovascular disease-related genes.

The cardiovascular disease diagnosis apparatus 100 periodically connects to the document database 300 to collect and store gene data closely related to specific cardiovascular diseases such as hypertension, arteriosclerosis, myocardial infarction, and angina pectoris.

The cardiovascular disease diagnosis apparatus 100 also accesses the Uniprot database 400, the UCSC know gene database 500, and the NCBI dbSNP database 600 to obtain SNP location information on the gene data related to the cardiovascular disease . The stored SNP location information becomes reference data for extracting SNP feature data from the gene data for the learning.

On the other hand, the reason for obtaining and storing the SNP position information is that human genome data (DNA) is represented by a base, which is about 3 billion. Most of them are similar to most people, and among them, different bases occur in about 1 in 1000, which is called single nucleotide polymorphism (SNP).

Therefore, diagnosis of cardiovascular disease using human genome data is problematic because the amount of data is so large that its computational complexity and time complexity are close to infinity. The cardiovascular disease diagnosis apparatus 100 uses only gene data related to cardiovascular diseases and extracts SNP location information from the gene data to diagnose cardiovascular diseases. Generally, the number of bases for one gene is about 23,000, of which about 23 are represented by SNPs.

In addition, when the query data including the user's personal health data and the genome data is input, the user's personal health examination data is composed of a plurality of features (e.g., blood glucose, blood pressure, family history, cholesterol, etc.) The cardiovascular disease diagnosis apparatus 100 converts the personal health examination data into a binary image for rapid diagnosis and extracts the characteristics of the personal health examination data by applying a machine learning technique to the converted binary image , Reducing the total number of features needed for diagnosis.

In addition, the cardiovascular disease diagnosis apparatus 100 extracts SNP feature data from the user's genome data using the stored SNP location information for the cardiovascular disease gene data.

In addition, the cardiovascular disease diagnosis apparatus 100 may derive the cardiovascular disease prediction result for the user by inputting the personal health examination data in which the feature number is reduced and the extracted SNP feature data to the generated cardiovascular disease prediction model, Lt; / RTI >

On the other hand, the prediction result is calculated as a probability value for each cardiovascular disease (i.e., has a value of 0 to 1).

In addition, the cardiovascular disease diagnosis apparatus 100 may be constructed in a hospital that provides cardiovascular disease-related care services, or may be constructed as a cloud server and a platform server on the Internet, and a user may access the cardiovascular disease diagnosis apparatus 100 via a wired / wireless communication network Provide diagnosis services for cardiovascular disease. At this time, the user inputs his / her own personal health data and dielectric data necessary for providing a cardiovascular disease diagnosis service to the cardiovascular disease diagnosis apparatus 100.

FIG. 2 is a diagram illustrating a method of imaging personal health screening data of a user according to an exemplary embodiment of the present invention. Referring to FIG.

As shown in FIG. 2, the personal health examination data of the user is an example of health examination data obtained in general health examination, and includes characteristics (variable name), criteria for characteristics, and characteristic values for each year . In addition, characteristics such as smoking and drinking that can not be represented by numerical values can be added.

In addition, the cardiovascular disease diagnosis apparatus 100 converts the user's personal health examination data into a two-dimensional image.

The horizontal axis of the two-dimensional image is defined as a plurality of features shown in the personal health examination data, and the vertical axis is defined as data for each year.

Also, if the value for each feature falls within the reference value range (i.e., the normal range), the yearly data for the feature is set to 0 and the value is set to 1 if the value is outside the reference value range (i.e., abnormal range).

As shown in FIG. 2, if the personal health screening data is data measured from 2002 to 2013 for 19 characteristics, it can be converted into an image having a size of 19 horizontally and 12 vertically with a total of 12 years of data . That is, the personal health examination data for each user is converted into a two-dimensional binary monochrome image of 19 * 12 and generated.

Then, the cardiovascular disease diagnosis apparatus 100 reduces the number of features of the personal health examination data by extracting features by applying convolution and pulling techniques of CNN (Convolutional Neural Network) to the personal health examination data converted into the image . The personal health examination data for the plurality of patients and the genome information of the patient can be learned so that the diagnosis of cardiovascular diseases can be performed promptly using the personal health examination data that does not utilize the features of all health examination data, do.

FIG. 3 is a diagram for explaining a method for searching for a protein produced by each gene using gene data of cardiovascular disease according to an embodiment of the present invention.

As shown in FIG. 3, the protein ID information about the gene data can be extracted by connecting to the UniProt database 400 to search for a protein produced by the specific gene for the cardiovascular disease.

For example, when a "MTHFR" gene closely related to hypertension among cardiovascular diseases is searched, protein ID information can be extracted as shown in FIG. 3 (red dotted line portion). In the case of a homo sapiens, Information P42898 is retrieved.

In addition, the cardiovascular disease diagnosis apparatus 100 stores the protein ID information for the searched "MTHFR" gene in the database 200.

That is, the cardiovascular disease diagnosis apparatus 100 searches for a protein produced by the gene according to a gene closely related to each cardiovascular disease (for example, a hypertension related gene "MTHFR" or atherosclerosis related gene "CD137" Protein ID information is stored in database 200 for each cardiovascular disease.

Hereinafter, the process of extracting the SNP position information of the gene data based on the protein searched by using the gene data for cardiovascular disease will be described with reference to FIG. 4 to FIG.

4A is a diagram showing a result of searching UCSC Known Gene database using protein ID information according to an embodiment of the present invention.

4 (b) is a diagram illustrating a schema of a UCSC Known Gene database according to an embodiment of the present invention.

As shown in FIG. 4 (a), when the UCSC Known Gene database 500 is searched using the P42898 information, which is the protein ID information retrieved using the "MTHFR" gene data as described in FIG. 3, The database 500 provides information on the gene in the form of a file including information on the chromosome information, the gene start position, the end position, the beginning and the end positions of the upper and lower limbs.

As shown in FIG. 4B, the UCSC Known Gene database 500 provides a schema for a corresponding gene, and based on the provided schema, The results show that the gene occupies the region from 11845786 to 11856547 in chr1, and there are also 8 exons. The first exon is located between 11845786 and 11850955, the second is between 11851263 and 11851363, As shown in FIG.

FIG. 5 is a view for explaining a method for extracting SNP feature data from gene data for learning by obtaining SNP location information of gene data of cardiovascular disease subject to an embodiment of the present invention.

As shown in FIG. 5, the gene is composed of an axon and an intron. Since the direct participation in protein production is a sumon, the cardiovascular disease diagnosis apparatus 100 uses the information shown in FIG. 4, The SNP is selected.

In addition, the cardiovascular disease diagnosis apparatus 100 searches the NCBI dbSNP database 600 for the cardiovascular disease-related gene and locates the SNP for the gene, and obtains the SNP position information for the gene I understand.

The results of locating the SNPs are labeled and stored in the database 200. 5, the cardiovascular disease diagnosing apparatus 100 may include (<chr1, 1250>, 1), (<chr1, (SNP position information) for extracting the SNP feature data from the gene data for learning by labeling the SNP feature information of the SNP feature information by labeling the SNP feature information, And stores it.

When the data to be used for learning to generate a predictive model is input (i.e., gene data) using the labeling result, the apparatus 100 for diagnosing cardiovascular disease generates final learning data with reference to the above label. That is, if the position 1250 of chr1 is identified and the data of the position is the same as the human reference dielectric data GRCh38, 0 is set. Otherwise, 1 is set. In this way, SNP feature data, which is the final learning data, is generated by referring to the information at the next position and selecting a value by comparing with the data to be used for the input learning.

Finally, the shape of the SNP feature data extracted from the genome information of the patient and used for the learning is a structure such as (1,0,0,1), (0,0,0,0) or (1,1,1,0) .

6 is a diagram illustrating a learning process according to an embodiment of the present invention.

As shown in Fig. 6, the form of the plurality of health examination data used for learning is a two-dimensional binary monochrome image, and the form of SNP feature data is composed of 0 and 1.

Also, the plurality of two-dimensional binary monochrome images are reduced (①) by using CNN, which is a machine learning technique, and the SNP feature data is obtained by using a Restricted Boltzmann Machine (RBM) And generates characteristic data (2).

Next, the cardiovascular disease diagnosis apparatus 100 inputs the feature data generated through the processes 1) and 2) into the FCN (Full Conniated Layer), and outputs the predicted results obtained by combining the health examination data and the gene data. The learning result is calculated and output as a probability value for each cardiovascular disease using the softmax function.

In addition, CNN reduces the number of features of personal health screening data through convolution, reLU, and pooling, and combines data obtained by reducing the number of features of SNP feature data through RBM to perform integrated learning through FCN And inputs it to the integrated learning unit 163.

(1), (2), (3), (4), (5), and (6)) are used to calculate the weight value. do. On the other hand, the feature extraction part of the RBM is calculated in advance regardless of the number.

Also, since the patient of the personal health examination data used for learning is already diagnosed, it is necessary to update the weight value between the respective nodes so that accurate diagnosis can be performed according to the learning result, since it is already known what type of cardiovascular disease was diagnosed. do.

The update is performed using a back propagation method to correct errors according to the order of <1>, <2>, <3>, <4>, <5>, and <6> Of patients with cardiovascular disease.

The backprogramming method is a typical error correction method of the machine learning. When the input value and the target value of the neural network are subjected to the machine learning of the given type, the weight is adjusted between the nodes, .

The adjustment of the error is propagated from the input node to the output node, detects the error, and adjusts the weights among the nodes while propagating back from the output node to the input node.

That is, the cardiovascular disease diagnosis apparatus 100 recursively learns the health examination data, the health examination data, and the gene data, and reflects the learning result of the specific learning stage to the previous learning stage to improve the learning performance, And a reliable prediction model.

Thereafter, when the difference between the output value and the target value converges within the designated range, the process of correcting the error through the backprogramming method is terminated and a final cardiovascular disease prediction model is generated.

The results of the cardiovascular disease prediction model are shown as a value between 0 and 1 for each cardiovascular disease. The closer to 1 the cardiovascular disease can be diagnosed.

That is, as shown in FIG. 6, if the output result is hypertension of 0.9, arterial lag of 0.99, and normal of 0.1, it is predicted that two types of cardiovascular diseases, hypertension and arteriosclerosis, It can be diagnosed early. In addition, it can be predicted that the cardiovascular disease may occur in the case of a predetermined value or more (for example, 0.5), and the prediction result can be provided to the user.

In addition, when the cardiovascular disease diagnosis apparatus 100 receives query data from a specific user, the cardiovascular disease diagnosis apparatus 100 predicts the probability of occurrence of a true vessel disease of the user by using the generated cardiovascular disease prediction model.

The query data includes personal health examination data of the user and the user's genome data.

Also, the cardiovascular disease diagnosis apparatus 100 converts the personal health screening data of the user into a two-dimensional binary image, refers to the SNP position information on the cardiovascular disease-specific gene data stored by labeling, The feature data is extracted. In addition, the cardiovascular disease diagnosis apparatus 100 provides the user with the results of the cardiovascular disease prediction by inputting the personal health examination data of the user who has converted the image and the SNP feature data extracted from the genome data of the user into the cardiovascular disease prediction model.

FIG. 7 is a block diagram illustrating a configuration of a cardiovascular disease diagnosis apparatus according to an embodiment of the present invention. Referring to FIG.

7, the cardiovascular disease diagnosis apparatus 100 includes a user interface unit 110 for receiving query data of a user from a user, a plurality of health examination data to be a target of learning for generating a cardiovascular disease prediction model A medical data collection unit 120 for periodically collecting gene data corresponding to the health examination data, a cardiovascular disease gene data collection unit 130 for collecting cardiovascular disease gene data, An SNP extracting unit 150 for extracting SNP location information from the acquired gene data of the subject for cardiovascular disease, and a cardiovascular disease prediction model The learning unit 160 receives the user's query data through the generated cardiovascular disease prediction model, A cardiovascular disease diagnosis unit 170 for outputting and providing a prediction result of a cardiovascular disease to a user, and a control unit 180. [

In order to generate a cardiovascular disease prediction model, the cardiovascular disease diagnosis apparatus 100 periodically collects health examination data and gene data of a person suffering from or suffering from a cardiovascular disease in the past through the learning data collection unit 120 , And the cardiovascular disease gene data is collected through the cardiovascular disease gene data collection unit 130.

In addition, the health examination data and genetic data used in the learning can be collected from domestic and overseas large hospitals and governmental agencies (for example, the Hankyung Complex, NHK Industrial Complex) or individuals, and the collected health examination data and gene data are classified into personal information : Resident number) is deleted data.

In addition, the health examination data imaging unit 140 converts the numerical value of the characteristic of the health examination data collected periodically to the values of 0 and 1, and converts the health examination data into a two-dimensional monochrome image .

In addition, the cardiovascular disease gene data collection unit 130 accesses the document database 300 to collect gene data on cardiovascular diseases.

Also, the SNP extracting unit 150 extracts the position information of the SNPs for each gene from the collected gene data, and generates and stores reference data for extracting the SNP feature data from the learning gene data.

Also, the SNP extracting unit 150 extracts SNP feature data for the SNP position information from the learning gene data using the generated reference data.

Meanwhile, the image conversion and the SNP feature data extraction have been described with reference to FIG. 2 to FIG. 5, and a detailed description thereof will be omitted.

The learning unit 160 includes a gene data learning unit 161 that learns the gene data for learning that is periodically collected, a health examination data learning unit 162 that learns the learning health examination data, and a gene data learning unit 161 And an integrated learning unit 163 for generating a cardiovascular disease prediction model by integrating the results of the learning through the health examination data learning unit 162.

Also, the input of the health examination data learning unit 162 is learning health examination data converted into a two-dimensional binary image, and by extracting the feature number through the input health examination data using the CNN technique, the dimension of the health examination data is reduced do.

The input of the SNP feature data is extracted from the input SNP feature data by extracting the SNP feature data from the corresponding SNP feature data through the BMS technique, Reduce dimensions.

In addition, the unified learning unit 153 integrally learns the dimensionally reduced health examination data and the SNP feature data, and finally generates a cardiovascular disease prediction model.

In addition, the learning unit 16 can eliminate the error in the learning step using the backproguage method in order to improve the accuracy of the cardiovascular disease prediction model, and the detailed description thereof will be omitted since it has already been described above.

The cardiovascular disease diagnosis apparatus 100 generates a cardiovascular disease prediction model, and when query data of a user is inputted from a user, the cardiovascular disease prediction model is output through the cardiovascular disease prediction model, Provide the user with disease prediction results.

In addition, the user interface unit 110 provides a user interface for accessing the cardiovascular disease diagnosis apparatus 100 to receive a cardiovascular disease diagnosis service, and receives user query data through the user interface.

The query data of the user includes personal health examination data of the user and the user's genome data, and the health examination data imaging unit 140 converts the inputted health examination data of the user into a two-dimensional binary monochrome image, And provides it to the disease diagnosis unit 170.

The SNP extracting unit 150 extracts SNP feature data from the input user's genome data and provides the extracted SNP feature data to the cardiovascular disease diagnosis unit 170.

On the other hand, the user can not know what kind of cardiovascular disease he / she is suffering. Therefore, the SNP feature data for each cardiovascular disease is extracted from the genome data of the user using the SNP location information of the genetic data for each of the stored cardiovascular diseases.

Also, the SNP extracting unit 150 compares the genetic data corresponding to the SNP position information and the human reference dielectric data from the user's genomic data, and sets SNP feature data to 0 if they are the same and to 1 if they are different, And provides it to the cardiovascular disease diagnosis unit 170.

In addition, the cardiovascular disease diagnosis unit 170 inputs the personal health screening data for the image type user and the SNP feature data extracted from the genome of the user into the cardiovascular disease prediction model to output the prediction result of the cardiovascular disease of the user, And provides it to the user.

In addition, the control unit 180 controls the learning using the gene data and the health examination data, and controls the overall operation of the cardiovascular disease diagnosis apparatus 100 including the flow of data between the constituent parts of the cardiovascular disease diagnosis apparatus 100 .

FIG. 8 is a flowchart illustrating a procedure for labeling and storing SNP location information for each cardiovascular disease gene data according to an embodiment of the present invention.

As shown in FIG. 8, in the procedure of labeling and storing SNP position information for each cardiovascular disease gene data, at least one cardiovascular disease gene data is determined by searching the literature database 300 (S110).

Next, a protein generated by the determined target cardiovascular disease gene is searched (S120).

The search is performed by inputting the gene in the UnitPro database 400 and extracting ID information about the protein generated by the gene.

Next, the cardiovascular disease diagnosis apparatus 100 acquires positional information on the SNP of the target cardiovascular disease gene using the ID information of the searched protein (S130).

The location information for the SNP is obtained from the UCSC Know Gene database 500.

Next, the cardiovascular disease diagnosis apparatus 100 compares the acquired SNP location information of each gene with the dbSNP information for each gene searched from the NCBI dbSNP database 600 (S140).

If the dbSNP information is included in the location information for each gene (S150), the SNP location information for each gene is stored in the database 200 (S160).

That is, the cardiovascular disease diagnosis apparatus 100 compares the position information on the SNP for the corresponding gene acquired from the UCSC Know Gene database 500 with the dbSNP information of the corresponding gene stored in the NCBI dbSNP database 600, Only the SNP position information corresponding to the position of the information is extracted.

SNP location information for each gene, which is labeled and stored in the database 200, is reference data for generating SNP feature data by extracting SNP location information from gene data used for learning.

9 is a flowchart illustrating a procedure for diagnosing cardiovascular disease for a user based on query data input from a user according to an embodiment of the present invention.

9, in the case of receiving query data including personal health examination data and genome data from a user (S210), the personal health examination data of the input user is input into two-dimensional monochrome Into an image (S210).

The horizontal axis and the vertical axis in the monochrome image represent numerical values of time and characteristic, and the numerical values of the characteristic are converted to have values of 0 and 1.

Next, the cardiovascular disease diagnosis apparatus 100 extracts SNP feature data from the user's genome data (S220).

The SNP feature data is extracted by comparing SNP location information of the gene for each cardiovascular disease with data of each position of the user's genome data and data of the corresponding position of the reference genome data of the user, respectively.

Next, the cardiovascular disease diagnosis apparatus 100 inputs the imaged personal health examination data and SNP characteristic data to the cardiovascular disease prediction model, and outputs the result to the user (S230).

The result is provided as a probability value for each cardiovascular disease, and when the probability value is output above a predetermined value, the user diagnoses that the corresponding cardiovascular disease can occur and provides the diagnosis to the user.

On the other hand, the cardiovascular disease prediction model reduces the number of features by applying the input personalized health examination data to the CNN technique, and the SNP feature data can also be rapidly diagnosed as cardiovascular disease by using the BRM technique .

As described above, the apparatus and method for diagnosing cardiovascular diseases using genome information and health examination data are different from the conventional techniques for diagnosing cardiovascular disease using only health examination data, and genome information and health examination data on cardiovascular diseases Thereby making it possible to diagnose cardiovascular diseases, thereby providing a more accurate and reliable diagnosis result.

Further, by using only the minimum information (SNP feature data) of the genome information, reducing the number of features of the health examination data, and learning the health examination data reducing the SNP feature data and feature number to generate a cardiovascular disease prediction model, And provide an accurate diagnosis result to the user.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. .

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the present invention.

100: cardiovascular disease diagnosis apparatus 110: user interface unit
120: learning data collection unit 130: cardiovascular disease gene data collection unit
140: Health examination data imaging unit 150: SNP extracting unit
160: learning section 160: gene data learning section
162: Health check-up data learning unit 163: Integrated learning unit
170: cardiovascular disease diagnosis part 180: control part
200: Database

Claims (14)

A gene data learning unit that learns using a plurality of gene data;
A health examination data learning unit for learning using a plurality of health examination data;
And an integrated learning unit that collectively learns the learning results of the gene data and the health examination data and generates a prediction model.
The method according to claim 1,
Wherein the integrated learning unit and the health examination data learning unit,
Wherein learning performance is improved by cyclically learning and reflecting learning results of a specific learning stage to a previous learning stage.
The method according to claim 1,
The gene data learning unit,
Extracts SNP feature data from a plurality of gene data, and learns the extracted SNP feature data.
The method according to claim 1,
Wherein the health examination data learning unit comprises:
The plurality of health examination data is converted into a two-dimensional binary image so that the characteristic numerical values of the plurality of health examination data have values of 0 and 1, and a plurality of health examination data converted into the two- A diagnostic device for cardiovascular disease.
The method of claim 3,
The apparatus for diagnosing cardiovascular diseases comprises:
And an SNP extracting unit for collecting the gene data for each cardiovascular disease and for extracting SNP position information for each of the collected gene data,
Wherein the SNP feature data is generated by referring to the extracted SNP position information.
The method of claim 5,
The apparatus for diagnosing cardiovascular diseases comprises:
And a user interface unit for receiving query data including user's personal health data and gene data,
The cardiovascular disease diagnosis apparatus converts the input personal health data of the user into a two-dimensional binary image and extracts SNP feature data from the genome data of the corresponding user by referring to the stored respective SNP position information A diagnostic device for cardiovascular disease.
The method of claim 6,
The apparatus for diagnosing cardiovascular diseases comprises:
And a cardiovascular disease predicting unit for inputting the user's personal health data converted into the two-dimensional binary image and the extracted SNP feature data into the generated prediction model and outputting diagnosis results for each cardiovascular disease Diagnostic device for cardiovascular disease.
A gene data learning step of learning using a plurality of gene data;
A health examination data learning step of learning using a plurality of health examination data;
And an integrated learning step of collecting and learning the learning results of the gene data and the health examination data and generating a prediction model.
The method of claim 8,
Wherein the integrated learning step and the health examination data learning step comprise:
Wherein learning performance is improved by cyclically learning and reflecting learning results of a specific learning stage to a previous learning stage.
The method of claim 8,
In the gene data learning step,
And extracting SNP feature data from the plurality of gene data and learning the extracted SNP feature data.
The method of claim 8,
The health examination data learning step may include:
The plurality of health examination data is converted into a two-dimensional binary image so that the characteristic numerical values of the plurality of health examination data have values of 0 and 1, and a plurality of health examination data converted into the two- A method for diagnosing a cardiovascular disease.
The method of claim 10,
The method for diagnosing cardiovascular diseases comprises:
And an SNP extracting unit for collecting the gene data for each cardiovascular disease and for extracting SNP position information for each of the collected gene data,
Wherein the SNP feature data is generated by referring to the extracted SNP position information.
The method of claim 12,
The method for diagnosing cardiovascular diseases comprises:
And a user query data input step of receiving query data including user's personal health data and gene data,
The method for diagnosing cardiovascular disease comprises converting the input user's personal health data into a two-dimensional binary image, extracting SNP feature data from the genetic data of the user with reference to the stored respective SNP location information, A method for diagnosing cardiovascular disease.
14. The method of claim 13,
The method for diagnosing cardiovascular diseases comprises:
And inputting the user's personal health data converted into the two-dimensional binary image and the extracted SNP feature data into the generated prediction model, and outputting diagnosis results for each cardiovascular disease. Methods of diagnosing cardiovascular disease.
KR1020170012278A 2016-11-30 2017-01-25 Apparatus and method for diagnosing cardiovascular disorders using genome information and health medical examination data KR20180062917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/808,476 US20180150608A1 (en) 2016-11-30 2017-11-09 Device and method for diagnosing cardiovascular disease using genome information and health medical checkup data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020160161029 2016-11-30
KR20160161029 2016-11-30

Publications (1)

Publication Number Publication Date
KR20180062917A true KR20180062917A (en) 2018-06-11

Family

ID=62601187

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020170012278A KR20180062917A (en) 2016-11-30 2017-01-25 Apparatus and method for diagnosing cardiovascular disorders using genome information and health medical examination data

Country Status (1)

Country Link
KR (1) KR20180062917A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109545277A (en) * 2018-11-21 2019-03-29 广州市康健基因科技有限公司 A kind of methods of marking and system of scd gene mutation point
KR20200018341A (en) * 2018-08-09 2020-02-19 순천향대학교 산학협력단 Apparatus and method for facial reproduction using genetic information
KR102102848B1 (en) * 2019-06-12 2020-04-22 주식회사 프로카젠 Prostate cancer risk score calculator, and method of the above calculator
KR20220086968A (en) * 2020-12-17 2022-06-24 다윈그룹(주) Artificial Intelligence-based Predictive Medical Information Service System and Its Method
WO2024063455A1 (en) * 2022-09-23 2024-03-28 재단법인 아산사회복지재단 Method and device for predicting initial appropriate dose of anticoagulant using machine learning technology

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200018341A (en) * 2018-08-09 2020-02-19 순천향대학교 산학협력단 Apparatus and method for facial reproduction using genetic information
CN109545277A (en) * 2018-11-21 2019-03-29 广州市康健基因科技有限公司 A kind of methods of marking and system of scd gene mutation point
KR102102848B1 (en) * 2019-06-12 2020-04-22 주식회사 프로카젠 Prostate cancer risk score calculator, and method of the above calculator
KR20220086968A (en) * 2020-12-17 2022-06-24 다윈그룹(주) Artificial Intelligence-based Predictive Medical Information Service System and Its Method
WO2024063455A1 (en) * 2022-09-23 2024-03-28 재단법인 아산사회복지재단 Method and device for predicting initial appropriate dose of anticoagulant using machine learning technology

Similar Documents

Publication Publication Date Title
KR20180062917A (en) Apparatus and method for diagnosing cardiovascular disorders using genome information and health medical examination data
KR102024375B1 (en) Apparatus and method for predicting disease risk of chronic kidney disease
US20180150608A1 (en) Device and method for diagnosing cardiovascular disease using genome information and health medical checkup data
JP7276915B2 (en) Method and System for Individualized Prediction of Psychiatric Disorders Based on Monkey-Human Species Transfer of Brain Function Maps
US20180150609A1 (en) Server and method for predicting future health trends through similar case cluster based prediction models
KR20170061222A (en) The method for prediction health data value through generation of health data pattern and the apparatus thereof
Picchietti et al. Achievements, challenges, and future perspectives of epidemiologic research in restless legs syndrome (RLS)
RU2007124523A (en) METHODS, SYSTEMS AND COMPUTER SOFTWARE PRODUCTS FOR THE DEVELOPMENT AND USE OF FORECASTING MODELS FOR PREDICTING MOST MEDICAL CASES, EVALUATING THE INTERVENTION STRATEGIES AND FOR THE SHARPET OF SHARPOINT
US20220157470A1 (en) Method and system for identifying subjects who are potentially impacted by a medical condition
JP2007524461A (en) Mammography automatic diagnosis and decision support system and method
US20200258639A1 (en) Medical device and computer-implemented method of predicting risk, occurrence or progression of adverse health conditions in test subjects in subpopulations arbitrarily selected from a total population
KR20180002234A (en) A smart examination apparatus for dementia early diagnosis and the method by using the same
JP2007102709A (en) Gene diagnostic marker selection program, device and system executing this program, and gene diagnostic system
KR102439082B1 (en) Method for predicting liver disease of ordinary perseon using ecg analysis data based on deep running
CN115602325A (en) Chronic disease risk assessment method and system based on multi-model algorithm
CN113077875A (en) CT image processing method and device
WO2021243246A1 (en) A machine learning system and method for predicting alzheimer&#39;s disease based on retinal fundus images
CN110603592B (en) Biomarker detection method, disease judgment method, biomarker detection device, and biomarker detection program
KR20150099726A (en) Detection device, detection method and detection program which support detection of sign of state transition in living organism on basis of network entropy
WO2010064413A1 (en) System for predicting drug effects and adverse effects and program for the same
CN114898859A (en) Acute aortic dissection hospital internal prognosis prediction system
Parsons et al. Clinical prediction models in Epidemiological studies: lessons from the application of Qrisk3 to UK Biobank data
Maini et al. Determination of significant features for building an efficient heart disease prediction system
CN116452584B (en) Neonatal retinopathy prediction method and system
US20050137905A1 (en) Method and system for producing an updated and reliable health forecast guide