CN109599157A - A kind of accurate intelligent diagnosis and treatment big data system - Google Patents
A kind of accurate intelligent diagnosis and treatment big data system Download PDFInfo
- Publication number
- CN109599157A CN109599157A CN201811444715.0A CN201811444715A CN109599157A CN 109599157 A CN109599157 A CN 109599157A CN 201811444715 A CN201811444715 A CN 201811444715A CN 109599157 A CN109599157 A CN 109599157A
- Authority
- CN
- China
- Prior art keywords
- data
- patient
- drug response
- marker
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Medicinal Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
The present invention relates to a kind of accurate intelligent diagnosis and treatment big data systems, which includes: management module in data set: learning data with group to more medical institutions' clinic electronic health record data and manage concentratedly;Data preprocessing module: pre-processing the data of centralized management, establishes the interdependent net of relationship based on biometrical features;Marker extraction module: being based on pretreated data, extracts patient characteristic gene and obtains marker collection;Subtypes module: Subtypes are carried out to patient, determine group corresponding to patient;Drug response prediction module: establishing drug response prediction model, predicts reaction of the patient to different pharmaceutical according to drug response prediction model.Compared with prior art, the present invention is able to achieve the effective of medical data and manages and carry out drug response prediction, realizes intelligent.
Description
Technical field
The present invention relates to big data technical fields, more particularly, to a kind of accurate intelligent diagnosis and treatment big data system.
Background technique
China's cancer patient morbidity numbers in 2015 and death toll account for world population respectively up to 429.2 ten thousand and 281.4 ten thousand
22% and 27%.Cause huge burden on society and economic loss.Lung cancer, breast cancer are that China's men and women's number of patients is most respectively
Cancer types.Due to the heterogeneity of the diseases such as cancer, variability, the effective percentage of cancer drug only has 25%, and individuation is accurate
Medical treatment becomes the only way for further capturing cancer.
" precisely medical treatment " refers to based on personal genome, conjugated protein group, environment in the correlation such as metabolism group
Information measures body for patient and designs therapeutic regimen, to reach one kind that therapeutic effect maximizes and side effect minimizes
Customize medical model.The development and progress of modern genetic group can provide the something lost of newest nosopathology for pharmaceuticals industry
Biography and molecule foundation, provide technical support for the exploitation and personalized medicine of high potency drugs.Especially in oncotherapy, it is different from
The conventional method for carrying out patient's parting and therapeutic scheme formulation is checked based on tumor histology, new molecular detecting method passes through
To the precision detection of a people's gene, albumen, signal transduction and cancer cell mutation, the disease process of patient can be preferably determined,
To propose most effective treatment recommendations.From a long-term perspective, personalized precisely medical treatment is predicted potential by more accurate diagnosis
The risk of disease can provide more effective, more targeted treatment, prevent the generation of certain disease, save treatment cost.
Comprehensively extensive group's genomics research, accurate timely molecular marker analyte detection, complex clinical feature with it is more
Group learns the individuation Precise Diagnosis of feature, the target drug exploitation for specific molecular biology pathomechanism, is accurate medical treatment
Several big key links, and bioinformatics and big data technology are then the skeletons of the entire precisely medical system of support.Disease is controlled
Treatment counts the characterization limit of power with clinical path guide far beyond traditional medicine in the complexity of molecular biology scale,
Challenge is also even caused to a certain extent to the diagnostic mode for relying primarily on doctors experience.From the discovery of molecular marker with
Optimization, to the foundation of medical diagnosis on disease and assessing drug actions prediction model, to the selection of target therapeutic agent and opening for novel drugs target spot
Hair is established in the biological wisdom assisting in diagnosis and treatment technology organized on the basis of learning big data and knowledge engineering technology, is all precisely medical obtain
With the important support tool of realization.In the accurate special instruction of the great research and development of medical treatment of the Department of Science and Technology issued in the recent period, " will precisely it cure
Big data is treated to build using technology and shared platform " one of eight big tasks are classified as, show to establish a powerful biological big data
Have become industry common recognition in the importance of accurate medical field with bioinformatics support platform.
How to overcome the high isomerism of medical data and dispersibility, realizes the effective of clinical data between more medical institutions
Shared and fusion;How from the magnanimity feature and relatively limited patient's sample of human genome effective marker sieve is carried out
Choosing and feature modeling realize patient's exact classification and the therapeutic scheme matching assessment of molecular biology level;How magnanimity is overcome
High dimensional feature bring computational complexity is sufficiently excavated and establishes disease-drug-genome three connection rule, realizes treatment
Medication effect is effectively predicted, and is the three big significant challenges faced for constructing accurate medical data support platform.
The Subtypes of complex disease such as cancer are a core missions of accurate medical treatment.Traditional Subtypes are mainly
Based on histology specificity, clinically there is significant limitation, it is especially past to the effect that end-stage patients carry out classification therapy
It is past bad.With popularizing for high throughput experiment, scientists restart based on genome, and transcript profile and epigenetic group are to cancer
Disease is classified.Large-scale Genome Project such as TCGA project etc. acquires the molecule of a tumor samples up to ten thousand of various cancers type
And genetics characteristics, this is just indicating that cancer patient disaggregated classification is going into the great revolution epoch.Since cancerous tissue is one different
Matter, the dynamic system constantly to make a variation, existing research is it has been shown that molecule and genetics characteristics parting cannot be confined to be based on
The static classification of a small amount of sample, and need the dynamic analysis based on a large amount of patient's samples that could obtain accurate diagnostic result.Cause
This, needs to develop novel big data bioinformatics software packet to solve the following challenge;Integration including clinical data
With it is shared, such heterogeneous relation can be reacted in the feature space and data space with clear biology and clinical meaning,
Screen these high-dimensional feature spaces effectively to measure intensity and understand the attribute of these relationships, cancer subtypes classification, drug effect is commented
Estimate, and researches and develops personalized treatment prediction model to utilize the knowledge services individualized treatment recognized.
Realize that medication effect assessment and prediction towards individual patient are another key challenges of accurate medicine.Although target
The individual specific aim of medication is largely improved to the exploitation of drug, however, pharmacy and diagnosis under existing medical treatment system
The mode of business separation, causes clinic population's scale involved in drug research and development process very limited, in extensive people after listing
The effect applied on group often has larger difference with experimental stage.It is obtained from the association of a large amount of clinical medicine data and gene data
The potential molecular mechanism that gene phenotype feature is closely related with the individual difference reacted and cancer prognosis to drug is obtained, in turn
Establish prediction model, according to clinical diagnosis and treatment is optimized the characteristics of each patient, be it is final realize precisely medical treatment must be by
Road.The biomedical big data of exponential growth provides largely poor to drug susceptibility about cancer patient in all fields
Different details easily can carry out multi-angular analysis to the effect of taking of drug by the extraction to these information.It obtains
In relation to medication adaptability and the details of clinical effect rule, product renewing is carried out for the clinical application of hospital's specification, medical manufacturer and is changed
In generation, provides very valuable information.
All kinds of Clinical symptoms and medical test in the patient disease's development process of clinical electronic health record well-documented history as a result,
It is that genomics data are realized to the important tie for being associated with, obtaining accurate diagnosis and treatment proficiency data with clinic diagnosis.However, existing doctor
The generally existing record dispersion of the electronic health record for the treatment of system, format disunity are difficult to the defects of shared, and " information island " phenomenon is tight
Weight;On the other hand, the level that information excavating utilizes is universal lower, and a large amount of useful informations in electronic health record data are unable to fully mention
It takes, causes a large amount of wastes;Finally, the informatization of most of hospital is concentrated mainly in medical profession management, to scientific research
The support of purposes is insufficient, and especially clinical medical data library is difficult to realize comprehensive function of search, it is also difficult to incorporate medicine sheet
Body language pair information carries out structuring extraction.These problems all limit clinical medical data library in Clinical Decision Support Systems
With the realization of clinical test system.
Summary of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide a kind of accurate intelligent diagnosis and treatment
Big data system.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of accurate intelligent diagnosis and treatment big data system, the system include:
Management module in data set: data are learned with group to more medical institutions' clinic electronic health record data and are managed concentratedly;
Data preprocessing module: pre-processing the data of centralized management, establishes the relationship based on biometrical features
Interdependent net;
Marker extraction module: being based on pretreated data, extracts patient characteristic gene and obtains marker collection;
Subtypes module: Subtypes are carried out to patient, determine group corresponding to patient;
Drug response prediction module: establishing drug response prediction model, predicts patient couple according to drug response prediction model
The reaction of different pharmaceutical.
Management module is based on i2b2, SCILHS/SHRINE Data Share System to the clinical electricity of more medical institutions in data set
Sub- medical record data and group learn data and carry out Dynamic Extraction, dynamic fusion and dynamic data set generation, and then complete data concentrate tube
Reason.
The interdependent net of relationship based on biometrical features is the three-dimensional isomery based on patient, cell line and drug
Figure.
The marker is concentrated including molecule, cell, into the cell, clinical and demography feature and event.
Subtypes module carries out Subtypes by H-cube algorithm.
H-cube algorithm carries out Subtypes specifically:
(1) it calculates the corresponding marker G-Score value of patient and generates general marker collection, the G-Score value indicates
One marker is spent in being rich in for gene set;
(2) Hashing mapping is carried out based on marker G-Score value and the general marker collection of generation;
(3) Hasse tree graph is constructed based on Hashing mapping result;
(4) bidirectional clustering is carried out based on the search of Hasse tree graph and fuzzy matching and completes patient's Subtypes.
The drug response prediction model is based on the drug response of patient-cell strain-drug response three-dimensional dendrogram
Prediction model.
Include following prediction process based on patient-cell strain-drug response three-dimensional dendrogram drug response prediction model:
(1) it uses and drug response analysis is carried out with the Algorithms of Non-Negative Matrix Factorization of signature guidance, according to different medicines
Object reacts to identify cell line and drug;
(2) it is based on cancer metastasis life span, each patient is mapped to suitable cell line;
(3) the respective signature of patient is found and selected using exhaustive search support vector machines, determines Patient drug
Reaction.
Compared with prior art, the present invention has the advantage that
(1) present system is able to achieve the medical data under SHRINE framework and shares, and realizes big data management;
(2) present system learns the Knowledge Discovery of genius morbi marker and the representation of knowledge of big data by clinical and group,
Analysis is driven to realize the disease marker relational network knowledge under higher-dimension isomery biomedical data environment by a large amount of online datas
It was found that realize that the accurate assisting in diagnosis and treatment of disease lays the foundation;
(3) present invention is being examined by there is supervision to obtain accurately disease subtypes taxonomic structure with unsupervised deep learning
Consider under heterogeneous cancer cell, dynamic variation and polygenes, drug interaction, establishes patient-cell line-drug three and close
The structure of knowledge of system realizes the effect of drugs Accurate Prediction to patient.
Detailed description of the invention
Fig. 1 is the structural block diagram of accurate intelligent diagnosis and treatment big data system of the present invention.
Wherein, 1 is management module in data set, and 2 be data preprocessing module, and 3 be marker extraction module, and 4 be hypotype
Categorization module, 5 be drug response prediction module.
Specific embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.
Embodiment
As shown in Figure 1, a kind of accurate intelligent diagnosis and treatment big data system, the system include:
Management module 1 in data set: data are learned with group to more medical institutions' clinic electronic health record data and are managed concentratedly;
Data preprocessing module 2: pre-processing the data of centralized management, establishes the relationship based on biometrical features
Interdependent net;
Marker extraction module 3: being based on pretreated data, extracts patient characteristic gene and obtains marker collection;
Subtypes module 4: Subtypes are carried out to patient, determine group corresponding to patient;
Drug response prediction module 5: establishing drug response prediction model, predicts patient couple according to drug response prediction model
The reaction of different pharmaceutical.
One, management module 1 in data set
Management module 1 is based on i2b2, SCILHS/SHRINE Data Share System to the clinical electricity of more medical institutions in data set
Sub- medical record data and group learn data and carry out Dynamic Extraction, dynamic fusion and dynamic data set generation, and then complete data concentrate tube
Reason.
The present embodiment is related to TCGA database, Wake Forest University's (WFU) clinical breast cancer data set and MDACC (MD
Anderson Cancer Research Center) data set, realize the fusion of data in multiple databases.
TCGA database: TCGA project is one of American National key project, and target is to be faced by population information
Bed record and newest biotechnology, technique to describe clinical tumor sample comprehensively.The present embodiment focuses on all publish
High-level breast cancer, lung cancer, carcinoma of mouth data set (pass through mean value, be segmented, annotation, the data of description, or by with it is original
The cross correlation data that data compare), including: somatic mutation, DNA methylation, gene copy number variation, DNA-Seq,
MRNA-Seq, miRNA-SEQ, mRNA microarray, demographic, clinical diagnosis, treatment and track record.Individual patient it is more
Source isomery biological data will be used for drug repositioning, the optimization of personalized medicine and drug discovery.
Wake Forest University (WFU) clinical breast cancer data set: the set of clinical data covers 1954 patient with breast cancers,
They belong to 15 kinds or more of group, and have the experience of nursing in 10 years.Data set specifically includes: 1) by Affymetrix
The gene expression atlas of U133 genetic chip microarray platform measurement;2) clinical diagnosis records, including receptor status, lymph node shape
State, tumor size and histological grade;3) treatment record, including treatment type (operation, adjuvant hormonal therapy, adjuvant chemotherapy);
4) prognosis records, the Time And Event including no far-end transfer survival rate (DMFS);5) demography record (refers mainly to patient year
Age).Based on the data set, genius morbi is developed and marker extracts prototype, and further improved personalized medicine
Method, while the control group that TCGA data set will be used it as.
Disclosed breast cancer patients with brain shifts data set Salhia 2014, it includes the mRNA of 35 breast cancer patients with brain transfer cases
Microarray (GEO:GSE5260), methylate (Figshare:862978) and copy number changes (Figshare:855629)
Data.2010 data set of Silva, it includes mRNA microarray data (GEO:GSE14690), somatic mutation and 39
The Clinical and pathologic features of example primary breast cancer and matching brain metastes situation.Duchnowska 2015HER2+ data
Collection, the data include 89 brain metastes tumours and 70 control groups.One, which shares 153 breast cancer patients with brain transfer cases, is included.
Two, data preprocessing module 2
It is that the three-dimensional based on patient, cell line and drug is different that this module, which establishes the interdependent net of relationship based on biometrical features,
Composition.
Clinical large data sets show typical local dense and global sparse data mode, they cover different
Data type.In order to illustrate this characteristic of data set, the present embodiment selects a small feature set (about 200 features) to make
For Pre feasibility, it corresponds to 44 features.Having 8 in these features is molecular marked compound, remaining is demography or examines
Disconnected feature (see the color-bar mark on right side).These features belong to four kinds of data types: numeric type, Ordinal, title type and
Binary type.Therefore, which shows the isomerism of feature well.
The pairing of characteristic relation based on boot-strap and combination learning, the mechanism are used to solve different types of variables (numerical value
Type, binary type, Ordinal and title type) between relevance and different sample size between related question.Two features
Between the degree of association measured by a data rate, which refers to while the data that have value in two features account for the ratio of total data
Example.Therefore, there is apparent variation to relevant data sampling size in conjunction with different characteristic.The present embodiment is used corresponding to specific
The method of data type describes the whole strength of association between different concepts.Five kinds of different correlation measurements are used in the present embodiment
In 10 groups of blended datas, every group all includes 4 kinds of data types.
Three, marker extraction module 3
Feature or marker (Signature) are exactly the spy in a characterizing gene or a signal path simply
Levy gene set.The present invention proposes one signatome of building, it is a feature or marker collection, and this feature collection, which reflects, works as
The preceding understanding to biosystem.Signatome is made of representative feature knowledge library, covers molecule, cell, cell
It is interior, clinical and demography feature and event.Therefore, signatome provides unified " knowledge space ", as a kind of new
Type measurement criterion, the criterion system and quantitatively describe the up-to-date knowledge in data sample.The feature that signatome is used
From three databases: MSigDB characterization of molecules collection and DrugSig, the characteristic set database of pLINDAW.signatome
Can highly extend: up-to-date knowledge can constantly be integrated into signatome.These features represent currently in molecular level
On to the understanding of biomedical system.Genome therein will be used for group genetic enrichment for learning data analysis (GSEA) with determination
The importance of character pair or marker in clinical samples.
MSigDB feature is that MIT-Harvard Broad Institute research institute maintains a large amount of clinical and scientific researches uses
Biometrical features gene set and marker collection, they are known as characterization of molecules database (MSigDB3.0).In the database altogether
There are 10295 features, they are used in early-stage study with genomic form.Including: 1) it is based on gene genetics position
The gene set of relationship;2) leading type genome, for example disturbed from chemistry and heredity, typical molecular pathway, and by data
Library BioCarta, KEGG genome and REACTOME are classified as main genome;3) target gene of microRNA and transcription because
Son;4) genome obtained by calculation, including cancer neighbour gene and cancer module;5) and bioprocess, cell component and
The relevant GO signal path database gene set of molecular function;6) oncogenic feature, it by NCBI GEO database micro- battle array
Column data generates;7) immune characteristic of human immunity project alliance (HIPC) production.
The present embodiment tentatively establishes tag database: DrugSig and pLINDAW, they include breast cancer
Drug marker, potential drug target, and from NIH LINCS project calculated various chemicals marker, and
The molecular marker such as PAM50, Oncotype DXTM (21 for the inside that breast cancer metabolism marker and breast cancer share
The marker of a gene),(marker of 70 genes) and Rotterdam Signature (76
The marker of gene), and the marker of the verified mistake of other in document.TCGA methylation marker methylation
Signatures describe DNA methylation adjust gene function, Copy number variation marker,
It is usually the very important mark of cancer with mutation marker.Marker is concentrated including molecule, cell, into the cell, clinical and people
Mouth learns feature and event.
Four, Subtypes module 4
The purpose of patient's Subtypes is that patient is divided into different groups, then provides every group of patient towards patient
Property medical services.Traditional clustering algorithm is usually a small amount of feature with patient to determine several hypotypes, and usually with minimum
Overlapping between the hypotype of change is as objective function." coarse " hypotype does not have enough characteristics to distinguish the weight between patient in this way
It distinguishes.The rapid advances of research and the clinical practice of personalised drug need finer richer hypotype, to optimize disease
Rule treatment and monitoring by men.This clinical demand is answered, the present embodiment carries out Subtypes using H-cube algorithm, it can be fine
Scale on gone systematically to identify the similar features that patient's subgroup is shared with many candidate markers of different nature.
H-cube algorithm need to be to solve: (1) mode of " patient-marker " is found, because some markers are
What certain particular patients had, rather than all patients have, the bidirectional clustering of " marker -- patient " is developed in this requirement
(bi-clustering) method;(2) by exploring huge feature space, multiple evidences are provided to a kind of mechanism: because of an Asia
The potential pathogenesis of type may be related with multiple evidences such as genotype and phenotype expression in terms of different, such as DNA exception, table
Genetic modification is seen, gene expression pattern is associated, signal path activity, receptor status, diagnostic function, character of living in groups, and treatment
Reaction etc.;(3) similarity for the clinical subtype being overlapped between the feature and patient of main complexity is portrayed to bilinearization: because often
Several hypotypes seen share the important feature in part and same patient may be associated with multiple hypotypes;(4) matching is different
Clinical evidence: different hypotypes potentially contributes to different clinical practices, such as diagnose, risk assessment, the selection of drug, treatment
And response prediction.Newfound hypotype is translated into useful knowledge (knowledge) to use clinically, this will be most important
, because only that so just can determine that these hypotypes whether with clinical application have correlation and which knowledge be suitble to which send out
Existing hypotype.
H-cube algorithm includes three steps: G-Score (richness of one marker of measurement in a gene set of marker
Containing degree) calculate and general marker signatome generation, wherein signatome refers to the set from different markers, with
And initial data is projected to the knowledge space of signatome;Reach identification by patient's subspace clustering to general marker
Important patient's hypotype;And how to analyze similitude between these hypotypes.
H-cube algorithm carries out Subtypes as a result, specifically:
(1) it calculates the corresponding marker G-Score value of patient and generates general marker collection, the G-Score value indicates
One marker is spent in being rich in for gene set;
(2) Hashing mapping is carried out based on marker G-Score value and the general marker collection of generation;
(3) Hasse tree graph is constructed based on Hashing mapping result;
(4) bidirectional clustering is carried out based on the search of Hasse tree graph and fuzzy matching and completes patient's Subtypes.
Five, drug response prediction module 5
The drug response prediction model that the module is established is based on patient-cell strain-drug response three-dimensional dendrogram medicine
Object response prediction model, specifically:
The present embodiment uses following measure: (1) for three with feature selecting of individuation drug response prediction research and development
Isomery graph model;(2) model is verified with GEO data and clinical breast cancer biopsy sample to drug response and potential mechanism
Predictive power.The success of this BDS4PM system will react for cancer drug provides a knowledge environment, and conversion current biological medicine is ground
Study carefully the mode with clinical practice, and promotes biomedical big data to the conversion of individualized treatment.For biologically, pass through
Signature can comprehensively describe the related mechanism of phenotype and different pharmaceutical reaction.Existed by these labels and drug response
High correlation between patient and certain a kind of cell line can represent the cell line of patient this type.In skill
For in art, with the arrival of big data era, the continuous product of patient and cell line these two types data and correlated characteristic label
It is tired, it is supported so as to there is enough data, finds out the relevance of the drug response in patient and cell line, and explain it
In include mechanism.The present embodiment is successfully confirmed by analyzing the similitude between breast cancer cell line and patient
Above-mentioned basic principle.Then, the present invention develops a kind of three new step prediction models, which includes: 1) with signature
The Algorithms of Non-Negative Matrix Factorization of guidance identifies the two-way modules of cell line and drug according to different drug responses;2) it is based on
Each patient is mapped to most suitable cell line module up by cancer metastasis life span;3) exhaustion is searched in each module
Rope support vector machines finds and selects respective signature.The invention proposes the random walks on isomery figure, before it is
The extension of bidirectional clustering and feature selecting thought in phase work: by hereditary using parallel multi-Deme in feature space
With the random walk on isomery figure in algorithm (PMDGA) and data entity space, patient-cell strain-drug response three is found
Personalized treatment model is established to cluster.The purpose of the method proposed is to maximize the standard of the three-dimensional cluster of each identification
Change the sum of purity, and complete in such a way that another kind updates, it may be assumed that feature selecting is carried out using PMDGA;Based on selected feature
Update three-dimensional isomery figure;Isomery figure random walk towards three-dimensional cluster;Finally the superiority and inferiority of assessment three-dimensional cluster is to adjust feature
Selection scheme.To sum up, drug response prediction model is based on the drug response of patient-cell strain-drug response three-dimensional dendrogram
Prediction model specifically predicts process are as follows:
(1) it uses and drug response analysis is carried out with the Algorithms of Non-Negative Matrix Factorization of signature guidance, according to different medicines
Object reacts to identify cell line and drug;
(2) it is based on cancer metastasis life span, each patient is mapped to suitable cell line;
(3) the respective signature of patient is found and selected using exhaustive search support vector machines, determines Patient drug
Reaction.
Claims (8)
1. a kind of accurate intelligent diagnosis and treatment big data system, which is characterized in that the system includes:
Management module (1) in data set: data are learned with group to more medical institutions' clinic electronic health record data and are managed concentratedly;
Data preprocessing module (2): pre-processing the data of centralized management, establish the relationship based on biometrical features according to
Deposit net;
Marker extraction module (3): being based on pretreated data, extracts patient characteristic gene and obtains marker collection;
Subtypes module (4): Subtypes are carried out to patient, determine group corresponding to patient;
Drug response prediction module (5): establishing drug response prediction model, predicts patient to not according to drug response prediction model
With the reaction of drug.
2. a kind of accurate intelligent diagnosis and treatment big data system according to claim 1, which is characterized in that data manage mould concentratedly
Block (1) is based on i2b2, SCILHS/SHRINE Data Share System and learns data to more medical institutions' clinic electronic health record data and group
It carries out Dynamic Extraction, dynamic fusion and dynamic data set to generate, and then completes data centralized management.
3. a kind of accurate intelligent diagnosis and treatment big data system according to claim 1, which is characterized in that described based on biology
The interdependent net of the relationship of medical features is the three-dimensional isomery figure based on patient, cell line and drug.
4. a kind of accurate intelligent diagnosis and treatment big data system according to claim 1, which is characterized in that the marker collection
In include molecule, cell, into the cell, clinical and demography feature and event.
5. a kind of accurate intelligent diagnosis and treatment big data system according to claim 1, which is characterized in that Subtypes module
(4) Subtypes are carried out by H-cube algorithm.
6. a kind of accurate intelligent diagnosis and treatment big data system according to claim 5, which is characterized in that H-cube algorithm carries out
Subtypes specifically:
(1) it calculates the corresponding marker G-Score value of patient and generates general marker collection, the G-Score value indicates one
Marker is spent in being rich in for gene set;
(2) Hashing mapping is carried out based on marker G-Score value and the general marker collection of generation;
(3) Hasse tree graph is constructed based on Hashing mapping result;
(4) bidirectional clustering is carried out based on the search of Hasse tree graph and fuzzy matching and completes patient's Subtypes.
7. a kind of accurate intelligent diagnosis and treatment big data system according to claim 1, which is characterized in that the drug response
Prediction model is based on patient-cell strain-drug response three-dimensional dendrogram drug response prediction model.
8. a kind of accurate intelligent diagnosis and treatment big data system according to claim 7, which is characterized in that be based on patient-cell
The drug response prediction model of strain-drug response three-dimensional dendrogram includes following prediction process:
(1) it uses and drug response analysis is carried out with the Algorithms of Non-Negative Matrix Factorization of signature guidance, it is anti-according to different drugs
It should identify cell line and drug;
(2) it is based on cancer metastasis life span, each patient is mapped to suitable cell line;
(3) the respective signature of patient is found and selected using exhaustive search support vector machines, determines that Patient drug reacts.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811444715.0A CN109599157B (en) | 2018-11-29 | 2018-11-29 | Accurate intelligent diagnosis and treatment big data system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811444715.0A CN109599157B (en) | 2018-11-29 | 2018-11-29 | Accurate intelligent diagnosis and treatment big data system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109599157A true CN109599157A (en) | 2019-04-09 |
CN109599157B CN109599157B (en) | 2020-10-02 |
Family
ID=65959164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811444715.0A Active CN109599157B (en) | 2018-11-29 | 2018-11-29 | Accurate intelligent diagnosis and treatment big data system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109599157B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110782954A (en) * | 2019-10-31 | 2020-02-11 | 哈尔滨工业大学 | Weight modular mapping method for predicting drug response in cancer cell strain |
CN111966813A (en) * | 2019-05-20 | 2020-11-20 | 阿里巴巴集团控股有限公司 | Information mining method and device and information recommendation method and device |
CN113284611A (en) * | 2021-05-17 | 2021-08-20 | 西安交通大学 | System, device and storage medium for diagnosing and prognosing cancer based on individual pathway activity |
CN114255886A (en) * | 2022-02-28 | 2022-03-29 | 浙江大学 | Multi-group similarity guide-based drug sensitivity prediction method and device |
CN115938590A (en) * | 2023-02-09 | 2023-04-07 | 四川大学华西医院 | Construction method and prediction system of colorectal cancer postoperative LARS prediction model |
WO2023141706A1 (en) * | 2022-01-25 | 2023-08-03 | Duke University | Systems and devices for coupling metabolomics data with digital monitors for precision health |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529177A (en) * | 2016-11-12 | 2017-03-22 | 杭州电子科技大学 | Patient portrait drawing method and device based on medical big data |
CN107103207A (en) * | 2017-04-05 | 2017-08-29 | 浙江大学 | Based on the multigroup accurate medical knowledge search system and implementation method for learning variation features of case |
CN107609326A (en) * | 2017-07-26 | 2018-01-19 | 同济大学 | Drug sensitivity prediction method in the accurate medical treatment of cancer |
-
2018
- 2018-11-29 CN CN201811444715.0A patent/CN109599157B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106529177A (en) * | 2016-11-12 | 2017-03-22 | 杭州电子科技大学 | Patient portrait drawing method and device based on medical big data |
CN107103207A (en) * | 2017-04-05 | 2017-08-29 | 浙江大学 | Based on the multigroup accurate medical knowledge search system and implementation method for learning variation features of case |
CN107609326A (en) * | 2017-07-26 | 2018-01-19 | 同济大学 | Drug sensitivity prediction method in the accurate medical treatment of cancer |
Non-Patent Citations (4)
Title |
---|
FRÖHLICH, H., BALLING, R., BEERENWINKEL, N. ET AL.: "《From hype to reality: data science enabling personalized medicine》", 《BMC MEDICINE》 * |
OW, GHIM SIONG; TANG, ZHIQUN; KUZNETSOV, VLADIMIR A.: "《Big data and computational biology strategy for personalized prognosis》", 《ONCOTARGET》 * |
向俊,刘朦: "《基于大数据分析法的精准医疗前景》", 《中国医疗设备》 * |
杭渤,束永前,刘平 等: "《肿瘤的精准医疗:概念、技术和展望》", 《科技导报》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111966813A (en) * | 2019-05-20 | 2020-11-20 | 阿里巴巴集团控股有限公司 | Information mining method and device and information recommendation method and device |
CN110782954A (en) * | 2019-10-31 | 2020-02-11 | 哈尔滨工业大学 | Weight modular mapping method for predicting drug response in cancer cell strain |
CN110782954B (en) * | 2019-10-31 | 2021-05-04 | 哈尔滨工业大学 | Weight modular mapping method for predicting drug response in cancer cell strain |
CN113284611A (en) * | 2021-05-17 | 2021-08-20 | 西安交通大学 | System, device and storage medium for diagnosing and prognosing cancer based on individual pathway activity |
CN113284611B (en) * | 2021-05-17 | 2023-06-06 | 西安交通大学 | Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity |
WO2023141706A1 (en) * | 2022-01-25 | 2023-08-03 | Duke University | Systems and devices for coupling metabolomics data with digital monitors for precision health |
CN114255886A (en) * | 2022-02-28 | 2022-03-29 | 浙江大学 | Multi-group similarity guide-based drug sensitivity prediction method and device |
CN114255886B (en) * | 2022-02-28 | 2022-06-14 | 浙江大学 | Multi-group similarity guide-based drug sensitivity prediction method and device |
CN115938590A (en) * | 2023-02-09 | 2023-04-07 | 四川大学华西医院 | Construction method and prediction system of colorectal cancer postoperative LARS prediction model |
CN115938590B (en) * | 2023-02-09 | 2023-05-02 | 四川大学华西医院 | Construction method and prediction system of colorectal cancer postoperative LARS prediction model |
Also Published As
Publication number | Publication date |
---|---|
CN109599157B (en) | 2020-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Goecks et al. | How machine learning will transform biomedicine | |
Shehab et al. | Machine learning in medical applications: A review of state-of-the-art methods | |
CN109599157A (en) | A kind of accurate intelligent diagnosis and treatment big data system | |
Androulakis et al. | Analysis of time-series gene expression data: methods, challenges, and opportunities | |
CN100504385C (en) | Methods for analyzing tissue specimen by biological map | |
Jerby-Arnon et al. | DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data | |
CN108198621A (en) | A kind of database data synthesis dicision of diagnosis and treatment method based on neural network | |
JP2003021630A (en) | Method of providing clinical diagnosing service | |
CN106971071A (en) | A kind of Clinical Decision Support Systems and method | |
CN107092770A (en) | medical analysis system | |
CN106650256A (en) | Precise medical platform for molecular diagnosis and treatment | |
WO2012104764A2 (en) | Method for estimation of information flow in biological networks | |
Hajirasouliha et al. | Precision medicine and artificial intelligence: overview and relevance to reproductive medicine | |
Abdelazim et al. | A survey on classification analysis for cancer genomics: Limitations and novel opportunity in the era of cancer classification and Target Therapies | |
CN108335756A (en) | The synthesis dicision of diagnosis and treatment method in nasopharyngeal carcinoma database and based on the data library | |
CN108206056A (en) | A kind of nasopharyngeal carcinoma artificial intelligence assisting in diagnosis and treatment decision terminal | |
Sealfon et al. | Machine learning methods to model multicellular complexity and tissue specificity | |
Diaz-Flores et al. | Evolution of artificial intelligence-powered technologies in biomedical research and healthcare | |
Gifari et al. | Artificial intelligence toward personalized medicine | |
Sethi et al. | Long Short-Term Memory-Deep Belief Network based Gene Expression Data Analysis for Prostate Cancer Detection and Classification | |
CN108320797A (en) | A kind of nasopharyngeal carcinoma database and based on the data the synthesis dicision of diagnosis and treatment method in library | |
CN117457065A (en) | Method and system for identifying phenotype-associated cell types based on single-cell multi-set chemical data | |
Zhang et al. | Landscape of big medical data: a pragmatic survey on prioritized tasks | |
Siddiqui et al. | Artificial intelligence in precision medicine | |
Srivastava et al. | Computational intelligence-based gene expression analysis in colorectal cancer: a review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |