CN107103207A - Based on the multigroup accurate medical knowledge search system and implementation method for learning variation features of case - Google Patents

Based on the multigroup accurate medical knowledge search system and implementation method for learning variation features of case Download PDF

Info

Publication number
CN107103207A
CN107103207A CN201710218630.XA CN201710218630A CN107103207A CN 107103207 A CN107103207 A CN 107103207A CN 201710218630 A CN201710218630 A CN 201710218630A CN 107103207 A CN107103207 A CN 107103207A
Authority
CN
China
Prior art keywords
case
multigroup
model
variation
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710218630.XA
Other languages
Chinese (zh)
Other versions
CN107103207B (en
Inventor
陈新
张嘉宁
王纬韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201710218630.XA priority Critical patent/CN107103207B/en
Publication of CN107103207A publication Critical patent/CN107103207A/en
Application granted granted Critical
Publication of CN107103207B publication Critical patent/CN107103207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • G06F19/325
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of based on the multigroup accurate medical knowledge search system and implementation method for learning variation features of case, system includes:One accurate medical knowledge base, optimizable matching algorithm, assessment algorithm of matching algorithm etc. produces the analysis report of case matching system, the data renewal of knowledge base and the self-evolution step realization of matching algorithm by setting up based on multigroup how accurate medical knowledge base for learning the intervention response correlation model that makes a variation, multigroup variation features of extraction new case, the matching algorithm set up between new case and model, hunting zone of the present invention is wide, can retrieve the correlation model under different observing results.Known group is incorporated to learn variation and intervene the association between response present system, the framework of response correlation model class is intervened in multigroup variation by defining a broad sense, and the intervention in varying level and source is responded and group learns variation information integration and entered a knowledge base.

Description

Based on the multigroup accurate medical knowledge search system for learning variation features of case and realization Method
Technical field
The invention belongs to medical and health industry field, a kind of implementation method of accurate medical knowledge search system, specifically Say, be a kind of implementation method based on the multigroup accurate medical knowledge search system for learning variation features of case.
Background technology
Precisely classification of the medical treatment dependent on biomarker to disease risks, prognosis and treatment response.Omics technology it is fast Speed development has greatly enriched the biomarker quantity of molecular level, is medical diagnosis on disease, judges staging or evaluate new treatment Security in target group provides more comprehensive careful basis for estimation with validity.
Currently the association of " molecular level label or pathologic group variation features-intervention response (including medicine response) " is believed Breath mainly can be from several canals such as diagnosis, the high-flux medicaments sifting experiment of cell line level, the experiments of accurate medicinal Road is obtained.The related information provided with diagnosis is obtained under the large sample crowd i.e. observation of population level, and information is direct Easily obtain.But cell line medicament screening experiment is needed at raw information with the related information that precisely medical treatment experiment is provided Manage, between the variation and intervention by that could set up molecular level to group multigroup variation features extraction of data progress are responded Association.Therefore separate sources and different types of related information mix the difficult present situation divided and add many clinical workerss to group The deciphering of variation features physiological significance and the difficulty utilized of clinical value.
In addition, the problem of integration and clinical conversion that group learns data also need to consider data stability, experiment porch is (such as Different experiments room or mechanism), observing result (such as cell line level organizes level, individual level etc.), observed pattern is (as transcribed Group aspect, protein groups aspect, or genome aspect etc.), observation method (such as mononucleotide polymorphic chip, two generation sequencing technologies etc.) The unstable of the same biomarker behavior that observes all is likely to result in etc. factor.Therefore how these farthest to be integrated Related information, allows them to give play to the effect of maximum still urgently to be resolved hurrily.
The content of the invention
The purpose of the present invention is to utilize in observable individual multigroup variation features, fast search knowledge base and new disease Example multigroup that the match is successful becomes iso- and intervenes response correlation model, by the corresponding Intervention Strategy of all models that the match is successful and Whether the record of success response is presented to user with a kind of readability and the close form of integration, and the present invention is by following technical side Case is realized:
The invention discloses a kind of based on the multigroup accurate medical knowledge search system for learning variation features of case, system bag Include:
One accurate medical knowledge base, becomes iso- intervention response correlation model for collecting multigroup learn, realizes to different water The collection of flat " group learns variation features-intervention response " information is integrated;
Optimizable matching algorithm, for judging whether case matches and matching degree with the model in knowledge base;
The assessment algorithm of matching algorithm, for by assessing cluster result of the matching algorithm to knowledge base model, with model The result that labeling according to response is intervened is obtained is contrasted, and can assess the quality of matching algorithm, continuous to algorithm to carry out Optimization;
What search system was directly generated contains the form of case group analyze data and systematic search result, for for doctor The physiological significance of raw offer group data referred to, and supplemental treatment regimens are drafted.
As a further improvement, varying level of the present invention includes population level, individual level, organizes level and thin Born of the same parents system level.
The invention also discloses a kind of realization based on the multigroup accurate medical knowledge search system for learning variation features of case Method, is to be achieved by the steps of:
1), set up and the iso- how accurate medical knowledge base for intervening response correlation model is become based on multigroup learn;
2), when new case occurs, multigroup variation features of new case are extracted;
3) matching algorithm, set up between new case and model (known multigroup variation features-intervention response is associated);
4) analysis report of case matching system, is produced;
5), the data renewal of knowledge base and the self-evolution of matching algorithm.
As a further improvement, step 1 of the present invention) in, multigroup variation information of learning includes the base of transcriptionally active Because of single base mutation (SNP and base insertion and deletion) in group region, chromosomal variation (such as Gene Fusion) and use Come judge gene whether the benchmark gene expression amount of abnormal expression.
As a further improvement, step 1 of the present invention) in, multigroup learns becomes iso- intervention response correlation model Being one group has adjoint diagnostic medicine response annotation and multigroup variation features " with diagnosis correlation model ", or drug screening In experiment " cell line correlation model " comprising medicine response information and multigroup variation features, or clinical observation to include Intervene " the case correlation model " of response results and multigroup variation features, or include drug screening object information and multigroup " the individuation disease model correlation model " of variation features.Described individuation model includes but is not limited to PDX mouse, PDO classes Organ model.
As a further improvement, step 2 of the present invention) in, described multigroup variation features include transcription and lived Single base mutation, chromosomal structural variation, abnormal gene expression information in the genome area of jump.
As a further improvement, step 2 of the present invention) in, set up the group data analysis stream of a set of standardization Journey is extracted multigroup learn and made a variation, and from sample collection, sequencing, data analysis, Quality Control and quality guarantee are carried out to knowledge base matching overall process.
As a further improvement, step 3 of the present invention) in, search system provides a starting matching algorithm With the appraisal procedure for matching algorithm, appraisal procedure can be gathered according to using Different matching algorithm to correlation model in knowledge base Class shows to assess whether existing algorithm is better than new algorithm, and deciding whether to upgrade to algorithm optimizes.
As a further improvement, step 4 of the present invention) in, described report is divided into two parts:Part I is The statistical information of the multigroup variation features related to pathophysiological shows, from single base mutation, chromosomal variation and difference table The group variation information of pathological tissues is provided in terms of up to gene;Part II is the foundation after the search to knowledge base is completed In system model sorted from high to low with the similitude of case show model match evidence and medication information.
As a further improvement, step 5 of the present invention) in, when case complete step 2) group learn feature extraction after, Case medication therapeutic effect is tracked, accurate medical knowledge base is added using case data as a case class model, expands knowledge The coverage in storehouse and the matching precision for improving knowledge base;When not searching the correlation model that can be matched in knowledge base, directly Connect and treated according to doctors experience, while can develop case sets up individuation disease model, and track case therapeutic effect and individual Change the reagent result of disease model, build corresponding " case correlation model " or " individuation disease model correlation model " and add essence Quasi- medical knowledge base.
The advantage of the invention is that:
1) hunting zone of the present invention is wide, can retrieve the correlation model under different observing results.Incorporate present system Association between known group of variation and intervention response, multigroup by defining a broad sense becomes iso- intervention response and associated The framework of model class, the intervention in varying level and source is responded and group learns variation information integration and entered a knowledge base.
2) available matching characteristic and matching strategy of the invention enrich.On the one hand, from single base variation, chromosomal variation, The many aspects such as difference expression gene ensure that the reliability of matching result to multigroup variation features collaboration matching, reduce list One variation type and the noise in physiological phenotype association analysis.On the other hand, intervention of the present invention to different scale in knowledge base Response model each provides specific optimizable matching strategy, is between case-intervention is responded by correlation model The evidence that relation provides multi-angle is supported.
3) present invention has self-evolution ability.The ability shows two aspects:First, model quantity in accurate medical knowledge base To constantly it expand with the operation of search system.After new case enters, system can record multigroup variation features of case, with reference to The medication result of the successive treatment scheme of case and the individuation disease model of intervention response results or case, generates the pass of case Gang mould type adds many precisely medical knowledge bases.2nd, the matching algorithm of system can be continued to optimize.The present invention is built for matching algorithm Corresponding evaluation method is found.Once updating matching algorithm, new matching algorithm can be used to carry out the model in knowledge base Again cluster, with being compared based on the mode classification for intervening responsive tags, by evaluating whether new algorithm is better than existing algorithm To decide whether more new system.
4) present invention has filled up group and has learned the blank made a variation between information extraction link and clinical guidance medication link, facilitates Systematicness deciphering and the excavation of clinical value of the clinical staff to group variation physiological significance.
Brief description of the drawings
Fig. 1 is technical solution of the present invention implementation process schematic diagram.
Embodiment
The present invention establishes a kind of based on the multigroup accurate medical knowledge search for learning variation collaboration matching process of individual case System.Present system:First, comprising an accurate medical knowledge base.Knowledge base becomes iso- intervention response pass by collecting multigroup learn Gang mould type, realizes " the group variation spy to varying level (population level, individual level, tissue level, cell line level etc.) Levy-intervene response " collection of information integrates.Individual case into system can be by as new model, the amplification for knowledge base; 2nd, optimizable matching algorithm is contained.The starting matching algorithm that system is provided has not given play to abundant farthest Group learns the advantage of variation, but the invention provides the appraisal procedure of a matching algorithm, by assessing matching algorithm to knowledge base The cluster result of model, the result obtained with model according to the labeling for intervening response is contrasted, and can assess matching algorithm Quality, algorithm is constantly optimized;3rd, search system directly generates one and readable contains case group analyze data With the form of systematic search result, the physiological significance reference of group data can be provided for doctor, supplemental treatment regimens are drafted.
The basic model of this invention is:First, set up and the iso- how accurate doctor for intervening response correlation model is become based on multigroup learn Gain knowledge storehouse.Multigroup variation information of learning includes single base mutation (SNP and base insertion and deletion), and chromosome becomes Different (such as Gene Fusion) and for judge gene whether content in terms of the benchmark gene expression amount three of abnormal expression.One multigroup Become it is iso- intervene response correlation model and can be one group have adjoint diagnostic medicine response annotation and multigroup to learn variation features " adjoint Diagnose correlation model ";Can also be that medicine response information and multigroup variation features " cell line are included in medicament screening experiment Correlation model ";Can also be clinical observation arrive comprising intervene response results and multigroup variation features " case associates mould Type ";Can also be comprising drug screening object information and multigroup variation features " individuation disease association model " (including but It is not limited to PDX mouse, PDO organoids model).2nd, when new case occurs, multigroup variation features (bag of new case is extracted Include but be not limited to single base mutation, chromosomal structural variation, gene expression spectrum information).Set up the group data of a set of standardization Analysis process extract it is multigroup learn variation, from sample collection, sequencing, data analysis, to knowledge base matching overall process carry out Quality Control and Quality guarantee.3rd, the matching algorithm set up between new case and correlation model.Search system provides a starting matching algorithm and pin To the appraisal procedure of matching algorithm, appraisal procedure can be according to cluster table of the use Different matching algorithm to correlation model in knowledge base Now come assess existing algorithm whether be better than new algorithm, decide whether to algorithm upgrade optimize.4th, the personalization of case is generated Report.Report is divided into two parts:Part I is that the statistical information of the multigroup variation features related to pathophysiological shows, from The group variation information of pathological tissues is provided in terms of single base mutation, chromosomal variation and difference expression gene;Part II It is that the similitude according to model and case in system sorts from high to low shows of model after the search to knowledge base is completed With evidence and medication information.If the 5, case does not match existing model, direct basis doctors experience medication, while can Develop the individuation disease treatment model based on the case and carry out drug screening, " case is built to the case according to feedback result Correlation model " and " individuation disease association model ", add knowledge base.
Fig. 1 is technical solution of the present invention implementation process schematic diagram, implements step as follows:
1) build and the iso- accurate medical knowledge base for intervening response correlation model is become based on multigroup learn:Set up different scale (bag Include but be not limited to population level, individual level, tissue level, cell line level) intervention response model, include but is not limited to from " population groups variation features-intervention response ", " individual case group variation features-intervention response ", " individuation disease model (such as PDX mouse and PDO models) group variation features-intervention response ", " cell line group variation features-intervention response " are several Individual angle, collects multigroup variation features and corresponding intervention and intervention response message.Data in this knowledge base pass through network Crawler capturing, public database are downloaded, and local data imports modes such as (case and individuation disease models) and obtained.Obtain Data need by participle, semantic analysis, the technology such as canonical matching extracts kernel keyword and the laggard row format conversion of data, Data are added after raw information to be mapped to the information standardization interface being worth with clinical intervention design reference, manual synchronizing Storehouse.The data of same class correlation model have unified information saving format in database;
2) build and extract the multigroup flow for learning variation features of case:Build the biological information credit based on two generation sequencing technologies Flow is analysed, extracting data and the closely related single base mutation of physiological change, genome structure mutation and transcription are learned from group The abnormal gene of horizontal expression, as multigroup variation features of case, for the mould in multigroup variation features database Type is matched.The data analysis process of case uses strict Quality Control, in the case of obtained by normal control sample, utilizes Normal sample and known disease-group variation information are screened to case group variation, and multigroup of increase case makes a variation special Levy the reliability associated with physiological phenotype;
3) realize that case-model is multigroup and learn variation collaboration matching algorithm:Accurate medical knowledge base incorporates multiple data origin, The variation features information of multigroup correlation model for learning angle.When case completes multigroup extraction for learning variation features, into case , it is necessary to according to the type of model in knowledge base, be matched to case with model during match system.With a certain specific association mould When type is matched, it is necessary to for different group variation features, respectively using different methods to being extracted from case Variation features are carried out matching marking with the variation features of model, and the marking of Different Variation feature finally is generated into disease according to formula The matching total score of example-medicine response model, judges whether case can match with model according to total score;
4) analysis report of case matching system is produced:Report is divided into two aspects:First layer:The group letter of individual case Breath report.Including but not limited to initial data sequencing quality information, data analysis flow introduce, it is multigroup learn variation features statistics Information;The second layer:Case and the matching result of model in accurate medical knowledge base.According to search result, by model in system with The similitude of case sorts the Intervention Strategy for showing model, response results and the matching information such as evidence from high to low.The second layer There is provided readable " individual case group variation features-model group variation features-intervention response ", there is provided case for information It is potential to intervene response message to aid in the clinical value of the physiological significance and excavation group data of doctor's deciphering group variation features;
5) renewal of search system:The renewal of system is divided into data renewal and the self-evolution two of matching algorithm of knowledge base Part.
First, the renewal of knowledge base:When case matches model in knowledge base, case medication therapeutic effect is tracked, by disease Number of cases adds accurate medical knowledge base according to as a case class model, expands the coverage of knowledge base and improves knowledge base Matching precision.When not searching the correlation model that can be matched in knowledge base, directly treated according to doctors experience, while can send out Exhibition case sets up individuation disease model (such as PDX mouse or PDO organoids model), and track case intervene response results and The reagent result of individual disease model, builds corresponding case correlation model or individual disease association model adds accurate medical science and known Know storehouse.
2nd, the self-evolution of matching algorithm:The system establishes an assessment side good and bad for comparing new and old matching algorithm Method optimizes system matches algorithm.When the system puts into operation, providing one first has starting matching algorithm to be optimized.With Model in the expansion of new case, accurate medical knowledge base can be continuously increased, and resource is provided for Optimized Matching algorithm.According to knowing Know in storehouse that model is to the response taxonomy of intervention, the present invention can randomly select M correlation model, the model of selection is divided between any two Do not given a mark using new and old matching algorithm, obtain two similarity score matrixes being made up of these models.It is further right Matrix Cluster, can obtain the category of model situation obtained respectively with new and old matching algorithm, and real according to medicine response information The result classified is compared, so that it is more outstanding to judge whether new algorithm shows, can replace system current algorithm.
Technical scheme is further described below by way of specific embodiment:
Embodiment 1:One cases of cancer Rapid matching system based on case transcript profile variation features
The present embodiment is made up of five big steps:
1) structure of how accurate medical knowledge base:Knowledge base is using correlation model as storage object, from U.S.'s food and medicine prison Adjoint diagnostic medicine list, precision cancer biomedical information, the Sang Ge of My Cancer Genome offers of office (FDA) approval are provided Three data sources of GDSC databases of research institute collect multigroup variation features associated by medicine response information.With diagnosis medicine Thing and My Cancer Genome provide group variation features-medicine response information of population level, and GDSC databases are provided Cell line level specific group of variation features-medicine response information.The data of different-format, pass through international standard data The naming method that storehouse is provided is managed collectively.In this example, the single base mutation of separate sources is mapped to COSMIC numbers According to correspondence title in storehouse, standard output is used as using the name in the database.Similarly, gene name is with NCBI entrez ID As standard, disease name is used as standard using OMIM ID.
2) the multigroup extraction for learning variation features of case:Build the biological information that (RNA-Seq) data are sequenced based on transcript profile Learn analysis process, from transcript profile extracting data and the closely related single base mutation of physiological change, chromosome structure be mutated with And the gene of transcriptional level abnormal expression, as multigroup variation features of case, for learning variation features database with multigroup In model matched.
In this example, the extraction flow of variation is divided into following components:RNA-Seq data predictions, it is single Base mutation detects (SNP, small fragment insertion and deletion), chromosomal structural variation detection (Gene Fusion), gene Expression and unconventionality expression genetic test, result visualization displaying.
First, RNA-Seq data predictions:
Initial data use quality control instrument checks the quality of data, is then used by the data of detection and removes joint software Low quality base is cut off to the joint sequence in read and end to end.Read after cleaning is used for ensuing sequence alignment. Here, this example compares software using quick short-movie section and human genome is compared as reference gene group.
2nd, the single base mutation of case is detected:
The RNA-seq variation detection best practices flows (http that this example is provided in this step according to GATK:// gatkforums.broadinstitute.org/gatk/discussion/3892/the-gatk-best-practices- For-variant-calling-on-rnaseq-in-full-detail) operated.First to comparing obtained text in 1. Part removes the read of redundancy, then read is carried out to cut out tail processing, and read is taken apart by extron section, base correction is performed, right SNP and mononucleotide insertion and deletion are detected, finally using human genome variation database resource, are made The single base variation detected is annotated and filtered with variation annotating software.
3rd, the chromosomal variation of case is detected:
The structure variation that transcript profile sequencing data can be detected predominantly Gene Fusion.Here to comparison result in 1. Use the Gene Fusion event that can be seen on Gene Fusion software detection transcript profile.
4th, gene expression amount is detected:
The step for also using in 1. compare file as fragment assembly composite software input file, for transcribing This splicing and the calculating of expression quantity.We are only considered without offer cancer beside organism and the transcription of disclosed cancer in this embodiment Also the situation without cancer beside organism in group database.
5th, case group data result visual presentation:
The multigroup variation features of learning of the entirety of individual case are shown with loop graph.Loop graph is made up of four parts from inside to outside, most in Face shows the generation position of Gene Fusion event, the generation position of single base mutation event is then shown, next to that base Because of the expression in whole transcript profile, outermost layer is the chromosome position information with annotation.
All kinds of statistical charts produced during analysis, such as scatter diagram, histogram, pie chart are realized by statistic software R Visualization output.
3) the multigroup realization for learning variation collaboration matching algorithm of case-model:Multigroup variation features database integration is more Data source, the variation features information of the correlation model of multigroup angle.When case completes multigroup extraction for learning variation features, enter , it is necessary to which there is provided the matching algorithm of case-model according to the type of model in database when entering case matching system.
In this example, knowledge base provides three class models:1. with diagnosis correlation model;2. cell line correlation model; 3. case correlation model.
The intervention result that the correlation model of population level is provided is typically for a certain or certain several variation of specific group Influence of the feature in big population sample to medicine response.Therefore the strategy that example is used to the model is, when being compared such as Fruit disease example and a population model have identical group of variation features, and the match is successful with the population model for reported cases, Otherwise it fails to match.
The correlation model of cell line level and the correlation model of individual level are each provided with complete single base mutation, dyeing Body structural mutation and gene expression spectrum information.Therefore this example employs a similarity score for combining this three aspects information Method measures the similitude of case and model.Wherein using the correlation model and the correlation model of individual level of cell line level It is finally to decide whether that the threshold parameter that the match is successful is different from the difference that case is matched.It is the reality of scoring method below Existing step:
First, for single base mutation:This example measures the work(of single base mutation in case and model using DANN methods Energy importance, respectively to occurring the progress of the DANN values in the site of notable single base function mutation in case and model on each gene Summation, measures on the gene single base function mutation to physiological influence.It is similar that case and the gene function of this in model are mutated Property score value can pass through formula 1- | Csnv-Msnv|/Max{Csnv,MsnvObtain, wherein CsnvDashed forward for the function of a certain gene in case Become influence value, MsnvFor the function mutation influence value of model.The score value can be similar with the gene function of model as case is weighed One index V1 of property.
2nd, for chromosomal structural variation:There is presently no method of the direct measure gene fusion to physiological effect degree. Influence in view of usual structure variation to gene physiological function is very serious, this example with a customized index V2 (0 or 1) similitude of case and sample in Gene Fusion event is weighed.If in case and model, a certain gene there occurs Gene Fusion or non-producer fusion, then V2 values are 1, and otherwise V2 values are 0.
3rd, for unconventionality expression gene:One index V3 of this example definition is abnormal to weigh gene expression amount, and formula is V3=1- | Cexp-Mexp|/Max{Cexp,Mexp, wherein CexpAnd MexpRespectively express spectra case and mould after standardization The expression quantity of a certain gene in type.
In this example, it is contemplated that abnormal gene expression has reacted the variation on transcriptional level, single base mutation or dyeing Body structure variation has reacted the variation on genome, therefore needs both comprehensive effect when integrating these indexs.In example Case is finally defined as V=Min { V3*V1, V3*V2 }, wherein V1, V2, V3 with model for the similarity score values of a certain gene For specification three similarity indices mentioned hereinbefore.To a certain specific gene, if similarity score values are higher than 0.5, recognize It is consistent with performance in model in case for the gene.When the gene that half is had more than in case is showed and their tables in a model It is now consistent, then it is assumed that case and Model Matching success, otherwise it is assumed that it fails to match.
4) analysis report is produced according to the matching result of case:
Analysis report displaying is broadly divided into two parts:Individual case information and knowledge base searching result displaying.
Individual case information displaying is included in this example:
1. be sequenced sample essential information (comprising sample name, the sample presentation time, be sequenced the time, sequenator model, sample label, Data saturation degree assesses parameter);
2. a group data integrally show figure, transcript profile sequencing data statistical information (includes the original read number of sample, after cleaning Read number, compares the reading segment number information compared to the read number in reference gene group, specificity);
3. detect the expression and distribution histogram of the gene of expression, the chart of difference expression gene;
4. the quantity statistics and variation file format of single base variation and structure variation are understood on genome;
5. the expression document location of initial data QC reported positions, gene and transcript, the file position of difference expression gene Put, the document location of single base variation information, the document location of Gene Fusion information.
The displaying of knowledge base searching result is included:
1. the essential information (types of models, initial data source, model name, disease name etc.) of the model matched;
2. support the evidence that case matches model (type of the index matched in model and case, index name, to refer to Target metric etc.);
3. the clinical application reference information of the model matched (whether medicine name, model respond to medicine)
5) self-evolution of search system:
First, the renewal of accurate medical knowledge base:The case analyzed entering knowledge base is tracked, and treatment is abided by according to case Therapeutic effect and long-term final result, set up case group variation features-intervention response correlation model, add knowledge base.To first entrance Knowledge base does not search the case of Matching Model, it is considered to set up individuation disease model (PDX mouse models or PDO organoids Model), the reaction according to external individuation disease model to different pharmaceutical, set up individuation disease model group variation features- Medicine response correlation model, adds knowledge base.
2nd, the self-evolution of matching algorithm:When certain value is arrived in the accumulation of the model I of certain in knowledge base in search system quantity, The M class models can be randomly choosed, the response of medicine is classified according to them, for the matching calculation for the class model The assessment of method.When the matching algorithm of a new case and the class model is realized, can compare new matching algorithm and The assessment result of old matching algorithm.If new method and the uniformity classified according to the response of medicine are higher, illustrate new Application effect with algorithm under real situation more preferably, updates the matching algorithm, otherwise illustrates that former algorithm is performed better than, abandons more New algorithm.
Listed above is only the specific embodiment of the present invention.It is clear that the invention is not restricted to which above example, can also have Many deformations, all deformations that one of ordinary skill in the art directly can export or associate from present disclosure, It is considered as protection scope of the present invention.

Claims (9)

1. it is a kind of based on the multigroup accurate medical knowledge search system for learning variation features of case, it is characterised in that described system Including:
One accurate medical knowledge base, becomes iso- intervention response correlation model for collecting multigroup learn, realizes to varying level The collection of " group learns variation features-intervention response " information is integrated;
Optimizable matching algorithm, for judging whether case matches and matching degree with model in knowledge base;
The assessment algorithm of matching algorithm, for by assessing cluster result of the matching algorithm to knowledge base model, with model according to The result that the labeling of intervention response is obtained is contrasted, and can assess the quality of matching algorithm, algorithm is constantly optimized;
What search system was directly generated contains the form of case group analyze data and systematic search result, for being carried for doctor Physiological significance reference for group learning data, supplemental treatment regimens are drafted.
2. it is according to claim 1 based on the multigroup accurate medical knowledge search system for learning variation features of case, its feature It is, the varying level includes population level, individual level, tissue level and cell line level.
3. a kind of accurate medical knowledge search system as claimed in claim 1 or 2 based on the multigroup variation features of case Implementation method, it is characterised in that described implementation method is achieved by the steps of:
1), set up and the iso- accurate medical knowledge base for intervening response correlation model is become based on multigroup learn;
2), when new case occurs, multigroup variation features of new case are extracted;
3) matching algorithm, set up between new case and model (known multigroup variation features-intervention response is associated);
4) analysis report of case matching system, is produced;
5), the data renewal of knowledge base and the self-evolution of matching algorithm.
4. the realization side according to claim 3 based on the multigroup accurate medical knowledge search system for learning variation features of case Method, it is characterised in that described step 1) in, multigroup variation information includes single base in the genome area of transcriptionally active and dashed forward Become, chromosomal variation and for judge gene whether the benchmark gene expression amount of abnormal expression.
5. the realization side according to claim 3 based on the multigroup accurate medical knowledge search system for learning variation features of case Method, it is characterised in that described step 1) in, a multigroup iso- intervention response correlation model of change, which is one group, adjoint diagnosis Medicine response annotates " with the diagnosis correlation model " with multigroup variation features, or medicine sound is included in medicament screening experiment Answer " the cell line correlation model " of information and multigroup variation features, or clinical observation arrive comprising intervening response results and many Group learns " the case correlation model " of variation features, the or " individual comprising drug screening object information and multigroup variation features Change disease model correlation model ".
6. the reality based on the multigroup accurate medical knowledge search system for learning variation features of case according to claim 4 or 5 Existing method, it is characterised in that described step 2) in, the group data analysis flow for setting up a set of standardization extracts multigroup change It is different, from sample collection, sequencing, data analysis, Quality Control and quality guarantee are carried out to knowledge base matching overall process.
7. the realization side according to claim 6 based on the multigroup accurate medical knowledge search system for learning variation features of case Method, it is characterised in that described step 3) in, search system provides a starting matching algorithm and commenting for matching algorithm Estimate method, appraisal procedure can assess existing calculation according to being showed using Different matching algorithm the cluster of correlation model in knowledge base Whether method is better than new algorithm, and deciding whether to upgrade to algorithm optimizes.
8. according to claim 4 or 5 or 7 based on the multigroup accurate medical knowledge search system for learning variation features of case Implementation method, it is characterised in that described step 4) in, described report is divided into two parts:Part I is to pathophysiological The statistical information of related multigroup variation features shows, from the side such as single base mutation, chromosomal variation and difference expression gene Face provides the group variation information of pathological tissues;Part II is after the search to knowledge base is completed, according to model in system Sorted from high to low with the similitude of case show model match evidence and medication information.
9. it is according to claim 3 based on the multigroup accurate medical knowledge search system for learning variation features of case according to right Implementation method, it is characterised in that described step 5) in, when case matches model in knowledge base, track case medication Multigroup variation features data of actual case and medication effect are added essence by therapeutic effect as one " case correlation model " Quasi- medical knowledge base, expands the coverage of knowledge base and improves the matching precision of knowledge base;When not searched in knowledge base During the correlation model that can be matched, directly treated according to doctors experience, while can develop case sets up individuation disease model, and with Track case intervenes the reagent result of response results and individual disease model, builds corresponding " case correlation model " or " individuation Disease model correlation model " adds accurate medical knowledge base.
CN201710218630.XA 2017-04-05 2017-04-05 Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method Active CN107103207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710218630.XA CN107103207B (en) 2017-04-05 2017-04-05 Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710218630.XA CN107103207B (en) 2017-04-05 2017-04-05 Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method

Publications (2)

Publication Number Publication Date
CN107103207A true CN107103207A (en) 2017-08-29
CN107103207B CN107103207B (en) 2020-07-03

Family

ID=59675265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710218630.XA Active CN107103207B (en) 2017-04-05 2017-04-05 Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method

Country Status (1)

Country Link
CN (1) CN107103207B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108320797A (en) * 2018-01-18 2018-07-24 中山大学 A kind of nasopharyngeal carcinoma database and based on the data the synthesis dicision of diagnosis and treatment method in library
CN108335748A (en) * 2018-01-18 2018-07-27 中山大学 A kind of nasopharyngeal carcinoma artificial intelligence assisting in diagnosis and treatment policy server cluster
CN108509771A (en) * 2018-03-27 2018-09-07 华南理工大学 One kind finding method based on sparse matched multigroup data correlation relation
CN109599157A (en) * 2018-11-29 2019-04-09 同济大学 A kind of accurate intelligent diagnosis and treatment big data system
CN110379460A (en) * 2019-06-14 2019-10-25 西安电子科技大学 A kind of cancer parting information processing method based on multiple groups data
CN110660055A (en) * 2019-09-25 2020-01-07 北京青燕祥云科技有限公司 Disease data prediction method and device, readable storage medium and electronic equipment
CN110656172A (en) * 2019-01-14 2020-01-07 南方医科大学珠江医院 Molecular marker and kit for predicting sensitivity of small cell lung cancer to EP chemotherapy scheme
CN112053783A (en) * 2020-08-27 2020-12-08 北京颢云信息科技股份有限公司 Disease intelligent prediction modeling method based on multiple groups of mathematical data
CN112070731A (en) * 2020-08-27 2020-12-11 佛山读图科技有限公司 Method for guiding registration of human body model atlas and case CT image by artificial intelligence
CN117457068A (en) * 2023-06-30 2024-01-26 上海睿璟生物科技有限公司 Multi-genetics-based functional biomarker screening method, system, terminal and medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020064792A1 (en) * 1997-11-13 2002-05-30 Lincoln Stephen E. Database for storage and analysis of full-length sequences
CN1547721A (en) * 2001-08-28 2004-11-17 System, method, and apparatus for storing, retrieving, and integrating clinical, diagnostic, genomic, and therapeutic data
US20110256545A1 (en) * 2010-04-14 2011-10-20 Nancy Lan Guo mRNA expression-based prognostic gene signature for non-small cell lung cancer
CN102637245A (en) * 2001-05-25 2012-08-15 株式会社日立制作所 Information processing system using nucleotide sequence-related information
CN103955608A (en) * 2014-04-24 2014-07-30 上海星华生物医药科技有限公司 Intelligent medical information remote processing system and processing method
CN104067278A (en) * 2011-11-18 2014-09-24 加利福尼亚大学董事会 Bambam: parallel comparative analysis of high-throughput sequencing data
CN105229649A (en) * 2013-03-15 2016-01-06 百世嘉(上海)医疗技术有限公司 For the human genome analysis of variance of disease association and the system and method for report
CN105229651A (en) * 2013-05-23 2016-01-06 皇家飞利浦有限公司 DNA sequence dna fast and the retrieval of safety
CN105701342A (en) * 2016-01-12 2016-06-22 西北工业大学 Agent-based construction method and device of intuitionistic fuzzy theory medical diagnosis model
CN105760705A (en) * 2016-05-20 2016-07-13 陕西科技大学 Medical diagnosis system based on big data
CN106202936A (en) * 2016-07-13 2016-12-07 为朔医学数据科技(北京)有限公司 A kind of disease risks Forecasting Methodology and system
CN106227992A (en) * 2016-07-13 2016-12-14 为朔医学数据科技(北京)有限公司 A kind of recommendation method and system of therapeutic scheme

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020064792A1 (en) * 1997-11-13 2002-05-30 Lincoln Stephen E. Database for storage and analysis of full-length sequences
CN102637245A (en) * 2001-05-25 2012-08-15 株式会社日立制作所 Information processing system using nucleotide sequence-related information
CN1547721A (en) * 2001-08-28 2004-11-17 System, method, and apparatus for storing, retrieving, and integrating clinical, diagnostic, genomic, and therapeutic data
US20110256545A1 (en) * 2010-04-14 2011-10-20 Nancy Lan Guo mRNA expression-based prognostic gene signature for non-small cell lung cancer
CN104067278A (en) * 2011-11-18 2014-09-24 加利福尼亚大学董事会 Bambam: parallel comparative analysis of high-throughput sequencing data
CN105229649A (en) * 2013-03-15 2016-01-06 百世嘉(上海)医疗技术有限公司 For the human genome analysis of variance of disease association and the system and method for report
CN105229651A (en) * 2013-05-23 2016-01-06 皇家飞利浦有限公司 DNA sequence dna fast and the retrieval of safety
CN103955608A (en) * 2014-04-24 2014-07-30 上海星华生物医药科技有限公司 Intelligent medical information remote processing system and processing method
CN105701342A (en) * 2016-01-12 2016-06-22 西北工业大学 Agent-based construction method and device of intuitionistic fuzzy theory medical diagnosis model
CN105760705A (en) * 2016-05-20 2016-07-13 陕西科技大学 Medical diagnosis system based on big data
CN106202936A (en) * 2016-07-13 2016-12-07 为朔医学数据科技(北京)有限公司 A kind of disease risks Forecasting Methodology and system
CN106227992A (en) * 2016-07-13 2016-12-14 为朔医学数据科技(北京)有限公司 A kind of recommendation method and system of therapeutic scheme

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S. YILMAZ等: "《Gene–disease relationship discovery based on model-driven data integration and database view definition》", 《BIOINFORMATICS》 *
汪鹏等: "《医疗大数据临床应用的探索与实践》", 《中国数字医学》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108320797A (en) * 2018-01-18 2018-07-24 中山大学 A kind of nasopharyngeal carcinoma database and based on the data the synthesis dicision of diagnosis and treatment method in library
CN108335748A (en) * 2018-01-18 2018-07-27 中山大学 A kind of nasopharyngeal carcinoma artificial intelligence assisting in diagnosis and treatment policy server cluster
CN108509771A (en) * 2018-03-27 2018-09-07 华南理工大学 One kind finding method based on sparse matched multigroup data correlation relation
CN109599157A (en) * 2018-11-29 2019-04-09 同济大学 A kind of accurate intelligent diagnosis and treatment big data system
CN110656172A (en) * 2019-01-14 2020-01-07 南方医科大学珠江医院 Molecular marker and kit for predicting sensitivity of small cell lung cancer to EP chemotherapy scheme
CN110379460A (en) * 2019-06-14 2019-10-25 西安电子科技大学 A kind of cancer parting information processing method based on multiple groups data
CN110379460B (en) * 2019-06-14 2023-06-20 西安电子科技大学 Cancer typing information processing method based on multiple sets of chemical data
CN110660055A (en) * 2019-09-25 2020-01-07 北京青燕祥云科技有限公司 Disease data prediction method and device, readable storage medium and electronic equipment
CN110660055B (en) * 2019-09-25 2022-11-29 北京青燕祥云科技有限公司 Disease data prediction method and device, readable storage medium and electronic equipment
CN112053783A (en) * 2020-08-27 2020-12-08 北京颢云信息科技股份有限公司 Disease intelligent prediction modeling method based on multiple groups of mathematical data
CN112070731A (en) * 2020-08-27 2020-12-11 佛山读图科技有限公司 Method for guiding registration of human body model atlas and case CT image by artificial intelligence
CN112070731B (en) * 2020-08-27 2021-05-11 佛山读图科技有限公司 Method for guiding registration of human body model atlas and case CT image by artificial intelligence
CN117457068A (en) * 2023-06-30 2024-01-26 上海睿璟生物科技有限公司 Multi-genetics-based functional biomarker screening method, system, terminal and medium
CN117457068B (en) * 2023-06-30 2024-05-24 上海睿璟生物科技有限公司 Multi-genetics-based functional biomarker screening method, system, terminal and medium

Also Published As

Publication number Publication date
CN107103207B (en) 2020-07-03

Similar Documents

Publication Publication Date Title
CN107103207A (en) Based on the multigroup accurate medical knowledge search system and implementation method for learning variation features of case
Thabtah et al. A new computational intelligence approach to detect autistic features for autism screening
Jabbar et al. Intelligent heart disease prediction system using random forest and evolutionary approach
Wilson et al. Application of data mining techniques in pharmacovigilance
CN107247881A (en) A kind of multi-modal intelligent analysis method and system
CN106650256A (en) Precise medical platform for molecular diagnosis and treatment
CN110021364A (en) Analysis detection system based on patients clinical symptom data and full sequencing of extron group data screening single gene inheritance disease Disease-causing gene
CN109920551A (en) Autism children social action performance characteristic analysis system based on machine learning
Wijaya et al. Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines
Sessa et al. Artificial intelligence in pharmacoepidemiology: a systematic review. Part 1—overview of knowledge discovery techniques in artificial intelligence
US20110093448A1 (en) System method and computer program product for pedigree analysis
CN111477295A (en) Traditional Chinese medicine formula recommendation method and system based on latent semantic model
JP5540986B2 (en) Program and medical assistance device
Adi et al. Stroke risk prediction model using machine learning
Shahmoradi et al. Systematic review of using medical informatics in lung transplantation studies
CN112071431B (en) Clinical path automatic generation method and system based on deep learning and knowledge graph
CN113517044A (en) Clinical data processing method and system for evaluating citicoline based on pharmacokinetics
Papageorgiou et al. Unsupervised Learning in NBA Injury Recovery: Advanced Data Mining to Decode Recovery Durations and Economic Impacts
Avramidou et al. Classification binary trees with SSR allelic sizes: Combining regression trees with genetic molecular data in order to characterize genetic diversity between cultivars of Olea Europaea L.
Ghanem et al. Deep learning approaches for glioblastoma prognosis in resource-limited settings: A study using basic patient demographic, clinical, and surgical inputs
Wang et al. Big data analytics for medication management in diabetes mellitus
Di Camillo et al. From translational bioinformatics computational methodologies to personalized medicine
Vignesh et al. A NEW ITJ METHOD WITH COMBINED SAMPLE SELECTION TECHNIQUE TO PREDICT THE DIABETES MELLITUS.
İnceoğlu et al. Prediction of effective sociodemographic variables in modeling health literacy: a machine learning approach
Bakar et al. Predicting depression using social media posts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant