CN108629148A - The genome analytical method and device of ocular physiology information based on phenotypic analysis - Google Patents

The genome analytical method and device of ocular physiology information based on phenotypic analysis Download PDF

Info

Publication number
CN108629148A
CN108629148A CN201710153482.8A CN201710153482A CN108629148A CN 108629148 A CN108629148 A CN 108629148A CN 201710153482 A CN201710153482 A CN 201710153482A CN 108629148 A CN108629148 A CN 108629148A
Authority
CN
China
Prior art keywords
information
variant sites
database
value
ocular physiology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710153482.8A
Other languages
Chinese (zh)
Inventor
蓝章彰
杨传春
许详阳
陈川
张文勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Letu Biotechnology Co., Ltd.
Original Assignee
Shenzhen Paradise Precision Medical Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Paradise Precision Medical Technology Co Ltd filed Critical Shenzhen Paradise Precision Medical Technology Co Ltd
Priority to CN201710153482.8A priority Critical patent/CN108629148A/en
Publication of CN108629148A publication Critical patent/CN108629148A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Bioethics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Genetics & Genomics (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The genome analytical method and device for the ocular physiology information based on phenotypic analysis that the invention discloses a kind of.The method of the present invention includes:Obtain give phenotypic information and with the relevant annotation genomic information of ocular physiology information, obtain the data in gene frequency database, mutation forecasting database, human genetic variation's database and human phenotype database;Calculate variant sites score value;Calculate the related coefficient with the relevant ocular physiology information of given phenotypic information;The weight values of variant sites score value and related coefficient are calculated, and maximally related ocular physiology information is obtained according to weight values.The method of the present invention can automatically screening go out the candidate gene met with given phenotype and variant sites, and according to relevance score, obtain most probable and the relevant ocular physiology information of given phenotype.

Description

The genome analytical method and device of ocular physiology information based on phenotypic analysis
Technical field
The present invention relates to genome analysis technical field more particularly to a kind of ocular physiology information based on phenotypic analysis Genome analytical method and device.
Background technology
Ocular physiology information status, which plays visual performance, has great influence, these ocular physiology information may be disease Become information, such as a variety of genetic diseases of eye, it is also possible to which normal, health physiologic information, such as eyes function are normal, nothing Input and output obstacle.Previous researcher using a variety of methods such as linkage analysis, association analysis to these ocular physiology information (including Disease) numerous studies have been carried out, achieve certain achievement.With the development of second generation sequencing technologies, DNA sequencing efficiency is apparent It improves, cost significantly reduces;And the appearance of exon trapping technology and with the second generation sequencing be combined more become DNA sequencing and base Because of the means of the efficiently and accurately of screening.
Using target area capture technique, shown using exon trapping chip pair and the outer of ocular physiology information related gene Son sequencing, and then carry out correlative study and clinical diagnosis, a current or new technology.The basic principle of the technology is to use A set of oligonucleotide probe captures the target sequence on genome, the sequence that then these are captured using universal primer into Row PCR amplification, then high-flux sequence is carried out to these amplification volume increase objects and passes through life to identify the base sequence in DNA sample Object information analysis method analyzes sequencing gained sequence information, to find the variation information of target sequence, including monokaryon Thuja acid variation, insertion/deletion etc..
Invention content
The present invention provides a kind of method and apparatus that the genome for ocular physiology information carries out analysis of biological information, energy Enough automatically screenings go out the candidate gene met with given phenotype and variant sites, and according to relevance score, and acquisition most may be used It can be with the relevant ocular physiology information of given phenotype.
According to the first aspect of the invention, the present invention provides a kind of genome of the ocular physiology information based on phenotypic analysis Analysis method, including:
Obtain give phenotypic information and with the relevant annotation genomic information of above-mentioned ocular physiology information, above-mentioned annotation base Because group information includes variant sites information, gene frequency database, mutation forecasting database, human genetic variation's number are obtained According to the data in library and human phenotype database;
Become according to above-mentioned variant sites information, above-mentioned gene frequency database, mutation forecasting database, human inheritance Data in different database calculate variant sites score value;
According to the data in phenotypic information given herein above and above-mentioned human phenotype database, calculates and believe with phenotype given herein above Cease the related coefficient of relevant ocular physiology information;
The weight values of above-mentioned variant sites score value and above-mentioned related coefficient are calculated, and are obtained most according to above-mentioned weight values Relevant ocular physiology information.
Further, above-mentioned variant sites score value be frequency score, function score value and harmfulness score value weight values;It is excellent Selection of land, above-mentioned variant sites score value be frequency score, function score value and harmfulness score value average value.
Further, the calculation formula of said frequencies score value is:
Wherein, f indicates above-mentioned variant sites in above-mentioned gene frequency Frequency values in database;
Preferably, above-mentioned gene frequency database includes tetra- databases of dbSNP, ESP6500,1000G and ExAC, Said frequencies score value is according to the maximum value in above-mentioned formula and the calculated numerical value of aforementioned four database.
Further, above-mentioned function score value determines by the following method:
If above-mentioned variant sites are missense mutation (missense), above-mentioned function score value takes several above-mentioned mutation forecastings The maximum value of above-mentioned variant sites in database;Preferably, above-mentioned mutation forecasting database include SIFT, Poly-Phen, GERP forecast databases;If the score value without above-mentioned variant sites in above-mentioned mutation forecasting database, takes default value 0.6;
If above-mentioned variant sites are frameshift mutation (Frameshift), value 0.95;
If above-mentioned variant sites are nonsense mutation (Nonsense), value 0.95;
If above-mentioned variant sites are splice site (Splice-site), value 0.90;
If above-mentioned variant sites are non-frameshift mutation (Non-frameshift), value 0.85;
If above-mentioned variant sites, which are terminator codon, lacks (Stop-loss), value 0.70;
If above-mentioned variant sites are same sense mutation (Synonymous), value 0.10.
Further, above-mentioned harmfulness score value carries out as follows according to the definition of above-mentioned human genetic variation's database (HGVD) Judge to determine:
If above-mentioned variant sites are serious pathogenic (Pathogenic), value 1;
If above-mentioned variant sites are possible harmful (likely deleterious), value 0.75;
If above-mentioned variant sites are (the likely pathogenic) that possible cause a disease, value 0.75;
It is taken if above-mentioned variant sites are possible benign (likely non-pathogenic/likely benign) Value 0.25;
If above-mentioned variant sites are benign (Benign), value 0;
If above-mentioned variant sites are other types, value 0.5.
Further, above-mentioned related coefficient determines by the following method:
IC (the information of all phenotypes in above-mentioned human phenotype database (HPO) are calculated according to following formula Content) value:
IC=-log10X, wherein x are the ocular physiology information content of the phenotype associated ocular physiologic information quantity/total;
The IC values with the immediate existing phenotype of phenotype given herein above are obtained, as phenotype given herein above eye associated therewith The related coefficient of physiologic information.
Further, the weight values of above-mentioned variant sites score value and above-mentioned related coefficient be above-mentioned variant sites score value and The sum of above-mentioned related coefficient and/or above-mentioned average value;Above-mentioned maximally related ocular physiology information is a variety of ocular physiology information In with that highest ocular physiology information in the above-mentioned weight values of phenotype given herein above.
According to the second aspect of the invention, the present invention provides a kind of genome of the ocular physiology information based on phenotypic analysis Analytical equipment, including:
Input information acquisition device, for obtain given phenotypic information and with the above-mentioned relevant annotation of ocular physiology information Genomic information, above-mentioned annotation genomic information include variant sites information, obtain gene frequency database, mutation forecasting Data in database, human genetic variation's database and human phenotype database;
Variant sites score value computing device, for according to above-mentioned variant sites information, above-mentioned gene frequency database, Data in mutation forecasting database, human genetic variation's database calculate variant sites score value;
Related coefficient computing device, for according to the number in phenotypic information given herein above and above-mentioned human phenotype database According to the related coefficient of calculating and the relevant ocular physiology information of phenotypic information given herein above;
Ocular physiology information determining means, the weighted number for calculating above-mentioned variant sites score value and above-mentioned related coefficient Value, and obtain maximally related ocular physiology information according to above-mentioned weight values.
According to the third aspect of the invention we, the present invention provides a kind of genome of the ocular physiology information based on phenotypic analysis Analytical equipment, including:
One memory,
One or more processors, and
One or more programs, said one or multiple programs are stored in above-mentioned memory, and for by above-mentioned one A or multiple processors execute, and above procedure includes for realizing the instruction of the method for such as first aspect.
According to the fourth aspect of the invention, the present invention provides a kind of computer readable storage medium, including program, above-mentioned journey Sequence can be executed by processor the method realized such as first aspect.
The genome analytical method of the ocular physiology information based on phenotypic analysis of the present invention is related to ocular physiology information Gene is detected, and is combined annotation genomic information and database data by phenotype, is fast and effeciently determined and given phenotype Maximally related ocular physiology information, can be horizontal based on current database, makes judgement the most accurate.
Description of the drawings
Fig. 1 is the genome analytical method flow chart of ocular physiology information of the embodiment of the present invention based on phenotypic analysis;
Fig. 2 is the genome analysis apparatus structure block diagram of ocular physiology information of the embodiment of the present invention based on phenotypic analysis.
Specific implementation mode
Below by specific implementation mode combination attached drawing, invention is further described in detail.
The method of the embodiment of the present invention using high-flux sequence data and combines analysis of biological information to ocular physiology information Related gene is detected, by phenotype combination sequencing information carry out genome analysis, can fast and effeciently determine with it is given The maximally related ocular physiology information of phenotype detects mutational site.
It is related to high-flux sequence in the upstream of the method for the embodiment of the present invention, main policies include:With subject's blood Liquid, saliva or other tissue-derived genomic DNAs carry out DNA and interrupt and prepare library, then lead to first as detection material Chip is crossed to target gene code area (including ocular physiology information related gene) and closes on the DNA of shear zone and carries out capture and rich Collection, is finally sequenced using high-flux sequence platform to carry out abrupt climatic change.
In the upstream of the method for the embodiment of the present invention, chip is captured using exon trapping chip or target area, it must Must include ocular physiology information related gene, and coverage in 95% or more, 30 times of valid data 85% or more.In addition, Since portion gene has high repetition low complex degree region or pseudogene, so that its all exon cannot be completely covered in detection Area, but collective coverage is up to 95% or more.
The method of the embodiment of the present invention, missing insertion mutation (micromutation) inspection being suitable within point mutation and 20bp It surveys, is not suitable for the detection of the specific types mutation such as gene large fragment copy number variation, dynamic mutation and complicated recombination, it is also uncomfortable For detecting genome structure variation (such as large fragment deletion, duplication and inversion are reset), large fragment heterozygosis insertion mutation (such as The insertion that Alu is mediated) and include positioned at Gene regulation area and depth the mutation of sub-district.
To the high-flux sequence data of lower machine, by following workflow:Lower machine initial data passes through GATK workflows " Best Practices for Germline SNP&Indel Discovery " obtain the variation file of VCF formats;It is soft using annotating Part ANNOVAR is annotated and is counted to variation testing result, and the content of annotation includes:HGVS is named, functional information (including mistake Adopted mutation, nonsense mutation, frameshift mutation, splice site, non-frameshift mutation, terminator codon missing and same sense mutation etc.), this is just It is annotation genomic information, is used as the input data of the method for the embodiment of the present invention.
As shown in Figure 1, the method for the embodiment of the present invention includes the following steps:
Step 110:Obtain give phenotypic information and with the relevant annotation genomic information of ocular physiology information, annotate base Because group information includes variant sites information, gene frequency database, mutation forecasting database, human genetic variation's number are obtained According to the data in library and human phenotype database.
Wherein given phenotypic information refers to deep with a kind of relevant ocular phenotype of ocular physiology information, such as eyeball color Shallow, pupil size etc.;Ocular physiology information refers to the physiological status of eye, can be Pathological Information, such as a variety of something lost of eye Pass disease, it is also possible to which normal, health physiologic information, such as eyes function are normal, no input and output obstacle.Especially criticize Often, healthy physiologic information, therefore the method for the present invention includes nondiagnostic purposes.
Gene frequency database contains specific allele and its frequency, and there are many such numbers in the prior art According to library, such as dbSNP, ESP6500,1000G and ExAC etc..In the method for the embodiment of the present invention, one of them can be used Such database can also use multiple such databases, in one particular embodiment of the present invention, using dbSNP, Tetra- databases of ESP6500,1000G and ExAC.
Specifically, dbSNP (https://www.ncbi.nlm.nih.gov/snp/):It refer to single nucleotide polymorphism Database (Short Genetic Variations database), what is included is single nucleotide polymorphism information, such as individually Replacement, missing or the insertion information of base, the single nucleotide polymorphism that source is collected and arranged for NCBI, it includes side to include information The wing sequence context DNA or cDNA, allele, method, population, sample size, specific population gene frequency, specific Genotype frequency, the heterozygosity estimation of specific population, idiotype, the verification information of population.The method of the embodiment of the present invention In mainly use its specific population gene frequency, with this with reference to detection single nucleotide polymorphism general applicability.Its In, specific population is such as Asia ethnic group, European ethnic group, yellow race ethnic group.ESP6500(https:// esp.gs.washington.edu/drupal/):Refer to NHLBI Grand Opportunity Exome Sequencing Project (ESP) database, data source is in the database of NHLBI GO exon sequencing projects, the side of the embodiment of the present invention The gene frequency of its specific population is mainly used in method.1000G(http:// www.internationalgenome.org/about/):Refer to thousand human genome databases, data source is in international thousand people's bases Because of a group plan, its Asian's gene frequency is mainly used in the method for the embodiment of the present invention.ExAC(http:// exac.broadinstitute.org/):Refer to exon integrated database (Exome Aggregation Consortium), It is intended to collect and coordinate the exon sequencing data from various large scale sequencing projects, and provides and pluck for wider scientific circles Want data.The gene frequency of its single nucleotide polymorphism is mainly used in the method for the embodiment of the present invention.
Mutation forecasting database, the influence caused by albumen changes after prediction mononucleotide changes, of the invention real Apply the harmfulness of the single nucleotide polymorphism in the method for example for assessing detection.There are many such data in the prior art Library, such as SIFT, Poly-Phen, GERP forecast database etc..In the method for the embodiment of the present invention, wherein one can be used A such database can also use multiple such databases to use in one particular embodiment of the present invention Tri- databases of SIFT, Poly-Phen and GERP.
Specifically, SIFT databases (http://sift.jcvi.org/) refer to Sorting Intolerant From Tolerant;Poly-Phen databases (http://genetics.bwh.harvard.edu/pph2/) refer to Polymorphism Phenotyping v2;GERP databases (https://omictools.com/genomic- Evolutionary-rate-profiling-tool) refer to Genomic Evolutionary Rate Profiling.
Human genetic variation's database (HGVD, http://www.hgvd.genome.med.kyoto-u.ac.jp/):It should Database includes to be obtained by the determining hereditary variation of 1208 individual exons sequencings and from 3248 individual queues at present Common variation genotype data.Its judgment method to mutation is mainly utilized in the method for the embodiment of the present invention.
Human phenotype database (HPO, http://human-phenotype-ontology.github.io/):Refer to Human Phenotype Ontology, provide a set of standard vocabulary, and to describe the abnormal phenotype of human phenotype, the present invention is real The method for applying example is the semantic phase between phenotype exception to phenotype aberrant gene (such as Disease-causing gene and non-Disease-causing gene etc.) Closing property is excavated, to implement the acquisition of related data, storage and exchange.
Step 120:Become according to variant sites information, gene frequency database, mutation forecasting database, human inheritance Data in different database calculate variant sites score value.
Variant sites score value is used for the degree of variation of quantification variant sites, can be defined by different quantitative targets Variant sites score value.In a specific embodiment of the invention, variant sites score value includes frequency score, function score value and harm Property score value, final variant sites score value is the weight values of above-mentioned three;In a more specific embodiment of the invention, variation Site score value be frequency score, function score value and harmfulness score value average value.Certainly, in other embodiments, variant sites Score value can also be one or two in above-mentioned three, this depends on the type and quantity of available database.From this hair From the point of view of bright spirit, the type and quantity of available database are more, more can more fully embody said frequencies point Value, function score value and harmfulness score value, so as to obtain more accurately scoring and prediction result.
Specifically, (1) single nucleotide polymorphism (single nucleotide polymorphism, SNP), mainly Refer to the DNA sequence polymorphism caused by a single nucleotide variation in genomic level.It is human heritable mutation Middle the most common type.Account for 90% or more of all known polymorphisms.SNP is widely present in human genome, and average every 500 Just there is 1 in~1000 base-pairs, estimates that its sum is even more up to 3,000,000.Frequency score is primarily referred to as SNP each Frequency in large database concept, in simple terms, the high SNP of frequency seldom can be associated with physiologic information abnormal (such as disease), lead The usually low frequency mutation that physiologic information abnormal (such as disease) occurs is caused, so can simply assess this with frequency score Whether SNP can cause physiologic information abnormal.(2) generation of each SNP, will produce different functions, (missense mutation, nonsense are prominent Change, frameshift mutation, splice site, non-frameshift mutation, terminator codon missing or same sense mutation etc.), these functions are on access Change, can lead to that certain dysfunction occurs, different function score values represents the difference of its function.(3) HGVD databases pair Corresponding SNP classifies, its comprehensive classification be divided into 7 classes (serious pathogenic, it is general it is pathogenic, may it is harmful, may It is pathogenic, may benign, benign and other types completely), each classification leads to the evidence that physiologic information occurs extremely Differ, the present invention defines its weight using different harmfulness score values.
Specific to frequency score, the assignment of function score value and harmfulness score value, in a specific embodiment of the invention:
The calculation formula of frequency score is:
Wherein, f indicates variant sites in gene frequency database Frequency values.
Wherein gene frequency database can be in tetra- databases of dbSNP, ESP6500,1000G and ExAC It is one or more.From the point of view of the accuracy of scoring, frequency score is calculated using aforementioned four database, that is to say, that Corresponding frequency point is calculated according to it in the corresponding frequency values f of variant sites for each gene frequency database Value s (f) then takes maximum value therein.
Function score value determines by the following method:
If variant sites are missense mutation (missense), function score value takes the change in several mutation forecasting databases The maximum value of ectopic sites, such as mutation forecasting database become using in the case of tri- databases of SIFT, Poly-Phen, GERP Ectopic sites are missense mutation, then take the maximum value of function score value of the variant sites in three databases.If mutation forecasting data Without the score value for giving variant sites in library, then default value 0.6 is taken.
Assignment rule in other cases is:If variant sites are frameshift mutation (Frameshift), value 0.95; If variant sites are nonsense mutation (Nonsense), value 0.95;If variant sites are splice site (Splice-site), Then value 0.90;If variant sites are non-frameshift mutation (Non-frameshift), value 0.85;If variant sites are to terminate Codon lacks (Stop-loss), then value 0.70;If variant sites are same sense mutation (synonymous), value 0.10.
Harmfulness score value makes the following judgment determination according to the definition of human genetic variation's database (HGVD):If becoming dystopy Point is serious pathogenic (Pathogenic), then value 1;If variant sites are possible harmful (likely Deleterious), then value 0.75;If variant sites are (the likely pathogenic) that possible cause a disease, value 0.75; If variant sites are (likely non-pathogenic/likely benign) value 0.25 that possible benign;If variation Site is benign (Benign), then value 0;If variant sites are other types, value 0.5.
Step 130:According to the data given in phenotypic information and human phenotype database, calculate and given phenotypic information phase The related coefficient of the ocular physiology information of pass.
Circular includes two steps:HPO database score values are default and assign given phenotype entry score value, tool Body is as follows:
(a) specification phenotypic information:Using the data in human phenotype database (HPO), according to formula IC=-log10X, Middle x is the ocular physiology information content of the phenotype associated ocular physiologic information quantity/total, calculates all phenotypes in HPO databases IC (information content) value.It should be noted that these phenotypes that can calculate IC values are all HPO data Existing phenotype in library;The phenotype associated ocular physiologic information quantity, such as can refer to the phenotype associated ocular genetic disease Quantity, may also mean that the normal or healthy physiologic information of the phenotype associated ocular;Total ocular physiology information content, such as It can refer to the total quantity of eye genetic disease, may also mean that the total quantity of the normal or healthy physiologic information of eye.
(b) related coefficient for giving phenotype P and ocular physiology information D (such as physiologic information D of disease D or health) is calculated When, the ocular physiology information D and the given immediate phenotypes of phenotype P are obtained according to phenotype subtending tree and disease-phenotypic data library P ', then the corresponding IC values of phenotype P ' (i.e. the calculated result of above-mentioned steps (a)), as ocular physiology information D with it is given The related coefficient (S ') of phenotype P.
Step 140:The weight values of variant sites score value and related coefficient are calculated, and most related according to weight values acquisition Ocular physiology information.
One gene (or variant sites) may correspond to a variety of diseases or normal physiological information.The step fully utilizes The calculated variant sites score value of face step and related coefficient, consider their weight, to each gene (or become dystopy Point) a variety of ocular physiology information corresponding thereto score.
Specific method of weighting and weight proportion, can be true according to variant sites score value and reliability of correlation coefficient Fixed, reliability is bigger, can assign higher weight;On the contrary, reliability is smaller, smaller weight should be assigned.Reliability can It is related with the number of the value volume and range of product of the database used in above step, it is however generally that, the quantity of the database of use It is more with type, there may be higher reliability.
In one embodiment of the invention, the weight values of the step are the sum of variant sites score value and related coefficient, Or the average value of the sum of variant sites score value and related coefficient, that is to say, that variant sites score value and related coefficient take identical Weight.In other embodiments, variant sites score value and related coefficient take different weights.
It is relevant with given phenotype in a variety of ocular physiology information, according to the weight values calculated separately out, Take that highest corresponding ocular physiology information of weight values, as with the maximally related ocular physiology information of given phenotype.
As shown in Fig. 2, the embodiment of the present invention also provides a kind of gene component of the ocular physiology information based on phenotypic analysis Analysis apparatus, including:
Input information acquisition device 210, for obtain given phenotypic information and with the relevant annotation of ocular physiology information Genomic information, annotation genomic information include variant sites information, obtain gene frequency database, mutation forecasting data Data in library, human genetic variation's database and human phenotype database;
Variant sites score value computing device 220, for according to variant sites information, gene frequency database, mutation Data in forecast database, human genetic variation's database calculate variant sites score value;
Related coefficient computing device 230, for according to the data given in phenotypic information and human phenotype database, calculating With the related coefficient of the relevant ocular physiology information of given phenotypic information;
Ocular physiology information determining means 240, the weight values for calculating variant sites score value and related coefficient, and according to Maximally related ocular physiology information is obtained according to weight values.
The embodiment of the present invention also provides a kind of genome analysis device of the ocular physiology information based on phenotypic analysis, packet It includes:
One memory,
One or more processors, and
One or more programs, said one or multiple programs are stored in above-mentioned memory, and for by above-mentioned one A or multiple processors execute, and above procedure includes for realizing the instruction of the method for such as embodiment of the present invention.
It will be understood by those skilled in the art that all or part of step of various methods can pass through in the above embodiment Program instructs related hardware to complete, which can be stored in a computer readable storage medium, storage medium can wrap It includes:Read-only memory, random access memory, disk or CD etc..Therefore, the embodiment of the present invention also provides a kind of computer-readable deposit Storage media, including program, above procedure can be executed by processor the method realized such as the embodiment of the present invention.
The technical solution and technique effect that the present invention will be described in detail by the following examples, it should be understood that embodiment is only Illustratively, it should not be understood as limiting the scope of the invention.
The present embodiment raises the physically different doubtful person of 1 retinal pigment degeneration, and doubtful person's phenotypic information is as follows:
Man, 14 years old.Five years ago starts eyesight and is gradually reduced, with yctalopia.Through funduscopy, it is found that there are different for retina Often, wax yellow colored appearance is presented in optic disk.
Inputting phenotype entry is:Retinosis (retinal degeneration);Bone-shaped copulatory spicules (bony spicule);Yctalopia (night blindness).
Sequencing data annotates result such as the following table 1 (as space is limited, only listing moiety site):
Table 1
Frequency score, function score value, harmfulness score value, gene-site score value ((frequency are calculated according to the method for the present invention Score value+function score value+harmfulness score value)/3), phenotype related coefficient, final score value ((gene-site score value+phenotype phase relation Number)/2) such as the following table 2:
Table 2
Judging result:NM_001142800.1(EYS):c.9405T>Retinal pigment caused by A (p.Tyr3135Ter) It is denaturalized (Retinitis pigmentosa) physically different result.It is consistent with document report verification through conventional method.
The above content is combining, specific embodiment is made for the present invention to be further described, and it cannot be said that this hair Bright specific implementation is confined to these explanations.For those of ordinary skill in the art to which the present invention belongs, it is not taking off Under the premise of from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to the protection of the present invention Range.

Claims (10)

1. a kind of genome analytical method of the ocular physiology information based on phenotypic analysis, which is characterized in that including:
Obtain give phenotypic information and with the relevant annotation genomic information of the ocular physiology information, the annotation genome Information includes variant sites information, obtains gene frequency database, mutation forecasting database, human genetic variation's database With the data in human phenotype database;
According to the variant sites information, the gene frequency database, mutation forecasting database, human genetic variation's number According to the data in library, variant sites score value is calculated;
According to the data in the given phenotypic information and the human phenotype database, calculate and the given phenotypic information phase The related coefficient of the ocular physiology information of pass;
The weight values of the variant sites score value and the related coefficient are calculated, and most related according to weight values acquisition Ocular physiology information.
2. the genome analytical method of the ocular physiology information according to claim 1 based on phenotypic analysis, feature exist In, the variant sites score value be frequency score, function score value and harmfulness score value weight values;Preferably, the variation Site score value be frequency score, function score value and harmfulness score value average value.
3. the genome analytical method of the ocular physiology information according to claim 2 based on phenotypic analysis, feature exist In the calculation formula of the frequency score is:
S (f)=max (0,1-0.13533e100*F), wherein f indicates the variant sites in the gene frequency database In frequency values;
Preferably, the gene frequency database includes tetra- databases of dbSNP, ESP6500,1000G and ExAC, described Frequency score is according to the maximum value in the formula and the calculated numerical value of four databases.
4. the genome analytical method of the ocular physiology information according to claim 2 based on phenotypic analysis, feature exist In the function score value determines by the following method:
If the variant sites are missense mutation, the function score value takes described in several described mutation forecasting databases The maximum value of variant sites;Preferably, the mutation forecasting database includes SIFT, Poly-Phen, GERP forecast database; If the score value without the variant sites in the mutation forecasting database, takes default value 0.6;
If the variant sites are frameshift mutation, value 0.95;
If the variant sites are nonsense mutation, value 0.95;
If the variant sites are splice site, value 0.90;
If the variant sites are non-frameshift mutation, value 0.85;
If the variant sites lack for terminator codon, value 0.70;
If the variant sites are same sense mutation, value 0.10.
5. the genome analytical method of the ocular physiology information according to claim 2 based on phenotypic analysis, feature exist In the harmfulness score value makes the following judgment determination according to the definition of human genetic variation's database (HGVD):
If the variant sites are serious pathogenic, value 1;
If the variant sites are possible harmful, value 0.75;
If the variant sites may cause a disease, value 0.75;
If the variant sites are possible benign, value 0.25;
If the variant sites are benign, value 0;
If the variant sites are other types, value 0.5.
6. the genome analytical method of the ocular physiology information according to claim 1 based on phenotypic analysis, feature exist In the related coefficient determines by the following method:
The IC values of all phenotypes in the human phenotype database (HPO) are calculated according to following formula:
IC=-log10X, wherein x are the ocular physiology information content of the phenotype associated ocular physiologic information quantity/total;
The IC values with the given immediate existing phenotype of phenotype are obtained, as the given phenotype ocular physiology associated therewith The related coefficient of information.
7. the genome analytical method of the ocular physiology information according to claim 1 based on phenotypic analysis, feature exist In, the weight values of the variant sites score value and the related coefficient be the variant sites score value and the related coefficient it And/or the sum of described average value;The maximally related ocular physiology information be in a variety of ocular physiology information with the given table That highest ocular physiology information in the weight values of type.
8. a kind of genome analysis device of the ocular physiology information based on phenotypic analysis, which is characterized in that including:
Input information acquisition device, for obtain given phenotypic information and with the relevant annotation gene of the ocular physiology information Group information, the annotation genomic information include variant sites information, obtain gene frequency database, mutation forecasting data Data in library, human genetic variation's database and human phenotype database;
Variant sites score value computing device, for according to the variant sites information, the gene frequency database, mutation Data in forecast database, human genetic variation's database calculate variant sites score value;
Related coefficient computing device, for according to the data in the given phenotypic information and the human phenotype database, meter Calculate the related coefficient with the given relevant ocular physiology information of phenotypic information;
Ocular physiology information determining means, the weight values for calculating the variant sites score value and the related coefficient, and Maximally related ocular physiology information is obtained according to the weight values.
9. a kind of genome analysis device of the ocular physiology information based on phenotypic analysis, which is characterized in that described device includes:
One memory,
One or more processors, and
One or more programs, one or more of programs are stored in the memory, and for by one or Multiple processors execute, and described program includes for realizing the instruction of method as described in any one of claim 1 to 7.
10. a kind of computer readable storage medium, which is characterized in that including program, described program can be executed by processor reality Existing method as described in any one of claim 1 to 7.
CN201710153482.8A 2017-03-15 2017-03-15 The genome analytical method and device of ocular physiology information based on phenotypic analysis Pending CN108629148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710153482.8A CN108629148A (en) 2017-03-15 2017-03-15 The genome analytical method and device of ocular physiology information based on phenotypic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710153482.8A CN108629148A (en) 2017-03-15 2017-03-15 The genome analytical method and device of ocular physiology information based on phenotypic analysis

Publications (1)

Publication Number Publication Date
CN108629148A true CN108629148A (en) 2018-10-09

Family

ID=63687467

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710153482.8A Pending CN108629148A (en) 2017-03-15 2017-03-15 The genome analytical method and device of ocular physiology information based on phenotypic analysis

Country Status (1)

Country Link
CN (1) CN108629148A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109390038A (en) * 2018-12-25 2019-02-26 人和未来生物科技(长沙)有限公司 The pathogenic detection method of the mutation that group's frequency is combined with mutation forecasting and system
CN111883210A (en) * 2020-06-08 2020-11-03 国家卫生健康委科学技术研究所 Single-gene disease name recommendation method and system based on clinical features and sequence variation
CN117877578A (en) * 2024-01-16 2024-04-12 广东劢智医疗科技有限公司 Gene variation scoring and sorting method for genetic variation analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8796182B2 (en) * 2009-07-10 2014-08-05 Decode Genetics Ehf. Genetic markers associated with risk of diabetes mellitus
CN106156538A (en) * 2016-06-29 2016-11-23 天津诺禾医学检验所有限公司 The annotation method of a kind of full-length genome variation data and annotation system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8796182B2 (en) * 2009-07-10 2014-08-05 Decode Genetics Ehf. Genetic markers associated with risk of diabetes mellitus
CN106156538A (en) * 2016-06-29 2016-11-23 天津诺禾医学检验所有限公司 The annotation method of a kind of full-length genome variation data and annotation system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZEMOJTEL等: ""Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome"", 《SCIENCE TRANSLATIONAL MEDICINE》 *
王刚: ""基于疾病表型的基因语义相似性分析与应用"", 《中国硕士学位论文全文数据库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109390038A (en) * 2018-12-25 2019-02-26 人和未来生物科技(长沙)有限公司 The pathogenic detection method of the mutation that group's frequency is combined with mutation forecasting and system
CN109390038B (en) * 2018-12-25 2020-01-14 人和未来生物科技(长沙)有限公司 Method and system for detecting pathogenicity of mutation by combining population frequency with mutation prediction
CN111883210A (en) * 2020-06-08 2020-11-03 国家卫生健康委科学技术研究所 Single-gene disease name recommendation method and system based on clinical features and sequence variation
CN117877578A (en) * 2024-01-16 2024-04-12 广东劢智医疗科技有限公司 Gene variation scoring and sorting method for genetic variation analysis

Similar Documents

Publication Publication Date Title
Klein et al. Age-related macular degeneration: clinical features in a large family and linkage to chromosome 1q
Stein et al. Identification of common variants associated with human hippocampal and intracranial volumes
Bair et al. Multivariable modeling of phenotypic risk factors for first-onset TMD: the OPPERA prospective cohort study
KR101542529B1 (en) Examination methods of the bio-marker of allele
Wiggs et al. The NEIGHBOR consortium primary open-angle glaucoma genome-wide association study: rationale, study design, and clinical variables
Spencer et al. Using genetic variation and environmental risk factor data to identify individuals at high risk for age-related macular degeneration
Gramer et al. Results of a patient-directed survey on frequency of family history of glaucoma in 2170 patients
KR20140061223A (en) System and method for detecting disease markers by reverse classification using allelic depth, signal intensity and quality score of ngs and snpchip
Schwartz et al. Genetics and age-related macular degeneration: a practical review for the clinician
CN113272912A (en) Methods and apparatus for phenotype-driven clinical genomics using likelihood ratio paradigm
CN108629148A (en) The genome analytical method and device of ocular physiology information based on phenotypic analysis
KR20150024232A (en) Examination methods of the origin marker of resistance from drug resistance gene about disease
Brasil Filho et al. Towards the early diagnosis of Alzheimer’s disease via a multicriteria classification model
Sundaramurthy et al. Homozygosity mapping guided next generation sequencing to identify the causative genetic variation in inherited retinal degenerative diseases
RU2699284C2 (en) System and method of interpreting data and providing recommendations to user based on genetic data thereof and data on composition of intestinal microbiota
Marchini et al. Genome gender diversity in affected sib‐pairs with familial vesico‐ureteric reflux identified by single nucleotide polymorphism linkage analysis
Gibson et al. Inherited genetic variation and overall survival following follicular lymphoma
JP2008272510A (en) Program, database, system and method for estimating effectiveness of treatment method
CN114783613A (en) Myopia prediction analysis method
CN113270144A (en) Phenotype-based gene priority ordering method and electronic equipment
KR102344631B1 (en) Method and device for measuring the risk of dyslipidemia associated with cold genetics
Goldin et al. Sampling strategies for linkage studies
JP4284050B2 (en) Program, database, system and method for predicting effectiveness of treatment
JP7064215B2 (en) How to determine the risk of developing desquamation syndrome or desquamation glaucoma
KR101818103B1 (en) Apparatus and method for companion diagnosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190213

Address after: 518000 Guangdong Province Dapeng New District Kwai Chung Street Life Science Industrial Park A11 Building 201

Applicant after: Shenzhen Letu Biotechnology Co., Ltd.

Address before: 518000 Shenzhen Nanshan District, Shenzhen City, Guangdong Province, Shahexi Road, Shenzhen Bay Science and Technology Eco-Park, 2 C Blocks, 9 Floors

Applicant before: Shenzhen paradise precision medical technology Co., Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181009