CN109994154A - A kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes - Google Patents

A kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes Download PDF

Info

Publication number
CN109994154A
CN109994154A CN201811633860.3A CN201811633860A CN109994154A CN 109994154 A CN109994154 A CN 109994154A CN 201811633860 A CN201811633860 A CN 201811633860A CN 109994154 A CN109994154 A CN 109994154A
Authority
CN
China
Prior art keywords
variation
gene
module
candidate
disease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811633860.3A
Other languages
Chinese (zh)
Other versions
CN109994154B (en
Inventor
周扬
刘涛
蒋德志
玄兆伶
李大为
梁峻彬
陈重建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ANNOROAD GENETIC TECHNOLOGY (BEIJING) Co Ltd
Original Assignee
ANNOROAD GENETIC TECHNOLOGY (BEIJING) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ANNOROAD GENETIC TECHNOLOGY (BEIJING) Co Ltd filed Critical ANNOROAD GENETIC TECHNOLOGY (BEIJING) Co Ltd
Publication of CN109994154A publication Critical patent/CN109994154A/en
Application granted granted Critical
Publication of CN109994154B publication Critical patent/CN109994154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes, which includes: family information analysis module, variation data obtaining module, variation information comparison module, hereditary pattern determination module, candidate locus annotations module and result screening module.

Description

A kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes
Technical field
The present invention relates to genetic test field more particularly to a kind of monogenic disease recessive genetic disorder candidate disease causing genes Screening plant and method.
Background technique
In human genome, the disease as caused by a pair of variation allele is known as monogenic inheritance disease (monogenic disorder).The type disease has more than 6600 kinds, and with annual 10~50 kinds of speed increase.Single-gene Hereditary disease constitutes biggish threat to human health, wherein more typical the type disease has protanopia anerythrochloropsia, hemophilia, albefaction Disease etc..Mankind's monogenic inheritance disease is broadly divided into the single gene inheritance disease for meeting mendel's law and does not follow Mendel The single gene inheritance disease two major classes of law of inheritance, the former includes autosomal inheritance, X linkage inheritance and y linkage heredity, the latter Including imprinted gene disease and mitochondrial disease.The mutant gene of single gene inheritance disease can occur in reproduction cell or fertilized eggs and The embryo of early stage, the hereditary information of variation can form the disease of certain abnormal character through being overexpressed.
Autosomal dominant inherited disease refers in two allele being located on autosome that single allele is sent out It changes different, can show morbidity.This kind of disease up to more than 1700 kinds, such as familial multiple polyposis, refer to and refer to. Its Genetic lineages feature is: 1, heredity is unrelated with gender, and men and women's morbidity has equal opportunities;2, patient parents have a side often to suffer from Person;If 3, parents are disease-free, children do not fall ill generally;4, patient is often heterozygous, if with normal person's marriage, children's illness Probability is 50%;5, the heredity of common constant generations.Pathogenic strain gene sometimes due to internal and external environment influence, heterozygote individual It carries pathogenic strain gene not express, i.e. incomplete penetrance.The genepenetrance of autosomal dominant inherited disease is 60%~90%.
Autosomal recessive hereditary diseases Disease-causing gene is the recessive gene on autosome, when recessive gene homozygosis It could fall ill.Recessive inheritance patient is the offspring given birth to by two carrier mostly, therefore the children for the person that is more common in consanguineous mating. About more than 1200 kinds of this kind of disease are had determined that, such as congenital deafmutism, albinism, phenylketonuria.Heterozygous recessiveness Disease-causing gene is taken Band person, itself does not express corresponding character, but Disease-causing gene can be transmitted to offspring.Filial generation has 1/4 probability illness, children's illness Probability is impartial.Such as glycogenosis I type, patient cannot be decomposed into 6- due to lacking G-6-Pase, glycogen in vivo Glucose 1-phosphate1- and cause glycogen deposition, only genotype be gg individual, just show as causing a disease.When a couple is to carry When person (genotype Gg), there is 1/4 probability illness in offspring, the ratio of normal individual and patient are 3:1.Many heredity The disease of metabolic disorder belongs to autosomal recessive hereditary diseases.According to " gene, an enzyme " (one gene one enzyme) Or the concept of " cistron, a polypeptide " (one cistron one polypeptide), the enzyme of these Inherited Metabolic Disorders Or the exception of protein molecular, the exception from respective encoding gene.The pedigree feature of autosomal recessive hereditary diseases: 1, Nan Nvfa Disease has equal opportunities, and falls ill unrelated with gender;2, parents are disease-free carrier, and children's incidence rate is 25%;3, it can't see continuous Several generations heredity, but visible throwback or hereditary every number generation;4, when consanguineous mating, recessive hereditary disease illness rate greatly increases in children, This is because the people having relationship by blood is easier to carry the same variation allele.As phenylketonuria is wedded at random in crowd Timing, disease incidence 1:14500;Cousin's marriage is then 1:1700.Systemic albinism disease incidence in crowd is 1: 40000;Cousin's marriage is then 1:3600.
Currently, the analysis method of single-gene recessive genetic disorder candidate gene the following steps are included: 1, is determined according to character Candidate gene;2, obtain the sequence information for having candidate gene;3, analyze the variation information of candidate gene;4, determine candidate gene Whether be single-gene recessive genetic disorder Disease-causing gene.
Summary of the invention
People judge the disease controlled by individual gene generally according to mendel's law and family disease condition at present Whether disease is recessive inheritance mode, i.e., whether the Disease-causing gene is recessive gene.This method low efficiency, heavy workload, and one As be appropriate only for judgement to a few gene, and cannot achieve high-throughput operation.
Present invention is primarily aimed at providing a kind of detection within the scope of full-length genome to meet the gene of recessive inheritance mode, To realize the purpose of high-throughput, efficient screening Disease-causing gene.In order to realize the purpose, the present invention provides a kind of combinations two The analytical equipment and method of the positioning of single-gene recessive genetic disorder candidate gene are carried out for sequencing approach.
The present invention relates to the screening plants of single-gene recessive genetic disorder candidate disease causing genes comprising:
1. a kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes, wherein described device includes:
Family information collection module: for collecting the disease condition of each member in family to be analyzed;.
Make a variation data obtaining module: for obtaining the Varia nt in exon site information of each member in family to be analyzed;
Make a variation information comparison module: for the change to each member of family to be analyzed obtained by variation data obtaining module The genotype for taking union, and judging each member in family is compared in ectopic sites information;
Hereditary pattern determination module: for determining whether the hereditary pattern of variant sites meets mendelian recessive Hereditary Modules Formula;
Candidate locus annotations module: the base being determined for compliance with using database where mendelian recessive hereditary pattern variant sites Cause, and the influence of specific location and the variation to encoding histone on the gene;
As a result screening module, it is further for making to the variant sites for meeting mendelian recessive hereditary pattern that screening obtains Screening, variant sites of the variant sites as last candidate after thus screening, the gene comprising these sites is as mesh Mark candidate gene.
2. according to device described in item 1, wherein the family information analysis module further includes according to member each in family Affiliation and disease condition draw family genetic map.
3. according to device described in item 1, wherein the variation data obtaining module resurveys sequence or complete by full-length genome Exon group resurveys the variant sites information that sequence obtains each member of family to be analyzed.
4. according to device described in item 1, wherein variant sites information described in variation information comparison module includes: variation The chromosome at place, the initiation site of variation, the termination site of variation, variation on reference genome genotype, after variation Base type.
5. according to device described in item 1, wherein be determined for compliance with mendelian recessive heredity in the candidate locus annotations module The database that gene where mode variant sites uses is ncbi database and UCSC database.
6. according to device described in item 1, wherein the candidate locus annotations module further include:
The relevant annotation of disease is carried out using COSMIC, OMIM and HGMD database;
The annotation of pathogenic prediction is carried out using SIFT, POLYPHEN, CADD and FATHMM-MKL database:
Using phyloP20way_mammalian, phastConsElements46way and gerp++elem database into The prediction of row conservative.
7. according to device described in item 1, wherein the result screening module including the following steps:
Step A: retain the variation of exon 1 and shearing site area;
Step B: thousand human genome databases of filtering (frequency is greater than 0.01 in crowd) variant sites delete multiplicity between individual Property site, obtain the rare variation that may be caused a disease, retain the variant sites that frequency in thousand human genome databases is lower than 0.01;
Step C: deleting synonymous variation, obtains variation influential on gene expression.
8. according to device described in item 1, wherein described device further include:
Target candidate annotation of gene function module: for being annotated using function of the database to target candidate gene;
Function is enriched with analysis module: for carrying out function enrichment point to candidate gene using Gene Ontology database Analysis, it is whether similar to analyze these candidate gene functions, if to be enriched in function access relevant to disease;And
Access is enriched with analysis module: for utilizing Kyoto Encyclopedia of Genes and Genomes data Library carries out access enrichment analysis to candidate gene, judges whether these candidate genes are enriched in metabolic pathway relevant to disease In.
9. according to device described in item 8, wherein after the access enrichment analysis module further include:
Protein interaction analysis module: it is analyzed using the visual molecule reciprocation network of Cytoscape software building Interaction between the albumen of candidate gene coding.
The effect of invention
In accordance with the invention it is possible to the data directly generated using the two generations sequencing of scientific research circle mainstream, in full-length genome range Interior positioning and the associated candidate locus of recessive genetic disorder and candidate gene.
The present invention takes into account all variations within the scope of full-length genome, caused by avoiding because of factor and individual subjective factor partially Tropism and omission;The variant sites that the present invention once analyzes are up to 106(or even can be higher), and it is easy to operate, it is only necessary to Analysis personnel, which fill in several necessary parameters, can complete one-touch analysis, and high throughput is realized while convenient, fast.
The present invention is retrieved using the relevant each large database concept of human diseases, can carry out genescreen in many levels, Multiple libraries can also be used simultaneously mutually to be proved.
The present invention also carries out enrichment analysis using multiple databases, in addition to analyzing basic candidate gene function, it is also contemplated that its Effect and enrichment condition in different accesses determine that candidate gene provides strong evidence to be last.
Specific embodiment
Definition:
Family: number of members, kinship and the related inhereditary feature or hereditary disease for referring to a certain family are in the family Middle distribution situation, herein, member include father, mother and the filial generation given birth to by the two, and the quantity of filial generation is one or two More than a;
Same sense mutation: refer to since there are degeneracy phenomenons for biological genetic codon, after a certain sequence change, original The position of certain amino acid be translated into same amino acid;
Shearing site area: the sequence and connector of introne and exon fillet in the identifiable RNA precursor of spliceosome The region in site;
Family genetic map: different patterns is used according to different situations.General men's indicates that lady's zero indicates;□, Zero is known as marriage line with horizontal line connection, is expressed as Mr. and Mrs;Make vertical line downwards from the nearly midpoint of marriage line, lower end connects children's note Number, children can be arranged from left to right such as more than two people by birth order, and ■, ● indicate illness;
Mendelian recessive hereditary pattern: Disease-causing gene be it is recessive, only homozygote just shows symptom, and the something lost of the disease Arq mode meets mendelian inheritance.
The present invention relates to a kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes, which needs to provide The illness information of core families.The following steps are included:
Family information collection module: for collecting the disease condition of each member in family to be analyzed;.
Make a variation data obtaining module: for obtaining the Varia nt in exon site information of each member in family to be analyzed;
Make a variation information comparison module: for the change to each member of family to be analyzed obtained by variation data obtaining module The genotype for taking union, and judging each member in family is compared in ectopic sites information;
Hereditary pattern determination module: for determining whether the hereditary pattern of variant sites meets mendelian recessive Hereditary Modules Formula;
Candidate locus annotations module: the base being determined for compliance with using database where mendelian recessive hereditary pattern variant sites Cause, and the influence of specific location and the variation to encoding histone on the gene;
As a result screening module, it is further for making to the variant sites for meeting mendelian recessive hereditary pattern that screening obtains Screening, variant sites of the variant sites as last candidate after thus screening, the gene comprising these sites is as mesh Mark candidate gene.
It collects family in family information collection module and generally includes father, mother and the filial generation given birth to by the two, including but not It is limited to a filial generation.Herein, to be described for the typical family of parent plus a child.Ensure child and parents it Between affiliation after, collect the disease condition of each member in family, and ensure that information is accurate.Preferably, it is closed according to relationship System and disease condition draw family genetic map.
Variation data obtaining module: full-length genome (exon) single nucleotide variations (SNP) and small fragment of sample are obtained Insertion and deletion makes a variation (InDel).Recommend to resurvey sequence by the full-length genome of people or full exon resurveys sequence and obtains family to be analyzed It is the variation situation of all samples.
The analysis strategy of two generation sequencing data routines: the p- variation detection-ANNOVAR annotation-of sequencing data specific filtration resistance is subsequent Other analyses.In order to two generation sequencing analysis method seamless interfacings, the present invention starts with detection to the result that individual variation detects Meet the variant sites of recessive inheritance mode, to excavate candidate gene relevant to recessive genetic disorder.
Variation information comparison module: variation result individual in family is integrated, the variation of family member is taken simultaneously Collection.One variant sites a line, former column write exactly chromosomes where variation, the initiation site of variation, variation stop bit Point changes ectopic sites in the genotype on reference genome, the base type after change ectopic sites variation, these information are for fixed Position variation is also used for the subsequent ANNOVAR to variant sites and annotates.Next judge that each site is in Different Individual in family Genotype (genotype), wherein indicate with the consistent homozygous site of reference genome with 0/0, the homozygous site to make a variation is with 1/1 It indicates, heterozygous sites are indicated with 0/1, and the present invention is completed this step by exploitation program and analyzed.
Hereditary pattern determination module: according to mendel's law, recessive homozygosis variant sites are normally behaved as: Fu Muwei Heterozygous sites, and be homozygous variant sites in diseased individuals, and different from normal sister's genotype (normal sister may be miscellaneous Coincidence point, it is also possible to be the homozygous genotype different from illness mutant gene type).Each variant sites are judged according to mentioned above principle Hereditary pattern whether meet mendelian recessive hereditary pattern.The variation that we detect includes SNP and two kinds of InDel.
Candidate locus annotations module: in order to guarantee annotation it is comprehensive, we simultaneously according to NCBI and UCSC database into Row annotation, the gene where definitive variation site, and the influence of specific location and variation to protein coding on gene. In addition to this, we also carry out the annotation of different aspect using multiple databases.Such as the relevant database COSMIC of disease, OMIM, HGMD etc., it is pathogenic to predict relevant database SIFT, POLYPHEN, CADD, FATHMM-MKL etc., conservative prediction Relevant database phyloP20way_mammalian, phastConsElements46way, gerp++elem etc..
As a result screening module: make further screening for the recessive inheritance variant sites that screening obtains, obtain rare change Different, specific screening technique is as follows:
1) retain the variation of exon 1 (exonic) or shearing site area (splicing, shearing site upstream 2bp);
2) thousand human genome databases (frequency is greater than 0.01 in crowd) variant sites are filtered, diversity position between individual is deleted Point obtains the rare variation (rare) that may really cause a disease: retaining the variation that frequency in 1000Genome database is lower than 0.01 Site;
3) synonymous variation (variation for not causing amino acid encoding to change) is deleted, is obtained influential on gene expression product Variation (variation of reservation includes frameshit and non-frameshift variant in InDel).Variant sites after screening are as last candidate Variant sites, the gene comprising these sites is as target candidate gene.
Herein, the screening plant further include:
Target candidate annotation of gene function module: target candidate gene function is annotated using database.
Function is enriched with analysis module: it is rich to carry out function to candidate gene using Gene Ontology (abbreviation GO) database Set analysis, it is whether similar to analyze these candidate gene functions, if to be enriched in function access relevant to disease.
Access is enriched with analysis module: (referred to as using Kyoto Encyclopedia of Genes and Genomes KEGG) database carries out access enrichment analysis to candidate gene, sees whether these candidate genes are enriched in generation relevant to disease It thanks in access.
Further, further includes:
Protein interaction analysis module: it using the Cytoscape software of open source software, constructs visual molecule interaction and makees With the interaction between the albumen of network analysis candidate gene coding.
Embodiment
Embodiment is directed to above-mentioned family information collection module, due to the fast development of sequencing technologies, current main sequencing Technology -- not only speed is fast, accuracy is high for the sequencing of two generations, but also flux is very big, therefore heredity letter is parsed within the scope of full-length genome Breath becomes quick and cheap, and the conventional means for having become multiple field of scientific studies were sequenced in two generations.The present invention was also based on for two generations What sequencing technologies carried out.The details of family to be analyzed is determined first, and family here generally includes father, mother and by two The raw filial generation of person, if certainly can also be with there are two above filial generation.For convenience, in the present specification, we add one with parent Subsequent narration is carried out for the typical family of a child.The information for needing to confirm is as follows:
Firstly, to determine the genetic connection between child and parents.Below several situation discomforts share the inventive method into Row analysis: 1) it for the child that adoptes, can not be analyzed with the family that child and adoptive parent are constituted;2) individual for three Core families, if one of parents genotype can not be obtained or can not determine illness (be suitable for parent early in the morning It dies, and disease time is than later situation) can not be analyzed;3) for only having one of parents' and multiple filial generations Hereditary information and disease condition can be analyzed, but the false positive of result can be relatively high;Secondly, to specify in family respectively at The disease condition of member, i.e. progress phenotypic evaluation.For there is the disease of clinical manifestation, phenotype can be directly judged;For no clinic The disease of performance then needs to determine phenotype by Physico-chemical tests substance in vivo content;Again, the invention is only for recessive inheritance Disease carries out the positioning of candidate gene, therefore it is necessary for meeting " recessive inheritance mode ", so firstly the need of the something lost for judging disease Whether arq mode meets recessive inheritance mode.According to mendel's law it is found that recessive hereditary disease shows in parents Normally, and diseased individuals and normal individual exist in filial generation.It must be parent if phenotypic evaluation has only done a generation The equal phenotype of parents is normal, and one of filial generation illness;If the phenotype in mostly generation has been identified, the disease condition mostly for people can use Gone to judge whether it is recessive inheritance mode according to mendel's law, in the present embodiment, be with parent it is normal, children be it is small For the family of deformity of ear phenotype.
In variation data obtaining module, the variation of the covering whole gene group of eligible family member is obtained, here Variation include single nucleotide variations (SNP) and small fragment insertion and deletion variation (InDel).The method usually obtained is as follows: Firstly, the DNA (other tissue samples can also with) in each member's blood of family is extracted, then according to the different situations of DNA sample Different methods can be used and carry out DNA extraction;Then, DNA is broken at random using automatic focused sound waves sample process instrument short Piece segment DNA.Suitable connector is added on the DNA fragmentation interrupted at random, will be connected with the DNA fragmentation of connector using round pcr into Row enrichment, the DNA fragmentation after taking enrichment are purified.In the present invention, it is surveyed using library of the high-flux sequence to preparation Sequence, platform specific can choose the Hiseq2000 of illumina company, Hiseq2500, Miseq, Netseq500, Xten etc. It (needs to carry out exon trapping if it is sequencing of extron group, there is no special want to sequencing type and Insert Fragment size It asks).Although the sequencing mode of different microarray datasets is slightly different, finally can after obtain surveying from the DNA of sample to be tested Ordinal number evidence.Sequencing result includes certain original lower machine sequences (Raw Reads), and Raw Reads includes low-quality sequence, connector Sequence and Clean Reads of the sequence, base ratio containing N of pollution greater than 5%.Ratio shared by CleanReads is higher, The quality of data is better.For the quality for guaranteeing follow-up analysis, original series are filtered, the Clean of high quality is obtained Reads, then carry out subsequent analysis.Data processing step is as follows:
(1) removal connector pollution Reads (connector pollution base number be greater than 5bp Reads, for both-end be sequenced, if One end is polluted by connector, then removes the Reads at both ends);
(2) remove low-quality Reads (50% or more Reads of the total base of base Zhan of mass value Q≤19, for Both-end sequencing can remove both ends Reads if one end is low quality Reads);
(3) removal base ratio containing N greater than 5% Reads (both-end is sequenced, if one end ratio containing N greater than 5%, Both ends Reads can be removed).
Filter out it is above-mentioned easily cause the Reads of analytical error after, using UCSC hg19 with reference to genome as reference sequences, Using software (being recommended to use BWA) is compared, filtered Clean Reads is compared onto reference genome, BAM is obtained (http://samtools.github.io/hts-specs/SAMv1.pdf) format comparison destination file.Use Samtools (Li et al., 2009) software is ranked up BAM file, only retains the unique comparison result of sequence, recycles Picard (http://broadinstitute.github.io/picard/) marks the Duplicationread in comparison result, simultaneously Part also is carried out to the sequence around InDel using GATK (McKenna, A.et al., 2000) to compare again, reduces SNP inspection False positive is surveyed, the comparison result BAM file for the detection that makes a variation of high accuracy is obtained.For exon sequencing analysis, need It assesses sample by index values such as statistics target area capture rate, comparison rate and coverages, build library and sequencing etc. Quality condition wherein Uniq Rate can intuitively reflect unique ratio compared to sequence on reference genome, DuplicationRate reflection sample builds library quality.It is soft by mutation analysis on the basis of comparing to reference to genome sequence Part GATK therefrom extracts potential polymorphism SNP and the site InDel all in full-length genome, further according to mass value, depth, repetition Property etc. factors do further filtering screening, finally obtain high confidence level SNP and InDel variation data set.
In variation information comparison module, the variant sites obtained in data obtaining module that make a variation are the knots of individual detection Fruit, rather than crowd surveillance as a result, therefore variation information comparison module in the variation of sample each in family is integrated.Due to That export when detecting the variant sites of single body with GATK is only mutational site (non-gvcf), and each individual has certainly The unique variant sites of body, still need to integrate the variant sites of entire family and determine last genotype (genotype). First by the variant sites of sample each in family according to chromosome number, initial position, final position, with reference to genome base type, Several indexs such as mutating alkali yl type take intersection to obtain a total variant sites set M1.And all individuals exist in final finishing family The genotype of each variant sites in M1, wherein being indicated with the consistent homozygous site of reference genome with 0/0, the site of homozygous mutation It is indicated with 1/1, heterozygous sites are indicated with 0/1.In order to fast and efficiently execute, the present invention completes this by exploitation program Step analysis.
In step hereditary pattern determination module, hereditary pattern judgement is carried out.It is recessive homozygous according to mendel's law Mutational site normally behaves as: parent is heterozygous sites, and in diseased individuals be homozygous mutation site, and with normal sister's base Because type is different (normal sister may be heterozygous sites, it is also possible to be the homozygous genotype different from illness mutated-genotype).According to Judge whether the hereditary pattern of each variant sites meets recessive inheritance mode according to mentioned above principle.The full-length genome of people is resurveyed Ordinal number evidence, the sum of above-mentioned SNP and InDel variation are usually 105-106It is a.Even full exon data, two kinds detected become It is different also 104-105In level, it is impossible to which site is gone to judge its hereditary pattern by naked eyes one by one.The present invention develops accordingly Program come complete this step analysis.
In candidate locus annotations module, by the analysis of the program in hereditary pattern determination module, we are accorded with All variant sites of recessive inheritance mode are closed, different location of these sites throughout genome: 1, being likely to be on gene, It is likely to be at intergenic region;2, it is likely to be on exon, it is also possible in including sub-district or shearing site;3, it may cause Identical mutation, it is also possible to cause nonsynonymous mutation.In order to determine these variations location and its coding albumen on gene Influence etc., ANNOVAR annotation is carried out to the variation obtained by S4 step.In order to guarantee the comprehensive of annotation, we are simultaneously It is annotated using NCBI and UCSC database, the gene where definitive variation site, and in the upper specific location of gene. Such as variant sites 1 are located on No. 1 chromosome, the initiation site of the variation is 120611960, and termination site is 120611960, Base type with reference to genome is C, and the base type after mutation is T, which is located at NOTCH2 on gene, positioned at gene On the exon of position, which is nonsynonymous mutation, which leads to the 21st ammonia on first exon of NOTCH2 gene Base acid is mutated into T by A.In addition to this, we also carry out the annotation of different aspect using multiple databases.Such as disease is relevant Database COSMIC, OMIM, HGMD etc., it is pathogenic to predict relevant database SIFT, POLYPHEN, CADD, FATHMM-MKL Deng conservative prediction relevant database phyloP20way_mammalian, phastConsElements46way, gerp++ Elem etc..
For result screening module, due to be mutated be heritable variation basic source, whether eucaryote or original The mutation of core biology, also whether what kind of mutation all have the characteristics such as randomness, low frequency and invertibity.So simultaneously Not every mutation can cause human diseases.And the mutation screened by hereditary pattern determination module is obtained in screening Recessive inheritance mutational site in make further screening, obtain rare mutation, specific screening technique is as follows:
1) it is possible to cause the change of protein function due to being located at the variation of exon 1 and shearing site, and most of diseases Disease is all caused by changing due to protein function, however is located at intergenic region and to include the variation of sub-district be not influence mostly Function, it in order to further reduce the range of functional site, therefore is screened to the result of S4 is further, retains exon 1 (exonic) or the variation of shearing site area (splicing, shearing site upstream 2bp);
2) due to thousand human genome databases include be normal person data, therefore in the variation position of the database medium-high frequency Point will not be the pathogenic sites of monogenic disease, therefore carry out following filtering: (frequency is big in crowd for thousand human genome databases of filtering In 0.01) variant sites, diversity point between individual is deleted, the rare mutation (rare) that may really cause a disease is obtained, retained Frequency is lower than 0.01 mutational site in 1000Genome database;
3) synonymous variation (mutation for not causing amino acid encoding to change) does not influence protein sequence, so removal Synonymous variation, obtaining variation influential on gene expression product, (mutation of reservation includes that frameshit in InDel and non-frameshit are prominent Become).
Variant sites after screening as last candidate mutational site, wait as target by the gene comprising these sites Select gene.
The results show that the sample exists, there are 1717 (A/C) to make a variation for SIX2 genome on No. 2 chromosomes, No. 1 chromosome There are 109268573 (T/C) to make a variation by upper FNDC7, and there are 195452951 (C/G) to make a variation by MUC20 on No. 3 chromosomes;No. 5 dyeing 140229086 (C/G) variation, 141336264 (G/T) variation of gene PCDHA9, PCDH12 on body;On No. 7 chromosomes There are 21207000 (C/G) to make a variation by IFNA10, and there are 125830970 (A/T), No. 15 chromosomes for No. 11 chromosome CDON genes ADAMTS7 is there are 79058445 (G/A), the variation of the several genes of variation of KIZ, BPIFA2 of No. 20 chromosome etc. 40, these Genetic mutation is of great significance for research microtia.
The preferred embodiment of the present invention has shown and described in above description, as previously described, it should be understood that the present invention is not office Be limited to form disclosed herein, should not be regarded as an exclusion of other examples, and can be used for various other combinations, modification and Environment, and can be changed within that scope of the inventive concept describe herein by the above teachings or related fields of technology or knowledge It is dynamic.And changes and modifications made by those skilled in the art do not depart from the spirit and scope of the present invention, then it all should be appended by the present invention In scope of protection of the claims.

Claims (9)

1. a kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes, which is characterized in that described device includes:
Family information collection module: for collecting the disease condition of each member in family to be analyzed;.
Make a variation data obtaining module: for obtaining the Varia nt in exon site information of each member in family to be analyzed;
Make a variation information comparison module: for the variation position to each member of family to be analyzed obtained by variation data obtaining module The genotype for taking union, and judging each member in family is compared in point information;
Hereditary pattern determination module: for determining whether the hereditary pattern of variant sites meets mendelian recessive hereditary pattern;
Candidate locus annotations module: the gene being determined for compliance with using database where mendelian recessive hereditary pattern variant sites, And the influence of specific location and the variation to encoding histone on the gene;
As a result screening module, for making further sieve to the variant sites for meeting mendelian recessive hereditary pattern that screening obtains Choosing, the variant sites after thus screening as last candidate variant sites, wait as target by the gene comprising these sites Select gene.
2. the apparatus according to claim 1, which is characterized in that the family information analysis module further includes according in family The affiliation and disease condition of each member draws family genetic map.
3. the apparatus according to claim 1, which is characterized in that the variation data obtaining module is resurveyed by full-length genome Sequence or full exon group resurvey the variant sites information that sequence obtains each member of family to be analyzed.
4. the apparatus according to claim 1, which is characterized in that variant sites packet described in variation information comparison module It includes: genotype of the termination site, variation of chromosome, the initiation site of variation, variation where making a variation on reference genome, Base type after variation.
5. the apparatus according to claim 1, which is characterized in that be determined for compliance with Mendel in the candidate locus annotations module The database that gene where recessive inheritance mode variant sites uses is ncbi database and UCSC database.
6. the apparatus according to claim 1, which is characterized in that the candidate locus annotations module further include:
The relevant annotation of disease is carried out using COSMIC, OMIM and HGMD database;
The annotation of pathogenic prediction is carried out using SIFT, POLYPHEN, CADD and FATHMM-MKL database:
It is protected using phyloP20way_mammalian, phastConsElements46way and gerp++elem database The prediction of keeping property.
7. the apparatus according to claim 1, which is characterized in that the result screening module including the following steps:
Step A: retain the variation of exon 1 and shearing site area;
Step B: thousand human genome databases of filtering (frequency is greater than 0.01 in crowd) variant sites delete diversity position between individual Point obtains the rare variation that may be caused a disease, and retains the variant sites that frequency in thousand human genome databases is lower than 0.01;
Step C: deleting synonymous variation, obtains variation influential on gene expression.
8. the apparatus according to claim 1, which is characterized in that described device further include:
Target candidate annotation of gene function module: for being annotated using function of the database to target candidate gene;
Function is enriched with analysis module: for carrying out function enrichment analysis to candidate gene using Gene Ontology database, with Whether similar analyze these candidate gene functions, if be enriched in function access relevant to disease;And
Access is enriched with analysis module: for utilizing Kyoto Encyclopedia of Genes and Genomes database pair Candidate gene carries out access enrichment analysis, judges whether these candidate genes are enriched in metabolic pathway relevant to disease.
9. device according to claim 8, which is characterized in that after the access enrichment analysis module further include:
Protein interaction analysis module: it is analyzed using the visual molecule reciprocation network of Cytoscape software building candidate Interaction between the albumen of gene coding.
CN201811633860.3A 2017-12-30 2018-12-29 Screening device for candidate pathogenic genes of monogenic recessive genetic disease Active CN109994154B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711488527 2017-12-30
CN2017114885273 2017-12-30

Publications (2)

Publication Number Publication Date
CN109994154A true CN109994154A (en) 2019-07-09
CN109994154B CN109994154B (en) 2021-07-13

Family

ID=67129860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811633860.3A Active CN109994154B (en) 2017-12-30 2018-12-29 Screening device for candidate pathogenic genes of monogenic recessive genetic disease

Country Status (1)

Country Link
CN (1) CN109994154B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110544537A (en) * 2019-07-29 2019-12-06 北京荣之联科技股份有限公司 Generation method of single-gene genetic disease gene analysis report and electronic equipment thereof
CN110648722A (en) * 2019-09-19 2020-01-03 北京市儿科研究所 Device for evaluating neonatal genetic disease risk
CN110993031A (en) * 2019-11-07 2020-04-10 广州医科大学附属第三医院(广州重症孕产妇救治中心、广州柔济医院) Analysis method, analysis device, apparatus and storage medium for autism candidate gene
CN111139291A (en) * 2020-01-14 2020-05-12 首都医科大学附属北京安贞医院 High-throughput sequencing analysis method for monogenic hereditary diseases
CN112908412A (en) * 2021-02-10 2021-06-04 北京贝瑞和康生物技术有限公司 Methods, devices and media for compounding the applicability of heterozygous variant pathogenic evidence
CN113436681A (en) * 2021-07-05 2021-09-24 温州谱希医学检验实验室有限公司 Method and related equipment for performing associated statistical test on low-frequency variation and target disease
CN113611361A (en) * 2021-08-10 2021-11-05 飞科易特(广州)基因科技有限公司 Matching method of single-gene autosomal recessive genetic disease for marriage and love matching
CN115798579A (en) * 2023-02-08 2023-03-14 智泽童康(广州)生物科技有限公司 Evidence judgment method, system, device and medium for genetic variation

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005017143A1 (en) * 2003-08-13 2005-02-24 Garvan Institute Of Medical Research Diagnosis and treatment of neurodegenerative disorders, involving the microtubule associated protein tau (mapt) gene
CN102206701A (en) * 2010-09-19 2011-10-05 深圳华大基因科技有限公司 Identification method for genetic disease-related gene
CN103290137A (en) * 2013-06-26 2013-09-11 北京迈基诺基因科技有限责任公司 Screening method of tumor susceptibility gene
US20130317006A1 (en) * 2012-05-24 2013-11-28 Amy Yasko Use of polymorphisms for identifying individuals at risk of developing autism
CN103667438A (en) * 2013-01-07 2014-03-26 赵晨 Method for screening HRDs disease-causing mutation and gene chip hybridization probe designing method involved in same
CN104789572A (en) * 2015-05-06 2015-07-22 南京医科大学 GPRASP2 mutant type gene as well as identification method and detection kit thereof
CN105779463A (en) * 2014-12-25 2016-07-20 深圳华大基因研究院 VPS13B gene mutant and application thereof
CN105838720A (en) * 2015-01-14 2016-08-10 中国人民解放军总医院 PTPRQ gene mutant and application thereof
US20160326586A1 (en) * 2013-10-18 2016-11-10 The Hospital For Sick Children Method of determining disease causality of genome mutations
CN106156538A (en) * 2016-06-29 2016-11-23 天津诺禾医学检验所有限公司 The annotation method of a kind of full-length genome variation data and annotation system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005017143A1 (en) * 2003-08-13 2005-02-24 Garvan Institute Of Medical Research Diagnosis and treatment of neurodegenerative disorders, involving the microtubule associated protein tau (mapt) gene
CN102206701A (en) * 2010-09-19 2011-10-05 深圳华大基因科技有限公司 Identification method for genetic disease-related gene
US20130317006A1 (en) * 2012-05-24 2013-11-28 Amy Yasko Use of polymorphisms for identifying individuals at risk of developing autism
CN103667438A (en) * 2013-01-07 2014-03-26 赵晨 Method for screening HRDs disease-causing mutation and gene chip hybridization probe designing method involved in same
CN103290137A (en) * 2013-06-26 2013-09-11 北京迈基诺基因科技有限责任公司 Screening method of tumor susceptibility gene
US20160326586A1 (en) * 2013-10-18 2016-11-10 The Hospital For Sick Children Method of determining disease causality of genome mutations
CN105779463A (en) * 2014-12-25 2016-07-20 深圳华大基因研究院 VPS13B gene mutant and application thereof
CN105838720A (en) * 2015-01-14 2016-08-10 中国人民解放军总医院 PTPRQ gene mutant and application thereof
CN104789572A (en) * 2015-05-06 2015-07-22 南京医科大学 GPRASP2 mutant type gene as well as identification method and detection kit thereof
CN106156538A (en) * 2016-06-29 2016-11-23 天津诺禾医学检验所有限公司 The annotation method of a kind of full-length genome variation data and annotation system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YOSHITSUGU AOKI等: "Challenges for antisense oligonucleotide-based therapeutics, in particular for exon 51-skipping in Duchenne muscular dystrophy", 《2011 FOURTH INTERNATIONAL CONFERENCE ON MODELING, SIMULATION AND APPLIED OPTIMIZATION》 *
乔瑞敏: "以家猪为模型解析先天性外耳发育畸形的遗传机制", 《中国博士学位论文全文数据库 医药卫生科技辑》 *
左宇志: "Notch信号通路与先天性脊柱侧凸的遗传学病因研究", 《中国博士学位论文全文数据库 医药卫生科技辑》 *
黎籽秀等: "高通量测序数据分析和临床诊断流程的解读", 《中国循证儿科杂志》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110544537A (en) * 2019-07-29 2019-12-06 北京荣之联科技股份有限公司 Generation method of single-gene genetic disease gene analysis report and electronic equipment thereof
CN110648722A (en) * 2019-09-19 2020-01-03 北京市儿科研究所 Device for evaluating neonatal genetic disease risk
CN110648722B (en) * 2019-09-19 2022-05-31 首都医科大学附属北京儿童医院 Device for evaluating neonatal genetic disease risk
CN110993031A (en) * 2019-11-07 2020-04-10 广州医科大学附属第三医院(广州重症孕产妇救治中心、广州柔济医院) Analysis method, analysis device, apparatus and storage medium for autism candidate gene
CN111139291A (en) * 2020-01-14 2020-05-12 首都医科大学附属北京安贞医院 High-throughput sequencing analysis method for monogenic hereditary diseases
CN112908412A (en) * 2021-02-10 2021-06-04 北京贝瑞和康生物技术有限公司 Methods, devices and media for compounding the applicability of heterozygous variant pathogenic evidence
CN113436681A (en) * 2021-07-05 2021-09-24 温州谱希医学检验实验室有限公司 Method and related equipment for performing associated statistical test on low-frequency variation and target disease
CN113436681B (en) * 2021-07-05 2022-02-25 温州谱希医学检验实验室有限公司 Method and related equipment for performing associated statistical test on low-frequency variation and target disease
CN113611361A (en) * 2021-08-10 2021-11-05 飞科易特(广州)基因科技有限公司 Matching method of single-gene autosomal recessive genetic disease for marriage and love matching
CN113611361B (en) * 2021-08-10 2023-08-08 飞科易特(广州)基因科技有限公司 Matching method for single-gene autosomal recessive genetic disease for wedding love matching
CN115798579A (en) * 2023-02-08 2023-03-14 智泽童康(广州)生物科技有限公司 Evidence judgment method, system, device and medium for genetic variation

Also Published As

Publication number Publication date
CN109994154B (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN109994154A (en) A kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes
Marwaha et al. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome
CA3018186C (en) Genetic variant-phenotype analysis system and methods of use
CN104762402B (en) Method for rapidly detecting human genome single base mutation and micro-insertion deletion
CN102770558B (en) The analysis of Fetal genome is carried out by maternal biological sample
US20160224722A1 (en) Methods of Selection, Reporting and Analysis of Genetic Markers Using Broad-Based Genetic Profiling Applications
CN106021984A (en) Whole-exome sequencing data analysis system
CN106575321A (en) Methods and systems for genome analysis
CN104657628A (en) Proton-based transcriptome sequencing data comparison and analysis method and system
WO2017143585A1 (en) Method and apparatus for assembling separated long fragment sequences
CN110189796A (en) A kind of sheep full-length genome resurveys sequence analysis method
CN111139291A (en) High-throughput sequencing analysis method for monogenic hereditary diseases
CN115052994A (en) Method for determining base type of predetermined site in chromosome of embryonic cell and application thereof
Borges-Monroy et al. Whole-genome analysis reveals the contribution of non-coding de novo transposon insertions to autism spectrum disorder
Hopkins et al. Phenotypic screening models for rapid diagnosis of genetic variants and discovery of personalized therapeutics
Sezerman et al. Bioinformatics workflows for genomic variant discovery, interpretation and prioritization
CN117230175B (en) Embryo preimplantation genetics detection method based on third generation sequencing
Crowgey et al. An integrated approach for analyzing clinical genomic variant data from next-generation sequencing
Cormier et al. Combining genetic constraint with predictions of alternative splicing to prioritize deleterious splicing in rare disease studies
Pal et al. Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype‐weighted knowledge in the CAGI SickKids5 clinical genomes challenge
Guo et al. stLFRsv: a germline structural variant analysis pipeline using co-barcoded reads
Löytynoja Thousands of human mutation clusters are explained by short-range template switching
Wu Detection of aberrant events in RNA for clinical diagnostics
Lee Prioritizing SNPs for disease-gene association studies: algorithms and systems
Weber Integrating Diverse Technologies for Genomic Variant Discovery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant