CN109994154A - A kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes - Google Patents
A kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes Download PDFInfo
- Publication number
- CN109994154A CN109994154A CN201811633860.3A CN201811633860A CN109994154A CN 109994154 A CN109994154 A CN 109994154A CN 201811633860 A CN201811633860 A CN 201811633860A CN 109994154 A CN109994154 A CN 109994154A
- Authority
- CN
- China
- Prior art keywords
- variation
- gene
- module
- candidate
- disease
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to a kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes, which includes: family information analysis module, variation data obtaining module, variation information comparison module, hereditary pattern determination module, candidate locus annotations module and result screening module.
Description
Technical field
The present invention relates to genetic test field more particularly to a kind of monogenic disease recessive genetic disorder candidate disease causing genes
Screening plant and method.
Background technique
In human genome, the disease as caused by a pair of variation allele is known as monogenic inheritance disease
(monogenic disorder).The type disease has more than 6600 kinds, and with annual 10~50 kinds of speed increase.Single-gene
Hereditary disease constitutes biggish threat to human health, wherein more typical the type disease has protanopia anerythrochloropsia, hemophilia, albefaction
Disease etc..Mankind's monogenic inheritance disease is broadly divided into the single gene inheritance disease for meeting mendel's law and does not follow Mendel
The single gene inheritance disease two major classes of law of inheritance, the former includes autosomal inheritance, X linkage inheritance and y linkage heredity, the latter
Including imprinted gene disease and mitochondrial disease.The mutant gene of single gene inheritance disease can occur in reproduction cell or fertilized eggs and
The embryo of early stage, the hereditary information of variation can form the disease of certain abnormal character through being overexpressed.
Autosomal dominant inherited disease refers in two allele being located on autosome that single allele is sent out
It changes different, can show morbidity.This kind of disease up to more than 1700 kinds, such as familial multiple polyposis, refer to and refer to.
Its Genetic lineages feature is: 1, heredity is unrelated with gender, and men and women's morbidity has equal opportunities;2, patient parents have a side often to suffer from
Person;If 3, parents are disease-free, children do not fall ill generally;4, patient is often heterozygous, if with normal person's marriage, children's illness
Probability is 50%;5, the heredity of common constant generations.Pathogenic strain gene sometimes due to internal and external environment influence, heterozygote individual
It carries pathogenic strain gene not express, i.e. incomplete penetrance.The genepenetrance of autosomal dominant inherited disease is 60%~90%.
Autosomal recessive hereditary diseases Disease-causing gene is the recessive gene on autosome, when recessive gene homozygosis
It could fall ill.Recessive inheritance patient is the offspring given birth to by two carrier mostly, therefore the children for the person that is more common in consanguineous mating.
About more than 1200 kinds of this kind of disease are had determined that, such as congenital deafmutism, albinism, phenylketonuria.Heterozygous recessiveness Disease-causing gene is taken
Band person, itself does not express corresponding character, but Disease-causing gene can be transmitted to offspring.Filial generation has 1/4 probability illness, children's illness
Probability is impartial.Such as glycogenosis I type, patient cannot be decomposed into 6- due to lacking G-6-Pase, glycogen in vivo
Glucose 1-phosphate1- and cause glycogen deposition, only genotype be gg individual, just show as causing a disease.When a couple is to carry
When person (genotype Gg), there is 1/4 probability illness in offspring, the ratio of normal individual and patient are 3:1.Many heredity
The disease of metabolic disorder belongs to autosomal recessive hereditary diseases.According to " gene, an enzyme " (one gene one enzyme)
Or the concept of " cistron, a polypeptide " (one cistron one polypeptide), the enzyme of these Inherited Metabolic Disorders
Or the exception of protein molecular, the exception from respective encoding gene.The pedigree feature of autosomal recessive hereditary diseases: 1, Nan Nvfa
Disease has equal opportunities, and falls ill unrelated with gender;2, parents are disease-free carrier, and children's incidence rate is 25%;3, it can't see continuous
Several generations heredity, but visible throwback or hereditary every number generation;4, when consanguineous mating, recessive hereditary disease illness rate greatly increases in children,
This is because the people having relationship by blood is easier to carry the same variation allele.As phenylketonuria is wedded at random in crowd
Timing, disease incidence 1:14500;Cousin's marriage is then 1:1700.Systemic albinism disease incidence in crowd is 1:
40000;Cousin's marriage is then 1:3600.
Currently, the analysis method of single-gene recessive genetic disorder candidate gene the following steps are included: 1, is determined according to character
Candidate gene;2, obtain the sequence information for having candidate gene;3, analyze the variation information of candidate gene;4, determine candidate gene
Whether be single-gene recessive genetic disorder Disease-causing gene.
Summary of the invention
People judge the disease controlled by individual gene generally according to mendel's law and family disease condition at present
Whether disease is recessive inheritance mode, i.e., whether the Disease-causing gene is recessive gene.This method low efficiency, heavy workload, and one
As be appropriate only for judgement to a few gene, and cannot achieve high-throughput operation.
Present invention is primarily aimed at providing a kind of detection within the scope of full-length genome to meet the gene of recessive inheritance mode,
To realize the purpose of high-throughput, efficient screening Disease-causing gene.In order to realize the purpose, the present invention provides a kind of combinations two
The analytical equipment and method of the positioning of single-gene recessive genetic disorder candidate gene are carried out for sequencing approach.
The present invention relates to the screening plants of single-gene recessive genetic disorder candidate disease causing genes comprising:
1. a kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes, wherein described device includes:
Family information collection module: for collecting the disease condition of each member in family to be analyzed;.
Make a variation data obtaining module: for obtaining the Varia nt in exon site information of each member in family to be analyzed;
Make a variation information comparison module: for the change to each member of family to be analyzed obtained by variation data obtaining module
The genotype for taking union, and judging each member in family is compared in ectopic sites information;
Hereditary pattern determination module: for determining whether the hereditary pattern of variant sites meets mendelian recessive Hereditary Modules
Formula;
Candidate locus annotations module: the base being determined for compliance with using database where mendelian recessive hereditary pattern variant sites
Cause, and the influence of specific location and the variation to encoding histone on the gene;
As a result screening module, it is further for making to the variant sites for meeting mendelian recessive hereditary pattern that screening obtains
Screening, variant sites of the variant sites as last candidate after thus screening, the gene comprising these sites is as mesh
Mark candidate gene.
2. according to device described in item 1, wherein the family information analysis module further includes according to member each in family
Affiliation and disease condition draw family genetic map.
3. according to device described in item 1, wherein the variation data obtaining module resurveys sequence or complete by full-length genome
Exon group resurveys the variant sites information that sequence obtains each member of family to be analyzed.
4. according to device described in item 1, wherein variant sites information described in variation information comparison module includes: variation
The chromosome at place, the initiation site of variation, the termination site of variation, variation on reference genome genotype, after variation
Base type.
5. according to device described in item 1, wherein be determined for compliance with mendelian recessive heredity in the candidate locus annotations module
The database that gene where mode variant sites uses is ncbi database and UCSC database.
6. according to device described in item 1, wherein the candidate locus annotations module further include:
The relevant annotation of disease is carried out using COSMIC, OMIM and HGMD database;
The annotation of pathogenic prediction is carried out using SIFT, POLYPHEN, CADD and FATHMM-MKL database:
Using phyloP20way_mammalian, phastConsElements46way and gerp++elem database into
The prediction of row conservative.
7. according to device described in item 1, wherein the result screening module including the following steps:
Step A: retain the variation of exon 1 and shearing site area;
Step B: thousand human genome databases of filtering (frequency is greater than 0.01 in crowd) variant sites delete multiplicity between individual
Property site, obtain the rare variation that may be caused a disease, retain the variant sites that frequency in thousand human genome databases is lower than 0.01;
Step C: deleting synonymous variation, obtains variation influential on gene expression.
8. according to device described in item 1, wherein described device further include:
Target candidate annotation of gene function module: for being annotated using function of the database to target candidate gene;
Function is enriched with analysis module: for carrying out function enrichment point to candidate gene using Gene Ontology database
Analysis, it is whether similar to analyze these candidate gene functions, if to be enriched in function access relevant to disease;And
Access is enriched with analysis module: for utilizing Kyoto Encyclopedia of Genes and Genomes data
Library carries out access enrichment analysis to candidate gene, judges whether these candidate genes are enriched in metabolic pathway relevant to disease
In.
9. according to device described in item 8, wherein after the access enrichment analysis module further include:
Protein interaction analysis module: it is analyzed using the visual molecule reciprocation network of Cytoscape software building
Interaction between the albumen of candidate gene coding.
The effect of invention
In accordance with the invention it is possible to the data directly generated using the two generations sequencing of scientific research circle mainstream, in full-length genome range
Interior positioning and the associated candidate locus of recessive genetic disorder and candidate gene.
The present invention takes into account all variations within the scope of full-length genome, caused by avoiding because of factor and individual subjective factor partially
Tropism and omission;The variant sites that the present invention once analyzes are up to 106(or even can be higher), and it is easy to operate, it is only necessary to
Analysis personnel, which fill in several necessary parameters, can complete one-touch analysis, and high throughput is realized while convenient, fast.
The present invention is retrieved using the relevant each large database concept of human diseases, can carry out genescreen in many levels,
Multiple libraries can also be used simultaneously mutually to be proved.
The present invention also carries out enrichment analysis using multiple databases, in addition to analyzing basic candidate gene function, it is also contemplated that its
Effect and enrichment condition in different accesses determine that candidate gene provides strong evidence to be last.
Specific embodiment
Definition:
Family: number of members, kinship and the related inhereditary feature or hereditary disease for referring to a certain family are in the family
Middle distribution situation, herein, member include father, mother and the filial generation given birth to by the two, and the quantity of filial generation is one or two
More than a;
Same sense mutation: refer to since there are degeneracy phenomenons for biological genetic codon, after a certain sequence change, original
The position of certain amino acid be translated into same amino acid;
Shearing site area: the sequence and connector of introne and exon fillet in the identifiable RNA precursor of spliceosome
The region in site;
Family genetic map: different patterns is used according to different situations.General men's indicates that lady's zero indicates;□,
Zero is known as marriage line with horizontal line connection, is expressed as Mr. and Mrs;Make vertical line downwards from the nearly midpoint of marriage line, lower end connects children's note
Number, children can be arranged from left to right such as more than two people by birth order, and ■, ● indicate illness;
Mendelian recessive hereditary pattern: Disease-causing gene be it is recessive, only homozygote just shows symptom, and the something lost of the disease
Arq mode meets mendelian inheritance.
The present invention relates to a kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes, which needs to provide
The illness information of core families.The following steps are included:
Family information collection module: for collecting the disease condition of each member in family to be analyzed;.
Make a variation data obtaining module: for obtaining the Varia nt in exon site information of each member in family to be analyzed;
Make a variation information comparison module: for the change to each member of family to be analyzed obtained by variation data obtaining module
The genotype for taking union, and judging each member in family is compared in ectopic sites information;
Hereditary pattern determination module: for determining whether the hereditary pattern of variant sites meets mendelian recessive Hereditary Modules
Formula;
Candidate locus annotations module: the base being determined for compliance with using database where mendelian recessive hereditary pattern variant sites
Cause, and the influence of specific location and the variation to encoding histone on the gene;
As a result screening module, it is further for making to the variant sites for meeting mendelian recessive hereditary pattern that screening obtains
Screening, variant sites of the variant sites as last candidate after thus screening, the gene comprising these sites is as mesh
Mark candidate gene.
It collects family in family information collection module and generally includes father, mother and the filial generation given birth to by the two, including but not
It is limited to a filial generation.Herein, to be described for the typical family of parent plus a child.Ensure child and parents it
Between affiliation after, collect the disease condition of each member in family, and ensure that information is accurate.Preferably, it is closed according to relationship
System and disease condition draw family genetic map.
Variation data obtaining module: full-length genome (exon) single nucleotide variations (SNP) and small fragment of sample are obtained
Insertion and deletion makes a variation (InDel).Recommend to resurvey sequence by the full-length genome of people or full exon resurveys sequence and obtains family to be analyzed
It is the variation situation of all samples.
The analysis strategy of two generation sequencing data routines: the p- variation detection-ANNOVAR annotation-of sequencing data specific filtration resistance is subsequent
Other analyses.In order to two generation sequencing analysis method seamless interfacings, the present invention starts with detection to the result that individual variation detects
Meet the variant sites of recessive inheritance mode, to excavate candidate gene relevant to recessive genetic disorder.
Variation information comparison module: variation result individual in family is integrated, the variation of family member is taken simultaneously
Collection.One variant sites a line, former column write exactly chromosomes where variation, the initiation site of variation, variation stop bit
Point changes ectopic sites in the genotype on reference genome, the base type after change ectopic sites variation, these information are for fixed
Position variation is also used for the subsequent ANNOVAR to variant sites and annotates.Next judge that each site is in Different Individual in family
Genotype (genotype), wherein indicate with the consistent homozygous site of reference genome with 0/0, the homozygous site to make a variation is with 1/1
It indicates, heterozygous sites are indicated with 0/1, and the present invention is completed this step by exploitation program and analyzed.
Hereditary pattern determination module: according to mendel's law, recessive homozygosis variant sites are normally behaved as: Fu Muwei
Heterozygous sites, and be homozygous variant sites in diseased individuals, and different from normal sister's genotype (normal sister may be miscellaneous
Coincidence point, it is also possible to be the homozygous genotype different from illness mutant gene type).Each variant sites are judged according to mentioned above principle
Hereditary pattern whether meet mendelian recessive hereditary pattern.The variation that we detect includes SNP and two kinds of InDel.
Candidate locus annotations module: in order to guarantee annotation it is comprehensive, we simultaneously according to NCBI and UCSC database into
Row annotation, the gene where definitive variation site, and the influence of specific location and variation to protein coding on gene.
In addition to this, we also carry out the annotation of different aspect using multiple databases.Such as the relevant database COSMIC of disease,
OMIM, HGMD etc., it is pathogenic to predict relevant database SIFT, POLYPHEN, CADD, FATHMM-MKL etc., conservative prediction
Relevant database phyloP20way_mammalian, phastConsElements46way, gerp++elem etc..
As a result screening module: make further screening for the recessive inheritance variant sites that screening obtains, obtain rare change
Different, specific screening technique is as follows:
1) retain the variation of exon 1 (exonic) or shearing site area (splicing, shearing site upstream 2bp);
2) thousand human genome databases (frequency is greater than 0.01 in crowd) variant sites are filtered, diversity position between individual is deleted
Point obtains the rare variation (rare) that may really cause a disease: retaining the variation that frequency in 1000Genome database is lower than 0.01
Site;
3) synonymous variation (variation for not causing amino acid encoding to change) is deleted, is obtained influential on gene expression product
Variation (variation of reservation includes frameshit and non-frameshift variant in InDel).Variant sites after screening are as last candidate
Variant sites, the gene comprising these sites is as target candidate gene.
Herein, the screening plant further include:
Target candidate annotation of gene function module: target candidate gene function is annotated using database.
Function is enriched with analysis module: it is rich to carry out function to candidate gene using Gene Ontology (abbreviation GO) database
Set analysis, it is whether similar to analyze these candidate gene functions, if to be enriched in function access relevant to disease.
Access is enriched with analysis module: (referred to as using Kyoto Encyclopedia of Genes and Genomes
KEGG) database carries out access enrichment analysis to candidate gene, sees whether these candidate genes are enriched in generation relevant to disease
It thanks in access.
Further, further includes:
Protein interaction analysis module: it using the Cytoscape software of open source software, constructs visual molecule interaction and makees
With the interaction between the albumen of network analysis candidate gene coding.
Embodiment
Embodiment is directed to above-mentioned family information collection module, due to the fast development of sequencing technologies, current main sequencing
Technology -- not only speed is fast, accuracy is high for the sequencing of two generations, but also flux is very big, therefore heredity letter is parsed within the scope of full-length genome
Breath becomes quick and cheap, and the conventional means for having become multiple field of scientific studies were sequenced in two generations.The present invention was also based on for two generations
What sequencing technologies carried out.The details of family to be analyzed is determined first, and family here generally includes father, mother and by two
The raw filial generation of person, if certainly can also be with there are two above filial generation.For convenience, in the present specification, we add one with parent
Subsequent narration is carried out for the typical family of a child.The information for needing to confirm is as follows:
Firstly, to determine the genetic connection between child and parents.Below several situation discomforts share the inventive method into
Row analysis: 1) it for the child that adoptes, can not be analyzed with the family that child and adoptive parent are constituted;2) individual for three
Core families, if one of parents genotype can not be obtained or can not determine illness (be suitable for parent early in the morning
It dies, and disease time is than later situation) can not be analyzed;3) for only having one of parents' and multiple filial generations
Hereditary information and disease condition can be analyzed, but the false positive of result can be relatively high;Secondly, to specify in family respectively at
The disease condition of member, i.e. progress phenotypic evaluation.For there is the disease of clinical manifestation, phenotype can be directly judged;For no clinic
The disease of performance then needs to determine phenotype by Physico-chemical tests substance in vivo content;Again, the invention is only for recessive inheritance
Disease carries out the positioning of candidate gene, therefore it is necessary for meeting " recessive inheritance mode ", so firstly the need of the something lost for judging disease
Whether arq mode meets recessive inheritance mode.According to mendel's law it is found that recessive hereditary disease shows in parents
Normally, and diseased individuals and normal individual exist in filial generation.It must be parent if phenotypic evaluation has only done a generation
The equal phenotype of parents is normal, and one of filial generation illness;If the phenotype in mostly generation has been identified, the disease condition mostly for people can use
Gone to judge whether it is recessive inheritance mode according to mendel's law, in the present embodiment, be with parent it is normal, children be it is small
For the family of deformity of ear phenotype.
In variation data obtaining module, the variation of the covering whole gene group of eligible family member is obtained, here
Variation include single nucleotide variations (SNP) and small fragment insertion and deletion variation (InDel).The method usually obtained is as follows:
Firstly, the DNA (other tissue samples can also with) in each member's blood of family is extracted, then according to the different situations of DNA sample
Different methods can be used and carry out DNA extraction;Then, DNA is broken at random using automatic focused sound waves sample process instrument short
Piece segment DNA.Suitable connector is added on the DNA fragmentation interrupted at random, will be connected with the DNA fragmentation of connector using round pcr into
Row enrichment, the DNA fragmentation after taking enrichment are purified.In the present invention, it is surveyed using library of the high-flux sequence to preparation
Sequence, platform specific can choose the Hiseq2000 of illumina company, Hiseq2500, Miseq, Netseq500, Xten etc.
It (needs to carry out exon trapping if it is sequencing of extron group, there is no special want to sequencing type and Insert Fragment size
It asks).Although the sequencing mode of different microarray datasets is slightly different, finally can after obtain surveying from the DNA of sample to be tested
Ordinal number evidence.Sequencing result includes certain original lower machine sequences (Raw Reads), and Raw Reads includes low-quality sequence, connector
Sequence and Clean Reads of the sequence, base ratio containing N of pollution greater than 5%.Ratio shared by CleanReads is higher,
The quality of data is better.For the quality for guaranteeing follow-up analysis, original series are filtered, the Clean of high quality is obtained
Reads, then carry out subsequent analysis.Data processing step is as follows:
(1) removal connector pollution Reads (connector pollution base number be greater than 5bp Reads, for both-end be sequenced, if
One end is polluted by connector, then removes the Reads at both ends);
(2) remove low-quality Reads (50% or more Reads of the total base of base Zhan of mass value Q≤19, for
Both-end sequencing can remove both ends Reads if one end is low quality Reads);
(3) removal base ratio containing N greater than 5% Reads (both-end is sequenced, if one end ratio containing N greater than 5%,
Both ends Reads can be removed).
Filter out it is above-mentioned easily cause the Reads of analytical error after, using UCSC hg19 with reference to genome as reference sequences,
Using software (being recommended to use BWA) is compared, filtered Clean Reads is compared onto reference genome, BAM is obtained
(http://samtools.github.io/hts-specs/SAMv1.pdf) format comparison destination file.Use Samtools
(Li et al., 2009) software is ranked up BAM file, only retains the unique comparison result of sequence, recycles Picard
(http://broadinstitute.github.io/picard/) marks the Duplicationread in comparison result, simultaneously
Part also is carried out to the sequence around InDel using GATK (McKenna, A.et al., 2000) to compare again, reduces SNP inspection
False positive is surveyed, the comparison result BAM file for the detection that makes a variation of high accuracy is obtained.For exon sequencing analysis, need
It assesses sample by index values such as statistics target area capture rate, comparison rate and coverages, build library and sequencing etc.
Quality condition wherein Uniq Rate can intuitively reflect unique ratio compared to sequence on reference genome,
DuplicationRate reflection sample builds library quality.It is soft by mutation analysis on the basis of comparing to reference to genome sequence
Part GATK therefrom extracts potential polymorphism SNP and the site InDel all in full-length genome, further according to mass value, depth, repetition
Property etc. factors do further filtering screening, finally obtain high confidence level SNP and InDel variation data set.
In variation information comparison module, the variant sites obtained in data obtaining module that make a variation are the knots of individual detection
Fruit, rather than crowd surveillance as a result, therefore variation information comparison module in the variation of sample each in family is integrated.Due to
That export when detecting the variant sites of single body with GATK is only mutational site (non-gvcf), and each individual has certainly
The unique variant sites of body, still need to integrate the variant sites of entire family and determine last genotype (genotype).
First by the variant sites of sample each in family according to chromosome number, initial position, final position, with reference to genome base type,
Several indexs such as mutating alkali yl type take intersection to obtain a total variant sites set M1.And all individuals exist in final finishing family
The genotype of each variant sites in M1, wherein being indicated with the consistent homozygous site of reference genome with 0/0, the site of homozygous mutation
It is indicated with 1/1, heterozygous sites are indicated with 0/1.In order to fast and efficiently execute, the present invention completes this by exploitation program
Step analysis.
In step hereditary pattern determination module, hereditary pattern judgement is carried out.It is recessive homozygous according to mendel's law
Mutational site normally behaves as: parent is heterozygous sites, and in diseased individuals be homozygous mutation site, and with normal sister's base
Because type is different (normal sister may be heterozygous sites, it is also possible to be the homozygous genotype different from illness mutated-genotype).According to
Judge whether the hereditary pattern of each variant sites meets recessive inheritance mode according to mentioned above principle.The full-length genome of people is resurveyed
Ordinal number evidence, the sum of above-mentioned SNP and InDel variation are usually 105-106It is a.Even full exon data, two kinds detected become
It is different also 104-105In level, it is impossible to which site is gone to judge its hereditary pattern by naked eyes one by one.The present invention develops accordingly
Program come complete this step analysis.
In candidate locus annotations module, by the analysis of the program in hereditary pattern determination module, we are accorded with
All variant sites of recessive inheritance mode are closed, different location of these sites throughout genome: 1, being likely to be on gene,
It is likely to be at intergenic region;2, it is likely to be on exon, it is also possible in including sub-district or shearing site;3, it may cause
Identical mutation, it is also possible to cause nonsynonymous mutation.In order to determine these variations location and its coding albumen on gene
Influence etc., ANNOVAR annotation is carried out to the variation obtained by S4 step.In order to guarantee the comprehensive of annotation, we are simultaneously
It is annotated using NCBI and UCSC database, the gene where definitive variation site, and in the upper specific location of gene.
Such as variant sites 1 are located on No. 1 chromosome, the initiation site of the variation is 120611960, and termination site is 120611960,
Base type with reference to genome is C, and the base type after mutation is T, which is located at NOTCH2 on gene, positioned at gene
On the exon of position, which is nonsynonymous mutation, which leads to the 21st ammonia on first exon of NOTCH2 gene
Base acid is mutated into T by A.In addition to this, we also carry out the annotation of different aspect using multiple databases.Such as disease is relevant
Database COSMIC, OMIM, HGMD etc., it is pathogenic to predict relevant database SIFT, POLYPHEN, CADD, FATHMM-MKL
Deng conservative prediction relevant database phyloP20way_mammalian, phastConsElements46way, gerp++
Elem etc..
For result screening module, due to be mutated be heritable variation basic source, whether eucaryote or original
The mutation of core biology, also whether what kind of mutation all have the characteristics such as randomness, low frequency and invertibity.So simultaneously
Not every mutation can cause human diseases.And the mutation screened by hereditary pattern determination module is obtained in screening
Recessive inheritance mutational site in make further screening, obtain rare mutation, specific screening technique is as follows:
1) it is possible to cause the change of protein function due to being located at the variation of exon 1 and shearing site, and most of diseases
Disease is all caused by changing due to protein function, however is located at intergenic region and to include the variation of sub-district be not influence mostly
Function, it in order to further reduce the range of functional site, therefore is screened to the result of S4 is further, retains exon 1
(exonic) or the variation of shearing site area (splicing, shearing site upstream 2bp);
2) due to thousand human genome databases include be normal person data, therefore in the variation position of the database medium-high frequency
Point will not be the pathogenic sites of monogenic disease, therefore carry out following filtering: (frequency is big in crowd for thousand human genome databases of filtering
In 0.01) variant sites, diversity point between individual is deleted, the rare mutation (rare) that may really cause a disease is obtained, retained
Frequency is lower than 0.01 mutational site in 1000Genome database;
3) synonymous variation (mutation for not causing amino acid encoding to change) does not influence protein sequence, so removal
Synonymous variation, obtaining variation influential on gene expression product, (mutation of reservation includes that frameshit in InDel and non-frameshit are prominent
Become).
Variant sites after screening as last candidate mutational site, wait as target by the gene comprising these sites
Select gene.
The results show that the sample exists, there are 1717 (A/C) to make a variation for SIX2 genome on No. 2 chromosomes, No. 1 chromosome
There are 109268573 (T/C) to make a variation by upper FNDC7, and there are 195452951 (C/G) to make a variation by MUC20 on No. 3 chromosomes;No. 5 dyeing
140229086 (C/G) variation, 141336264 (G/T) variation of gene PCDHA9, PCDH12 on body;On No. 7 chromosomes
There are 21207000 (C/G) to make a variation by IFNA10, and there are 125830970 (A/T), No. 15 chromosomes for No. 11 chromosome CDON genes
ADAMTS7 is there are 79058445 (G/A), the variation of the several genes of variation of KIZ, BPIFA2 of No. 20 chromosome etc. 40, these
Genetic mutation is of great significance for research microtia.
The preferred embodiment of the present invention has shown and described in above description, as previously described, it should be understood that the present invention is not office
Be limited to form disclosed herein, should not be regarded as an exclusion of other examples, and can be used for various other combinations, modification and
Environment, and can be changed within that scope of the inventive concept describe herein by the above teachings or related fields of technology or knowledge
It is dynamic.And changes and modifications made by those skilled in the art do not depart from the spirit and scope of the present invention, then it all should be appended by the present invention
In scope of protection of the claims.
Claims (9)
1. a kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes, which is characterized in that described device includes:
Family information collection module: for collecting the disease condition of each member in family to be analyzed;.
Make a variation data obtaining module: for obtaining the Varia nt in exon site information of each member in family to be analyzed;
Make a variation information comparison module: for the variation position to each member of family to be analyzed obtained by variation data obtaining module
The genotype for taking union, and judging each member in family is compared in point information;
Hereditary pattern determination module: for determining whether the hereditary pattern of variant sites meets mendelian recessive hereditary pattern;
Candidate locus annotations module: the gene being determined for compliance with using database where mendelian recessive hereditary pattern variant sites,
And the influence of specific location and the variation to encoding histone on the gene;
As a result screening module, for making further sieve to the variant sites for meeting mendelian recessive hereditary pattern that screening obtains
Choosing, the variant sites after thus screening as last candidate variant sites, wait as target by the gene comprising these sites
Select gene.
2. the apparatus according to claim 1, which is characterized in that the family information analysis module further includes according in family
The affiliation and disease condition of each member draws family genetic map.
3. the apparatus according to claim 1, which is characterized in that the variation data obtaining module is resurveyed by full-length genome
Sequence or full exon group resurvey the variant sites information that sequence obtains each member of family to be analyzed.
4. the apparatus according to claim 1, which is characterized in that variant sites packet described in variation information comparison module
It includes: genotype of the termination site, variation of chromosome, the initiation site of variation, variation where making a variation on reference genome,
Base type after variation.
5. the apparatus according to claim 1, which is characterized in that be determined for compliance with Mendel in the candidate locus annotations module
The database that gene where recessive inheritance mode variant sites uses is ncbi database and UCSC database.
6. the apparatus according to claim 1, which is characterized in that the candidate locus annotations module further include:
The relevant annotation of disease is carried out using COSMIC, OMIM and HGMD database;
The annotation of pathogenic prediction is carried out using SIFT, POLYPHEN, CADD and FATHMM-MKL database:
It is protected using phyloP20way_mammalian, phastConsElements46way and gerp++elem database
The prediction of keeping property.
7. the apparatus according to claim 1, which is characterized in that the result screening module including the following steps:
Step A: retain the variation of exon 1 and shearing site area;
Step B: thousand human genome databases of filtering (frequency is greater than 0.01 in crowd) variant sites delete diversity position between individual
Point obtains the rare variation that may be caused a disease, and retains the variant sites that frequency in thousand human genome databases is lower than 0.01;
Step C: deleting synonymous variation, obtains variation influential on gene expression.
8. the apparatus according to claim 1, which is characterized in that described device further include:
Target candidate annotation of gene function module: for being annotated using function of the database to target candidate gene;
Function is enriched with analysis module: for carrying out function enrichment analysis to candidate gene using Gene Ontology database, with
Whether similar analyze these candidate gene functions, if be enriched in function access relevant to disease;And
Access is enriched with analysis module: for utilizing Kyoto Encyclopedia of Genes and Genomes database pair
Candidate gene carries out access enrichment analysis, judges whether these candidate genes are enriched in metabolic pathway relevant to disease.
9. device according to claim 8, which is characterized in that after the access enrichment analysis module further include:
Protein interaction analysis module: it is analyzed using the visual molecule reciprocation network of Cytoscape software building candidate
Interaction between the albumen of gene coding.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711488527 | 2017-12-30 | ||
CN2017114885273 | 2017-12-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109994154A true CN109994154A (en) | 2019-07-09 |
CN109994154B CN109994154B (en) | 2021-07-13 |
Family
ID=67129860
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811633860.3A Active CN109994154B (en) | 2017-12-30 | 2018-12-29 | Screening device for candidate pathogenic genes of monogenic recessive genetic disease |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109994154B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110544537A (en) * | 2019-07-29 | 2019-12-06 | 北京荣之联科技股份有限公司 | Generation method of single-gene genetic disease gene analysis report and electronic equipment thereof |
CN110648722A (en) * | 2019-09-19 | 2020-01-03 | 北京市儿科研究所 | Device for evaluating neonatal genetic disease risk |
CN110993031A (en) * | 2019-11-07 | 2020-04-10 | 广州医科大学附属第三医院(广州重症孕产妇救治中心、广州柔济医院) | Analysis method, analysis device, apparatus and storage medium for autism candidate gene |
CN111139291A (en) * | 2020-01-14 | 2020-05-12 | 首都医科大学附属北京安贞医院 | High-throughput sequencing analysis method for monogenic hereditary diseases |
CN112908412A (en) * | 2021-02-10 | 2021-06-04 | 北京贝瑞和康生物技术有限公司 | Methods, devices and media for compounding the applicability of heterozygous variant pathogenic evidence |
CN113436681A (en) * | 2021-07-05 | 2021-09-24 | 温州谱希医学检验实验室有限公司 | Method and related equipment for performing associated statistical test on low-frequency variation and target disease |
CN113611361A (en) * | 2021-08-10 | 2021-11-05 | 飞科易特(广州)基因科技有限公司 | Matching method of single-gene autosomal recessive genetic disease for marriage and love matching |
CN115798579A (en) * | 2023-02-08 | 2023-03-14 | 智泽童康(广州)生物科技有限公司 | Evidence judgment method, system, device and medium for genetic variation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005017143A1 (en) * | 2003-08-13 | 2005-02-24 | Garvan Institute Of Medical Research | Diagnosis and treatment of neurodegenerative disorders, involving the microtubule associated protein tau (mapt) gene |
CN102206701A (en) * | 2010-09-19 | 2011-10-05 | 深圳华大基因科技有限公司 | Identification method for genetic disease-related gene |
CN103290137A (en) * | 2013-06-26 | 2013-09-11 | 北京迈基诺基因科技有限责任公司 | Screening method of tumor susceptibility gene |
US20130317006A1 (en) * | 2012-05-24 | 2013-11-28 | Amy Yasko | Use of polymorphisms for identifying individuals at risk of developing autism |
CN103667438A (en) * | 2013-01-07 | 2014-03-26 | 赵晨 | Method for screening HRDs disease-causing mutation and gene chip hybridization probe designing method involved in same |
CN104789572A (en) * | 2015-05-06 | 2015-07-22 | 南京医科大学 | GPRASP2 mutant type gene as well as identification method and detection kit thereof |
CN105779463A (en) * | 2014-12-25 | 2016-07-20 | 深圳华大基因研究院 | VPS13B gene mutant and application thereof |
CN105838720A (en) * | 2015-01-14 | 2016-08-10 | 中国人民解放军总医院 | PTPRQ gene mutant and application thereof |
US20160326586A1 (en) * | 2013-10-18 | 2016-11-10 | The Hospital For Sick Children | Method of determining disease causality of genome mutations |
CN106156538A (en) * | 2016-06-29 | 2016-11-23 | 天津诺禾医学检验所有限公司 | The annotation method of a kind of full-length genome variation data and annotation system |
-
2018
- 2018-12-29 CN CN201811633860.3A patent/CN109994154B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005017143A1 (en) * | 2003-08-13 | 2005-02-24 | Garvan Institute Of Medical Research | Diagnosis and treatment of neurodegenerative disorders, involving the microtubule associated protein tau (mapt) gene |
CN102206701A (en) * | 2010-09-19 | 2011-10-05 | 深圳华大基因科技有限公司 | Identification method for genetic disease-related gene |
US20130317006A1 (en) * | 2012-05-24 | 2013-11-28 | Amy Yasko | Use of polymorphisms for identifying individuals at risk of developing autism |
CN103667438A (en) * | 2013-01-07 | 2014-03-26 | 赵晨 | Method for screening HRDs disease-causing mutation and gene chip hybridization probe designing method involved in same |
CN103290137A (en) * | 2013-06-26 | 2013-09-11 | 北京迈基诺基因科技有限责任公司 | Screening method of tumor susceptibility gene |
US20160326586A1 (en) * | 2013-10-18 | 2016-11-10 | The Hospital For Sick Children | Method of determining disease causality of genome mutations |
CN105779463A (en) * | 2014-12-25 | 2016-07-20 | 深圳华大基因研究院 | VPS13B gene mutant and application thereof |
CN105838720A (en) * | 2015-01-14 | 2016-08-10 | 中国人民解放军总医院 | PTPRQ gene mutant and application thereof |
CN104789572A (en) * | 2015-05-06 | 2015-07-22 | 南京医科大学 | GPRASP2 mutant type gene as well as identification method and detection kit thereof |
CN106156538A (en) * | 2016-06-29 | 2016-11-23 | 天津诺禾医学检验所有限公司 | The annotation method of a kind of full-length genome variation data and annotation system |
Non-Patent Citations (4)
Title |
---|
YOSHITSUGU AOKI等: "Challenges for antisense oligonucleotide-based therapeutics, in particular for exon 51-skipping in Duchenne muscular dystrophy", 《2011 FOURTH INTERNATIONAL CONFERENCE ON MODELING, SIMULATION AND APPLIED OPTIMIZATION》 * |
乔瑞敏: "以家猪为模型解析先天性外耳发育畸形的遗传机制", 《中国博士学位论文全文数据库 医药卫生科技辑》 * |
左宇志: "Notch信号通路与先天性脊柱侧凸的遗传学病因研究", 《中国博士学位论文全文数据库 医药卫生科技辑》 * |
黎籽秀等: "高通量测序数据分析和临床诊断流程的解读", 《中国循证儿科杂志》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110544537A (en) * | 2019-07-29 | 2019-12-06 | 北京荣之联科技股份有限公司 | Generation method of single-gene genetic disease gene analysis report and electronic equipment thereof |
CN110648722A (en) * | 2019-09-19 | 2020-01-03 | 北京市儿科研究所 | Device for evaluating neonatal genetic disease risk |
CN110648722B (en) * | 2019-09-19 | 2022-05-31 | 首都医科大学附属北京儿童医院 | Device for evaluating neonatal genetic disease risk |
CN110993031A (en) * | 2019-11-07 | 2020-04-10 | 广州医科大学附属第三医院(广州重症孕产妇救治中心、广州柔济医院) | Analysis method, analysis device, apparatus and storage medium for autism candidate gene |
CN111139291A (en) * | 2020-01-14 | 2020-05-12 | 首都医科大学附属北京安贞医院 | High-throughput sequencing analysis method for monogenic hereditary diseases |
CN112908412A (en) * | 2021-02-10 | 2021-06-04 | 北京贝瑞和康生物技术有限公司 | Methods, devices and media for compounding the applicability of heterozygous variant pathogenic evidence |
CN113436681A (en) * | 2021-07-05 | 2021-09-24 | 温州谱希医学检验实验室有限公司 | Method and related equipment for performing associated statistical test on low-frequency variation and target disease |
CN113436681B (en) * | 2021-07-05 | 2022-02-25 | 温州谱希医学检验实验室有限公司 | Method and related equipment for performing associated statistical test on low-frequency variation and target disease |
CN113611361A (en) * | 2021-08-10 | 2021-11-05 | 飞科易特(广州)基因科技有限公司 | Matching method of single-gene autosomal recessive genetic disease for marriage and love matching |
CN113611361B (en) * | 2021-08-10 | 2023-08-08 | 飞科易特(广州)基因科技有限公司 | Matching method for single-gene autosomal recessive genetic disease for wedding love matching |
CN115798579A (en) * | 2023-02-08 | 2023-03-14 | 智泽童康(广州)生物科技有限公司 | Evidence judgment method, system, device and medium for genetic variation |
Also Published As
Publication number | Publication date |
---|---|
CN109994154B (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109994154A (en) | A kind of screening plant of single-gene recessive genetic disorder candidate disease causing genes | |
Marwaha et al. | A guide for the diagnosis of rare and undiagnosed disease: beyond the exome | |
CA3018186C (en) | Genetic variant-phenotype analysis system and methods of use | |
CN104762402B (en) | Method for rapidly detecting human genome single base mutation and micro-insertion deletion | |
CN102770558B (en) | The analysis of Fetal genome is carried out by maternal biological sample | |
US20160224722A1 (en) | Methods of Selection, Reporting and Analysis of Genetic Markers Using Broad-Based Genetic Profiling Applications | |
CN106021984A (en) | Whole-exome sequencing data analysis system | |
CN106575321A (en) | Methods and systems for genome analysis | |
CN104657628A (en) | Proton-based transcriptome sequencing data comparison and analysis method and system | |
WO2017143585A1 (en) | Method and apparatus for assembling separated long fragment sequences | |
CN110189796A (en) | A kind of sheep full-length genome resurveys sequence analysis method | |
CN111139291A (en) | High-throughput sequencing analysis method for monogenic hereditary diseases | |
CN115052994A (en) | Method for determining base type of predetermined site in chromosome of embryonic cell and application thereof | |
Borges-Monroy et al. | Whole-genome analysis reveals the contribution of non-coding de novo transposon insertions to autism spectrum disorder | |
Hopkins et al. | Phenotypic screening models for rapid diagnosis of genetic variants and discovery of personalized therapeutics | |
Sezerman et al. | Bioinformatics workflows for genomic variant discovery, interpretation and prioritization | |
CN117230175B (en) | Embryo preimplantation genetics detection method based on third generation sequencing | |
Crowgey et al. | An integrated approach for analyzing clinical genomic variant data from next-generation sequencing | |
Cormier et al. | Combining genetic constraint with predictions of alternative splicing to prioritize deleterious splicing in rare disease studies | |
Pal et al. | Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype‐weighted knowledge in the CAGI SickKids5 clinical genomes challenge | |
Guo et al. | stLFRsv: a germline structural variant analysis pipeline using co-barcoded reads | |
Löytynoja | Thousands of human mutation clusters are explained by short-range template switching | |
Wu | Detection of aberrant events in RNA for clinical diagnostics | |
Lee | Prioritizing SNPs for disease-gene association studies: algorithms and systems | |
Weber | Integrating Diverse Technologies for Genomic Variant Discovery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |