CN109033751A - A kind of function prediction method of noncoding region mononucleotide genome mutation - Google Patents

A kind of function prediction method of noncoding region mononucleotide genome mutation Download PDF

Info

Publication number
CN109033751A
CN109033751A CN201810804405.9A CN201810804405A CN109033751A CN 109033751 A CN109033751 A CN 109033751A CN 201810804405 A CN201810804405 A CN 201810804405A CN 109033751 A CN109033751 A CN 109033751A
Authority
CN
China
Prior art keywords
transcription factor
gene
site
prediction method
mononucleotide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810804405.9A
Other languages
Chinese (zh)
Other versions
CN109033751B (en
Inventor
刘宏德
孙啸
罗坤
马伟恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201810804405.9A priority Critical patent/CN109033751B/en
Publication of CN109033751A publication Critical patent/CN109033751A/en
Application granted granted Critical
Publication of CN109033751B publication Critical patent/CN109033751B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a kind of function prediction methods of noncoding region mononucleotide genome mutation, comprising the following steps: 1) chromatin opening region identifies;2) Binding site for transcription factor identifies;3) effect of single nucleotide variations: the site-specific frequency matrix based on transcription factor is assessed, the influence that the single nucleotide variations being located in Binding site for transcription factor area combine the transcription factor factor is calculated, identification significantly changes the single nucleotide variations of transcription factor binding ability;Further by checking the target gene biological pathways of transcription factor, the effect of single nucleotide variations is assessed.This method disposably completes the identification of a variety of transcription factors and its binding site by chromatin open zone information and gene expression information, and realizes the functional annotation of non-coding region gene group variation.

Description

A kind of function prediction method of noncoding region mononucleotide genome mutation
Technical field
The invention belongs to gene technology fields, and in particular to a kind of function of noncoding region mononucleotide genome mutation is pre- Survey method, the present invention are based on transcription factor in chromatin open zone high-flux sequence information identification eukaryotic gene groups The method of (transcription factor, TF) and its binding site, and the die body based on transcription factor combination DNA (motif) method of the influence of single nucleotide variations is assessed.
Background technique
The all living things function and feature of cell are associated with the transcriptional regulatory of gene, and transcriptional regulatory has cell type Specificity, and it is extremely close with differentiation and Carcinogenesis relationship, it is a key for parsing cell rule, cracking cancer problem. The transcriptional regulatory of gene is analyzed, top priority is knot of the various transcription factors (TF) on genomic DNA in cell to be identified Coincidence point (TFBS), that is, determine which kind of transcription factor has been incorporated in what position of genome, regulates and controls the transcription of which kind of gene.Currently, The full-length genome high throughput assay of transcription factor and its binding site is mainly co-precipitated sequencing (ChIP- by chromatin immune Seq) experiment is to realize.ChIP-Seq utilizes the antibody of transcription factor, on chromatin, identifies the base in conjunction with the transcription factor Because of a group DNA, these subsequent DNA are isolated and purified, then by two generation sequencing technologies (NGS) measure these DNA fragmentations (read, Read base sequence), compares finally by by money order receipt to be signed and returned to the sender, identifies the position of these DNA fragmentations in the genome, and then determine and turn Record the binding site of the factor.The shortcomings that the method, is that primary experiment can only measure a kind of combining information of transcription factor, cost It is big time-consuming more.
The processes such as transcriptional regulatory is arranged with nucleosome, chromatin is open couple closely.Eukaryotic DNA is deposited in the form of chromatin It is nucleosome in, basic structural unit.Generally, the combination of transcription factor and its binding site DNA, have one it is necessary before It mentions, i.e., to arrange nucleosome around Binding site for transcription factor, form the region in a chromatin open zone, DNA double spiral is naked Dew.Therefore, chromatin open zone on genome is the possible region that transcription factor combines.The high throughput in chromatin open zone Detection method includes: DNase-Seq, ATAC-Seq, FAIRE-seq etc..In addition, there is DNA sequence dna since transcription factor combines Specificity, i.e., transcription factor combine DNA composition and sequence on have specific mode (i.e. die body, motif).Therefore, may be used With using the sequence signature (motif) of chromatin open zone information and Binding site for transcription factor, one-off recognition is multiple (can be with It is the transcription factor of all known motif) transcription factor and its binding site, this is a part of important content of the invention.
Although the noncoding region of genome does not participate in coding protein directly, but regulate and control the important of coded sequence transcription Region includes element or its region combined of numerous controlling gene expression such as enhancer and Binding site for transcription factor.
With going deep into variation with disease correlation studies, more and more evidences show in the function element of noncoding region Variation and genetic disease have close ties.Especially by whole-genome association (GWAS) various diseases excavated State of risk single nucleotide variations/polymorphism (SNVs/SNPs), largely both point to Genome noncoding regions.These non-codings If the variation in area will change compatibility of the transcription factor in conjunction with DNA on Binding site for transcription factor, and then change downstream The transcriptional level of target gene changes cell phenotype feature.SNV that these are located at Binding site for transcription factor is assessed to transcription factor In conjunction with influence, for analyzing cell differentiation, canceration, annotation genes of individuals group function etc. is most important.For example, oncogene C-MYC has a regulatory region (8q24) apart from its 335kb, the G allelotype of this region SNP rs6983267 to transcription because Sub- TCF4 is strongly affine, so that c-MYC high expression out of control, forms cancer phenotype.Therefore, it is necessary to establish a kind of assessment noncoding region change The model of different effect, effect of the forecast variation to cell phenotype.
Regrettably, currently, there has been no methodology, the systematically bound site of a variety of transcription factors of one-off recognition Point, and the nucleotide diversity for being located at these Binding site for transcription factor regions is assessed to transcription factor combination and downstream gene transcription Influence.
Summary of the invention
Goal of the invention: the present invention utilizes chromatin open zone high-flux sequence data, establishes identification transcription factor and its knot The recognition methods of coincidence point establishes a kind of assess and is located at the variation of Binding site for transcription factor area non-coding nucleotide to transcriptional regulatory Effect method and model.The object of the present invention is to provide a kind of function predictions of noncoding region mononucleotide genome mutation Method.
Technical solution: in order to solve the above-mentioned technical problem, the technical scheme adopted by the invention is as follows: a kind of noncoding region list The function prediction method of nucleotide gene group variation, comprising the following steps:
1) chromatin opening region identifies;
2) Binding site for transcription factor identifies: utilizing the site-specific frequency matrix (PSSM) of transcription factor, scanning dyeing The binding site of the transcription factor of matter open area;Based on encoding transcription factors gene expression dose, transcription factor is determined;To turn Record target gene of the gene in the base of 8,000, factor binding site downstream as transcription factor;
3) assess the effect of single nucleotide variations: the site-specific frequency matrix based on transcription factor calculates and is located at transcription The influence that single nucleotide variations in factor binding site area combine the transcription factor factor, identification significantly change transcription factor knot The single nucleotide variations of conjunction ability;Further by checking that the target gene biological pathways of transcription factor, assessment mononucleotide become Different effect.
Wherein, step 1) chromatin opening region identification step are as follows: chromatin opening region sequencing data is carried out The quality control of read (reads), money order receipt to be signed and returned to the sender compare (mapping) and the identification of read rich region, identify chromatin opening region.
Wherein, the read of mass value Q >=30 of the read, sequencing error rate≤0.001 is compared for money order receipt to be signed and returned to the sender.
Wherein, the mathematical model of the read rich region is Poisson distribution, is enriched with significance value formula are as follows:
K is to count in genomic locus read, and λ is the average value that genome read counts, rich The threshold value for collecting significance value P is 10-5
Wherein, the chromatin opening region sequencing data in the step 1) is that deoxyribonuclease I sensitivity site is surveyed Ordinal number according to, regulation original part formaldehyde auxiliary separation sequencing data and transposase can and chromatin experiment one of sequencing data or several Kind.
Wherein, the gene expression level data in the step 2) is microarray data (Microarray) or ribose core Sour sequencing data (RNA-Seq).
When expression quantity FPKM >=8 of the gene, then it is assumed that transcription factor abundance in cell is high.
Wherein, the following formula 1 that acts through of the assessment single nucleotide variations in the step 3) is assessed:
Formula 1, wherein P (i, j) and P (i, k) is at i-th of site specific frequencies matrix respectively On position, the value of base j and base k, j and k belong to one kind of adenine, guanine, cytimidine and thymidine.
Wherein, (genotype k) is the base type that frequency is low in saltant type or crowd, base j (genotype j) to base k For the base type of wild type the latter crowd's high frequency;F is positive number, illustrates after being mutated or low frequency genotype has increasing Add the effect of transcription factor compatibility, F is negative, illustrates that wild type or high-frequency genotype have and increases transcription factor parent With the effect of property;F numerical value is positive number and numerical value is bigger, indicates after being mutated or low frequency genotype is with higher to transcription It is higher to indicate that wild type or high-frequency genotype have on the contrary, F numerical value is negative and numerical value is smaller for the compatibility of the factor To the compatibility of transcription factor;F is 0, and indicating that the genotype combines transcription factor does not influence.
The utility model has the advantages that compared with prior art, the invention has the advantages that
1, present invention may apply to predict a variety of transcription factors in target sample genome simultaneously, rather than ChIP- Seq technology equally to different transcription factors needs that same sample is sequenced repeatedly using different antibody.
2, the present invention is lower to sequencing depth requirements by being then based on chromatin opening region sequencing data, and same hour hands To a variety of transcription factors, therefore there is lower time and economic cost.
3, the present invention establishes an assessment Binding site for transcription factor area, and genome single base makes a variation to transcription factor knot The model of group photo sound constructs a kind of method of noncoding region mutation annotation.The conjunction of model is demonstrated in embodiment 1 Rationality and accuracy.
4, the present invention is only sequenced by chromatin open zone and courier's nucleotide (mRNA) is sequenced, and realizes transcription factor knowledge Not, the note of Binding site for transcription factor identification, the target gene identification of transcription factor regulation and Binding site for transcription factor variation Release analysis.
Detailed description of the invention
Fig. 1, noncoding region mononucleotide genome mutation of the invention function prediction method flow chart;
The calculating process explanation of Fig. 2, the embodiment of the present invention 1;
The transcription factor in the chromatin open zone guarded in Fig. 3, eight kinds of cells combines on the site DNA of chromatin open zone Situation.
Specific embodiment
Below by specific embodiment, the present invention is further described, it is noted that for the ordinary skill of this field For personnel, without departing from the principle of the present invention, several variations and modifications can also be made, these also should be regarded as belonging to Protection scope of the present invention.
Embodiment 1: the accuracy experiment of prediction technique of the present invention: calculate the DNA's on one section of Binding site for transcription factor The influence that one single nucleotide polymorphism combines transcription factor
Proto-oncogene c-MYC has a regulatory region (8q24) apart from its 335kb, there is a SNP in this region (rs6983267) (genome assembles version GRCh37.p13), in human genome project, in 1008 asian populations (Phase3_V1-EAS), in normal chain, which is that the frequency of guanine (G) is G=0.388, and the frequency of cytimidine (C) is C= 0.612.(thousand human genomes, crowd's number 1006), frequency G=0.499, T=0.501 in European crowd.Where the SNP DNA sequence dna (" ATGAAAGGC ") be transcription factor TCF4 binding site, regulation target gene be c-MYC.
In European crowd, in this polymorphic site of rs6983267, genotype G has slightly lower frequency (0.501), Genotype T has slightly higher frequency (0.499), this site is the binding site of transcription factor TCF4, this transcription factor tune The target gene of control is c-MYC.Does what so genotype G and genotype T have influence the compatibility of transcription factor TCF4? according to this A kind of assessment that invention is established is located at the variation of Binding site for transcription factor area non-coding nucleotide to the side of the effect of transcriptional regulatory Defined in method, the formula (formula that is defined using the site-specific frequency matrix (PSSM) and the present invention of transcription factor TCF4 1), calculate F numerical value: this polymorphic site corresponds to the 9th of PSSM matrix, therefore P (9, G)=0.48, P (9, T)= 0.07, F=-log10 (0.07/0.48)=0.83.F numerical value is positive, and illustrates that genotype G has enhancing transcription factor compatibility Effect;If being converted into multiple is 0.48/0.07=6.86, show to be exactly genotype G than genotype T, to transcription factor The compatibility of TCF4 is 6.86 times big.The compatibility of the transcription factor enhances, and corresponding effect is exactly the expression quantity of target gene c-MYC Increase.That is, the expression quantity of gene c-MYC is higher than the individual that genotype is T in the individual that loci gene type is G, Namely increase the risk that cancerous phenotype occurs.
In the document delivered, the evidence for supporting above-mentioned conclusion is found.Evidence shows: the G equipotential base of rs6983267 Because being a colon cancer, breast cancer, the variation of the risk of prostate cancer.In colon carcinoma cell line HCT116 and DLD, G equipotential base Because TCF4 can be made to increase the compatibility of about 26% and %51;In DLD cell line, G allele causes c-MYC expression to increase 2 (Molecular And Cellular Biology, 2010,30 (6): 1411-1420) again.
Similarly, can calculate in asian population, the disease risks in this site, about 61.2% asian population Rs6983267 loci gene type is C, for the minimum genotype T of risk, F=-log10 (P (9, T)/P (9, C)) =-log10 (0.07/0.20)=0.46 > 0, genotype C still have disease risks, only show not as genotype G It writes.
Therefore, the function prediction method of noncoding region mononucleotide genome mutation of the invention, which can be assessed accurately, turns The variation for recording factor binding site area acts on the influence that transcription factor combines, and makees to the adjusting of the transcription of target gene With.
Embodiment 2 identifies the shared transcription factor of 8 kinds of cell line, predicts the genome mutation of Binding site for transcription factor Function
1, data source:
8 kinds of cell line GM12878, IMR90, MCF-7, K562, BJ, H7, HepG2 and M059J chromatin open zone high passes Measure sequencing data (DNase-Seq) from National Center for Biotechnology Information data entry number (GEO ID: GSE32970)(Nature 2012Sep 6;489(7414):75-82).The cell type of 8 kinds of cell lines is shown in Table 1.
Table 1
Source expression data (RNA high-flux sequence (RNA-Seq)) of eight kinds of cell line is shown in Table 1.Variation data are derived from beauty The genomic information browser (UCSC genome browser) in University of California of state Santa Cruz branch school, with Tables therein The simple variation data information (dbSNP150) of function acquisition human genome hg19.Single base is had chosen according to classification SNP (class label is single), mainly includes information are as follows: chromosome numbers (chrom), position (chromStart), SNP Number (name), positive minus strand (strand), base information (observe).
2, data processing
Analysis data are carried out according to flow chart shown in FIG. 1.
1) chromatin area identification (" peak " identification)
Chromatin open zone sequencing data (DNase-Seq) analysis: sequencing read (reads) tool BWA (Bioinformatics, 2010,25:1754-60) is compared to human gene with reference in group, and version number hg19 is retained in reference The reads of 4 comparison positions is only less than on genome.The identification MACS2 tool of Reads rich region (peak (peaks)) (P≤10-5)(Genome Biol,2008,9(9):R137)。
2) transcription factor and its binding site identification
The Binding site for transcription factor identification of reads enrichment region (peaks) completes (Molecular with Homer tool Cell, 2010,38 (4): 576-589), P-values≤0.01.(conservative) that this example selects those to have in 8 kinds of cells It is analyzed in chromatin open zone.The target gene in Reads rich region downstream with PAVIS tool tips (Bioinformatics, 2013,29 (23): 3097-3099), annotate 8,000 base-pairs of distance.So far, it has identified in every kind of cell type, is dyeing The Binding site for transcription factor in matter open zone, and the transcription factor that may be combined (have binding site and centainly grab the record factor In conjunction with) and this binding site downstream gene (regulation target gene).
The chromatin open zone number guarded in 8 kinds of cells is shown in Table 2.
The information in the chromatin open zone (" peak ") identified in 2 eight kinds of cell lines of table
Data (RNA-Seq) is expressed to be compared with Tophat (Genome biology, 2013,14 (4): R36) with reference to gene Group hg19, then calculates the expression FPKM (segment number in every kilobase transcript) of each gene with Cufflinks. The expression quantity (FPKM) of the gene of calculation code transcription factor, if FPKM >=8 of the gene, then it is assumed that the transcription factor is thin Abundance is high in born of the same parents, can have transcriptional control effect to downstream target gene in conjunction in its binding site.It is guarded in 8 kinds of cells The transcription factor in chromatin open zone, the combination identified is shown in Fig. 3.Basic logic is that it is transcription that chromatin, which is in open state, The factor is incorporated in the premise in the site DNA, and the DNA fragmentation that transcription factor combines has sequence-specific.Two o'clock accordingly identifies 8 The binding site of the transcription factor in chromatin open zone, also just deduces possible transcription factor type in kind cell.Then root According to expression of the gene of encoding transcription factors in each cell, those transcriptions to play a leading role in cell are finally determined The factor.In Fig. 3, conspicuousness P≤10 are shown in this example-8And the transcription factor of FPKM >=8.The transcription factor of identification, this example What is calculated is incorporated in the transcription factor in the chromatin open zone guarded in eight kinds of cells.Circle size illustrates transcription factor in figure The conspicuousness (P) combined on the site DNA of chromatin open zone, circle is bigger, stronger in conjunction with conspicuousness;Circle Fill Color The expression (FPKM) of the gene of depth representing encoding transcription factors, specific data are shown in Table 4, and (table 4-1 is encoding transcription factors Gene expression data;Table 4-2 is the significance value that transcription factor is enriched in chromatin open zone).This Fig. 3 is shown aobvious Work property P≤10-8And the transcription factor of FPKM >=8.Column represent cell type in Fig. 3, and row represents transcription factor.
The expression data (FPKM) of the gene of table 4-1 encoding transcription factors
Significance value that table 4-2 transcription factor is enriched in chromatin open zone (- log10 (P-value))
In four kinds of cancerous cell lines, the transcription factor and target gene of identification are listed in table 5~8.
Impacted higher gene and relevant information in 5 K562 cell line of table
Impacted higher gene and relevant information in 6 HepG2 cell line of table
Impacted higher gene and relevant information in 7 M059J cell line of table
Impacted higher gene and relevant information in 8 MCF-7 cell line of table
This example have identified significantly exist the transcription factor of 8 kinds of cell line, i.e., some general transcription factors, such as transcribe because Sub- CTCF, BORIS and Sp1.The encoding gene of CTCF is expressed frequently in normal body cell, and BORIS (Brother of Regulator of Imprinted Sites) as CTCF paralog but in contrast (PLoS Genet, 2008.4 (8):e1000169).The canceration that the expressing gene of BORIS is considered with cell is related, and document finds its encoding gene in big portion Dividing in tumour cell all can frequently express, and then seldom in normal cell (Proc Natl Acad Sci USA, 2002.99 (10):6806-11;Eur J Cancer,2012.48(6):929-35).Recent studies suggest that BORIS encoding gene is in rectum It is highly expressed in cancer, and has the function of inhibiting Apoptosis (Eur J Cancer, 2012.48 (6): 929-35).
3) effect of single nucleotide variations is assessed
Site-specific frequency matrix (PSSM) based on transcription factor assesses the effect (formula 1) of single nucleotide variations.
Formula 1, wherein P (i, j) and P (i, k) is at i-th of site specific frequencies matrix respectively On position, the value of base j and base k, j and k belong to one kind of adenine, guanine, cytimidine and thymidine.Make a variation data Information is from above-mentioned dbSNP150 (hg19).
This example identifies single nucleotide polymorphism (SNPs) variation in Binding site for transcription factor area.In eight kinds of cells, The transcription factor that chromatin open zone combines is guarded, the number of existing SNPs is shown in Table 9 in binding site area.
Single nucleotide variations (SNPs) number in 9 eight kinds of cell line Binding site for transcription factor (TFBS) regions of table
In 4 kinds of cancerous cell lines identify transcription factor, target gene and risk genes group variation and these variation to turn The influence degree that the factor combines is recorded, the F value of all variations (SNPs) of influence degree adduction indicates.Formula 1 is shown in the calculating of F value. Specifically:
Formula 1, wherein P (i, j) and P (i, k) is at i-th of site specific frequencies matrix respectively On position, the value of base j and base k, j and k belong to one kind of adenine, guanine, cytimidine and thymidine.
Wherein, (genotype k) is the base type that frequency is low in saltant type or crowd, base j (genotype j) to base k For the base type of wild type the latter crowd's high frequency;F numerical value is more positive (big), indicate mutation after or low frequency genotype With the higher compatibility to transcription factor, on the contrary, F numerical value is more negative (small), wild type or high-frequency genotype tool are indicated There is the higher compatibility to transcription factor;F is positive number, illustrate mutation after or low frequency genotype have increase transcription because The effect of sub- compatibility, F are negative, illustrate that wild type or high-frequency genotype have the work for increasing transcription factor compatibility With F 0, indicating that the genotype combines transcription factor does not influence.
One gene, which may have several SNP all, to be its Binding site for transcription factor, in order to assess these SNP to the gene Transcription and adjusting, indicate total adjustment effect with the F value (∑ F) for summing it up all SNP.
For the embodiment as a result, choosing several special cases illustrates the reasonability or biological significance of result.It was found that: The isogenic expression of gene DAD1, SIRPA, BAX is different due to the difference of genes of individuals group, specifically, due to these gene tune Area (Binding site for transcription factor) is controlled in different crowd, has different genotype (SNPs), so as to cause the knot of transcription factor It is different to close intensity, causes the expression of these genes that can change, and these changes are associated with disease (cancer) phenotype.
In K562 (Leukemia Cell Lines) cell (table 5), the ∑ F=-15.4 of gene DAD1 is negative according to F numerical value, And absolute value is larger, illustrates the low frequency alleles type of polymorphic site rs2301200, rs227870 and rs5742730 (Minor allele frequency (MAF)) will lead to transcription factor NFY, Klf9, Max and NRF in the regulation of gene DAD1 The compatibility in site declines, and the expression of gene DAD1 is caused to reduce.Gene DAD1 be transcription factor NFY, Klf9, Max and The controlling gene (target gene) of NRF.The expression product of DAD1 gene is a kind of enzyme, has to apoptotic apoptotic process and inhibits Effect, the inactivation of DAD1 can cause Apoptosis (Genomics, 1995,26 (2): 433-5).And DAD1 gene and MCL1 base There is interaction because between, the latter then equally plays a part of inhibiting Apoptosis (J Biochem, 2000.128 (3): 399- 405)。
In HepG2 (liver cancer cell lines) (table 6), the ∑ F=-13.5 of gene SIRPA is negative according to F numerical value, and absolutely It is larger to being worth, illustrate that the low frequency alleles type (MAF) of polymorphic site rs55698111 and rs67558779 will lead to and turns The compatibility that factor NFY and Sox2 is recorded in the regulatory site of gene SIRPA declines, and the expression of gene SIRPA is caused to reduce. Gene SIRPA is the target gene of NFY and Sox2, and expression product is a kind of signals-modulating family protein, be a kind of inhibition by Body.The albumen can interact with CD47 albumen, and this effect makes cell from being swallowed by macrophage, and antibody is pressing down (JCI Insight, 2017,2 (1): e89140) can be played a role in growth of cancer cells processed and transfer.
In M059J cell line (table 7), the ∑ F=14 of gene PTGR1 is positive number according to F numerical value, and absolute value is larger, Illustrate the low frequency alleles type of polymorphic site rs10980954, rs3031178, rs71501685 and rs200997621 (MAF) compatibility that will lead to transcription factor Mef2b and OCT4 in the regulatory site of gene PTGR increases, and leads to gene SIRPA Expression increase.
In MCF-7 (breast cancer cell line) (table 8), the ∑ F=10 of gene BAX is positive number, absolute value according to F numerical value It is larger, illustrate that the low frequency alleles type (MAF) of polymorphic site rs115440855 and rs138364829 will lead to transcription Factor c-Myc increases in the compatibility of the regulatory site of gene BAX, and the expression of gene BAX is caused to increase.BAX gene is A member of Bcl-2 gene family, is regulated and controled by c-Myc, and the product and Apoptosis of BAX has close association.Cell is in normal shape Under state, BAX albumen is present in cytosol, and once generates antiapoptotic signals, and BAX albumen meeting occurred conformation changes and becomes thin Born of the same parents' device film correlation especially mitochondrial membrane GAP-associated protein GAP (EMBO J, 1998,17 (14): 3878-85).Importantly, these by Influencing higher gene has close functional relationship with the death of cell, and transcription factor combination trend caused by SNP Variation make gene towards avoiding the direction of Apoptosis from changing.
Similarly, it can be point according to F value size, transcription factor, target gene, variation, explain the biology meaning of other results Justice.
It above are only the preferred embodiment of the invention, be not limited to the present invention.For those skilled in the art For, other various forms of variations or variation can also be made on the basis of the above description.There is no need and unable to institute Some embodiments illustrates.And the obvious changes or variations that thus scheme is extended out are still in guarantor of the invention Within the scope of shield.

Claims (9)

1. a kind of function prediction method of noncoding region mononucleotide genome mutation, which comprises the following steps:
1) chromatin opening region identifies;
2) Binding site for transcription factor identifies: using the site-specific frequency matrix of transcription factor, scanning chromatin opening region Transcription factor binding site;Based on encoding transcription factors gene expression dose, transcription factor is determined;It is combined with transcription factor Target gene of the gene as transcription factor in the base of 8,000, site downstream;
3) assess the effect of single nucleotide variations: the site-specific frequency matrix based on transcription factor calculates and is located at transcription factor The influence that single nucleotide variations in binding site area combine the transcription factor factor, identification significantly change transcription factor combination energy The single nucleotide variations of power;Further by checking the target gene biological pathways of transcription factor, single nucleotide variations are assessed Effect.
2. the function prediction method of noncoding region mononucleotide genome mutation according to claim 1, which is characterized in that Step 1) chromatin opening region identification step are as follows: carry out the quality control of read to chromatin opening region sequencing data System, money order receipt to be signed and returned to the sender compare and the identification of read rich region, identifies chromatin opening region.
3. the function prediction method of noncoding region mononucleotide genome mutation according to claim 2, which is characterized in that The read of mass value Q >=30 of the read, sequencing error rate≤0.001 is compared for money order receipt to be signed and returned to the sender.
4. the function prediction method of noncoding region mononucleotide genome mutation according to claim 2, which is characterized in that The mathematical model of the read rich region is Poisson distribution, is enriched with significance value formula are as follows:
K is to count in genomic locus read, and λ is the average value that genome read counts, and enrichment is aobvious The threshold value of work property value P is 10-5
5. the function prediction method of noncoding region mononucleotide genome mutation according to claim 1, which is characterized in that Chromatin opening region sequencing data in the step 1) is deoxyribonuclease I sensitivity site sequencing data, regulation original Part formaldehyde auxiliary separation sequencing data and transposase can and chromatin experiment one or more of sequencing data.
6. the function prediction method of noncoding region mononucleotide genome mutation according to claim 1, which is characterized in that Gene expression level data in the step 2) is microarray data or ribonucleic acid sequencing data.
7. the function prediction method of noncoding region mononucleotide genome mutation according to claim 1, which is characterized in that When expression quantity FPKM >=8 of the gene, then it is assumed that transcription factor abundance in cell is high.
8. the function prediction method of noncoding region mononucleotide genome mutation according to claim 1, which is characterized in that The following formula 1 that acts through of assessment single nucleotide variations in the step 3) is assessed:
Wherein P (i, j) and P (i, k) is at i-th of site specific frequencies matrix respectively It sets, the value of base j and base k, j and k belong to one kind of adenine, guanine, cytimidine and thymidine.
9. the function prediction method of noncoding region mononucleotide genome mutation according to claim 5, which is characterized in that The F is positive number, illustrates after being mutated or low frequency genotype has the function of increasing transcription factor compatibility, and F is negative, Illustrate that wild type or high-frequency genotype have the function of increasing transcription factor compatibility;F numerical value is positive number and numerical value is got over Greatly, indicate mutation after or low frequency genotype have the higher compatibility to transcription factor, on the contrary, F numerical value be negative and Numerical value is smaller, indicates that wild type or high-frequency genotype have the higher compatibility to transcription factor;F is 0, and indicating should Genotype, which combines transcription factor, not to be influenced.
CN201810804405.9A 2018-07-20 2018-07-20 Function prediction method for non-coding region mononucleotide genome variation Active CN109033751B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810804405.9A CN109033751B (en) 2018-07-20 2018-07-20 Function prediction method for non-coding region mononucleotide genome variation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810804405.9A CN109033751B (en) 2018-07-20 2018-07-20 Function prediction method for non-coding region mononucleotide genome variation

Publications (2)

Publication Number Publication Date
CN109033751A true CN109033751A (en) 2018-12-18
CN109033751B CN109033751B (en) 2021-07-27

Family

ID=64644759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810804405.9A Active CN109033751B (en) 2018-07-20 2018-07-20 Function prediction method for non-coding region mononucleotide genome variation

Country Status (1)

Country Link
CN (1) CN109033751B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109979531A (en) * 2019-03-29 2019-07-05 北京市商汤科技开发有限公司 A kind of genetic mutation recognition methods, device and storage medium
CN110544509A (en) * 2019-08-20 2019-12-06 广州基迪奥生物科技有限公司 single-cell ATAC-seq data analysis method
CN111863127A (en) * 2020-07-17 2020-10-30 北京林业大学 Method for constructing genetic control network of plant transcription factor to target gene
CN114427116A (en) * 2021-12-29 2022-05-03 北京林业大学 Method for predicting downstream target gene regulated by plant growth and development transcription factor at whole genome level

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104450884A (en) * 2014-10-28 2015-03-25 首都医科大学附属北京安贞医院 Hypertension susceptibility relevant gene variation locus and detecting method thereof
US20170114413A1 (en) * 2015-10-27 2017-04-27 The Broad Institute Inc. Compositions and methods for targeting cancer-specific sequence variations
CN108220394A (en) * 2018-01-05 2018-06-29 清华大学 Identification method, system and its application of gene regulation sex chromatin interaction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104450884A (en) * 2014-10-28 2015-03-25 首都医科大学附属北京安贞医院 Hypertension susceptibility relevant gene variation locus and detecting method thereof
US20170114413A1 (en) * 2015-10-27 2017-04-27 The Broad Institute Inc. Compositions and methods for targeting cancer-specific sequence variations
CN108220394A (en) * 2018-01-05 2018-06-29 清华大学 Identification method, system and its application of gene regulation sex chromatin interaction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘宏德等: "核小体定位模式及其与DNA甲基化位点分布的关系", 《中国生物化学与分子生物学报》 *
聂玉敏等: "调控真核基因表达的非编码序列", 《生物物理学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109979531A (en) * 2019-03-29 2019-07-05 北京市商汤科技开发有限公司 A kind of genetic mutation recognition methods, device and storage medium
CN109979531B (en) * 2019-03-29 2021-08-31 北京市商汤科技开发有限公司 Gene variation identification method, device and storage medium
CN110544509A (en) * 2019-08-20 2019-12-06 广州基迪奥生物科技有限公司 single-cell ATAC-seq data analysis method
CN110544509B (en) * 2019-08-20 2021-06-11 广州基迪奥生物科技有限公司 Single-cell ATAC-seq data analysis method
CN111863127A (en) * 2020-07-17 2020-10-30 北京林业大学 Method for constructing genetic control network of plant transcription factor to target gene
CN111863127B (en) * 2020-07-17 2023-06-16 北京林业大学 Method for constructing genetic regulation network of plant transcription factor to target gene
CN114427116A (en) * 2021-12-29 2022-05-03 北京林业大学 Method for predicting downstream target gene regulated by plant growth and development transcription factor at whole genome level
CN114427116B (en) * 2021-12-29 2023-08-15 北京林业大学 Method for predicting downstream target gene regulated by plant growth transcription factor on whole genome level

Also Published As

Publication number Publication date
CN109033751B (en) 2021-07-27

Similar Documents

Publication Publication Date Title
Minnoye et al. Chromatin accessibility profiling methods
Lomberk et al. Distinct epigenetic landscapes underlie the pathobiology of pancreatic cancer subtypes
Rooijers et al. Simultaneous quantification of protein–DNA contacts and transcriptomes in single cells
Mieczkowski et al. MNase titration reveals differences between nucleosome occupancy and chromatin accessibility
Lister et al. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells
Elliott et al. Intermediate DNA methylation is a conserved signature of genome regulation
Liscovitch-Brauer et al. Profiling the genetic determinants of chromatin accessibility with scalable single-cell CRISPR screens
Ernst et al. Mapping and analysis of chromatin state dynamics in nine human cell types
CN109033751A (en) A kind of function prediction method of noncoding region mononucleotide genome mutation
Henckel et al. Genome-wide identification of new imprinted genes
Ahn et al. Introduction to single-cell DNA methylation profiling methods
Tanaka Omics-based medicine and systems pathology
Nordström et al. Unique and assay specific features of NOMe-, ATAC-and DNase I-seq data
Rifatbegovic et al. Neuroblastoma cells undergo transcriptomic alterations upon dissemination into the bone marrow and subsequent tumor progression
Bakhtiar et al. Epigenetics in head and neck cancer
Liu et al. Transcription factor expression as a predictor of colon cancer prognosis: a machine learning practice
Kelly et al. A multi-omic dissection of super-enhancer driven oncogenic gene expression programs in ovarian cancer
Carnevali et al. Whole-genome expression analysis of mammalian-wide interspersed repeat elements in human cell lines
Zhang et al. Cancer biomarkers discovery of methylation modification with direct high-throughput nanopore sequencing
Chenarani et al. Bioinformatic tools for DNA methylation and histone modification: A survey
Zhang et al. RNA sequencing and bioinformatics analysis of the long noncoding RNA–mRNA network in colorectal cancer
Magar et al. Gene expression and transcriptome sequencing: basics, analysis, advances
Pezone et al. Tracing and tracking epiallele families in complex DNA populations
Araki et al. More than 40,000 transcripts, including novel and noncoding transcripts, in mouse embryonic stem cells
Yang et al. Analysis approaches for the identification and prediction of N 6-methyladenosine sites

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant