WO2019134586A1 - Method and system for identifying gene-regulatory chromatin interaction and application thereof - Google Patents

Method and system for identifying gene-regulatory chromatin interaction and application thereof Download PDF

Info

Publication number
WO2019134586A1
WO2019134586A1 PCT/CN2018/124761 CN2018124761W WO2019134586A1 WO 2019134586 A1 WO2019134586 A1 WO 2019134586A1 CN 2018124761 W CN2018124761 W CN 2018124761W WO 2019134586 A1 WO2019134586 A1 WO 2019134586A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
chromatin
difference
interaction
sample
Prior art date
Application number
PCT/CN2018/124761
Other languages
French (fr)
Chinese (zh)
Inventor
陈阳
李炎剑
贺毅
张奇伟
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Publication of WO2019134586A1 publication Critical patent/WO2019134586A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection

Definitions

  • the present invention relates to a method for identifying gene-regulated chromatin interactions, and more particularly to a method for identifying chromatin interactions and corresponding effector genes that are capable of affecting sample transitions during sample state transitions.
  • Chromatin conformation plays a key role in the regulation of gene expression. Studies have found that mitotic interphase chromosomes occupy a specific domain, and gene transcription is closely related to the relative position of genes relative to the nuclear fibrosis and chromosome domains. Recent studies using high-throughput chromosome conformation capture (Hi-C) have revealed that genomes are organized into hundreds of kilobases to one megabase Topologically Associating Domain (TAD), and in TAD The chromatin region is more likely to interact with other regions within the same TAD than with regions other than TAD. And between different cell types, most of the TAD positions remain unchanged and show evolutionary conservation.
  • Hi-C high-throughput chromosome conformation capture
  • TAD TAD is not only a structural structural unit, but also a functional unit of transcriptional regulation.
  • long-term chromatin interactions mediated by specific or non-coding RNAs in TAD link long-range regulatory regions, such as enhancers and gene promoters, making long-range regulation of gene expression possible.
  • the inventors After long-term research, the inventors have obtained a method for correlating chromatin interaction, expression level and/or recognition site of a specific gene with gene regulation inside chromatin, thereby completing the present invention.
  • the invention relates to a method for identifying a sample state transition effector gene whose expression is affected by a change in chromatin interaction in a state transition of a sample, comprising the steps of:
  • the sample is a cell.
  • the gene recognizable behavioral difference comprises a difference in gene expression amount and/or a difference in binding pattern distribution in a gene regulatory region genomic sequence; preferably, the difference in gene expression amount is mRNA expression Difference in amount or difference in protein expression.
  • chromatin interaction difference present in the transcriptional regulatory region of the gene in step (1) is obtained by the following steps:
  • step (c) integrating the information obtained in step (a) and step (b) to obtain a chromatin interaction between the promoter and the enhancer in an activated state, ie, a chromatin interaction present in the transcriptional regulatory region;
  • step (1) wherein the genetically identifiable behavior difference in step (1) is obtained by the following steps:
  • the transcriptional expression analysis is performed by RNA sequencing, ie, RNA-seq method.
  • the antibody is a binding antibody of H3K4me3 and H3K27ac, which forms a signal peak in combination with H3K4me3 and H3K27ac, respectively, representing promoter and enhancer sites in an activated state;
  • chromatin conformation capture technology preferably using high-throughput chromatin conformation capture techniques, such as Hi-C method, in situ Hi-C method, BL-Hi -C method or ChIA-PET method to obtain information on genome-wide chromatin interaction; or to obtain information on local chromatin interaction using 4C or 5C method;
  • step iv) dividing the reference genomic sequence into regions of a certain size, preferably between 1 and 40 kb, for example 1 kb, 5 kb, 10 kb, 15 kb, 20 kb, 25 kb, 30 kb, 35 kb or 40 kb, based on step iv) Promoter and enhancer information of the obtained active state, by separately obtaining a region containing the active promoter and enhancer sites, preferably the region including the promoter is named as a gene region, and the region containing the enhancer sequence Named the regulatory area;
  • a chromatin interaction frequency signal occurring between the gene region and the regulatory region is identified, thereby obtaining a chromatin interaction associated with gene regulation; Comparison of gene regulation-related chromatin interactions between samples in a state and a second state, with statistically significant differences, identified as differential gene-regulatory interactions, including relative In the sample in the first state, the enhanced gene-regulated chromosomal interaction and/or the attenuated gene-regulated chromosomal interaction in the sample in the second state.
  • step (2) specifically includes the following steps:
  • a step of functionally studying the screened effector genes to determine their function is further included.
  • it further comprises the step of identifying a gene-regulated chromatin interaction, a chromatin interaction that is capable of affecting the expression of the effector gene identified in step (2), as a gene-regulated chromatin interaction .
  • sample state transition is achieved by chemical agent induction, natural differentiation, and/or physical stimulation.
  • the invention in a second aspect, relates to a method of identifying a chromatin interaction capable of modulating a state transition of a sample, comprising the steps of any of the first aspects.
  • the invention in a third aspect, relates to a method of identifying a regulatory factor involved in a chromatin interaction involved in a state transition of a sample, comprising the steps of any of the first aspects.
  • the invention in a fourth aspect, relates to a method of identifying a substance capable of modulating chromatin interaction, comprising: identifying an effector gene or gene regulation of a chromatin interaction using any of the embodiments of the first aspect
  • the chromatin interacts, and then the substance to be tested is contacted with the sample to analyze changes in the effector gene or gene regulatory interaction.
  • the invention relates to an identification system for a sample state transition effector gene whose expression is affected by changes in chromatin interactions in a state transition of a sample, comprising the following modules:
  • the system is capable of obtaining genetically identifiable behavioral differences associated with differences in chromatin interactions, thereby obtaining sample state transition effector genes that are affected by chromatin interactions.
  • the system further comprises a gene regulatory chromatin interaction recognition module to identify gene regulatory chromatin interactions that are capable of affecting expression of the effector gene.
  • the gene recognizable behavioral difference analysis module is capable of analyzing a difference in gene expression levels and/or a difference in transcription factor binding motif distribution in a genomic sequence of a transcriptional regulatory region of the gene.
  • an analysis module in which the chromatin interaction differences of the transcriptional regulatory regions are capable of performing the following analysis:
  • step (c) integrating the information obtained in step (a) and step (b) to obtain a chromatin interaction between the promoter and the enhancer in an activated state, ie, a chromatin interaction present in the transcriptional regulatory region;
  • the effector gene identification module is capable of performing the following analysis:
  • the present invention relates to a detection kit comprising the reagents used in the methods of the first to fourth aspects of the invention.
  • the method of the present invention establishes an analysis and identification method between chromatin conformation and transcriptional regulation by integrating a plurality of histological test methods and results.
  • the method can be applied to the analysis of multiple biological processes, such as cell differentiation, ontogeny, cell variability, disease treatment, etc., so that chromatin interactions and regulatory factors having important influence on the above processes can be identified at the chromatin conformation level. .
  • Figure 1 shows the overall flow of one embodiment of the present invention.
  • Figure 2 is a graph showing the change in mRNA expression of HL-60 cells induced by All-trans-retinoic acid (abbreviated as ATRA) compared with the control group in one embodiment of the present invention, and Figure 2A shows the dimension. The change in gene expression after formic acid induction, Figure 2B shows the enrichment of differentially expressed genes in different GO classifications.
  • ATRA All-trans-retinoic acid
  • Figure 3 shows the change in the frequency of chromatin interactions within the TAD after induction with ATRA (Fig. 3A) and the relationship between frequency changes and differentially expressed genes (Fig. 3B).
  • Figure 4 shows the relationship between changes in the frequency of chromatin interactions in TAD and changes in H3K4me3 and H3K27ac modifications.
  • Figure 5 shows the chromatin interaction differences associated with gene expression regulation obtained in one embodiment of the present invention, wherein Figure 5A shows the changes in the H3K4me3 signal and the H3K27ac signal in the ATRA-treated and control groups, and Figure 5B shows the ATRA. The specific and shared peaks of the control group and the control group were separated from the transcription start site, respectively.
  • Figure 5C shows the search for chromatin interactions related to gene regulation.
  • Figure 5D shows the H3K27ac in the gene region of the Gain and Loss groups. Signals, as well as relative comparison of H3K4me3 signals in the gene region,
  • Figure 5E shows the differentially expressed genes in the Gain and Loss groups.
  • Figure 6 shows a comparison of the results of ATAC-seq determination of chromatin open regions after induction by ATRA in one embodiment of the invention.
  • Figures 6A and 6B show the specific peaks of each group and the distribution of peaks in the genome.
  • Figures 6C and 6D show that in the regulatory region and the gene region, the Gain group and the Loss group interact with the ATRA-treated group and the control group, respectively.
  • Sexual and shared chromatin open signal distribution in which the Gain group is more enriched in the ATRA-treated group-specific signal, while the Loss group is more enriched in the control-specific signal, and it can be seen that the regulatory region is more enriched than the gene region.
  • Figure 6E shows transcription factor binding motifs enriched in chromatin open regions in the control and ATRA induction groups.
  • Figure 7 shows a comparison of the distribution of transcription factors (TFs) with binding motifs in the open chromatin region of the Gain group ( Figure 7A) and the Loss group ( Figure 7B), from which it can be seen that the GATA binding motif is between the two groups. There are specific differences.
  • Figure 8A shows that transcriptional factor-target gene regulatory network analysis indicates that GATA2 is located in the network core hub
  • Figure 8B shows that ATRA induces HL-60 cell differentiation, between GATA2 and other ATAC-seq-rich transcription factors. Interaction relationship.
  • Figure 9A and Figure 9B show the GATA2 gene region and regulatory region interaction and GATA2 gene expression changes after ATRA induction, respectively;
  • Figure 9C and Figure 9D show the ZBTB16 gene region and regulatory region interaction and ZBTB16 gene expression changes after ARTA induction, respectively;
  • Figure 10A shows the results of the 4C chromatin conformation capture upstream of the GATA2 gene
  • Figure 10B shows the in situ FISH results verifying the chromatin loop structure
  • Figure 10C shows the Pearson correlation coefficients for the red and green fluorescence distributions in the FISH experiment.
  • Fig. 11A shows the results of the 4C chromatin conformation capture of the ZBTB16 gene region
  • Fig. 11B and Fig. 11C show the models of Gata2 and Zbtb16 in the induced differentiation, respectively, which explain the relationship between chromatin structure, transcription factor binding and gene expression.
  • sample also referred to as “sample”, refers to any subject that can be analyzed, as long as the subject of the analysis contains chromatin and the expression product of the gene (eg, mRNA and/or protein), the sample may be a eukaryotic cell.
  • the sample may be a eukaryotic cell.
  • animal cells, plant cells, fungal cells, and the like, and sometimes lysates of cells may also be included.
  • state transition refers to a change in the nature or morphology of a sample by a particular additive induction or internal natural process for the same sample. For example, the differentiation of chemical agents, physical stimulation or differentiation of cells in natural physiological processes, such as the natural differentiation process of cells triggered by the action of external hormones or other signaling molecules or genes or proteins inside cells. In one embodiment, "at least two samples in different states” are formed by state transitions in the present invention.
  • sample in the first state and the “sample in the second state” refer to the samples of the two different states obtained after the state transition process.
  • the “sample in the first state” is the sample before the state transition
  • the “sample in the second state” is the sample after the state transition.
  • effector gene refers to a gene involved in the process of sample state transition.
  • the effector gene may be the cause of the state transition of the sample, that is, the process by which the gene can initiate a state transition.
  • the gene may be directly In response to the addition of the induced gene, which triggers the differentiation of the cell; in addition, the gene can also be a link in the middle of the state transition process, or simply as a result of a state transition.
  • a “gene” herein may refer to a gene, and may also refer to an expression product of a gene, such as an mRNA transcript or a protein.
  • the effector gene may be a transcription factor.
  • transcriptional regulatory region refers to a region within the genomic DNA that is located upstream or downstream of the gene, for example, 10 kb-1 Mb, 50 kb-500 kb, 100 kb-200 kb, including promoters, enhancers. A region of a site where a trans-acting factor (eg, a transcription factor) binds.
  • a trans-acting factor eg, a transcription factor
  • the analysis of the transcriptional regulatory region plays an important role in the selection of a target effector gene from a large number of alternative candidate genes, such as determining whether an interacting promoter and enhancer are present in the transcriptional regulatory region of the gene. , and whether the transcriptional regulatory region is an open sequence, and which binding motifs are present; whether the interaction between the enhancer and the promoter is altered in the samples of different states, and whether the distribution of the binding motif is Changes have also occurred, and combined with other information (such as the amount of gene expression) can effectively obtain effector genes.
  • gene regulatory-associated chromatin interactions also known as “chromosome interactions present in transcriptional regulatory regions” refers to sequences within different regulatory elements of the transcriptional regulatory region of a gene, such as promoters and enhancers. , the occurrence of chromatin interactions.
  • the reference genomic sequence is divided into regions of a size that can be adjusted according to the depth of the chromatin conformational analysis, such as the depth of sequencing, preferably in the range of 1 kb to 40 kb, for example 1 kb.
  • the region is 40 kb in size.
  • the regions containing the active promoter and the enhancer site are respectively obtained by alignment, and the region containing the promoter is preferably named as the gene region.
  • the region containing the enhancer sequence is named as a regulatory region; subsequently, combined with the signal of chromatin interaction in step v), the chromatin interaction frequency signal between the gene region and the regulatory region is analyzed (ie, between specific regions)
  • the number of chromatin interactions can be expressed as the number of reads in the Hi-C data where the two ends fall within a specific region, respectively. When there is a identifiable intensity contact signal between the gene region and the regulatory region, it is considered to have gene regulation. Related chromatin interactions.
  • gene-regulated chromatin interaction differences is sometimes referred to herein as “differences in chromatin interactions in transcriptional regulatory regions", “differences in chromatin interactions in transcriptional regulatory regions” or “existing in transcription”
  • the difference in chromatin interactions in the regulatory region the meaning of which is the same.
  • the method is obtained by comparing the gene regulation-related chromatin interactions of the samples in the first state and the second state, wherein there is a significant difference, which is identified as a differential gene-regulatory interaction. ), wherein “significance” preferably means statistically significant. For example, when a hypothesis test is employed, p ⁇ 0.05 or p ⁇ 0.01.
  • gene-regulated chromatin interaction refers to those chromatin-related chromatin interactions that are significantly different between a sample in a first state and a sample in a second state, ie, staining-related staining. As part of the qualitative interaction, the association between chromatin interactions and gene regulation is more certain in the identified gene-regulated chromatin interactions.
  • gene-regulated chromatin interactions are also divided into two types: the enhanced gene regulatory interaction of the sample in the second state relative to the sample in the first state (in the present invention) In the embodiment, this type of interaction is also classified as a Gain group) and/or attenuated gene regulatory interactions (in the embodiment of the invention, this type of interaction is also classified as the Loss group).
  • gene identifiable behavioral difference refers to a difference that can be qualitatively or quantitatively observed in relation to the nature, state, etc. of a gene in a sample of different states. Wherein the "gene” is not a specific part or a pre-selected gene by human intervention, but in a quantitative or qualitative analysis, a whole set of genes with identifiable behavioral differences is observed. Sometimes, for the sake of clarity, the above “gene” can be defined as a candidate gene or a candidate gene, but it should be noted that even the expressions of "alternative gene” or “candidate gene” are not used herein to indicate these. Alternative genes or candidate genes are part of a pre-selected range that needs to be artificially selected.
  • binding motif refers to an element which is present on genomic DNA and which can be bound by a trans-acting factor such as a transcription factor to regulate a target gene, for example, to regulate expression of an effector gene in the present invention.
  • difference in the distribution of bound phantoms refers to the difference in the number, position or presence of the whole or part of the bound phantom between samples in different states, or the portion located in the region of interest or The difference in the number, location, and presence or absence of a particular binding motif.
  • chromatin interaction refers to long-range interactions between different chromatin sites, thereby forming a high-level conformation of chromatin to maintain chromatin structure or to promote gene expression.
  • chromatin interaction frequency also referred to herein as “Hi-C interaction frequency” or Hi-C contact frequency
  • Hi-C interaction frequency refers to the different regions found in chromatin interactions when performing chromatin conformational analysis.
  • the signal of the interaction is expressed as the number of reads in the Hi-C data where the two ends fall within a specific area.
  • chromatin open region sequence refers to a DNA sequence that is exposed in chromatin due to nucleosome binding or the like and can be bound by a trans-acting factor such as a transcription factor.
  • ChIP-seq refers to a technique that combines immunoprecipitation (ChIP) with high-throughput sequencing to efficiently detect DNA segments that interact with histones, transcription factors, etc., across the genome.
  • the principle is as follows: Firstly, the DNA fragment of the target protein is specifically enriched by chromatin immunoprecipitation (ChIP), and purified and library constructed; then, the enriched DNA fragment is subjected to high-throughput sequencing. The obtained millions of sequence reads are then accurately mapped to the genome, that is, DNA segment information that interacts with histones, transcription factors, and the like in a genome-wide range is obtained.
  • chromatin conformation capture technology refers to all techniques that enable the relationship between different spatial positions of chromatin to establish chromatin three-dimensional structure information, including the common 3C technology, namely Chromosome Conformation Capture, which also includes high-throughput Sequencing chromatin conformation capture technology.
  • high-throughput chromatin conformation capture technology refers to the combination of high-throughput sequencing technology and bioinformatics analysis methods to efficiently analyze the spatial position of the entire chromatin DNA in the genome-wide range and achieve high resolution.
  • a method for chromatin three-dimensional structure and chromatin interaction information includes at least Hi-C, Hi-C based improved technology in situ Hi-C, and BL-Hi-C obtained after further introduction of bridge-linker based on the in situ Hi-C method.
  • the ChIA-PET method is also a high-throughput chromatin conformation capture technique.
  • ATAC-seq refers to a technique for studying chromatin accessibility in molecular biology. It consists of two parts, the ATAC experiment and high-throughput sequencing.
  • the key part of the ATAC-seq experiment is the transposase Tn5 to the sample genome.
  • the role of DNA The transposon preferentially incorporates a genomic region that is generally free of nucleosomes (nuclear bodies) or exposed DNA segments. Therefore, the enrichment of certain locus sequences in the genome indicates that there are no nucleosomes in the region, and is in a loosely exposed state in which nuclear machinery such as DNA-binding proteins can enter, providing information on the transcriptional active state of the chromatin segment.
  • ATAC-seq employs a mutated multi-active transposase that allows for efficient cleavage of exposed DNA and simultaneous ligation of specific sequences.
  • the ligation-ligated DNA fragment was isolated and amplified by PCR for high-throughput sequencing.
  • HL60 cells were purchased from the National Experimental Cell Resource Sharing Platform (Beijing Union Medical College, China). The cells were maintained in RPMI-1640 medium (Gibco, supplemented with 10% fetal calf serum (FBS, Gibco, USA), 50 units/mL penicillin and streptomycin (Gibco, USA) and non-essential amino acids (Gibco, USA). In the United States.
  • HL-60 cells For granulocyte differentiation, 2 ⁇ 10 5 /ml HL-60 cells were treated with 1 ⁇ M ATRA (1 mM stock solution in ethanol, Sigma, USA) for 4 days (referred to as ATRA group); cells treated with equal amounts of ethanol were called controls. group. The medium was changed on the second day while adding ATRA/ethanol.
  • RNA-seq analysis the adaptor sequence was first removed, followed by Bowtie to compare the data back to the reference genome hg19, filtering out the sequencing reads of the ribosomal RNA. After the above steps, the remaining read data was compared with the transcriptome data RSEM v1.2.7 and quantitative analysis was performed.
  • the annotated file was downloaded from the Human Genome 19 version of the genome browser at the University of California, Santa Cruz (UCSC).
  • Differential gene expression was calculated from the mean of gene-wise dispersion estimates using a Deseq2 version 1.4.5 software package. A gene with a significant difference in expression was determined based on an adjusted p-value equal to 0.01 and a fold change value of greater than 0.9 after log2.
  • Gene Ontology analysis uses DAVID. RNA-seq plots (plots) were drawn using the Rg 3.3.1 version of the ggplot2 software package.
  • DEGs Differentially expressed genes
  • Example 3 The level of chromatin interaction in TAD is closely related to gene expression level, gene promoter and enhancer activity.
  • the cells were digested with proteinase K (Ambion) overnight, and DNA was extracted and purified using phenol-chloroform (Solarbio) in combination with ethanol precipitation. Then, the DNA was fragmented using a S220 focused sonicator (Covaris), and the biotin-labeled DNA fragment was bound by streptavidin-coated Dynabeads M280 (Thermo Fisher). Libraries prepared from magnetic beads were subjected to Illumina sequencing and amplified by PCR. After purification with AMPure XP beads (Beckman, Germany), sequencing was performed using an Illumina HiSeq 2500 sequencer.
  • the bridge linker sequence (sequence: CGCGATATCTTATCTGACT or GTCAGATAAGATATCGCGT) in the read is removed, and if the complete link divides the read into two segments, the 5' segment is retained.
  • the reads processed as described above are mapped to the human genome hg19 version, while the duplicate fragments are removed.
  • Hi-C data correction The iterative correction method (ICE) was used to correct the system bias, followed by a 40 kb resolution interaction matrix.
  • TAD interaction fold change To calculate the fold change in TAD internal and external Hi-C counts upon ATRA induction, the fold change between replicates was first calculated. For each TAD, the fold change of the control and ATRA group cells was combined to generate a background distribution (internal and external fold changes were calculated separately). The fold change between ATRA treated cells and control cells is then introduced into the background distribution to obtain p values based on their position in the background distribution. A TAD with a p-value ⁇ 0.05 in both replicates was defined as a significantly altered TAD.
  • the ATRA-treated HL-60 cells and control cells were cross-linked with 1% formaldehyde. Then, the cell membrane was lysed using lysis buffer (50 mM HEPES-KOH, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate and 1% SDS). Chromatin was resuspended in FA lysis buffer (50 mM HEPES-KOH, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate and 0.1% SDS) followed by sonication processor (Cole-Parmer) , United States) for fragmentation.
  • lysis buffer 50 mM HEPES-KOH, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate and 0.1% SDS
  • sonication processor Cold-Parmer
  • Immunoprecipitation was performed using overnight treatment with Dynabeads (Thermo Fisher) pre-incubated with H3K4me3 and H3K27ac antibodies (Abeam, England). After washing and purifying the DNA, library construction was performed using the TruePrep DNA Library Preparation Kit (Vazyme, China) according to the experimental manual provided by the manufacturer. The library was sequenced using an Illumina HiSeq 2500 sequencer.
  • the linker sequence was first removed.
  • the sequencing reads were then aligned to the human genome hg19 version using Bowtie.
  • the histone modification peak was generated using the MACS2 software call and the parameter was set to '-g hs--nomodel--broad'.
  • the peaks present in both replicates were considered confidence peaks.
  • the confidence peaks obtained by the control and ATRA-treated cells were compared to distinguish the ATRA-treated group-specific peaks, the control-group-specific peaks, and the overlapping peaks. Peak comparisons were made using the interactBed software in bedtools.
  • Fig. 3A, 3B The results showed that after TTRA induction, TAD was more likely to have differentially expressed genes in TAD, which showed significant changes in internal interactions compared to TAD, which did not change significantly in internal interactions.
  • TAD with increased internal chromatin interaction enriches the differentially expressed genes, while TAD with reduced internal chromatin interactions exhibits differentially expressed differentially expressed genes (Fig. 3B).
  • the above results show that there is a positive correlation between the change in gene expression in TAD and the frequency of internal chromatin interaction.
  • the level of chromatin interaction in the TAD is very closely related to gene regulation, and can be further used to find the target functional gene.
  • Example 4 "differential gene-regulatory chromatin interactions" can effectively indicate differences in gene expression
  • Example 3 On the basis of Example 3, in order to better quantify the TAD internal chromatin interaction level and apply it to the identification of genes, the ChIP-seq data in Example 3 was first used, and 12295 was identified in the ATRA group cells.
  • a H3K4me3 peak and a 12493 H3K27ac peak identified 14263 H3K4me3 and 22149 H3K27ac peaks in the control cells (Fig. 5A).
  • the H3K4me3 peak represents the active promoter
  • the H3K27ac peak represents the active transcriptional region and enhancer.
  • Chromatin interactions can pull distal regulatory elements, promoters, and transcription initiation sites for transcription initiation, in order to further determine how changes in chromatin interaction affect gene expression and which genes are affected Expression, carried out the following steps:
  • the entire gene was first assembled into a 40 kb bin.
  • the bin contains a promoter that expresses the gene (ie, the H3K4me3 peak)
  • the bin is labeled as "gene region”
  • the H3K27ac peak at the distal end of the promoter is included (Fig. 5C)
  • the bin is labeled as "regulatory region” ".
  • the Hi-C interaction relationship (Hi-C read) between the gene region and the regulatory region is used to express the intensity of chromatin interactions associated with gene regulation.
  • Hi-C read Hi-C read
  • the count of [i, j] represents a chromatin interaction associated with gene regulation.
  • the chromatin interactions associated with gene regulation within the same TADs serve as input for differential interaction analysis.
  • the difference interaction analysis is based on the MA curve method and the random sampling model, namely:
  • the Hi-C experiment is considered to be a sampling of chromatin interactions in most cells, so the reads of the interaction between the two regions in the Hi-C data follow a binomial distribution.
  • M (log 2 C 1 - log 2 C 2 )/2
  • A (log 2 C 1 + log 2 C 2 )/2.
  • Example 5 There is a correlation between the difference in gene regulatory chromatin interactions and the accessibility of chromatin
  • control sample and the ATRA-treated sample were separately placed in ice in a lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 and NP-40) for 10 minutes to prepare a nucleus immediately after cell lysis. Rotate to remove the supernatant.
  • the nuclei were then incubated with Tn5 transposome and labeling buffer for 30 minutes at 37 °C (Vazyme, China). After the labeling, the stop buffer was directly added to the reaction system to end the labeling reaction.
  • a 12-cycle PCR was then performed to amplify the library. After the PCR reaction, the library was purified using 1.2 x AMP beads (Beckman, Germany) and the resulting library was sequenced using an Illumina HiSeq 2500 sequencer.
  • the linker sequences were removed and the sequences aligned into the human genome hg19 version.
  • the ATAC-seq peak is called up using MACS2 using default parameters, followed by peak comparison.
  • MACS2 To identify sequence motifs rich in ATAC-seq peaks, use Mififs Genome.pl in the HOMER program.
  • AnnotatePeaks.pl is used to identify specific peaks that contain certain motifs.
  • the GREAT analysis of the ATAC-seq peak is described in the literature (McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010 May2; 28 (5 ): 495–501).
  • the regulatory region showed a stronger enrichment tendency than the gene region, indicating that changes in TF binding in the open chromatin region, particularly in the distal regulatory region, regulate the formation of chromatin interactions.
  • Example 6 Identification of key gene transcription associated with chromatin structural changes by integrating multi-omics data in an ATRA induction model
  • GATA motif sequences (such as GATA1 and GATA2) are only significantly enriched in the regulatory region of the Loss group, suggesting that GATA transcription factors have a unique role in chromatin interactions, further binding RNA- According to the seq data, since the expression level of GATA2 mRNA is significantly down-regulated after differentiation ( ⁇ 0.06-fold), the loss of GATA2 binding may contribute to the ATRA induction process; thus, by synthesizing the above multi-omics data, GATA2 was successfully identified as a candidate. gene.
  • Example 8 ATRA induction reduces chromatin interactions between the GATA2 promoter and regulatory regions
  • the cells were cross-linked with 1% formaldehyde, and the nuclei were separated by resuspending in lysis buffer (500 ⁇ l of 10 mM Tris-HCl pH 8.0, 10 mM NaCl, 0.2% Igepal CA-630, and 50 ⁇ l protease inhibitor), and then, The nuclei were washed by 1 x NEBuffer 2 (NEB, UK) and treated with 0.3% SDS at 65 °C. Subsequent digestion with HindIII at 37 °C overnight followed by proximal ligation at 4 °C for 4 hours.
  • lysis buffer 500 ⁇ l of 10 mM Tris-HCl pH 8.0, 10 mM NaCl, 0.2% Igepal CA-630, and 50 ⁇ l protease inhibitor
  • DNA was purified by extracting a combined ethanol precipitate using phenol-chloroform (Solarbio). The second digestion step was carried out overnight using DpnII. Then, we performed a second ligation and ethanol precipitation to extract the DNA. DNA was finally purified using the QIAquick PCR Purification Kit (QIAGEN, Germany) according to the manufacturer's protocol. After the PCR reaction, we purified the 4C library using AMPure beads (Beckman, Germany) and sequenced the library using an Illumina HiSeq 2500 sequencer.
  • the adaptor sequence was first removed using the cutadapt software in SAMtools. Use Bowtie to match the read length to the human hg19 version of the genomic information. Then, using the RPM standardization, the corresponding data is processed using the r3Cseq packet in R 3.3.1.
  • the Gata2 promoter has strong chromatin interactions with three upstream enhancers indicated by the H3K27ac peak (ie, chr3: 128240590-128254410, 128262419-128292429 and 128309790-128334446).
  • the position of enhancer E3 (approximately 80 kb upstream of the GATA2 promoter) is very close to the known enhancer, confirming the reliability of the 4C data.
  • the intensity of interaction between the Gata2 promoter and the upstream region decreased to varying degrees (0.54, 0.46- and 0.4-fold for E1, E2 and E3, respectively), consistent with a decrease in H3K27ac (Fig. 10A).
  • FIG. 8A Another important gene encoding the zinc finger protein ZBTB16 (also known as PLZF) involves differences in gene regulatory chromatin interactions. After ATRA induction, the expression level of this gene and chromatin interaction were significantly reduced (Fig. 9C).
  • Fig. 9C A previous study (Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, et al. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell.2015 Dec17;163(7): 1611–1627) Three upstream and two downstream anchors bound by CTCF near the Zbtb16 gene locus were identified in K562 cells using ChIP-PET.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a method for identifying gene-regulatory chromatin interaction by using various genome analysis technologies, for example RNA-seq, ChIP-seq, and Hi-C, to characterize and analyze a target property of a sample undergoing state transition (such as cell differentiation) in multiple aspects. In particular, the present invention relates to a method for identifying chromatin interaction and a related gene associated with sample state transition during the transition.

Description

基因调控性染色质相互作用的鉴定方法、系统及其应用Method, system and application for identifying gene regulatory chromatin interaction
交叉引用cross reference
本申请要求发明名称为“基因调控性染色质相互作用的鉴定方法、系统及其应用”于2018年1月5日提交到中国专利局的中国专利申请201810011140.7的优先权,其内容通过引用以整体并入本文。The present application claims the priority of the invention as "identification method, system and application of gene-regulated chromatin interaction" on January 5, 2018, to Chinese Patent Application No. 201810011140.7 of the Chinese Patent Office, the contents of which are incorporated by reference. Incorporated herein.
技术领域Technical field
本发明涉及一种鉴定基因调控性染色质相互作用的方法,特别涉及一种鉴定在样品状态转变过程中,能够影响样品转变的染色质相互作用以及相应的效应基因的方法。The present invention relates to a method for identifying gene-regulated chromatin interactions, and more particularly to a method for identifying chromatin interactions and corresponding effector genes that are capable of affecting sample transitions during sample state transitions.
背景技术Background technique
染色质构象在基因表达的调控中起关键作用。研究发现有丝分裂间期染色体占据特定的领域,基因转录与基因相对于核纤层以及染色体领域的相对位置密切关联。最近使用高通量染色体构象捕获(Hi-C)的研究已经揭示,基因组被组织成几百个千碱基至一个兆碱基的拓扑相关结构域(Topologically Associating Domain,简称TAD),并且TAD中的染色质区域更可能与同一个TAD内的其他区域发生作用,而不是与TAD以外的区域发生作用。并且在不同的细胞类型之间,大多数TAD位置是保持不变的,并显示出了进化上的保守性。Chromatin conformation plays a key role in the regulation of gene expression. Studies have found that mitotic interphase chromosomes occupy a specific domain, and gene transcription is closely related to the relative position of genes relative to the nuclear fibrosis and chromosome domains. Recent studies using high-throughput chromosome conformation capture (Hi-C) have revealed that genomes are organized into hundreds of kilobases to one megabase Topologically Associating Domain (TAD), and in TAD The chromatin region is more likely to interact with other regions within the same TAD than with regions other than TAD. And between different cell types, most of the TAD positions remain unchanged and show evolutionary conservation.
在同一个TAD的基因在面临激素刺激时或者在分化过程中显示出了相互协调的变化,这表明了TAD不仅是一个结构结构单元,而且还能够作为转录调节的功能单位。此外,在TAD内,由特定蛋白质或非编码RNA介导的长程染色质相互作用将远距离的调控区域,如增强子和基因启动子连接起来,从而使基因表达的远距离调控成为可能。The genes in the same TAD showed coordinated changes in the face of hormone stimulation or during differentiation, indicating that TAD is not only a structural structural unit, but also a functional unit of transcriptional regulation. In addition, long-term chromatin interactions mediated by specific or non-coding RNAs in TAD link long-range regulatory regions, such as enhancers and gene promoters, making long-range regulation of gene expression possible.
例如,在细胞分化中,常常伴随着关键基因的表达差异以及染色质三维结构或者构象的巨大改变,但是目前并没有有效的手段能够用于确定在这个过程中染色质结构的改变如何与关键基因的表达等行为的相互关联, 以及这种相互关联是如何影响到细胞分化等状态变化的。因此本领域迫切需要一种新的方法,能够有效的分析并鉴定出在状态转变过程中,具有调节功能的染色质相互作用或者与受到染色质相互作用影响或调节的,对状态转变具有重要作用的关键基因或者调控因子。For example, in cell differentiation, often accompanied by differences in the expression of key genes and large changes in the three-dimensional structure or conformation of chromatin, there is currently no effective means for determining how changes in chromatin structure and key genes are involved in this process. The interrelationship of expressions and other behaviors, and how this correlation affects state changes such as cell differentiation. Therefore, there is an urgent need in the art for a new method that can effectively analyze and identify chromatin interactions with regulatory functions or influence or regulation of chromatin interaction during state transitions, which plays an important role in state transition. Key genes or regulatory factors.
发明内容Summary of the invention
发明人经过长期的研究,获得了一种将染色质相互作用、特定基因的表达水平和/或识别位点与染色质内部基因调控这三个方面进行关联的方法,从而完成了本发明。After long-term research, the inventors have obtained a method for correlating chromatin interaction, expression level and/or recognition site of a specific gene with gene regulation inside chromatin, thereby completing the present invention.
在第一个方面中,本发明涉及一种样品状态转变效应基因的鉴定方法,所述效应基因的表达受到样品状态转变中染色质相互作用改变的影响,其包括下列步骤:In a first aspect, the invention relates to a method for identifying a sample state transition effector gene whose expression is affected by a change in chromatin interaction in a state transition of a sample, comprising the steps of:
(1)对处于第一状态和第二状态的样品进行比较,从而至少获得下列差异信息:基因可识别行为差异,以及存在于基因转录调控区域的染色质相互作用差异,和(1) comparing the samples in the first state and the second state, thereby obtaining at least the following difference information: the difference in the identifiable behavior of the gene, and the difference in chromatin interactions existing in the transcriptional regulatory region of the gene, and
(2)将步骤(1)获得的差异信息建立关联,获得与状态转变中转录调控区域的染色质相互作用差异有关的基因可识别行为差异,从而鉴定所述效应基因。(2) correlating the difference information obtained in the step (1) to obtain a difference in the identifiable behavior of the gene related to the chromatin interaction difference in the transcriptional regulatory region in the state transition, thereby identifying the effector gene.
在一个实施方式中,其中所述样品是细胞。In one embodiment, wherein the sample is a cell.
在另一个实施方式中,其中所述基因可识别行为差异包括基因表达量的差异和/或基因调控区域基因组序列中结合模体分布的差异;优选的,所述基因表达量的差异是mRNA表达量差异或蛋白质表达量差异。In another embodiment, wherein the gene recognizable behavioral difference comprises a difference in gene expression amount and/or a difference in binding pattern distribution in a gene regulatory region genomic sequence; preferably, the difference in gene expression amount is mRNA expression Difference in amount or difference in protein expression.
在另一个实施方式中,其中通过以下步骤获得步骤(1)中存在于基因转录调控区域的染色质相互作用差异:In another embodiment, wherein the chromatin interaction difference present in the transcriptional regulatory region of the gene in step (1) is obtained by the following steps:
(a)鉴定样品基因组中处于激活状态的启动子和/或增强子的位点;(a) identifying the site of the promoter and/or enhancer in the active state of the sample genome;
(b)鉴定所有染色质相互作用的发生区域;(b) Identify the areas where all chromatin interactions occur;
(c)整合步骤(a)和步骤(b)所获得的信息,得到位于激活状态的启动子和增强子之间存在的染色质相互作用,即存在于转录调控区域的染色质相互作用;和(c) integrating the information obtained in step (a) and step (b) to obtain a chromatin interaction between the promoter and the enhancer in an activated state, ie, a chromatin interaction present in the transcriptional regulatory region;
(d)将不同样品之间存在于转录调控区域的染色质相互作用进行比 较,得到存在于基因转录调控区域的染色质相互作用差异。(d) Comparing chromatin interactions between different samples in the transcriptional regulatory region, resulting in differences in chromatin interactions present in the transcriptional regulatory regions of the gene.
在另一个实施方式中,其中通过以下步骤获得步骤(1)中的基因可识别行为差异:In another embodiment, wherein the genetically identifiable behavior difference in step (1) is obtained by the following steps:
i)获得处于处于第一状态和第二状态的样品;i) obtaining a sample in a first state and a second state;
ii)取部分处于第一状态和第二状态的样品,分别进行转录表达分析,并比较样本间的mRNA表达量差异;优选的,转录表达分析采用RNA测序即RNA-seq法。Ii) taking a portion of the sample in the first state and the second state, respectively performing transcriptional expression analysis, and comparing the difference in mRNA expression between the samples; preferably, the transcriptional expression analysis is performed by RNA sequencing, ie, RNA-seq method.
在另一个实施方式中,其中进一步包括:In another embodiment, further comprising:
iii)取部分处于第一状态和第二状态的样品,分别进行染色质开放区域序列的分析,优选使用ATAC-seq法,并分析染色质开放区域序列中所分布的转录因子结合模体,优选进一步比较处于第一状态和第二状态的样品间的结合模体的分布差异。Iii) taking a portion of the sample in the first state and the second state, respectively analyzing the chromatin open region sequence, preferably using the ATAC-seq method, and analyzing the transcription factor binding motif distributed in the chromatin open region sequence, preferably The difference in distribution of the bonded phantoms between the samples in the first state and the second state is further compared.
在另一个实施方式中,其中通过以下步骤分析步骤(1)中存在于基因转录调控区域的染色质相互作用差异:In another embodiment, wherein the chromatin interaction differences present in the transcriptional regulatory region of the gene in step (1) are analyzed by the following steps:
iv)取部分处于第一状态和第二状态的样品,鉴定分别处于激活状态的启动子和/或增强子信息,优选的,采用ChIP-seq法进行鉴定,所述ChIP-seq法中所使用的抗体优选为H3K4me3和H3K27ac的结合抗体,所述抗体分别结合H3K4me3和H3K27ac形成信号峰,分别代表了处于激活状态的启动子和增强子位点;Iv) taking a portion of the sample in the first state and the second state, identifying promoter and/or enhancer information respectively in an activated state, preferably using the ChIP-seq method, which is used in the ChIP-seq method Preferably, the antibody is a binding antibody of H3K4me3 and H3K27ac, which forms a signal peak in combination with H3K4me3 and H3K27ac, respectively, representing promoter and enhancer sites in an activated state;
v)另取部分处于第一状态和第二状态的样品,采用染色质构象捕获技术,优选采用高通量染色质构象捕获技术,例如Hi-C法、in situ Hi-C法、BL-Hi-C法或ChIA-PET法,获得全基因组染色质相互作用的信息;或者利用4C或5C法获得局部染色质相互作用的信息;v) taking another sample in the first state and the second state, using chromatin conformation capture technology, preferably using high-throughput chromatin conformation capture techniques, such as Hi-C method, in situ Hi-C method, BL-Hi -C method or ChIA-PET method to obtain information on genome-wide chromatin interaction; or to obtain information on local chromatin interaction using 4C or 5C method;
vi)将参考基因组序列划分成一定大小的区域,优选的,所述区域大小在1-40kb之间,例如1kb、5kb、10kb、15kb、20kb、25kb、30kb、35kb或40kb,基于步骤iv)获得的活性状态的启动子和增强子信息,通过比对分别获得包含有活性的启动子和增强子位点的区域,优选将包含启动子的区域命名为基因区域,将包含增强子序列的区域命名为调控区域;Vi) dividing the reference genomic sequence into regions of a certain size, preferably between 1 and 40 kb, for example 1 kb, 5 kb, 10 kb, 15 kb, 20 kb, 25 kb, 30 kb, 35 kb or 40 kb, based on step iv) Promoter and enhancer information of the obtained active state, by separately obtaining a region containing the active promoter and enhancer sites, preferably the region including the promoter is named as a gene region, and the region containing the enhancer sequence Named the regulatory area;
随后,结合步骤v)获得的染色质相互作用的频率信号,识别出发生于基因区域和调节区域之间的染色质相互作用频率信号,从而得到基因调控相关的染色质相互作用;然后将处于第一状态和第二状态的样品之间的 基因调控相关染色质相互作用进行比较,其中具有统计显著性差异的,被鉴定为基因调控性染色质相互作用差异(differential gene-regulatory interaction),包括相对于处于第一状态的样品,处于第二状态的样品中增强的基因调控性染色体相互作用和/或减弱的基因调控性染色体相互作用。Subsequently, in combination with the frequency signal of the chromatin interaction obtained in step v), a chromatin interaction frequency signal occurring between the gene region and the regulatory region is identified, thereby obtaining a chromatin interaction associated with gene regulation; Comparison of gene regulation-related chromatin interactions between samples in a state and a second state, with statistically significant differences, identified as differential gene-regulatory interactions, including relative In the sample in the first state, the enhanced gene-regulated chromosomal interaction and/or the attenuated gene-regulated chromosomal interaction in the sample in the second state.
在另一个实施方式中,其中步骤(2)具体包括下述步骤:In another embodiment, the step (2) specifically includes the following steps:
a)将基因表达量差异与基因调控性染色质相互作用差异结合,选择在不同状态的样品中,存在于增强或减弱的基因调控性染色质相互作用内部、同时表达量也具有显著变化的基因;或a) Combine differences in gene expression levels with gene-regulated chromatin interactions, and select genes that are present in enhanced or diminished gene-regulated chromatin interactions with significant changes in expression in different states of the sample. ;or
b)将基因组转录调控区域中转录因子结合模体分布的差异与基因调控性染色质相互作用差异信息相结合,选择在不同状态的样品中,存在于增强的或减弱的基因调控性染色质相互作用内部、转录因子结合模体分布也发生显著变化的基因;或b) Combine the difference in transcription factor binding motif distribution in the genomic transcriptional regulatory region with the information on the difference in gene regulatory chromatin interaction, and select the enhanced or weakened gene regulatory chromatin in the different states of the sample. a gene whose internal action, transcription factor binding motif distribution also undergoes significant changes; or
c)将基因表达量差异、转录因子结合模体分布的差异与基因调控性染色质相互作用差异信息相结合,选择在不同状态的样品中,存在于增强的或减弱的基因调控性染色质相互作用内部、基因组转录调控区域中结合模体分布发生显著变化、同时表达量也具有显著变化的基因。c) Combine the difference in gene expression, the difference in transcription factor binding motif distribution with the information on the difference in gene regulatory chromatin interaction, and choose to enhance or attenuate gene regulatory chromatin in different states of the sample. In the internal and genomic transcriptional regulatory regions, genes whose binding motif distribution changes significantly and the expression level also changes significantly.
在另一个实施方式中,此外还包括对筛选出来的效应基因进行功能研究以确定其功能的步骤。In another embodiment, a step of functionally studying the screened effector genes to determine their function is further included.
在另一个实施方式中,其还进一步包括鉴定获得基因调控性染色质相互作用的步骤,即将能够影响步骤(2)鉴定得到的效应基因表达的染色质相互作用,作为基因调控性染色质相互作用。In another embodiment, it further comprises the step of identifying a gene-regulated chromatin interaction, a chromatin interaction that is capable of affecting the expression of the effector gene identified in step (2), as a gene-regulated chromatin interaction .
在另一个实施方式中,其中所述样品状态转变通过下述方式实现:化学试剂诱导、自然分化和/或物理刺激。In another embodiment, wherein the sample state transition is achieved by chemical agent induction, natural differentiation, and/or physical stimulation.
在第二个方面中,本发明涉及一种鉴定能够调控样品状态转变的染色质相互作用的方法,其包括第一方面中任一实施方式所述的步骤。In a second aspect, the invention relates to a method of identifying a chromatin interaction capable of modulating a state transition of a sample, comprising the steps of any of the first aspects.
在第三个方面中,本发明涉及一种鉴定参与样品状态转变所涉及染色质相互作用的调控因子的方法,其包括第一方面中任一实施方式所述的步骤。In a third aspect, the invention relates to a method of identifying a regulatory factor involved in a chromatin interaction involved in a state transition of a sample, comprising the steps of any of the first aspects.
在第四个方面中,本发明涉及一种鉴定能够调控染色质相互作用的物质的方法,其包括:利用第一个方面的任一实施方式鉴定得到染色质相互 作用的效应基因或基因调控性染色质相互作用,随后将待测物质与样品接触,分析所述效应基因或基因调控性相互作用的变化。In a fourth aspect, the invention relates to a method of identifying a substance capable of modulating chromatin interaction, comprising: identifying an effector gene or gene regulation of a chromatin interaction using any of the embodiments of the first aspect The chromatin interacts, and then the substance to be tested is contacted with the sample to analyze changes in the effector gene or gene regulatory interaction.
在第五个方面中,本发明涉及一种样品状态转变效应基因的鉴定系统,所述效应基因的表达受到样品状态转变中染色质相互作用改变的影响,包括下述模块:In a fifth aspect, the invention relates to an identification system for a sample state transition effector gene whose expression is affected by changes in chromatin interactions in a state transition of a sample, comprising the following modules:
(1)基因可识别行为差异分析模块;(1) Gene identifiable behavior difference analysis module;
(2)转录调控区域的染色质相互作用差异的分析模块;和(2) an analysis module for differences in chromatin interactions in the transcriptional regulatory region;
(3)效应基因鉴定模块;(3) an effect gene identification module;
所述系统能够获得与染色质相互作用差异相关的基因可识别行为差异,从而获得受染色质相互作用影响的样品状态转变效应基因。The system is capable of obtaining genetically identifiable behavioral differences associated with differences in chromatin interactions, thereby obtaining sample state transition effector genes that are affected by chromatin interactions.
优选的,所述系统还进一步包括基因调控性染色质相互作用鉴定模块,从而鉴定能够影响所述效应基因表达的基因调控性染色质相互作用。Preferably, the system further comprises a gene regulatory chromatin interaction recognition module to identify gene regulatory chromatin interactions that are capable of affecting expression of the effector gene.
在一个实施方式中,其中所述基因可识别行为差异分析模块能够分析基因表达量的差异和/或基因的转录调控区域的基因组序列中转录因子结合模体分布的差异。In one embodiment, wherein the gene recognizable behavioral difference analysis module is capable of analyzing a difference in gene expression levels and/or a difference in transcription factor binding motif distribution in a genomic sequence of a transcriptional regulatory region of the gene.
在另一个实施方式中,其中转录调控区域的染色质相互作用差异的分析模块能够执行下述分析:In another embodiment, an analysis module in which the chromatin interaction differences of the transcriptional regulatory regions are capable of performing the following analysis:
(a)鉴定样品基因组中处于激活状态的启动子和/或增强子的位点;(a) identifying the site of the promoter and/or enhancer in the active state of the sample genome;
(b)鉴定所有染色质相互作用的发生区域;(b) Identify the areas where all chromatin interactions occur;
(c)整合步骤(a)和步骤(b)所获得的信息,得到位于激活状态的启动子和增强子之间存在的染色质相互作用,即存在于转录调控区域的染色质相互作用;和(c) integrating the information obtained in step (a) and step (b) to obtain a chromatin interaction between the promoter and the enhancer in an activated state, ie, a chromatin interaction present in the transcriptional regulatory region;
(d)将不同样品之间存在于转录调控区域的染色质相互作用进行比较,得到存在于转录调控区域的染色质相互作用差异。(d) Comparison of chromatin interactions between different samples in the transcriptional regulatory region to obtain differences in chromatin interactions present in the transcriptional regulatory regions.
在又一个实施方式中,所述效应基因鉴定模块能够执行下述分析:In yet another embodiment, the effector gene identification module is capable of performing the following analysis:
a)将基因表达量差异与基因调控性染色质相互作用差异结合,选择在不同状态的样品中,存在于增强或减弱的基因调控性染色质相互作用内部、同时表达量也具有显著变化的基因;或a) Combine differences in gene expression levels with gene-regulated chromatin interactions, and select genes that are present in enhanced or diminished gene-regulated chromatin interactions with significant changes in expression in different states of the sample. ;or
b)将基因组转录调控区域中转录因子结合模体分布的差异与基因调控性染色质相互作用差异信息相结合,选择在不同状态的样品中,存在于 增强的或减弱的基因调控性染色质相互作用内部、转录因子结合模体分布也发生显著变化的基因;或b) Combine the difference in transcription factor binding motif distribution in the genomic transcriptional regulatory region with the information on the difference in gene regulatory chromatin interaction, and select the enhanced or weakened gene regulatory chromatin in the different states of the sample. a gene whose internal action, transcription factor binding motif distribution also undergoes significant changes; or
c)将基因表达量差异、转录因子结合模体分布的差异与基因调控性染色质相互作用差异信息相结合,选择在不同状态的样品中,存在于增强的或减弱的基因调控性染色质相互作用内部、基因组转录调控区域中结合模体分布发生显著变化、同时表达量也具有显著变化的基因。c) Combine the difference in gene expression, the difference in transcription factor binding motif distribution with the information on the difference in gene regulatory chromatin interaction, and choose to enhance or attenuate gene regulatory chromatin in different states of the sample. In the internal and genomic transcriptional regulatory regions, genes whose binding motif distribution changes significantly and the expression level also changes significantly.
在第六个方面中,本发明涉及一种检测试剂盒,其包含本发明第一至第四方面所述方法中所使用的试剂。In a sixth aspect, the present invention relates to a detection kit comprising the reagents used in the methods of the first to fourth aspects of the invention.
本发明的方法通过整合多个组学试验方法和结果,从而建立了染色质构象与转录调控之间的分析和鉴定方法。该方法可应用于分析多个生物学过程,如细胞分化、个体发育、细胞变异、疾病治疗等方面,从而可以在染色质构象水平鉴定出对上述过程具有重要影响的染色质相互作用和调控因子。The method of the present invention establishes an analysis and identification method between chromatin conformation and transcriptional regulation by integrating a plurality of histological test methods and results. The method can be applied to the analysis of multiple biological processes, such as cell differentiation, ontogeny, cell variability, disease treatment, etc., so that chromatin interactions and regulatory factors having important influence on the above processes can be identified at the chromatin conformation level. .
附图说明DRAWINGS
图1显示了本发明一个实施方式的整体流程。Figure 1 shows the overall flow of one embodiment of the present invention.
图2显示了本发明一个实施例中,经过全反式维甲酸(All-trans-retinoic acid,缩写为ATRA)诱导后的HL-60细胞与对照组相比mRNA表达变化,图2A显示了维甲酸诱导后基因表达的变化情况,图2B显示了差异表达基因在不同GO分类中的富集情况。Figure 2 is a graph showing the change in mRNA expression of HL-60 cells induced by All-trans-retinoic acid (abbreviated as ATRA) compared with the control group in one embodiment of the present invention, and Figure 2A shows the dimension. The change in gene expression after formic acid induction, Figure 2B shows the enrichment of differentially expressed genes in different GO classifications.
图3显示了经过ATRA诱导后,TAD内部的染色质相互作用频率变化(图3A)以及频率变化与差异表达基因的关系(图3B)。Figure 3 shows the change in the frequency of chromatin interactions within the TAD after induction with ATRA (Fig. 3A) and the relationship between frequency changes and differentially expressed genes (Fig. 3B).
图4显示了具有TAD内染色质相互作用频率变化与H3K4me3和H3K27ac修饰变化的关系。Figure 4 shows the relationship between changes in the frequency of chromatin interactions in TAD and changes in H3K4me3 and H3K27ac modifications.
图5显示了本发明一个实施例中所获得的基因表达调控相关的染色质相互作用差异,其中图5A显示了H3K4me3信号和H3K27ac信号在ATRA处理组和对照组中的变化,图5B显示了ATRA和对照组的特异性峰和共有峰分别与转录起始位点的距离,图5C显示了基因调控相关的染色质相互作用的寻找方式,图5D显示了Gain与Loss组的基因区域内H3K27ac的信号,以及在基因区域内H3K4me3的信号的相对比较,图5E显示了Gain和Loss组中差异表达基因的情况。Figure 5 shows the chromatin interaction differences associated with gene expression regulation obtained in one embodiment of the present invention, wherein Figure 5A shows the changes in the H3K4me3 signal and the H3K27ac signal in the ATRA-treated and control groups, and Figure 5B shows the ATRA. The specific and shared peaks of the control group and the control group were separated from the transcription start site, respectively. Figure 5C shows the search for chromatin interactions related to gene regulation. Figure 5D shows the H3K27ac in the gene region of the Gain and Loss groups. Signals, as well as relative comparison of H3K4me3 signals in the gene region, Figure 5E shows the differentially expressed genes in the Gain and Loss groups.
图6显示了本发明一个实施例中经过ATRA诱导后,ATAC-seq测定染色质开放区域的结果比较。图6A和6B显示了各组的特异性峰以及峰在基因组的分布,图6C和图6D显示了分别在调控区域和基因区域中,Gain组和Loss组相互作用内ATRA处理组和对照组特异性以及共有的染色质开放信号分布,其中Gain组更富集ATRA处理组特异性信号,而Loss组更富集对照组特异性信号,从中还能够看出调控区域相比于基因区域更富集了特异性信号。图6E显示了在对照组和ATRA诱导组中,在染色质开放区域上富集的转录因子结合模体(motif)。Figure 6 shows a comparison of the results of ATAC-seq determination of chromatin open regions after induction by ATRA in one embodiment of the invention. Figures 6A and 6B show the specific peaks of each group and the distribution of peaks in the genome. Figures 6C and 6D show that in the regulatory region and the gene region, the Gain group and the Loss group interact with the ATRA-treated group and the control group, respectively. Sexual and shared chromatin open signal distribution, in which the Gain group is more enriched in the ATRA-treated group-specific signal, while the Loss group is more enriched in the control-specific signal, and it can be seen that the regulatory region is more enriched than the gene region. Specific signal. Figure 6E shows transcription factor binding motifs enriched in chromatin open regions in the control and ATRA induction groups.
图7显示了在Gain组(图7A)和Loss组(图7B)中的染色质开放区域内具有结合模体的转录因子(TFs)分布比较,从中可以发现,GATA结合模体在两组间具有特异性差别。Figure 7 shows a comparison of the distribution of transcription factors (TFs) with binding motifs in the open chromatin region of the Gain group (Figure 7A) and the Loss group (Figure 7B), from which it can be seen that the GATA binding motif is between the two groups. There are specific differences.
图8A显示了通过转录因子-靶基因调控网络分析,表明GATA2位于网络核心枢纽,图8B显示了ATRA诱导HL-60细胞分化进程中,GATA2与其他富含ATAC-seq峰的转录因子之间的相互作用关系。Figure 8A shows that transcriptional factor-target gene regulatory network analysis indicates that GATA2 is located in the network core hub, and Figure 8B shows that ATRA induces HL-60 cell differentiation, between GATA2 and other ATAC-seq-rich transcription factors. Interaction relationship.
图9A和图9B分别显示了ATRA诱导后GATA2基因区域和调节区域相互作用和GATA2基因表达变化;图9C和图9D分别显示了ARTA诱导后ZBTB16基因区域和调节区域相互作用和ZBTB16基因表达变化;Figure 9A and Figure 9B show the GATA2 gene region and regulatory region interaction and GATA2 gene expression changes after ATRA induction, respectively; Figure 9C and Figure 9D show the ZBTB16 gene region and regulatory region interaction and ZBTB16 gene expression changes after ARTA induction, respectively;
图10A显示了GATA2基因上游4C染色质构象捕获的实验结果,图10B显示了验证染色质环结构的原位FISH结果,图10C显示了FISH实验中红绿两种荧光分布的皮尔森相关系数。Figure 10A shows the results of the 4C chromatin conformation capture upstream of the GATA2 gene, Figure 10B shows the in situ FISH results verifying the chromatin loop structure, and Figure 10C shows the Pearson correlation coefficients for the red and green fluorescence distributions in the FISH experiment.
图11A显示了ZBTB16基因区域4C染色质构象捕获的实验结果,图11B和图11C分别显示了Gata2和Zbtb16在诱导分化中的模型,其中解释了染色质结构、转录因子结合和基因表达的关系。Fig. 11A shows the results of the 4C chromatin conformation capture of the ZBTB16 gene region, and Fig. 11B and Fig. 11C show the models of Gata2 and Zbtb16 in the induced differentiation, respectively, which explain the relationship between chromatin structure, transcription factor binding and gene expression.
具体实施方式Detailed ways
本申请所用术语具有与现有技术中该术语相同的含义。为了清楚地表明所用术语的含义,以下给出一些术语在本申请中的具体含义。当本文定义与该术语的常规含义有冲突时,以本文定义为准。The terms used in this application have the same meaning as the term in the prior art. In order to clearly indicate the meaning of the terms used, the specific meanings of some terms in this application are given below. In the event of a conflict between the definitions of this term and the general meaning of the term, the definitions herein prevail.
定义definition
术语“样品”,也可以叫做“样本”,是指任何可被分析的对象,只要该分析的对象内部包含染色质以及基因的表达产物(例如mRNA和/或蛋 白质),样品可以是真核细胞,例如动物细胞、植物细胞、真菌细胞等,有时候也可以包括细胞的裂解物。The term "sample", also referred to as "sample", refers to any subject that can be analyzed, as long as the subject of the analysis contains chromatin and the expression product of the gene (eg, mRNA and/or protein), the sample may be a eukaryotic cell. For example, animal cells, plant cells, fungal cells, and the like, and sometimes lysates of cells may also be included.
术语“状态转变”是指对于同一样品而言,通过某种特定的外加诱导或者内部的自然过程而导致样品的性质或形态等发生改变。例如化学试剂的诱导分化、物理刺激或细胞在自然生理过程中的分化,如响应外部激素或其他信号分子或者细胞内部基因或蛋白的作用,而引发的细胞自然分化过程。在一个实施方式中,本发明中“至少两种处于不同状态的样品”就是通过状态转变而形成的。The term "state transition" refers to a change in the nature or morphology of a sample by a particular additive induction or internal natural process for the same sample. For example, the differentiation of chemical agents, physical stimulation or differentiation of cells in natural physiological processes, such as the natural differentiation process of cells triggered by the action of external hormones or other signaling molecules or genes or proteins inside cells. In one embodiment, "at least two samples in different states" are formed by state transitions in the present invention.
“处于第一状态的样品”和“处于第二状态的样品”是指经过状态转变过程后得到的两种不同状态的样品。其中在一些实施方式中,“处于第一状态的样品”是状态转变前的样品,而“处于第二状态的样品”是经过状态转变后的样品。The "sample in the first state" and the "sample in the second state" refer to the samples of the two different states obtained after the state transition process. In some embodiments, the "sample in the first state" is the sample before the state transition, and the "sample in the second state" is the sample after the state transition.
术语“效应基因”是指参与到样品状态转变过程的基因,效应基因可能是样品发生状态转变的原因,即该基因能够引发状态转变的过程,例如在细胞分化诱导模型中,该基因可以是直接响应外加诱导的基因,从而引发细胞的分化;另外该基因也可以是状态转变过程中间的环节,或者仅仅作为状态转变的结果。需要注意的是,在本文中“基因”可以指基因、也可以指基因的表达产物,如mRNA转录本或者蛋白质,在一个实施方式中,效应基因可以是转录因子。The term "effector gene" refers to a gene involved in the process of sample state transition. The effector gene may be the cause of the state transition of the sample, that is, the process by which the gene can initiate a state transition. For example, in a cell differentiation induction model, the gene may be directly In response to the addition of the induced gene, which triggers the differentiation of the cell; in addition, the gene can also be a link in the middle of the state transition process, or simply as a result of a state transition. It should be noted that a "gene" herein may refer to a gene, and may also refer to an expression product of a gene, such as an mRNA transcript or a protein. In one embodiment, the effector gene may be a transcription factor.
术语“转录调控区域”,或者叫做调控区域,是指对基因组DNA中位于基因的上游或下游一定的范围内,例如10kb-1Mb,50kb-500kb,100kb-200kb范围内,包含启动子、增强子等反式作用因子(例如转录因子)结合位点的一段区域。The term "transcriptional regulatory region", or regulatory region, refers to a region within the genomic DNA that is located upstream or downstream of the gene, for example, 10 kb-1 Mb, 50 kb-500 kb, 100 kb-200 kb, including promoters, enhancers. A region of a site where a trans-acting factor (eg, a transcription factor) binds.
在本发明一个实施方式中,转录调控区域的分析对从大量的可供选择的候选基因中挑选出目标效应基因具有重要作用,例如确定基因的转录调控区域内是否存在相互作用的启动子和增强子,以及转录调控区域是否是开放序列,以及存在哪些转录因子的结合模体;在不同状态的样品中,上述增强子和启动子之间的相互作用是否发生改变,同时结合模体的分布是否也发生了改变等,再与其他信息结合(如基因表达量)就可以有效的获 得效应基因。In one embodiment of the invention, the analysis of the transcriptional regulatory region plays an important role in the selection of a target effector gene from a large number of alternative candidate genes, such as determining whether an interacting promoter and enhancer are present in the transcriptional regulatory region of the gene. , and whether the transcriptional regulatory region is an open sequence, and which binding motifs are present; whether the interaction between the enhancer and the promoter is altered in the samples of different states, and whether the distribution of the binding motif is Changes have also occurred, and combined with other information (such as the amount of gene expression) can effectively obtain effector genes.
术语“基因调控相关的染色质相互作用”,也叫做“存在于转录调控区域的染色质相互作用”,是指在基因转录调控区域的不同调控元件内,例如启动子和增强子的序列之间,发生的染色质相互作用。The term "gene regulatory-associated chromatin interactions", also known as "chromosome interactions present in transcriptional regulatory regions", refers to sequences within different regulatory elements of the transcriptional regulatory region of a gene, such as promoters and enhancers. , the occurrence of chromatin interactions.
在本发明的一个实施方式中,将参考基因组序列划分成一定大小的区域,所述区域的大小可以根据染色质构象分析的数据深度例如测序深度进行调整,优选的范围是1kb-40kb,例如1kb、2kb、3kb、4kb、5kb、6kb、7kb、8kb、9kb、10kb、11kb、12kb、13kb、14kb、15kb、16kb、17kb、18kb、19kb、20kb、21kb、22kb、23kb、24kb、25kb、26kb、27kb、28kb、29kb、30kb、31kb、32kb、33kb、34kb、35kb、36kb、37kb、38kb、39kb或40kb。在一个具体实施方式中,所述区域大小为40kb。In one embodiment of the invention, the reference genomic sequence is divided into regions of a size that can be adjusted according to the depth of the chromatin conformational analysis, such as the depth of sequencing, preferably in the range of 1 kb to 40 kb, for example 1 kb. 2kb, 3kb, 4kb, 5kb, 6kb, 7kb, 8kb, 9kb, 10kb, 11kb, 12kb, 13kb, 14kb, 15kb, 16kb, 17kb, 18kb, 19kb, 20kb, 21kb, 22kb, 23kb, 24kb, 25kb, 26kb 27 kb, 28 kb, 29 kb, 30 kb, 31 kb, 32 kb, 33 kb, 34 kb, 35 kb, 36 kb, 37 kb, 38 kb, 39 kb or 40 kb. In a specific embodiment, the region is 40 kb in size.
接下来,基于前述步骤iv)获得的活性状态的启动子和增强子信息,通过比对分别获得包含有活性的启动子和增强子位点的区域,优选将包含启动子的区域命名为基因区域,将包含增强子序列的区域命名为调控区域;随后,结合步骤v)中的染色质相互作用的信号,分析基因区域和调节区域之间的染色质相互作用频率信号(即特定区域之间的染色质相互作用数,例如具体可表现为Hi-C数据中两端分别落在特定区域的读段数),当基因区域和调节区域之间具有可识别强度的接触信号时,被认为存在基因调控相关的染色质相互作用。Next, based on the promoter and enhancer information of the active state obtained in the aforementioned step iv), the regions containing the active promoter and the enhancer site are respectively obtained by alignment, and the region containing the promoter is preferably named as the gene region. , the region containing the enhancer sequence is named as a regulatory region; subsequently, combined with the signal of chromatin interaction in step v), the chromatin interaction frequency signal between the gene region and the regulatory region is analyzed (ie, between specific regions) The number of chromatin interactions, for example, can be expressed as the number of reads in the Hi-C data where the two ends fall within a specific region, respectively. When there is a identifiable intensity contact signal between the gene region and the regulatory region, it is considered to have gene regulation. Related chromatin interactions.
术语“基因调控性染色质相互作用差异”,在本文中有时候也被称为“转录调控区域的染色质相互作用差异”、“位于转录调控区域的染色质相互作用差异”或“存在于转录调控区域的染色质相互作用差异”,其表达的含义相同。其获得方法是:比较处于第一状态和第二状态的样品的基因调控相关的染色质相互作用,其中具有显著性差异的,被鉴定为基因调控性染色质相互作用差异(differential gene-regulatory interaction),其中的“显著性”优选是指统计学意义上的显著。例如当采用假设检验时,p<0.05或p<0.01。The term "gene-regulated chromatin interaction differences" is sometimes referred to herein as "differences in chromatin interactions in transcriptional regulatory regions", "differences in chromatin interactions in transcriptional regulatory regions" or "existing in transcription" The difference in chromatin interactions in the regulatory region, the meaning of which is the same. The method is obtained by comparing the gene regulation-related chromatin interactions of the samples in the first state and the second state, wherein there is a significant difference, which is identified as a differential gene-regulatory interaction. ), wherein "significance" preferably means statistically significant. For example, when a hypothesis test is employed, p < 0.05 or p < 0.01.
术语“基因调控性染色质相互作用”,是指在处于第一状态的样品和处于第二状态的样品间具有显著性差异的那些基因调控相关的染色质相 互作用,即属于染色调控相关的染色质相互作用的一部分,在鉴定出的基因调控性染色质相互作用中,染色质相互作用与基因调控之间的关联更加确定。The term "gene-regulated chromatin interaction" refers to those chromatin-related chromatin interactions that are significantly different between a sample in a first state and a sample in a second state, ie, staining-related staining. As part of the qualitative interaction, the association between chromatin interactions and gene regulation is more certain in the identified gene-regulated chromatin interactions.
实际上,前述鉴定得到的“基因调控性染色质相互作用差异”就可以认为是“基因调控性染色质相互作用”的比较结果。根据所述“差异”的不同类型,基因调控性染色质相互作用也分为两种类型:即处于第二状态的样品相对于处于第一状态的样品增强的基因调控性相互作用(在本发明的实施例中,这一类相互作用也被归为Gain组)和/或减弱的基因调控性相互作用(在本发明的实施例中,这一类相互作用也被归为Loss组)。In fact, the "identity of gene-regulated chromatin interactions" identified above can be considered as a comparison of "gene-regulated chromatin interactions". Depending on the type of "difference", gene-regulated chromatin interactions are also divided into two types: the enhanced gene regulatory interaction of the sample in the second state relative to the sample in the first state (in the present invention) In the embodiment, this type of interaction is also classified as a Gain group) and/or attenuated gene regulatory interactions (in the embodiment of the invention, this type of interaction is also classified as the Loss group).
术语“基因可识别行为差异”指在不同状态的样品中与基因的性质、状态等相关的可以被定性或定量观测到的差异。而其中所述“基因”并非特定的某一部分或经过人工干预而预先选定的基因,而是在定量或定性的分析中,被观察到具有可识别行为差异的整体基因集合。有时候,为了清楚,可以将上述“基因”定义为备选基因或候选基因,但需要注意的是,在本文中即使使用“备选基因”或“候选基因”的表述,也并不表明这些备选基因或候选基因是需要人为预先选定范围的一部分。The term "gene identifiable behavioral difference" refers to a difference that can be qualitatively or quantitatively observed in relation to the nature, state, etc. of a gene in a sample of different states. Wherein the "gene" is not a specific part or a pre-selected gene by human intervention, but in a quantitative or qualitative analysis, a whole set of genes with identifiable behavioral differences is observed. Sometimes, for the sake of clarity, the above "gene" can be defined as a candidate gene or a candidate gene, but it should be noted that even the expressions of "alternative gene" or "candidate gene" are not used herein to indicate these. Alternative genes or candidate genes are part of a pre-selected range that needs to be artificially selected.
术语“结合模体”是指存在于基因组DNA上,能够被转录因子等反式作用因子结合从而调节目标基因,例如调节本发明中效应基因表达的元件。The term "binding motif" refers to an element which is present on genomic DNA and which can be bound by a trans-acting factor such as a transcription factor to regulate a target gene, for example, to regulate expression of an effector gene in the present invention.
术语“结合模体分布的差异”是指:在不同状态的样品之间,整体或部分结合模体在数量、位置或存在与否方面的差异,或者,指位于感兴趣的区域中的部分或特定结合模体在数量、位置以及存在与否方面的差异。The term "difference in the distribution of bound phantoms" refers to the difference in the number, position or presence of the whole or part of the bound phantom between samples in different states, or the portion located in the region of interest or The difference in the number, location, and presence or absence of a particular binding motif.
术语“染色质相互作用”是指染色质不同的位点之间的长距离相互作用,从而形成染色质的高级构象从而维持染色质结构或者促进基因的表达。The term "chromatin interaction" refers to long-range interactions between different chromatin sites, thereby forming a high-level conformation of chromatin to maintain chromatin structure or to promote gene expression.
术语“染色质相互作用频率”,在本文中也叫做“Hi-C相互作用频率”或Hi-C contact frequency,是指在进行染色质构象分析时,寻找染色质 相互作用时发现的不同区域之间相互作用的信号,表现为Hi-C数据中两端分别落在特定区域的读段数。The term "chromatin interaction frequency", also referred to herein as "Hi-C interaction frequency" or Hi-C contact frequency, refers to the different regions found in chromatin interactions when performing chromatin conformational analysis. The signal of the interaction is expressed as the number of reads in the Hi-C data where the two ends fall within a specific area.
术语“染色质开放区域序列”是指染色质中由于无核小体结合等原因而暴露的DNA序列,并可被反式作用因子如转录因子结合。The term "chromatin open region sequence" refers to a DNA sequence that is exposed in chromatin due to nucleosome binding or the like and can be bound by a trans-acting factor such as a transcription factor.
术语“ChIP-seq”是指将免疫共沉淀技术(ChIP)与高通量测序相结合的技术,从而高效地在全基因组范围内检测与组蛋白、转录因子等相互作用的DNA区段。其原理在于:首先通过染色质免疫共沉淀技术(ChIP)特异性地富集目的蛋白结合的DNA片段,并对其进行纯化与文库构建;然后对富集得到的DNA片段进行高通量测序。随后将获得的数百万条序列读段精确定位到基因组上,即获得全基因组范围内与组蛋白、转录因子等互作的DNA区段信息。The term "ChIP-seq" refers to a technique that combines immunoprecipitation (ChIP) with high-throughput sequencing to efficiently detect DNA segments that interact with histones, transcription factors, etc., across the genome. The principle is as follows: Firstly, the DNA fragment of the target protein is specifically enriched by chromatin immunoprecipitation (ChIP), and purified and library constructed; then, the enriched DNA fragment is subjected to high-throughput sequencing. The obtained millions of sequence reads are then accurately mapped to the genome, that is, DNA segment information that interacts with histones, transcription factors, and the like in a genome-wide range is obtained.
术语“染色质构象捕获技术”是指所有能够实现染色质不同空间位置之间的关系从而建立染色质三维结构信息的技术,其包括普通的3C技术,即Chromosome Conformation Capture,也包括结合高通量测序的染色质构象捕获技术。The term "chromatin conformation capture technology" refers to all techniques that enable the relationship between different spatial positions of chromatin to establish chromatin three-dimensional structure information, including the common 3C technology, namely Chromosome Conformation Capture, which also includes high-throughput Sequencing chromatin conformation capture technology.
术语“高通量染色质构象捕获技术”,是指结合高通量测序技术以及生物信息分析方法,从而有效分析全基因组范围内整个染色质DNA在空间位置上的关系,并获得高分辨率的染色质三维结构和染色质相互作用信息的方法。在本文中,该技术至少包括Hi-C、基于Hi-C的改进型技术in situ Hi-C,以及在in situ Hi-C法的基础上进一步引进bridge-linker后获得的BL-Hi-C法,另外ChIA-PET法在本文中也属于一种高通量染色质构象捕获技术。The term "high-throughput chromatin conformation capture technology" refers to the combination of high-throughput sequencing technology and bioinformatics analysis methods to efficiently analyze the spatial position of the entire chromatin DNA in the genome-wide range and achieve high resolution. A method for chromatin three-dimensional structure and chromatin interaction information. In this paper, the technique includes at least Hi-C, Hi-C based improved technology in situ Hi-C, and BL-Hi-C obtained after further introduction of bridge-linker based on the in situ Hi-C method. In addition, the ChIA-PET method is also a high-throughput chromatin conformation capture technique.
术语“ATAC-seq”是指一种分子生物学中研究染色质可接近性的技术,由ATAC实验和高通量测序两部分组成,ATAC-seq实验的关键部分是转座酶Tn5对样品基因组DNA的作用。转座子优先并入一般没有核小体(无核小体区域)或暴露DNA段的基因组区域。因此,基因组中某些 基因座序列的富集表明该区域不存在核小体,处于DNA结合蛋白等核机器能进入的松散暴露状态,提供有关染色质区段转录活跃状态的信息。ATAC-seq采用突变的多活性转座酶,允许高效切割暴露的DNA和同时连接特定序列的接头。分离接头连接的DNA片段,通过PCR扩增后用于高通量测序。The term "ATAC-seq" refers to a technique for studying chromatin accessibility in molecular biology. It consists of two parts, the ATAC experiment and high-throughput sequencing. The key part of the ATAC-seq experiment is the transposase Tn5 to the sample genome. The role of DNA. The transposon preferentially incorporates a genomic region that is generally free of nucleosomes (nuclear bodies) or exposed DNA segments. Therefore, the enrichment of certain locus sequences in the genome indicates that there are no nucleosomes in the region, and is in a loosely exposed state in which nuclear machinery such as DNA-binding proteins can enter, providing information on the transcriptional active state of the chromatin segment. ATAC-seq employs a mutated multi-active transposase that allows for efficient cleavage of exposed DNA and simultaneous ligation of specific sequences. The ligation-ligated DNA fragment was isolated and amplified by PCR for high-throughput sequencing.
实施例Example
下述实施例以ATRA诱导HL-60分化为例,示例性的说明本公开方法如何寻找在上述分化过程中具有重要调节的基因并进行相应的分析。需要说明的是,本领域技术人员可以理解,本公开中的方法并不限于所述实施例所示例的方法,而是适用于任何处于两种不同状态下的样本中,相关目标调节基因的寻找和分析。The following examples are exemplified by ATRA-induced HL-60 differentiation, and exemplarily illustrate how the disclosed method finds genes that have important regulation during the above differentiation process and performs corresponding analyses. It should be noted that those skilled in the art can understand that the methods in the present disclosure are not limited to the methods exemplified in the embodiments, but are applicable to any sample in two different states, and the related target regulatory genes are searched. And analysis.
实施例1细胞培养和ATRA诱导Example 1 Cell Culture and ATRA Induction
HL60细胞购自国家实验细胞资源共享平台(中国北京协和医学院)。细胞维持在补充有10%胎牛血清(FBS,Gibco,USA),50单位/mL青霉素和链霉素(Gibco,USA)和非必需氨基酸(Gibco,USA)的RPMI-1640培养基(Gibco,美国)中。HL60 cells were purchased from the National Experimental Cell Resource Sharing Platform (Beijing Union Medical College, China). The cells were maintained in RPMI-1640 medium (Gibco, supplemented with 10% fetal calf serum (FBS, Gibco, USA), 50 units/mL penicillin and streptomycin (Gibco, USA) and non-essential amino acids (Gibco, USA). In the United States.
对于粒细胞分化,2x10 5/ml HL-60细胞使用1μM ATRA(溶于乙醇的1mM储备液,Sigma,USA)处理4天(称为ATRA组);用等量乙醇处理的细胞被称为对照组。第2天更换培养基,同时加入ATRA/乙醇。 For granulocyte differentiation, 2×10 5 /ml HL-60 cells were treated with 1 μM ATRA (1 mM stock solution in ethanol, Sigma, USA) for 4 days (referred to as ATRA group); cells treated with equal amounts of ethanol were called controls. group. The medium was changed on the second day while adding ATRA/ethanol.
实施例2获得对照组和ATRA组细胞的RNA差异表达信息Example 2 Obtaining RNA differential expression information of control and ATRA cells
方法:RNA seqMethod: RNA seq
过程process
利用TRIZOL(Ambion,USA)法从对照细胞以及经过ATRA处理的细胞中提取总RNA。文库构建和测序均由安诺优达公司(中国)进行。Total RNA was extracted from control cells and ATRA-treated cells using the TRIZOL (Ambion, USA) method. Library construction and sequencing were performed by Annoyouda (China).
对于RNA-seq分析,首先去除接头序列(adapter),随后用Bowtie将数据比对回参考基因组hg19,滤掉核糖体RNA的测序读段(reads)。经过上述步骤后,将剩余的读段数据与转录组数据RSEM v1.2.7进行比 对,并进行定量分析。注释文件下载于加利福尼亚大学圣克鲁斯分校(UCSC)基因组浏览器(genome browser)的人类基因组hg19版本(human hg19 assembly)。使用Deseq2 1.4.5版本的软件包,根据基因方差分布估计(gene-wise dispersion estimates)的平均值计算差异基因表达。基于调整p值等于0.01和倍数变化值取log2后大于0.9,确定表达具有显著差异的基因。Gene Ontology分析使用DAVID。使用R 3.3.1版本的ggplot2软件包绘制RNA-seq图(plots)。For RNA-seq analysis, the adaptor sequence was first removed, followed by Bowtie to compare the data back to the reference genome hg19, filtering out the sequencing reads of the ribosomal RNA. After the above steps, the remaining read data was compared with the transcriptome data RSEM v1.2.7 and quantitative analysis was performed. The annotated file was downloaded from the Human Genome 19 version of the genome browser at the University of California, Santa Cruz (UCSC). Differential gene expression was calculated from the mean of gene-wise dispersion estimates using a Deseq2 version 1.4.5 software package. A gene with a significant difference in expression was determined based on an adjusted p-value equal to 0.01 and a fold change value of greater than 0.9 after log2. Gene Ontology analysis uses DAVID. RNA-seq plots (plots) were drawn using the Rg 3.3.1 version of the ggplot2 software package.
结果result
差异表达基因(DEGs)分析显示在ATRA诱导后有941个上调基因和611个下调基因(图2A)。GO分析显示,“免疫反应”和“白细胞活化”类基因被显著富集,这实际上与嗜中性粒细胞的终末分化过程一致(图2B)。Differentially expressed genes (DEGs) analysis showed 941 up-regulated genes and 611 down-regulated genes after ATRA induction (Fig. 2A). GO analysis showed that the "immune response" and "leukocyte activation" genes were significantly enriched, which is in fact consistent with the terminal differentiation process of neutrophils (Fig. 2B).
实施例3 TAD内部染色质相互作用水平与基因表达水平、基因启动子和增强子活性密切相关Example 3 The level of chromatin interaction in TAD is closely related to gene expression level, gene promoter and enhancer activity.
方法:BL-Hi-CMethod: BL-Hi-C
过程process
文库构建:用1%甲醛处理细胞以交联细胞中的蛋白质以及蛋白质和DNA,然后使用裂解缓冲液(50mM HEPES-KOH,150mM NaCl,1mM EDTA,1%Triton X-100和0.1%SDS)重悬。然后用酶HaeIII将基因组消化成具有平末端的片段。将DNA片段平末端用腺嘌呤处理,并在16℃下与含有生物素的桥连接物(bridge linker)连接4小时,并用外切核酸酶(NEB)消化未连接的DNA片段。接下来,细胞用蛋白酶K(Ambion)消化过夜,利用酚-氯仿(Solarbio)结合乙醇沉淀提取并纯化DNA。然后,使用S220聚焦超声波仪(Covaris)将DNA片段化,并通过链霉亲和素包被的Dynabeads M280(Thermo Fisher)结合生物素标记的DNA片段。对由磁珠制备的文库进行Illumina测序并通过PCR进行扩增。用AMPure XP珠(Beckman,德国)纯化后,使用Illumina HiSeq 2500测序仪进行测序。Library construction: Cells were treated with 1% formaldehyde to crosslink proteins and proteins and DNA in cells, then heavy using lysis buffer (50 mM HEPES-KOH, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100 and 0.1% SDS) Hanging. The genome was then digested with the enzyme HaeIII into fragments with blunt ends. The blunt ends of the DNA fragments were treated with adenine and ligated with a biotin-containing bridge linker at 16 °C for 4 hours, and the unligated DNA fragments were digested with exonuclease (NEB). Next, the cells were digested with proteinase K (Ambion) overnight, and DNA was extracted and purified using phenol-chloroform (Solarbio) in combination with ethanol precipitation. Then, the DNA was fragmented using a S220 focused sonicator (Covaris), and the biotin-labeled DNA fragment was bound by streptavidin-coated Dynabeads M280 (Thermo Fisher). Libraries prepared from magnetic beads were subjected to Illumina sequencing and amplified by PCR. After purification with AMPure XP beads (Beckman, Germany), sequencing was performed using an Illumina HiSeq 2500 sequencer.
Hi-C数据分析:首先,将读段中的桥连接物序列(序列:CGCGATATCTTATCTGACT或GTCAGATAAGATATCGCGT)去除,如果完整接头将读段分成两个片段,则保留5'片段。其次,经过上述处理 的读段被对应到人类基因组hg19版本,同时去除重复片段。第三,基于DNA链信息估计读段对的距离阈值,结合读段对和链的信息,将相互作用对分成下面几种类型:完整片段(读段对内部无发生连接发生)、自连接、染色体间连接和染色体内连接。对于识别4碱基对的限制性内切酶而言,染色体内连接的阈值是大约3kb。Hi-C data analysis: First, the bridge linker sequence (sequence: CGCGATATCTTATCTGACT or GTCAGATAAGATATCGCGT) in the read is removed, and if the complete link divides the read into two segments, the 5' segment is retained. Second, the reads processed as described above are mapped to the human genome hg19 version, while the duplicate fragments are removed. Third, based on the DNA strand information to estimate the distance threshold of the pair of readings, combined with the information of the pair of pairs and the chain, the interaction pairs are divided into the following types: complete fragments (readings do not occur internally), self-joining, Inter-chromosomal connections and intrachromosomal connections. For a restriction endonuclease that recognizes 4 base pairs, the threshold for intrachromosomal ligation is approximately 3 kb.
Hi-C数据校正:使用迭代校正法(ICE)来纠正系统偏差,之后产生40kb分辨率的相互作用矩阵。Hi-C data correction: The iterative correction method (ICE) was used to correct the system bias, followed by a 40 kb resolution interaction matrix.
对于TAD分类,我们使用基于聚类的Hi-C结构域搜索(CHDF)法进行。For the TAD classification, we use the cluster-based Hi-C domain search (CHDF) method.
计算TAD相互作用倍数改变:为了计算ATRA诱导时TAD内部和外部Hi-C计数的倍数变化,首先计算重复之间的倍数变化。对于每个TAD,将对照组和ATRA组细胞的倍数变化相结合以产生背景分布(分别计算内部和外部倍数变化)。然后,将ATRA处理细胞与对照细胞之间的倍数变化引入背景分布中以获得基于它们在背景分布中位置的p值。在两个重复中p值<0.05的TAD被定义为显着变化的TAD。Calculation of TAD interaction fold change: To calculate the fold change in TAD internal and external Hi-C counts upon ATRA induction, the fold change between replicates was first calculated. For each TAD, the fold change of the control and ATRA group cells was combined to generate a background distribution (internal and external fold changes were calculated separately). The fold change between ATRA treated cells and control cells is then introduced into the background distribution to obtain p values based on their position in the background distribution. A TAD with a p-value <0.05 in both replicates was defined as a significantly altered TAD.
方法:ChIP-seq文库构建和数据分析Method: ChIP-seq library construction and data analysis
过程process
利用1%甲醛对经过ATRA处理的HL-60细胞和对照细胞进行交联处理。然后,使用裂解缓冲液(50mM HEPES-KOH,150mM NaCl,1mM EDTA,1%Triton X-100,0.1%脱氧胆酸钠和1%SDS)裂解细胞膜。染色质用FA裂解缓冲液(50mM HEPES-KOH,150mM NaCl,1mM EDTA,1%Triton X-100,0.1%脱氧胆酸钠和0.1%SDS)重悬,随后利用超声处理处理器(Cole-Parmer,美国)进行片段化。利用与H3K4me3和H3K27ac抗体(Abcam,England)预温育的Dynabeads(Thermo Fisher)过夜处理进行免疫沉淀。洗涤和纯化DNA后,使用TruePrep DNA文库制备试剂盒(Vazyme,中国)根据制造商提供的实验手册进行文库构建。使用Illumina HiSeq 2500测序仪对文库进行测序。The ATRA-treated HL-60 cells and control cells were cross-linked with 1% formaldehyde. Then, the cell membrane was lysed using lysis buffer (50 mM HEPES-KOH, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate and 1% SDS). Chromatin was resuspended in FA lysis buffer (50 mM HEPES-KOH, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate and 0.1% SDS) followed by sonication processor (Cole-Parmer) , United States) for fragmentation. Immunoprecipitation was performed using overnight treatment with Dynabeads (Thermo Fisher) pre-incubated with H3K4me3 and H3K27ac antibodies (Abeam, England). After washing and purifying the DNA, library construction was performed using the TruePrep DNA Library Preparation Kit (Vazyme, China) according to the experimental manual provided by the manufacturer. The library was sequenced using an Illumina HiSeq 2500 sequencer.
对于ChIP-Seq分析,首先去除接头序列。然后,使用Bowtie将测序读段比对到人类基因组hg19版本上。组蛋白修饰峰使用MACS2软件调用生成,参数设置为'-g hs--nomodel--broad'。在两次重复中均存在的峰(bedtools软件,1bp最小重叠)被认为是置信峰。然后,比较对照和ATRA处理的细胞所获得的置信峰,从而区分ATRA处理组特异峰、对 照组特异峰和重叠峰。峰值比较使用bedtools中的intersectBed软件进行。For the ChIP-Seq analysis, the linker sequence was first removed. The sequencing reads were then aligned to the human genome hg19 version using Bowtie. The histone modification peak was generated using the MACS2 software call and the parameter was set to '-g hs--nomodel--broad'. The peaks present in both replicates (bedtools software, 1 bp minimal overlap) were considered confidence peaks. Then, the confidence peaks obtained by the control and ATRA-treated cells were compared to distinguish the ATRA-treated group-specific peaks, the control-group-specific peaks, and the overlapping peaks. Peak comparisons were made using the interactBed software in bedtools.
分析analysis
根据TAD内部Hi-C相互作用频率变化对包含表达基因(n=3362)的TAD进行排序,并根据Hi-C相互作用频率的统计分布将这些TAD分类,并计算这些TAD中表达基因的分布情况(图3A,3B)。结果表明,经过ATRA诱导后,相比于内部相互作用变化并不明显的TAD,内部相互作用发生明显变化的TAD中更可能具有差异表达的基因。而且,内部染色质相互作用增加的TAD会富集上调的差异表达基因,而内部染色质相互作用减弱的TAD则表现出富集下调的差异表达基因(图3B)。以上结果显示,TAD内基因表达变化与内部染色质相互作用频率之间存在正相关性。The TAD containing the expressed gene (n=3362) was sorted according to the change of the internal Hi-C interaction frequency of TAD, and these TADs were classified according to the statistical distribution of Hi-C interaction frequency, and the distribution of the expressed genes in these TADs was calculated. (Fig. 3A, 3B). The results showed that after TTRA induction, TAD was more likely to have differentially expressed genes in TAD, which showed significant changes in internal interactions compared to TAD, which did not change significantly in internal interactions. Moreover, TAD with increased internal chromatin interaction enriches the differentially expressed genes, while TAD with reduced internal chromatin interactions exhibits differentially expressed differentially expressed genes (Fig. 3B). The above results show that there is a positive correlation between the change in gene expression in TAD and the frequency of internal chromatin interaction.
进一步的,为了定性表观表征TAD内的表观遗传状态,我们使用针对H3K4me3和H3K27ac的抗体分别在对照和ATRA处理的细胞中进行ChIP-seq,其主要标记活性启动子和增强子。通过计算TAD内的ChIP-seq信号变化,可以发现内部染色质相互作用增加的TAD的H3K4me3和H3K27ac的水平也增加,而染色质相互作用减少的TAD则显示相反的变化(图4)。这提示了TAD内部表观遗传学活性的改变可能会影响染色质构象变异和差异基因表达。Further, in order to qualitatively characterize the epigenetic state within the TAD, we performed ChIP-seq in control and ATRA-treated cells, respectively, using antibodies against H3K4me3 and H3K27ac, which primarily labeled active promoters and enhancers. By calculating the ChIP-seq signal change in the TAD, it was found that the levels of H3K4me3 and H3K27ac in the TAD with increased internal chromatin interaction also increased, while the TAD with reduced chromatin interaction showed an opposite change (Fig. 4). This suggests that changes in epigenetic activity within TAD may affect chromatin conformational variation and differential gene expression.
通过本实施例可知,TAD内部染色质相互作用水平与基因调控非常密切,可以进一步用于寻找目标功能基因。It can be seen from the present example that the level of chromatin interaction in the TAD is very closely related to gene regulation, and can be further used to find the target functional gene.
实施例4“基因表达调节相关的染色质相互作用差异(differential gene-regulatory chromatin interactions)”可有效指示基因表达差异Example 4 "differential gene-regulatory chromatin interactions" can effectively indicate differences in gene expression
在实施例3的基础上,为了更好的将TAD内部染色质相互作用水平定量化从而应用于基因的鉴定,首先利用实施例3中的ChIP-seq数据,并在ATRA组细胞中鉴定了12295个H3K4me3峰和12493个H3K27ac峰,在对照组细胞中鉴定了14263个H3K4me3和22149个H3K27ac峰(图5A)。其中H3K4me3峰代表活性的启动子,H3K27ac峰代表活性的转录区域与增强子。通过计算每个H3K27ac峰的最接近转录起始位点(TSS)的距离,我们发现ATRA组和/或对照组特异性峰,相较于二者的共有峰,常常更加远离TSS(图5B),表明在ATRA诱导后出现更远端的调节变化。On the basis of Example 3, in order to better quantify the TAD internal chromatin interaction level and apply it to the identification of genes, the ChIP-seq data in Example 3 was first used, and 12295 was identified in the ATRA group cells. A H3K4me3 peak and a 12493 H3K27ac peak identified 14263 H3K4me3 and 22149 H3K27ac peaks in the control cells (Fig. 5A). The H3K4me3 peak represents the active promoter, and the H3K27ac peak represents the active transcriptional region and enhancer. By calculating the distance from the closest transcription start site (TSS) of each H3K27ac peak, we found that the ATRA group and/or control-specific peaks are often farther away from TSS than the common peaks of both (Figure 5B). , indicating a more distant regulatory change after ATRA induction.
染色质相互作用能够将远端的调节元件、启动子和转录起始位点拉近从而用于转录起始,为了更进一步确定究竟染色质相互作用的改变如何影响了基因表达以及影响了哪些基因表达,进行了如下的步骤:Chromatin interactions can pull distal regulatory elements, promoters, and transcription initiation sites for transcription initiation, in order to further determine how changes in chromatin interaction affect gene expression and which genes are affected Expression, carried out the following steps:
首先将整个基因组分成大小为40kb的bin。接下来,如果bin包含表达基因的启动子(即H3K4me3峰),该bin被标注为“基因区域”;如果包含启动子远端的H3K27ac峰(图5C),则该bin被标注为“调控区域”。然后,利用基因区域和调节区域之间的Hi-C相互作用关系(Hi-C读段)来表示基因调控相关的染色质相互作用的强度。由此,我们使用Hi-C数据以40kb的分辨率生成了染色质相互作用矩阵,并且矩阵的相应元素中的计数表示基因和调控区域之间的相互作用强度。例如,如果第i个bin是基因区域,第j个bin是调控区域,则[i,j]的计数代表基因调控相关的染色质相互作用。相同TADs内的基因调控相关的染色质相互作用作为差异相互作用分析的输入。差异相互作用分析是基于MA曲线法和随机抽样模型而进行的,即:The entire gene was first assembled into a 40 kb bin. Next, if the bin contains a promoter that expresses the gene (ie, the H3K4me3 peak), the bin is labeled as "gene region"; if the H3K27ac peak at the distal end of the promoter is included (Fig. 5C), the bin is labeled as "regulatory region" ". Then, the Hi-C interaction relationship (Hi-C read) between the gene region and the regulatory region is used to express the intensity of chromatin interactions associated with gene regulation. Thus, we generated a chromatin interaction matrix using Hi-C data at a resolution of 40 kb, and the counts in the corresponding elements of the matrix represent the intensity of interaction between the gene and the regulatory region. For example, if the i-th bin is a gene region and the j-th bin is a regulatory region, the count of [i, j] represents a chromatin interaction associated with gene regulation. The chromatin interactions associated with gene regulation within the same TADs serve as input for differential interaction analysis. The difference interaction analysis is based on the MA curve method and the random sampling model, namely:
认为Hi-C实验是对多数细胞中染色质相互作用的采样,因此Hi-C数据中两区域相互作用的读段遵循二项分布。使C 1和C 2分别表示从对照和ATRA处理的细胞中得到的特定基因调控性染色质相互作用的计数,其具有C i~二项式分布(n i,p i),i=1,2,其中n i表示Hi-C计数的总数,p i表示来自该基因调控性染色质相互作用的计数的概率。我们定义M=(log 2C 1-log 2C 2)/2,并且A=(log 2C 1+log 2C 2)/2。在随机抽样的假设下,假设A=a(a是A的一个观察值),M的条件分布遵循近似的正态分布。对于MA图上的每个基因调控性染色质相互作用,我们进行H 0:p 1=p 2与H 1:p 1≠p 2的假设检验。然后,基于条件正态分布分配p值。使用R 3.3.1版本的DEGseq软件包进行分析,参数设置为“MARS”。 The Hi-C experiment is considered to be a sampling of chromatin interactions in most cells, so the reads of the interaction between the two regions in the Hi-C data follow a binomial distribution. Let C 1 and C 2 represent the counts of specific gene-regulated chromatin interactions obtained from control and ATRA-treated cells, respectively, having a C i - binomial distribution (n i , p i ), i=1, 2, where n i represents the total number of Hi-C counts and p i represents the probability of counting from the regulatory chromatin interaction of the gene. We define M = (log 2 C 1 - log 2 C 2 )/2, and A = (log 2 C 1 + log 2 C 2 )/2. Under the assumption of random sampling, assuming A = a (a is an observation of A), the conditional distribution of M follows an approximate normal distribution. For each gene-regulated chromatin interaction on the MA map, we performed a hypothesis test of H 0 : p 1 = p 2 and H 1 : p 1 ≠p 2 . The p-value is then assigned based on the conditional normal distribution. The analysis was performed using the DE Gseq software package of version 3.3.1 with the parameter set to "MARS".
结果:根据在对照和ATRA处理的细胞之间Hi-C接触差异是否显著(基于Benjamini校正的p<0.001),挑选出了基因调控性染色质相互作用差异,并鉴定了422对增强的基因调节相关的染色质相互作用(可简称为“Gain组”)和330对减弱(或降低)的基因调节相关的染色质相互作用(可简称为“Loss”组)(图5C)。RESULTS: Based on whether the difference in Hi-C exposure between control and ATRA-treated cells was significant (p<0.001 based on Benjamini correction), differences in gene-regulated chromatin interactions were identified and 422 pairs of enhanced gene regulation were identified. Correlated chromatin interactions (which may be referred to simply as "Gain") and 330 chromatin interactions associated with attenuated (or decreased) gene regulation (which may be referred to simply as the "Loss" group) (Fig. 5C).
进一步的,在Gain组中,相应基因区域中H3K27ac信号的倍数变化与相应基因区域中的H3K4me3信号的倍数变化均显著增加,而在Loss组中上述两组信号则呈现相反趋势(图5D)。这些结果可以至少部分地解释活性组蛋白状态和染色质相互作用强度之间的正相关性以及与差异基 因表达的正相关性。经统计,分别共计有430个和323个基因分别涉及了增强的基因调节相关的染色质相互作用以及降低的基因调节相关的染色质相互作用;进一步与RNA表达数据结合可以发现,当ATRA诱导时,上述基因中分别有164个和61个基因显示出了差异性的表达(图5E),其比例显著高于整体比例(在12266个基因中具有1649个差异表达基因)可见,Gain组或Loss组的相互作用分别在很大程度上能够导致涉及的相应基因的上调或下调。Further, in the Gain group, the fold change of the H3K27ac signal in the corresponding gene region and the fold change of the H3K4me3 signal in the corresponding gene region were significantly increased, while in the Loss group, the above two groups of signals showed opposite trends (Fig. 5D). These results can explain, at least in part, the positive correlation between active histone status and chromatin interaction intensity and positive correlation with differential gene expression. According to statistics, a total of 430 and 323 genes respectively involved enhanced gene regulation-associated chromatin interactions and decreased gene regulation-associated chromatin interactions; further combined with RNA expression data, it was found that when ATRA was induced 164 and 61 genes in the above genes showed differential expression (Fig. 5E), and the proportion was significantly higher than the overall ratio (1649 differentially expressed genes in 12266 genes), Gain group or Loss Group interactions can, to a large extent, result in up- or down-regulation of the corresponding genes involved.
实施例5基因调节性染色质相互作用差异与染色质的可接近性之间存在关联Example 5: There is a correlation between the difference in gene regulatory chromatin interactions and the accessibility of chromatin
方法:ATAC-seqMethod: ATAC-seq
流程Process
将对照样品和ATRA处理的样品分别在裂解缓冲液(10mM Tris-HCl(pH 7.4),10mM NaCl,3mM MgCl 2和NP-40)中置于冰上10分钟以制备细胞核,在细胞裂解后立即旋转以除去上清液。随后将细胞核与Tn5转座体和标记缓冲液在37℃下孵育30分钟(Vazyme,中国)。在标记后,直接将停止缓冲液加入到反应体系中以结束标记反应。随后进行12个循环的PCR以扩增文库。PCR反应后,使用1.2×AMP珠(Beckman,德国)纯化文库,然后使用Illumina HiSeq 2500测序仪对得到的文库进行测序。 The control sample and the ATRA-treated sample were separately placed in ice in a lysis buffer (10 mM Tris-HCl (pH 7.4), 10 mM NaCl, 3 mM MgCl 2 and NP-40) for 10 minutes to prepare a nucleus immediately after cell lysis. Rotate to remove the supernatant. The nuclei were then incubated with Tn5 transposome and labeling buffer for 30 minutes at 37 °C (Vazyme, China). After the labeling, the stop buffer was directly added to the reaction system to end the labeling reaction. A 12-cycle PCR was then performed to amplify the library. After the PCR reaction, the library was purified using 1.2 x AMP beads (Beckman, Germany) and the resulting library was sequenced using an Illumina HiSeq 2500 sequencer.
去除接头序列并且将序列比对到人类基因组hg19版本中。利用MACS2使用默认参数调出ATAC-seq峰值,随后进行峰值比较。为了鉴定富含ATAC-seq峰的序列基序,使用HOMER程序中的MififsGenome.pl。AnnotatePeaks.pl用于识别包含某些基序的特定峰。ATAC-seq峰的GREAT分析参见文献(McLean CY,Bristor D,Hiller M,Clarke SL,Schaar BT,Lowe CB,et al.GREAT improves functional interpretation of cis-regulatory regions.Nat Biotechnol.2010 May2;28(5):495–501)。The linker sequences were removed and the sequences aligned into the human genome hg19 version. The ATAC-seq peak is called up using MACS2 using default parameters, followed by peak comparison. To identify sequence motifs rich in ATAC-seq peaks, use Mififs Genome.pl in the HOMER program. AnnotatePeaks.pl is used to identify specific peaks that contain certain motifs. The GREAT analysis of the ATAC-seq peak is described in the literature (McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010 May2; 28 (5 ): 495–501).
结果result
计算位于具有差异性基因调节性染色质相互作用的基因区域中和调节区域中的ATAC-seq峰,发现增强(Gain)和减弱(Loss)的相互作用中分别富集了ATRA特异性和对照特异性峰(图6C和6D)。调控区域显 示出比基因区域更强的富集倾向,表明在开放染色质区域,特别是远端调控区域中TF结合的变化,调节了染色质相互作用的形成。The ATAC-seq peaks in the region of the gene with differential gene regulatory chromatin interactions and in the regulatory region were calculated, and it was found that the enhanced (Gain) and attenuated (Loss) interactions were enriched in ATRA-specific and control-specific, respectively. Sex peaks (Figures 6C and 6D). The regulatory region showed a stronger enrichment tendency than the gene region, indicating that changes in TF binding in the open chromatin region, particularly in the distal regulatory region, regulate the formation of chromatin interactions.
为了表征ATRA诱导后的转录因子结合状态,我们使用HOMER软件针对对照组和ATRA诱导组的特异性ATAC-seq峰进行基序分析(具体操作见实施例5)。包括CTCF,PU.1,RUNX和CEBP的大多数转录因子在ATRA处理的细胞或对照细胞之间高度相似(图6E)。通过ATAC数据分析观察到PU.1mRNA表达具有轻微上调(~1.9倍);在RUNX家族成员中,只有RUNX3在ATRA处理后显示出显著的mRNA水平变化(~3.2倍,上调),表明其可能在ATRA诱导中起调节作用。值得注意的是,GATA结合基序仅在对照细胞中富集(图6E),并且GATA2mRNA的表达水平在分化后显著下调(~0.06倍),这表明失去GATA2的结合与ATRA诱导过程相关。To characterize the transcription factor binding status after ATRA induction, we performed a motif analysis of the specific ATAC-seq peaks of the control and ATRA-inducing groups using the HOMER software (see Example 5 for specific procedures). Most transcription factors including CTCF, PU.1, RUNX and CEBP were highly similar between ATRA-treated cells or control cells (Fig. 6E). The expression of PU.1 mRNA was slightly up-regulated by the ATAC data analysis (~1.9-fold); among the RUNX family members, only RUNX3 showed significant mRNA level changes (~3.2-fold, up-regulated) after ATRA treatment, indicating that it may be Modulation in ATRA induction. Notably, the GATA binding motif was only enriched in control cells (Fig. 6E), and the expression level of GATA2 mRNA was significantly downregulated (~0.06 fold) after differentiation, indicating that loss of GATA2 binding is associated with the ATRA induction process.
实施例6在ATRA诱导模型中,通过整合多组学数据鉴定与染色质结构改变相关的关键基因转录Example 6 Identification of key gene transcription associated with chromatin structural changes by integrating multi-omics data in an ATRA induction model
Hi-C和ChIP-seq的实验操作和结果分析具体见实施例3和4,从而获得增强的基因调节相关的染色质相互作用组(Gain)(图7A)或减弱的的基因调控性染色质相互作用组(Loss)(图7B)。然后在上述两个相互作用组中,对ATAC-seq峰利用HOMER软件进行motif分析(具体见实施例5)。The experimental procedures and results analysis of Hi-C and ChIP-seq are described in detail in Examples 3 and 4 to obtain an enhanced gene regulation-associated chromatin interaction group (Gain) (Fig. 7A) or attenuated gene regulatory chromatin. Interaction group (Loss) (Fig. 7B). Motif analysis was then performed on the ATAC-seq peak using the HOMER software in the above two interaction groups (see Example 5 for details).
结果显示,具有结合位点的PU.1、RUNX JUNB等转录因子均在两组中存在富集现象,但在Gain组和Loss组间不存在富集有显著差异的现象,这表明这些转录因子与改变染色质相互作用的关联较低。但值得注意的是,GATA模体序列(如GATA1和GATA2)只在Loss组的调节区域中显著富集,这表明了GATA转录因子在染色质相互作用中具有独特的作用,进一步的结合RNA-seq的数据,由于GATA2mRNA的表达水平在分化后显著下调(~0.06倍),GATA2结合的丧失可能有助于ATRA诱导过程;由此,通过综合上述多组学的数据,成功鉴定得到GATA2作为候选基因。The results showed that the transcription factors such as PU.1 and RUNX JUNB with binding sites were enriched in the two groups, but there was no significant difference in enrichment between the Gain group and the Loss group, indicating that these transcription factors The association with altered chromatin interactions is lower. However, it is worth noting that GATA motif sequences (such as GATA1 and GATA2) are only significantly enriched in the regulatory region of the Loss group, suggesting that GATA transcription factors have a unique role in chromatin interactions, further binding RNA- According to the seq data, since the expression level of GATA2 mRNA is significantly down-regulated after differentiation (~0.06-fold), the loss of GATA2 binding may contribute to the ATRA induction process; thus, by synthesizing the above multi-omics data, GATA2 was successfully identified as a candidate. gene.
实施例7更多相关转录因子的发现Example 7 Discovery of More Related Transcription Factors
为了进一步描述转录因子和差异表达基因在分化过程中的关系,我们 将上述转录因子和差异表达基因定位到HTRI TF-Target网络(图8A)。在子网络中,我们发现GATA2是网络程度最高的枢纽节点。此外,GATA2与大多数富含ATAC-seq峰的转录因子和已知的粒细胞分化的关键调控因子(图8B)显示了相互作用。总之,通过整合染色质可及性信息和转录调控网络,我们发现GATA2可能在ATRA诱导的HL-60分化过程中起到重要的转录因子的作用。To further describe the relationship between transcription factors and differentially expressed genes during differentiation, we mapped the above transcription factors and differentially expressed genes to the HTRI TF-Target network (Fig. 8A). In the subnet, we find that GATA2 is the most networked hub node. In addition, GATA2 showed interaction with most of the ATAC-seq-rich transcription factors and known key regulators of granulocyte differentiation (Fig. 8B). In conclusion, by integrating chromatin accessibility information and transcriptional regulatory networks, we found that GATA2 may play an important transcription factor in ATRA-induced HL-60 differentiation.
实施例8 ATRA诱导降低了GATA2启动子和调控区域之间的染色质相互作用Example 8 ATRA induction reduces chromatin interactions between the GATA2 promoter and regulatory regions
根据差异基因调控染色质相互作用分析,我们观察到在ATRA刺激后含有Gata2的基因区域和上游调节区域之间的相互作用显著降低(图9A和图9B)。为了详细描述与Gata2相关的染色质构象,接下来使用了Gata2启动子作为诱饵,在对照和ATRA处理的细胞中进行了4C染色质构象捕获技术。Based on differential gene regulation chromatin interaction analysis, we observed a significant decrease in the interaction between the gene region containing Gata2 and the upstream regulatory region after ATRA stimulation (Fig. 9A and Fig. 9B). To describe in detail the chromatin conformation associated with Gata2, the Gata2 promoter was next used as a bait, and 4C chromatin conformation capture techniques were performed in control and ATRA treated cells.
具体流程如下:The specific process is as follows:
首先,用1%甲醛交联细胞,并通过重悬于裂解缓冲液(500μl 10mM Tris-HCl pH8.0,10mM NaCl,0.2%Igepal CA-630和50μl蛋白酶抑制剂)中分离细胞核,然后,用1×NEBuffer2(NEB,英国)洗涤细胞核,并于65℃下使用0.3%SDS处理。随后利用HindIII于37℃酶切过夜,然后在4℃进行近端连接4小时。用蛋白酶K(Ambion,USA)处理反交联后,使用酚-氯仿(Solarbio)提取结合乙醇沉淀从而纯化DNA。第二次酶切步骤使用DpnII过夜进行。然后,我们进行第二次连接和乙醇沉淀以提取DNA。最终使用QIAquick PCR纯化试剂盒(QIAGEN,德国)根据制造商的方案纯化DNA。PCR反应后,我们使用AMPure珠(Beckman,Germany)纯化4C文库,并使用Illumina HiSeq 2500测序仪对文库进行测序。First, the cells were cross-linked with 1% formaldehyde, and the nuclei were separated by resuspending in lysis buffer (500 μl of 10 mM Tris-HCl pH 8.0, 10 mM NaCl, 0.2% Igepal CA-630, and 50 μl protease inhibitor), and then, The nuclei were washed by 1 x NEBuffer 2 (NEB, UK) and treated with 0.3% SDS at 65 °C. Subsequent digestion with HindIII at 37 °C overnight followed by proximal ligation at 4 °C for 4 hours. After the reverse cross-linking was treated with proteinase K (Ambion, USA), the DNA was purified by extracting a combined ethanol precipitate using phenol-chloroform (Solarbio). The second digestion step was carried out overnight using DpnII. Then, we performed a second ligation and ethanol precipitation to extract the DNA. DNA was finally purified using the QIAquick PCR Purification Kit (QIAGEN, Germany) according to the manufacturer's protocol. After the PCR reaction, we purified the 4C library using AMPure beads (Beckman, Germany) and sequenced the library using an Illumina HiSeq 2500 sequencer.
对于4C-seq数据分析,首先使用SAMtools中的cutadapt软件移除衔接子序列。使用Bowtie将读长对应到人类hg19版基因组信息。然后,使用RPM标准化,使用R 3.3.1中的r3Cseq包处理对应后的数据。For 4C-seq data analysis, the adaptor sequence was first removed using the cutadapt software in SAMtools. Use Bowtie to match the read length to the human hg19 version of the genomic information. Then, using the RPM standardization, the corresponding data is processed using the r3Cseq packet in R 3.3.1.
结果result
在对照细胞中,我们发现Gata2启动子与3个被H3K27ac峰所指示的上游增强子(即chr3:128240590-128254410,128262419-128292429和 128309790-128334446)具有强的染色质相互作用。增强子E3(GATA2启动子上游大约80kb)的位置非常接近已知的增强子,证实了4C数据的可靠性。在ATRA诱导后,Gata2启动子和上游区域之间的相互作用强度在不同程度上下降(分别是E1,E2和E3的0.54-,0.46-和0.4倍),与H3K27ac的下降一致(图10A)。模体分析显示这些区域分别含有PU.1/RUNX1和GATA模体序列,并且仅在对照细胞中观察到开放染色质状态(图10A)。关键转录因子(PU.1/RUNX1/GATA)在远端调控区域结合的缺失破坏了染色质环并抑制了GATA2表达。为了进一步证实ATRA诱导后染色质环的消失,我们在对照和ATRA处理的细胞中进行了三维DNA荧光原位杂交(FISH),结果表明对照细胞相对于ATRA处理的细胞,结合Gata2的启动子和增强子E3的探针信号具有更大的重叠(图10B)。此外,两种探针之间的Pearson相关系数在对照细胞中高于ATRA处理的细胞中,进一步验证ATRA诱导后导致的染色质环的破坏(图10C)。In control cells, we found that the Gata2 promoter has strong chromatin interactions with three upstream enhancers indicated by the H3K27ac peak (ie, chr3: 128240590-128254410, 128262419-128292429 and 128309790-128334446). The position of enhancer E3 (approximately 80 kb upstream of the GATA2 promoter) is very close to the known enhancer, confirming the reliability of the 4C data. After ATRA induction, the intensity of interaction between the Gata2 promoter and the upstream region decreased to varying degrees (0.54, 0.46- and 0.4-fold for E1, E2 and E3, respectively), consistent with a decrease in H3K27ac (Fig. 10A). . Morphological analysis showed that these regions contained the PU.1/RUNX1 and GATA motif sequences, respectively, and an open chromatin state was observed only in the control cells (Fig. 10A). Deletion of binding of the key transcription factor (PU.1/RUNX1/GATA) in the distal regulatory region disrupts the chromatin loop and inhibits GATA2 expression. To further confirm the disappearance of the chromatin loop after ATRA induction, we performed three-dimensional DNA fluorescence in situ hybridization (FISH) in control and ATRA-treated cells, and the results showed that the control cells bind to the Gata2 promoter and the ATRA-treated cells. The probe signal of enhancer E3 has a larger overlap (Fig. 10B). Furthermore, the Pearson correlation coefficient between the two probes was higher in the control cells than in the ATRA-treated cells, further confirming the destruction of the chromatin loop caused by ATRA induction (Fig. 10C).
实施例9染色质相互作用的变化(失去染色质环)抑制了ZBTB16 mRNA的表达Example 9 Changes in chromatin interaction (loss of chromatin loop) inhibited ZBTB16 mRNA expression
在此前提到的转录调控网络中(图8A),另一个编码锌指蛋白ZBTB16(也称为PLZF)的重要基因涉及了基因调节性染色质相互作用差异。在ATRA诱导后,该基因的表达量和染色质相互作用显着降低(图9C)。之前的一项研究(Tang Z,Luo OJ,Li X,Zheng M,Zhu JJ,Szalaj P,et al.CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription.Cell.2015 Dec17;163(7):1611–1627)使用ChIP-PET在K562细胞中鉴定了Zbtb16基因位点附近被CTCF结合的3个上游和2个下游锚点(anchor)。为了确定在ATRA诱导下ChIA-PET发现的染色质环是否改变,我们使用5种ChIA-PET锚作为诱饵进行4C分析(图11A)。其中一个3'锚点的结果显示Zbtb16的5'和3'之间的染色质环消失,同时伴随着ZBTB16 mRNA的显着降低(图9D)。这个结果与先前预测相符,即染色质的5'和3'环保持高基因表达水平。在ATRA处理和对照细胞中,在Zbtb16的5'处观察到ATAC-seq峰。出乎意料的是,只有ATRA处理的细胞中的峰富集了PU.1基序,表明PU.1可以在ATRA处理后与5'锚结合。In the previously mentioned transcriptional regulatory network (Fig. 8A), another important gene encoding the zinc finger protein ZBTB16 (also known as PLZF) involves differences in gene regulatory chromatin interactions. After ATRA induction, the expression level of this gene and chromatin interaction were significantly reduced (Fig. 9C). A previous study (Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, et al. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell.2015 Dec17;163(7): 1611–1627) Three upstream and two downstream anchors bound by CTCF near the Zbtb16 gene locus were identified in K562 cells using ChIP-PET. To determine if the chromatin loops found by ChIA-PET were altered under ATRA induction, we used 5 ChIA-PET anchors as baits for 4C analysis (Figure 11A). The results of one of the 3' anchors showed a disappearance of the chromatin loop between 5' and 3' of Zbtb16, accompanied by a significant decrease in ZBTB16 mRNA (Fig. 9D). This result is consistent with previous predictions that the 5' and 3' loops of chromatin maintain high levels of gene expression. The ATAC-seq peak was observed at 5' of Zbtb16 in ATRA treated and control cells. Unexpectedly, only the peaks in the ATRA-treated cells were enriched for the PU.1 motif, indicating that PU.1 can bind to the 5' anchor after ATRA treatment.
基于实施例8和9的结果,我们提出了两个模型来解释在GATA2和 Zbtb16区域中ATRA诱导分化过程中的染色质结构、转录因子结合和基因表达改变。在GATA2区域,PU.1,RUNX1和GATA2结合上游增强子,维持染色质环路并促进转录(图11B),在ATRA处理后,引起转录因子结合的丧失,破坏染色质环从而抑制Gata2的转录。在Zbtb16区域中,5'和3'环维持对照细胞中Zbtb16的连续转录。在ATRA处理后,与Gata2相反,PU.1的结合导致染色质环的破坏从而抑制了Zbtb16的转录(图11C)。Based on the results of Examples 8 and 9, we proposed two models to explain chromatin structure, transcription factor binding, and gene expression changes during ATRA-induced differentiation in the GATA2 and Zbtb16 regions. In the GATA2 region, PU.1, RUNX1 and GATA2 bind to the upstream enhancer, maintain the chromatin loop and promote transcription (Fig. 11B), causing loss of transcription factor binding after ATRA treatment, disrupting the chromatin loop and inhibiting Gata2 transcription. . In the Zbtb16 region, the 5' and 3' loops maintain continuous transcription of Zbtb16 in control cells. After ATRA treatment, in contrast to Gata2, binding of PU.1 resulted in disruption of the chromatin loop and thus inhibited transcription of Zbtb16 (Fig. 11C).

Claims (19)

  1. 一种样品状态转变效应基因的鉴定方法,所述效应基因的表达受到样品状态转变中染色质相互作用变化的影响,其包括下列步骤:A method for identifying a sample state transition effector gene whose expression is affected by a change in chromatin interaction in a state transition of a sample, comprising the steps of:
    (1)对处于第一状态和第二状态的样品进行比较,从而至少获得下列差异信息:基因可识别行为差异,以及存在于基因转录调控区域的染色质相互作用差异,和(1) comparing the samples in the first state and the second state, thereby obtaining at least the following difference information: the difference in the identifiable behavior of the gene, and the difference in chromatin interactions existing in the transcriptional regulatory region of the gene, and
    (2)将步骤(1)获得的差异信息建立关联,获得与状态转变中转录调控区域的染色质相互作用差异有关的基因可识别行为差异,从而鉴定所述效应基因。(2) correlating the difference information obtained in the step (1) to obtain a difference in the identifiable behavior of the gene related to the chromatin interaction difference in the transcriptional regulatory region in the state transition, thereby identifying the effector gene.
  2. 根据权利要求1所述的方法,其中所述样品是细胞。The method of claim 1 wherein the sample is a cell.
  3. 根据权利要求1或2所述的方法,其中所述基因可识别行为差异包括基因表达量的差异和/或基因调控区域基因组序列中结合模体分布的差异;优选的,所述基因表达量的差异是mRNA表达量差异或蛋白质表达量差异。The method according to claim 1 or 2, wherein said gene recognizable behavioral difference comprises a difference in gene expression amount and/or a difference in binding pattern distribution in a gene regulatory region genomic sequence; preferably, said gene expression amount The difference is the difference in mRNA expression or the difference in protein expression.
  4. 根据权利要求1至3任一项所述的方法,其中通过以下步骤获得步骤(1)中存在于基因转录调控区域的染色质相互作用差异:The method according to any one of claims 1 to 3, wherein the chromatin interaction difference existing in the transcriptional regulatory region of the gene in the step (1) is obtained by the following steps:
    (a)鉴定样品基因组中处于激活状态的启动子和/或增强子的位点;(a) identifying the site of the promoter and/or enhancer in the active state of the sample genome;
    (b)鉴定所有染色质相互作用的发生区域;(b) Identify the areas where all chromatin interactions occur;
    (c)整合步骤(a)和步骤(b)所获得的信息,得到位于激活状态的启动子和增强子之间存在的染色质相互作用,即存在于基因转录调控区域的染色质相互作用;和(c) integrating the information obtained in step (a) and step (b) to obtain a chromatin interaction between the promoter and the enhancer in an activated state, that is, a chromatin interaction existing in the transcriptional regulatory region of the gene; with
    (d)将不同样品之间存在于基因转录调控区域的染色质相互作用进行比较,得到存在于基因转录调控区域的染色质相互作用差异。(d) Comparing chromatin interactions between different samples in the transcriptional regulatory region of the gene to obtain differences in chromatin interactions present in the transcriptional regulatory regions of the gene.
  5. 根据权利要求1至4中任一项所述的方法,其中通过以下步骤获得步骤(1)中的基因可识别行为差异:The method according to any one of claims 1 to 4, wherein the difference in the identifiable behavior of the gene in the step (1) is obtained by the following steps:
    i)获得处于第一状态和第二状态的样品;i) obtaining samples in the first state and the second state;
    ii)取部分处于第一状态和第二状态的样品,分别进行转录表达分析,并比较样品间的mRNA表达量差异;优选的,转录表达分析采用RNA测序即RNA-seq法。Ii) taking a portion of the sample in the first state and the second state, respectively performing transcriptional expression analysis, and comparing the difference in mRNA expression between the samples; preferably, the transcriptional expression analysis is performed by RNA sequencing, ie, RNA-seq method.
  6. 根据权利要求5所述的方法,其中进一步包括:The method of claim 5 further comprising:
    iii)取部分处于第一状态和第二状态的样品,分别进行染色质开放区域序列的分析,优选使用ATAC-seq法,并分析染色质开放区域序列中所分布的转录因子结合模体,优选进一步比较处于第一状态和第二状态的样品间的转录因子结合模体的分布差异。Iii) taking a portion of the sample in the first state and the second state, respectively analyzing the chromatin open region sequence, preferably using the ATAC-seq method, and analyzing the transcription factor binding motif distributed in the chromatin open region sequence, preferably The difference in distribution of transcription factor binding motifs between samples in the first state and the second state is further compared.
  7. 根据权利要求1至6中任一项所述的方法,其中通过以下步骤分析步骤(1)中存在于基因转录调控区域的染色质相互作用差异:The method according to any one of claims 1 to 6, wherein the difference in chromatin interactions present in the transcriptional regulatory region of the gene in step (1) is analyzed by the following steps:
    iv)取部分处于第一状态和第二状态的样品,鉴定分别处于激活状态的启动子和/或增强子信息,优选的,采用ChIP-seq法进行鉴定,所述ChIP-seq法中所使用的抗体优选为H3K4me3和H3K27ac的结合抗体,所述抗体分别结合H3K4me3和H3K27ac形成信号峰,分别代表了处于激活状态的启动子和增强子位点;Iv) taking a portion of the sample in the first state and the second state, identifying promoter and/or enhancer information respectively in an activated state, preferably using the ChIP-seq method, which is used in the ChIP-seq method Preferably, the antibody is a binding antibody of H3K4me3 and H3K27ac, which forms a signal peak in combination with H3K4me3 and H3K27ac, respectively, representing promoter and enhancer sites in an activated state;
    v)另取部分处于第一状态和第二状态的样品,采用染色质构象捕获技术,优选采用高通量染色质构象捕获技术,例如Hi-C法、in situ Hi-C法、BL-Hi-C法或ChIA-PET法,获得全基因组染色质相互作用的信息;或者利用4C或5C法获得局部染色质相互作用的信息;v) taking another sample in the first state and the second state, using chromatin conformation capture technology, preferably using high-throughput chromatin conformation capture techniques, such as Hi-C method, in situ Hi-C method, BL-Hi -C method or ChIA-PET method to obtain information on genome-wide chromatin interaction; or to obtain information on local chromatin interaction using 4C or 5C method;
    vi)将参考基因组序列划分成一定大小的区域,优选的,所述区域大小在1-40kb之间,例如1kb、5kb、10kb、15kb、20kb、25kb、30kb、35kb或40kb,基于步骤iv)获得的活性状态的启动子和增强子信息,通过比对分别获得包含有活性的启动子和增强子位点的区域,优选将包含启动子的区域命名为基因区域,将包含增强子序列的区域命名为调控区域;Vi) dividing the reference genomic sequence into regions of a certain size, preferably between 1 and 40 kb, for example 1 kb, 5 kb, 10 kb, 15 kb, 20 kb, 25 kb, 30 kb, 35 kb or 40 kb, based on step iv) Promoter and enhancer information of the obtained active state, by separately obtaining a region containing the active promoter and enhancer sites, preferably the region including the promoter is named as a gene region, and the region containing the enhancer sequence Named the regulatory area;
    随后,结合步骤v)获得的染色质相互作用的频率信号,识别出发生于基因区域和调节区域之间的染色质相互作用频率信号,从而得到基因调控相关染色质相互作用;然后将处于第一状态和第二状态的样品之间的基因调控相关染色质相互作用进行比较,其中具有统计显著性差异的,被鉴定为基因调控性染色质相互作用差异(differential gene-regulatory interaction),包括相对于处于第一状态的样品,处于第二状态的样品中增强的基因调控性染色体相互作用和/或减弱的基因调控性染色体相互作用。Subsequently, combined with the frequency signal of the chromatin interaction obtained in step v), the chromatin interaction frequency signal occurring between the gene region and the regulatory region is identified, thereby obtaining a gene regulation-related chromatin interaction; Comparison of gene regulation-related chromatin interactions between samples of the state and the second state, with statistically significant differences, identified as differential gene-regulatory interactions, including relative The sample in the first state, the enhanced gene regulatory chromosome interaction and/or the attenuated gene regulatory chromosome interaction in the sample in the second state.
  8. 根据权利要求1至7中任一项所述的方法,其中步骤(2)具体包括下述步骤:The method according to any one of claims 1 to 7, wherein step (2) specifically comprises the steps of:
    a)将基因表达量差异与基因调控性染色质相互作用差异结合,选择在不同状态的样品中,存在于增强或减弱的基因调控性染色质相互作用内 部、同时表达量也具有显著变化的基因;或a) Combine differences in gene expression levels with gene-regulated chromatin interactions, and select genes that are present in enhanced or diminished gene-regulated chromatin interactions with significant changes in expression in different states of the sample. ;or
    b)将基因组转录调控区域中转录因子结合模体分布的差异与基因调控性染色质相互作用差异信息相结合,选择在不同状态的样品中,存在于增强的或减弱的基因调控性染色质相互作用内部、转录因子结合模体分布也发生显著变化的基因;或b) Combine the difference in transcription factor binding motif distribution in the genomic transcriptional regulatory region with the information on the difference in gene regulatory chromatin interaction, and select the enhanced or weakened gene regulatory chromatin in the different states of the sample. a gene whose internal action, transcription factor binding motif distribution also undergoes significant changes; or
    c)将基因表达量差异、转录因子结合模体分布的差异与基因调控性染色质相互作用差异信息相结合,选择在不同状态的样品中,存在于增强的或减弱的基因调控性染色质相互作用内部、基因组转录调控区域中结合模体分布发生显著变化、同时表达量也具有显著变化的基因。c) Combine the difference in gene expression, the difference in transcription factor binding motif distribution with the information on the difference in gene regulatory chromatin interaction, and choose to enhance or attenuate gene regulatory chromatin in different states of the sample. In the internal and genomic transcriptional regulatory regions, genes whose binding motif distribution changes significantly and the expression level also changes significantly.
  9. 根据权利要求1至8中任一项的方法,此外还包括对筛选出来的效应基因进行功能研究以确定其功能的步骤。The method according to any one of claims 1 to 8, further comprising the step of performing a functional study on the screened effector gene to determine its function.
  10. 根据权利要求1至9中任一项所述的方法,其还进一步包括鉴定获得基因调控性染色质相互作用的步骤,即将能够影响步骤(2)鉴定得到的效应基因表达的染色质相互作用,作为基因调控性染色质相互作用。The method of any one of claims 1 to 9, further comprising the step of identifying a gene-regulated chromatin interaction, ie, a chromatin interaction capable of affecting the expression of the effector gene identified in step (2), As a gene-regulated chromatin interaction.
  11. 根据权利要求1至10中任一项所述的方法,其中所述样品状态转变通过下述方式实现:化学试剂诱导、自然分化和/或物理刺激。The method according to any one of claims 1 to 10, wherein the sample state transition is achieved by chemical agent induction, natural differentiation and/or physical stimulation.
  12. 一种鉴定调控样品状态转变的染色质相互作用的方法,其包括权利要求1至11中任一项所述的步骤。A method of identifying a chromatin interaction that modulates a state transition of a sample, comprising the steps of any one of claims 1 to 11.
  13. 一种鉴定参与样品状态转变所涉及染色质相互作用的调控因子的方法,其包括权利要求1至11中任一项所述的步骤。A method of identifying a regulatory factor involved in a chromatin interaction involved in a state transition of a sample, comprising the steps of any one of claims 1 to 11.
  14. 一种鉴定调控染色质相互作用的物质的方法,其包括:利用权利要求1至9或11中任一项所述的方法鉴定得到染色质相互作用的效应基因或利用权利要求10或11所述方法鉴定得到基因调控性染色质相互作用,随后将待测物质与样品接触,分析所述效应基因或基因调控性相互作用的变化。A method of identifying a substance that modulates a chromatin interaction, comprising: identifying an effector gene of a chromatin interaction using the method of any one of claims 1 to 9 or 11 or using the method of claim 10 or The method identifies a gene-regulated chromatin interaction, and then contacts the test substance with the sample to analyze changes in the effector gene or gene regulatory interaction.
  15. 一种样品状态转变效应基因的鉴定系统,所述效应基因的表达受到样品状态转变中染色质相互作用改变的影响,包括下述模块:An identification system for a sample state transition effector gene whose expression is affected by changes in chromatin interactions in a state transition of a sample, including the following modules:
    (1)基因可识别行为差异分析模块;(1) Gene identifiable behavior difference analysis module;
    (2)转录调控区域的基因调控性染色质相互作用差异的分析模块;和(2) an analysis module for the difference in gene regulatory chromatin interactions in the transcriptional regulatory region;
    (3)效应基因鉴定模块;(3) an effect gene identification module;
    所述系统能够获得与染色质相互作用差异相关的基因可识别行为差异,从而获得受染色质相互作用影响的样品状态转变效应基因。The system is capable of obtaining genetically identifiable behavioral differences associated with differences in chromatin interactions, thereby obtaining sample state transition effector genes that are affected by chromatin interactions.
    优选的,所述系统还进一步包括基因调控性染色质相互作用鉴定模块,从而鉴定能够影响所述效应基因表达的基因调控性染色质相互作用。Preferably, the system further comprises a gene regulatory chromatin interaction recognition module to identify gene regulatory chromatin interactions that are capable of affecting expression of the effector gene.
  16. 根据权利要求15的系统,其中所述基因可识别行为差异分析模块能够分析基因表达量的差异和/或基因的转录调控区域的基因组序列中转录因子结合模体分布的差异。The system according to claim 15, wherein said gene recognizable behavior difference analysis module is capable of analyzing a difference in gene expression amount and/or a difference in transcription factor binding motif distribution in a genomic sequence of a transcription regulatory region of the gene.
  17. 根据权利要求15或16的系统,其中转录调控区域的染色质相互作用差异的分析模块能够执行下述分析:The system according to claim 15 or 16, wherein the analysis module for the difference in chromatin interactions of the transcriptional regulatory region is capable of performing the following analysis:
    (a)鉴定样品基因组中处于激活状态的启动子和/或增强子的位点;(a) identifying the site of the promoter and/or enhancer in the active state of the sample genome;
    (b)鉴定所有染色质相互作用的发生区域;(b) Identify the areas where all chromatin interactions occur;
    (c)整合步骤(a)和步骤(b)所获得的信息,得到位于激活状态的启动子和增强子之间存在的染色质相互作用,即存在于转录调控区域的染色质相互作用;和(c) integrating the information obtained in step (a) and step (b) to obtain a chromatin interaction between the promoter and the enhancer in an activated state, ie, a chromatin interaction present in the transcriptional regulatory region;
    (d)将不同样品之间存在于转录调控区域的染色质相互作用进行比较,得到存在于转录调控区域的染色质相互作用差异。(d) Comparison of chromatin interactions between different samples in the transcriptional regulatory region to obtain differences in chromatin interactions present in the transcriptional regulatory regions.
  18. 根据权利要求15至17中任一项所述的系统,所述效应基因鉴定模块能够执行下述分析:The system according to any one of claims 15 to 17, wherein the effector gene identification module is capable of performing the following analysis:
    a)将基因表达量差异与基因调控性染色质相互作用差异结合,选择在不同状态的样品中,存在于增强或减弱的基因调控性染色质相互作用内部、同时表达量也具有显著变化的基因;或a) Combine differences in gene expression levels with gene-regulated chromatin interactions, and select genes that are present in enhanced or diminished gene-regulated chromatin interactions with significant changes in expression in different states of the sample. ;or
    b)将基因组转录调控区域中转录因子结合模体分布的差异与基因调控性染色质相互作用差异信息相结合,选择在不同状态的样品中,存在于增强的或减弱的基因调控性染色质相互作用内部、转录因子结合模体分布也发生显著变化的基因;或b) Combine the difference in transcription factor binding motif distribution in the genomic transcriptional regulatory region with the information on the difference in gene regulatory chromatin interaction, and select the enhanced or weakened gene regulatory chromatin in the different states of the sample. a gene whose internal action, transcription factor binding motif distribution also undergoes significant changes; or
    c)将基因表达量差异、转录因子结合模体分布的差异与基因调控性染色质相互作用差异信息相结合,选择在不同状态的样品中,存在于增强的或减弱的基因调控性染色质相互作用内部、基因组转录调控区域中结合模体分布发生显著变化、同时表达量也具有显著变化的基因。c) Combine the difference in gene expression, the difference in transcription factor binding motif distribution with the information on the difference in gene regulatory chromatin interaction, and choose to enhance or attenuate gene regulatory chromatin in different states of the sample. In the internal and genomic transcriptional regulatory regions, genes whose binding motif distribution changes significantly and the expression level also changes significantly.
  19. 一种检测试剂盒,其包含权利要求1至14中任一项所述方法中所使用的试剂。A test kit comprising the reagent used in the method of any one of claims 1 to 14.
PCT/CN2018/124761 2018-01-05 2018-12-28 Method and system for identifying gene-regulatory chromatin interaction and application thereof WO2019134586A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810011140.7A CN108220394B (en) 2018-01-05 2018-01-05 Identification method and system for gene regulatory chromatin interaction and application thereof
CN201810011140.7 2018-01-05

Publications (1)

Publication Number Publication Date
WO2019134586A1 true WO2019134586A1 (en) 2019-07-11

Family

ID=62642997

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124761 WO2019134586A1 (en) 2018-01-05 2018-12-28 Method and system for identifying gene-regulatory chromatin interaction and application thereof

Country Status (2)

Country Link
CN (1) CN108220394B (en)
WO (1) WO2019134586A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108220394B (en) * 2018-01-05 2021-03-23 清华大学 Identification method and system for gene regulatory chromatin interaction and application thereof
CN109033751B (en) * 2018-07-20 2021-07-27 东南大学 Function prediction method for non-coding region mononucleotide genome variation
CN109448783B (en) * 2018-08-07 2022-05-13 清华大学 Analysis method of chromatin topological structure domain boundary
CN109837335A (en) * 2019-03-20 2019-06-04 福建省农业科学院食用菌研究所(福建省蘑菇菌种研究推广站) A method of joint ATAC-seq and RNA-seq screens edible and medical fungi functional gene
CN110544509B (en) * 2019-08-20 2021-06-11 广州基迪奥生物科技有限公司 Single-cell ATAC-seq data analysis method
CN112562783A (en) * 2019-09-26 2021-03-26 北京百迈客生物科技有限公司 Method for mining functional gene by combining three-dimensional structure difference identification of genome and transcriptome gene expression level difference analysis
CN112011625B (en) * 2020-09-02 2023-08-11 武汉爱基百客生物科技有限公司 Detection method for evaluating enrichment result of porcine histone modification
CN112365920B (en) * 2020-09-30 2024-04-02 中国农业科学院蜜蜂研究所 Method for identifying bee differentiation key genes, identified genes and application
CN115786501B (en) * 2022-07-02 2023-06-16 武汉大学 Enhancer functional site related to colorectal cancer early screening and auxiliary diagnosis and application thereof
CN115651975B (en) * 2022-11-17 2023-03-17 四川大学 Pre-screening method, system and storage medium for hyperuricemia kidney disease pathogenic factor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017025594A1 (en) * 2015-08-12 2017-02-16 Cemm Forschungszentrum Für Molekulare Medizin Gmbh Methods for studying nucleic acids
CN106754868A (en) * 2016-11-29 2017-05-31 武汉菲沙基因信息有限公司 A kind of method of the DNA fragmentation interacted in capture Matrix attachment region
CN107119120A (en) * 2017-05-04 2017-09-01 河海大学常州校区 A kind of key effect molecular detecting method based on chromatin 3D conformation technologies
CN108220394A (en) * 2018-01-05 2018-06-29 清华大学 Identification method, system and its application of gene regulation sex chromatin interaction

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0810051D0 (en) * 2008-06-02 2008-07-09 Oxford Biodynamics Ltd Method of diagnosis
GB2517936B (en) * 2013-09-05 2016-10-19 Babraham Inst Chromosome conformation capture method including selection and enrichment steps

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017025594A1 (en) * 2015-08-12 2017-02-16 Cemm Forschungszentrum Für Molekulare Medizin Gmbh Methods for studying nucleic acids
CN106754868A (en) * 2016-11-29 2017-05-31 武汉菲沙基因信息有限公司 A kind of method of the DNA fragmentation interacted in capture Matrix attachment region
CN107119120A (en) * 2017-05-04 2017-09-01 河海大学常州校区 A kind of key effect molecular detecting method based on chromatin 3D conformation technologies
CN108220394A (en) * 2018-01-05 2018-06-29 清华大学 Identification method, system and its application of gene regulation sex chromatin interaction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PAN, YOUFU: "Advances in Chromatin in Teraction Studies", JOURNAL OF ZUNYI MEDICAL UNIVERSITY, vol. 37, no. 5, 31 October 2014 (2014-10-31), pages 470 - 478 *

Also Published As

Publication number Publication date
CN108220394A (en) 2018-06-29
CN108220394B (en) 2021-03-23

Similar Documents

Publication Publication Date Title
WO2019134586A1 (en) Method and system for identifying gene-regulatory chromatin interaction and application thereof
G Hendrickson et al. Widespread RNA binding by chromatin-associated proteins
JP6697070B2 (en) Nucleic acid research methods
Routh et al. Poly (A)-ClickSeq: click-chemistry for next-generation 3΄-end sequencing without RNA enrichment or fragmentation
Alecki et al. RNA-DNA strand exchange by the Drosophila Polycomb complex PRC2
Thomas et al. Transcript isoform sequencing reveals widespread promoter-proximal transcriptional termination in Arabidopsis
EP2083090B1 (en) Nucleic acid interaction analysis
Cullum et al. The next generation: using new sequencing technologies to analyse gene regulation
US20210010062A1 (en) Method for analyzing an interaction effect of nucleic acid segments in nucleic acid complex
US20050255501A1 (en) Method for gene identification signature (GIS) analysis
CN109477132B (en) Ribonucleic acid (RNA) interactions
JP7140754B2 (en) Genome-wide identification of chromatin interactions
Jayaseelan et al. Profiling post-transcriptionally networked mRNA subsets using RIP-Chip and RIP-Seq
Akhtar et al. TAF-ChIP: an ultra-low input approach for genome-wide chromatin immunoprecipitation assay
US20180274007A1 (en) Methods of genome seqencing and epigenetic analysis
Collins et al. High-throughput and quantitative genome-wide messenger RNA sequencing for molecular phenotyping
US20220002337A1 (en) Poly(A)-ClickSeq Click-Chemistry for Next Generation 3-End Sequencing Without RNA Enrichment or Fragmentation
Sheu et al. Bioinformatics of epigenetic data generated from next-generation sequencing
Bowman Discovering enhancers by mapping chromatin features in primary tissue
Tang et al. In vivo, genome-wide profiling of endogenously tagged chromatin-binding proteins with spatial and temporal resolution using NanoDam in Drosophila
Zibetti et al. Lhx2 regulates temporal changes in chromatin accessibility and transcription factor binding in retinal progenitor cells
Metkar RIPPLiT and ChimeraTie: High throughput tools for understanding higher order RNP structures
EP3283646B1 (en) Method for analysing nuclease hypersensitive sites.
Gohr et al. Insplico: Effective computational tool for studying intron splicing order genome-wide with short and long RNA-seq reads
Beltran Marqués et al. Integrator is recruited to promoter-proximally paused RNA Pol II to generate Caenorhabditis elegans piRNA precursors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18898104

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18898104

Country of ref document: EP

Kind code of ref document: A1