WO2021224233A1 - Method - Google Patents

Method Download PDF

Info

Publication number
WO2021224233A1
WO2021224233A1 PCT/EP2021/061685 EP2021061685W WO2021224233A1 WO 2021224233 A1 WO2021224233 A1 WO 2021224233A1 EP 2021061685 W EP2021061685 W EP 2021061685W WO 2021224233 A1 WO2021224233 A1 WO 2021224233A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
dna
minutes
dna fragments
hours
Prior art date
Application number
PCT/EP2021/061685
Other languages
French (fr)
Inventor
Mehmet A. YILMAZ
Erik C. SPLINTER
Katie J. CLARKE
Joost Fransciscus SWENNENHUIS
Original Assignee
Cergentis B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB2006546.2A external-priority patent/GB202006546D0/en
Priority claimed from GBGB2010492.3A external-priority patent/GB202010492D0/en
Priority claimed from GBGB2101819.7A external-priority patent/GB202101819D0/en
Application filed by Cergentis B.V. filed Critical Cergentis B.V.
Publication of WO2021224233A1 publication Critical patent/WO2021224233A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • C12Q1/6855Ligating adaptors
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention relates to the field of molecular biology.
  • the present invention relates to methods for enzymatically treating polynucleotides from preserved, cross-linked tissue samples - such as formalin-fixed, paraffin-embedded samples (FFPE).
  • FFPE formalin-fixed, paraffin-embedded samples
  • the invention relates to the sequencing of DNA, for example to strategies for determining a DNA sequence of a genomic region of interest.
  • the invention relates to the determination of the sequence of parts of a genome that are in a spatial configuration with each other.
  • the invention also provides methods that enable the recovery of cross-linked DNA and soluble RNA from a single starting sample.
  • tissue samples such as formalin-fixed, paraffin-embedded samples (FFPE) - are a common sample type in both clinical and research settings.
  • FFPE paraffin-embedded samples
  • biopsies and surgical samples are often prepared as FFPE tissue blocks. Accordingly, these samples provide a notable resource for determining genetic information; for example to identify the presence of mutations from e.g. a tumour sample.
  • Examples of methods which may be performed on preserved, cross-linked tissue samples include proximity ligation sequencing methods in which crosslinked DNA fragments that originate from a genomic region of interest remain in proximity of each other because they are crosslinked. When these crosslinked DNA fragments are subsequently ligated, DNA fragments of the genomic region of interest, which are in the proximity of each other due to the crosslinks, are ligated. By determining (at least part of) the sequence of ligated fragments that comprise a fragment comprising a target nucleotide sequence, sequences of DNA fragments within the spatial surrounding of the genomic region of interest are obtained. Each individual target nucleotide sequence is likely to be crosslinked to multiple other DNA fragments.
  • DNA fragment may be ligated to a fragment comprising the target nucleotide sequence.
  • a sequence of the genomic region of interest may be built and the presence of mutations identified.
  • the present invention is based on the inventors’ surprising determination that the use of certain conditions for a permeabilization incubation prior to enzyme treatment improves the ability of the enzyme(s) to act on the cross-linked DNA. This is considered to result in higher quality data from downstream analysis. Without wishing to be bound by theory, it is considered that the permeabilization incubation improves the ability of enzymes to access the cross-linked DNA, thus improving the ability of the enzyme (s) to act on the DNA.
  • act on DNA refers to enzymes that directly alter the structure of a DNA molecule itself, for example in contrast to enzymes which act on proteins associated with DNA.
  • enzymes that act on DNA in the context of the present invention include restriction enzymes, SI nuclease, DNasel, DNA ligase, DNA nickase, and DNA polymerase.
  • the present invention provides a method for determining at least part of the sequences of DNA fragments from a sample of fragmented cross-linked DNA; which comprises the following steps: a) providing a sample of fragmented cross-linked DNA; b) permeablizing the sample by incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours; c) optionally, further fragmenting the cross-linked DNA; d) optionally, repairing the ends of DNA fragments; e) ligating the cross-linked DNA fragments; f) reversing the cross-linking; and g) determining at least part of the sequences of the DNA fragments.
  • the present invention relates to a method for determining at least part of the sequences of DNA fragments from a sample of fragmented cross-linked DNA; which comprises the following steps: a) providing a sample of fragmented cross-linked DNA; b) permeablizing the sample by incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours; c) optionally, further fragmenting the crosslinked DNA; d) optionally, repairing the ends of DNA fragments to facilitate ligation; e) ligating the fragmented crosslinked DNA; f) reversing the crosslinking; g) optionally fragmenting the DNA of step f), h) optionally, ligating the fragmented DNA of step f) or g) to at least one adaptor; i) optionally, (1) amplifying the ligated DNA fragments of step f) or g) comprising the target nucleotide sequence using at least one primer which hybridises to the target nucleo
  • the invention provides a method for determining the presence or absence of a mutation in a genomic region of interest comprising a target nucleotide sequence, comprising the steps of: a) providing a sample of fragmented cross-linked DNA; b) permeablizing the sample by incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours; c) optionally, further fragmenting the crosslinked DNA; d) optionally, repairing the ends of DNA fragments; e) ligating the fragmented crosslinked DNA; f) reversing the crosslinking; g) optionally fragmenting the DNA of step f), h) optionally, ligating the fragmented DNA of step f) or g) to at least one adaptor; i) optionally, (1) amplifying the ligated DNA fragments of step f) or g) comprising the target nucleotide sequence using at least one primer which hybridises to the target nucleotide sequence, or
  • the present invention also provides methods that enable the simultaneous recovery of cross-linked DNA and soluble RNA from a single starting sample, such as a FFPE sample.
  • the present invention provides a method for recovering soluble RNA and cross-linked DNA from a sample comprising cross-linked cells or tissue; which method comprises a step of permeabilizing the sample by an incubation at about 75 °C to about 100 °C for about 1 minute to about 4 hours.
  • the cross-linked DNA may subsequently be used in a method for determining at least part of the sequences of DNA fragments from the cross-linked DNA or a method for determining the presence or absence of a mutation in a genomic region of interest; as described herein.
  • Such methods may be advantageous, for example, in allowing recovery of cross-linked DNA and soluble RNA to from a single starting sample, in particular in the same step of a method.
  • This may be particularly advantageous for FFPE samples, for example.
  • FFPE samples represent a precious resource from both clinical and research settings.
  • the ability to recover RNA and cross- linked DNA from a FPPE sample may therefore provide particular advantages by enabling the sequencing and analysis of both the cross-linked DNA and corresponding RNA from the same sample.
  • the ability to analyze both RNA and cross-linked DNA from a single starting sample may be advantageous in enabling the expression of genes with identified mutations (e.g. SNVs or structural changes) to be determined.
  • Figure 2 For two FFPE samples subjected to the indicated permeabilization conditions an increase in the fraction of covered nucleotides in a lOOkb region surrounding the viewpoint was observed for both the 80 and 90°C compared to the aliquots incubated at 37 or 65°C. Aliquots of Sample 1 were subjected to 37, 65 and 80°C incubations for both 30 (blue) and 60 minutes (orange), for sample two an additional 90°C incubation was included. The fraction of covered nucleotides in the lOOkb region surrounding the location of the primer set is shown on the y-axis.
  • Figure 3 Longer permeabilization incubation at high temperatures increases enzyme accessibility. Aliquots of Sample 1 and 2 were subjected to 30, 60, 120, 150, 180, 210 and 240 minutes incubation at 80°C, aliquots of Sample 3 were subjected to an incubation for 15, 30, 45 and 60 minutes at 90°C. The fraction of covered nucleotides in the lOOkb region surrounding the location of the primer set is shown on the y-axis.
  • Figure 4 The effect of the permeabilization time for 30, 60, 120, 180 and 240 minutes at 80°C on various FFPE sample qualities.
  • Figure 5 DNA yields (ng) retrieved from each experimental condition for three different samples. Stars refer to the conditions that are best performing when considering the performance of the three FFPE samples from different qualities.
  • Figure 6 The effect of the permeabilization time and temperature on the fraction of covered nucleotides in the target region. Stars refer to the conditions that are best performing when considering the performance of the three different FFPE samples.
  • Figure 7 Relative RNA Yield in Supernatant from FFPE Samples. Normalised to RNA yield in unincubated samples to determine relative increase in RNA release. Four different FFPE samples were used to allow for averages to be calculated.
  • Figure 8 Quality Analysis of RNA isolated from Samples Subjected to Different Incubation Conditions.
  • A Scale diagram of the size and distance between three different primer pairs from the TBP RNA. Three different primer pairs were designed to be located at different distances from the poly A tail of the RNA. The further the primer pair of from the poly A tail, the more susceptible to degradation the RNA is.
  • B. 15 minute incubation C. 30 minute incubation
  • Figure 9 Overview of the workflow of an illustrative method of the present invention. Graphical overview of the methods used to integrate an RNA isolation step into the FFPE-TLA protocol.
  • the present methods comprise a step of permeabilizing a sample of cross-linked DNA by incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours prior to enzyme treatment.
  • a permeabilization step allows recovery of RNA and cross-linked DNA from the same starting sample, in particularly recovery of RNA and cross- linked DNA from the same starting FFPE sample. This allows for the integration of an RNA isolation procedure into, for example, a proximity ligation protocol on FFPE sample without sacrificing the quality of data extracted from the DNA sequences.
  • the methods may comprise incubating the sample at about 75 °C to about 100 °C.
  • the methods may comprise incubating the sample at about 75 °C to about 100 °C, about 78 °C to about 100 °C, or about 80 °C to about 100 °C, about 80 °C to about 95 °C, about 80 °C to about 90 °C, or 85 °C to about 90 °C.
  • the methods may comprise incubating the sample at about 78 °C, about 79 °C, about 80 °C, about 81 °C, about 82 °C, about 83 °C, about 84 °C, or about 85 °C.
  • the methods may comprise incubating the sample at about 84 °C, about 85 °C, about 86 °C, about 87 °C, about 88 °C, about 89 °C, about 90 °C, about 91 °C, about 92 °C, about 93 °C, about 94 °C, about 95 °C, about 96 °C, about 97 °C, about 98 °C, about 99 °C or about 100 °C.
  • the methods may comprise incubating the sample at about 84 °C, about 85 °C, about 86 °C, about 87 °C, about 88 °C, about 89 °C, about 90 °C, about 91 °C, about 92 °C, about 93 °C, about 94 °C, or about 95 °C.
  • the methods may comprise incubating the sample at about 84 °C, about 85 °C, about 86 °C, about 87 °C, about 88 °C, about 89 °C, or about 90 °C.
  • the methods may comprise incubating the sample at about 80 °C.
  • the methods may comprise incubating the sample at about 90 °C.
  • the methods may comprise incubating the sample at about 95 °C.
  • the methods may comprise incubating the sample at about 98 °C.
  • the present methods may comprise incubating the sample for about 1 minute to about 4 hours.
  • the present methods may comprise incubating the sample for about 30 minutes to about 4 hours.
  • the present methods may comprise incubating the sample for about for about 30 minutes to about 3 hours.
  • the present methods may comprise incubating the sample for about for about 1 hour to about 3 hours.
  • the present methods may comprise incubating the sample for about for about 2 hour to about 3 hours.
  • the present methods may comprise incubating the sample for about for about 2 hour or about 3 hours.
  • the present methods may comprise incubating the sample for about 2 hours.
  • the present methods may comprise incubating the sample for about 3 hours.
  • the present methods may comprise incubating the sample for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample for about 30 minutes.
  • the present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 4 hours.
  • the present methods may comprise incubating the sample at about 75 °C to about 85 °C for about 1 hour to about 3 hours.
  • the present methods may comprise incubating the sample at 75 °C to about 85 °C for about 2 hour to about 3 hours.
  • the present methods may comprise incubating the sample at about 75 °C to about 85 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 75 °C to about 85 °C for about 30 minutes.
  • the present methods may comprise incubating the sample at 75 °C to about 85 °C for about 3 hours.
  • the present methods may comprise incubating the sample at 75 °C to about 85 °C for about 2 hours.
  • the present methods may comprise incubating the sample at about 78 °C to about 83 °C for about 30 minutes to about 4 hours.
  • the present methods may comprise incubating the sample at about 78 °C to about 83 °C about 1 hour to about 4 hours.
  • the present methods may comprise incubating the sample at 78 °C to about 83 °C for about 2 hour to about 3 hours.
  • the present methods may comprise incubating the sample at 78 °C to about 83 °C for about 3 hours.
  • the present methods may comprise incubating the sample at 78 °C to about 83 °C for about 2 hours.
  • the present methods may comprise incubating the sample at about 78 °C to about 83 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 78 °C to about 83 °C for about 30 minutes
  • the present methods may comprise incubating the sample at about 80 °C for about 30 minutes to about 4 hours.
  • the present methods may comprise incubating the sample at about 80 °C for about for about 30 minutes to about 3 hours.
  • the present methods may comprise incubating the sample at about 80 °C for about for about 1 hour to about 3 hours.
  • the present methods may comprise incubating the sample at about 80 °C for about for about 2 hour to about 3 hours.
  • the present methods may comprise incubating the sample at about 80 °C for about for about 2 hour or about 3 hours.
  • the present methods may comprise incubating the sample at about 80 °C for about 3 hours.
  • the present methods may comprise incubating the sample at about 80 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 80 °C for about 30 minutes
  • the present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours.
  • the present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 4 hours.
  • the present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 2 hours.
  • the present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 75 °C to about 90 °C for about 30 minutes to about 4 hours.
  • the present methods may comprise incubating the sample at about 75 °C to about 90 °C for about 30 minutes to about 2 hours.
  • the present methods may comprise incubating the sample at about 75 °C to about 90 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 75 °C to about 90 °C for about 1 hour.
  • the present methods may comprise incubating the sample at about 80 °C to about 100 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 85 °C to about 100 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 90 °C to about 100 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 80 °C to about 90 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 85 °C to about 90 °C for about 30 minutes to about 1 hour.
  • the present methods may comprise incubating the sample at about 75 °C to about 85 °C for about 1 hour to about 3 hours.
  • the present methods may comprise incubating the sample at 75 °C to about 85 °C for about 2 hour to about 3 hours.
  • the present methods may comprise incubating the sample at about 90 °C for about 1 minute to about 2 hours.
  • a relatively high incubation temperature e.g. about 90 °C to about 100 °C
  • a relatively short time period e.g. about 1 minute to about 1 hour
  • the present invention provides a particularly high incubation temperature (e.g. about 95 °C to about 100 °C) for a very short time period (e.g. about 1 minute, about 5 minutes, about 10 minutes, about 15 minutes, or about 30 minutes).
  • a particularly high incubation temperature e.g. about 95 °C to about 100 °C
  • a very short time period e.g. about 1 minute, about 5 minutes, about 10 minutes, about 15 minutes, or about 30 minutes.
  • the method may comprise incubating the sample at about 90 °C to about 100 °C for about 1 minute to about 2 hours.
  • the method may comprise incubating the sample at about 90 °C to about 100 °C for about 1 minute to about 1 hour.
  • the method may comprise incubating the sample at about 90 °C to about 100 °C for about 1 minute to about 30 minutes.
  • the method may comprise incubating the sample at about 90 °C for about 5 minutes to about 2 hours.
  • the method may comprise incubating the sample at about 90 °C for about 5 minutes to about 90 minutes, about 5 minutes to about 60 minutes or about 5 minutes to about 30 minutes.
  • the method may comprise incubating the sample at about 90 °C for about 30 minutes.
  • the method may comprise incubating the sample at about 95 °C for about 1 minute to about 1 hour.
  • the method may comprise incubating the sample at about 95 °C for about 1 minute to about 30 minutes, about 5 minutes to about 30 minutes or about 15 minutes to about 30 minutes.
  • the method may comprise incubating the sample at about 95 °C for about 20 minutes.
  • the method may comprise incubating the sample at about 98 °C for about 1 minute to about 45 minutes.
  • the method may comprise incubating the sample at about 98 °C for about 10 to 30 minutes, or about 10 minutes to about 20 minutes.
  • Preferably method may comprise incubating the sample at about 98 °C for about 12 minutes.
  • the method may comprise incubating the sample under conditions that may be advantageous for RNA quantity and/or quality recovery.
  • the permeabilization step may comprise incubating the sample at about 90 °C to 95°C for about 15 to 60 minutes.
  • the permeabilization step may comprise incubating the sample at about 90 °C for about 15 to 60 minutes, preferably about 90 °C for about 30 to 60 minutes.
  • the present methods may provide particularly advantageous permeabilization conditions based on the quality of the cross-linked DNA in the sample, in particular in a FFPE sample.
  • a permeabilization incubation from about 1 hour to about 3 hours according to the invention may be performed.
  • a permeabilization incubation of less than about 1 hour at about 78 °C to about 85 °C, for example about 30 minutes to 1 hour, for example about 30 minutes according to the invention may be performed.
  • DNA quality as used herein may be determined based on the average DNA fragment size of a cross- linked sample prior to any optional further fragmentation steps as described herein. Average DNA fragment size may be determined using methods which are known in the art, for example agarose gel electrophoresis, Tape station, bioanalyzer or devices using similar technology. A high-quality sample may have an average DNA fragment size of greater than 3000 base pairs. A high-quality sample may have an average DNA fragment size of greater than 5000 base pairs.
  • a high-quality sample may have an average DNA fragment size of between about 3000 base pairs and about 10000 base pairs, between about 5000 base pairs and about 10000 base pairs or between about 8000 base pairs and about 10000 base pairs.
  • a high-quality sample may have an average DNA fragment size about 3000 base pairs, about 5000 base pairs, about 8000 base pairs, or about 10000 base pairs.
  • An average -quality sample may have an average DNA fragment size of less than 3000 base pairs.
  • An average -quality sample may have an average DNA fragment size of between about 1500 base pairs and about 3000 base pairs, between about 800 base pairs and about 3000 base pairs or between about 450 base pairs and about 3000 base pairs.
  • a high-quality sample may have an average DNA fragment size about 3000 base pairs, about 1500 base pairs, about 800 base pairs, or about 450 base pairs.
  • a low-quality sample may have an average DNA fragment size of less than about 450 base pairs.
  • the present methods may comprise incubating the sample at about 78 °C to about 85 °C, for example at about 80 °C to about 85 °C for about 1 hour to about 2 hours.
  • Such embodiments may provide a particular advantage of allowing a suitable permeabilization of a range of different sample types (e.g. samples with different DNA quality). These conditions may therefore be broadly applied by a user without requiring an additional step to determine the DNA quality of the sample before performing the permeabilisation. This may be particularly advantageous in allowing a standard set of conditions to be applied within a process.
  • the present permeabilization step may be performed in a detergent buffer, for example in a buffer comprising sodium dodecyl sulfate (SDS).
  • SDS sodium dodecyl sulfate
  • the buffer may comprise 0.1-0.3% SDS, for example.
  • Titron-X or NP-40 may subsequently be used to sequester the SDS used in the permeabilization step.
  • a cell lysis step refers to an incubation which results in the polynucleotides, in particular genomic DNA and more particularly cross-linked genomic DNA, for subsequent method steps being released as soluble polynucleotides in the resulting supernatant when a standard centrifugation in performed immediately following the incubation step (e.g. 10 minutes at 13,000 rpm at 4°C).
  • a standard centrifugation in performed immediately following the incubation step e.g. 10 minutes at 13,000 rpm at 4°C
  • the polynucleotides, in particular genomic DNA and more particularly cross-linked genomic DNA for subsequent method steps are retaining the pellet when a standard centrifugation in performed immediately following the present permeabilization incubation step (e.g. 10 minutes at 13,000 rpm at 4°C).
  • a cell lysis step refers to an incubation that results in genomic DNA, particularly cross-linked genomic DNA, for subsequent method steps being released as soluble polynucleotides in the resulting supernatant when a standard centrifugation is performed immediately following the incubation step (e.g. 1-10 minutes at 10,000-13,000 rpm at 4°C).
  • genomic DNA and more particularly cross-linked genomic DNA for subsequent method steps are retaining the pellet when a standard centrifugation is performed immediately following the incubation step (e.g. 1-10 minutes at 10,000-13,000 rpm at 4°C at 4°C).
  • At least one DNA-acting enzyme as defined herein may be applied to the sample prior to any proteinase treatment (e.g. proteinase K treatment).
  • any proteinase treatment e.g. proteinase K treatment.
  • it is an advantage of the present invention that it may allow DNA-acting enzymes to be applied to the sample prior a proteinase (e.g. proteinase K) treatment.
  • proteinase K treatment may cause DNA fragments to be released from the insoluble sample by degrading the proteins to which the DNA is cross-linked.
  • the present permeabilization step is considered particularly effective because it retains the necessary complexity of DNA/protein cross-links whilst improving the ability of DNA- acting enzymes to access the cross-linked DNA.
  • the permeabilization step of the present invention does not include a proteinase (e.g. proteinase K) treatment.
  • the permeabilization step of the present invention does not comprise the addition/presence of a proteinase (e.g. proteinase K) to the sample prior to or during the permeabilization incubation.
  • previous methods for recovery of RNA from a cross-linked sample typically comprise a step of adding a proteinase (e.g. proteinase K).
  • a proteinase e.g. proteinase K
  • This typically involves the addition of proteinase K at concentrations/conditions that result in DNA molecules that were connected via a protein bridge being separated.
  • standard proteinase K treatments which are not required in the present methods, contribute to an incompatibility with co-recovery of cross-linked DNA and soluble RNA.
  • Such a proteinase K treatment step is not required in the methods of the present invention.
  • a sample of cross-linked polynucleotides refers to a sample comprising polynucleotides (e.g. DNA and/or RNA) which has been subjected to cross-linking.
  • Samples may be taken from a patient and/or from diseases tissue, and may also be derived from other organisms or from separate sections of the same organism, such as samples from one patient, one sample from healthy tissue and one sample from diseased tissue. Samples may thus be analysed according to the invention and compared with a reference sample, or different samples may be analysed and compared with each other. For example, from a patient being suspected of having breast cancer, a biopsy may be obtained from the suspected tumour. Another biopsy may be obtained from non-diseased tissue.
  • Genomic regions of interests may be the BRCA1 and BRCA2 gene, which genes are 83 and 86 kb long.
  • the present methods are for use with samples that have undergone a heavy cross-linking procedure (e.g. a fixation procedure).
  • a heavy cross-linking procedure e.g. a fixation procedure
  • Suitable cross-linked samples include, but are not limited to, sample cross-linked with formalin or formaldehyde
  • the sample is a formalin cross-linking sample.
  • the sample may be a paraffin embedded sample.
  • the sample may be a Formalin-Fixed Paraffin-Embedded (FFPE) sample.
  • the sample may be a tissue sample.
  • the sample may be a tumour sample.
  • the sample may be a FFPE tumour sample.
  • the sample may be a slice or a puncture from a FFPE sample.
  • the cross-linked polynucleotides may be cross-linked DNA.
  • the cross-linked DNA may be part of a chromatin complex.
  • sample of crosslinked DNA is a sample DNA which has been subjected to crosslinking.
  • Crosslinking the sample DNA has the effect that the three-dimensional state of the DNA within the sample remains largely intact. This way, DNA strands that are in physical proximity of each other remain in each others vicinity.
  • the sample is not placed on a flat surface or a solid supporting base during the present permeabilization step.
  • the sample is not placed on a flat surface of a solid supporting base during the permeabilization step.
  • the sample is suspended in solution during the permeabilization step of the present invention.
  • Embodiments of the present methods in which the sample is paraffin embedded may comprise an initial paraffin removal step.
  • Suitable methods for paraffin removal include, for example, xylene treatment, sonication and/or heating the sample for a short time period (e.g. 80°C for around 3 minutes).
  • the methods may also comprise a sonication step prior to the present permeablization step.
  • a sonication step may be particularly advantageous in embodiments where soluble RNA is to be isolated following the permeablization, as it may further aid release of the RNA.
  • Methods and techniques for suitable sonication are well known in the art.
  • the sonication step is a mild sonication.
  • the sonication should not result in cell lysis - as defined herein.
  • the aim of the sonication step is to add energy/heat to dissolve and solubilize any left over paraffin and disrupt the tissue.
  • the DNA and/or RNA should essentially not be sheared be the sonication.
  • tissue should essentially not be solubilized by the sonication.
  • the tissue should essentially disperse into smaller insoluble pieces.
  • An exemplary sonication step uses a Covaris M220 for 300 seconds, Duty factor 20%, Power 75 Watts, 200 cycles/burst at 20°C.
  • the present invention provides methods that enable the simultaneous recovery of cross-linked DNA and soluble RNA from a single starting sample, such as a FFPE sample.
  • the present invention provides a method for recovering soluble RNA and cross-linked DNA from a sample comprising cross-linked cells or tissue; which method comprises a step of permeabilizing the sample by an incubation at about 75 °C to about 100 °C for about 1 minute to about 4 hours.
  • the RNA may be separated from the cross-linked DNA following the permeabilization step.
  • the separation may be achieved, for example, using standard centrifugation parameters for separating soluble and insoluble components. For example, centrifugation may be performed for 1-10 minutes at 10, 000-13, OOOg.
  • the recovered RNA may be purified (e.g. following its removal in the supernatant after centrifugation) using known RNA purification method - including for example, commercial available RNA purification kits. Such methods may comprise the use of magnetic beads or spin columns, for example.
  • the recovered or purified RNA may be subject to standard RNA analysis methods including, for example, amplification and/or sequencing of the RNA.
  • standard RNA analysis methods including, for example, amplification and/or sequencing of the RNA.
  • Such methods are well-known in the art and include, but are not limited to, reverse-transcription PCR, quantitative PCR and next generation sequencing (RNA-Seq).
  • the cross-linked DNA isolated from the same starting sample as the RNA may subsequently be used in a method for determining at least part of the sequences of DNA fragments from the cross-linked DNA or a method for determining the presence or absence of a mutation in a genomic region of interest; as described herein.
  • genomic region of interest is a DNA sequence of an organism of which it is desirable to determine, at least part of, the DNA sequence.
  • a genomic region which is suspected of comprising an allele associated with a disease may be a genomic region of interest.
  • allele(s) means any of one or more alternative forms of a gene at a particular locus.
  • loci plural locus on a chromosome.
  • One allele is present on each chromosome of the pair of homologous chromosomes.
  • two alleles and thus two separate (different) genomic regions of interest may exist.
  • a “nucleic acid” or “polynucleotide” as referred to herein may include any polymer or oligomer of pyrimidine and purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively.
  • the present invention contemplates any deoxyribonucleotide, ribonucleotide or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated or glycosylated forms of these bases, and the like.
  • the polymers or oligomers may be heterogeneous or homogenous in composition, and may be isolated from naturally occurring sources or may be artificially or synthetically produced.
  • the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states.
  • identifier is a short sequence that can be added to an adaptor or a primer or included in its sequence or otherwise used as label to provide a unique identifier.
  • sequence identifier or tag
  • can be a unique base sequence of varying but defined length, typically from 4-16 bp used for identifying a specific nucleic acid sample. For instance 4 bp tags allow 4(exp4) 256 different tags. Typical examples are ZIP sequences, known in the art as commonly used tags for unique detection by hybridization. Identifiers are useful according to the invention, as by using such an identifier, the origin of a (PCR) sample can be determined upon further processing.
  • the different nucleic acid samples may be identified using different identifiers. For instance, as according to the invention sequencing may be performed using high throughput sequencing, multiple samples may be combined. Identifiers may then assist in identifying the sequences corresponding to the different samples. Identifiers may also be included in adaptors for ligation to DNA fragments assisting in DNA fragment sequences identification. Identifiers preferably differ from each other by at least two base pairs and preferably do not contain two identical consecutive bases to prevent misreads. The identifier function can sometimes be combined with other functionalities such as adaptors or primers.
  • aligning and “alignment” is meant the comparison of two or more nucleotide sequence based on the presence of short or long stretches of identical or similar nucleotides. Methods and computer programs for alignment are well known in the art.
  • “Fragmenting” includes any technique that, when applied to polynucleotides, in particular DNA which may be crosslinked DNA or not, results in fragments. Techniques well known in the art are sonication, shearing and/or enzymatic restriction, but other techniques can also be envisaged.
  • DNA is cross-linked during the preparation of FFPE samples, in particular because of the formalin fixation performed. Further, the preparation of FFPE samples results in fragmentation of the DNA.
  • the DNA extracted from an FFPE sample may have an average fragment size of around 50 bases, around 100 bases, around 200 bases, around 400 bases, around 800 bases, around 1500 bases, around 3000 bases, around 8000 bases or around 10000 bases.
  • DNA fragments that originate from a genomic region of interest remain in proximity of each other because they are crosslinked.
  • DNA fragments of the genomic region of interest which are in the proximity of each other due to the crosslinks, are ligated.
  • This type of ligation may also be referred to as proximity ligation.
  • DNA fragments comprising the target nucleotide sequence may ligate with DNA fragments within a large linear distance on sequence level.
  • Each individual target nucleotide sequence is likely to be crosslinked to multiple other DNA fragments. As a consequence, often more than one DNA fragment may be ligated to a fragment comprising the target nucleotide sequence.
  • a sequence of the genomic region of interest may be built.
  • a DNA fragment ligated with the fragment comprising the target nucleotide sequence includes any fragment which may be present in ligated DNA fragments.
  • the sample for use in the present methods comprises fragmented polynucleotides, in particular fragmented DNA.
  • samples of crosslinked DNA may be further fragmented in embodiments of the present invention.
  • the fragmenting may comprise sonication, and may be followed by enzymatic DNA end repair. Sonication results in the fragmenting of DNA at random sites, which can be either blunt ended, or can have 3’- or 5’- overhangs, as these DNA breakage points occur randomly, the DNA may be repaired (enzymatically), filling in possible 3’- or 5 ’-overhangs, such that DNA fragments are obtained which have blunt ends that allow ligation of the fragments to adaptors and/or to each other in the subsequent step. Alternatively, the overhangs may also be made blunt ended by removing overhanging nucleotides, using e.g. exonucleases.
  • the fragmenting may comprise fragmenting with one or more restriction enzymes, or combinations thereof.
  • Fragmenting with a restriction enzyme is advantageous as it may allow control of the average fragment size.
  • the fragments that are formed may have compatible overhangs or blunt ends that allow ligation of the fragments in the subsequent step.
  • the fragmenting may be performed using SI nuclease to generate blunt ended fragments.
  • the fragmenting may be performed using DNasel.
  • restriction enzymes with different recognition sites may be used. This is advantageous because by using different restriction enzymes having different recognition sites, different DNA fragments can be obtained from each subsample.
  • a “restriction endonuclease” or “restriction enzyme” is an enzyme that recognizes a specific nucleotide sequence (recognition site) in a double-stranded DNA molecule, and will cleave both strands of the DNA molecule at or near every recognition site, leaving a blunt or a 3’- or 5’- overhanging end.
  • the specific nucleotide sequence which is recognized may determine the frequency of cleaving, e.g. a nucleotide sequence of 6 nucleotides occurs on average every 4096 nucleotides, whereas a nucleotide sequence of 4 nucleotides occurs much more frequently, on average every 256 nucleotides. LIGATION
  • the present methods may comprise a “ligation” step.
  • “Ligating” involves the joining of separate DNA fragments.
  • the DNA fragments may be blunt ended, or may have compatible overhangs (sticky overhangs) such that the overhangs can hybridise with each other.
  • the joining of the DNA fragments may be enzymatic, with a ligase enzyme, DNA ligase.
  • a non-enzymatic ligation may also be used, as long as DNA fragments are joined, i.e. forming a covalent bond.
  • a phosphodiester bond between the hydroxyl and phosphate group of the separate strands is formed.
  • a fragment comprising a target nucleotide sequence may be crosslinked to multiple other DNA fragments, more than one DNA fragment may be ligated to the fragment comprising the target nucleotide sequence. This may result in combinations of DNA fragments which are in proximity of each other as they are held together by the cross links. Different combinations and/or order of the DNA fragments in ligated DNA fragments may be formed.
  • the recognition site of the restriction enzyme is known, which makes it possible to identify the fragments as remains of or reconstituted restriction enzyme recognition sites may indicate the separation between different DNA fragments.
  • the ligation step may be performed in the presence of an adaptor, ligating adaptor sequences in between fragments.
  • the adaptor may be ligated in a separate step. This may be advantageous because the different fragments can be easily identified by identifying the adaptor sequences which are located in between the fragments.
  • the ligated DNA fragments may optionally be further fragmented, preferably with a restriction enzyme.
  • the optional first and second fragmenting step may be aimed at obtaining ligated DNA fragments of a size which is compatible with the subsequent amplification step and/or sequence determination step.
  • a second fragmenting step preferably with an enzyme may result in ligated fragment ends which are compatible with the optional ligation of an adaptor.
  • the second fragmenting step may be performed after reversing the crosslinking, however, it is also possible to perform the second fragmenting step and/or ligation step while the DNA fragments are still crosslinked.
  • the restriction enzyme recognition site of the second fragmentation step is longer than the recognition site of the restriction enzyme used in the first fragmentation step.
  • the second enzyme thus cuts at a lower frequency than the first enzyme. This means that the average DNA fragment size after the first fragmentation is smaller than the average fragment size obtained after the second fragmentation step. This way, in the first fragmenting step, relatively small fragments are formed, which are subsequently ligated. As the second restriction enzyme cuts less frequently, most of the DNA fragments may not comprise the restriction recognition site of the second restriction enzyme. Thus, when the ligated DNA fragments are subsequently fragmented in the second fragmentation step, many of the initial DNA fragments may remain intact.
  • the first optional fragmenting step is less frequent than the second optional fragmenting step, the result would be that the initial fragment are generally further fragmented, which may result in the loss of relatively large DNA sequences that are useful for building a contig.
  • the first optional fragmenting step is more frequent as compared to the second optional fragmenting step, such that DNA fragments may largely remain intact, i.e. are largely not further fragmented in the second optional fragmentation step.
  • Reversing crosslinking is used herein to refer to breaking the crosslinks such that the DNA that has been crosslinked is no longer crosslinked and is suitable for subsequent amplification and/or sequencing steps. For example, performing a protease K treatment on a sample DNA that has been crosslinked with formaldehyde will digest the protein present in the sample. Because the crosslinked DNA is connected indirectly via protein, the protease treatment in itself may reverse the crosslinking between the DNA. However, the protein fragments that remain connected to the DNA may hamper subsequent sequencing and/or amplification. Hence, reversing the connections between the DNA and the protein may also result in “reversing crosslinking”.
  • the DNA-crosslinker-protein connection may be reversed through a heating step for example by incubating at 70°C.
  • a heating step for example by incubating at 70°C.
  • any “reversing crosslinking” method may be contemplated wherein the DNA strands that are connected in a crosslinked sample becomes suitable for sequencing and/or amplification.
  • the reverse crosslinking results in a pool of ligated DNA fragments that comprise two or more fragments, preferably three or more.
  • a subpopulation of the pool of ligated DNA fragments comprises a DNA fragment which comprises the target nucleotide sequence.
  • An “adaptor” is a short double -stranded oligonucleotide molecule with a limited number of base pairs, e.g. about 10 to about 30 base pairs in length, which are designed such that they can be ligated to the ends of fragments.
  • Adaptors are generally composed of two synthetic oligonucleotides which have nucleotide sequences which are partially complementary to each other. When mixing the two synthetic oligonucleotides in solution under appropriate conditions, they will anneal to each other forming a double-stranded structure.
  • one end of the adaptor molecule may be designed such that it is compatible with the end of a restriction fragment and can be ligated thereto; the other end of the adaptor can be designed so that it cannot be ligated, but this does need not to be the case, for instance when an adaptor is to be ligated in between DNA fragments.
  • At least one adaptor is optionally ligated to the ligated DNA fragments.
  • the ends of the ligated DNA fragments need to be compatible with ligation of such an adaptor.
  • the ligated DNA fragments may be linear DNA
  • ligation of an adaptor may provide for a primer hybridisation sequence.
  • the adaptor sequence ligated with ligated DNA fragments comprising the target nucleotide sequence provides DNA molecules which may be amplified using PCR.
  • DNA comprising the target nucleotide sequence may be amplified using at least one oligonucleotide primer which hybridises to the target nucleotide sequence, and at least one additional primer which hybridises to the at least one adaptor.
  • the DNA comprising the target nucleotide may also be amplified in using at least one oligonucleotide primer which hybridises to the target nucleotide sequence.
  • “Amplifying” refers to a polynucleotide amplification reaction, namely, a population of polynucleotides that are replicated from one or more starting sequences. Amplifying may refer to a variety of amplification reactions, including but not limited to polymerase chain reaction (PCR), linear polymerase reactions, nucleic acid sequence- based amplification, rolling circle amplification and like reactions.
  • PCR polymerase chain reaction
  • linear polymerase reactions nucleic acid sequence- based amplification
  • rolling circle amplification rolling circle amplification and like reactions.
  • Oligonucleotide primers in general, refer to strands of nucleotides which can prime the synthesis of DNA. DNA polymerase cannot synthesize DNA de novo without primers. A primer hybridises to the DNA, i.e. base pairs are formed. Nucleotides that can form base pairs, that are complementary to one another, are e.g. cytosine and guanine, thymine and adenine, adenine and uracil, guanine and uracil. The complementarity between the primer and the existing DNA strand does not have to be 100%, i.e. not all bases of a primer need to base pair with the existing DNA strand.
  • nucleotides are incorporated using the existing strand as a template (template directed DNA synthesis).
  • template directed DNA synthesis we may refer to the synthetic oligonucleotide molecules which are used in an amplification reaction as “primers”.
  • SEQUENCING refers to determining the order of nucleotides (base sequences) in a nucleic acid sample, e.g. DNA or RNA. Many techniques are available such as Sanger sequencing and High throughput sequencing technologies such as offered by Roche, Illumina and Applied Biosystems.
  • the sequence of the (amplified) ligated DNA fragments for example comprising the target nucleotide sequence is determined. Determining the sequence is preferably performed using high throughput sequencing technology, as this is more convenient and allows a high number of sequences to be determined to cover the complete genomic region of interest. From these determined sequences a contig may be built of the genomic region of interest. When sequences of the DNA fragments are determined, overlapping reads may be obtained from which the genomic region of interest may be built. In case the DNA fragments were obtained by random fragmentation, the random nature of the fragmentation step already may result in DNA fragments which when sequenced results in overlapping reads. By increasing the sample size, e.g. increasing the number of cells analysed, the reliability of the genomic region of interest that is built may be increased.
  • sequencing adaptors may be ligated to the (amplified) ligated DNA fragments.
  • the linear or circularized fragment is amplified, by using for example PCR as described herein, the amplified product is linear, allowing the ligation of the adaptors.
  • Suitable ends may be provided for ligating adaptor sequences (e.g. blunt, complementary staggered ends).
  • primer(s) used for PCR or other amplification method may include adaptor sequences, such that amplified products with adaptor sequences are formed in the amplification.
  • the circularized fragment may be fragmented, preferably by using for example a restriction enzyme in between primer binding sites for the inverse PCR reaction, such that DNA fragments ligated with the DNA fragment comprising the target nucleotide sequence remain intact.
  • Sequencing adaptors may also be included in the ligation steps of the methods of the invention. These sequencing adaptors may be included as part of the adaptor sequences of the adaptors that may already optionally used in these steps and/or separate sequence adaptors may be provided in these steps in addition.
  • long reads may be generated in the high throughput sequencing method used.
  • Long reads may allow one to read across multiple DNA fragments of ligated DNA fragments. This way, initial DNA fragments or DNA fragments generated during optional further fragmentation in step (b) may be identified.
  • DNA fragment sequences may be compared to a reference sequence and/or compared with each other. For example, such DNA fragment sequences may be used for determining the ratio of fragments of cells carrying a genetic mutation.
  • sequencing also DNA fragment sequences of DNA fragments adjacent to such sequences, unique ligated DNA fragments may be identified. This is in particular the case when DNA fragments were obtained in step by random fragmentation.
  • Such short reads may involve additional processing steps such that separate ligated DNA fragments when fragmented, are ligated or equipped with identifiers, such that from the short reads, contigs may be built for the ligated DNA fragments.
  • Such high throughput sequencing technologies involving short sequence reads may involve paired end sequencing.
  • the short reads from both ends of a DNA molecule used for sequencing which DNA molecule may comprise different DNA fragments, may allow coupling of DNA fragments that were ligated. This is because two sequence reads can be coupled spanning a relatively large DNA sequence relative to the sequence that was determined from both ends. This way, contigs may be built for the (amplified) ligated DNA fragments.
  • short reads may be contemplated without identifying DNA fragments, because from the short sequence reads a genomic region of interest may be built, especially when the genomic region of interest has been amplified. Information regarding DNA fragments and/or separate genomic region of interests (for instance of a diploid cell) may be lost, but DNA mutations may still be identified.
  • the step of determining at least part of the sequence of the (amplified) ligated DNA sequence may comprise short sequence reads, but preferably longer sequence reads are determined such that DNA fragment sequences may be identified.
  • a contig is used in connection with DNA sequence analysis, and refers to reassembled contiguous stretches of DNA derived from two or more DNA fragments, preferably three or more DNA fragments, having contiguous nucleotide sequences.
  • a contig may be a set of overlapping DNA fragments that provides a (partial) contiguous sequence of a genomic region of interest.
  • a contig may also be a set of DNA fragments that, when aligned to a reference sequence, may form a contiguous nucleotide sequence.
  • the term "contig” encompasses a series of (ligated) DNA fragment(s) which are ordered in such a way as to have sequence overlap of each (ligated) DNA fragment(s) with at least one of its neighbours.
  • the linked or coupled (ligated) DNA fragment(s) may be ordered either manually or, preferably, using appropriate computer programs such as FPC, PHRAP, CAP3 etc, and may also be grouped into separate contigs.
  • the present methods may optionally comprise a step of circularization the DNA fragments generated, for example following reverse cross-linking.
  • a restriction enzyme is used for the second fragmentation, as a restriction enzyme allows control of the fragmentation step and results, if an appropriate restriction enzyme is chosen, in compatible ends of the ligated DNA fragments that are favourable for ligation of the compatible ends, resulting in circularized ligated DNA fragments.
  • fragmenting using other methods, e.g. shearing and/or sonication and subsequent enzymatic DNA end repair, such that blunt ended double strand DNA is formed may also be ligated to form circularized DNA.
  • the optional first and second fragmenting steps are aimed at obtaining ligated DNA fragments which are compatible with the subsequent circularization, amplification step and/or sequence determination step.
  • the optional first and second fragmenting steps comprise restriction enzymes
  • the first and second restriction enzyme may be chosen as described herein.
  • Circularization involves the ligation of the ends of the ligated DNA fragments such that a closed circle is formed.
  • the circularized DNA comprising ligated DNA fragments which comprise the target nucleotide sequence may subsequently be amplified using at least one primer which hybridises to the target nucleotide sequence.
  • amplification step reversing the crosslinking is required, as crosslinked DNA may hamper or prevent amplification.
  • two primers are used that hybridise to the target nucleotide sequence in an inverse PCR reaction. In this way, DNA fragments of the circularized DNA, which are ligated with the DNA fragment comprising the target nucleotide sequence, may be amplified.
  • Size selection involves techniques with which particular size ranges of molecules, e.g. (ligated) DNA fragments or amplified (ligated) DNA fragments, are selected. Techniques that can be used are for instance gel electrophoresis, size exclusion, gel extraction chromatography, but are not limited thereto, as long as molecules with a particular size can be selected, such a technique will suffice.
  • a size selection step may be performed prior to or after the amplification.
  • a size selection step may be performed using gel extraction chromatography, gel electrophoresis or density gradient centrifugation, which are methods generally known in the art.
  • DNA is selected of a size between 20-20,0000 base pairs, preferably 50-10,0000 base pairs, most preferably between 100-3,000 base pairs.
  • a size separation step allows to select for (amplified) ligated DNA fragments in a size range that may be optimal for PCR amplification and/or optimal for the sequencing of long reads by next generation sequencing.
  • SMRTTM Single Molecule Real Time
  • ploidy in a cell of a genomic region of interest is greater than 1, for each ploidy a sequence may be determined, the presence or absence of a mutation determined and/or a contig may be built.
  • genomic environment of any given target site in the genome mostly consists of DNA genome sequences that are physically close to the target sequence on the linear chromosome template, it allows the reconstruction of each particular chromosome template.
  • ploidy of a genomic region of interest is greater than 1, multiple genomic regions of interest are present in a cell (or equivalent thereof). These multiple genomic regions of interest generally do not occupy the same space, i.e. they are separated in space.
  • a sample of crosslinked DNA of such a cell is fragmented, from each genomic region of interest in a cell a corresponding DNA fragment comprising the target nucleotide sequence will be formed. These DNA fragments will each ligate with DNA fragments in their proximity. Ligated DNA fragments will thus be representative of the different genomic regions of interest.
  • the step determining the sequence of at least part of the sequences of DNA fragments, determining the presence or absence of a mutation or of building a contig comprises the steps of:
  • the step 2) of assigning the fragments to a genomic region comprises identifying the different ligation products and coupling of the different ligation products comprising the DNA fragments.
  • the methods of the present invention may be used for haplotyping.
  • heterogeneous cell populations For instance, in case a sample of crosslinked DNA is provided which comprises a heterogeneous cell population (e.g. cells with different origin or cells from an organism which comprises normal cells and genetically mutated cells (e.g. cancer cells)), for each genomic region of interest corresponding to different genomic environment (which may e.g. be different genomic environments in a cell or different genomic environments from different cells) a sequence may be determined, the presence or absence of a mutation determined and/or a contig may be built.
  • a heterogeneous cell population e.g. cells with different origin or cells from an organism which comprises normal cells and genetically mutated cells (e.g. cancer cells)
  • a sequence may be determined, the presence or absence of a mutation determined and/or a contig may be built.
  • Methods are provided for identifying the presence or absence of a genetic mutation.
  • the method for identifying the presence or absence of a genetic mutation may comprise the steps of any of methods of the invention as described above and further identifying the presence or absence of a genetic mutation in a sequence determined.
  • Genetic mutations can be identified for instance by sequences determined for multiple samples, in case one (or more) of the samples comprises a genetic mutation, this may be observed as the sequence is different when compared to the sequence of the other samples, i.e. the presence of a genetic mutation is identified. In case no sequence differences between the samples is observed, the absence of genetic mutation is identified.
  • a reference sequence may also be used to which the sequence may be aligned. When the sequence of the sample is different from the sequence of the reference sequence, a genetic mutation is observed, i.e. the presence of a genetic mutation is identified. In case no sequence differences between the sample or samples and the reference sequence is observed, the absence of genetic mutation is identified.
  • DNA fragments sequences may be aligned, with each other or with a reference sequence, the presence or absence of a genetic mutation may be identified.
  • an identified genetic mutation may be a single nucleotide variant (SNV), for example a single nucleotide polymorphism (SNP), an insertion, an inversion and/or a translocation.
  • SNV single nucleotide variant
  • SNP single nucleotide polymorphism
  • the number of fragments and/or ligation products from a sample carrying the deletion and/or insertion may be compared with a reference sample in order to identify the deletion and/or insertion.
  • a deletion, insertion, inversion and/or translocation may also be identified based on the presence of chromosomal breakpoints in analyzed fragments.
  • the presence or absence of methylated nucleotides is determined in DNA fragments, ligated DNA fragments, and/or genomic regions of interest.
  • the DNA may be treated with bisulphite.
  • Treatment of DNA with bisulphite converts cytosine residues to uracil, but leaves 5-methylcytosine residues unaffected.
  • bisulphite treatment introduces specific changes in the DNA sequence that depend on the methylation status of individual cytosine residues, yielding single- nucleotide resolution information about the methylation status of a segment of DNA.
  • sequences from a plurality of samples treated with bisulphite may also be aligned, or a sequence from a sample treated with bisulphite may be aligned to a reference sequence.
  • the primer sequence may be removed prior to the high throughput sequencing step.
  • primers are used carrying a moiety, e.g. biotin, for the optional purification of (amplified) ligated DNA fragments through binding to a solid support.
  • the ligated DNA fragments comprising the target nucleotide sequence may be captured with a hybridisation probe (or capture probe) that hybridises to a target nucleotide sequence.
  • the hybridisation probe may be attached directly to a solid support, or may comprise a moiety, e.g. biotin, to allow binding to a solid support suitable for capturing biotin moieties (e.g. beads coated with streptavidin).
  • the ligated DNA fragments comprising a target nucleotide sequence are captured thus allowing to separate ligate DNA fragments comprising the target nucleotide sequence from ligated DNA fragments not comprising the target nucleotide sequence.
  • a capturing steps allows to enrich for ligated DNA fragments comprising the target nucleotide sequence.
  • an amplification step is performed, which is also an enrichment step, alternatively a capture step with a probe directed to a target nucleotide sequence may be performed.
  • a capture probe for a target nucleotide sequence may be used for capturing.
  • more than one probe may be used for multiple target nucleotide sequences.
  • one primer of one of the 5 target nucleotide sequences may be used as a capture probe (A, B, C, D or E).
  • the 5 primers may be used in a combined fashion (A, B, C, D and E) capturing the genomic region of interest.
  • an amplification step and capture step may be combined, e.g. first performing a capture step and then an amplification step or vice versa.
  • a capture probe may be used that hybridises to an adaptor sequence comprised in (amplified) ligated DNA fragments.
  • FFPE quality is measured by the average length of the DNA following DNA isolation and ranged from 450 to lOOOObp. As shown in Figure 4, for samples with lower quality the fraction of covered bases generally increases with longer incubation times, with an optimum between 120 and 240 minutes. The fraction of bases covered using FFPE samples with higher quality, >5000bp, benefits from shorter incubation times, with an optimum between 30 and 120 min.
  • the fraction of covered bases in a lOOkb window surrounding the location of a viewpoint is a measure for the performance of the FFPE targeted proximity-ligation protocol. Performance using 37 or 65°C incubation temperatures during the permeabilization step yielded insufficient quality data, even after lengthy incubations of 4 to 6 hours. In the experiments presented above it was determined that the fraction of covered bases and thus the quality of the proximity ligation experiment starting from FFPE tissue samples benefits of a prolonged incubation at an elevated temperature compared to conventional permeabilization protocols. Though different optimal incubation times were less effective for different qualities FFPE samples, a protocol of 120 minutes incubation at 80°C yields in all tested cases good quality data.
  • Proximity ligation technologies rely on the ability to use of enzymes (DNA (restriction) endonucleases, DNA end-repair enzymes, DNA ligase and others) on crosslinked samples.
  • enzymes DNA (restriction) endonucleases, DNA end-repair enzymes, DNA ligase and others
  • under treated samples enzymes may have difficulty accessing the DNA and thus not be able to optimally digest and ligate DNA.
  • an over-treated sample DNA may be present as single-stranded DNA molecules hampering effective enzymatic activity and/or proximity information maintained by the formaldehyde induced crosslinks may be (partially) lost due to an undesired reversal of the crosslinks.
  • FIG. 1 shows the fraction of covered nucleotides in the lOOkb region surrounding the location of the primer set after 30- and 60-minute incubation at different temperatures for two different FFPE samples tested.
  • the preferred incubation time was first determined using 10 pm sections of a single FFPE tissue block. Sections were deparaffmized and permeabilized at 7 different incubation times varying from 30 minutes to 240 minutes at 80°C and incubations from 15 to 60 minutes at 90°C. After the permeabilization step all samples were equally treated according to a protocol using linear PCR and prepped for next-generation sequencing (NGS). Following NGS, the % of co-captured bases was determined for each condition.
  • Figure 3 shows the fraction of covered nucleotides in the lOOkb region surrounding the location of the primer set.
  • SI nuclease was used.
  • SI processes ssDNA which here is used to increase the number of blunt ended dsDNA fragments that can be ligated during the proximity ligation step. Similar to the Nlalll experiments in Example 1, the percentage of covered bases in the regions flanking the sequence used for enrichment is a direct measure for the influence of the permeabilization step on the accessibility of the SI nuclease enzyme to the crosslinked DNA.
  • DNA yields for each condition were measured and quality of the proximity reshuffling was determined following NGS adapter ligation and hybrid capture using a panel spanning BRCA1 and BRCA2, about lOOkb probed region each. After sequencing on an Illumina MiniSeq machine, the percentage of co-captured bases were determined for each sample within a 1200kb region surrounding the location of the probes (viewpoint) used for enrichment.
  • the effect of the permeabilization condition is measured on two points: 1) effect on DNA yield and 2) effect on covered bases surrounding the viewpoints.
  • Table 1 DNA yields (ng) retrieved from each experimental condition.
  • Permeability of the sample is measured using the fraction of covered bases in the target region which are presented in Figure 6.
  • the following permeabilization conditions are optimal when combining yield and experimental quality results: 2 hours at 80°C or 30min at 90°C or 20min (15-30min) at 95°C or 12min (10-15min) at 98°C.
  • Example 3 Single-step recovery of soluble RNA and cross-linked DNA using the present permeabilization conditions
  • Deparaffinized samples were sonicated on a Covaris M220 for 300 seconds, Duty factor 20%, Power 75 Watts, 200 cycles/burst at 20°C.
  • the samples were then permeabilised by incubation for 15, 30, 60 or 120min at 65°C, 80°C, 90°C or 95°C with SDS shaking at 800rpm.
  • the permeabilised samples were then centrifuged for 1 min at lOOOOxg.
  • RNA Quantity was determined that following centrifugation, soluble RNA was released into the supernatant whilst cross-linked DNA - in particular cross-linked genomic DNA - was retained in the pellet.
  • RNA released in the supernatant was purified using magnetic beads (e.g. from the truXTRAC FFPE RNA Plus Kit (Covaris)) or spin columns (e.g. from the PureLink RNA mini kit (Thermo Fisher Scientific)). All purification methods isolated sufficient concentrations of RNA from the supernatant for sufficient cDNA to be transcribed, even if there was a low starting concentration of RNA. As expected, FFPE samples yielded less GAPDHthan the cell samples.
  • RNA in each sample was determined by reverse transcribing the RNA then assessing the presence of amplicons from one housekeeping gene (TATA binding protein; TBP) at different distances from the poly A tail.
  • Figure 9A indicates the relative size and position of the primer pairs used. If there is a large reduction in amplifiability of all amplicons, it is evident that the reverse transcriptase was inhibited in all sections of the housekeeping gene. If the amplicons furthest from the poly A tail are less abundant, then it can be assumed that the longer RNA strands have been degraded or sub optimally de-crosslinked. The greater the difference in amplification, the worse the quality of template RNA.
  • Figure 9B shows the difference in amplifiability when incubated for only 15 minutes at varying temperatures.
  • the difference in amplifiability decreases, indicating an increase in RNA quality.
  • Incubation at 95°C causes a decrease in amplifiability of the second amplicon, TBP2, indicating a decrease in RNA quality.
  • This pattern is the same when incubation is increased to 30 minutes and 60 minutes ( Figure 9C and 9D). Only when incubation is increased to 120 minutes is when a greater increase in the difference of amplification is observed, which is more extreme as the temperature increases (Figure 9E).

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides a method for permeabilizing a sample comprising cross-linked polynucleotides, the method comprising incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours.

Description

METHOD
FIELD OF THE INVENTION
The present invention relates to the field of molecular biology. In particular, the present invention relates to methods for enzymatically treating polynucleotides from preserved, cross-linked tissue samples - such as formalin-fixed, paraffin-embedded samples (FFPE). In particular embodiments, the invention relates to the sequencing of DNA, for example to strategies for determining a DNA sequence of a genomic region of interest. In particular the invention relates to the determination of the sequence of parts of a genome that are in a spatial configuration with each other. The invention also provides methods that enable the recovery of cross-linked DNA and soluble RNA from a single starting sample.
BACKGROUND TO THE INVENTION
Preserved, cross-linked tissue samples - such as formalin-fixed, paraffin-embedded samples (FFPE) - are a common sample type in both clinical and research settings. By way of example, biopsies and surgical samples are often prepared as FFPE tissue blocks. Accordingly, these samples provide a notable resource for determining genetic information; for example to identify the presence of mutations from e.g. a tumour sample.
Examples of methods which may be performed on preserved, cross-linked tissue samples include proximity ligation sequencing methods in which crosslinked DNA fragments that originate from a genomic region of interest remain in proximity of each other because they are crosslinked. When these crosslinked DNA fragments are subsequently ligated, DNA fragments of the genomic region of interest, which are in the proximity of each other due to the crosslinks, are ligated. By determining (at least part of) the sequence of ligated fragments that comprise a fragment comprising a target nucleotide sequence, sequences of DNA fragments within the spatial surrounding of the genomic region of interest are obtained. Each individual target nucleotide sequence is likely to be crosslinked to multiple other DNA fragments. As a consequence, often more than one DNA fragment may be ligated to a fragment comprising the target nucleotide sequence. By combining (partial) sequences of the (amplified) ligated DNA fragments that were ligated with a fragment comprising the target nucleotide sequence, a sequence of the genomic region of interest may be built and the presence of mutations identified.
However, the fixation conditions applied during the production of such preserved, cross-linked samples are often harsh and result in heavy cross-linking, including the generation of heavily cross linked DNA. This heavy cross-linking of DNA can interfere with downstream molecular analysis. In particular, it may interfere with and/or inhibit enzyme activity on the sample. There is thus a need for methods and approaches for processing preserved, cross-linked samples which provide polynucleotides (e.g. DNA and/or RNA) at suitable yields and integrity for downstream, molecular analysis.
SUMMARY OF THE INVENTION
The present invention is based on the inventors’ surprising determination that the use of certain conditions for a permeabilization incubation prior to enzyme treatment improves the ability of the enzyme(s) to act on the cross-linked DNA. This is considered to result in higher quality data from downstream analysis. Without wishing to be bound by theory, it is considered that the permeabilization incubation improves the ability of enzymes to access the cross-linked DNA, thus improving the ability of the enzyme (s) to act on the DNA.
As used herein, “act on DNA” refers to enzymes that directly alter the structure of a DNA molecule itself, for example in contrast to enzymes which act on proteins associated with DNA. Examples of enzymes that act on DNA in the context of the present invention include restriction enzymes, SI nuclease, DNasel, DNA ligase, DNA nickase, and DNA polymerase.
Accordingly, in a first aspect, the present invention provides a method for determining at least part of the sequences of DNA fragments from a sample of fragmented cross-linked DNA; which comprises the following steps: a) providing a sample of fragmented cross-linked DNA; b) permeablizing the sample by incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours; c) optionally, further fragmenting the cross-linked DNA; d) optionally, repairing the ends of DNA fragments; e) ligating the cross-linked DNA fragments; f) reversing the cross-linking; and g) determining at least part of the sequences of the DNA fragments.
In a further aspect, the present invention relates to a method for determining at least part of the sequences of DNA fragments from a sample of fragmented cross-linked DNA; which comprises the following steps: a) providing a sample of fragmented cross-linked DNA; b) permeablizing the sample by incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours; c) optionally, further fragmenting the crosslinked DNA; d) optionally, repairing the ends of DNA fragments to facilitate ligation; e) ligating the fragmented crosslinked DNA; f) reversing the crosslinking; g) optionally fragmenting the DNA of step f), h) optionally, ligating the fragmented DNA of step f) or g) to at least one adaptor; i) optionally, (1) amplifying the ligated DNA fragments of step f) or g) comprising the target nucleotide sequence using at least one primer which hybridises to the target nucleotide sequence, or amplifying the ligated DNA fragments of step g) using at least one oligonucleotide primer which hybridises to the target nucleotide sequence and at least one oligonucleotide primer which hybridises to the at least one adaptor and/or (2) capturing the ligated DNA fragments of step f) to h) comprising the target nucleotide sequence using a capture probe to separate (amplified) ligated DNA fragments comprising the target nucleotide sequence from (amplified) ligated DNA fragments not comprising the target nucleotide sequence; j) determining at least part of the sequence of the (amplified) ligated DNA fragments of step f), g), h) or i) comprising the target nucleotide sequence preferably using high throughput sequencing.
In a further aspect, the invention provides a method for determining the presence or absence of a mutation in a genomic region of interest comprising a target nucleotide sequence, comprising the steps of: a) providing a sample of fragmented cross-linked DNA; b) permeablizing the sample by incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours; c) optionally, further fragmenting the crosslinked DNA; d) optionally, repairing the ends of DNA fragments; e) ligating the fragmented crosslinked DNA; f) reversing the crosslinking; g) optionally fragmenting the DNA of step f), h) optionally, ligating the fragmented DNA of step f) or g) to at least one adaptor; i) optionally, (1) amplifying the ligated DNA fragments of step f) or g) comprising the target nucleotide sequence using at least one primer which hybridises to the target nucleotide sequence, or amplifying the ligated DNA fragments of step g) using at least one oligonucleotide primer which hybridises to the target nucleotide sequence and at least one oligonucleotide primer which hybridises to the at least one adaptor and/or (2) capturing the ligated DNA fragments of step f) to h) comprising the target nucleotide sequence using a capture probe to separate (amplified) ligated DNA fragments comprising the target nucleotide sequence from (amplified) ligated DNA fragments not comprising the target nucleotide sequence; j) determining at least part of the sequence of the (amplified) ligated DNA fragments of step f), g), h) or i) comprising the target nucleotide sequence preferably using high throughput sequencing; k) aligning the determined sequences of the (amplified) ligated DNA fragments to a reference sequence; 1) identifying the presence or absence of a genetic mutation in the determined sequences.
In a further aspect, the present invention also provides methods that enable the simultaneous recovery of cross-linked DNA and soluble RNA from a single starting sample, such as a FFPE sample.
Accordingly, in one aspect the present invention provides a method for recovering soluble RNA and cross-linked DNA from a sample comprising cross-linked cells or tissue; which method comprises a step of permeabilizing the sample by an incubation at about 75 °C to about 100 °C for about 1 minute to about 4 hours.
In some embodiments, the cross-linked DNA may subsequently be used in a method for determining at least part of the sequences of DNA fragments from the cross-linked DNA or a method for determining the presence or absence of a mutation in a genomic region of interest; as described herein.
Such methods may be advantageous, for example, in allowing recovery of cross-linked DNA and soluble RNA to from a single starting sample, in particular in the same step of a method. This may be particularly advantageous for FFPE samples, for example. As noted above, FFPE samples represent a precious resource from both clinical and research settings. The ability to recover RNA and cross- linked DNA from a FPPE sample may therefore provide particular advantages by enabling the sequencing and analysis of both the cross-linked DNA and corresponding RNA from the same sample. Further, the ability to analyze both RNA and cross-linked DNA from a single starting sample may be advantageous in enabling the expression of genes with identified mutations (e.g. SNVs or structural changes) to be determined.
BRIEF DESCRIPTION OF THE FIGURES Figure 1: Summary of an exemplary method
Figure 2: For two FFPE samples subjected to the indicated permeabilization conditions an increase in the fraction of covered nucleotides in a lOOkb region surrounding the viewpoint was observed for both the 80 and 90°C compared to the aliquots incubated at 37 or 65°C. Aliquots of Sample 1 were subjected to 37, 65 and 80°C incubations for both 30 (blue) and 60 minutes (orange), for sample two an additional 90°C incubation was included. The fraction of covered nucleotides in the lOOkb region surrounding the location of the primer set is shown on the y-axis.
Figure 3: Longer permeabilization incubation at high temperatures increases enzyme accessibility. Aliquots of Sample 1 and 2 were subjected to 30, 60, 120, 150, 180, 210 and 240 minutes incubation at 80°C, aliquots of Sample 3 were subjected to an incubation for 15, 30, 45 and 60 minutes at 90°C. The fraction of covered nucleotides in the lOOkb region surrounding the location of the primer set is shown on the y-axis. Figure 4: The effect of the permeabilization time for 30, 60, 120, 180 and 240 minutes at 80°C on various FFPE sample qualities.
Figure 5: DNA yields (ng) retrieved from each experimental condition for three different samples. Stars refer to the conditions that are best performing when considering the performance of the three FFPE samples from different qualities.
Figure 6: The effect of the permeabilization time and temperature on the fraction of covered nucleotides in the target region. Stars refer to the conditions that are best performing when considering the performance of the three different FFPE samples.
Figure 7: Relative RNA Yield in Supernatant from FFPE Samples. Normalised to RNA yield in unincubated samples to determine relative increase in RNA release. Four different FFPE samples were used to allow for averages to be calculated.
Figure 8: Quality Analysis of RNA isolated from Samples Subjected to Different Incubation Conditions. A. Scale diagram of the size and distance between three different primer pairs from the TBP RNA. Three different primer pairs were designed to be located at different distances from the poly A tail of the RNA. The further the primer pair of from the poly A tail, the more susceptible to degradation the RNA is. B. 15 minute incubation C. 30 minute incubation D. 60 minute incubation. E. 120 minute incubation. ACt is TBP2-TBP1 or TBP3-TBP1. Fold change was calculated using the formula =2A(ACt). Two FFPE samples were used, each in triplicate. The greater the difference in amplification, the worse the quality of template RNA.
Figure 9: Overview of the workflow of an illustrative method of the present invention. Graphical overview of the methods used to integrate an RNA isolation step into the FFPE-TLA protocol.
DETAILED DESCRIPTION OF THE INVENTION
PERMEABILIZATION
The present methods comprise a step of permeabilizing a sample of cross-linked DNA by incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours prior to enzyme treatment.
It has been surprisingly shown that including a permeabilization step according to the present invention prior to enzyme treatment improves the quality of data that may be generated from proximity ligation-based sequencing methods, in particular when applied to a FFPE sample.
Further, the use of a permeabilization step according to the present invention allows recovery of RNA and cross-linked DNA from the same starting sample, in particularly recovery of RNA and cross- linked DNA from the same starting FFPE sample. This allows for the integration of an RNA isolation procedure into, for example, a proximity ligation protocol on FFPE sample without sacrificing the quality of data extracted from the DNA sequences.
Suitably, the methods may comprise incubating the sample at about 75 °C to about 100 °C. Suitably, the methods may comprise incubating the sample at about 75 °C to about 100 °C, about 78 °C to about 100 °C, or about 80 °C to about 100 °C, about 80 °C to about 95 °C, about 80 °C to about 90 °C, or 85 °C to about 90 °C.
Suitably, the methods may comprise incubating the sample at about 78 °C, about 79 °C, about 80 °C, about 81 °C, about 82 °C, about 83 °C, about 84 °C, or about 85 °C.
Suitably, the methods may comprise incubating the sample at about 84 °C, about 85 °C, about 86 °C, about 87 °C, about 88 °C, about 89 °C, about 90 °C, about 91 °C, about 92 °C, about 93 °C, about 94 °C, about 95 °C, about 96 °C, about 97 °C, about 98 °C, about 99 °C or about 100 °C.
Suitably, the methods may comprise incubating the sample at about 84 °C, about 85 °C, about 86 °C, about 87 °C, about 88 °C, about 89 °C, about 90 °C, about 91 °C, about 92 °C, about 93 °C, about 94 °C, or about 95 °C. Suitably, the methods may comprise incubating the sample at about 84 °C, about 85 °C, about 86 °C, about 87 °C, about 88 °C, about 89 °C, or about 90 °C.
Suitably, the methods may comprise incubating the sample at about 80 °C.
Suitably, the methods may comprise incubating the sample at about 90 °C.
Suitably, the methods may comprise incubating the sample at about 95 °C.
Suitably, the methods may comprise incubating the sample at about 98 °C.
The present methods may comprise incubating the sample for about 1 minute to about 4 hours. The present methods may comprise incubating the sample for about 30 minutes to about 4 hours. The present methods may comprise incubating the sample for about for about 30 minutes to about 3 hours. The present methods may comprise incubating the sample for about for about 1 hour to about 3 hours.
The present methods may comprise incubating the sample for about for about 2 hour to about 3 hours.
The present methods may comprise incubating the sample for about for about 2 hour or about 3 hours.
The present methods may comprise incubating the sample for about 2 hours.
The present methods may comprise incubating the sample for about 3 hours.
The present methods may comprise incubating the sample for about 30 minutes to about 1 hour. The present methods may comprise incubating the sample for about 30 minutes. The present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 4 hours.
The present methods may comprise incubating the sample at about 75 °C to about 85 °C for about 1 hour to about 3 hours. The present methods may comprise incubating the sample at 75 °C to about 85 °C for about 2 hour to about 3 hours.
The present methods may comprise incubating the sample at about 75 °C to about 85 °C for about 30 minutes to about 1 hour. The present methods may comprise incubating the sample at about 75 °C to about 85 °C for about 30 minutes.
The present methods may comprise incubating the sample at 75 °C to about 85 °C for about 3 hours.
The present methods may comprise incubating the sample at 75 °C to about 85 °C for about 2 hours.
The present methods may comprise incubating the sample at about 78 °C to about 83 °C for about 30 minutes to about 4 hours. The present methods may comprise incubating the sample at about 78 °C to about 83 °C about 1 hour to about 4 hours. The present methods may comprise incubating the sample at 78 °C to about 83 °C for about 2 hour to about 3 hours. The present methods may comprise incubating the sample at 78 °C to about 83 °C for about 3 hours. Preferably, the present methods may comprise incubating the sample at 78 °C to about 83 °C for about 2 hours.
The present methods may comprise incubating the sample at about 78 °C to about 83 °C for about 30 minutes to about 1 hour. The present methods may comprise incubating the sample at about 78 °C to about 83 °C for about 30 minutes
The present methods may comprise incubating the sample at about 80 °C for about 30 minutes to about 4 hours. The present methods may comprise incubating the sample at about 80 °C for about for about 30 minutes to about 3 hours. The present methods may comprise incubating the sample at about 80 °C for about for about 1 hour to about 3 hours. The present methods may comprise incubating the sample at about 80 °C for about for about 2 hour to about 3 hours. The present methods may comprise incubating the sample at about 80 °C for about for about 2 hour or about 3 hours.
The present methods may comprise incubating the sample at about 80 °C for about 3 hours.
The present methods may comprise incubating the sample at about 80 °C for about 30 minutes to about 1 hour. The present methods may comprise incubating the sample at about 80 °C for about 30 minutes
The present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours. The present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 4 hours. The present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 2 hours. The present methods may comprise incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 1 hour.
The present methods may comprise incubating the sample at about 75 °C to about 90 °C for about 30 minutes to about 4 hours. The present methods may comprise incubating the sample at about 75 °C to about 90 °C for about 30 minutes to about 2 hours. The present methods may comprise incubating the sample at about 75 °C to about 90 °C for about 30 minutes to about 1 hour. The present methods may comprise incubating the sample at about 75 °C to about 90 °C for about 1 hour.
The present methods may comprise incubating the sample at about 80 °C to about 100 °C for about 30 minutes to about 1 hour. The present methods may comprise incubating the sample at about 85 °C to about 100 °C for about 30 minutes to about 1 hour. The present methods may comprise incubating the sample at about 90 °C to about 100 °C for about 30 minutes to about 1 hour.
The present methods may comprise incubating the sample at about 80 °C to about 90 °C for about 30 minutes to about 1 hour. The present methods may comprise incubating the sample at about 85 °C to about 90 °C for about 30 minutes to about 1 hour.
Without wishing to be bound by theory, lower temperatures encompassed by the present methods may be most effective when used with a longer time period. For example, the present methods may comprise incubating the sample at about 75 °C to about 85 °C for about 1 hour to about 3 hours. Preferably, the present methods may comprise incubating the sample at 75 °C to about 85 °C for about 2 hour to about 3 hours.
Suitably, the present methods may comprise incubating the sample at about 90 °C for about 1 minute to about 2 hours. Such embodiments, with a relatively high incubation temperature (e.g. about 90 °C to about 100 °C) for a relatively short time period (e.g. about 1 minute to about 1 hour) may provide a particular advantage of allowing a suitable permeabilization within a shorter incubation period and thus contribute to shortening the total process time.
In certain embodiments, the present invention provides a particularly high incubation temperature (e.g. about 95 °C to about 100 °C) for a very short time period (e.g. about 1 minute, about 5 minutes, about 10 minutes, about 15 minutes, or about 30 minutes). Such embodiments may provide an advantage of allowing a suitable permeabilization within an even shorter incubation period and thus contribute to shortening the total process time.
In such embodiments, the method may comprise incubating the sample at about 90 °C to about 100 °C for about 1 minute to about 2 hours. Preferably, in such embodiments, the method may comprise incubating the sample at about 90 °C to about 100 °C for about 1 minute to about 1 hour. Even more preferably, in such embodiments the method may comprise incubating the sample at about 90 °C to about 100 °C for about 1 minute to about 30 minutes.
The method may comprise incubating the sample at about 90 °C for about 5 minutes to about 2 hours. For example, the method may comprise incubating the sample at about 90 °C for about 5 minutes to about 90 minutes, about 5 minutes to about 60 minutes or about 5 minutes to about 30 minutes. Preferably, the method may comprise incubating the sample at about 90 °C for about 30 minutes.
The method may comprise incubating the sample at about 95 °C for about 1 minute to about 1 hour. For example, the method may comprise incubating the sample at about 95 °C for about 1 minute to about 30 minutes, about 5 minutes to about 30 minutes or about 15 minutes to about 30 minutes. Preferably the method may comprise incubating the sample at about 95 °C for about 20 minutes.
The method may comprise incubating the sample at about 98 °C for about 1 minute to about 45 minutes. For example, the method may comprise incubating the sample at about 98 °C for about 10 to 30 minutes, or about 10 minutes to about 20 minutes. Preferably method may comprise incubating the sample at about 98 °C for about 12 minutes.
The method may comprise incubating the sample under conditions that may be advantageous for RNA quantity and/or quality recovery.
Suitably, the permeabilization step may comprise incubating the sample at about 90 °C to 95°C for about 15 to 60 minutes.
Suitably, the permeabilization step may comprise incubating the sample at about 90 °C for about 15 to 60 minutes, preferably about 90 °C for about 30 to 60 minutes.
The present methods may provide particularly advantageous permeabilization conditions based on the quality of the cross-linked DNA in the sample, in particular in a FFPE sample.
In particular, for samples determined to comprise “average-quality DNA” or “poor-quality DNA” a permeabilization incubation from about 1 hour to about 3 hours according to the invention may be performed.
For samples determined to comprise “high-quality DNA” a permeabilization incubation of less than about 1 hour at about 78 °C to about 85 °C, for example about 30 minutes to 1 hour, for example about 30 minutes according to the invention may be performed.
“DNA quality” as used herein may be determined based on the average DNA fragment size of a cross- linked sample prior to any optional further fragmentation steps as described herein. Average DNA fragment size may be determined using methods which are known in the art, for example agarose gel electrophoresis, Tape station, bioanalyzer or devices using similar technology. A high-quality sample may have an average DNA fragment size of greater than 3000 base pairs. A high-quality sample may have an average DNA fragment size of greater than 5000 base pairs.
A high-quality sample may have an average DNA fragment size of between about 3000 base pairs and about 10000 base pairs, between about 5000 base pairs and about 10000 base pairs or between about 8000 base pairs and about 10000 base pairs.
A high-quality sample may have an average DNA fragment size about 3000 base pairs, about 5000 base pairs, about 8000 base pairs, or about 10000 base pairs.
An average -quality sample may have an average DNA fragment size of less than 3000 base pairs.
An average -quality sample may have an average DNA fragment size of between about 1500 base pairs and about 3000 base pairs, between about 800 base pairs and about 3000 base pairs or between about 450 base pairs and about 3000 base pairs.
A high-quality sample may have an average DNA fragment size about 3000 base pairs, about 1500 base pairs, about 800 base pairs, or about 450 base pairs.
A low-quality sample may have an average DNA fragment size of less than about 450 base pairs.
Suitably, the present methods may comprise incubating the sample at about 78 °C to about 85 °C, for example at about 80 °C to about 85 °C for about 1 hour to about 2 hours. Such embodiments may provide a particular advantage of allowing a suitable permeabilization of a range of different sample types (e.g. samples with different DNA quality). These conditions may therefore be broadly applied by a user without requiring an additional step to determine the DNA quality of the sample before performing the permeabilisation. This may be particularly advantageous in allowing a standard set of conditions to be applied within a process.
Suitably, the present permeabilization step may be performed in a detergent buffer, for example in a buffer comprising sodium dodecyl sulfate (SDS). The buffer may comprise 0.1-0.3% SDS, for example. Titron-X or NP-40 may subsequently be used to sequester the SDS used in the permeabilization step.
Suitably, the present permeabilization step does not result in cell lysis. Suitably, a cell lysis step refers to an incubation which results in the polynucleotides, in particular genomic DNA and more particularly cross-linked genomic DNA, for subsequent method steps being released as soluble polynucleotides in the resulting supernatant when a standard centrifugation in performed immediately following the incubation step (e.g. 10 minutes at 13,000 rpm at 4°C). In contrast, suitably the polynucleotides, in particular genomic DNA and more particularly cross-linked genomic DNA, for subsequent method steps are retaining the pellet when a standard centrifugation in performed immediately following the present permeabilization incubation step (e.g. 10 minutes at 13,000 rpm at 4°C).
Preferably, in the context of the present invention, a cell lysis step refers to an incubation that results in genomic DNA, particularly cross-linked genomic DNA, for subsequent method steps being released as soluble polynucleotides in the resulting supernatant when a standard centrifugation is performed immediately following the incubation step (e.g. 1-10 minutes at 10,000-13,000 rpm at 4°C). In contrast, in the present methods genomic DNA and more particularly cross-linked genomic DNA, for subsequent method steps are retaining the pellet when a standard centrifugation is performed immediately following the incubation step (e.g. 1-10 minutes at 10,000-13,000 rpm at 4°C at 4°C).
Suitably, following the permeabilization step of the present invention at least one DNA-acting enzyme as defined herein may be applied to the sample prior to any proteinase treatment (e.g. proteinase K treatment). Accordingly, it is an advantage of the present invention that it may allow DNA-acting enzymes to be applied to the sample prior a proteinase (e.g. proteinase K) treatment. Without wishing to be bound by theory, it is considered that this feature may be advantageous because proteinase K treatment may cause DNA fragments to be released from the insoluble sample by degrading the proteins to which the DNA is cross-linked. For a proximity ligation to take place it is important for fragments to remain (at least partially) intact and remain connected via crosslinks - as achieved by the parameters of the present permeabilization step. This retention of crosslinks is important to provide the necessary complexity of cross-linked DNA fragments for effective proximity ligation analysis. Thus, the present permeabilization step is considered particularly effective because it retains the necessary complexity of DNA/protein cross-links whilst improving the ability of DNA- acting enzymes to access the cross-linked DNA. Suitably, the permeabilization step of the present invention does not include a proteinase (e.g. proteinase K) treatment. In particular, the permeabilization step of the present invention does not comprise the addition/presence of a proteinase (e.g. proteinase K) to the sample prior to or during the permeabilization incubation.
Further, previous methods for recovery of RNA from a cross-linked sample typically comprise a step of adding a proteinase (e.g. proteinase K). This typically involves the addition of proteinase K at concentrations/conditions that result in DNA molecules that were connected via a protein bridge being separated. Thus standard proteinase K treatments, which are not required in the present methods, contribute to an incompatibility with co-recovery of cross-linked DNA and soluble RNA. Such a proteinase K treatment step is not required in the methods of the present invention. SAMPLE
A sample of cross-linked polynucleotides (e.g. DNA and/or RNA) refers to a sample comprising polynucleotides (e.g. DNA and/or RNA) which has been subjected to cross-linking. Samples may be taken from a patient and/or from diseases tissue, and may also be derived from other organisms or from separate sections of the same organism, such as samples from one patient, one sample from healthy tissue and one sample from diseased tissue. Samples may thus be analysed according to the invention and compared with a reference sample, or different samples may be analysed and compared with each other. For example, from a patient being suspected of having breast cancer, a biopsy may be obtained from the suspected tumour. Another biopsy may be obtained from non-diseased tissue. From both tissue biopsies may be analysed according to the invention. Genomic regions of interests may be the BRCA1 and BRCA2 gene, which genes are 83 and 86 kb long. By determining the genomic region of interest sequence according to the invention and comparing the genomic region sequences of the different biopsies with each other and/or with a reference BRCA gene sequence, genetic mutations may be found that will assist in diagnosing the patient and/or determining treatment of the patient and/or predicting prognosis of disease progression.
Suitably, the present methods are for use with samples that have undergone a heavy cross-linking procedure (e.g. a fixation procedure).
Examples of suitable cross-linked samples are known in the art and include, but are not limited to, sample cross-linked with formalin or formaldehyde
Preferably, the sample is a formalin cross-linking sample. Suitably, the sample may be a paraffin embedded sample. In particular, the sample may be a Formalin-Fixed Paraffin-Embedded (FFPE) sample.
The sample may be a tissue sample. The sample may be a tumour sample.
Suitably, the sample may be a FFPE tumour sample.
The sample may be a slice or a puncture from a FFPE sample.
Suitably, the cross-linked polynucleotides may be cross-linked DNA. Suitably, the cross-linked DNA may be part of a chromatin complex.
A “sample of crosslinked DNA” is a sample DNA which has been subjected to crosslinking. Crosslinking the sample DNA has the effect that the three-dimensional state of the DNA within the sample remains largely intact. This way, DNA strands that are in physical proximity of each other remain in each others vicinity. Suitably, the sample is not placed on a flat surface or a solid supporting base during the present permeabilization step. For example, the sample is not placed on a flat surface of a solid supporting base during the permeabilization step. Suitably, the sample is suspended in solution during the permeabilization step of the present invention.
PARAFFIN REMOVAL
Embodiments of the present methods in which the sample is paraffin embedded, for example an FFPE sample, may comprise an initial paraffin removal step. Suitable methods for paraffin removal are known in the art and include, for example, xylene treatment, sonication and/or heating the sample for a short time period (e.g. 80°C for around 3 minutes).
The methods may also comprise a sonication step prior to the present permeablization step. Such a sonication step may be particularly advantageous in embodiments where soluble RNA is to be isolated following the permeablization, as it may further aid release of the RNA. Methods and techniques for suitable sonication are well known in the art. Suitably, the sonication step is a mild sonication. For example, the sonication should not result in cell lysis - as defined herein. The aim of the sonication step is to add energy/heat to dissolve and solubilize any left over paraffin and disrupt the tissue. The DNA and/or RNA should essentially not be sheared be the sonication. Further, the tissue should essentially not be solubilized by the sonication. The tissue should essentially disperse into smaller insoluble pieces. An exemplary sonication step uses a Covaris M220 for 300 seconds, Duty factor 20%, Power 75 Watts, 200 cycles/burst at 20°C.
RNA RECOVERY
In some aspects, the present invention provides methods that enable the simultaneous recovery of cross-linked DNA and soluble RNA from a single starting sample, such as a FFPE sample.
Accordingly, in one aspect the present invention provides a method for recovering soluble RNA and cross-linked DNA from a sample comprising cross-linked cells or tissue; which method comprises a step of permeabilizing the sample by an incubation at about 75 °C to about 100 °C for about 1 minute to about 4 hours.
The RNA may be separated from the cross-linked DNA following the permeabilization step. The separation may be achieved, for example, using standard centrifugation parameters for separating soluble and insoluble components. For example, centrifugation may be performed for 1-10 minutes at 10, 000-13, OOOg.
The recovered RNA may be purified (e.g. following its removal in the supernatant after centrifugation) using known RNA purification method - including for example, commercial available RNA purification kits. Such methods may comprise the use of magnetic beads or spin columns, for example.
The recovered or purified RNA may be subject to standard RNA analysis methods including, for example, amplification and/or sequencing of the RNA. Such methods are well-known in the art and include, but are not limited to, reverse-transcription PCR, quantitative PCR and next generation sequencing (RNA-Seq).
In some embodiments, the cross-linked DNA isolated from the same starting sample as the RNA may subsequently be used in a method for determining at least part of the sequences of DNA fragments from the cross-linked DNA or a method for determining the presence or absence of a mutation in a genomic region of interest; as described herein.
DETERMINING THE SEQUENCE AND/OR PRESENCE OF A MUTATION
Methods for determining at least part of the sequences of DNA fragments and determining the presence or absence of a mutation in a genomic region of interest are described in WO 2012/005595; for example.
A “genomic region of interest” according to the invention is a DNA sequence of an organism of which it is desirable to determine, at least part of, the DNA sequence. For instance, a genomic region which is suspected of comprising an allele associated with a disease may be a genomic region of interest. As used herein, the term “allele(s)” means any of one or more alternative forms of a gene at a particular locus. In a diploid cell of an organism, alleles of a given gene are located at a specific location, or locus (loci plural) on a chromosome. One allele is present on each chromosome of the pair of homologous chromosomes. Thus, in a diploid cell, two alleles and thus two separate (different) genomic regions of interest may exist.
A “nucleic acid” or “polynucleotide” as referred to herein may include any polymer or oligomer of pyrimidine and purine bases, preferably cytosine, thymine, and uracil, and adenine and guanine, respectively. The present invention contemplates any deoxyribonucleotide, ribonucleotide or peptide nucleic acid component, and any chemical variants thereof, such as methylated, hydroxymethylated or glycosylated forms of these bases, and the like. The polymers or oligomers may be heterogeneous or homogenous in composition, and may be isolated from naturally occurring sources or may be artificially or synthetically produced. In addition, the nucleic acids may be DNA or RNA, or a mixture thereof, and may exist permanently or transitionally in single-stranded or double-stranded form, including homoduplex, heteroduplex, and hybrid states.
An “identifier” is a short sequence that can be added to an adaptor or a primer or included in its sequence or otherwise used as label to provide a unique identifier. Such a sequence identifier (or tag) can be a unique base sequence of varying but defined length, typically from 4-16 bp used for identifying a specific nucleic acid sample. For instance 4 bp tags allow 4(exp4) = 256 different tags. Typical examples are ZIP sequences, known in the art as commonly used tags for unique detection by hybridization. Identifiers are useful according to the invention, as by using such an identifier, the origin of a (PCR) sample can be determined upon further processing. In the case of combining processed products originating from different nucleic acid samples, the different nucleic acid samples may be identified using different identifiers. For instance, as according to the invention sequencing may be performed using high throughput sequencing, multiple samples may be combined. Identifiers may then assist in identifying the sequences corresponding to the different samples. Identifiers may also be included in adaptors for ligation to DNA fragments assisting in DNA fragment sequences identification. Identifiers preferably differ from each other by at least two base pairs and preferably do not contain two identical consecutive bases to prevent misreads. The identifier function can sometimes be combined with other functionalities such as adaptors or primers.
With the term “aligning” and “alignment” is meant the comparison of two or more nucleotide sequence based on the presence of short or long stretches of identical or similar nucleotides. Methods and computer programs for alignment are well known in the art.
FRAGMENTING
“Fragmenting” includes any technique that, when applied to polynucleotides, in particular DNA which may be crosslinked DNA or not, results in fragments. Techniques well known in the art are sonication, shearing and/or enzymatic restriction, but other techniques can also be envisaged.
DNA is cross-linked during the preparation of FFPE samples, in particular because of the formalin fixation performed. Further, the preparation of FFPE samples results in fragmentation of the DNA. The DNA extracted from an FFPE sample may have an average fragment size of around 50 bases, around 100 bases, around 200 bases, around 400 bases, around 800 bases, around 1500 bases, around 3000 bases, around 8000 bases or around 10000 bases.
Accordingly, further fragmentation steps of the present methods may be optional. For example, in embodiments where the sample is an FFPE sample any further fragmentation step may be optional.
In particular, by fragmenting a sample of crosslinked DNA, the DNA fragments that originate from a genomic region of interest remain in proximity of each other because they are crosslinked. When these crosslinked DNA fragments are subsequently ligated, DNA fragments of the genomic region of interest, which are in the proximity of each other due to the crosslinks, are ligated. This type of ligation may also be referred to as proximity ligation. DNA fragments comprising the target nucleotide sequence may ligate with DNA fragments within a large linear distance on sequence level. By determining (at least part of) the sequence of ligated fragments that comprise the fragment comprising a target nucleotide sequence, sequences of DNA fragments within the spatial surrounding of the genomic region of interest are obtained. Each individual target nucleotide sequence is likely to be crosslinked to multiple other DNA fragments. As a consequence, often more than one DNA fragment may be ligated to a fragment comprising the target nucleotide sequence. By combining (partial) sequences of the (amplified) ligated DNA fragments that were ligated with a fragment comprising the target nucleotide sequence, a sequence of the genomic region of interest may be built. A DNA fragment ligated with the fragment comprising the target nucleotide sequence includes any fragment which may be present in ligated DNA fragments.
The sample for use in the present methods comprises fragmented polynucleotides, in particular fragmented DNA. In addition the samples of crosslinked DNA may be further fragmented in embodiments of the present invention. By fragmenting the crosslinked DNA, DNA fragments are produced which are held together by the crosslinks.
The fragmenting may comprise sonication, and may be followed by enzymatic DNA end repair. Sonication results in the fragmenting of DNA at random sites, which can be either blunt ended, or can have 3’- or 5’- overhangs, as these DNA breakage points occur randomly, the DNA may be repaired (enzymatically), filling in possible 3’- or 5 ’-overhangs, such that DNA fragments are obtained which have blunt ends that allow ligation of the fragments to adaptors and/or to each other in the subsequent step. Alternatively, the overhangs may also be made blunt ended by removing overhanging nucleotides, using e.g. exonucleases. The fragmenting may comprise fragmenting with one or more restriction enzymes, or combinations thereof. Fragmenting with a restriction enzyme is advantageous as it may allow control of the average fragment size. The fragments that are formed may have compatible overhangs or blunt ends that allow ligation of the fragments in the subsequent step. Suitably, the fragmenting may be performed using SI nuclease to generate blunt ended fragments. Suitably, the fragmenting) may be performed using DNasel. Furthermore, when dividing a sample of cross-linked DNA into a plurality of subsamples, for each subsample restriction enzymes with different recognition sites may be used. This is advantageous because by using different restriction enzymes having different recognition sites, different DNA fragments can be obtained from each subsample.
A “restriction endonuclease” or “restriction enzyme” is an enzyme that recognizes a specific nucleotide sequence (recognition site) in a double-stranded DNA molecule, and will cleave both strands of the DNA molecule at or near every recognition site, leaving a blunt or a 3’- or 5’- overhanging end. The specific nucleotide sequence which is recognized may determine the frequency of cleaving, e.g. a nucleotide sequence of 6 nucleotides occurs on average every 4096 nucleotides, whereas a nucleotide sequence of 4 nucleotides occurs much more frequently, on average every 256 nucleotides. LIGATION
The present methods may comprise a “ligation” step. “Ligating” involves the joining of separate DNA fragments. The DNA fragments may be blunt ended, or may have compatible overhangs (sticky overhangs) such that the overhangs can hybridise with each other. The joining of the DNA fragments may be enzymatic, with a ligase enzyme, DNA ligase. However, a non-enzymatic ligation may also be used, as long as DNA fragments are joined, i.e. forming a covalent bond. Typically a phosphodiester bond between the hydroxyl and phosphate group of the separate strands is formed.
Since a fragment comprising a target nucleotide sequence may be crosslinked to multiple other DNA fragments, more than one DNA fragment may be ligated to the fragment comprising the target nucleotide sequence. This may result in combinations of DNA fragments which are in proximity of each other as they are held together by the cross links. Different combinations and/or order of the DNA fragments in ligated DNA fragments may be formed. In case the DNA fragments are obtained via enzymatic restriction, the recognition site of the restriction enzyme is known, which makes it possible to identify the fragments as remains of or reconstituted restriction enzyme recognition sites may indicate the separation between different DNA fragments. Irrespective of what fragmenting method is used, the ligation step may be performed in the presence of an adaptor, ligating adaptor sequences in between fragments. Alternatively the adaptor may be ligated in a separate step. This may be advantageous because the different fragments can be easily identified by identifying the adaptor sequences which are located in between the fragments.
The ligated DNA fragments may optionally be further fragmented, preferably with a restriction enzyme. The optional first and second fragmenting step may be aimed at obtaining ligated DNA fragments of a size which is compatible with the subsequent amplification step and/or sequence determination step. In addition, a second fragmenting step, preferably with an enzyme may result in ligated fragment ends which are compatible with the optional ligation of an adaptor. The second fragmenting step may be performed after reversing the crosslinking, however, it is also possible to perform the second fragmenting step and/or ligation step while the DNA fragments are still crosslinked.
In case the first and second fragmenting steps each comprise restriction enzymes, it is preferred that the restriction enzyme recognition site of the second fragmentation step is longer than the recognition site of the restriction enzyme used in the first fragmentation step. The second enzyme thus cuts at a lower frequency than the first enzyme. This means that the average DNA fragment size after the first fragmentation is smaller than the average fragment size obtained after the second fragmentation step. This way, in the first fragmenting step, relatively small fragments are formed, which are subsequently ligated. As the second restriction enzyme cuts less frequently, most of the DNA fragments may not comprise the restriction recognition site of the second restriction enzyme. Thus, when the ligated DNA fragments are subsequently fragmented in the second fragmentation step, many of the initial DNA fragments may remain intact. This is useful because the combined sequences of the initial DNA fragments may be used to build a contig for the genomic region of interest. If the first optional fragmenting step is less frequent than the second optional fragmenting step, the result would be that the initial fragment are generally further fragmented, which may result in the loss of relatively large DNA sequences that are useful for building a contig. Thus, irrespective of which method would be used for the first and second optional fragmenting steps, it is preferred that the first optional fragmenting step is more frequent as compared to the second optional fragmenting step, such that DNA fragments may largely remain intact, i.e. are largely not further fragmented in the second optional fragmentation step.
REVERSING CROSS-LINKING
“Reversing crosslinking” is used herein to refer to breaking the crosslinks such that the DNA that has been crosslinked is no longer crosslinked and is suitable for subsequent amplification and/or sequencing steps. For example, performing a protease K treatment on a sample DNA that has been crosslinked with formaldehyde will digest the protein present in the sample. Because the crosslinked DNA is connected indirectly via protein, the protease treatment in itself may reverse the crosslinking between the DNA. However, the protein fragments that remain connected to the DNA may hamper subsequent sequencing and/or amplification. Hence, reversing the connections between the DNA and the protein may also result in “reversing crosslinking”. The DNA-crosslinker-protein connection may be reversed through a heating step for example by incubating at 70°C. As in a sample DNA large amounts of protein is present, it is often desirable to digest the protein with a protease in addition. Hence, any “reversing crosslinking” method may be contemplated wherein the DNA strands that are connected in a crosslinked sample becomes suitable for sequencing and/or amplification.
The reverse crosslinking results in a pool of ligated DNA fragments that comprise two or more fragments, preferably three or more. A subpopulation of the pool of ligated DNA fragments comprises a DNA fragment which comprises the target nucleotide sequence. By reversing the crosslinking, the structural/spatial fixation of the DNA is released and the DNA sequence becomes available for subsequent steps, e.g. amplification and/or sequencing, as crosslinked DNA may not be a suitable substrate for such steps. The subsequent steps may be performed after the reversal of the crosslinking, however, the ligation and optional second fragmentation steps may also be performed while the ligated DNA fragments are still in the crosslinked state.
ADAPTOR
An “adaptor” is a short double -stranded oligonucleotide molecule with a limited number of base pairs, e.g. about 10 to about 30 base pairs in length, which are designed such that they can be ligated to the ends of fragments. Adaptors are generally composed of two synthetic oligonucleotides which have nucleotide sequences which are partially complementary to each other. When mixing the two synthetic oligonucleotides in solution under appropriate conditions, they will anneal to each other forming a double-stranded structure. After annealing, one end of the adaptor molecule may be designed such that it is compatible with the end of a restriction fragment and can be ligated thereto; the other end of the adaptor can be designed so that it cannot be ligated, but this does need not to be the case, for instance when an adaptor is to be ligated in between DNA fragments.
At least one adaptor is optionally ligated to the ligated DNA fragments. The ends of the ligated DNA fragments need to be compatible with ligation of such an adaptor. As the ligated DNA fragments may be linear DNA, ligation of an adaptor may provide for a primer hybridisation sequence. The adaptor sequence ligated with ligated DNA fragments comprising the target nucleotide sequence provides DNA molecules which may be amplified using PCR.
AMPLIFICATION
In the present methods, DNA comprising the target nucleotide sequence may be amplified using at least one oligonucleotide primer which hybridises to the target nucleotide sequence, and at least one additional primer which hybridises to the at least one adaptor. As the step of ligating an adaptor is optional, the DNA comprising the target nucleotide may also be amplified in using at least one oligonucleotide primer which hybridises to the target nucleotide sequence.
“Amplifying” refers to a polynucleotide amplification reaction, namely, a population of polynucleotides that are replicated from one or more starting sequences. Amplifying may refer to a variety of amplification reactions, including but not limited to polymerase chain reaction (PCR), linear polymerase reactions, nucleic acid sequence- based amplification, rolling circle amplification and like reactions.
“Oligonucleotide primers”, in general, refer to strands of nucleotides which can prime the synthesis of DNA. DNA polymerase cannot synthesize DNA de novo without primers. A primer hybridises to the DNA, i.e. base pairs are formed. Nucleotides that can form base pairs, that are complementary to one another, are e.g. cytosine and guanine, thymine and adenine, adenine and uracil, guanine and uracil. The complementarity between the primer and the existing DNA strand does not have to be 100%, i.e. not all bases of a primer need to base pair with the existing DNA strand. From the 3 ’-end of a primer hybridised with the existing DNA strand, nucleotides are incorporated using the existing strand as a template (template directed DNA synthesis). We may refer to the synthetic oligonucleotide molecules which are used in an amplification reaction as “primers”.
SEQUENCING “Sequencing” refers to determining the order of nucleotides (base sequences) in a nucleic acid sample, e.g. DNA or RNA. Many techniques are available such as Sanger sequencing and High throughput sequencing technologies such as offered by Roche, Illumina and Applied Biosystems.
The sequence of the (amplified) ligated DNA fragments, for example comprising the target nucleotide sequence is determined. Determining the sequence is preferably performed using high throughput sequencing technology, as this is more convenient and allows a high number of sequences to be determined to cover the complete genomic region of interest. From these determined sequences a contig may be built of the genomic region of interest. When sequences of the DNA fragments are determined, overlapping reads may be obtained from which the genomic region of interest may be built. In case the DNA fragments were obtained by random fragmentation, the random nature of the fragmentation step already may result in DNA fragments which when sequenced results in overlapping reads. By increasing the sample size, e.g. increasing the number of cells analysed, the reliability of the genomic region of interest that is built may be increased.
In general, sequencing adaptors may be ligated to the (amplified) ligated DNA fragments. In case the linear or circularized fragment is amplified, by using for example PCR as described herein, the amplified product is linear, allowing the ligation of the adaptors. Suitable ends may be provided for ligating adaptor sequences (e.g. blunt, complementary staggered ends). Alternatively, primer(s) used for PCR or other amplification method, may include adaptor sequences, such that amplified products with adaptor sequences are formed in the amplification. In case the circularized fragment is not amplified, the circularized fragment may be fragmented, preferably by using for example a restriction enzyme in between primer binding sites for the inverse PCR reaction, such that DNA fragments ligated with the DNA fragment comprising the target nucleotide sequence remain intact. Sequencing adaptors may also be included in the ligation steps of the methods of the invention. These sequencing adaptors may be included as part of the adaptor sequences of the adaptors that may already optionally used in these steps and/or separate sequence adaptors may be provided in these steps in addition.
Preferably long reads may be generated in the high throughput sequencing method used. Long reads may allow one to read across multiple DNA fragments of ligated DNA fragments. This way, initial DNA fragments or DNA fragments generated during optional further fragmentation in step (b) may be identified. DNA fragment sequences may be compared to a reference sequence and/or compared with each other. For example, such DNA fragment sequences may be used for determining the ratio of fragments of cells carrying a genetic mutation. By sequencing also DNA fragment sequences of DNA fragments adjacent to such sequences, unique ligated DNA fragments may be identified. This is in particular the case when DNA fragments were obtained in step by random fragmentation. The chance that two cells will provide for the exact same DNA fragment is very small, let alone that the DNA fragment ends to which such a fragment is ligated will be the same. Thus, by identifying DNA fragments this way, the ratio of cells and/or genomic regions of interest comprising a particular mutation may be determined.
Hence, it is not required to provide for a complete sequence of the ligated DNA fragments. It is preferred to at least sequence across (multiple) DNA fragments, i.e. three or more fragments, such that DNA fragment sequences are determined.
It may also be contemplated to read even shorter sequences, for instance, short reads of 50-100 nucleotides. In such a scenario, it is preferred to fragment the (amplified) ligated DNA in smaller fragments, which may be subsequently ligated with an appropriate adaptor suitable for the high throughput sequencing method. In case a standard sequencing protocol would be used, this may mean that the information regarding the ligated DNA fragments may be lost. With short reads it may not be possible to identify a complete DNA fragment sequence. In case such short reads are contemplated, it may be envisioned to provide additional processing steps such that separate ligated DNA fragments when fragmented, are ligated or equipped with identifiers, such that from the short reads, contigs may be built for the ligated DNA fragments. Such high throughput sequencing technologies involving short sequence reads may involve paired end sequencing. By using paired end sequencing and short sequence reads, the short reads from both ends of a DNA molecule used for sequencing, which DNA molecule may comprise different DNA fragments, may allow coupling of DNA fragments that were ligated. This is because two sequence reads can be coupled spanning a relatively large DNA sequence relative to the sequence that was determined from both ends. This way, contigs may be built for the (amplified) ligated DNA fragments.
However, using short reads may be contemplated without identifying DNA fragments, because from the short sequence reads a genomic region of interest may be built, especially when the genomic region of interest has been amplified. Information regarding DNA fragments and/or separate genomic region of interests (for instance of a diploid cell) may be lost, but DNA mutations may still be identified.
Thus, the step of determining at least part of the sequence of the (amplified) ligated DNA sequence, may comprise short sequence reads, but preferably longer sequence reads are determined such that DNA fragment sequences may be identified. In addition, it may also be contemplated to use different high throughput sequencing strategies for the (amplified) ligated DNA fragments, e.g. combining short sequence reads from paired end sequencing with the ends relatively far apart with longer sequence reads, this way, contigs may be build for the (amplified) ligated DNA fragments.
CONTIG
The term "contig" is used in connection with DNA sequence analysis, and refers to reassembled contiguous stretches of DNA derived from two or more DNA fragments, preferably three or more DNA fragments, having contiguous nucleotide sequences. Thus, a contig may be a set of overlapping DNA fragments that provides a (partial) contiguous sequence of a genomic region of interest. A contig may also be a set of DNA fragments that, when aligned to a reference sequence, may form a contiguous nucleotide sequence. For example, the term "contig" encompasses a series of (ligated) DNA fragment(s) which are ordered in such a way as to have sequence overlap of each (ligated) DNA fragment(s) with at least one of its neighbours. The linked or coupled (ligated) DNA fragment(s), may be ordered either manually or, preferably, using appropriate computer programs such as FPC, PHRAP, CAP3 etc, and may also be grouped into separate contigs.
CIRCULARIZATION
The present methods may optionally comprise a step of circularization the DNA fragments generated, for example following reverse cross-linking.
Suitably, a restriction enzyme is used for the second fragmentation, as a restriction enzyme allows control of the fragmentation step and results, if an appropriate restriction enzyme is chosen, in compatible ends of the ligated DNA fragments that are favourable for ligation of the compatible ends, resulting in circularized ligated DNA fragments. However, fragmenting using other methods, e.g. shearing and/or sonication and subsequent enzymatic DNA end repair, such that blunt ended double strand DNA is formed may also be ligated to form circularized DNA.
In these embodiments, the optional first and second fragmenting steps are aimed at obtaining ligated DNA fragments which are compatible with the subsequent circularization, amplification step and/or sequence determination step. In case the optional first and second fragmenting steps comprise restriction enzymes, the first and second restriction enzyme may be chosen as described herein.
It may be advantageous to reverse crosslinking before the circularization, because it may be unfavourable to circularize crosslinked DNA while crosslinked. However, circularization may also be performed while the ligated DNA fragments are crosslinked. It may even be possible that an additional circularization step is not required, as during the ligation step, circularized ligated DNA fragments are already formed, and hence circularization would occur simultaneously with ligation. However, it is preferred to perform an additional circularization step. Circularization involves the ligation of the ends of the ligated DNA fragments such that a closed circle is formed.
The circularized DNA comprising ligated DNA fragments which comprise the target nucleotide sequence, may subsequently be amplified using at least one primer which hybridises to the target nucleotide sequence. For the amplification step, reversing the crosslinking is required, as crosslinked DNA may hamper or prevent amplification. Preferably two primers are used that hybridise to the target nucleotide sequence in an inverse PCR reaction. In this way, DNA fragments of the circularized DNA, which are ligated with the DNA fragment comprising the target nucleotide sequence, may be amplified.
SIZE SELECTION
“Size selection” involves techniques with which particular size ranges of molecules, e.g. (ligated) DNA fragments or amplified (ligated) DNA fragments, are selected. Techniques that can be used are for instance gel electrophoresis, size exclusion, gel extraction chromatography, but are not limited thereto, as long as molecules with a particular size can be selected, such a technique will suffice.
Prior to or after the amplification, according to the methods of the invention, a size selection step may be performed. Such a size selection step may be performed using gel extraction chromatography, gel electrophoresis or density gradient centrifugation, which are methods generally known in the art. Preferably DNA is selected of a size between 20-20,0000 base pairs, preferably 50-10,0000 base pairs, most preferably between 100-3,000 base pairs. A size separation step allows to select for (amplified) ligated DNA fragments in a size range that may be optimal for PCR amplification and/or optimal for the sequencing of long reads by next generation sequencing. Sequencing of reads of 500 nucleotides is currently commercially available, recent advances by companies such as the Single Molecule Real Time (SMRT™) DNA Sequencing technology developed by Pacific Biosciences (http://www.pacificbiosciences.com/) indicate that reads of 1.000 to 10,000 nucleotides are within reach.
PLOIDY
In case the ploidy in a cell of a genomic region of interest is greater than 1, for each ploidy a sequence may be determined, the presence or absence of a mutation determined and/or a contig may be built.
Since the genomic environment of any given target site in the genome mostly consists of DNA genome sequences that are physically close to the target sequence on the linear chromosome template, it allows the reconstruction of each particular chromosome template. In case the ploidy of a genomic region of interest is greater than 1, multiple genomic regions of interest are present in a cell (or equivalent thereof). These multiple genomic regions of interest generally do not occupy the same space, i.e. they are separated in space. When a sample of crosslinked DNA of such a cell is fragmented, from each genomic region of interest in a cell a corresponding DNA fragment comprising the target nucleotide sequence will be formed. These DNA fragments will each ligate with DNA fragments in their proximity. Ligated DNA fragments will thus be representative of the different genomic regions of interest. For instance, in case the ploidy is two, when two fragments each having a unique mutation, and separated by 1 MB, would be found together in ligated DNA fragments, it may be concluded that these two fragments are from the same genomic region of interest. Thus, in this scenario, two fragments were identified, and are both assigned to the same genomic region. Thus, when building a contig from the sequences of identified fragments, these two fragments carrying a mutation would be used for building a contig for one particular genomic region, while the contig built for the other genomic region would not carry the mutations.
Thus, according to the methods of the invention, the step determining the sequence of at least part of the sequences of DNA fragments, determining the presence or absence of a mutation or of building a contig comprises the steps of:
1) identifying the fragments;
2) assigning the fragments to a genomic region for a particular allele.
Also, when three fragments comprising a unique mutation occur (A*, B* and C*) and the ploidy of the genomic interest is two. This time, ligation products comprising two of the mutated fragments are identified, one ligation product comprising A*B* and one with A*C*. Also ligation products comprising non-mutated, fragments are identified BC and AC. In this scenario, the ligated DNA fragments A*B and A*C* are coupled by fragment A*, and ligated DNA fragments BC and AC are coupled by fragment C. In this scenario DNA fragments A*, B* and C* are assigned to the same genomic region, while A, B and C are assigned to the other genomic region. Thus, accordingly, the step 2) of assigning the fragments to a genomic region comprises identifying the different ligation products and coupling of the different ligation products comprising the DNA fragments.
In other words, the methods of the present invention may be used for haplotyping.
Likewise, the same would apply for heterogeneous cell populations. For instance, in case a sample of crosslinked DNA is provided which comprises a heterogeneous cell population (e.g. cells with different origin or cells from an organism which comprises normal cells and genetically mutated cells (e.g. cancer cells)), for each genomic region of interest corresponding to different genomic environment (which may e.g. be different genomic environments in a cell or different genomic environments from different cells) a sequence may be determined, the presence or absence of a mutation determined and/or a contig may be built.
MUTATIONS
Methods are provided for identifying the presence or absence of a genetic mutation.
Suitably, the method for identifying the presence or absence of a genetic mutation, may comprise the steps of any of methods of the invention as described above and further identifying the presence or absence of a genetic mutation in a sequence determined.
Genetic mutations can be identified for instance by sequences determined for multiple samples, in case one (or more) of the samples comprises a genetic mutation, this may be observed as the sequence is different when compared to the sequence of the other samples, i.e. the presence of a genetic mutation is identified. In case no sequence differences between the samples is observed, the absence of genetic mutation is identified. Alternatively, a reference sequence may also be used to which the sequence may be aligned. When the sequence of the sample is different from the sequence of the reference sequence, a genetic mutation is observed, i.e. the presence of a genetic mutation is identified. In case no sequence differences between the sample or samples and the reference sequence is observed, the absence of genetic mutation is identified.
It is not required to build a contig for identifying the presence or absence of a genetic mutation. As long as DNA fragments sequences may be aligned, with each other or with a reference sequence, the presence or absence of a genetic mutation may be identified.
In the methods according to the invention, an identified genetic mutation may be a single nucleotide variant (SNV), for example a single nucleotide polymorphism (SNP), an insertion, an inversion and/or a translocation. In case a deletion and/or insertion is observed, the number of fragments and/or ligation products from a sample carrying the deletion and/or insertion may be compared with a reference sample in order to identify the deletion and/or insertion. A deletion, insertion, inversion and/or translocation may also be identified based on the presence of chromosomal breakpoints in analyzed fragments.
In another embodiment, in the methods as described above, the presence or absence of methylated nucleotides is determined in DNA fragments, ligated DNA fragments, and/or genomic regions of interest. For example, the DNA may be treated with bisulphite. Treatment of DNA with bisulphite converts cytosine residues to uracil, but leaves 5-methylcytosine residues unaffected. Thus, bisulphite treatment introduces specific changes in the DNA sequence that depend on the methylation status of individual cytosine residues, yielding single- nucleotide resolution information about the methylation status of a segment of DNA. By dividing samples into subsamples, wherein one of the samples is treated, and the other is not, methylated nucleotides may be identified. Alternatively, sequences from a plurality of samples treated with bisulphite may also be aligned, or a sequence from a sample treated with bisulphite may be aligned to a reference sequence.
When analyzing (short) sequence reads, it may be of interest to prevent sequencing the primers used. Thus, in an alternative method, the primer sequence may be removed prior to the high throughput sequencing step.
In an alternative embodiment, in any of the methods as described herein, primers are used carrying a moiety, e.g. biotin, for the optional purification of (amplified) ligated DNA fragments through binding to a solid support. Suitably, the ligated DNA fragments comprising the target nucleotide sequence may be captured with a hybridisation probe (or capture probe) that hybridises to a target nucleotide sequence. The hybridisation probe may be attached directly to a solid support, or may comprise a moiety, e.g. biotin, to allow binding to a solid support suitable for capturing biotin moieties (e.g. beads coated with streptavidin). In any case, the ligated DNA fragments comprising a target nucleotide sequence are captured thus allowing to separate ligate DNA fragments comprising the target nucleotide sequence from ligated DNA fragments not comprising the target nucleotide sequence. Hence, such a capturing steps allows to enrich for ligated DNA fragments comprising the target nucleotide sequence. Hence, wherein throughout the invention, an amplification step is performed, which is also an enrichment step, alternatively a capture step with a probe directed to a target nucleotide sequence may be performed. For a genomic region of interest at least one capture probe for a target nucleotide sequence may be used for capturing. For a genomic region of interest more than one probe may be used for multiple target nucleotide sequences. For example, similar to as described for the BRCA1 gene, one primer of one of the 5 target nucleotide sequences may be used as a capture probe (A, B, C, D or E). Alternatively, the 5 primers may be used in a combined fashion (A, B, C, D and E) capturing the genomic region of interest.
In one embodiment an amplification step and capture step may be combined, e.g. first performing a capture step and then an amplification step or vice versa.
In one embodiment, a capture probe may be used that hybridises to an adaptor sequence comprised in (amplified) ligated DNA fragments.
This disclosure is not limited by the exemplary methods and materials disclosed herein, and any methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of this disclosure. Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, any nucleic acid sequences are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within this disclosure. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within this disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in this disclosure. It must be noted that as used herein and in the appended claims, the singular forms "a", "an", and "the" include plural referents unless the context clearly dictates otherwise.
The terms "comprising", "comprises" and "comprised of as used herein are synonymous with "including", "includes" or "containing", "contains", and are inclusive or open-ended and do not exclude additional, non-recited members, elements or method steps. The terms "comprising", "comprises" and "comprised of also include the term "consisting of.
The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that such publications constitute prior art to the claims appended hereto.
The invention will now be further described by way of Examples, which are meant to serve to assist one of ordinary skill in the art in carrying out the invention and are not intended in any way to limit the scope of the invention.
EXAMPLES
Example 1 - Determination of appropriate permeablization conditions for FFPE sample proximity ligation technologies
Initial studies to determine the incubation condition that is most suited for obtaining high quality results from proximity ligation experiments on FFPE samples involved incubation of two FFPE samples at 37, 65, 80 and 90°C for either 30 or 60 minutes (Figure 2). Following enrichment, next- generation sequencing (NGS) and analysis it was observed that the 80°C and 90°C permeabilization reactions yield a higher fraction of covered nucleotides surrounding the used viewpoint compared to the 37 and 65°C incubation conditions.
Given the observed improvement using either the 80°C and 90°C incubation condition these became the focus for further optimization experiments.
Subsequently, a 30, 60, 120, 150, 180, 210 and a 240 minute incubation was performed at 80°C for two FFPE samples and a 15, 30, 45 and 60 minute incubation at 90°C was performed for a single FFPE sample (see Figure 3).
Following an 80°C incubation the fraction of co-captured bases surrounding the viewpoint increased in both FFPE samples from 30 up till 180 minutes after which it gradually decreases (Figure 3). Following a 90°C incubation the fraction of covered bases increases with longer incubation times but reaching a higher level of bases covered faster compared to a similar level following the 80°C incubation. These data suggest using 90°C would allow a decrease in incubation time. Subsequent investigation focus on testing incubation conditions at 80°C. The effect of permeabilization incubation time at 80°C on various FFPE samples was tested using 16 FFPE samples with different DNA qualities using 5 different incubation times. Here FFPE quality is measured by the average length of the DNA following DNA isolation and ranged from 450 to lOOOObp. As shown in Figure 4, for samples with lower quality the fraction of covered bases generally increases with longer incubation times, with an optimum between 120 and 240 minutes. The fraction of bases covered using FFPE samples with higher quality, >5000bp, benefits from shorter incubation times, with an optimum between 30 and 120 min.
Conclusion
The fraction of covered bases in a lOOkb window surrounding the location of a viewpoint is a measure for the performance of the FFPE targeted proximity-ligation protocol. Performance using 37 or 65°C incubation temperatures during the permeabilization step yielded insufficient quality data, even after lengthy incubations of 4 to 6 hours. In the experiments presented above it was determined that the fraction of covered bases and thus the quality of the proximity ligation experiment starting from FFPE tissue samples benefits of a prolonged incubation at an elevated temperature compared to conventional permeabilization protocols. Though different optimal incubation times were less effective for different qualities FFPE samples, a protocol of 120 minutes incubation at 80°C yields in all tested cases good quality data.
Methods
Single 10 pm sections of an FFPE tissue block were used. Deparaffmization was performed by heating the samples for 3 minutes at 80°C followed by centrifugation for 2 minutes. Paraffin wax was manually removed after which samples were treated for 30 and 60 minutes at 37, 65 and 80°C with SDS shaking at 800rpm. 10pm sections of a different FFPE block were treated for 30 and 60 minutes at 37, 65, 80°C and 90°C. Following the permeabilization step samples were treated by Nlalll digestion, ligation, NGS adapter ligation and linear PCR. After sequencing on an Illumina MiniSeq machine, the percentage of co-captured bases of each sample within a lOOkb region surrounding the location of the primer set (viewpoint) used for enrichment were determined. Figure 1 shows an overview of the method.
Proximity ligation technologies rely on the ability to use of enzymes (DNA (restriction) endonucleases, DNA end-repair enzymes, DNA ligase and others) on crosslinked samples. In under treated samples enzymes may have difficulty accessing the DNA and thus not be able to optimally digest and ligate DNA. In an over-treated sample DNA may be present as single-stranded DNA molecules hampering effective enzymatic activity and/or proximity information maintained by the formaldehyde induced crosslinks may be (partially) lost due to an undesired reversal of the crosslinks. The success of a permeabilization step in the context of a proximity ligation technology may therefore measurable in the completeness of coverage (%bases covered) in the region flanking the PCR primer set. Figure 2 shows the fraction of covered nucleotides in the lOOkb region surrounding the location of the primer set after 30- and 60-minute incubation at different temperatures for two different FFPE samples tested.
The preferred incubation time was first determined using 10 pm sections of a single FFPE tissue block. Sections were deparaffmized and permeabilized at 7 different incubation times varying from 30 minutes to 240 minutes at 80°C and incubations from 15 to 60 minutes at 90°C. After the permeabilization step all samples were equally treated according to a protocol using linear PCR and prepped for next-generation sequencing (NGS). Following NGS, the % of co-captured bases was determined for each condition. Figure 3 shows the fraction of covered nucleotides in the lOOkb region surrounding the location of the primer set.
The effect of the permeabilization time at 80°C on various FFPE sample qualities was tested using single 10 pm sections of 16 different FFPE tissue blocks. Sections were deparaffmized and permeabilized for 30 and 60 minutes at 37, 65 and 80°C with SDS shaking at 800rpm for 30, 60, 120, 180, 240 minutes. The FFPE tissues ranged in quality from low to high. The lowest quality yield DNA with an average size of 450bp and the best quality FFPE yields DNA with an average size of 10000 bp. After the permeabilization step all samples were equally treated according to a protocol using linear PCR and prepped for sequencing. After sequencing on an Illumina MiniSeq machine, the number of co-captured bases was determined of each sample. Figure 4 shows the fraction of covered bases in lOOkb the region surrounding the location of the primer set. The x-axes show the average length of the DNA fragments of these tissues after the treatment.
Example 2 - Investigation of further parameters and conditions
Further investigations were performed to determine if further increasing the incubation temperature would allow the incubation time to be reduced. SI nuclease was used. SI processes ssDNA which here is used to increase the number of blunt ended dsDNA fragments that can be ligated during the proximity ligation step. Similar to the Nlalll experiments in Example 1, the percentage of covered bases in the regions flanking the sequence used for enrichment is a direct measure for the influence of the permeabilization step on the accessibility of the SI nuclease enzyme to the crosslinked DNA.
Methods
Single 10 pm sequential sections of three FFPE tissue blocks of varying quality (DNA fragment length) were used per condition. Deparaffmization was performed by heating the samples for 3 minutes at 80°C followed by centrifugation for 2 minutes. Paraffin wax was manually removed after which samples were treated for 120min at 80°C, 15, 30, 60 and 120min at 90°C, 5, 15, 30 and 60 minutes at 95 and 98°C with SDS shaking at 800rpm. Following the permeabilization step samples were treated similar regarding SI digestion, ligation, reversal of the crosslinks and DNA purification. DNA yields for each condition were measured and quality of the proximity reshuffling was determined following NGS adapter ligation and hybrid capture using a panel spanning BRCA1 and BRCA2, about lOOkb probed region each. After sequencing on an Illumina MiniSeq machine, the percentage of co-captured bases were determined for each sample within a 1200kb region surrounding the location of the probes (viewpoint) used for enrichment.
Results
The effect of the permeabilization condition is measured on two points: 1) effect on DNA yield and 2) effect on covered bases surrounding the viewpoints. First the DNA yields of which the values are presented in Table 1 and in Figure 5.
Table 1 : DNA yields (ng) retrieved from each experimental condition.
Figure imgf000031_0001
Though the absolute yield for the three different FFPE samples vary, yields up to 15min incubation are comparable between the different incubation temperatures. Within each temperature series a decline in yield is observed with an increase in incubation time. This could be due to DNA degradation induced by the elongated incubation at high temperature but is more likely explained by an increase in DNA fragments for which crosslinks are reversed during the incubation releasing them from the insoluble sample into the soluble fraction which is removed in the procedure. For the proximity ligation to take place it is important for fragments to remain (at least partially) intact and remain connected via crosslinks. In that respect the optimal incubation time using at 90°C lies around 30min, at 95°C lies around 5 to 30min and at 98°C around 5 to 15min.
Permeability of the sample is measured using the fraction of covered bases in the target region which are presented in Figure 6.
For the different temperatures a similar maximum fraction of covered bases is observed indicating that temperatures up to at least 98°C can be used for permeabilization. The optimal incubation time differs per sample and differs per temperature. In agreement with the data presented above, optimal temperatures for higher quality samples are shorter compared to those for the poorer quality samples. (FI 62 vs FI 80 respectively). All tested conditions resulted in data of sufficient quality, but aiming to define the incubation conditions that result in highest quality data for all sample qualities the conditions are 30min incubation at 90°C, 15-30min at 95°C or 15min incubation at 98°C.
Conclusions
The following permeabilization conditions are optimal when combining yield and experimental quality results: 2 hours at 80°C or 30min at 90°C or 20min (15-30min) at 95°C or 12min (10-15min) at 98°C.
Example 3 - Single-step recovery of soluble RNA and cross-linked DNA using the present permeabilization conditions
Deparaffinized samples were sonicated on a Covaris M220 for 300 seconds, Duty factor 20%, Power 75 Watts, 200 cycles/burst at 20°C. The samples were then permeabilised by incubation for 15, 30, 60 or 120min at 65°C, 80°C, 90°C or 95°C with SDS shaking at 800rpm. The permeabilised samples were then centrifuged for 1 min at lOOOOxg.
Surprisingly, the inventors determined that following centrifugation, soluble RNA was released into the supernatant whilst cross-linked DNA - in particular cross-linked genomic DNA - was retained in the pellet. RNA released in the supernatant was purified using magnetic beads (e.g. from the truXTRAC FFPE RNA Plus Kit (Covaris)) or spin columns (e.g. from the PureLink RNA mini kit (Thermo Fisher Scientific)). All purification methods isolated sufficient concentrations of RNA from the supernatant for sufficient cDNA to be transcribed, even if there was a low starting concentration of RNA. As expected, FFPE samples yielded less GAPDHthan the cell samples. RNA Quantity
Incubation at 65°C caused a very slight increase in RNA release, however this was not significant (Figure 7). 15 minutes of incubation resulted in a plateau of RNA release even when the incubation time was increased to 120 minutes. Degradation of RNA can be seen in samples incubated for 120 minutes at temperatures of 90 and 95°C. The greatest increase in RNA release can be seen in samples incubated for 30 to 60 minutes at 90 to 95°C. However, there was a slight decrease in RNA yield when incubated for 120 minutes at 95°C, indicating slight RNA degradation
The results in Figure 8 show that suitable incubation conditions include 90°C for 60 minutes, 90°C for 30 minutes and 80°C for 60 minutes. The ideal conditions appear to be 30 to 60 minutes at 90 to 95°C.
RNA Quality
The quality of the RNA in each sample was determined by reverse transcribing the RNA then assessing the presence of amplicons from one housekeeping gene (TATA binding protein; TBP) at different distances from the poly A tail. Figure 9A indicates the relative size and position of the primer pairs used. If there is a large reduction in amplifiability of all amplicons, it is evident that the reverse transcriptase was inhibited in all sections of the housekeeping gene. If the amplicons furthest from the poly A tail are less abundant, then it can be assumed that the longer RNA strands have been degraded or sub optimally de-crosslinked. The greater the difference in amplification, the worse the quality of template RNA.
Figure 9B shows the difference in amplifiability when incubated for only 15 minutes at varying temperatures. When the temperature increases, the difference in amplifiability decreases, indicating an increase in RNA quality. Incubation at 95°C causes a decrease in amplifiability of the second amplicon, TBP2, indicating a decrease in RNA quality. This pattern is the same when incubation is increased to 30 minutes and 60 minutes (Figure 9C and 9D). Only when incubation is increased to 120 minutes is when a greater increase in the difference of amplification is observed, which is more extreme as the temperature increases (Figure 9E).
Overall, the best RNA quality is retained when the samples are incubated at 90°C for 15 to 60 minutes. A two-way ANOVA was conducted on these data points and no significant difference was detected when samples were incubated within this “ideal window”. This conclusion is also supported by the absolute data values.
These data demonstrate that the present permeabilization conditions can be used to isolate soluble RNA and cross-linked genomic DNA from the same FFPE sample. This is in contrast to commercially available kits, which result in essentially complete reversal of crosslinking in both RNA and genomic DNA and are thus unsuitable for methods requiring proximity ligation techniques to be applied to the genomic DNA. At least in part, the commercially available kits typically involve the addition of proteinase K at concentrations/conditions that result in DNA molecules that were connected via a protein bridge being separated. Thus these standard proteinase K treatments, which are not required in the present methods, contribute to the incompatibility with co-recovery of cross- linked DNA and soluble RNA.
All publications mentioned in the above specification are herein incorporated by reference. Various modifications and variations of the described methods and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention which are obvious to those skilled in molecular biology or related fields are intended to be within the scope of the following claims.

Claims

1. A method for determining at least part of the sequences of DNA fragments from a sample of fragmented cross-linked DNA; which comprises the following steps: a) providing a sample of fragmented cross-linked DNA; b) permeabilizing the sample; c) optionally, further fragmenting the cross-linked DNA; d) optionally, repairing the ends of DNA fragments; e) ligating the cross-linked DNA fragments; f) reversing the cross-linking; and g) determining at least part of the sequences of the DNA fragments; wherein step (b) comprises incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours.
2. A method for determining the presence or absence of a mutation in a genomic region of interest comprising a target nucleotide sequence, comprising the steps of: a) providing a sample of fragmented cross-linked DNA; b) permeabilizing the sample; c) optionally, further fragmenting the crosslinked DNA; d) optionally, repairing the ends of DNA fragments; e) ligating the fragmented crosslinked DNA; f) reversing the crosslinking; g) optionally fragmenting the DNA of step f), h) optionally, ligating the fragmented DNA of step f) or g) to at least one adaptor; i) optionally, (1) amplifying the ligated DNA fragments of step f) or g) comprising the target nucleotide sequence using at least one primer which hybridises to the target nucleotide sequence, or amplifying the ligated DNA fragments of step g) using at least one oligonucleotide primer which hybridises to the target nucleotide sequence and at least one oligonucleotide primer which hybridises to the at least one adaptor and/or (2) capturing the ligated DNA fragments of step f) to h) comprising the target nucleotide sequence using a capture probe to separate (amplified) ligated DNA fragments comprising the target nucleotide sequence from (amplified) ligated DNA fragments not comprising the target nucleotide sequence; j) determining at least part of the sequence of the (amplified) ligated DNA fragments of step f), g), h) or i) comprising the target nucleotide sequence preferably using high throughput sequencing; k) aligning the determined sequences of the (amplified) ligated DNA fragments to a reference sequence. 1) identifying the presence or absence of a genetic mutation in the determined sequences; wherein step (b) comprises incubating the sample at about 75 °C to about 100 °C for about 1 minute to about 4 hours.
3. The method according to claim 1 or 2 wherein step (b) comprises incubating the sample for about 5 minutes to about 3 hours, about 10 minutes to about 3 hours, about 30 minutes to about 3 hours, about 1 hour to about 3 hours, or from about 2 hours to about 3 hours.
4. The method according to claim 1 or 2 wherein step (b) comprises incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 4 hours.
5. The method according to claim 3 or 4 wherein step (b) comprises incubating the sample for about 2 hours.
6. The method according to claim 5 wherein step (b) comprises incubating the sample at about 80 °C.
7. The method according to claim 1 or 2 wherein step (b) comprises incubating the sample for less than about 1 hour, preferably for about 10 to about 30 minutes.
8. The method according to claim 7 wherein step (b) comprises incubating the sample at about 90 °C to about 100 °C.
9. The method according to claim 1 or 2 wherein step (b) comprising incubating the sample at:
(i) about 80 °C for about 30 minutes to about 4 hours;
(ii) about 90 °C for about 5 minute to about 2 hours;
(iii) about 95 °C for about 1 minute to about 1 hour; or
(iv) about 98 °C for about 1 minutes to about 45 minutes.
10. The method according to claim 9 wherein step (b) comprises incubating the sample at about 80 °C for about 2 hours.
11. The method according to claim 9 wherein step (b) comprises incubating the sample at about 90 °C for about 30 minutes;
12. The method according to claim 9 wherein step (b) comprises incubating the sample at about 95 °C for about 20 minutes.
13. The method according to claim 9 wherein step (b) comprises incubating the sample at about 98 °C for about 12 minutes.
14. The method according to any preceding claim wherein the optional further fragmentation of step (c) comprises treating the sample with a restriction enzyme.
15. The method according to any preceding claim wherein the cross-linked sample is a tissue sample.
16. The method according to claim 15 wherein the tissue sample is a tumour sample.
17. The method according to any preceding claim wherein the cross-linked sample is a Formalin- Fixed Paraffin-Embedded (FFPE) sample.
18. The method according to any preceding claim wherein there is no cell lysis performed following the permeabilization step.
19. The method according to any preceding claim, further comprising building a contig of the genomic region of interest from the determined sequences.
20. The method according to any of claims 2 to 19 wherein the mutation is a SNV, a deletion, an insertion, an inversion and/or a translocation.
21. The method according to claim 20 wherein a deletion, insertion, inversion and/or translocation is identified based on the presence of chromosomal breakpoints in analyzed fragments.
22. A method for recovering RNA and cross-linked DNA from a sample comprising cross-linked cells or tissue; which comprises the step of permeabilizing the sample by an incubation at about 75 °C to about 100 °C for about 1 minute to about 4 hours.
23. The method according to claim 22, further comprising a step of separating the RNA from the cross-linked DNA after the permeabilization step.
24. The method according to claim 23, wherein separation step comprises centrifugation of the permeabilized sample.
25. The method according to any of claims 22 to 24 wherein the RNA is released or releasable as soluble RNA following the permeabilization step.
26. The method according to any of claims 22 to 25 wherein the sample is a cross-linked tissue sample.
27. The method according to any of claims 22 to 26 wherein the sample comprises fragmented cross-linked DNA.
28. The method according to claim 26 or 27 wherein the cross-linked sample is a Formalin-Fixed Paraffin-Embedded (FFPE) sample.
29. The method according to claim 27 or 28 wherein there is no cell lysis step performed following the permeabilization step.
30. The method according to any of claims 22 to 29 wherein the method further comprises sequencing and/or amplifying at least part of the recovered RNA.
31. The method according to any of claims 22 to 30 wherein the method further comprises subjecting at least part of the recovered RNA to reverse-transcription PCR, quantitative PCR and/or next generation sequencing.
32. The method according to any of claims 22 to 31, wherein the method further comprises determining at least part of the sequences of DNA fragments from the cross-linked DNA by a method which comprises the following steps: a) optionally, fragmenting the cross-linked DNA; b) optionally, repairing the ends of DNA fragments; c) ligating the cross-linked DNA fragments; d) reversing the cross-linking; and e) determining at least part of the sequences of the DNA fragments.
33. The method according to any of claims 22 to 31, wherein the method further comprises determining the presence or absence of a mutation in a genomic region of interest in the cross-linked DNA, wherein the genomic region of interest comprises a target nucleotide sequence, comprising the steps of: a) optionally, fragmenting the crosslinked DNA; b) optionally, repairing the ends of DNA fragments; c) ligating the fragmented crosslinked DNA; d) reversing the crosslinking; e) optionally fragmenting the DNA of step d), f) optionally, ligating the fragmented DNA of step d) or e) to at least one adaptor; g) optionally, (1) amplifying the ligated DNA fragments of step d) or e) comprising the target nucleotide sequence using at least one primer which hybridises to the target nucleotide sequence, or amplifying the ligated DNA fragments of step e) using at least one oligonucleotide primer which hybridises to the target nucleotide sequence and at least one oligonucleotide primer which hybridises to the at least one adaptor and/or (2) capturing the ligated DNA fragments of step d) to f) comprising the target nucleotide sequence using a capture probe to separate (amplified) ligated DNA fragments comprising the target nucleotide sequence from (amplified) ligated DNA fragments not comprising the target nucleotide sequence; j) determining at least part of the sequence of the (amplified) ligated DNA fragments of step d), e), f) or g) comprising the target nucleotide sequence preferably using high throughput sequencing; k) aligning the determined sequences of the (amplified) ligated DNA fragments to a reference sequence. l) identifying the presence or absence of a genetic mutation in the determined sequences.
34. The method of claim 32 or 33, wherein the sample comprises fragmented cross-linked DNA and step (a) is an optional further fragmentation step.
35. The method according to any of claims 22 to 34 wherein the permeabilization step comprises incubating the sample for about 5 minutes to about 3 hours, about 10 minutes to about 3 hours, about 30 minutes to about 3 hours, about 1 hour to about 3 hours, or from about 2 hours to about 3 hours.
36. The method according to any of claims 22 to 34 wherein the permeabilization step comprises incubating the sample at about 75 °C to about 100 °C for about 30 minutes to about 4 hours.
37. The method according to claim 35 or 36 wherein the permeabilization step comprises incubating the sample for about 2 hours.
38. The method according to claim 37 wherein the permeabilization step comprises incubating the sample at about 80 °C.
39. The method according to any of claims 22 to 34 wherein the permeabilization step comprises incubating the sample for less than about 1 hour, preferably for about 10 to about 30 minutes.
40. The method according to claim 39 wherein the permeabilization step comprises incubating the sample at about 90 °C to about 100 °C.
41. The method according to any of claims 22 to 34 wherein the permeabilization step comprises incubating the sample at:
(i) about 80 °C for about 30 minutes to about 4 hours;
(ii) about 90 °C for about 5 minute to about 2 hours;
(iii) about 95 °C for about 1 minute to about 1 hour; or
(iv) about 98 °C for about 1 minutes to about 45 minutes.
42. The method according to claim 41 wherein the permeabilization step comprises incubating the sample at about 80 °C for about 1 to 2 hours.
43. The method according to claim 41 wherein the permeabilization step comprises incubating the sample at about 90 °C to 95°C for about 15 to 60 minutes.
44. The method according to claim 41 wherein the permeabilization step comprises incubating the sample at about 90 °C for about 15 to 60 minutes, preferably about 90 °C for about 30 to 60 minutes.
PCT/EP2021/061685 2020-05-04 2021-05-04 Method WO2021224233A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
GB2006546.2 2020-05-04
GBGB2006546.2A GB202006546D0 (en) 2020-05-04 2020-05-04 Method
GBGB2010492.3A GB202010492D0 (en) 2020-07-08 2020-07-08 Method
GB2010492.3 2020-07-08
GBGB2101819.7A GB202101819D0 (en) 2021-02-10 2021-02-10 Method
GB2101819.7 2021-02-10

Publications (1)

Publication Number Publication Date
WO2021224233A1 true WO2021224233A1 (en) 2021-11-11

Family

ID=75787119

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/061685 WO2021224233A1 (en) 2020-05-04 2021-05-04 Method

Country Status (1)

Country Link
WO (1) WO2021224233A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008021419A2 (en) * 2006-08-17 2008-02-21 Panomics, Inc. Nucleic acid quantitation from tissue slides
US20090286305A1 (en) * 2005-04-15 2009-11-19 Wei-Sing Chu Method for non-destructive macromolecule extraction from biological samples on slide
WO2012005595A2 (en) 2010-07-09 2012-01-12 Wouter Leonard De Laat V3-d genomic region of interest sequencing strategies
CN105838783A (en) * 2016-03-22 2016-08-10 北京交通大学 Rapid separation and detection kit and method for fragmented crosslinked DNA
WO2017136198A1 (en) * 2016-02-01 2017-08-10 Mayo Foundation For Medical Education And Research Methods and materials for extracting chromatin

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090286305A1 (en) * 2005-04-15 2009-11-19 Wei-Sing Chu Method for non-destructive macromolecule extraction from biological samples on slide
WO2008021419A2 (en) * 2006-08-17 2008-02-21 Panomics, Inc. Nucleic acid quantitation from tissue slides
WO2012005595A2 (en) 2010-07-09 2012-01-12 Wouter Leonard De Laat V3-d genomic region of interest sequencing strategies
WO2017136198A1 (en) * 2016-02-01 2017-08-10 Mayo Foundation For Medical Education And Research Methods and materials for extracting chromatin
CN105838783A (en) * 2016-03-22 2016-08-10 北京交通大学 Rapid separation and detection kit and method for fragmented crosslinked DNA

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GULISA TURASHVILI ET AL: "Nucleic acid quantity and quality from paraffin blocks: Defining optimal fixation, processing and DNA/RNA extraction techniques", EXPERIMENTAL AND MOLECULAR PATHOLOGY, vol. 92, no. 1, 21 September 2011 (2011-09-21), pages 33 - 43, XP028464113, ISSN: 0014-4800, [retrieved on 20110921], DOI: 10.1016/J.YEXMP.2011.09.013 *
PAULA J P DE VREE ET AL: "Targeted sequencing by proximity ligation for comprehensive variant detection and local haplotyping", NATURE BIOTECHNOLOGY, vol. 32, no. 10, 17 August 2014 (2014-08-17), pages 1019 - 1025, XP055201940, ISSN: 1087-0156, DOI: 10.1038/nbt.2959 *

Similar Documents

Publication Publication Date Title
EP2591125B1 (en) V3-d genomic region of interest sequencing strategies
EP2002017B1 (en) High throughput detection of molecular markers based on restriction fragments
US20090247415A1 (en) Strategies for trranscript profiling using high throughput sequencing technologies
WO2016040446A1 (en) Methods for selectively suppressing non-target sequences
EP3610032A2 (en) Methods of attaching adapters to sample nucleic acids
US20180100180A1 (en) Methods of single dna/rna molecule counting
US20160040228A1 (en) Sequencing strategies for genomic regions of interest
US20220325317A1 (en) Methods for generating a population of polynucleotide molecules
WO2021224233A1 (en) Method
WO2018081666A1 (en) Methods of single dna/rna molecule counting
WO2021224225A1 (en) Method
WO2023012195A1 (en) Method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21723247

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21723247

Country of ref document: EP

Kind code of ref document: A1