WO2021019197A1 - Methods of identifying microsatellite instability - Google Patents

Methods of identifying microsatellite instability Download PDF

Info

Publication number
WO2021019197A1
WO2021019197A1 PCT/GB2019/052148 GB2019052148W WO2021019197A1 WO 2021019197 A1 WO2021019197 A1 WO 2021019197A1 GB 2019052148 W GB2019052148 W GB 2019052148W WO 2021019197 A1 WO2021019197 A1 WO 2021019197A1
Authority
WO
WIPO (PCT)
Prior art keywords
microsatellite
loci
msi
markers
listed
Prior art date
Application number
PCT/GB2019/052148
Other languages
French (fr)
Inventor
John Burn
Michael Stewart JACKSON
Francisco Mauro SANTIBANEZ-KOREF
Richard GALLON
Harsh SHETH
Original Assignee
University Of Newcastle Upon Tyne
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Newcastle Upon Tyne filed Critical University Of Newcastle Upon Tyne
Priority to PCT/GB2019/052148 priority Critical patent/WO2021019197A1/en
Publication of WO2021019197A1 publication Critical patent/WO2021019197A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6827Hybridisation assays for detection of mutation or polymorphism

Definitions

  • the invention provides novel methods for evaluating levels of microsatellite instability in a sample or evaluating the biological significance of sequence variation identified during sequencing. Corresponding kits are also provided.
  • Microsatellites are regions of genomic DNA comprising simple repetitive sequences where 1 - 6bp long units are tandemly repeated, often 5-50 times. Microsatellite loci are classified based on the length of the smallest repetitive unit. For example, loci with repetitive units of 1 to 5 base pairs in length are termed“mono-nucleotide”,“di-nucleotide”,“tri-nucleotide”,“tetra- nucleotide”, and“penta-nucleotide” repeat loci, respectively.
  • Microsatellite loci in normal genomic DNA of most diploid species, including human are present in two copies, or alleles. By and large, microsatellite alleles are normally maintained at constant length in a given individual and its descendants.
  • microsatellites are known to be unstable during meiotic and mitotic replication in eukaryotes and prokaryotes. Instability in the length of microsatellites has been observed in some tumours.
  • Factors which affect the stability of microsatellites include the length of the microsatellite, repeat unit length, base composition, and the sequence surrounding the microsatellite. For example, dinucleotide repeats tend to be more mutable than tetra nucleotide repeats of the same length.
  • MSI Microsatellite instability
  • MMR mismatch repair
  • Mismatch repair (MMR) deficiency affects approximately one in six colorectal cancers (CRCs) (Boland et al., 1998).
  • Lynch syndrome (LS) an inherited predisposition to cancer caused by germline pathogenic variants affecting one allele of an MMR gene, accounts for approximately one in five MMR deficient CRCs (Hampel et al., 2008).
  • Assessment of MSI (or MMR) status can inform patient management and is recommended in all CRCs by national and international guidelines to screen for LS (Balmana et al., 2013; Stoffel et al., 2015; Newland et al., 2017).
  • MMR status of samples is commonly assessed by immunohistochemistry (IHC) of MMR proteins, or PCR fragment length analysis (FLA) of microsatellites to detect increased MSI.
  • IHC immunohistochemistry
  • FLA PCR fragment length analysis
  • MSI-H high levels of MSI
  • MSI-H is defined by mutation of 330-40% of microsatellites analysed (Boland et al., 1998).
  • FLA has been shown to be reliable when sample tumour cell content is 310% (Berg et al., 2000), and IHC can detect focal MMR deficiency (Chapusot et al., 2002). Both are also considered to be relatively cheap and cost-effective for LS screening (Snowsill et al., 2014).
  • MMR deficiency testing has been poor; only 28% of 152,993 CRC cases were analysed during 2010-2012 in the USA (Shaikh et al., 2018), with a similar proportion being analysed in the UK. This is despite guidelines recommending testing and estimates that only 1.2% of LS gene carriers were known to clinical services in the US in 2011 (Hampel & de la Chapelle, 201 1). It is estimated that only 5% of carriers are currently known in the UK.
  • Sequencing-based MSI assays determine the mutation status of microsatellites and then use the proportion of microsatellites that are mutated to classify a sample. Sensitivities and specificities >95% have been reported when comparing the performance of several such classifiers using microsatellites captured by gene panel sequencing (Kautto et al., 2016; Zhu et al., 2018), and such methods can identify samples misclassified by conventional MMR deficiency tests, highlighting that there is no gold standard reference method (Hause et al., 2016). However, the high cost of gene panel sequencing (Marino et al., 2018) may be a barrier to its widespread deployment for MSI testing, or for the detection of LS by MMR gene sequencing.
  • Targeted sequencing-based MSI assays using a specific panel of microsatellites have been developed that, similar to gene panel-based classifiers, classify samples by the proportion of microsatellites that are mutated (Hempelmann et al., 2015; Hempelmann et al., 2018; Waalkes et al., 2018).
  • different marker proportions are used as a threshold with different marker sets (Hempelmann et al., 2015; Kautto et al.
  • thresholds can be uncertain when relatively few microsatellites ( ⁇ 20) are analysed (Hempelmann et al. , 2015). Variable or indeterminate thresholds can be compensated for by larger marker panels, albeit with increased sequencing costs. Ideally such assays need to be competitive with both more expensive, and more comprehensive, gene panel sequencing, as well as the cheaper, and lower throughput, methods of IHC and FLA.
  • the inventors have previously used amplicon sequencing of short (7-12bp), monomorphic (i.e. where no length variation has been reported in the human population), mono-nucleotide repeats (MNRs) to classify the MSI status of CRCs, without needing matched normal tissue (see WO 2018/037231 and Redford et al., 2018).
  • Short MNRs were selected as longer (>15bp) microsatellites are associated with increased PCR and sequencing error (Fazekas et al., 2010), and it has been reported that 9-15bp microsatellites give the greatest differences in mutation frequencies between MSI-H and MSS samples using NGS (Maruvka et al., 2017).
  • Their previous method for MSI detection accounts for the individual sensitivity and specificity of each marker and achieved >97% accuracy in 209 CRCs with only 17 markers, using FLA as the reference method.
  • the inventors have now developed a new MSI marker set with 24 short MNRs (listed in Table 1 below) that has been tested and validated on CRC patient samples. They have shown that the 24 MSI marker panel provided herein achieves 100% accuracy in detecting MSI in real CRC patient samples and is therefore suitable for clinical cancer diagnostics.
  • the new MSI marker set is suitable for clinical practice, the inventors followed joint guidelines from the Association for Molecular Pathology and the College of American Pathologists (Jennings et al., 2017). This includes validation of diagnostic accuracy using independent sample cohorts, assessment of reproducibility and detection limits, definition of quality control criteria, and deployment in an independent diagnostic laboratory.
  • the new MSI marker set described herein is therefore ready and suitable for clinical use.
  • the inventors have also performed further analysis on subsets of the MSI markers used in the validated 24 marker set described herein. Surprisingly, the inventors have found that 100% detection accuracy may also be achieved when the number of MSI markers used is significantly reduced. Backward-forward stepwise selection was used to identify a 6-marker subset of equal accuracy to the 24 marker panel. Additional computational analysis was also performed to identify other 3 to 6 marker subsets that may also be equally informative.
  • MSI marker panels with a small number of highly informative markers are therefore also provided herein, which can be used to classify MSI status in a variety of different samples, including tissue or liquid biopsies. Further details of such subsets of MSI markers are provided below.
  • the invention is based on amplifying and sequencing of a plurality of MSI marker loci to classify MSI status in a sample.
  • the inventors have demonstrated the invention using single molecule molecular inversion probes (smMIPs) (Hiatt et al. , 2013) to amplify the MSI marker loci in multiplex.
  • smMIPs single molecule molecular inversion probes
  • Multiplex amplification and sequencing techniques are particularly advantageous because they allow for automated sequence analysis and high throughput diagnostics.
  • any other suitable means for amplifying and sequencing the informative MSI markers described herein may also be used (e.g. conventional PCR may be used).
  • the invention is therefore not limited to using smMIPs or any other specific probes or primers described herein. Other appropriate methods are described in more detail below.
  • a method for evaluating levels of microsatellite instability in a sample comprising:
  • (i) comprise GM07 and up to four other microsatellite loci listed in Table 1 ; or
  • (ii) comprise LR44 and up to four other microsatellite loci listed in Table 1 ; or
  • (iii) comprise LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44;
  • a method for evaluating the biological significance of sequence variation identified during sequencing comprising:
  • (i) comprise GM07 and up to four other microsatellite loci listed in Table 1 ; or
  • (ii) comprise LR44 and up to four other microsatellite loci listed in Table 1 ; or (iii) comprise LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44;
  • each microsatellite loci has a single nucleotide polymorphism (SNP) loci within a short distance of the microsatellite loci and said amplifying step amplifies both the microsatellite loci and the associated SNP in a single amplicon;
  • SNP nucleotide polymorphism
  • LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44; for evaluating levels of microsatellite instability in a sample, or for evaluating the biological significance of sequence variation identified during sequencing of a sample is also provided.
  • the one or more microsatellite loci may comprise LR52 and up to 22 other microsatellite loci listed in Table 1 , optionally wherein the up to 22 other microsatellite loci include GM07 and/or LR44.
  • the one or more microsatellite loci may comprise LR52 and up to 15 other microsatellite loci from Table 1 , optionally wherein the up to 15 other microsatellite loci include GM07 and/or LR44.
  • the one or more microsatellite loci may comprise LR52 and up to 9 other microsatellite loci from Table 1 , optionally wherein the up to 9 other microsatellite loci include GM07 and/or LR44.
  • the one or more microsatellite loci may comprise LR52 and from 2 to 9 other microsatellite loci from Table 1 , optionally wherein the 2 to 9 other microsatellite loci include GM07 and/or LR44.
  • step (a) may comprise amplifying three or more microsatellite loci listed in Table 1.
  • the three or more microsatellite loci may comprise two or three markers selected from: GM07, LR44 and LR52.
  • the three or more microsatellite loci may comprise or consist of a microsatellite loci combination listed in Table 6, optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
  • the three or more microsatellite loci may comprise or consist of the combination of three microsatellite loci listed in Table 6 (3mer), optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
  • the three or more microsatellite loci may comprise or consist of a combination of four microsatellite loci listed in Table 6 (4mers), optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
  • the three or more microsatellite loci may comprise or consist of a combination of five microsatellite loci listed in Table 6 (5mers), optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
  • the three or more microsatellite loci may comprise or consist of a combination of six microsatellite loci listed in Table 6 (6mers), optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
  • the three or more microsatellite loci may comprise GM07, GM1 1 , GM14, LR36, LR44 and LR52.
  • step (a) may comprise amplifying a total of up to 16 microsatellite loci.
  • step (a) may comprise amplifying a total of up to 10 microsatellite loci.
  • step (a) may comprise amplifying a total of from 3 to 10 microsatellite loci.
  • the sample may be a tissue or biological fluid sample.
  • the sample may be from a subject that is suspected of having, at risk of having, or being predisposed to cancer, optionally wherein the cancer is colorectal cancer or Lynch syndrome.
  • the methods described above may be for use in identifying mismatch repair defects is also provided, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of a mismatch repair defect.
  • the methods described above may be for use in identifying MSI-H, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of the sample having high levels of microsatellite instability (MSI-H).
  • kits are also provided, wherein the kit is for amplifying:
  • LR52 and up to twenty-three other microsatellite mono-nucleotide repeat loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44,
  • kit comprises primers and/or probes for specifically amplifying the microsatellite loci of (i), (ii) or (iii).
  • the kit may be for amplifying LR52 and up to 22 other loci listed in Table 1 , optionally wherein the up to 22 other microsatellite loci include GM07 and/or LR44.
  • the kit further may comprise a thermostable polymerase and/or dNTPs or analogs thereof, optionally wherein the dNTPs or analogs thereof are labeled.
  • nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. It is to be understood that this invention is not limited to the particular methodology, protocols, and reagents described, as these may vary, depending upon the context they are used by those of skill in the art.
  • Figure 1 shows MSI classification of CRCs. MSI classifier scores versus diagnosis by the MSI Analysis System v1.2 (Promega) for CRCs analysed in (A) the training cohort, and (B) the validation cohort.
  • Figure 2 shows assay robustness to sample heterogeneity.
  • A Classifier scores from mixtures of MSI-H cell line and microsatellite stable (MSS) peripheral blood lymphocytes (PBL) DNA samples.
  • Figure 3 shows assay robustness to variation in quantity of sample DNA.
  • Figure 4 shows assay validation in an independent laboratory. MSI classifier scores versus diagnosis by the MSI Analysis System v1.2 (Promega) for 23 CRCs tested by the Northern Genetics Service (Newcastle Hospitals NHS Foundation Trust, Newcastle, UK).
  • Figure 5 shows validation using a DNA and read mixing series.
  • A Observed versus expected relative frequency of microsatellite length mutations in the three replicate sample mixture series, ranging from 0.78-50.00% MSI-H cell DNA.
  • B Classifier scores of simulated and empirical sample mixture series, ranging from 0.00-100.00% MSI-H cell DNA.
  • Figure 6 shows validation using a DNA and read dilution series.
  • A Visualisation of amplicons from the template DNA dilution series. Agarose gel electrophoresis of amplicons generated from 9 samples by the MSI assay. Amplicons are visible at 240-270bp. Primer dimers are visible at 80bp. Marker (M): GeneRuler 1 kb Plus (ThermoFisher). Top panel: cell line controls. Middle panel: FFPE MSI-high CRCs. Bottom panel: formalin fixed paraffin embedded (FFPE) MSS CRCs.
  • B Correlation of the number of molecular barcodes detected, and the input quantity of template DNA.
  • C Comparison of empirically observed and simulated sample dilution series, and the association between molecular barcodes/marker and classifier score.
  • Figure 7 shows the influence of subset size on marker accuracy. Shown is the distribution of accuracies of all combinations of the microsatellite loci markers of Table 1 , from the markers on their own (“1”) to combinations of 6 markers (“6”).
  • microsatellite mono-nucleotide repeat loci GM07, LR44 and/or LR52 are particularly informative markers for evaluating microsatellite instability in a sample.
  • the invention therefore provides novel methods for evaluating levels of microsatellite instability (MSI) in a sample using a minimal MSI marker set, comprising one or more of the microsatellite mono-nucleotide repeat loci GM07, LR44 and/or LR52.
  • MSI microsatellite instability
  • a minimal MSI marker set comprising one or more of the microsatellite mono-nucleotide repeat loci GM07, LR44 and/or LR52.
  • SNP single nucleotide polymorphism
  • the SNPs are typically within 80 base pairs of the microsatellite locus of interest, for example within 50 base pairs. The SNP may even be within 30 base pairs of the microsatellite locus of interest.
  • the single nucleotide polymorphism (SNP) has a minor allele frequency between above 0.05.
  • the single nucleotide polymorphism (SNP) has a high heterozygosity.
  • the methods provided herein are particularly useful as they provide an accurate means for MSI classification using a minimal number of MSI markers. This provides a new means for reducing cost and complexity of MSI testing, whilst maintaining accuracy.
  • the methods are particularly useful for assessing MSI in a sample from a human subject.
  • the sample comprises nucleic acid (DNA) comprising the loci of interest.
  • the sample may be a tissue sample (e.g. tumour tissue sample) or a biological fluid sample that has been obtained from a human subject.
  • a biological fluid sample encompasses any fluid sample (e.g. liquid biopsy) obtained from the subject.
  • Suitable biological fluid samples include e.g. blood, serum, plasma, urine etc.
  • Testing biological samples using the methods described herein may be particularly useful e.g. for early cancer detection in those at high risk or monitoring for disease recurrence (by assessing circulating tumour or cell free DNA). Methods for obtaining appropriate samples from a subject are well known in the art.
  • the methods provided herein are particularly useful for assessing MSI in a sample obtained from a subject (e.g. human subject).
  • the subject may be a subject that is suspected of being at risk of developing cancer, e.g. colorectal cancer or Lynch syndrome.
  • the subject may be known to be at risk of developing cancer, e.g. colorectal cancer or Lynch syndrome.
  • the subject may be suspected of being predisposed to cancer, e.g. colorectal cancer or Lynch syndrome, or they may be known to be predisposed to cancer, e.g. colorectal cancer or Lynch syndrome.
  • the subject may have cancer e.g. colorectal cancer or Lynch syndrome already.
  • the methods may also be used to diagnose cancer and/or monitor disease recurrence after cancer treatment. Based on the MSI classification, an appropriate class of therapeutics or specific therapeutics for treatment of the cancer may be selected, for example pembrolizumab, irinotecan, bevacizumab, cisplatin, carboplatin or 5-fluorouracil, all of which have been suggested as showing differential effectiveness between the treatment of MMR defective or MSI-H cancers compared to MSS cancers.
  • the methods of the invention may also be useful as companion diagnostics for immune checkpoint blockade therapy.
  • the methods described herein assess MSI using one or more microsatellite mono-nucleotide repeat loci listed in Table 1 below.
  • the invention may therefore utilise one or more (i.e. 1 , 2, 3, 4, 5, 6 ,7 ,8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, or 24) markers that are present within the validated 24 marker panel of Table 1.
  • GM07, LR44 and LR52 are particularly informative markers for MSI, and accordingly the methods described herein use at least one of these markers. In some examples, it may be preferable to use at least two of these markers, for example, GM07 and LR44. Alternatively, GM07 and LR52, or LR44 and LR52, may be used.
  • the inventors have shown that a combination of GM07, LR52 and LR44 as a triplet of markers on its own can give an MSI classification accuracy of 1. Furthermore, the inventors have shown that all combinations of 6 markers (from those listed in Table 1) that have an MSI classification accuracy >0.999 contain at least one of GM07, LR52 and LR44.
  • the marker combinations provided herein can therefore achieve a clinically acceptable MSI classification accuracy with significantly fewer markers than was previously understood to be necessary, meaning that the associated methods and kits can be significantly cheaper and more efficient.
  • the marker combinations provided herein allow for an MSI classification accuracy of at least 0.9, preferably at least 0.95, more preferably at least 0.999 or 1.
  • the marker combinations provided herein can therefore achieve a clinically acceptable MSI classification accuracy with significantly fewer markers than was previously understood to be necessary, meaning that the associated methods and kits can be significantly cheaper and more efficient.
  • the marker combinations provided herein are therefore particularly advantageous in achieving a clinically acceptable MSI classification accuracy.
  • a combination of GM07, LR44 and LR52 may be used as the minimal marker set for accurate MSI classification.
  • a combination of all three informative markers i.e. GM07, LR44 and LR52
  • two or even only one of the three informative markers i.e. two or one of GM07, LR44 and/and LR52
  • the total number of microsatellite loci used e.g.
  • the core markers GM07, LR52 and LR44 provide particular benefits when included in the marker panels.
  • the three core markers as a triplet can give an accuracy of 1 , even in the absence of additional microsatellite loci, so that the skilled person will understand that although only one or two of the core markers may need to be provided in larger markers sets (of 4, 5, 6, 7 plus markers selected from Table 1), it may be beneficial to include all three of the core markers in the loci selected for methods of the invention, particularly when few (e.g. only 3 or 4) microsatellite markers of Table 1 are to be used.
  • Methods for evaluating MSI levels may therefore comprise amplifying and sequencing a minimal set of MSI markers that are selected from: (i) GM07 and up to four other microsatellite loci listed in Table 1 ; or (ii) LR44 and up to four other microsatellite loci listed in Table 1 ; (iii) LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44.
  • an evaluation of MSI levels may be performed using GM07 as a marker alone, or in combination with up to four other markers listed in Table 1.
  • GM07 was found by the inventors to be the marker that was present most frequently in the combinations of three, four, five or six markers from Table 1 that were able to classify MSI status with a high level (>0.999) of accuracy.
  • GM07 may not provide this level of accuracy on its own, it may be sufficiently informative on its own as an initial (e.g. preliminary) screen for MSI status.
  • it may be combined with other MSI markers (e.g.
  • GM07 may be combined with 0, 1 , 2, 3, or 4 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • GM07 may be combined with 2, 3, or 4 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • GM07 may also be combined with LR44 and/or LR52 (optionally in combination with other markers from table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • a MSI marker panel comprising GM07 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than GM07), which may include up to four other markers from the markers listed in Table 1.
  • the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including GM07).
  • the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including GM07).
  • the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as GM07 are used.
  • LR44 was also found by the inventors to be a marker that was frequently present in the combinations of three, four, five or six markers from Table 1 that were able to classify MSI status with a high level (>0.999) of accuracy. Although LR44 may not provide this level of accuracy on its own, it may be sufficiently informative on its own as an initial (e.g. preliminary) screen for MSI status. Alternatively, it may be combined with other MSI markers (e.g.
  • LR44 may be combined with 0, 1 , 2, 3, or 4 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). For example, LR44 may be combined with 2, 3, or 4 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • LR44 may also be combined with GM07 and/or LR52 (optionally in combination with other markers from table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • a MSI marker panel comprising LR44 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR44), which may include up to four other markers from the markers listed in Table 1.
  • the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR44).
  • the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR44).
  • the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR44 are used.
  • an evaluation of MSI levels may be performed using LR52 as a marker alone, or in combination with up to 23 or up to 22 other markers listed in Table 1.
  • LR52 was also found by the inventors to be a marker that was frequently present in the combinations of three, four, five or six markers from Table 1 that were able to classify MSI status with a high level (>0.999) of accuracy. Although LR52 may not provide this level of accuracy on its own, it may be sufficiently informative on its own as an initial (e.g. preliminary) screen for MSI status. Alternatively, it may be combined with other MSI markers (e.g.
  • LR52 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 or 22 (and optionally with all 23) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • LR52 may be combined with 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 or 22 (and optionally with all 23) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • a MSI marker panel comprising LR52 may therefore have a minimum of O, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR52), which may include up to 22 (or optionally up to 23) other markers from the markers listed in Table 1.
  • the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR52).
  • the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR52).
  • the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR52 are used.
  • An evaluation of MSI levels may be performed using LR52 in combination with GM07 and/or LR52 (optionally in combination with other markers from table 1 , and/orwith any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • LR52 and GM07 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or 21 (and optionally with all 22) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • a MSI marker panel comprising LR52 and GM07 may therefore have a minimum of O, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR52 and GM07), which may include up to 21 (or optionally up to 22) other markers from the markers listed in Table 1.
  • the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR52 and GM07).
  • the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR52 and GM07).
  • the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR52 and GM07 are used.
  • An evaluation of MSI levels may alternatively be performed using LR52 in combination with LR44 (optionally in combination with other markers from table 1 , and/orwith any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • LR52 and LR44 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or 21 (and optionally with all 22) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • a MSI marker panel comprising LR52 and LR44 may therefore have a minimum of O, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR52 and LR44), which may include up to 21 (or optionally up to 22) other markers from the markers listed in Table 1.
  • the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR52 and LR44).
  • the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR52 and LR44).
  • the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR52 and LR44 are used.
  • An evaluation of MSI levels may alternatively be performed using LR52 in combination with both LR44 and GM07 (optionally in combination with other markers from table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • LR52, LR44 and GM07 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 (and optionally with all 21) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • a MSI marker panel comprising LR52, GM07 and LR44 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR52, GM07 and LR44), which may include up to 20 (or optionally up to 21) other markers from the markers listed in Table 1.
  • the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR52, GM07 and LR44).
  • the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR52, GM07 and LR44).
  • the total number of markers may be reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR52, GM07 and LR44 are used.
  • one, two or all three of the following loci from Table 1 are not used as markers in the methods described herein: LR48, and/or LR17, and/or LR43.
  • the methods described herein do not include amplifying LR48 (on its own or in combination with LR17).
  • the methods described herein do not include amplifying LR17 (on its own, or in combination with LR43).
  • the methods described herein do not include amplifying LR43 (on its own or in combination with LR48).
  • the methods described herein do not include amplifying any of LR43, LR48, or LR17.
  • Table 6 Useful combinations of MSI markers from those listed in Table 1 are provided in Table 6, which provides different three, four, five and six marker combinations that can be used as a“core set” of markers (in other words, a minimal set of markers) for MSI classification because, on their own, these combinations of three, four, five and six markers are able to accurately classify MSI status in a sample.
  • Any of the three, four, five or six marker combinations shown in Table 6 may be used in the methods described herein, either on their own, or in combination with any of the other markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status) to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method.
  • any of the combinations provided in Table 6 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or 21 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • a MSI marker panel comprising any of the combinations listed in Table 6 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than the core markers of the combination listed in Table 6), which may include up to 20 (or optionally 21) other markers from the markers listed in Table 1.
  • the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including the core markers of the combination listed in Table 6).
  • the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative combinations of markers such as those listed in Table 6 are used.
  • a 6-marker panel of GM07, GM1 1 , GM14, LR36, LR44 and LR52 may be used.
  • GM07, GM11 , GM14, LR36, LR44 and LR52 may be used in the methods described herein, either on its own, or in combination with any of the other markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status) to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method.
  • the combination of GM07, GM1 1 , GM14, LR36, LR44 and LR52 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16 or 17 (and optionally all 18) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status).
  • a MSI marker panel comprising combination of GM07, GM1 1 , GM 14, LR36, LR44 and LR52 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than combination of GM07, GM 11 , GM14, LR36, LR44 and LR52), which may include up to 17 (or optionally 18) other markers from the markers listed in Table 1.
  • the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6 etc markers (including GM07, GM11 , GM14, LR36, LR44 and LR52).
  • markers including GM07, GM11 , GM14, LR36, LR44 and LR52.
  • the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative combinations of markers such as GM07, GM1 1 , GM14, LR36, LR44 and LR52 are used.
  • the inventors also describe how they validated a 24-marker panel (using the 24 markers listed in Table 1). Therefore, in the methods and kits described herein the 24 marker set of Table 1 may be used.
  • the combination of 24 markers listed in Table 1 may be used in the methods described herein, either on its own, or in combination with any of the other markers that are not listed in Table 1 but that may provide additional information on MSI status to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method.
  • microsatellite loci of Table 1 may be limited, for example less than 16 of the microsatellite loci of Table 1 may be included, or less than 12, or less than 10. From 3 to 10 microsatellite loci may be selected from Table 1 for use in the methods and kits of the invention, for example from 3 to 8.
  • the methods described herein recite the minimum (and in some cases also maximum) number of loci listed in Table 1 that can be used. For the avoidance of doubt, where a maximum number of loci from Table 1 is recited (e.g.“up to 4 other microsatellite loci listed in Table 1”, “up to twenty-three other microsatellite loci listed in Table 1”, up to 22 other microsatellite loci listed in Table 1”,“up to 15 loci listed in Table 1”,“up to 9 other microsatellite loci listed in Table 1” etc) this maximum number only refers to the markers listed in Table 1. In other words, additional markers that are not listed in Table 1 that may also be useful (e.g. because they provide information about some other aspect of the subject or sample) may be included in the methods recited herein. A non-limiting example of such a marker is AP003532_2.
  • the methods provided herein amplify the selected loci (optionally together with an associated SNP) to generate a microsatellite amplicon, which is subsequently sequenced.
  • MIPs molecular inversion probes
  • smMIPs single-molecule molecular inversion probes
  • any other appropriate technique for amplifying the selected loci may be used.
  • Alternative appropriate methods are well known in the art and include conventional PCR. In other words, the methods may use any appropriate nucleic acid sequence (e.g. primer and/or probe) that enables amplification of the selected marker loci.
  • the amplification step may amplify each selected locus individually (in a separate reaction), or may comprise co-amplifying some or all of the selected loci in a multiplex amplification reaction.
  • Suitable primers and/or probes may be selected for the chosen method using standard techniques. For example, in methods wherein a single nucleotide polymorphism (SNP) within a short distance of the selected microsatellite loci is to be amplified together with the locus in order to generate a single amplicon encompassing the locus plus SNP, primers and/or probes that amplify both the microsatellite loci and the SNP within a short distance of the microsatellite loci need to be used.
  • SNP single nucleotide polymorphism
  • the methods provided herein subsequently sequence the microsatellite amplicons generated in the amplification step.
  • Any appropriate sequencing method may be used. For example, high throughput or next generation sequencing, sequencing-by-synthesis, ion semiconductor sequencing or ion torrent sequencing and/or pyrosequencing may be used.
  • the methods provided herein then compare the sequences from the microsatellite amplicons to predetermined sequences and determine any deviation, indicative of instability, from the predetermined sequences.
  • deviation may be in the form of an insertion or deletion when compared to the predetermined sequences.
  • the methods described herein may include the step of determining allelic imbalance. Assessing whether length variants are concentrated in sequence reads from one SNP allele offers an additional criterion to differentiate between PCR artefacts and mutations that occur in vivo, and can provide additional discrimination between MSI and MSS samples. This is because PCR artefacts are likely to affect both alleles equally, whereas instability is a stochastic event affecting a single allele at a time. This can lead to bias in the levels of instability observed between the alleles at a single microsatellite marker, even if both are unstable.
  • the methods may therefore allow for the evaluation of the biological significance of any microsatellite instability in the sample, and may comprise amplifying both the microsatellite loci and an SNP within a short distance of it in a single amplicon, using the primers, and, for heterozygous SNPs, determining whether there is a bias between indel frequencies for the two alleles of the sample.
  • (i) comprise GM07 and up to four other microsatellite loci listed in Table 1 ; or
  • (ii) comprise LR44 and up to four other microsatellite loci listed in Table 1 ; or
  • each microsatellite loci has a single nucleotide polymorphism (SNP) loci within a short distance of the microsatellite loci and said amplifying step amplifies both the microsatellite loci and the associated SNP in a single amplicon;
  • SNP single nucleotide polymorphism
  • the preferred features of this method e.g. details of the microsatellite loci, amplification step, etc, may be in accordance with those described above for the methods for evaluating levels of microsatellite instability in a sample, since the methods are clearly related.
  • a method as above may be useful for identifying mismatch repair defects, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of a mismatch repair defect.
  • a method as above may be useful for identifying MSI-H, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of the sample having high levels of microsatellite instability (MSI-H).
  • a kit is also provided herein for use in the methods of the invention.
  • the kit may comprise primers and/or probes for amplifying microsatellite loci in accordance with the above.
  • the kit may also comprise a thermostable polymerase and/or labeled dNTPs or analogs thereof.
  • the labeled dNTPs or analogs thereof may be fluorescently labeled.
  • the kit may comprise, as well as the primers and/or probes for amplifying the microsatellite loci in accordance with the methods of the invention, reagents necessary for carrying out the methods of the invention, for example enzymes, dNTP mixes, buffers, PCR reaction mixes, chelating agents and/or nuclease-free water.
  • the kit may comprise instructions for carrying out a method of the invention.
  • MSI markers can be used to accurately classify samples as either MSI and MSS.
  • the former corresponds to samples classified as MSI-H by fragment analysis while the latter includes MSS samples and samples with low levels of instability (MSI- L).
  • MSI- L MSS samples and samples with low levels of instability
  • tumours have resulted from a breakdown in mismatch repair is important in clinical management of the individual and can help prevent future cancers in those families where there is a germline molecular defect.
  • a scalable, reliable MSI test will have clinical utility while modest costs and the ability to link this analysis to routine pathology assessment will help to ensure rapid adoption and facilitate further molecular approaches to tumour profiling and precision medical care.
  • microsatellite or“microsatellite regions” as used herein refers to mono-, di-, tri-, tetra, penta- or hexanucleotide repeats in a nucleotide sequence, consisting of at least two repeat units and with a minimal length of 6 bases.
  • a particular subclass of microsatellites includes the homopolymers.
  • microsatellite mono-nucleotide repeat loci “microsatellite loci”, “microsatellite MNR loci” and “marker” are used interchangeably herein, unless the context specifically indicates otherwise.
  • “Homopolymer” as used herein refers to a microsatellite region that is a mononucleotide repeat of at least 6 bases; in other words, a stretch of at least 6 consecutive A, C, T or G residues if looking at the DNA level. Most particularly, when determining microsatellites, one looks at genomic DNA of a subject (or of genomic DNA of a cancer present in the subject).
  • MSI status refers to the presence of microsatellite instability (MSI), a germline or somatic change in the number of repeated DNA nucleotide units in microsatellites.
  • MSI status can be one of three discrete classes: MSI-H, also referred to as MSI-high, MSI positive or MSI, MSI-L, also referred to as MSI-low, or microsatellite stable (MSS), also referred to as absence of MSI.
  • MSI-H typically, to be classified as MSI-H, at least 30% of the markers used to classify MSI status need to score positive, while for the MSS classification, 0% score positive. If an intermediate number of markers scores positive, the tumour is classified as MSI-L.
  • An“indel” as used herein refers to a mutation class that includes both insertions, deletions, and the combination thereof.
  • An indel in a microsatellite region results in a net gain or loss of nucleotides.
  • the presence of an indel can be established by comparing it to DNA in which the indel is not present (e.g. comparing DNA from a tumour sample to germline DNA from the subject with the tumour), or, especially in case of monomorphic microsatellites or homopolymers, by comparing it to the known length of the microsatellite, particularly by counting the number of repeated units.
  • cancer refers to different diseases involving unregulated cell growth, also referred to as malignant neoplasm.
  • malignant neoplasm refers to different diseases involving unregulated cell growth, also referred to as malignant neoplasm.
  • tumor refers to different diseases involving unregulated cell growth, also referred to as malignant neoplasm.
  • the term“tumour” is used as a synonym in the application. It is envisaged that this term covers all solid tumour types (carcinoma, sarcoma, blastoma), but it also explicitly encompasses non-solid cancer types such as leukemia.
  • a“tumour sample” encompasses both solid tumour samples (e.g. tissue biopsies) as well as biological fluid samples (e.g. those that have been obtained or isolated from a bodily fluid such as urine, blood, plasma, serum etc).
  • the sample can be described as a“sample of tumour DNA”.
  • the tumour DNA may be present within a bodily fluid such as urine, blood, plasma, serum etc and may be isolated from the bodily fluid prior to performing the methods described herein. Any appropriate method for obtaining or isolating the tumour DNA may be used. Several appropriate methods are well known in the art.
  • a sample of tumour DNA has at one point been isolated from a subject, particularly a subject with cancer.
  • it has undergone one or more forms of pre-treatment (e.g. lysis, fractionation, separation, purification) in order for the DNA to be sequenced, although it is also envisaged that DNA from an untreated sample is sequenced.
  • the noun“subject” refers to an individual vertebrate, more particularly an individual mammal, most particularly an individual human being.
  • A“subject” as used herein is typically a human, but can also be a mammal, particularly domestic animals such as cats, dogs, rabbits, guinea pigs, ferrets, rats, mice, and the like, or farm animals like horses, cows, pigs, goat, sheep, llamas, and the like.
  • a subject can also be a non-mammalian vertebrate, like a fish, reptile, amphibian or bird; in essence any animal which can develop cancer fulfils the definition.
  • Lox syndrome refers to an autosomal dominant genetic condition which has a high risk of colon cancer as well as other cancers including endometrium, ovary, stomach, small intestine, hepatobiliary tract, upper urinary tract, brain, and skin cancer.
  • endometrium ovary
  • stomach small intestine
  • hepatobiliary tract upper urinary tract
  • brain and skin cancer.
  • the increased risk for these cancers is due to inherited mutations that impair DNA mismatch repair.
  • the old name for the condition is HNPCC.
  • microsatellite loci As used herein, the terms “microsatellite loci”, or “repeat” and “marker” are used interchangeably where the context allows.
  • GM07 and“GM7” are used interchangeably herein.
  • AP003532_2 and“AP0035322” are also interchangeable.
  • the inventors have amplified and sequenced 24 microsatellite markers and trained a classifier using 98 CRCs, which accommodates marker specific sensitivities to MSI. Sample classification achieved 100% concordance with the MSI Analysis System v1.2 (Promega) in three independent cohorts, totalling 220 CRCs. The inventors have therefore demonstrated the clinical validity of a specific 24 microsatellite marker panel described herein.
  • the markers of the 24 microsatellite loci marker panel are listed in Table 1.
  • the inventors then used backward-forward stepwise selection to identify a 6-marker subset (from the 24 original markers) which shows equal accuracy to the 24-marker panel. Surprisingly, assessment of assay detection limits tested herein showed that the 6-marker panel is only marginally less robust to sample variables than the 24 marker subset.
  • the 6- marker subset therefore represents a very useful reduced marker panel for accurate detection of MSI in a sample.
  • the markers of the 6 microsatellite marker panel are GM07, GM11 , GM14, LR36, LR44 and LR52 (see Table 1).
  • the inventors have performed further computational analysis of the original 24 marker panel to identify particularly informative individual markers and combinations. They have surprisingly identified that MSI classification is still accurate when the marker number is further reduced, and that the markers GM07, LR44 and/or LR52 are particularly informative.
  • GM07, LR44 and/or LR52 are much more discriminating than other markers and are present at a much higher frequency in small marker sets which have high MSI classification accuracy.
  • the presence of one, two or three of these markers in an MSI marker panel enables a reduced total number of markers to be used without loss of MSI classification accuracy.
  • the original 24 marker panel was subdivided into different subgroups of three, four, five or six markers and the MSI classification accuracy for each subgroup of markers was determined, using the original data generated for each of these markers in the original 24 marker dataset.
  • the inventors combined the 24 microsatellite loci markers of Table 1 into subgroups of 6 markers, resulting in 134,569 different 6-marker subgroup combinations. Of these 134,569 different 6-marker subgroup combinations only 1 132 (0.8%) gave an MSI classification accuracy of >0.999.
  • GM07, LR52 and LR44 are all present at frequencies of 0.75 or above, with no other marker reaching a frequency of 0.3. As the number of markers used decreases, the requirement for using one or more of these 3 markers (GM07 /LR52 / LR44) increases, and they are the markers present in the only 3mer combination that is accurate out of 2024 possible triplets (Table 2 columns 8-9). Histograms summarizing the accuracy of all markers singly, and in all combinations of 2, 3, 4, 5 and 6 markers, are presented in Figure 7. Only 0.05% of 3mers (1/2024), 0.2% of 4mers (22/10626), 0.4% of 5mers (162/42504), and 0.8% of 6mers (1 132/134596) give an accuracy of >0.999.
  • Table 2 Representation of individual markers among six, ive, four and three marker combinations with accuracy >0.999. Number of appearances and frequencies are shown.
  • GM07, LR44 and/or LR52 are particularly useful markers for MSI.
  • Reduced MSI marker panels and assays are therefore provided herein using at least one, at least two or all three of GM07, LR44 and/or LR52 for accurate MSI classification.
  • Table 3 Occurrence of triplet marker combinations within all 6 marker combinations with >0.999 accuracy. Information on triplets with a frequency below 0.05 is aggregated for space considerations.
  • MSI classification is accurate and reproducible using a 24 MSI marker panel or a 6 MSI marker panel
  • smMIPs were designed for the 17 short MNR markers previously described in the singleplex assay of Redford et al (2016). Two markers had to be excluded from the panel due to poor amplification by the smMIP protocol. To supplement this reduced panel, smMIPs were successfully designed for an additional 9 markers taken from the extended set of Redford et al (2016), giving a total 24 short MNR markers (data not shown). Having defined the marker panel, a training cohort of 51 MSI-H and 47 MSS CRCs was used to estimate classifier parameters. Reclassification of the training samples using these parameters achieved 100% sensitivity (95% Cls: 93.0-100.0%) and 100% specificity (95% Cls: 92.5-100.0%) (Figure 1A).
  • MSI classification detects 3% MSI-H cell line DNA in sample mixtures
  • MSI classification is accurate when 10 or more molecules are sequenced per marker
  • the MSI assay presented here achieved 100% accuracy of MSI classification in 220 CRCs, relative to the MSI Analysis System v1.2 (Promega), using only tumour DNA and as few as 6 microsatellite markers.
  • We found no improvement to classifier performance using molecular barcodes for sequencing error correction (Hiatt et al., 2013). This is likely due to our use of short MNRs with flanking SNPs, selected from genome-wide data, and our classification method. Shorter microsatellites have lower PCR and sequencing error rates compared to longer microsatellites (Fazekas et al., 2010), while the SNPs flanking the microsatellites provide additional discrimination between error and true microsatellite mutations.
  • Marker number has a significant impact on cost, and with only 6 markers, plus BRAF c.1799 for streamlined LS screening (Newland et al., 2017), reagent cost estimates range from £5.50- 6.77 per sample, depending on the capacity of the MiSeq kit used.
  • the 24 marker set may, however, be preferred for a variety of reasons. It could provide protection against allele or marker drop out, due to technical variation, somatic events within tumours, or population specific sequence variants on marker length or stability. It may also enhance the clinical utility of the assay as it increases the power of the internal sample traceability provided by the SNPs linked to each marker. For instance, using the allele frequencies observed in the training cohort, the probability of any two individuals sharing the same genotype is 3.8x10 -3 from the 6 marker subset, but 3.6x10 -10 when 24 markers are used (Tables 10A, 10B and 10C).
  • the clinical demand for MSI analysis may increase, driven by the need to predict patient response to immune checkpoint blockade therapy across multiple cancer types (Le et al., 2017).
  • the frequency of mutations in non-coding microsatellites has been shown to be equivalent between different cancer types (Cortes-Ciriano et al., 2017), and the 24 marker panel can detect CMMRD from PBL DNA (Gallon et al., 2019), making it likely that the assay will be suitable for MSI detection in extra-colonic tissues.
  • the MSI assay outlined here is accurate, reproducible, robust to sample heterogeneity, and includes both internal quality controls and sample identification.
  • the automatable laboratory workflow and analysis, and the need for as few as 6 microsatellite markers at moderate read depths provides a cheap and scalable option for high-throughput MMR deficiency testing.
  • the greatly reduced marker number also means that the assay is potentially suitable for application to the detection of MSI within liquid biopsy material, for example to allow early detection or monitoring of tumours with MSI.
  • H9 embryonic stem cell line (WiCell) DNA was a gift from L. Lako (Newcastle University, UK), and used as an MSS control.
  • HCT116 (CCL-247, ATCC) and K562 (CCL-243, ATCC) cells were gifted by J. Irving (Newcastle University, UK).
  • HCT116 and K562 cells were grown in RPMI growth medium containing 2mM L-glutamine (Gibco), 10% fetal bovine serum (Gibco), 60mg/ml penicillin and 100mg/ml streptomycin (Gibco) at 37°C and 5% CO 2 .
  • HCT116 cells were passaged or harvested at 80-90% confluence by decanting expired growth medium, washing in 5ml PBS (Gibco), and detaching the cells using 0.05% Trypsin-EDTA (Gibco).
  • K562 cells were passaged or harvested at a density of 1x10 6 cells/ml.
  • DNA extracted from HCT1 16 CRC cell line (MLH1 deficient) was used as an MSI-H control.
  • DNA extracted from K562 chronic myeloid leukaemia cell line was used as an MSS control.
  • MSI-H and MSS samples were created using HCT 116 and PBL DNAs (Table 9).
  • 9 samples comprising 3 fresh tissues (HCT116, H9, and K562 cell lines) and 6 FFPE tissues (3 MSI-H CRCs: N021 , N068, and N073, and 3 MSS CRCs: N033, N036, and N056), were 2- fold serially diluted in 10mM Tris-HCI pH 8.5.
  • the marker panel includes 24 MNRs, previously published by Redford et al (2016), for MSI classification, as well as BRAF c.1799 to screen for sporadic MSI-H CRCs Newland et al., 2017). MIPgen (Boyle et al., 2014) was used to generate smMIP sequences for each marker.
  • MIPgen parameters were: tag size 6,0, minimum capture size 120, and maximum capture size 150.
  • smMIP designs were selected by the following criteria: no common single nucleotide polymorphisms (SNPs) in the smMIP extension or ligation arms, logistic score >0.8, and successful amplification of loci. Marker loci and smMIP sequences are detailed in Table 7.
  • the probability of any two patients having the same genotype can be calculated from the observed frequencies of each SNP genotype in a marker.
  • the product of the probabilities for each SNP can be used to determine the probability of any two patients having the same genotype across multiple markers; to avoid the potential of linkage disequilibrium where a marker has multiple SNPs, the SNP with the lowest match probability was used.
  • the probability of two patients having the same genotype is 3.58x10 -10 .
  • the probability of two patients having the same genotype is 3.82x10 -3 .
  • smMIPs and primers for amplification and sequencing (Table 12 below), were synthesised by and purchased from Metabion.
  • smMIP phosphorylation and pooling smMIPs were individually phosphorylated using 10U of T4 Polynucleotide Kinase (NEB), 1X T4 DNA Ligase buffer (NEB), and 1 mM of un-phosphorylated smMIP in a 100mI reaction volume, and incubated at 37°C for 45 minutes, followed by 80°C for 20 minutes.
  • Phosphorylated smMIPs were pooled, with specified volumes for each smMIP to equalise the number of reads from each marker locus, and diluted using TE buffer (Sigma) to an average concentration of 0.1 nM (Table 11).
  • Table 11 Read-balancing smMIPs; ⁇ 100-fold dilution to achieve average smMIP concentration of 0.1nM; Read-balancing the smMIP pool reduced the coefficient of variance of reads assigned to different markers from 68% to 35%.
  • thermocycler SensoQuest thermocycler
  • Herculase II Polymerase was used during extension and amplification steps for increased fidelity of microsatellite replication (Fazekas et al., 2010).
  • the thermocycler programme used 98°C for 2 minutes, 30 cycles of 98°C for 15 seconds, 60°C for 30 seconds and 72°C for 30 seconds, followed by 72°C for 2 minutes.
  • sample DNA 100ng was used as template unless stated otherwise: the input quantity of CRC sample DNA varied depending on quantity available (Table 8).
  • smMIP reaction products 240-270bp were analysed using 3% Agarose gel electrophoresis at 80mV for 60 minutes, or a QIAxcel (Qiagen).
  • Library preparation and sequencing smMIP amplicons were purified using Agencourt AMPure XP Beads (Beckman Coulter), diluted to 4nM in 10mM Tris pH 8.5, and pooled in equal volumes. Libraries were sequenced on a MiSeq (lllumina) using the GenerateFastq workflow, paired end sequencing and custom sequencing primers (Hiatt et al., 2013); sequencing run statistics are presented in Table 13. Fastq files are available from the EMBL-EBI European Nucleotide Archive, accession number PRJEB28394.
  • Sequence analysis and MSI classification was carried out as described in by Redford et al (2016) which is incorporated herein by reference in its entirety; see also WO2018/037231 , which is also incorporated herein by reference in its entirety.
  • sequencing reads were aligned to the hg19 reference genome using BWA mem (BWA vO.6.2) (Li & Durbin, 2010).
  • smM IP-based sequencing assesses the regions of interest in both orientations, and only base calls supported by both reads of a pair were processed further.
  • the MSI classifier uses both the frequency and allelic bias of deletions in the microsatellite markers to type each sample. The deletion frequency was defined as the proportion of reads that have a microsatellite length less than the reference length.
  • the allelic bias of deletions i.e. whether deletions are preferentially observed in reads carrying one of the SNP alleles, was assessed using Fisher's Exact test p value.
  • deletion frequency and allelic bias were dichotomised into two binary traits; deletion frequency is assessed by whether it is above or below the 95th percentile of the training MSS samples, and allelic bias is assessed by whether the p value is above or below 0.05.
  • a training cohort of samples was used to estimate the probabilities of observing the different combinations for each marker in MSI-H and MSS tumours.
  • the (posterior) probability that a new sample is MSI- H versus MSS can then be estimated from its microsatellite deletion frequencies, and the allelic bias of deletions, using a naive Bayes approach.
  • a prior probability of 0.85 that a sample is MSS was used.
  • the assay score represents the decadic logarithm of the odds a sample is MSI-H versus MSS. Scores >0 classify a sample as MSI-H, and scores ⁇ 0 classify a sample as MSS.
  • the number of molecular barcodes detected can be used to assess the accuracy of the sample dilution series.
  • One sample shows a lower than expected number of molecular barcodes from 6.25ng of template DNA, which is also visible from a reduced intensity of amplicon in the gel image ( Figure 6A), suggesting there was an error in reaction preparation.
  • Additional sample dilution series were simulated by resampling of reads. For each marker in a sample, reads were grouped by molecular barcode, and the microsatellite length and SNP genotype associated with that molecular barcode was summarised from that found in the majority of reads in the group. A predetermined number of molecular barcodes was selected to simulate sample dilution. Subsequently, reads for the simulated sample were generated to a depth equal to that of the original sample, with each read having a defined microsatellite length and SNP genotype by random sampling of the selected molecular barcodes.
  • MIPgen optimized modeling and design of molecular inversion probes for targeted resequencing.
  • Hereditary colorectal cancer syndromes American Society of Clinical Oncology Clinical

Abstract

The invention provides novel methods for evaluating levels of microsatellite instability in a sample or evaluating the biological significance of sequence variation identified during sequencing. Corresponding kits are also provided.

Description

METHODS OF IDENTIFYING MICROSATELLITE INSTABILITY
Field of the invention
The invention provides novel methods for evaluating levels of microsatellite instability in a sample or evaluating the biological significance of sequence variation identified during sequencing. Corresponding kits are also provided.
Background to the invention
Microsatellites are regions of genomic DNA comprising simple repetitive sequences where 1 - 6bp long units are tandemly repeated, often 5-50 times. Microsatellite loci are classified based on the length of the smallest repetitive unit. For example, loci with repetitive units of 1 to 5 base pairs in length are termed“mono-nucleotide”,“di-nucleotide”,“tri-nucleotide”,“tetra- nucleotide”, and“penta-nucleotide” repeat loci, respectively.
Microsatellite loci in normal genomic DNA of most diploid species, including human are present in two copies, or alleles. By and large, microsatellite alleles are normally maintained at constant length in a given individual and its descendants. However, microsatellites are known to be unstable during meiotic and mitotic replication in eukaryotes and prokaryotes. Instability in the length of microsatellites has been observed in some tumours. Factors which affect the stability of microsatellites include the length of the microsatellite, repeat unit length, base composition, and the sequence surrounding the microsatellite. For example, dinucleotide repeats tend to be more mutable than tetra nucleotide repeats of the same length. Microsatellite instability (MSI) often occurs due to a failure to correct DNA replication errors as a result of defects in mismatch repair (MMR) genes. Testing for MSI in tumours is therefore used to identify MMR gene defects. MSI has also been shown to predict tumour response to immune checkpoint blockade therapy, irrespective of tissue of origin (Le et al. , 2017). Accordingly, detecting MSI may be useful for several different therapeutic applications.
Mismatch repair (MMR) deficiency affects approximately one in six colorectal cancers (CRCs) (Boland et al., 1998). Lynch syndrome (LS), an inherited predisposition to cancer caused by germline pathogenic variants affecting one allele of an MMR gene, accounts for approximately one in five MMR deficient CRCs (Hampel et al., 2008). Assessment of MSI (or MMR) status can inform patient management and is recommended in all CRCs by national and international guidelines to screen for LS (Balmana et al., 2013; Stoffel et al., 2015; Newland et al., 2017). Once identified, LS patients benefit from surveillance colonoscopy, prophylactic surgery, and chemoprevention (Burn et al., 2011 ; Vasen et al., 2013). MMR status of samples (such as tumour samples) is commonly assessed by immunohistochemistry (IHC) of MMR proteins, or PCR fragment length analysis (FLA) of microsatellites to detect increased MSI. MMR deficiency is inferred from the absence of at least one MMR protein, or high levels of MSI (MSI-H). Typically, MSI-H is defined by mutation of ³30-40% of microsatellites analysed (Boland et al., 1998). These methods are highly sensitive and specific, with reported sensitivities and specificities of 93% and 95% for IHC of all four MMR proteins (Shia, 2008), and 97% and 100% for FLA of mononucleotide repeats (MNRs) (Bacher et al., 2004). IHC and FLA also perform well with respect to other demands of diagnostic tests. FLA is considered highly reproducible, with 98% concordance of results observed between independent laboratories (Zhang, 2008), although IHC shows some heterogeneity due to discordant interpretation of variable staining, and use of different antibodies (Shia, 2008). FLA has been shown to be reliable when sample tumour cell content is ³10% (Berg et al., 2000), and IHC can detect focal MMR deficiency (Chapusot et al., 2002). Both are also considered to be relatively cheap and cost-effective for LS screening (Snowsill et al., 2014). However, the uptake of MMR deficiency testing has been poor; only 28% of 152,993 CRC cases were analysed during 2010-2012 in the USA (Shaikh et al., 2018), with a similar proportion being analysed in the UK. This is despite guidelines recommending testing and estimates that only 1.2% of LS gene carriers were known to clinical services in the US in 2011 (Hampel & de la Chapelle, 201 1). It is estimated that only 5% of carriers are currently known in the UK.
Sequencing-based MSI assays determine the mutation status of microsatellites and then use the proportion of microsatellites that are mutated to classify a sample. Sensitivities and specificities >95% have been reported when comparing the performance of several such classifiers using microsatellites captured by gene panel sequencing (Kautto et al., 2016; Zhu et al., 2018), and such methods can identify samples misclassified by conventional MMR deficiency tests, highlighting that there is no gold standard reference method (Hause et al., 2016). However, the high cost of gene panel sequencing (Marino et al., 2018) may be a barrier to its widespread deployment for MSI testing, or for the detection of LS by MMR gene sequencing. Targeted sequencing-based MSI assays using a specific panel of microsatellites have been developed that, similar to gene panel-based classifiers, classify samples by the proportion of microsatellites that are mutated (Hempelmann et al., 2015; Hempelmann et al., 2018; Waalkes et al., 2018). However, even when using the same classification method, different marker proportions are used as a threshold with different marker sets (Hempelmann et al., 2015; Kautto et al. , 2016; Hempelmann et al., 2018; Waalkes et al., 2018; Zhu et al., 2018), and thresholds can be uncertain when relatively few microsatellites (<20) are analysed (Hempelmann et al. , 2015). Variable or indeterminate thresholds can be compensated for by larger marker panels, albeit with increased sequencing costs. Ideally such assays need to be competitive with both more expensive, and more comprehensive, gene panel sequencing, as well as the cheaper, and lower throughput, methods of IHC and FLA.
There is a need for improved methods for identifying microsatellite instability in a sample.
Summary of the invention
The inventors have previously used amplicon sequencing of short (7-12bp), monomorphic (i.e. where no length variation has been reported in the human population), mono-nucleotide repeats (MNRs) to classify the MSI status of CRCs, without needing matched normal tissue (see WO 2018/037231 and Redford et al., 2018). Short MNRs were selected as longer (>15bp) microsatellites are associated with increased PCR and sequencing error (Fazekas et al., 2010), and it has been reported that 9-15bp microsatellites give the greatest differences in mutation frequencies between MSI-H and MSS samples using NGS (Maruvka et al., 2017). Their previous method for MSI detection accounts for the individual sensitivity and specificity of each marker and achieved >97% accuracy in 209 CRCs with only 17 markers, using FLA as the reference method.
The inventors have now developed a new MSI marker set with 24 short MNRs (listed in Table 1 below) that has been tested and validated on CRC patient samples. They have shown that the 24 MSI marker panel provided herein achieves 100% accuracy in detecting MSI in real CRC patient samples and is therefore suitable for clinical cancer diagnostics. To establish that the new MSI marker set is suitable for clinical practice, the inventors followed joint guidelines from the Association for Molecular Pathology and the College of American Pathologists (Jennings et al., 2017). This includes validation of diagnostic accuracy using independent sample cohorts, assessment of reproducibility and detection limits, definition of quality control criteria, and deployment in an independent diagnostic laboratory. The new MSI marker set described herein is therefore ready and suitable for clinical use.
The inventors have also performed further analysis on subsets of the MSI markers used in the validated 24 marker set described herein. Surprisingly, the inventors have found that 100% detection accuracy may also be achieved when the number of MSI markers used is significantly reduced. Backward-forward stepwise selection was used to identify a 6-marker subset of equal accuracy to the 24 marker panel. Additional computational analysis was also performed to identify other 3 to 6 marker subsets that may also be equally informative. Surprisingly, the inventors demonstrate herein that, of the 24 MSI markers tested, GM07, LR44 and/or LR52 are particularly informative, and therefore at least one of these markers is present, a reduced marker set (of as few as three markers, four markers, five markers, six markers etc) can be used to for MSI classification with at least 95% accuracy. MSI marker panels with a small number of highly informative markers are therefore also provided herein, which can be used to classify MSI status in a variety of different samples, including tissue or liquid biopsies. Further details of such subsets of MSI markers are provided below.
The invention is based on amplifying and sequencing of a plurality of MSI marker loci to classify MSI status in a sample. The inventors have demonstrated the invention using single molecule molecular inversion probes (smMIPs) (Hiatt et al. , 2013) to amplify the MSI marker loci in multiplex. Multiplex amplification and sequencing techniques are particularly advantageous because they allow for automated sequence analysis and high throughput diagnostics. However, as would be clear to a person of skill in the art, any other suitable means for amplifying and sequencing the informative MSI markers described herein may also be used (e.g. conventional PCR may be used). The invention is therefore not limited to using smMIPs or any other specific probes or primers described herein. Other appropriate methods are described in more detail below.
A method for evaluating levels of microsatellite instability in a sample is provided herein comprising:
(a) amplifying from the sample one or more microsatellite mono-nucleotide repeat loci to give microsatellite amplicons, wherein the one or more microsatellite loci:
(i) comprise GM07 and up to four other microsatellite loci listed in Table 1 ; or
(ii) comprise LR44 and up to four other microsatellite loci listed in Table 1 ; or
(iii) comprise LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44;
(b) sequencing the microsatellite amplicons; and
(c) comparing the sequences from the microsatellite amplicons to predetermined sequences and determining any deviation, indicative of instability, from the predetermined sequences.
A method for evaluating the biological significance of sequence variation identified during sequencing is also provided, comprising:
(a) amplifying from the sample one or more microsatellite mono-nucleotide repeat loci to give microsatellite amplicons, wherein the one or more microsatellite loci:
(i) comprise GM07 and up to four other microsatellite loci listed in Table 1 ; or
(ii) comprise LR44 and up to four other microsatellite loci listed in Table 1 ; or (iii) comprise LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44;
and wherein each microsatellite loci has a single nucleotide polymorphism (SNP) loci within a short distance of the microsatellite loci and said amplifying step amplifies both the microsatellite loci and the associated SNP in a single amplicon;
(b) sequencing the microsatellite amplicons;
(c) comparing the sequences from the microsatellite amplicons to predetermined sequences (e.g. wild type sequences) and determining any deviation, indicative of instability, from the predetermined sequences; and
(d) for heterozygous SNPs, determining whether there is a bias between indel frequencies for the two alleles.
Use of:
(i) GM07 and up to four other microsatellite loci listed in Table 1 ; or
(ii) LR44 and up to four other microsatellite loci listed in Table 1 ; or
(iii) LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44; for evaluating levels of microsatellite instability in a sample, or for evaluating the biological significance of sequence variation identified during sequencing of a sample is also provided.
Suitably, the one or more microsatellite loci may comprise LR52 and up to 22 other microsatellite loci listed in Table 1 , optionally wherein the up to 22 other microsatellite loci include GM07 and/or LR44.
Suitably, the one or more microsatellite loci may comprise LR52 and up to 15 other microsatellite loci from Table 1 , optionally wherein the up to 15 other microsatellite loci include GM07 and/or LR44.
Suitably, the one or more microsatellite loci may comprise LR52 and up to 9 other microsatellite loci from Table 1 , optionally wherein the up to 9 other microsatellite loci include GM07 and/or LR44.
Suitably, the one or more microsatellite loci may comprise LR52 and from 2 to 9 other microsatellite loci from Table 1 , optionally wherein the 2 to 9 other microsatellite loci include GM07 and/or LR44. Suitably, step (a) may comprise amplifying three or more microsatellite loci listed in Table 1.
Suitably, the three or more microsatellite loci may comprise two or three markers selected from: GM07, LR44 and LR52.
Suitably, the three or more microsatellite loci may comprise or consist of a microsatellite loci combination listed in Table 6, optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
For example, the three or more microsatellite loci may comprise or consist of the combination of three microsatellite loci listed in Table 6 (3mer), optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
For example, the three or more microsatellite loci may comprise or consist of a combination of four microsatellite loci listed in Table 6 (4mers), optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
For example, the three or more microsatellite loci may comprise or consist of a combination of five microsatellite loci listed in Table 6 (5mers), optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
For example, the three or more microsatellite loci may comprise or consist of a combination of six microsatellite loci listed in Table 6 (6mers), optionally wherein step (a) comprises amplifying a total of up to 16, 12 or 10 microsatellite loci.
Suitably, the three or more microsatellite loci may comprise GM07, GM1 1 , GM14, LR36, LR44 and LR52.
Suitably, step (a) may comprise amplifying a total of up to 16 microsatellite loci.
Suitably, step (a) may comprise amplifying a total of up to 10 microsatellite loci.
Suitably, step (a) may comprise amplifying a total of from 3 to 10 microsatellite loci.
Suitably, the sample may be a tissue or biological fluid sample. Suitably, the sample may be from a subject that is suspected of having, at risk of having, or being predisposed to cancer, optionally wherein the cancer is colorectal cancer or Lynch syndrome.
The methods described above may be for use in identifying mismatch repair defects is also provided, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of a mismatch repair defect.
The methods described above may be for use in identifying MSI-H, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of the sample having high levels of microsatellite instability (MSI-H).
A kit is also provided, wherein the kit is for amplifying:
(i) GM07 and up to four other microsatellite mono-nucleotide repeat loci listed in Table 1 ; or
(ii) LR44 and up to four other microsatellite mono-nucleotide repeat loci listed in Table 1 ; or
(iii) LR52 and up to twenty-three other microsatellite mono-nucleotide repeat loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44,
wherein the kit comprises primers and/or probes for specifically amplifying the microsatellite loci of (i), (ii) or (iii).
Suitably, the kit may be for amplifying LR52 and up to 22 other loci listed in Table 1 , optionally wherein the up to 22 other microsatellite loci include GM07 and/or LR44.
Suitably, the kit further may comprise a thermostable polymerase and/or dNTPs or analogs thereof, optionally wherein the dNTPs or analogs thereof are labeled.
Throughout the description and claims of this specification, the words“comprise” and“contain” and variations of them mean“including but not limited to”, and they are not intended to (and do not) exclude other moieties, additives, components, integers or steps.
Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
The following terms or definitions are provided solely to aid in the understanding of the invention. Unless specifically defined herein, all terms used herein have the same meaning as they would to one skilled in the art of the present invention. Practitioners are particularly directed to Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Press, Plainsview, N.Y. (1989); and Ausubel et al., Current Protocols in Molecular Biology (Supplement 47), John Wiley & Sons, New York (1999), for definitions and terms of the art. As a further example, Singleton and Sainsbury, Dictionary of Microbiology and Molecular Biology, 2d Ed., John Wiley and Sons, NY (1994); and Hale and Marham, The Harper Collins Dictionary of Biology, Harper Perennial, NY (1991) provide those of skill in the art with a general dictionary of many of the terms used in the invention. Although any methods and materials similar or equivalent to those described herein find use in the practice of the present invention, the preferred methods and materials are described herein.
Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise. Accordingly, as used herein, the singular terms "a", "an," and "the" include the plural reference unless the context clearly indicates otherwise.
Unless otherwise indicated, nucleic acids are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. It is to be understood that this invention is not limited to the particular methodology, protocols, and reagents described, as these may vary, depending upon the context they are used by those of skill in the art.
Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith.
The entire disclosures of the issued patents, published patent applications, and other publications that are cited herein are hereby incorporated by reference to the same extent as if each was specifically and individually indicated to be incorporated by reference. In the case of any inconsistencies, the present disclosure will prevail. Various aspects of the invention are described in further detail below.
Brief description of the figures
In order to provide a better understanding of the present invention, embodiments will be described by way of example only and with reference to the following figures in which;
Figure 1 shows MSI classification of CRCs. MSI classifier scores versus diagnosis by the MSI Analysis System v1.2 (Promega) for CRCs analysed in (A) the training cohort, and (B) the validation cohort.
Figure 2 shows assay robustness to sample heterogeneity. (A) Classifier scores from mixtures of MSI-H cell line and microsatellite stable (MSS) peripheral blood lymphocytes (PBL) DNA samples. (B) The proportion of correctly classified samples from 2400 simulated mixture series from the validation cohort reads (dotted line = 0.95).
Figure 3 shows assay robustness to variation in quantity of sample DNA. (A) Classifier scores from serial dilution of 9 samples, using 3.13-100ng of template DNA (dotted line = 75 molecular barcodes per marker). (B) The proportion of correctly classified samples from 60 simulated dilution series per sample in the validation cohort (dotted line = 0.95).
Figure 4 shows assay validation in an independent laboratory. MSI classifier scores versus diagnosis by the MSI Analysis System v1.2 (Promega) for 23 CRCs tested by the Northern Genetics Service (Newcastle Hospitals NHS Foundation Trust, Newcastle, UK).
Figure 5 shows validation using a DNA and read mixing series. (A) Observed versus expected relative frequency of microsatellite length mutations in the three replicate sample mixture series, ranging from 0.78-50.00% MSI-H cell DNA. (B) Classifier scores of simulated and empirical sample mixture series, ranging from 0.00-100.00% MSI-H cell DNA.
Figure 6 shows validation using a DNA and read dilution series. (A) Visualisation of amplicons from the template DNA dilution series. Agarose gel electrophoresis of amplicons generated from 9 samples by the MSI assay. Amplicons are visible at 240-270bp. Primer dimers are visible at 80bp. Marker (M): GeneRuler 1 kb Plus (ThermoFisher). Top panel: cell line controls. Middle panel: FFPE MSI-high CRCs. Bottom panel: formalin fixed paraffin embedded (FFPE) MSS CRCs. (B) Correlation of the number of molecular barcodes detected, and the input quantity of template DNA. (C) Comparison of empirically observed and simulated sample dilution series, and the association between molecular barcodes/marker and classifier score.
Figure 7 shows the influence of subset size on marker accuracy. Shown is the distribution of accuracies of all combinations of the microsatellite loci markers of Table 1 , from the markers on their own (“1”) to combinations of 6 markers (“6”).
Detailed description
A panel of 24 MSI markers is provided herein which has been validated on clinical samples. Surprisingly, it was found that within the panel of 24 MSI markers (listed in Table 1), microsatellite mono-nucleotide repeat loci GM07, LR44 and/or LR52 are particularly informative markers for evaluating microsatellite instability in a sample.
This discovery allows for the design and implementation of new MSI screening methods using a smaller number of MSI markers than previously thought possible. The invention therefore provides novel methods for evaluating levels of microsatellite instability (MSI) in a sample using a minimal MSI marker set, comprising one or more of the microsatellite mono-nucleotide repeat loci GM07, LR44 and/or LR52. In addition, it provides novel methods for evaluating the biological significance of sequence variation identified using a minimal MSI marker set comprising one or more of these markers, together with corresponding single nucleotide polymorphism (SNP) loci located within a short distance of these microsatellite loci. The methods described herein can be used to differentiate between amplification (e.g. PCR) and sequencing errors and MSI induced indels/mutations when using those markers. The SNPs are typically within 80 base pairs of the microsatellite locus of interest, for example within 50 base pairs. The SNP may even be within 30 base pairs of the microsatellite locus of interest. Preferably the single nucleotide polymorphism (SNP) has a minor allele frequency between above 0.05. Preferably the single nucleotide polymorphism (SNP) has a high heterozygosity.
The methods provided herein are particularly useful as they provide an accurate means for MSI classification using a minimal number of MSI markers. This provides a new means for reducing cost and complexity of MSI testing, whilst maintaining accuracy.
The methods are particularly useful for assessing MSI in a sample from a human subject. The sample comprises nucleic acid (DNA) comprising the loci of interest. The sample may be a tissue sample (e.g. tumour tissue sample) or a biological fluid sample that has been obtained from a human subject. In this context, a biological fluid sample encompasses any fluid sample (e.g. liquid biopsy) obtained from the subject. Suitable biological fluid samples include e.g. blood, serum, plasma, urine etc. Testing biological samples using the methods described herein may be particularly useful e.g. for early cancer detection in those at high risk or monitoring for disease recurrence (by assessing circulating tumour or cell free DNA). Methods for obtaining appropriate samples from a subject are well known in the art.
The methods provided herein are particularly useful for assessing MSI in a sample obtained from a subject (e.g. human subject). The subject may be a subject that is suspected of being at risk of developing cancer, e.g. colorectal cancer or Lynch syndrome. Alternatively, the subject may be known to be at risk of developing cancer, e.g. colorectal cancer or Lynch syndrome. For example, the subject may be suspected of being predisposed to cancer, e.g. colorectal cancer or Lynch syndrome, or they may be known to be predisposed to cancer, e.g. colorectal cancer or Lynch syndrome. In another example, the subject may have cancer e.g. colorectal cancer or Lynch syndrome already. The methods may also be used to diagnose cancer and/or monitor disease recurrence after cancer treatment. Based on the MSI classification, an appropriate class of therapeutics or specific therapeutics for treatment of the cancer may be selected, for example pembrolizumab, irinotecan, bevacizumab, cisplatin, carboplatin or 5-fluorouracil, all of which have been suggested as showing differential effectiveness between the treatment of MMR defective or MSI-H cancers compared to MSS cancers. The methods of the invention may also be useful as companion diagnostics for immune checkpoint blockade therapy.
The methods described herein assess MSI using one or more microsatellite mono-nucleotide repeat loci listed in Table 1 below. The invention may therefore utilise one or more (i.e. 1 , 2, 3, 4, 5, 6 ,7 ,8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, or 24) markers that are present within the validated 24 marker panel of Table 1.
As will be understood by a person of skill in the art, the minimum number of microsatellite mono-nucleotide repeat loci that can be used in the methods described herein to accurately evaluate MSI levels will depend on the combination of loci selected, as some loci are more informative than others. As described in detail below, the inventors have shown that GM07, LR44 and LR52 are particularly informative markers for MSI, and accordingly the methods described herein use at least one of these markers. In some examples, it may be preferable to use at least two of these markers, for example, GM07 and LR44. Alternatively, GM07 and LR52, or LR44 and LR52, may be used.
Advantageously, the inventors have shown that a combination of GM07, LR52 and LR44 as a triplet of markers on its own can give an MSI classification accuracy of 1. Furthermore, the inventors have shown that all combinations of 6 markers (from those listed in Table 1) that have an MSI classification accuracy >0.999 contain at least one of GM07, LR52 and LR44. The marker combinations provided herein can therefore achieve a clinically acceptable MSI classification accuracy with significantly fewer markers than was previously understood to be necessary, meaning that the associated methods and kits can be significantly cheaper and more efficient.
Advantageously, the marker combinations provided herein allow for an MSI classification accuracy of at least 0.9, preferably at least 0.95, more preferably at least 0.999 or 1. The marker combinations provided herein can therefore achieve a clinically acceptable MSI classification accuracy with significantly fewer markers than was previously understood to be necessary, meaning that the associated methods and kits can be significantly cheaper and more efficient. The marker combinations provided herein are therefore particularly advantageous in achieving a clinically acceptable MSI classification accuracy.
Accordingly, in one example, a combination of GM07, LR44 and LR52 may be used as the minimal marker set for accurate MSI classification. A combination of all three informative markers (i.e. GM07, LR44 and LR52) may be preferable when the total number of microsatellite mono-nucleotide repeat loci used in the method is relatively low (e.g. when only three loci from Table 1 are tested in the method). Conversely, two or even only one of the three informative markers (i.e. two or one of GM07, LR44 and/and LR52) may be sufficient when the total number of microsatellite loci used (e.g. from those listed in Table 1) is more than three, for example when the total number of microsatellite loci used (e.g. from those listed in Table 1) is six or more, ten or more, twenty or more etc. The skilled person will appreciate that when additional markers are used (e.g. 6 markers rather than 3), there will be greater flexibility in which markers can be used, with numerous combinations of “weaker” markers able to compensate for the greater accuracy provided to the marker panel by alternative use of just one of the“stronger”, core, markers in the panel.
The core markers GM07, LR52 and LR44 provide particular benefits when included in the marker panels. The three core markers as a triplet can give an accuracy of 1 , even in the absence of additional microsatellite loci, so that the skilled person will understand that although only one or two of the core markers may need to be provided in larger markers sets (of 4, 5, 6, 7 plus markers selected from Table 1), it may be beneficial to include all three of the core markers in the loci selected for methods of the invention, particularly when few (e.g. only 3 or 4) microsatellite markers of Table 1 are to be used. Methods for evaluating MSI levels may therefore comprise amplifying and sequencing a minimal set of MSI markers that are selected from: (i) GM07 and up to four other microsatellite loci listed in Table 1 ; or (ii) LR44 and up to four other microsatellite loci listed in Table 1 ; (iii) LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44.
In other words, an evaluation of MSI levels may be performed using GM07 as a marker alone, or in combination with up to four other markers listed in Table 1. GM07 was found by the inventors to be the marker that was present most frequently in the combinations of three, four, five or six markers from Table 1 that were able to classify MSI status with a high level (>0.999) of accuracy. Although GM07 may not provide this level of accuracy on its own, it may be sufficiently informative on its own as an initial (e.g. preliminary) screen for MSI status. Alternatively, it may be combined with other MSI markers (e.g. up to four other markers from those listed in Table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status) to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method. In other words, GM07 may be combined with 0, 1 , 2, 3, or 4 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). For example, GM07 may be combined with 2, 3, or 4 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). As will be described in more detail below, GM07 may also be combined with LR44 and/or LR52 (optionally in combination with other markers from table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). A MSI marker panel comprising GM07 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than GM07), which may include up to four other markers from the markers listed in Table 1. In some examples, the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including GM07). For example, the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including GM07). As would be clear to a person of skill in the art, as the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as GM07 are used.
Alternatively, an evaluation of MSI levels may be performed using LR44 as a marker alone, or in combination with up to four other markers listed in Table 1. LR44 was also found by the inventors to be a marker that was frequently present in the combinations of three, four, five or six markers from Table 1 that were able to classify MSI status with a high level (>0.999) of accuracy. Although LR44 may not provide this level of accuracy on its own, it may be sufficiently informative on its own as an initial (e.g. preliminary) screen for MSI status. Alternatively, it may be combined with other MSI markers (e.g. up to four other markers from those listed in Table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status) to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method. In other words, LR44 may be combined with 0, 1 , 2, 3, or 4 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). For example, LR44 may be combined with 2, 3, or 4 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). As will be described in more detail below, LR44 may also be combined with GM07 and/or LR52 (optionally in combination with other markers from table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). A MSI marker panel comprising LR44 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR44), which may include up to four other markers from the markers listed in Table 1. In some examples, the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR44). For example, the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR44). As would be clear to a person of skill in the art, as the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR44 are used.
Alternatively, an evaluation of MSI levels may be performed using LR52 as a marker alone, or in combination with up to 23 or up to 22 other markers listed in Table 1. LR52 was also found by the inventors to be a marker that was frequently present in the combinations of three, four, five or six markers from Table 1 that were able to classify MSI status with a high level (>0.999) of accuracy. Although LR52 may not provide this level of accuracy on its own, it may be sufficiently informative on its own as an initial (e.g. preliminary) screen for MSI status. Alternatively, it may be combined with other MSI markers (e.g. up to 23 or up to 22 other markers from those listed in Table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status) to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method. In other words, LR52 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 or 22 (and optionally with all 23) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). For example, LR52 may be combined with 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 or 22 (and optionally with all 23) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). A MSI marker panel comprising LR52 may therefore have a minimum of O, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR52), which may include up to 22 (or optionally up to 23) other markers from the markers listed in Table 1. In some examples, the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR52). For example, the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR52). As would be clear to a person of skill in the art, as the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR52 are used.
An evaluation of MSI levels may be performed using LR52 in combination with GM07 and/or LR52 (optionally in combination with other markers from table 1 , and/orwith any other markers that are not listed in Table 1 but that may provide additional information on MSI status). In other words, LR52 and GM07 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or 21 (and optionally with all 22) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). A MSI marker panel comprising LR52 and GM07 may therefore have a minimum of O, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR52 and GM07), which may include up to 21 (or optionally up to 22) other markers from the markers listed in Table 1. In some examples, the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR52 and GM07). For example, the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR52 and GM07). As would be clear to a person of skill in the art, as the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR52 and GM07 are used.
An evaluation of MSI levels may alternatively be performed using LR52 in combination with LR44 (optionally in combination with other markers from table 1 , and/orwith any other markers that are not listed in Table 1 but that may provide additional information on MSI status). In other words, LR52 and LR44 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or 21 (and optionally with all 22) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). A MSI marker panel comprising LR52 and LR44 may therefore have a minimum of O, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR52 and LR44), which may include up to 21 (or optionally up to 22) other markers from the markers listed in Table 1. In some examples, the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR52 and LR44). For example, the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR52 and LR44). As would be clear to a person of skill in the art, as the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR52 and LR44 are used.
An evaluation of MSI levels may alternatively be performed using LR52 in combination with both LR44 and GM07 (optionally in combination with other markers from table 1 , and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). In other words, LR52, LR44 and GM07 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, or 20 (and optionally with all 21) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). A MSI marker panel comprising LR52, GM07 and LR44 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than LR52, GM07 and LR44), which may include up to 20 (or optionally up to 21) other markers from the markers listed in Table 1. In some examples, the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including LR52, GM07 and LR44). For example, the total number of markers within the panel may be up to 14, up to 10, or up to 6 markers (including LR52, GM07 and LR44). As would be clear to a person of skill in the art, as the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative markers such as LR52, GM07 and LR44 are used.
In some examples, one, two or all three of the following loci from Table 1 are not used as markers in the methods described herein: LR48, and/or LR17, and/or LR43. In other words, in one example, the methods described herein do not include amplifying LR48 (on its own or in combination with LR17). In another example, the methods described herein do not include amplifying LR17 (on its own, or in combination with LR43). In another example, the methods described herein do not include amplifying LR43 (on its own or in combination with LR48). In another example, the methods described herein do not include amplifying any of LR43, LR48, or LR17.
Useful combinations of MSI markers from those listed in Table 1 are provided in Table 6, which provides different three, four, five and six marker combinations that can be used as a“core set” of markers (in other words, a minimal set of markers) for MSI classification because, on their own, these combinations of three, four, five and six markers are able to accurately classify MSI status in a sample. Any of the three, four, five or six marker combinations shown in Table 6 may be used in the methods described herein, either on their own, or in combination with any of the other markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status) to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method. In other words, any of the combinations provided in Table 6 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19, 20 or 21 additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). A MSI marker panel comprising any of the combinations listed in Table 6 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than the core markers of the combination listed in Table 6), which may include up to 20 (or optionally 21) other markers from the markers listed in Table 1. In some examples, the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6, up to 5, up to 4, up to 3 etc markers (including the core markers of the combination listed in Table 6). As would be clear to a person of skill in the art, as the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative combinations of markers such as those listed in Table 6 are used.
In the Examples below the inventors describe how they identified and validated a 6-marker panel of GM07, GM1 1 , GM14, LR36, LR44 and LR52. Therefore, in the methods and kits described herein a core marker set of GM07, GM11 , GM14, LR36, LR44 and LR52 may be used. The combination of GM07, GM11 , GM14, LR36, LR44 and LR52 may be used in the methods described herein, either on its own, or in combination with any of the other markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status) to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method. In other words, the combination of GM07, GM1 1 , GM14, LR36, LR44 and LR52 may be combined with 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16 or 17 (and optionally all 18) additional markers listed in Table 1 (and/or with any other markers that are not listed in Table 1 but that may provide additional information on MSI status). A MSI marker panel comprising combination of GM07, GM1 1 , GM 14, LR36, LR44 and LR52 may therefore have a minimum of 0, 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24 etc additional markers (other than combination of GM07, GM 11 , GM14, LR36, LR44 and LR52), which may include up to 17 (or optionally 18) other markers from the markers listed in Table 1. In some examples, the total number of markers within the panel would be up to 24, up to 23, up to 20, up to 16, up to 14, up to 12, up to 10, up to 8, up to 6 etc markers (including GM07, GM11 , GM14, LR36, LR44 and LR52). As would be clear to a person of skill in the art, as the total number of markers is reduced, the overall cost and simplicity of the MSI classification method is also reduced. This reduction is possible when highly informative combinations of markers such as GM07, GM1 1 , GM14, LR36, LR44 and LR52 are used.
In the Examples below the inventors also describe how they validated a 24-marker panel (using the 24 markers listed in Table 1). Therefore, in the methods and kits described herein the 24 marker set of Table 1 may be used. The combination of 24 markers listed in Table 1 may be used in the methods described herein, either on its own, or in combination with any of the other markers that are not listed in Table 1 but that may provide additional information on MSI status to improve MSI classification accuracy and/or provide some redundancy within the MSI classification method.
One of the benefits of the invention, compared to previously proposed methods, and kits using MSI markers, is that surprisingly fewer markers may be used, leading to cost savings and efficiency increases. Therefore, the number of microsatellite loci of Table 1 that are used may be limited, for example less than 16 of the microsatellite loci of Table 1 may be included, or less than 12, or less than 10. From 3 to 10 microsatellite loci may be selected from Table 1 for use in the methods and kits of the invention, for example from 3 to 8.
The methods described herein recite the minimum (and in some cases also maximum) number of loci listed in Table 1 that can be used. For the avoidance of doubt, where a maximum number of loci from Table 1 is recited (e.g.“up to 4 other microsatellite loci listed in Table 1”, “up to twenty-three other microsatellite loci listed in Table 1”, up to 22 other microsatellite loci listed in Table 1”,“up to 15 loci listed in Table 1”,“up to 9 other microsatellite loci listed in Table 1” etc) this maximum number only refers to the markers listed in Table 1. In other words, additional markers that are not listed in Table 1 that may also be useful (e.g. because they provide information about some other aspect of the subject or sample) may be included in the methods recited herein. A non-limiting example of such a marker is AP003532_2.
The methods provided herein amplify the selected loci (optionally together with an associated SNP) to generate a microsatellite amplicon, which is subsequently sequenced. Although the invention is exemplified herein using molecular inversion probes (MIPs; e.g. single-molecule molecular inversion probes (smMIPs)) to amplify the selected marker loci, any other appropriate technique for amplifying the selected loci may be used. Alternative appropriate methods are well known in the art and include conventional PCR. In other words, the methods may use any appropriate nucleic acid sequence (e.g. primer and/or probe) that enables amplification of the selected marker loci. The amplification step may amplify each selected locus individually (in a separate reaction), or may comprise co-amplifying some or all of the selected loci in a multiplex amplification reaction. Suitable primers and/or probes may be selected for the chosen method using standard techniques. For example, in methods wherein a single nucleotide polymorphism (SNP) within a short distance of the selected microsatellite loci is to be amplified together with the locus in order to generate a single amplicon encompassing the locus plus SNP, primers and/or probes that amplify both the microsatellite loci and the SNP within a short distance of the microsatellite loci need to be used.
The methods provided herein subsequently sequence the microsatellite amplicons generated in the amplification step. Any appropriate sequencing method may be used. For example, high throughput or next generation sequencing, sequencing-by-synthesis, ion semiconductor sequencing or ion torrent sequencing and/or pyrosequencing may be used.
The methods provided herein then compare the sequences from the microsatellite amplicons to predetermined sequences and determine any deviation, indicative of instability, from the predetermined sequences. In this context, deviation may be in the form of an insertion or deletion when compared to the predetermined sequences. Methods for detecting an insertion or deletion are well known in the art.
The methods described herein may include the step of determining allelic imbalance. Assessing whether length variants are concentrated in sequence reads from one SNP allele offers an additional criterion to differentiate between PCR artefacts and mutations that occur in vivo, and can provide additional discrimination between MSI and MSS samples. This is because PCR artefacts are likely to affect both alleles equally, whereas instability is a stochastic event affecting a single allele at a time. This can lead to bias in the levels of instability observed between the alleles at a single microsatellite marker, even if both are unstable. The methods may therefore allow for the evaluation of the biological significance of any microsatellite instability in the sample, and may comprise amplifying both the microsatellite loci and an SNP within a short distance of it in a single amplicon, using the primers, and, for heterozygous SNPs, determining whether there is a bias between indel frequencies for the two alleles of the sample.
Methods for evaluating the biological significance of sequence variation identified during sequencing are therefore provided herein, comprising:
(a) amplifying from the sample one or more microsatellite mono-nucleotide repeat loci to give microsatellite amplicons, wherein the one or more microsatellite loci:
(i) comprise GM07 and up to four other microsatellite loci listed in Table 1 ; or
(ii) comprise LR44 and up to four other microsatellite loci listed in Table 1 ; or
(iii) comprise LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44; and wherein each microsatellite loci has a single nucleotide polymorphism (SNP) loci within a short distance of the microsatellite loci and said amplifying step amplifies both the microsatellite loci and the associated SNP in a single amplicon;
(b) sequencing the microsatellite amplicons;
(c) comparing the sequences from the microsatellite amplicons to predetermined sequences (e.g. wild type sequences) and determining any deviation, indicative of instability, from the predetermined sequences; and
(d) for heterozygous SNPs, determining whether there is a bias between indel frequencies for the two alleles.
The preferred features of this method, e.g. details of the microsatellite loci, amplification step, etc, may be in accordance with those described above for the methods for evaluating levels of microsatellite instability in a sample, since the methods are clearly related.
A method as above may be useful for identifying mismatch repair defects, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of a mismatch repair defect.
A method as above may be useful for identifying MSI-H, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of the sample having high levels of microsatellite instability (MSI-H). A kit is also provided herein for use in the methods of the invention. The kit may comprise primers and/or probes for amplifying microsatellite loci in accordance with the above. The kit may also comprise a thermostable polymerase and/or labeled dNTPs or analogs thereof. The labeled dNTPs or analogs thereof may be fluorescently labeled. The kit may comprise, as well as the primers and/or probes for amplifying the microsatellite loci in accordance with the methods of the invention, reagents necessary for carrying out the methods of the invention, for example enzymes, dNTP mixes, buffers, PCR reaction mixes, chelating agents and/or nuclease-free water. The kit may comprise instructions for carrying out a method of the invention.
The combinations of MSI markers provided herein can be used to accurately classify samples as either MSI and MSS. The former corresponds to samples classified as MSI-H by fragment analysis while the latter includes MSS samples and samples with low levels of instability (MSI- L). Identifying that the markers GM07, LR44 and LR52 are particularly informative for accurate MSI classification is advantageous, as it provides new options for MSI marker panels and MSI screening methods with a reduced number of total markers. The methods provided herein are advantageous due to their simplicity and reduced cost. Furthermore, accurate MSI classification using the methods provided herein does not require control normal DNA. Establishing whether tumours have resulted from a breakdown in mismatch repair is important in clinical management of the individual and can help prevent future cancers in those families where there is a germline molecular defect. A scalable, reliable MSI test will have clinical utility while modest costs and the ability to link this analysis to routine pathology assessment will help to ensure rapid adoption and facilitate further molecular approaches to tumour profiling and precision medical care.
Definitions
The term“microsatellite” or“microsatellite regions” as used herein refers to mono-, di-, tri-, tetra, penta- or hexanucleotide repeats in a nucleotide sequence, consisting of at least two repeat units and with a minimal length of 6 bases. A particular subclass of microsatellites includes the homopolymers.
The terms “microsatellite mono-nucleotide repeat loci”, “microsatellite loci”, “microsatellite MNR loci” and “marker” are used interchangeably herein, unless the context specifically indicates otherwise.
“Homopolymer” as used herein refers to a microsatellite region that is a mononucleotide repeat of at least 6 bases; in other words, a stretch of at least 6 consecutive A, C, T or G residues if looking at the DNA level. Most particularly, when determining microsatellites, one looks at genomic DNA of a subject (or of genomic DNA of a cancer present in the subject).
The term “MSI status” as used in the application refers to the presence of microsatellite instability (MSI), a germline or somatic change in the number of repeated DNA nucleotide units in microsatellites. MSI status can be one of three discrete classes: MSI-H, also referred to as MSI-high, MSI positive or MSI, MSI-L, also referred to as MSI-low, or microsatellite stable (MSS), also referred to as absence of MSI. Typically, to be classified as MSI-H, at least 30% of the markers used to classify MSI status need to score positive, while for the MSS classification, 0% score positive. If an intermediate number of markers scores positive, the tumour is classified as MSI-L. However, classification criteria can vary depending upon the number of markers used, and scoring method. Alternatively, only the difference between presence and absence of microsatellite instability is assessed, in which case the status is either presence of MSI or absence of MSI (=MSS).
An“indel” as used herein refers to a mutation class that includes both insertions, deletions, and the combination thereof. An indel in a microsatellite region results in a net gain or loss of nucleotides. The presence of an indel can be established by comparing it to DNA in which the indel is not present (e.g. comparing DNA from a tumour sample to germline DNA from the subject with the tumour), or, especially in case of monomorphic microsatellites or homopolymers, by comparing it to the known length of the microsatellite, particularly by counting the number of repeated units.
The term “cancer” as used herein, refers to different diseases involving unregulated cell growth, also referred to as malignant neoplasm. The term“tumour” is used as a synonym in the application. It is envisaged that this term covers all solid tumour types (carcinoma, sarcoma, blastoma), but it also explicitly encompasses non-solid cancer types such as leukemia. Thus, a“tumour sample” encompasses both solid tumour samples (e.g. tissue biopsies) as well as biological fluid samples (e.g. those that have been obtained or isolated from a bodily fluid such as urine, blood, plasma, serum etc). As would be clearly understood by a person of skill in the art, the sample can be described as a“sample of tumour DNA”. The tumour DNA may be present within a bodily fluid such as urine, blood, plasma, serum etc and may be isolated from the bodily fluid prior to performing the methods described herein. Any appropriate method for obtaining or isolating the tumour DNA may be used. Several appropriate methods are well known in the art. Typically, a sample of tumour DNA has at one point been isolated from a subject, particularly a subject with cancer. Optionally, it has undergone one or more forms of pre-treatment (e.g. lysis, fractionation, separation, purification) in order for the DNA to be sequenced, although it is also envisaged that DNA from an untreated sample is sequenced. As used herein, the noun“subject” refers to an individual vertebrate, more particularly an individual mammal, most particularly an individual human being. A“subject” as used herein is typically a human, but can also be a mammal, particularly domestic animals such as cats, dogs, rabbits, guinea pigs, ferrets, rats, mice, and the like, or farm animals like horses, cows, pigs, goat, sheep, llamas, and the like. A subject can also be a non-mammalian vertebrate, like a fish, reptile, amphibian or bird; in essence any animal which can develop cancer fulfils the definition.
The term“Lynch syndrome” as used herein refers to an autosomal dominant genetic condition which has a high risk of colon cancer as well as other cancers including endometrium, ovary, stomach, small intestine, hepatobiliary tract, upper urinary tract, brain, and skin cancer. The increased risk for these cancers is due to inherited mutations that impair DNA mismatch repair. The old name for the condition is HNPCC.
As used herein, the terms “microsatellite loci”, or “repeat” and “marker” are used interchangeably where the context allows.
As used herein, the terms“GM07” and“GM7” are used interchangeably herein. Similarly, the terms“AP003532_2” and“AP0035322” are also interchangeable.
Aspects of the invention are demonstrated by the following non-limiting examples.
EXAMPLES
The inventors have amplified and sequenced 24 microsatellite markers and trained a classifier using 98 CRCs, which accommodates marker specific sensitivities to MSI. Sample classification achieved 100% concordance with the MSI Analysis System v1.2 (Promega) in three independent cohorts, totalling 220 CRCs. The inventors have therefore demonstrated the clinical validity of a specific 24 microsatellite marker panel described herein. The markers of the 24 microsatellite loci marker panel are listed in Table 1.
Figure imgf000025_0001
Figure imgf000026_0001
Table 1 - 24 microsatellite loci marker pane
The inventors then used backward-forward stepwise selection to identify a 6-marker subset (from the 24 original markers) which shows equal accuracy to the 24-marker panel. Surprisingly, assessment of assay detection limits tested herein showed that the 6-marker panel is only marginally less robust to sample variables than the 24 marker subset. The 6- marker subset therefore represents a very useful reduced marker panel for accurate detection of MSI in a sample. The markers of the 6 microsatellite marker panel are GM07, GM11 , GM14, LR36, LR44 and LR52 (see Table 1).
The inventors have performed further computational analysis of the original 24 marker panel to identify particularly informative individual markers and combinations. They have surprisingly identified that MSI classification is still accurate when the marker number is further reduced, and that the markers GM07, LR44 and/or LR52 are particularly informative.
Based on their analysis, the inventors have identified that GM07, LR44 and/or LR52 are much more discriminating than other markers and are present at a much higher frequency in small marker sets which have high MSI classification accuracy. Advantageously, the presence of one, two or three of these markers in an MSI marker panel enables a reduced total number of markers to be used without loss of MSI classification accuracy.
The original 24 marker panel was subdivided into different subgroups of three, four, five or six markers and the MSI classification accuracy for each subgroup of markers was determined, using the original data generated for each of these markers in the original 24 marker dataset. To exemplify the methodology used, the inventors combined the 24 microsatellite loci markers of Table 1 into subgroups of 6 markers, resulting in 134,569 different 6-marker subgroup combinations. Of these 134,569 different 6-marker subgroup combinations only 1 132 (0.8%) gave an MSI classification accuracy of >0.999. Surprisingly, when the frequency of each marker in the accurate 6mer combinations was analysed, three markers were clearly present far more often than the others: GM07 (present in ~90% of the 6 marker sets), LR52 (present in ~75% of the 6 marker sets), and LR44 (present in ~71 % of the 6 marker sets) - see Table 2 columns 1-3. Notably, there is a big drop to the 4th most common marker (LR17, ~36%). Importantly, all of the 1 132 combinations (i.e. having an accuracy >0.999) contain at least one of these core markers. A strikingly similar pattern was observed when 5mer and 4mer combinations were analysed (see Table 2 columns 4-7). GM07, LR52 and LR44 are all present at frequencies of 0.75 or above, with no other marker reaching a frequency of 0.3. As the number of markers used decreases, the requirement for using one or more of these 3 markers (GM07 /LR52 / LR44) increases, and they are the markers present in the only 3mer combination that is accurate out of 2024 possible triplets (Table 2 columns 8-9). Histograms summarizing the accuracy of all markers singly, and in all combinations of 2, 3, 4, 5 and 6 markers, are presented in Figure 7. Only 0.05% of 3mers (1/2024), 0.2% of 4mers (22/10626), 0.4% of 5mers (162/42504), and 0.8% of 6mers (1 132/134596) give an accuracy of >0.999.
Figure imgf000027_0001
Table 2 - Representation of individual markers among six, ive, four and three marker combinations with accuracy >0.999. Number of appearances and frequencies are shown.
Furthermore, when the representation of triplets is analysed within larger marker combinations (4-6 markers) with accuracy >0.999, this triplet (GM07, LR52, LR44) is present more often than any other triplet within all six marker, five marker and four marker combinations (frequencies of 0.421 , 0.5 and 0.591 , see Tables 3 to 5). A full list of all 3-6 marker combinations with accuracy >0.999 is given in Table 6.
The inventors have therefore identified that GM07, LR44 and/or LR52 are particularly useful markers for MSI. Reduced MSI marker panels and assays are therefore provided herein using at least one, at least two or all three of GM07, LR44 and/or LR52 for accurate MSI classification.
Figure imgf000028_0001
Figure imgf000029_0001
Table 3: Occurrence of triplet marker combinations within all 6 marker combinations with >0.999 accuracy. Information on triplets with a frequency below 0.05 is aggregated for space considerations.
Figure imgf000029_0002
Figure imgf000030_0001
Table 4: Occurrence of triplet marker combinations within all 5 mar <er combinations with
>0.999 accuracy. Information on triplets with a frequency below 0.05 is aggregated for space considerations.
Figure imgf000031_0001
Table 5: Occurrence of triplet marker combinations within all 4 marker combinations with
>0.999 accuracy. Information on triplets with a frequency below 0.05 is aggregated for space considerations.
Figure imgf000031_0002
Figure imgf000032_0001
Figure imgf000033_0001
3 9 PZL 358Ί 9178Ί 17C ILI 9 CC ILI930ILI9 5 59
58T6178 T17178 VZ.X ILI 9'17C ILI 9'Z.OILI 9
8mso/6iozao/i3d 361610/1303 OAV
Figure imgf000034_0001
Figure imgf000035_0001
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
8mso/6iozao/i3d 261610/1202 OAV
Figure imgf000042_0001
Figure imgf000043_0001
Figure imgf000044_0001
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Table 6. A. Al 3mer, 4mer, 5mer and 6mer marker combinations with accuracy >0.999.
Details of the experiments that were performed by the inventors to validate the 24 MSI marker panel and 6 MSI marker panel described herein are provided below.
MSI classification is accurate and reproducible using a 24 MSI marker panel or a 6 MSI marker panel
smMIPs were designed for the 17 short MNR markers previously described in the singleplex assay of Redford et al (2018). Two markers had to be excluded from the panel due to poor amplification by the smMIP protocol. To supplement this reduced panel, smMIPs were successfully designed for an additional 9 markers taken from the extended set of Redford et al (2018), giving a total 24 short MNR markers (data not shown). Having defined the marker panel, a training cohort of 51 MSI-H and 47 MSS CRCs was used to estimate classifier parameters. Reclassification of the training samples using these parameters achieved 100% sensitivity (95% Cls: 93.0-100.0%) and 100% specificity (95% Cls: 92.5-100.0%) (Figure 1A). Data filtering using smMIP molecular barcodes to reduce sequencing error (Hiatt et al., 2013) did not improve sample separation by the classifier, and therefore was not employed for MSI classification. The 15 markers remaining from the 17-marker panel of Redford et al (2018) also achieved 100% sensitivity and specificity (data not shown), indicating redundancy in the marker panel. Backward-forward stepwise selection was used to define a subset of 6 short MNRs (consisting of GM07, GM 11 , GM 14, LR36, LR44, and LR52, as described in Table 1) with accuracy equal to the 24-marker panel (Figure 1A).
An independent validation cohort of 50 MSI-H and 49 MSS was sequenced and analysed, and 100% sensitivity (95% Cls: 92.9-100.0%) and 100% specificity (95% Cls: 92.8-100.0%) was again achieved using all 24 markers, and the 6-marker subset (Figure 1 B). To assess assay reproducibility, 16 MSI-H and 16 MSS CRCs from the validation cohort were amplified, sequenced, and classified a second time. Classification was 100% concordant, and scores were strongly correlated between sample repeats, using both 24 markers (b = 0.97, R2 = 0.97), and the 6-marker subset (b = 1.01 , R2 = 0.97).
MSI classification detects 3% MSI-H cell line DNA in sample mixtures
To estimate the minimum MMR deficient tumour cell content required for a CRC to be classified as MSI-H, we mixed DNA from HCT 116 (a clonal, MMR deficient CRC cell line) with DNA from non-neoplastic PBLs to create, in triplicate, sample-mixtures containing 0.78-100% DNA from MSI-H cells. Across the mixtures, the observed and the theoretically expected proportion of reads containing insertion-deletion mutations in each microsatellite were strongly correlated (b = 1.03, R2 = 0.99), giving confidence in mixing accuracy. Mixtures containing ³3.13% and ³6.25% DNA from MSI-H cells were classified as MSI-H using the 24, and 6 marker sets, respectively (Figure 2A), results which are better than or equivalent to FLA (Table 7).
Figure imgf000050_0001
Table 7; Sample mixture series classification by Promega MSI Analysis System v1.2.† the clinical scientist called a marker as uncertain if they could not determine its mutation status with confidence.
The impact of sample heterogeneity on classification in siiico was further investigated by randomly selecting sequencing reads from MSI-H and MSS samples, and mixing them in predetermined proportions to create simulated samples. Scores from simulated samples were strongly correlated with scores from the mixing of DNAs (b = 0.97, R2 = 0.98). Mixing reads from all pairwise combinations of MSI-H and MSS samples from the validation cohort revealed that >95% of mixtures containing ³25.0 reads from an MSI-H CRC were classified as MSI-H, using the 24 marker panels, while ³27.5% of MSI-H CRC reads were needed to achieve the same level of classification accuracy using the 6 marker panel (Figure 2B). As reads from MSI- H CRCs are derived from heterogeneous mixtures of tumour and normal tissue, this supports the conclusion that the MSI classifier is robust to low MMR deficient tumour cell content.
MSI classification is accurate when 10 or more molecules are sequenced per marker
Whilst we found no improvement to classifier performance using molecular barcodes to correct sequencing error, molecular barcodes can be used to estimate the number of template molecules sequenced to provide a quality control metric (Jennings et al. , 2017). To establish this, and investigate the relationship between the number of template molecules sequenced and the accuracy of classification, the inventors created 2 fold dilution series of 9 samples, with template quantities ranging from 3.13-100ng of DNA per reaction. A strong correlation between the input quantity of template DNA and the number of molecular barcodes detected across the 9 samples (R2 = 0.99-1.00), confirmed the accuracy of dilution. Using the 24 marker panel, all samples with a mean molecular barcodes per marker ³75 were correctly classified, and among these there was no correlation between the number of molecular barcodes detected and any change in classifier score, relative to the baseline score from 100ng of template DNA (R2 = 0.10, p = 0.09; Figure 3A). However, below 75 molecular barcodes per marker, there was marked variability in score for three samples (Figure 3A). Results were similar using the 6 marker subset, except that of one MSS sample with a mean molecular barcodes detected per marker ³75 was misclassified (Figure 3A). In agreement with these estimates, only one sample from the training and validation cohorts had a mean molecular barcodes detected per marker <75 (Table 8), and all were correctly classified with either the 24, or 6, marker panels.
To explore the minimum number of template molecules that need to be sequenced for accurate classification, we also performed an in silico resampling of sequencing data. Analysis of the 9 sample dilution series gave a strong correlation between classifier scores from empirical observations and from resampling (b = 0.92, R2 = 0.96). Resampling of the CRCs included in the validation cohort was used to increase the number of observations, and it was found that, using the 24 and 6 marker panels, sequencing of ³10 and ³15 molecular barcodes per marker, respectively, gave correct classification of >95% of samples (Figure 3B). It should be noted that these estimates were obtained from resampling high quality sequencing data. Therefore, the mean of 75 molecular barcodes per marker obtained empirically provides a more conservative threshold for diagnostic use.
Validation in an independent clinical laboratory Assessment of an assay's performance in an independent clinical laboratory supports that it is a reproducible method, suitable for wider adoption (Jennings et al. , 2017). To test this, the inventors' smM IP-based MSI assay was set up by the Northern Genetics Service (Newcastle Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK) by providing protocols and smMIP and primer oligonucleotides. All other reagents and equipment were distinct from those used during assay development, and the personnel running the assay were independent from our research team. Once established, a further 23 independent CRCs were analysed using the assay, and it again achieved 100% sensitivity (95% Cls: 79.4-100.0%) and 100% specificity (95% Cls: 59.0-100.0%) relative to the MSI Analysis System v1.2, when classifying samples with both the 24, and 6, marker panels (Figure 4). Although four samples had <75 molecular barcodes per marker detected (Table 8), they were accurately classified in agreement with read sampling predictions, and so were not re-sequenced at a higher depth.
Discussion
The MSI assay presented here achieved 100% accuracy of MSI classification in 220 CRCs, relative to the MSI Analysis System v1.2 (Promega), using only tumour DNA and as few as 6 microsatellite markers. We found no improvement to classifier performance using molecular barcodes for sequencing error correction (Hiatt et al., 2013). This is likely due to our use of short MNRs with flanking SNPs, selected from genome-wide data, and our classification method. Shorter microsatellites have lower PCR and sequencing error rates compared to longer microsatellites (Fazekas et al., 2010), while the SNPs flanking the microsatellites provide additional discrimination between error and true microsatellite mutations. Classification by a naive Bayesian approach accounts for individual marker sensitivity, specificity, and sequencing error rate (Redford et al. , 2018). However, molecular barcodes are used in our assay to provide a quality control metric by estimating the number of independent molecules sequenced (Jennings et al., 2017). We have also shown, previously, that molecular barcodes are useful for the detection of much lower frequency microsatellite variants, found in the PBLs of patients with constitutional mismatch repair deficiency (CMMRD).
To show that the assay is suitable for clinical practice, clinical validity was tested according to published guidelines (Jennings et al., 2017). Accuracy was 100% across three cohorts of clinical samples, which included poor quality DNA samples from FFPE tissue, and 23 CRCs analysed by an independent diagnostic laboratory. 100% classification concordance was observed in repeat testing, and robustness to sample heterogeneity was assessed, with detection of 3% or 6% MSI-H cell line DNA in sample mixtures, for the 24 and 6 marker panels, respectively. Depending on the marker panel used, it is estimated that 10 or 15 molecular barcodes per marker are required for correct classification of >95% of samples. It is, therefore, possible to accurately determine MSI status using only 6 markers, a fraction of the number required by other NGS-based MSI assays (Kautto et al., 2016; Waalkes et al., 2018; Zhu et al., 2018), only a small reduction in assay robustness is observed when using this subset rather than the 24 marker panel. The requirement of other NGS-based MSI classifiers for larger marker panels may be explained by the classification method, as assessing the proportion of mutated microsatellites gives equal diagnostic weight to each marker, and does not account for the variable influence of MMR deficiency on the mutation of individual microsatellites (Dietmaier et al., 1997).
Marker number has a significant impact on cost, and with only 6 markers, plus BRAF c.1799 for streamlined LS screening (Newland et al., 2017), reagent cost estimates range from £5.50- 6.77 per sample, depending on the capacity of the MiSeq kit used. The 24 marker set may, however, be preferred for a variety of reasons. It could provide protection against allele or marker drop out, due to technical variation, somatic events within tumours, or population specific sequence variants on marker length or stability. It may also enhance the clinical utility of the assay as it increases the power of the internal sample traceability provided by the SNPs linked to each marker. For instance, using the allele frequencies observed in the training cohort, the probability of any two individuals sharing the same genotype is 3.8x10-3 from the 6 marker subset, but 3.6x10-10 when 24 markers are used (Tables 10A, 10B and 10C).
The clinical demand for MSI analysis may increase, driven by the need to predict patient response to immune checkpoint blockade therapy across multiple cancer types (Le et al., 2017). The frequency of mutations in non-coding microsatellites has been shown to be equivalent between different cancer types (Cortes-Ciriano et al., 2017), and the 24 marker panel can detect CMMRD from PBL DNA (Gallon et al., 2019), making it likely that the assay will be suitable for MSI detection in extra-colonic tissues.
In summary, the MSI assay outlined here is accurate, reproducible, robust to sample heterogeneity, and includes both internal quality controls and sample identification. The automatable laboratory workflow and analysis, and the need for as few as 6 microsatellite markers at moderate read depths provides a cheap and scalable option for high-throughput MMR deficiency testing. The greatly reduced marker number, also means that the assay is potentially suitable for application to the detection of MSI within liquid biopsy material, for example to allow early detection or monitoring of tumours with MSI.
Materials and Methods Unless stated otherwise, manufacturer's protocols were followed for each kit or reagent.
Patient samples
Nineteen and 73 CRC DNAs were provided by the Department of Molecular Pathology, University of Edinburgh, UK, and the Oncogenetics and Hereditary Cancer Group, Complejo Hospitalario de Navarra, Spain, respectively. These 92 samples were residual stocks from Redford et al (2018). An additional 128 CRC DNAs or CRC formalin-fixed paraffin-embedded (FFPE) tissue samples were provided by the Northern Genetics Service, Newcastle Hospitals NHS Foundation Trust, UK. Nineteen DNAs extracted from peripheral blood leukocytes (PBLs), from patients consenting to sample-use in assay development, were gifted by K. Wimmer (Medical University of Innsbruck, Austria) and used as microsatellite stable (MSS) controls.
All CRC samples (Table 8) were independently tested for MMR deficiency by the contributing laboratory using the MSI Analysis System v1.2 (Promega); samples with one mutant marker (MSI low) were considered equivalent to MSS samples (Halford et al., 2002).
All samples were anonymised by the contributing laboratory and analysed following approval by the NHS Health Research Authority Research Ethics Committee (13/LO/1514).
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
Figure imgf000058_0001
Figure imgf000059_0001
Figure imgf000060_0001
Table 8. CRC sample details and assay scores.†MSI status as determined by the contributing pathology laboratory, using the MSI Analysis System v1.2 (Promega). ǂBRAF V600E testing was conducted in a subset of MSI-high samples by the contributing pathology laboratory, using High Resolution Melt Curve analysis (Nikiforov et al., 2009). Indept Lab = independent laboratory.
Cell lines and culture
H9 embryonic stem cell line (WiCell) DNA was a gift from L. Lako (Newcastle University, UK), and used as an MSS control. HCT116 (CCL-247, ATCC) and K562 (CCL-243, ATCC) cells were gifted by J. Irving (Newcastle University, UK). HCT116 and K562 cells were grown in RPMI growth medium containing 2mM L-glutamine (Gibco), 10% fetal bovine serum (Gibco), 60mg/ml penicillin and 100mg/ml streptomycin (Gibco) at 37°C and 5% CO2. HCT116 cells were passaged or harvested at 80-90% confluence by decanting expired growth medium, washing in 5ml PBS (Gibco), and detaching the cells using 0.05% Trypsin-EDTA (Gibco). K562 cells were passaged or harvested at a density of 1x106cells/ml. DNA extracted from HCT1 16 CRC cell line (MLH1 deficient) was used as an MSI-H control. DNA extracted from K562 chronic myeloid leukaemia cell line was used as an MSS control.
Sample mixtures and dilutions
Mixtures of MSI-H and MSS samples were created using HCT 116 and PBL DNAs (Table 9). 9 samples, comprising 3 fresh tissues (HCT116, H9, and K562 cell lines) and 6 FFPE tissues (3 MSI-H CRCs: N021 , N068, and N073, and 3 MSS CRCs: N033, N036, and N056), were 2- fold serially diluted in 10mM Tris-HCI pH 8.5.
Figure imgf000061_0002
Table 9 Generation of sample mixtures wil h varying MSI-H content.
Assessing the effect of DNA mixing
Assuming there is no selective amplification of microsatellite alleles from MSI-H or MSS cells, a comparison of the observed relative frequency of microsatellite length mutations in DNA mixtures of HCT116 (MSI-H cell line) and PBLs (MSS cells) can be compared to the expected frequency (fmix) to assess the accuracy of mixing. fmix was calculated from the mean frequencies of microsatellite length mutations observed from HCT1 16 DNA (fMSI) and PBLs (fMSS), and the proportion of MSI-H cell DNA in the mixture (pmix), using the equation:
Figure imgf000061_0001
fmix = pmix • fMSI + (1 -pmix)• fMSS
The observed frequencies were strongly correlated with expected frequencies (b = 1.03, R2 = 0.99, Figure 5A).
Simulating additional samples of varying MSI-H cell content
Additional samples of varying MSI-H content were simulated by mixing reads from one MSI- H sample and one MSS sample. For each marker in a simulated sample, reads were randomly mixed in the desired proportion, with the total number of reads per marker equal to the reads per marker of the MSI-H sample used. Classifier scores of simulated and empirical sample mixtures were strongly correlated using data from HCT1 16 and PBLs (b = 0.97, R2 = 0.98; Figure 5B), supporting the validity of the method. DNA extraction and quantification
DNA was extracted from FFPE CRC tissue using the GeneRead DNA FFPE Kit (Qiagen). DNA was extracted from cell lines using the Wizard Genomic DNA Purification Kit (Promega). DNAs were quantified using QuBit 2.0 Fluorometer (Invitrogen) and QuBit dsDNA BR/HS Kits (Invitrogen).
Markers and smMIP design
The marker panel includes 24 MNRs, previously published by Redford et al (2018), for MSI classification, as well as BRAF c.1799 to screen for sporadic MSI-H CRCs Newland et al., 2017). MIPgen (Boyle et al., 2014) was used to generate smMIP sequences for each marker.
MIPgen parameters were: tag size 6,0, minimum capture size 120, and maximum capture size 150. smMIP designs were selected by the following criteria: no common single nucleotide polymorphisms (SNPs) in the smMIP extension or ligation arms, logistic score >0.8, and successful amplification of loci. Marker loci and smMIP sequences are detailed in Table 7.
Figure imgf000062_0001
Figure imgf000063_0001
Table 10 - Marker loci, smMIPs and SNP frequencies; (A) Marker loci of the invention; BRAF is included for reference, though it is not a microsatellite marker of the invention. † Chromosomal coordinates are specified from reference genome hg19.
Figure imgf000063_0002
Figure imgf000064_0001
Table 10 contd - - Marker loci, smMIPs and SNP frequencies; (B) SNP/variant ID, SNP/variant locus, reference allele and variant allele information relating to the microsatellite loci of the invention; BRAF is also included for reference
Figure imgf000064_0002
Figure imgf000065_0001
Table 1 0 contd - - Marker loci, smMIPs and SNP allele frequencies; (C) smMIPs of the microsatellite loci of the invention; BRAF is also included for reference
Figure imgf000065_0002
Figure imgf000066_0001
Table 10 contd - Marker loci, smMIPs and SNP frequencies; (C) Allele frequencies of SNPs associated with the microsatellite loci of the invention; equivalent information for BRAF is also included for reference. ǂ The probability of any two patients having the same genotype can be calculated from the observed frequencies of each SNP genotype in a marker. The product of the probabilities for each SNP can be used to determine the probability of any two patients having the same genotype across multiple markers; to avoid the potential of linkage disequilibrium where a marker has multiple SNPs, the SNP with the lowest match probability was used. Using the 24 marker panel, the probability of two patients having the same genotype is 3.58x10-10. Using the 6 marker panel, the probability of two patients having the same genotype is 3.82x10-3.
Oligonucleotide synthesis
smMIPs, and primers for amplification and sequencing (Table 12 below), were synthesised by and purchased from Metabion. smMIP phosphorylation and pooling smMIPs were individually phosphorylated using 10U of T4 Polynucleotide Kinase (NEB), 1X T4 DNA Ligase buffer (NEB), and 1 mM of un-phosphorylated smMIP in a 100mI reaction volume, and incubated at 37°C for 45 minutes, followed by 80°C for 20 minutes. Phosphorylated smMIPs were pooled, with specified volumes for each smMIP to equalise the number of reads from each marker locus, and diluted using TE buffer (Sigma) to an average concentration of 0.1 nM (Table 11).
Figure imgf000067_0001
Figure imgf000068_0002
Figure imgf000068_0001
Table 11 - Read-balancing smMIPs; † 100-fold dilution to achieve average smMIP concentration of 0.1nM; Read-balancing the smMIP pool reduced the coefficient of variance of reads assigned to different markers from 68% to 35%.
smMIP amplification smMIP-multiplexed amplification was based on Hiatt et al (2013) using a SensoQuest thermocycler (SensoQuest GmbH), with minor modifications to the protocol. Herculase II Polymerase (Agilent) was used during extension and amplification steps for increased fidelity of microsatellite replication (Fazekas et al., 2010). For amplification, the thermocycler programme used 98°C for 2 minutes, 30 cycles of 98°C for 15 seconds, 60°C for 30 seconds and 72°C for 30 seconds, followed by 72°C for 2 minutes. 100ng of sample DNA was used as template unless stated otherwise: the input quantity of CRC sample DNA varied depending on quantity available (Table 8). smMIP reaction products (240-270bp) were analysed using 3% Agarose gel electrophoresis at 80mV for 60 minutes, or a QIAxcel (Qiagen).
Library preparation and sequencing smMIP amplicons were purified using Agencourt AMPure XP Beads (Beckman Coulter), diluted to 4nM in 10mM Tris pH 8.5, and pooled in equal volumes. Libraries were sequenced on a MiSeq (lllumina) using the GenerateFastq workflow, paired end sequencing and custom sequencing primers (Hiatt et al., 2013); sequencing run statistics are presented in Table 13. Fastq files are available from the EMBL-EBI European Nucleotide Archive, accession number PRJEB28394.
Figure imgf000069_0001
Table 12A - Oligonucleotide sequences (primers); (A) General single-molecule molecular inversion probe (smMIP) sequences
Figure imgf000069_0002
Figure imgf000070_0001
Table 12B - Oligonucleotide sequences (primers); (B) Sample Index IDs and Sequences
Figure imgf000071_0001
Table 13 - Sequencing Runs
Sequence analysis and MSI classification
Sequence analysis and MSI classification was carried out as described in by Redford et al (2018) which is incorporated herein by reference in its entirety; see also WO2018/037231 , which is also incorporated herein by reference in its entirety. In brief, sequencing reads were aligned to the hg19 reference genome using BWA mem (BWA vO.6.2) (Li & Durbin, 2010). smM IP-based sequencing assesses the regions of interest in both orientations, and only base calls supported by both reads of a pair were processed further. The MSI classifier uses both the frequency and allelic bias of deletions in the microsatellite markers to type each sample. The deletion frequency was defined as the proportion of reads that have a microsatellite length less than the reference length. For samples heterozygous at the neighbouring SNP, the allelic bias of deletions, i.e. whether deletions are preferentially observed in reads carrying one of the SNP alleles, was assessed using Fisher's Exact test p value. For each marker, deletion frequency and allelic bias were dichotomised into two binary traits; deletion frequency is assessed by whether it is above or below the 95th percentile of the training MSS samples, and allelic bias is assessed by whether the p value is above or below 0.05. A training cohort of samples was used to estimate the probabilities of observing the different combinations for each marker in MSI-H and MSS tumours. The (posterior) probability that a new sample is MSI- H versus MSS can then be estimated from its microsatellite deletion frequencies, and the allelic bias of deletions, using a naive Bayes approach. A prior probability of 0.85 that a sample is MSS was used. The assay score represents the decadic logarithm of the odds a sample is MSI-H versus MSS. Scores >0 classify a sample as MSI-H, and scores <0 classify a sample as MSS.
Assessing the effect of sample dilutions on the number of molecular barcodes detected
Assuming that each molecular barcode originates from a distinct template DNA molecule, the number of molecular barcodes detected can be used to assess the accuracy of the sample dilution series. The number of molecular barcodes detected was correlated with the quantity of template DNA for each of the 9 samples (b = 0.84-0.96, R2 = 0.99-1.00, Figure 6B), suggesting dilutions were accurate. One sample shows a lower than expected number of molecular barcodes from 6.25ng of template DNA, which is also visible from a reduced intensity of amplicon in the gel image (Figure 6A), suggesting there was an error in reaction preparation.
Simulating sample dilutions to decrease molecular barcodes detected
Additional sample dilution series were simulated by resampling of reads. For each marker in a sample, reads were grouped by molecular barcode, and the microsatellite length and SNP genotype associated with that molecular barcode was summarised from that found in the majority of reads in the group. A predetermined number of molecular barcodes was selected to simulate sample dilution. Subsequently, reads for the simulated sample were generated to a depth equal to that of the original sample, with each read having a defined microsatellite length and SNP genotype by random sampling of the selected molecular barcodes.
For each of the 9 samples that were empirically analysed, 20 simulated dilution series were generated, using 7, 1 1 , 17, 25, 38, 57, 86, 129, 194, 291 , 437, 656 and 985 molecular barcodes per marker, and classifier scores were modelled by non-linear regression using cubic splines, such that a simulated score could be predicted for any given number of molecular barcodes per marker. Classifier scores of simulated and empirical sample dilutions, of the same mean number of molecular barcodes per marker, were strongly correlated using data from the 9 samples (b = 0.92, R2 = 0.96; Figure 6C), supporting the validity of the method.
Statistics and graphics
Analyses were performed in R v3.3.1 , and graphs generated with R package ggplot2.
It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims are generally intended as“open” terms (e.g., the term“including” should be interpreted as“including but not limited to,” the term“having” should be interpreted as“having at least,” the term“includes” should be interpreted as“includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g.,“a” and/or“an” should be interpreted to mean“at least one” or“one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitations," without other modifiers, means at least two recitations, or two or more recitations). It will be appreciated that various embodiments of the present disclosure have been described herein for purposes of illustration, and that various modifications may be made without departing from the scope and spirit of the present disclosure. Accordingly, the various embodiments disclosed herein are not intended to be limiting, but instead the scope of the invention is indicated by the following claims, taking account of equivalents where permitted by law.
References
Bacher, J., Flanagan, L., Smalley, R., Nassif, N., Burgart, L., Halberg, R., . . . Thibodeau, S. (2004). Development of a fluorescent multiplex assay for detection of MSI-High tumors. Disease Markers, 20(4-5), 237-250. doi: 10.1 155/2004/136734
Balmana, J., Balaguer, F., Cervantes, A., & Arnold, D. (2013). Familial risk-colorectal cancer: ESMO Clinical Practice Guidelines. Annals of Oncology, 24 Suppl 6, vi73-80.
doi: 10.1093/annonc/mdt209. doi: 10.1093/annonc/mdt209
Berg, K., Glaser, C., Thompson, R., Hamilton, S., Griffin, C., & Eshleman, J. (2000).
Detection of microsatellite instability by fluorescence multiplex polymerase chain reaction. Journal of Molecular Diagnostics, 2(1), 20-28. doi:10.1016/S1525-1578(10)60611-3
Boland, C., Thibodeau, S., Hamilton, S., Sidransky, D., Eshleman, J., Burt, R., . . .
Srivastava, S. (1998). A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Research, 58(22), 5248-5257.
Boyle, E., O'Roak, B., Martin, B., Kumar, A., & Shendure, J. (2014). MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing.
Bioinformatics, 30(18), 2670-2672. doi:10.1093/bioinformatics/btu353
Burn, J., Gerdes, A., Macrae, F., Mecklin, J., Moeslein, G., Olschwang, S., . . . Bishop, D. (201 1). Long-term effect of aspirin on cancer risk in carriers of hereditary colorectal cancer: an analysis from the CAPP2 randomised controlled trial. The Lancet, 378(9809), 2081-2087. doi : 10.1016/S0140-6736(11 )61049-0
Chapusot, C., Martin, L., Bouvier, A., Bonithon-Kopp, C., Ecarnot-Laubriet, A., Rageot, D., . .
. Piard, F. (2002). Microsatellite instability and intratumoural heterogeneity in 100 right-sided sporadic colon carcinomas. British Journal of Cancer, 87(4), 400-404.
doi:10.1038/sj.bjc.6600474
Cortes-Ciriano, I., Lee, S., Park, W., Kim, T., & Park, P. (2017). A molecular portrait of microsatellite instability across multiple cancers. Nature Communications, 8, 15180.
doi:10.1038/ncomms15180
Dietmaier, W., Wallinger, S., Booker, T., Kullmann, F., Fishel, R., & Ruschoff, J. (1997). Diagnostic microsatellite instability: definition and correlation with mismatch repair protein expression. Cancer Research, 57(21), 4749-4756.
Fazekas, A., Steeves, R., & Newmaster, S. (2010). Improving sequencing quality from PCR products containing long mononucleotide repeats. Biotechniques, 48(4), 277-285.
doi: 10.2144/000113369
Gallon, R., Muhlegger, B., Wenzel, S., Sheth, H., Hayes, C., Aretz, S., . . . Wimmer, K. (2019). A sensitive and scalable microsatellite instability assay to diagnose constitutional mismatch repair deficiency by sequencing of peripheral blood leukocytes. Human Mutation, 40(5), 649-655. doi:10.1002/humu.23721
Halford, S., Sasieni, P., Rowan, A., Wasan, H., Bodmer, W., Talbot, I., . . . Tomlinson, I. (2002). Low-level microsatellite instability occurs in most colorectal cancers and is a nonrandomly distributed quantitative trait. Cancer Research, 62(1), 53-57.
Hampel, H., & de la Chapelle, A. (2011). The search for unaffected individuals with Lynch syndrome: do the ends justify the means? Cancer Prevention Research, 4(1), 1-5.
doi: 10.1158/1940-6207.CAPR-10-0345
Hampel, H., Frankel, W., Martin, E., Arnold, M., Khanduja, K., Kuebler, P., . . . de la Chapelle, A. (2008). Feasibility of screening for Lynch syndrome among patients with colorectal cancer. Journal of Clinical Oncology, 26(35), 5783-5788.
doi: 10.1200/JC0.2008.17.5950
Hampel, H., Pearlman, R., Beightol, M., Zhao, W., Jones, D., Frankel, W., . . . Pritchard, C. (2018). Assessment of Tumor Sequencing as a Replacement for Lynch Syndrome
Screening and Current Molecular Tests for Patients With Colorectal Cancer. JAMA
Oncology, 4(6), 806-813. doi:10.1001/jamaoncol.2018.0104 Hause, R., Pritchard, C., Shendure, J., & Salipante, S. (2016). Classification and characterization of microsatellite instability across 18 cancer types. Nature Medicine, 22(11), 1342-1350. doi: 10.1038/nm.4191
Hempelmann, J., Scroggins, S., Pritchard, C., & Salipante, S. (2015). MSIplus for Integrated Colorectal Cancer Molecular Testing by Next-Generation Sequencing. Journal of Molecular Diagnostics, 17(6), 705-714. doi: 10.1016/j.jmoldx.2015.05.008
Hempelmann, J., Lockwood, C., Konnick, E., Schweizer, M., Antonarakis, E., Lotan, T., . . . Pritchard, C. (2018). Microsatellite instability in prostate cancer by PCR or next-generation sequencing. Journal for Immunotherapy of Cancer, 6(1), 29. doi: 10.1 186/s40425-018-0341 -y
Hiatt, J., Pritchard, C., Salipante, S., O'Roak, B., & Shendure, J. (2013). Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation. Genome Research, 23(5), 843-854. doi: 10.1 101/gr.147686.1 12
Jennings, L., Arcila, M., Corless, C., Kamel-Reid, S., Lubin, I., Pfeifer, J., . . . Nikiforova, M. (2017). Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists. Journal of Molecular Diagnostics, 19(3), 341-365.
doi: 10.1016/j.jmoldx.2017.01.01 1
Kautto, E., Bonneville, R., Miya, J., Yu, L., Krook, M., Reeser, J., & Roychowdhury, S.
(2016). Performance evaluation for rapid detection of pan-cancer microsatellite instability with MANTIS. Oncotarget, 8(5), 7452-7463. doi: 10.18632/oncotarget.13918
Le, D., Durham, J., Smith, K., Wang, H., Bartlett, B., Aulakh, L., . . . Diaz, L. J. (2017).
Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade. Science, 357(6349), 409-413. doi: 10.1126/science.aan6733
Li, H., & Durbin, R. (2010). Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26(5), 589-595. doi: 10.1093/bioinformatics/btp698
Marino, P., Touzani, R., Perrier, L., Rouleau, E., Kossi, D., Zhaomin, Z., . . . Baffert, S.
(2018). Cost of cancer diagnosis using next-generation sequencing targeted gene panels in routine practice: a nationwide French study. European Journal of Human Genetics, 26(3), 314-323. doi: 10.1038/s41431 -017-0081-3 Maruvka, Y., Mouw, K., Karlic, R., Parasuraman, P., Kamburov, A., Polak, P., . . . Getz, G. (2017). Analysis of somatic microsatellite indels identifies driver events in human tumors. Nature Biotechnology, 35(10), 951-959. doi:10.1038/nbt.3966
May, A., Abeln, S., Buijs, M., Heringa, J., Crielaard, W., & Brandt, B. (2015). NGS-eval: NGS Error analysis and novel sequence VAriant detection tooL. Nucleic Acids Research, 43(W1), W301-305. doi: 10.1093/nar/gkv346
Newland, A., Kroese, M., Akehurst, R., Bagshaw, J., Chambers, P., Crawford, S., . . .
Fernley, R. (2017). Molecular testing strategies for Lynch syndrome in people with colorectal cancer (Diagnostics Guidance 27). National Institute for Health and Care Excellence
Diagnostics https://www.nice.org.uk/guidance/dg27
Nikiforov, Y., Steward, D., Robinson-Smith, T., Haugen, B., Klopper, J., Zhu, Z., . . .
Nikiforova, M. (2009). Molecular testing for mutations in improving the fine-needle aspiration diagnosis of thyroid nodules. The Journal of Clinical Endocrinology and Metabolism, 94(6), 2092-2098. doi : 10.1210/jc.2009-0247
Redford, L, Alhilal, G., Needham, S., O'Brien, O., Coaker, J., Tyson, J., . . . Burn, J. (2018). A novel panel of short mononucleotide repeats linked to informative polymorphisms enabling effective high volume low cost discrimination between mismatch repair deficient and proficient tumours. PLoS One, 13(8), e0203052. doi: 10.1371/journal. pone.0203052
Shaikh, T., Handorf, E., Meyer, J., Hall, M., & Esnaola, N. (2018). Mismatch Repair
Deficiency Testing in Patients With Colorectal Cancer and Nonadherence to Testing
Guidelines in Young Adults. JAMA Oncology, 4(2), e173580.
doi: 10.1001/jamaoncol.2017.3580
Shia, J. (2008). Immunohistochemistry versus Microsatellite Instability Testing For Screening Colorectal Cancer Patients at Risk For Hereditary Nonpolyposis Colorectal Cancer
Syndrome. Part I. The Utility of Immunohistochemistry. Journal of Molecular Diagnostics, 10(4), 293-300. doi:10.2353/jmoldx.2008.080031
Snowsill, T., Huxley, N., Hoyle, M., Jones-Hughes, T., Coelho, H., Cooper, C., . . . Hyde, C. (2014). A systematic review and economic evaluation of diagnostic strategies for Lynch syndrome. Health Technology Assessment, 18(58), 1-406. doi: 10.3310/hta18580
Stoffel, E., Mangu, P., Gruber, S., Hamilton, S., Kalady, M., Lau, M., . . . Limburg, P. (2015).
Hereditary colorectal cancer syndromes: American Society of Clinical Oncology Clinical
Practice Guideline endorsement of the familial risk-colorectal cancer: European Society for Medical Oncology Clinical Practice Guidelines. Journal of Clinical Oncology, 33(2), 209-217. doi:10.1200/jco.2014.58.1322
Vasen, H., Blanco, I., Aktan-Collan, K., Gopie, J., Alonso, A., Aretz, S., . . . Moslein, G.
(2013). Revised guidelines for the clinical management of Lynch syndrome (HNPCC):
recommendations by a group of European experts. Gut, 62(6), 812-823. doi: 10.1136/gutjnl- 2012-304356
Waalkes, A., Smith, N., Penewit, K., Hempelmann, J., Konnick, E., Hause, R., . . . Salipante, S. (2018). Accurate Pan-Cancer Molecular Diagnosis of Microsatellite Instability by Single- Molecule Molecular Inversion Probe Capture and High-Throughput Sequencing. Clinical Chemistry, 64(6), 950-958. doi:10.1373/clinchem.2017.285981
Zhang, L. (2008). Immunohistochemistry versus microsatellite instability testing for screening colorectal cancer patients at risk for hereditary nonpolyposis colorectal cancer syndrome. Part II. The utility of microsatellite instability testing. Journal of Molecular Diagnostics, 10(4), 301-307. doi: 10.2353/jmoldx.2008.080062
Zhu, L., Huang, Y., Fang, X., Liu, C., Deng, W., Zhong, C., . . . Yuan, Y. (2018). A Novel and Reliable Method to Detect Microsatellite Instability in Colorectal Cancer by Next-Generation Sequencing. Journal of Molecular Diagnostics, 20(2), 225-231. doi: 10.1016/j.jmoldx.2017.11.007

Claims

1. A method for evaluating levels of microsatellite instability in a sample comprising:
(a) amplifying from the sample one or more microsatellite mono-nucleotide repeat loci to give microsatellite amplicons, wherein the one or more microsatellite loci:
(i) comprise GM07 and up to four other microsatellite loci listed in Table 1 ; or
(ii) comprise LR44 and up to four other microsatellite loci listed in Table 1 ; or
(iii) comprise LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44;
(b) sequencing the microsatellite amplicons; and
(c) comparing the sequences from the microsatellite amplicons to predetermined sequences and determining any deviation, indicative of instability, from the predetermined sequences.
2. A method for evaluating the biological significance of sequence variation identified during sequencing, comprising:
(a) amplifying from the sample one or more microsatellite mono-nucleotide repeat loci to give microsatellite amplicons, wherein the one or more microsatellite loci:
(i) comprise GM07 and up to four other microsatellite loci listed in Table 1 ; or
(ii) comprise LR44 and up to four other microsatellite loci listed in Table 1 ; or
(iii) comprise LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44;
and wherein each microsatellite loci has a single nucleotide polymorphism (SNP) loci within a short distance of the microsatellite loci and said amplifying step amplifies both the microsatellite loci and the associated SNP in a single amplicon;
(b) sequencing the microsatellite amplicons;
(c) comparing the sequences from the microsatellite amplicons to predetermined sequences (e.g. wild type sequences) and determining any deviation, indicative of instability, from the predetermined sequences; and
(d) for heterozygous SNPs, determining whether there is a bias between indel frequencies for the two alleles.
3. The method of claim 1 or 2, wherein the one or more microsatellite loci comprise LR52 and up to 22 other microsatellite loci listed in Table 1 , optionally wherein the up to 22 other microsatellite loci include GM07 and/or LR44.
4. The method of any preceding claim, wherein the one or more microsatellite loci comprise LR52 and up to 15 other microsatellite loci from Table 1 , optionally wherein the up to 15 other microsatellite loci include GM07 and/or LR44.
5. The method of claim 4, wherein the one or more microsatellite loci comprise LR52 and up to 9 other microsatellite loci from Table 1 , optionally wherein the up to 9 other microsatellite loci include GM07 and/or LR44.
6. The method of claim 5, wherein the one or more microsatellite loci comprise LR52 and from 2 to 9 other microsatellite loci from Table 1 , optionally wherein the 2 to 9 other microsatellite loci include GM07 and/or LR44.
7. The method of any preceding claim, wherein step (a) comprises amplifying three or more microsatellite loci listed in Table 1.
8. The method of claim 7, wherein the three or more microsatellite loci comprise two or three markers selected from: GM07, LR44 and LR52.
9. The method of claim 7 or 8, wherein the three or more microsatellite loci comprise or consist of a microsatellite loci combination listed in Table 6.
10. The method of claim 9, wherein the three or more microsatellite loci comprise GM07, GM11 , GM14, LR36, LR44 and LR52.
1 1. The method of any preceding claim, wherein step (a) comprises amplifying a total of up to 16 microsatellite loci.
12. The method of claim 11 , wherein step (a) comprises amplifying a total of up to 10 microsatellite loci.
13. The method of 12, wherein step (a) comprises amplifying a total of from 3 to 10 microsatellite loci.
14. The method of any preceding claim, wherein the sample is a tissue or biological fluid sample.
15. The method of claim 14, wherein the sample is from a subject that is suspected of having, at risk of having, or being predisposed to cancer, optionally wherein the cancer is colorectal cancer or Lynch syndrome.
16. A method according to any preceding claim for use in identifying mismatch repair defects, wherein deviation from the predetermined sequences for two or more microsatellite mono- nucleotide repeat loci is indicative of a mismatch repair defect.
17. A method according to any preceding claim for use in identifying MSI-H, wherein deviation from the predetermined sequences for two or more microsatellite mono-nucleotide repeat loci is indicative of the sample having high levels of microsatellite instability (MSI-H).
18. A kit for amplifying:
(i) GM07 and up to four other microsatellite mono-nucleotide repeat loci listed in Table 1 ; or
(ii) LR44 and up to four other microsatellite mono-nucleotide repeat loci listed in Table 1 ; or
(iii) LR52 and up to twenty-three other microsatellite mono-nucleotide repeat loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44,
wherein the kit comprises primers and/or probes for specifically amplifying the microsatellite loci of (i), (ii) or (iii).
19. The kit of claim 18, wherein the kit is for amplifying LR52 and up to 22 other loci listed in Table 1 , optionally wherein the up to 22 other microsatellite loci include GM07 and/or LR44.
20. The kit of claim 18 or 19, wherein the kit further comprises a thermostable polymerase and/or dNTPs or analogs thereof, optionally wherein the dNTPs or analogs thereof are labeled.
21. Use of:
(i) GM07 and up to four other microsatellite loci listed in Table 1 ; or
(ii) LR44 and up to four other microsatellite loci listed in Table 1 ; or
(iii) LR52 and up to twenty-three other microsatellite loci listed in Table 1 , optionally wherein the up to twenty-three other microsatellite loci include GM07 and/or LR44; for evaluating levels of microsatellite instability in a sample, or for evaluating the biological significance of sequence variation identified during sequencing of a sample.
PCT/GB2019/052148 2019-07-31 2019-07-31 Methods of identifying microsatellite instability WO2021019197A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/GB2019/052148 WO2021019197A1 (en) 2019-07-31 2019-07-31 Methods of identifying microsatellite instability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/GB2019/052148 WO2021019197A1 (en) 2019-07-31 2019-07-31 Methods of identifying microsatellite instability

Publications (1)

Publication Number Publication Date
WO2021019197A1 true WO2021019197A1 (en) 2021-02-04

Family

ID=67551570

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2019/052148 WO2021019197A1 (en) 2019-07-31 2019-07-31 Methods of identifying microsatellite instability

Country Status (1)

Country Link
WO (1) WO2021019197A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021183683A1 (en) * 2020-03-12 2021-09-16 Personal Genome Diagnostics Inc. Microsatellite instability signatures
GB202114136D0 (en) 2021-10-01 2021-11-17 Cancer Research Tech Ltd Microsatellite markers

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012158385A1 (en) * 2011-05-16 2012-11-22 Baylor Research Institute Detecting dna mismatch repair-deficient colorectal cancers
WO2018037231A1 (en) 2016-08-24 2018-03-01 The University Of Newcastle Upon Tyne Methods of identifying microsatellite instability

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012158385A1 (en) * 2011-05-16 2012-11-22 Baylor Research Institute Detecting dna mismatch repair-deficient colorectal cancers
WO2018037231A1 (en) 2016-08-24 2018-03-01 The University Of Newcastle Upon Tyne Methods of identifying microsatellite instability

Non-Patent Citations (44)

* Cited by examiner, † Cited by third party
Title
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 1999, JOHN WILEY & SONS
BACHER, J.FLANAGAN, L.SMALLEY, R.NASSIF, N.BURGART, L.HALBERG, R.THIBODEAU, S.: "Development of a fluorescent multiplex assay for detection of MSI-High tumors", DISEASE MARKERS, vol. 20, no. 4-5, 2004, pages 237 - 250, XP009089880
BALMANA, J.BALAGUER, F.CERVANTES, A.ARNOLD, D.: "Familial risk-colorectal cancer: ESMO Clinical Practice Guidelines", ANNALS OF ONCOLOGY, vol. 24, no. 6, 2013, pages vi73 - 80
BERG, K.GLASER, C.THOMPSON, R.HAMILTON, S.GRIFFIN, C.ESHLEMAN, J.: "Detection of microsatellite instability by fluorescence multiplex polymerase chain reaction", JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 2, no. 1, 2000, pages 20 - 28
BOLAND, C.THIBODEAU, S.HAMILTON, S.SIDRANSKY, D.ESHLEMAN, J.BURT, R.SRIVASTAVA, S.: "A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer", CANCER RESEARCH, vol. 58, no. 22, 1998, pages 5248 - 5257
BOYLE, E.O'ROAK, B.MARTIN, B.KUMAR, A.SHENDURE, J.: "MIPgen: optimized modeling and design of molecular inversion probes for targeted resequencing", BIOINFORMATICS, vol. 30, no. 18, 2014, pages 2670 - 2672, XP055310504, doi:10.1093/bioinformatics/btu353
BURN, J.GERDES, A.MACRAE, F.MECKLIN, J.MOESLEIN, G.OLSCHWANG, S.BISHOP, D.: "Long-term effect of aspirin on cancer risk in carriers of hereditary colorectal cancer: an analysis from the CAPP2 randomised controlled trial", THE LANCET, vol. 378, no. 9809, 2011, pages 2081 - 2087, XP055085755, doi:10.1016/S0140-6736(11)61049-0
CHAPUSOT, C.MARTIN, L.BOUVIER, A.BONITHON-KOPP, C.ECARNOT-LAUBRIET, A.RAGEOT, D.PIARD, F.: "Microsatellite instability and intratumoural heterogeneity in 100 right-sided sporadic colon carcinomas", BRITISH JOURNAL OF CANCER, vol. 87, no. 4, 2002, pages 400 - 404
CORTES-CIRIANO, I.LEE, S.PARK, W.KIM, T.PARK, P.: "A molecular portrait of microsatellite instability across multiple cancers", NATURE COMMUNICATIONS, vol. 8, 2017, pages 15180, XP055598140, doi:10.1038/ncomms15180
DIETMAIER, W.WALLINGER, S.BOCKER, T.KULLMANN, F.FISHEL, R.RUSCHOFF, J.: "Diagnostic microsatellite instability: definition and correlation with mismatch repair protein expression", CANCER RESEARCH, vol. 57, no. 21, 1997, pages 4749 - 4756, XP002199560
FAZEKAS, A.STEEVES, R.NEWMASTER, S.: "Improving sequencing quality from PCR products containing long mononucleotide repeats", BIOTECHNIQUES, vol. 48, no. 4, 2010, pages 277 - 285, XP055290741, doi:10.2144/000113369
GALLON, R.MUHLEGGER, B.WENZEL, S.SHETH, H.HAYES, C.ARETZ, S.WIMMER, K.: "A sensitive and scalable microsatellite instability assay to diagnose constitutional mismatch repair deficiency by sequencing of peripheral blood leukocytes", HUMAN MUTATION, vol. 40, no. 5, 2019, pages 649 - 655
HALEMARHAM: "The Harper Collins Dictionary of Biology", 1991, HARPER PERENNIAL
HALFORD, S.SASIENI, P.ROWAN, A.WASAN, H.BODMER, W.TALBOT, I.TOMLINSON, I.: "Low-level microsatellite instability occurs in most colorectal cancers and is a nonrandomly distributed quantitative trait", CANCER RESEARCH, vol. 62, no. 1, 2002, pages 53 - 57
HAMPEL, H.DE LA CHAPELLE, A.: "The search for unaffected individuals with Lynch syndrome: do the ends justify the means?", CANCER PREVENTION RESEARCH, vol. 4, no. 1, 2011, pages 1 - 5
HAMPEL, H.FRANKEL, W.MARTIN, E.ARNOLD, M.KHANDUJA, K.KUEBLER, P.DE LA CHAPELLE, A.: "Feasibility of screening for Lynch syndrome among patients with colorectal cancer", JOURNAL OF CLINICAL ONCOLOGY, vol. 26, no. 35, 2008, pages 5783 - 5788
HAMPEL, H.PEARLMAN, R.BEIGHTOL, M.ZHAO, W.JONES, D.FRANKEL, W.PRITCHARD, C.: "Assessment of Tumor Sequencing as a Replacement for Lynch Syndrome Screening and Current Molecular Tests for Patients With Colorectal Cancer", JAMA ONCOLOGY, vol. 4, no. 6, 2018, pages 806 - 813
HAUSE, R.PRITCHARD, C.SHENDURE, J.SALIPANTE, S.: "Classification and characterization of microsatellite instability across 18 cancer types", NATURE MEDICINE, vol. 22, no. 11, 2016, pages 1342 - 1350, XP055494424, doi:10.1038/nm.4191
HEMPELMANN, J.LOCKWOOD, C.KONNICK, E.SCHWEIZER, M.ANTONARAKIS, E.LOTAN, T.PRITCHARD, C.: "Microsatellite instability in prostate cancer by PCR or next-generation sequencing", JOURNAL FOR IMMUNOTHERAPY OF CANCER, vol. 6, no. 1, 2018, pages 29, XP021255477, doi:10.1186/s40425-018-0341-y
HEMPELMANN, J.SCROGGINS, S.PRITCHARD, C.SALIPANTE, S.: "MSIplus for Integrated Colorectal Cancer Molecular Testing by Next-Generation Sequencing", JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 17, no. 6, 2015, pages 705 - 714
HIATT, J.PRITCHARD, C.SALIPANTE, S.O'ROAK, B.SHENDURE, J.: "Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation", GENOME RESEARCH, vol. 23, no. 5, 2013, pages 843 - 854, XP055225609, doi:10.1101/gr.147686.112
JENNINGS, L.ARCILA, M.CORLESS, C.KAMEL-REID, S.LUBIN, I.PFEIFER, J.NIKIFOROVA, M.: "Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists", JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 19, no. 3, 2017, pages 341 - 365
KAUTTO, E.BONNEVILLE, R.MIYA, J.YU, L.KROOK, M.REESER, J.ROYCHOWDHURY, S.: "Performance evaluation for rapid detection of pan-cancer microsatellite instability with MANTIS", ONCOTARGET, vol. 8, no. 5, 2016, pages 7452 - 7463
LE, D.DURHAM, J.SMITH, K.WANG, H.BARTLETT, B.AULAKH, L.DIAZ, L. J.: "Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade", SCIENCE, vol. 357, no. 6349, 2017, pages 409 - 413
LI, H.DURBIN, R.: "Fast and accurate long-read alignment with Burrows-Wheeler transform", BIOINFORMATICS, vol. 26, no. 5, 2010, pages 589 - 595
LISA REDFORD ET AL: "A novel panel of short mononucleotide repeats linked to informative polymorphisms enabling effective high volume low cost discrimination between mismatch repair deficient and proficient tumours", PLOS ONE, vol. 13, no. 8, 29 August 2018 (2018-08-29), pages e0203052, XP055651331, DOI: 10.1371/journal.pone.0203052 *
LISA REDFORD ET AL: "A novel panel of short mononucleotide repeats linked to informative polymorphisms enabling effective high volume low cost discrimination between mismatch repair deficient and proficient tumours. Supplementary Table S2", PLOS ONE, VOL.13, ISSUE 8, 29 August 2018 (2018-08-29), XP055677956, Retrieved from the Internet <URL:https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0203052#sec016> [retrieved on 20200319], DOI: 10.1371/journal.pone.0203052 *
LIZHEN ZHU ET AL: "A Novel and Reliable Method to Detect Microsatellite Instability in Colorectal Cancer by Next-Generation Sequencing", THE JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 20, no. 2, 1 March 2018 (2018-03-01), US, pages 225 - 231, XP055651393, ISSN: 1525-1578, DOI: 10.1016/j.jmoldx.2017.11.007 *
MARINO, P.TOUZANI, R.PERRIER, L.ROULEAU, E.KOSSI, D.ZHAOMIN, Z.BAFFERT, S.: "Cost of cancer diagnosis using next-generation sequencing targeted gene panels in routine practice: a nationwide French study", EUROPEAN JOURNAL OF HUMAN GENETICS, vol. 26, no. 3, 2018, pages 314 - 323, XP036822847, doi:10.1038/s41431-017-0081-3
MARUVKA, Y.MOUW, K.KARLIC, R.PARASURAMAN, P.KAMBUROV, A.POLAK, P.GETZ, G.: "Analysis of somatic microsatellite indels identifies driver events in human tumors", NATURE BIOTECHNOLOGY, vol. 35, no. 10, 2017, pages 951 - 959, XP055481846, doi:10.1038/nbt.3966
MAY, A.ABELN, S.BUIJS, M.HERINGA, J.CRIELAARD, W.BRANDT, B.: "NGS-eval: NGS Error analysis and novel sequence VAriant detection tooL", NUCLEIC ACIDS RESEARCH, vol. 43, no. W1, 2015, pages W301 - 305
NEWLAND, A.KROESE, M.AKEHURST, R.BAGSHAW, J.CHAMBERS, P.CRAWFORD, S.FERNLEY, R.: "Molecular testing strategies for Lynch syndrome in people with colorectal cancer (Diagnostics Guidance 27", NATIONAL INSTITUTE FOR HEALTH AND CARE EXCELLENCE DIAGNOSTICS, 2017, Retrieved from the Internet <URL:https://www.nice.org.uk/guidance/dg27>
NIKIFOROV, Y.STEWARD, D.ROBINSON-SMITH, T.HAUGEN, B.KLOPPER, J.ZHU, Z.NIKIFOROVA, M.: "Molecular testing for mutations in improving the fine-needle aspiration diagnosis of thyroid nodules", THE JOURNAL OF CLINICAL ENDOCRINOLOGY AND METABOLISM, vol. 94, no. 6, 2009, pages 2092 - 2098, XP002654267, doi:10.1210/JC.2009-0247
REDFORD, L.ALHILAL, G.NEEDHAM, S.O'BRIEN, O.COAKER, J.TYSON, J.BURN, J.: "A novel panel of short mononucleotide repeats linked to informative polymorphisms enabling effective high volume low cost discrimination between mismatch repair deficient and proficient tumours", PLOS ONE, vol. 13, no. 8, 2018, pages e0203052
SAMBROOK ET AL.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR PRESS
SHAIKH, T.HANDORF, E.MEYER, J.HALL, M.ESNAOLA, N.: "Mismatch Repair Deficiency Testing in Patients With Colorectal Cancer and Nonadherence to Testing Guidelines in Young Adults", JAMA ONCOLOGY, vol. 4, no. 2, 2018, pages e173580
SHIA, J: "Immunohistochemistry versus Microsatellite Instability Testing For Screening Colorectal Cancer Patients at Risk For Hereditary Nonpolyposis Colorectal Cancer Syndrome. Part I. The Utility of Immunohistochemistry", JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 10, no. 4, 2008, pages 293 - 300, XP008098090, doi:10.2353/jmoldx.2008.080062
SINGLETONSAINSBURY: "Dictionary of Microbiology and Molecular Biology", 1994, JOHN WILEY AND SONS
SNOWSILL, T.HUXLEY, N.HOYLE, M.JONES-HUGHES, T.COELHO, H.COOPER, C.HYDE, C.: "A systematic review and economic evaluation of diagnostic strategies for Lynch syndrome", HEALTH TECHNOLOGY ASSESSMENT, vol. 18, no. 58, 2014, pages 1 - 406
STOFFEL, E.MANGU, P.GRUBER, S.HAMILTON, S.KALADY, M.LAU, M.LIMBURG, P.: "Hereditary colorectal cancer syndromes: American Society of Clinical Oncology Clinical Practice Guideline endorsement of the familial risk-colorectal cancer: European Society for Medical Oncology Clinical Practice Guidelines", JOURNAL OF CLINICAL ONCOLOGY, vol. 33, no. 2, 2015, pages 209 - 217
VASEN, H.BLANCO, I.AKTAN-COLLAN, K.GOPIE, J.ALONSO, A.ARETZ, S.MOSLEIN, G.: "Revised guidelines for the clinical management of Lynch syndrome (HNPCC): recommendations by a group of European experts", GUT, vol. 62, no. 6, 2013, pages 812 - 823
WAALKES, A.SMITH, N.PENEWIT, K.HEMPELMANN, J.KONNICK, E.HAUSE, R.SALIPANTE, S.: "Accurate Pan-Cancer Molecular Diagnosis of Microsatellite Instability by Single-Molecule Molecular Inversion Probe Capture and High-Throughput Sequencing", CLINICAL CHEMISTRY, vol. 64, no. 6, 2018, pages 950 - 958
ZHANG, L.: "Immunohistochemistry versus microsatellite instability testing for screening colorectal cancer patients at risk for hereditary nonpolyposis colorectal cancer syndrome. Part II. The utility of microsatellite instability testing", JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 10, no. 4, 2008, pages 301 - 307, XP008098090, doi:10.2353/jmoldx.2008.080062
ZHU, L.HUANG, Y.FANG, X.LIU, C.DENG, W.ZHONG, C.YUAN, Y.: "A Novel and Reliable Method to Detect Microsatellite Instability in Colorectal Cancer by Next-Generation Sequencing", JOURNAL OF MOLECULAR DIAGNOSTICS, vol. 20, no. 2, 2018, pages 225 - 231

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021183683A1 (en) * 2020-03-12 2021-09-16 Personal Genome Diagnostics Inc. Microsatellite instability signatures
GB202114136D0 (en) 2021-10-01 2021-11-17 Cancer Research Tech Ltd Microsatellite markers
WO2023052795A1 (en) 2021-10-01 2023-04-06 Cancer Research Technology Limited Microsatellite markers

Similar Documents

Publication Publication Date Title
Jennings et al. Guidelines for validation of next-generation sequencing–based oncology panels: a joint consensus recommendation of the Association for Molecular Pathology and College of American Pathologists
Ciampi et al. Genetic landscape of somatic mutations in a large cohort of sporadic medullary thyroid carcinomas studied by next-generation targeted sequencing
US11978535B2 (en) Methods of detecting somatic and germline variants in impure tumors
Zaliova et al. ETV6/RUNX1‐like acute lymphoblastic leukemia: a novel B‐cell precursor leukemia subtype associated with the CD27/CD44 immunophenotype
AU2017316709B2 (en) Methods of identifying microsatellite instability
CN114959918A (en) Methods and systems for assessing tumor mutational burden
US20190348149A1 (en) Validation methods and systems for sequence variant calls
CN114026646A (en) System and method for assessing tumor score
Gallon et al. Sequencing‐based microsatellite instability testing using as few as six markers for high‐throughput clinical diagnostics
WO2017112738A1 (en) Methods for measuring microsatellite instability
Zhao et al. TruSight oncology 500: enabling comprehensive genomic profiling and biomarker reporting with targeted sequencing
WO2022182878A1 (en) Methods for detection of donor-derived cell-free dna in transplant recipients of multiple organs
CN114867870A (en) Method for determining the presence or absence of Minimal Residual Disease (MRD) in a subject whose disease has been treated
WO2021019197A1 (en) Methods of identifying microsatellite instability
Kristensen et al. Targeted ultradeep next‐generation sequencing as a method for KIT D 816 V mutation analysis in mastocytosis
Kantorova et al. TP53 mutation analysis in chronic lymphocytic leukemia: comparison of different detection methods
US20230002831A1 (en) Methods and compositions for analyses of cancer
WO2016057852A1 (en) Markers for hematological cancers
EP4359569A1 (en) Systems and methods for evaluating tumor fraction
US11702692B2 (en) Method of treatment of disease and method for quantifying the level of minimal residual disease in a subject
US20220316015A1 (en) Method for determining if a tumor has a mutation in a microsatellite
Heider Detection of trace levels of circulating tumour DNA in early stage non-small cell lung cancer
WO2023052795A1 (en) Microsatellite markers
Temple-Smolkin et al. Guidelines for Validation of Next-Generation SequencingeBased Oncology Panels
Trujillano Lidón Mendelian disease gene identification and diagnosis using targeted next generation sequencing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19752234

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19752234

Country of ref document: EP

Kind code of ref document: A1