WO2020063964A1 - 一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法 - Google Patents

一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法 Download PDF

Info

Publication number
WO2020063964A1
WO2020063964A1 PCT/CN2019/109036 CN2019109036W WO2020063964A1 WO 2020063964 A1 WO2020063964 A1 WO 2020063964A1 CN 2019109036 W CN2019109036 W CN 2019109036W WO 2020063964 A1 WO2020063964 A1 WO 2020063964A1
Authority
WO
WIPO (PCT)
Prior art keywords
microsatellite
cancer
sample
msi
site
Prior art date
Application number
PCT/CN2019/109036
Other languages
English (en)
French (fr)
Inventor
汉雨生
刘成林
张之宏
张周
段飞蝶
Original Assignee
广州燃石医学检验所有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201811149011.0A external-priority patent/CN109207594B/zh
Priority claimed from CN201811149015.9A external-priority patent/CN109182525B/zh
Application filed by 广州燃石医学检验所有限公司 filed Critical 广州燃石医学检验所有限公司
Priority to AU2019351522A priority Critical patent/AU2019351522A1/en
Priority to US17/281,071 priority patent/US20210355544A1/en
Priority to CA3114465A priority patent/CA3114465A1/en
Priority to JP2021517643A priority patent/JP2022503916A/ja
Priority to BR112021005966-0A priority patent/BR112021005966B1/pt
Priority to EP19867890.6A priority patent/EP3859010A4/en
Publication of WO2020063964A1 publication Critical patent/WO2020063964A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/40ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to mechanical, radiation or invasive therapies, e.g. surgery, laser therapy, dialysis or acupuncture
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the present invention relates to a combination of biomarkers, a kit for detecting the same, a method for detecting a microsatellite steady state in a plasma sample, and a cancer, preferably a noninvasive diagnosis of colorectal cancer (such as colon cancer), gastric cancer, or endometrial cancer, Prognostic assessment, choice of treatment options or use in genetic screening.
  • a cancer preferably a noninvasive diagnosis of colorectal cancer (such as colon cancer), gastric cancer, or endometrial cancer, Prognostic assessment, choice of treatment options or use in genetic screening.
  • a microsatellite is a short sequence of DNA or a single nucleotide region contained in a gene.
  • DNA when DNA is methylated or a gene mutation causes a mismatch repair gene to be deleted, it can cause microsatellite repeat sequence mismatches (microsatellite mutations), resulting in shortened or extended sequences, which can cause microsatellite instability (microsatellite instability (MSI).
  • MSI microsatellite instability
  • MSI microsatellite instability-high
  • MSI-L microsatellite instability-low
  • MSS microsatellite stability
  • MSI-H is involved in the occurrence and development of malignant tumors and is closely related to the occurrence of colorectal cancer (such as bowel cancer), gastric cancer, and endometrial cancer.
  • MSI-H is present in approximately 15% of colorectal cancer patients, and MSI-H is present in more than 90% of patients with hereditary nonpolyposis colorectal cancer (HNPCC), indicating that MSI-H can As an important marker for detecting patients with HNPCC; compared with colorectal cancer with MSS (ie, microsatellite stable), colorectal cancer patients with MSI-H have a better prognosis, and the drug response is not the same, suggesting that MSI-H can be used as an independent predictor of the prognosis of colorectal cancer. Therefore, MSI testing is of great significance for patients with colorectal cancer.
  • MSI testing methods are limited to tissue testing.
  • MMR gene tests performed in domestic hospitals usually only include MLH1 and MSH2, and some of them include both MSH6 and PMS2.
  • the positive results of MSI tests are low in agreement with MSI test results; only a few hospitals have developed MSI status detection by PCR and capillary electrophoresis was performed, and most of them were sent out.
  • This method usually selects 5-11 single nucleotide repeat sites with a length of about 25bp. After PCR amplification, the length distribution interval is measured by capillary electrophoresis to determine the microsatellite (un) steady state of the sample. This method is the current gold standard detection method.
  • tissue MSI detection methods based on next-generation sequencing have proven to have a very high rate of agreement with PCR-MSI, which can map the genome map while determining MSI status, and provide more abundant information for cancer diagnosis.
  • these methods all require a sufficient proportion of tumor cells.
  • plasma circulating tumor DNA (ctDNA) is scarce, tissue-based methods cannot be implemented in plasma.
  • Tumor blood tests have non-invasive, real-time, non-tissue-specific features that tissues do not have, and have important clinical significance. Therefore, there is an urgent need in the art for plasma-based MSI detection methods, especially for non-invasive diagnosis, prognostic evaluation, treatment plan selection or genetic screening of cancer, preferably colorectal cancer (such as bowel cancer), gastric cancer or endometrial cancer.
  • Method for MSI of tumor blood test preferably colorectal cancer (such as bowel cancer), gastric cancer or endometrial cancer.
  • the present application provides a plasma MSI detection method for the first time, and compared to tissue MSI detection, the plasma MSI detection of the present application is non-invasive, real-time, non-tissue specific, and can detect multiple lesions in advance.
  • the method of the present invention can complete the detection of microsatellite status in a plasma sample with a very low ctDNA content, which fills the gap in detecting the microsatellite status through the plasma sample.
  • the detection speed is fast, it does not rely on matching white blood cell samples, and the price is lower. Faster, you can judge the microsatellite stability (MS) status of a sample with high accuracy, high sensitivity and high specificity.
  • MS microsatellite stability
  • the detection method of the present application can also be used for non-invasive diagnosis, prognostic evaluation, or treatment plan selection of patients with cancer, preferably colorectal cancer (such as bowel cancer), gastric cancer or endometrial cancer.
  • cancer preferably colorectal cancer (such as bowel cancer), gastric cancer or endometrial cancer.
  • this application relates to the following aspects:
  • the application provides a biomarker combination that includes one or more of the eight microsatellite loci shown in Table 1.
  • the present application provides a biomarker combination comprising a combination of a microsatellite site and one or more genes, wherein the microsatellite site comprises the 8 microsatellite sites shown in claim 1 or A combination of any one or more of which one or more genes are any one or more of the following 41 genes:
  • the present invention provides a kit for detecting a microsatellite steady state in a plasma sample, characterized in that the kit includes a detection reagent for a biomarker combination of the present application.
  • the present invention provides a kit for non-invasive diagnosis, prognostic assessment, selection of a treatment regimen, or genetic screening for cancer, preferably colorectal cancer (eg, bowel cancer), gastric cancer or endometrial cancer, characterized in that
  • the kit includes a detection reagent for a biomarker combination of the present application.
  • the plasma sample is a cancer plasma sample, preferably a colorectal cancer plasma sample, such as a colon cancer plasma sample, a gastric cancer plasma sample, and an endometrial cancer plasma sample.
  • the microsatellite stability state includes microsatellite stability-high (MSI-H), microsatellite stability-low (MSI-L), and microsatellite stability stable, MSS) type.
  • the detection reagent is a reagent for performing next-generation sequencing (NGS).
  • NGS next-generation sequencing
  • the present application also relates to the use of a biomarker combination for detecting a microsatellite steady state in a plasma sample.
  • the plasma sample is a cancer plasma sample, preferably a colorectal cancer plasma sample, such as a colon cancer plasma sample, a gastric cancer plasma sample, and an endometrial cancer plasma sample.
  • the microsatellite stability state includes microsatellite stability-high (MSI-H), microsatellite stability-low (MSI-L), and microsatellite stability stable, MSS) type.
  • the present application also relates to the use of a combination of biomarkers in non-invasive diagnosis, prognostic evaluation, selection of a treatment plan, or genetic screening for cancer, preferably colorectal cancer (eg, bowel cancer), gastric cancer or endometrial cancer.
  • cancer preferably colorectal cancer (eg, bowel cancer), gastric cancer or endometrial cancer.
  • the present application provides a method for determining a microsatellite marker site that can be used in the detection of microsatellite steady state in a plasma sample, which includes the following steps:
  • the MSS length feature is a continuous range of the smallest range, so that the number of corresponding sequencing fragments in the MSS sample is greater than 75% of the total number of supported sequencing fragments at the site;
  • the MSI-H length feature is a segment height in the MSS and MSI-H samples Differentiated consecutive length ranges, such that a) the total number of sequencing fragments supported by this range is less than 0.2% of the total number of sequenced fragments at the site in the MSS sample, and b) it accounts for 50% of the total number of sequenced fragments at the site in the MSI-H sample the above,
  • Microsatellite sites with the above characteristics are microsatellite detection marker sites.
  • the sample comprises a sample from normal white blood cells and tissues from a patient with cancer, preferably the cancer is colorectal cancer (e.g., bowel cancer), gastric cancer, or endometrium cancer.
  • the microsatellite site determined by the method for determining a microsatellite landmark site of the present application includes one or more of the eight microsatellite sites described in Table 1.
  • the microsatellite steady state detection is used for noninvasive diagnosis, prognostic evaluation, treatment of cancer, preferably colorectal cancer (such as colon cancer), gastric cancer or endometrial cancer. Choice of protocol or genetic screening.
  • the present application provides a method for determining a stable state of a microsatellite site from a plasma sample of a cancer patient based on a second-generation high-throughput sequencing method, which includes the following steps:
  • the Zscore is evaluated by H s ,
  • N is the total number of reads in the MSI-H state and the MSS state repeat sequence length set
  • K is the total number of the sequence fragments in the MSI-H state repeat sequence length set
  • N-K is the total number of the sequence fragments in the MSI state repeat sequence length set.
  • n and k are respectively the number of corresponding sequencing fragments in the test sample.
  • MSscore is calculated based on the following formula:
  • the cancer is colorectal cancer (eg, bowel cancer), gastric cancer, or endometrial cancer.
  • the present application provides a method for detecting microsatellite steady-state and disease-related genetic variation of a patient based on second-generation high-throughput sequencing to provide a clinical risk control, treatment, and / or prognosis program for the patient or family. Guided method, which includes the following steps:
  • the method of detecting microsatellite stability and disease-related genetic variation of a patient based on second-generation high-throughput sequencing provided in the present application in order to provide clinical guidance for the risk control, treatment and / or prognosis of the patient or family
  • the disease is cancer, preferably colorectal cancer (eg, bowel cancer), gastric cancer or endometrial cancer.
  • the present application also relates to a kit for one of the various methods of the present application, comprising a reagent for detecting the plurality of microsatellite sites.
  • the present application also provides a device for determining a microsatellite marker site in a microsatellite steady state detection in a plasma sample, wherein the device includes:
  • Sequencing data reading module used to read the sample sequencing data obtained and stored in the sequencing equipment
  • Microsatellite marker site detection module which is used to analyze all microsatellite sites in the sequencing region in the sample from the sample sequencing data.
  • the repeat sequence length type determination module is used to count the number of each repeat sequence length type of the sequencing fragments (reads) by using the sample sequencing data read by the sequencing data reading module for any microsatellite site i.
  • a determination module configured to determine whether any microsatellite site i is a microsatellite marker site, the determination module includes a first analysis module, a second analysis module, and a third analysis template;
  • the first analysis template is used to determine the repeat sequence length characteristics of a site in a microsatellite stable (MSS) state, and determine whether the number of corresponding sequencing fragments in the MSS sample is greater than 75% of the total number of sequencing fragments supported by the site Among them, the MSS length feature is a continuous range of a minimum range. If a positive result is obtained, it is recorded as "+”, and if a negative result is obtained, it is recorded as "-”.
  • MSS microsatellite stable
  • the second analysis template is used to determine the length repeat feature of the loci in a microsatellite highly unstable (MSI-H) state, wherein the MSI-H length feature is a continuous length that is highly distinguished in the MSI and MSI-H samples And determine whether a) the total number of sequenced fragments in the continuous length range is less than 0.2% of the total number of sequenced fragments at the site in the MSS sample. If a positive result is obtained, record it as "+”. Negative result, marked as "-",
  • sequenced fragments in the MSI-H sample account for more than 50% of the total number of sequenced fragments at this site, if a positive result is obtained, it is recorded as "+”, and if a negative result is obtained, it is recorded as ""
  • the third analysis template is used to analyze the results of the first analysis template and the second analysis template. When three positive results are obtained, that is, three “+” s, it is determined that the microsatellite site i is a microsatellite mark. Site.
  • the sample includes a sample from normal white blood cells and tissues of a cancer patient, and the cancer is preferably Colorectal cancer (eg, bowel cancer), gastric cancer, or endometrial cancer.
  • the microsatellite location determined by the above-mentioned device includes one or more of the eight microsatellite locations described in Table 1.
  • the microsatellite steady state test is for cancer, preferably colorectal cancer (such as bowel cancer), gastric cancer or endometrial cancer, non-invasive diagnosis, prognostic assessment, choice of treatment options or genetic screening.
  • cancer preferably colorectal cancer (such as bowel cancer), gastric cancer or endometrial cancer, non-invasive diagnosis, prognostic assessment, choice of treatment options or genetic screening.
  • this aspect also relates to a device for determining a stable state of a microsatellite site from a plasma sample of a cancer patient based on a second-generation high-throughput sequencing method, characterized in that the device includes:
  • Sequencing data reading module used to read the sample sequencing data obtained and stored in the sequencing equipment
  • Repeat sequence length feature determination module configured to analyze and obtain repeat sequence length features of multiple microsatellite sites in a plasma sample and a MSS plasma sample as a reference sample from the sequencing data of the sample, the multiple microsatellite sites including One or more of the eight microsatellite sites shown in Table 1;
  • Enrichment index calculation module used to calculate the enrichment index Zscore of microsatellite sites
  • the microsatellite state index calculation module is used to sum the enrichment indices Zscore of all microsatellite sites to obtain the index MSscore for judging the status of the sample microsatellites;
  • a threshold calculation module configured to calculate an average mean and a standard deviation SD of the MSscore of the MSS plasma sample as a reference sample, and use its mean + 3SD as a threshold cutoff;
  • Microsatellite site steady state determination template used to compare the index MSscore and threshold cutoff. For plasma samples from cancer patients, when its MSscore> cutoff, the sample is determined to be MSI-H, and when its MSscore ⁇ cutoff, the sample is determined to be MSS.
  • a device for determining a stable state of a microsatellite site from a plasma sample of a cancer patient based on a second-generation high-throughput sequencing method characterized in that said Zscore is evaluated by H s
  • N is the total number of reads in the MSI-H state and the MSS state repeat sequence length set
  • K is the total number of the sequence fragments in the MSI-H state repeat sequence length set
  • N-K is the total number of the sequence fragments in the MSI state repeat sequence length set.
  • n and k are respectively the number of corresponding sequencing fragments in the test sample.
  • MSscore is calculated based on the following formula:
  • the disease is cancer, preferably colorectal cancer (such as bowel cancer), gastric cancer, or endometrial cancer.
  • FIG. 1 (A) Distribution of the number of reads of each repeat sequence length of the microsatellite marker site bMS-BR1 in complete MSI-H cancer cells and leukocyte samples. The blue box indicates the MSS characteristic range of the site is 22-25bp, and the red box indicates the MSI-H characteristic range of the site ⁇ 16bp. (B) The distribution of the number of non-marker loci in each MSI-H cancer cell and leukocyte sample with the length of each repeat sequence. Although the length of the repetitive sequence at this site has been shortened by about 2bp, this difference is not enough to distinguish it from the capture fluctuations of leukocytes under the condition that the tumor ctDNA content is very small. It does not exist only in high frequency in MSI-H samples. The repeat sequence length type.
  • FIG. 1 bMSISEA detection effect.
  • B Correlation between maxAF and MSscore of 44 MSI-H samples; red dots indicate MSscore> 15, The sample was judged to be MSI-H, and blue indicates that the MSscore did not meet the threshold, and the sample was judged to be MSS;
  • C Based on the correlation between the detection sensitivity of the simulated sample and maxAF.
  • results Based on 350 simulated samples with gradient differences in ctDNA content, the horizontal axis represents only samples whose maxAF is greater than the corresponding value.
  • the vertical axis is the detection sensitivity of MSI-H. When maxAF> 0.2%, the MSI-H detected The sensitivity is higher than 93%, maxAF> 0.5%, and the sensitivity is higher than 98%.
  • the present application provides a method for detecting microsatellite stability and disease-related genes through plasma for the first time based on next-generation sequencing, and based on the detection method, a highly sensitive and specific method for detecting cancer, preferably colorectal cancer, is obtained. (Such as bowel cancer), gastric cancer, or endometrial cancer-related MSI sites.
  • the present invention establishes a method for determining microsatellite marker sites that can be used to detect the status of microsatellites based on plasma samples.
  • the invention also realizes the simultaneous detection of multiple microsatellite loci and multiple disease-related genes in a sample, and can provide more comprehensive conclusions and recommendations in terms of prognosis, treatment, investigation and the like for the detected sample.
  • the present application provides a plasma MSI detection method for the first time, and compared to tissue MSI detection, the plasma MSI detection of the present application is non-invasive, real-time, non-tissue-specific, and the like.
  • the method of the present invention can complete the detection of microsatellite status in a plasma sample with a very low ctDNA content, filling a gap in detecting the microsatellite status through a plasma sample, and can achieve a high accuracy for samples with a ctDNA content higher than 0.4%.
  • Rate, fast detection speed does not rely on matching white blood cell samples, lower price, faster detection, can determine the microsatellite stable (MS) status of the sample with high sensitivity and specificity.
  • the detection method of the present application can also be used in non-invasive diagnosis, prognostic evaluation, or selection of treatment options for patients with cancer, preferably colorectal cancer (such as bowel cancer), gastric cancer or endometrial cancer.
  • cancer preferably colorectal cancer (such as bowel cancer), gastric cancer or endometrial cancer.
  • the present application also provides a device for determining a microsatellite marker site in a microsatellite steady state detection in a plasma sample, and a microsatellite site is determined from a plasma sample of a cancer patient based on a second-generation high-throughput sequencing method. Steady state equipment.
  • the inventors of the present invention have found that, for microsatellite highly unstable samples, the microsatellite loci expand or contract a large number of repeated sequences due to erroneous DNA replication.
  • the length type is used to characterize the repeat length of the loci in the MSI-H state.
  • the specific marker site selection criteria are as follows: a) the sequenced fragments in the repeat sequence length range in the MSS sample are less than 0.2% of the total number of sequenced fragments in the site and b) the number of sequenced fragments in the range in the MSI-H sample Supports more than 50% of the number of sequencing fragments.
  • the length range is defined as the characteristic of the repeat length of the MSI-H status site.
  • This application is based on a second-generation high-throughput sequencing method for determining the stability of microsatellite loci in plasma samples from cancer patients, that is, the main strategy of the microsatellite unstable plasma detection technology bMSISEA is to first find MSI-H and MSS based on tissue samples Sequencing reads in different states cover distinct morphological sites, and describe the main types of reads supported by the loci in the two states. By performing reads on the MSI-H state of each marker site, reads ) Feature richness analysis, evaluate its unstable state, and then obtain the state of the sample microsatellite.
  • the method for determining the stable state of microsatellite loci in a plasma sample from a cancer patient includes the following steps: 1) data preparation, including sample preparation, microsatellite locus detection in a sequencing region, and statistics on the type of locus repeat sequences 2) Selection of marker sites and description of site characteristics; 3) Analysis of enrichment of unstable characteristics of microsatellites; 4) Evaluation of the average fluctuation level of the enrichment index of each site. 5) Construct the MS score based on the relative level of the richness index of the plasma sample to be tested, and then determine the MS status of the sample.
  • Tissue sample capture steps are as follows: Use QIAamp DNA FFPE tissue issue kit (QIAGEN: 56404) to extract DNA from tumor tissue and normal tissues adjacent to the cancer, respectively.
  • the dsDNA and HS kits (ThermoFisher: Q32854) provided with Qubit 3.0 fluorometer were used for accurate quantification.
  • the DNA was physically fragmented into a 180-250 bp long fragment by using an ultrasonic disruptor Covaris M220 (Covaris: PN500295), and then repaired and phosphorylated, and deoxyadenine was added to the 3 ′ end and connected to the linker.
  • the DNA connected to the amplification adapter was purified with AgencourtAMPure XP paramagnetic magnetic beads, and pre-amplified with PCR polymerase.
  • the amplified purified product was hybridized with Agilent's customized multi-biotin labeled probe set. (The panel design includes exon and partial intron region sequences of 41 genes).
  • the hybridized fragments were specifically eluted, and after PCR polymerase enrichment and amplification, quantitative and fragment length distribution measurements were performed.
  • Second-generation sequencing was performed using an Illumina Novaseq 6000 sequencer (commercial number: 20012850) with a sequencing depth of 1000X.
  • the steps for capturing a blood sample are as follows: First, a nucleic acid extraction reagent is used to separately extract plasma free DNA and paired peripheral blood leukocyte genomic DNA, and segment the leukocyte genomic DNA. Then the whole genome pre-library is prepared by adding adapters, PCR amplification and other steps. The pre-library is hybridized with an RNA probe with a specific sequence labeled with biotin to specifically capture partial exons of 41 genes in the human genome. And intron regions (full coding region, exon-intron junction region, UTR region, and promoter region). The streptavidin magnetic beads were used to enrich the DNA fragments captured by the probes, and the enriched DNA fragments were used as templates to amplify the final library. After quantifying and quality control of the final library, the IlluminaNovaSeq gene sequencer was used to perform high-throughput sequencing of the final library with a sequencing depth of 15000X.
  • VarScanfpfilter will remove sites with low coverage depth (tissue: below 50x, plasma below 500x, and leukocytes below 20x); for indel and single point mutation, at least 5 and 8 are required, respectively. Variant reads.
  • the microsatellite instability detection algorithm bMSISEA requires only binary sequence alignment (BAM) files for cancer plasma samples.
  • BAM binary sequence alignment
  • the baseline construction process also requires BAM files for the following samples: sufficient matched MSI-H cancer tissue and normal samples (number greater than 50), sufficient white blood cell samples (number greater than 100), and sufficient MSS plasma samples (number greater than 100) .
  • This method first uses MSIsensor (v 0.5) software to obtain all microsatellite loci with a length greater than 10 repeat sequences of 1 in the sequencing coverage area, and calculates the number of read sequencing reads of each type of repeat sequence in the microsatellite locus. .
  • the method of covering the number of sequencing reads by each length type of MSIsensor statistical locus is as follows: For each microsatellite locus, first search its position information and sequences at both ends in the human genome, and construct the middle connected by the sequences at both ends All the sequences with repeat lengths ranging from 1 to L-10 bp were used as the search dictionary, and L was the length of the reads.
  • a single base microsatellite locus on chromosome 1 (14T, T is the repeating base, 14 is the number of repeats)
  • the sequences at both ends are ATTCC and GCTTT
  • the search dictionary constructed contains ATTCCTGCTTT (repeat The length is 1), ATTCCTTGCTTT (repeat length is 2), ATTCCTTTGCTTT (repeat length is 3), and so on.
  • the sequencing fragments are most likely to cover one or two types of repeat sequence lengths corresponding to the sample genotype.
  • This step is based on the white blood cell sample, and describes the type of repeat sequence length with high probability of sequencing fragments at each point in the normal state as a feature of the repeat sequence length of the site in the MSS state.
  • For each white blood cell sample at each site find the minimum range of consecutive lengths, so that the number of corresponding sequencing fragments is greater than 75% of the total number of sequencing fragments supported by the site. This continuous length range is called the sample at that site. peak area.
  • the repeat sequence length range selected as the peak region in at least 25% of the white blood cell samples was used as the repeat sequence length feature of the site in the MSS state.
  • the microsatellite sites have a large number of repeated sequences that expand or contract due to erroneous DNA replication.
  • This step is based on paired MSI-H cancer tissues and adjacent normal tissue samples, and describes the types of repeat sequence lengths that differ from the normal state in a large number of sequenced fragments in the MSI-H state, as a feature of the repeat sequence length in the MSI-H state. Because the cancer tissue sample is a mixture of cancer cells and normal cells, the first step of the method is to estimate the proportion of tumor cells in the sample.
  • the specific method is as follows, counting the number of sequencing fragments of the MSS status site repeat sequence length type at each point in cancer tissues and normal tissues adjacent to the cancer, and assuming that the sequencing fragments for the MSS status in the cancer tissue samples are completely derived from normal cells, thereby constructing Linear model to estimate the proportion u of tumor cells.
  • the second step is to normalize the total number of sequenced fragments of cancer tissue and paired normal tissues, and then subtract the corresponding data of paired normal tissues by u times the number of sequenced fragments of each repeat sequence length at each point of the cancer tissue to estimate the complete MSI- H-cancer cell repeat sequence length statistics.
  • the total number of sequencing fragments supported by the repetitive sequence length range is less than 0.2% of the total number of sequenced fragments at the site in the MSS sample, and accounts for more than 50% of the total number of sequenced fragments at the site in the MSI-H sample.
  • Table 1 lists eight microsatellite detection marker sites selected for microsatellite status detection according to the above methods.
  • Figure 1 (A) shows the marker site bMS-BR1. Among them, the MSS status locus repeats feature length ranging from 22-25bp, and MSI-H feature length ranges from 1-16bp.
  • Figure 1 (B) shows the coverage feature map of a non-marker site in two types of samples. Although compared to MSS samples, the repeat length of this site was shortened by about 2bp in the MSI-H state. This change could not be distinguished from the capture fluctuations of leukocytes under the condition of very small ctDNA content, and the marker position was not satisfied. Point screening conditions cannot be used to judge the status of the sample microsatellite.
  • the enrichment analysis of MSI-H features in plasma samples was performed with the background of the number of sequencing fragments corresponding to the length feature set of normal white blood cells in the MSS and MSI-H states. This step is based on a large number of normal white blood cell samples, and calculates the total number of sequencing fragments corresponding to the MSI-H state and the MSS state repeat sequence length set, which are recorded as K and NK, respectively.
  • the sample is also calculated corresponding to the MSI-H state and The number of sequenced fragments k and nk of the MSS state repeat sequence length set. If the sample state is MSS, the characteristics of the sequenced fragments are consistent with the state of the white blood cell sample, which conforms to the hypergeometric distribution.
  • H s -log (P s (X> k s ).
  • the fluctuation range of the enrichment index for each point was obtained.
  • the Zscore of the enrichment index of each point is calculated based on the fluctuation level, and all Zscores are summed to obtain the index MSscore for determining the microsatellite status of the sample.
  • the total number of sequenced fragments ranging from 1 to 16 bp in repeat length K was 504, and the total number of sequenced fragments ranging in length from 1 to 16 bp or 22 to 25 bp N was 190,588.
  • H s fluctuation level was evaluated based on MSS plasma samples, as shown in Table 1, The Zscore value of this site was 108.6. The calculation method of other loci is as described above. Finally, all Zscores are added to obtain the final MSscore of the locus of 355.3.
  • This sample also detected MLH1's suspected pathogenic system frameshift mutation p.D214fs, and pathogenic / suspected pathogenic mutations including PIK3CA, KRAS, PTEN and mutations with unknown pathogenic information including BRCA2, STK11, PMS1 and reagents.
  • the box involves benign mutations in other parts of the gene.
  • MSScore For a plasma sample, based on the MSScore value of the MSS plasma sample, calculate the mean mean and standard deviation SD, and use mean + 3SD as the threshold cutoff. When MSscore> cutoff, the sample is determined to be MSI-H, and MSscore ⁇ cutoff. The sample was determined to be MSS.
  • the NGS detection method is based on the difference in the length of the repeating sequence, and judges the microsatellite status of the sample through 22 marker sites.
  • the method evaluates the range of repeat lengths of the sequenced fragments that are concentrated in the MSS state, and evaluates the percentage change of the total number of sequenced fragments within the range.
  • the mean-3sd is the threshold. If the above ratio of the sample at the site is less than the threshold, the site is judged to be an unstable site. If the total number of unstable sites is less than 15% of the total number of sites, the sample is judged to be MSS, and if it is higher than 40%, the sample is judged to be MSI-H, which is in between, and MSI-L.
  • This detection method can be found in Patent Application No. 201710061152.6. In addition, histopathological sections also completed IHC assessment.
  • the IHC method uses immunohistochemical methods to detect MMR proteins, including the expression of MLH1, PMS2, MSH2, and MSH6 proteins. If one of the proteins is missing, it is determined to be dMMR, and if there is no protein deletion, it is determined to be pMMR. dMMR patients usually show MSI-H due to abnormal mismatch repair mechanisms.
  • the sensitivity and specificity of the bMSISEA detection method are shown in Table 2 by comparing the results of bMSISEA-based detection of the 127 plasma samples with those of their matched tissues.
  • sensitivity sensitivity
  • specificity specificity
  • PPV positive predictive value
  • NPV negative predictive value
  • accuracy accuracy
  • the calculation method is as follows:
  • TP, TN, FP, and FN represent true yang (both tissue and plasma test results are MSI-H), true yin (both tissue and plasma test results are MSS), and false yang (tissue test results are MSS, plasma test The result is MSI-H), and the number of false negatives (tissue test result is MSI-H and plasma test result is MSS).
  • Figure 2 (A) shows the MSscore distribution of MSI based on 127 bowel cancer plasma samples. Based on the bMSISEA method, the MSscore of 83 MSS samples were all less than 15, with a specificity of 100%. The MSscore of 23/44 MSI-H samples was greater than 15, with a sensitivity of 52.3%.
  • Figure 2 (B) depicts the correlation between maxAF and MSscore of MSI-H samples. Only considering samples with maxAF> 0.2%, 15/16 MSI-H samples have MSscore greater than 15, which is accurate Sex reached 93.8%.
  • the detection sensitivity will be affected by the ctDNA content. Therefore, based on real clinical plasma and leukocyte samples, an additional set of 350 simulated samples with different ctDNA content gradients was constructed in this experiment to evaluate the sensitivity of the method to detect microsatellite instability based on plasma samples with different ctDNA content.
  • the ctDNA content of a cancer sample can be estimated using the sample's maximum somatic gene mutation frequency (maxAF).
  • MSI-H detection When maxAF> 0.2%, the sensitivity of MSI-H detection is higher than 93%, maxAF> 0.5%, sensitivity is higher than 98%.
  • the detection of MSI-H is limited when the ctDNA content is too low, when the ctDNA content reaches a stable detection range (maxAF> 0.2%), the bMSISEA method can determine the microsatellite stability of a sample with high accuracy and sensitivity (MS ) Status, which provides the possibility for non-invasive detection of MS status in plasma.
  • the bMSISEA method can obtain sensitivity and high specificity for tissue samples with maxAF> 0.2% (corresponding to ctDNA content greater than 0.4%).
  • tissue MSI detection the plasma MSI detection of the present application has the unique advantages of liquid biopsy, including non-invasive diagnosis, non-tissue specificity, and multiple lesion detection.
  • the detection process of the bMSISEA method does not rely on paired leukocyte samples. It detects the microsatellite status of the sample while detecting mutations, which is cheaper and faster.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Organic Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Zoology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Microbiology (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Surgery (AREA)
  • Urology & Nephrology (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

一方面,本发明公开了生物标志物组合,检测其的试剂盒,及其在血浆样本中的微卫星不稳定(MSI)检测和癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查中的用途。另一方面,本发明提供了一种基于二代测序,通过血浆检测微卫星不稳定(MSI)和疾病相关基因变异的方法,实施该方法的设备,尤其是该检测方法在癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌患者的无创诊断,预后评估,选择治疗方案或遗传筛查中的用途。本申请首次提供了血浆MSI检测方法,可以以高准确度高灵敏度判断样本的微卫星(MS)状态。

Description

一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法
本申请要求在2018年9月29日提交的,申请号为201811149011.0,发明名称为“一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法”和在2018年9月29日提交的,申请号为201811149015.9,发明名称为“一种微卫星生物标志物组合、检测试剂盒及其用途”的优先权。
发明领域
本发明涉及生物标志物组合,检测其的试剂盒,及其在血浆样本中的微卫星稳定状态检测方法和癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查中的用途。
发明背景
微卫星是基因上含有的重复DNA短小序列或单核苷酸区域。在肿瘤细胞中,当DNA发生甲基化或基因突变致错配修复基因缺失时,可导致微卫星重复序列错配(微卫星突变),导致其序列缩短或延长,从而引起微卫星不稳定(microsatellite instability,MSI)。根据MSI不稳定的程度,可分为微卫星高度不稳定(microsatellite instability-high,MSI-H),微卫星低度不稳定(microsatellite instability-low,MSI-L),和微卫星稳定(microsatellite stable,MSS)型。
大量研究表明,MSI参与恶性肿瘤的发生发展过程,与结直肠癌(例如肠癌)、胃癌、子宫内膜癌等发生密切相关。例如,约15%的结直肠癌患者存在MSI-H现象,其中典型的遗传性非息肉病性结直肠癌(hereditary nonpolyposis colorectal cancer,HNPCC)患者90%以上存在MSI-H,表明MSI-H可作为检测是否为HNPCC患者的重要标志物;与MSS(即微卫星稳定)的结直肠癌相比,携 带有MSI-H的结直肠癌患者预后更好,并且二者药物反应也不一样,提示MSI-H可作为结直肠癌预后的独立预测因子。因此,MSI检测对结直肠癌患者意义重大。
2016年最新版美国国立综合癌症网络(National Comprehensive Cancer Network,NCCN,2016 Version 2)的结直肠癌治疗指南第一次明确指出“有结/直肠癌病史的全部患者均应检测MMR(错配修复)或MSI”,因为MSI-H(即高微卫星不稳定)的II期结直肠癌预后良好(单纯手术5y-OS率为80%),且不能从5FU辅助化疗中获益(反而有害)。并且指南首次将PD-1单抗Pembrolizumab和Nivolumab推荐用于具有dMMR/MSI-H分子表型的mCRC末线治疗,充分说明了晚期结直肠癌中检测MMR及MSI的重要性。同时,由于遗传性结直肠癌相关基因较多,在2016最新的NCCN结直肠癌遗传风险评价指南中,建议有明显家族史的患者和家属采用多基因组合(panel)测序进行首次检测。
2017年,默沙东PD-1单抗Keytruda被美国FDA批准用于治疗携带MSI-H或错配修复缺陷(dMMR)的实体瘤患者,再次证明了MSI-H可以作为一种独立于肿瘤发病位置的泛癌种癌症标志物。因此,癌症的MSI检测至关重要。
目前MSI检测方法局限于组织检测,例如国内医院中开展的MMR基因检测通常仅包含MLH1和MSH2,部分同时包含MSH6以及PMS2,其阳性结果与MSI检测结果吻合率较低;仅有极少数医院开展了通过PCR法结合毛细管电泳法的MSI状态检测,且多为外送检测。该方法通常选择5-11个单核苷酸重复位点,长度为25bp左右,PCR扩增后通过毛细管电泳测量其长度分布区间,来确定样本的微卫星(不)稳定状态。该方法为目前的金标准检测方法。近来,基于二代测序的组织MSI检测方法已经证明与PCR-MSI有极高的一致率,可以在判断MSI状态的同时刻画基因组图谱,提供癌症诊断更丰富的信息。然而,这些方法都需要足够的肿瘤细胞占比。由于血浆循环肿瘤DNA(ctDNA)极少,基于组织的方法无法在血浆里推行。
肿瘤血检具有组织不具有的无创性,实时性,非组织特异性等特征,具有重要的临床意义。因此,本领域急需基于血浆的MSI检测方法,尤其是用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估, 治疗方案的选择或遗传筛查中肿瘤血检MSI的方法。
发明概述
本申请首次提供了血浆MSI检测方法,并且,相对于组织MSI检测,本申请的血浆MSI检测具有无创性,实时性,非组织特异性,可以提前发现多发病灶等特征。同时,本发明的方法可以在ctDNA含量很低的血浆样本中完成微卫星状态的检测,填补了通过血浆样本检测微卫星状态的空白,检测速度快,不依赖匹配白细胞样本,价格更低,检测更快捷,可以以高准确度、高灵敏度和高特异性判断样本的微卫星稳定(MS)状态。
同时,本申请的检测方法还可以用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌患者的无创诊断,预后评估,或治疗方案选择中。
具体地,本申请涉及以下方面:
在一个方面,本申请提供生物标志物组合,其包括表1中所示的8个微卫星位点中的一个或多个。
在另一个方面,本申请提供一种生物标志物组合,其包括微卫星位点和一种或多种基因的组合,其中微卫星位点包括权利1中所示的8个微卫星位点或任意一个或多个的组合,其中一种或多种基因为如下41种基因中的任意一种或多种:
AKT1,APC,ATM,BLM,BMPR1A,BRAF,BRCA1,BRCA2,CDH1,CHEK2,CYP2D6,DPYD,EGFR,EPCAM,ERBB2,GALNT12,GREM1,HRAS,KIT,KRAS,MET,MLH1,MSH2,MSH6,MUTYH,NRAS,PDGFRA,PIK3CA,PMS1,PMS2,POLD1,POLE,PTCH1,PTEN,SDHB,SDHC,SDHD,SMAD4,STK11,TP53,UGT1A1。
在另一个方面,本发明提供用于血浆样本中的微卫星稳定状态检测的试剂盒,其特征在于,所述试剂盒包括用于本申请生物标志物组合的检测试剂。
在又一个方面,本发明提供用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查的试剂盒,其特征在于,所述试剂盒包括用于本申请生物标志物组合的检测试剂。
优选地,在本申请提供的试剂盒中,所述血浆样本是癌症血浆样本,优 选结直肠癌血浆样本,例如肠癌血浆样本、胃癌血浆样本、子宫内膜癌血浆样本。更优选地,所述微卫星稳定状态包括微卫星高度不稳定(microsatellite instability-high,MSI-H),微卫星低度不稳定(microsatellite instability-low,MSI-L),和微卫星稳定(microsatellite stable,MSS)型。
在一个实施方案中,在本申请提供的试剂盒中,所述检测试剂为所述检测试剂为进行二代高通量测序(Next-generation sequencing,NGS)的试剂。
另外,本申请还涉及生物标志物组合在检测血浆样本中的微卫星稳定状态中的用途。优选地,所述血浆样本是癌症血浆样本,优选结直肠癌血浆样本,例如肠癌血浆样本、胃癌血浆样本、子宫内膜癌血浆样本。更优选地,所述微卫星稳定状态包括微卫星高度不稳定(microsatellite instability-high,MSI-H),微卫星低度不稳定(microsatellite instability-low,MSI-L),和微卫星稳定(microsatellite stable,MSS)型。
并且,本申请还涉及生物标志物组合在癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查中的用途。
在一个方面,本申请提供了确定能够用于血浆样本中的微卫星稳定状态检测中的微卫星标志位点的方法,其包括如下步骤:
1)检测样本中测序区域的微卫星位点;
2)针对任一微卫星位点i,通过NGS数据统计测序片段(reads)各重复序列长度类型的个数;
3)针对任一微卫星位点,确定微卫星稳定型(MSS)状态下的位点重复序列长度特征和微卫星高度不稳定(MSI-H)状态下的位点重复序列长度特征;其中,MSS长度特征为一段最小范围的连续长度,使得在MSS样本中对应测序片段个数大于位点支持测序片段总个数的75%;MSI-H长度特征为一段在MSS和MSI-H样本中高度区分的连续长度范围,使得a)该范围支持的测序片段总数在MSS样本中不足该位点测序片段总数的0.2%,而b)在MSI-H样本中占该位点测序片段总数的50%以上,
具有以上特征的微卫星位点为微卫星检测标志位点。
在一个实施方案中,在确定微卫星标志位点的方法中,所述样本包括来 自正常白细胞和癌症患者组织的样本,所述癌症优选是结直肠癌(例如肠癌)、胃癌或子宫内膜癌。优选地,用本申请的确定微卫星标志位点的方法所确定的微卫星位点包含表1中所述的8个微卫星位点中的一个或多个。
更优选地,在确定微卫星标志位点的方法中,所述微卫星稳定状态检测用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查。
在一个方面,本申请提供了基于二代高通量测序法通过癌症患者的血浆样本确定微卫星位点稳定状态的方法,其包括如下步骤:
1)基于二代测序法测定血浆样本和作为参考样本的MSS血浆样本中多个微卫星位点的重复序列长度特征,所述多个微卫星位点包括选自表1中所示的8个微卫星位点中的一个或多个微卫星位点;
2)针对1)中所述的任一微卫星位点,计算其对应的富集性指数Zscore;
3)将全部微卫星位点的富集性指数Zscore加和,以得到判断样本微卫星状态的指数MSscore;
4)计算作为参考样本的MSS血浆样本的MSscore的平均值mean和标准差SD,并将其mean+3SD作为阈值cutoff;
5)对于来自癌症患者的血浆样本,当其MSscore>cutoff,判定该样本为MSI-H,当其MSscore≤cutoff,判定该样本为MSS。
在一个实施方案中,在基于二代高通量测序法通过癌症患者的血浆样本确定微卫星位点稳定状态的方法中,所述Zscore由H s评估,
H s=-log(P s(X>k s)评估,
并且
Figure PCTCN2019109036-appb-000001
其中,N为MSI-H状态和MSS状态重复序列长度集的reads总数,K为MSI-H状态重复序列长度集的测序片段总数,N-K为MSS状态重复序列长度集的测序片段总数。相对应的,n和k分别为待测样本中相应测序片段的个数。
在一个实施方案中,在基于二代高通量测序法通过癌症患者的血浆样本确定微卫星位点稳定状态的方法中,MSscore基于以下公式计算:
Figure PCTCN2019109036-appb-000002
优选地,所述癌症是结直肠癌(例如肠癌)、胃癌或子宫内膜癌。
在又一个方面,本申请提供了一种基于二代高通量测序进行患者微卫星稳定状态和疾病相关基因变异的检测,以对该患者或家族的风险控制、治疗和/或预后方案提供临床指导的方法,其包括如下步骤:
(1)同时对如权利要求15中所述的多个微卫星位点进行检测;
(2)根据权利要求15-18中任一项所述的方法确定所述样本的微卫星位点稳定状态;
(3)根据测序结果获得所述一种或多种疾病相关基因的检测结果;
(4)结合上述步骤(2)、(3)的结果对该患者或家族的风险控制、治疗和/或预后方案提供临床指导。
优选地,在本申请提供的基于二代高通量测序进行患者微卫星稳定状态和疾病相关基因变异的检测,以对该患者或家族的风险控制、治疗和/或预后方案提供临床指导的方法中,所述疾病是癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌。
在又一个方面,本申请还涉及用于本申请的各种方法之一的试剂盒,其包含检测所述多个微卫星位点的试剂。
在另一个方面,本申请还提供确定用于血浆样本中的微卫星稳定状态检测中的微卫星标志位点的设备,其特征在于,所述设备中包括:
测序数据读取模块,用于读取测序设备中获得并存储的样本测序数据;
微卫星标志位点检测模块,用于从样本测序数据中分析检测样本中测序区域的全部微卫星位点,
重复序列长度类型判定模块,用于针对任一微卫星位点i,通过测序数据读取模块读取的样本测序数据来统计测序片段(reads)各重复序列长度类型的个数,
判定模块,用于判定任一微卫星位点i是否是微卫星标志位点,所述判定模块包括第一分析模块、第二分析模块和第三分析模板,
所述第一分析模板用于确定微卫星稳定型(MSS)状态下的位点重复序列长度特征,并判定在MSS样本中对应测序片段个数是否大于位点支持测序片段总个数的75%,其中,MSS长度特征为一段最小范围的连续长度,如果得到的是肯定的结果,记为“+”,如果得到的是否定的结果,记为“-”,
所述第二分析模板用于确定微卫星高度不稳定(MSI-H)状态下的位点重复序列长度特征,其中MSI-H长度特征为一段在MSS和MSI-H样本中高度区分的连续长度范围,并判定a)在所述连续长度范围内的测序片段总数在MSS样本中是否不足该位点测序片段总数的0.2%,如果得到的是肯定的结果,记为“+”,如果得到的是否定的结果,记为“-”,
和b)在MSI-H样本中所述测序片段是否占该位点测序片段总数的50%以上,如果得到的是肯定的结果,记为“+”,如果得到的是否定的结果,记为“”
-,
所述第三分析模板用于分析所述第一分析模板和第二分析模板的结果,当得到三个肯定的结果,即三个“+”,判定所述微卫星位点i是微卫星标志位点。
优选地,在本申请提供的确定用于血浆样本中的微卫星稳定状态检测中的微卫星标志位点的设备中,所述样品包括来自正常白细胞和癌症患者组织的样品,所述癌症优选是结直肠癌(例如肠癌)、胃癌或子宫内膜癌。更优选地,通过上述设备所确定的微卫星位点包含表1中所述的8个微卫星位点中的一个或多个。
在一个实施方案中,在本申请提供的确定用于血浆样本中的微卫星稳定状态检测中的微卫星标志位点的设备中,所述微卫星稳定状态检测用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查。
在又一个方面,本方面还涉及基于二代高通量测序法通过癌症患者的血浆样本确定微卫星位点稳定状态的设备,其特征在于,所述设备包括:
测序数据读取模块,用于读取测序设备中获得并存储的样本测序数据;
重复序列长度特征判定模块,用于从样本测序数据中分析得到血浆样本和作为参考样本的MSS血浆样本中多个微卫星位点的重复序列长度特征,所 述多个微卫星位点包括选自表1中所示的8个微卫星位点中的一个或多个微卫星位点;
富集性指数计算模块,用于计算微卫星位点的富集性指数Zscore;
微卫星状态指数计算模块,用于将全部微卫星位点的富集性指数Zscore加和,以得到判断样本微卫星状态的指数MSscore;
阈值计算模块,用于计算作为参考样本的MSS血浆样本的MSscore的平均值mean和标准差SD,并将其mean+3SD作为阈值cutoff;
微卫星位点稳定状态判定模板,用于比较指数MSscore和阈值cutoff,对于来自癌症患者的血浆样本,当其MSscore>cutoff,判定该样本为MSI-H,当其MSscore≤cutoff,判定该样本为MSS。
在一个实施方案中,在基于二代高通量测序法通过癌症患者的血浆样本确定微卫星位点稳定状态的设备中,特征在于所述Zscore由H s评估,
H s=-log(P s(X>k s)评估,
并且
Figure PCTCN2019109036-appb-000003
其中,N为MSI-H状态和MSS状态重复序列长度集的reads总数,K为MSI-H状态重复序列长度集的测序片段总数,N-K为MSS状态重复序列长度集的测序片段总数。相对应的,n和k分别为待测样本中相应测序片段的个数。
优选地,在上述确定微卫星位点稳定状态的设备中,MSscore基于以下公式计算:
Figure PCTCN2019109036-appb-000004
更优选地,在上述确定微卫星位点稳定状态的设备中,所述疾病是癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌。
附图简述
图1.(A)微卫星标志位点bMS-BR1在完全MSI-H癌细胞和白细胞样本中各重复序列长度测序片段(reads)个数的分布。蓝框标注该位点的MSS特征范围22-25bp,红框标注该位点MSI-H特征范围<16bp。(B)非标志位点在完全MSI-H癌细胞和白细胞样本中各重复序列长度片段个数的分布。虽然该位点重复序列长度发生了大约2bp的缩短,这种差异在肿瘤ctDNA含量非常小的条件下,不足以与白细胞本身的捕获波动区分开来,不存在仅在MSI-H样本中高频出现的重复序列长度类型。
图2.bMSISEA检测效果。(A)127例肠癌血浆样本MSscore的分布,其MS状态由配对组织确定,共包含44例MSI-H样本和83例MSS样本。当MSscore高于cutoff=15时,血浆样本判别为MSI-H,MSscore小于等于15,则判别为MSS;(B)44例MSI-H样本maxAF与MSscore的相关性;红点表示MSscore>15,该样本判定为MSI-H,蓝色则表示MSscore不满足阈值,该样本判定为MSS;(C)基于模拟样本的检测敏感性与maxAF的相关性。结果基于具有ctDNA含量梯度差异的350例模拟样本,横轴表示仅统计maxAF大于对应值的样本,纵轴为MSI-H的检出敏感性,当maxAF>0.2%时,MSI-H检出的敏感性高于93%,maxAF>0.5%,敏感性高于98%。
发明详述
本申请提供了一种基于二代测序,首次通过血浆检测微卫星稳定状态和疾病相关基因检测的方法,并基于该检测方法获得了高度灵敏性及特异性的用于检测癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌相关的MSI位点。
此外,本发明建立了一种-确定能够用于基于血浆样本检测微卫星状态的微卫星标志位点的方法。本发明还实现了样本中多个微卫星位点和多个疾病相关基因的同时检测,能够针对所检测的样本给出更加全面的预后、治疗、排查等方面的结论和建议。
因此,本申请首次提供了血浆MSI检测方法,并且,相对于组织MSI检测, 本申请的血浆MSI检测具有无创性,实时性,非组织特异性等特征。同时,本发明的方法可以在ctDNA含量很低的血浆样本中完成微卫星状态的检测,填补了通过血浆样本检测微卫星状态的空白,对ctDNA含量高于0.4%的样本可以达到很高的正确率,检测速度快,不依赖匹配白细胞样本,价格更低,检测更快捷,可以以高灵敏度高特异性地判断样本的微卫星稳定(MS)状态。
此外,本申请的检测方法还可以用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌患者的无创诊断,预后评估,或选择治疗方案中。
并且,本申请还提供了用于确定用于血浆样本中的微卫星稳定状态检测中的微卫星标志位点的设备和基于二代高通量测序法通过癌症患者的血浆样本确定微卫星位点稳定状态的设备。
本发明的发明人发现,对于微卫星高度不稳定样本,其微卫星位点由于DNA的错误复制导致大量重复序列的扩张或收缩。对此,通过比较MSI-H组织样本和正常白细胞样本测序片段(reads)的重复序列长度类型的差异,来寻找MSI-H组织样本下大量出现的,而正常白细胞样本中极少出现的重复序列长度类型,以作为MSI-H状态下位点重复序列长度特征。
具体的标志位点选择的标准如下:a)MSS样本中该重复序列长度范围的测序片段少于该位点测序片段总数的0.2%且b)MSI-H样本中该范围测序片段数占位点支持测序片段数的50%以上,同时,该长度范围定义为MSI-H状态位点重复序列长度的特征通过以上两个条件,方法保证了再极低的ctDNA含量下,覆盖在MSI-H长度特征的测序片段(reads)几乎完全来自于肿瘤DNA。
基于此选择,发明人筛选出了8个微卫星标志位点(具体参见表1)。
表1微卫星检测标志位点信息
Figure PCTCN2019109036-appb-000005
Figure PCTCN2019109036-appb-000006
本申请基于二代高通量测序法确定来自癌症患者的血浆样本中微卫星位点稳定状态的方法,即微卫星不稳定血浆检测技术bMSISEA的主要策略是首先基于组织样本寻找MSI-H和MSS状态下测序片段(reads)覆盖形态截然不同的标志位点,并描述两种状态下位点支持测序片段(reads)的主要长度类型,通过对各标志位点进行关于MSI-H状态测序片段(reads)特征的富集性分析,评估其不稳定状态,进而得到样本微卫星状态的判断。
本申请确定来自癌症患者的血浆样本中微卫星位点稳定状态的方法包括以下几个步骤:1)数据准备,包括样本准备,测序区域的微卫星位点检测,位点重复序列长度类型的统计;2)标志位点筛选及位点特征描述;3)微卫星不稳定特征富集性分析;4)评估各位点富集度指数的平均波动水平。5)基于待测血浆样本富集度指数的相对水平构建MS score,进而判断样本MS状态。
同时本申请提供了以下实施例,以帮助理解本发明,在所附权利要求中给出了本发明的真正的范围。应当理解,在不背离本发明精神的情况下,可以对给出的方法进行修改。
实施例
1.数据准备:基于二代测序法进行基因组合(panel)检测,具体步骤如下:
组织样本的捕获步骤如下:使用QIAamp DNA FFPE tissue kit(QIAGEN:56404)分别提取肿瘤组织与癌旁正常组织DNA。用Qubit 3.0荧光仪配套的dsDNA HS assay kits(ThermoFisher:Q32854)进行精确定量。然后用超声破碎仪Covaris M220(Covaris:PN500295)将DNA物理性片段化到180-250bp长的片段后,进行末端修复、磷酸化,3'端加脱氧腺嘌呤,和接头连接。然后将连接上扩增接头的DNA用AgencourtAMPure XP顺磁性磁珠进行纯化,并使 用PCR聚合酶进行预扩增,扩增后的纯化后产物与Agilent订制的多重生物素标记探针组进行杂交(该基因组合(panel)设计包括41个基因的外显子及部分内含子区域序列)。杂交成功的片段经过特异性洗脱,PCR聚合酶的富集扩增后,进行定量和片段长度分布测定,使用IlluminaNovaseq 6000测序仪(商品号:20012850)进行二代测序,测序深度1000X。
血液样本的捕获步骤如下:首先采用核酸提取试剂分别进行血浆游离DNA和配对的外周血白细胞基因组DNA的提取,并对白细胞基因组DNA进行片段化处理。然后通过加接头、PCR扩增等步骤制备得到全基因组预文库,用带有生物素标记的特定序列的RNA探针与预文库杂交,特异性地捕获人类基因组中41个基因的部分外显子与内含子区域(全编码区、外显子-内含子连接区、UTR区以及启动子区)。用链霉亲和素磁珠富集被探针捕获的DNA片段,将富集的DNA片段作为模板扩增得到最终的文库。对终文库定量和质控后,用IlluminaNovaSeq基因测序仪对终文库进行高通量测序,测序深度15000X。
最终,测得的序列采用BWA 0.7.10版比对到人类基因组序列(版本hg19),采用GATK 3.2进行局部排列优化、使用VarScan2.4.3进行变异响应,使用ANNOVAR和SnpEff 4.3进行变异注释。对于变异响应(calling),VarScanfpfilter将去除覆盖深度过低的位点(组织:50x以下,血浆500x以下,白细胞20x以下);对于插入缺失(indel)和单位点变异,分别至少需要5条和8条变异的测序片段(reads)。
2.基于二代高通量测序(Next-generation sequencing,NGS)数据的微卫星位点中重复序列长度类型的统计
微卫星不稳定检测算法bMSISEA检测过程仅需要癌症血浆样本的二元序列比对(BAM)文件。基线构建过程还需要以下样本的BAM文件:足够的配对的MSI-H癌组织和正常样本(数目大于50),足够的白细胞样本(数目大于100),以及足够的MSS血浆样本(数目大于100)。
此方法首先采用MSIsensor(v 0.5)软件获取测序覆盖区域所有的长度大于10重复序列为1的微卫星位点,并计算微卫星位点中各长度类型重复序列的覆盖测序片段(reads)个数。
MSIsensor统计位点各长度类型覆盖测序片段(reads)个数的方法如下:对每一个微卫星位点,首先在人类基因组中搜索其位置信息及两端序列,并构建由两端序列连接的中间重复序列长度分别为1到L-10bp的所有序列作为搜索字典,L为测序片段(reads)的长度。例如1号染色体上某单碱基的微卫星位点(14T,T是重复的碱基,14是重复的个数),其两端序列分别为ATTCC和GCTTT,构建的搜索字典包含ATTCCTGCTTT(重复长度为1),ATTCCTTGCTTT(重复长度为2),ATTCCTTTGCTTT(重复长度为3)等。而后,从样本的BAM文件中提取至少一端位于位点附近2kb内的配对的测序片段(read pairs),并将其和该位点的搜索字典中的序列进行比对。统计覆盖搜索字典中不同长度序列的测序片段个数,构建位点所有长度类型的测序片段覆盖个数的直方图。
3.微卫星不稳定的标志位点筛选
3.1MSS状态位点重复序列长度特征
对于正常样本的微卫星位点,测序片段大概率的覆盖在样本基因型对应的一种或两种重复序列长度类型上。本步骤基于白细胞样本,描述正常状态下各位点测序片段大概率出现的重复序列长度的类型,作为MSS状态下位点重复序列长度特征。对于每个位点每个白细胞样本,寻找最小范围的连续长度,使得对应的测序片段个数大于位点支持测序片段总个数的75%,该连续长度范围称为该样本在该位点的peak区域。对于每个位点,以至少在25%的白细胞样本中被选为peak区域的重复序列长度范围作为MSS状态下该位点的重复序列长度特征。
3.2MSI-H状态位点重复序列长度特征与标志位点选择
对于微卫星高度不稳定样本,其微卫星位点由于DNA的错误复制导致大量的重复序列的扩张或收缩,在此,我们集中关注长重复序列位点序列收缩的现象。本步骤基于配对MSI-H癌组织和癌旁正常组织样本,描述MSI-H状态下测序片段大量出现的与正常状态差异的重复序列长度类型,作为MSI-H状态 下位点重复序列长度特征。由于癌组织样本是混有癌细胞和正常细胞的混合体,方法第一步估计样本的肿瘤细胞占比。具体方法如下,统计癌组织和癌旁正常组织中各位点对应MSS状态位点重复序列长度类型的测序片段个数,并假设癌组织样本中对于MSS状态测序片段完全来自其中的正常细胞,从而构建线性模型,估计肿瘤细胞占比u。第二步,将癌组织和配对正常组织的测序片段总数进行规范化,然后将癌组织各位点各重复序列长度测序片段数对应减去u倍的配对正常组织的对应数据,从而估计完全的MSI-H癌细胞的重复序列长度统计数据。
对所有的微卫星位点,基于完全MSI-H癌细胞的重复序列长度统计数据,选择有如下特征的位点作为bMSISEA的标志位点,并将其重复序列长度范围作为MSI-H状态位点重复序列长度特征:该重复序列长度范围支持的测序片段总数在MSS样本中不足该位点测序片段总数的0.2%,而在MSI-H样本中占该位点测序片段总数的50%以上。以上两条件保证了在极低的ctDNA含量下,覆盖在MSI-H长度特征的测序片段几乎完全来自癌症DNA。
表1列出按照以上方法筛选出来的用于微卫星状态检测的8个微卫星检测标志位点。图1(A)显示的是标志位点bMS-BR1。其中,MSS状态位点重复序列特征长度范围为22-25bp,MSI-H特征长度范围为1-16bp。图1(B)显示了一个非标志位点在两种类型样本的覆盖特征图谱。虽然相对MSS样本,在MSI-H状态下该位点重复序列长度发生了约2bp的缩短,该变化无法在ctDNA含量极小的条件下,与白细胞本身的捕获波动区分开来,不满足标志位点筛选条件,无法用来进行样本微卫星状态判断。
4.MSI特征富集性分析
对于每一个标志位点,以正常白细胞样本在MSS和MSI-H状态长度特征集对应测序片段个数为背景,进行血浆样本关于MSI-H特征的富集性分析。该步骤基于大量正常白细胞样本,计算其对应于MSI-H状态和MSS状态重复序列长度集的测序片段总数,分别记为K和N-K,对于血浆样本,同样计算该样本对应于MSI-H状态和MSS状态重复序列长度集的测序片段数k和n-k,若样本状态 为MSS,则测序片段特征与白细胞样本状态一致,符合超几何分布
Figure PCTCN2019109036-appb-000007
故而,位点的富集性指数可以由H s评估,H s=-log(P s(X>k s)。
进一步,基于大量MSS血浆样本,得到各位点富集性指数的波动范围。对一个待测血浆样本,基于该波动水平计算各位点富集性指数的Zscore,并将所有Zscore加和,得到判断样本微卫星状态的指数MSscore。
Figure PCTCN2019109036-appb-000008
以bMS-BR1位点为例,基于100例WBC样本,重复序列长度范围为1-16bp的测序片段总数K为504,长度范围为1-16bp或22-25bp的测序片段总数N为190588。对于一个待测样本,该位点重复序列在1-16bp长度范围的测序片段总数k为65,1-16bp或22-25bp的测序片段总数n为1308,从而,H s=-log(P s(X>k s)=-log(P s(X>65)=140.6。进而,基于MSS血浆样本评估H s的波动水平,如表一所示,
Figure PCTCN2019109036-appb-000009
得该位点的Zscore值为108.6。其他位点计算方法如上所述,最后将所有Zscore相加,得到最终该位点的MSscore为355.3。该样本同时检出MLH1的疑似致病的体系移码突变p.D214fs,及包括PIK3CA,KRAS,PTEN的致病/疑似致病突变和包含BRCA2,STK11,PMS1的致病信息不明的突变以及试剂盒涉及的其他部分基因的良性突变。
5.癌症样本的微卫星状态判定
对于一个血浆样本,以MSS血浆样本MSscore的值为基础,计算其平均值mean和标准差SD,并以mean+3SD为阈值cutoff,当MSscore>cutoff,样本判 定为MSI-H,MSscore≤cutoff,样本判定为MSS。
6.bMSISEA微卫星不稳定血浆检测结果
我们使用bMSISEA微卫星检测技术基于表1中所列8个微卫星标志位点和检测试剂盒对127例真实临床肠癌血浆样本进行了包括突变和微卫星检测的NGS检测。样本微卫星状态基于对应患者配对的组织样本,通过IHC和NGS-MSI技术双重确认,最终包含44例MSI-H样本和83例MSS样本。其中,组织检测的方法如下:NGS检测方法基于重复序列的长度的差异,通过22个标志位点,判断样本的微卫星状态。对于每个标志位点,方法评估MSS状态下集中出现的测序片段重复序列长度范围,并评估该范围内测序片段占位点测序片段总数的百分比变化,以mean-3sd为阈值,当待测样本在该位点上述比例小于阈值,则该位点判断为不稳定位点。若不稳定位点总数低于总位点数的15%,样本判为MSS,高于40%,样本判为MSI-H,介于二者之间,判为MSI-L。该检测方法可参见第201710061152.6号专利申请。另外,组织病理切片同时完成了IHC评估。IHC方法通过免疫组化方法检测MMR蛋白,包括MLH1,PMS2,MSH2,MSH6蛋白的表达情况,若其中一个蛋白缺失,则判定为dMMR,若无蛋白缺失,则判断为pMMR。dMMR患者由于错配修复机制异常,通常表现为MSI-H。
通过对该127例血浆样本基于bMSISEA检测结果和与其配对组织检测结果对比,得到bMSISEA方法检测的灵敏度和特异性显示在表2中。
表2.基于127例肠癌血浆的bMSISEA检测结果(以组织检测结果为基准)
Figure PCTCN2019109036-appb-000010
ctDNA(maxAF>0.2%)时,血浆MSI检测准确性达98.5%
Figure PCTCN2019109036-appb-000011
Figure PCTCN2019109036-appb-000012
*基于组织检测的微卫星状态结果由NGS和IHC方法双重确认。检测指标中,灵敏度:sensitivity;特异度:specificity;PPV:阳性预测值(positive predictive value);NPV:阴性预测值(negative predictive value);准确性:accuracy,其
Figure PCTCN2019109036-appb-000013
Figure PCTCN2019109036-appb-000014
Figure PCTCN2019109036-appb-000015
Figure PCTCN2019109036-appb-000016
Figure PCTCN2019109036-appb-000017
计算方法如下:
其中,TP,TN,FP,FN分别表示真阳(组织和血浆检测结果均为MSI-H),真阴(组织和血浆检测结果均为MSS),假阳(组织检测结果为MSS,血浆检测结果为MSI-H),假阴(组织检测结果为MSI-H,血浆检测结果为MSS)样本的个数。
由表2可知,基于血浆样本的MSI-H检测的特异性100%。当无筛选的计入所有样本,由于大部分样本ctDNA含量极低,检测的总体灵敏度仅为52.3%,准确性为83.5%。与之相比,仅筛选其中满足maxAF>0.2%(ctDNA>0.4%)的血浆样本时,检测的灵敏度为93.8%,准确性为98.5%。事实上,当仅选取该组样本中maxAF>0.5%的样本时,检测的准确性为100%。由此可见,bMSISEA在保证检测特异性的基础上,当血浆中包含足够含量的ctDNA时,bMSISEA具有足够高的检测敏感性。
另外,更细节的检测结果如图2所示。图2(A)展示了基于127例肠癌血浆样本的MSI检测的MSscore分布。基于bMSISEA方法,83例MSS样本MSscore 均小于15,特异性100%。23/44例MSI-H样本MSscore大于15,敏感性52.3%。考虑到样本间ctDNA含量的差异性,图2(B)描述了maxAF与MSI-H样本MSscore的相关性,仅考虑maxAF>0.2%的样本,15/16例MSI-H样本MSscore大于15,准确性达93.8%。
7.模拟实验确认血浆中ctDNA含量对检测敏感性的影响
由于血浆中ctDNA含量一般极低,检测敏感性将受到ctDNA含量的影响。因此,基于真实临床血浆和白细胞样本,本实验另外构建了一组350例具有不同ctDNA含量梯度的模拟样本,用以评估不同ctDNA含量下方法基于血浆样本检测微卫星不稳定的敏感性。此处,癌症样本的ctDNA含量可以用样本的最大体细胞基因突变频率(maxAF)评估。
我们选取18对配对的血浆与白细胞样本,基于血浆样本的maxAF按比例混合血浆和白细胞样本的bam文件,并重新下采样至原始血浆样本,模拟不同ctDNA含量梯度的样本350例,用以评估包含不同ctDNA含量的血浆样本检测的敏感度水平。模拟样本采用与真实临床样本同样的突变检测流程进行突变检测,用于确定maxAF水平。如图2(C)所示,横轴为仅统计maxAF大于该阈值的样本,纵轴为MSI-H的检出敏感性,当maxAF>0.2%时,MSI-H检出的敏感性高于93%,maxAF>0.5%,敏感性高于98%。虽然MSI-H的检出在ctDNA含量过低时受限,但当ctDNA含量达到稳定检出范围(maxAF>0.2%),bMSISEA方法可以以高准确度高灵敏度地判断样本的微卫星稳定(MS)状态,为血浆无创检测MS状态提供了可能。
因此,bMSISEA方法对于maxAF>0.2%(约对应ctDNA含量高于0.4%)的血浆样本,可以得到与组织检测匹配的敏感性和极高的特异性水平。相对组织MSI检测,本申请的血浆MSI检测具有液体活检所独有的优势,包括无创诊断,非组织特异性,多发病灶发现等。bMSISEA方法检测过程不依赖配对白细胞样本,在检测突变的同时判断样本的微卫星状态,价格更低,速度更快。

Claims (30)

  1. 生物标志物组合,其包括表1中所示的8个微卫星位点中的一个或多个。
  2. 一种生物标志物组合,其包括微卫星位点和一种或多种基因的组合,其中微卫星位点包括权利1中所示的8个微卫星位点或任意一个或多个的组合,其中一种或多种基因为如下41种基因中的任意一种或多种:AKT1,APC,ATM,BLM,BMPR1A,BRAF,BRCA1,BRCA2,CDH1,CHEK2,CYP2D6,DPYD,EGFR,EPCAM,ERBB2,GALNT12,GREM1,HRAS,KIT,KRAS,MET,MLH1,MSH2,MSH6,MUTYH,NRAS,PDGFRA,PIK3CA,PMS1,PMS2,POLD1,POLE,PTCH1,PTEN,SDHB,SDHC,SDHD,SMAD4,STK11,TP53,UGT1A1。
  3. 试剂盒,其用于血浆样本中的微卫星稳定状态检测,其特征在于,所述试剂盒包括如权利要求1或2所述生物标志物组合的检测试剂。
  4. 试剂盒,其用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查,其特征在于,所述试剂盒包括如权利要求1或2所述生物标志物组合的检测试剂。
  5. 权利要求3或4的试剂盒,其中所述血浆样本是癌症血浆样本,优选结直肠癌血浆样本,例如肠癌血浆样本、胃癌血浆样本、子宫内膜癌血浆样本。
  6. 权利要求3的试剂盒,其中所述微卫星稳定状态包括微卫星高度不稳定(microsatellite instability-high,MSI-H),微卫星低度不稳定(microsatellite instability-low,MSI-L),和微卫星稳定(microsatellite stable,MSS)型。
  7. 权利要求3-6中任一项的试剂盒,其中所述检测试剂为进行二代高通量测序(Next-generation sequencing,NGS)的试剂。
  8. 权利要求1或2中的生物标志物组合在检测血浆样本中的微卫星稳定状态中的用途。
  9. 权利要求8的用途,其中所述血浆样本是癌症血浆样本,优选结直肠癌血浆样本,例如肠癌血浆样本、胃癌血浆样本、子宫内膜癌血浆样本。
  10. 权利要求9的用途,其中所述微卫星稳定状态包括微卫星高度不稳定(microsatellite instability-high,MSI-H),微卫星低度不稳定(microsatellite instability-low,MSI-L),和微卫星稳定(microsatellite stable,MSS)型。
  11. 权利要求1或2中的生物标志物组合在癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查中的用途。
  12. 确定能够用于血浆样本中的微卫星不稳定检测中的微卫星标志位点的方法,其包括如下步骤:
    1)检测样本中测序区域的微卫星位点;
    2)针对任一微卫星位点i,通过NGS数据统计测序片段(reads)各重复序列长度类型的个数;
    3)针对任一微卫星位点,确定微卫星稳定型(MSS)状态下的位点重复序列长度特征和微卫星高度不稳定(MSI-H)状态下的位点重复序列长度特征;其中,MSS长度特征为一段最小范围的连续长度,使得在MSS样本中对应测序片段个数大于位点支持测序片段总个数的75%;MSI-H长度特征为一段在MSS和MSI-H样本中高度区分的连续长度范围,使得a)该范围支持的测序片段总数在MSS样本中不足该位点测序片段总数的0.2%,而b)在MSI-H样本中占该位点测序片段总数的50%以上,
    具有以上特征的微卫星位点为微卫星检测标志位点。
  13. 权利要求12所述的方法,其中所述样本包括来自正常白细胞和癌症患者组织的样本,所述癌症优选是结直肠癌(例如肠癌)、胃癌或子宫内膜癌。
  14. 根据权利要求12所述的方法所确定的微卫星位点,其包含表1中所述的8个微卫星位点中的一个或多个。
  15. 权利要求12-14中任一项所述的方法,其中所述微卫星不稳定检测用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查。
  16. 基于二代高通量测序法通过癌症患者的血浆样本确定微卫星位点稳定状态的方法,其包括如下步骤:
    1)基于二代测序法测定血浆样本和作为参考样本的MSS血浆样本中多个微卫星位点的重复序列长度特征,所述多个微卫星位点包括选自表1中所示的8个微卫星位点中的一个或多个微卫星位点;
    2)针对1)中所述的任一微卫星位点,计算其对应的富集性指数Zscore;
    3)将全部微卫星位点的富集性指数Zscore加和,以得到判断样本微卫星状态的指数MSscore;
    4)计算作为参考样本的MSS血浆样本的MSscore的平均值mean和标准差SD,并将其mean+3SD作为阈值cutoff;
    5)对于来自癌症患者的血浆样本,当其MSscore>cutoff,判定该样本为MSI-H,当其MSscore≤cutoff,判定该样本为MSS。
  17. 权利要求16所述的方法,其中所述Zscore由H s评估,
    H s=-log(P s(X>k s)评估,
    并且
    Figure PCTCN2019109036-appb-100001
    其中,N为MSI-H状态和MSS状态重复序列长度集的reads总数,K为MSI-H状态重复序列长度集的测序片段总数,N-K为MSS状态重复序列长度集的测序片段总数。相对应的,n和k分别为待测样本中相应测序片段的个数。
  18. 权利要求16所述的方法,其中MSscore基于以下公式计算:
    Figure PCTCN2019109036-appb-100002
  19. 权利要求16的方法,其中所述癌症是结直肠癌(例如肠癌)、胃癌或子宫内膜癌。
  20. 一种基于二代高通量测序进行患者微卫星不稳定和疾病相关基因变异的检测,以对该患者或家族的风险控制、治疗和/或预后方案提供临床指导的方法,其包括如下步骤:
    (1)同时对如权利要求16中所述的多个微卫星位点进行检测;
    (2)根据权利要求5-8中任一项所述的方法确定所述样本的微卫星位点稳定状态;
    (3)根据测序结果获得所述一种或多种疾病相关基因的检测结果;
    (4)结合上述步骤(2)、(3)的结果对该患者或家族的风险控制、治疗和/或预后方案提供临床指导。
  21. 权利要求20的方法,其中所述疾病是癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌。
  22. 用于权利要求12-20中任一项的方法的试剂盒,其包含检测所述多个微卫星位点的试剂。
  23. 确定用于血浆样本中的微卫星不稳定检测中的微卫星标志位点的设备,其特征在于,所述设备中包括:
    测序数据读取模块,用于读取测序设备中获得并存储的样本测序数据;
    微卫星标志位点检测模块,用于从样本测序数据中分析检测样本中测序区域的全部微卫星位点,
    重复序列长度类型判定模块,用于针对任一微卫星位点i,通过测序数据读取模块读取的样本测序数据来统计测序片段(reads)各重复序列长度类型的个数,
    判定模块,用于判定任一微卫星位点i是否是微卫星标志位点,所述判定模块包括第一分析模块、第二分析模块和第三分析模板,
    所述第一分析模板用于确定微卫星稳定型(MSS)状态下的位点重复序列长度特征,并判定在MSS样本中对应测序片段个数是否大于位点支持测序片段总个数的75%,其中,MSS长度特征为一段最小范围的连续长度,如果得到的是肯定的结果,记为“+”,如果得到的是否定的结果,记为“-”,
    所述第二分析模板用于确定微卫星高度不稳定(MSI-H)状态下的位点重复序列长度特征,其中MSI-H长度特征为一段在MSS和MSI-H样本中高度区分的连续长度范围,并判定a)在所述连续长度范围内的测序片段总数在MSS样本中是否不足该位点测序片段总数的0.2%,如果得到的是肯定的结果,记为“+”,如果得到的是否定的结果,记为“-”,
    和b)在MSI-H样本中所述测序片段是否占该位点测序片段总数的50%以上,如果得到的是肯定的结果,记为“+”,如果得到的是否定的结果,记为“-”,
    所述第三分析模板用于分析所述第一分析模板和第二分析模板的结果,当得到三个肯定的结果,即三个“+”,判定所述微卫星位点i是微卫星标志位点。
  24. 权利要求23所述的设备,其中所述样本包括来自正常白细胞和癌症患者组织的样本,所述癌症优选是结直肠癌(例如肠癌)、胃癌或子宫内膜癌。
  25. 根据权利要求23所述的设备所确定的微卫星位点,其包含表1中所述的8个微卫星位点中的一个或多个。
  26. 根据权利要求23所述的设备,其中所述微卫星不稳定检测用于癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌的无创诊断,预后评估,治疗方案的选择或遗传筛查。
  27. 基于二代高通量测序法通过癌症患者的血浆样本确定微卫星位点稳定状态的设备,其特征在于,所述设备包括:
    测序数据读取模块,用于读取测序设备中获得并存储的样本测序数据;
    重复序列长度特征判定模块,用于从样本测序数据中分析得到血浆样本和作为参考样本的MSS血浆样本中多个微卫星位点的重复序列长度特征,所述多个微卫星位点包括选自表1中所示的8个微卫星位点中的一个或多个微卫星位点;
    富集性指数计算模块,用于计算微卫星位点的富集性指数Zscore;
    微卫星状态指数计算模块,用于将全部微卫星位点的富集性指数Zscore加和,以得到判断样本微卫星状态的指数MSscore;
    阈值计算模块,用于计算作为参考样本的MSS血浆样本的MSscore的平均值mean和标准差SD,并将其mean+3SD作为阈值cutoff;
    微卫星位点稳定状态判定模板,用于比较指数MSscore和阈值cutoff,对于来自癌症患者的血浆样本,当其MSscore>cutoff,判定该样本为MSI-H,当其MSscore≤cutoff,判定该样本为MSS。
  28. 权利要求27的设备,其特征在于所述Zscore由H s评估,
    H s=-log(P s(X>k s)评估,
    并且
    Figure PCTCN2019109036-appb-100003
    其中,N为MSI-H状态和MSS状态重复序列长度集的reads总数,K为MSI-H状态重复序列长度集的测序片段总数,N-K为MSS状态重复序列长度集的测序 片段总数,相对应的,n和k分别为待测样本中相应测序片段的个数。
  29. 权利要求27的设备,其特征在于MSscore基于以下公式计算:
    Figure PCTCN2019109036-appb-100004
  30. 权利要求27的设备,其特征在于所述疾病是癌症,优选结直肠癌(例如肠癌)、胃癌或子宫内膜癌。
PCT/CN2019/109036 2018-09-29 2019-09-29 一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法 WO2020063964A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
AU2019351522A AU2019351522A1 (en) 2018-09-29 2019-09-29 Second generation sequencing-based method for detecting microsatellite stability and genome changes by means of plasma
US17/281,071 US20210355544A1 (en) 2018-09-29 2019-09-29 Second generation sequencing-based method for detecting microsatellite stability and genome changes by means of plasma
CA3114465A CA3114465A1 (en) 2018-09-29 2019-09-29 Next-generation sequencing-based method for detection of microsatellites stability and genomic changes in plasma samples
JP2021517643A JP2022503916A (ja) 2018-09-29 2019-09-29 血漿サンプル中のマイクロサテライトの安定性およびゲノム変化を検出する次世代シークエンシングに基く方法
BR112021005966-0A BR112021005966B1 (pt) 2018-09-29 2019-09-29 Kits, usos de um painel de biomarcadores, métodos para determinar a estabilidade dos loci de microssatélites e para detectar a instabilidade de microssatélites, bem como dispositivo
EP19867890.6A EP3859010A4 (en) 2018-09-29 2019-09-29 Second generation sequencing-based method for detecting microsatellite stability and genome changes by means of plasma

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201811149011.0 2018-09-29
CN201811149011.0A CN109207594B (zh) 2018-09-29 2018-09-29 一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法
CN201811149015.9A CN109182525B (zh) 2018-09-29 2018-09-29 一种微卫星生物标志物组合、检测试剂盒及其用途
CN201811149015.9 2018-09-29

Publications (1)

Publication Number Publication Date
WO2020063964A1 true WO2020063964A1 (zh) 2020-04-02

Family

ID=69951215

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/109036 WO2020063964A1 (zh) 2018-09-29 2019-09-29 一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法

Country Status (6)

Country Link
US (1) US20210355544A1 (zh)
EP (1) EP3859010A4 (zh)
JP (1) JP2022503916A (zh)
AU (1) AU2019351522A1 (zh)
CA (1) CA3114465A1 (zh)
WO (1) WO2020063964A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705157A (zh) * 2022-03-28 2023-09-05 北京吉因加医学检验实验室有限公司 一种基于二代测序检测血浆样本微卫星状态的方法和装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220483B (zh) * 2021-12-03 2022-11-15 广州达安临床检验中心有限公司 用于微卫星稳定性状态检测的标志物、应用、方法及装置
CN115954049B (zh) * 2023-03-13 2023-05-09 广州迈景基因医学科技有限公司 微卫星不稳定位点状态检测方法、系统及存储介质
CN116543835B (zh) * 2023-04-21 2024-02-06 苏州吉因加生物医学工程有限公司 一种检测血浆样本微卫星状态的方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106755501A (zh) * 2017-01-25 2017-05-31 广州燃石医学检验所有限公司 一种基于二代测序的同时检测微卫星位点稳定性和基因组变化的方法
CN107513565A (zh) * 2017-09-06 2017-12-26 南京世和基因生物技术有限公司 一种微卫星不稳定位点组合、检测试剂盒及其应用
CN109182525A (zh) * 2018-09-29 2019-01-11 广州燃石医学检验所有限公司 一种微卫星生物标志物组合、检测试剂盒及其用途
CN109207594A (zh) * 2018-09-29 2019-01-15 广州燃石医学检验所有限公司 一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106755501A (zh) * 2017-01-25 2017-05-31 广州燃石医学检验所有限公司 一种基于二代测序的同时检测微卫星位点稳定性和基因组变化的方法
CN107513565A (zh) * 2017-09-06 2017-12-26 南京世和基因生物技术有限公司 一种微卫星不稳定位点组合、检测试剂盒及其应用
CN109182525A (zh) * 2018-09-29 2019-01-11 广州燃石医学检验所有限公司 一种微卫星生物标志物组合、检测试剂盒及其用途
CN109207594A (zh) * 2018-09-29 2019-01-15 广州燃石医学检验所有限公司 一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"National Comprehensive Cancer Network (NCCN, 2016 Version 2) guidelines", 2016
See also references of EP3859010A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705157A (zh) * 2022-03-28 2023-09-05 北京吉因加医学检验实验室有限公司 一种基于二代测序检测血浆样本微卫星状态的方法和装置
CN116705157B (zh) * 2022-03-28 2024-01-30 北京吉因加医学检验实验室有限公司 一种基于二代测序检测血浆样本微卫星状态的方法和装置

Also Published As

Publication number Publication date
CA3114465A1 (en) 2020-04-02
EP3859010A4 (en) 2022-06-29
AU2019351522A1 (en) 2021-05-27
US20210355544A1 (en) 2021-11-18
EP3859010A1 (en) 2021-08-04
BR112021005966A2 (pt) 2021-06-29
JP2022503916A (ja) 2022-01-12

Similar Documents

Publication Publication Date Title
CN109207594B (zh) 一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法
JP7022758B2 (ja) マイクロサテライト座の安定性およびゲノム変化を同時に検出する次世代シークエンシングに基づく方法
CN109182525B (zh) 一种微卫星生物标志物组合、检测试剂盒及其用途
US11142798B2 (en) Systems and methods for monitoring lifelong tumor evolution field of invention
WO2020063964A1 (zh) 一种基于二代测序的通过血浆检测微卫星稳定状态和基因组变化的方法
JP6621802B6 (ja) 遺伝的多様体を検出する方法
JP2021061861A (ja) 癌スクリーニング及び胎児分析のための変異検出
CN107779506B (zh) 用于癌症检测的血浆dna突变分析
WO2019047577A1 (zh) 一种微卫星不稳定性的测序数据分析方法、装置及计算机可读介质
TWI732771B (zh) Dna混合物中組織之單倍型甲基化模式分析
CN106778073B (zh) 一种评估肿瘤负荷变化的方法和系统
BR112015004847A2 (pt) métodos para detectar variação de número de cópias, para detectar uma mutação rara em uma amostra e para caracterizar a heterogeneidade de uma afecção anormal em um indivíduo
CN106676178B (zh) 一种评估肿瘤异质性的方法及系统
TW202010845A (zh) 組織特異性甲基化標記
CN116631508B (zh) 肿瘤特异性突变状态的检测方法及其应用
KR102112951B1 (ko) 암의 진단을 위한 ngs 방법
Dos Santos et al. Upregulation of shelterin and CST genes and longer telomeres are associated with unfavorable prognostic characteristics in prostate cancer
CN108342483B (zh) 一组用于非超突变型结直肠癌分子分型的基因及其应用
CN110564851A (zh) 一组用于非超突变型直肠癌分子分型的基因及其应用
BR112021005966B1 (pt) Kits, usos de um painel de biomarcadores, métodos para determinar a estabilidade dos loci de microssatélites e para detectar a instabilidade de microssatélites, bem como dispositivo
Tuncer et al. Germline mutational variants of Turkish ovarian cancer patients suspected of Hereditary Breast and Ovarian Cancer (HBOC) by next-generation sequencing
Szadkowska Application of targeted next-generation sequencing to detect potentially pathogenic alterations in gliomas, circulating tumor DNA and tumor-derived cells: PhD thesis
WO2022018777A1 (en) Method for in vitro diagnosis of head and neck cancer and related kit
CA3233741A1 (en) Microsatellite markers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19867890

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3114465

Country of ref document: CA

Ref document number: 2021517643

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112021005966

Country of ref document: BR

WWE Wipo information: entry into national phase

Ref document number: 2019867890

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2019867890

Country of ref document: EP

Effective date: 20210429

ENP Entry into the national phase

Ref document number: 2019351522

Country of ref document: AU

Date of ref document: 20190929

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 112021005966

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20210326