CN114155911A - Method and system for correcting tumor mutation load - Google Patents

Method and system for correcting tumor mutation load Download PDF

Info

Publication number
CN114155911A
CN114155911A CN202111492879.2A CN202111492879A CN114155911A CN 114155911 A CN114155911 A CN 114155911A CN 202111492879 A CN202111492879 A CN 202111492879A CN 114155911 A CN114155911 A CN 114155911A
Authority
CN
China
Prior art keywords
tumor
mutation load
subject
mutation
sequencing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111492879.2A
Other languages
Chinese (zh)
Inventor
杨玲
郝时光
付骁睿
易玉婷
刘涛
管彦芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Genehome Technology Co ltd
Original Assignee
Shenzhen Genehome Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Genehome Technology Co ltd filed Critical Shenzhen Genehome Technology Co ltd
Priority to CN202111492879.2A priority Critical patent/CN114155911A/en
Publication of CN114155911A publication Critical patent/CN114155911A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method and system for correcting a tumor mutational burden, the method comprising: and (3) correcting the tumor mutation load according to the tumor mutation load obtained by the sequencing data of the sample to be detected taken from the subject by using a formula to obtain the corrected tumor mutation load. By correcting the tumor mutation load, the more accurate differentiation of the curative effect of the drug treatment is realized.

Description

Method and system for correcting tumor mutation load
Technical Field
The invention relates to the field of gene detection, in particular to a method and a system for correcting tumor mutation load.
Background
The effectiveness of PD-L1 inhibitors and/or PD-1 inhibitors is only about 20% in patients not selected for biomarkers (biomarkers). Therefore, the current therapeutic effect prediction indexes of the inhibitors are hot spots for research.
Detection of PD-L1 expression is essential in the first-line immune checkpoint inhibitor monotherapy of non-small cell lung cancer. However, the PD-L1 index also has its limitations. First, PD-L1 expression has limited use of cancer types and drugs. Secondly, PD-L1 expression is unevenly distributed in tumor tissues, easily causing false negative results. In addition, there was inconsistency in the results for different detection antibodies and platforms. Also, obtaining sufficient tumor tissue for molecular detection in a clinical setting often presents difficulties, particularly for patients with advanced disease.
Tumor mutation load is a predictive index of the potential curative effect of the prior research. It is defined as the number of somatic mutations of a coding region, in particular, the sum of the number of somatic mutations of a coding region, including base Substitutions (SNVs), insertion or deletion variations (indels), detected per million bases (Mb) of the genomic range. Studies have shown that tumor mutation burden can be used to estimate the overall neoantigen burden in patients (Rizvi, N.A. et al. cancer immunology. biological antigen specificity to PD-1block in non-small cell capacity cancer. science 348,124-128, 2015). Research also shows that the objective effective rate (ORR) and the sustained clinical benefit rate (DCB%) of the high-tumor mutation load patients are superior to those of the low-tumor mutation load patients. The detection of tumor mutational load can be based on tumor tissue, and can also be based on plasma ctDNA. Compared with a tissue slice, the blood test result is more stable and is not influenced by the deviation of the sample collection. Therefore, there are also increasing plasma-based DNA detection techniques (Wan, J.C. et al. liquid biology community of age: equipped with a means of circulating tumor DNA. Nat. Rev. cancer 17,223-238, 2017).
However, methods and thresholds for assessing tumor mutational burden are still under investigation.
Disclosure of Invention
According to a first aspect, in an embodiment, there is provided a method of correcting a tumor mutational burden, comprising: correcting the tumor mutation load according to the obtained tumor mutation load of the sequencing data of a sample to be detected taken from a subject by using the following formula to obtain the corrected tumor mutation load:
Figure BDA0003399959270000011
in the formula (1), α is a positive number.
According to a second aspect, in an embodiment, there is provided a prediction method comprising: obtaining corrected tumor mutation load of sequencing data of a test sample of a subject according to the method of the first aspect, and predicting the subject to be a high tumor mutation load subject or a low tumor mutation load subject according to the magnitude relation between the tumor mutation load and a threshold value.
According to a third aspect, in an embodiment, there is provided a system for correcting a tumor mutational burden, comprising: the mutation load correction device is used for correcting the tumor mutation load according to the sequencing data of the sample to be detected by using the following formula:
Figure BDA0003399959270000021
in the formula (1), α is a positive number.
According to a fourth aspect, in an embodiment, there is provided a prediction system comprising: and a prediction device, configured to obtain corrected tumor mutation load in the sequencing data of the sample to be tested of the subject according to the method of the first aspect, and predict the subject as a high tumor mutation load subject or a low tumor mutation load subject according to the magnitude relationship between the tumor mutation load and the threshold.
According to a fifth aspect, in an embodiment, there is provided an apparatus comprising:
a memory for storing a program;
a processor for implementing the method according to the first aspect and/or the second aspect by executing the program stored in the memory.
According to a sixth aspect, in an embodiment, there is provided a computer readable storage medium having a program stored thereon, the program being executable by a processor to implement the method of the first and/or second aspect.
According to the method and the system for correcting the tumor mutation load, the tumor mutation load is corrected, and the more accurate differentiation of the curative effect of the drug treatment is realized.
Drawings
FIG. 1 is a flow chart of sample detection according to an embodiment.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The terms "connected" and "coupled" when used herein, unless otherwise indicated, include both direct and indirect connections (couplings).
Definition of
Herein, unless otherwise stated, cfDNA (circulating free DNA or cell free DNA), also called circulating free DNA or cell free DNA, refers to partially degraded, in vivo-derived DNA free from cells in circulating blood or other body fluids.
Herein, unless otherwise indicated, ctdna (circulating tumor DNA) refers to a DNA fragment derived from cfDNA of a tumor, usually a primary tumor or a metastatic new tumor, which is shed after the cells have ruptured and entered the peripheral blood circulation system or other body fluids.
PR herein refers to partial response, where the sum of the radii of the target lesions is reduced by > 30% for at least 4 weeks, based on the sum of the radii of the baseline target lesions.
Herein, PD represents progressive disease (progressive disease), and the sum of all target lesion radii is increased by at least 20% by taking the minimum value of the sum of the target lesion radii as a reference (including the sum of baseline target lesion radii); in addition, the absolute value of the sum of the radii of the target lesions is increased by at least 5 mm; or new lesions may appear.
Herein, SD refers to stable disease (stable disease), and the target lesion radius comprehensively decreases and increases the criterion of PR and PD with reference to the minimum value of the sum of the target lesion radii. Herein, RICIST 1.1 was used to assess efficacy, DCB (sustained Clinical Benefit) is defined as PR or SD for more than 6 months.
As used herein, "highest somatic mutation frequency," also referred to as "highest somatic mutation abundance," refers to the maximum value of the mutation frequency of the detected somatic mutations. For example, when three somatic mutations were detected in sample a, the mutation frequencies were 5%, 10% and 20%, respectively, and the highest somatic mutation frequency in sample a was 20%.
As used herein, "tumor mutational burden" refers to the total number of somatic SNV and Indel variations detected per million bases.
As used herein, "driver mutation" refers to a mutation that drives cancer progression.
As used herein, "Indel," also known as Indel or Indel, refers to a mutation that involves the insertion or deletion of a nucleotide in the genome of a subject.
As used herein, "mutation" refers to a variation from a known reference sequence and includes mutations such as Single Nucleotide Variants (SNVs), copy number variants or variations (CNVs)/aberrations, insertions or deletions (indels), gene fusions, transversions, translocations, frameshifts, duplications, and epigenetic variants. The mutation may be a germline mutation or a somatic mutation. In some embodiments, the reference sequence for comparison purposes is a wild-type genomic sequence, typically the human genome, of the species of the subject providing the test sample.
As used herein, "single nucleotide variant" or "SNV" refers to a mutation or variation of a single nucleotide that occurs at a particular location in a genome.
As used herein, "second-generation sequencing," also known as "second-generation sequencing," "next-generation sequencing," NGS, has increased throughput compared to traditional Sanger (Sanger) -and capillary electrophoresis-based methods, e.g., is capable of producing hundreds of thousands of relatively small sequence reads at a time. Some examples of next generation sequencing techniques include, but are not limited to, sequencing-by-synthesis, sequencing-by-ligation, and sequencing-by-hybridization. In one embodiment, the rationale for second generation sequencing is as follows: modifying 3' -OH of dNTP with azide Group RTG (Reversible Terminating Group); connecting the 4 bases with different fluorescent molecules respectively; RTG can play a role similar to ddNTP to stop reaction when DNA is synthesized; after each synthesis reaction was terminated and the signal read, the RTG and fluorescent molecules were eluted and the next cycle was performed (see website: https:// www.jianshu.com/p/c9ade91 accessed). In one embodiment, the second generation sequencing includes, but is not limited to, Illumina cycle SBS method, huada DNA nanosphere amplification technology, and the like, and the second generation sequencing platform includes, but is not limited to, Geneseq 2000 sequencing platform, MGISEQ-T7 sequencing platform, Illumina sequencing platform, and the like.
Antibodies of programmed death receptor 1 (PD-1) and programmed death ligand 1 (PD-L1) prevent the activation of tumor cell immunosuppressive pathway, PD-1/PD-L1 immunotherapy is a current hot new immunotherapy, aims to resist tumors by using a human body autoimmune system, enables tumor cells to lose self-protection capability by blocking a PD-1/PD-L1 signal channel, has the potential of treating various types of tumors, and is expected to substantially improve the total survival period of tumor patients.
PD-1 is a member of the B7-CD 28 receptor family, designated as involved in classical programmed cell death receptor 1, expressed as an immune checkpoint on activated CD4+ and CD8+ T cells, Natural Killer (NK) T cells, B cells, activated monocytes and dendritic cells. PD-1 has two ligands PD-L1 and PD-L2 (B7-DC). PD-L1 is its cognate ligand, also known as leukocyte differentiation antigen 274(CD274) or B7 homolog (B7-H1), a protein encoded by the CD274 gene in humans. It is constitutively expressed at low levels on antigen presenting cells, vascular endothelial cells, islet cells, and immunologically privileged sites (placenta, testis, eye), and it is also expressed in a variety of malignancies. PD-L2 is another ligand for PD-1 and is found on activated dendritic cells and macrophages.
The physiological role of PD-1 is to ensure T cell homeostasis by limiting T cell activation and proliferation. Thus, ligand PD-L1 expressed on the surface of activated T cells binds to PD-1 to produce an inhibitory signal, reducing cytokine production and proliferation of T cells. PD-1 is involved in maintaining peripheral self-tolerance, inhibiting activation and/or proliferation of autoreactive T cells. That is, when PD-1 binds to its cognate ligand, PD-L1, suppression of the immune response occurs, resulting in immune tolerance and prevention of normal tissue damage. Impairment of PD-1/PD-L1 signaling may lead to the development of autoimmune diseases. Immune checkpoint receptor PD-1/PD-L1 signaling has now been defined as a key pathway to modulate the balance between immune activation and tolerance. In tumor cells, the binding of PD-L1 expressed on tumor cells to PD-1 expressed on T lymphocytes leads to T cell proliferation, inhibition of cytokine secretion and an increase in regulatory T cells (tregs), together resulting in immune tolerance. Studies in solid tumors have demonstrated that PD-1/PD-L1 signaling allows escape of immune surveillance, transforming the tumor microenvironment into a tumor protective immunosuppressive environment. In particular, expression of PD-L1 on tumor cells inhibits T cell activation and Cytotoxic T Lymphocyte (CTL) -mediated tumor lysis. PD-1/PD-L1 signaling promotes tumor growth while also suppressing effector cell-mediated anti-tumor immune responses. These effector cells can remain functional and might be reactivated by blocking the PD-1/PD-L1 axis. The binding of PD-L1 and PD-1 also produces a reverse signal in tumor cells, promoting tumor cell survival and inducing resistance to chemotherapy. Blocking PD-1/PD-L1 signaling using clinically relevant anti-PD-1 and/or PD-L1 monoclonal antibodies can restore the immune response and have achieved significant clinical effects in solid tumors (including melanoma and lung cancer), providing a very promising new immunotherapeutic strategy for the treatment of malignancies.
In view of the deficiencies of the prior art, there is a need in the art for an effective method of assessing the detected tumor mutational burden to achieve a better predictive effect of therapeutic efficacy.
According to a first aspect, in an embodiment, there is provided a method of correcting a tumor mutational burden, comprising: correcting the tumor mutation load according to the obtained tumor mutation load of the sequencing data of a sample to be detected taken from a subject by using the following formula to obtain the corrected tumor mutation load:
Figure BDA0003399959270000051
in the formula (1), α is a positive number. Because the value of the maximum somatic mutation frequency may be relatively small, the value obtained by directly dividing the tumor mutation load by the maximum somatic mutation frequency becomes large, so that the tumor mutation load is adjusted to an order of magnitude convenient for reading and understanding by multiplying by a constant alpha, and the specific value of the constant alpha can be determined according to the ratio of the tumor mutation load to the maximum somatic mutation frequency. The specific value of the constant α should not be changed once determined.
In one embodiment, α is a positive number.
In one embodiment, 0 < α < 1.
In one embodiment, α includes, but is not limited to, 0.1.
In one embodiment, the maximum somatic mutation frequency of the sequencing data of the sample to be tested is greater than or equal to the threshold, in other words, only the sample with the maximum somatic mutation frequency greater than or equal to the threshold (e.g., may be 2%) is corrected.
In one embodiment, the threshold may be 2%. In other words, test samples with a maximum somatic mutation frequency of < 2% were not corrected for tumor mutation burden. The tumor mutation load correction is not carried out on the sample to be detected with the maximum somatic mutation frequency of less than 2 percent, on one hand, the curative effect of the sample with the low maximum somatic mutation frequency is mostly better, and no other index is needed for prediction; on the other hand, after the sample with the maximum somatic mutation frequency of less than 2% is corrected by the method, the value is higher, and false positive is easily caused.
The specific threshold may be determined as needed, or may be 1%, or other threshold.
In one embodiment, the frequency of somatic mutation is greater than or equal to 1 ‰.
In one embodiment, the frequency of somatic mutation is greater than or equal to 5 ‰.
In one embodiment, the somatic mutations are SNV (single nucleotide variation) and Indel (insertion \ deletion).
In one embodiment, the somatic mutation comprises a synonymous mutation.
Synonymous mutations are mutations in a DNA fragment that sometimes do not alter the encoded amino acid for a certain base pair. The reason for this is that the codon at this position is degenerate before and after the mutation. Such as: CTA and CTG both encode leucine, and if a is mutated to G, the mutation is a synonymous mutation.
In one embodiment, the somatic mutation does not comprise a driver mutation.
In one embodiment, the sequencing data is cfDNA sequencing data.
In one embodiment, the sequencing data is second generation sequencing data.
In one embodiment, the method for detecting tumor mutational burden comprises: according to the sequencing data of a sample to be tested, extracting the somatic mutation of a coding region, determining the number of the somatic mutations (namely the number of the somatic mutations), and calculating the tumor mutation load according to the following formula:
Figure BDA0003399959270000052
in one embodiment, the mutation number refers to a somatic mutation number.
In one embodiment, the somatic mutation number refers to the somatic mutation number of the coding region.
In one embodiment, the sequencing length is the size of the region of the assay to which the sequencing is directed, calculated as the sum of the overlap between each probe designed to capture the gene of interest, removed;
in one embodiment, the probe for capturing the gene related to the target region is a sample probe, and the probe for capturing the gene related to the whole genome is a whole genome probe;
in one embodiment, the measured region is the target region that is measured by capture with a sample probe when detecting a mutant burden on the target tumor, or the corresponding region that is captured with a whole genome probe when detecting a whole genome tumor burden.
In one embodiment, the sequencing length is the sequencing length of the coding region in Mb, i.e., megabases.
In one embodiment, the sequencing length is the size of the coding region targeted for capture by the probe used for sequencing.
In some embodiments, the tumor mutational burden comprises a number of somatic mutations within a region of the tumor genome. In some embodiments, the somatic mutation comprises a DNA alteration in a non-germline cell and is typically present in a cancer cell. In some embodiments, somatic mutations can be detected in tumor tissue samples and/or body fluid samples.
In some embodiments, somatic mutations refer to mutations that occur in somatic cells other than sex cells, i.e., that do not result in a genetic alteration in the offspring, but that may result in a change in the genetic structure of certain cells of the present generation.
In one embodiment, the sequencing data is second generation sequencing data captured by the region, i.e., data obtained by probe capture and second generation sequencing.
In one embodiment, the sample to be tested includes, but is not limited to, a body fluid sample.
In one embodiment, the body fluid sample includes, but is not limited to, at least one of blood, plasma.
According to a second aspect, in an embodiment, there is provided a prediction method comprising: obtaining corrected tumor mutation load of sequencing data of a test sample of a subject according to the method of the first aspect, and predicting the subject to be a high tumor mutation load subject or a low tumor mutation load subject according to the magnitude relation between the corrected tumor mutation load and a threshold value.
It should be noted that the above-mentioned determination result is only an intermediate reference result and cannot be used as a final diagnosis result, and in the actual diagnosis process, a doctor is usually required to make a comprehensive determination according to the clinical symptom performance, the patient history, the family history and other detection results of the subject, for example, for a patient with non-small cell lung cancer, an X-ray examination result, a bronchoscopy, a cytological examination, a chest examination, an ECT examination, a mediastinum examination and the like are usually required to be combined to obtain the final diagnosis result. Therefore, the method for predicting the therapeutic effect provided by the present invention is not a diagnostic method for diseases, and is not a therapeutic method for diseases.
In one embodiment, the subject is predicted to be a high tumor mutation burden subject if the corrected tumor mutation burden is greater than or equal to the threshold value, otherwise, the subject is predicted to be a low tumor mutation burden subject.
For patients with high tumor mutation load, the method can predict that the patients can respond well to immunotherapy drugs, namely the treatment effect is good; one major application scenario is to test patients prior to immunotherapy, and in the case of patients with high tumor mutation load, specific immunotherapeutic drugs may be considered in their treatment regimen.
In one embodiment, the method further comprises predicting the treatment effect of the drug on the disease according to the size relation between the corrected tumor mutation load and the threshold value.
In one embodiment, the upper quartile (or other quantile) is typically used when there is insufficient efficacy information; in some embodiments, if a complete efficacy result is available, an efficacy analysis may be performed by the method of the invention, and the value that minimizes HR (haz ard ratio) is selected as the threshold.
In one embodiment, the disease includes, but is not limited to, cancer. Cancer is generally referred to as a malignancy.
In one embodiment, the cancer includes, but is not limited to, at least one of non-small cell lung cancer, melanoma, colorectal cancer, bladder cancer, endometrial cancer, cervical cancer, and the like. This is merely an exemplary list and the present invention is applicable to a variety of cancers.
In one embodiment, the drug includes, but is not limited to, at least one of PD-1, PD-L1 inhibitors, and the like, and specifically may include, but is not limited to, at least one of palbociclumab (Pembrolizumab), atelizumab (Atezolizumab), and the like. This is merely an exemplary list of drugs that can be used to predict efficacy using tumor mutational burden.
According to a third aspect, in an embodiment, there is provided a system for correcting a tumor mutational burden, comprising: the mutation load correction device is used for correcting the tumor mutation load according to the sequencing data of the sample to be detected by using the following formula:
Figure BDA0003399959270000071
in the formula (1), α is a positive number.
According to a fourth aspect, in an embodiment, there is provided a prediction system comprising: and a prediction device, configured to obtain corrected tumor mutation load in the sequencing data of the sample to be tested of the subject according to the method of the first aspect, and predict the subject as a high tumor mutation load subject or a low tumor mutation load subject according to the magnitude relationship between the tumor mutation load and the threshold.
According to a fifth aspect, in an embodiment, there is provided an apparatus comprising:
a memory for storing a program;
a processor for implementing the method according to the first aspect and/or the second aspect by executing the program stored in the memory.
According to a sixth aspect, in an embodiment, there is provided a computer readable storage medium having a program stored thereon, the program being executable by a processor to implement the method of the first and/or second aspect.
According to a seventh aspect, in one embodiment, the present invention provides a method for calculating tumor mutation burden in a plasma sample based on secondary sequencing, the method comprising:
1) obtaining a DNA sample library of a subject;
2) hybridizing the DNA sample pool with a targeted capture probe, sequencing the probe capture sequence, and determining somatic mutations in the coding region, thereby determining the number of the somatic mutations, the somatic mutations being SNV and Indel, the somatic mutations including synonymous mutations, the somatic mutations not including driver mutations;
wherein, the detected mutation abundance (or frequency) is more than or equal to 1 per thousand, preferably more than or equal to 5 per thousand for the plasma sample;
3)
Figure BDA0003399959270000072
wherein the mutation of the coding region refers to the somatic mutation in step 2);
the size of the coding region refers to the size of the coding region of the targeted capture probe in step 2) (unit: mb, i.e., megabases).
According to an eighth aspect, in one embodiment, the invention provides a method for correcting tumor mutational burden in a plasma sample based on secondary sequencing, the method comprising:
1) obtaining a tumor mutational burden as described in the seventh aspect;
2)
Figure BDA0003399959270000073
wherein α is a positive number.
In one embodiment, the method further comprises the step of controlling the quality of the plasma sample: samples with a maximal gross cellular mutation abundance (or frequency) of less than 2% were not corrected for tumor mutation burden.
Example 1
Detection, calculation and correction of tumor mutation load of non-small cell lung cancer plasma sample and determination of threshold value
In this example, a peripheral blood sample was taken from a patient with non-small cell lung cancer (the type of cancer was determined by a physician who performed a clinical diagnosis, which was determined prior to taking the sample for testing in this example). 96 plasma samples of the primary-diagnosis non-small cell lung cancer are taken, the tumor mutation load is detected, calculated and corrected, and the threshold value for distinguishing the tumor mutation load is determined according to the upper quartile of the total samples.
DNA extraction
For whole blood, firstly, plasma/blood cell separation is carried out, specifically 10mL of peripheral blood is collected, plasma/blood cell separation is carried out in time (EDTA anticoagulation tube, within 4 h; Streck tube, within 72 h), and the separation steps are as follows:
first, the plasma was centrifuged at 1600 Xg for 10min at 4 ℃ and the supernatant was dispensed into 1.5mL or 2.0mL centrifuge tubes. After plasma separation, the middle layer + bottom layer blood cells were kept for use as normal controls. Then, the cells were centrifuged at 16000 Xg for 10min at 4 ℃ to remove residual cells, and the supernatant was transferred to a new 1.5mL or 2.0mL centrifuge tube to obtain the desired plasma.
This example uses the gDNA sequencing results of blood cells as controls for the exclusion of germline mutations. Plasma cfDNA was extracted according to the QIAa mp Circulating Nucleic Acid Kit (Qiagen) extraction Kit. The gDNA of the blood cells is extracted according to the QIAa mp DNA Mini Kit extraction reagent instruction, and then the quantitive method is adopted, wherein the gDNA of the blood cells is required to be more than 100ng, and the cfDNA of the blood plasma is required to be more than 25 ng.
2. Library construction
For gDNA of blood cells, the gDNA is firstly broken to 200-250 bp and then is processed according to the sequence
Figure BDA0003399959270000082
UltraTMII instructions for DNA library construction kits sample libraries were constructed. cfDNA isolated from plasma
Figure BDA0003399959270000083
UltraTMII DNA library construction kit instructions construct the sample library.
2.1 end repair and addition of "A"
The end repair and addition of "A" reactions were configured as follows
TABLE 1
Components Single reaction volume (μ L)
End Prep Reaction Buffer 7
End Prep Enzyme Mix 3
cfDNA or fragmented DNA 50
Total volume 60
In Table 1, both the End Prep Reaction Buffer and the End Prep Enzyme Mix
Figure BDA0003399959270000081
UltraTMIIDNA library construction kit reagents.
The mixture in table 1 was shaken well and centrifuged and then incubated on a thermostatic mixer according to the following steps: incubation was first at 20 ℃ for 30min, then at 65 ℃ for 30 min. After incubation, the temperature is reduced to room temperature, a high-speed centrifuge is used for short-time centrifugation, and the liquid on the tube wall is centrifuged to the tube bottom.
2.2 Joint connection
The linker ligation reaction Premix (Premix) was prepared according to the following table.
TABLE 2
Figure BDA0003399959270000091
The amount of linker added varied with the initial amount of DNA, and the correspondence was as shown in the following table.
TABLE 3
Type of sample Initial volume of reservoir building 15 μ M connector volume (μ L)
Blood cells 800ng 4
cfDNA >25ng 4
31 μ L of linker-ligation reaction Premix and a linker of the corresponding volume were sequentially added to the reaction tube and ddH was used2The volume of the O is supplemented to 95 mu L, and the mixture is fully shaken, mixed and centrifuged. Incubating for 15min at the constant temperature of the mixer 20 ℃. After incubation, the high-speed micro-centrifuge is used for short-time centrifugation, and the liquid on the tube wall is centrifuged to the tube bottom.
After the ligation reaction was completed, the linker ligation product was purified using magnetic beads and finally redissolved in 25. mu.L of TE buffer (pH 8.0).
2.3 Pre-Capture PCR (Non-C-PCR) introduction of index
Reaction components were added to the PCR tube in the order shown below and a negative/positive control was set.
TABLE 4
Components Single reaction volume (μ L)
index Primer/i7Primer-P7(10μM) 2.5
index Primer/i7Primer-P5(10μM) 2.5
KAPA HiFi HotStart Ready Mix 25
Adapter-Ligated library 20
Total volume 50
And (4) oscillating, uniformly mixing and centrifuging, and centrifuging the liquid on the tube wall to the tube bottom.
2.3.1PCR on-machine sample cycle number corresponding relation
TABLE 5
Type of sample Number of PCR cycles
Blood cells 5
cfDNA 8
2.3.2PCR computer program
The PCR program for the Gene + Seq2000 sequencer was as follows:
TABLE 6
Figure BDA0003399959270000092
The Non-C-PCR product was purified and finally dissolved in 31. mu.L of TE buffer (pH 8.0). And quantifying the purified product by using an enzyme-labeling instrument or a Qubit-HS, and detecting a library sample of a micro-library by using a LabChip GX Touch microfluidic capillary electrophoresis system to judge the size of the fragment.
3. Enrichment of target sequences and sequencing on machine
After the quality control of the library is qualified, the target capture probes disclosed in paragraphs 67 to 89 of the published Chinese patent sequence combination for detecting tumor mutation load and its design method (application publication No. CN109427412A) are used, and hybridization capture is performed according to the instruction provided by chip manufacturers (Integrated DNA Technologies, IDT). Finally, elution and redissolution of 21. mu.L ddH2And (4) hybridizing an O band to elute the magnetic beads.
3.1 amplification of hybrid Capture products
3.1.1 removing the magnetic beads in the previous step, then purifying the magnetic beads, and finally redissolving 25 μ L of ddH2And O, performing Quality Control (QC), judging that the library with the concentration of more than 5 ng/mu L is qualified, and performing on-machine sequencing.
3.1.2 sequencing was performed on a Gene + Seq2000 sequencer, the sequencing experiments were performed according to the manufacturer's instructions. The data amount on the computer is required to be as follows: the effective depth of sequencing data obtained for blood cell samples was 250 x, and the effective depth of sequencing data obtained for plasma samples was 1000 x.
4. Information analysis
The information analysis is carried out according to the published Chinese patent 'a low-frequency mutation Enrichment sequencing method of free target DNA in plasma' (application publication No. CN105063208A) in the specification from paragraph 116 to paragraph 125, which is a low-frequency mutation Enrichment sequencing technology of plasma cfDNA-ER-Sequence (Enrichment & Rallele Sequence) information analysis flow (RealSeq Pipel), and the specific method is as follows:
4.1 based on the sequence base at the two ends of the insert as the label, the insert is the DNA fragment connected with the joint primer in the library, and each fragment forms a pair of paired sequencing sequences through double-end sequencing; using the first 12bp base of the sequencing sequence 1 and the first 12bp base of the sequencing sequence 2 of the paired sequencing sequence as tags, connecting smaller tags into a 24bp index in alphabetical sequence arrangement, using the 24bp as the index of the paired sequencing sequence, marking the tag of the sequencing sequence 1 into a positive strand in the front, and marking the tag of the sequencing sequence 2 into a reverse strand in the front;
4.2, externally sequencing the indexes to achieve the aim of gathering all sequencing repeated sequencing sequences of the same DNA template;
4.3 the gathered sequencing sequences with the same index are subjected to center clustering, each large cluster with the same index is clustered into a plurality of small clusters according to the Hamming distance between the sequences, and the Hamming distance between any two pairs of paired sequencing sequences in each small cluster is not more than 10, so that the purpose of distinguishing the sequencing sequences with the same index but from different DNA templates is achieved;
4.4, screening the repeated clusters of the same DNA template obtained in the step 4.3, and if the sequencing sequence number of the positive strand and the reverse strand reaches more than 2 pairs, carrying out subsequent analysis;
4.5 error correction is performed on the clusters that meet the conditions in step 4.4 and a pair of error-free new sequencing sequences is generated. For each sequencing base of the DNA template, if the consistency rate of a certain base type in the sequencing sequence of the positive strand reaches 80 percent and the consistency rate in the sequencing sequence of the reverse strand also reaches 80 percent, recording the base of the new sequencing sequence as the base type, or recording the base as N, thus obtaining a new sequencing sequence representing the original DNA template sequence;
4.6 alignment of the sequence to the reference genome GRCh37(hg19) using the bwa mem algorithm, screening out the sequencing sequence with alignment quality less than 30;
4.7 labeling duplicates according to the sequencing sequence obtained in step 4.6, treated with MarkDuplicates from picard;
4.8 correcting the sequence base quality value with BaseRecalibrator of GATK using indels at the end of the indelraligner correction sequence of GATK with reference to best practice of GATK (best practice);
4.9, counting according to the sequencing sequence obtained in the step 4.8 to obtain the base type distribution of each site in the capture area, and counting the coverage size, the average sequencing depth, the positive and negative strand interworking rate and the low-frequency mutation rate of the target area;
4.10 Call SNV/Indel: according to the information comparison of the test sample and the control sample, carrying out somatic SNV/Indel trapping (not related to SV and CNV information analysis) by adopting Mutect2 and RealDaler process; the screening parameters used were: the variation rate of the control site is less than or equal to 2 percent, the number of the variation sequencing sequences after error correction is more than or equal to 2, and the mutation prediction p value is less than or equal to 0.05;
4.11 variant notes: annotating the function of the variation, the number of variant sequencing supports, the frequency of variation, and the amino acid variations remembering the instances of the variation in the variation database.
5. Calculation of tumor mutational burden
5.1 determining that the frequency of the SNV and Indel included in the calculation is more than or equal to 5 per mill according to the effective depth requirement of the sequencing data of the plasma sample;
5.2 the coding region size of TMB was calculated to be 1Mb in this example.
In this example, the formula for correcting the tumor mutation load is as follows:
Figure BDA0003399959270000111
6. the result of the detection
The detection results of this example are as follows:
TABLE 7
Figure BDA0003399959270000112
Figure BDA0003399959270000121
Figure BDA0003399959270000131
The samples in Table 7 are those without efficacy information, i.e., the subjects to which the samples were not treated with the inhibitor, and the data from this table are used to assess the overall TMB-H ratio.
Since the upper quartile of the tumor mutation load before correction of the plasma sample of non-small cell lung cancer was 9.36 and the upper quartile of the tumor mutation load after correction was 12.15, in this example, 9 and 12 were selected as thresholds for distinguishing the tumor mutation load before and after correction, respectively, and specifically, subjects having a tumor mutation load before correction of not less than 9 or a tumor mutation load after correction of not less than 12 were judged as subjects having a high tumor mutation load.
Example 2
Detection, calculation and correction of tumor mutation load of non-small cell lung cancer plasma sample and curative effect prediction analysis
In this example, the detection, calculation and correction of tumor mutation load of a plasma sample of non-small cell lung cancer were carried out in reference to example 1.
The detection results of this example are as follows:
TABLE 8
Figure BDA0003399959270000141
Figure BDA0003399959270000151
The subjects to which the samples in the above table belong, when receiving medication, used medications include atezolizumab, durvalumab, nivolumab, BGB-A317, IBI308, ipilimumab, Jun Zi A, Jun Zi B, and Baiji.
As can be seen from the above table, when the pre-correction tumor mutation load 9 was selected as the threshold, the risk ratio of the high tumor mutation load patients to the low tumor mutation load patients was 0.78 (95% CI 0.42-1.45, log-rank p 0.3966) in the 50 patients with non-small cell lung cancer of this example; when the post-correction tumor mutation load of 12 was selected as the threshold, the risk ratio of the high tumor mutation load patients to the low tumor mutation load patients was 0.48 (95% CI 0.27-0.85, log-rank p 0.0060) in 50 patients with non-small cell lung cancer in this example. Before correction, the DCB of patients with high and low tumor mutation load is respectively 38% (12/32) and 17% (3/18), and Fisher's Exact Test p is 0.1990; after correction, the patients with high and low tumor mutation load had DCB of 45% (10/22) and 18% (5/28), respectively, and Fisher's Exact Test p 0.0608. The above results indicate that the tumor mutation load after correction has better efficacy prediction performance than before correction.
The patient-administered inhibitors of the samples of this example, from previous studies and from clinical trial data per se, have been shown to be more effective in patients with high tumor burden, and therefore, our data results show that the prediction method of this example is more accurate if the risk ratio between the high and low mutation burden groups is smaller, the DCB in the high and low mutation burden groups is higher and lower.
In survival analysis, the risk ratio refers to the ratio of the risk rates described by two levels of one explanatory variable. For example, in a drug trial, a treatment group patient has a 2-fold higher mortality rate per unit time than a control group patient, and the risk ratio is 2. The risk ratio was calculated by the prism graphic 8 software or the R packages survivor and survival using the log-rank method. The calculation process is common within the industry.
This example corrects with bTMB divided by MSAF, whereas the prior art corrects with only low frequency mutations (4.5% or less than 5%). The advantages of this embodiment are: at a comparable level of risk ratio, the screened positive populations are complementary, i.e., this embodiment allows screening of potentially benefitting patients not screened by the prior art.
Example 3
Detection, calculation and correction of tumor mutation load in esophageal cancer plasma samples
In this example, the detection, calculation and correction of tumor mutation load in esophageal cancer plasma samples were performed according to example 1.
In this example, the tumor mutation load threshold before and after esophageal cancer correction was set in example 1.
The detection results of this example are as follows:
TABLE 9
Figure BDA0003399959270000152
Figure BDA0003399959270000161
Figure BDA0003399959270000171
When the subjects to which the samples in the above table belong receive drug therapy, the drug used is the messenger IBI308, which is a PD-1 antibody.
As can be seen from the above table, when pre-correction tumor burden 9 was chosen as the threshold, the risk ratio of high tumor burden patients to low tumor burden patients was 0.82 (95% CI 0.46-1.47, lo g-rank p 0.4523) in 54 patients with esophageal cancer of this example; when the corrected tumor mutation load of 12 was selected as the threshold, the risk ratio of the patients with high tumor mutation load to the patients with low tumor mutation in 54 patients with esophageal cancer in this example was 0.52 (95% CI 0.27-1.00, log-rank p ═ 0.0777). Before correction, the DCB of patients with high and low tumor mutation load is 24% (8/33) and 10% (2/21), and Fisher's Exact Test p is 0.2838; after correction, the patients with high and low tumor mutation load had DCB 33% (3/9) and 16% (7/45), respectively, and Fisher's Exact Test p 0.3425. The above results indicate that the tumor mutation load after correction has better efficacy prediction performance than before correction.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims (10)

1. A method of correcting a tumor mutational burden comprising: correcting the tumor mutation load according to the obtained tumor mutation load of the sequencing data of a sample to be detected taken from a subject by using the following formula to obtain the corrected tumor mutation load:
Figure FDA0003399959260000011
in the formula (1), α is a positive number.
2. The method of claim 1, wherein in formula (1), 0 < α < 1;
preferably, in formula (1), α is 0.1.
3. The method of claim 1, wherein the maximum somatic mutation frequency of the sequencing data of the test sample is greater than or equal to a threshold value;
preferably, the threshold is 2%.
4. The method according to claim 1, wherein the frequency of somatic mutations is greater than or equal to 1%, preferably greater than or equal to 5%;
the somatic mutations comprise SNV and Indel;
the somatic mutation comprises a synonymous mutation;
the somatic mutation does not comprise a driver mutation;
the sequencing data is cfDNA sequencing data;
the sequencing data comprises second generation sequencing data;
the detection method of the tumor mutation load comprises the following steps: according to the sequencing data of a sample to be detected, extracting the somatic mutation of a coding region, determining the number of the somatic mutation, and calculating the tumor mutation load according to the following formula:
Figure FDA0003399959260000012
the mutation number refers to the somatic mutation number, and the sequencing length is the size of a sequencing region;
the sequencing length is the size of a coding region targeted and captured by a probe used for sequencing;
the unit of the sequencing length is Mb, i.e., megabases;
the sample to be tested comprises a body fluid sample;
the body fluid sample comprises at least one of blood and plasma.
5. A prediction method, comprising: the method according to any one of claims 1 to 4, wherein the corrected tumor mutation load of the sequencing data of the sample to be tested of the subject is obtained, and the subject is predicted to be a subject with a high tumor mutation load or a subject with a low tumor mutation load based on the magnitude relationship between the corrected tumor mutation load and the threshold.
6. The prediction method of claim 5, wherein if the corrected tumor mutation load is greater than or equal to the threshold value, the subject is predicted to be a high tumor mutation load subject, otherwise, the subject is predicted to be a low tumor mutation load subject;
the prediction method also comprises the step of predicting the treatment effect of the medicine on the disease according to the size relation between the tumor mutation load and the threshold value;
the disease includes cancer;
the cancer comprises at least one of non-small cell lung cancer, melanoma, colorectal cancer, bladder cancer, endometrial cancer, and cervical cancer;
the medicament comprises at least one of a PD-1 inhibitor and a PD-L1 inhibitor.
7. A system for correcting a tumor mutational burden, comprising: the mutation load correction device is used for correcting the tumor mutation load according to the sequencing data of the sample to be detected by using the following formula:
Figure FDA0003399959260000021
in the formula (1), α is a positive number.
8. A prediction system, comprising: the prediction device is used for obtaining the corrected tumor mutation load in the sequencing data of a sample to be tested of a subject according to the method for correcting the tumor mutation load according to any one of claims 1 to 4, and predicting the subject to be a high tumor mutation load subject or a low tumor mutation load subject according to the size relation between the tumor mutation load and a threshold value.
9. An apparatus, comprising:
a memory for storing a program;
a processor for executing the program stored in the memory to implement the method for correcting a tumor mutational burden as set forth in any one of claims 1 to 4 or the prediction method as set forth in any one of claims 5 to 6.
10. A computer-readable storage medium having stored thereon a program executable by a processor to perform a method of correcting a tumor mutational burden as defined in any one of claims 1 to 4 or a method of predicting a tumor mutational burden as defined in any one of claims 5 to 6.
CN202111492879.2A 2021-12-08 2021-12-08 Method and system for correcting tumor mutation load Pending CN114155911A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111492879.2A CN114155911A (en) 2021-12-08 2021-12-08 Method and system for correcting tumor mutation load

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111492879.2A CN114155911A (en) 2021-12-08 2021-12-08 Method and system for correcting tumor mutation load

Publications (1)

Publication Number Publication Date
CN114155911A true CN114155911A (en) 2022-03-08

Family

ID=80453841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111492879.2A Pending CN114155911A (en) 2021-12-08 2021-12-08 Method and system for correcting tumor mutation load

Country Status (1)

Country Link
CN (1) CN114155911A (en)

Similar Documents

Publication Publication Date Title
AU2020264326B2 (en) Detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results
US10982279B2 (en) Cell-free DNA for assessing and/or treating cancer
CN104662168B (en) Plasma dna mutation analysis for cancer detection
US11300574B2 (en) Methods for treating breast cancer and for identifying breast cancer antigens
CN109880910A (en) A kind of detection site combination, detection method, detection kit and the system of Tumor mutations load
Abedalthagafi et al. Angiomatous meningiomas have a distinct genetic profile with multiple chromosomal polysomies including polysomy of chromosome 5
CN108753967A (en) A kind of gene set and its panel detection design methods for liver cancer detection
WO2019108807A1 (en) Process for microsatellite instability detection
DK2513330T3 (en) Diagnostic procedures based on somatic acquired rearrangement
CN105442052A (en) Deoxyribonucleic acid (DNA) library for detecting disease causing genes of aoreic dissection diseases and application thereof
GB2577548A (en) A noise measure for copy number analysis on targeted panel sequencing data
CN112592976B (en) Method and device for detecting MET gene amplification
WO2017220782A1 (en) Screening method for endometrial cancer
Wong et al. BRCA sequencing of tumors: Understanding its implications in the oncology community
CN105442053A (en) Deoxyribonucleic acid (DNA) library for detecting and diagnosing disease-causing genes of ion channel diseases and application thereof
CN114155911A (en) Method and system for correcting tumor mutation load
WO2020044046A2 (en) Testing and therapy
US20240052419A1 (en) Methods and systems for detecting genetic variants
Bunz Cancer Detection and Prognostication
WO2024118500A2 (en) Methods for detecting and treating ovarian cancer
Cradic Next Generation Sequencing: Applications for the Clinic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 518118 4th floor, building 3, the first branch of Zhongcheng Life Science Park, Zhongxing Road, Kengzi street, Pingshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen jiyinga Information Technology Co.,Ltd.

Address before: 518118 4th floor, building 3, the first branch of Zhongcheng Life Science Park, Zhongxing Road, Kengzi street, Pingshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen genehome Technology Co.,Ltd.