EP3844761A1 - Mikrosatelliteninstabilitätsdetektion in zellfreier dna - Google Patents
Mikrosatelliteninstabilitätsdetektion in zellfreier dnaInfo
- Publication number
- EP3844761A1 EP3844761A1 EP19769633.9A EP19769633A EP3844761A1 EP 3844761 A1 EP3844761 A1 EP 3844761A1 EP 19769633 A EP19769633 A EP 19769633A EP 3844761 A1 EP3844761 A1 EP 3844761A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- microsatellite
- sample
- loci
- nucleic acid
- score
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000032818 Microsatellite Instability Diseases 0.000 title claims abstract description 324
- 108020004414 DNA Proteins 0.000 title claims description 91
- 238000001514 detection method Methods 0.000 title claims description 56
- 108091092878 Microsatellite Proteins 0.000 claims abstract description 401
- 238000000034 method Methods 0.000 claims abstract description 246
- 150000007523 nucleic acids Chemical class 0.000 claims description 454
- 102000039446 nucleic acids Human genes 0.000 claims description 441
- 108020004707 nucleic acids Proteins 0.000 claims description 441
- 206010028980 Neoplasm Diseases 0.000 claims description 248
- 238000012163 sequencing technique Methods 0.000 claims description 175
- 230000003252 repetitive effect Effects 0.000 claims description 148
- 201000011510 cancer Diseases 0.000 claims description 90
- 210000001519 tissue Anatomy 0.000 claims description 71
- 238000002560 therapeutic procedure Methods 0.000 claims description 62
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 58
- 238000012360 testing method Methods 0.000 claims description 57
- 201000010099 disease Diseases 0.000 claims description 52
- 108091081062 Repeated sequence (DNA) Proteins 0.000 claims description 47
- 108700028369 Alleles Proteins 0.000 claims description 34
- 238000012549 training Methods 0.000 claims description 24
- 238000009169 immunotherapy Methods 0.000 claims description 23
- 210000002381 plasma Anatomy 0.000 claims description 22
- 230000000392 somatic effect Effects 0.000 claims description 21
- 239000012634 fragment Substances 0.000 claims description 16
- 230000035945 sensitivity Effects 0.000 claims description 16
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 15
- -1 PD-2 Proteins 0.000 claims description 15
- 210000004369 blood Anatomy 0.000 claims description 15
- 239000008280 blood Substances 0.000 claims description 15
- 206010009944 Colon cancer Diseases 0.000 claims description 14
- 208000002154 non-small cell lung carcinoma Diseases 0.000 claims description 13
- 206010044412 transitional cell carcinoma Diseases 0.000 claims description 13
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 claims description 13
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 208000020816 lung neoplasm Diseases 0.000 claims description 12
- 208000005718 Stomach Neoplasms Diseases 0.000 claims description 11
- 206010073071 hepatocellular carcinoma Diseases 0.000 claims description 11
- 206010069754 Acquired gene mutation Diseases 0.000 claims description 10
- 208000000236 Prostatic Neoplasms Diseases 0.000 claims description 10
- 210000001744 T-lymphocyte Anatomy 0.000 claims description 10
- 239000012530 fluid Substances 0.000 claims description 10
- 206010017758 gastric cancer Diseases 0.000 claims description 10
- 230000037439 somatic mutation Effects 0.000 claims description 10
- 206010058467 Lung neoplasm malignant Diseases 0.000 claims description 9
- 230000001684 chronic effect Effects 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 9
- 201000011243 gastrointestinal stromal tumor Diseases 0.000 claims description 9
- 230000036541 health Effects 0.000 claims description 9
- 201000005202 lung cancer Diseases 0.000 claims description 9
- 206010014759 Endometrial neoplasm Diseases 0.000 claims description 8
- 101001137987 Homo sapiens Lymphocyte activation gene 3 protein Proteins 0.000 claims description 8
- 229940076838 Immune checkpoint inhibitor Drugs 0.000 claims description 8
- 206010060862 Prostate cancer Diseases 0.000 claims description 8
- 108091081021 Sense strand Proteins 0.000 claims description 8
- 230000000692 anti-sense effect Effects 0.000 claims description 8
- 208000006990 cholangiocarcinoma Diseases 0.000 claims description 8
- 239000012274 immune-checkpoint protein inhibitor Substances 0.000 claims description 8
- 201000001441 melanoma Diseases 0.000 claims description 8
- 206010006187 Breast cancer Diseases 0.000 claims description 7
- 208000026310 Breast neoplasm Diseases 0.000 claims description 7
- 208000008051 Hereditary Nonpolyposis Colorectal Neoplasms Diseases 0.000 claims description 7
- 102000002698 KIR Receptors Human genes 0.000 claims description 7
- 108010043610 KIR Receptors Proteins 0.000 claims description 7
- 201000005027 Lynch syndrome Diseases 0.000 claims description 7
- 101100407308 Mus musculus Pdcd1lg2 gene Proteins 0.000 claims description 7
- 108700030875 Programmed Cell Death 1 Ligand 2 Proteins 0.000 claims description 7
- 102100024213 Programmed cell death 1 ligand 2 Human genes 0.000 claims description 7
- 208000006265 Renal cell carcinoma Diseases 0.000 claims description 7
- 210000002966 serum Anatomy 0.000 claims description 7
- 201000011549 stomach cancer Diseases 0.000 claims description 7
- 208000024893 Acute lymphoblastic leukemia Diseases 0.000 claims description 6
- 208000014697 Acute lymphocytic leukaemia Diseases 0.000 claims description 6
- 108010074708 B7-H1 Antigen Proteins 0.000 claims description 6
- 206010051066 Gastrointestinal stromal tumour Diseases 0.000 claims description 6
- 208000002454 Nasopharyngeal Carcinoma Diseases 0.000 claims description 6
- 206010061306 Nasopharyngeal cancer Diseases 0.000 claims description 6
- 208000015914 Non-Hodgkin lymphomas Diseases 0.000 claims description 6
- 208000006664 Precursor Cell Lymphoblastic Leukemia-Lymphoma Diseases 0.000 claims description 6
- 102100024216 Programmed cell death 1 ligand 1 Human genes 0.000 claims description 6
- 101710089372 Programmed cell death protein 1 Proteins 0.000 claims description 6
- 208000000102 Squamous Cell Carcinoma of Head and Neck Diseases 0.000 claims description 6
- 201000011216 nasopharynx carcinoma Diseases 0.000 claims description 6
- 210000002700 urine Anatomy 0.000 claims description 6
- 208000023747 urothelial carcinoma Diseases 0.000 claims description 6
- 206010005003 Bladder cancer Diseases 0.000 claims description 5
- 201000009030 Carcinoma Diseases 0.000 claims description 5
- 206010014733 Endometrial cancer Diseases 0.000 claims description 5
- 206010061902 Pancreatic neoplasm Diseases 0.000 claims description 5
- 201000008275 breast carcinoma Diseases 0.000 claims description 5
- 208000029742 colonic neoplasm Diseases 0.000 claims description 5
- 231100000844 hepatocellular carcinoma Toxicity 0.000 claims description 5
- 201000005825 prostate adenocarcinoma Diseases 0.000 claims description 5
- 208000003174 Brain Neoplasms Diseases 0.000 claims description 4
- 102100027207 CD27 antigen Human genes 0.000 claims description 4
- 208000017095 Hereditary nonpolyposis colon cancer Diseases 0.000 claims description 4
- 101000914511 Homo sapiens CD27 antigen Proteins 0.000 claims description 4
- 101000851370 Homo sapiens Tumor necrosis factor receptor superfamily member 9 Proteins 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims description 4
- 206010027406 Mesothelioma Diseases 0.000 claims description 4
- 206010033128 Ovarian cancer Diseases 0.000 claims description 4
- 206010036790 Productive cough Diseases 0.000 claims description 4
- 208000015634 Rectal Neoplasms Diseases 0.000 claims description 4
- 208000000453 Skin Neoplasms Diseases 0.000 claims description 4
- 102100036856 Tumor necrosis factor receptor superfamily member 9 Human genes 0.000 claims description 4
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 claims description 4
- 201000003914 endometrial carcinoma Diseases 0.000 claims description 4
- 208000032839 leukemia Diseases 0.000 claims description 4
- 208000014018 liver neoplasm Diseases 0.000 claims description 4
- 210000003296 saliva Anatomy 0.000 claims description 4
- 210000000582 semen Anatomy 0.000 claims description 4
- 210000003802 sputum Anatomy 0.000 claims description 4
- 208000024794 sputum Diseases 0.000 claims description 4
- 210000001179 synovial fluid Anatomy 0.000 claims description 4
- 208000031261 Acute myeloid leukaemia Diseases 0.000 claims description 3
- 208000036764 Adenocarcinoma of the esophagus Diseases 0.000 claims description 3
- 206010003571 Astrocytoma Diseases 0.000 claims description 3
- 208000003950 B-cell lymphoma Diseases 0.000 claims description 3
- 102100035875 C-C chemokine receptor type 5 Human genes 0.000 claims description 3
- 101710149870 C-C chemokine receptor type 5 Proteins 0.000 claims description 3
- 101150013553 CD40 gene Proteins 0.000 claims description 3
- 108010021064 CTLA-4 Antigen Proteins 0.000 claims description 3
- 102000008203 CTLA-4 Antigen Human genes 0.000 claims description 3
- 229940045513 CTLA4 antagonist Drugs 0.000 claims description 3
- 208000010667 Carcinoma of liver and intrahepatic biliary tract Diseases 0.000 claims description 3
- 206010008342 Cervix carcinoma Diseases 0.000 claims description 3
- 208000030808 Clear cell renal carcinoma Diseases 0.000 claims description 3
- 206010052360 Colorectal adenocarcinoma Diseases 0.000 claims description 3
- 102000004127 Cytokines Human genes 0.000 claims description 3
- 108090000695 Cytokines Proteins 0.000 claims description 3
- 208000000461 Esophageal Neoplasms Diseases 0.000 claims description 3
- 206010018338 Glioma Diseases 0.000 claims description 3
- 206010073069 Hepatic cancer Diseases 0.000 claims description 3
- 101000868279 Homo sapiens Leukocyte surface antigen CD47 Proteins 0.000 claims description 3
- 208000031671 Large B-Cell Diffuse Lymphoma Diseases 0.000 claims description 3
- 102100032913 Leukocyte surface antigen CD47 Human genes 0.000 claims description 3
- 208000025205 Mantle-Cell Lymphoma Diseases 0.000 claims description 3
- 208000034578 Multiple myelomas Diseases 0.000 claims description 3
- 206010029260 Neuroblastoma Diseases 0.000 claims description 3
- 206010030137 Oesophageal adenocarcinoma Diseases 0.000 claims description 3
- 206010030155 Oesophageal carcinoma Diseases 0.000 claims description 3
- 206010061534 Oesophageal squamous cell carcinoma Diseases 0.000 claims description 3
- 206010031096 Oropharyngeal cancer Diseases 0.000 claims description 3
- 206010057444 Oropharyngeal neoplasm Diseases 0.000 claims description 3
- 208000027190 Peripheral T-cell lymphomas Diseases 0.000 claims description 3
- 206010035226 Plasma cell myeloma Diseases 0.000 claims description 3
- 208000032758 Precursor T-lymphoblastic lymphoma/leukaemia Diseases 0.000 claims description 3
- 206010054184 Small intestine carcinoma Diseases 0.000 claims description 3
- 208000034254 Squamous cell carcinoma of the cervix uteri Diseases 0.000 claims description 3
- 208000036765 Squamous cell carcinoma of the esophagus Diseases 0.000 claims description 3
- 208000031672 T-Cell Peripheral Lymphoma Diseases 0.000 claims description 3
- 208000029052 T-cell acute lymphoblastic leukemia Diseases 0.000 claims description 3
- 206010042971 T-cell lymphoma Diseases 0.000 claims description 3
- 102100040245 Tumor necrosis factor receptor superfamily member 5 Human genes 0.000 claims description 3
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 claims description 3
- 208000002495 Uterine Neoplasms Diseases 0.000 claims description 3
- 201000005969 Uveal melanoma Diseases 0.000 claims description 3
- 208000008383 Wilms tumor Diseases 0.000 claims description 3
- 208000006336 acinar cell carcinoma Diseases 0.000 claims description 3
- 201000009036 biliary tract cancer Diseases 0.000 claims description 3
- 208000020790 biliary tract neoplasm Diseases 0.000 claims description 3
- 201000010881 cervical cancer Diseases 0.000 claims description 3
- 201000006612 cervical squamous cell carcinoma Diseases 0.000 claims description 3
- 206010073251 clear cell renal cell carcinoma Diseases 0.000 claims description 3
- 201000010989 colorectal carcinoma Diseases 0.000 claims description 3
- 208000035250 cutaneous malignant susceptibility to 1 melanoma Diseases 0.000 claims description 3
- 208000030381 cutaneous melanoma Diseases 0.000 claims description 3
- 206010012818 diffuse large B-cell lymphoma Diseases 0.000 claims description 3
- 201000000330 endometrial stromal sarcoma Diseases 0.000 claims description 3
- 208000029179 endometrioid stromal sarcoma Diseases 0.000 claims description 3
- 208000028653 esophageal adenocarcinoma Diseases 0.000 claims description 3
- 201000004101 esophageal cancer Diseases 0.000 claims description 3
- 208000007276 esophageal squamous cell carcinoma Diseases 0.000 claims description 3
- 210000003608 fece Anatomy 0.000 claims description 3
- 201000008396 gallbladder adenocarcinoma Diseases 0.000 claims description 3
- 201000010175 gallbladder cancer Diseases 0.000 claims description 3
- 201000007487 gallbladder carcinoma Diseases 0.000 claims description 3
- 208000010749 gastric carcinoma Diseases 0.000 claims description 3
- 208000006359 hepatoblastoma Diseases 0.000 claims description 3
- 201000007270 liver cancer Diseases 0.000 claims description 3
- 201000002250 liver carcinoma Diseases 0.000 claims description 3
- 230000000527 lymphocytic effect Effects 0.000 claims description 3
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 claims description 3
- 201000008026 nephroblastoma Diseases 0.000 claims description 3
- 201000011330 nonpapillary renal cell carcinoma Diseases 0.000 claims description 3
- 201000002575 ocular melanoma Diseases 0.000 claims description 3
- 208000010655 oral cavity squamous cell carcinoma Diseases 0.000 claims description 3
- 201000006958 oropharynx cancer Diseases 0.000 claims description 3
- 201000008968 osteosarcoma Diseases 0.000 claims description 3
- 201000002528 pancreatic cancer Diseases 0.000 claims description 3
- 208000008443 pancreatic carcinoma Diseases 0.000 claims description 3
- 201000008129 pancreatic ductal adenocarcinoma Diseases 0.000 claims description 3
- 230000000770 proinflammatory effect Effects 0.000 claims description 3
- 206010038038 rectal cancer Diseases 0.000 claims description 3
- 201000001275 rectum cancer Diseases 0.000 claims description 3
- 201000000849 skin cancer Diseases 0.000 claims description 3
- 201000003708 skin melanoma Diseases 0.000 claims description 3
- 201000000498 stomach carcinoma Diseases 0.000 claims description 3
- 201000005112 urinary bladder cancer Diseases 0.000 claims description 3
- 206010046766 uterine cancer Diseases 0.000 claims description 3
- 208000037965 uterine sarcoma Diseases 0.000 claims description 3
- 238000007482 whole exome sequencing Methods 0.000 claims description 3
- 238000012070 whole genome sequencing analysis Methods 0.000 claims description 3
- 102000017578 LAG3 Human genes 0.000 claims 1
- 239000000523 sample Substances 0.000 description 366
- 125000003729 nucleotide group Chemical group 0.000 description 63
- 239000002773 nucleotide Substances 0.000 description 60
- 238000007481 next generation sequencing Methods 0.000 description 48
- 238000003199 nucleic acid amplification method Methods 0.000 description 43
- 230000003321 amplification Effects 0.000 description 40
- 238000003752 polymerase chain reaction Methods 0.000 description 34
- 210000004027 cell Anatomy 0.000 description 32
- 238000004458 analytical method Methods 0.000 description 31
- 238000004891 communication Methods 0.000 description 31
- 238000003556 assay Methods 0.000 description 27
- 230000035772 mutation Effects 0.000 description 25
- 238000002360 preparation method Methods 0.000 description 22
- 238000011282 treatment Methods 0.000 description 21
- 102000037982 Immune checkpoint proteins Human genes 0.000 description 19
- 108091008036 Immune checkpoint proteins Proteins 0.000 description 19
- 238000013459 approach Methods 0.000 description 19
- 108090000623 proteins and genes Proteins 0.000 description 19
- 229940126546 immune checkpoint molecule Drugs 0.000 description 18
- 238000006243 chemical reaction Methods 0.000 description 17
- 238000003364 immunohistochemistry Methods 0.000 description 17
- 239000000463 material Substances 0.000 description 16
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 15
- 238000010200 validation analysis Methods 0.000 description 15
- 102000040430 polynucleotide Human genes 0.000 description 14
- 108091033319 polynucleotide Proteins 0.000 description 14
- 239000002157 polynucleotide Substances 0.000 description 14
- 238000012545 processing Methods 0.000 description 14
- 230000004044 response Effects 0.000 description 14
- 239000000090 biomarker Substances 0.000 description 12
- 101000914484 Homo sapiens T-lymphocyte activation antigen CD80 Proteins 0.000 description 11
- 102100027222 T-lymphocyte activation antigen CD80 Human genes 0.000 description 11
- 230000000295 complement effect Effects 0.000 description 11
- 230000005746 immune checkpoint blockade Effects 0.000 description 11
- 230000002401 inhibitory effect Effects 0.000 description 11
- 101000883798 Homo sapiens Probable ATP-dependent RNA helicase DDX53 Proteins 0.000 description 10
- 102100038236 Probable ATP-dependent RNA helicase DDX53 Human genes 0.000 description 10
- 238000003205 genotyping method Methods 0.000 description 10
- 229960002621 pembrolizumab Drugs 0.000 description 10
- 210000001124 body fluid Anatomy 0.000 description 9
- 210000004602 germ cell Anatomy 0.000 description 9
- 230000033607 mismatch repair Effects 0.000 description 9
- 230000000875 corresponding effect Effects 0.000 description 8
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 8
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 8
- 238000003786 synthesis reaction Methods 0.000 description 8
- 108091093088 Amplicon Proteins 0.000 description 7
- 102100039498 Cytotoxic T-lymphocyte protein 4 Human genes 0.000 description 7
- 101000889276 Homo sapiens Cytotoxic T-lymphocyte protein 4 Proteins 0.000 description 7
- 102100020862 Lymphocyte activation gene 3 protein Human genes 0.000 description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 description 7
- 108091034117 Oligonucleotide Proteins 0.000 description 7
- 239000000556 agonist Substances 0.000 description 7
- 239000003795 chemical substances by application Substances 0.000 description 7
- 238000009826 distribution Methods 0.000 description 7
- 239000002955 immunomodulating agent Substances 0.000 description 7
- 238000004088 simulation Methods 0.000 description 7
- 238000012546 transfer Methods 0.000 description 7
- 102000053602 DNA Human genes 0.000 description 6
- 241001465754 Metazoa Species 0.000 description 6
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 6
- 238000007405 data analysis Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 6
- 208000035475 disorder Diseases 0.000 description 6
- 238000009396 hybridization Methods 0.000 description 6
- 239000013610 patient sample Substances 0.000 description 6
- 238000004393 prognosis Methods 0.000 description 6
- 238000012175 pyrosequencing Methods 0.000 description 6
- 238000007841 sequencing by ligation Methods 0.000 description 6
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 6
- 102000004190 Enzymes Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- 101000914514 Homo sapiens T-cell-specific surface glycoprotein CD28 Proteins 0.000 description 5
- 230000005867 T cell response Effects 0.000 description 5
- 102100027213 T-cell-specific surface glycoprotein CD28 Human genes 0.000 description 5
- 239000005557 antagonist Substances 0.000 description 5
- 239000000427 antigen Substances 0.000 description 5
- 108091007433 antigens Proteins 0.000 description 5
- 102000036639 antigens Human genes 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 238000012937 correction Methods 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000002068 genetic effect Effects 0.000 description 5
- 239000003446 ligand Substances 0.000 description 5
- 230000001394 metastastic effect Effects 0.000 description 5
- 206010061289 metastatic neoplasm Diseases 0.000 description 5
- 238000012552 review Methods 0.000 description 5
- 229930024421 Adenine Natural products 0.000 description 4
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 4
- 102100031351 Galectin-9 Human genes 0.000 description 4
- 101710121810 Galectin-9 Proteins 0.000 description 4
- 102100034458 Hepatitis A virus cellular receptor 2 Human genes 0.000 description 4
- 229960000643 adenine Drugs 0.000 description 4
- 230000004075 alteration Effects 0.000 description 4
- 238000001574 biopsy Methods 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 4
- 229940104302 cytosine Drugs 0.000 description 4
- 230000002357 endometrial effect Effects 0.000 description 4
- 238000007672 fourth generation sequencing Methods 0.000 description 4
- 108020001507 fusion proteins Proteins 0.000 description 4
- 102000037865 fusion proteins Human genes 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 210000000987 immune system Anatomy 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 230000000869 mutational effect Effects 0.000 description 4
- 229960003301 nivolumab Drugs 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 239000000047 product Substances 0.000 description 4
- 239000004065 semiconductor Substances 0.000 description 4
- 239000006228 supernatant Substances 0.000 description 4
- 238000004448 titration Methods 0.000 description 4
- 240000005020 Acaciella glauca Species 0.000 description 3
- 101150051188 Adora2a gene Proteins 0.000 description 3
- 238000000729 Fisher's exact test Methods 0.000 description 3
- 206010051922 Hereditary non-polyposis colorectal cancer syndrome Diseases 0.000 description 3
- 101001068133 Homo sapiens Hepatitis A virus cellular receptor 2 Proteins 0.000 description 3
- 230000004888 barrier function Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 239000010839 body fluid Substances 0.000 description 3
- 208000035269 cancer or benign tumor Diseases 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 238000002512 chemotherapy Methods 0.000 description 3
- 210000000349 chromosome Anatomy 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 239000013068 control sample Substances 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 201000000459 head and neck squamous cell carcinoma Diseases 0.000 description 3
- 238000012432 intermediate storage Methods 0.000 description 3
- 238000011528 liquid biopsy Methods 0.000 description 3
- 210000004072 lung Anatomy 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000001404 mediated effect Effects 0.000 description 3
- 239000002777 nucleoside Substances 0.000 description 3
- 125000003835 nucleoside group Chemical group 0.000 description 3
- 238000011275 oncology therapy Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000036961 partial effect Effects 0.000 description 3
- 230000037361 pathway Effects 0.000 description 3
- 229920000642 polymer Polymers 0.000 description 3
- 239000000092 prognostic biomarker Substances 0.000 description 3
- 235000003499 redwood Nutrition 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 238000013179 statistical model Methods 0.000 description 3
- 238000002626 targeted therapy Methods 0.000 description 3
- 230000008685 targeting Effects 0.000 description 3
- 229940124597 therapeutic agent Drugs 0.000 description 3
- 230000001225 therapeutic effect Effects 0.000 description 3
- 229940113082 thymine Drugs 0.000 description 3
- 229940035893 uracil Drugs 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 description 2
- 206010052747 Adenocarcinoma pancreas Diseases 0.000 description 2
- 208000023275 Autoimmune disease Diseases 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 2
- 108091061744 Cell-free fetal DNA Proteins 0.000 description 2
- 108091026890 Coding region Proteins 0.000 description 2
- 206010052358 Colorectal cancer metastatic Diseases 0.000 description 2
- 108091035707 Consensus sequence Proteins 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 2
- 108060002716 Exonuclease Proteins 0.000 description 2
- 102100023600 Fibroblast growth factor receptor 2 Human genes 0.000 description 2
- 101710182389 Fibroblast growth factor receptor 2 Proteins 0.000 description 2
- 206010062878 Gastrooesophageal cancer Diseases 0.000 description 2
- 102000053646 Inducible T-Cell Co-Stimulator Human genes 0.000 description 2
- 108700013161 Inducible T-Cell Co-Stimulator Proteins 0.000 description 2
- 102000037984 Inhibitory immune checkpoint proteins Human genes 0.000 description 2
- 108091008026 Inhibitory immune checkpoint proteins Proteins 0.000 description 2
- 206010025323 Lymphomas Diseases 0.000 description 2
- 108091092919 Minisatellite Proteins 0.000 description 2
- 206010061535 Ovarian neoplasm Diseases 0.000 description 2
- 108091007412 Piwi-interacting RNA Proteins 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 108020004682 Single-Stranded DNA Proteins 0.000 description 2
- 108020003224 Small Nucleolar RNA Proteins 0.000 description 2
- 102000042773 Small Nucleolar RNA Human genes 0.000 description 2
- 230000006044 T cell activation Effects 0.000 description 2
- 108091008874 T cell receptors Proteins 0.000 description 2
- 102000016266 T-Cell Antigen Receptors Human genes 0.000 description 2
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 2
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 208000037844 advanced solid tumor Diseases 0.000 description 2
- 210000000612 antigen-presenting cell Anatomy 0.000 description 2
- 229950002916 avelumab Drugs 0.000 description 2
- 238000003766 bioinformatics method Methods 0.000 description 2
- 201000001531 bladder carcinoma Diseases 0.000 description 2
- 238000002619 cancer immunotherapy Methods 0.000 description 2
- 238000005251 capillar electrophoresis Methods 0.000 description 2
- 230000010261 cell growth Effects 0.000 description 2
- 210000003169 central nervous system Anatomy 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- 201000010897 colon adenocarcinoma Diseases 0.000 description 2
- 239000012228 culture supernatant Substances 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 229950009791 durvalumab Drugs 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000001747 exhibiting effect Effects 0.000 description 2
- 102000013165 exonuclease Human genes 0.000 description 2
- 210000003722 extracellular fluid Anatomy 0.000 description 2
- 230000001605 fetal effect Effects 0.000 description 2
- 201000006585 gastric adenocarcinoma Diseases 0.000 description 2
- 201000006974 gastroesophageal cancer Diseases 0.000 description 2
- 201000010536 head and neck cancer Diseases 0.000 description 2
- 208000014829 head and neck neoplasm Diseases 0.000 description 2
- 238000012165 high-throughput sequencing Methods 0.000 description 2
- 229920001519 homopolymer Polymers 0.000 description 2
- 210000002865 immune cell Anatomy 0.000 description 2
- 230000028993 immune response Effects 0.000 description 2
- 238000000126 in silico method Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 150000002500 ions Chemical class 0.000 description 2
- 229960005386 ipilimumab Drugs 0.000 description 2
- 238000005304 joining Methods 0.000 description 2
- 201000005249 lung adenocarcinoma Diseases 0.000 description 2
- 201000005243 lung squamous cell carcinoma Diseases 0.000 description 2
- 230000003211 malignant effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000008774 maternal effect Effects 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 239000002674 ointment Substances 0.000 description 2
- 201000002094 pancreatic adenocarcinoma Diseases 0.000 description 2
- 210000005259 peripheral blood Anatomy 0.000 description 2
- 239000011886 peripheral blood Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical compound [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 239000011541 reaction mixture Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 229920002477 rna polymer Polymers 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 239000004055 small Interfering RNA Substances 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 230000000699 topical effect Effects 0.000 description 2
- 208000010570 urinary bladder carcinoma Diseases 0.000 description 2
- YKBGVTZYEHREMT-KVQBGUIXSA-N 2'-deoxyguanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](CO)O1 YKBGVTZYEHREMT-KVQBGUIXSA-N 0.000 description 1
- WAVYAFBQOXCGSZ-UHFFFAOYSA-N 2-fluoropyrimidine Chemical compound FC1=NC=CC=N1 WAVYAFBQOXCGSZ-UHFFFAOYSA-N 0.000 description 1
- 208000010543 22q11.2 deletion syndrome Diseases 0.000 description 1
- CKTSBUTUHBMZGZ-ULQXZJNLSA-N 4-amino-1-[(2r,4s,5r)-4-hydroxy-5-(hydroxymethyl)oxolan-2-yl]-5-tritiopyrimidin-2-one Chemical compound O=C1N=C(N)C([3H])=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 CKTSBUTUHBMZGZ-ULQXZJNLSA-N 0.000 description 1
- 102100023990 60S ribosomal protein L17 Human genes 0.000 description 1
- 101150113019 74 gene Proteins 0.000 description 1
- 102000007471 Adenosine A2A receptor Human genes 0.000 description 1
- 108010085277 Adenosine A2A receptor Proteins 0.000 description 1
- 208000002485 Adiposis dolorosa Diseases 0.000 description 1
- 208000003343 Antiphospholipid Syndrome Diseases 0.000 description 1
- 206010003445 Ascites Diseases 0.000 description 1
- 206010003805 Autism Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 208000010061 Autosomal Dominant Polycystic Kidney Diseases 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 206010005949 Bone cancer Diseases 0.000 description 1
- 208000018084 Bone neoplasm Diseases 0.000 description 1
- 102100038078 CD276 antigen Human genes 0.000 description 1
- 101710185679 CD276 antigen Proteins 0.000 description 1
- 102100025221 CD70 antigen Human genes 0.000 description 1
- 206010008723 Chondrodystrophy Diseases 0.000 description 1
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 1
- 208000005443 Circulating Neoplastic Cells Diseases 0.000 description 1
- 206010010099 Combined immunodeficiency Diseases 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 102000012437 Copper-Transporting ATPases Human genes 0.000 description 1
- 208000011231 Crohn disease Diseases 0.000 description 1
- NZNMSOFKMUBTKW-UHFFFAOYSA-N Cyclohexanecarboxylic acid Natural products OC(=O)C1CCCCC1 NZNMSOFKMUBTKW-UHFFFAOYSA-N 0.000 description 1
- 201000003883 Cystic fibrosis Diseases 0.000 description 1
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 description 1
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 206010061818 Disease progression Diseases 0.000 description 1
- 201000010374 Down Syndrome Diseases 0.000 description 1
- 201000000913 Duane retraction syndrome Diseases 0.000 description 1
- 208000020129 Duane syndrome Diseases 0.000 description 1
- 206010013801 Duchenne Muscular Dystrophy Diseases 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 102000012804 EPCAM Human genes 0.000 description 1
- 101150084967 EPCAM gene Proteins 0.000 description 1
- 101150029707 ERBB2 gene Proteins 0.000 description 1
- 241000283086 Equidae Species 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 108091008794 FGF receptors Proteins 0.000 description 1
- 206010016207 Familial Mediterranean fever Diseases 0.000 description 1
- 208000001914 Fragile X syndrome Diseases 0.000 description 1
- 201000003741 Gastrointestinal carcinoma Diseases 0.000 description 1
- 208000015872 Gaucher disease Diseases 0.000 description 1
- 208000018565 Hemochromatosis Diseases 0.000 description 1
- 208000031220 Hemophilia Diseases 0.000 description 1
- 208000009292 Hemophilia A Diseases 0.000 description 1
- 101710083479 Hepatitis A virus cellular receptor 2 homolog Proteins 0.000 description 1
- 208000002972 Hepatolenticular Degeneration Diseases 0.000 description 1
- 101000934356 Homo sapiens CD70 antigen Proteins 0.000 description 1
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 description 1
- 101000968658 Homo sapiens DNA mismatch repair protein Msh6 Proteins 0.000 description 1
- 101001019455 Homo sapiens ICOS ligand Proteins 0.000 description 1
- 101000598160 Homo sapiens Nuclear mitotic apparatus protein 1 Proteins 0.000 description 1
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 1
- 101000638251 Homo sapiens Tumor necrosis factor ligand superfamily member 9 Proteins 0.000 description 1
- 208000023105 Huntington disease Diseases 0.000 description 1
- 208000025500 Hutchinson-Gilford progeria syndrome Diseases 0.000 description 1
- 206010020608 Hypercoagulation Diseases 0.000 description 1
- 208000000563 Hyperlipoproteinemia Type II Diseases 0.000 description 1
- 102100034980 ICOS ligand Human genes 0.000 description 1
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- 108090001005 Interleukin-6 Proteins 0.000 description 1
- 208000005016 Intestinal Neoplasms Diseases 0.000 description 1
- 208000008839 Kidney Neoplasms Diseases 0.000 description 1
- 208000017924 Klinefelter Syndrome Diseases 0.000 description 1
- 108020005198 Long Noncoding RNA Proteins 0.000 description 1
- 102100024640 Low-density lipoprotein receptor Human genes 0.000 description 1
- 229910015837 MSH2 Inorganic materials 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 208000001826 Marfan syndrome Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 206010063916 Metastatic gastric cancer Diseases 0.000 description 1
- 108010074346 Mismatch Repair Endonuclease PMS2 Proteins 0.000 description 1
- 102000008071 Mismatch Repair Endonuclease PMS2 Human genes 0.000 description 1
- 108020005196 Mitochondrial DNA Proteins 0.000 description 1
- 208000003445 Mouth Neoplasms Diseases 0.000 description 1
- 102000013609 MutL Protein Homolog 1 Human genes 0.000 description 1
- 108010026664 MutL Protein Homolog 1 Proteins 0.000 description 1
- 206010068871 Myotonic dystrophy Diseases 0.000 description 1
- 208000009905 Neurofibromatoses Diseases 0.000 description 1
- 206010029748 Noonan syndrome Diseases 0.000 description 1
- 208000010505 Nose Neoplasms Diseases 0.000 description 1
- 108091093105 Nuclear DNA Proteins 0.000 description 1
- 102100036961 Nuclear mitotic apparatus protein 1 Human genes 0.000 description 1
- 102000004473 OX40 Ligand Human genes 0.000 description 1
- 108010042215 OX40 Ligand Proteins 0.000 description 1
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 206010031243 Osteogenesis imperfecta Diseases 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 239000012269 PD-1/PD-L1 inhibitor Substances 0.000 description 1
- 208000018737 Parkinson disease Diseases 0.000 description 1
- 201000011252 Phenylketonuria Diseases 0.000 description 1
- 208000002151 Pleural effusion Diseases 0.000 description 1
- 208000019222 Poland syndrome Diseases 0.000 description 1
- 241000097929 Porphyria Species 0.000 description 1
- 208000010642 Porphyrias Diseases 0.000 description 1
- 241001237728 Precis Species 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 208000007932 Progeria Diseases 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 1
- 208000007014 Retinitis pigmentosa Diseases 0.000 description 1
- 206010039491 Sarcoma Diseases 0.000 description 1
- 108020004487 Satellite DNA Proteins 0.000 description 1
- 108020004459 Small interfering RNA Proteins 0.000 description 1
- 208000032383 Soft tissue cancer Diseases 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 241000282887 Suidae Species 0.000 description 1
- 101150057140 TACSTD1 gene Proteins 0.000 description 1
- 108091046869 Telomeric non-coding RNA Proteins 0.000 description 1
- 208000002903 Thalassemia Diseases 0.000 description 1
- 206010043515 Throat cancer Diseases 0.000 description 1
- 208000024770 Thyroid neoplasm Diseases 0.000 description 1
- 206010068233 Trimethylaminuria Diseases 0.000 description 1
- 108060008682 Tumor Necrosis Factor Proteins 0.000 description 1
- 102100040247 Tumor necrosis factor Human genes 0.000 description 1
- 102100032101 Tumor necrosis factor ligand superfamily member 9 Human genes 0.000 description 1
- 208000026928 Turner syndrome Diseases 0.000 description 1
- 206010045261 Type IIa hyperlipidaemia Diseases 0.000 description 1
- 108010079206 V-Set Domain-Containing T-Cell Activation Inhibitor 1 Proteins 0.000 description 1
- 102100038929 V-set domain-containing T-cell activation inhibitor 1 Human genes 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 201000007960 WAGR syndrome Diseases 0.000 description 1
- 208000018839 Wilson disease Diseases 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 208000008919 achondroplasia Diseases 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 208000006682 alpha 1-Antitrypsin Deficiency Diseases 0.000 description 1
- AFVLVVWMAFSXCK-VMPITWQZSA-N alpha-cyano-4-hydroxycinnamic acid Chemical compound OC(=O)C(\C#N)=C\C1=CC=C(O)C=C1 AFVLVVWMAFSXCK-VMPITWQZSA-N 0.000 description 1
- 239000012491 analyte Substances 0.000 description 1
- 239000012805 animal sample Substances 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 230000005809 anti-tumor immunity Effects 0.000 description 1
- 230000005975 antitumor immune response Effects 0.000 description 1
- 230000006907 apoptotic process Effects 0.000 description 1
- 239000007900 aqueous suspension Substances 0.000 description 1
- 229960003852 atezolizumab Drugs 0.000 description 1
- 208000022185 autosomal dominant polycystic kidney disease Diseases 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 201000011263 bladder neck cancer Diseases 0.000 description 1
- 210000001772 blood platelet Anatomy 0.000 description 1
- 210000001185 bone marrow Anatomy 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 230000005907 cancer growth Effects 0.000 description 1
- 229940022399 cancer vaccine Drugs 0.000 description 1
- 238000009566 cancer vaccine Methods 0.000 description 1
- 239000002775 capsule Substances 0.000 description 1
- 230000003197 catalytic effect Effects 0.000 description 1
- 230000030833 cell death Effects 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 210000000038 chest Anatomy 0.000 description 1
- 208000013056 classic Hodgkin lymphoma Diseases 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000139 costimulatory effect Effects 0.000 description 1
- 108091008034 costimulatory receptors Proteins 0.000 description 1
- 210000001151 cytotoxic T lymphocyte Anatomy 0.000 description 1
- 235000013365 dairy product Nutrition 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 239000005549 deoxyribonucleoside Substances 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 239000003085 diluting agent Substances 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 230000005750 disease progression Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000002222 downregulating effect Effects 0.000 description 1
- 230000003828 downregulation Effects 0.000 description 1
- 238000001493 electron microscopy Methods 0.000 description 1
- 239000000839 emulsion Substances 0.000 description 1
- 210000002889 endothelial cell Anatomy 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 230000008995 epigenetic change Effects 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 210000001723 extracellular space Anatomy 0.000 description 1
- 108010091897 factor V Leiden Proteins 0.000 description 1
- 201000001386 familial hypercholesterolemia Diseases 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 102000052178 fibroblast growth factor receptor activity proteins Human genes 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013100 final test Methods 0.000 description 1
- 238000004374 forensic analysis Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 230000002496 gastric effect Effects 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 208000016361 genetic disease Diseases 0.000 description 1
- 230000037442 genomic alteration Effects 0.000 description 1
- 208000005017 glioblastoma Diseases 0.000 description 1
- 239000008187 granular material Substances 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 201000005787 hematologic cancer Diseases 0.000 description 1
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 description 1
- 230000011132 hemopoiesis Effects 0.000 description 1
- 229920000140 heteropolymer Polymers 0.000 description 1
- 208000009624 holoprosencephaly Diseases 0.000 description 1
- 108091008039 hormone receptors Proteins 0.000 description 1
- 125000004435 hydrogen atom Chemical group [H]* 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 230000001900 immune effect Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 230000028709 inflammatory response Effects 0.000 description 1
- 239000003112 inhibitor Substances 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 201000002313 intestinal cancer Diseases 0.000 description 1
- 230000010189 intracellular transport Effects 0.000 description 1
- 238000007918 intramuscular administration Methods 0.000 description 1
- 238000007912 intraperitoneal administration Methods 0.000 description 1
- 238000007913 intrathecal administration Methods 0.000 description 1
- 230000002601 intratumoral effect Effects 0.000 description 1
- 238000007915 intraurethral administration Methods 0.000 description 1
- 238000001990 intravenous administration Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 238000007834 ligase chain reaction Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000001926 lymphatic effect Effects 0.000 description 1
- 230000002934 lysing effect Effects 0.000 description 1
- 238000007403 mPCR Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 208000010658 metastatic prostate carcinoma Diseases 0.000 description 1
- 108091070501 miRNA Proteins 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 231100000310 mutation rate increase Toxicity 0.000 description 1
- 230000017074 necrotic cell death Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 201000002120 neuroendocrine carcinoma Diseases 0.000 description 1
- 201000004931 neurofibromatosis Diseases 0.000 description 1
- 239000002751 oligonucleotide probe Substances 0.000 description 1
- 229940037201 oris Drugs 0.000 description 1
- 230000002611 ovarian Effects 0.000 description 1
- 229940121653 pd-1/pd-l1 inhibitor Drugs 0.000 description 1
- 238000011338 personalized therapy Methods 0.000 description 1
- 239000008194 pharmaceutical composition Substances 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- RLZZZVKAURTHCP-UHFFFAOYSA-N phenanthrene-3,4-diol Chemical compound C1=CC=C2C3=C(O)C(O)=CC=C3C=CC2=C1 RLZZZVKAURTHCP-UHFFFAOYSA-N 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000009258 post-therapy Methods 0.000 description 1
- 244000144977 poultry Species 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000002342 ribonucleoside Substances 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 208000007056 sickle cell anemia Diseases 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 208000002320 spinal muscular atrophy Diseases 0.000 description 1
- 239000007921 spray Substances 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000007920 subcutaneous administration Methods 0.000 description 1
- 239000000829 suppository Substances 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 208000020408 systemic-onset juvenile idiopathic arthritis Diseases 0.000 description 1
- 239000003826 tablet Substances 0.000 description 1
- 229940066453 tecentriq Drugs 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 201000005665 thrombophilia Diseases 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 230000000451 tissue damage Effects 0.000 description 1
- 231100000827 tissue damage Toxicity 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
- 238000013274 transthoracic needle biopsy Methods 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 210000003171 tumor-infiltrating lymphocyte Anatomy 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 210000003932 urinary bladder Anatomy 0.000 description 1
- 201000003701 uterine corpus endometrial carcinoma Diseases 0.000 description 1
- 201000000866 velocardiofacial syndrome Diseases 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- 229940055760 yervoy Drugs 0.000 description 1
- XOOUIPVCVHRTMJ-UHFFFAOYSA-L zinc stearate Chemical compound [Zn+2].CCCCCCCCCCCCCCCCCC([O-])=O.CCCCCCCCCCCCCCCCCC([O-])=O XOOUIPVCVHRTMJ-UHFFFAOYSA-L 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
Definitions
- Repetitive nucleic acid elements are patterns of nucleotides (DNA or RNA) that occur in multiple copies throughout eukaryotic and prokaryotic genomes. Examples of such repetitive elements, include microsatellites, short tandem repeats (STRs), and minisatellites, among others. Microsatellites typically include repeat units of less than 10 base pairs. STRs generally include repeat units of two to thirteen nucleotides that are often repeated hundreds of times in a given stretch of nuclear DNA. STR analysis is a common tool used in forensic analysis. Minisatellites are repetitive elements that typically have repeat units from about 10 to 60 base pairs.
- Microsatellites are highly polymorphic DNA-repeat regions.
- Microsatellite instability is a guideline-recommended biomarker used in assessment of prognosis and treatment choices, including checkpoint inhibitors recently approved for the treatment of cancers with MSI high (MSI-H) status.
- Plasma- based next generation DNA sequencing (NGS) tests are increasingly used for comprehensive genomic profiting of cancer, however, methods to detect MSI status from cell-free DNA (cfDNA) data are underdeveloped. Additionally, the impact of variable tumor shedding on MSI detection has not been previously evaluated.
- This application discloses methods, computer readable media, and systems that are useful in determining the microsatellite and/or other repetitive DNA instability status of cell-free DNA (cfDNA) samples from patients and which help guide disease prognosis and treatment decisions.
- cfDNA cell-free DNA
- PCR polymerase chain reaction
- the present disclosure provides a method of determining a repetitive nucleic acid instability status of a nucleic acid sample.
- the method includes (a) quantifying a number of different repeat lengths present at each of a plurality of repetitive nucleic acid loci from sequence information to generate a site score for each of the plurality of the repetitive nucleic acid loci.
- the sequence information is from a population of repetitive nucleic acid loci in the nucleic acid sample.
- the method also includes (b) calling a given repetitive nucleic acid locus as being unstable when the site score of the given repetitive nucleic acid locus exceeds a site specific trained threshold for the given repetitive nucleic acid locus to generate a repetitive nucleic acid instability score comprising a number of unstable repetitive nucleic acid loci from the plurality of the repetitive nucleic acid loci.
- the method also includes (c) classifying the repetitive nucleic acid instability status of the nucleic acid sample as being unstable when the repetitive nucleic acid instability score exceeds a population trained threshold for the population of repetitive nucleic acid loci in the nucleic acid sample, thereby determining the repetitive nucleic acid instability status of the nucleic acid sample.
- the present disclosure provides a method of determining a repetitive DNA instability status of a sample (e.g. cell-free DNA (cfDNA) sample).
- the method includes (a) quantifying a number of different repeat lengths present at each of a plurality of repetitive DNA loci from sequence information to generate a site score for each of the plurality of the repetitive DNA loci.
- the sequence information is from a population of repetitive DNA loci in the sample.
- the method also includes (b) comparing the site score of a given repetitive DNA locus to a site specific trained threshold for the given repetitive DNA locus for each of the plurality of the repetitive DNA loci.
- the method further includes (c) calling the given repetitive DNA locus as being unstable when the site score of the given repetitive DNA locus exceeds the site specific trained threshold for the given repetitive DNA locus to generate a repetitive DNA instability score comprising a number of unstable repetitive DNA loci from the plurality of the repetitive DNA loci.
- the method also includes (d) classifying the repetitive DNA instability status of the sample as being unstable when the repetitive DNA instability score exceeds a population trained threshold for the population of repetitive DNA loci in the sample, thereby determining the repetitive DNA instability status of the sample.
- the methods disclosed herein are typically at least partially computer implemented.
- the present disclosure provides a method of determining a microsatellite instability (MSI) status of a sample.
- The includes (a) quantifying a number of different repeat lengths present at each of a plurality of microsatellite loci from sequence information to generate a site score for each of the plurality of the microsatellite loci in which the sequence information is from a population of microsatellite loci in the sample.
- the method also includes (b) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci.
- the method further includes (c) calling the given microsatellite locus as being unstable when the site score of the given microsatellite locus exceeds the site specific trained threshold for the given microsatellite locus to generate a microsatellite instability score comprising a number of unstable microsatellite loci from the plurality of the microsatellite loci.
- the method also includes (d) classifying the MSI status of the sample as being unstable when the microsatellite instability score exceeds a population trained threshold for the population of microsatellite loci in the sample, thereby determining the MSI status of the sample.
- the present disclosure provides a method of determining a microsatellite instability (MSI) status of a sample.
- the method includes (a) receiving sequence information from a population of microsatellite loci in the sample, and (b) quantifying a number of different repeat lengths present at each of a plurality of the microsatellite loci from the sequence information to generate a site score for each of the plurality of the microsatellite loci.
- the method also includes (c) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci.
- the method further includes (d) calling the given microsatellite locus as being unstable when the site score of the given microsatellite locus exceeds the site specific trained threshold for the given microsatellite locus to generate a microsatellite instability score comprising a number of unstable microsatellite loci from the plurality of the microsatellite loci.
- the method also includes (e) classifying the MSI status of the sample as being unstable when the microsatellite instability score exceeds a population trained threshold for the population of microsatellite loci in the sample, thereby determining the MSI status of the sample.
- the present disclosure provides a method of identifying one or more customized therapies for treating a disease in a subject.
- the method includes (a) quantifying a number of different repeat lengths present at each of a plurality of microsatellite loci from sequence information to generate a site score for each of the plurality of the microsatellite loci in which the sequence information is from a population of microsatellite loci in a sample.
- the method also includes (b) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci.
- the method further includes (c) calling the given microsatellite locus as being unstable when the site score of the given microsatellite locus exceeds the site specific trained threshold for the given microsatellite locus to generate a microsatellite instability score comprising a number of unstable microsatellite loci from the plurality of the microsatellite loci.
- the method also includes (d) classifying the MSI status of the sample as being unstable when the microsatellite instability score exceeds a population trained threshold for the population of microsatellite loci in the sample to identify an unstable sample.
- the method also includes (e) comparing the microsatellite instability status of the sample to one or more comparator results that are indexed with one or more therapies to identify one or more customized therapies for treating the disease in the subject.
- the present disclosure provides a method of treating a disease in a subject.
- the method includes (a) quantifying a number of different repeat lengths present at each of a plurality of microsatellite loci from sequence information to generate a site score for each of the plurality of the microsatellite loci, wherein the sequence information is from a population of microsatellite loci in a sample.
- the method also includes (b) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci.
- the method further includes (c) calling the given microsatellite locus as being unstable when the site score of the given microsatellite locus exceeds the site specific trained threshold for the given microsatellite locus to generate a microsatellite instability score comprising a number of unstable microsatellite loci from the plurality of the microsatellite loci.
- the method also includes (d) classifying the MSI status of the sample as being unstable when the microsatellite instability score exceeds a population trained threshold for the population of microsatellite loci in the sample to identify an unstable sample.
- the method also includes (e) comparing the microsatellite instability status of the sample to one or more comparator results that are indexed with one or more therapies to identity one or more customized therapies for treating the disease in the subject.
- the method also includes (f) administering at least one of the identified customized therapies to the subject when there is a substantial match between the microsatellite instability status of the sample and the comparator results, thereby treating the disease in the subject.
- the present disclosure provides a method of treating a disease in a subject.
- the method includes administering one or more customized therapies to the subject, thereby treating the disease in the subject, in which the customized therapies have been identified by : (a) quantifying a number of different repeat lengths present at each of a plurality of microsatellite loci from sequence information to generate a site score for each of the plurality of the microsatellite loci, wherein the sequence information is from a population of microsatellite loci in a sample.
- the method also includes (b) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci.
- the method further includes (c) calling the given microsatellite locus as being unstable when the site score of the given microsatellite locus exceeds the site specific trained threshold for the given microsatellite locus to generate a microsatellite instability score comprising a number of unstable microsatellite loci from the plurality of the microsatellite loci.
- the method also includes (d) classifying the MSI status of the sample as being unstable when the microsatellite instability score exceeds a population trained threshold for the population of microsatellite loci in the sample to identify an unstable sample.
- the method further includes (e) comparing the microsatellite instability status of the sample to one or more comparator results that are indexed with one or more therapies.
- the method also includes (f) identifying one or more customized therapies for treating the disease in the subject when there is a substantial match between the microsatellite instability status of the sample and the comparator results.
- the site scores of the plurality of the microsatellite loci comprise likelihood scores.
- the likelihood scores comprise probabilistic log likelihood-based scores that discriminate biological signal derived from a number of nucleic acid fragments (in some embodiments - cfDNA fragments) of somatic origin in the sample from noise arising from post-sample collection artifacts in the sample.
- the methods include determining the probabilistic log likelihood-based score for an individual microsatellite locus in the sequence information from the sample using at least two parameters in which at least a first parameter comprises allele frequencies and at least a second parameter comprises at least one error mode.
- the allele frequencies comprise frequencies of nucleic acids comprising different repeat lengths in the sequence information from the sample.
- the at least one error mode comprises a random error mode and a strand specific error mode.
- the site scores of the plurality of the microsatellite loci comprise a difference between or ratio of: (a) a score measuring a support of observed sequences for a null hypothesis that the given microsatellite locus is stable, and (b) a score measuring a support of observed sequences for an alternate hypothesis that the given microsatellite locus is unstable.
- the site scores of the plurality of the microsatellite loci are generated using one or more of: a likelihood criterion, a log-likelihood criterion, a posterior probability criterion, an Akaike information criterion (AIC), a Bayesian information criterion, and/or the like.
- the site scores of the plurality of the microsatellite loci comprise Akaike Information Criterion (AlC)-based site scores that test for a presence of somatic indels at the plurality of the microsatellite loci.
- AlC Akaike Information Criterion
- the methods include estimating the parameters of the model using a maximum likelihood estimation (MLE).
- MLE maximum likelihood estimation
- the methods include determining the MLE using a Nelder-Mead algorithm.
- the methods include calculating a null hypothesis score (e.g., a score measuring a support of observed sequences for a null hypothesis that the given microsatellite locus is stable) of the model using the formula of:
- AICo k - log(Pr(obs
- the methods include calculating an alternate hypothesis score (e.g., a score measuring a support of observed sequences for an alternate hypothesis that the given microsatellite locus is unstable) of the model using the formula of:
- AlCmin min a (k - log(Pr(obs
- AIC m ,n is the alternate hypothesis, min a an effect of minimizing over all values of a.
- k is the number of parameters used in the model
- Pr probability
- obs comprises repeat lengths of observed sequencing reads covering the given microsatellite locus, /?is at least one strand specific error parameter, g is at least one random error parameter, and rris at least one allele frequency, wherein i is a vector of allele frequencies such that the sum of one or more a, is equal to one.
- obs is a number of observed sequencing reads covering the given microsatellite locus
- the methods include detecting change in the model to determine site scores (i.e., dAIC) using the formula of:
- dAIC AICo - AlCmin.
- g comprises: (a) a rate of read-level errors where a microsatellite length observed within a sequencing read is one repeat unit longer than an expected microsatellite length for a strand of an originating nucleic acid molecule; and/or (b) a rate of read-level errors where a microsatellite length observed within a sequencing read is one repeat unit shorter than an expected microsatellite length for a strand of an originating nucleic acid molecule.
- b comprises: (a) a rate of strand-level errors where an expected microsatellite length of a sense strand is one repeat unit longer than an expected microsatellite length of an nucleic acid originating molecule; (b) a rate of strand-level errors where an expected microsatellite length of an antisense strand is one repeat unit longer than an expected microsatellite length of an nucleic acid originating molecule; (c) a rate of strand-level errors where an expected microsatellite length of a sense strand is one repeat unit shorter than an expected microsatellite length of an nucleic acid originating molecule; and/or, (d) a rate of strand-level errors where an expected microsatellite length of an antisense strand is one repeat unit shorter than an expected microsatellite length of an nucleic acid originating molecule.
- the method includes calling the given microsatellite locus as being unstable when the site score of the given microsatellite
- a AlC-based site score is calculated using the formula of:
- AICo and AlCmin are calculated using the above formula.
- a mutant allele fraction (MAF) of the sample is estimated.
- a tumor fraction of the sample e.g. cfDNA sample
- the tumor fraction comprises a maximum mutant allele fraction (MAF) of all somatic mutations identified in the nucleic acids in the sample (e.g. cfDNA sample).
- the tumor fraction is below about 0.05%, about 0.1%, about 0.2%, about 0.5%, about 1%, about 2%, about 3%, about 4%, about 5%, about 6%, about 7%, about 8%, about 9%, about 10%, about 11%, about 12%, about 13%, about 14%, or about 15% of all nucleic acids in the sample (e.g. cfDNA sample).
- the plurality of microsatellite loci comprises all of the population of microsatellite loci, whereas in other embodiments, the plurality of microsatellite loci comprises a subset of the population of microsatellite loci.
- the methods include determining the site specific trained threshold and/or the population trained threshold from sequence information from a population of microsatellite loci in one or more training DNA samples.
- the training DNA samples comprise non-tumor cfDNA training samples and/or DNA from one or more tumor types.
- the methods comprise a sensitivity of at least about 94% at a limit of detection (LOD) of about a 0.1- 0.4% tumor fraction of nucleic acids in the sample.
- the methods comprise analytical specificity of at least about 99% for non-tumor DNA in the sample.
- the determined MSI status of the sample comprises at least about 90%, 91%, 92%, 93%, 94%95%, 96%, 97%, 98%, or 99% concordance with a corresponding MSI status of the sample determined using a PCR-based MSI assessment technique across a tumor fraction range of about 1% to about 15%. In some of these embodiments, the concordance is 100%.
- the methods include classifying the MSI status of the sample as MSI-high (MSI-H) when the microsatellite instability score is greater than about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 15, about 20, about 30, about 40, about 50, or more unstable microsatellite loci from the plurality of the microsatellite loci.
- the methods include classifying the MSf status of the sample as MSI-high (MSI-H) when the number of unstable microsatellite loci comprises about 0.1%, about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 15%, about 20%, or about 25% of the plurality of the microsatellite loci.
- the number of different repeat lengths comprises a frequency of each different repeat length present at each of the plurality of microsatellite loci.
- the present disclosure includes methods of selecting customized therapies for treating disease in subjects, and/or methods of treating disease in subjects.
- the disease comprises a cancer comprising at least one tumor type selected from the group consisting of, but not limited to: biliary tract cancer, bladder cancer, transitional cell carcinoma, urothelial carcinoma, brain cancer, gliomas, astrocytomas, breast carcinoma, metaplastic carcinoma, cervical cancer, cervical squamous cell carcinoma, rectal cancer, colorectal carcinoma, colon cancer, hereditary nonpolyposis colorectal cancer, colorectal adenocarcinomas, gastrointestinal stromal tumors (GISTs), endometrial carcinoma, endometrial stromal sarcomas, esophageal cancer, esophageal squamous cell carcinoma, esophageal adenocarcinoma, ocular melanoma, uveal melanoma, gallbladder carcinomas
- the therapies comprise at least one immunotherapy (e.g., checkpoint inhibitor antibody, autologous cytotoxic T cells, personalized cancer vaccine, etc.) ⁇
- the immunotherapy comprises an antibody against PD-1, PD-2, PD-L1, PD-L2, CTLA-4, 0X40, B7.1, B7He, LAG3, CD137, KIR, CCR5, CD27, CD40, or CD47.
- the immunotherapy comprises administration of a pro-inflammatory cytokine against at least one tumor type.
- the immunotherapy comprises administration of T cells against at least one tumor type.
- the methods include obtaining the sample from a subject.
- the sample is tissue, blood, plasma, serum, sputum, urine, semen, vaginal fluid, feces, synovial fluid, spinal fluid, saliva, and/or the like.
- the subject is a mammalian subject (e.g., a human subject).
- the sample is blood.
- the sample is plasma.
- the sample is serum.
- the sample comprises cell- free DNA (i.e., cfDNA sample).
- the cfDNA sample comprises circulating tumor nucleic acids.
- the methods include receiving the sequence information generated from the sample in which the sequence information comprises sequencing reads from the population of micro satellite loci in the sample. In some embodiments, the methods include amplifying one or more segments of nucleic acids in the sample to generate at least one amplified nucleic acid. In certain embodiments, the methods include sequencing nucleic acids from the sample to generate the sequence information. In some embodiments, the sample can be cfDNA sample. In these embodiments, the sequence information comprises cfDNA sequencing reads from the population of microsatellite loci in the cfDNA sample.
- the sequence information is obtained from targeted segments of nucleic acids in the sample in which the targeted segments are obtained by selectively enriching one or more regions from the nucleic acids in the sample prior to sequencing.
- the methods include amplifying the obtained targeted segments prior to sequencing fn these embodiments, the methods typically include attaching one or more adapters comprising molecular barcodes to the nucleic acids prior to amplification. In some embodiments, the methods included attaching one or more sample indexes via amplification prior to the sequencing.
- any nucleic acid sequencing technique is optionally used or adapted for use in performing the methods disclosed herein.
- the sequencing is optionally selected from targeted sequencing, intron sequencing, exome sequencing, whole genome sequencing, and/or the like.
- the sequencing is targeted sequencing.
- the methods include sequencing at least about 50, about 100, about 150, about 200, about 250, about 500, about 750, about 1,000, about 1,500, about 2,000, or more targeted genomic regions in the nucleic acids of the sample to generate the sequence information.
- the present disclosure provides a system, comprising a controller comprising, or capable of accessing, computer readable media comprising non-transitory computer-executable instructions which, when executed by at least one electronic processor perform at least: (a) receiving sequence information from a population of microsatellite loci in a sample; (b) quantifying a number of different repeat lengths present at each of a plurality of the microsatellite loci from the sequence information to generate a site score for each of the plurality of the microsatellite loci; (c) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci; (d) calling the given microsatellite locus as being unstable when the site score of the given microsatellite locus exceeds the site specific trained threshold for the given microsatellite locus to generate a microsatellite instability score comprising a number
- the system includes a nucleic acid sequencer operably connected to the controller, which nucleic acid sequencer is configured to provide the sequence information from the population of microsatellite loci in the sample.
- the nucleic acid sequencer is configured to perform pyrosequencing, single-molecule sequencing, nanopore sequencing, semiconductor sequencing, sequencing-bysynthesis, sequencing-by -ligation or sequencing-by -hybridization on the nucleic acids to generate sequencing reads.
- the system includes a sample preparation component operably connected to the controller, which sample preparation component is configured to prepare the sample (in some cases, cfDNA sample) to be sequenced by a nucleic acid sequencer.
- the sample preparation component is configured to selectively enrich regions from the nucleic acids in the sample.
- the sample preparation component is configured to attach one or more adapters comprising molecular barcodes to the nucleic acids.
- the system includes a nucleic acid amplification component operably connected to the controller, which nucleic acid amplification component is configured to amplify the DNA (in some cases, cfDNA).
- the nucleic acid amplification component is configured to amplify selectively enriched regions from the nucleic acids in the sample.
- the system includes a material transfer component operably connected to the controller, which material transfer component is configured to transfer one or more materials between a nucleic acid sequencer and a sample preparation component.
- the system includes a database operably connected to the controller, which database comprises one or more comparator results that are indexed with one or more therapies, and wherein the electronic processor further performs at least: (f) comparing the microsatellite instability status of the sample to one or more comparator results, wherein a substantial match between the microsatellite instability score and the comparator results indicates a predicted response to therapy for a subject.
- the present disclosure provides a computer readable media comprising non- transitory computer-executable instructions which, when executed by at least one electronic processor perform at least: (a) receiving sequence information from a population of microsatellite loci in a sample; (b) quantifying a number of different repeat lengths present at each of a plurality of the microsatellite loci from the sequence information to generate a site score for each of the plurality of the microsatellite loci; (c) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci; (d) calling the given microsatellite locus as being unstable when the site score of the given microsatellite locus exceeds the site specific trained threshold for the given microsatellite locus to generate a microsatellite instability score comprising a number of unstable microsatellite loci from the plurality of the
- the site scores of the plurality of the microsatellite loci comprise likelihood scores.
- the likelihood scores comprise probabilistic log likelihood-based scores that discriminate biological signal derived from a number of nucleic acid fragments (in some embodiments - cfDNA fragments) of somatic origin in the sample from noise arising from post-sample collection artifacts in the sample.
- the probabilistic log likelihood-based score for an individual microsatellite locus in the sequence information from the sample is typically determined using at least two parameters, wherein at least a first parameter comprises allele frequencies and at least a second parameter comprises at least one error mode.
- the allele frequencies comprise frequencies of nucleic acids comprising different repeat lengths in the sequence information from the sample.
- the at least one error mode typically comprises a random error mode and a strand specific error mode.
- the site scores of the plurality of the microsatellite loci comprise a difference between or ratio of: (a) a score measuring a support of observed sequences for a null hypothesis that the given microsatellite locus is stable, and (b) a score measuring a support of observed sequences for an alternate hypothesis that the given microsatellite locus is unstable.
- the site scores of the plurality of the microsatellite loci are generated using one or more statistical model selection criteria, such as a likelihood criterion, a log-likelihood criterion, a posterior probability criterion, an Akaike information criterion (AIC), a Bayesian information criterion, and/or the like.
- a likelihood criterion such as a probability criterion, a log-likelihood criterion, a posterior probability criterion, an Akaike information criterion (AIC), a Bayesian information criterion, and/or the like.
- AIC Akaike information criterion
- the site scores of the plurality of the microsatellite loci comprise Akaike Information Criterion (AlC)-based site scores that test for a presence of somatic indels at the plurality of the microsatellite loci.
- AlC Akaike Information Criterion
- a given AlC-based site score is calculated using the formula of:
- the parameters of the model are estimated using a maximum likelihood estimation (MLE).
- MLE is determined using a Nelder-Mead algorithm.
- a null hypothesis score of the model is calculated using the formula of:
- AICo k - log(Pr(obs
- AICo is the null hypothesis
- k is the number of parameters used in the model
- Pr probability
- obs comprises repeat lengths of observed sequencing reads covering the given microsatellite locus
- b is at least one strand specific error parameter
- g is at least one random error parameter.
- an alternate hypothesis score of the model is calculated using the formula of:
- AlCmin min a (k - log(Pr(obs
- k is the number of parameters used in the model
- Pr probability
- obs comprises repeat lengths of observed sequencing reads covering the given microsatellite locus
- b is at least one strand specific error parameter
- g is at least one random error parameter
- oris at least one allele frequency, wherein a ⁇ s a vector of allele frequencies such that the sum of one or more , is equal to one.
- change in the model is typically detected to determine site scores using the formula of:
- dAIC AICo - AlCmin.
- g comprises: (a) a rate of read-level errors where a microsatellite length observed within a sequencing read is one repeat unit longer than an expected microsatellite length for a strand of an originating nucleic acid molecule; and/or (b) a rate of read-level errors where a microsatellite length observed within a sequencing read is one repeat unit shorter than an expected microsatellite length for a strand of an originating nucleic acid molecule.
- b comprises: (a) a rate of strand-level errors where an expected microsatellite length of a sense strand is one repeat unit longer than an expected microsatellite length of an nucleic acid originating molecule; (b) a rate of strand-level errors where an expected microsatellite length of an antisense strand is one repeat unit longer than an expected microsatellite length of an nucleic acid originating molecule; (c) a rate of strand-level errors where an expected microsatellite length of a sense strand is one repeat unit shorter than an expected microsatellite length of an nucleic acid originating molecule; and/or, (d) a rate of strand-level errors where an expected microsatellite length of an antisense strand is one repeat unit shorter than an expected microsatellite length of an nucleic acid originating molecule.
- the given microsatellite locus is called as being unstable when the site score of the given microsatellite locus statistically exceeds the site specific trained threshold for the given microsatellite locus.
- a tumor fraction that comprises a maximum mutant allele fraction (MAF) of all somatic mutations identified in the nucleic acids in the sample is estimated.
- the tumor fraction is below about 0.05%, about 0.1%, about 0.2%, about 0.5%, about 1%, about 2%, about 3%, about 4%, about 5%, about 6%, about 7%, about 8%, about 9%, about 10%, about 11%, about 12%, about 13%, about 14%, or about 15% of all nucleic acids in the sample.
- the plurality of microsatellite loci comprises all of the population of microsatellite loci, whereas in other embodiments, the plurality of microsatellite loci comprises a subset of the population of microsatellite loci.
- the site specific trained threshold and/or the population trained threshold is determined from sequence information from a population of microsatellite loci in one or more training DNA samples.
- the MSI status of the sample is classified as MSI-high (MSI-H) when the microsatellite instability score is greater than about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 15, about 16, about 17, about 18, about 19, about 20, about 30, about 40, about 50, or more unstable microsatellite loci from the plurality of the microsatellite loci.
- the MSI status of the sample is classified as MSI-high (MSI-H) when the number of unstable microsatellite loci comprises about 0.1%, about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 15%, about 20%, or about 25% of the plurality of the microsatellite loci.
- the present disclosure provides a system, comprising a communication interface that obtains, over a communication network, sequencing information from one of more nucleic acids in a sample from the subject; and a computer in communication with the communication interface, wherein the computer comprises at least one computer processor and a computer readable medium comprising machine-executable code that, upon execution by at least one computer processor, implements a method comprising: (a) receiving sequence information from a population of microsatellite loci in a sample; (b) quantifying a number of different repeat lengths present at each of a plurality of the microsatellite loci from the sequence information to generate a site score for each of the plurality of the microsatellite loci; (c) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci; (d) calling the given microsatellite locus as being unstable when the
- the sequence information is provided by a nucleic acid sequencer.
- the nucleic acid sequencer performs pyrosequencing, single-molecule sequencing, nanopore sequencing, semiconductor sequencing, sequencing-by-synthesis, sequencing-by-ligation, sequencing-by-hybridization, and/or another sequencing technique on the nucleic acids to generate sequencing reads.
- the nucleic acid sequencer uses a clonal single molecule array derived from the sequencing library to generate the sequencing reads.
- the nucleic acid sequencer comprises a chip having an array of microwells for sequencing the sequencing library to generate the sequencing reads.
- the computer readable medium of the systems disclosed herein typically include a memory, a hard drive or a computer server.
- the communication network includes one or more computer servers capable of distributed computing.
- the distributed computing is cloud computing.
- the computer is located on a computer server that is remotely located from the nucleic acid sequencer.
- the systems disclosed herein include an electronic display in communication with the computer over a network, wherein the electronic display comprises a user interface for displaying results upon implementing (i) - (iv).
- the user interface is a graphical user interface (GUI) or web- based user interface.
- GUI graphical user interface
- the electronic display is in a personal computer.
- the electronic display is in an internet enabled computer.
- the internet enabled computer is located at a location remote from the computer.
- the computer readable medium comprises a memory, a hard drive or a computer server.
- the communication network comprises a telecommunication network, an internet, an extranet, or an intranet.
- the results of the systems and methods disclosed herein are used as an input to generate a report.
- the report may be in a paper or electronic format.
- the MSI score and/or MSI status obtained by the methods and systems disclosed herein can be displayed directly in such a report.
- diagnostic information or the one or more customized therapies based on the MSI status can be included in the report.
- the report is communicated to the subject (e.g. a patient) or health care provider.
- the methods, systems or computer readable media further comprises classifying the repetitive nucleic acid instability status of the nucleic acid sample as being stable if the repetitive nucleic acid instability score is below or at the population trained threshold for the population of repetitive nucleic acid loci in the nucleic acid sample.
- the methods, systems or computer readable media further comprises classifying the repetitive DNA instability status of the sample as being stable if the repetitive DNA instability score is below or at the population trained threshold for the population of repetitive DNA loci in the sample.
- the methods, systems or computer readable media further comprises classifying the microsatellite instability status of the sample as being stable if the microsatellite instability score is below or at the population trained threshold for the population of microsatellite loci in the sample.
- FIG. 1 is a flow chart that schematically depicts exemplary method steps of determining microsatellite instability (MSI) status according to some embodiments of the invention.
- MSI microsatellite instability
- Figure 2 is a schematic diagram of an exemplary system suitable for use with certain embodiments of the invention.
- Figure 3 is a plot of limit of detection (LoD) for simulated samples (probability of detection (y- axis); mutant allele fraction (MAF) (x-axis)).
- Figure 4A MSI score (y-axis); flowcell (x-axis)
- 4B MSI score (y-axis); reference sample (x-axis)
- Figure 5A (somatic max-MAF (y-axis); MSI score (x-axis)) and 5B (somatic max-MAF (y-axis); MSI score (x-axis)) are plots showing that tumor fraction does not correlate with MSI scores.
- Figures 6A and 6B are plots showing technical features of microsatellite detection.
- Figure 6A is a plot showing hierarchical clustering of Akaike Information Criterion scores for 99 candidate microsatellite loci from cfDNA sequencing results from 84 healthy donors. Loci with poor unique molecule coverage are shown in black, while loci with excessive technical artifact are shown in dark grey. Robust but consistent measurements of microsatellite repeat length, defining an informative site, are shown in light gray. Arrows indicate three Bethesda loci included in this study.
- Figure 6B is a plot showing observed error rate reduction associated with each component of Digital Sequencing.
- Figures 7A-7C are plots showing analytical validation of ctDNA MSI detection. Observed MSI detection rate was plotted by titration level (grey dots), and probit regression was used to determine the 95% limit of detection for 5ng ( Figure 7A) and 30ng (Figure 7B) cfDNA inputs.
- Figure 7C is a plot showing sample-level MSI scores for 499 independent replicates of two microsatellite-stable (MSS) and two MSI-H contrived materials ran across 499 separate sequencing runs. Dashed line indicates the sample-level threshold for MSI detection.
- Figure 8 is a plot showing precision studies using a contrived sample at three input levels (5, 10, and 30 ng) processed in triplicate within ran and between runs. Each greyscale shade represents a different ran.
- Figures 10A-10C show concordance data of ctDNA MSI status with tissue testing.
- Figure 10A is a plot showing sample-level MSI scores for 1137 cfDNA samples categorized by tissue test result and observed tumor fraction. Dashed line indicates the sample level threshold for MSI detection.
- Figure 10B is a plot showing concordance result categorized by tissue test methodology.
- Figure 10C is a table showing descriptive statistics for the evaluable unique patient cohort.
- Figures 11A-11C are plots showing ctDNA MSI landscape across 28,459 clinical samples.
- Figure 11A is a plot showing positive axis reports ctDNA MSI prevalence across 16 most prevalent tumor types in the sample set. Negative axis reports the tissue MSI prevalence across the same based on Craig et. al (52). The total number of samples are reported each with the number of MSI-H samples in parentheses.
- Figure 11B is a plot showing sample-level MSI scores by tumor type for tumor types with > 5 MSI-H samples. Dashed line indicates the sample- level threshold for MSI detection.
- Figure 11C is a plot showing frequency of individual microsatellite sites contributing to MSI-H samples by tumor type for tumor types with > 5 MSI-H samples.
- UCEC uterine corpus endometrial carcinoma
- STAD stomach adenocarcinoma
- COAD colon adenocarcinoma
- PRAD prostate adenocarcinoma
- COUP cancer of unknown primary
- BLCA bladder carcinoma
- CHCA cholangiocarcinoma
- HNSC head and neck squamous cell carcinoma
- LUSC lung squamous cell carcinoma
- BRST breast carcinoma
- PANC pancreatic adenocarcinoma
- LUNG lung cancer, not otherwise specified
- LIHC liver hepatocellular carcinoma
- KIRC kidney renal cell carcinoma
- LUAD lung adenocarcinoma.
- Figures 12A and 12B are plots showing tumor mutation burden by MSI status. Number of single nucleotide variants (SNVs) ( Figure 12A) and indels ( Figure 12B) detected per sample categorized by MSI status across 278 MSI-H and 28,181 MSS samples.
- SNVs single nucleotide variants
- Figure 12B indels
- Figures 13A-13E show clinical outcome data to immune checkpoint blockade (ICB) therapy in ctDNA MSI-H patients.
- Figure 13A is a swimmer plot of duration of pembrolizumab therapy in months.
- Baseline Figures 13B and 13C
- post-therapy Figures 13D and 13E
- CT Figures 13B and 13D
- gastroendoscopy Figures 13C and 13E
- “about” or“approximately” as applied to one or more values or elements of interest refers to a value or element that is similar to a stated reference value or element.
- the term“about” or“approximately” refers to a range of values or elements that falls within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value or element unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value or element).
- Adapter refers to a short nucleic acid (e.g., less than about 500 nucleotides, less than about 100 nucleotides, or less than about 50 nucleotides in length) that is typically at least partially double-stranded and used to link to either or both ends of a given sample nucleic acid molecule.
- Adapters can include nucleic acid primer binding sites to permit amplification of a nucleic acid molecule flanked by adapters at both ends, and/or a sequencing primer binding site, including primer binding sites for sequencing applications, such as various next-generation sequencing (NGS) applications.
- NGS next-generation sequencing
- Adapters can also include binding sites for capture probes, such as an oligonucleotide attached to a flow cell support or the like.
- Adapters can also include a nucleic acid tag as described herein. Nucleic acid tags are typically positioned relative to amplification primer and sequencing primer binding sites, such that a nucleic acid tag is included in amplicons and sequence reads of a given nucleic acid molecule.
- the same or different adapters can be linked to the respective ends of a nucleic acid molecule. In some embodiments, an adapter of the same sequence is linked to the respective ends of the nucleic acid molecule except that the nucleic acid tag differs.
- the adapter is a Y -shaped adapter in which one end is blunt ended or tailed as described herein, for joining to a nucleic acid molecule, which is also blunt ended or tailed with one or more complementary nucleotides.
- an adapter is a bell-shaped adapter that includes a blunt or tailed end for joining to a nucleic acid molecule to be analyzed.
- Other examples of adapters include T-tailed and C-tailed adapters.
- Administer As used herein, “administef’ or“administering” a therapeutic agent (e.g., an immunological therapeutic agent) to a subject means to give, apply or bring the composition into contact with the subject. Administration can be accomplished by any of a number of routes, including, for example, topical, oral, subcutaneous, intramuscular, intraperitoneal, intravenous, intrathecal and intradermal.
- a therapeutic agent e.g., an immunological therapeutic agent
- Administration can be accomplished by any of a number of routes, including, for example, topical, oral, subcutaneous, intramuscular, intraperitoneal, intravenous, intrathecal and intradermal.
- Akaike Information Criterion As used herein,“Akaike information criterion” or“AIC” refers to a criterion for selecting a statistical model from among a finite set of models and includes a penalty term for the number of parameters in the model. In some embodiments, the model with the lowest AIC is selected.
- Allele Frequency refers to the relative frequency of an allele at a particular locus in a population or in a given subject. Allele frequency is typically expressed as a fraction or percentage.
- “amplify” or“amplification” in the context of nucleic acids refers to the production of multiple copies of a polynucleotide, or a portion of the polynucleotide, typically starting from a small amount of the polynucleotide (e.g., a single polynucleotide molecule), where the amplification products or amplicons are generally detectable. Amplification of polynucleotides encompasses a variety of chemical and enzymatic processes.
- Barcode As used herein,“barcode” or“molecular barcode” in the context of nucleic acids refers to a nucleic acid molecule comprising a sequence that can serve as a molecular identifier. For example, individual "barcode” sequences are typically added to each DNA fragment during next-generation sequencing (NGS) library preparation so that each sequencing read can be identified and sorted before the final data analysis.
- NGS next-generation sequencing
- Cancer Type refers to a type or subtype of cancer defined, e.g., by histopathology. Cancer type can be defined by any conventional criterion, such as on the basis of occurrence in a given tissue (e.g., blood cancers, central nervous system (CNS), brain cancers, lung cancers (small cell and non-small cell), skin cancers, nose cancers, throat cancers, liver cancers, bone cancers, lymphomas, pancreatic cancers, bowel cancers, rectal cancers, thyroid cancers, bladder cancers, kidney cancers, mouth cancers, stomach cancers, breast cancers, prostate cancers, ovarian cancers, lung cancers, intestinal cancers, soft tissue cancers, neuroendocrine cancers, gastroesophageal cancers, head and neck cancers, gynecological cancers, colorectal cancers, urothelial cancers, solid state cancers, heterogeneous
- Cell-free nucleic acid refers to nucleic acids not contained within or otherwise bound to a cell or, in some embodiments, nucleic acids naturally remaining in a sample following the removal of intact cells.
- Cell-free nucleic acids can include, for example, all non-encapsulated nucleic acids sourced from a bodily fluid (e.g., blood, plasma, serum, urine, cerebrospinal fluid (CSF), etc.) from a subject.
- a bodily fluid e.g., blood, plasma, serum, urine, cerebrospinal fluid (CSF), etc.
- Cell-free nucleic acids include DNA (cfDNA), RNA (cfRNA), and hybrids thereof, including genomic DNA, mitochondrial DNA, circulating DNA, siRNA, miRNA, circulating RNA (cRNA), tRNA, rRNA, small nucleolar RNA (snoRNA), Piwi-interacting RNA (piRNA), long non-coding RNA (long ncRNA), and/or fragments of any of these.
- Cell-free nucleic acids can be double-stranded, single-stranded, or a hybrid thereof.
- a cell-free nucleic acid can be released into bodily fluid through secretion or cell death processes, e.g., cellular necrosis, apoptosis, or the like.
- cell-free nucleic acids are released into bodily fluid from cancer cells, e.g., circulating tumor DNA (ctDNA). Others are released from healthy cells. CtDNA can be non-encapsulated tumor-derived fragmented DNA.
- CtDNA can be non-encapsulated tumor-derived fragmented DNA.
- Another example of cell-free nucleic acids is fetal DNA circulating freely in the maternal blood stream, also called cell-free fetal DNA (cffDNA).
- a cell-free nucleic acid can have one or more epigenetic modifications, for example, a cell-free nucleic acid can be acetylated, 5-methylated, ubiquitylated, phosphorylated, sumoylated, ribosylated, and/or citrullinated.
- Comparator result means a result or set of results to which a given test sample or test result can be compared to identity one or more likely properties of the test sample or result, and/or one or more possible prognostic outcomes and/or one or more customized therapies for the subject from whom the test sample was taken or otherwise derived. Comparator results are typically obtained from a set of reference samples (e.g., from subjects having the same disease or cancer type as the test subject and/or from subjects who are receiving, or who have received, the same therapy as the test subject). In certain embodiments, for example, a microsatellite instability status of the sample (e.g.
- microsatellite instability status of the cfDNA test sample is compared with comparator results to identity substantial matches between the microsatellite instability status of the cfDNA test sample and microsatellite instability status determined for a set of reference samples.
- the microsatellite instability scores determined for the set of reference samples are typically indexed with one or more customized therapies. Thus, when a substantial match is identified, the corresponding customized therapies are thereby also identified as potential therapeutic pathways for the subject from whom the test sample was taken.
- control sample or“control DNA sample” refers to a sample of known composition and/or having known properties and/or known parameters (e.g., known tumor fraction, known coverage, known microsatellite instability score, and/or the like) that is analyzed along with or compared to test samples in order to evaluate the accuracy of an analytical procedure.
- known parameters e.g., known tumor fraction, known coverage, known microsatellite instability score, and/or the like
- Coverage refers to the number of nucleic acid molecules that represent a particular base position.
- Customized therapy refers to a therapy that is associated with a desired therapeutic outcome for a subject or population of subjects selected based on a given criterion, e.g. having a given microsatellite instability status or being within a defined range of microsatellite instability scores.
- “deoxyribonucleic acid” or“DNA” refers to a natural or modified nucleotide which has a hydrogen group at the 2'-position of the sugar moiety.
- DNA typically includes a chain of nucleotides comprising four types of nucleotide bases; adenine (A), thymine (T), cytosine (C), and guanine (G).
- “ribonucleic acid” or“RNA” refers to a natural or modified nucleotide which has a hydroxyl group at the 2'-position of the sugar moiety.
- RNA typically includes a chain of nucleotides comprising four types of nucleotides; A, uracil (U), G, and C.
- A uracil
- U uracil
- G guanine
- C guanine
- nucleic acid sequencing data denotes any information or data that is indicative of the order and identity of the nucleotide bases (e.g., adenine, guanine, cytosine, and thymine or uracil) in a molecule (e.g., a whole genome, whole transcriptome, exome, oligonucleotide, polynucleotide, or fragment) of a nucleic acid such as DNA or RNA.
- sequence information obtained using all available varieties of techniques, platforms or technologies, including, but not limited to: capillary electrophoresis, microarrays, ligation-based systems, polymerase-based systems, hybridization-based systems, direct or indirect nucleotide identification systems, pyrosequencing, ion- or pH-based detection systems, and electronic signature-based systems.
- Immunotherapy refers to treatment with one or more agents that act to stimulate the immune system so as to kill or at least to inhibit growth of cancer cells, and preferably to reduce further growth of the cancer, reduce the size of the cancer and/or eliminate the cancer. Some such agents bind to a target present on cancer cells; some bind to a target present on immune cells and not on cancer cells; some bind to a target present on both cancer cells and immune cells. Such agents include, but are not limited to, checkpoint inhibitors and/or antibodies.
- Checkpoint inhibitors are inhibitors of pathways of the immune system that maintain self-tolerance and modulate the duration and amplitude of physiological immune responses in peripheral tissues to minimize collateral tissue damage (see, e.g., Pardoll, Nature Reviews Cancer 12, 252-264 (2012)).
- Exemplary agents include antibodies against any of PD-1, PD-2, PD-L1, PD-L2, CTLA-4, 0X40, B7.1, B7He, LAG3, CD137, KIR, CCR5, CD27, CD40, or CD47.
- Other exemplary agents include proinflammatory cytokines, such as IL-Ib, IL-6, and TNF-a.
- Other exemplary agents are T-cells activated against a tumor, such as T-cells activated by expressing a chimeric antigen targeting a tumor antigen recognized by the T-cell.
- Indel refers to a mutation that involves the insertion or deletion of one or more nucleotides in the genome of a subject.
- indexed refers to a first element (e.g., microsatellite instability score) linked to a second element (e.g., a given therapy).
- first element e.g., microsatellite instability score
- second element e.g., a given therapy
- “instability status” or“instability score” in the context of repetitive nucleic acids refers to a measure or determination of whether a given repetitive nucleic acid locus or population of repetitive nucleic acid loci in one or more nucleic acid samples exhibit a level or degree of mutation (e.g., variable repeat length, etc.) above, at, or below a threshold level determined for that locus or population of loci.
- level or degree of mutation e.g., variable repeat length, etc.
- instability status and instability score are not interchangeable but are rather related concepts. The instability status is based on the instability score.
- Limit of Detection As used herein,“limit of detection” or“LoD” means the smallest amount of a substance (e.g., a nucleic acid) in a sample that can be measured by a given assay or analytical approach.
- Maximum MAF As used herein,“maximum MAF” or“max MAF” refers to the maximum MAF of all somatic variants in a sample.
- Microsatellite refers to a repetitive nucleic acid having repeat units of less than about 10 base pairs or nucleotides in length.
- Minisatellite refers to a repetitive nucleic acid having repeat units from about 10 to about 60 base pairs or nucleotides in length.
- mutant allele fraction refers to the fraction of nucleic acid molecules harboring an allelic alteration or mutation at a given genomic position. MAF is generally expressed as a fraction or a percentage. For example, an MAF is typically less than about 0.5, 0.1, 0.05, or 0.01 (i.e., less than about 50%, 10%, 5%, or 1%) of all somatic variants or alleles present at a given locus.
- Mutation refers to a variation from a known reference sequence and includes mutations such as, for example, single nucleotide variants (SNVs), copy number variants or variations (CNVs)/aberrations, insertions or deletions (indels), gene fusions, transversions, translocations, frame shifts, duplications, repeat expansions, and epigenetic variants.
- SNVs single nucleotide variants
- CNVs copy number variants or variations
- Indels insertions or deletions
- gene fusions transversions
- translocations translocations
- frame shifts duplications
- repeat expansions and epigenetic variants.
- a mutation can be a germline or somatic mutation.
- a reference sequence for purposes of comparison is a wildtype genomic sequence of the species of the subject providing a test sample, typically the human genome.
- Neoplasm As used herein, the terms“neoplasm” and“tumof’ are used interchangeably. They refer to abnormal growth of cells in a subject. A neoplasm or tumor canbe benign, potentially malignant, or malignant. A malignant tumor is referred to as a cancer or a cancerous tumor.
- next generation sequencing refers to sequencing technologies having increased throughput as compared to traditional Sanger- and capillary electrophoresis- based approaches, for example, with the ability to generate hundreds of thousands of relatively small sequence reads at a time.
- next generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization.
- nucleic acid tag refers to a short nucleic acid (e.g., less than about 500 nucleotides, about 100 nucleotides, about 50 nucleotides, or about 10 nucleotides in length), used to distinguish nucleic acids from different samples (e.g., representing a sample index), or different nucleic acid molecules in the same sample (e.g., representing a molecular barcode), of different types, or which have undergone different processing.
- the nucleic acid tag comprises a predetermined, fixed, non-random, random or semi-random oligonucleotide sequence.
- nucleic acid tags may be used to label different nucleic acid molecules or different nucleic acid samples or sub-samples.
- Nucleic acid tags can be single-stranded, double-stranded, or at least partially double-stranded. Nucleic acid tags optionally have the same length or varied lengths. Nucleic acid tags can also include double-stranded molecules having one or more blunt-ends, include 5’ or 3’ single-stranded regions (e.g., an overhang), and/or include one or more other single-stranded regions at other locations within a given molecule. Nucleic acid tags can be attached to one end or to both ends of the other nucleic acids (e.g., sample nucleic acids to be amplified and/or sequenced).
- Nucleic acid tags can be decoded to reveal information such as the sample of origin, form, or processing of a given nucleic acid.
- nucleic acid tags can also be used to enable pooling and/or parallel processing of multiple samples comprising nucleic acids bearing different molecular barcodes and/or sample indexes in which the nucleic acids are subsequently being deconvolved by detecting (e.g., reading) the nucleic acid tags.
- Nucleic acid tags can also be referred to as identifiers (e.g. molecular identifier, sample identifier).
- nucleic acid tags can be used as molecular barcodes (e.g., to distinguish between different molecules or amplicons of different parent molecules in the same sample or sub-sample). This includes, for example, uniquely tagging different nucleic acid molecules in a given sample, or non-uniquely tagging such molecules.
- tags i.e., molecular barcodes
- endogenous sequence information for example, start and/or stop positions where they map to a selected reference genome, a sub-sequence of one or both ends of a sequence, and/or length of a sequence
- a sufficient number of different molecular barcodes are used such that there is a low probability (e.g., less than about a 10%, less than about a 5%, less than about a 1%, or less than about a 0.1% chance) that any two molecules may have the same endogenous sequence information (e.g., start and/or stop positions, subsequences of one or both ends of a sequence, and/or lengths) and also have the same molecular barcode.
- polynucleotide refers to a linear polymer of nucleosides (including deoxyribonucleosides, ribonucleosides, or analogs thereof) joined by intemucleosidic linkages.
- a polynucleotide comprises at least three nucleosides. Oligonucleotides often range in size from a few monomeric units, e.g. 3-4, to hundreds of monomeric units.
- a polynucleotide is represented by a sequence of letters, such as“ATGCCTG,” it will be understood that the nucleotides are in 5’ -> 3’ order from left to right and that in the case of DNA,“A” denotes deoxy adenosine,“C” denotes deoxycytidine,“G” denotes deoxyguanosine, and“T” denotes deoxythymidine, unless otherwise noted.
- the letters A, C, G, and T may be used to refer to the bases themselves, to nucleosides, or to nucleotides comprising the bases, as is standard in the art.
- “population trained threshold” in the context of repetitive nucleic acids refers to a separately determined aggregate maximum number of unstable repetitive nucleic acid loci (e.g., a number of unstable microsatellite loci) expected to be observed in a training DNA sample (e.g., a non-tumor sample, a tumor sample, etc.) that includes those loci.
- a population trained threshold is typically used to characterize an experimentally determined repetitive nucleic acid instability score for a particular sample.
- processing As used herein, the terms“processing”,“calculating”, and“comparing” can be used interchangeably. In certain applications, the terms refer to determining a difference, e.g., a difference in number or sequence. For example, repetitive DNA instability score (e.g., microsatellite instability score), gene expression, copy number variation (CNV), indel, and/or single nucleotide variant (SNV) values or sequences can be processed.
- repetitive DNA instability score e.g., microsatellite instability score
- CNV copy number variation
- SNV single nucleotide variant
- Reference Sequence As used herein,“reference sequence” refers to a known sequence used for purposes of comparison with experimentally determined sequences. For example, a known sequence can be an entire genome, a chromosome, or any segment thereof.
- a reference sequence typically includes at least about 20, at least about 50, at least about 100, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, at least about 500, at least about 1000, or more nucleotides.
- a reference sequence can align with a single contiguous sequence of a genome or chromosome or can include non-contiguous segments that align with different regions of a genome or chromosome.
- Exemplary reference sequences include, for example, human genomes, such as, hG19 and hG38.
- repeat length in the context of repetitive nucleic acids refers to the number of repeat units present at a given repetitive nucleic acid locus. To illustrate, the following single-stranded nucleic acid strand has a repeat length of eight:
- repeat unit in the context of repetitive nucleic acids refers to the individual nucleotide pattern or motif (e.g., homopolymer or heteropolymer) that is repeated at a given repetitive nucleic acid locus.
- repeat unit e.g., homopolymer or heteropolymer
- Repetitive nucleic acid or“repetitive element” refers to a recurring pattern of nucleotides that is present in multiple copies throughout a given genome and/or a population of genomes. Repetitive nucleic acid include repetitive DNA and repetitive RNA.
- Non-limiting examples of repetitive nucleic acids include microsatellites, terminal repeats, tandem repeats, minisatellites, satellite DNA, interspersed repeats, transposable elements (e.g., DNA transposons, retrotransposons (e.g., LTR-retrotransposons (HERVs) and LTR-retrotransposons (HERVs)), etc.), clustered regularly interspaced short palindromic repeats (CRISPR), direct repeats, inverted repeats, mirror repeats, and everted repeats.
- transposable elements e.g., DNA transposons, retrotransposons (e.g., LTR-retrotransposons (HERVs) and LTR-retrotransposons (HERVs)
- CRISPR clustered regularly interspaced short palindromic repeats
- direct repeats inverted repeats, mirror repeats, and everted repeats.
- Repetitive Nucleic Acid Instability Score refers to an aggregate number of repetitive nucleic acid loci from a population of repetitive nucleic acid loci in a given sample that are called or otherwise determined to be unstable. This repetitive nucleic acid instability score is a sample-level score (or sample score) and is different from the site score, which is specific to the locus.
- sample means anything capable of being analyzed by the methods and/or systems disclosed herein.
- Sensitivity As used herein,“sensitivity” means the probability of detecting the presence of a mutation at a given MAF and coverage.
- Sequencing As used herein,“sequencing” refers to any of a number of technologies used to determine the sequence (e.g., the identity and order of monomer units) of a biomolecule, e.g., a nucleic acid such as DNA or RNA.
- Exemplary sequencing methods include, but are not limited to, targeted sequencing, single molecule real-time sequencing, exon or exome sequencing, intron sequencing, electron microscopy-based sequencing, panel sequencing, transistor-mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, sequencing by hybridization, pyrosequencing, capillary electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid-phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, co -amplification at lower denaturation temperature-PCR (COLD-PCR), multiplex PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, singlemolecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiDTM sequencing, MS-PET sequencing, and a combination thereof.
- sequence information in the context of a nucleic acid polymer means the order and identity of monomer units (e.g., nucleotides, etc.) in that polymer.
- Site Score refers to a measure of likelihood of presence of additional repeat lengths apart from germline repeat length at a given repetitive nucleic acid locus in a sample.
- a site score is determined for a given locus by calculating a delta Akaike information criterion (dAIC) for the locus.
- dAIC delta Akaike information criterion
- Site Specific Trained Threshold refers to a separately determined maximum value of a site score for a given repetitive nucleic acid locus (e.g., a given microsatellite locus) such that this locus is stable.
- Somatic mutation means a mutation in the genome that occurs after conception. Somatic mutations can occur in any cell of the body except germ cells and accordingly, are not passed on to progeny.
- Specificity in the context of a diagnostic analysis or assay refers to the extent to which the analysis or assay detects an intended target analyte to the exclusion of other components of a given sample.
- Substantial Match means that at least a first value orelement is at least approximately equal to at least a second value or element. In certain embodiments, for example, customized therapies are identified when there is at least a substantial or approximate match between a microsatellite instability score and a comparator result.
- Subject refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant.
- a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human.
- Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals).
- a subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual that is in need of therapy or suspected of needing therapy.
- the terms“individual” or “patient” are intended to be interchangeable with“subject.”
- a subject can be an individual who has been diagnosed with having a cancer, is going to receive a cancer therapy, and/or has received at least one cancer therapy.
- the subject can be in remission of a cancer.
- the subject can be an individual who is diagnosed of having an autoimmune disease.
- the subject can be a female individual who is pregnant or who is planning on getting pregnant, who may have been diagnosed of or suspected of having a disease, e.g., a cancer, an auto-immune disease.
- Threshold refers to a separately determined value used to characterize or classify experimentally determined values.
- training DNA sample refers to a DNA sample used in estimating the site specific trained threshold and population trained threshold.
- the training DNA sample dataset comprises one or more training DNA samples.
- the training DNA samples comprise one or more normal DNA samples and/or tumor DNA samples.
- the training DNA samples comprise one or more samples with MSI-High and/or MSI-Low/MSS status.
- Tumor Fraction refers to the estimate of the fraction of nucleic acid molecules derived from a tumor in a given sample.
- the tumor fraction of a sample can be a measure derived from the max MAF of the sample or coverage of the sample or length of the cfDNA fragments in the sample or any other selected feature of the sample.
- the tumor fraction of a sample is equal to the max MAF of the sample.
- Unstable As used herein,“unstable” or“instability” in the context of repetitive nucleic acids refers to a level of mutation (e.g., indels or the like) observed at a given repetitive nucleic acid locus or in a given population of repetitive nucleic acid loci in a nucleic acid sample (e.g., a cfDNA sample) that exceeds a threshold (e.g., a site specific trained threshold - locus level; a population trained threshold - sample level; or the like).
- a threshold e.g., a site specific trained threshold - locus level; a population trained threshold - sample level; or the like.
- Cancer encompasses a large group of genetic diseases with the common characteristics of abnormal cell growth and the potential to metastasize beyond the cells’ site of origin within the body.
- the underlying molecular basis of the disease are mutations and/or epigenetic changes that lead to a transformed cellular phenotype, whether those deleterious changes were acquired through heredity or have a somatic basis. To complicate matters, these molecular changes typically vary, not only among patients having the same type of cancer, but even within a given patient’s own tumor.
- MSI microsatellite instability
- MMR impaired DNA mismatch repair
- dmgs such as pembrolizumab (Keytruda®), which is used to treat advanced melanoma, head and neck squamous cell carcinoma, non-small cell lung cancer (NSCLC), and classical Hodgkin lymphoma.
- pembrolizumab Keytruda®
- NSCLC non-small cell lung cancer
- Hodgkin lymphoma classical Hodgkin lymphoma
- This disclosure provides methods, computer readable media, and systems that are useful in determining and analyzing MSI in patient samples, especially cell-free DNA (cfDNA) samples.
- the MSI status determined using these methods and related aspects helps guide disease prognosis and treatment decisions.
- the results achieved with the methods and related aspects disclosed herein generally have a high degree of concordance with, for example, those obtained using more conventional PCR-based MSI assessment approaches.
- This application discloses various methods of accurately determining the microsatellite instability (MSI) status and/or other repetitive DNA instability status of samples (especially, cell-free DNA (cfDNA) samples).
- the methods of assessing MSI status include targeted sequencing of cfDNA, for example, using the Digital Sequencing platform from Guardant Health, Inc. (Redwood City, CA, USA), allows broad coverage of simple repeats where microsatellite instability can occur across a wide range of cancer types.
- Digital Sequencing platform is an NGS panel of cancer-related genes utilizing high-quality sequencing of cell-free DNA (which could comprise circulating tumor DNA) isolated from a simple, non-invasive blood draw.
- FIG. 1 provides a flow chart that schematically depicts exemplary method steps of determining the MSI status according to some embodiments of the invention.
- method 100 includes quantifying a number of different repeat lengths present at each of a plurality of microsatellite loci from sequence information to generate a site score for each of the plurality of the microsatellite loci in step 110.
- the sequence information is typically obtained from a population of microsatellite loci in a cfDNA sample.
- the number of different repeat lengths present at a given microsatellite locus is quantified using a probabilistic log likelihood-based site score in some embodiments.
- other quantification approaches are also optionally utilized so long as they too accurately discriminate biological signal derived from relatively small numbers of cfDNA fragments of somatic origin from noise arising, for example, from technical or post-sample collection artifacts (e.g., amplification artifacts, sequencing artifacts, and the like) in samples.
- Method 100 also includes comparing the site score of a given microsatellite locus to a site specific trained threshold for that specific microsatellite locus in step 112.
- the experimentally determined site score of a particular locus and its corresponding site specific trained threshold are typically compared for each of the plurality of the microsatellite loci.
- the site specific trained threshold of a given locus is generally a predetermined value for that particular locus derived from a population of training DNA samples, such as a cohort of normal or non-tumor cfDNA samples.
- method 100 further includes calling a given microsatellite locus as being unstable when the site score (e.g., a likelihood score or the like) of that given microsatellite locus exceeds (e.g., is statistically greater than) the site specific trained threshold for that given microsatellite locus in step 114. Based upon these comparisons, a microsatellite instability score is generated, which includes the number of microsatellite loci called as being unstable from the plurality of the microsatellite loci (e.g., is an overall or aggregate MSI score for the sample).
- site score e.g., a likelihood score or the like
- method 100 also includes classifying the MSI status of the cfDNA sample as being unstable when the microsatellite instability score exceeds a population trained threshold for the population of microsatellite loci in the cfDNA sample to thereby identify an unstable cfDNA sample (e.g., score or predict the sample as being MSI-High) in step 116.
- the MSI status of a sample is determined by the presence of a minimum number of unstable microsatellite loci in certain embodiments.
- the population trained threshold is generally a predetermined value derived from a population of training DNA samples, such as a cohort of normal or non-tumor cfDNA samples.
- thresholds are determined or otherwise derived from at least one training DNA sample dataset.
- a training DNA sample dataset typically includes from at least about 25 to at least about 30,000 or more training samples.
- the training DNA sample dataset includes about 50, 75, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,500, 5,000, 7,500, 10,000, 15,000, 20,000, 25,000, 50,000, 100,000, 1,000,000, or more training DNA samples.
- method 100 includes additional upstream and/or downstream steps.
- method 100 starts in step 102 with providing the sample from the subject in step 104 (e.g., providing a blood sample taken from the subject).
- the workflow of method 100 also typically includes amplifying nucleic acids in the sample to generate amplified nucleic acids in step 106 and sequencing the amplified nucleic acids to generate sequence information in step 108, before quantifying the number of different repeat lengths present at each of a plurality of microsatellite loci from the sequence information in step 110. Nucleic acid amplification (including related sample preparation), nucleic acid sequencing, and related data analysis are described further herein.
- method 100 includes various steps that are downstream from the identification of unstable cfDNA samples in step 116. Some examples of these, include comparing the microsatellite instability status of a cfDNA sample to comparator results that are indexed with therapies to identify customized therapies for treating the disease (e.g., cancer or another genetic-based disease, disorder, or condition) in the subject in step 118. In other exemplary embodiments, method 100 also includes administering at least one of the identified customized therapies to the subject when there is a substantial match between the microsatellite instability status of the sample and the comparator results in step 120 before ending in step 122 (e.g., to treat cancer or another disease, disorder, or condition of the subject).
- the disease e.g., cancer or another genetic-based disease, disorder, or condition
- method 100 also includes administering at least one of the identified customized therapies to the subject when there is a substantial match between the microsatellite instability status of the sample and the comparator results in step 120 before ending in step 122 (e.g., to treat cancer or
- site scores of microsatellite loci optionally include likelihood scores.
- likelihood scores include probabilistic log likelihood-based scores.
- the methods include determining the probabilistic log likelihood-based score for an individual microsatellite locus in sequence information obtained from a sample using various parameters, such as allele frequencies and one or more error modes (e.g., random error modes, strand specific error mode, and/or the like). Allele frequencies generally include observed frequencies of nucleic acids having different repeat lengths at a given microsatellite locus in sequence information obtained from the sample.
- a site score of a particular microsatellite locus includes a difference between or a ratio of: (a) a score measuring a support of observed nucleic acid sequences for a null hypothesis that the given microsatellite locus is stable, and (b) a score measuring a support of observed nucleic acid sequences for an alternate hypothesis that the given microsatellite locus is unstable.
- Null hypothesis is the hypothesis with minimum AIC score among all hypothesis with assumption that site is stable and alternate hypothesis is the hypothesis with minimum AIC score among all hypothesis with assumption that the site is unstable.
- site scores are generated using various measures of model accuracy, such as a likelihood criterion, a log-likelihood criterion, a posterior probability criterion, an Akaike information criterion (AIC), a Bayesian information criterion, and/or the like. Additional details regarding statistical modeling, including measures of statistical model accuracy, that are optionally adapted for using in performing the methods disclosed herein are provided in, for example, Brace, Practical Statistics for Data scientistss: 50 Essential Concepts , 1 st Ed., O'Reilly Media (2017), Freedman et al., Statistics , 4 th Ed., W. W.
- a site score of a given microsatellite locus optionally includes an AlC-based site score that tests for a presence of somatic indels at that microsatellite locus.
- a given AlC-based site score is calculated using the formula of:
- the methods include estimating the parameters of the model using a maximum likelihood estimation (MLE) (e.g., using a Nelder-Mead algorithm or another simplex search algorithm).
- MLE maximum likelihood estimation
- the methods optionally include calculating a null hypothesis of the model using the formula of:
- AICo k - log(Pr(obs
- the methods include calculating an alternate hypothesis of the model using the formula of:
- AlCmin min a (k - log(Pr(obs
- AIC m ,n is the alternate hypothesis, min a an effect of minimizing over all values of a.
- k is the number of parameters used in the model
- Pr probability
- obs is a number of observed sequencing reads covering the given microsatellite locus
- /? is at least one strand specific error parameter
- g is at least one random error parameter
- a is at least one allele frequency, wherein a ⁇ s a vector of allele frequencies such that the sum of one or more «7 is equal to one.
- Changes in the model used to determine sites scores (D AIC) are typically detected using the formula of:
- dAIC AICo - AlCmin.
- the parameter g includes (a) a rate of read-level errors where a microsatellite length observed within a sequencing read is one repeat unit longer than an expected microsatellite length for a strand of an originating nucleic acid molecule, and/or (b) a rate of read-level errors where a microsatellite length observed within a sequencing read is one repeat unit shorter than an expected microsatellite length for a strand of an originating nucleic acid molecule.
- the parameter b includes (a) a rate of strand-level errors where an expected microsatellite length of a sense strand is one repeat unit longer than an expected microsatellite length of an nucleic acid originating molecule, (b) a rate of strand-level errors where an expected microsatellite length of an antisense strand is one repeat unit longer than an expected microsatellite length of an nucleic acid originating molecule, (c) a rate of strand-level errors where an expected microsatellite length of a sense strand is one repeat unit shorter than an expected microsatellite length of an nucleic acid originating molecule, and/or (d) a rate of strand-level errors where an expected microsatellite length of an antisense strand is one repeat unit shorter than an expected microsatellite length of an nucleic acid originating molecule.
- a AlC-based site score is calculated using the formula of:
- AICo and AlCmin are calculated using the above formula.
- samples analyzed using the methods described herein typically include various mutant allele fractions (MAFs) (e.g., sample fractions exhibiting different repeat lengths a specific microsatellite locus or other allelic alterations).
- samples include a tumor fraction in some embodiments.
- the maximum MAF serves as an approximation of the tumor fraction in a given sample.
- the tumor fraction is typically below about 0.05%, about 0.1%, about 0.2%, about 0.5%, about 1%, about 2%, about 3%, about 4%, about 5%, about 6%, about 7%, about 8%, about 9%, about 10%, about 11%, about 12%, about 13%, about 14%, or about 15% of all nucleic acids in the sample.
- the methods disclosed herein typically include a sensitivity of at least about 94% at a limit of detection (LOD) of about a 0.2% tumor fraction of nucleic acids in a given sample.
- LOD limit of detection
- the methods also generally have a specificity of at least about 99% for non-tumor DNA in the sample.
- the determined MSI status of a sample also typically has at least about 95%, 96%, 97%, 98%, or 99% concordance with a corresponding MSI status of the sample determined using a standard PCR-based MSI assessment technique across a tumor fraction range of about 1.4% to about 15%. In some embodiments, this concordance is 100%.
- the MSI status of a particular sample is classified as MSI-high (MSI-H) when the microsatellite instability score for the sample is greater than about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100 or more that 100 unstable microsatellite loci in that sample.
- the population trained threshold used to determine the instability status (e.g., MSI status) of a sample is about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 30, about 40, about 50, about 60, about 70, about 80, about 90, about 100 or more that 100 unstable repetitive nucleic acid (e.g. microsatellite) loci.
- the population trained threshold for the sample is about 5 unstable microsatellite loci.
- the population trained threshold for the sample is about 6 unstable repetitive nucleic acid loci.
- the population trained threshold for the sample is about 10 unstable repetitive nucleic acid loci.
- the population trained threshold for the sample is about 15 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 16 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 20 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 25 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 26 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 30 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 35 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 36 unstable repetitive nucleic acid loci.
- the population trained threshold for the sample is about 40 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 45 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 46 unstable repetitive nucleic acid loci. In some embodiments, the population trained threshold for the sample is about 50 unstable repetitive nucleic acid loci. In some embodiments, the repetitive nucleic acid loci can be microsatellite loci.
- the MSI status of a given sample is classified as MSI-H when the number of unstable micro satellite loci comprises about 0.1%, about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 15%, about 20%, or about 25% of all microsatellite loci evaluated in that sample.
- microsatellite loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 50 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 60 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 70 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 80 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g.
- MSI MSI status of a given sample.
- about 90 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 100 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 200 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 300 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 400 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 500 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 1000 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 1100 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 1200 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample. In some embodiments, about 1300 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample. In some embodiments, about 1400 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample. In some embodiments, at least 1500 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- about 1600 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample. In some embodiments, at least 1700 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample. In some embodiments, at least 1800 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample. In some embodiments, at least 1900 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- At least 2000 repetitive nucleic acid loci are used in determining the repetitive nucleic acid instability (e.g. MSI) status of a given sample.
- the repetitive nucleic acid loci can be microsatellite loci.
- the repetitive nucleic acid instability status can be MSI status
- the methods include obtaining the sample from a subject.
- a sample type is optionally utilized.
- the sample is tissue, blood, plasma, serum, sputum, urine, semen, vaginal fluid, feces, synovial fluid, spinal fluid, saliva, and/or the like. Additional exemplary sample types that are optionally utilized are described further herein.
- the subject is a mammalian subject (e.g., a human subject).
- any type of nucleic acid e.g., DNA and/or RNA
- cell-free nucleic acids e.g., cfDNA of tumor origin, fetal origin, maternal origin, and/or the like
- cellular nucleic acids including circulating tumor cells (e.g., obtained by lysing intact cells in a sample), circulating tumor nucleic acids, and the like.
- the sample comprises cell-free DNA (cfDNA sample).
- the cfDNA sample comprises circulating tumor nucleic acids.
- the methods disclosed in this application generally include obtaining sequence information from nucleic acids in samples taken from subjects.
- the sequence information is obtained from targeted segments of the nucleic acids.
- the targeted segments can include at least 10, at least 50, at least 100, at least 500, at least 1000, at least 2000, at least 5000, at least 10,000, at least 20,000 or at least 50, 000 (e.g., 25, 50, 75, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 25,000, 30,000, 35,000, 40,000, 45,000) different or overlapping genomic regions.
- the targeted segments comprise selected regions of at least 50, at least 60, at least 70, at least 80, at least 90, at least 100, at least 200, at least 300, at least 400, at least 500, at least 600 or at least 700 genes. In some embodiments, the targeted segments comprise selected regions of at least 70 genes. In some embodiments, the targeted segments comprise regions of at least 500 genes.
- the methods also typically include various sample or library preparation steps to prepare nucleic acids for sequencing.
- sample preparation techniques are well-known to persons skilled in the art. Essentially any of those techniques are used, or adapted for use, in performing the methods described herein.
- typical steps to prepare nucleic acids for sequencing include tagging nucleic acids with molecular identifiers or barcodes, adding adapters (e.g., which may include the barcodes), amplifying the nucleic acids one or more times, enriching for targeted segments of the nucleic acids (e.g., using various target capturing strategies, etc.), and/or the like.
- nucleic acid sample/library preparation is described further herein. Additional details regarding nucleic acid sample/library preparation are also described in, for example, van Dijk et al., Library preparation methods for next-generation sequencing: Tone down the bias , Experimental Cell Research, 322(l): 12-20 (2014), Micic (Ed.), Sample Preparation Techniques for Soil, Plant, and Animal Samples (Springer Protocols Handbooks) , 1 st Ed., Humana Press (2016), and Chiu, Next-Generation Sequencing and Sequence Data Analysis , Bentham Science Publishers (2016), which are each incorporated by reference in their entirety.
- Microsatellite and/or other repetitive nucleic acid instability status determined by the methods disclosed herein are optionally used to diagnose the presence of a disease or condition, particularly cancer, in a subject, to characterize such a disease or condition (e.g., to stage a given cancer, to determine the heterogeneity of a cancer, and the like), to monitor response to treatment, to evaluate the potential risk of developing a given disease or condition, and/or to assess the prognosis of the disease or condition.
- Microsatellite and/or other repetitive nucleic acid instability status are also optionally used for characterizing a specific form of cancer.
- microsatellite and/or other repetitive nucleic acid instability status data may allow for the characterization of specific sub-types of cancer to thereby assist with diagnosis and treatment selection.
- This information may also provide a subject or healthcare practitioner with clues regarding the prognosis of a specific type of cancer, and enable a subject and/or healthcare practitioner to adapt treatment options in accordance with the progress of the disease.
- Some cancers become more aggressive and genetically unstable as they progress. Other tumors remain benign, inactive or dormant.
- Microsatellite and/or other repetitive nucleic acid instability status can also be useful in determining disease progression and/or in monitoring recurrence.
- a successful treatment may initially increase the observed microsatellite and/or other repetitive nucleic acid instability as an increased number of cancer cells die and shed nucleic acids.
- the microsatellite and/or other repetitive nucleic acid instability will then typically decrease as the tumor continues to reduce in size.
- a successful treatment may also decrease microsatellite and/or other repetitive nucleic acid instability without an initial increase in such instability.
- microsatellite and/or other repetitive nucleic acid instability status may be used to monitor residual disease or recurrence of disease in a patient.
- a sample can be any biological sample isolated from a subject.
- Samples can include body tissues, whole blood, platelets, serum, plasma, stool, red blood cells, white blood cells or leucocytes, endothelial cells, tissue biopsies (e.g., biopsies from known or suspected solid tumors), cerebrospinal fluid, synovial fluid, lymphatic fluid, ascites fluid, interstitial or extracellular fluid (e.g., fluid from intercellular spaces), gingival fluid, crevicular fluid, bone marrow, pleural effusions, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat, urine. Samples are preferably body fluids, particularly blood and fractions thereof, and urine.
- Such samples include nucleic acids shed from tumors.
- the nucleic acids can include DNA and RNA and can be in double and single-stranded forms.
- a sample can be in the form originally isolated from a subject or can have been subjected to further processing to remove or add components, such as cells, enrich for one component relative to another, or convert one form of nucleic acid to another, such as RNA to DNA or single-stranded nucleic acids to double-stranded.
- a body fluid sample for analysis is plasma or serum containing cell-free nucleic acids, e.g., cell-free DNA (cfDNA).
- the sample volume of body fluid taken from a subject depends on the desired read depth for sequenced regions.
- Exemplary volumes are about 0.4-40 ml, about 5-20 ml, about 10-20 ml.
- the volume can be about 0.5 ml, about 1 ml, about 5 ml, about 10 ml, about 20 ml, about 30 ml, about 40 ml, or more milliliters.
- a volume of sampled plasma is typically between about 5 ml to about 20 ml.
- the sample can comprise various amounts of nucleic acid. Typically, the amount of nucleic acid in a given sample is equated with multiple genome equivalents. For example, a sample of about 30 ng DNA can contain about 10,000 (10 4 ) haploid human genome equivalents and, in the case of cfDNA, about 200 billion (2xlO n ) individual polynucleotide molecules. Similarly, a sample of about 100 ng of DNA can contain about 30,000 haploid human genome equivalents and, in the case of cfDNA, about 600 billion individual molecules. [0134] In some embodiments, a sample comprises nucleic acids from different sources, e.g., from cells and from cell-free sources (e.g., blood samples, etc.).
- sources e.g., from cells and from cell-free sources (e.g., blood samples, etc.).
- a sample includes nucleic acids carrying mutations.
- a sample optionally comprises DNA carrying germline mutations and/or somatic mutations.
- a sample comprises DNA carrying cancer-associated mutations (e.g., cancer-associated somatic mutations).
- Exemplary amounts of cell-free nucleic acids in a sample before amplification typically range from about 1 femtogram (fg) to about 1 microgram (pg), e.g., about 1 picogram (pg) to about 200 nanogram (ng), about 1 ng to about 100 ng, about 10 ng to about 1000 ng.
- a sample includes up to about 600 ng, up to about 500 ng, up to about 400 ng, up to about 300 ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up to about 20 ng of cell-free nucleic acid molecules.
- the amount is at least about 1 fg, at least about 10 fg, at least about 100 fg, at least about 1 pg, at least about 10 pg, at least about 100 pg, at least about 1 ng, at least about 10 ng, at least about 100 ng, at least about 150 ng, or at least about 200 ng of cell-free nucleic acid molecules.
- the amount is up to about 1 fg, about 10 fg, about 100 fg, about 1 pg, about 10 pg, about 100 pg, about 1 ng, about 10 ng, about 100 ng, about 150 ng, or about 200 ng of cell-free nucleic acid molecules.
- methods include obtaining between about 5 ng to about 30 ng of cell-free nucleic acid molecules from samples. In certain embodiments, methods include obtaining between about 5 ng to about 100 ng of cell-free nucleic acid molecules from samples. In certain embodiments, methods include obtaining between about 5 ng to about 150 ng of cell-free nucleic acid molecules from samples. In certain embodiments, methods include obtaining between about 5 ng to about 200 ng of cell-free nucleic acid molecules from samples. In some embodiments, the amount is up to about 100 ng of cell-free nucleic acid molecules from samples. In some embodiments, the amount is up to about 150 ng of cell-free nucleic acid molecules from samples.
- the amount is up to about 200 ng of cell-free nucleic acid molecules from samples. In some embodiments, the amount is up to about 250 ng of cell-free nucleic acid molecules from samples. In some embodiments, the amount is up to about 300 ng of cell- free nucleic acid molecules from samples. In some embodiments, methods include obtaining between about 1 fg to about 200 ng cell-free nucleic acid molecules from samples.
- Cell-free nucleic acids typically have a size distribution of between about 100 nucleotides in length and about 500 nucleotides in length, with molecules of about 110 nucleotides in length to about 230 nucleotides in length representing about 90% of molecules in the sample, with a mode of about 168 nucleotides in length and a second minor peak in a range between about 240 to about 440 nucleotides in length.
- cell- free nucleic acids are from about 160 to about 180 nucleotides in length, or from about 320 to about 360 nucleotides in length, or from about 440 to about 480 nucleotides in length.
- cell-free nucleic acids are isolated from bodily fluids through a partitioning step in which cell-free nucleic acids, as found in solution, are separated from intact cells and other non-soluble components of the bodily fluid.
- partitioning includes techniques such as centrifugation or filtration.
- cells in bodily fluids are lysed, and cell-free and cellular nucleic acids processed together.
- cell-free nucleic acids are precipitated with, for example, an alcohol.
- additional clean up steps are used, such as silica-based columns to remove contaminants or salts.
- Non-specific bulk carrier nucleic acids are optionally added throughout the reaction to optimize certain aspects of the exemplary procedure, such as yield.
- samples typically include various forms of nucleic acids including double-stranded DNA, single-stranded DNA and/or single- stranded RNA.
- single stranded DNA and/or single stranded RNA are converted to double stranded forms so that they are included in subsequent processing and analysis steps.
- the nucleic acid molecules may be tagged with sample indexes and/or molecular barcodes (referred to generally as“tags”) ⁇ Tags may be incorporated into or otherwise joined to adapters by chemical synthesis, ligation (e.g., blunt-end ligation or sticky -end ligation), or overlap extension polymerase chain reaction (PCR), among other methods.
- ligation e.g., blunt-end ligation or sticky -end ligation
- PCR overlap extension polymerase chain reaction
- Such adapters may be ultimately joined to the target nucleic acid molecule.
- one or more rounds of amplification cycles are generally applied to introduce sample indexes to a nucleic acid molecule using conventional nucleic acid amplification methods.
- the amplifications may be conducted in one or more reaction mixtures (e.g., a plurality of microwells in an array).
- Molecular barcodes and/or sample indexes may be introduced simultaneously, or in any sequential order.
- molecular barcodes and/or sample indexes are introduced prior to and/or after sequence capturing steps are performed.
- only the molecular barcodes are introduced prior to probe capturing and the sample indexes are introduced after sequence capturing steps are performed.
- both the molecular barcodes and the sample indexes are introduced prior to performing probe-based capturing steps.
- the sample indexes are introduced after sequence capturing steps are performed.
- molecular barcodes are incorporated to the nucleic acid molecules (e.g. cfDNA molecules) in a sample through adapters via ligation (e.g., blunt-end ligation or sticky-end ligation).
- sample indexes are incorporated to the nucleic acid molecules (e.g. cfDNA molecules) in a sample through overlap extension polymerase chain reaction (PCR).
- sequence capturing protocols involve introducing a single-stranded nucleic acid molecule complementary to a targeted nucleic acid sequence, e.g., a coding sequence of a genomic region and mutation of such region is associated with a cancer type.
- the tags may be located at one end or at both ends of the sample nucleic acid molecule.
- tags are predetermined or random or semi-random sequence oligonucleotides.
- the tags may be less than about 500, 200, 100, 50, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 nucleotides in length.
- the tags may be linked to sample nucleic acids randomly or non-randomly.
- each sample is uniquely tagged with a sample index or a combination of sample indexes.
- each nucleic acid molecule of a sample or sub-sample is uniquely tagged with a molecular barcode or a combination of molecular barcodes.
- a plurality of molecular barcodes may be used such that molecular barcodes are not necessarily unique to one another in the plurality (e.g., non-unique molecular barcodes).
- molecular barcodes are generally attached (e.g., by ligation) to individual molecules such that the combination of the molecular barcode and the sequence it may be attached to creates a unique sequence that may be individually tracked.
- Detection of non-uniquely tagged molecular barcodes in combination with endogenous sequence information typically allows for the assignment of a unique identity to a particular molecule.
- the length, or number of base pairs, of an individual sequence read are also optionally used to assign a unique identity to a given molecule.
- fragments from a single strand of nucleic acid having been assigned a unique identity may thereby permit subsequent identification of fragments from the parent strand, and/or a complementary strand.
- molecular barcodes are introduced at an expected ratio of a set of identifiers (e.g., a combination of unique or non-unique molecular barcodes) to molecules in a sample.
- a set of identifiers e.g., a combination of unique or non-unique molecular barcodes
- One example format uses from about 2 to about 1,000,000 different molecular barcodes, or from about 5 to about 150 different molecular barcodes, or from about 20 to about 50 different molecular barcodes. Alternatively, from about 25 to about 1,000,000 different molecular barcodes may be used.
- the molecular barcodes can be ligated to both ends of a target molecule. For example, 20-50 x 20-50 molecular barcodes can be used. In some embodiments, 20-50 different molecular barcodes can be used.
- 5-100 different molecular barcodes can be used, In some embodiments, 5-150 molecular barcodes can be used. In some embodiments, 5-200 different molecular barcodes can be used. Such numbers of identifiers are typically sufficient for different molecules having the same start and stop points to have a high probability (e.g., at least 94%, 99.5%, 99.99%, or 99.999%) of receiving different combinations of identifiers. In some embodiments, about 80%, about 90%, about 95%, or about 99% of molecules have the same combinations of molecular barcodes.
- the assignment of unique or non-unique molecular barcodes in reactions is performed using methods and systems described in, for example, U.S. Patent Application Nos. 20010053519, 20030152490, and 20110160078, and U.S. PatentNos. 6,582,908, 7,537,898, 9,598,731, and 9,902,992, each of which is hereby incorporated by reference in its entirety.
- different nucleic acid molecules of a sample may be identified using only endogenous sequence information (e.g., start and/or stop positions, sub-sequences of one or both ends of a sequence, and/or lengths).
- Sample nucleic acids flanked by adapters are typically amplified by PCR and other amplification methods using nucleic acid primers binding to primer binding sites in adapters flanking a DNA molecule to be amplified.
- amplification methods involve cycles of extension, denaturation and annealing resulting from thermocycling, or can be isothermal as, for example, in transcription mediated amplification.
- Other exemplary amplification methods that are optionally utilized include the ligase chain reaction, strand displacement amplification, nucleic acid sequence-based amplification, and self-sustained sequence-based replication, among other approaches.
- One or more rounds of amplification cycles are generally applied to introduce sample indexes to a nucleic acid molecule using conventional nucleic acid amplification methods.
- the amplifications are typically conducted in one or more reaction mixtures.
- Molecular tags and sample indexes/tags are optionally introduced simultaneously, or in any sequential order.
- molecular tags and sample indexes/tags are introduced prior to and/or after nucleic acid molecule capturing steps (i.e., nucleic acid enrichment) are performed.
- only the molecular tags are introduced prior to probe capturing and the sample indexes/tags are introduced after sequence capturing steps are performed.
- both the molecular tags and the sample indexes/tags are introduced prior to performing probe-based capturing steps.
- the sample indexes/tags are introduced after sequence capturing steps are performed.
- sequence capturing protocols involve introducing a single-stranded nucleic acid molecule complementary to a targeted nucleic acid sequence, e.g., a coding sequence of a genomic region and mutation of such region associated with a cancer type.
- the amplification reactions generate a plurality of non-uniquely or uniquely tagged nucleic acid amplicons with molecular tags and sample indexes/tags at size ranging from about 200 nucleotides (nt) to about 700 nt, from 250 nt to about 350 nt, or from about 320 nt to about 550 nt.
- the amplicons have a size of about 300 nt. In some embodiments, the amplicons have a size of about 500 nt.
- sequences are enriched prior to sequencing the nucleic acids. Enrichment is optionally performed for specific target regions (“target sequences”) ⁇
- targeted regions of interest may be enriched with nucleic acid capture probes ("baits") selected for one or more bait set panels using a differential tiling and capture scheme.
- a differential tiling and capture scheme generally uses bait sets of different relative concentrations to differentially tile (e.g., at different "resolutions") across genomic regions associated with the baits, subject to a set of constraints (e.g., sequencer constraints such as sequencing load, utility of each bait, etc.), and capture the targeted nucleic acids at a desired level for downstream sequencing.
- targeted genomic regions of interest optionally include natural or synthetic nucleotide sequences of the nucleic acid construct.
- biotin-labeled beads with probes to one or more regions of interest can be used to capture target sequences, and optionally followed by amplification of those regions, to enrich for the regions of interest.
- Sequence capture typically involves the use of oligonucleotide probes that hybridize to the target nucleic acid sequence.
- a probe set strategy involves tiling the probes across a region of interest.
- Such probes can be, for example, from about 60 to about 120 nucleotides in length.
- the set can have a depth of about 2x, 3x, 4x, 5x, 6x, 8x, 9x, lOx, 15x, 20x, 30x, 40x, 50x or more than 50x.
- the effectiveness of sequence capture generally depends, in part, on the length of the sequence in the target molecule that is complementary (or nearly complementary) to the sequence of the probe.
- Sample nucleic acids, optionally flanked by adapters, with or without prior amplification are generally subject to sequencing.
- Sequencing methods or commercially available formats that are optionally utilized include, for example, Sanger sequencing, high-throughput sequencing, pyrosequencing, sequencing-by-synthesis, single-molecule sequencing, nanopore-based sequencing, semiconductor sequencing, sequencing-by-ligation, sequencing-by-hybridization, RNA-Seq (Illumina), Digital Gene Expression (Helicos), next generation sequencing (NGS), Single Molecule Sequencing by Synthesis (SMSS) (Helicos), massively-parallel sequencing, Clonal Single Molecule Array (Solexa), shotgun sequencing, Ion Torrent, Oxford Nanopore, Roche Genia, Maxim-Gilbert sequencing, primer walking, sequencing using PacBio, SOLiD, Ion Torrent, or nanopore platforms. Sequencing reactions can be performed in a variety of sample processing units, which may include multiple lanes, multiple channels, multiple wells,
- the sequencing reactions can be performed on one more nucleic acid fragment types or regions known to contain markers (e.g., microsatellites and/or other repetitive nucleic acid elements) of cancer or of other diseases.
- the sequencing reactions can also be performed on any nucleic acid fragment present in the sample.
- the sequence reactions may provide for sequence coverage of the genome of at least about 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9% or 100% of the genome. In other cases, sequence coverage of the genome may be less than about 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9% or 100% of the genome.
- sequence coverage of the genome may be less than about 0.01%, 0.02%, 0.05%, 0.1%, 0.2%, 0.5%, 1%, 2% or 5% of the genome. In some embodiments, sequence coverage of the genome may be less than about 0.01% of the genome. In some embodiments, sequence coverage of the genome may be less than about 0.02% of the genome. In some embodiments, sequence coverage of the genome may be less than about 0.05% of the genome. In some embodiments, sequence coverage of the genome may be less than about 0.1% of the genome. In some embodiments, sequence coverage of the genome may be less than about 0.2% of the genome. In some embodiments, sequence coverage of the genome may be less than about 0.5% of the genome.
- sequence coverage of the genome may be less than about 1% of the genome. In some embodiments, sequence coverage of the genome may be less than about 2% of the genome. In some embodiments, sequence coverage of the genome may be less than about 5% of the genome. In some embodiments, sequence coverage of the genome may be at least about 5% of the genome. In some embodiments, sequence coverage of the genome may be at least about 10% of the genome. In some embodiments, sequence coverage of the genome may be at least about 20% of the genome.
- Simultaneous sequencing reactions may be performed using multiplex sequencing techniques.
- cell-free polynucleotides are sequenced with at least about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100,000 sequencing reactions.
- cell-free polynucleotides are sequenced with less than about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100,000 sequencing reactions. Sequencing reactions are typically performed sequentially or simultaneously. Subsequent data analysis is generally performed on all or part of the sequencing reactions.
- data analysis is performed on at least about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100,000 sequencing reactions. In other embodiments, data analysis may be performed on less than about 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100,000 sequencing reactions.
- An exemplary read depth is from about 1000 to about 50000 reads per locus (base position) or >50,000 reads per locus.
- a nucleic acid population is prepared for sequencing by enzymatically forming blunt-ends on double-stranded nucleic acids with single-stranded overhangs at one or both ends.
- the population is typically treated with an enzyme having a 5’ -3’ DNA polymerase activity and a 3’ -5’ exonuclease activity in the presence of the nucleotides (e.g., A, C, G and T or U) in dNTP form.
- Exemplary enzymes or catalytic fragments thereof that are optionally used include Klenow large fragment and T4 polymerase.
- the enzyme typically extends the recessed 3’ end on the opposing strand until it is flush with the 5’ end to produce a blunt end.
- the enzyme generally digests from the 3’ end up to and sometimes beyond the 5’ end of the opposing strand. If this digestion proceeds beyond the 5’ end of the opposing strand, the gap can be filled in by an enzyme having the same polymerase activity that is used for 5’ overhangs.
- the formation of blunt- ends on double-stranded nucleic acids facilitates, for example, the attachment of adapters and subsequent amplification.
- nucleic acid populations are subject to additional processing, such as the conversion of single-stranded nucleic acids to double-stranded and/or conversion of RNA to DNA. These forms of nucleic acid are also optionally linked to adapters and amplified.
- nucleic acids subject to the process of forming blunt-ends described above, and optionally other nucleic acids in a sample can be sequenced to produce sequenced nucleic acids.
- a sequenced nucleic acid can refer either to the sequence of a nucleic acid (i.e., sequence information) or a nucleic acid whose sequence has been determined. Sequencing can be performed so as to provide sequence data of individual nucleic acid molecules in a sample either directly or indirectly from a consensus sequence of amplification products of an individual nucleic acid molecule in the sample.
- double-stranded nucleic acids with single-stranded overhangs in a sample after blunt-end formation are linked at both ends to adapters including barcodes, and the sequencing determines nucleic acid sequences as well as in-line barcodes introduced by the adapters.
- the blunt-end DNA molecules are optionally ligated to a blunt end of an at least partially double-stranded adapter (e.g., a Y shaped or bell-shaped adapter).
- blunt ends of sample nucleic acids and adapters can be tailed with complementary nucleotides to facilitate ligation (e.g., sticky end ligation).
- the nucleic acid sample is typically contacted with a sufficient number of adapters such that there is a low probability (e.g., ⁇ 1 or ⁇ 0.1 %) that any two identical same nucleic acids receive the same combination of adapter barcodes from the adapters linked at both ends.
- a sufficient number of adapters such that there is a low probability (e.g., ⁇ 1 or ⁇ 0.1 %) that any two identical same nucleic acids receive the same combination of adapter barcodes from the adapters linked at both ends.
- the use of adapters in this manner permits identification of families of nucleic acid sequences with the same start and stop points on a reference nucleic acid and linked to the same combination of barcodes. Such a family represents sequences of amplification products of a nucleic acid in the sample before amplification.
- sequences of family members can be compiled to derive consensus nucleotide(s) or a complete consensus sequence for a nucleic acid molecule in the original sample, as modified by blunt end formation and adapter attachment.
- the nucleotide occupying a specified position of a nucleic acid in the sample is determined to be the consensus of nucleotides occupying that corresponding position in family member sequences.
- Families can include sequences of one or both strands of a double-stranded nucleic acid.
- members of a family include sequences of both strands from a double-stranded nucleic acid, sequences of one strand are converted to their complement for purposes of compiling all sequences to derive consensus nucleotide(s) or sequences.
- Some families include only a single member sequence. In this case, this sequence can be taken as the sequence of a nucleic acid in the sample before amplification. Alternatively, families with only a single member sequence can be eliminated from subsequent analysis.
- Nucleotide variations in sequenced nucleic acids can be determined by comparing sequenced nucleic acids with a reference sequence.
- the reference sequence is often a known sequence, e.g., a known whole or partial genome sequence from a subject (e.g., a whole genome sequence of a human subject).
- the reference sequence can be, for example, hG19 or hG38.
- the sequenced nucleic acids can represent sequences determined directly for a nucleic acid in a sample, or a consensus of sequences of amplification products of such a nucleic acid, as described above. A comparison can be performed at one or more designated positions on a reference sequence.
- a subset of sequenced nucleic acids can be identified including a position corresponding with a designated position of the reference sequence when the respective sequences are maximally aligned. Within such a subset it can be determined which, if any, sequenced nucleic acids include a nucleotide variation at the designated position, and optionally which if any, include a reference nucleotide (i.e., same as in the reference sequence). If the number of sequenced nucleic acids in the subset including a nucleotide variant exceeding a selected threshold, then a variant nucleotide can be called at the designated position.
- the threshold can be a simple number, such as at least 1, 2, 3, 4, 5, 6, 7, 9, or 10 sequenced nucleic acids within the subset including the nucleotide variant or it can be a ratio, such as a least 0.5, 1, 2, 3, 4, 5, 10, 15, or 20 of sequenced nucleic acids within the subset that include the nucleotide variant, among other possibilities.
- the comparison can be repeated for any designated position of interest in the reference sequence. Sometimes a comparison can be performed for designated positions occupying at least about 20, 100, 200, or 300 contiguous positions on a reference sequence, e.g., about 20-500, or about 50-300 contiguous positions.
- nucleic acid sequencing includes the formats and applications described herein. Additional details regarding nucleic acid sequencing, including the formats and applications described herein are also provided in, for example, Levy et al., Annual Review of Genomics and Human Genetics, 17: 95-115 (2016), Liu et al., J. of Biomedicine and Biotechnology, Volume 2012, Article ID 251364: 1-11 (2012), Voelkerding et al., Clinical Chem., 55: 641-658 (2009), MacLean et al., Nature Rev. Microbiol., 7: 287-296 (2009), Astier et al., J Am Chem Soc., 128(5):1705-10 (2006), U.S. Pat. No. 6,210,891, U.S. Pat. No. 6,258,568, U.S.
- the test subject’s microsatellite and/or other repetitive nucleic acid instability status and comparator results are measured across, for example, the entire genome or entire exome, whereas in other embodiments, those markers are measured based, for example, upon a subset or targeted regions of the genome or exome, which are optionally extrapolated to determine, for example, microsatellite instability for the whole genome or whole exome.
- test subject microsatellite and/or other repetitive nucleic acid instability status and comparator microsatellite and/or other repetitive nucleic acid instability status are measured by determining the mutational count or load in a predetermined or selected set of genes or genomic regions. Essentially any gene (e.g., oncogene) is optionally selected for such analysis.
- the selected genes or genomic regions include at least about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1,500, 2,000 or more selected genes or genomic regions.
- the selected genes or genomic regions optionally include one or more genes listed in Table 1.
- the methods and systems disclosed herein are used to identity customized therapies to treat a given disease, disorder or condition in patients.
- the disease under consideration is a type of cancer.
- cancers include biliary tract cancer, bladder cancer, transitional cell carcinoma, urothelial carcinoma, brain cancer, gliomas, astrocytomas, breast carcinoma, metaplastic carcinoma, cervical cancer, cervical squamous cell carcinoma, rectal cancer, colorectal carcinoma, colon cancer, hereditary nonpolyposis colorectal cancer, colorectal adenocarcinomas, gastrointestinal stromal tumors (GISTs), endometrial carcinoma, endometrial stromal sarcomas, esophageal cancer, esophageal squamous cell carcinoma, esophageal adenocarcinoma, ocular melanoma, uveal melanoma, gallbladder carcinomas, gallbladder adeno
- GISTs gastrointestinal stromal tumors
- Prostate cancer prostate adenocarcinoma, skin cancer, melanoma, malignant melanoma, cutaneous melanoma, small intestine carcinomas, stomach cancer, gastric carcinoma, gastrointestinal stromal tumor (GIST), uterine cancer, or uterine sarcoma.
- Non-limiting examples of other genetic -based diseases, disorders, or conditions that are optionally evaluated using the methods and systems disclosed herein include achondroplasia, alpha- 1 antitrypsin deficiency, antiphospholipid syndrome, autism, autosomal dominant polycystic kidney disease, Charcot-Marie-Tooth (CMT), cri du chat, Crohn's disease, cystic fibrosis, Dercum disease, down syndrome, Duane syndrome, Duchenne muscular dystrophy, Factor V Leiden thrombophilia, familial hypercholesterolemia, familial mediterranean fever, fragile X syndrome, Gaucher disease, hemochromatosis, hemophilia, holoprosencephaly, Huntington's disease, Klinefelter syndrome, Marfan syndrome, myotonic dystrophy, neurofibromatosis, Noonan syndrome, osteogenesis imperfecta, Parkinson's disease, phenylketonuria, Poland anomaly, porphyria, progeria, retin
- the methods disclosed herein relate to identifying and administering customized therapies to patients having a given microsatellite and/or other repetitive nucleic acid instability status.
- any cancer therapy e.g., surgical therapy, radiation therapy, chemotherapy, and/or the like
- customized therapies include at least one immunotherapy (or an immunotherapeutic agent).
- Immunotherapy refers generally to methods of enhancing an immune response against a given cancer type.
- immunotherapy refers to methods of enhancing a T cell response against a tumor or cancer.
- the immunotherapy or immunotherapeutic agents targets an immune checkpoint molecule.
- Certain tumors are able to evade the immune system by co-opting an immune checkpoint pathway.
- targeting immune checkpoints has emerged as an effective approach for countering a tumor’s ability to evade the immune system and activating anti-tumor immunity against certain cancers. Pardoll, Nature Reviews Cancer , 2012, 12:252-264.
- the immune checkpoint molecule is an inhibitory molecule that reduces a signal involved in the T cell response to antigen.
- CTLA4 is expressed on T cells and plays a role in downregulating T cell activation by binding to CD80 (aka B7.1) or CD86 (aka B7.2) on antigen presenting cells.
- PD- 1 is another inhibitory checkpoint molecule that is expressed on T cells. PD- 1 limits the activity of T cells in peripheral tissues during an inflammatory response.
- the ligand for PD- 1 (PD-L 1 or PD-L2) is commonly upregulated on the surface of many different tumors, resulting in the downregulation of anti-tumor immune responses in the tumor microenvironment.
- the inhibitory immune checkpoint molecule is CTLA4 or PD- 1.
- the inhibitory immune checkpoint molecule is a ligand for PD-1, such as PD-L1 or PD-L2.
- the inhibitory immune checkpoint molecule is a ligand for CTLA4, such as CD80 or CD86.
- the inhibitory immune checkpoint molecule is lymphocyte activation gene 3 (LAG3), killer cell immunoglobulin like receptor (KIR), T cell membrane protein 3 (TIM3), galectin 9 (GAL9), or adenosine A2a receptor (A2aR).
- the immunotherapy or immunotherapeutic agent is an antagonist of an inhibitory immune checkpoint molecule.
- the inhibitory immune checkpoint molecule is PD-1.
- the inhibitory immune checkpoint molecule is PD-L1.
- the antagonist of the inhibitory immune checkpoint molecule is an antibody (e.g., a monoclonal antibody).
- the antibody or monoclonal antibody is an anti-CTLA4, anti-PD-1, anti-PD-Ll, or anti-PD-L2 antibody.
- the antibody is a monoclonal anti-PD-1 antibody.
- the antibody is a monoclonal anti-PD-Ll antibody.
- the monoclonal antibody is a combination of an anti-CTLA4 antibody and an anti-PD-1 antibody, an anti-CTLA4 antibody and an anti-PD-Ll antibody, or an anti-PD-Ll antibody and an anti-PD-1 antibody.
- the anti-PD-1 antibody is one or more of pembrolizumab (Keytruda®) or nivolumab (Opdivo®).
- the anti- CTLA4 antibody is ipilimumab (Yervoy®).
- the anti-PD-Ll antibody is one or more of atezolizumab (Tecentriq®), avelumab (Bavencio®), or durvalumab (Imfinzi®).
- the immunotherapy or immunotherapeutic agent is an antagonist (e.g. antibody) against CD80, CD86, LAG3, KIR, TIM3, GAL9, or A2aR.
- the antagonist is a soluble version of the inhibitory immune checkpoint molecule, such as a soluble fusion protein comprising the extracellular domain of the inhibitory immune checkpoint molecule and an Fc domain of an antibody.
- the soluble fusion protein comprises the extracellular domain of CTLA4, PD-1, PD-L1, or PD-L2.
- the soluble fusion protein comprises the extracellular domain of CD80, CD86, LAG3, KIR, TIM3, GAL9, or A2aR.
- the soluble fusion protein comprises the extracellular domain of PD-L2 or LAG3.
- the immune checkpoint molecule is a co-stimulatory molecule that amplifies a signal involved in a T cell response to an antigen.
- CD28 is a co-stimulatory receptor expressed on T cells.
- CD80 aka B7.1
- CD86 aka B7.2
- CTLA4 is able to counteract or regulate the co-stimulatory signaling mediated by CD28.
- the immune checkpoint molecule is a costimulatory molecule selected from CD28, inducible T cell co-stimulator (ICOS), CD137, 0X40, or CD27.
- the immune checkpoint molecule is a ligand of a co-stimulatory molecule, including, for example, CD80, CD86, B7RP1, B7-H3, B7-H4, CD137L, OX40L, or CD70.
- the immunotherapy or immunotherapeutic agent is an agonist of a co-stimulatory checkpoint molecule.
- the agonist of the co-stimulatory checkpoint molecule is an agonist antibody and preferably is a monoclonal antibody.
- the agonist antibody or monoclonal antibody is an anti-CD28 antibody.
- the agonist antibody or monoclonal antibody is an anti-ICOS, anti-CD137, anti-OX40, or anti-CD27 antibody.
- the agonist antibody or monoclonal antibody is an anti-CD80, anti-CD86, anti-B7RPl, anti-B7-H3, anti-B7-H4, anti-CD137L, anti-OX40L, or anti-CD70 antibody.
- the customized therapies described herein are typically administered parenterally (e.g., intravenously or subcutaneously).
- Pharmaceutical compositions containing the immunotherapeutic agent are typically administered intravenously.
- Certain therapeutic agents are administered orally.
- customized therapies e.g., immunotherapeutic agents, etc.
- the present disclosure also provides various systems and computer program products or machine readable media.
- the methods described herein are optionally performed or facilitated at least in part using systems, distributed computing hardware and applications (e.g., cloud computing services), electronic communication networks, communication interfaces, computer program products, machine readable media, electronic storage media, software (e.g., machine-executable code or logic instructions) and/or the like.
- Figure 2 provides a schematic diagram of an exemplary system suitable for use with implementing at least aspects of the methods disclosed in this application.
- system 200 includes at least one controller or computer, e.g., server 202 (e.g., a search engine server), which includes processor 204 and memory, storage device, or memory component 206, and one or more other communication devices 214 and 216 (e.g., client-side computer terminals, telephones, tablets, laptops, other mobile devices, etc.) positioned remote from and in communication with the remote server 202, through electronic communication network 212, such as the internet or other internetwork.
- server 202 e.g., a search engine server
- server 202 e.g., a search engine server
- Communication devices 214 and 216 typically include an electronic display (e.g., an internet enabled computer or the like) in communication with, e.g., server 202 computer over network 212 in which the electronic display comprises a user interface (e.g., a graphical user interface (GUI), a web-based user interface, and/or the like) for displaying results upon implementing the methods described herein.
- a user interface e.g., a graphical user interface (GUI), a web-based user interface, and/or the like
- communication networks also encompass the physical transfer of data from one location to another, for example, using a hard drive, thumb drive, or other data storage mechanism.
- System 200 also includes program product 208 stored on a computer or machine readable medium, such as, for example, one or more of various types of memory, such as memory 206 of server 202, that is readable by the server 202, to facilitate, for example, a guided search application or other executable by one or more other communication devices, such as 214 (schematically shown as a desktop or personal computer) and 216 (schematically shown as a tablet computer).
- system 200 optionally also includes at least one database server, such as, for example, server 210 associated with an online website having data stored thereon (e.g., control sample or comparator result data, indexed customized therapies, etc.) searchable either directly or through search engine server 202.
- System 200 optionally also includes one or more other servers positioned remotely from server 202, each of which are optionally associated with one or more database servers 210 located remotely or located local to each of the other servers.
- the other servers can beneficially provide service to geographically remote users and enhance geographically distributed operations.
- memory 206 of the server 202 optionally includes volatile and/or nonvolatile memory including, for example, RAM, ROM, and magnetic or optical disks, among others. It is also understood by those of ordinary skill in the art that although illustrated as a single server, the illustrated configuration of server 202 is given only by way of example and that other types of servers or computers configured according to various other methodologies or architectures can also be used.
- Server 202 shown schematically in Figure 2 represents a server or server cluster or server farm and is not limited to any individual physical server. The server site may be deployed as a server farm or server cluster managed by a server hosting provider. The number of servers and their architecture and configuration may be increased based on usage, demand and capacity requirements for the system 200.
- network 212 can include an internet, intranet, a telecommunication network, an extranet, or world wide web of a plurality of computers/servers in communication with one or more other computers through a communication network, and/or portions of a local or other area network.
- exemplary program product or machine readable medium 208 is optionally in the form of microcode, programs, cloud computing format, routines, and/or symbolic languages that provide one or more sets of ordered operations that control the functioning of the hardware and direct its operation.
- Program product 208 according to an exemplary embodiment, also need not reside in its entirety in volatile memory, but can be selectively loaded, as necessary, according to various methodologies as known and understood by those of ordinary skill in the art.
- the term "computer-readable medium” or“machine-readable medium” refers to any medium that participates in providing instructions to a processor for execution.
- computer-readable medium encompasses distribution media, cloud computing formats, intermediate storage media, execution memory of a computer, and any other medium or device capable of storing program product 208 implementing the functionality or processes of various embodiments of the present disclosure, for example, for reading by a computer.
- a "computer-readable medium” or “machine-readable medium” may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
- Non-volatile media includes, for example, optical or magnetic disks.
- Volatile media includes dynamic memory, such as the main memory of a given system.
- Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise a bus.
- Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications, among others.
- Exemplary forms of computer-readable media include a floppy disk, a flexible disk, hard disk, magnetic tape, a flash drive, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.
- Program product 208 is optionally copied from the computer-readable medium to a hard disk or a similar intermediate storage medium.
- program product 208, or portions thereof, are to be ran, it is optionally loaded from their distribution medium, their intermediate storage medium, or the like into the execution memory of one or more computers, configuring the computer(s) to act in accordance with the functionality or method of various embodiments. All such operations are well known to those of ordinary skill in the art of, for example, computer systems.
- this application provides systems that include one or more processors, and one or more memory components in communication with the processor.
- the memory component typically includes one or more instructions that, when executed, cause the processor to provide information that causes sequence information, microsatellite and/or other repetitive nucleic acid instability status, comparator results, customized therapies, and/or the like to be displayed (e.g., via communication devices 214, 216, or the like) and/or receive information from other system components and/or from a system user (e.g., via communication devices 214, 216, or the like).
- program product 208 includes non-transitory computer-executable instructions which, when executed by electronic processor 204 perform at least: (i) receiving sequence information from a population of microsatellite loci in a sample, (ii) quantifying a number of different repeat lengths present at each of a plurality of the microsatellite loci from the sequence information to generate a site score for each of the plurality of the microsatellite loci, (iii) comparing the site score of a given microsatellite locus to a site specific trained threshold for the given microsatellite locus for each of the plurality of the microsatellite loci, (iv) calling the given microsatellite locus as being unstable when the site score of the given microsatellite locus exceeds the site specific trained threshold for the given microsatellite locus to generate a microsatellite instability score comprising a number of unstable microsatellite loci from the plurality of the microsatellite
- System 200 also typically includes additional system components that are configured to perform various aspects of the methods described herein.
- one or more of these additional system components are positioned remote from and in communication with the remote server 202 through electronic communication network 212, whereas in other embodiments, one or more of these additional system components are positioned local, and in communication with server 202 (i.e., in the absence of electronic communication network 212) or directly with, for example, desktop computer 214.
- sample preparation component 218 is operably connected (directly or indirectly (e.g., via electronic communication network 212)) to controller 202.
- Sample preparation component 218 is configured to prepare the nucleic acids in samples (e.g., prepare libraries of nucleic acids) to be amplified and/or sequenced by a nucleic acid amplification component (e.g., a thermal cycler, etc.) and/or a nucleic acid sequencer.
- a nucleic acid amplification component e.g., a thermal cycler, etc.
- sample preparation component 218 is configured to isolate nucleic acids from other components in a sample, to attach one or adapters comprising barcodes to nucleic acids as described herein, selectively enrich one or more regions from a genome or transcriptome prior to sequencing, and/or the like.
- system 200 also includes nucleic acid amplification component 220 (e.g., a thermal cycler, etc.) operably connected (directly or indirectly (e.g., via electronic communication network 212)) to controller 202.
- Nucleic acid amplification component 220 is configured to amplify nucleic acids in samples from subjects.
- nucleic acid amplification component 220 is optionally configured to amplify selectively enriched regions from a genome or transcriptome in the samples as described herein.
- System 200 also typically includes at least one nucleic acid sequencer 222 operably connected (directly or indirectly (e.g., via electronic communication network 212)) to controller 202.
- Nucleic acid sequencer 222 is configured to provide the sequence information from nucleic acids (e.g., amplified nucleic acids) in samples from subjects.
- nucleic acid sequencer 222 is optionally configured to perform pyrosequencing, single-molecule sequencing, nanopore sequencing, semiconductor sequencing, sequencing-by-synthesis, sequencing-by-ligation, sequencing-byhybridization, or other techniques on the nucleic acids to generate sequencing reads.
- nucleic acid sequencer 222 is configured to group sequence reads into families of sequence reads, each family comprising sequence reads generated from a nucleic acid in a given sample.
- nucleic acid sequencer 222 uses a clonal single molecule array derived from the sequencing library to generate the sequencing reads.
- nucleic acid sequencer 222 includes at least one chip having an array of microwells for sequencing a sequencing library to generate sequencing reads.
- system 200 typically also includes material transfer component 224 operably connected (directly or indirectly (e.g., via electronic communication network 212)) to controller 202.
- Material transfer component 224 is configured to transfer one or more materials (e.g., nucleic acid samples, amplicons, reagents, and/or the like) to and/or from nucleic acid sequencer 222, sample preparation component 218, and nucleic acid amplification component 220.
- MSI high (MSI-H) samples were computationally simulated with variable tumor fractions and numbers of unstable sites using non-tumor samples as background. The distribution observed in a cohort of 3000 samples of different cancer types was used as a prior for the number of unstable sites. This analysis demonstrated a sensitivity of 94% at limit of detection (LoD) of 0.2% of tumor content. The expected specificity of the method to determine MSI status according to an embodiment described herein on non-tumor donor samples was 99.999%. Comparison of these results against standard or conventional PCR-based MSI assessment showed 100% concordance across a tumor content range of 1.4%-15%.
- MSI-H 145 microsatellite stable (MSS)
- MSI calls generated according to an embodiment described herein showed 100% concordance with the standard PCR-based MSI assessment.
- Digital Sequencing clinical platform is an NGS panel of cancer-related genes utilizing high-quality sequencing of cell-free DNA (which could comprise circulating tumor DNA) isolated from a simple, non-invasive blood draw.
- Digital Sequencing employs pre-sequencing preparation of a digital library of individually tagged cfDNA molecules combined with post-sequencing bioinformatic reconstruction to eliminate nearly all false positives. Sequence information was obtained using targeted sequencing of cfDNA in the samples. Sites scores (dAIC) were determined for 61 of the most informative micro satellite loci in each sample. The tumor fraction of samples ranged from 0.5% to 15%.
- the site scores were compared with corresponding site specific trained thresholds for each sample to identify the number of unstable microsatellite loci in each sample.
- the number of unstable microsatellite loci identified in a given sample was used as the microsatellite instability score (i.e., MSI sample score) for that particular sample.
- a population trained threshold for the 61 microsatellite loci in the samples was determined in which a microsatellite instability score that was greater than or equal to five was predicted to classify a sample as being MSI-High (MSI-H), whereas a micro satellite instability score that was less than or equal to four was predicted to classify a sample as being microsatellite stable (MSS).
- MSI-H MSI-High
- MSS microsatellite stable
- Microsatellite instability is a guideline-recommended biomarker with prognostic significance in a variety of tumor types as well as predictive significance for treatment with immune checkpoint inhibitors.
- microsatellite instability detection has relied on testing tumor tissue by PCR or immunohistochemistry .
- next generation sequencing (NGS) methods have been developed that also rely on availability of tumor tissue.
- NGS next generation sequencing
- a plasma-based MSI detection method could provide non-invasive, realtime assessment of MSI status.
- Guardant Health s large panel cell-free DNA (cfDNA) NGS assay assesses 500 cancer-associated genes to identify genomic alterations and tumor mutation burden (TMB).
- the panel can detect microsatellite instability high (MSI-high) status based on somatic changes in >1,000 MSI sites.
- MSI-high microsatellite instability high
- the analytical validation presented in this example has four main components that serve to determine the performance of the 500 cancer-associated gene cfDNA NGS assay for MSI-high detection: accuracy, limit of detection (LoD), precision, and Limit of Blank (LoB).
- Accuracy analysis used 258 samples from 3 sources with MSI status predicted by 500 cancer- associated gene cfDNA NGS assay compared with truth based on tissue MSI status determined with an orthogonal method.
- 36 collaborator samples with tissue MSI status (truth: tissue MSI status), 121 healthy donors (truth: microsatellite stable, MSS), and 101 samples sequenced by 500 cancer-associated gene cfDNA NGS assay (large panel assay) and 73 cancer-associated gene cfDNA NGS assay (small panel assay, MSI status as tmth) were used.
- Reproducibility and repeatability analysis used 2 sets of replicates (56 replicates in total). MSI status and MSI scores were compared within mn and between run. LoD was obtained both by simulation. LoB was calculated using healthy donor samples and known MSS samples.
- MSI-High 13 MSI high samples based on tissue MSI status
- 12 12 were called MSI-High by the large panel assay. All MSS/MSI-low(MSI-L) samples were correctly detected (Table 2).
- MSI-high the 12 samples detected as MSI-high by the large panel assay also met the small panel assay thresholds for being called MSI-high.
- MSI numeric scores are ⁇ 4 within and between runs ( Figure 4A). 10 MSS/MSI low samples have 2-3 replicates (32 replicates in total) tested in the same flow cell. All replicates are detected as MSS/MSI low with 500 cancer-associated gene cfDNA NGS assay. MSI scores are ⁇ 3 within each sample
- MSI scores are plotted vs max mutant allele fraction (MAF) of somatic calls ( Figure 5 A and B) in more than 2,000 large panel assay samples showing that tumor fraction (as measured by MAF) did not correlate with MSI status.
- MSI high detection with 500 cancer-associated gene cfDNA NGS assay showed high sensitivity (> 90%) and specificity (100%). Repeatability and reproducibility within and across mns were high. LoD of MSI high detection was 0.1% MAF. LoB study showed the false positive rate of 0%.
- 500 cancer-associated gene cfDNA NGS assay provides a reliable prediction of MSI high status with cfDNA, which will give treatment values to physicians, without the need of tissue samples.
- Microsatellite instability is a National Comprehensive Cancer Network (NCCN) clinical practice guidelines-recommended biomarker in at least nine cancer types— cervical, cholangiocarcinoma, colorectal, endometrial, esophageal and esophagogastric, gastric, ovarian, pancreatic, and prostate cancers (1-9)— due to its importance as a predictive biomarker for response to immune checkpoint blockade (ICB) as exemplified by pancancer approval of pembrolizumab (10, 11). Detection of MSI in a patient with advanced cancer can also alert the clinician to evaluate the patient’s asymptomatic family members for hereditary cancer risk.
- NCCN National Comprehensive Cancer Network
- MSI is the archetypical manifestation of defective DNA mismatch repair (dMMR), which leads to dramatically increased mutation rates throughout the genome, including gain and/or loss of nucleotides within repeating motifs known as microsatellite tracts, from which the entity derives its name.
- MSI is most prevalent in endometrial, colorectal, and gastroesophageal cancers, where it can be a sequela of sporadic mutations in MMR- related genes or a manifestation of Lynch syndrome, a hereditary cancer predisposition syndrome most commonly caused by germline mutations in MLH1, MSH2, MSH6, PMS2, or EPCAM (12).
- landscape analyses have shown that MSI also occurs at non-negligible rates in most other solid tumors, including common tumor types such as lung, prostate, and breast cancer (13).
- MSI predicts clinical benefit from ICB with PD-1/PD-L1 inhibitors, which has led to the approval of these agents in several indications when MSI is present, including nivolumab ⁇ ipilimumab for MSI-High (MSI-H, positive for MSI) metastatic colorectal cancer and pembrolizumab for unresectable or metastatic MSI-H solid tumors following progression on prior approved therapies (14).
- MSI also has prognostic significance, most notably in colorectal cancer (CRC), where testing is recommended in clinical practice guidelines for all patients (3,15).
- MSI testing is most commonly performed via polymerase chain reaction (PCR) and/or immunohistochemistry (IHC) analysis of tumor tissue specimens.
- PCR polymerase chain reaction
- IHC immunohistochemistry
- NGS next-generation sequencing
- testing of newly- obtained tissue specimens can also result in significant delays associated with biopsy scheduling and failure and is additionally associated with risk and cost due to procedure complications.
- invasive tissue acquisition procedures are contraindicated in many heavily pre-treated and/or frail patients.
- the rapidly growing number of biomarkers and diversification of testing options creates daunting complexity for already over-burdened physicians.
- liquid biopsies have successfully addressed such barriers in many genotyping indications by enabling minimally-invasive profiling of contemporaneous tumor DNA.
- Liquid biopsies thus expand patient access to standard-of-care targeted therapies, including ICBs, by identifying patients whose tumors harbor biomarkers of interest not otherwise identifiable due to tissue sampling limitations and do so more rapidly than typical tissue testing (25).
- comprehensive liquid biopsies can provide all guideline recommended somatic genomic biomarker information for all adult solid tumors in a single test. In this study, it was sought to enhance the utility of a previously validated ctDNA-based genotyping test through the addition of MSI detection.
- guardant Health small panel cell-free DNA (cfDNA) NGS assay is a 74-gene panel previously validated for detection of SNVs, indels, CNAs, and fusions in all guideline-recommended indications for advanced solid tumors (26,27).
- the assay initially incorporated 99 putative microsatellite loci consisting of short tandem repeats (STRs) of length 7 or more, which were selected to include sites susceptible to instability across multiple tumor types, including three of the five Bethesda panel sites (BAT-25, BAT-26, and NR-21). The remaining two Bethesda sites (NR-24 and MONO-27) were not included due to extremely low mappability of the regions. Coverage and noise profiles at these sites were assessed using sequencing data from a set of 84 healthy donor samples, to exclude uninformative sites from the final MSI detection algorithm.
- STRs short tandem repeats
- MSI detection is based on integrating observed read sequences with molecular barcoding information into a single probabilistic model that compares the likelihood of observed data under PCR and sequencing noise assumptions with that under somatic MSI instability assumption.
- Each of the individual sites is scored independently using Akaike Information Criterion (AIC) (28).
- AIC Akaike Information Criterion
- the AIC model generates a locus score (ranging from 0 to infinity), reflecting the likelihood that observed variability at any given micro satellite locus is due to biological instability vs. noise, and a locus is considered unstable if its score (i.e., site score) is above a site specific trained threshold.
- the thresholds for individual loci and total MSI score per sample were established using permutation-based simulations with data from healthy donor samples varying the frequencies of molecules with different repeat lengths and the error parameters at individual loci, as well as the overall number of unstable loci within a simulated sample. Through this approach, simulations were used here in order to interrogate 100,000 combinations of microsatellite lengths and unstable locus numbers, which allows assessment of a diverse landscape of scenarios, some of which may not be represented in a non-simulated dataset.
- MSI-Low a category defined by the observation of a single unstable Bethesda locus using PCR methods
- MSI-Low a category defined by the observation of a single unstable Bethesda locus using PCR methods
- MSI algorithm development and training was performed using simulated data as well as a set of 84 healthy donor samples.
- the clinical validation study included 1145 archived samples (residual plasma and/or cell- free DNA) collected and processed as part of routine standard of care clinical testing in the Guardant Health CLIA laboratory as previously described (26), or archival patient plasma samples collected in EDTA blood collection tubes. Twenty healthy donor samples were also used for the analytical specificity study. Contrived samples used in the analytical validation studies comprise cfDNA pools extracted from cell line supernatants and healthy donor plasma.
- Cell-free DNA prepared from culture supernatants from the following cell lines were used (ATCC, Inc.): KM12, NCI- H660, HCC1419, NCI-H2228, NCI-H1650, NCI-H1648, NCI-H1975, NCI-H1993, NCIH596, HCC78, GM12878, MCF-7.
- cfDNA isolated from cell line culture supernatant mimics the fragment size and mechanisms of extracellular release (29), library conversion, and sequencing properties of patient-derived cfDNA, while also providing a renewable source of well-defined material of sufficient quantity to support the high material demands of studies such as limit of detection and precision.
- Cell-free DNA was extracted from plasma samples or cell line supernatants (QIAmp Circulating Nucleic Acid Kit, Qiagen, Inc.), and up to 30ng of extracted cfDNA was labeled with non-random oligonucleotide barcodes (DDT, Inc.), followed by library preparation, hybrid capture enrichment (Agilent Technologies, Inc.), and sequencing by paired-end synthesis (NextSeq 500/550 or HiSeq 2500, Illumina, Inc.) as previously described (26). Bioinformatics analysis and variant detection were performed as previously described (26).
- Targeted tumor fractions were verified using known germline variants unique to the titrant and diluent materials. Assessment of repeatability (within- run precision) and reproducibility (between-run precision) was based on clinical and contrived model samples. Six of the clinical samples for precision (three MSI-H, and three MSS) were selected with max MAF values of 1-2%, representing ⁇ 2-3x the predicted LoD at 5ng. MSI analytical specificity was determined by analyzing 20 healthy donor samples and 245 known MSS contrived samples.
- Tissue-based MSI status was derived from IHC, PCR, or, less commonly, NGS.
- Clinical outcome data were extracted from patient medical records and deidentified by the treating physician.
- the cohort comprised 28459 consecutive advanced cancer patient samples tested using the 73 cancer-associated gene cfDNA NGS assay (small panel assay) in the course of their clinical care. All analyses were conducted with de-identified data and according to an IRB-approved protocol.
- MSI-H MSI-H in this cohort was assessed across 16 primary tumor types: bladder carcinoma, breast carcinoma, cholangiocarcinoma, colon adenocarcinoma, cancer of unknown primary, head and neck squamous cell carcinoma, hepatocellular carcinoma, lung adenocarcinoma, lung cancer not otherwise specified, lung squamous cell carcinoma,“other” cancer diagnosis, pancreatic adenocarcinoma, prostate adenocarcinoma, stomach adenocarcinoma, and uterine endometrial carcinoma. [0230] 8. Statistics
- MSI detection presents additional challenges due to the need for 1) efficient molecular capture, sequencing, and mapping of repetitive genomic regions that accurately reflect MSI status; 2) error correction and variant detection within repetitive regions; and 3) differentiation of signal due to MSI from non-MSI somatic variation and the strong PCR slippage artifacts at sites typically impacted by somatic instability.
- PCR error is typically at least an order of magnitude higher than typical sequencing error rates in homopolymeric sites, necessitating iterative site selection and optimal use of molecular barcoding to achieve relevant signal-to-noise detection ratios across a large number of candidate microsatellite sites.
- tissue sequencing panels often comprise sufficient informative microsatellite loci simply due to large panel size and longer DNA fragment lengths (13,32)
- the moderate size of the ctDNA panel utilized here and short cell-free DNA (cfDNA) fragment lengths require purposeful microsatellite selection and inclusion.
- tissue sequencing compendia was used informed by literature and tissue sequencing compendia to evaluate candidate sites to provide pan-cancer MSI detection with minimal background noise. The list of candidate loci was further refined based on the performance criteria referenced above using healthy donor cfDNA.
- 90 microsatellite loci were selected for inclusion in the final test version: 89 mononucleotide repeats and a single trinucleotide repeat, all of which comprise repeat lengths of 7 or above. Assessment of unique molecule coverage distribution demonstrated that 65% of these loci have coverage above 0.5X median sample coverage.
- MSI detection In addition to effective molecular capture and mapping, MSI detection also entails highly accurate differentiation of cancer-related signal from background noise due to sequencing and polymerase errors at the very low allele fractions at which ctDNA is typically found (26,27,34).
- the same repetitive genomic context that makes microsatellite candidates informative for MSI detection due to polymerase slippage during in vivo cellular replication also makes them particularly susceptible to the same polymerase slippage during in vitro library preparation and sequencing, resulting in high levels of technical noise.
- Digital Sequencing error correction was used to define true biological insertion-deletion events at microsatellite loci at high fidelity as previously described (26,27).
- Digital Sequencing platform is an NGS panel of cancer-related genes utilizing high-quality sequencing of cell-free DNA (which could comprise circulating tumor DNA) isolated from a simple, non-invasive blood draw.
- Digital Sequencing employs pre-sequencing preparation of a digital library of individually tagged cfDNA molecules combined with post-sequencing bioinformatic reconstruction to eliminate nearly all false positives.
- PCR and/or NGS tissue testing supported the ctDNA NGS results rather than the tissue IHC in 5 of 12 discordances. Together, these data support previous report that IHC may not be as reliable in MSI determination as the PCR diagnostic archetype (36).
- MSI-H prevalence among tumor types also closely reflected that observed in tissue-based analyses ( Figure 11 A); as expected, MSI-H was most prevalent in endometrial, colorectal, and gastric cancers, whereas other tumors such as lung, bladder, and head and neck cancers demonstrated lower prevalence. Specific exceptions to previous MSI-H prevalence estimates included marginally lower prevalence in endometrial, colorectal, and gastric cancers, and marginally higher prevalence in prostate cancer.
- This example demonstrates robust analytical performance for MSI detection on a ctDNA panel previously validated for detection of the other four variant types in all guideline-recommended indications (26).
- the analytical sensitivity for MSI detection in contrived samples demonstrated reproducible detection to 0.1%, congruent with previous reports of similar sensitivity for indels and SNVs (26).
- this example assessed the performance of ctDNA MSI testing in 1145 samples with orthogonal tissue MSI, which constitutes the largest ctDNA-tissue MSI concordance cohort yet described.
- ctDNA MSI assessment demonstrated high PPV (95%), which compares favorably to the reported PPV of 90-92% reported for local vs. central tissue-based MSI assessment (36), and high PPA (87%) in the evaluable population, which is consistent with previous studies examining concordance of plasma and tissue genotyping for other variant types (25,26,45,46).
- Factors that can contribute to incomplete concordance may include tumor heterogeneity, differential shedding by the primary vs. metastatic lesions, temporal discordance of tissue and plasma collection, and low tumor shedding by some tumors (40,44,47-49).
- MSI-H gastric cancer patient identified as MSI-H by plasma and by pentaplex PCR in this report was previously reported to comprise discrete tumor populations of MSS and MSI-H disease as assessed by both IHC and PCR performed on tissue (40).
- the observation of non-trivial discordance between PCR and IHC tissue methods in this report highlight the importance of accurate MSI testing, which has been reported as a primary source of ICB failure (36).
- NSCLC metastatic non-small cell lung cancer
- This example presents the first ctDNA-based landscape analysis of MSI in a large advanced pancancer cohort.
- tissue 13,32,37
- the prevalence in CRC and endometrial cancer is lower than what has been reported for tissue (13), which most likely reflects the fact that the tissue-based landscape analyses include large numbers of early stage MSI-H tumors, which have a better prognosis (15) and are less likely to be part of the advanced cancer population tested with ctDNA.
- MSI-H prostate cancer is attributable to increased representation of MSI-H disease in advanced patients; two recent studies focusing on MSI status in advanced prostate cancer have shown MSI-H prevalence of 3.1% and 3.8% in that patient population (50,51), which is similar to our the 2.6% observed in this study.
- landscape analysis did not reveal tumor type-specific patterns of microsatellite instability. However, this does not preclude the possibility that in plasma, similar to what has been shown in tissue (37,52), tumor-type specific patterns could emerge with the assessment of a larger number of microsatellite loci and larger numbers of representative MSI-H samples.
- a cfDNA-based targeted NGS panel has been developed and validated that accurately assesses MSI status while also providing comprehensive tumor genotyping, allowing pan-solid tumor guideline- complete testing from a single peripheral blood draw with high sensitivity, specificity and precision.
- Clinical validation using both comparison to tissue testing, population-level prevalence analyses, and the first reported outcomes for cfDNA MSI-H patients treated with ICB therapy supported the clinical accuracy and relevance of this approach.
- Such simultaneous characterization of MSI status and tumor genotype from a simple peripheral blood draw has the potential to expand access to both targeted therapy and immunotherapies to all advanced cancer patients including those for whom current tissue-based testing paradigms are inadequate.
- NCCN Clinical Practice Guidelines in Oncology NCCN Guidelines 2018. Available from: https://www.nccn.org.
- DNA profiling of metastatic prostate cancer reveals microsatellite instability, structural rearrangements and clonal hematopoiesis. Genome Med. 2018;10:85.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Chemical & Material Sciences (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Data Mining & Analysis (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Organic Chemistry (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Wood Science & Technology (AREA)
- Pathology (AREA)
- Zoology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- General Engineering & Computer Science (AREA)
- Hospice & Palliative Care (AREA)
- Biochemistry (AREA)
- Oncology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Medicines Containing Material From Animals Or Micro-Organisms (AREA)
- Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862726182P | 2018-08-31 | 2018-08-31 | |
US201962823578P | 2019-03-25 | 2019-03-25 | |
US201962857048P | 2019-06-04 | 2019-06-04 | |
PCT/US2019/048999 WO2020047378A1 (en) | 2018-08-31 | 2019-08-30 | Microsatellite instability detection in cell-free dna |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3844761A1 true EP3844761A1 (de) | 2021-07-07 |
Family
ID=67982149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19769633.9A Pending EP3844761A1 (de) | 2018-08-31 | 2019-08-30 | Mikrosatelliteninstabilitätsdetektion in zellfreier dna |
Country Status (9)
Country | Link |
---|---|
US (2) | US11773451B2 (de) |
EP (1) | EP3844761A1 (de) |
JP (1) | JP2021535489A (de) |
KR (1) | KR20210052511A (de) |
CN (1) | CN112930569A (de) |
AU (1) | AU2019328344A1 (de) |
CA (1) | CA3109539A1 (de) |
SG (1) | SG11202101400UA (de) |
WO (1) | WO2020047378A1 (de) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4118238A4 (de) * | 2020-03-12 | 2024-04-17 | Personal Genome Diagnostics Inc | Mikrosatelliteninstabilitätssignaturen |
WO2022124575A1 (ko) * | 2020-12-07 | 2022-06-16 | (주)디엑솜 | 현미부수체 지역의 서열 길이의 변동계수를 이용한 현미부수체 불안정성 진단방법 |
WO2023287410A1 (en) * | 2021-07-14 | 2023-01-19 | Foundation Medicine, Inc. | Methods and systems for determining microsatellite instability |
WO2023017402A1 (en) * | 2021-08-09 | 2023-02-16 | Canexia Health Inc. | Methods for identifying microsatellite instability high (msi-h) in dna samples |
KR20230023276A (ko) * | 2021-08-10 | 2023-02-17 | (주)디엑솜 | 현미부수체 지역의 서열 길이의 변화율을 이용한 현미부수체 불안정성 진단방법 |
KR20230023278A (ko) * | 2021-08-10 | 2023-02-17 | (주)디엑솜 | 현미부수체 지역의 서열 길이의 최대값과 최소값의 차이를 이용한 현미부수체 불안정성 진단방법 |
CN113744251B (zh) * | 2021-09-07 | 2023-08-29 | 上海桐树生物科技有限公司 | 基于自注意力机制从病理图片预测微卫星不稳定性的方法 |
US20240003888A1 (en) | 2022-05-17 | 2024-01-04 | Guardant Health, Inc. | Methods for identifying druggable targets and treating cancer |
CN117809744A (zh) * | 2023-04-21 | 2024-04-02 | 苏州吉因加生物医学工程有限公司 | 一种筛选msi特征位点的方法、装置和存储介质 |
CN116312781B (zh) * | 2023-05-17 | 2023-08-18 | 普瑞基准科技(北京)有限公司 | 一种基于机器学习的基因组不稳定性评估方法及系统 |
Family Cites Families (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6582908B2 (en) | 1990-12-06 | 2003-06-24 | Affymetrix, Inc. | Oligonucleotides |
US20030017081A1 (en) | 1994-02-10 | 2003-01-23 | Affymetrix, Inc. | Method and apparatus for imaging a sample on a device |
ATE226983T1 (de) | 1994-08-19 | 2002-11-15 | Pe Corp Ny | Gekoppeltes ampflikation- und ligationverfahren |
GB9620209D0 (en) | 1996-09-27 | 1996-11-13 | Cemu Bioteknik Ab | Method of sequencing DNA |
GB9626815D0 (en) | 1996-12-23 | 1997-02-12 | Cemu Bioteknik Ab | Method of sequencing DNA |
US6969488B2 (en) | 1998-05-22 | 2005-11-29 | Solexa, Inc. | System and apparatus for sequential processing of analytes |
AR021833A1 (es) | 1998-09-30 | 2002-08-07 | Applied Research Systems | Metodos de amplificacion y secuenciacion de acido nucleico |
US6818395B1 (en) | 1999-06-28 | 2004-11-16 | California Institute Of Technology | Methods and apparatus for analyzing polynucleotide sequences |
US7501245B2 (en) | 1999-06-28 | 2009-03-10 | Helicos Biosciences Corp. | Methods and apparatuses for analyzing polynucleotide sequences |
AU7537200A (en) | 1999-09-29 | 2001-04-30 | Solexa Ltd. | Polynucleotide sequencing |
CN100462433C (zh) | 2000-07-07 | 2009-02-18 | 维西根生物技术公司 | 实时序列测定 |
US7208271B2 (en) | 2001-11-28 | 2007-04-24 | Applera Corporation | Compositions and methods of selective nucleic acid isolation |
US7169560B2 (en) | 2003-11-12 | 2007-01-30 | Helicos Biosciences Corporation | Short cycle methods for sequencing polynucleotides |
US7170050B2 (en) | 2004-09-17 | 2007-01-30 | Pacific Biosciences Of California, Inc. | Apparatus and methods for optical analysis of molecules |
CA2579150C (en) | 2004-09-17 | 2014-11-25 | Pacific Biosciences Of California, Inc. | Apparatus and method for analysis of molecules |
US7482120B2 (en) | 2005-01-28 | 2009-01-27 | Helicos Biosciences Corporation | Methods and compositions for improving fidelity in a nucleic acid synthesis reaction |
US7282337B1 (en) | 2006-04-14 | 2007-10-16 | Helicos Biosciences Corporation | Methods for increasing accuracy of nucleic acid sequencing |
EP2218794A1 (de) * | 2009-02-13 | 2010-08-18 | Alphagenics International SA | Instabilitätserkennung in Regionen mit genomischem DNA mit einfacher Tandemwiederholung |
WO2010129787A2 (en) * | 2009-05-08 | 2010-11-11 | The Johns Hopkins University | Single molecule spectroscopy for analysis of cell-free nucleic acid biomarkers |
US8835358B2 (en) | 2009-12-15 | 2014-09-16 | Cellular Research, Inc. | Digital counting of individual molecules by stochastic attachment of diverse labels |
AU2011252795B2 (en) | 2010-05-14 | 2015-09-03 | Dana-Farber Cancer Institute, Inc. | Compositions and methods of identifying tumor specific neoantigens |
KR20190002733A (ko) | 2010-12-30 | 2019-01-08 | 파운데이션 메디신 인코포레이티드 | 종양 샘플의 다유전자 분석의 최적화 |
WO2013153130A1 (en) * | 2012-04-10 | 2013-10-17 | Vib Vzw | Novel markers for detecting microsatellite instability in cancer and determining synthetic lethality with inhibition of the dna base excision repair pathway |
IL305303A (en) | 2012-09-04 | 2023-10-01 | Guardant Health Inc | Systems and methods for detecting rare mutations and changes in number of copies |
US20160040229A1 (en) | 2013-08-16 | 2016-02-11 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US20140235456A1 (en) * | 2012-12-17 | 2014-08-21 | Virginia Tech Intellectual Properties, Inc. | Methods and Compositions for Identifying Global Microsatellite Instability and for Characterizing Informative Microsatellite Loci |
MX2016008771A (es) | 2014-01-02 | 2016-12-20 | Memorial Sloan Kettering Cancer Center | Determinantes de respuesta del cancer a la inmunoterapia. |
US20150292033A1 (en) | 2014-04-10 | 2015-10-15 | Dana-Farber Cancer Institute, Inc. | Method of determining cancer prognosis |
SG10201914022QA (en) | 2014-11-13 | 2020-03-30 | Univ Johns Hopkins | Checkpoint blockade and microsatellite instability |
MA40737A (fr) | 2014-11-21 | 2017-07-04 | Memorial Sloan Kettering Cancer Center | Déterminants de la réponse d'un cancer à une immunothérapie par blocage de pd-1 |
US10364468B2 (en) | 2016-01-13 | 2019-07-30 | Seven Bridges Genomics Inc. | Systems and methods for analyzing circulating tumor DNA |
EP3405574A4 (de) * | 2016-01-22 | 2019-10-02 | Grail, Inc. | Auf varianten basierende krankheitsdiagnostik und verfolgung |
JP2019509282A (ja) | 2016-02-29 | 2019-04-04 | ファウンデーション・メディシン・インコーポレイテッド | 癌の治療方法 |
KR20180119632A (ko) | 2016-02-29 | 2018-11-02 | 제넨테크, 인크. | 암에 대한 치료 및 진단 방법 |
EP3423828A4 (de) | 2016-02-29 | 2019-11-13 | Foundation Medicine, Inc. | Verfahren und systeme zur beurteilung einer tumormutationslast |
TWI822521B (zh) | 2016-05-13 | 2023-11-11 | 美商再生元醫藥公司 | 藉由投予pd-1抑制劑治療皮膚癌之方法 |
US10294518B2 (en) | 2016-09-16 | 2019-05-21 | Fluxion Biosciences, Inc. | Methods and systems for ultra-sensitive detection of genomic alterations |
CA3038712A1 (en) | 2016-10-06 | 2018-04-12 | Genentech, Inc. | Therapeutic and diagnostic methods for cancer |
AU2017355732A1 (en) | 2016-11-07 | 2019-05-09 | Grail, Llc | Methods of identifying somatic mutational signatures for early cancer detection |
CN106834479A (zh) * | 2017-02-16 | 2017-06-13 | 凯杰(苏州)转化医学研究有限公司 | 肿瘤免疫治疗中微卫星不稳定状态分析体系 |
US20200024669A1 (en) | 2017-03-20 | 2020-01-23 | Caris Mpi, Inc. | Genomic stability profiling |
KR20200093518A (ko) | 2017-07-21 | 2020-08-05 | 제넨테크, 인크. | 암에 대한 치료 및 진단 방법 |
CN107526944B (zh) * | 2017-09-06 | 2018-08-24 | 南京世和基因生物技术有限公司 | 一种微卫星不稳定性的测序数据分析方法、装置及计算机可读介质 |
EP3717520A4 (de) | 2017-12-01 | 2021-08-18 | Personal Genome Diagnostics Inc. | Verfahren zur erkennung von mikrosatelliteninstabilität |
US20190206513A1 (en) | 2017-12-29 | 2019-07-04 | Grail, Inc. | Microsatellite instability detection |
CN112955570A (zh) | 2018-09-14 | 2021-06-11 | 莱森特生物公司 | 评估微卫星不稳定性的方法和系统 |
EP4008005A4 (de) | 2019-08-01 | 2023-09-27 | Tempus Labs, Inc. | Verfahren und systeme zum nachweis von mikrosatelliteninstabilität von krebs in einem flüssigen biopsietest |
-
2019
- 2019-08-30 AU AU2019328344A patent/AU2019328344A1/en active Pending
- 2019-08-30 EP EP19769633.9A patent/EP3844761A1/de active Pending
- 2019-08-30 JP JP2021510455A patent/JP2021535489A/ja active Pending
- 2019-08-30 SG SG11202101400UA patent/SG11202101400UA/en unknown
- 2019-08-30 WO PCT/US2019/048999 patent/WO2020047378A1/en unknown
- 2019-08-30 CA CA3109539A patent/CA3109539A1/en active Pending
- 2019-08-30 KR KR1020217009381A patent/KR20210052511A/ko active Search and Examination
- 2019-08-30 CN CN201980071631.3A patent/CN112930569A/zh active Pending
-
2020
- 2020-06-19 US US16/907,034 patent/US11773451B2/en active Active
-
2023
- 2023-08-25 US US18/456,362 patent/US20230416843A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2020047378A1 (en) | 2020-03-05 |
KR20210052511A (ko) | 2021-05-10 |
CA3109539A1 (en) | 2020-03-05 |
SG11202101400UA (en) | 2021-03-30 |
CN112930569A (zh) | 2021-06-08 |
US20210363586A1 (en) | 2021-11-25 |
US11773451B2 (en) | 2023-10-03 |
US20230416843A1 (en) | 2023-12-28 |
JP2021535489A (ja) | 2021-12-16 |
AU2019328344A1 (en) | 2021-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11773451B2 (en) | Microsatellite instability detection in cell-free DNA | |
JP7466519B2 (ja) | 腫瘍遺伝子変異量を腫瘍割合およびカバレッジによって調整するための方法およびシステム | |
US20190385700A1 (en) | METHODS AND SYSTEMS FOR DETERMINING The CELLULAR ORIGIN OF CELL-FREE NUCLEIC ACIDS | |
KR20220011140A (ko) | 종양 분획 평가를 위한 시스템 및 방법 | |
Zhao et al. | TruSight oncology 500: enabling comprehensive genomic profiling and biomarker reporting with targeted sequencing | |
US20220028494A1 (en) | Methods and systems for determining the cellular origin of cell-free dna | |
Nassar et al. | Epigenomic charting and functional annotation of risk loci in renal cell carcinoma | |
US20210398610A1 (en) | Significance modeling of clonal-level absence of target variants | |
US20200020416A1 (en) | Methods for detecting and suppressing alignment errors caused by fusion events | |
US20220344004A1 (en) | Detecting the presence of a tumor based on off-target polynucleotide sequencing data | |
US20220411876A1 (en) | Methods and related aspects for analyzing molecular response | |
RU2811503C2 (ru) | Способы выявления и мониторинга рака путем персонализированного выявления циркулирующей опухолевой днк | |
WO2023168300A1 (en) | Methods for analyzing cytosine methylation and hydroxymethylation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20210331 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
PUAG | Search results despatched under rule 164(2) epc together with communication from examining division |
Free format text: ORIGINAL CODE: 0009017 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: GUARDANT HEALTH, INC. |
|
17Q | First examination report despatched |
Effective date: 20240312 |
|
B565 | Issuance of search results under rule 164(2) epc |
Effective date: 20240312 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G16B 20/20 20190101ALI20240306BHEP Ipc: G16B 40/20 20190101AFI20240306BHEP |