WO2023277932A1 - Detection of human leukocyte antigen loss of heterozygosity - Google Patents
Detection of human leukocyte antigen loss of heterozygosity Download PDFInfo
- Publication number
- WO2023277932A1 WO2023277932A1 PCT/US2021/042039 US2021042039W WO2023277932A1 WO 2023277932 A1 WO2023277932 A1 WO 2023277932A1 US 2021042039 W US2021042039 W US 2021042039W WO 2023277932 A1 WO2023277932 A1 WO 2023277932A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- hla
- allele
- tumor
- loh
- coverage
- Prior art date
Links
- 239000000427 antigen Substances 0.000 title claims abstract description 12
- 108091007433 antigens Proteins 0.000 title claims abstract description 12
- 102000036639 antigens Human genes 0.000 title claims abstract description 12
- 210000000265 leukocyte Anatomy 0.000 title claims abstract description 8
- 238000001514 detection method Methods 0.000 title description 13
- 108700028369 Alleles Proteins 0.000 claims abstract description 358
- 238000000034 method Methods 0.000 claims abstract description 321
- 230000008569 process Effects 0.000 claims abstract description 182
- 238000007481 next generation sequencing Methods 0.000 claims abstract description 81
- 206010028980 Neoplasm Diseases 0.000 claims description 374
- 239000000523 sample Substances 0.000 claims description 211
- 108090000623 proteins and genes Proteins 0.000 claims description 137
- 201000011510 cancer Diseases 0.000 claims description 117
- 239000002773 nucleotide Substances 0.000 claims description 69
- 125000003729 nucleotide group Chemical group 0.000 claims description 68
- 210000004027 cell Anatomy 0.000 claims description 51
- 238000002560 therapeutic procedure Methods 0.000 claims description 43
- 238000004422 calculation algorithm Methods 0.000 claims description 38
- 238000012163 sequencing technique Methods 0.000 claims description 38
- 239000012472 biological sample Substances 0.000 claims description 27
- 102100028972 HLA class I histocompatibility antigen, A alpha chain Human genes 0.000 claims description 26
- 108010075704 HLA-A Antigens Proteins 0.000 claims description 26
- 238000001914 filtration Methods 0.000 claims description 21
- 230000036961 partial effect Effects 0.000 claims description 21
- 108091008109 Pseudogenes Proteins 0.000 claims description 19
- 102000057361 Pseudogenes Human genes 0.000 claims description 19
- 210000004369 blood Anatomy 0.000 claims description 16
- 239000008280 blood Substances 0.000 claims description 16
- 102100028976 HLA class I histocompatibility antigen, B alpha chain Human genes 0.000 claims description 14
- 108010058607 HLA-B Antigens Proteins 0.000 claims description 14
- 101001024425 Mus musculus Ig gamma-2A chain C region secreted form Proteins 0.000 claims description 10
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 9
- 238000002360 preparation method Methods 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 8
- 229940076838 Immune checkpoint inhibitor Drugs 0.000 claims description 6
- 239000012274 immune-checkpoint protein inhibitor Substances 0.000 claims description 6
- 238000007477 logistic regression Methods 0.000 claims description 6
- 230000001186 cumulative effect Effects 0.000 claims description 4
- 208000020816 lung neoplasm Diseases 0.000 claims description 4
- 230000001394 metastastic effect Effects 0.000 claims description 4
- 206010061289 metastatic neoplasm Diseases 0.000 claims description 4
- 101100284398 Bos taurus BoLA-DQB gene Proteins 0.000 claims description 3
- 229940045513 CTLA4 antagonist Drugs 0.000 claims description 3
- 101000986085 Homo sapiens HLA class I histocompatibility antigen, alpha chain E Proteins 0.000 claims description 3
- 101001100327 Homo sapiens RNA-binding protein 45 Proteins 0.000 claims description 3
- 102100038823 RNA-binding protein 45 Human genes 0.000 claims description 3
- 208000037841 lung tumor Diseases 0.000 claims description 3
- 102100028970 HLA class I histocompatibility antigen, alpha chain E Human genes 0.000 claims description 2
- 102100028966 HLA class I histocompatibility antigen, alpha chain F Human genes 0.000 claims description 2
- 108010024164 HLA-G Antigens Proteins 0.000 claims description 2
- 101000986080 Homo sapiens HLA class I histocompatibility antigen, alpha chain F Proteins 0.000 claims description 2
- 206010061902 Pancreatic neoplasm Diseases 0.000 claims description 2
- 201000002528 pancreatic cancer Diseases 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 32
- 210000001519 tissue Anatomy 0.000 description 54
- 108020004414 DNA Proteins 0.000 description 44
- 230000002068 genetic effect Effects 0.000 description 38
- 238000013507 mapping Methods 0.000 description 38
- 210000002220 organoid Anatomy 0.000 description 33
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 28
- 230000014509 gene expression Effects 0.000 description 26
- 238000012545 processing Methods 0.000 description 26
- 230000006870 function Effects 0.000 description 22
- 208000032818 Microsatellite Instability Diseases 0.000 description 18
- 238000012549 training Methods 0.000 description 17
- 102210047469 A*02:01 Human genes 0.000 description 15
- 230000036541 health Effects 0.000 description 15
- 238000003556 assay Methods 0.000 description 13
- 108010052199 HLA-C Antigens Proteins 0.000 description 12
- 238000009169 immunotherapy Methods 0.000 description 12
- 238000003066 decision tree Methods 0.000 description 11
- 102000004169 proteins and genes Human genes 0.000 description 11
- 238000012360 testing method Methods 0.000 description 11
- 238000013459 approach Methods 0.000 description 10
- 210000004602 germ cell Anatomy 0.000 description 10
- 238000011160 research Methods 0.000 description 10
- 210000003296 saliva Anatomy 0.000 description 10
- 102210042925 HLA-A*02:01 Human genes 0.000 description 9
- 108091028043 Nucleic acid sequence Proteins 0.000 description 9
- 230000015654 memory Effects 0.000 description 9
- 102100028971 HLA class I histocompatibility antigen, C alpha chain Human genes 0.000 description 8
- 108091092878 Microsatellite Proteins 0.000 description 8
- 239000013610 patient sample Substances 0.000 description 8
- 102210047218 B*07:02 Human genes 0.000 description 7
- 230000008901 benefit Effects 0.000 description 7
- 239000003153 chemical reaction reagent Substances 0.000 description 7
- 238000010606 normalization Methods 0.000 description 7
- 238000011282 treatment Methods 0.000 description 7
- 210000004881 tumor cell Anatomy 0.000 description 7
- 108010019670 Chimeric Antigen Receptors Proteins 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000012217 deletion Methods 0.000 description 6
- 230000037430 deletion Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000007717 exclusion Effects 0.000 description 6
- 238000001943 fluorescence-activated cell sorting Methods 0.000 description 6
- 238000010801 machine learning Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 6
- 210000000056 organ Anatomy 0.000 description 6
- 102210047471 B*44:02 Human genes 0.000 description 5
- 108010074708 B7-H1 Antigen Proteins 0.000 description 5
- 206010009944 Colon cancer Diseases 0.000 description 5
- 238000001712 DNA sequencing Methods 0.000 description 5
- 102100024216 Programmed cell death 1 ligand 1 Human genes 0.000 description 5
- 239000000090 biomarker Substances 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000002998 immunogenetic effect Effects 0.000 description 5
- 238000003780 insertion Methods 0.000 description 5
- 230000037431 insertion Effects 0.000 description 5
- 230000000869 mutational effect Effects 0.000 description 5
- 102100025475 Carcinoembryonic antigen-related cell adhesion molecule 5 Human genes 0.000 description 4
- 230000004075 alteration Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 230000007423 decrease Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 230000008595 infiltration Effects 0.000 description 4
- 238000001764 infiltration Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 230000002441 reversible effect Effects 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 3
- 108700024394 Exon Proteins 0.000 description 3
- 108700005089 MHC Class I Genes Proteins 0.000 description 3
- 210000001744 T-lymphocyte Anatomy 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 239000011324 bead Substances 0.000 description 3
- 238000004166 bioassay Methods 0.000 description 3
- 238000001574 biopsy Methods 0.000 description 3
- 230000004640 cellular pathway Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 210000000349 chromosome Anatomy 0.000 description 3
- 238000013145 classification model Methods 0.000 description 3
- 230000007812 deficiency Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 239000012530 fluid Substances 0.000 description 3
- 239000007850 fluorescent dye Substances 0.000 description 3
- 201000005787 hematologic cancer Diseases 0.000 description 3
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 description 3
- 230000006801 homologous recombination Effects 0.000 description 3
- 238000002744 homologous recombination Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 229960005386 ipilimumab Drugs 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 210000004940 nucleus Anatomy 0.000 description 3
- 230000001717 pathogenic effect Effects 0.000 description 3
- 102000054765 polymorphisms of proteins Human genes 0.000 description 3
- 238000003908 quality control method Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 230000000392 somatic effect Effects 0.000 description 3
- 230000001225 therapeutic effect Effects 0.000 description 3
- 102210042961 A*03:01 Human genes 0.000 description 2
- 101150118346 HLA-A gene Proteins 0.000 description 2
- 108700005092 MHC Class II Genes Proteins 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000030741 antigen processing and presentation Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 229940022399 cancer vaccine Drugs 0.000 description 2
- 238000009566 cancer vaccine Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000002759 chromosomal effect Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000684 flow cytometry Methods 0.000 description 2
- 238000012252 genetic analysis Methods 0.000 description 2
- 102000054766 genetic haplotypes Human genes 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 210000000987 immune system Anatomy 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 238000002887 multiple sequence alignment Methods 0.000 description 2
- 229960003301 nivolumab Drugs 0.000 description 2
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 229940031734 peptide cancer vaccine Drugs 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 108090000765 processed proteins & peptides Proteins 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 230000008707 rearrangement Effects 0.000 description 2
- 108020003175 receptors Proteins 0.000 description 2
- 102000005962 receptors Human genes 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 2
- 229960005486 vaccine Drugs 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 206010069754 Acquired gene mutation Diseases 0.000 description 1
- 102210047283 C*07:01 Human genes 0.000 description 1
- 102210047220 C*07:02 Human genes 0.000 description 1
- 210000001266 CD8-positive T-lymphocyte Anatomy 0.000 description 1
- 108020003215 DNA Probes Proteins 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 239000003298 DNA probe Substances 0.000 description 1
- 206010059866 Drug resistance Diseases 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 108010067770 Endopeptidase K Proteins 0.000 description 1
- 108700039691 Genetic Promoter Regions Proteins 0.000 description 1
- 102100031180 Hereditary hemochromatosis protein Human genes 0.000 description 1
- 101000993059 Homo sapiens Hereditary hemochromatosis protein Proteins 0.000 description 1
- 101000866971 Homo sapiens Putative HLA class I histocompatibility antigen, alpha chain H Proteins 0.000 description 1
- 238000001276 Kolmogorov–Smirnov test Methods 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 208000034179 Neoplasms, Glandular and Epithelial Diseases 0.000 description 1
- 101100384865 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) cot-1 gene Proteins 0.000 description 1
- 208000025174 PANDAS Diseases 0.000 description 1
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- ZYFVNVRFVHJEIU-UHFFFAOYSA-N PicoGreen Chemical compound CN(C)CCCN(CCCN(C)C)C1=CC(=CC2=[N+](C3=CC=CC=C3S2)C)C2=CC=CC=C2N1C1=CC=CC=C1 ZYFVNVRFVHJEIU-UHFFFAOYSA-N 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 206010041067 Small cell lung cancer Diseases 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 230000006044 T cell activation Effects 0.000 description 1
- 101500012027 Viscum album Beta-galactoside-specific lectin 1 chain A isoform 1 Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 125000003275 alpha amino acid group Chemical group 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 238000012197 amplification kit Methods 0.000 description 1
- 229960003852 atezolizumab Drugs 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 210000000601 blood cell Anatomy 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000022534 cell killing Effects 0.000 description 1
- 238000002659 cell therapy Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000002648 combination therapy Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 239000013068 control sample Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000002357 endometrial effect Effects 0.000 description 1
- 230000017188 evasion or tolerance of host immune response Effects 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 238000007672 fourth generation sequencing Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000005965 immune activity Effects 0.000 description 1
- 210000002865 immune cell Anatomy 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 230000037451 immune surveillance Effects 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002147 killing effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000011528 liquid biopsy Methods 0.000 description 1
- 239000006193 liquid solution Substances 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 201000001997 microphthalmia with limb anomalies Diseases 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000012188 paraffin wax Substances 0.000 description 1
- 244000052769 pathogen Species 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 229960002621 pembrolizumab Drugs 0.000 description 1
- 210000003819 peripheral blood mononuclear cell Anatomy 0.000 description 1
- 229950010773 pidilizumab Drugs 0.000 description 1
- 230000003389 potentiating effect Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 208000000587 small cell lung carcinoma Diseases 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000037439 somatic mutation Effects 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000002626 targeted therapy Methods 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
- 229950007217 tremelimumab Drugs 0.000 description 1
- 238000009966 trimming Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 238000007482 whole exome sequencing Methods 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- HLA Human Leukocyte Antigen Class I
- a computer-implemented method of detecting loss of heterozygosity (LOH) of a human leukocyte antigen (HLA) gene in a subject includes: obtaining HLA coverage feature metrics of a biological sample; providing one or more of the HLA coverage feature metrics to a three-class HLA loss of heterozygosity (LOH) modeling process trained to classify the biological sample as corresponding to one of three LOH classes, no LOH, partial LOH, or clonal LOH and determining the LOH class of the sample and determining, using the three-class HLA LOH modeling process, the LOH class for the HLA gene; and generating and storing a report of the determined LOH class for the HLA gene.
- LOH loss of heterozygosity
- the three-class HLA LOH modeling process is a sequential two stage modeling process having a first LOH classifier model stage and a second LOH classifier model stage, wherein providing the one or more of the HLA coverage feature metrics to the three- class HLA LOH modeling process includes: providing at least one of the HLA coverage feature metrics to the first LOH classifier model and determining either no LOH or a LOH for the sample; and in response to determining LOH for the samples, providing at least one of the HLA coverage feature metrics to the second LOH classifier model stage and determining the LOH class as either partial LOH or clonal LOH for the sample.
- the one or more of the HLA coverage feature metrics includes: read depth of coverage of candidate HLA allele of the HLA gene; a ratio of a B allele frequency (BAF) of a stable allele in a tumor sample of the biological sample to the BAF of the stable allele in a normal sample of the biological sample; a difference between a log ratio (logR) of coverage for the stable allele between the tumor sample and the normal sample and a logR of coverage of a lost HLA allele of the HLA gene between the tumor sample and the normal sample; tumor purity; a ratio of a BAF of the lost allele in the tumor sample to the BAF of the lost allele in the normal sample; and a quotient of the observed logR difference minus the expected logR difference divided by the expected logR difference based on tumor purity.
- BAF B allele frequency
- the observed logR difference is the difference between the logR of coverage of the stable allele and the logR of coverage of the lost allele.
- the observed logR difference is an average of log(coverage in tumor / coverage in normal), calculated for at least one nucleotide position in an HLA gene.
- the log(coverage in tumor / coverage in normal) is calculated for nucleotide positions having a coverage of at least 40 sequence reads.
- the observed logR difference is an average of log(coverage in tumor / coverage in normal * match ratio), calculated for at least one nucleotide position in an HLA gene, wherein the match ratio is the ratio of the number of HLA reads in the normal to number of HLA reads in the tumor or the ratio of the number of unique reads in the normal sample to the number of unique reads in the tumor sample.
- log(coverage in tumor / coverage in normal * match ratio) is calculated for nucleotide positions having a coverage of at least 40 sequence reads.
- the observed logR difference is the cumulative area between the logR line associated with a first allele and the logR line associated with a second allele.
- the expected logR difference is the log2(1 -tumor purity) and tumor purity is a value between 0 and 1.
- the method further includes: for each gene, calculating a ratio of a BAF of a first allele in the tumor sample to the BAF of the first allele in the normal sample and calculating a ratio of a BAF of a second allele in the tumor sample to the BAF of the second allele in the normal sample; and comparing each ratio and selecting the allele associated with the lowest ratio as the allele that is more likely to be lost, before running the modeling process.
- obtaining FILA coverage feature metrics of the biological sample includes: receiving next generation sequencing data generated from the biological sample of the subject; aligning the next generation sequencing data against a reference genome to determine a mapped reads dataset and an unmapped reads dataset; providing at least the unmapped reads dataset to an FILA typing process to identify at least one candidate FILA allele for the FILA gene; identifying a FILA sequence associated with each identified candidate FILA allele; creating a FILA reference genome using each identified FILA sequence; aligning the next generation sequencing data against the FILA reference genome and adjusting the FILA reference genome to account for a variant identified during the aligning; and aligning the next generation sequencing data against the adjusted FILA reference genome and, in response, determining the FILA coverage feature metrics associated with one or more identified candidate FILA alleles.
- obtaining FILA coverage feature metrics of the biological sample includes: receiving normal next generation sequencing data generated from a buffy coat preparation of a blood sample of the subject; aligning the next generation sequencing data against a reference genome to determine a normal mapped reads dataset and a normal unmapped reads dataset; receiving tumor next generation sequencing data generated from a tumor specimen of the subject; providing at least a portion of the normal unmapped reads dataset to an FILA typing process to identify at least one candidate FILA allele for the FILA gene; identifying a FILA sequence associated with each identified candidate FILA allele; creating a FILA reference genome using each identified FILA sequence; aligning the normal next generation sequencing dataset against the FILA reference genome and adjusting the FILA reference genome to account for a variant identified during the aligning; and aligning the normal next generation sequencing dataset against the adjusted FILA reference genome and aligning the tumor next generation sequencing dataset against the adjusted FILA reference genome to determine the HLA coverage feature metrics associated with the identified candidate HLA alleles.
- determining the LOH class for the HLA gene includes applying a logistic regression model to the obtained HLA coverage feature metrics.
- the one or more of the HLA coverage feature metrics includes: read depth of coverage of a candidate allele of the HLA gene; a ratio of a B allele frequency (BAF) of a stable allele in a tumor sample of the biological sample to the BAF of the stable allele in a normal sample of the biological sample; a difference between a log ratio (logR) of coverage for the stable allele between the tumor sample and the normal sample and a logR of coverage of a lost HLA allele of the HLA gene between the tumor sample and the normal sample; tumor purity; a ratio of a BAF of the lost allele in the tumor sample to the BAF of the lost allele in the normal sample; and a quotient of the observed logR difference minus the expected logR difference divided by the expected logR difference based on tumor purity.
- BAF B allele frequency
- next generation sequencing data is generated using short read sequencing.
- a method for determining loss of heterozygosity for the HLA-A, HLA-B, and HLA-C genes, or for the HLA-E, HLA-F, and HLA-G genes, or for the DRA, DRB1 , DQA1 , DQB1 , DPA1 , and DPB1 genes uses, for each gene, methods herein.
- At least a portion of the reads data includes forward reads from paired- end reads.
- the HLA typing process applies an Optitype HLA typing algorithm or a Kourami HLA typing algorithm.
- the HLA reference genome further includes at least one HLA pseudogene sequence.
- providing at least a portion of the normal unmapped reads dataset to the HLA typing process to identify at least one candidate HLA allele for the HLA gene includes providing at least a portion of the normal unmapped reads dataset and a portion of the normal mapped reads dataset to the HLA typing process.
- aligning the tumor next generation sequencing dataset against the adjusted HLA reference genome to determine the HLA coverage feature metrics includes filtering the tumor next generation sequencing dataset.
- filtering the tumor next generation sequencing dataset includes removing reads that are not properly aligned, removing duplicate reads, and/or removing a read based on an edit distance associated with the read.
- the tumor specimen is a solid tumor specimen.
- the tumor specimen is a cell free DNA (cfDNA) specimen.
- cfDNA cell free DNA
- the tumor specimen is a lung tumor specimen, a metastatic specimen, a colorectal tumor specimen, or a pancreatic tumor specimen.
- the method is implemented on one or more microservices.
- the method further includes: for the biological sample containing cancer, when it is determined that the biological sample has an LOH class of no LOH in the HLA gene, treating the cancer by administering a checkpoint inhibitor therapy to the subject.
- the checkpoint inhibitor therapy is selected from the group consisting of an anti-CTLA-4 therapy, an anti-PD-1 therapy, and an anti-PD-L1 therapy.
- the biological sample is selected from the group consisting of a tumor specimen and a buffy coat preparation.
- FIG. 1 illustrates an example workflow 10 for next generation sequencing, bioinformatics processing, and report generation, in an example.
- FIG. 2 illustrates a schematic of an example process for Human Leukocyte Antigen Class I (HLA) detection and analysis.
- FIG. 3 illustrates an example process schematic for data flow for an HLA typing model and a loss of heterozygosity (LOH) in HLA genes (LOH) model (collectively the HLA and HLA- LOH model).
- HLA Human Leukocyte Antigen Class I
- FIG. 4 illustrates an example HLA typing report, generated in an example.
- FIGS. 5A, 5B, and 5C collectively illustrate plots of coverage metrics calculated for different examples of the techniques herein, some in comparison to non-technique examples, and some without the filter steps.
- FIG. 5A shows data that were calculated using all disclosed steps and features
- FIG. 5B shows data calculated without aligning discarded/unmapped reads to HLA genes
- FIG. 5C shows data calculated without replacing the HLA reference sequences with the variants detected in the sequence data generated by the patient sample.
- Light colors (lighter blue and lighter red) indicate areas of low coverage and black dots indicate positions where the sequences of the two alleles diverge from one another.
- FIG. 6 illustrates an example shallow decision tree showing the use of coverage metrics to predict HLA- LOH.
- FIGS. 7A and 7B collectively illustrate the results of an optional biological assay used to validate the predictions of the HLA and LOH model.
- FIGS. 8A, 8B, and 8C collectively illustrate coverage metrics plots calculated by the methods disclosed herein for different types of tissues.
- FIG. 8A shows coverage data calculated for the non-cancer sample.
- FIG. 8B shows coverage data calculated for the cancer sample tissue extracted from the same patient as the non-cancer sample.
- FIG. 8A shows coverage data calculated for the non-cancer sample.
- 8C shows coverage data for a tumor organoid derived from the cancer sample tissue.
- FIGS. 9A, 9B, 9C, and 9D collectively illustrate how various model features lead to more robust alignments and less noisy signal for downstream analysis by comparing plots of coverage metrics calculated for different examples of the techniques herein with coverage metrics calculated for non-technique examples, and some without the filter steps.
- FIG. 10 illustrates an example system for HLA and HLA-LOH analysis that may be implemented on a network accessible processing system for performing the processes described herein.
- FIG. 11 illustrates how HLA-LOH can potentially lead to escape of immune pressure.
- FIG. 12 illustrates relative differences in allele coverage metrics calculated in order to detect HLA-LOH, including B allele frequencies (BAF) and Log Coverage ratios, between the Tumor and Normal sample.
- BAF B allele frequencies
- Log Coverage ratios between the Tumor and Normal sample.
- the cancer specimen analyzed for these results represents a strong HLA-LOH.
- the allele predicted to have been lost and the allele predicted to be stable are highlighted in red and blue, respectively.
- Light colors indicate areas of low coverage and black dots indicate positions where the sequences of the two alleles diverge from one another.
- FIG. 13 is a table showing the percent and number of samples in the xT 500 cohort predicted to have HLA-LOH by the model, categorized by cancer type.
- FIG. 14 illustrates predicted HLA-LOH status among all samples in the xT 500 cohort.
- Each column represents a sample, with the LOH status of each HLA gene (HLA-A, HLA-B, or HLA-C as denoted by the y-axis label) shown as Predicted LOH (red), Predicted Stable (blue), or Homozygous (grey).
- FIG. 15 illustrates the association or lack of association between T umor Mutational Burden (TMB) and LOH status. These charts compare the log normalized TMB between samples with no HLA-LOH (blue) and predicted HLA-LOH (red). Significance was determined by Student’s T test.
- FIG. 16 is a schematic of an example process for determining HLA LOH status in a three-class classification process having two classification stages.
- FIG. 17 is a schematic of an example HLA LOH classification stage of FIG. 16.
- FIG. 18 illustrates normal sample plots of (i) read coverage (number of reads) on the y- axis for two different alleles (B*44:02 (red data points) and B*07:02 (blue data points)) as a function of nucleotide position, (ii) BAF for the two different alleles as a function of nucleotide position, and (iii) Log Ratio of read coverage in the tumor sample to the read coverage in the normal sample as a function of nucleotide position.
- FIG. 19 illustrates tumor sample plots of (i) read coverage (number of reads) on the y- axis for two different alleles (B*44:02 (red data points) and B*07:02 (blue data points)) as a function of nucleotide position, (ii) BAF for the two different alleles as a function of nucleotide position, and (iii) Difference between Log Ratio of the two different alleles as a function of nucleotide position, illustrating a partial LOH example.
- FIG. 20 illustrates normal and tumor sample plots of (i) read coverage (number of reads) on the y-axis for two different alleles (B*44:02 (red data points) and B*07:02 (blue data points)) as a function of nucleotide position, (ii) Log Ratio of read coverage in the tumor sample to the read coverage in the normal sample as a function of nucleotide position, and (iii) Difference between Log Ratio of the two different alleles as a function of nucleotide position, illustrating a clonal LOH example.
- “Pseudogene” means a non-functional HLA gene (for example, HLA-Y) and/or an HLA gene that isn’t expressed. HLA pseudogenes may not impact a patient’s health, immune system activity and/or control of cancer cells, but these pseudogenes may have genetic sequences that are similar to the genetic sequences of functional HLA genes, such that sequence reads from HLA pseudogenes could potentially align to functional HLA genes.
- Genetic analyzer means a device, system, and/or methods for determining the characteristics (including sequences) of nucleic acid molecules (including DNA, RNA, etc.) present in biological specimens (including tumors, biopsies, tumor organoids, blood samples, saliva samples, or other tissues or fluids).
- “Targeted Panel” means a combination of probes for next-generation sequencing of a patient’s biological specimens (including tumors, biopsies, tumor organoids, blood samples, saliva samples, or other tissues or fluids) which are selected to map one or more loci on one or more chromosomes.
- “Sequencing probe” means a collection of chemicals which attach to a locus of a chromosome based on the expected sequence of nucleotides at the RNA or DNA present at that locus.
- RNA read count means the read counts of RNA or cDNA generated from a genetic analyzer.
- Bioinformatics pipeline means a series of processing stages of a pipeline to instantiate bioinformatics reporting regarding next-generation sequencing results of a patient’s tumor or normal tissue or bodily fluids to extract and report on variants present in the patient’s genome.
- Genetic profile means a combination of one or more variants, RNA transcriptomes, or other informative genetic characteristics determined for a patient from next-generation sequencing.
- Genetic sequence means a recordation of a series of nucleotides present in a patient’s RNA or DNA as determined from sequencing the patient’s tissue or fluids.
- Variant means a difference in a genetic sequence or genetic profile when compared to a reference genetic sequence or expected genetic profile.
- “Expression level” means the number of copies of an RNA or protein molecule generated by a gene or other genetic locus, which may be defined by a chromosomal location or other genetic mapping indicator.
- Gene product means a molecule (including a protein or RNA molecule) generated by the manipulation (including transcription) of the gene or other genetic locus, which may be defined by a chromosomal location or other genetic mapping indicator.
- NGS DNA Next-Generation Sequencing
- Class I genes include HLA-A, -B, and -C, as well as the non- classical MHC-lb genes HLA-E, -F, and -G.
- Class II genes include DRA, DRB1 , DQA1 , DQB1 , DPA1 , and DPB1 . Multiple alleles exist for each genetic locus.
- HLA polymorphic lipoprotein
- the polymorphic nature of HLA is an important evolutionary development, as it allows the population to display a wide range of antigens to the immune system.
- the large degree of polymorphism at the Class I and Class II loci poses a significant challenge for detecting mutation and loss of heterozygosity.
- HLA-LOH heterozygosity
- the HLA-LOH processes herein may be executed on one or more network accessible computer processing systems, including network accessible devices communicatively coupled to other computer systems, such as other NGS systems.
- the processes include, initially receiving genetic material (DNA or RNA) isolated from a patient specimen and sequenced, for example, using a NGS technique.
- the processes may receive only the sequence data.
- the specimen may be any biological sample obtained from the patient, such as a tissue sample (e.g., tumor tissue from a biopsy), a cell sample, blood, saliva, urine, and the like.
- Both cancer and non-cancer specimens may be isolated and sequenced by the computer processing systems performing the HLA-LOH processes, and such systems may store the sequence data in a set of data files for the cancer specimens and a set of data files for non-cancer specimens. Each file may be configured to store the sequence of each detected read and the number of times (counts) that a sequence was detected.
- Example data file formats include a BCL file or a FASTQ file, where the FASTQ format further includes a quality score for each read.
- the computer processing systems may pre-process the sequence data by filtering and/or cleaning the data and align that pre-processed data against a reference genome, for example, using a bioinformatics pipeline executed using the computer processing system.
- the reference genome build is the hg 19 genome (see, e.g., GenBank assembly accession: GCA 000001405.1).
- GenBank assembly accession: GCA 000001405.1 the reference genome build is the hg 19 genome.
- the hg 19 genome contains only one allele for each FILA gene; therefore many reads detected from the FILA genes may not map to hg 19.
- the normalization and alignment for sequence data occurs for both cancer and non-cancer specimens, yielding a set of output files for cancer specimens and a set of output files for non-cancer specimens.
- the output files may store genetic positions indicating the location in the reference genome that matches the sequence of each read, and additional information relating to mapping attributes and mapping quality of each read.
- Example file formats include a Binary Alignment Map (BAM) file.
- BAM Binary Alignment Map
- Unmapped reads that is, reads that do not match the genome with quality scores that exceed quality thresholds, are stored in the BAM file with corresponding read flags indicating that the read did not map successfully. This may be due to high numbers of mismatched bases or a high degree of multimapping. In some examples, reads bearing this unmapped flag are generally excluded from downstream analysis (variant calling, etc.).
- FIG. 1 illustrates an example workflow 10 for next generation sequencing, bioinformatics processing, and report generation, in an example.
- cancer samples and non-cancer samples may be processed by DNA next generation sequencing (NGS) 12, designed to sequence either the whole exome or a targeted panel of cancer-related genes, to generate DNA sequencing data, and the DNA sequencing data may be processed by a bioinformatics pipeline 14 to generate HLA-LOH results (among other outputs) for each sample.
- the cancer sample may be a tissue sample or blood sample containing cancer cells.
- a tumor organoid sample may be processed instead of the patient cancer sample.
- germline (“normal”, non-cancerous) DNA may be extracted from either blood (for example, if a patient has cancer that is not a blood cancer) or saliva (for example, if a patient has blood cancer).
- Normal blood samples may be collected from patients (for example, in PAXgene Blood DNA Tubes) and saliva samples may be collected from patients (for example, in Oragene DNA Saliva Kits).
- Blood cancer samples may be collected from patients (for example, in EDTA collection tubes).
- Macrodissected FFPE tissue sections (which may be mounted on a histopathology slide) from solid tumor samples may be analyzed by pathologists to determine overall tumor amount in the sample and percent tumor cellularity as a ratio of tumor to normal nuclei.
- background tissue may be excluded or removed such that the section meets a tumor purity threshold (in one example, at least 20% of the nuclei in the section are tumor nuclei).
- DNA may be isolated from blood samples, saliva samples, and tissue sections using commercially available reagents, including proteinase K to generate a liquid solution of DNA.
- Each solution of isolated DNA may be subjected to a quality control protocol to determine the concentration and/or quantity of the DNA molecules in the solution, which may include the use of a fluorescent dye and a fluorescence microplate reader, standard spectrofluorometer, or filter fluorometer.
- isolated DNA molecules may be mechanically sheared to an average length using an ultrasonicator (for example, a Covaris ultrasonicator).
- the DNA molecules may also be analyzed to determine their fragment size, which may be done through gel electrophoresis techniques and may include the use of a device such as a LabChip GX Touch.
- DNA libraries may be prepared from the isolated DNA, for example, using the KAPA Hyper Prep Kit, a New England Biolabs (NEB) kit, or a similar kit.
- DNA library preparation may include the ligation of adapters onto the DNA molecules.
- UDI adapters including Roche SeqCap dual end adapters, or UMI adapters (for example, full length or stubby Y adapters) may be ligated to the DNA molecules.
- adapters are nucleic acid molecules that may serve as barcodes to identify DNA molecules according to the sample from which they were derived and/or to facilitate the downstream bioinformatics processing and/or the next generation sequencing reaction.
- the sequence of nucleotides in the adapters may be specific to a sample in order to distinguish samples.
- the adapters may facilitate the binding of the DNA molecules to anchor oligonucleotide molecules on the sequencer flow cell and may serve as a seed for the sequencing process by providing a starting point for the sequencing reaction.
- DNA libraries may be amplified and purified using reagents, for example, Axygen MAG PCR clean up beads. Then the concentration and/or quantity of the DNA molecules may be quantified using a fluorescent dye and a fluorescence microplate reader, standard spectrofluorometer, or filter fluorometer.
- DNA libraries may be pooled (two or more DNA libraries may be mixed to create a pool) and treated with reagents to reduce off-target capture, for example Human COT-1 and/or IDT xGen Universal Blockers. Pools may be dried in a vacufuge and resuspended. DNA libraries or pools may be hybridized to a probe set (for example, a probe set specific to a panel that includes approximately 100, 600, 1 ,000, 10,000, etc.
- a probe set for example, a probe set specific to a panel that includes approximately 100, 600, 1 ,000, 10,000, etc.
- IDT xGen Exome Research Panel v1.0 probes IDT xGen Exome Research Panel v2.0 probes, other IDT probe panels, Roche probe panels, another probe panel that captures the human exome, or another probe panel
- amplified with commercially available reagents for example, the KAPA HiFi HotStart ReadyMix.
- Pools may be incubated in an incubator, PCR machine, water bath, or other temperature modulating device to allow probes to hybridize. Pools may then be mixed with Streptavidin- coated beads or another means for capturing hybridized DNA-probe molecules, especially DNA molecules representing exons of the human genome and/or genes selected for a genetic panel.
- Pools may be amplified and purified more than once using commercially available reagents, for example, the KAPA HiFi Library Amplification kit and Axygen MAG PCR clean up beads, respectively.
- the pools or DNA libraries may be analyzed to determine the concentration or quantity of DNA molecules, for example by using a fluorescent dye (for example, PicoGreen pool quantification) and a fluorescence microplate reader, standard spectrofluorometer, or filter fluorometer.
- a fluorescent dye for example, PicoGreen pool quantification
- a fluorescence microplate reader for example, PicoGreen pool quantification
- standard spectrofluorometer standard spectrofluorometer
- filter fluorometer filter fluorometer.
- the DNA library preparation and/or whole exome capture steps of the process 12 may be performed partially or wholly with an automated system, using a liquid handling robot (for example, a SciClone NGSx).
- the library amplification may be performed on a device, for example, an lllumina C-Bot2, and the resulting flow cell containing amplified target-captured DNA libraries may be sequenced on a next generation sequencer, for example, an lllumina HiSeq 4000 or an lllumina NovaSeq 6000 to a unique on-target depth selected by the user, for example, 300x, 400x, 500x, 10,000x, etc. Samples may be further assessed for uniformity with each sample required to have 95% of all targeted bp sequenced to a minimum depth selected by the user, for example, 300x.
- the next generation sequencer may generate a FASTQ, BCL, or other file for each flow cell or each patient sample.
- the bioinformatics pipeline 14 may filter FASTQ data obtained from the NGS Lab process 12.
- Filtering FASTQ data may include correcting sequencer errors and removing (trimming) low quality sequences or bases, adapter sequences, contaminations, chimeric reads, overrepresented sequences, biases caused by library preparation, amplification, or capture, and other errors.
- Entire reads, individual nucleotides, or multiple nucleotides that are likely to have errors may be discarded based on the quality rating associated with the read in the FASTQ file, the known error rate of the sequencer, and/or a comparison between each nucleotide in the read and one or more nucleotides in other reads that has been aligned to the same location in the reference genome. Filtering may be done in part or in its entirety by various software tools, for example Skewer (see doi.org/10.1186/1471 -2105-15-182).
- FASTQ files may be analyzed for rapid assessment of quality control and reads, for example, by a sequencing data QC software such as AfterQC, Kraken, RNA-SeQC, FastQC, (see lllumina, BaseSpace Labs or illumina.com/products/by-type/informatics-products/basespace-sequence- hub/apps/fastqc.html), or another similar software program.
- a sequencing data QC software such as AfterQC, Kraken, RNA-SeQC, FastQC, (see lllumina, BaseSpace Labs or illumina.com/products/by-type/informatics-products/basespace-sequence- hub/apps/fastqc.html), or another similar software program.
- a sequencing data QC software such as AfterQC, Kraken, RNA-SeQC, FastQC, (see lllumina, BaseSpace Labs or illumina.com/products/by-type/informatics-products/basespace-sequence
- each read in the file may be aligned to the location in the human genome having a sequence that best matches the sequence of nucleotides in the read.
- There are many software programs designed to align reads for example, Novoalign (Novocraft, Inc.), Bowtie, Burrows Wheeler Aligner (BWA), programs that use a Smith-Waterman algorithm, etc.
- Alignment may be directed using a reference genome (for example, hg 19, GRCh38, hg38, GRCh37, other reference genomes developed by the Genome Reference Consortium, etc.) by comparing the nucleotide sequences in each read with portions of the nucleotide sequence in the reference genome to determine the portion of the reference genome sequence that is most likely to correspond to the sequence in the read.
- the alignment may generate a SAM file, which stores the locations of the start and end of each read according to coordinates in the reference genome and the coverage (number of reads) for each nucleotide in the reference genome.
- the SAM files may be converted to BAM files, BAM files may be sorted, and duplicate reads may be marked for deletion, resulting in de-duplicated BAM files.
- a BAM file may contain reads from both a cancer sample and a normal sample, and these samples may be derived from the same patient.
- a matched tumor-normal oncology targeted panel single-site Next Generation Sequencing (NGS) assay may be used for pre-processing.
- the assay is a laboratory-developed test (LDT).
- the assay is a marketed assay approved by a regulatory body.
- the assay may include reagents, software, instruments, and procedures for testing DNA extracted from formalin-fixed, paraffin-embedded (FFPE) tumor specimens and matched normal blood or saliva specimens.
- FFPE formalin-fixed, paraffin-embedded
- the assay is designed to detect and identify somatic alterations for use and interpretation by qualified healthcare professionals to aid in the clinical management of previously diagnosed cancer patients with solid malignant neoplasms.
- the assay is a next generation sequencing-based in vitro diagnostic device intended for use in the detection of substitutions (single nucleotide variants (SNVs) and multi-nucleotide variants (MNVs)) and insertion and deletion alterations (INDELs) in 648 genes, as well as microsatellite instability (MSI) status using DNA isolated from formalin- fixed paraffin embedded (FFPE) tumor tissue specimens, and matched normal specimens, from previously diagnosed cancer patients.
- the assay may provide tumor mutation profiling to be used by qualified health care professionals in accordance with professional guidelines in oncology for patients with malignant neoplasms.
- the assay workflow includes sample processing through to the completion of sequencing and creation of an aligned BAM file for patient-matched tumor and normal samples.
- HLA-LOH determination involves novel bioinformatics pipeline software to add a parallel analysis of sequencing results to support HLA-LOH determination.
- the sequencing assay includes DNA extraction from FFPE tissue samples and matched normal saliva or blood samples. Extracted DNA undergoes whole- genome shotgun library construction and hybridization-based capture of specified regions from 648 cancer-related genes (including intronic overhang and selected promoter regions), 196 loci for microsatellite instability (MSI), and the sequencing probes also include probes specifically designed to efficiently capture a diverse array of HLA alleles.
- MSI microsatellite instability
- the systems and methods described herein may be used to determine whether a patient sample has HLA-LOH, for example.
- BAM files may be analyzed to detect genetic variants, including single nucleotide variants (SNVs), copy number variants (CNVs), gene rearrangements, etc.
- SNVs single nucleotide variants
- CNVs copy number variants
- VCF variant call format
- de-duplicated BAM files and a VCF generated from the variant calling pipeline may be used to compute read depth and variation in heterozygous germline SNVs between the tumor and normal samples (or between the tumor sample and a pool of process matched normal controls for tumor- only cases when the matched normal sample is not available).
- Circular binary segmentation may be applied and segments may be selected with highly differential log2 ratios between the tumor and its comparator (matched normal or normal pool).
- Approximate integer copy number may be assessed from a combination of differential coverage in segmented regions and an estimate of stromal admixture (for example, tumor purity, or the portion of a sample that is tumor vs. non-tumor) generated by analysis of heterozygous germline SNVs.
- the copy number status of chromosome (chr) 6 and/or arms or other portions of chr 6 in the tumor sample and/or the normal sample may be detected by the bioinformatics pipeline and/or received by the systems and methods.
- tumor FASTQ files may be aligned against the human reference genome using BWA for DNA files.
- DNA reads may be sorted and duplicates may be marked with a software, for example, SAMBIaster.
- Discordant and split reads may be further identified and separated.
- These data may be read into a software, for example, LUMPY, for structural variant detection.
- Structural alterations may be grouped by type, recurrence, and presence and stored within a database and displayed through a fusion viewer software tool.
- the fusion viewer software tool may reference a database, for example, Ensembl, to determine the gene and proximal exons surrounding the breakpoint for any possible transcript generated across the breakpoint.
- the fusion viewer tool may then place the breakpoint 5’ or 3’ to the subsequent exon in the direction of transcription. For inversions, this orientation may be reversed for the inverted gene.
- the translated amino acid sequences may be generated for both genes in the chimeric protein, and a plot may be generated containing the remaining functional domains for each protein, as returned from a database, for example, Uniprot.
- a report generation process 16 may be used for variant classification and reporting.
- the process 16 may detect variants and investigate detected variants following criteria from known evolutionary models, functional data, clinical data, literature, and other research endeavors, including tumor organoid experiments.
- variants may be prioritized and classified based on known gene-disease relationships, hotspot regions within genes, internal and external somatic databases, primary literature, and other features of somatic drivers.
- Variants may be added to a patient (or sample, for example, organoid sample) report based on recommendations from the AMP/ASCO/CAP guidelines. Additional guidelines may be followed. Briefly, pathogenic variants with therapeutic, diagnostic, or prognostic significance may be prioritized in the report.
- Non-actionable pathogenic variants may be included as biologically relevant, followed by variants of uncertain significance. Translocations may be reported based on features of known gene fusions, relevant breakpoints, and biological relevance. Evidence may be curated from public and private databases or research and presented as 1) consensus guidelines 2) clinical research, or 3) case studies, with a link to the supporting literature. Germline alterations may be reported as secondary findings in a subset of genes for consenting patients. These may include genes recommended by the ACMG and additional genes associated with cancer predisposition or drug resistance.
- microsatellite instability status For detecting microsatellite instability status (MSI), the probes used during library preparation before sequencing may target microsatellite regions (for example, approximately 40, 50, 60, 100, 1 ,000 regions).
- a MSI classification algorithm classifies tumors into three categories: microsatellite instability-high (MSI-H), microsatellite stable (MSS), or microsatellite equivocal (MSE).
- MSI testing for paired tumor-normal patients may use reads mapped to the microsatellite loci with at least five, ten, fifteen, etc. bp flanking the microsatellite region. A minimum read threshold may be used. For example, the identification of at least 10, 20, 30, etc.
- mapping reads in both tumor and normal samples may be required for the locus to be included in the analysis.
- a minimum coverage threshold may be used. For example, At least 10, 15, 20, etc. of the total microsatellites on the panel may be required to reach the minimum coverage.
- Each locus may be individually tested for instability, as measured by changes in the number of nucleotide base repeats in tumor data compared to normal data, for example, using the Kolmogorov-Smirnov test. If p ⁇ 0.05, the locus may be considered unstable.
- the proportion of unstable microsatellite loci may be fed into a logistic regression classifier trained on samples from various cancer types, especially cancer types which have clinically determined MSI statuses, for example, colorectal and endometrial cohorts.
- the mean and variance for the number of repeats may be calculated for each microsatellite locus.
- a vector containing the mean and variance data may be put into a support vector machine classification algorithm. Both algorithms may return the probability of the patient being MSI-H as an output which may be compared to a threshold value.
- the sample may be classified as MSI-H. If there is between a 30-70% probability of MSI-H status, the test results may be too ambiguous to interpret and those samples may be classified as MSE. If there is a ⁇ 30% probability of MSI-HMSI-H status, the sample may be considered MSS.
- a patient report may be generated at a process 16.
- the report may be presented to a patient, physician, medical personnel, or researcher in a digital copy (for example, a JSON object, pdf file, or an image on a website or portal), a hard copy (for example, printed on paper or another tangible medium), as audio (for example, recorded or streaming audio), or in another format.
- a digital copy for example, a JSON object, pdf file, or an image on a website or portal
- a hard copy for example, printed on paper or another tangible medium
- audio for example, recorded or streaming audio
- the report may include information related to the lost or present HLA alleles, including clinical trials for which the patient is eligible, therapies that may match the patient (for example, the systems and methods may be used as a companion diagnostic for these therapies) and/or adverse effects predicted if the patient receives a given therapy, based on the present or lost HLA alleles in the patient’s tumor (obtained using a process 24).
- the report may include information related to whether the patient's tumor is potentially-resistant to HLA- restricted immunotherapies (for example, cellular TCR therapies, vaccines, and immunotherapies designed to be most efficacious in the presence of a particular HLA allele or alleles, etc.).
- the report may include information related to whether the patient’s tumor is potentially a good candidate for H LA-restricted immunotherapies (for example, cellular TCR therapies, vaccines, and immunotherapies designed to be most efficacious in the absence of a particular HLA allele or alleles, etc.).
- the report may state that the patient may not respond to immunotherapies that target HLA alleles that have been lost in the patient sample, may or may not be eligible for clinical trials listing the loss or presence of those HLA alleles as inclusion or exclusion criteria (obtained using a process 26).
- treatments for example, immunotherapies
- treatments based on any HLA alleles present in the patient sample may be matched to the patient (for example, the systems and methods may be used as a companion diagnostic for these treatments) and the patient may be eligible for clinical trials listing present HLA alleles as inclusion criteria, and may not be eligible for clinical trials listing present HLA alleles as exclusion criteria (as obtained using process 26).
- the report may further include the copy number status of chr 6 and/or arms or portions of chr 6 in the tumor sample and/or normal sample.
- the report may infer HLA-LOH for that sample.
- information related to a loss of a portion of chr 6 does not specify which copy of an HLA allele was contained on the lost copy of a portion of chr 6 but provides supporting evidence that one of the HLA alleles was lost.
- the allele specific systems and methods described herein conclude that coverage of Allele B is lower than coverage of Allele A, but the coverage of Allele B is close to the threshold for calling LOH, resulting in an equivocal LOH call, which may be caused by standard variability in coverage or may reflect a partial loss or actual loss of the HLA allele.
- the chr6 LOH status serves as an orthogonal way to confirm that loss or presence of the HLA allele.
- the HLA allele that was called as equivocal loss status by the systems and methods described herein may be called as LOH.
- the HLA allele with an equivocal LOH call may be determined to be present.
- the HLA-LOH results may be used to analyze a database of clinical data, especially to determine whether there is a trend showing that a therapy slowed cancer progression in other patients having the same or similar lost/present status as the results for a given HLA allele.
- the LOH results may also be used to design tumor organoid experiments.
- an organoid may be genetically engineered to have the same HLA alleles present as a patient and may be observed after exposure to a therapy to determine whether the therapy can reduce the growth rate of the organoid, and thus may be likely to reduce the progression of cancer in the patient associated with the specimen.
- FIG. 2 illustrates an overall schematic of an example process 100 for HLA detection and analysis that may be performed by an HLA and HLA-LOH analysis system, such as that shown in FIG. 10.
- the HLA and HLA-LOH analysis system access stored genomic sequence data collected from normal tissue and from cancer tissue. More specifically, in the illustrated example, the process 100 accesses BAM files 102 containing non-cancer specimens with sequence data stored in a normal BAM file 104 and/or cancer specimens with sequence data stored in a tumor BAM file 106.
- the process 100 retrieves normal tissue (or blood) HLA mapping reads 108 from the normal BAM file 104 and tumor tissue HLA mapping reads 110 from the tumor BAM file 106.
- the normal tissue HLA mapping reads and the tumor tissue HLA mapping reads, from files 108 and 110, respectively, are communicated to or accessed by an alignment process 112.
- the alignment process 112 aligns tumor tissue data from the BAM file 106, i.e., the tumor HLA mapped reads 110, with normal tissue data from the BAM file 104, i.e., the normal HLA mapped reads 108.
- the alignment process 112 applies one or more read filters to the BAM file data prior to alignment. These filters may be applied to each HLA mapped reads data, normal tissue and tumor tissue. The filters may be applied to only one of the HLA mapped reads, normal tissue or tumor tissue.
- the filters may be stored in a hierarchical manner by the HLA and HLA-LOH analysis system, where the system applies a filters in order based on ranking, with higher ranking filters applied before lower ranked filters, and, in some examples, with an assessment of filter performance, whereby if a higher ranked filter achieves a desired filtering result, lower ranked filters are not executed by the system.
- the output from the alignment process 112 is provided to a coverage statistics process 114, that compares the aligned HLA mapped reads for normal tumor tissue and calculates coverage metrics for each allele for the normal tissue and tumor tissue data.
- the process 114 generates a report in the form of HLA allele-based coverage data 116, where that report may be stored in the system, displayed to medical personnel, and/or sent to a networked connected device, database, etc. In this way, the processes 112, 114, and 116 form an example HLA typing process.
- the HLA allele-based coverage data 116 is provided to an HLA-LOH process 118, which in the illustrated example is configured to receive other data, such as copy number data, tumor purity data, tumor ploidy data, and/or genome-wide LOH predictions (collectively data 120), and apply integrated metrics for performing an HLA-LOH classification on the received HLA allele-based coverage data.
- the data 120 may be generated by an external pathology system communicatively connected to the bioinformatics pipeline 14, e.g., the computing device 402.
- generating the data 120 may comprise a manual or automated assessment of one or more histopathology slides associated with the HLA allele-based coverage data 116.
- the data 120 may be wholly or partially generated from a module within bioinformatics pipeline 14, e.g., the computing device 402.
- the bioinformatics pipeline module may generate data 120 based on DNA-seq data, RNA-seq data, methylation data, and/or another type of bioinformatics data, and the generating may comprise a deconvolution process.
- the process 100 includes analyzing the BAM files 102 and additionally retrieving unmapped/discarded reads (i.e. , reads from a BAM file that are either assigned locations within HLA gene loci or flagged as unmapped).
- the HLA and HLA-LOH analysis system executes a preprocessing script that formats the unmapped reads (and the HLA mapped reads) from the BAM files 104 and 106 into two FASTQ files, which are fed into the next process. For the two FASTQ files, one FASTQ file is generated and contains all of the forward reads from each paired-end read, while the other FASTQ file contains the reverse of each paired-end read.
- the pairs are listed in corresponding order in the files, so the first read in the first FASTQ file will be the pair of the first read in the second FASTQ file.
- both forward and reverse reads could be included in the same FASTQ file as alternating sequences that share a similar read name.
- single read sequencing data could be included in a single FASTQ, or paired reads could be considered independent, disregarding their forward or reverse status and included in a single FASTQ.
- sequencing data from a panel of exemplary normal specimens may be used.
- sequencing data from the panel of normal specimens having HLA genetic sequences most similar to the patient’s cancer sample may be selected to create an HLA-matched panel of normal specimens.
- FIG. 3 illustrates example process 200 for the data flow for the HLA typing and the HLA-LOH model that may be implemented through the process 100.
- the two FASTQ files may be used for both HLA typing to generate HLA type, and for the LOH model, which also receives the HLA type/patient reference as input.
- BAM files 202 (such as files 102) are accessed on the HLA and HLA-
- BAM files 202 may be stored on the system, generated from tissue and/or blood biological samples from a subject and from populations of subjects, or generated remotely and accessed by the system, for example, through a bioinformatics pipeline that includes network accessible NGS systems or databases.
- FASTQ files 204 are generated from the BAM files 202.
- the FASTQ files 204 may include a FASTQ file that contains all of the forward reads from each paired-end read, and another FASTQ file that contains the reverse of each paired-end read.
- the FASTQ files 204 may consist of a single FASTQ file that contains single end reads, or paired end reads that are being considered as independent reads.
- the FASTQ files 204 are provided to two different processes, an HLA typing process 206 and an HLA-LOH process 208.
- the HLA typing process 206 generates candidate alleles in the form of HLA type data 210 for the subject’s sequence data in the BAM files 202 sample.
- the HLA-LOH process 208 generates HLA-LOH data 212 for the subject’s sequence data.
- Each of the HLA type data 210 and the HLA-LOH data 212 may be stored by the HLA and HLA-LOH analysis system and reported to clinicians or other personnel.
- an alignment is performed on the sequencing data in the BAM files 202, wherein the sequencing data is aligned against a reference genome. Further, the genetic positions indicating locations in the reference genome of mapped reads having a sequence that map to the reference genome is determined. Further still, unmapped reads in the next generation sequencing data are determined, as well, and the mapped reads data and unmapped reads data are stored in one or more FASTQ files 204 having sequence reads.
- sequence read FASTQ files 204 are fed to the processes 206 and 208.
- the process 206 identifies candidate HLA alleles and stores the candidate HLA alleles as the HLA type data 210 in an HLA reference file.
- the HLA type data 210 from the process 206 is additionally fed to the HLA-LOH process 208, which determines the HLA- LOH status for each identified HLA allele.
- the data 210 and 212 are then stored and a report of the HLA-LOH statuses for each of the HLA alleles may be generated.
- an HLA typing algorithm which may include the Optitype HLA Typing algorithm (Szolek et al., OptiType: precision HLA typing from next-generation sequencing data, Bioinformatics 2014, which is hereby incorporated by reference and in its entirety for all purposes) or the Kourami HLA typing algorithm (Lee et al., Kourami: graph-guided assembly for novel human leukocyte antigen allele discovery, Genome Biology 2018, which is hereby incorporated by reference and in its entirety for all purposes), may be applied to the two FASTQ files 204 input to the HLA typing process.
- the Optitype HLA Typing algorithm Szolek et al., OptiType: precision HLA typing from next-generation sequencing data, Bioinformatics 2014, which is hereby incorporated by reference and in its entirety for all purposes
- Kourami HLA typing algorithm Lee et al., Kourami: graph-guided assembly for novel human leukocyte antigen allele discovery, Genome Biology 2018, which is hereby incorporated by reference and
- the HLA typing algorithm finds mapped reads (pairs of reads) and analyzes them to predict which HLA alleles the patient has. For example, the HLA typing algorithm generates a list of predicted HLA alleles for the sample, based on reads that map to either the original reference HLA or any known HLA genetic sequence, including those in the international ImMunoGeneTics (IMGT) database. In one example, the sequences of some of the most common Class I HLA alleles are well-characterized and available to download through the IMGT (imgt.org). In one example, there are at least 40,000 known HLA genetic sequences.
- IMGT international ImMunoGeneTics
- the Optitype HLA Typing algorithm is used.
- the Optitype HLA Typing algorithm works on the premise that the correct genotype explains the source of more reads than any other genotype, where an allele is said to explain a read if the read is aligned to it with no more mismatches than to any other allele.
- the HLA Typing algorithm finds an allele combination, which maximizes the number of reads they explain.
- the HLA Typing algorithm includes three main steps. First, reads are mapped against a carefully constructed HLA allele reference. Because only exon 2 and 3 subsequences are available for all alleles, these regions are considered during read mapping so that no allele is disadvantaged because of incomplete sequence information.
- HLA Typing algorithm may include flanking intronic regions and a process to impute missing sequence data based on phylogenetic information.
- a binary matrix is generated indicating which alleles a specific read could be aligned to with the least number of mismatches.
- ILP integer linear program
- a special case of the set cover problem is formulated as an integer linear program (ILP) that selects up to two alleles for each locus simultaneously, maximizing the number of mapped reads that can be explained by the predicted genotype.
- ILP integer linear program
- minor alleles G, H and J are considered during optimization, as long subsequences of these minor loci show high similarity with major loci, occasionally causing ambiguous read alignments.
- the Kourami HLA typing algorithm is a graph-guided assembly technique for classical HLA genes, which can construct allele sequences given high-coverage whole-genome sequencing data.
- the Kourami HLA typing algorithm takes advantage of partial-order graphs (POGs) to capture all known alleles.
- the Kourami HLA typing algorithm further modifies the graph to include variants found in the sequencing data so that the graph includes the paths of true alleles.
- a comprehensive reference panel is created from a combined multiple sequence alignment (MSA) of both full- length and exon-only known alleles for each HLA locus. Reads mapped to all known HLA loci in the human reference genome are extracted and aligned to the comprehensive reference panel.
- MSA combined multiple sequence alignment
- Gene-wise POGs are constructed using the combined MSAs.
- the alignments of the extracted reads are projected onto the graphs so that each read alignment is stored as a path in the graphs and the read depths on the edges naturally become edge weights.
- these read- or read-pair-backed paths connect two or more neighboring heterozygous sites of two alleles, they provide phasing information.
- the graphs are modified by adding nodes and edges to incorporate differences found by the alignment, such as substitutions and indels. Note that a sequence of an allele may be encoded as a path through the entire graph.
- Kourami HLA typing algorithm formulates the problem of constructing the best pair of HLA allele sequences as finding the pair of paths through the graph.
- finding the pair the Kourami HLA typing algorithm considers consistent phasing information from the reads and coverage using base quality scores. Additionally, the pair of paths may be identical, to permit homozygous alleles.
- Table 1 includes 150 examples of Class I HLA alleles.
- the HLA alleles identified are HLA-A Allele 1 : A*02:01 , HLA -A Allele 2: A*01 :01 , HLA-B Allele 1 : B*07:02, HLA-B Allele 2: B*07:02, HLA-C Allele 1 : C*07:01 , HLA-C Allele 2: C*07:02.
- the HLA typing algorithm generates an accession number, which allows the user to retrieve an allele sequence.
- the output from the HLA typing algorithm is provided to downstream HLA-LOH models, e.g., the process 208.
- the process 100 uses the list of predicted HLA alleles, such as data 210, to create a preliminary HLA reference file composed of reference sequences of the patient’s predicted HLA alleles and all HLA pseudogenes.
- the HLA reference file is automatically generated.
- the HLA reference file may be automatically generated by pulling sequences from the Optitype (github) source code, especially the Optitype database/reference library (including the IMGT dataset) or Kourami reference library based on allele and accession number, for example using a data converter to maintain allele nomenclature consistency.
- predicted Class I HLA type data 122 is obtained and an HLA reference file is generated at a process 124, by adjusting to match the predicted HLA alleles of the non-cancer specimen.
- the process 124 generates a patient- specific HLA reference file by writing the sequence associated with each of the patient’s predicted Class I HLA types to a FAST A file.
- a FASTA file is essentially a text file where lines alternate between a sequence name (these lines start with a > symbol by convention followed by the sequence name, for example, HLA00001 ) and the following line is the nucleotide sequence corresponding to that sequence name.
- the process 124 writes the name and sequence for each predicted Class I HLA type as well as the pseudogenes.
- the output from the process 124 is an HLA reference file as a FASTA file that, in various embodiments, is then converted or indexed to a novoalign index file for alignment to generate a .nix file.
- the .nix file is a specialized format that allows novoalign software to more quickly and efficiently align reads. If the patient is homozygous for a given allele, it is included only once in the reference.
- This HLA reference file then may be a patient specific HLA reference file.
- the HLA reference file is a sequence file that includes the patient’s predicted HLA class I genes and all nonclassical HLA genes and HLA pseudogenes to ensure that a read maps to the correct gene, even though there is high homology from gene to gene.
- the HLA reference file is expanded to include class II HLA genes.
- a process 126 aligns HLA mapping reads, along with unmapped/discarded reads (from the two paired end FASTQ files mentioned above), to the predicted patient reference file (which is the FAST A file that has been indexed to be a .nix file), for example, using Novalign to generate a BAM file.
- the process 126 may filter the BAM file (in one example by using pySAM) using various filtering criteria, such as, for example, checking that: (1) the read is properly paired, (2) the read is not qc_fail (failed by quality control checks), (3), the read is not a duplicate, (4) the edit distance to the reference sequence of the predicted allele is less than or equal to 2, (5) the read has less than or equal to 2 insertions compared to the reference sequence of the predicted allele, (6) the read has less than or equal to 2 deletions compared to reference sequence of the predicted allele, and/or (7) both ends of paired read must map to the same predicted allele.
- a filtered BAM file is generated as a result.
- the process 126 may apply a variant calling process performed on the filtered alignment file (for example, the filtered BAM file), using freebayes (available from github), to identify any nucleotide positions where the patient’s HLA sequences diverge from the HLA reference.
- the variant calling included the following criteria: the sequence data must include at least 3 reads supporting the variant (indicating that the patient has an alternate allele, meaning a sequence that is not identical to the reference sequence of the predicted allele), and fewer than 5 reads supporting the reference sequence of the predicted allele.
- a process 128 updates the patient specific reference by replacing portions of the reference sequences with the variant sequences that are supported by at least 3 reads at the genomic positions of those variants to generate an updated patient HLA reference file.
- the updated patient HLA reference sequence file has been adjusted to match the exact nucleotide sequence of the non-cancer specimen HLA genes.
- the sequence is contained in a FASTA file that is then converted to a novoalign index file. If the patient is homozygous for a given allele, the sequence is included only once in the reference.
- the updated HLA reference file may then be sent to the process 112.
- a Novalign alignment of HLA mapping reads is repeated along with aligning unmapped/discarded reads to the updated reference file (if updates were made).
- Strict filtering may be used, including read is properly paired; read is not qc_fail; read is not a duplicate; edit distance to reference is 0; read has zero insertions to reference; read has zero deletions to reference; read is not mapped more than once.
- the process 112 aligns HLA mapping reads along with unmapped/discarded reads, to the patient HLA reference sequence (the updated HLA reference sequence data from process 128) using Novalign and filters reads with pySAM, using strict filtering criteria to generate a cancer specimen BAM file.
- the process 114 receives the aligned HLA mapping reads and data from the process 112 and calculates coverage (for example, the number of reads that map to a single nucleotide position) for normal HLA reads.
- coverage may be inferred for nucleotide positions located between two appropriately-oriented paired reads, for example, if the two non-overlapping reads that comprise a paired-end read do not explicitly include a nucleotide position, but flank the nucleotide position, the presence of a molecule containing this intervening nucleotide position can be inferred, and thus the paired-end read may be included in the coverage metrics calculation for that nucleotide position.
- this paired-end read would count as a read that maps to the nucleotide position even though the nucleotide position is located between the two ends of the paired-end read.
- the process 114 uses bedtools to assess coverage across each of the predicted HLA alleles in the non-cancer specimen BAM file. The result is a Table of Positional Coverage across each HLA allele in the non-cancer specimen.
- the process 114 generates a csv file (116) with the number of reads that uniquely map to a specific HLA allele at each nucleotide position along that allele in the non- cancer specimen.
- each column in the csv file represents a nucleotide position in an HLA gene and each row represents an allele.
- Each entry is a number representing the number of reads at that nucleotide position for that allele.
- the process 114 further calculates coverage for tumor HLA reads, e.g., using bedtools to assess coverage across each of the predicted HLA alleles in the cancer specimen BAM file.
- the result is a Table of Positional Coverage across each HLA allele in the cancer specimen, generating a csv file (116) with the number of reads that uniquely map to a specific HLA allele at each nucleotide position along that allele in the cancer specimen.
- the positional coverage for both the non-cancer and cancer specimen are contained in one csv file.
- row 1 may represent allele A in the normal sample
- row 2 may represent allele B in the normal sample
- row 3 may represent allele A in the tumor sample
- row 4 may represent allele B in the tumor sample.
- the cancer specimen is circulating tumor DNA (ctDNA) obtained from a blood sample and the coverage obtained from NGS analysis of ctDNA may differ from coverage obtained from NGS analysis of a specimen that contains solid tumor tissue or cancerous blood cells. The calculation of coverage metrics may be adjusted accordingly.
- the process 114 combines data from the Table of Positional Coverage across each HLA allele in the non-cancer specimen and the Table of Positional Coverage across each HLA allele in the cancer specimen, to generate higher level features to describe relative changes in coverage between the non-cancer specimen and cancer specimen and a Combined Coverage Metrics Table (e.g., using formulae for calculating, one example may include formulae from the following Python packages: pandas, NumPy, SciPy).
- This process 114 may generate a Combined Coverage Metrics Table, in the form of an expanded csv file that contains positional statistics on not only coverage depth but features including allelic frequencies of each allele, log ratios of each allele between tumor and normal, and areas of low sequencing coverage (See FIG. 9 for more details).
- the process 114 may also generate a Summary Statistics Table, in the form of a csv file where each row is an HLA gene and the columns contain summary statistics describing the differences in allele level coverage that will be used to make HLA LOH determinations.
- FIG. 4 illustrates an example output report displaying the results of HLA-LOH classification.
- HLA-LOH there are two detected copy losses (HLA-LOH) for HLA class I genes.
- HLA-A*02:01 an HLA-A allele
- HLA-B allele HLA- B*45:02
- No HLA-C alleles or HLA class II genes are reported lost in this example. All HLA alleles without the copy loss designation have been detected as present in the specimen.
- the report may include information related to the lost or present HLA alleles, including clinical trials for which the patient is eligible, therapies that may match the patient and/or adverse effects predicted if the patient receives a given therapy, based on the present or lost HLA alleles in the patient’s tumor.
- the report may include information related to whether the patient's tumor is potentially-resistant to H LA-restricted immunotherapies.
- the report may state that the patient may not respond to immunotherapies based on those lost HLA alleles, may not be eligible for clinical trials listing those lost HLA alleles as inclusion criteria, and may be eligible for clinical trials listing those lost HLA alleles as exclusion criteria.
- immunotherapies based on any present HLA alleles may be matched to the patient and the patient may be eligible for clinical trials listing present HLA alleles as inclusion criteria, and may not be eligible for clinical trials listing present HLA alleles as exclusion criteria.
- FIGS. 5A-5C are plots of combined coverage metrics for different examples of the techniques herein, some in comparison to non-technique examples, and some without the filter steps. (See, FIGS. 9A-9D for more details).
- FIG. 5A shows data that were calculated using all disclosed steps and features
- FIG. 5B shows data calculated without aligning discarded/unmapped reads to HLA genes
- FIG. 5C shows data calculated without replacing the HLA reference sequences with the variants detected in the sequence data generated by the patient sample.
- the process 100 may determine and report LOH Status for each HLA allele in the cancer (tumor) sample, with reference to the non-cancer (normal) sample.
- the process 118 may report all HLA alleles present in the tumor sample (known as stable alleles, versus lost alleles that are missing, absent, or detected with low coverage from the tumor sample) or, the process 118 may compare to a normal sample from at least one distinct patient, where the sample(s) may have matched HLA types similar to the HLA types in the tumor sample to control for sequencing bias caused by hybrid capture, GC content etc.
- the more pure a tumor sample is the stronger and more easily detectable a signal will be for a lost allele. As tumor purity decreases, the signal becomes increasingly hard to distinguish from background noise.
- the features from the Summary Statistics Table (116) are input into a machine learning classification model (of process 118) that returns a likelihood of LOH.
- alleles with a likelihood of LOH greater than 50% are reported as LOH.
- LOH Status Predictions for each allele in the predicted HLA alleles are determined by the process 118 using a Shallow Decision Tree machine learning model.
- FIG. 6 illustrates an example shallow decision tree 300 that may be executed by the process 118.
- the first line of each node (represented by a box in FIG. 6) is the name of a feature that corresponds to a statistic selected from the Summary Statistics Table (116) and a cut-off threshold against which the sample’s value for that feature is compared. If the value of the sample meets or does not meet the threshold criterion, the sample is sorted into the corresponding branch of the decision tree.
- delta_expected_difference_logR of a sample is less than or equal to 0.123
- the mean differenceJogR of the sample is then compared to a set threshold, etc.
- the other lines of text in a box may indicate the gini index value for that node, the number of samples (which may mean the number of HLA genes that were analyzed for LOH) sorted by that node, and during model training, “value” may act as a confusion matrix by indicating the number of samples (HLA genes) that were sorted into that node and that had manual annotations of either loss (right number) or stable (left number) HLA status.
- the decision tree 300 is shallow/short with few nodes to avoid overfitting, decisions are based on features from the Summary Statistics Table (116), and features or threshold values may change).
- a decision tree that is shallow may be easier to interpret, making it easier to explain the classification of a patient or specimen, for example, if a physician calls to ask about a “borderline” allele.
- the classification models of process 118 may be particularly configured to reduce processing time and increase the speed by which particular alleles can be classified, for faster ultimate diagnosis. These decision tree models are also typically more resilient to variations in upstream sample analysis.
- decision tree outputs are more discrete, for example, three possible decision tree outputs could be clear loss of an HLA allele, or clear stability of an HLA allele, and one intermediate state. Another example may include more than one intermediate state.
- LOH Status Predictions from the process 118 may be determined using other decisional techniques, such as Random Forest methods which may be slightly more accurate, and may yield a more continuous distribution of probabilities/likelihoods, for example, 75% likelihood of a loss of an HLA allele.
- the process 118 may apply a coverage threshold, such that any HLA allele with coverage below a threshold is reported by the process 118 as a loss of heterozygosity for that allele.
- the process 118 may be configured such that the threshold may be specific to the testing panel used for NGS sequencing.
- the coverage threshold below which an allele is reported as lost may be approximately 75 reads for an example (targeted -600 gene) genomic sequencing panel or 35 reads for an example (whole exome) sequencing panel, where the process reports each allele as either stable or lost.
- the model may report an equivocal or uncertain status for an allele in a specimen that is not obviously stable (present in the specimen) or lost (absent from the specimen).
- coverage metrics for an allele may fall in the middle of the distribution of coverage metrics values observed from all specimens, placing the coverage metrics in a range where the allele has a roughly equivalent probability of being either lost or stable.
- the process 100 may match a patient with clinical trials and/or a therapy/therapies that are likely to eliminate the cancer cells, based on HLA alleles that are present in cancer sample as predicted by the HLA LOH model. This may help a physician make a therapy decision or identify a matched set of possible therapies or clinical trials in which the patient may participate.
- the clinical trials are matched to the patient’s HLA LOH results based on the trials having inclusion/exclusion criteria based on the presence of specific HLA alleles in tumor or cancer cells.
- a biological assay to test for the presence of any of the alleles is performed.
- an assay which may include fluorescence activated cell sorting (FACS)
- FACS fluorescence activated cell sorting
- an assay which may include fluorescence activated cell sorting (FACS)
- FACS fluorescence activated cell sorting
- antibodies for example, one detecting HLA allele A*02, one detecting A*03, and one detecting B*07, to confirm the presence or the absence of various HLA alleles.
- Antibodies directed to other alleles are known in the art, and additional antibodies to detect other HLA alleles are in development.
- the techniques described herein were used to analyze a patient non-cancer sample, a patient cancer sample, and a tumor organoid (T.O.) derived from the patient cancer sample and predicted that the cancer sample and T.O. had lost an A*02 HLA allele but maintained a stable A*03 HLA allele (see FIGS. 8A-8C).
- T.O. tumor organoid
- FACS was used on the T.O. to detect the presence of these two HLA alleles, and the results are shown in FIGS. 7 A & 7B.
- FIGs. 7 A & 7B include the following FACS plots: the top row shows FACS results from an anti-A*03 antibody assay (FIG. 7 A) and the bottom row shows FACS results from an anti-A*02 antibody assay (FIG. 7B). From left to right in each row, there is a plot for a negative A*02 control sample, a plot for the tumor organoid sample, and a plot for a positive A*02 control. The upper half of each plot indicates which cells bound the pan HLA Class-I antibody, indicating that those cells were expressing HLA Class-I molecules.
- each plot indicates which cells bound either the anti-A*03 antibody (top row) or the anti-A*02 antibody (bottom row), indicating that those cells expressed the allele targeted by the antibody used to generate that plot.
- Horizontal and vertical lines within the plots indicate the location of cut-offs used to determine those percentages and numbers in the outer corners of the plots indicate the percentage of all data points in the plot that are located in each quadrant of the plot.
- Each of the plots shows a cell population that expressed HLA Class-I molecules, demonstrated by the data points being located in the upper two quadrants of each plot.
- the A*02 negative control and the tumor organoid plots in the bottom row show a cell population that is not expressing the A*02 allele, demonstrated by the data points being located in the left two quadrants of the plots. All remaining plots show a cell population that expressed either the A*02 allele (bottom row plots) or the A*03 allele (top row plots), demonstrated by the data points being located in the right two quadrants of each plot.
- T.O. tumor organoid
- T.O. genetic material may be sequenced to generate T.O. sequence data
- HLA LOH model may be used on the T.O sequence data.
- FIG. 8A-8C show examples of plots for different types of tissues. In this example, FIG.
- FIG. 8A shows coverage data calculated by the methods disclosed herein for the non-cancer sample tissue.
- FIG. 8B shows coverage data calculated by the methods disclosed herein for the cancer sample tissue.
- FIG. 8C shows coverage data calculated by the methods disclosed herein for a tumor organoid derived from the cancer sample tissue.
- FIG. 8A shows approximately equivalent coverage for two HLA alleles (A*02:01 shown in red data points and A*03:01 shown in blue data points) in the non-cancer tissue.
- FIG. 8B shows reduced coverage for the A*02:01 allele.
- the sequence reads from the cancer tissue mapping to the A*02:01 allele may be explained by the presence of non-cancer cells in the cancer sample due to the heterogeneity of cancer samples that do not have 100% tumor purity.
- FIG. 8A shows coverage data calculated by the methods disclosed herein for the non-cancer sample tissue.
- FIG. 8B shows coverage data calculated by the methods disclosed herein for the cancer sample tissue.
- FIG. 8C shows coverage data
- the complete loss of the A*02:01 allele in the T.O. may reflect the absence of non-cancer cells in the T.O., which indicates that the T.O. has 100% “tumor purity”.
- FIGS. 9A-9D illustrate example plots of coverage (number of reads) on the y-axis (plots in the top row) or the fraction of cancer specimen coverage divided by non-cancer specimen coverage (B allele fraction) on the y-axis (plots in the bottom row).
- HLA alleles plotted as data points having either shades of red or shades of blue, depending on which allele is associated with each data point
- the two alleles are B*44:03 (red data points) and B*15:10 (blue data points).
- lighter shades of red or blue indicate that coverage at that nucleotide position was below a user determined threshold and data corresponding to reads mapping to those positions were excluded from downstream summary statistic calculations.
- the Optitype dataset is optimized to have consistent sequence lengths across each allele, inferring missing intronic sequence when missing, which reduces the need to normalize LOH signal across sequences of highly divergent lengths (e.g., if one allele is 1400bp and the other is only 400bp).
- the present technique first performs an alignment step using the patient’s normal NGS data allowing for some degree of mismatch. By performing variant calling against the initial HLA reference, positions where the NGS data does not support the initial chosen reference can be identified. The reference can then be updated and the alignment repeated with the more appropriate reference sequence.
- HLA-A and HLA-C are the most divergent, and yet still most alleles of these two genes share greater than 90% homology with one another across their most polymorphic regions (Exons 2 and 3). Because of this homology, including all of the patient's alleles in the mapping reference ensures that reads do not erroneously cross map between HLA genes or multimap to two HLA genes and skew coverage metrics.
- the Loss of Heterozygosity determination may hinge on whether there is a relative loss of coverage for a particular HLA allele in a tumor sample, relative to its matched normal control. This calculation may include normalizing the read counts between normal and tumor NGS data when they may have been sequenced at different depths.
- the metric used for normalization may include the number of unique reads mapping to the HLA reference, total reads, total mapped reads, or total mapped reads minus duplicates.
- Power to distinguish LOH is more of a function of coverage and estimates of tumor purity.
- these area- based metrics when integrated with depth and coverage features, also incorporate some measure of how confident the model is in its ability to resolve the two alleles (e.g. a higher area- based score means there are more positions that meet the read depth threshold and diverge between the two alleles).
- the difference in area between the variant allele frequency (VAF) curves as a feature - the B allele frequency (BAF) at any given position is the ratio of reads supporting each allele.
- the area between the two BAF curves defines how much the NGS reads have been skewed towards a particular allele.
- the BAF is almost 1 .0 and 0 for the stable and lost allele, respectively.
- the tumor specific difference in BAF is an incredibly sensitive metric of allele loss.
- the BAF will fluctuate across the length of a gene but generally land somewhere around 0.5 for each allele, however it is not impossible for one allele to be slightly more well covered than the other (possibly due to better homology with sequencing probes).
- the method arrives at a feature that is robust to noise and still very sensitive to allelic imbalance.
- Tumor samples that are prepared for sequencing by NGS are generally heterogeneous and contain a mixture of tumor cells, healthy stroma and immune cells. As a result, a fully clonal loss may not necessarily appear as full loss of one allele sequence. For the sequencing specimen, it is advantageous to account for tumor purity when determining how much loss would be expected.
- Tumor purity may be estimated by methods that include but are not limited to assessing a histopathological slide corresponding to the sample that was sequenced by NGS, by analyzing DNA sequence data, or by analyzing RNA sequence data.
- Expected difference in logR may be defined as Iog2 of (1- tumor purity).
- delta_expected_difference_logR An areawise difference between the observed difference in logR value and the expected difference in logR value for a complete LOH sample, defined in this patent as delta_expected_difference_logR, may be determined by comparing the observed difference in logR to the expected value generated by our tumor purity estimate, the method more effectively determines whether the loss of HLA reads observed in the tumor sample represents a loss that would be on par with clonal LOH.
- a loss of heterozygosity in a specific HLA gene (such as HLA-A, HLA-B, or HLA- C) in a cancer specimen may be determined in accordance with a threshold value, which may be set if, for instance, a significant difference exists between the read counts of the first tumor allele for the HLA gene and the read counts of the second tumor allele for the HLA gene.
- a significant difference may exist, for instance, if the difference between the read counts of the first tumor allele for the HLA gene and the read counts of the second tumor allele for the HLA gene is significantly more than the difference between the read counts of the first normal allele for such HLA gene and the read counts of the second normal allele for such HLA gene.
- “Significantly more” may be confirmed, for instance, when the delta_expected_difference_logR value for the HLA gene is significant. For instance, the delta_expected_difference_logR value may be significant if it is between 0 and -2. “Significantly” more may be confirmed, for instance in circumstances where LOH is partial rather than complete, when the delta_expected_difference_logR value for the HLA gene is between 0 and .1 , between 0 and 0.2, between 0 and 0.25, between 0 and 0.5, or between 0 and 1 .
- Determination of whether an HLA gene suffers a LOH can help further determine whether certain treatment options may be appropriate for patients.
- treating the cancer by administering a therapy known to be effective against HLA-heterozygous cancers may be appropriate.
- a checkpoint inhibitor therapy may be appropriate for a subject with an HLA-heterozygous cancer.
- the checkpoint inhibitor therapy may be selected from the group consisting of an anti-CTLA-4 therapy, an anti-PD-1 therapy, and an anti-PD-L1 therapy, for example.
- Examples may include ipilimumab, nivolumab, pembrolizumab, pidilizumab, atezolizumab, Ipilimumab, and/or tremelimumab, and may include combination therapies, such as nivolumab + ipilimumab.
- a cancer vaccine may be appropriate, such as a cancer vaccine targeted to a specific HLA allele.
- a peptide cancer vaccine available through Shiga University to treat HLA-A*02-positive advanced non-small cell lung cancer (NCT01069640).
- Another example is a peptide cancer vaccine available through Shiga University to treat HLA-A*24-positive advanced small cell lung cancer (NCT01069653).
- FIG. 10 illustrates an example system 400 for HLA and HLA-LOH analysis that may be implemented on a network accessible processing system for performing the processes described herein.
- the system 400 may be part of a precision medicine platform.
- the example system may be part of an NGS system or implemented on one or more network accessible processing systems (e.g., servers) communicatively coupled to an NGS system, a network accessible sequencing database, digital reporting system, or other processing system.
- network accessible processing systems e.g., servers
- the HLA and HLA-LOH analysis system 400 may be configured for performing the methods described herein including those of processes 100 and 200.
- the system 400 may include a computing device 402, and more particularly may be implemented on one or more processing units 404, e.g., Central Processing Units (CPUs), and/or on one or more or Graphical Processing Units (GPUs) 406, including clusters of CPUs and/or GPUs.
- CPUs Central Processing Units
- GPUs Graphical Processing Units
- Features and functions described may be stored on and implemented from one or more non-transitory computer-readable media 408 of the computing device.
- the computer-readable media 408 may include, for example, an operating system 410 and software modules, or "engines,” that implement the methods described herein, including those of processes 100 and 200 and other processes illustrated and described herein.
- the computer-readable media 408 stores an HLA analysis system 412 for performing the HLA typing processes and HLA-LOH processes described herein.
- the HLA analysis system 412 includes an HLA typing process 414 and an HLA-LOH process 416, both similar to those described in examples of FIGS. 2 and 3.
- An HLA- LOH report generator 418 is configured to store and generate HLA allele predictions and LOH allele reports, also in accordance with the examples herein.
- the computer-readable media 408 may store sequence data processing instructions, including BAM file analysis instructions, sequence data filtering instructions, FASTQ file generation instructions, and normalization processes instructions for implementing the techniques herein.
- the computing device 402 may be a distributed computing system, such as an Amazon Web Services cloud computing solution.
- the computing device 402 may be implemented on one network accessible processing device 450 or distributed across multiple such devices 450, 452, 454, etc.
- the computing device 402 includes a network interface 420 communicatively coupled to network 422, for communicating to and/or from a portable personal computer, smart phone, electronic document, tablet, and/or desktop personal computer, or other computing devices for communicating overlay maps, predicted tile classifications and locations, predicted cell classifications and locations, etc. Such information may also be stored in a database 424.
- the computing device 402 further includes an I/O interface 426 connected to devices, such as digital displays 428 for displaying generator overlay maps, user input devices 430, etc.
- a dashboard generator 432 may be used to generate GUI and/or other digital displays allowing a user to review and interact with and adjust generated HLA allele reports and HLA-LOH allele reports.
- the network 422 may be a public network such as the Internet, a private network such as that of a research institution or a corporation, or any combination thereof.
- Networks can include, local area network (LAN), wide area network (WAN), cellular, satellite, or other network infrastructure, whether wireless or wired.
- the networks can utilize communications protocols, including packet-based and/or datagram-based protocols such as Internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), or other types of protocols.
- IP Internet protocol
- TCP transmission control protocol
- UDP user datagram protocol
- the networks can include a number of devices that facilitate network communications and/or form a hardware basis for the networks, such as switches, routers, gateways, access points (such as a wireless access point as shown), firewalls, base stations, repeaters, backbone devices, etc.
- the computer-readable media 408 may include executable computer-readable code stored thereon for programming a computer (e.g., comprising a processor(s) and GPU(s)) to the techniques herein.
- Examples of such computer-readable storage media include a hard disk, a CD-ROM, digital versatile disks (DVDs), an optical storage device, a magnetic storage device, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable Programmable Read Only Memory), an EEPROM (Electrically Erasable Programmable Read Only Memory) and a Flash memory.
- the processing units of the computing device may represent a CPU-type processing unit, a GPU-type processing unit, a field-programmable gate array (FPGA), another class of digital signal processor (DSP), or other hardware logic components that can be driven by a CPU.
- FPGA field-programmable gate array
- DSP digital signal processor
- HLA-LOH To investigate the prevalence of HLA-LOH, we utilized the specialized pipeline described above to detect HLA-LOH by DNA next-generation sequencing (NGS).
- NGS DNA next-generation sequencing
- Class I HLA alleles are highly polymorphic and most individuals have two distinct alleles for each HLA gene. Each allele allows for presentation of a unique pool of short peptides (approximately 8-11 amino acids in length) derived from the cellular products being made by each cell in the body.
- NGS next-generation sequencing
- HLA Loss of Heterozygosity is a potential escape mechanism for tumors under immune pressure, where tumors can lose one copy of HLA and thereby avoid presenting potent neoepitopes. (See FIG. 11 and Tran et al., New England Journal of Medicine 2016;
- HLA LOH could be an especially important escape mechanism to identify in target populations.
- the HLA-LOH process 100 was used.
- the HLA- LOH process 100 takes as inputs BAM files 102 from a matched Tumor and Normal Sample, respectively, as well as two digit HLA type 122 (similar to those generated by
- Optitype/Kourami/etc. Optitype/Kourami/etc.
- tumor purity and ploidy information 120 See FIG. 2
- a full length HLA sequence is not required.
- the process 100 then maps all HLA mapping reads as well as all unmapped reads to a new HLA reference 124 & 126. After accounting for potential germline variants present in the sample’s HLA genes, it updates alignments and determines allele specific coverage. [183] By comparing changes in coverage between alleles, in the context of the expected tumor purity, the process 100 then determines, at 128, whether any reduction in allele coverage is consistent with a clonal loss of a specific HLA allele.
- the output of the HLA-LOH process 100 is a prediction of LOH status for HLA-A, HLA-B, and HLA-C genes.
- HLA LOH occurs across the entire locus -
- TMB Burden
- FIGS. 7 A and 7B are flow cytometry experiment results showing the expression of the stable and lost allele relative to a pan HLA antibody. Gated on live cells.
- MLAs include supervised algorithms (such as algorithms where the features/classifications in the data set are annotated) using linear regression, logistic regression, decision trees, classification and regression trees, Naive Bayes, nearest neighbor clustering; unsupervised algorithms (such as algorithms where no features/classification in the data set are annotated) using Apriori, means clustering, principal component analysis, random forest, adaptive boosting; and semi-supervised algorithms (such as algorithms where certain features/classifications in the data set are annotated) using generative approach (such as mixture of Gaussian distributions, mixture of multinomial distributions, hidden Markov models), low density separation, graphbased approaches (such as mincut, harmonic function, manifold regularization), heuristic approaches, or support vector machines.
- generative approach such as mixture of Gaussian distributions, mixture of multinomial distributions, hidden Markov models
- graphbased approaches such as mincut, harmonic function, manifold regularization
- heuristic approaches or support vector machines.
- NNs include conditional random fields, convolutional neural networks, attention based neural networks, long short term memory networks, or other neural models where the training data set includes a plurality of samples and RNA expression data for each sample. While MLA and neural networks identify distinct approaches to machine learning, the terms may be used interchangeably herein. Thus, a mention of MLA may include a corresponding NN or a mention of NN may include a corresponding MLA.
- Training may include identifying common expression characteristics shared across RNA gene expressions in tissue normal samples, primary samples, and metastatic samples, such that the MLA may predict the ratio of a metastases tumor from the background tissue and identify which portion of an input RNA expression set may be attributed to the tumor and which portion may be attributed to the background tissue.
- Common expression characteristics may include which genes are expected to be overexpressed, expressed, and/or underexpressed for each type of tissue and/or tumor and may be identified for each k cluster as the corresponding genes.
- the annotations provided for each sample would be a full transcriptome gene expression dataset, cancer type, tissue site, and background tissue percentage.
- an implementation of one or more embodiments of the methods and systems as described above may include microservices constituting a digital and laboratory health care platform supporting detection of LOH in a cancer specimen, especially in HLA genes.
- Embodiments may include a single microservice for executing and delivering HLA LOH detection or may include a plurality of microservices each having a particular role which together implement one or more of the embodiments above.
- a first microservice may execute alignment of reads to HLA genes in order to deliver HLA reference sequences to a second microservice for calculating coverage metrics.
- the second microservice may execute calculating coverage metrics to deliver coverage metrics according to an embodiment, above.
- a third microservice may receive coverage metrics from a second microservice and may execute HLA LOH modeling to deliver an LOH status for each HLA allele in a specimen.
- micro-services may be part of an order management system that orchestrates the sequence of events as needed at the appropriate time and in the appropriate order necessary to instantiate embodiments above.
- a micro-services based order management system is disclosed, for example, in U.S. Prov. Patent Application No. 62/873,693, titled “Adaptive Order Fulfillment and Tracking Methods and
- an order management system may notify the first microservice that an order for HLA typing has been received and is ready for processing.
- the first microservice may execute and notify the order management system once the delivery of HLA typing is ready for the second microservice.
- the order management system may identify that execution parameters (prerequisites) for the second microservice are satisfied, including that the first microservice has completed, and notify the second microservice that it may continue processing the order to calculate coverage metrics according to an embodiment, above.
- the genetic analyzer system may include targeted panels and/or sequencing probes.
- a targeted panel is disclosed, for example, in U.S. Prov. Patent Application No. 62/902,950, titled “System and Method for Expanding Clinical Options for Cancer Patients using Integrated Genomic Profiling”, and filed 9/19/19, which is incorporated herein by reference and in its entirety for all purposes.
- targeted panels may enable the delivery of next generation sequencing results for HLA LOH detection according to an embodiment, above.
- An example of the design of next-generation sequencing probes is disclosed, for example, in U.S. Prov. Patent Application No. 62/924,073, titled “Systems and Methods for Next Generation Sequencing Uniform Probe Design”, and filed 10/21/19, which is incorporated herein by reference and in its entirety for all purposes.
- the methods and systems described above may be utilized after completion or substantial completion of the systems and methods utilized in the bioinformatics pipeline.
- the bioinformatics pipeline may receive next-generation genetic sequencing results and return a set of binary files, such as one or more BAM files, reflecting DNA and/or RNA read counts aligned to a reference genome.
- the methods and systems described above may be utilized, for example, to ingest the DNA and/or RNA read counts and produce HLA LOH detection as a result.
- any RNA read counts may be normalized before processing embodiments as described above.
- An example of an RNA data normalizer is disclosed, for example, in U.S. Patent Application No. 16/581 ,706, titled “Methods of Normalizing and Correcting RNA Expression Data”, and filed 9/24/19, which is incorporated herein by reference and in its entirety for all purposes.
- any system and method for deconvoluting may be utilized for analyzing genetic data associated with a specimen having two or more biological components to determine the contribution of each component to the genetic data and/or determine what genetic data would be associated with any component of the specimen if it were purified.
- An example of a genetic data deconvoluter is disclosed, for example, in U.S. Patent Application No. 16/732,229 and PCT19/69161 , both titled “Transcriptome Deconvolution of Metastatic Tissue Samples”, and filed 12/31/19, U.S. Prov. Patent Application No.
- 62/924,054 titled “Calculating Cell-type RNA Profiles for Diagnosis and Treatment”, and filed 10/21/19, and U.S. Prov. Patent Application No. 62/944,995, titled “Rapid Deconvolution of Bulk RNA Transcriptomes for Large Data Sets (Including Transcriptomes of Specimens Having Two or More Tissue Types)”, and filed 12/6/19 which are incorporated herein by reference and in their entirety for all purposes.
- RNA expression levels may be adjusted to be expressed as a value relative to a reference expression level, which is often done in order to prepare multiple RNA expression data sets for analysis to avoid artifacts caused when the data sets have differences because they have not been generated by using the same methods, equipment, and/or reagents.
- An example of an automated RNA expression caller is disclosed, for example, in U.S. Prov. Patent Application No. 62/943,712, titled “Systems and Methods for Automating RNA Expression Calls in a Cancer Prediction Pipeline”, and filed 12/4/19, which is incorporated herein by reference and in its entirety for all purposes.
- the digital and laboratory health care platform may further include one or more insight engines to deliver information, characteristics, or determinations related to a disease state that may be based on genetic and/or clinical data associated with a patient and/or specimen.
- exemplary insight engines may include a tumor of unknown origin engine, a tumor mutational burden engine, a PD-L1 status engine, a homologous recombination deficiency engine, a cellular pathway activation report engine, an immune infiltration engine, a microsatellite instability engine, a pathogen infection status engine, and so forth.
- An example tumor of unknown origin engine is disclosed, for example, in U.S. Prov. Patent Application No.
- TMB tumor mutational burden
- PD-L1 status engine is disclosed, for example, in U.S. Prov. Patent Application No.
- 62/854,400 titled “A Pan-Cancer Model to Predict The PD-L1 Status of a Cancer Cell Sample Using RNA Expression Data and Other Patient Data”, and filed 5/30/19, which is incorporated herein by reference and in its entirety for all purposes.
- An additional example of a PD-L1 status engine is disclosed, for example, in U.S. Prov. Patent Application No. 62/824,039, titled “PD-L1 Prediction Using H&E Slide Images”, and filed 3/26/19, which is incorporated herein by reference and in its entirety for all purposes.
- An example of a homologous recombination deficiency engine is disclosed, for example, in U.S. Prov. Patent Application No.
- the methods and systems described above may be utilized to create a summary report of a patient’s genetic profile and the results of one or more insight engines for presentation to a physician.
- the report may provide to the physician information about the extent to which the specimen that was sequenced contained tumor or normal tissue from a first organ, a second organ, a third organ, and so forth.
- the report may provide a genetic profile for each of the tissue types, tumors, or organs in the specimen.
- the genetic profile may represent genetic sequences present in the tissue type, tumor, or organ and may include variants, expression levels, information about gene products, or other information that could be derived from genetic analysis of a tissue, tumor, or organ.
- the report may include therapies and/or clinical trials matched based on a portion or all of the genetic profile or insight engine findings and summaries.
- the therapies may be matched according to the systems and methods disclosed in U.S. Prov. Patent Application No. 62/804,724, titled “Therapeutic Suggestion Improvements Gained Through Genomic Biomarker Matching Plus Clinical History”, filed 2/12/2019, which is incorporated herein by reference and in its entirety for all purposes.
- the clinical trials may be matched according to the systems and methods disclosed in U.S. Prov. Patent Application No. 62/855,913, titled “Systems and Methods of Clinical Trial Evaluation”, filed 5/31/2019, which is incorporated herein by reference and in its entirety for all purposes.
- the report may include a comparison of the results to a database of results from many specimens.
- An example of methods and systems for comparing results to a database of results are disclosed in U.S. Prov. Patent Application No. 62/786,739, titled “A Method and Process for Predicting and Analyzing Patient Cohort Response, Progression and Survival”, and filed 12/31/18, which is incorporated herein by reference and in its entirety for all purposes.
- the information may be used, sometimes in conjunction with similar information from additional specimens and/or clinical response information, to discover biomarkers or design a clinical trial.
- the methods and systems may be used to further evaluate genetic sequencing data derived from an organoid to provide information about the extent to which the organoid that was sequenced contained a first cell type, a second cell type, a third cell type, and so forth.
- the report may provide a genetic profile for each of the cell types in the specimen.
- the genetic profile may represent genetic sequences present in a given cell type and may include variants, expression levels, information about gene products, or other information that could be derived from genetic analysis of a cell.
- the report may include therapies matched based on a portion or all of the deconvoluted information.
- organoids may be cultured and tested according to the systems and methods disclosed in U.S. Patent Application No. 16/693,117, titled “Tumor Organoid Culture Compositions, Systems, and Methods”, filed 11/22/2019; U.S. Prov. Patent Application No. 62/924,621 , titled “Systems and Methods for Predicting Therapeutic Sensitivity”, filed 10/22/2019; and U.S. Prov. Patent Application No. 62/944,292, titled “Large Scale Phenotypic Organoid Analysis”, filed 12/5/2019, which are incorporated herein by reference and in their entirety for all purposes.
- the digital and laboratory health care platform further includes application of one or more of the above in combination with or as part of a medical device or a laboratory developed test that is generally targeted to medical care and research, such laboratory developed test or medical device results may be enhanced and personalized through the use of artificial intelligence.
- An example of laboratory developed tests, especially those that may be enhanced by artificial intelligence, is disclosed, for example, in U.S. Provisional Patent Application No. 62/924,515, titled “Artificial Intelligence Assisted Precision Medicine Enhancements to Standardized Laboratory Diagnostic Testing”, and filed 10/22/19, which is incorporated herein by reference and in its entirety for all purposes.
- techniques herein may be extended to include LOH loss type classifications, using a two-layer HLA LOH classifier model developed to classify specimens as having a LOH status of loss, no loss, and for specimens having a loss of heterozygosity a further classification of whether the loss is a complete (clonal) loss of heterozygosity (for example, nearly all of the cancer cells in a specimen are predicted to have LOH) or partial loss of heterozygosity (for example, only a portion of the cancer cells in a specimen are predicted to have LOH).
- the result is a three class model of HLA LOH status.
- these techniques can be agnostic to the type of initial HLA LOH classifier used.
- these techniques can be implemented with input classification data from the HLA LOH classifications described above with reference to FIGS. 2 and 3 and/or input classification data developed using other types of HLA LOH classifiers.
- these techniques can advantageously identify mis-classified HLA LOH status and correctly reclassify, e.g., detecting previously classified no loss of heterozygosity specimens as partial loss of heterozygosity thereby allowing for more accurate classification results that lead to more accurate decisions on matched therapy types and/or determination of meeting eligibility criteria in order to match clinical trials to a patient or specimen.
- FIG. 16 illustrates an example process 500 for determining LOH status, as may be executed by the bioinformatics pipeline 14, the computing device 400, and in particular the HLA analysis system 412.
- Biological sample data 502 is provided to HLA LOH classification process 504 that identifies the sample as corresponding to one of two LOH classifications, loss of heterozygosity or no loss of heterozygosity.
- the data 502 may include data in accordance with examples herein, such as HLA reads, HLA reference data, alignment data between the two, coverage feature metrics (e.g., statistics), and/or the determination of allelic imbalance data, etc.
- the process 504 may be implemented by the LOH modeling processes of processes 100 and 200 of FIGS.
- process 118 determines LOH status for an entire sample and/or for each HLA allele in the sample, by referencing to a normal sample.
- the classification process 504 generates one of two classification outputs that may be reported out, a “No loss” (no loss of heterozygosity (no LOH)) classification 506 or a “Loss” (loss of heterozygosity (LOH)) classification 508.
- Such determination corresponds to a first layer 510 of the two layer configuration of the process 500.
- HLA reads, HLA reference data, alignment data between the two, coverage feature metrics (e.g., statistics), and/or the determination of allelic imbalance data, etc. may be provided to a second layer 512 that contains a second HLA LOH classification process 514 designed to classify the sample as a partial LOH 516 or clonal LOH 518.
- the first layer 510 may be implemented in accordance with the example methods like that described above in reference to FIGS. 1-3. However, an advantage of some embodiments herein is that the second layer 512 may be implemented agnostic to the source of initial 2-state LOH classification data provided thereto.
- FIG. 17 illustrates an example process 600 that may be implemented by the processes 504 and 514.
- Initial HLA reads are identified or otherwise obtained, at a block 602.
- a patient-specific HLA reference (genome or partial genome) is generated using normal HLA mapping reads, at a block 604, from which the HLA reads are aligned to the HLA reference, at a block 606.
- Coverage feature metrics from that alignment are computed at a block 608 and a determination of allelic imbalance(s) are made at a block 610, from which a determination of LOH status is performed at block 612.
- the block 612 may be implemented as a two-stage LOH classification process, executing processes 504 and 514, e.g., using models based on any of the coverage features and other metrics of processes 504 and 514.
- the HLA coverage feature metrics described herein may be initially received at the process 608, without needing to be determined.
- the process 600 may be truncated to start at process 608 and the receipt of HLA coverage feature metrics that have been determined from an external source based on any number of sequence read alignment and filtering processes.
- HLA-LOH for the HLA-A gene
- HLA-A*02:01 allele is discussed to illustrate analysis for LOH of a particular HLA allele.
- this same process applies to determining HLA-LOH for any HLA gene (i.e. , HLA-A, HLA-B, HLA-C) or allele of any of those genes.
- identifying HLA reads at block 602 was performed as follows. Because a number of informative HLA reads lack sufficient homology to the standard human reference genome to successfully map during routine analysis, specimen sequence reads of interest for HLA LOH determination were collected from two sources: reads already mapped to an HLA gene in the hg19-aligned BAM output; and unmapped reads from the BAM output that align to a reference file having a large number of HLA alleles collected from a database, such as the IEDB Database. These reads are then combined together into a single file for each of the normal sample and the tumor sample and referred to as normal HLA mapping reads and tumor HLA mapping reads, respectively. Reads from the received sample are compared against these combined reads to identify the HLA reads for analysis.
- the block 604 in order to assess relative coverage across a patient’s two HLA-A alleles, and ultimately determine whether one has been lost based on HLA-A coverage in a matched tumor sample, the block 604 first identifies the sequences of the two HLA-A alleles that are present. In an example, this may be achieved using Optitype, in accordance with examples described hereinabove.
- normal HLA mapping reads are passed into Optitype and the pair of HLA sequences that explain the greatest proportion of HLA mapping reads with the least amount of error is returned (or a single sequence in the case of a homozygous sample).
- each reference allele contains intron 1 , exon 2, intron 2, exon 3, and intron 3 of the allele, referred to as the HLA Region of Interest.
- the reference file generated includes the sequences that were determined for the HLA-A, HLA-B, and HLA-C genes as well as a pool of non-classical HLA genes and HLA pseudogenes to minimize issues that may arise from homology between these genes and HLA-A.
- HLA-A mapping reads are re-aligned to the HLA-A specific reference generated at block 604, e.g., using a Novoalign process.
- Novoalign may be executed with parameters that allow a read to be mapped in more than one location provided those locations both have equivalent mapping qualities (as opposed to one location being selected at random).
- reads may then be removed that have more than one mismatch, insertion, or deletion relative to the reference.
- the HLA reference file may be assessed to determine whether the sequences present are fully supported by the reads in the sample. For example, Freebayes (vt .1 .0) may be used to detect any positions in the HLA reference file where another germline sequence is more supported by the sequencing results.
- reads may be updated in cases where there are at least 40 reads covering a position, and fewer than 5 of those reads support the current reference position. In any case, such information may be provided to the block 604 for updating the HLA reference file or the HLA reference file may be updated at the block 606. In cases where reference updates are needed, the alignment and post alignment filtering described above is repeated at the block 606.
- computing allele coverage feature metrics and normalization may be performed as follows. Following alignment and filtering at block 606, in an example, Bedtools (v 2.26.0) was used to calculate the number of reads that support each allele at each position across a region of interest. In some embodiments, the block 608 can further perform de-noising across the region of interest. For example, to minimize the effect of fluctuations in coverage, in an example, the block 608 is configured to apply a fourth order Savitzky Golay filter with a window length set in base pairs (e.g., 801 bp) to all coverage values.
- base pairs e.g. 801 bp
- the coverage depth of each allele along the length of the region of interest is then used by the block 608 to generate a number of higher order coverage feature metrics.
- these higher order features include B allele frequency (BAF), the proportion of reads supporting each allele at each position.
- the features further include log ratio (logR), i.e. , the ratio of coverage for an allele between the tumor and normal sample. A negative log ratio indicates that the allele is less abundant in the tumor than in the normal. This ratio may be calculated as the Iog2(tumor_read_depth/normal_read_depth * normalization factor), where the normalization factor is the ratio of the number of mapped paired primary reads in the final normal and tumor alignment files.
- the allele with the lower mean logR value is designated the “target allele”, and a further determination procedure is used to determine whether the target allele has undergone Loss of Heterozygosity relative to the “stable” allele.
- each of the processes 504 and 514 may contain trained classifier models.
- the process 504 for example, may be a classifier trained to determine allelic imbalance from which all samples with partial or clonal loss of heterozygosity are collectively classified separate from samples classified as no loss of heterozygosity.
- the process 514 may be a classifier trained for sequential assessment using specific data for tumor and normal samples with the application of predetermined thresholds. The coverage features and thresholds applied by each of the processes 504 and 514 may be established empirically using training data.
- the processes 504 and 514 may be implemented by respective logistic regression models.
- coverage feature selection and threshold determination for the models of 504 and 514 were established empirically using a training set of 189 samples across 34 cancer types that underwent manual classification by two expert reviewers to annotate partial and clonal LOH.
- 477 loci (295 with no loss, 92 with a partial loss and 90 with a clonal loss) across 186 samples with concordant results and tumor purity >30% were selected for training.
- Initial performance of both models was evaluated on a hold-out (validation) dataset of 203 loci across 77 samples (128 with no loss, 37 with a partial loss, 37 with a clonal loss).
- the process 504 performs classification for allelic imbalance based on three coverage features metrics, where training data corresponding to each feature individually and/or collectively is issued for training the allelic imbalance model at process 504.
- the first feature is the ratio of B allele frequency (BAF) of stable allele between tumor and normal samples, which captures the magnitude of the LOH signal between a tumor sample and a normal sample. While the BAF in the normal sample should always be approximately 0.5, in samples with clonal LOH, the BAF of the retained allele in the tumor will increase substantially as tumor purity increases, for example. Training data that includes normal samples and tumor samples of different BAF values, in particular at different tumor purity levels therefore may be used during model training.
- BAF B allele frequency
- the second feature is the mean difference in LogR values between the target allele and the stable alleles.
- This logR value represents the change in coverage for a given allele from the normal to the tumor sample. For an allele that has undergone LOH, its logR will decrease significantly, and the logR of the corresponding stable allele will generally increase slightly. The difference between these two values represents the magnitude of the total change.
- Training data that includes normal samples and tumor samples of alleles with different coverage amounts may be used during model training.
- the third feature is tumor purity.
- training data may include samples of different tumor purity as determined by a pathologist. As tumor purity approaches the limit of detection there may be a greater degree of uncertainty around the determination of allelic imbalance.
- the allelic imbalance model of the process 504 returns a probability of having allelic imbalance.
- samples with a probability of less than 0.5 are classified as LOH negative (classification 506); and samples that have a probability of greater than 0.5 are classified as LOH positive (classification 508) and assessed by a second classification model at process 514.
- these probability thresholds are determined by the model training and therefore may be different than that listed. Further, the model training process may determine that the cut off probabilities for determining classifications 506 and 508 may be different values.
- the process 514 uses a LOH modeling process independent from that of process 504, namely a clonal LOH model that uses three coverage features metrics to classify the loss of heterozygosity as either clonal or partial LOH.
- the coverage features of process 514 may include the expression: (observed logR difference - expected logR difference) / expected logR difference, i.e. , the ratio of the difference between observed and expected logR difference to the expected logR difference.
- the difference in LogR values between the target and stable allele represent the magnitude of the loss event.
- the expected LogR difference can be calculated as the log 2 (1 -TP), where TP is the tumor purity.
- the ratio of observed to expected logR describes whether the loss event observed meets or exceeds the expected loss for a clonal LOH event.
- the observed logR difference is the difference between the logR of coverage of the stable allele and the logR of coverage of the lost allele.
- the observed logR difference is an average of log(coverage in tumor / coverage in normal), calculated for at least one nucleotide position in an HLA gene.
- the log(coverage in tumor / coverage in normal) may be calculated for nucleotide positions having a coverage of at least 40 sequence reads.
- the observed logR difference is an average of log(coverage in tumor / coverage in normal*match ratio), calculated for at least one nucleotide position in an HLA gene, wherein the match ratio is the ratio of the number of HLA reads in the normal sample to the number of HLA reads in the tumor sample or the ratio of the number of unique reads in the normal sample to the number of unique reads in the tumor sample.
- the log(coverage in tumor / coverage in normal * match ratio) may be calculated for nucleotide positions having a coverage of at least 40 sequence reads.
- the observed logR difference is the cumulative area between the logR line associated with a first allele and the logR line associated with a second allele.
- the expected logR difference is the Iog2(1 -tumor purity) and tumor purity is a value between 0 and 1 .
- the process 514 may calculate, for each allele, a ratio of a BAF of a lost allele in the tumor sample to the BAF of the lost allele in the normal sample. Then, the process 514 may compare each ratio for the alleles and select the allele associated with the lowest ratio as the allele that is more likely to be lost. The process 514 may do this determination before determining LOH classification.
- the coverage features of the process 514 may further include the ratio of BAF of the target allele between tumor and normal samples. This feature captures the magnitude of the LOH signal between the tumor and normal sample. While the BAF in the normal sample should always be approximately 0.5, the BAF in the tumor will decrease substantially in samples with Clonal LOH as tumor purity increases. Additionally, the coverage features of the process 514 may include tumor purity, where, as the tumor purity approaches the limit of detection, there may be a greater degree of uncertainty around the determined classification.
- the clonal LOH model returns a probability of clonal loss of heterozygosity. If the probability of clonal LOH is greater than 0.5, the process 514 will return a result of clonal LOH for the target allele. For example, if the target allele is A*02:01 , then the process 514 will return a status of “A*02:01 LOH Positive”, corresponding to classification 518. If the probability is 0.5 or less, the process 514 will return a result of partial LOH for the target allele, e.g., “A*02:01 LOH Partial”, corresponding to classification 516.
- FIG. 18 illustrates three plots.
- a top plot is of the read coverage (number of reads) on the y-axis for two different alleles (B*44:02 (red data points) and B*07:02 (blue data points)) as a function of nucleotide position, for a normal sample.
- a middle plot is of the BAF for two different alleles as a function of nucleotide position, for the normal sample.
- a bottom plot is of Log Ratio of read coverage in the tumor sample to the read coverage in the normal sample, as a function of nucleotide position.
- FIG. 19 illustrates three plots.
- a top plot is of the read coverage (number of reads) on the y-axis for two different alleles (B*44:02 (red data points) and B*07:02 (blue data points)) as a function of nucleotide position, for a tumor sample.
- a middle plot is of the BAF for two different alleles as a function of nucleotide position, for the tumor sample.
- a bottom plot is the difference between the log Ratio of the two different alleles as a function of nucleotide position and illustrates a partial LOH classification example.
- FIG. 20 illustrates four plots. Two top plots correspond to read coverage as a function of nucleotide position for normal and tumor samples, respectively.
- One bottom plot is of Log Ratio of read coverage in the tumor sample to the read coverage in the normal sample, as a function of nucleotide position.
- the other bottom plot is the difference between log Ratio of the two different alleles as a function of nucleotide position and illustrates a clonal loss classification example.
- a gray line is shown for each of the allele plots and represents the read coverage after a smoothing filter was applied, in these examples, a Savitzky-Golay filter. Smoothing the read coverage allows for less noise in downstream determined coverage features.
- the two-layer clonal LOH determination process 500 can be used with any number of cancer types to provide decisional support for identifying targeted therapies.
- the HLA-LOH determination may be performed for patients having a colorectal cancer diagnosis.
- the process 500 may be focused on an HLA-LOH determination on one specific HLA-A allele (HLA-A*02:01), and may be used as a companion diagnostic, for example, to a chimeric antigen receptor (CAR) T-cell therapy or another therapy indicated for treatment of tumors having HLA-LOH (including HLA-LOH of a specific HLA allele).
- CAR chimeric antigen receptor
- the CAR therapy is targeted to the tumor-specific antigen CEA, a well-known tumor-selective antigen highly expressed in all colorectal cancers and a subset of other epithelial neoplasms.
- the CAR may further comprise a synthetic AND/NOT logic gate system that reacts to two antigens in the body.
- the CAR includes an activating (for example, A module) receptor that can bind to CEA, and a blocking (for example, B module) receptor that blocks T-cell activation and binds to an HLA allele.
- the HLA allele is the HLA-A*02 allele.
- identifying a sample as having no LOH, clonal LOH, or partial LOH for the HLA-A*02:01 allele can be used in identifying whether to use CAR therapy or, as may be the case with partial LOH, use a combo therapy that combines a therapy directed at a subpopulation of cancer cells having HLA LOH and another therapy directed at a subpopulation of cancer cells without HLA LOH.
- routines, subroutines, applications, or instructions may constitute either software (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware.
- routines, etc. are tangible units capable of performing certain operations and may be configured or arranged in a certain manner.
- one or more computer systems e.g., a standalone, client or server computer system
- one or more hardware modules of a computer system e.g., a processor or a group of processors
- software e.g., an application or application portion
- a hardware module may be implemented mechanically or electronically.
- a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a microcontroller, field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations.
- a hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
- the term "hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein.
- hardware modules are temporarily configured (e.g., programmed)
- each of the hardware modules need not be configured or instantiated at any one instance in time.
- the hardware modules comprise a processor configured using software
- the processor may be configured as respective different hardware modules at different times.
- Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
- Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connects the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
- a resource e.g., a collection of information
- processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations.
- processors may constitute processor-implemented modules that operate to perform one or more operations or functions.
- the modules referred to herein may, in some example embodiments, comprise processor-implemented modules.
- the methods or routines described herein may be at least partially processor-implemented. For example, at least some of the operations of a method can be performed by one or more processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but also deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
- the performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but also deployed across a number of machines.
- the one or more processors or processor- implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
- any reference to "one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment.
- the appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- Coupled and “connected” along with their derivatives.
- some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact.
- the term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
- the embodiments are not limited in this context.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Hospice & Palliative Care (AREA)
- Theoretical Computer Science (AREA)
- Oncology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Microbiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA3219608A CA3219608A1 (en) | 2021-06-28 | 2021-07-16 | Detection of human leukocyte antigen loss of heterozygosity |
EP21948675.0A EP4363616A1 (en) | 2021-06-28 | 2021-07-16 | Detection of human leukocyte antigen loss of heterozygosity |
AU2021454223A AU2021454223A1 (en) | 2021-06-28 | 2021-07-16 | Detection of human leukocyte antigen loss of heterozygosity |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/304,940 US11475978B2 (en) | 2019-02-12 | 2021-06-28 | Detection of human leukocyte antigen loss of heterozygosity |
US17/304,940 | 2021-06-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023277932A1 true WO2023277932A1 (en) | 2023-01-05 |
Family
ID=84690558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/042039 WO2023277932A1 (en) | 2021-06-28 | 2021-07-16 | Detection of human leukocyte antigen loss of heterozygosity |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP4363616A1 (en) |
AU (1) | AU2021454223A1 (en) |
CA (1) | CA3219608A1 (en) |
WO (1) | WO2023277932A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117409868A (en) * | 2023-12-14 | 2024-01-16 | 成都大熊猫繁育研究基地 | Panda genetic map drawing method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9562269B2 (en) * | 2013-01-22 | 2017-02-07 | The Board Of Trustees Of The Leland Stanford Junior University | Haplotying of HLA loci with ultra-deep shotgun sequencing |
WO2019012296A1 (en) * | 2017-07-14 | 2019-01-17 | The Francis Crick Institute Limited | Analysis of hla alleles in tumours and the uses thereof |
US20200258597A1 (en) * | 2019-02-12 | 2020-08-13 | Tempus Labs, Inc. | Detection of human leukocyte antigen loss of heterozygosity |
-
2021
- 2021-07-16 AU AU2021454223A patent/AU2021454223A1/en active Pending
- 2021-07-16 EP EP21948675.0A patent/EP4363616A1/en active Pending
- 2021-07-16 CA CA3219608A patent/CA3219608A1/en active Pending
- 2021-07-16 WO PCT/US2021/042039 patent/WO2023277932A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9562269B2 (en) * | 2013-01-22 | 2017-02-07 | The Board Of Trustees Of The Leland Stanford Junior University | Haplotying of HLA loci with ultra-deep shotgun sequencing |
WO2019012296A1 (en) * | 2017-07-14 | 2019-01-17 | The Francis Crick Institute Limited | Analysis of hla alleles in tumours and the uses thereof |
US20200258597A1 (en) * | 2019-02-12 | 2020-08-13 | Tempus Labs, Inc. | Detection of human leukocyte antigen loss of heterozygosity |
US11081210B2 (en) * | 2019-02-12 | 2021-08-03 | Tempus Labs, Inc. | Detection of human leukocyte antigen loss of heterozygosity |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117409868A (en) * | 2023-12-14 | 2024-01-16 | 成都大熊猫繁育研究基地 | Panda genetic map drawing method and system |
CN117409868B (en) * | 2023-12-14 | 2024-02-20 | 成都大熊猫繁育研究基地 | Panda genetic map drawing method and system |
Also Published As
Publication number | Publication date |
---|---|
EP4363616A1 (en) | 2024-05-08 |
CA3219608A1 (en) | 2023-01-08 |
AU2021454223A1 (en) | 2024-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11081210B2 (en) | Detection of human leukocyte antigen loss of heterozygosity | |
JP7368483B2 (en) | An integrated machine learning framework for estimating homologous recombination defects | |
US11475978B2 (en) | Detection of human leukocyte antigen loss of heterozygosity | |
EP4073805B1 (en) | Systems and methods for predicting homologous recombination deficiency status of a specimen | |
US20200210852A1 (en) | Transcriptome deconvolution of metastatic tissue samples | |
WO2017065959A2 (en) | Methods and compositions that utilize transcriptome sequencing data in machine learning-based classification | |
US20220215900A1 (en) | Systems and methods for joint low-coverage whole genome sequencing and whole exome sequencing inference of copy number variation for clinical diagnostics | |
WO2021258026A1 (en) | Molecular response and progression detection from circulating cell free dna | |
Oh et al. | Reliable analysis of clinical tumor-only whole-exome sequencing data | |
US20230064530A1 (en) | Detection of Genetic Variants in Human Leukocyte Antigen Genes | |
Quiroz-Zárate et al. | Expression Quantitative Trait loci (QTL) in tumor adjacent normal breast tissue and breast tumor tissue | |
WO2023277932A1 (en) | Detection of human leukocyte antigen loss of heterozygosity | |
EP4377479A1 (en) | Detection of genetic variants in human leukocyte antigen genes | |
Lynch et al. | demuxSNP: supervised demultiplexing scRNAseq using cell hashing and SNPs | |
WO2024192121A1 (en) | White blood cell contamination detection | |
Feldhahn | Computational methods for personalized cancer therapy based on genomics data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21948675 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 3219608 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021454223 Country of ref document: AU Ref document number: AU2021454223 Country of ref document: AU |
|
ENP | Entry into the national phase |
Ref document number: 2021454223 Country of ref document: AU Date of ref document: 20210716 Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021948675 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021948675 Country of ref document: EP Effective date: 20240129 |