CN113316818A - Method for identifying neoantigens - Google Patents
Method for identifying neoantigens Download PDFInfo
- Publication number
- CN113316818A CN113316818A CN202080008090.2A CN202080008090A CN113316818A CN 113316818 A CN113316818 A CN 113316818A CN 202080008090 A CN202080008090 A CN 202080008090A CN 113316818 A CN113316818 A CN 113316818A
- Authority
- CN
- China
- Prior art keywords
- tumor
- sequencing
- gene
- mutation
- somatic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 109
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 198
- 206010069754 Acquired gene mutation Diseases 0.000 claims description 129
- 230000037439 somatic mutation Effects 0.000 claims description 129
- 238000012163 sequencing technique Methods 0.000 claims description 98
- 108090000623 proteins and genes Proteins 0.000 claims description 67
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 54
- 230000035772 mutation Effects 0.000 claims description 43
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 34
- 150000001413 amino acids Chemical class 0.000 claims description 30
- 230000014509 gene expression Effects 0.000 claims description 29
- 238000012216 screening Methods 0.000 claims description 28
- 230000006870 function Effects 0.000 claims description 20
- 239000000427 antigen Substances 0.000 claims description 18
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 17
- 108091007433 antigens Proteins 0.000 claims description 17
- 102000036639 antigens Human genes 0.000 claims description 17
- 108700028369 Alleles Proteins 0.000 claims description 16
- 238000007482 whole exome sequencing Methods 0.000 claims description 13
- 201000011510 cancer Diseases 0.000 claims description 11
- 229940028444 muse Drugs 0.000 claims description 11
- GMVPRGQOIOIIMI-DWKJAMRDSA-N prostaglandin E1 Chemical compound CCCCC[C@H](O)\C=C\[C@H]1[C@H](O)CC(=O)[C@@H]1CCCCCCC(O)=O GMVPRGQOIOIIMI-DWKJAMRDSA-N 0.000 claims description 11
- 102000004245 Proteasome Endopeptidase Complex Human genes 0.000 claims description 8
- 108090000708 Proteasome Endopeptidase Complex Proteins 0.000 claims description 8
- 230000001613 neoplastic effect Effects 0.000 claims description 8
- 238000004949 mass spectrometry Methods 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 6
- 101150075764 CD4 gene Proteins 0.000 claims description 5
- 229940076838 Immune checkpoint inhibitor Drugs 0.000 claims description 3
- 102000037984 Inhibitory immune checkpoint proteins Human genes 0.000 claims description 3
- 108091008026 Inhibitory immune checkpoint proteins Proteins 0.000 claims description 3
- 239000012274 immune-checkpoint protein inhibitor Substances 0.000 claims description 3
- 230000029087 digestion Effects 0.000 claims description 2
- 238000001976 enzyme digestion Methods 0.000 claims description 2
- 238000001819 mass spectrum Methods 0.000 claims description 2
- 108700026220 vif Genes Proteins 0.000 claims description 2
- 229960005486 vaccine Drugs 0.000 abstract description 13
- 210000001744 T-lymphocyte Anatomy 0.000 abstract description 6
- 238000009169 immunotherapy Methods 0.000 abstract description 4
- 238000002659 cell therapy Methods 0.000 abstract description 3
- 210000004027 cell Anatomy 0.000 description 66
- 210000001519 tissue Anatomy 0.000 description 64
- 241000699670 Mus sp. Species 0.000 description 24
- 238000004458 analytical method Methods 0.000 description 23
- 238000001514 detection method Methods 0.000 description 21
- 238000004422 calculation algorithm Methods 0.000 description 15
- 241000699666 Mus <mouse, genus> Species 0.000 description 13
- 108091036414 Polyinosinic:polycytidylic acid Proteins 0.000 description 12
- 210000005259 peripheral blood Anatomy 0.000 description 11
- 239000011886 peripheral blood Substances 0.000 description 11
- 238000001914 filtration Methods 0.000 description 10
- 241000282414 Homo sapiens Species 0.000 description 9
- 238000011282 treatment Methods 0.000 description 9
- 238000003776 cleavage reaction Methods 0.000 description 8
- 239000012634 fragment Substances 0.000 description 8
- 102000004169 proteins and genes Human genes 0.000 description 8
- 230000007017 scission Effects 0.000 description 8
- 238000007920 subcutaneous administration Methods 0.000 description 8
- 238000011081 inoculation Methods 0.000 description 7
- 239000002609 medium Substances 0.000 description 7
- 210000004988 splenocyte Anatomy 0.000 description 7
- 230000004614 tumor growth Effects 0.000 description 7
- 239000002671 adjuvant Substances 0.000 description 6
- 239000003814 drug Substances 0.000 description 6
- 230000028993 immune response Effects 0.000 description 6
- 108020004414 DNA Proteins 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000003114 enzyme-linked immunosorbent spot assay Methods 0.000 description 5
- 238000012165 high-throughput sequencing Methods 0.000 description 5
- 239000008194 pharmaceutical composition Substances 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 210000004881 tumor cell Anatomy 0.000 description 5
- 102100037850 Interferon gamma Human genes 0.000 description 4
- 108010074328 Interferon-gamma Proteins 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 4
- 238000003559 RNA-seq method Methods 0.000 description 4
- 230000002163 immunogen Effects 0.000 description 4
- 201000007270 liver cancer Diseases 0.000 description 4
- 208000014018 liver neoplasm Diseases 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 239000002773 nucleotide Substances 0.000 description 4
- 125000003729 nucleotide group Chemical group 0.000 description 4
- 229920001184 polypeptide Polymers 0.000 description 4
- 238000003908 quality control method Methods 0.000 description 4
- 230000003248 secreting effect Effects 0.000 description 4
- 101100428022 Arabidopsis thaliana UTR3 gene Proteins 0.000 description 3
- 101150059668 Bard1 gene Proteins 0.000 description 3
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 3
- 101000820585 Homo sapiens SUN domain-containing ossification factor Proteins 0.000 description 3
- 101000673946 Homo sapiens Synaptotagmin-like protein 1 Proteins 0.000 description 3
- 102000043129 MHC class I family Human genes 0.000 description 3
- 108091054437 MHC class I family Proteins 0.000 description 3
- 101100453133 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) ISY1 gene Proteins 0.000 description 3
- 102100040541 Synaptotagmin-like protein 1 Human genes 0.000 description 3
- 101150007199 UTR5 gene Proteins 0.000 description 3
- 108020004417 Untranslated RNA Proteins 0.000 description 3
- 102000039634 Untranslated RNA Human genes 0.000 description 3
- 230000004075 alteration Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 239000003937 drug carrier Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 102000054766 genetic haplotypes Human genes 0.000 description 3
- 108020004999 messenger RNA Proteins 0.000 description 3
- 230000003285 pharmacodynamic effect Effects 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 210000001082 somatic cell Anatomy 0.000 description 3
- 210000000952 spleen Anatomy 0.000 description 3
- 238000011144 upstream manufacturing Methods 0.000 description 3
- 206010000830 Acute leukaemia Diseases 0.000 description 2
- 206010005003 Bladder cancer Diseases 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 2
- 208000026310 Breast neoplasm Diseases 0.000 description 2
- 102100036178 Centrosomal protein of 192 kDa Human genes 0.000 description 2
- 206010008342 Cervix carcinoma Diseases 0.000 description 2
- 108091026890 Coding region Proteins 0.000 description 2
- 206010009944 Colon cancer Diseases 0.000 description 2
- 101150102768 Dhodh gene Proteins 0.000 description 2
- 238000011510 Elispot assay Methods 0.000 description 2
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 2
- 108700024394 Exon Proteins 0.000 description 2
- 208000032612 Glial tumor Diseases 0.000 description 2
- 206010018338 Glioma Diseases 0.000 description 2
- 208000002250 Hematologic Neoplasms Diseases 0.000 description 2
- 208000008839 Kidney Neoplasms Diseases 0.000 description 2
- 206010023825 Laryngeal cancer Diseases 0.000 description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- 206010025323 Lymphomas Diseases 0.000 description 2
- 108700018351 Major Histocompatibility Complex Proteins 0.000 description 2
- 208000001894 Nasopharyngeal Neoplasms Diseases 0.000 description 2
- 206010061306 Nasopharyngeal cancer Diseases 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 2
- 206010033128 Ovarian cancer Diseases 0.000 description 2
- 206010061535 Ovarian neoplasm Diseases 0.000 description 2
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 2
- 206010060862 Prostate cancer Diseases 0.000 description 2
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 2
- 208000015634 Rectal Neoplasms Diseases 0.000 description 2
- 206010038389 Renal cancer Diseases 0.000 description 2
- 101150046285 Sdcbp gene Proteins 0.000 description 2
- 208000005718 Stomach Neoplasms Diseases 0.000 description 2
- 102100035003 Synaptotagmin-like protein 5 Human genes 0.000 description 2
- 101150057813 Sytl5 gene Proteins 0.000 description 2
- 102100034922 T-cell surface glycoprotein CD8 alpha chain Human genes 0.000 description 2
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 2
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 2
- 101150004074 Vps33a gene Proteins 0.000 description 2
- -1 anti-PD1 Proteins 0.000 description 2
- 230000000259 anti-tumor effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000006285 cell suspension Substances 0.000 description 2
- 201000010881 cervical cancer Diseases 0.000 description 2
- 230000001684 chronic effect Effects 0.000 description 2
- 208000024207 chronic leukemia Diseases 0.000 description 2
- 208000029742 colonic neoplasm Diseases 0.000 description 2
- 201000004101 esophageal cancer Diseases 0.000 description 2
- 206010017758 gastric cancer Diseases 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 230000012010 growth Effects 0.000 description 2
- 239000001963 growth medium Substances 0.000 description 2
- 201000010536 head and neck cancer Diseases 0.000 description 2
- 208000014829 head and neck neoplasm Diseases 0.000 description 2
- 206010073071 hepatocellular carcinoma Diseases 0.000 description 2
- 231100000844 hepatocellular carcinoma Toxicity 0.000 description 2
- 230000005847 immunogenicity Effects 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 201000010982 kidney cancer Diseases 0.000 description 2
- 206010023841 laryngeal neoplasm Diseases 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 201000001441 melanoma Diseases 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000010172 mouse model Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 201000008968 osteosarcoma Diseases 0.000 description 2
- 201000002528 pancreatic cancer Diseases 0.000 description 2
- 208000008443 pancreatic carcinoma Diseases 0.000 description 2
- 239000002244 precipitate Substances 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 206010038038 rectal cancer Diseases 0.000 description 2
- 201000001275 rectum cancer Diseases 0.000 description 2
- 238000007480 sanger sequencing Methods 0.000 description 2
- 230000028327 secretion Effects 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 210000004989 spleen cell Anatomy 0.000 description 2
- 230000000638 stimulation Effects 0.000 description 2
- 201000011549 stomach cancer Diseases 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 230000020382 suppression by virus of host antigen processing and presentation of peptide antigen via MHC class I Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 201000005112 urinary bladder cancer Diseases 0.000 description 2
- 101150092939 Abcc4 gene Proteins 0.000 description 1
- 101150111062 C gene Proteins 0.000 description 1
- 102000008203 CTLA-4 Antigen Human genes 0.000 description 1
- 108010021064 CTLA-4 Antigen Proteins 0.000 description 1
- 229940045513 CTLA4 antagonist Drugs 0.000 description 1
- 101100198345 Caenorhabditis elegans rnf-121 gene Proteins 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 238000002965 ELISA Methods 0.000 description 1
- 201000003741 Gastrointestinal carcinoma Diseases 0.000 description 1
- 101150101098 HIPK1 gene Proteins 0.000 description 1
- 102000008949 Histocompatibility Antigens Class I Human genes 0.000 description 1
- 108010088652 Histocompatibility Antigens Class I Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101001117317 Homo sapiens Programmed cell death 1 ligand 1 Proteins 0.000 description 1
- 101000658110 Homo sapiens Synaptotagmin-like protein 2 Proteins 0.000 description 1
- 101000658112 Homo sapiens Synaptotagmin-like protein 3 Proteins 0.000 description 1
- 101150017040 I gene Proteins 0.000 description 1
- 101150062179 II gene Proteins 0.000 description 1
- 108010002350 Interleukin-2 Proteins 0.000 description 1
- 102000043131 MHC class II family Human genes 0.000 description 1
- 108091054438 MHC class II family Proteins 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 101100180990 Mus musculus Khnyn gene Proteins 0.000 description 1
- 101100091963 Mus musculus Polrmt gene Proteins 0.000 description 1
- 101100309367 Mus musculus Sec23ip gene Proteins 0.000 description 1
- 101100312652 Mus musculus Sytl4 gene Proteins 0.000 description 1
- 101100206736 Mus musculus Tiam1 gene Proteins 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 102000007079 Peptide Fragments Human genes 0.000 description 1
- 108010033276 Peptide Fragments Proteins 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 102100024216 Programmed cell death 1 ligand 1 Human genes 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 101150088385 SLC25A37 gene Proteins 0.000 description 1
- 102100035007 Synaptotagmin-like protein 2 Human genes 0.000 description 1
- 102100035001 Synaptotagmin-like protein 3 Human genes 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 239000003963 antioxidant agent Substances 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000000386 athletic effect Effects 0.000 description 1
- 238000013476 bayesian approach Methods 0.000 description 1
- 239000011230 binding agent Substances 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000011443 conventional therapy Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000007884 disintegrant Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000945 filler Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 201000005787 hematologic cancer Diseases 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 238000002649 immunization Methods 0.000 description 1
- 230000003053 immunization Effects 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 201000002313 intestinal cancer Diseases 0.000 description 1
- 238000007912 intraperitoneal administration Methods 0.000 description 1
- 239000007928 intraperitoneal injection Substances 0.000 description 1
- 239000007951 isotonicity adjuster Substances 0.000 description 1
- 101150047102 kpnb1 gene Proteins 0.000 description 1
- 101150043067 lcp1 gene Proteins 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000000314 lubricant Substances 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000012452 mother liquor Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 229940023041 peptide vaccine Drugs 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000001850 reproductive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 201000002314 small intestine cancer Diseases 0.000 description 1
- 101150000485 snd1 gene Proteins 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 230000003393 splenic effect Effects 0.000 description 1
- 239000003381 stabilizer Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 239000011550 stock solution Substances 0.000 description 1
- 239000007929 subcutaneous injection Substances 0.000 description 1
- 238000010254 subcutaneous injection Methods 0.000 description 1
- 239000004094 surface-active agent Substances 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 229940021747 therapeutic vaccine Drugs 0.000 description 1
- 230000002110 toxicologic effect Effects 0.000 description 1
- 231100000759 toxicological effect Toxicity 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to the field of tumor immunotherapy. In particular, the invention provides methods and devices for identifying tumor-specific neoantigens in a patient. The neoantigens identified by the methods or devices of the invention can be used to develop vaccines or T cell therapies against the tumor.
Description
The present invention relates to the field of tumor immunotherapy. In particular, the invention provides methods and devices for identifying tumor-specific neoantigens in a patient. The neoantigens identified by the methods or devices of the invention can be used to develop vaccines or T cell therapies against the tumor.
Background
Cancer is characterized by abnormal cell proliferation. The success of conventional therapy depends on the type of cancer and the stage at which it is detected. Many treatments involve expensive and painful surgery and chemotherapy and are often unsuccessful, or only moderately prolong the life of the patient. Promising therapies being developed include tumor vaccines or T cell therapies targeting tumor antigens, which enable the patient's immune system to distinguish between tumor and healthy cells and elicit the patient's immune response.
Neoantigens are a class of immunogens that are associated with patient-specific tumor-specific mutations. The neoantigens have shown good promise as targets for anti-tumor immunization technologies, such as personalized tumor vaccines.
Although there are strategies for identifying candidate neoantigens by sequencing and HLA typing, there are disadvantages of high false positive rate, few applicable population, and the like, which severely limit the development of neoantigen-based anti-tumor vaccines. Thus, there remains a need in the art for new methods for identifying neoantigens.
Brief description of the invention
In one aspect, the present invention provides a method of identifying a neoplastic antigen in a subject, the method comprising the steps of:
(a) analyzing the sequencing results of the whole exome of the tumor tissue or cell and the normal tissue or cell of the object to identify the tumor tissue specific somatic mutation;
(b) analyzing the subject tumor tissue or cells for transcriptome sequencing results and further screening for somatic mutations identified in step (a);
(c) analyzing the sequencing result of the whole exome of the normal tissues or cells of the subject, and carrying out HLA typing on the patient;
(d) analyzing the binding of the mutant peptide corresponding to the somatic mutation to MHC based on the results of steps (b) and (c), thereby screening candidate tumor-specific neoantigens.
In another aspect, the present invention provides a device for identifying a tumor neoantigen in a subject, the device comprising: a memory for storing a program; a processor for implementing the method for identifying a tumor neoantigen in a subject of the present invention by executing the program stored in the memory.
In another aspect, the present invention provides a computer-readable storage medium comprising a program executable by a processor to perform the method of the present invention for identifying a tumor neoantigen in a subject.
In another aspect, the present invention provides a device for identifying tumor neoantigens in a subject, the device comprising the following four modules: a somatic mutation identification module I) for identifying tumor-specific somatic mutations based on the results of whole exome sequencing of the tumor tissue or cells and normal tissue or cells of the subject; a tumor specific somatic mutation screening module II for further screening tumor specific somatic mutations based on the transcriptome sequencing results of the tumor tissue or cells of the subject; an HLA typing module III for HLA typing based on the sequencing result of the whole exome of the normal tissue or cell of the subject); and tumor neoantigen prediction module IV).
In another aspect, the invention provides a neoplastic antigen identified according to the method or device of the invention.
In another aspect, the invention provides a pharmaceutical composition comprising a tumor neoantigen identified according to the method or device of the invention, and a pharmaceutically acceptable carrier.
In another aspect, the invention also provides the use of a tumor neoantigen identified according to the method or device of the invention or a pharmaceutical composition of the invention in the manufacture of a medicament for the treatment and/or prevention of cancer.
In another aspect, the present invention provides a method of treating cancer in a subject, the method comprising:
a) identifying at least one neoplastic antigen of the subject by the method or device of the invention;
b) generating at least one tumor neoantigen identified in step a); and
c) administering to said subject said at least one tumor neoantigen produced in step b).
FIG. 1, a flow chart showing the method of identifying neoantigens according to the present invention.
FIG. 2, candidate neoantigen of H22 cells. The neoantigen of RPKM >0 in H22 cells is shown, with the red line representing the threshold line for RPKM ═ 1. RPKM is more than or equal to 1 and is selected as a candidate neoantigen.
Figure 3 shows the animal pharmacodynamics experimental protocol.
Figure 4, shows H22 subcutaneous tumor-bearing mouse groupings. On day 5 of growth of the subcutaneous tumors, tumor size was measured with a vernier caliper and grouped after volume calculation. A total of 6 groups were set: ctrl, poly I: C, SLPs, anti-PD1, poly I: C + anti-PD1, SLPs + anti-PD1, starting 6 animals per group. SLPs are H22 neoantigen synthesized 25 amino acid long peptide, and poly I: C is adjuvant. Tumor volume calculation formula: vTumor(s)=(L Long and long Diameter of a pipe×L Short diameter 2)1/2。
FIG. 5 shows the tumor growth curve of H22 subcutaneous tumor-bearing mice. Tumors were first measured 5 days after tumor inoculation and every 3 days thereafter. After data collection was complete, a single mouse tumor growth curve was plotted. During the experiment, when the tumor volume of the mice grows to 2000-3000mm3At intervals, the test was stopped and sacrificed.
Figure 6 shows pictures of H22 subcutaneous tumor-bearing mice. Mice were sacrificed and photographed 26 days after tumor inoculation. Tumor-bearing pictures of each mouse of each group are shown.
FIG. 7 shows SLP corresponding to stacked ASPs design. Each SLPs was designed as a 4-amino acid tandem assay peptide (ASP) of 15 amino acids. The ASPs design is shown as model H22 SLP1(Bard1) with the red marker letter being the mutated amino acid.
FIG. 8 shows the result of IFN-. gamma.ELISPOT assay of mouse splenocytes. ASPs of SLP1-17 were used to stimulate splenocytes from mice, and IFN-gamma secretion from splenocytes was measured in vitro by ELISPOT method. Each dot in the figure represents a mouse, 5X 105The number of IFN-. gamma.secreting cells in the spleen cells. Single spots in All SLPs 5X 10 mice5Total number of IFN-. gamma.secreting cells in the spleen cells that responded to all SLPs. The abscissa, from left to right, is ctrl, poly I: C, SLPs, anti-PD1, poly I: C + anti-PD1, SLPs + anti-PD1, respectively.
Detailed Description
In one aspect, the present invention provides a method of identifying a neoplastic antigen in a subject, the method comprising the steps of:
(a) analyzing the sequencing results of the whole exome of the tumor tissue or cell and the normal tissue or cell of the object to identify the tumor tissue specific somatic mutation;
(b) analyzing the subject tumor tissue or cells for transcriptome sequencing results and further screening for somatic mutations identified in step (a);
(c) analyzing the sequencing result of the whole exome of the normal tissues or cells of the subject, and carrying out HLA typing on the patient;
(d) analyzing the binding of the mutant peptide corresponding to the somatic mutation to MHC based on the results of steps (b) and (c), thereby screening candidate tumor-specific neoantigens.
In some embodiments of this aspect of the invention, the sequencing is high throughput sequencing, also known as next generation sequencing ("NGS"). Second generation sequencing produces thousands to millions of sequences simultaneously in a parallel sequencing process. NGS is distinguished from "Sanger sequencing" (one generation sequencing), which is based on electrophoretic separation of chain termination products in a single sequencing reaction. Sequencing platforms that can be used with the NGS of the present invention are commercially available and include, but are not limited to, Roche/454FLX, Illumina/Solexa Genome Analyzer, and Applied Biosystems SOLID system, among others.
Exome sequencing is a genome analysis method of high-throughput sequencing after capturing and enriching DNA of the whole genome exome region by using a sequence capture technology. Because it has high sensitivity to common and rare variations, only 2% of the genome need be sequenced to discover most disease-related variations in exon regions.
Transcriptome sequencing is to obtain almost all transcripts and gene sequences of specific cell or tissue of some species in some state via the second generation sequencing platform, and may be used in research of gene expression amount, gene function, structure, alternative splicing, new transcript prediction, etc.
The normal tissue or cell may be any non-neoplastic tissue or cell, such as peripheral blood (for non-hematologic cancers) or a tissue adjacent to a cancer, preferably peripheral blood.
The tumor tissue or cells include, but are not limited to, the following tumor tissues or cells: liver cancer, lung cancer, ovarian cancer, colon cancer, rectal cancer, melanoma, kidney cancer, bladder cancer, prostate cancer, breast cancer, lymphoma, hematological malignancies, head and neck cancer, glioma, stomach cancer, nasopharyngeal cancer, laryngeal cancer, pancreatic cancer, cervical cancer, esophageal cancer, small intestine cancer, chronic or acute leukemia, and osteosarcoma.
"somatic mutation" refers to a mutation that occurs in a somatic cell of an organism other than a germ cell. Somatic mutations do not pass on to offspring, but may result in the phenotype of the contemporary organism, for example, in a tumor. Somatic mutations generally refer to nucleotide mutations in a DNA sequence. However, as will be understood by those skilled in the art, the term may also refer to corresponding amino acid mutations in a particular context.
As used herein, the term "antigen" refers to a substance, such as a polypeptide, that induces an immune response. As used herein, the term "neoantigen" is an antigen having at least one alteration that makes it different from the corresponding wild-type parent antigen, e.g., the alteration is a tumor-specific somatic mutation. As used herein, the term "tumor neoantigen" or "tumor-specific neoantigen" is a neoantigen that is present in a tumor cell or tissue of a subject but is substantially absent from a normal cell or tissue of the subject. The term "neoantigen" can be a full-length protein, or a portion thereof that comprises the alteration. For example, a "tumor neoantigen" can be a polypeptide (mutant peptide) comprising a tumor-specific somatic mutation, particularly a polypeptide that is immunogenic (e.g., comprises a T cell epitope), which is truncated from the full-length protein. The polypeptide may be about 8 to about 35 amino acids in length, e.g., 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 amino acids, or any range therebetween.
As used herein, "subject" means a mammal, including a rodent or primate, e.g., mouse, rat, monkey, human. Preferably the subject is a human.
As used herein, "MHC" refers to the major histocompatibility complex (major histocompatibility complex). Human MHC is also called HLA (human Leukocyte antigen). It will be understood by those skilled in the art that the term is not limited to humans when used in other species where HLA typing, as used herein, refers to MHC typing in nature.
As mentioned above, the method of identifying a tumor neoantigen of the present invention comprises four main steps, wherein the first step (a)) is aimed at the accurate analysis of tumor-specific somatic mutations in tumor tissue.
By analyzing the paired normal and tumor tissues or cells, the obtained tumor tissue-specific or mutation ratio is obviously higher than that of the normal tissue of the same individual, and the obtained tumor tissue-specific or mutation ratio is considered to be the specific somatic mutation generated by the tumor tissue. In general, tumor tissue genomes are highly dynamic, changing in progression, and highly heterogeneous. In the process of tumor genome sequencing, the purity of tumor cells of a plurality of samples can not reach 80 percent, and the purity of tumor cells of a plurality of samples can be even lower. These cause tumor-specific somatic mutations that are difficult to accurately find.
Various algorithms for detecting tumor somatic mutations based on different principles have been published today, including but not limited to: 1) strelka uses a novel bayesian approach that considers the allele frequencies of cancer and paracarcinoma tissues as continuous values, i.e., paracarcinoma tissues are represented as a mixture of reproductive and noise, and tumor tissues are represented as a mixture of paracarcinoma and somatic mutations. Therefore, Strelka can guarantee higher sensitivity even for impure samples. Strelka searches InDels candidates for subsequent re-alignment (indel alignment); then, the physiological variable stability is calculated according to the information of the re-comparison, and a series of filtration is carried out to obtain a credible somatic mutation detection result. 2) MuTect2 is based on the GATK HaplotpypeCaller module, and finds regions to be further analyzed through obvious mutation evidence, which is called ActiveRegions. The algorithm then builds a De Brujin-like map, reassembles ActiveRegions, detects haplotypes that may be present, and realigns using the Smith-Waterman algorithm. Using PairHMM algorithm, ActiveRegions are paired up with each haplotype on a read data basis to generate a haplotype likelihood matrix. This matrix is then transformed to generate allele likelihoods for each possible variation position, and the probability of somatic mutation at each potential variation position is inferred. 3) The detection principle of the TNHaplotpyper of sentienon is consistent with that of Mutect2, co-registration is carried out on a cancer sample and a matched paracarcinoma sample, and then the mutation detection of the textual SNV and the Indel is carried out on the comparison BAM file after the series of operations through the TNHaplotpyper model of sentienon. However, each of the above methods has the disadvantages of high false positive rate and poor accuracy.
Aiming at the defects of the existing tumor somatic mutation detection, the inventor constructs a set of analysis flow and strategy which can effectively reduce the false positive rate of detection and improve the accuracy of somatic mutation detection.
Firstly, the inventor selects a plurality of methods based on different principles, uses the methods to detect the somatic mutation in the tumor tissue from high-throughput sequencing data respectively, and then takes intersection of the somatic mutation detection results of independent analysis, thereby greatly reducing the false positive rate of detection. The methods for detecting somatic mutations include, but are not limited to, the Strelka1 (see https:// adaptive. oup. com/bioinformatics/article/28/14/1811/218573), Strelka2 (see https:// www.nature.com/articles/s41592-018-0051-x), VarScan (see http:// vacuum. sourceforce. net), Mutect2 (see http:// www.broadinstitute.org/caner/cga/mut), and/or MuSE (see https:// bioinformatics. mdanderson. org/main/MuSE) methods. Other methods of detecting somatic mutations known in the art may also be applied to the present invention.
In some embodiments of the methods of the invention, step (a) identifies tumor-specific somatic mutations from the whole exome sequencing results by at least 3, at least 4, at least 5, e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more different methods, respectively, independently.
In some embodiments, step (a) identifies tumor-specific somatic mutations from the whole exome sequencing results by at least 3 different methods, respectively independently, and selects for tumor-specific somatic mutations that were all identified in the at least 3 different methods. For example, the at least 3 different methods are selected from the group consisting of Strelka1, Strelka2, VarScan, Mutect2, and MuSE.
In some preferred embodiments, the tumor-specific somatic mutations are identified using at least 5 different methods, e.g., the at least 5 different methods include Strelka1, Strelka2, VarScan, Mutect2, and MuSE. However, other methods known in the art for detecting somatic mutations may be further included.
In addition, the parameters of the method can be adjusted according to needs, and the detection threshold value is increased, so that the false positive rate of detection is further reduced.
More importantly, the present inventors have surprisingly found that tumour specific somatic mutations can be obtained more accurately by further filtering the results obtained by setting a specific set of filtering criteria. Thus, in some embodiments, step (a) further screens for somatic mutations that meet the following criteria:
1) the sequencing depth of the tumor tissue or cell and the normal tissue or cell is greater than or equal to 10;
2) (ii) in the sequencing data of the tumor tissue or cell, the number of reads comprising the mutation is greater than or equal to 3;
3) (ii) the allele frequency of the mutation is greater than 0.1 in the sequencing data of the tumor tissue or cell;
4) (ii) in the sequencing data of the normal tissue or cell, the mutant allele frequency is less than or equal to 0.01; and
5) the allelic frequency of the mutation is less than 0.01 in the sequencing result of the whole exome of a normal tissue or cell comprising at least 100, at least 200, at least 300 or more, for example 200 and 300 normal subjects.
As used herein, "sequencing depth" refers to the ratio of the total number of bases obtained by sequencing to the size of the genome to be tested (number of bases). For example, a target region of 1000bp in length is sequenced to give a total of 200 reads (reads), each 50bp in length, to a sequencing depth of 200 × 50bp/1000bp ═ 10.
As used herein, "allele frequency" refers to the proportion of a particular variation in a sample that is among all alleles at that variation site. For example, in a sample sequencing data, the ratio of the number of reads that contain a particular variation to the number of reads at all of the sites is the allelic frequency of the variation.
The method for identifying a tumor neoantigen of the present invention comprises the second step (b)) of further screening candidate somatic mutation sites in combination with information on the gene expression level, prediction of the gene function of the mutation, and the like.
In this step, for each individual cell mutation obtained by the first step, analysis of annotation of the mutation site at the gene structure level, the mutation function level (affecting the gene-encoding function level) is performed based on the NCBI human genome annotation information database.
In the NCBI annotation database, annotation of mutation sites at the gene structure level included: exonic, helicing, ncRNA, UTR5/UTR3, intron, upstream/downstream, intergenic > undnown. In some embodiments of the methods of the invention, the screening priority order is: the exon is divided into ncRNA, UTR5/UTR3, intron, upstream/downstream, interactive and unknown.
In the NCBI annotation database, the annotation that the mutation site affects the coding function of the gene includes: stopgain, stoplos, nononyymous SNV, synonymous SNV, and unbnown. In some embodiments of the methods of the invention, the screening priority order is: stopgain > stoploss > nonsynonymous SNV > synonymous SNV > unknown.
In some preferred embodiments, somatic mutations are selected for structural level annotation of the gene as exonic and affecting gene coding functional level annotation as nonsynonymous SNV (non-synonymous single nucleotide variation).
In addition, based on transcriptome sequencing data of tumor tissues or cells, the expression levels of all about 3 ten thousand protein-encoding genes that have been annotated in the NCBI human genome annotation information database can be detected. Thus, in this step, selection of somatic mutations based on gene expression levels may also be included.
In some embodiments, wherein somatic mutations are selected that are located within a highly expressed gene, for example, the highly expressed gene has an rpkm (reads Per Kilobase Per Million mapped reads) greater than or equal to 1. RPKM is the product of the number of reads localized to the gene (exon) divided by the number of all reads localized to the genome (in million) and the length of the gene (exon) (in kb).
Through the above steps, tumor-specific somatic mutations that are located within highly expressed genes and that alter the amino acid sequence can be identified. Thus, in some embodiments, the somatic mutation of the present invention is a mutation located in the protein coding sequence of a highly expressed gene, and which results in an amino acid mutation.
In addition, based on transcriptome sequencing data of tumor tissues or cells, the expression level of HLA gene, CD4 gene and/or CD8 gene of the subject can also be evaluated to determine whether the subject is suitable for immunotherapy with tumor neoantigens.
Thus, in some embodiments, step b) further comprises assessing the expression level of an HLA gene, a CD4 gene and/or a CD8 gene in the subject.
The method of identifying a tumor neoantigen of the present invention comprises a third step (c)) of HLA-typing the subject based on the sequencing of the whole exome of normal tissues or cells of the subject.
HLA typing remains a problem in medicine today. In clinic, the current gold standard method for HLA typing recommended by the World Health Organization (WHO) is PCR-SBT technology, but the method has the problems of non-unique typing, low resolution (4 bits), long time consumption (15 days-20 days), high cost (2000 yuan/sample) and the like.
In the invention, the sequencing data of the exons of normal tissues or cells (such as peripheral blood) of a subject are utilized to carry out HLA typing, the information of each allele on all currently known HLA I/II type gene loci is integrated, and the sequencing data of the exons are used for carrying out high-precision comparison analysis on 2 levels of amino acid sequences and nucleotide sequences, so that the typing aiming at the HLA I/II gene loci can realize the precision of more than 6 (2 x 3), the analysis time is not more than 3 hours, and the precision is more than 98 percent (the consistency of comparison with the technical result of the 'gold standard' PCR-SBT).
In some embodiments, at least one or more, preferably all, of the following databases are used for HLA typing in step (c): ATHLATES (http:// www.broadinstitute.org/scientific-community/science/projects/visual-genetic mics/athletics), HLA-HD (https:// www.genome.med.kyoto-u.ac.jp/HLA-HD /), HLAVBseq (http:// nagasakilab.csml.org/HLA), seq2HLA (http:// bitbucket.org/sebastin _ boegel/seq2HLA), and HLAminer (http:// www.bcgsc.ca/platform/bioinfor/software/wrapper/hliner).
The fourth step (d)) of the method for identifying a tumor neoantigen of the present invention is to predict a tumor neoantigen for a specific HLA type by mutating the amino acid sequence-altered tumor-specific somatic cell selected from the high-expression genes based on the analysis results of the first 3 steps.
In some embodiments, step (d) comprises:
d1) extracting an amino acid sequence corresponding to the somatic mutation, thereby obtaining a mutant peptide corresponding to the somatic mutation;
d2) based on the HLA typing results of step (c), scoring and ranking the extracted mutant peptides independently by MHC binding affinity, MHC binding stability, proteasome digestion, mass spectrometry data, respectively; and
d3) based on the results of step d2), candidate tumor neoantigens are selected by scoring and ranking the mutant peptides by geometric mean.
As used herein, an "amino acid sequence or mutant peptide" corresponding to the somatic mutation refers to an amino acid sequence or peptide comprising the amino acid mutation resulting from the somatic mutation, which is encoded by a nucleotide sequence in the genome of the subject comprising the somatic mutation.
In some embodiments, the amino acid sequence of about 8 to 35 amino acids, preferably about 15 to 27 amino acids, corresponding to said somatic mutation is extracted in d 1). For example, a series of mutant peptides of about 8 to about 35 amino acids in length corresponding to the somatic mutations can be obtained by extracting, for each tumor-specific somatic mutation identified through the preceding steps, the entire amino acid sequence extending forward and/or backward by about 7 to about 17 amino acids centered on the corresponding mutant amino acid (i.e., the mutant amino acid resulting from the somatic mutation) based on the amino acid sequence of the protein encoded by the nucleotide sequence in the genome of the subject that includes the somatic mutation. Preferably, for each tumor-specific somatic mutation identified by the preceding steps, for example, the entire amino acid sequence extending forward and backward about 7 to about 13 amino acids centered on the corresponding mutated amino acid can be extracted, thereby obtaining a series of mutated peptides of about 15 to about 27 amino acids in length corresponding to the somatic mutation.
The obtained mutant peptides were then scored and ranked independently for their likelihood of being candidate neoantigens from the perspective of their respective MHC binding affinity, MHC binding stability, proteasome cleavage (i.e., whether the mutant peptides could be produced by proteasome cleavage), and mass spectral data for the corresponding HLA type determined by the foregoing steps.
In some embodiments, the extracted mutant peptides are scored and ordered in step (d2) using one or more methods/tools selected from the group consisting of NetMHCns (http:// www.cbs.dtu.dk/services/NetMHCns), NetMHC (http:// www.cbs.dtu.dk/services/NetMHC), NetMHCpan (http:// www.cbs.dtu.dk/services/NetMHCpan), PickPocket (http:// www.cbs.dtu.dk/services/PickPocket), MHCflurry (htps:// www.sciencedirect.com/science/arle/pii/S2405471218302321 dgcid ═ r ss _ sd _ all), NetMHClab (http:// www.cbs.dtu.dk/services/MHCsacb-1.0), Chot (www.cbs.dtu.dk/services/Chop). For example, the binding affinity of a mutant peptide to a particular MHC can be analyzed using NetMHCcons, NetMHC, NetMHCpan, and/or PickPocket tools; the netMHCstab tool can be used to analyze the binding stability of mutant peptides to specific MHC; MHCflurry can be used to predict binding of mutant peptides to MHC depending on mass spectral data; NetChop can be used to analyze the possibility of proteasome cleavage to generate mutant peptides.
And finally, performing final comprehensive scoring sequencing on the mutant peptides by a geometric mean method based on prediction results of different angles. For example, for a particular mutant peptide, the MHC binding affinity ranking is 3, the MHC binding stability ranking is 2, the proteasome cleavage ranking is 2, the mass spectrometry data ranking is 4, and the geometric mean ranking isThe mutant peptides can be ranked according to the geometric mean score and candidate tumor neoantigens selected therefrom.
By the method, the tumor neoantigen can be identified with higher accuracy, and the false positive rate is obviously reduced.
Those skilled in the art will appreciate that all or part of the functions of the above-described method steps may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions of the above method steps are implemented by means of a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
In a second aspect, the present invention provides a device for identifying a tumor neoantigen in a subject, the device comprising the following four modules: tumor specific somatic mutation identification module I); tumor specific somatic mutation screening module II); HLA typing module III); and tumor neoantigen prediction module IV).
Wherein the tumor specific somatic mutation identification module I) identifies tumor specific somatic mutations based on whole exome sequencing results of the tumor tissue or cells and normal tissue or cells of the subject.
In some embodiments, the tumor-specific somatic mutation identification module I) identifies a somatic mutation from the whole exome sequencing results by at least 3 different methods independently and respectively, and selects a somatic mutation that was identified in all of the at least 3 different methods. For example, the at least 3 different methods are selected from the group consisting of Strelka1, Strelka2, VarScan, Mutect2, and MuSE.
In some preferred embodiments, the tumor specific somatic mutation identification module I) identifies the somatic mutations using at least 5 different methods, e.g., the at least 5 different methods include strelska 1, strelska 2, VarScan, Mutect2, and MuSE. However, other methods known in the art for detecting somatic mutations may be further included.
In some embodiments, the tumor-specific somatic mutation identification module I) further screens for somatic mutations that meet the following criteria:
1) the sequencing depth of the tumor tissue or cell and the normal tissue or cell is greater than or equal to 10;
2) (ii) in the sequencing data of the tumor tissue or cell, the number of reads comprising the mutation is greater than or equal to 3;
3) (ii) the allele frequency of the mutation is greater than 0.1 in the sequencing data of the tumor tissue or cell;
4) (ii) in the sequencing data of the normal tissue or cell, the mutant allele frequency is less than or equal to 0.01; and
5) the allelic frequency of the mutation is less than 0.01 in the sequencing result of the whole exome of a normal tissue or cell comprising at least 100, at least 200, at least 300 or more, for example 200 and 300 normal subjects.
Tumor-specific somatic mutation screening module II) further screening for tumor-specific somatic mutations based on the transcriptome sequencing results of the tumor tissue or cells of the subject.
In some embodiments, the tumor-specific somatic mutation screening module II) selects a somatic mutation based on the gene expression level. In some embodiments, it selects for somatic mutations that are located within a highly expressed gene, e.g., the highly expressed gene has an RPKM of 1 or greater.
In some embodiments, the tumor specific somatic mutation screening module II) performs a selection of the somatic mutations at the gene structural level and at the level of affecting gene coding function, e.g., selecting a somatic mutation with a gene structural level annotated as exonic and a level of affecting gene coding function annotated as nosynonymous SNV.
In some embodiments, tumor-specific somatic mutation screening module II) also optionally evaluates the expression level of HLA gene, CD4 gene, and/or CD8 gene in the subject.
HLA typing module III) HLA typing is performed based on the whole exome sequencing results of normal tissues or cells of the subject.
In some embodiments, HLA typing module III) HLA types using at least the following databases: ATHLATES, HLA-HD, HLAVBseq, seq2HLA and HLAminer.
Tumor neoantigen prediction module IV) predicts tumor neoantigens based on the results of the three steps.
In some embodiments, tumor neoantigen prediction module IV):
extracting an amino acid sequence corresponding to the somatic mutation, thereby obtaining a mutant peptide corresponding to the somatic mutation, for example, extracting an amino acid sequence of about 8 to 35 amino acids, preferably about 15 to 27 amino acids, for example, 25 amino acids, corresponding to the somatic mutation;
based on HLA typing results, the extracted mutant peptides are respectively and independently scored and sequenced through MHC binding affinity, MHC binding stability, proteasome enzyme digestion and mass spectrum data; and
and comprehensively scoring and sequencing the mutant peptides by a geometric mean method, thereby selecting candidate tumor neoantigens.
In some embodiments, the extracted mutant peptides are scored and ordered using one or more selected from NetMHCcons, NetMHC, NetMHCpan, PickPocket, MHCflurry, netMHCstab, NetChop.
In another aspect, the present invention also provides a device for identifying a tumor neoantigen in a subject, the device comprising: a memory for storing a program; a processor for implementing the method of the first aspect of the invention by executing the program stored in the memory.
In another aspect, the invention also provides a computer readable storage medium comprising a program executable by a processor to perform the method of the first aspect of the invention.
In another aspect, the invention provides a neoplastic antigen identified according to the method or device of the invention.
In another aspect, the invention provides a pharmaceutical composition comprising a tumor neoantigen identified according to the method or device of the invention, and a pharmaceutically acceptable carrier.
As used herein, a "pharmaceutically acceptable carrier" is a substance that can be added to an active pharmaceutical ingredient to help formulate or stabilize the formulation without causing significant adverse toxicological effects to the patient, including, but not limited to, disintegrants, binders, fillers, buffers, isotonic agents, stabilizers, antioxidants, surfactants, or lubricants.
In another aspect, the invention also provides the use of a tumor neoantigen identified according to the method or device of the invention or a pharmaceutical composition of the invention in the manufacture of a medicament for the treatment and/or prevention of cancer.
In some embodiments, the medicament is a tumor vaccine. In some embodiments, the vaccine is a therapeutic vaccine.
In some embodiments, the pharmaceutical composition or the medicament further comprises an adjuvant. For example, the adjuvant is poly I: C.
In another aspect, the present invention provides a method of treating cancer in a subject, the method comprising:
a) identifying at least one neoplastic antigen of the subject by the method or device of the invention;
b) generating at least one tumor neoantigen identified in step a); and
c) administering to said subject said at least one tumor neoantigen produced in step b).
In some embodiments, wherein a plurality of tumor neoantigens are identified, generated, and administered, e.g., at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, or even more tumor neoantigens.
In some embodiments, the tumor neoantigen is administered with an adjuvant. For example, the adjuvant is poly I: C.
In some preferred embodiments, the method further comprises administering to the subject an immune checkpoint inhibitor. The immune checkpoint inhibitors include, but are not limited to, PD1 antibodies, PDL1 antibodies, CTLA-4 antibodies, and the like.
In various aspects and embodiments herein, the cancer includes, but is not limited to, liver cancer, lung cancer, ovarian cancer, colon cancer, rectal cancer, melanoma, kidney cancer, bladder cancer, prostate cancer, breast cancer, lymphoma, hematologic malignancies, head and neck cancer, glioma, stomach cancer, nasopharyngeal cancer, laryngeal cancer, pancreatic cancer, cervical cancer, esophageal cancer, small bowel cancer, chronic or acute leukemia, and osteosarcoma.
In this context, the term "and/or" covers all combinations of items connected by the term, which should be taken as if each combination had been listed individually herein. For example, "a and/or B" encompasses "a", "a and B", and "B". For example, "A, B and/or C" encompasses "a", "B", "C", "a and B", "a and C", "B and C", and "a and B and C".
The invention is explained in more detail below with reference to specific embodiments and the drawing. However, it should not be construed as limiting the invention.
The study takes a mouse liver cancer model as an example, and starts from the second-generation sequencing of tumor tissues and peripheral blood whole exome and the second-generation sequencing result of transcriptome, the specific neoantigen of liver cancer is identified.
Example 1, accurate analysis of specific somatic mutations (physiological mutations) in tumor tissues:
1.1 summary of public databases and publicly published algorithms required for this example
TABLE 1
1.2 the specific method steps:
1) raw sequencing data acquisition and interpretation (raw data) of tumor tissue samples and peripheral blood control samples: the whole exome sequencing is a genome analysis method of high-throughput sequencing after capturing and enriching DNA of a whole genome exome region by using a sequence capture technology. Because it has high sensitivity to common and rare variations, only 2% of the genome need be sequenced to discover most disease-related variations in exon regions. The whole exome sequencing technology has the characteristics of strong pertinence, deep coverage, high data accuracy, simplicity, convenience, economy, high efficiency and the like.
Tumor tissue samples and peripheral blood control samples were obtained and subjected to high throughput exon sequencing via Illumina platform. The obtained original image data file is converted into a sequencing read (Sequenced Reads) through CASAVA Base recognition (Base Calling) analysis, and the result is stored in a FASTQ (fq for short) file format and is called Raw Reads.
The FASTQ file contains the name of each read, the base sequence, and its corresponding sequencing quality information. In the FASTQ format file, each base corresponds to a base Quality character, and the sequencing Quality Score (Phred Quality Score) is obtained by subtracting 33 from the ASCII code value corresponding to each base Quality character. Different Phred Quality Score represents different base sequencing error rates, e.g., values of 20 and 30 for Phred Quality Score indicate a base sequencing error rate of 1.0% and 0.1%, respectively. The FASTQ format is exemplified as follows:
(1) the first line starts with "@", followed by Illumina sequencing tag Identifiers (Sequence Identifiers) and descriptors (optional section);
(2) the second row is a base sequence;
(3) the third line begins with "+", followed by the Illumina sequencing tag identifier (selective moiety);
(4) the fourth row is the sequencing quality value of the corresponding base, and the value of ASCII corresponding to each character in the row is subtracted by 33, namely the sequencing quality value of the corresponding base in the second row is obtained.
2) Quality control and filtering of raw sequencing data (clean data): the raw sequencing data were quality assessed using the FastQC algorithm. The raw sequencing data was processed using Trim _ galore software, with the following criteria: the linker sequence fragments and low quality fragments with a Q value less than 20 were removed from the 3' end, while fragments less than 70bp in length were removed, resulting in Clean high quality sequenced sequence fragments for subsequent analysis (Clean data).
3) Sequencing data aligned to the reference genome (alignment): high quality sequencing data by quality control were aligned to the reference genome using the Bowtie2 algorithm. The alignment results are sorted, repeated sequences are labeled and removed.
4) Analyzing somatic mutations in the tumor tissue sample sequencing data results by comparison with the sequencing results of peripheral blood control samples: somatic mutations in tumor tissues are detected by using Strelka1, Stralka2, VarScan, Mutect2 (sentien) and MuSE analysis algorithms respectively, and then intersection is taken for detection results of the 5 independently analyzed somatic mutations, so that the false positive rate of detection is greatly reduced. And the parameters of each algorithm are adjusted, so that the detection threshold is improved, and the false positive rate of detection is further reduced.
5) Integration and filtering of 5 independent algorithm results (consistency and filtering): and (3) taking intersection sets from the somatic mutation detection results of the above 5 independent analyses, and filtering to obtain a high-quality somatic mutation result in the tumor tissue. The filtration criteria were as follows: (i) the sequencing depth of both tumor tissue and peripheral blood samples > -10; (ii) in tumor sample data, the number of reads supporting this variation > -3 (de-duplication data); (iii) in tumor sample data, the allele frequency > of the mutation is 0.1; (iv) in peripheral blood sample sequencing data, the allele frequency of the variation is < 0.01; (v) the frequency of this variation was <0.01 in 100 normal human peripheral blood exon sequencing data that the inventors have established.
6) The specific somatic mutations selected were verified by first generation sequencing (Sanger sequencing). The results show that the false positive rate of somatic mutations identified by the methods of the invention is reduced by a factor of 2-3 compared to prior art methods.
Example 2 screening of somatic mutation sites based on Gene expression level and prediction of mutated Gene function
For each individual cell mutation detected in example 1, gene-based (gene-based) and functional-based (region-based) annotation of the mutation site was performed based on the NCBI human genome annotation information database.
(1) Annotation information and priority order at the gene structure level: the exon is divided into ncRNA, UTR5/UTR3, intron, upstream/downstream, interactive and unknown.
(2) Annotation information and priority order affecting gene coding function: stopgain > stoploss > nonsynonymous SNV > synonymous SNV > unknown.
In the present invention, only the nnsynymous SNV (affecting the level of gene coding function) located in exonic (structural level of gene) was selected.
At the same time, the expression level of the annotated protein-encoding genes in all NCBI human genome annotation information databases is detected based on transcriptome sequencing data, from which
(1) Further screening out somatic cell mutation on genes with high and medium expression level (RPKM is more than or equal to 1);
(2) the expression level of HLA gene/CD 4/CD8 in the sample was evaluated.
Therefore, somatic mutations which can change protein coding sequences and are positioned on genes with high and high expression levels can be further screened, and the expression levels of the HLA genes/CD 4/CD8 are evaluated to judge whether the patient is suitable for the tumor neoantigen immunotherapy at present.
2.1 summary of public databases and published algorithms required for this example
TABLE 2
2.2 the specific method steps:
1) raw sequencing data acquisition and presentation of tumor tissue sample transcriptome (raw data): tumor tissue samples were obtained, mRNA was captured using characteristic PolyA sequences and second-generation sequencing was performed. The original image data file obtained by high-throughput sequencing (Illumina) is converted into a sequencing read (sequential Reads) through CASAVA Base recognition (Base Calling) analysis, and the result is stored in a FASTQ (fq for short) file format and is called Raw Reads.
2) Quality control and filtering of raw sequencing data (clean data): the raw sequencing data were quality assessed using the FastQC algorithm. The raw sequencing data was processed using Trim _ galore software, with the following criteria: the linker sequence fragments and low quality fragments with a Q value less than 20 were removed from the 3' end, while fragments less than 70bp in length were removed, resulting in Clean high quality sequenced sequence fragments for subsequent analysis (Clean data).
3) Sequencing data aligned to the reference genome (alignment): and (3) aligning the quality-controlled high-quality sequencing data to a reference genome by using a Tophat2 algorithm, and sequencing the aligned results.
4) Analysis of gene expression level (gene expression information): the expression level of each gene was evaluated by calculating the RPKM value.
5) Functional annotation analysis of somatic mutations (mutation annotation interpretation): for each individual cell mutation, analysis of gene-based (gene-based) and functional (region-based) annotation of the mutation site was performed based on the NCBI genome annotation information database. Only the nonynonymous SNV (affecting the level of gene coding function) located in exonic (structural level of the gene) was selected.
The results are shown in the following table:
somatic mutations in mouse models were further screened based on gene expression levels and annotation information. The number of somatic mutations that were further selected for each model is shown in bold.
TABLE 3
Example 3 HLA-I/II typing of test samples based on peripheral blood exon sequencing data
3.1 summary of public databases and published algorithms required for this example
TABLE 4
3.2 the specific method steps:
1) raw sequencing data acquisition and quality control and filtration: the same as in example 1.
2) Based on the 5 different HLA genotype database information, the sequencing data were aligned strictly to HLA gene annotated regions and HLA typed. Based on the analysis results of 5 different databases, HLA typing was judged.
Results as shown in the table below, it can be seen that the method of the present invention can achieve typing of more than 6 (2 x 3) HLA sites in 8 individuals, and the accuracy is greater than 98% compared to the gold standard PCR-SBT technique.
TABLE 5
The two columns shown in bold are the typing results for PCR-SBT.
Example 4 screening of personalized tumor neoepitope Using optimized computational model platform
In this example, based on the analysis results of the first 3 examples, the tumor neoantigen was predicted for a specific HLA type against somatic mutations selected from genes with high or medium expression levels that alter the protein coding sequence. The embodiment adopts a multi-angle analysis and comprehensive prediction strategy. Although this strategy will filter out some positive results, the screening of retained neoantigens is more accurate and the false positive rate is low. In this example, tumor-specific neoantigens are independently predicted from the aspects of binding affinity (binding affinity), binding stability (binding stability), proteasomal cleavages (proteasomal cleavages), and Mass spectrometry data (Mass spectrometry), and then the results of independent analysis from different angles are integrated, so that neoantigens with significant effects from several angles are screened. And finally, sequencing the predicted neoantigens by adopting a strategy of a geometric mean method.
4.1 summary of public databases and publicly published algorithms required for this example
TABLE 6
4.2 the specific method steps:
1) based on the somatic mutation sites analyzed in examples 1-3, all amino acid sequences were extracted by extending 7-13aa forward and backward around the missense mutation site in the protein coding region as the center.
2) Binding predicted HLA typing, tumor specific neoantigens were predicted independently for binding affinity (binding affinity), binding stability (binding stability), proteasomal cleavage (proteasomal cleavages), Mass spectrometry data (Mass spectrometry) using NetMHCcons, NetMHC, NetMHCpan, PickPocket, MHCflurry, netMHCstab, NetChop, respectively, ranked from top to bottom according to likelihood.
3) And finally, comprehensively sequencing the predicted neoantigens by adopting a geometric mean method according to sequencing of different methods.
Example 5 validation of the method and Effect of neoantigen identification based on H22 mouse tumor model
Screening of tumor neoantigen
Screening for tumor somatic mutations by Whole Exon Sequencing (WES)
First, mouse hepatocellular carcinoma (HCC) H22 cells were purchased from a double denier cell bank (FDCC). Genomic DNA was extracted from cells cultured at the 10 th passage. 6-8 weeks old Balb/C mice originated from H22 cells purchased from Beijing Wittingle, Inc., and genomic DNA was extracted from rat tail tissue. Subsequently, the above genomic DNA samples were subjected to 200 XWES sequencing by Shanghaineo and provenance. The original sequencing data were analyzed bioinformatically as described in the above examples, i.e., using Balb/C gene sequence as wild type control, calculating the frequency of somatic mutation/allelic mutation of H22 cells, and analyzing the MHC class I molecular typing of Balb/C mice and H22 cells. The results show that the H22 cell has 108 genes with amino acid mutation, namely, the H22 cell contains 108 candidate neoantigens, and the MHC class I molecules of the H22 cell and the Balb/C mouse are H2-Kd types.
RNA sequencing (RNA-seq) to detect Gene expression levels
The H22 cells cultured at the 9 th and 10 th generations were taken for RNA-seq to detect the mRNA expression level of the gene, and the average of the two generations was taken to represent the protein expression level. The mRNA expression level is represented by RPKM value, and the larger the value, the higher the expression level.
MHC class I molecule affinity prediction for neoantigenic peptides
The immunogenicity of peptides includes the ability of MHC class I/II molecules to present peptides (characterized by the MHC molar affinity of the peptide fragments and the stability of the MHC-peptide complex) and the ability of TCR to recognize the MHC-peptide complex. MHC I presented peptides are recognized by CD 8T cells, predicting that MHC I affinity of the peptide will aid in predicting its ability to activate CD 8T cell immune responses.
Some representative results of the above three steps are shown in the following table.
TABLE 7
Selection of candidate pool of neoantigens
Tumor cells usually contain multiple neoantigens, e.g., 108 in H22, and the gene expression level is the first factor in determining whether a neoantigen is a suitable vaccine target. According to the data fed back by RNA-seq sequencing, a candidate neoantigen library of H22 cells is screened by taking RPKM (RPKM is more than or equal to 1) as a standard, and 23 candidate neoantigens are screened in total (figure 2).
Second, design of new antigen vaccine
In general, a plurality of nascent CD 8T cell epitopes including mutation sites may exist near the mutation sites of amino acids, and the length is usually 8-13 amino acids. These epitopes have the potential to successfully attack target sites by CD 8T cells, and in order to cover the epitopes to the maximum extent, the immunogen corresponding to a single mutation site is designed according to the following principle: according to the protein amino acid sequence, 25 amino acid long peptides with 12 amino acids expanded on both sides by taking a mutation site as a center are used as immunogens. 23 neoantigens are screened out by the H22 model, long peptide sequences of the neoantigens are determined respectively, and then the neoantigens are handed over to Gill Biochemical (Shanghai) Co., Ltd for synthesis, and finally 17/23 long peptides are successfully synthesized. When in treatment, the selected new antigen long peptide is combined to obtain the long peptide vaccine. The following are long peptide sequences and syntheses.
TABLE 8
Immunogen long peptide sequence | Immunogen long peptide sequence | ||
Gene | 25 amino acid peptides | Gene | 25 amino acid peptides |
Abcc4 | ETLDLSWYLGIYTGLTAVTVLFGIA | Lcp1 | VNIGAEDLKEGKLYLVLGLLWQVIK |
Agap3 | NKEWKKKYVTLCGNGLLTYHPSLHD | Polrmt | QEFVWEASHYLVCQVFKSLQEMFTS |
Bard1 | CSRCANILKEPVYLGGCEHIFCSGC | Rnf121 | QLLDWLRYLVAWKPVIIGLVQGISY |
Cep192 | VLESLDSAYHQRTHLESELSQLACS | Sdcbp | KVDKVIQAQTAYFANPASQAFVLVD |
Dhodh | DLSTQTIREMYARTQGTIPIIGVGG | Sec23ip | YLFALQSHLCYWESEDTALLLLKEI |
Dhx37 | YQEIVETTKMYMNGVSTVEIQWIPS | Sestd1 | EEIESQHSEWFALYVELNQQIAALL |
Endog | ELRSYVMPNAPVNETIPLERFLVPI | Slc25a37 | RLQMYNSQHQSALSCIRTVWRTEGL |
Eya3 | HILSVPVSETTYSGQTQYQTLQQSQ | Snd1 | LEEKERSASYKPMFVTEITDDLHFY |
Fbxo4 | QLGSTDHYWNKTVRDPILWRYFLLR | Srr | EDEIKYATQLVWERMKLLIEPTAGV |
Hipk1 | AIKILKNHPSYASQGQIEVSILSRL | Tiam1 | FRFRCYLASLQGWELPNPKRLLAFA |
Khnyn | VDFILQREPYCRYINQLSEALLSLN | Vps33a | AAHLSYGRVNLNALREAVRRELREF |
Kpnb1 | HTSKFYAKGALQCLVPILTQTLTKQ |
Third, evaluation of animal pharmacodynamics
The specific pharmacodynamic protocol is shown in figure 3.
Disease model establishment
To establish the H22 tumor-bearing mouse model, 36 SPF-class Balb/C mice (6-8 weeks old, female) were purchased from Beijing Witonglie, Inc. and housed in the SPF-class animal house of the institute of medicine, Zhangjiang school, university, of double denier. Within one week after the mice reached normal status, tumor cell inoculation was immediately started.
Collecting suspension of H22 cells cultured in vitro on the same day of inoculation, washing the cell precipitate twice with sterile PBS after centrifugation, and finally resuspending the precipitate to 2 × 10 with sterile PBS7Cells/ml, stored on ice. For H22 subcutaneous tumor inoculation, 0.1ml of cell suspension (about 2X 10) was injected with a syringe6Cells) were injected under the right flank. After inoculation is completed, subcutaneous tumor growth conditions are observed every day, obvious nodules are formed on the third day until red and swollen nodules of about 5mm multiplied by 7mm are formed on the fifth day, and the subcutaneous tumor model is successfully established.
Grouping of tumor-bearing mice and treatment results
5 days after tumor inoculation, the length and length of tumor nodules were measured with a vernier caliper and tumor volume was calculated for each mouse. Subsequently, 36 mice were divided into 6 groups according to tumor volume, and the grouping results showed that the tumor volume was relatively uniform in each group of mice (fig. 4).
The first dose was given 6 days after the tumor implantation, and then the second dose was given 4 times on 9, 13 and 16 days, respectively. SLPs vaccine/anti-PD 1/poly I: C three single drugs are prepared each time. When SLPs vaccine is prepared, dissolving 2mg SLP dry powder with 0.1ml DMSO (SIGMA), adding 0.3ml 1640 culture medium (GIBCO, without serum and double antibody) to prepare 5mg/ml mother liquor, subpackaging, and storing in-80 deg.C refrigerator. Subsequently, a total of 17 SLPs were mixed one by one at a ratio of 20. mu.g/SLP/dose, and then the adjuvant poly I: C (Sigma, 5mg/ml) was added at a ratio of 50. mu.g/dose, and finally the volume was made 0.2 ml/dose with sterile PBS. anti-PD1(BE0146) was purchased from BIOXCELL, USA, and the stock solution of anti-PD1 was diluted to 1mg/ml with the recommended buffer solution of pH7.0(IP0070) from the same company. When preparing poly I: C, preparing a solvent according to the SLP preparation method, and then adding poly I: C and PBS with the same quantity as the SLPs vaccine. For administration, SLPs and poly I: C were administered via tumor-to-flank subcutaneous injection (s.c.) and anti-PD1 was administered via intraperitoneal injection (i.p.).
Tumor size was measured every 3 days, and data were collected at 8 time points in total to plot tumor growth curves for individual mice (fig. 5). After the last time point measurement was completed, the mice were sacrificed and photographs of tumor-bearing mice were taken (fig. 6). Subsequently, a sample collection or the like is performed.
Detection result of splenic cell IFN-gamma ELISPOT of tumor-bearing mice after treatment
After mice were sacrificed, spleens of each mouse were collected and spleen single cell suspensions were prepared, and a total of 6 SLPs + anti-PD1 groups and 4 mice (2 mice each with tumor size maximum/minimum) of the remaining 5 groups were selected to count 26 mice, and the neonatal antigen-specific T cell immune response in the spleens of these mice was detected using IFN-. gamma.ELISA spots. The SLPs vaccine contains 17 neoantigens, the SLP numbering of which is as follows (Table 8). To detect immune responses against SLPs, 4 overlapping detection peptides (ASPs) were designed and synthesized for 17 SLPs, respectively (FIG. 7). Prior to ELISPOT assays, splenocytes were stimulated with ASPs in 96-well plates for a long period of time to generate more antigen-specific T cells, as follows: mixing 4 stacked ASPs of one SLP at equal ratio to obtain ASP concentration of 4 μ g/ml, mixing 50 μ l ASPs with 5 × 105Spleen cells are mixed in equal volume, cytokine IL-2 with the final concentration of 20U/ml is added into the system, half volume of fresh culture medium containing 2 xASPs and 2 xIL-2 is used for changing liquid every 3 days, and ELISPOT detection is carried out after 11 days of stimulation. For detection, 50. mu.l of the stimulation mixture was mixed with 50. mu.l of 2 × ASPs, incubated overnight in a 96-well plate (BD, 51-2447KC) coated with anti-IFN-. gamma. (BD, 51-2525KC), and then secretion of IFN-. gamma.was detected using an anti-IFN-. gamma.detection antibody (BD, 51-1818KZ) according to the instructions of the kit (BD, 551083). The IFN-gamma ELISPOT detection result shows that: on the whole, the number of IFN-gamma secreting splenocytes in the SLPs + anti-PD1 group is obviously superior to that in other groups, and is particularly remarkable in SLP1/2/6/7/15/16/17 and the like; comparative analysis of poly I C + anti-PD1 and SLPs + anti-PD1, the SLPs + anti-PD1 group produced more IFN-. gamma.secreting splenocytes in all SLPs except SLP5 (FIG. 8).
TABLE 9, H22 numbering for the neo-antigen SLP and the corresponding genes
SLP numbering | SLP1 | SLP2 | SLP3 | SLP4 | SLP5 | SLP6 |
Gene | Bard1 | Cep192 | Dhodh | Endog | Eya3 | Fbxo4 |
SLP numbering | SLP7 | SLP8 | SLP9 | SLP10 | SLP11 | SLP12 |
Gene | Hipk1 | Kpnb1 | Lcp1 | Rnf121 | Sdcbp | Sestd1 |
SLP numbering | SLP13 | SLP14 | SLP15 | SLP16 | SLP17 | |
Gene | Slc25a37 | Snd1 | Srr | Tiam1 | Vps33a |
Discussion of results
The above results show that: 1. the SLPs are used independently, so that the growth of the tumor in early and middle stages (7-14 days) is slightly inhibited; 2. the anti-PD1 can be used independently to obviously inhibit the growth of tumors, wherein 2/6 mice have disappeared tumors; poly I, C alone or in combination does not affect tumor growth; SLPs + anti-PD1 treatment shows very strong inhibition effect in early and middle stages of tumor growth, and can be continued until the tumor disappears, and finally only 1 mouse tumor grows to escape, but is still significantly inhibited; treatment with SLPs + anti-PD1 resulted in more IFN- γ positive splenocytes in mice against neoantigen SLPs, suggesting that treatment with SLPs + anti-PD1 achieved tumor clearance by inducing antigen-specific immune responses. These results indicate that the method for identifying neoantigens of the present invention can effectively calculate and screen out neoantigens with immunogenicity.
Claims (26)
- A method of identifying a tumor neoantigen in a subject, the method comprising the steps of:(a) analyzing the sequencing results of the whole exome of the tumor tissues or cells and the living cells of the normal tissues of the object to identify the tumor specific somatic mutation;(b) analyzing the subject tumor tissue or cells for transcriptome sequencing results and further screening for somatic mutations identified in step (a);(c) analyzing the sequencing result of the whole exome of the normal tissues or cells of the subject, and carrying out HLA typing on the patient;(d) analyzing the binding of the mutant peptide corresponding to the somatic mutation to MHC based on the results of steps (b) and (c), thereby screening candidate tumor-specific neoantigens.
- The method of claim 1, wherein step (a) separately identifies somatic mutations from the whole exome sequencing results by at least 3 different methods, and selects for somatic mutations that were all identified in the at least 3 different methods, for example the at least 3 different methods are selected from the group consisting of Strelka1, Strelka2, VarScan, Mutect2, and MuSE.
- The method of claim 2, wherein step (a) identifies the somatic mutation using at least 5 different methods, e.g., the at least 5 different methods include strelska 1, strelska 2, VarScan, Mutect2, and MuSE.
- The method of any one of claims 1-3, wherein step (a) further screens for somatic mutations meeting the following criteria:1) the sequencing depth of the tumor tissue or cell and the normal tissue or cell is greater than or equal to 10;2) (ii) in the sequencing data of the tumor tissue or cell, the number of reads comprising the mutation is greater than or equal to 3;3) (ii) the allele frequency of the mutation is greater than 0.1 in the sequencing data of the tumor tissue or cell;4) (ii) in the sequencing data of the normal tissue or cell, the mutant allele frequency is less than or equal to 0.01; and5) the mutant has an allele frequency of less than 0.01 in a whole exome sequencing result of a normal tissue or cell comprising at least 100 normal subjects.
- The method of any one of claims 1-4, wherein step (b) comprises selecting a somatic mutation based on the level of gene expression.
- The method of claim 5, wherein somatic mutations located within highly expressed genes are selected, preferably the highly expressed genes have an RPKM of greater than or equal to 1.
- The method of any one of claims 1 to 6, wherein step (b) comprises performing a selection of said somatic mutations at the gene structure level and at the level of affecting gene-encoded function, preferably a somatic mutation with a structural annotation for the selection gene as exonic and a functional annotation for the affecting gene-encoded function as nnsynnyms SNV.
- The method of any one of claims 1-7, wherein step b) further comprises assessing the expression level of an HLA gene, a CD4 gene and/or a CD8 gene in the subject.
- The method of any one of claims 1-8, wherein in step (c) at least the following databases are used for HLA typing: ATHLATES, HLA-HD, HLAVBseq, seq2HLA and HLAminer.
- The method of any one of claims 1-9, wherein step (d) comprises:d1) extracting an amino acid sequence corresponding to the somatic mutation, for example, an amino acid sequence of about 8 to 35 amino acids, preferably about 15 to 27 amino acids, corresponding to the somatic mutation, thereby obtaining a mutant peptide corresponding to the somatic mutation;d2) based on the HLA typing results of step (c), scoring and ranking the extracted mutant peptides independently by MHC binding affinity, MHC binding stability, proteasome digestion, mass spectrometry data, respectively; andd3) based on the results of step d2), candidate tumor neoantigens are selected by scoring and ranking the mutant peptides by geometric mean.
- The method of claim 10, wherein the extracted mutant peptides are scored and ordered in step (d2) using one or more selected from the group consisting of NetMHCcons, NetMHC, NetMHCpan, PickPocket, mhcfury, netMHCstab, NetChop.
- A device for identifying a tumor neoantigen in a subject, the device comprising: a memory for storing a program; a processor for implementing the method of any one of claims 1 to 11 by executing a program stored in the memory.
- A computer readable storage medium comprising a program executable by a processor to implement the method of any one of claims 1-11.
- A device for identifying tumor neoantigens in a subject, the device comprising the following four modules: a somatic mutation identification module I) for identifying tumor-specific somatic mutations based on the results of whole exome sequencing of the tumor tissue or cells and normal tissue or cells of the subject; a tumor-specific somatic mutation screening module II that further screens for tumor-specific somatic mutations based on the transcriptome sequencing results of the subject tumor tissue or cells); an HLA typing module III for HLA typing based on the sequencing result of the whole exome of the normal tissue or cell of the subject); and tumor neoantigen prediction module IV).
- The apparatus of claim 14, wherein the somatic mutation identification module I) identifies the somatic mutations from the whole exome sequencing results by at least 3 different methods, respectively independently, and selects for the somatic mutations that were all identified in the at least 3 different methods, e.g. the at least 3 different methods are selected from the group consisting of Strelka1, Strelka2, VarScan, Mutect2 and MuSE.
- The device of claim 15, somatic mutation identification module I) identifies the somatic mutations using at least 5 different methods, e.g., the at least 5 different methods include strelska 1, strelska 2, VarScan, Mutect2, and MuSE.
- The device of any one of claims 14-16, the somatic mutation identification module I) further screening for somatic mutations meeting the following criteria:1) the sequencing depth of the tumor tissue or cell and the normal tissue or cell is greater than or equal to 10;2) (ii) in the sequencing data of the tumor tissue or cell, the number of reads comprising the mutation is greater than or equal to 3;3) (ii) the allele frequency of the mutation is greater than 0.1 in the sequencing data of the tumor tissue or cell;4) (ii) in the sequencing data of the normal tissue or cell, the mutant allele frequency is less than or equal to 0.01; and5) the allelic frequency of the mutation is less than 0.01 in the whole exome secondary sequencing result of a normal tissue or cell comprising at least 100, at least 200, at least 300 or more, for example 200 and 300 normal subjects.
- The device of any one of claims 14-17, wherein the tumor specific somatic mutation screening module II) selects a somatic mutation based on gene expression level.
- The apparatus of claim 18, wherein somatic mutations within highly expressed genes are selected, e.g., the highly expressed genes have an RPKM of greater than or equal to 1.
- The device of any one of claims 14-19, tumor specific somatic mutation screening module II) performs a selection of said somatic mutations at the gene structure level and at the level of influencing gene encoding function, e.g. a somatic mutation with a structural level of the selection gene annotated as exonic and a functional level of the influencing gene encoding annotated as nosynonymous SNV.
- The device of any one of claims 14-20, tumor specific somatic mutation screening module II) further assessing the expression level of HLA gene, CD4 gene and/or CD8 gene in said subject.
- Device according to any of claims 14 to 21, HLA typing module III) HLA typing using at least the following databases: ATHLATES, HLA-HD, HLAVBseq, seq2HLA and HLAminer.
- The device of any one of claims 14-22, tumor neoantigen prediction module IV):extracting an amino acid sequence corresponding to the somatic mutation, for example, an amino acid sequence of about 8 to 35 amino acids, preferably about 15 to 27 amino acids, corresponding to the somatic mutation, thereby obtaining a mutant peptide corresponding to the somatic mutation;based on HLA typing results, the extracted mutant peptides are respectively and independently scored and sequenced through MHC binding affinity, MHC binding stability, proteasome enzyme digestion and mass spectrum data; andand comprehensively scoring and sequencing the mutant peptides by a geometric mean method, thereby selecting candidate tumor neoantigens.
- The device of claim 23, wherein the extracted mutant peptides are scored and ordered using NetMHCcons, NetMHC, NetMHCpan, PickPocket, MHCflurry, netMHCstab, NetChop, respectively.
- A method of treating cancer in a subject, the method comprising:a) identifying at least one neoplastic antigen of the subject by the method of any one of claims 1-11;b) generating at least one tumor neoantigen identified in step a); andc) administering to said subject said at least one tumor neoantigen produced in step b).
- The method of claim 25, wherein the method further comprises administering to the subject an immune checkpoint inhibitor.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2019101985821 | 2019-03-15 | ||
CN201910198582.1A CN111696628A (en) | 2019-03-15 | 2019-03-15 | Method for identifying neoantigens |
PCT/CN2020/079131 WO2020187143A1 (en) | 2019-03-15 | 2020-03-13 | Method for identifying neoantigens |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113316818A true CN113316818A (en) | 2021-08-27 |
CN113316818B CN113316818B (en) | 2024-04-02 |
Family
ID=72475381
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910198582.1A Pending CN111696628A (en) | 2019-03-15 | 2019-03-15 | Method for identifying neoantigens |
CN202080008090.2A Active CN113316818B (en) | 2019-03-15 | 2020-03-13 | Method for identifying neoantigen |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910198582.1A Pending CN111696628A (en) | 2019-03-15 | 2019-03-15 | Method for identifying neoantigens |
Country Status (2)
Country | Link |
---|---|
CN (2) | CN111696628A (en) |
WO (1) | WO2020187143A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118351934A (en) * | 2024-04-26 | 2024-07-16 | 广州润生细胞医药科技有限责任公司 | Tumor neoantigen recognition method and system based on second-generation sequencing data |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170016075A1 (en) * | 2015-07-14 | 2017-01-19 | Personal Genome Diagnostics, Inc. | Neoantigen analysis |
WO2017024006A1 (en) * | 2015-08-03 | 2017-02-09 | The Johns Hopkins University | Personalized, allogeneic cell therapy of cancer |
US20170199961A1 (en) * | 2015-12-16 | 2017-07-13 | Gritstone Oncology, Inc. | Neoantigen Identification, Manufacture, and Use |
CN107636162A (en) * | 2015-06-01 | 2018-01-26 | 加利福尼亚技术学院 | With the composition and method of the antigen selection T cell for special group |
CN107704727A (en) * | 2017-11-03 | 2018-02-16 | 杭州风起智能科技有限公司 | Neoantigen Activity Prediction and sort method based on tumour neoantigen characteristic value |
CN108388773A (en) * | 2018-02-01 | 2018-08-10 | 杭州纽安津生物科技有限公司 | A kind of identification method of tumor neogenetic antigen |
CN108491689A (en) * | 2018-02-01 | 2018-09-04 | 杭州纽安津生物科技有限公司 | Tumour neoantigen identification method based on transcript profile |
WO2018183544A1 (en) * | 2017-03-31 | 2018-10-04 | Dana-Farber Cancer Institute, Inc. | Method for identification of retained intron tumor neoantigens from patient transcriptome |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108796055B (en) * | 2018-06-12 | 2022-04-08 | 深圳裕策生物科技有限公司 | Method, device and storage medium for detecting tumor neoantigen based on second-generation sequencing |
CN109021062B (en) * | 2018-08-06 | 2021-08-20 | 倍而达药业(苏州)有限公司 | Screening method of tumor neoantigen |
-
2019
- 2019-03-15 CN CN201910198582.1A patent/CN111696628A/en active Pending
-
2020
- 2020-03-13 WO PCT/CN2020/079131 patent/WO2020187143A1/en active Application Filing
- 2020-03-13 CN CN202080008090.2A patent/CN113316818B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107636162A (en) * | 2015-06-01 | 2018-01-26 | 加利福尼亚技术学院 | With the composition and method of the antigen selection T cell for special group |
US20170016075A1 (en) * | 2015-07-14 | 2017-01-19 | Personal Genome Diagnostics, Inc. | Neoantigen analysis |
WO2017024006A1 (en) * | 2015-08-03 | 2017-02-09 | The Johns Hopkins University | Personalized, allogeneic cell therapy of cancer |
US20170199961A1 (en) * | 2015-12-16 | 2017-07-13 | Gritstone Oncology, Inc. | Neoantigen Identification, Manufacture, and Use |
CN108601731A (en) * | 2015-12-16 | 2018-09-28 | 磨石肿瘤生物技术公司 | Discriminating, manufacture and the use of neoantigen |
WO2018183544A1 (en) * | 2017-03-31 | 2018-10-04 | Dana-Farber Cancer Institute, Inc. | Method for identification of retained intron tumor neoantigens from patient transcriptome |
CN107704727A (en) * | 2017-11-03 | 2018-02-16 | 杭州风起智能科技有限公司 | Neoantigen Activity Prediction and sort method based on tumour neoantigen characteristic value |
CN108388773A (en) * | 2018-02-01 | 2018-08-10 | 杭州纽安津生物科技有限公司 | A kind of identification method of tumor neogenetic antigen |
CN108491689A (en) * | 2018-02-01 | 2018-09-04 | 杭州纽安津生物科技有限公司 | Tumour neoantigen identification method based on transcript profile |
Non-Patent Citations (2)
Title |
---|
JEANNE MENEZ-JAMET等: "Optimized tumor cryptic peptides: the basis for universal neoantigen- like tumor vaccines", ANNALS OF TRANSLATIONAL MEDICINE, vol. 14, no. 4, 22 April 2016 (2016-04-22), pages 1 - 11 * |
高志博等: "肿瘤精准免疫治疗与基因检测", 生物产业技术, vol. 02, 28 February 2017 (2017-02-28), pages 27 - 33 * |
Also Published As
Publication number | Publication date |
---|---|
WO2020187143A1 (en) | 2020-09-24 |
CN113316818B (en) | 2024-04-02 |
CN111696628A (en) | 2020-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7530455B2 (en) | Identification, production, and use of neoantigens | |
EP2872653B1 (en) | Personalized cancer vaccines and adoptive immune cell therapies | |
TWI765875B (en) | Neoantigen identification, manufacture, and use | |
CN108796055B (en) | Method, device and storage medium for detecting tumor neoantigen based on second-generation sequencing | |
JP7034931B2 (en) | Improved compositions and methods for viral delivery of neoepitope and their use | |
JP2019513021A (en) | Arrangement and sequence of sequences for neoepitope presentation | |
JP2019511907A (en) | High-throughput identification of patient-specific neoepitopes as therapeutic targets for cancer immunotherapy | |
KR20240023699A (en) | Compositions and methods for viral cancer neoepitopes | |
US12080382B2 (en) | Viral neoepitopes and uses thereof | |
CN110799196A (en) | System for ranking immunogenic cancer-specific epitopes | |
CN113316818B (en) | Method for identifying neoantigen | |
KR20210064229A (en) | Epitope targeting method and system for neoantigen-based immunotherapy | |
WO2023068931A1 (en) | Cancer neoantigens | |
US20220296642A1 (en) | Methods of Making Therapeutic T Lymphocytes | |
US20240321392A1 (en) | Viral Neoepitopes and Uses Thereof | |
US20230197192A1 (en) | Selecting neoantigens for personalized cancer vaccine | |
EA046410B1 (en) | IMMUNOGENETIC SCREENING TEST FOR CANCER |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220816 Address after: Unit 202-1, No. 817, Lianting Road, Xiang'an District, Xiamen City, Fujian Province Applicant after: Trace Biomedical Technology (Xiamen) Co., Ltd. Address before: 14 / F, far east development building, 121 Des Voeux Road Central, Hong Kong, China Applicant before: Mark Zhun Biotechnology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |