IL311528A - Gan-cnn for mhc peptide binding prediction - Google Patents

Gan-cnn for mhc peptide binding prediction

Info

Publication number
IL311528A
IL311528A IL311528A IL31152824A IL311528A IL 311528 A IL311528 A IL 311528A IL 311528 A IL311528 A IL 311528A IL 31152824 A IL31152824 A IL 31152824A IL 311528 A IL311528 A IL 311528A
Authority
IL
Israel
Prior art keywords
mhc
polypeptide
positive
cnn
gan
Prior art date
Application number
IL311528A
Other languages
Hebrew (he)
Original Assignee
Regeneron Pharma
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Regeneron Pharma filed Critical Regeneron Pharma
Publication of IL311528A publication Critical patent/IL311528A/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/40Searching chemical structures or physicochemical data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C99/00Subject matter not provided for in other groups of this subclass

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Epidemiology (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Peptides Or Proteins (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Claims (16)

Claims
1. A computer-implemented method for classifying data, comprising: presenting, by a computing device, a dataset to a convolutional neural network (CNN), wherein the dataset comprises a plurality of candidate polypeptide-MHC-I interactions, and wherein the CNN is trained based on positive simulated polypeptide-major histocompatibility complex class I (MHC-I) interaction data, positive real polypeptide-MHC-I interaction data, and negative real polypeptide-MHC-I interaction data; and classifying, by the CNN, at least one candidate polypeptide-MHC-I interaction of the plurality of candidate polypeptide-MHC-I interactions as positive or negative.
2. The computer-implemented method of claim 1, further comprising: a. generating, via a GAN generator, increasingly accurate positive simulated polypeptide-MHC-I interaction data until a GAN discriminator classifies the positive simulated polypeptide-MHC-I interaction data as positive; b. presenting the positive simulated polypeptide-MHC-I interaction data, positive real polypeptide-MHC-I interaction data, and negative real polypeptide-MHC-I interaction data to the CNN, until the CNN classifies each type of data as positive or negative; c. presenting the positive real data and the negative real data to the CNN to generate prediction scores; and d. determining based on the prediction scores, whether the GAN is trained or not trained, and when the GAN is not trained, repeating steps a-c until a determination is made, based on the prediction scores, that the GAN is trained.
3. The computer-implemented method of claim 2, wherein generating the increasingly accurate positive simulated polypeptide-MHC-I interaction data until the GAN discriminator classifies the positive simulated polypeptide-MHC-I interaction data as positive comprises: e. generating, by the GAN generator according to a set of GAN parameters, a first simulated dataset comprising simulated positive polypeptide-MHC-I interactions for a MHC allele; f. combining the first simulated dataset with the positive real polypeptide-MHC-I interactions for the MHC allele, and the negative real polypeptide-MHC-I interactions for the MHC allele to create a GAN training dataset; g. determining, by a discriminator according to a decision boundary, whether a respective polypeptide-MHC-I interaction for the MHC allele in the GAN training dataset is simulated positive, real positive, or real negative; h. adjusting, based on accuracy of the determination by the discriminator, one or more of the set of GAN parameters or the decision boundary; and i. repeating steps e-h until a first stop criterion is satisfied.
4. The computer-implemented method of claim 3, wherein presenting the positive simulated polypeptide-MHC-I interaction data, the positive real polypeptide- MHC-I interaction data, and the negative real polypeptide-MHC-I interaction data to the convolutional neural network (CNN), until the CNN classifies respective polypeptide- MHC-I interaction data as positive or negative comprises: j. generating, by the GAN generator according to the set of GAN parameters, a second simulated dataset comprising simulated positive polypeptide-MHC-I interactions for the MHC allele; k. combining the second simulated dataset, the positive real polypeptide-MHC-I interactions for the MHC allele, and the negative real polypeptide-MHC-I interactions for the MHC allele to create a CNN training dataset; l. presenting the CNN training dataset to the convolutional neural network (CNN); m. classifying, by the CNN according to a set of CNN parameters, a respective polypeptide-MHC-I interaction for the MHC allele in the CNN training dataset as positive or negative; n. adjusting, based on accuracy of the classification by the CNN, one or more of the set of CNN parameters; and o. repeating steps l-n until a second stop criterion is satisfied.
5. The computer-implemented of claim 4, wherein presenting the positive real polypeptide-MHC-I interaction data and the negative real polypeptide-MHC-I interaction data to the CNN to generate prediction scores comprises: classifying, by the CNN according to the set of CNN parameters, a respective polypeptide-MHC-I interaction for the MHC allele as positive or negative.
6. The computer-implemented method of claim 5, wherein determining, based on the prediction scores, whether the GAN is trained comprises determining accuracy of the classification by the CNN, wherein when the accuracy of the classification satisfies a third stop criterion, outputting the GAN and the CNN.
7. The computer-implemented method of claim 5, wherein determining, based on the prediction scores, whether the GAN is trained comprises determining accuracy of the classification by the CNN, wherein when the accuracy of the classification does not satisfy a third stop criterion, returning to step a.
8. The computer-implemented method of claim 3, wherein the set of GAN parameters comprises one or more of allele type, allele length, generating category, model complexity, learning rate, or batch size.
9. The computer-implemented method of claim 8, wherein the allele type comprises one or more of HLA-A, HLA-B, HLA-C, or a subtype thereof.
10. The computer-implemented method of claim 1, further comprising: synthesizing a polypeptide from the at least one candidate polypeptide- MHC-I interaction classified as a positive polypeptide-MHC-I interaction.
11. A polypeptide produced by the method of claim 10.
12. The computer-implemented method of claim 10, wherein the polypeptide is a tumor specific antigen.
13. The computer-implemented method of claim 10, wherein the polypeptide comprises an amino acid sequence that specifically binds to an MHC-I protein encoded by a selected MHC allele.
14. The computer-implemented method of claim 2, wherein generating the increasingly accurate positive simulated polypeptide-MHC-I interaction data until the GAN discriminator classifies the positive simulated polypeptide-MHC-I interaction data as positive comprises: iteratively executing the GAN discriminator in order to increase a likelihood of giving a high probability to positive real polypeptide-MHC-I interaction data, a low probability to the positive simulated polypeptide-MHC-I interaction data, and a low probability to the negative real polypeptide-MHC-I interaction data; and iteratively executing the GAN generator in order to increase a probability of the positive simulated polypeptide-MHC-I interaction data being rated highly.
15. An apparatus configured for performing the method of any one of claims 1- and 14.
16. A computer readable medium (CRM) configured for performing the method of any one of claims 1-9 and 14. For the Applicants, REINHOLD COHN AND PARTNERS By: Dr. Sheila Zrihan-Licht, Patent Attorney, Partner SZR
IL311528A 2018-02-17 2019-02-18 Gan-cnn for mhc peptide binding prediction IL311528A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862631710P 2018-02-17 2018-02-17
PCT/US2019/018434 WO2019161342A1 (en) 2018-02-17 2019-02-18 Gan-cnn for mhc peptide binding prediction

Publications (1)

Publication Number Publication Date
IL311528A true IL311528A (en) 2024-05-01

Family

ID=65686006

Family Applications (2)

Application Number Title Priority Date Filing Date
IL311528A IL311528A (en) 2018-02-17 2019-02-18 Gan-cnn for mhc peptide binding prediction
IL276730A IL276730B1 (en) 2018-02-17 2019-02-18 Gan-cnn for mhc peptide binding prediction

Family Applications After (1)

Application Number Title Priority Date Filing Date
IL276730A IL276730B1 (en) 2018-02-17 2019-02-18 Gan-cnn for mhc peptide binding prediction

Country Status (11)

Country Link
US (1) US20190259474A1 (en)
EP (1) EP3753022A1 (en)
JP (2) JP7047115B2 (en)
KR (2) KR102607567B1 (en)
CN (1) CN112119464A (en)
AU (2) AU2019221793B2 (en)
CA (1) CA3091480A1 (en)
IL (2) IL311528A (en)
MX (1) MX2020008597A (en)
SG (1) SG11202007854QA (en)
WO (1) WO2019161342A1 (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201718756D0 (en) * 2017-11-13 2017-12-27 Cambridge Bio-Augmentation Systems Ltd Neural interface
US10706534B2 (en) * 2017-07-26 2020-07-07 Scott Anderson Middlebrooks Method and apparatus for classifying a data point in imaging data
US11704573B2 (en) * 2019-03-25 2023-07-18 Here Global B.V. Method, apparatus, and computer program product for identifying and compensating content contributors
US20200379814A1 (en) * 2019-05-29 2020-12-03 Advanced Micro Devices, Inc. Computer resource scheduling using generative adversarial networks
CA3142888A1 (en) * 2019-06-12 2020-12-17 Quantum-Si Incorporated Techniques for protein identification using machine learning and related systems and methods
CN110598786B (en) * 2019-09-09 2022-01-07 京东方科技集团股份有限公司 Neural network training method, semantic classification method and semantic classification device
US20210150270A1 (en) * 2019-11-19 2021-05-20 International Business Machines Corporation Mathematical function defined natural language annotation
CN110875790A (en) * 2019-11-19 2020-03-10 上海大学 Wireless channel modeling implementation method based on generation countermeasure network
WO2021099584A1 (en) * 2019-11-22 2021-05-27 F. Hoffmann-La Roche Ag Multiple instance learner for tissue image classification
JP7419534B2 (en) * 2019-12-12 2024-01-22 ジャスト-エヴォテック バイオロジクス,インコーポレイテッド Generation of protein sequences using machine learning techniques based on template protein sequences
CN111063391B (en) * 2019-12-20 2023-04-25 海南大学 Non-culturable microorganism screening system based on generation type countermeasure network principle
CN111402113B (en) * 2020-03-09 2021-10-15 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
WO2021195155A1 (en) * 2020-03-23 2021-09-30 Genentech, Inc. Estimating pharmacokinetic parameters using deep learning
US20210295173A1 (en) * 2020-03-23 2021-09-23 Samsung Electronics Co., Ltd. Method and apparatus for data-free network quantization and compression with adversarial knowledge distillation
US10902291B1 (en) * 2020-08-04 2021-01-26 Superb Ai Co., Ltd. Methods for training auto labeling device and performing auto labeling related to segmentation while performing automatic verification by using uncertainty scores and devices using the same
US10885387B1 (en) * 2020-08-04 2021-01-05 SUPERB Al CO., LTD. Methods for training auto-labeling device and performing auto-labeling by using hybrid classification and devices using the same
EP4205125A4 (en) * 2020-08-28 2024-02-21 Just-Evotec Biologics, Inc. Implementing a generative machine learning architecture to produce training data for a classification model
CN112597705B (en) * 2020-12-28 2022-05-24 哈尔滨工业大学 Multi-feature health factor fusion method based on SCVNN
CN112309497B (en) * 2020-12-28 2021-04-02 武汉金开瑞生物工程有限公司 Method and device for predicting protein structure based on Cycle-GAN
KR102519341B1 (en) * 2021-03-18 2023-04-06 재단법인한국조선해양기자재연구원 Early detection system for uneven tire wear by real-time noise analysis and method thereof
US20220328127A1 (en) * 2021-04-05 2022-10-13 Nec Laboratories America, Inc. Peptide based vaccine generation system with dual projection generative adversarial networks
US20220319635A1 (en) * 2021-04-05 2022-10-06 Nec Laboratories America, Inc. Generating minority-class examples for training data
US20230083313A1 (en) * 2021-09-13 2023-03-16 Nec Laboratories America, Inc. Peptide search system for immunotherapy
KR102507111B1 (en) * 2022-03-29 2023-03-07 주식회사 네오젠티씨 Apparatus and method for determining reliability of immunopeptidome information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8121797B2 (en) * 2007-01-12 2012-02-21 Microsoft Corporation T-cell epitope prediction
US9805305B2 (en) * 2015-08-07 2017-10-31 Yahoo Holdings, Inc. Boosted deep convolutional neural networks (CNNs)
WO2018022752A1 (en) 2016-07-27 2018-02-01 James R. Glidewell Dental Ceramics, Inc. Dental cad automation using deep learning
CN106845471A (en) * 2017-02-20 2017-06-13 深圳市唯特视科技有限公司 A kind of vision significance Forecasting Methodology based on generation confrontation network
CN107590518A (en) 2017-08-14 2018-01-16 华南理工大学 A kind of confrontation network training method of multiple features study

Also Published As

Publication number Publication date
MX2020008597A (en) 2020-12-11
JP7459159B2 (en) 2024-04-01
RU2020130420A (en) 2022-03-17
CA3091480A1 (en) 2019-08-22
RU2020130420A3 (en) 2022-03-17
AU2022221568A1 (en) 2022-09-22
EP3753022A1 (en) 2020-12-23
AU2019221793A1 (en) 2020-09-17
KR20200125948A (en) 2020-11-05
AU2019221793B2 (en) 2022-09-15
IL276730B1 (en) 2024-04-01
KR20230164757A (en) 2023-12-04
SG11202007854QA (en) 2020-09-29
KR102607567B1 (en) 2023-12-01
AU2022221568B2 (en) 2024-06-13
WO2019161342A1 (en) 2019-08-22
US20190259474A1 (en) 2019-08-22
JP7047115B2 (en) 2022-04-04
JP2022101551A (en) 2022-07-06
CN112119464A (en) 2020-12-22
IL276730A (en) 2020-09-30
JP2021514086A (en) 2021-06-03

Similar Documents

Publication Publication Date Title
IL311528A (en) Gan-cnn for mhc peptide binding prediction
US11348570B2 (en) Method for generating style statement, method and apparatus for training model, and computer device
US10832685B2 (en) Speech processing device, speech processing method, and computer program product
AU2018232914A1 (en) Techniques for correcting linguistic training bias in training data
WO2016037350A1 (en) Learning student dnn via output distribution
KR20210124111A (en) Method and apparatus for training model, device, medium and program product
JP2019028839A (en) Classifier, method for learning of classifier, and method for classification by classifier
WO2022116440A1 (en) Model training method, apparatus and device
US11545238B2 (en) Machine learning method for protein modelling to design engineered peptides
US20200234196A1 (en) Machine learning method, computer-readable recording medium, and machine learning apparatus
WO2023088174A1 (en) Target detection method and apparatus
Moryossef et al. Improving quality and efficiency in plan-based neural data-to-text generation
US20200364520A1 (en) Counter rare training date for artificial intelligence
EP4162408A1 (en) Method and apparatus for enhancing performance of machine learning classification task
Zou et al. SVM learning from imbalanced data by GA sampling for protein domain prediction
WO2021234365A1 (en) Optimising a neural network
RU2022120739A (en) GAN-CNN FOR MHC-PEPTIDE BINDING PREDICTION
JP2017083990A5 (en)
CN115497564A (en) Antigen identification model establishing method and antigen identification method
JP7161974B2 (en) Quality control method
US12014728B2 (en) Dynamic combination of acoustic model states
US11238275B2 (en) Computer vision image feature identification via multi-label few-shot model
CN110162629B (en) Text classification method based on multi-base model framework
CN110515837B (en) Test case sequencing method based on EFSM model and cluster analysis
JP2021197164A (en) Information processing device, information processing method, and computer readable storage media