CN117546242A - 基于蛋白质结构的蛋白质语言模型 - Google Patents

基于蛋白质结构的蛋白质语言模型 Download PDF

Info

Publication number
CN117546242A
CN117546242A CN202280043979.3A CN202280043979A CN117546242A CN 117546242 A CN117546242 A CN 117546242A CN 202280043979 A CN202280043979 A CN 202280043979A CN 117546242 A CN117546242 A CN 117546242A
Authority
CN
China
Prior art keywords
amino acid
pathogenicity
protein
amino acids
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280043979.3A
Other languages
English (en)
Chinese (zh)
Inventor
T·汉普
H·高
K-H·法尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inmair Ltd
Original Assignee
Inmair Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/533,091 external-priority patent/US11538555B1/en
Application filed by Inmair Ltd filed Critical Inmair Ltd
Priority claimed from PCT/US2022/045825 external-priority patent/WO2023059752A1/en
Publication of CN117546242A publication Critical patent/CN117546242A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/20Heterogeneous data integration

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Probability & Statistics with Applications (AREA)
  • Physiology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
CN202280043979.3A 2021-10-06 2022-10-05 基于蛋白质结构的蛋白质语言模型 Pending CN117546242A (zh)

Applications Claiming Priority (13)

Application Number Priority Date Filing Date Title
US202163253122P 2021-10-06 2021-10-06
US63/253122 2021-10-06
US202163281579P 2021-11-19 2021-11-19
US202163281592P 2021-11-19 2021-11-19
US63/281579 2021-11-19
US63/281592 2021-11-19
US17/533091 2021-11-22
US17/533,091 US11538555B1 (en) 2021-10-06 2021-11-22 Protein structure-based protein language models
US17/953293 2022-09-26
US17/953286 2022-09-26
US17/953,286 US20230108241A1 (en) 2021-10-06 2022-09-26 Predicting variant pathogenicity from evolutionary conservation using three-dimensional (3d) protein structure voxels
US17/953,293 US20230108368A1 (en) 2021-10-06 2022-09-26 Combined and transfer learning of a variant pathogenicity predictor using gapped and non-gapped protein samples
PCT/US2022/045825 WO2023059752A1 (en) 2021-10-06 2022-10-05 Protein structure-based protein language models

Publications (1)

Publication Number Publication Date
CN117546242A true CN117546242A (zh) 2024-02-09

Family

ID=89808344

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202280043979.3A Pending CN117546242A (zh) 2021-10-06 2022-10-05 基于蛋白质结构的蛋白质语言模型
CN202280046302.5A Pending CN117642824A (zh) 2021-10-06 2022-10-05 使用三维(3d)蛋白质结构体素根据进化保守性预测变体致病性

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202280046302.5A Pending CN117642824A (zh) 2021-10-06 2022-10-05 使用三维(3d)蛋白质结构体素根据进化保守性预测变体致病性

Country Status (4)

Country Link
EP (3) EP4413577A1 (https=)
JP (3) JP2024538478A (https=)
KR (3) KR20240088641A (https=)
CN (2) CN117546242A (https=)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117121110A (zh) * 2021-04-15 2023-11-24 因美纳有限公司 用于深度学习的高效体素化
CN118629516A (zh) * 2024-05-17 2024-09-10 安徽农业大学 一种基于多模态特征和孪生网络的神经肽预测方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119560009B (zh) * 2025-01-22 2025-06-24 浙江工业大学 一种蛋白质翻译后修饰与疾病关联预测系统及方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117121110A (zh) * 2021-04-15 2023-11-24 因美纳有限公司 用于深度学习的高效体素化
CN118629516A (zh) * 2024-05-17 2024-09-10 安徽农业大学 一种基于多模态特征和孪生网络的神经肽预测方法及系统

Also Published As

Publication number Publication date
JP2024538478A (ja) 2024-10-23
EP4413575A1 (en) 2024-08-14
KR20240082270A (ko) 2024-06-10
KR20240082269A (ko) 2024-06-10
EP4413577A1 (en) 2024-08-14
EP4413576A1 (en) 2024-08-14
CN117642824A (zh) 2024-03-01
KR20240088641A (ko) 2024-06-20
JP2024538477A (ja) 2024-10-23
JP2024538475A (ja) 2024-10-23

Similar Documents

Publication Publication Date Title
US12444482B2 (en) Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks
CN117546242A (zh) 基于蛋白质结构的蛋白质语言模型
US11515010B2 (en) Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3D) protein structures
KR20230170680A (ko) 심층 콘볼루션 신경망들을 사용하여 변이체 병원성을 예측하기 위한 다중 채널 단백질 복셀화
JP7755105B2 (ja) 3次元(3d)タンパク質構造を用いて変異体病原性を予測する深層畳み込みニューラルネットワーク
CN117581302A (zh) 使用有缺口和非缺口的蛋白质样品的变体致病性预测器的组合学习和迁移学习
US20230343413A1 (en) Protein structure-based protein language models
US20230108368A1 (en) Combined and transfer learning of a variant pathogenicity predictor using gapped and non-gapped protein samples
US20230047347A1 (en) Deep neural network-based variant pathogenicity prediction
CN117178327A (zh) 使用深度卷积神经网络来预测变体致病性的多通道蛋白质体素化
WO2023059750A1 (en) Combined and transfer learning of a variant pathogenicity predictor using gapped and non-gapped protein samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination