IL307671A - Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures - Google Patents

Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures

Info

Publication number
IL307671A
IL307671A IL307671A IL30767123A IL307671A IL 307671 A IL307671 A IL 307671A IL 307671 A IL307671 A IL 307671A IL 30767123 A IL30767123 A IL 30767123A IL 307671 A IL307671 A IL 307671A
Authority
IL
Israel
Prior art keywords
amino acid
amino acids
voxel
voxels
nearest
Prior art date
Application number
IL307671A
Other languages
Hebrew (he)
Original Assignee
Illumina Inc
Illumina Cambridge Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/232,056 external-priority patent/US20220336054A1/en
Priority claimed from US17/703,958 external-priority patent/US20220336057A1/en
Application filed by Illumina Inc, Illumina Cambridge Ltd filed Critical Illumina Inc
Publication of IL307671A publication Critical patent/IL307671A/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Genetics & Genomics (AREA)
  • Epidemiology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Analytical Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Peptides Or Proteins (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Claims (20)

1.Claims 1. A system, comprising: memory storing amino acid-wise distance channels for a plurality of amino acids in an amino acid sequence of a protein, wherein each of the amino acid-wise distance channels has voxel-wise distance values for voxels in a plurality of voxels, and wherein the voxel-wise distance values specify distances from corresponding voxels in the plurality of voxels to atoms of corresponding amino acids in the plurality of amino acids; and a neural network-based variant pathogenicity classifier, running on at least one processor coupled to the memory, wherein the neural network-based variant pathogenicity classifier is trained to: process as input a tensor that includes the amino acid-wise distance channels and an alternative allele amino acid of the protein expressed by a variant, and classify the variant as benign or pathogenic based at least in part on the tensor.
2. The system of claim 1, further comprising a distance channels generator, running on at least one processor coupled to the memory, that centers a voxel grid of the voxels on an alpha-carbon atom of respective residues of the corresponding amino acids and calculates the voxel-wise distance values by specifying a distance between centers of the voxels in the voxel grid and the atoms of the corresponding amino acids.
3. The system of claim 2, wherein the distance channels generator centers the voxel grid on an alpha-carbon atom of a residue of a particular amino acid that corresponds to at least one variant amino acid in the protein.
4. The system of claim 3, further configured to encode, in the tensor, a directionality of the corresponding amino acids and a position of the particular amino acid by multiplying, with a directionality parameter, voxel-wise distance values for preceding amino acids that precede the particular amino acid.
5. The system of claim 3, wherein the distances are nearest-atom distances from corresponding voxel centers in the voxel grid to nearest atoms of the corresponding amino acids.
6. The system of claim 5, wherein the corresponding amino acids have alpha-carbon atoms, wherein the distances are nearest-alpha-carbon atom distances from the corresponding voxel centers to nearest alpha-carbon atoms of the corresponding amino acids.
7. The system of claim 5, wherein the corresponding amino acids have beta-carbon atoms, wherein the distances are nearest-beta-carbon atom distances from the corresponding voxel centers to nearest beta-carbon atoms of the corresponding amino acids.
8. The system of claim 5, wherein the corresponding amino acids have backbone atoms, wherein the distances are nearest-backbone atom distances from the corresponding voxel centers to nearest backbone atoms of the corresponding amino acids.
9. The system of claim 3, further configured to encode, in the tensor, a nearest atom channel that specifies a distance from each voxel to a nearest atom, wherein the nearest atom is selected irrespective of an amino acid to which the nearest atom belongs and atomic elements of the amino acid.
10. The system of claim 1, wherein the tensor further includes evolutionary profiles that specify conservation levels of the corresponding amino acids across a plurality of species with sequences that are homologous to the amino acid sequence of the protein.
11. The system of claim 10, further comprising an evolutionary profiles generator, running on at least one processor coupled to the memory, that, for each of the voxels, uses a multi-sequence alignment to determine pan-amino acid conservation frequencies, selects a nearest atom across the plurality of amino acids and atom categories, selects a pan-amino acid conservation frequencies sequence for a residue of an amino acid that includes the nearest atom, voxelizes the pan-amino acid conservation frequencies for the residue of the amino acid, and makes the pan-amino acid conservation frequencies sequence available as one of the evolutionary profiles.
12. The system of claim 11, wherein the pan-amino acid conservation frequencies sequence is configured for a particular position of the residue as observed in the plurality of species.
13. The system of claim 11, wherein the evolutionary profiles generator, for each of the voxels, uses the multi-sequence alignment to determine per-amino acid conservation frequencies, selects respective nearest atoms in respective ones of the plurality of amino acids, selects respective per-amino acid conservation frequencies for respective residues of the plurality of amino acids that include the respective nearest atoms, voxelizes the per-amino acid conservation frequencies for the respective residues of the plurality of amino acids, and makes the per-amino acid conservation frequencies available as one of the evolutionary profiles.
14. A computer-implemented method, comprising: storing amino acid-wise distance channels for a plurality of amino acids in an amino acid sequence of a protein, wherein each of the amino acid-wise distance channels has voxel-wise distance values for voxels in a plurality of voxels, and wherein the voxel-wise distance values specify distances from corresponding voxels in the plurality of voxels to atoms of corresponding amino acids in the plurality of amino acids; processing as input a tensor that includes the amino acid-wise distance channels and an alternative allele amino acid of the protein expressed by a variant; and classifying the variant as benign or pathogenic based at least in part on the tensor.
15. The computer-implemented method of claim 14, wherein the tensor further includes an absentee atom channel that specifies atoms not found within a predefined radius of a voxel center, wherein the absentee atom channel is one-hot encoded.
16. The computer-implemented method of claim 14, wherein the tensor further includes a one-hot encoding of the alternative allele amino acid that is voxel-wise encoded to each of the amino acid-wise distance channels.
17. The computer-implemented method of claim 14, wherein the tensor further includes a reference allele amino acid in the amino acid sequence of the protein.
18. The computer-implemented method of claim 17, wherein the tensor further includes a one-hot encoding of the reference allele amino acid that is voxel-wise encoded to each of the amino acid-wise distance channels.
19. The computer-implemented method of claim 14, wherein the tensor further includes one or more of: annotation channels for the corresponding amino acids that annotate characteristics of the corresponding amino acids, wherein the annotation channels are one-hot encoded in the tensor; or structure confidence channels for the corresponding amino acids that specify quality of respective structures of the corresponding amino acids.
20. A non-transitory computer readable medium storing instructions that, when executed by at least a processor, cause a system to performing actions comprising: storing amino acid-wise distance channels for a plurality of amino acids in an amino acid sequence of a protein, wherein each of the amino acid-wise distance channels has voxel-wise distance values for voxels in a plurality of voxels, and wherein the voxel-wise distance values specify distances from corresponding voxels in the plurality of voxels to atoms of corresponding amino acids in the plurality of amino acids; processing as input a tensor that includes the amino acid-wise distance channels and an alternative allele amino acid of the protein expressed by a variant; and classifying the variant as benign or pathogenic based at least in part on the tensor.
IL307671A 2021-04-15 2022-04-14 Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures IL307671A (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US202163175495P 2021-04-15 2021-04-15
US17/232,056 US20220336054A1 (en) 2021-04-15 2021-04-15 Deep Convolutional Neural Networks to Predict Variant Pathogenicity using Three-Dimensional (3D) Protein Structures
US202163175767P 2021-04-16 2021-04-16
US17/468,411 US11515010B2 (en) 2021-04-15 2021-09-07 Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3D) protein structures
US17/703,958 US20220336057A1 (en) 2021-04-15 2022-03-24 Efficient voxelization for deep learning
US17/703,935 US20220336056A1 (en) 2021-04-15 2022-03-24 Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks
PCT/US2022/024913 WO2022221589A1 (en) 2021-04-15 2022-04-14 Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures

Publications (1)

Publication Number Publication Date
IL307671A true IL307671A (en) 2023-12-01

Family

ID=81580106

Family Applications (1)

Application Number Title Priority Date Filing Date
IL307671A IL307671A (en) 2021-04-15 2022-04-14 Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures

Country Status (8)

Country Link
EP (1) EP4323990A1 (en)
JP (1) JP2024513994A (en)
KR (1) KR20230171930A (en)
AU (1) AU2022256491A1 (en)
BR (1) BR112023021302A2 (en)
CA (1) CA3215462A1 (en)
IL (1) IL307671A (en)
WO (2) WO2022221587A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116153435B (en) * 2023-04-21 2023-08-11 山东大学齐鲁医院 Polypeptide prediction method and system based on coloring and three-dimensional structure

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10423861B2 (en) * 2017-10-16 2019-09-24 Illumina, Inc. Deep learning-based techniques for training deep convolutional neural networks
WO2019084559A1 (en) * 2017-10-27 2019-05-02 Apostle, Inc. Predicting cancer-related pathogenic impact of somatic mutations using deep learning-based methods
CN110245685B (en) * 2019-05-15 2022-03-25 清华大学 Method, system and storage medium for predicting pathogenicity of genome single-site variation

Also Published As

Publication number Publication date
BR112023021302A2 (en) 2023-12-19
CA3215462A1 (en) 2022-10-20
WO2022221589A1 (en) 2022-10-20
JP2024513994A (en) 2024-03-27
KR20230171930A (en) 2023-12-21
AU2022256491A1 (en) 2023-10-26
WO2022221587A1 (en) 2022-10-20
EP4323990A1 (en) 2024-02-21

Similar Documents

Publication Publication Date Title
CN107085716B (en) Cross-view gait recognition method based on multi-task generation countermeasure network
JP6425219B2 (en) Learning Based Segmentation for Video Coding
Jin et al. CNN oriented fast QTBT partition algorithm for JVET intra coding
CN107371022B (en) Inter-frame coding unit rapid dividing method applied to HEVC medical image lossless coding
CN104427345B (en) Acquisition methods, acquisition device, Video Codec and its method of motion vector
CN111462261B (en) Fast CU partitioning and intra-frame decision method for H.266/VVC
JP2012508883A5 (en)
Gong et al. Real-time stereo matching using orthogonal reliability-based dynamic programming
IL307671A (en) Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures
CN108960486B (en) Interactive set evolution method for predicting adaptive value based on gray support vector regression
Lin et al. Anchor assisted experience replay for online class-incremental learning
CN114821237A (en) Unsupervised ship re-identification method and system based on multi-stage comparison learning
Cai et al. A novel video coding strategy in HEVC for object detection
Maag et al. Improving video instance segmentation by light-weight temporal uncertainty estimates
CN104125470B (en) A kind of method of transmitting video data
IL307661A (en) Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks
CN113269104A (en) Group abnormal behavior identification method, system, storage medium and equipment
CN114667732A (en) Method for predicting attribute information, encoder, decoder, and storage medium
EP3304489B1 (en) An image processing apparatus and method
CN104125471B (en) A kind of video image compressing method
Montazeri Memetic algorithm image enhancement for preserving mean brightness without losing image features
Niu et al. Improving post-training quantization on object detection with task loss-guided lp metric
Liu et al. Color image segmentation using multilevel thresholding-cooperative bacterial foraging algorithm
RU2023125430A (en) MULTI-CHANNEL PROTEIN VOXELIZATION FOR PATHOGENICITY VARIANT PREDICTION USING DEEP CONVOLUTIONAL NEURAL NETWORKS
Jiang et al. Hierarchical binary classification for monocular depth estimation