IL307661A - Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks - Google Patents

Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks

Info

Publication number
IL307661A
IL307661A IL307661A IL30766123A IL307661A IL 307661 A IL307661 A IL 307661A IL 307661 A IL307661 A IL 307661A IL 30766123 A IL30766123 A IL 30766123A IL 307661 A IL307661 A IL 307661A
Authority
IL
Israel
Prior art keywords
amino acid
dimensional
voxel
voxels
atoms
Prior art date
Application number
IL307661A
Other languages
Hebrew (he)
Original Assignee
Illumina Inc
Illumina Cambridge Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/703,935 external-priority patent/US20220336056A1/en
Application filed by Illumina Inc, Illumina Cambridge Ltd filed Critical Illumina Inc
Publication of IL307661A publication Critical patent/IL307661A/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Genetics & Genomics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Epidemiology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Processing (AREA)
  • Image Generation (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)

Claims (20)

1.Claims 1. A system comprising: a voxelizer that accesses a three-dimensional structure of a reference amino acid sequence of a protein, and fits a three-dimensional grid of voxels on atoms in the three-dimensional structure on an amino acid-basis to generate amino acid-wise distance channels, wherein each of the amino acid-wise distance channels has a three-dimensional distance value for each voxel in the three-dimensional grid of voxels, and wherein the three-dimensional distance value specifies a distance from a corresponding voxel in the three-dimensional grid of voxels to atoms of a corresponding reference amino acid in the reference amino acid sequence; an alternative allele encoder that encodes an alternative allele amino acid to each voxel in the three-dimensional grid of voxels, wherein the alternative allele amino acid is a three- dimensional representation of a one-hot encoding of a variant amino acid expressed by a variant nucleotide; an evolutionary conservation encoder that encodes an evolutionary conservation sequence to each voxel in the three-dimensional grid of voxels, wherein the evolutionary conservation sequence is a three-dimensional representation of amino acid-specific conservation frequencies across a plurality of species, and wherein the amino acid-specific conservation frequencies are selected in dependence upon amino acid proximity to the corresponding voxel; and a convolutional neural network configured to: apply three-dimensional convolutions to a tensor that includes the amino acid-wise distance channels encoded with the alternative allele amino acid and respective evolutionary conservation sequences, and determine a pathogenicity of the variant nucleotide based at least in part on the tensor.
2. The system of claim 1, wherein the voxelizer centers the three-dimensional grid of voxels on an alpha-carbon atom of respective residues of reference amino acids in the reference amino acid sequence.
3. The system of claim 2, wherein the voxelizer centers the three-dimensional grid of voxels on an alpha-carbon atom of a residue of a particular reference amino acid positioned at the variant amino acid.
4. The system of claim 3, further configured to encode, in the tensor, a directionality of the reference amino acids in the reference amino acid sequence and a position of the particular reference amino acid by multiplying, with a directionality parameter, three-dimensional distance values for those reference amino acids that precede the particular reference amino acid.
5. The system of any of claims 1-4, wherein the distances from corresponding voxels to atoms are nearest-atom distances from corresponding voxel centers in the three-dimensional grid of voxels to nearest atoms of the corresponding reference amino acids.
6. The system of any of claims 2-5, wherein the reference amino acids have alpha- carbon atoms, wherein the distances from corresponding voxels to the atoms are nearest-alpha- carbon atom distances from corresponding voxel centers to nearest alpha-carbon atoms of the corresponding reference amino acids.
7. The system of any of claims 1-6, wherein the reference amino acids have beta- carbon atoms, wherein the distances from corresponding voxels to atoms are nearest-beta-carbon atom distances from corresponding voxel centers to nearest beta-carbon atoms of the corresponding reference amino acids.
8. The system of any of claims 1-6, wherein the reference amino acids have backbone atoms, wherein the distances from corresponding voxels to atoms are nearest-backbone atom distances from corresponding voxel centers to nearest backbone atoms of the corresponding reference amino acids.
9. The system of any of claims 1-6, wherein the reference amino acids have sidechain atoms, wherein the distances from corresponding voxels to atoms are nearest-sidechain atom distances from corresponding voxel centers to nearest sidechain atoms of the corresponding reference amino acids.
10. The system of any of claims 1-9, further configured to encode, in the tensor, a nearest atom channel that specifies a distance from each voxel to a nearest atom, wherein the nearest atom is selected irrespective of an amino acid to which the nearest atom belongs and atomic elements of the amino acid.
11. The system of any of claims 1-10, further comprising a reference allele encoder that voxel-wise encodes a reference allele amino acid to each voxel in the three-dimensional grid of voxels.
12. The system of claim 11, wherein the reference allele amino acid is a three- dimensional representation of a one-hot encoding of a reference amino acid that experiences the variant amino acid.
13. The system of any of claims 1-12, wherein the amino acid-specific conservation frequencies specify conservation levels of respective amino acids across the plurality of species.
14. The system of any of claims 1-13, further comprising an annotations encoder that voxel-wise encodes one or more annotation channels to each voxel in the three-dimensional grid of voxels, and wherein the one or more annotation channels are three-dimensional representations of a one-hot encoding of residue annotations.
15. The system of any of claims 1-14, further comprising a structure confidence encoder that voxel-wise encodes one or more structure confidence channels to each voxel in the three-dimensional grid of voxels, and wherein the one or more structure confidence channels are three-dimensional representations of confidence scores that specify quality of respective residue structures.
16. A computer-implemented method comprising: accessing a three-dimensional structure of a reference amino acid sequence of a protein, and fitting a three-dimensional grid of voxels on atoms in the three-dimensional structure on an amino acid-basis to generate amino acid-wise distance channels, wherein each of the amino acid-wise distance channels has a three-dimensional distance value for each voxel in the three-dimensional grid of voxels, and wherein the three-dimensional distance value specifies a distance from a corresponding voxel in the three-dimensional grid of voxels to atoms of a corresponding reference amino acid in the reference amino acid sequence; encoding an alternative allele channel to each voxel in the three-dimensional grid of voxels, wherein the alternative allele channel is a three-dimensional representation of a one-hot encoding of a variant amino acid expressed by a variant nucleotide; encoding an evolutionary conservation channel to each sequence of three-dimensional distance values across the amino acid-wise distance channels on a voxel position-basis, wherein the evolutionary conservation channel is a three-dimensional representation of amino acid-specific conservation frequencies across a plurality of species, and wherein the amino acid-specific conservation frequencies are selected in dependence upon amino acid proximity to the corresponding voxel; applying three-dimensional convolutions to a tensor that includes the amino acid-wise distance channels encoded with the alternative allele channel and respective evolutionary conservation channels; and determining a pathogenicity of the variant nucleotide based at least in part on the tensor.
17. The computer-implemented method of claim 16, further comprising: selecting a nearest atom to the corresponding voxel across the reference amino acids and atom categories, selecting pan-amino acid conservation frequencies for a residue of a reference amino acid that includes the nearest atom, and using a three-dimensional representation of the pan-amino acid conservation frequencies as the evolutionary conservation channel.
18. The computer-implemented method of claim 17, wherein the pan-amino acid conservation frequencies are configured for a particular position of the residue as observed in the plurality of species.
19. The computer-implemented method of any of claims 16, further comprising: selecting respective nearest atoms to the corresponding voxel in respective reference amino acids, selecting respective per-amino acid conservation frequencies for respective residues of the respective reference amino acids that include the respective nearest atoms, and using a three-dimensional representation of the respective per-amino acid conservation frequencies as the evolutionary conservation channel.
20. A non-transitory computer readable medium storing instructions that, when executed by at least a processor, cause a system to performing actions comprising: accessing a three-dimensional structure of a reference amino acid sequence of a protein, and fitting a three-dimensional grid of voxels on atoms in the three-dimensional structure on an amino acid-basis to generate amino acid-wise distance channels, wherein each of the amino acid-wise distance channels has a three-dimensional distance value for each voxel in the three-dimensional grid of voxels, and wherein the three-dimensional distance value specifies a distance from a corresponding voxel in the three-dimensional grid of voxels to atoms of a corresponding reference amino acid in the reference amino acid sequence; encoding an alternative allele channel to each voxel in the three-dimensional grid of voxels, wherein the alternative allele channel is a three-dimensional representation of a one-hot encoding of a variant amino acid expressed by a variant nucleotide; encoding an evolutionary conservation channel to each sequence of three-dimensional distance values across the amino acid-wise distance channels on a voxel position-basis, wherein the evolutionary conservation channel is a three-dimensional representation of amino acid-specific conservation frequencies across a plurality of species, and wherein the amino acid-specific conservation frequencies are selected in dependence upon amino acid proximity to the corresponding voxel; applying three-dimensional convolutions to a tensor that includes the amino acid-wise distance channels encoded with the alternative allele channel and respective evolutionary conservation channels; and determining a pathogenicity of the variant nucleotide based at least in part on the tensor.
IL307661A 2021-04-15 2022-04-14 Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks IL307661A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163175495P 2021-04-15 2021-04-15
US202163175767P 2021-04-16 2021-04-16
US17/703,935 US20220336056A1 (en) 2021-04-15 2022-03-24 Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks
US17/703,958 US20220336057A1 (en) 2021-04-15 2022-03-24 Efficient voxelization for deep learning
PCT/US2022/024916 WO2022221591A1 (en) 2021-04-15 2022-04-14 Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks

Publications (1)

Publication Number Publication Date
IL307661A true IL307661A (en) 2023-12-01

Family

ID=81448684

Family Applications (2)

Application Number Title Priority Date Filing Date
IL307661A IL307661A (en) 2021-04-15 2022-04-14 Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks
IL307667A IL307667A (en) 2021-04-15 2022-04-14 Efficient voxelization for deep learning

Family Applications After (1)

Application Number Title Priority Date Filing Date
IL307667A IL307667A (en) 2021-04-15 2022-04-14 Efficient voxelization for deep learning

Country Status (9)

Country Link
EP (2) EP4323991A1 (en)
JP (2) JP2024514894A (en)
KR (2) KR20230170680A (en)
AU (2) AU2022258691A1 (en)
BR (2) BR112023021266A2 (en)
CA (2) CA3215514A1 (en)
IL (2) IL307661A (en)
MX (2) MX2023012226A (en)
WO (2) WO2022221593A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116153404B (en) * 2023-02-28 2023-08-15 成都信息工程大学 Single-cell ATAC-seq data analysis method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3622521A1 (en) * 2017-10-16 2020-03-18 Illumina, Inc. Deep convolutional neural networks for variant classification
EP3704640A4 (en) * 2017-10-27 2021-08-18 Apostle, Inc. Predicting cancer-related pathogenic impact of somatic mutations using deep learning-based methods
CN110245685B (en) * 2019-05-15 2022-03-25 清华大学 Method, system and storage medium for predicting pathogenicity of genome single-site variation

Also Published As

Publication number Publication date
CA3215520A1 (en) 2022-10-20
EP4323991A1 (en) 2024-02-21
JP2024514894A (en) 2024-04-03
WO2022221593A1 (en) 2022-10-20
MX2023012227A (en) 2024-01-08
MX2023012226A (en) 2024-01-08
CA3215514A1 (en) 2022-10-20
IL307667A (en) 2023-12-01
WO2022221591A1 (en) 2022-10-20
EP4323989A1 (en) 2024-02-21
JP2024513995A (en) 2024-03-27
KR20230170680A (en) 2023-12-19
BR112023021266A2 (en) 2023-12-12
KR20230170679A (en) 2023-12-19
BR112023021343A2 (en) 2023-12-19
AU2022259667A1 (en) 2023-10-26
AU2022258691A1 (en) 2023-10-26

Similar Documents

Publication Publication Date Title
JP7466849B2 (en) TRISOUP node size per slice
KR101958674B1 (en) Actually-measured marine environment data assimilation method based on sequence recursive filtering three-dimensional variation
RU2012127528A (en) METHOD FOR CODING / DECODING OF A MULTI-FULL VIDEO SEQUENCE ON THE BASIS OF ADAPTIVE LOCAL CORRECTION OF BRIGHTNESS OF FRAME FRAMES WITHOUT TRANSFER OF ADDITIONAL PARAMETERS (OPTIONS)
CN104462015B (en) Process the fractional order linear discrete system state updating method of non-gaussian L é vy noises
IL307661A (en) Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks
FI3922025T3 (en) Systems, apparatus and methods for inter prediction refinement with optical flow
US20120191428A1 (en) Apparatus and method for predicting total nitrogen using general water quality data
CN109993364A (en) A kind of prediction technique and device of natural gas gas consumption
CN102857778A (en) System and method for 3D (three-dimensional) video conversion and method and device for selecting key frame in 3D video conversion
CN108960486B (en) Interactive set evolution method for predicting adaptive value based on gray support vector regression
CN110212592A (en) Fired power generating unit Load Regulation maximum rate estimation method and system based on piecewise linearity expression
CN112970254A (en) Rate distortion optimization method and device and computer readable storage medium
CN114444584A (en) Informmer model improvement method and long sequence time sequence prediction method and system
CN114667732A (en) Method for predicting attribute information, encoder, decoder, and storage medium
IL307671A (en) Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures
CN103634600A (en) Video coding mode selection method and system based on SSIM evaluation
CN113916347A (en) Seawater sound velocity profile continuation method and device
CN109286817B (en) Method for processing quantization distortion information of DCT (discrete cosine transformation) coefficient in video coding
KR102070145B1 (en) Parameter determination device, method, program and recording medium
RU2023125430A (en) MULTI-CHANNEL PROTEIN VOXELIZATION FOR PATHOGENICITY VARIANT PREDICTION USING DEEP CONVOLUTIONAL NEURAL NETWORKS
CN114009014A (en) Color component prediction method, encoder, decoder, and computer storage medium
CN116611493A (en) Hardware perception hybrid precision quantization method and system based on greedy search
CN111107359A (en) Intra-frame prediction coding unit dividing method suitable for HEVC standard
KR20170098278A (en) Coding device, decoding device, method thereof, program and recording medium
TW201740727A (en) Methods for RDO (Rate-Distortion Optimization) based on curve fittings and apparatuses using the same