IL307661A - Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks - Google Patents
Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networksInfo
- Publication number
- IL307661A IL307661A IL307661A IL30766123A IL307661A IL 307661 A IL307661 A IL 307661A IL 307661 A IL307661 A IL 307661A IL 30766123 A IL30766123 A IL 30766123A IL 307661 A IL307661 A IL 307661A
- Authority
- IL
- Israel
- Prior art keywords
- amino acid
- dimensional
- voxel
- voxels
- atoms
- Prior art date
Links
- 230000007918 pathogenicity Effects 0.000 title claims 4
- 108090000623 proteins and genes Proteins 0.000 title claims 4
- 102000004169 proteins and genes Human genes 0.000 title claims 4
- 238000013527 convolutional neural network Methods 0.000 title claims 2
- 150000001413 amino acids Chemical class 0.000 claims 46
- 125000004429 atom Chemical group 0.000 claims 27
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 claims 14
- 108700028369 Alleles Proteins 0.000 claims 13
- 125000003275 alpha amino acid group Chemical group 0.000 claims 8
- 239000002773 nucleotide Substances 0.000 claims 6
- 125000003729 nucleotide group Chemical group 0.000 claims 6
- 229910052799 carbon Inorganic materials 0.000 claims 4
- 238000000034 method Methods 0.000 claims 4
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Genetics & Genomics (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Crystallography & Structural Chemistry (AREA)
- Epidemiology (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Processing (AREA)
- Image Generation (AREA)
- Magnetic Resonance Imaging Apparatus (AREA)
Claims (20)
1.Claims 1. A system comprising: a voxelizer that accesses a three-dimensional structure of a reference amino acid sequence of a protein, and fits a three-dimensional grid of voxels on atoms in the three-dimensional structure on an amino acid-basis to generate amino acid-wise distance channels, wherein each of the amino acid-wise distance channels has a three-dimensional distance value for each voxel in the three-dimensional grid of voxels, and wherein the three-dimensional distance value specifies a distance from a corresponding voxel in the three-dimensional grid of voxels to atoms of a corresponding reference amino acid in the reference amino acid sequence; an alternative allele encoder that encodes an alternative allele amino acid to each voxel in the three-dimensional grid of voxels, wherein the alternative allele amino acid is a three- dimensional representation of a one-hot encoding of a variant amino acid expressed by a variant nucleotide; an evolutionary conservation encoder that encodes an evolutionary conservation sequence to each voxel in the three-dimensional grid of voxels, wherein the evolutionary conservation sequence is a three-dimensional representation of amino acid-specific conservation frequencies across a plurality of species, and wherein the amino acid-specific conservation frequencies are selected in dependence upon amino acid proximity to the corresponding voxel; and a convolutional neural network configured to: apply three-dimensional convolutions to a tensor that includes the amino acid-wise distance channels encoded with the alternative allele amino acid and respective evolutionary conservation sequences, and determine a pathogenicity of the variant nucleotide based at least in part on the tensor.
2. The system of claim 1, wherein the voxelizer centers the three-dimensional grid of voxels on an alpha-carbon atom of respective residues of reference amino acids in the reference amino acid sequence.
3. The system of claim 2, wherein the voxelizer centers the three-dimensional grid of voxels on an alpha-carbon atom of a residue of a particular reference amino acid positioned at the variant amino acid.
4. The system of claim 3, further configured to encode, in the tensor, a directionality of the reference amino acids in the reference amino acid sequence and a position of the particular reference amino acid by multiplying, with a directionality parameter, three-dimensional distance values for those reference amino acids that precede the particular reference amino acid.
5. The system of any of claims 1-4, wherein the distances from corresponding voxels to atoms are nearest-atom distances from corresponding voxel centers in the three-dimensional grid of voxels to nearest atoms of the corresponding reference amino acids.
6. The system of any of claims 2-5, wherein the reference amino acids have alpha- carbon atoms, wherein the distances from corresponding voxels to the atoms are nearest-alpha- carbon atom distances from corresponding voxel centers to nearest alpha-carbon atoms of the corresponding reference amino acids.
7. The system of any of claims 1-6, wherein the reference amino acids have beta- carbon atoms, wherein the distances from corresponding voxels to atoms are nearest-beta-carbon atom distances from corresponding voxel centers to nearest beta-carbon atoms of the corresponding reference amino acids.
8. The system of any of claims 1-6, wherein the reference amino acids have backbone atoms, wherein the distances from corresponding voxels to atoms are nearest-backbone atom distances from corresponding voxel centers to nearest backbone atoms of the corresponding reference amino acids.
9. The system of any of claims 1-6, wherein the reference amino acids have sidechain atoms, wherein the distances from corresponding voxels to atoms are nearest-sidechain atom distances from corresponding voxel centers to nearest sidechain atoms of the corresponding reference amino acids.
10. The system of any of claims 1-9, further configured to encode, in the tensor, a nearest atom channel that specifies a distance from each voxel to a nearest atom, wherein the nearest atom is selected irrespective of an amino acid to which the nearest atom belongs and atomic elements of the amino acid.
11. The system of any of claims 1-10, further comprising a reference allele encoder that voxel-wise encodes a reference allele amino acid to each voxel in the three-dimensional grid of voxels.
12. The system of claim 11, wherein the reference allele amino acid is a three- dimensional representation of a one-hot encoding of a reference amino acid that experiences the variant amino acid.
13. The system of any of claims 1-12, wherein the amino acid-specific conservation frequencies specify conservation levels of respective amino acids across the plurality of species.
14. The system of any of claims 1-13, further comprising an annotations encoder that voxel-wise encodes one or more annotation channels to each voxel in the three-dimensional grid of voxels, and wherein the one or more annotation channels are three-dimensional representations of a one-hot encoding of residue annotations.
15. The system of any of claims 1-14, further comprising a structure confidence encoder that voxel-wise encodes one or more structure confidence channels to each voxel in the three-dimensional grid of voxels, and wherein the one or more structure confidence channels are three-dimensional representations of confidence scores that specify quality of respective residue structures.
16. A computer-implemented method comprising: accessing a three-dimensional structure of a reference amino acid sequence of a protein, and fitting a three-dimensional grid of voxels on atoms in the three-dimensional structure on an amino acid-basis to generate amino acid-wise distance channels, wherein each of the amino acid-wise distance channels has a three-dimensional distance value for each voxel in the three-dimensional grid of voxels, and wherein the three-dimensional distance value specifies a distance from a corresponding voxel in the three-dimensional grid of voxels to atoms of a corresponding reference amino acid in the reference amino acid sequence; encoding an alternative allele channel to each voxel in the three-dimensional grid of voxels, wherein the alternative allele channel is a three-dimensional representation of a one-hot encoding of a variant amino acid expressed by a variant nucleotide; encoding an evolutionary conservation channel to each sequence of three-dimensional distance values across the amino acid-wise distance channels on a voxel position-basis, wherein the evolutionary conservation channel is a three-dimensional representation of amino acid-specific conservation frequencies across a plurality of species, and wherein the amino acid-specific conservation frequencies are selected in dependence upon amino acid proximity to the corresponding voxel; applying three-dimensional convolutions to a tensor that includes the amino acid-wise distance channels encoded with the alternative allele channel and respective evolutionary conservation channels; and determining a pathogenicity of the variant nucleotide based at least in part on the tensor.
17. The computer-implemented method of claim 16, further comprising: selecting a nearest atom to the corresponding voxel across the reference amino acids and atom categories, selecting pan-amino acid conservation frequencies for a residue of a reference amino acid that includes the nearest atom, and using a three-dimensional representation of the pan-amino acid conservation frequencies as the evolutionary conservation channel.
18. The computer-implemented method of claim 17, wherein the pan-amino acid conservation frequencies are configured for a particular position of the residue as observed in the plurality of species.
19. The computer-implemented method of any of claims 16, further comprising: selecting respective nearest atoms to the corresponding voxel in respective reference amino acids, selecting respective per-amino acid conservation frequencies for respective residues of the respective reference amino acids that include the respective nearest atoms, and using a three-dimensional representation of the respective per-amino acid conservation frequencies as the evolutionary conservation channel.
20. A non-transitory computer readable medium storing instructions that, when executed by at least a processor, cause a system to performing actions comprising: accessing a three-dimensional structure of a reference amino acid sequence of a protein, and fitting a three-dimensional grid of voxels on atoms in the three-dimensional structure on an amino acid-basis to generate amino acid-wise distance channels, wherein each of the amino acid-wise distance channels has a three-dimensional distance value for each voxel in the three-dimensional grid of voxels, and wherein the three-dimensional distance value specifies a distance from a corresponding voxel in the three-dimensional grid of voxels to atoms of a corresponding reference amino acid in the reference amino acid sequence; encoding an alternative allele channel to each voxel in the three-dimensional grid of voxels, wherein the alternative allele channel is a three-dimensional representation of a one-hot encoding of a variant amino acid expressed by a variant nucleotide; encoding an evolutionary conservation channel to each sequence of three-dimensional distance values across the amino acid-wise distance channels on a voxel position-basis, wherein the evolutionary conservation channel is a three-dimensional representation of amino acid-specific conservation frequencies across a plurality of species, and wherein the amino acid-specific conservation frequencies are selected in dependence upon amino acid proximity to the corresponding voxel; applying three-dimensional convolutions to a tensor that includes the amino acid-wise distance channels encoded with the alternative allele channel and respective evolutionary conservation channels; and determining a pathogenicity of the variant nucleotide based at least in part on the tensor.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163175495P | 2021-04-15 | 2021-04-15 | |
US202163175767P | 2021-04-16 | 2021-04-16 | |
US17/703,935 US20220336056A1 (en) | 2021-04-15 | 2022-03-24 | Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks |
US17/703,958 US20220336057A1 (en) | 2021-04-15 | 2022-03-24 | Efficient voxelization for deep learning |
PCT/US2022/024916 WO2022221591A1 (en) | 2021-04-15 | 2022-04-14 | Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
IL307661A true IL307661A (en) | 2023-12-01 |
Family
ID=81448684
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
IL307661A IL307661A (en) | 2021-04-15 | 2022-04-14 | Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks |
IL307667A IL307667A (en) | 2021-04-15 | 2022-04-14 | Efficient voxelization for deep learning |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
IL307667A IL307667A (en) | 2021-04-15 | 2022-04-14 | Efficient voxelization for deep learning |
Country Status (9)
Country | Link |
---|---|
EP (2) | EP4323991A1 (en) |
JP (2) | JP2024514894A (en) |
KR (2) | KR20230170680A (en) |
AU (2) | AU2022258691A1 (en) |
BR (2) | BR112023021266A2 (en) |
CA (2) | CA3215514A1 (en) |
IL (2) | IL307661A (en) |
MX (2) | MX2023012226A (en) |
WO (2) | WO2022221593A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116153404B (en) * | 2023-02-28 | 2023-08-15 | 成都信息工程大学 | Single-cell ATAC-seq data analysis method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3622521A1 (en) * | 2017-10-16 | 2020-03-18 | Illumina, Inc. | Deep convolutional neural networks for variant classification |
EP3704640A4 (en) * | 2017-10-27 | 2021-08-18 | Apostle, Inc. | Predicting cancer-related pathogenic impact of somatic mutations using deep learning-based methods |
CN110245685B (en) * | 2019-05-15 | 2022-03-25 | 清华大学 | Method, system and storage medium for predicting pathogenicity of genome single-site variation |
-
2022
- 2022-04-14 KR KR1020237034825A patent/KR20230170680A/en unknown
- 2022-04-14 WO PCT/US2022/024918 patent/WO2022221593A1/en active Application Filing
- 2022-04-14 CA CA3215514A patent/CA3215514A1/en active Pending
- 2022-04-14 AU AU2022258691A patent/AU2022258691A1/en active Pending
- 2022-04-14 AU AU2022259667A patent/AU2022259667A1/en active Pending
- 2022-04-14 JP JP2023563036A patent/JP2024514894A/en active Pending
- 2022-04-14 JP JP2023563033A patent/JP2024513995A/en active Pending
- 2022-04-14 KR KR1020237034824A patent/KR20230170679A/en unknown
- 2022-04-14 WO PCT/US2022/024916 patent/WO2022221591A1/en active Application Filing
- 2022-04-14 IL IL307661A patent/IL307661A/en unknown
- 2022-04-14 IL IL307667A patent/IL307667A/en unknown
- 2022-04-14 MX MX2023012226A patent/MX2023012226A/en unknown
- 2022-04-14 EP EP22726207.8A patent/EP4323991A1/en active Pending
- 2022-04-14 BR BR112023021266A patent/BR112023021266A2/en unknown
- 2022-04-14 MX MX2023012227A patent/MX2023012227A/en unknown
- 2022-04-14 EP EP22720250.4A patent/EP4323989A1/en active Pending
- 2022-04-14 BR BR112023021343A patent/BR112023021343A2/en unknown
- 2022-04-14 CA CA3215520A patent/CA3215520A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CA3215520A1 (en) | 2022-10-20 |
EP4323991A1 (en) | 2024-02-21 |
JP2024514894A (en) | 2024-04-03 |
WO2022221593A1 (en) | 2022-10-20 |
MX2023012227A (en) | 2024-01-08 |
MX2023012226A (en) | 2024-01-08 |
CA3215514A1 (en) | 2022-10-20 |
IL307667A (en) | 2023-12-01 |
WO2022221591A1 (en) | 2022-10-20 |
EP4323989A1 (en) | 2024-02-21 |
JP2024513995A (en) | 2024-03-27 |
KR20230170680A (en) | 2023-12-19 |
BR112023021266A2 (en) | 2023-12-12 |
KR20230170679A (en) | 2023-12-19 |
BR112023021343A2 (en) | 2023-12-19 |
AU2022259667A1 (en) | 2023-10-26 |
AU2022258691A1 (en) | 2023-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7466849B2 (en) | TRISOUP node size per slice | |
KR101958674B1 (en) | Actually-measured marine environment data assimilation method based on sequence recursive filtering three-dimensional variation | |
RU2012127528A (en) | METHOD FOR CODING / DECODING OF A MULTI-FULL VIDEO SEQUENCE ON THE BASIS OF ADAPTIVE LOCAL CORRECTION OF BRIGHTNESS OF FRAME FRAMES WITHOUT TRANSFER OF ADDITIONAL PARAMETERS (OPTIONS) | |
CN104462015B (en) | Process the fractional order linear discrete system state updating method of non-gaussian L é vy noises | |
IL307661A (en) | Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks | |
FI3922025T3 (en) | Systems, apparatus and methods for inter prediction refinement with optical flow | |
US20120191428A1 (en) | Apparatus and method for predicting total nitrogen using general water quality data | |
CN109993364A (en) | A kind of prediction technique and device of natural gas gas consumption | |
CN102857778A (en) | System and method for 3D (three-dimensional) video conversion and method and device for selecting key frame in 3D video conversion | |
CN108960486B (en) | Interactive set evolution method for predicting adaptive value based on gray support vector regression | |
CN110212592A (en) | Fired power generating unit Load Regulation maximum rate estimation method and system based on piecewise linearity expression | |
CN112970254A (en) | Rate distortion optimization method and device and computer readable storage medium | |
CN114444584A (en) | Informmer model improvement method and long sequence time sequence prediction method and system | |
CN114667732A (en) | Method for predicting attribute information, encoder, decoder, and storage medium | |
IL307671A (en) | Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures | |
CN103634600A (en) | Video coding mode selection method and system based on SSIM evaluation | |
CN113916347A (en) | Seawater sound velocity profile continuation method and device | |
CN109286817B (en) | Method for processing quantization distortion information of DCT (discrete cosine transformation) coefficient in video coding | |
KR102070145B1 (en) | Parameter determination device, method, program and recording medium | |
RU2023125430A (en) | MULTI-CHANNEL PROTEIN VOXELIZATION FOR PATHOGENICITY VARIANT PREDICTION USING DEEP CONVOLUTIONAL NEURAL NETWORKS | |
CN114009014A (en) | Color component prediction method, encoder, decoder, and computer storage medium | |
CN116611493A (en) | Hardware perception hybrid precision quantization method and system based on greedy search | |
CN111107359A (en) | Intra-frame prediction coding unit dividing method suitable for HEVC standard | |
KR20170098278A (en) | Coding device, decoding device, method thereof, program and recording medium | |
TW201740727A (en) | Methods for RDO (Rate-Distortion Optimization) based on curve fittings and apparatuses using the same |