IL307671B2 - Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures - Google Patents
Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structuresInfo
- Publication number
- IL307671B2 IL307671B2 IL307671A IL30767123A IL307671B2 IL 307671 B2 IL307671 B2 IL 307671B2 IL 307671 A IL307671 A IL 307671A IL 30767123 A IL30767123 A IL 30767123A IL 307671 B2 IL307671 B2 IL 307671B2
- Authority
- IL
- Israel
- Prior art keywords
- amino acid
- amino acids
- atom
- nearest
- voxel
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Crystallography & Structural Chemistry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Peptides Or Proteins (AREA)
Claims (30)
- 307671/ Claims 1. A system, comprising: memory storing amino acid-wise distance channels for a plurality of amino acids in an amino acid sequence of a protein, wherein each of the amino acid-wise distance channels has voxel-wise distance values for voxels in a plurality of voxels, and wherein the voxel-wise distance values specify distances from corresponding voxels in the plurality of voxels to atoms of corresponding amino acids in the plurality of amino acids; and a neural network-based variant pathogenicity classifier, running on at least one processor coupled to the memory, wherein the variant pathogenicity classifier is trained to process as input a tensor that includes the amino acid-wise distance channels and an alternative allele amino acid of the protein expressed by a variant, and classify the variant as benign or pathogenic based at least in part on the tensor.
- 2. The system of claim 1, further comprising a distance channels generator, running on at least one processor coupled to the memory, that centers a voxel grid of the voxels on an alpha-carbon atom of respective residues of the corresponding amino acids and calculates the voxel-wise distance values by specifying a distance between centers of the voxels in the voxel grid and the atoms of the corresponding amino acids.
- 3. The system of claim 2, wherein the distance channels generator centers the voxel grid on an alpha-carbon atom of a residue of a particular amino acid that corresponds to at least one variant amino acid in the protein. 307671/
- 4. The system of claim 3, further configured to encode, in the tensor, a directionality of the corresponding amino acids and a position of the particular amino acid by multiplying, with a directionality parameter, voxel-wise distance values for preceding amino acids that precede the particular amino acid.
- 5. The system of claim 3, wherein the distances are nearest-atom distances from corresponding voxel centers in the voxel grid to nearest atoms of the corresponding amino acids.
- 6. The system of claim 5, wherein the nearest-atom distances are Euclidean distances.
- 7. The system of claim 6, wherein the nearest-atom distances are normalized by dividing the Euclidean distances with a maximum nearest-atom distance.
- 8. The system of claim 5, wherein the corresponding amino acids have alpha-carbon atoms, wherein the distances are nearest-alpha-carbon atom distances from the corresponding voxel centers to nearest alpha-carbon atoms of the corresponding amino acids.
- 9. The system of claim 5, wherein the corresponding amino acids have beta-carbon atoms, wherein the distances are nearest-beta-carbon atom distances from the corresponding voxel centers to nearest beta-carbon atoms of the corresponding amino acids.
- 10. The system of claim 5, wherein the corresponding amino acids have backbone atoms, wherein the distances are nearest-backbone atom distances from the corresponding voxel centers to nearest backbone atoms of the corresponding amino acids. 307671/
- 11. The system of claim 5, wherein the corresponding amino acids have sidechain atoms, wherein the distances are nearest-sidechain atom distances from the corresponding voxel centers to nearest sidechain atoms of the corresponding amino acids.
- 12. The system of claim 3, further configured to encode, in the tensor, a nearest atom channel that specifies a distance from each voxel to a nearest atom, wherein the nearest atom is selected irrespective of the amino acids and atomic elements of the amino acids.
- 13. The system of claim 12, wherein the distance is a Euclidean distance.
- 14. The system of claim 13, wherein the distance is normalized by dividing the Euclidean distance with a maximum distance.
- 15. The system of claim 12, wherein the amino acids include non-standard amino acids.
- 16. The system of claim 1, wherein the tensor further includes an absentee atom channel that specifies atoms not found within a predefined radius of a voxel center, wherein the absentee atom channel is one-hot encoded.
- 17. The system of claim 1, wherein the tensor further includes a one-hot encoding of the alternative allele amino acid that is voxel-wise encoded to each of the amino acid-wise distance channels.
- 18. The system of claim 1, wherein the tensor further includes a reference allele amino acid in the amino acid sequence of the protein. 307671/
- 19. The system of claim 18, wherein the tensor further includes a one-hot encoding of the reference allele amino acid that is voxel-wise encoded to each of the amino acid-wise distance channels.
- 20. The system of claim 1, wherein the tensor further includes evolutionary profiles that specify conservation levels of the corresponding amino acids across a plurality of species with sequences that are homologous to the amino acid sequence of the protein.
- 21. The system of claim 20, further comprising an evolutionary profiles generator, running on at least one processor coupled to the memory, that, for each of the voxels, uses a multi-sequence alignment to determine pan-amino acid conservation frequencies, selects a nearest atom across the amino acids and the atom categories, selects a pan-amino acid conservation frequencies sequence for a residue of an amino acid that includes the nearest atom, voxelizes the pan-amino acid conservation frequencies for the residue of the amino acid, and makes the pan-amino acid conservation frequencies sequence available as one of the evolutionary profiles.
- 22. The system of claim 21, wherein the pan-amino acid conservation frequencies sequence is configured for a particular position of the residue as observed in the plurality of species. 307671/
- 23. The system of claim 21, wherein the pan-amino acid conservation frequencies sequence specifies whether there is a missing conservation frequency for a particular amino acid.
- 24. The system of claim 21, wherein the evolutionary profiles generator, for each of the voxels, uses the multi-sequence alignment to determine per-amino acid conservation frequencies, selects respective nearest atoms in respective ones of the amino acids, selects respective per-amino acid conservation frequencies for respective residues of the amino acids that include the nearest atoms, voxelizes the per-amino acid conservation frequencies for the respective residues of the amino acids, and makes the per-amino acid conservation frequencies available as one of the evolutionary profiles.
- 25. The system of claim 24, wherein the per-amino acid conservation frequencies are configured for a particular position of the residues as observed in the plurality of species.
- 26. The system of claim 24, wherein the per-amino acid conservation frequencies specify whether there is a missing conservation frequency for a particular amino acid.
- 27. The system of claim 1, wherein the tensor further includes annotation channels for the corresponding amino acids that annotate characteristics of the corresponding amino acids, wherein the annotation channels are one-hot encoded in the tensor. 307671/
- 28. The system of claim 1, wherein the tensor further includes structure confidence channels for the corresponding amino acids that specify quality of respective structures of the corresponding amino acids. 307671/
- 29. A system, comprising: memory storing atom category-wise distance channels for amino acids in an amino acid sequence of a protein, wherein the amino acids have atoms for a plurality of atom categories, wherein atom categories in the plurality of atom categories specify atomic elements of the amino acids, wherein each of the atom category-wise distance channels has voxel-wise distance values for voxels in a plurality of voxels, and wherein the voxel-wise distance values specify distances from corresponding voxels in the plurality of voxels to atoms in corresponding atom categories in the plurality of atom categories; and a neural network-based variant pathogenicity classifier, running on at least one processor coupled to the memory, wherein the variant pathogenicity classifier is trained to process as input a tensor that includes the atom category-wise distance channels and an alternative allele amino acid of the protein expressed by a variant, and classify the variant as benign or pathogenic based at least in part on the tensor. 307671/
- 30. A computer-implemented method, comprising: storing amino acid-wise distance channels for a plurality of amino acids in an amino acid sequence of a protein, wherein each of the amino acid-wise distance channels has voxel-wise distance values for voxels in a plurality of voxels, and wherein the voxel-wise distance values specify distances from corresponding voxels in the plurality of voxels to atoms of corresponding amino acids in the plurality of amino acids; processing as input a tensor that includes the amino acid-wise distance channels and an alternative allele of the protein expressed by a variant; and classifying the variant as benign or pathogenic based at least in part on the tensor.
Applications Claiming Priority (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163175495P | 2021-04-15 | 2021-04-15 | |
| US17/232,056 US12217829B2 (en) | 2021-04-15 | 2021-04-15 | Artificial intelligence-based analysis of protein three-dimensional (3D) structures |
| US202163175767P | 2021-04-16 | 2021-04-16 | |
| US17/468,411 US11515010B2 (en) | 2021-04-15 | 2021-09-07 | Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3D) protein structures |
| US17/703,958 US20220336057A1 (en) | 2021-04-15 | 2022-03-24 | Efficient voxelization for deep learning |
| US17/703,935 US12444482B2 (en) | 2021-04-15 | 2022-03-24 | Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks |
| PCT/US2022/024913 WO2022221589A1 (en) | 2021-04-15 | 2022-04-14 | Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| IL307671A IL307671A (en) | 2023-12-01 |
| IL307671B1 IL307671B1 (en) | 2025-06-01 |
| IL307671B2 true IL307671B2 (en) | 2025-10-01 |
Family
ID=81580106
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| IL307671A IL307671B2 (en) | 2021-04-15 | 2022-04-14 | Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures |
Country Status (9)
| Country | Link |
|---|---|
| EP (1) | EP4323990A1 (en) |
| JP (2) | JP7712387B2 (en) |
| KR (1) | KR20230171930A (en) |
| AU (1) | AU2022256491A1 (en) |
| BR (1) | BR112023021302A2 (en) |
| CA (1) | CA3215462A1 (en) |
| IL (1) | IL307671B2 (en) |
| MX (1) | MX2023012228A (en) |
| WO (2) | WO2022221587A1 (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116153435B (en) * | 2023-04-21 | 2023-08-11 | 山东大学齐鲁医院 | Polypeptide prediction method and system based on coloring and three-dimensional structure |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9373059B1 (en) * | 2014-05-05 | 2016-06-21 | Atomwise Inc. | Systems and methods for applying a convolutional network to spatial data |
| US10423861B2 (en) * | 2017-10-16 | 2019-09-24 | Illumina, Inc. | Deep learning-based techniques for training deep convolutional neural networks |
| WO2019084559A1 (en) * | 2017-10-27 | 2019-05-02 | Apostle, Inc. | Predicting cancer-related pathogenic impact of somatic mutations using deep learning-based methods |
| US11210554B2 (en) * | 2019-03-21 | 2021-12-28 | Illumina, Inc. | Artificial intelligence-based generation of sequencing metadata |
| CN110245685B (en) * | 2019-05-15 | 2022-03-25 | 清华大学 | Method, system and storage medium for predicting pathogenicity of genome single site variant |
| US12217829B2 (en) * | 2021-04-15 | 2025-02-04 | Illumina, Inc. | Artificial intelligence-based analysis of protein three-dimensional (3D) structures |
| KR20230170680A (en) * | 2021-04-15 | 2023-12-19 | 일루미나, 인코포레이티드 | Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks |
-
2022
- 2022-04-14 IL IL307671A patent/IL307671B2/en unknown
- 2022-04-14 CA CA3215462A patent/CA3215462A1/en active Pending
- 2022-04-14 JP JP2023563032A patent/JP7712387B2/en active Active
- 2022-04-14 EP EP22721220.6A patent/EP4323990A1/en active Pending
- 2022-04-14 WO PCT/US2022/024911 patent/WO2022221587A1/en not_active Ceased
- 2022-04-14 BR BR112023021302A patent/BR112023021302A2/en not_active Application Discontinuation
- 2022-04-14 MX MX2023012228A patent/MX2023012228A/en unknown
- 2022-04-14 KR KR1020237034175A patent/KR20230171930A/en active Pending
- 2022-04-14 AU AU2022256491A patent/AU2022256491A1/en active Pending
- 2022-04-14 WO PCT/US2022/024913 patent/WO2022221589A1/en not_active Ceased
-
2025
- 2025-07-10 JP JP2025116672A patent/JP7755105B2/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| MX2023012228A (en) | 2024-01-08 |
| BR112023021302A2 (en) | 2023-12-19 |
| CA3215462A1 (en) | 2022-10-20 |
| WO2022221587A1 (en) | 2022-10-20 |
| JP2025148468A (en) | 2025-10-07 |
| WO2022221589A1 (en) | 2022-10-20 |
| KR20230171930A (en) | 2023-12-21 |
| EP4323990A1 (en) | 2024-02-21 |
| JP7712387B2 (en) | 2025-07-23 |
| IL307671B1 (en) | 2025-06-01 |
| JP2024513994A (en) | 2024-03-27 |
| AU2022256491A1 (en) | 2023-10-26 |
| JP7755105B2 (en) | 2025-10-15 |
| IL307671A (en) | 2023-12-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10909409B2 (en) | System and method for blind image quality assessment | |
| CN107085716B (en) | Cross-view gait recognition method based on multi-task generation countermeasure network | |
| Lin et al. | Anchor assisted experience replay for online class-incremental learning | |
| US6600830B1 (en) | Method and system of automatically extracting facial features | |
| JP2012508883A5 (en) | ||
| EP3186963A1 (en) | Learning-based partitioning for video encoding | |
| Ahmmed et al. | Tumor detection in brain MRI image using template based K-means and Fuzzy C-means clustering algorithm | |
| Gong et al. | Real-time stereo matching using orthogonal reliability-based dynamic programming | |
| IL307671B2 (en) | Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3d) protein structures | |
| Kaushik et al. | Medical image segmentation using genetic algorithm | |
| CN105405152B (en) | Adaptive scale method for tracking target based on structuring support vector machines | |
| CN104125470B (en) | A kind of method of transmitting video data | |
| IL307661A (en) | Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks | |
| Thota et al. | Genetic algorithm based feature selection and optimized edge detection for brain tumor detection | |
| Wen et al. | Paired decision trees for fast intra decision in H. 266/VVC | |
| Park et al. | Two-stream decoder feature normality estimating network for industrial anomaly detection | |
| CN104125471B (en) | A kind of video image compressing method | |
| CN109035264B (en) | Method for implementing image threshold segmentation in quantum state space | |
| Delibasis et al. | Multimodal genetic algorithms-based algorithm for automatic point correspondence | |
| Elghareb et al. | Self-supervised Prototype Learning for Spatio-Temporal Enhanced Ultrasound-based Prostate Cancer Detection | |
| JPWO2022221589A5 (en) | ||
| Liu et al. | Color image segmentation using multilevel thresholding-cooperative bacterial foraging algorithm | |
| Shang et al. | An improved OTSU method based on Genetic Algorithm | |
| US8391365B2 (en) | Motion estimator and a motion estimation method | |
| RU2023125430A (en) | MULTI-CHANNEL PROTEIN VOXELIZATION FOR PATHOGENICITY VARIANT PREDICTION USING DEEP CONVOLUTIONAL NEURAL NETWORKS |