IL307667A - Efficient voxelization for deep learning - Google Patents

Efficient voxelization for deep learning

Info

Publication number
IL307667A
IL307667A IL307667A IL30766723A IL307667A IL 307667 A IL307667 A IL 307667A IL 307667 A IL307667 A IL 307667A IL 30766723 A IL30766723 A IL 30766723A IL 307667 A IL307667 A IL 307667A
Authority
IL
Israel
Prior art keywords
cell
atoms
coordinates
mapping
voxels
Prior art date
Application number
IL307667A
Other languages
Hebrew (he)
Original Assignee
Illumina Inc
Illumina Cambridge Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/703,935 external-priority patent/US20220336056A1/en
Application filed by Illumina Inc, Illumina Cambridge Ltd filed Critical Illumina Inc
Publication of IL307667A publication Critical patent/IL307667A/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Genetics & Genomics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Epidemiology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Processing (AREA)
  • Image Generation (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)

Claims (20)

1.Claims 1. A computer-implemented method of efficiently determining which elements of a sequence are nearest to uniformly spaced cells in a grid, wherein the elements have element coordinates, and cells have dimension-wise cell indices and cell coordinates, the computer- implemented method comprising: generating an element-to-cells mapping that maps, to each of the elements, a subset of the cells, wherein the subset of the cells mapped to a particular element in the sequence includes a nearest cell in the grid and one or more neighborhood cells in the grid, wherein the nearest cell is selected based on matching element coordinates of the particular element to the cell coordinates, and wherein the one or more neighborhood cells are contiguously adjacent to the nearest cell and selected based on being within a distance proximity range from the particular element; generating a cell-to-elements mapping that maps, to each of the cells, a subset of the elements, wherein the subset of the elements mapped to a particular cell in the grid includes those elements in the sequence that are mapped to the particular cell by the element-to-cells mapping; and using the cell-to-elements mapping to determine, for each of the cells, a nearest element in the sequence, wherein the nearest element to the particular cell is determined based on distances between the particular cell and the elements in the subset of the elements.
2. The computer-implemented method of claim 1, wherein matching the element coordinates of the particular element to the cell coordinates further includes truncating a decimal portion of the element coordinates to generate truncated element coordinates.
3. The computer-implemented method of claim 1 or 2, wherein matching the element coordinates of the particular element to the cell coordinates further includes: for a first dimension, matching a first truncated element coordinate in the truncated element coordinates to a first cell coordinate of a first cell in the grid, and selecting a first dimension index of the first cell; for a second dimension, matching a second truncated element coordinate in the truncated element coordinates to a second cell coordinate of a second cell in the grid, and selecting a second dimension index of the second cell; for a third dimension, matching a third truncated element coordinate in the truncated element coordinates to a third cell coordinate of a third cell in the grid, and selecting a third dimension index of the third cell; using the selected first dimension index, the selected second dimension index, and the selected third dimension index to generate an accumulated sum based on position-wise weighting the selected first dimension index, the selected second dimension index, and the selected third dimension index by powers of a radix; and using the accumulated sum as a cell index for selection of the nearest cell.
4. The computer-implemented method of any of claims 1-3, wherein the distances are calculated between cell coordinates of the particular cell and the element coordinates of the elements in the subset of the elements.
5. The computer-implemented method of any of claims 1-4, wherein the sequence is a protein sequence of amino acids.
6. The computer-implemented method of claim 5, wherein the elements are atoms of the amino acids from the protein sequence of amino acids.
7. The computer-implemented method of claim 6, wherein generating the element-to- cells mapping, generating the cell-to-elements mapping, and using the cell-to-elements mapping to determine, for each of the cells, the nearest element have a runtime complexity of O(a * f + v), wherein: a is a number of the atoms of the amino acids, f is a number of the amino acids, v is a number of the cells, and * is a multiplication operation.
8. The computer-implemented method of claim 6 or 7, wherein the atoms include alpha carbon atoms.
9. The computer-implemented method of any of claims 6-8, wherein the atoms include beta carbon atoms.
10. The computer-implemented method of any of claims 6-9, wherein the atoms include non-carbon atoms.
11. The computer-implemented method of any of claims 1-10, wherein the cells are three-dimensional voxels.
12. The computer-implemented method of any of claims 1-11, wherein the cell coordinates are three-dimensional coordinates.
13. The computer-implemented method of any of claims 1-12, wherein the element coordinates are three-dimensional coordinates.
14. The computer-implemented method of any of claim 1-13, wherein the one or more neighborhood cells are selected based on being within an index adjacency range from the nearest cell.
15. The computer-implemented method of any of claims 1-14, wherein the one or more neighborhood cells are selected based on being within a cell neighborhood in the grid that includes the nearest cell.
16. The computer-implemented method of any of claims 1-15, wherein the sequence includes M elements, wherein the subset of the elements includes N elements, and wherein M > > N.
17. A computer-implemented method of efficiently determining which atoms in a protein are nearest to voxels in a grid, wherein the atoms have three-dimensional (3D) atom coordinates, and the voxels have 3D voxel coordinates, including: generating an atom-to-voxels mapping that maps, to each of the atoms, a containing voxel selected based on matching 3D atom coordinates of a particular atom of the protein to the 3D voxel coordinates in the grid; generating a voxel-to-atoms mapping that maps, to each of the voxels, a subset of the atoms, wherein the subset of the atoms mapped to a particular voxel in the grid includes those atoms in the protein that are mapped to the particular voxel by the atom-to-voxels mapping; and using the voxel-to-atoms mapping to determine, for each of the voxels, a nearest atom in the protein.
18. The computer-implemented method of claim 17, wherein generating the atom-to- voxels mapping, generating the atom-to-voxels mapping, and using the voxel-to-atoms mapping to determine, for each of the voxels, the nearest atom have a runtime complexity of O(number of atoms).
19. A system for efficiently determining which atoms in a protein are nearest to voxels in a grid, wherein the atoms have three-dimensional (3D) atom coordinates, and the voxels have 3D voxel coordinates, the system including: at least a processor; and a non-transitory computer readable storage medium storing instructions that, when executed by at least the processor, cause the system to: generate an atom-to-voxels mapping that maps, to each of the atoms, a containing voxel selected based on matching 3D atom coordinates of a particular atom of the protein to the 3D voxel coordinates in the grid; generate a voxel-to-atoms mapping that maps, to each of the voxels, a subset of the atoms, wherein the subset of the atoms mapped to a particular voxel in the grid includes those atoms in the protein that are mapped to the particular voxel by the atom-to-voxels mapping; and use the voxel-to-atoms mapping to determine, for each of the voxels, a nearest atom in the protein.
20. The system of claim 19, further storing instructions that, when executed by at least the processor, cause the system to: generate the atom-to-voxels mapping using a runtime complexity determined by a number of the atoms; generate the voxel-to-atoms mapping using the runtime complexity determined by the number of the atoms; and use the voxel-to-atoms mapping to determine, for each of the voxels, the nearest atom in the protein based on distances between each voxel of the voxels and each atom in the subset of the atoms.
IL307667A 2021-04-15 2022-04-14 Efficient voxelization for deep learning IL307667A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163175495P 2021-04-15 2021-04-15
US202163175767P 2021-04-16 2021-04-16
US17/703,935 US20220336056A1 (en) 2021-04-15 2022-03-24 Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks
US17/703,958 US20220336057A1 (en) 2021-04-15 2022-03-24 Efficient voxelization for deep learning
PCT/US2022/024918 WO2022221593A1 (en) 2021-04-15 2022-04-14 Efficient voxelization for deep learning

Publications (1)

Publication Number Publication Date
IL307667A true IL307667A (en) 2023-12-01

Family

ID=81448684

Family Applications (2)

Application Number Title Priority Date Filing Date
IL307661A IL307661A (en) 2021-04-15 2022-04-14 Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks
IL307667A IL307667A (en) 2021-04-15 2022-04-14 Efficient voxelization for deep learning

Family Applications Before (1)

Application Number Title Priority Date Filing Date
IL307661A IL307661A (en) 2021-04-15 2022-04-14 Multi-channel protein voxelization to predict variant pathogenicity using deep convolutional neural networks

Country Status (9)

Country Link
EP (2) EP4323991A1 (en)
JP (2) JP2024514894A (en)
KR (2) KR20230170680A (en)
AU (2) AU2022258691A1 (en)
BR (2) BR112023021266A2 (en)
CA (2) CA3215514A1 (en)
IL (2) IL307661A (en)
MX (2) MX2023012226A (en)
WO (2) WO2022221593A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116153404B (en) * 2023-02-28 2023-08-15 成都信息工程大学 Single-cell ATAC-seq data analysis method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3622521A1 (en) * 2017-10-16 2020-03-18 Illumina, Inc. Deep convolutional neural networks for variant classification
EP3704640A4 (en) * 2017-10-27 2021-08-18 Apostle, Inc. Predicting cancer-related pathogenic impact of somatic mutations using deep learning-based methods
CN110245685B (en) * 2019-05-15 2022-03-25 清华大学 Method, system and storage medium for predicting pathogenicity of genome single-site variation

Also Published As

Publication number Publication date
CA3215520A1 (en) 2022-10-20
EP4323991A1 (en) 2024-02-21
JP2024514894A (en) 2024-04-03
WO2022221593A1 (en) 2022-10-20
MX2023012227A (en) 2024-01-08
MX2023012226A (en) 2024-01-08
CA3215514A1 (en) 2022-10-20
WO2022221591A1 (en) 2022-10-20
EP4323989A1 (en) 2024-02-21
JP2024513995A (en) 2024-03-27
KR20230170680A (en) 2023-12-19
BR112023021266A2 (en) 2023-12-12
KR20230170679A (en) 2023-12-19
BR112023021343A2 (en) 2023-12-19
AU2022259667A1 (en) 2023-10-26
AU2022258691A1 (en) 2023-10-26
IL307661A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
Ivey et al. Accurate interface normal and curvature estimates on three-dimensional unstructured non-convex polyhedral meshes
US8412492B2 (en) System and method for fitting feature elements using a point-cloud of an object
CN110033519B (en) Three-dimensional modeling method, device and system based on implicit function and storage medium
US20150220812A1 (en) Point cloud simplification
IL307667A (en) Efficient voxelization for deep learning
Koehl Fast recursive computation of 3d geometric moments from surface meshes
Patera et al. A comparison of fundamental methods for iso-surface extraction
Miranda et al. Mesh generation on high-curvature surfaces based on a background quadtree structure
Guerreiro et al. Greedy hypervolume subset selection in the three-objective case
EP3726477A1 (en) Chamber reconstruction from a partial volume
Miranda et al. Surface mesh regeneration considering curvatures
RU2023125247A (en) PRODUCTIVE VOXELIZATION FOR DEEP LEARNING
Kim et al. Efficient encoding and decoding extended geocodes for massive point cloud data
Weber et al. Topological cacti: Visualizing contour-based statistics
WO2022263939A1 (en) Smooth surfaces via nets of geodesics
Cavoretto et al. Landmark-based registration using a local radial basis function transformation
Saracevic et al. Method for finding and storing optimal triangulations based on square matrix
Hasanah et al. Development of software for making contour plot using matlab to be used for teaching purpose
Boes et al. Multiple organ definition in CT using a Bayesian approach for 3D model fitting
Wang et al. Grid generation on NURBS surfaces developed for ship hull form optimization
US9582912B2 (en) Device for aiding the production of a mesh of a geometric domain
US8180607B2 (en) Acoustic modeling method
Paiva et al. Approximating implicit curves on triangulations with affine arithmetic
Schwartz et al. Faster approximations of shortest geodesic paths on polyhedra through adaptive priority queue
Pinskiy et al. A hierarchical error controlled octree data structure for large-scale visualization