WO2022218509A1 - A method for predicting an effect of a gene variant on an organism by means of a data processing system and a corresponding data processing system - Google Patents
A method for predicting an effect of a gene variant on an organism by means of a data processing system and a corresponding data processing system Download PDFInfo
- Publication number
- WO2022218509A1 WO2022218509A1 PCT/EP2021/059567 EP2021059567W WO2022218509A1 WO 2022218509 A1 WO2022218509 A1 WO 2022218509A1 EP 2021059567 W EP2021059567 W EP 2021059567W WO 2022218509 A1 WO2022218509 A1 WO 2022218509A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- gene
- variant
- graph
- variants
- genes
- Prior art date
Links
- 102000054767 gene variant Human genes 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000000694 effects Effects 0.000 title claims abstract description 30
- 238000012545 processing Methods 0.000 title claims abstract description 19
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 117
- 238000013528 artificial neural network Methods 0.000 claims abstract description 38
- 230000001717 pathogenic effect Effects 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 29
- 230000003993 interaction Effects 0.000 claims description 7
- 230000009141 biological interaction Effects 0.000 claims description 5
- 201000010099 disease Diseases 0.000 description 13
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 13
- 230000002068 genetic effect Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 206010028980 Neoplasm Diseases 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 230000035772 mutation Effects 0.000 description 4
- 238000002372 labelling Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 230000010365 information processing Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000007918 pathogenicity Effects 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000464 effect on transcription Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 238000009169 immunotherapy Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
- G16B5/20—Probabilistic models
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/10—Ontologies; Annotations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/20—Heterogeneous data integration
Definitions
- the present invention relates to a method for predicting an effect of a gene variant on an organism by means of a data processing system and a data processing system for carrying out this method.
- US 2019/0139622 A1 discloses a method and a system for predicting effects of perturbations to an organism.
- the method discloses that a neural network is trained to classify the effects of perturbations to a gene or other features of the organism. After training the graph neural network is configured to predict activity of a new strain having one or more modifications to the gene.
- the prior art reference EP 3 514 798 A1 discloses a system for a prediction of genetic variants with machine learning model.
- the prior art discloses an automated computational system for predicted information about genetic variants.
- the method comprises a microprocessor, determining the functionality for each gene based on the genetic variant data and also generating a weighted genetic network comprising the plurality of genes of the genome having connections between them.
- the method also comprises a regression model explaining the type of variant affecting genes.
- WO 2016/172 464 A1 discloses a method for predicting gene-dysfunction caused by a defined genetic mutation in the genome of an organism. This reference also discloses a variant gene graph and also discloses the variant category either benign or pathogenic based on a trained machine learning model. This prior art is not disclosing the feature to identify a newly added variant category to be analyzed.
- the prior art reference US 2016/0371431 A discloses a method of predicting pathogenicity of genetic sequence variants. It also discloses that after the machine learning model is trained and has categorized the variant with respect to category of disease causing variant or not, it will identify or predict the variant pathogenicity of newly added variant.
- the prior art reference does not disclose a gene interaction network.
- the aforementioned object is accomplished by a method for predicting an effect of a gene variant on an organism by means of a data processing system, comprising the following steps:
- a data processing system for carrying out the method for predicting an effect of a gene variant on an organism comprising:
- - creating means for creating a variant-gene graph by connecting each gene variant to one or more genes to which said gene variant belongs and by connecting each gene to one or more other genes according to a pre- definable rule;
- - feeding means for feeding a new gene variant to the graph neural network for predicting by the graph neural network whether the new gene variant is benign or pathogenic.
- a particularly suitable graph neural network model According to the invention it has been recognized that it is possible to realize a very high prediction accuracy by simply providing a particularly suitable graph neural network model and training set and proceeding.
- benign and pathogenic gene variants are provided or collected from a suitable source. This means that relevant data and/or features of such gene variants are provided or collected for further use in the method.
- a suitable variant-gene graph is created by a) connecting each gene variant to one or more genes to which this gene variant belongs and by b) connecting each gene to one or more other genes according to a pre-definable rule. Then, with such a variant-gene graph a graph neural network is trained.
- a new or unknown gene variant is fed to the graph neural network for predicting by the graph neural network whether the new or unknown gene variant is benign or pathogenic. All or some of the method steps can be performed or supported by the data processing system, e.g. a computer.
- This graph neural network approach operates on a heterogeneous graph with genes and gene variants. This graph is created by assigning gene variants to genes and connecting genes with an existing gene-gene interaction network. The invention improves the prediction accuracy and allows experts to interpret the prediction by inspecting which gene variants and genes had a large effect on a prediction. The prediction of effects of new observed gene variants is possible with very high accuracy.
- the provided or collected benign and pathogenic gene variants can be provided or collected from one or more databases comprising data or features of gene variants.
- a large amount of gene variants can be used for realizing a high prediction accuracy by simple means.
- labeling for each variant to which gene or genes it belongs can be based on suitable coordinates. Benign and pathogenic gene variants can be assigned to the closest gene or genes in a related genome. This simplifies the method and provides a realization of a high prediction accuracy.
- the pre-definable rule can comprise connecting each gene to every other gene.
- the pre-definable rule can comprise connecting each gene to one or more other genes which is or are connected to said gene based on one or more predefined biological interactions.
- the one or more predefined biological interactions can simply be retrieved from a biological database or from a gene-gene interaction graph of a biological database.
- At least one feature can be collected for at least one or each gene variant, wherein preferably the at least one feature can be the output of another variant prediction model that does not use a graph.
- the at least one or each gene variant can be represented by a feature vector.
- At least one feature can be collected for at least one or each gene.
- at least one or each gene can be specified by such a feature.
- At least one or each gene can be represented by a N dimensional vector, wherein N is an integer. This provides a very simple and clear representation.
- the N dimensional vector can be a randomly initialized vector, which is optimized in the training step.
- Such a type of vector is very suitable for effectively performing the method.
- the N dimensional vector can comprise at least one collected feature and/or is a concatenation of a randomly initialized vector, which is trainable, with one or more collected gene features. Also such a type of vector is very suitable for effectively performing the method.
- each gene variant in the training set can have a definable label, e.g. 0 for benign and 1 for pathogenic.
- one or more parameters of the graph neural network can be updated using gradient descent. This proceeding supports an increase of the likelihood for gene variants in the training set to obtain the correct label from the network.
- an explanation for the prediction of a gene variant or variants can be provided by returning which other gene variant or gene variants and/or which gene or genes the graph neural network has utilized to arrive at the prediction, wherein preferably the impact can be provided, for example to an expert, that the gene variant or gene variants and/or gene or genes had on the prediction.
- This proceeding provides a high degree of information to a user of the method.
- a graph neural network approach operates on a heterogeneous graph with genes and gene variants.
- the graph can be created by assigning variants to genes and connecting genes with an existing gene-gene interaction network.
- the graph neural network can be trained to aggregate information between genes, and between genes and gene variants. Gene variants can exchange information via the genes they connect to. This method improves the prediction accuracy and allows experts to interpret the prediction by inspecting which gene variants and genes had a large effect on a prediction.
- all embodiments of the present invention provide a variant effect prediction with graph neural network, VEGN.
- a graph can consist of a set of nodes and a set of edges, where an edge holds between the two nodes.
- a gene variant or variant is a genetic variation in a genome that differs from the reference genome. Such a variant can be identified to belong to a certain gene or genes by assigning it to the nearest gene - or genes in the case of equal distance - in the genome coordinate. Given a set of variants and the set of genes they belong to, the union of this set is the set of nodes in a graph. For each variant there is an edge to the genes it belongs to. For edges between genes, we consider two options: (1) the edges are given as input, e.g. a domain expert labelled the edges; (2) we assume an edge exists from each gene to every other gene.
- a graph neural network, GNN with weights w can be trained.
- GNN graph neural network
- each variant has a feature vector - e.g. predicted variant effect on transcription factor binding, on splicing, conservation score - and a classification label, e.g. 0 or 1 for benign or pathogenic.
- a classification label e.g. 0 or 1 for benign or pathogenic.
- the GNN itself can take various forms, e.g. it could be a graph attention network, see Velickovic et al. 2018, Graph Attention Networks. International Conference on Learning Representations, ICLR. Furthermore, we can learn one joint GNN or we could learn a different GNN depending on the edge type, e.g. a different network is learnt for gene-variant, variant-gene and gene-gene edges. Furthermore, if we assume that each gene has an edge to every other gene, then we learn the strength of each edge. This can be done with a fully connected neural network, e.g. using a Transformer, see Vaswani et al. 2017, Attention is All you Need, Neural Information Processing Systems, NeurlPS. The fully connected neural network can then be used for the edge type gene-gene, whereas a GNN can be used for other edge types. This allows us to combine a given graph structure and a learnt graph structure in one joint neural network.
- v VEGN predicts a probability of the variant to be disease-causing (pathogenic): P (pathogenic).
- the graph neural network model with weights w can be trained with standard stochastic gradient descent and a cross entropy loss function:
- T(w) ⁇ m Ti log Piipatho genic) + (1 - y t ) ⁇ log(l - P t (patho genic)), where y t is the label of the variant v t in the training data, pathogenic being 1 and benign being 0, Pi(pathogenic) is the prediction for v t and where i is an integer.
- Embodiments can formulate variant effect prediction as a graph via gene attachments and can learn a graph neural network.
- Embodiments can learn an application specific gene-gene interaction graph.
- Embodiments can combine a given graph structure with a learnt graph structure in one joint neural network.
- Embodiments can explain a prediction of a variant by providing the variants and genes that and the impact they had on the prediction.
- An embodiment can comprise a method for predicting what effect a human’s gene variant will have on their body.
- the method can comprise the steps of:
- each variant can be connected to one or more genes based on step
- each gene can be either i. connected to every other gene ii. connected to the genes identified in step 3) if step 3) is present.
- the feature could be the output of another variant prediction model that does not use a graph.
- Each variant can be represented by the feature vector collected in step 5).
- Each gene can be represented by a N dimensional vector, which may be either one of the below or a concatenation: a. A randomly initialized vector, which can be optimized in the training process. b. The gene features collected in step 6). c. A concatenation of the randomly initialized vector, which is trainable, with gene features collected in step 6).
- each variant in the training set can have a label, e.g. 0 for benign 1 for pathogenic.
- the model’s parameters can be updated using gradient descent in order to increase the likelihood for variants in the training set to obtain the correct label from the network.
- Previous methods classify each variant in isolation. By treating the problem as a graph where variants are linked to each other via genes and by automatically learning a gene-gene network, embodiments of the present method can learn a graph neural network that greatly improves the accuracy of the variant prediction.
- Fig. 1 shows in a diagram the overall architecture of an embodiment of the present invention
- Fig. 2 shows in a diagram a further embodiment of the present invention
- Fig. 3 shows in a block diagram a further embodiment of the present invention
- Fig. 4 shows in a block diagram a further embodiment of the present invention.
- Fig. 1 shows in a diagram the overall architecture of an embodiment of the present invention, concretely a VEGN.
- the goal is to classify gene variants - in short form: variants - which are denoted by triangles. Variants are associated with a gene, denoted by circles, and a gene-gene network is either given or learnt. Based on this, a GNN can be learnt. New variants are added to the graph via the gene they attach to. Given a new variant’s feature vector, the GNN classifies the new variants and can give an explanation of which other variants and genes were relevant for the classification.
- Fig. 2 shows in a diagram a further embodiment of the present invention.
- Flere is shown a concrete instantiation with a different GNN for each edge type: The goal is to classify variants which are denoted by triangles, e.g. as benign 0 or pathogenic 1. Variants are associated with a gene, denoted by circles, and a gene-gene network is either given or learnt. Based on this, a GNN can be learnt. This can either be one joint GNN or different GNNs can be learnt for different edges. E.g. for the three different edge types - “gene has variant”, “gene interacts with gene” and “variant in gene” - separate GNN layers are instantiated and learnt.
- Arrows within a layer indicate the direction of information flow, where the hidden representation of the arrow's source is used to update the hidden representation of the arrow's target.
- the arrows represent the weights of the GNN that is learnt and these weights are shared within this layer, i.e. for ’’variant in gene”, each variant has its own feature vector and to this the same GNN layer's weights are applied to update the target hidden representation.
- the hidden representations of each layer are aggregated, e.g. by sum.
- a classification layer e.g. via a sigmoid function, determines the likelihood of a variant being benign or pathogenic.
- weights can be updated via a loss function and backpropagation.
- new variants can be added to the graph via the gene they attach to.
- the learnt weights can be applied in a forward pass to derive a prediction.
- VEGN or embodiments of the present invention can be used to prioritize a short list of variants for clinician to manually inspect.
- Fig. 3 shows in a block diagram such a further embodiment of the present invention.
- patients first have their genome sequenced with whole genome sequencing or whole exon sequencing.
- a list of variants is generated through variant calling on the sequencing data.
- VEGN or embodiments of the present invention can be applied to each of the variant and predict a disease-relevance score P(pathogenic).
- the variants can then be sorted based on the score in descending order.
- the top k variants, wherein k is an integer, are selected for further manual investigation by domain experts. The number of k is dependent on the resource.
- Neoantigens are antigens found specifically in tumor samples. They are products from tumor-specific variants. Due to the tumor-specificity of neoantigens, they are frequently used as targets for immunotherapy. Existing neoantigen selection pipelines typically do not consider the effects of variants. VEGN or embodiments of the present invention can help to prioritize and select most biologically relevant variants.
- Fig. 4 shows in a block diagram such a further embodiment of the present invention.
- tumor samples are whole genome sequenced or whole exon sequenced.
- a list of missense variants is generated through variant calling on the sequencing data.
- VEGN or embodiments of the present invention can be applied to each of the variant and predict a disease-relevance score P(pathogenic). The variants can then be sorted based on the score in descending order.
- the predicted disease-causing probabilities are combined with other evidence in an existing neoantigen discovery pipeline to select for neoantigens.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Biotechnology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Bioethics (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
For realizing a very high prediction accuracy by simple means a method for predicting an effect of a gene variant on an organism by means of a data processing system is provided, comprising the steps: providing or collecting benign and pathogenic gene variants; creating a variant-gene graph by connecting each gene variant to one or more genes to which it belongs and by connecting each gene to one or more other genes according to a pre-definable rule; training a graph neural network, GNN, on the variant-gene graph; and feeding a new gene variant to the graph neural network for predicting by the graph neural network whether the new gene variant is benign or pathogenic. Further, a corresponding data processing system for carrying out the above method for predicting an effect of a gene variant on an organism is provided.
Description
A METHOD FOR PREDICTING AN EFFECT OF A GENE VARIANT ON AN ORGANISM BY MEANS OF A DATA PROCESSING SYSTEM AND A CORRESPONDING DATA PROCESSING SYSTEM
The present invention relates to a method for predicting an effect of a gene variant on an organism by means of a data processing system and a data processing system for carrying out this method.
Genetic mutations can cause disease by disrupting normal gene function. However, identifying the disease-causing mutations from millions of genetic variants within an individual patient is challenging. Computational methods which can prioritize disease-causing mutations have enormous applications. It is well known that genes function through a complex regulatory network.
Methods for predicting an effect of a gene variant on an organism by means of a data processing system and corresponding data processing systems are known from prior art. Corresponding prior art documents are listed as follows:
US 2016/0357903 A1 - A framework for determining the relative effect of genetic variants.
US 2015/0066378 A1 - Identifying Possible Disease-Causing Genetic Variants by Machine Learning Classification.
US 2019/0114547 A1 - Deep Learning-Based Splice Site Classification.
Further, US 2019/0139622 A1 discloses a method and a system for predicting effects of perturbations to an organism. The method discloses that a neural network is trained to classify the effects of perturbations to a gene or other features of the organism. After training the graph neural network is configured to predict activity of a new strain having one or more modifications to the gene.
The prior art reference EP 3 514 798 A1 discloses a system for a prediction of genetic variants with machine learning model. The prior art discloses an automated
computational system for predicted information about genetic variants. The method comprises a microprocessor, determining the functionality for each gene based on the genetic variant data and also generating a weighted genetic network comprising the plurality of genes of the genome having connections between them. The method also comprises a regression model explaining the type of variant affecting genes.
The prior art reference WO 2016/172 464 A1 discloses a method for predicting gene-dysfunction caused by a defined genetic mutation in the genome of an organism. This reference also discloses a variant gene graph and also discloses the variant category either benign or pathogenic based on a trained machine learning model. This prior art is not disclosing the feature to identify a newly added variant category to be analyzed.
The prior art reference US 2016/0371431 A discloses a method of predicting pathogenicity of genetic sequence variants. It also discloses that after the machine learning model is trained and has categorized the variant with respect to category of disease causing variant or not, it will identify or predict the variant pathogenicity of newly added variant. The prior art reference does not disclose a gene interaction network.
It is an object of the present invention to improve and further develop a method for predicting an effect of a gene variant on an organism by means of a data processing system and a corresponding data processing system for realizing a very high prediction accuracy by simple means.
In accordance with the invention, the aforementioned object is accomplished by a method for predicting an effect of a gene variant on an organism by means of a data processing system, comprising the following steps:
- providing or collecting benign and pathogenic gene variants;
- creating a variant-gene graph by connecting each gene variant to one or more genes to which said gene variant belongs and by connecting each gene to one or more other genes according to a pre-definable rule;
- training a graph neural network, GNN, on the variant-gene graph; and
- feeding a new gene variant to the graph neural network for predicting by the graph neural network whether the new gene variant is benign or pathogenic.
Further, the aforementioned object is accomplished by a data processing system for carrying out the method for predicting an effect of a gene variant on an organism, comprising:
- providing or collecting means for providing or collecting benign and pathogenic gene variants;
- creating means for creating a variant-gene graph by connecting each gene variant to one or more genes to which said gene variant belongs and by connecting each gene to one or more other genes according to a pre- definable rule;
- training means for training a graph neural network, GNN, on the variant-gene graph; and
- feeding means for feeding a new gene variant to the graph neural network for predicting by the graph neural network whether the new gene variant is benign or pathogenic.
According to the invention it has been recognized that it is possible to realize a very high prediction accuracy by simply providing a particularly suitable graph neural network model and training set and proceeding. Firstly, benign and pathogenic gene variants are provided or collected from a suitable source. This means that relevant data and/or features of such gene variants are provided or collected for further use in the method. In a next step a suitable variant-gene graph is created by a) connecting each gene variant to one or more genes to which this gene variant belongs and by b) connecting each gene to one or more other genes according to a pre-definable rule. Then, with such a variant-gene graph a graph neural network is trained. In a last step a new or unknown gene variant is fed to the graph neural network for predicting by the graph neural network whether the new or unknown gene variant is benign or pathogenic. All or some of the method steps can be performed or supported by the data processing system, e.g. a computer.
This graph neural network approach operates on a heterogeneous graph with genes and gene variants. This graph is created by assigning gene variants to genes and connecting genes with an existing gene-gene interaction network. The invention improves the prediction accuracy and allows experts to interpret the prediction by inspecting which gene variants and genes had a large effect on a prediction. The prediction of effects of new observed gene variants is possible with very high accuracy.
Thus, on the basis of the invention a method and system are provided which realize a very high prediction accuracy by simple means.
According to an embodiment of the invention the provided or collected benign and pathogenic gene variants can be provided or collected from one or more databases comprising data or features of gene variants. There can be a definable communication between the data processing system and one or more database for the step of providing or collecting the gene variants. Thus, a large amount of gene variants can be used for realizing a high prediction accuracy by simple means.
Within a further embodiment labeling for each variant to which gene or genes it belongs can be based on suitable coordinates. Benign and pathogenic gene variants can be assigned to the closest gene or genes in a related genome. This simplifies the method and provides a realization of a high prediction accuracy.
According to a further embodiment the pre-definable rule can comprise connecting each gene to every other gene. Alternatively, the pre-definable rule can comprise connecting each gene to one or more other genes which is or are connected to said gene based on one or more predefined biological interactions. Preferably, the one or more predefined biological interactions can simply be retrieved from a biological database or from a gene-gene interaction graph of a biological database.
Within a further embodiment at least one feature can be collected for at least one or each gene variant, wherein preferably the at least one feature can be the output of another variant prediction model that does not use a graph.
According to a further embodiment and for providing a simple and effective representation the at least one or each gene variant can be represented by a feature vector.
Within a further embodiment at least one feature can be collected for at least one or each gene. As a result, at least one or each gene can be specified by such a feature.
According to a further embodiment at least one or each gene can be represented by a N dimensional vector, wherein N is an integer. This provides a very simple and clear representation.
Within a further embodiment the N dimensional vector can be a randomly initialized vector, which is optimized in the training step. Such a type of vector is very suitable for effectively performing the method.
Within a further embodiment the N dimensional vector can comprise at least one collected feature and/or is a concatenation of a randomly initialized vector, which is trainable, with one or more collected gene features. Also such a type of vector is very suitable for effectively performing the method.
According to a further embodiment, for a training set in the training step, each gene variant in the training set can have a definable label, e.g. 0 for benign and 1 for pathogenic. By means of such a label very efficient and structured prediction with high prediction accuracy is possible.
Within a further embodiment in the training step one or more parameters of the graph neural network can be updated using gradient descent. This proceeding supports an increase of the likelihood for gene variants in the training set to obtain the correct label from the network.
According to a further embodiment and for further enhancing the prediction accuracy an explanation for the prediction of a gene variant or variants can be provided by returning which other gene variant or gene variants and/or which gene or genes the graph neural network has utilized to arrive at the prediction, wherein preferably the
impact can be provided, for example to an expert, that the gene variant or gene variants and/or gene or genes had on the prediction. This proceeding provides a high degree of information to a user of the method.
Advantages and aspects of embodiments of the present invention are summarized and further explained as follows:
According to embodiments of the present invention a graph neural network approach is proposed that operates on a heterogeneous graph with genes and gene variants. The graph can be created by assigning variants to genes and connecting genes with an existing gene-gene interaction network. The graph neural network can be trained to aggregate information between genes, and between genes and gene variants. Gene variants can exchange information via the genes they connect to. This method improves the prediction accuracy and allows experts to interpret the prediction by inspecting which gene variants and genes had a large effect on a prediction.
Generally, all embodiments of the present invention provide a variant effect prediction with graph neural network, VEGN.
Predicting variant effects is a long-standing problem in genetics. Previous Artificial Intelligence, Al, systems that perform variant effect prediction do this by predicting each gene variant in isolation. Even though work has been done to interpret variant effects in the context of biological regulatory network, no existing method can effectively integrate gene-variant and gene-gene network together to predict effect of new observed gene variants. In embodiments of the present invention, we learn the high order relationship between variants and between gene and variants with a graph neural network. Comparing to existing approaches, VEGN has two advantages. First, VEGN considers variant effect in the context of a biological network instead of isolated events. Such approach enables VEGN to capture potential remote - trans - effect of variants on indirectly connected genes. Second, existing annotations of disease variants are sparse and are focused on some well- studied genes. VEGN enables information flow from well-studied genes to less- studied genes through the biological network. It also considers the correlated
disease-causing status for variants within the same functional module in a biological network. The present approach greatly improves the prediction accuracy compared to previous methods.
In embodiments of this invention, we formulate variant effect prediction as a graph. A graph can consist of a set of nodes and a set of edges, where an edge holds between the two nodes. A gene variant or variant is a genetic variation in a genome that differs from the reference genome. Such a variant can be identified to belong to a certain gene or genes by assigning it to the nearest gene - or genes in the case of equal distance - in the genome coordinate. Given a set of variants and the set of genes they belong to, the union of this set is the set of nodes in a graph. For each variant there is an edge to the genes it belongs to. For edges between genes, we consider two options: (1) the edges are given as input, e.g. a domain expert labelled the edges; (2) we assume an edge exists from each gene to every other gene.
Based on this graph, a graph neural network, GNN, with weights w can be trained. For this, we assume we are given a training set where each variant has a feature vector - e.g. predicted variant effect on transcription factor binding, on splicing, conservation score - and a classification label, e.g. 0 or 1 for benign or pathogenic. For each variant in the training data, we can utilize the associated classification label to define a loss - e.g. binary cross entropy loss - and use - stochastic - gradient descent and backpropagation to update the weights w of the GNN. Once trained, new variants can be added to the graph and applying the GNN will classify the variant, e.g. as benign or pathogenic.
The GNN itself can take various forms, e.g. it could be a graph attention network, see Velickovic et al. 2018, Graph Attention Networks. International Conference on Learning Representations, ICLR. Furthermore, we can learn one joint GNN or we could learn a different GNN depending on the edge type, e.g. a different network is learnt for gene-variant, variant-gene and gene-gene edges. Furthermore, if we assume that each gene has an edge to every other gene, then we learn the strength of each edge. This can be done with a fully connected neural network, e.g. using a Transformer, see Vaswani et al. 2017, Attention is All you Need, Neural Information Processing Systems, NeurlPS. The fully connected neural network can then be
used for the edge type gene-gene, whereas a GNN can be used for other edge types. This allows us to combine a given graph structure and a learnt graph structure in one joint neural network.
When classifying a new variant, information flows via the graph neural network from the variant’s feature vector to the gene the variant it is attached to and from there to other parts of the graph. This enables us to track which gene and other variant had an influence on the prediction. This can be done by using an explanation method suitable for GNNs, e.g. GNNExplainer, see Ying et al 2019, GNNExplainer: Generating Explanations for Graph Neural Networks, Neural Information Processing Systems, NeurlPS. This is a powerful advantage of embodiments of our invention because it allows us to explain the model’s variant effect prediction to a domain expert by providing the information on which variants and genes had an impact on the prediction. This information may help the domain experts to discover new disease associated genes or non-additive effects of variant combinations.
For each variant v VEGN predicts a probability of the variant to be disease-causing (pathogenic): P (pathogenic). The graph neural network model with weights w can be trained with standard stochastic gradient descent and a cross entropy loss function:
T(w) = åm Ti log Piipatho genic) + (1 - yt) · log(l - Pt(patho genic)), where yt is the label of the variant vt in the training data, pathogenic being 1 and benign being 0, Pi(pathogenic) is the prediction for vt and where i is an integer.
Validating our method empirically shows large improvements over the previous state of the art, both in terms of average precision and area under the curve:
Further advantages and aspects of embodiments of the present invention can be summarized as follows:
1) Embodiments can formulate variant effect prediction as a graph via gene attachments and can learn a graph neural network.
2) Embodiments can learn an application specific gene-gene interaction graph.
3) Embodiments can combine a given graph structure with a learnt graph structure in one joint neural network.
4) Embodiments can explain a prediction of a variant by providing the variants and genes that and the impact they had on the prediction.
Further aspects and advantages of embodiments of the method and data processing system according to the present invention can be summarized as follows:
An embodiment can comprise a method for predicting what effect a human’s gene variant will have on their body. The method can comprise the steps of:
1) Collecting existing benign and pathogenic variants from databases.
2) Labeling for each variant to which genes it belongs, based on the coordinates or coordinates of the genes. Variants can be assigned to the closest genes in the genome.
3) Optional: For each gene, labeling which other genes are connected to it based on biological interactions, e.g. retrieved from a gene-gene interaction graph of a biological database.
4) Creating a variant-gene graph, where: a. each variant can be connected to one or more genes based on step
2) b. each gene can be either i. connected to every other gene ii. connected to the genes identified in step 3) if step 3) is present.
5) Collecting features for each variant. For example, the feature could be the output of another variant prediction model that does not use a graph.
6) Optional: Collecting features for each gene.
7) Each variant can be represented by the feature vector collected in step 5).
8) Each gene can be represented by a N dimensional vector, which may be either one of the below or a concatenation: a. A randomly initialized vector, which can be optimized in the training process. b. The gene features collected in step 6). c. A concatenation of the randomly initialized vector, which is trainable, with gene features collected in step 6).
9) Training a graph neural network model on the graph defined in step 4), where a. for a training set, each variant in the training set can have a label, e.g. 0 for benign 1 for pathogenic. b. the model’s parameters can be updated using gradient descent in order to increase the likelihood for variants in the training set to obtain the correct label from the network.
10)Once the model is trained, giving a new, previously unseen variant to the model to have the model predict whether the variant is benign or pathogenic
11)Optional: Providing an explanation for the prediction by returning which other variants and which genes the model utilized to arrive at the prediction.
Previous methods classify each variant in isolation. By treating the problem as a graph where variants are linked to each other via genes and by automatically learning a gene-gene network, embodiments of the present method can learn a graph neural network that greatly improves the accuracy of the variant prediction.
There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end it is to be referred to the following explanation of examples of embodiments of the invention, illustrated by the drawing. In the drawing
Fig. 1 shows in a diagram the overall architecture of an embodiment of the present invention,
Fig. 2 shows in a diagram a further embodiment of the present invention,
Fig. 3 shows in a block diagram a further embodiment of the present invention, and
Fig. 4 shows in a block diagram a further embodiment of the present invention.
Fig. 1 shows in a diagram the overall architecture of an embodiment of the present invention, concretely a VEGN. The goal is to classify gene variants - in short form: variants - which are denoted by triangles. Variants are associated with a gene, denoted by circles, and a gene-gene network is either given or learnt. Based on this, a GNN can be learnt. New variants are added to the graph via the gene they attach to. Given a new variant’s feature vector, the GNN classifies the new variants and can give an explanation of which other variants and genes were relevant for the classification.
Fig. 2 shows in a diagram a further embodiment of the present invention. Flere is shown a concrete instantiation with a different GNN for each edge type: The goal is to classify variants which are denoted by triangles, e.g. as benign 0 or pathogenic 1. Variants are associated with a gene, denoted by circles, and a gene-gene network is either given or learnt. Based on this, a GNN can be learnt. This can either be one joint GNN or different GNNs can be learnt for different edges. E.g. for the three different edge types - “gene has variant”, “gene interacts with gene” and “variant in gene” - separate GNN layers are instantiated and learnt. Arrows within a layer indicate the direction of information flow, where the hidden representation of the arrow's source is used to update the hidden representation of the arrow's target. Within a layer the arrows represent the weights of the GNN that is learnt and these weights are shared within this layer, i.e. for ’’variant in gene”, each variant has its own feature vector and to this the same GNN layer's weights are applied to update the target hidden representation. The hidden representations of each layer are aggregated, e.g. by sum. Finally, there is one further GNN layer where information flows from the gene to the variant. Based on this update, a classification layer, e.g. via a sigmoid function, determines the likelihood of a variant being benign or pathogenic. During training, the true label of a variant v is observed and weights can be updated via a loss function and backpropagation. During test time, new variants can be added to the graph via the gene they attach to. Based on the features
associated with the variant, the learnt weights can be applied in a forward pass to derive a prediction.
Further embodiments:
1. Genetic diagnostics for patients
Each individual has millions of genetic variants. Even though such variants can be identified with high-throughput sequencing and bioinformatics variant calling methods, it is challenging to prioritize potential disease-causing variants. VEGN or embodiments of the present invention can be used to prioritize a short list of variants for clinician to manually inspect.
Fig. 3 shows in a block diagram such a further embodiment of the present invention. In genetic diagnosis, patients first have their genome sequenced with whole genome sequencing or whole exon sequencing. A list of variants is generated through variant calling on the sequencing data. VEGN or embodiments of the present invention can be applied to each of the variant and predict a disease-relevance score P(pathogenic). The variants can then be sorted based on the score in descending order. The top k variants, wherein k is an integer, are selected for further manual investigation by domain experts. The number of k is dependent on the resource.
2. Neoantigen selection
Neoantigens are antigens found specifically in tumor samples. They are products from tumor-specific variants. Due to the tumor-specificity of neoantigens, they are frequently used as targets for immunotherapy. Existing neoantigen selection pipelines typically do not consider the effects of variants. VEGN or embodiments of the present invention can help to prioritize and select most biologically relevant variants.
Fig. 4 shows in a block diagram such a further embodiment of the present invention. In neoantigen discovery, tumor samples are whole genome sequenced or whole exon sequenced. A list of missense variants is generated through variant calling on the sequencing data. VEGN or embodiments of the present invention can be applied to each of the variant and predict a disease-relevance score P(pathogenic). The
variants can then be sorted based on the score in descending order. The predicted disease-causing probabilities are combined with other evidence in an existing neoantigen discovery pipeline to select for neoantigens. Many modifications and other embodiments of the invention set forth herein will come to mind to the one skilled in the art to which the invention pertains having the benefit of the teachings presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
Claims
1. A method for predicting an effect of a gene variant on an organism by means of a data processing system, comprising the following steps:
- providing or collecting benign and pathogenic gene variants;
- creating a variant-gene graph by connecting each gene variant to one or more genes to which said gene variant belongs and by connecting each gene to one or more other genes according to a pre-definable rule;
- training a graph neural network, GNN, on the variant-gene graph; and
- feeding a new gene variant to the graph neural network for predicting by the graph neural network whether the new gene variant is benign or pathogenic.
2. A method according to claim 1 , wherein the provided or collected benign and pathogenic gene variants are provided or collected from one or more databases.
3. A method according to claim 1 or 2, wherein the benign and pathogenic gene variants are assigned to the closest gene or genes in a related genome.
4. A method according to one of claims 1 to 3, wherein the pre-definable rule comprises connecting each gene to every other gene or connecting each gene to one or more other genes which is or are connected to said gene based on one or more predefined biological interactions, wherein preferably the one or more predefined biological interactions are retrieved from a biological database or from a gene-gene interaction graph of a biological database.
5. A method according to one of claims 1 to 4, wherein at least one feature is collected for at least one or each gene variant, wherein preferably the at least one feature is the output of another variant prediction model that does not use a graph.
6. A method according to one of claims 1 to 5, wherein at least one or each gene variant is represented by a feature vector.
7. A method according to one of claims 1 to 6, wherein at least one feature is collected for at least one or each gene.
8. A method according to one of claims 1 to 7, wherein at least one or each gene is represented by a N dimensional vector.
9. A method according to claim 8, wherein the N dimensional vector is a randomly initialized vector, which is optimized in the training step.
10. A method according to claim 8 or 9, wherein the N dimensional vector comprises at least one collected feature.
11. A method according to one of claims 8 to 10, wherein the N dimensional vector is a concatenation of a randomly initialized vector, which is trainable, with one or more collected gene features.
12. A method according to one of claims 1 to 11 , wherein for a training set in the training step, each gene variant in the training set has a definable label, e.g. 0 for benign and 1 for pathogenic.
13. A method according to one of claims 1 to 12, wherein in the training step one or more parameters of the graph neural network are updated using gradient descent.
14. A method according to one of claims 1 to 13, wherein an explanation for the prediction of a gene variant or variants is provided by returning which other gene variant or gene variants and/or which gene or genes the graph neural network has utilized to arrive at the prediction, wherein preferably the impact is provided that the gene variant or gene variants and/or gene or genes had on the prediction.
15. A data processing system for carrying out the method for predicting an effect of a gene variant on an organism according to any one of claims 1 to 14, comprising:
- providing or collecting means for providing or collecting benign and pathogenic gene variants;
- creating means for creating a variant-gene graph by connecting each gene variant to one or more genes to which said gene variant belongs and by
connecting each gene to one or more other genes according to a pre- definable rule;
- training means for training a graph neural network, GNN, on the variant-gene graph; and - feeding means for feeding a new gene variant to the graph neural network for predicting by the graph neural network whether the new gene variant is benign or pathogenic.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2021/059567 WO2022218509A1 (en) | 2021-04-13 | 2021-04-13 | A method for predicting an effect of a gene variant on an organism by means of a data processing system and a corresponding data processing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2021/059567 WO2022218509A1 (en) | 2021-04-13 | 2021-04-13 | A method for predicting an effect of a gene variant on an organism by means of a data processing system and a corresponding data processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022218509A1 true WO2022218509A1 (en) | 2022-10-20 |
Family
ID=75674774
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2021/059567 WO2022218509A1 (en) | 2021-04-13 | 2021-04-13 | A method for predicting an effect of a gene variant on an organism by means of a data processing system and a corresponding data processing system |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022218509A1 (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150066378A1 (en) | 2013-08-27 | 2015-03-05 | Tute Genomics | Identifying Possible Disease-Causing Genetic Variants by Machine Learning Classification |
WO2016172464A1 (en) | 2015-04-22 | 2016-10-27 | Genepeeks, Inc. | Device, system and method for assessing risk of variant-specific gene dysfunction |
US20160357903A1 (en) | 2013-09-20 | 2016-12-08 | University Of Washington Through Its Center For Commercialization | A framework for determining the relative effect of genetic variants |
US20160371431A1 (en) | 2015-06-22 | 2016-12-22 | Counsyl, Inc. | Methods of predicting pathogenicity of genetic sequence variants |
US20190114547A1 (en) | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Deep Learning-Based Splice Site Classification |
US20190139622A1 (en) | 2017-08-03 | 2019-05-09 | Zymergen, Inc. | Graph neural networks for representing microorganisms |
EP3514798A1 (en) | 2011-10-31 | 2019-07-24 | The Scripps Research Institute | Systems and methods for genomic annotation and distributed variant interpretation |
-
2021
- 2021-04-13 WO PCT/EP2021/059567 patent/WO2022218509A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3514798A1 (en) | 2011-10-31 | 2019-07-24 | The Scripps Research Institute | Systems and methods for genomic annotation and distributed variant interpretation |
US20150066378A1 (en) | 2013-08-27 | 2015-03-05 | Tute Genomics | Identifying Possible Disease-Causing Genetic Variants by Machine Learning Classification |
US20160357903A1 (en) | 2013-09-20 | 2016-12-08 | University Of Washington Through Its Center For Commercialization | A framework for determining the relative effect of genetic variants |
WO2016172464A1 (en) | 2015-04-22 | 2016-10-27 | Genepeeks, Inc. | Device, system and method for assessing risk of variant-specific gene dysfunction |
US20160371431A1 (en) | 2015-06-22 | 2016-12-22 | Counsyl, Inc. | Methods of predicting pathogenicity of genetic sequence variants |
WO2016209999A1 (en) * | 2015-06-22 | 2016-12-29 | Counsyl, Inc. | Methods of predicting pathogenicity of genetic sequence variants |
US20190139622A1 (en) | 2017-08-03 | 2019-05-09 | Zymergen, Inc. | Graph neural networks for representing microorganisms |
US20190114547A1 (en) | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Deep Learning-Based Splice Site Classification |
Non-Patent Citations (11)
Title |
---|
CHEREDA HRYHORII ET AL: "Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer", GENOME MEDICINE, vol. 13, no. 1, 11 March 2021 (2021-03-11), pages 42, XP055872471, Retrieved from the Internet <URL:https://genomemedicine.biomedcentral.com/track/pdf/10.1186/s13073-021-00845-7.pdf> [retrieved on 20211214], DOI: 10.1186/s13073-021-00845-7 * |
ERASLAN GÖKCEN ET AL: "Deep learning: new computational modelling techniques for genomics", NATURE REVIEWS GENETICS, NATURE PUBLISHING GROUP, GB, vol. 20, no. 7, 10 April 2019 (2019-04-10), pages 389 - 403, XP036813365, ISSN: 1471-0056, [retrieved on 20190410], DOI: 10.1038/S41576-019-0122-6 * |
KRZYSZTOF CHOROMANSKI ET AL: "Rethinking Attention with Performers", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 9 March 2021 (2021-03-09), XP081897794 * |
PETAR VELIKOVI ET AL: "GRAPH ATTENTION NETWORKS", 4 February 2018 (2018-02-04), XP055703475, Retrieved from the Internet <URL:https://arxiv.org/pdf/1710.10903.pdf> [retrieved on 20200610] * |
SCHULTE-SASSE ROMAN ET AL: "Graph Convolutional Networks Improve the Prediction of Cancer Driver Genes", 9 September 2019, ADVANCES IN DATABASES AND INFORMATION SYSTEMS; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER INTERNATIONAL PUBLISHING, CHAM, PAGE(S) 658 - 668, ISBN: 978-3-319-10403-4, XP047520829 * |
SHARAD VIKRAM ET AL: "SSCM: A method to analyze and predict the pathogenicity of sequence variants", BIORXIV, 26 June 2015 (2015-06-26), XP055546969, Retrieved from the Internet <URL:https://www.biorxiv.org/content/biorxiv/early/2015/06/26/021527.full.pdf> [retrieved on 20211214], DOI: 10.1101/021527 * |
SUNDARAM LAKSSHMAN ET AL: "Predicting the clinical impact of human mutation with deep neural networks", NATURE GENETICS, NATURE PUBLISHING GROUP US, NEW YORK, vol. 50, no. 8, 23 July 2018 (2018-07-23), pages 1161 - 1170, XP036902750, ISSN: 1061-4036, [retrieved on 20180723], DOI: 10.1038/S41588-018-0167-Z * |
TIANWEI YUE ET AL: "Deep Learning for Genomics: A Concise Overview", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 2 February 2018 (2018-02-02), XP080857057 * |
VASWANI ET AL.: "Attention is All you Need", NEURAL INFORMATION PROCESSING SYSTEMS, NEURLPS, 2017 |
VELICKOVIC ET AL.: "Graph Attention Networks", INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS, ICLR, 2018 |
YING ET AL.: "GNNExplainer: Generating Explanations for Graph Neural Networks", NEURAL INFORMATION PROCESSING SYSTEMS, NEURLPS, 2019 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7275228B2 (en) | Deep Convolutional Neural Networks for Variant Classification | |
AU2020202267B2 (en) | Methods and systems for identification of causal genomic variants | |
Pitangueira et al. | Software requirements selection and prioritization using SBSE approaches: A systematic review and mapping of the literature | |
US20190318806A1 (en) | Variant Classifier Based on Deep Neural Networks | |
AU2023282274A1 (en) | Variant classifier based on deep neural networks | |
AU2019272062A1 (en) | Deep learning-based techniques for pre-training deep convolutional neural networks | |
Mieth et al. | DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies | |
Matukumalli et al. | Application of machine learning in SNP discovery | |
D’Agaro | Artificial intelligence used in genome analysis studies | |
WO2023014912A1 (en) | Transfer learning-based use of protein contact maps for variant pathogenicity prediction | |
Wang et al. | Predict long-range enhancer regulation based on protein–protein interactions between transcription factors | |
Wise et al. | SMARTS: reconstructing disease response networks from multiple individuals using time series gene expression data | |
Pradier et al. | AIRIVA: a deep generative model of adaptive immune repertoires | |
WO2022218509A1 (en) | A method for predicting an effect of a gene variant on an organism by means of a data processing system and a corresponding data processing system | |
Minot et al. | Meta Learning Improves Robustness and Performance in Machine Learning-Guided Protein Engineering | |
US20230045003A1 (en) | Deep learning-based use of protein contact maps for variant pathogenicity prediction | |
Egilmez et al. | Cell loading and shipment optimisation in a cellular manufacturing system: an integrated genetic algorithms and neural network approach | |
Zheng et al. | Translation rate prediction and regulatory motif discovery with multi-task learning | |
Rahimikollu et al. | SLIDE: Significant Latent Factor Interaction Discovery and Exploration across biological domains | |
US20200265270A1 (en) | Mutual neighbors | |
US11443181B2 (en) | Apparatus and method for characterization of synthetic organisms | |
US20230368868A1 (en) | Entity selection metrics | |
Bartoszewicz et al. | DeePaC: Predicting pathogenic potential of novel DNA with a universal framework for reverse-complement neural networks | |
Bej | Improved imbalanced classification through convex space learning | |
Bronikowski et al. | Prediction of chronic fatigue syndrome using decision tree-based ensemble methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21721407 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21721407 Country of ref document: EP Kind code of ref document: A1 |