US11636309B2 - Systems and methods for modeling probability distributions - Google Patents
Systems and methods for modeling probability distributions Download PDFInfo
- Publication number
- US11636309B2 US11636309B2 US16/249,854 US201916249854A US11636309B2 US 11636309 B2 US11636309 B2 US 11636309B2 US 201916249854 A US201916249854 A US 201916249854A US 11636309 B2 US11636309 B2 US 11636309B2
- Authority
- US
- United States
- Prior art keywords
- layer
- rbm
- visible
- data
- hidden
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G06N3/0445—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G06N3/0454—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G06N3/0472—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present invention generally relates to modeling probability distributions and more specifically relates to training and implementing a Boltzmann machine to accurately model complex probability distributions.
- One embodiment includes a method for training a restricted Boltzmann machine (RBM), wherein the method includes generating, from a first set of visible values, a set of hidden values in a hidden layer of a RBM and generating a second set of visible values in a visible layer of the RBM based on the generated set of hidden values.
- RBM restricted Boltzmann machine
- the method also includes computing a set of likelihood gradients based on at least one of the first set of visible values and the generated set of visible values, computing a set of adversarial gradients using an adversarial model based on at least one of the set of hidden values and the set of visible values and computing a set of compound gradients based on the set of likelihood gradients and the set of adversarial gradients.
- the method includes updating the RBM based on the set of compound gradients.
- the visible layer of the RBM includes a composite layer composed of a plurality of sub-layers for different data types.
- the plurality of sub-layers includes at least one of a Bernoulli layer, an Ising layer, a one-hot layer, a von Mises-Fisher layer, a Gaussian layer, a ReLU layer, a clipped ReLU layer, a student-t layer, an ordinal layer, an exponential layer, and a composite layer.
- the RBM is a deep Boltzmann machine (DBM), wherein the hidden layer is one of a plurality of hidden layers.
- DBM deep Boltzmann machine
- the RBM is a first RBM and the hidden layer is a first hidden layer of the plurality of hidden layers.
- the method further includes sampling the hidden layer from the first RBM, stacking the visible layer and the hidden layer from the first RBM into a vector, training a second RBM, and generating the DBM by copying weights from the first and second RBMs to the DBM.
- the vector is a visible layer of the second RBM.
- the method further includes steps for receiving a phenotype vector for a patient, using the RBM to generate a time progression of a disease, and treating the patient based on the generated time progression.
- the visible layer and the hidden layer are for a first time instance, wherein the hidden layer is further connected to a second hidden layer that incorporates data from a different second time instance.
- the visible layer is a composite layer includes data for a plurality of different time instances.
- computing the set of likelihood gradients includes performing Gibbs sampling.
- the set of compound gradients are weighted averages of the set of likelihood gradients and the set of adversarial gradients.
- the method further includes steps for training the adversarial model by drawing data samples based on authentic data, drawing fantasy samples based from the RBM, and training the adversarial model based on the adversarial model's ability to distinguish between the data samples and the fantasy samples.
- training the adversarial model includes measuring a probability that a particular sample is drawn from either the authentic data or the RBM.
- the adversarial model is one of a fully-connected classifier, a logistic regression model, a nearest neighbor classifier, and a random forest.
- the method further includes steps for using the RBM to generate a set of samples of a target population.
- computing a set of likelihood gradients includes computing a convex combination of a Monte Carlo estimate and a mean field estimate.
- computing a set of likelihood gradients includes initializing a plurality of samples and initializing an inverse temperature for each sample of the plurality of samples. For each sample of the plurality of samples, computing a set of likelihood gradients further includes updating the inverse temperature by sampling from an autocorrelated Gamma distribution, and updating the sample using Gibbs sampling.
- FIG. 1 illustrates a system that provides for the gathering and distribution of data for modeling probability distributions in accordance with some embodiments of the invention.
- FIG. 2 illustrates a data processing element for training and utilizing a stochastic model.
- FIG. 3 illustrates a data processing application for training and utilizing a stochastic model.
- FIG. 4 conceptually illustrates a process for preparing data for analysis.
- FIG. 5 illustrates data structures for implementing a generalized Boltzmann Machine in accordance with certain embodiments of the invention.
- FIG. 6 illustrates a bimodal distribution and a smoothed, spread distribution that is learned by a RBM distribution in accordance with several embodiments of the invention.
- FIG. 7 illustrates an architecture for a generalized Restricted Boltzmann Machine in accordance with some embodiments of the invention.
- FIG. 8 illustrates a schema for implementing a generalized Boltzmann Machine in accordance with certain embodiments of the invention.
- FIG. 9 illustrates an architecture for a generalized Deep Boltzmann Machine in accordance with certain embodiments of the invention.
- FIG. 10 conceptually illustrates a process for reverse layerwise training in accordance with an embodiment of the invention.
- FIG. 11 illustrates an architecture for a generalized Deep Temporal Boltzmann Machine in accordance with many embodiments of the invention.
- FIG. 12 conceptually illustrates a process for training a Boltzmann Encoded Adversarial Machine in accordance with some embodiments of the invention.
- FIG. 13 illustrates resulting samples drawn from RBMs trained to maximize log likelihood and from RBMs trained as BEAMs.
- FIG. 14 illustrates results of training a BEAM on a 2D mixture of Gaussians in accordance with a number of embodiments of the invention.
- FIG. 15 illustrates an architecture for implementing a Boltzmann Encoded Adversarial Machine in accordance with a number of embodiments of the invention.
- FIG. 16 illustrates a comparison between samples drawn from a Boltzmann machine with regular Gibbs sampling to those drawn using Temperature Driven Sampling.
- FIG. 17 illustrates a comparison between fantasy particles generated by GRBMs trained on the MNIST dataset using regular Gibbs sampling to those using TDS.
- Machine learning is one potential approach to modeling complex probability distributions.
- many examples are described with reference to medical applications, but one skilled in the art will recognize that techniques described herein can be readily applied in a variety of different fields including (but not limited to) health informatics, image/audio processing, marketing, sociology, and lab research.
- One of the most pressing problems is that one often has little, or no, labeled data that directly addresses a particular question of interest.
- a supervised learning setting one would give the therapeutic to many patients and observe how each patient responds. Then, one would use this data to build a model that predicts how a new patient will respond to the therapeutic.
- a nearest neighbor classifier would look through the pool of previously treated patients to find a patient that is most similar to the new patient, then it would predict the new patient's response based on the previously treated patient's response.
- supervised learning requires significant amounts of labeled data and, particularly where sample sizes are small or labeled data is not readily available, unsupervised learning is critical to the successful application of machine learning.
- medical data can include a variety of different types of information from a variety of different sources, including (but not limited to) demographic information (e.g., a patient's age, ethnicity, etc.), diagnoses (e.g., binary codes that describe whether or not a patient has a particular disease), laboratory values (e.g., results from laboratory tests, such as blood tests), doctor's notes (e.g., hand written notes taken by a physician or entered into a medical records system), images (e.g., x-rays. CT scans, MRIs, etc.), and 'omics data (e.g., data from DNA sequencing studies that describe a patient's genetic background, the expression of his/her genes, etc.).
- demographic information e.g., a patient's age, ethnicity, etc.
- diagnoses e.g., binary codes that describe whether or not a patient has a particular disease
- laboratory values e.g., results from laboratory tests, such as blood tests
- doctor's notes e.g
- Some of these data are binary, some are continuous, and some are categorical. Integrating all of these different types and sources of data is critical, but treating a variety of data types with traditional approaches to machine learning is quite challenging. Typically, the data have to be heavily pre-processed so that all of the features used for machine learning are of the same type. Data pre-processing steps can take up a large portion of an analyst's time in training and implementing a machine learning model.
- any algorithm needs to be able to learn from data where there are missing observations in the training set.
- the algorithm needs to be able to make predictions even when it is only presented with a subset of input observations. That is, one needs to be able to express any conditional relationship from the joint probability distribution.
- GANs Generative Adversarial Networks
- GANs in their traditional formulation, use a generator that transforms random Gaussian noise into a visible vector through a feed-forward neural network. Models with this formulation can be trained using the standard back-propagation process.
- GAN training tends to be unstable—requiring a careful balance between training of the generator and the discriminator (or critic).
- it is not possible to generate samples from arbitrary conditional distributions with GANs, and it can be very difficult to apply GANs to problems involving heterogeneous datasets with different data types and missing observations.
- Many embodiments of the invention provide novel and innovative systems and methods for the use of heterogeneous, irregular, and unlabeled data to train and implement stochastic, unsupervised machine learning models of complex probability distributions.
- Network 100 includes a communications network 160 .
- the communications network 160 is a network such as the Internet that allows devices connected to the network 160 to communicate with other connected devices.
- Server systems 110 , 140 , and 170 are connected to the network 160 .
- Each of the server systems 110 , 140 , and 170 is a group of one or more servers communicatively connected to one another via internal networks that execute processes that provide cloud services to users over the network 160 .
- cloud services are one or more applications that are executed by one or more server systems to provide data and/or executable applications to devices over a network.
- the server systems 110 , 140 , and 170 are shown each having three servers in the internal network. However, the server systems 110 , 140 and 170 may include any number of servers and any additional number of server systems may be connected to the network 160 to provide cloud services.
- a network that uses systems and methods that model complex probability distributions in accordance with an embodiment of the invention may be provided by a process (or a set of processes) being executed on a single server system and/or a group of server systems communicating over network 160 .
- the personal devices 180 are shown as desktop computers that are connected via a conventional “wired” connection to the network 160 .
- the personal device 180 may be a desktop computer, a laptop computer, a smart television, an entertainment gaming console, or any other device that connects to the network 160 via a “wired” connection.
- the mobile device 120 connects to network 160 using a wireless connection.
- a wireless connection is a connection that uses Radio Frequency (RF) signals, Infrared signals, or any other form of wireless signaling to connect to the network 160 .
- RF Radio Frequency
- the mobile device 120 is a mobile telephone.
- mobile device 120 may be a mobile phone.
- PDA Personal Digital Assistant
- data processing element 200 is one or more of a server system and/or personal devices within a networked system similar to the system described with reference to FIG. 1 .
- Data processing element 200 includes a processor (or set of processors) 210 , network interface 225 , and memory 230 .
- the network interface 225 is capable of sending and receiving data across a network over a network connection.
- the network interface 225 is in communication with the memory 230 .
- memory 230 is any form of storage configured to store a variety of data, including, but not limited to, a data processing application 232 , data files 234 , and model parameters 236 .
- Data processing application 232 in accordance with some embodiments of the invention directs the processor 210 to perform a variety of processes, such as (but not limited to) using data from data files 234 to update model parameters 236 in order to model complex probability distributions.
- data processing element 300 includes a data gathering engine 310 , database 320 , a model trainer 330 , a generative model 340 , a discriminator model 350 , and a simulator engine 360 .
- Model trainer 330 includes a schema processor 332 and a sampling engine 334 .
- Data processing applications in accordance with many embodiments of the invention process data to train stochastic models that can be used to model complex probability distributions.
- Data gathering engines in accordance with many embodiments of the invention gather data from various sources in various formats.
- the gathered data in accordance with many embodiments of the invention include data that may be heterogeneous (e.g., data with various types, ranges, and constraints) and/or incomplete.
- data gathering engines are further for pre-processing the data to facilitate the training of the model.
- pre-processing in accordance with some embodiments of the invention is automatically performed based on a datatype and/or a schema associated with each data input.
- bodies of unstructured text are processed in a variety of ways, such as (but not limited to) vectorization (e.g., using word2vec), summarization, sentiment analysis, and/or keyword analysis.
- vectorization e.g., using word2vec
- summarization e.g., sentiment analysis
- keyword analysis e.g., keyword analysis
- Other pre-processing steps can include (but are not limited to) normalization, smoothing, filtering, and aggregation.
- the pre-processing is performed using various machine learning techniques, including (but not limited to) Restricted Boltzmann machines, support vector machines, recurrent neural networks, and convolutional neural networks.
- Databases in accordance with various embodiments of the invention store data for use by data processing applications, including (but not limited to) input data, pre-processed data, model parameters, schemas, output data, and simulated data.
- databases are located on separate machines (e.g., in cloud storage, server farms, networked databases, etc.) from a data processing application.
- Model trainers in accordance with a number of embodiments of the invention are used to train generative and/or discriminator models.
- model trainers utilize schema processors to build the generator and/or discriminator models based on schemas that are defined for the various data available to the system.
- Schema processors in accordance with some embodiments of the invention build composite layers for a generative model (e.g., restricted Boltzmann machine) that are made up of several different layers for handling different types of data in different ways.
- model trainers train the generative and discriminator models by optimizing a compound objective function based on a log-likelihood and adversarial objectives.
- Training generative models in accordance with certain embodiments of the invention utilizes sampling engines to draw samples from the models to measure the probability distributions of the data and/or the models.
- sampling engines to draw samples from the models to measure the probability distributions of the data and/or the models.
- Various methods for sampling from such models to train and/or draw generated samples from a model are described in greater detail below.
- generative models are trained to model complex probability distributions, which can be used to generate predictions/simulations of various probability distributions.
- Discriminator models discriminate between data-based samples and model-generated samples based on the visible and/or hidden states.
- Simulator engines in accordance with several embodiments of the invention are used to generate simulations of complex probability distributions.
- simulator engines are used to simulate patient populations, disease progressions, and/or predicted responses to various treatments.
- Simulator engines in accordance with several embodiments of the invention use a sampling engine for drawing samples from the generative models that simulate the probability distribution of the data.
- the data in accordance with several embodiments of the invention is pre-processed in order to simplify the data. Unlike other pre-processing which is often highly manual and specific to the data, this can be performed automatically based on the type of data, without additional input from another person.
- Unstructured data in accordance with many embodiments of the invention can include various types of data that can be pre-processed in order to speed up processing and/or to reduce the memory requirements for storing the relevant data. Examples of such data can include (but are not limited to) bodies of text, signal processing data, audio data, and image data. Processing unstructured data in accordance with many embodiments of the invention can include (but is not limited to) feature identification, summarization, keyword detection, sentiment analysis, and signal analysis.
- the process 400 reorders ( 410 ) the data based on a schema.
- processes reorder the data based on the different data types defined in schemas by grouping similar data types to allow for efficient processing of the data types.
- the process 400 in accordance with some embodiments of the invention rescales ( 415 ) the data to prevent the overrepresentation of certain data elements based purely on the scale of the measurements.
- Process 400 then routes ( 420 ) the pre-processed data to the sublayers of a Boltzmann machine that are structured based on data types identified in the schema. Examples of Boltzmann machine structures and architectures are described in greater detail below.
- the data is pre-processed into temporally sequenced data structures for inputs to a deep temporal Boltzmann machine. Deep temporal Boltzmann machines are described in further detail below.
- FIG. 5 Temporal data structures for inputs to a Boltzmann machine in accordance with a number of embodiments of the invention are illustrated in FIG. 5 .
- the example of FIG. 5 shows three data structures 510 , 520 , and 530 .
- Each of the data structures represents a set of the data values captured at a particular time (i.e., times t 0 , t 1 , and tn).
- certain traits e.g., gender, ethnicity, birthdate, etc.
- other characteristics e.g., test results, medical scans, etc.
- the example further shows that certain data may be missing for some fields for certain times for certain individuals.
- each individual is assigned a separate identification number in order to maintain patient confidential information.
- FIG. 6 illustrates a bimodal distribution 610 and the pretty good, smoothed, spread distribution that is learned by a RBM distribution 620 . While RBMs are able to generate such good approximations, they can struggle when faced with finer, more complex distributions.
- BEAM Boltzmann Encoded Adversarial Machine
- phase I the therapeutic is given to healthy volunteers to assess it's safety.
- phase II the therapeutic is given to approximately 100 patients to obtain initial estimates for safety and efficacy.
- phase III the therapeutic is given to a few hundred to a few thousand patients to rigorously investigate the efficacy of the drug.
- phase II Before phase II, there is no in-human data on the effect of the investigational drug for the desired indication, making supervised learning impossible. After phase II, there is some in-human data on the effect of the investigational drug, but the sample size is quite limited, rendering supervised learning techniques ineffective. For comparison, a phase II clinical trial may have 100-200 patients, whereas a typical application of machine learning in computer vision may use millions of labeled images. As with many situations with limited data, the lack of large labeled datasets for many important problems implies that health informatics must heavily rely on methods for unsupervised learning.
- RBMs Restricted Boltzmann Machine
- the visible layer v describes the observed data.
- the hidden layer h consists of a set of unobserved latent variables that capture the interactions between the visible units.
- E(v,h) is called the energy function
- processes use the integral operator. ⁇ dx, to denote both standard integration or a sum over all of the elements in a discrete set.
- both the visible and hidden units are binary. Each can only take on the values 0 or 1.
- the energy function can be written as,
- a key feature of an RBM is that it is easy to compute the conditional probabilities
- log p(v) data log ⁇ dhp(v,h) data .
- • data denotes a average over all of the observed samples.
- the derivative of the log-likelihood with respect to some parameter of the model ⁇ is:
- K-step PCD is similar: First, samples from the model are initialized using a batch of data. The samples are updated for k steps, the gradients are computed, and the parameters are updated. In contrast to CD, the samples from the model are never re-initialized.
- Many architectures of Boltzmann machines in accordance with several embodiments of the invention utilize sampling to compute derivatives for training the Boltzmann machines. Various methods for sampling in accordance with several embodiments of the invention are described in greater detail below.
- FIG. 7 A generalized RBM in accordance with a number of embodiments of the invention is illustrated in FIG. 7 .
- the example of FIG. 7 shows a generalized RBM 700 with a visible layer 710 and a hidden layer 720 .
- the visible layer 710 is a composite layer comprised of several nodes of various types (i.e., continuous, categorical, and binary).
- the nodes of visible layer 710 are connected to nodes of hidden layer 720 .
- Hidden layers of generalized RBMs in accordance with several embodiments of the invention operate as a low dimensional representation of individuals (e.g., patients in a clinical trial) based on the compiled inputs to a composite visible layer.
- Generalized RBMs in accordance with a number of embodiments of the invention are trained with an energy function.
- E ⁇ ( v , h ) - a ⁇ ( v ) - b ⁇ ( h ) - v T ⁇ W ( ⁇ T ) 2 ⁇ h ( 9 )
- a(•) and b(•) are arbitrary functions, and ⁇ >0 and ⁇ >0 are scale parameters of the visible and hidden layers, respectively.
- Different functions are used to represent different types of data. Examples of layer types used for modeling various types of data are described below.
- Bernoulli Layer A Bernoulli layer is used to represent binary data v i ⁇ ⁇ 0, 1 ⁇ .
- An Ising layer is a symmetrized Bernoulli layer for visible units v i ⁇ ⁇ 1, +1 ⁇ .
- One-hot layers are commonly used to represent categorical variables.
- Gaussian Layer A Gaussian layer represents data where v i ⁇ .
- the bias function is
- a ⁇ ( v ) - ⁇ i ⁇ ( v i - v _ i ) 2 2 ⁇ ⁇ i 2 .
- Both the location, v i , and scale, ⁇ i , parameters of the layer are generally trainable. In practice, it helps to parameterize the model in terms of log ⁇ i to ensure that the scale parameter stays positive.
- ReLU Layer A Rectified Linear Unit (ReLU) layer represents data where v i ⁇ with v i ⁇ v i low .
- a ReLU layer is essentially a one-sided truncated Gaussian layer. The bias function is
- a ⁇ ( v ) - ⁇ i ⁇ ( v i - v _ i ) 2 2 ⁇ ⁇ i 2 over the domain v i ⁇ v i low .
- v i the location, v i , and scale, ⁇ i , parameters of the layer are generally trainable whereas v i low is typically specified before training. In practice, it helps to parameterize the model in terms of log ⁇ i to ensure that the scale parameter stays positive.
- a Clipped Rectified Linear Unit (ReLU) layer represents data where v i ⁇ with v i high ⁇ v i ⁇ v i low .
- a Clipped ReLU layer is essentially a two-sided truncated Gaussian layer. The bias function is
- a ⁇ ( v ) - ⁇ i ⁇ ( v i - v _ i ) 2 2 ⁇ ⁇ i 2 domain v i high ⁇ v i ⁇ v i low .
- v i high and v i low are typically specified before training. In practice, it helps to parameterize the model in terms of log ⁇ i to ensure that the scale parameter stays positive.
- Student-t Layer A Student-t distribution is similar to a Gaussian distribution, but has fatter tails. In a variety of embodiments, implementation of a Student-t layer is implicit.
- the layer has three parameters, a location parameter v i that controls the mean, a scale parameter v i that controls the variance, and a degrees of freedom parameter di that controls the thickness of the tails.
- the layer is defined by drawing a variance ⁇ i 2 ⁇ InverseGamma
- a ⁇ ( v ) - ⁇ i ⁇ ( v i - v _ i ) 2 2 ⁇ ⁇ i 2 .
- An Ordinal layer is a generalization of a Bernoulli layer that is used to represent integer valued data v i ⁇ ⁇ 0,N i ⁇ .
- the upper value N i is specified ahead of time.
- Gaussian-Ordinal Layer is a generalization of an ordinal layer that is used to represent integer valued data v i ⁇ ⁇ 0,N i ⁇ with a more flexible distribution.
- the bias function is
- a ⁇ ( v ) - ⁇ i ⁇ ( v i - v _ i ) 2 2 ⁇ ⁇ i 2 .
- the upper value N i is specified ahead of time.
- Exponential Layer An exponential layer represents data where v i ⁇ + .
- exponential layers have some constraints because a i + ⁇ i W i ⁇ h ⁇ >0 for all values of the connected hidden units. Typically, this limits the types of layers that can be connected to an exponential layer, and requires ensuring that all of the weights are positive.
- a composite layer is not a mathematical object per se as was the case for the previously described layer types. Instead, a composite layer is a software implementation for combining multiple sub-layers of different types to create a meta-layer that can model heterogeneous data.
- Schema 800 in accordance with several embodiments of the invention is conceptually illustrated in FIG. 8 .
- Schema 800 includes descriptions of different layers of a generalized RBM.
- a schema allows for a model to be tuned to handle particular types of data, without requiring burdensome pre-processing by a person.
- the different layers allow for heterogeneous data of different types that may be incomplete and/or irregular.
- DBMs Generalized Deep Boltzmann Machines
- Deep learning refers to an approach to machine learning where the model processes the data through a series of transformations. The goal is to enable the model to learn to construct appropriate features rather than requiring the researcher to craft features using prior knowledge.
- a generalized Deep Boltzmann Machine is essentially a stack of RBMs.
- a generalized DBM in accordance with some embodiments of the invention is illustrated in FIG. 9 .
- the generalized DBM 900 shows a visible layer 910 connected to a hidden layer 920 .
- Hidden layer 920 is further connected to another hidden layer 930 .
- the visible layer 910 is encoded to hidden layer 920 , which then operates like a visible layer for the next hidden layer 930 .
- a DBM can, in principle, be trained in the same way as an RBM.
- DBMs are often trained using a greedy layer-wise process. Examples of greedy layer-wise process are described in R. Salakhutdinov and G. Hinton, in Artificial Intelligence and Statistics (2009) pp. 448-455, which is incorporated by reference herein.
- forward layerwise training of a DBM proceeds by training a sequence of RBMs with energy functions:
- methods in accordance with many embodiments of the invention train DBMs in reverse—starting with the deepest hidden layer ht and working backwards towards v. This ensures that the deepest hidden layer must contain as much information about the visible layer as possible.
- the reverse layerwise training procedure makes use of the fact that a three layer DBM with connectivity v-h 1 -h 2 is the same as a two layer RBM with connectivity [v,h 2 ]-h 1 , allowing RBMs with Composite Layers to talk backwards down the connectivity graph of the DBM.
- Process 1000 trains ( 1005 ) a first RBM with connectivity v-h L .
- Process 1000 samples ( 1010 ) h L ⁇ p(h L
- the process then stacks ( 1015 ) v and h L into a vector [v, h L ] and trains ( 1020 ) a second RBM with connectivity [v, h L ]-h L-1 .
- Process 1000 determines ( 1025 ) whether [v, h 2 ]-h 1 has been reached. When it has not been reached, process 1000 returns to step 1005 .
- process 1100 determines that [v, h 2 ]-h 1 has been reached, the process copies ( 1030 ) the weights from each of these intermediate RBMs into their respective positions in the DBM. In some embodiments. DBMs can then be fine-tuned by regular end-to-end training.
- An autoregressive Boltzmann Machine is a DBM where the hidden layers have undirected edges connecting neighboring time points. As a result, an ADBM relates nodes to their previous timepoints.
- a generalized ADBM in accordance with some embodiments of the invention is illustrated in FIG. 11 .
- the generalized ADBM 1100 shows a visible layer 1110 at time t connected to a hidden layer 1120 , also at time t.
- Hidden layer 1120 is further connected to another hidden layer 1130 that incorporates data that is offset from time t by ⁇ .
- an ADBM is a model for entire sequences that describes the joint probability distribution p(v(0), . . . , v( ⁇ )).
- x(t) [v(t),h 1 (t), . . . , h L (t)] denote the state of all of the layers at time t.
- E DBM (x(t)) be the energy of a DBM given by
- ADBMs as described in the previous section, are able to capture correlations through time, but they are often unable to represent non-stationary distributions or distributions with drift. For example, most patients with a degenerative disease will tend to worsen over time—an effect that the ADBM cannot capture. To capture this effect, many embodiments of the invention implement a Generalized Conditional Boltzmann Machine (GCBM).
- GCBM Generalized Conditional Boltzmann Machine
- this model can be constructed from two DBMs.
- a non-time dependent DBM, p 0 can be trained on all of the data.
- a time dependent DBM can be trained on a Composite Layer created by joining all of the neighboring time points [v(t),v(t ⁇ 1)].
- the second DBM describes the joint distribution p(v(t),v(t ⁇ 1)), which makes it possible to compute both p(v(t)
- the second DBM can be trained on a Composite Layer that can be readily extended to include multiple time lags. e.g., [v(t),v(t ⁇ 1), . . . , v(t ⁇ n)].
- a machine learning model is generative if it learns to draw new samples from an unknown probability distribution.
- Generative models can be used to learn useful representations of data and/or to enable simulations of systems with unknown, or very complicated, mechanistic laws.
- a generative model defined by some model parameters ⁇ describes the probability of observing some variable v. Therefore, training a generative model involves minimizing a distance between the distribution of the data, p d (v), and the distribution defined by the model, p ⁇ (v).
- the traditional method for training a Boltzmann machine maximizes the log-likelihood, which is equivalent to minimizing the forward Kullback-Liebler (KL) divergence:
- the forward KL divergence, D KL (p d ⁇ p ⁇ ), accumulates differences between the data and model distributions weighted by the probability under the data distribution.
- the reverse KL divergence, D KL (p 74 ⁇ p d ), accumulates differences between the data and model distributions weighted by the probability under the model distribution.
- RBMs can be trained using a novel type of f-divergence as a discriminator divergence:
- the function that defines the discriminator divergence is
- a generator that is able to trick the discriminator so that p(data
- the discriminator divergence closely mirrors the reverse KL divergence and strongly punishes models that overestimate the probability of the data.
- BEAM Boltzmann Encoded Adversarial Machine
- FIG. 12 A process for training an adversarial model in accordance with some embodiments of the invention is conceptually illustrated in FIG. 12 .
- the process 1200 draws ( 1205 ) samples from a model, such as (but not limited to) Boltzmann machines such as those described above. Samples can be drawn from a model according to a variety of methods, including (but not limited to) k-steps Gibbs sampling and TDS.
- the process 1200 then computes ( 1210 ) gradients based on the drawn samples.
- Process 1200 trains ( 1215 ) a discriminator based on the drawn samples and computes an adversarial gradient based on the classification of the samples, as either drawn from the model or drawn from the data.
- the process 1200 then computes ( 1220 ) a full compound gradient and updates ( 1225 ) the model parameters using the full gradient.
- FIG. 13 presents some comparisons between Boltzmann machines trained to maximize log likelihood and those trained as BEAMs.
- the examples of this figure illustrate three multimodal data distributions: a bimodal mixture of Gaussians in 1-dimension ( 1310 ), a mixture of 8 Gaussians arranged in a circle in 2-dimensions ( 1320 ), and a mixture of 25. Gaussians arranged in a grid in 2-dimensions ( 1330 ). Problems similar to the 2-dimensional mixture of Gaussians examples are commonly used for testing GANs.
- the regular Boltzmann machine learns a model with a pretty good likelihood by spreading the probability over the support of the data distribution.
- the Boltzmann machines trained using as BEAMs learn to reproduce the data distributions very accurately.
- the first panel 1405 illustrates estimates of the forward KL divergence, D KL (p d ⁇ p ⁇ ), and the reverse KL divergence, D KL (p ⁇ ⁇ p d ), per training epoch.
- the first panel 1405 illustrates that training an RBM as a BEAM decreases both the forward and reverse KL divergences.
- the second panel 1410 illustrates distributions of fantasy particles at various epochs during training. In the early stages of training, the BEAM fantasy particles are spread out across the support of the data distribution capturing the modes near the edge of the grid. These early epochs resemble the distributions obtained with GANs, which also concentrate density in the modes near the edge of the grid. As training progresses, the BEAM progressively learns to capture the modes near the center of the grid.
- FIG. 15 An architecture of a Boltzmann Encoded Adversarial Machine (BEAM) in accordance with some embodiments of the invention is illustrated in FIG. 15 .
- the illustrated example shows two steps of the BEAM architecture.
- a generator e.g., an RBM
- Generators in accordance with a number of embodiments of the invention are trained to encode input data by passing the input data through the visible layer to be encoded in a set of nodes of a hidden layer.
- Generators in accordance with several embodiments of the invention are trained with an objective to generate realistic samples from a complex distribution.
- objective functions for training generators can include a contribution from an adversarial loss generated by a critic (or discriminator).
- a hidden layer of the generator feeds into a classifier of a discriminator (or critic) that evaluates the hidden layers to distinguish samples drawn from the data from samples drawn from the model using tied weights learned by the generator. Therefore, the discriminator (or adversary) is constructed by encoding the visible units using a single forward pass through the layers of the generator and then applying a classifier (e.g., logistic regression, nearest neighbor classifiers, and random forest) trained to discriminate between samples from the data and samples from the model.
- a classifier e.g., logistic regression, nearest neighbor classifiers, and random forest
- the adversary uses the same architecture and weights as the RBM, and encodes visible units into hidden unit activations. These hidden unit activations, computed for both the data and fantasy particles sampled from the RBM, are used by a critic to estimate the distance between the data and model distributions.
- the critic can be any function of the visible and hidden units.
- methods in accordance with several embodiments of the invention use a critic that is monotonically related to p(data
- the discriminator divergence suggests that one could use log p(data
- the optimal discriminator can be approximated as a function of the hidden units activations p(data
- the function g(•) could be implemented by a neural network, as in most GANs, or using a simpler algorithm such as a random forest or nearest neighbor classifier.
- a simple approximation to the optimal discriminator can be sufficient because the classifier can operate on the hidden unit activities of the RBM generator rather than the visible units. Therefore, the optimal critic can be approximated using nearest neighbor methods.
- p(x) is estimated at an arbitrary point x based on a k-nearest-neighbor estimate.
- methods in accordance with some embodiments of the invention fix some positive integer k and compute the k nearest neighbors to x in X.
- d k is defined to be the distance between x and the furthest of the nearest-neighbors and the density p(x) is estimated to be the density of the uniform distribution on a ball of radius d k . That is,
- X ⁇ v 1 , . . . , v 2N ⁇ be a collection of samples where exactly half are drawn from p ⁇ and half from p d .
- the nearest neighbors can be computed from a cached minibatch of samples from the model combined with a minibatch of samples from the training dataset.
- the distance-weighted nearest-neighbor critic is a generalization which adds some continuity to the nearest-neighbor critic by applying an inverse distance weighting to the ratio count. Specifically, let ⁇ d 0 , . . . , d k ⁇ be the distances of the k-nearest neighbors, with ⁇ d 0 , . . . , d j ⁇ the distances for the neighbors originating from the data samples and ⁇ d j+1 , . . . , d k ⁇ the distances for the neighbors originating from the model samples.
- the distance-weighted nearest-neighbor critic can be defined as:
- BEAMs use the RBM as both the generator and as a feature extractor for the adversary.
- this double-usage allows the reuse of a single set of fantasy particles for multiple steps of the training algorithm. Specifically, a single set of M persistent fantasy particles are updated k times per gradient evaluation. In many embodiments, the same set of fantasy particles are used to compute the log-likelihood derivative and the adversarial derivative. Then, these fantasy particles can replace the fantasy particles from the previous gradient evaluation in the nearest neighbor estimates of the critic value. Reusing the fantasy particles for each step means that BEAM training has roughly the same computational cost as training an RBM with PCD.
- the gradients of the log-likelihood and the adversarial term both involve expectation values with respect to the model distribution. Unfortunately, these expectation values cannot be computed exactly. As a result, the expectation values can be approximated using Monte Carlo methods or other approximations. The accuracy of these approximate gradients can have a significant effect on the utility of the resulting model. Different approaches to improving the accuracy of the approximate gradients in accordance with certain embodiments of the invention are described below.
- Drawing samples from a probability distribution is an important component of many processes for training models in accordance with many embodiments of the invention. This can often be done with a simple function call for many 1-dimensional distributions. However, random sampling from Boltzmann machines is much more complicated.
- Sampling from a Boltzmann machine is usually performed using Gibbs sampling.
- Gibbs sampling is a local sampling process, which means that successive samples are correlated.
- Drawing uncorrelated samples requires one to make many Gibbs sampling steps for each successive sample.
- drawing a batch of uncorrelated random samples from a Boltzmann machine can take a long time.
- a batch of random samples is required for each gradient update—if it takes a long time to generate each batch, it can make training a Boltzmann machine take such a long time that it becomes impractical. Therefore, methods that decrease the correlation between successive samples from a Boltzmann machine can greatly accelerate the learning process.
- the fictional temperature is useful because raising the temperature (i.e., decreasing ⁇ ) decreases the autocorrelation between samples.
- the initial energy is E(v,h).
- the intermediate configurations will have varying energies. If the maximal energy from these intermediate configurations is E max then the time to travel from (v,h) to (v′, h′) roughly scales as: ⁇ ⁇ e ⁇ (E max ⁇ E(v,h)) (24) Therefore, decreasing ⁇ will decrease the number of Gibbs sampling steps required to move between distant configurations.
- Processes in accordance with certain embodiments of the invention use a process called parallel tempering (in the machine learning and statistics literature) or replica exchange (in the physics community).
- parallel tempering in accordance with a variety of embodiments of the invention, multiple Gibbs sampling chains are run in parallel, each at a different temperature. Periodically, one attempts to swap the configurations of two chains. In several embodiments, the swap can be accepted or rejected based on a criterion (e.g., the Metropolis criterion) to ensure that entire system stays at equilibrium.
- a criterion e.g., the Metropolis criterion
- the process uses Temperature Driven Sampling (TDS), which greatly improves the ability to train Boltzmann machines without incurring significant additional computational cost.
- the inverse temperatures of each sample can be independently updated once for every Gibbs sampling iteration of the model.
- the updates are autocorrelated across time so that the inverse temperatures are slowly varying.
- An example of sampling from an autocorrelated Gamma distribution is described below.
- TDS includes a standard Gibbs sampling based sequential Monte Carlo sampler in the limit that Var[ ⁇ ] ⁇ 0.
- the samples drawn with TDS are not samples from the equilibrium distribution of the Boltzmann machine. In certain embodiments, the drawn samples are re-weighted to correct for the bias due to the varying temperature.
- Input Number of samples m. Number of update steps k. Autocorrelation coefficient for the inverse temperature 0 ⁇ ⁇ ⁇ 1. Variance of the inverse temperature Var[ ⁇ ] ⁇ 1.
- for i 1,..., m do
- TDS Temperature Driven Sampling improves sampling from a Boltzmann machine.
- GMM refers to samples from a Gaussian mixture model.
- GRBM refers to samples from the equivalent Boltzmann machine drawn using 10 steps of Gibbs sampling.
- TDS refers to samples from the equivalent Boltzmann machine drawn using TDS with 10 steps of Gibbs sampling.
- This example shows a Gaussian mixture model with three modes at ( ⁇ 1, 0, +1) with various standard deviations and using a simple construction to create an equivalent Boltzmann machine with a Gaussian visible layer and a One-hot hidden layer with 3 hidden units.
- the autocorrelation coefficient and the standard deviation of the inverse temperature were set to 0.9 and 0.95, respectively. All starting samples were initialized from the middle mode. Starting from the middle mode, regular Gibbs sampling is unable to sample from the neighboring modes after 10 steps when the modes are well separated TDS, by contrast, has fatter tails allowing for better sampling of the neighboring modes.
- TDS Temperature Driven Sampling
- a patient can be represented as a collection of information that describes their symptoms, their genetic information, results from diagnostic tests, any medical treatments they are receiving, and other information that may be relevant for characterizing their health.
- a vector containing this information about a patient is sometimes called a phenotype vector.
- a method for prognostic prediction in accordance with many embodiments of the invention uses past and current health information about a patient to predict a health outcome at a future time.
- a patient trajectory refers to a time series that describes a patient's detailed health status (e.g., a patient's phenotype vector) at various points in time.
- prognostic prediction takes in a patient's trajectory (i.e., their past and current health information) and makes a prediction about a specific future health outcome (e.g., the likelihood they will have a heart attack within the next 2 years).
- predicting a patient's future trajectory involves predicting all of the information that characterizes the state of their health at all future times.
- discrete time steps e.g., one month
- models for simulating patient trajectories use discrete time steps (e.g., one month).
- the length of the time step in accordance with a number of embodiments of the invention will be selected to approximately match the frequency of treatment.
- a model for patient trajectories in accordance with many embodiments of the invention describes the joint probability distribution of all points along the trajectory, p(v 0 , . . . , v T ).
- p(v 0 , . . . , v T ) can be used for prediction by sampling from the conditional probability distribution p(v ⁇ , . . . , v T
- the model is a Boltzmann machine, as they make it easy to express conditional distributions and can be adapted to heterogeneous datasets, but one skilled in the art will recognize that many of the processes described herein can be applied to other architectures as well.
- Clinical decision support systems provide information to patients, physicians, or other caregivers to help guide choices about patient care. Simulated patient trajectories provide insights into a patient's future health that can inform choices of care. For example, consider a patient with mild cognitive impairment. A physician or caregiver would benefit from knowing the risks that the patient's condition progresses to Alzheimer's disease, or that he or she begins to exhibit other cognitive or psychological systems. In certain embodiments, systems based on simulated patient trajectories can forecast these risks to guide care choices. Aggregating such predictions over a population of patients can also help estimate population level risks, enabling long-term planning by organizations, such as elder care facilities, that act as caregivers to large groups of patients.
- a set of patient trajectories is collected from electronic medical records (also known as real world data), from natural history databases, or clinical trials.
- the patient trajectories in accordance with many embodiments of the invention can be normalized and used to train a time-dependent Boltzmann machine.
- these simulated trajectories can be analyzed to understand the risks associated with specific outcomes (e.g., Alzheimer's diagnosis) at various future times.
- models that are trained on data with treatment information would contain variables that describe treatment choices. Such a model could be used to assess how different treatment choices would change the patient's future risks by comparing simulated outcome risks conditioned on different treatments.
- a caretaker or physician can treat a patient based on the treatment choices and/or the simulated trajectories.
- Randomized Clinical Trials are the gold-standard for evidence in assessing therapeutic efficacy.
- each patient is randomly assigned to one of two study arms: a treatment arm where the patients are treated with an experimental therapy, and a placebo arm where the patients receive a dummy treatment and/or the current standard of care.
- a statistical analysis is performed to determine if patients in the treatment arm were more likely to respond positively to the new therapy than patients in the placebo arm were to respond to the dummy therapy.
- RCTs need to include a large number of patients. For example, it is not uncommon for Phase III clinical trials to include thousands of patients. Recruiting the large number of patients necessary to achieve sufficient power is challenging, and many clinical trials never meet their recruitment goals. Although there is, almost by definition, little-to-no data about an experimental therapy there is likely a lot of data about the efficacy of the current standard of care. Therefore, one way to reduce the number of patients needed for clinical trials is to replace the control arm with a synthetic control arm that contains virtual patients simulated from a Boltzmann machine trained to model the current standard of care.
- Methods in accordance with several embodiments of the invention use simulations to create a synthetic, or virtual, control arm for a clinical trial by training a Boltzmann machine using data from the control arms of previous clinical trials.
- data sets can be constructed by aggregating data from the control arms of multiple clinical trials for a chosen disease. Then, Boltzmann machines can be trained to simulate patients with that disease under the current standard of care. This model can then be used to simulate a population of patients with particular characteristics (e.g., age, ethnicity, medical history) to create a cohort of simulated patients that match the inclusion criteria of new trial.
- each patient in the experimental arm can be matched to a simulated patient with the same baseline measurements by simulating from the appropriate conditional distribution of the Boltzmann machine. This can provide a type of counterfactual (i.e., what would have happened to this patient if they had been given a placebo rather than the experimental therapy). In either case, data from simulated patients can be used to supplement, or in place of, data from a concurrent placebo arm using standard statistical methods in accordance with many embodiments of the invention.
- value based care means that the cost of a drug will be based on how effective it is, rather than a simple cost per pill. As a result, governments and other payers need to be able to compare the effectiveness of alternative therapies.
- Simulations in accordance with many embodiments of the invention provide an alternative approach for performing head-to-head trials.
- detailed individual level data from clinical trials of each drug can be included in the training data for a Boltzmann machine.
- samples generated with a Boltzmann machine, such as a BEAM can be used to simulate a head-to-head clinical trial between A and B.
- individual level data are not usually released for the experimental arms of clinical trials.
- aggregate level data from the experimental arms in accordance with a number of embodiments of the invention can be used to adjust a model that was trained on control arm data.
- the human genome encodes for more than 20 thousands genes that engage in an incredibly complex network of interactions. This network of genetic interactions is so complex that it is intractable to develop a mechanistic model linking genotype to phenotype. Therefore, studies that aim to predict a phenotype from genomic information have to use machine learning methods.
- a common goal of a genomic study in the clinical setting is predicting whether or not a patient will respond to a given therapeutic.
- data describing gene expression e.g., from messenger RNA sequencing experiments
- the response of each patient to the therapeutic is recorded at the end of the trial, and a mathematical model (e.g., linear or logistic regression) is trained to predict the response of each patient from their baseline gene expression data.
- Successful prediction of patient response would enable the sponsor of the clinical trial to use a genomic test to narrow the study population to a subset of patients where the drug is most likely to be successful. This improves the likelihood of success in a subsequent phase-Ill trial, while also improving patient outcomes through precision medicine.
- phase-II clinical trials tend to be small (200 people).
- sequencing experiments used to measure gene expression are still fairly expensive.
- the standard task involves training a regression model with up to 20 thousand features (i.e., the expression of the genes) using less than 200 measurements.
- a linear regression model is underdetermined if the number of features is greater than the number of measurements.
- raw gene expression values are combined into a smaller number of composite features.
- individual genes interact as parts of biochemical pathways, so one approach is to use known biochemical information to derive scores that describe the activation of pathways. Then, pathway activation scores can be used as features instead of raw expression values.
- pathway activation scores can be used as features instead of raw expression values.
- Deep Boltzmann Machines are implemented as a tool for unsupervised feature learning that may be useful for 'omics studies.
- Let v be a vector containing gene expression values determined from an experiment.
- the model in accordance with many embodiments of the invention can be trained without labels; therefore, in some embodiments, a large data set can be compiled by combining many different studies.
- h L v ⁇ dh 1 . . . dh L h L p(h 1 , . . . , h L
- Predicting the effect that a change in the activity, or expression, of a gene will have in-human is important for both drug design and drug development. For example, if one could predict the effect that a compound will have in-human then one could perform high-throughput computational screens for drug discovery. Similarly, if one could predict the effect that an investigational drug will have on different types of patients then one could optimize patient selection for phase II clinical trials even though there is no direct data on the action of the drug in-human.
- transcriptomic responses are predicted using a generative model of gene expression.
- v be a vector of raw gene expression values and let p ⁇ (v) be a model of the distribution of gene expression values that is parameterized by ⁇ .
- the model is parameterized such that ⁇ i is related to the mean value of v i , such that increasing (or decreasing) ⁇ i leads to an increase (or decrease) in v i .
- the effect of a drug that decreases the activity of gene i is simulated by decreasing ⁇ i and computing the change in
- the utility of generative models in accordance with several embodiments of the invention relies on the ability of the model to implicitly learn interactions between gene expression values. That is, the model must know that decreasing the activity of gene i using a therapeutic will—via a complex network of interactions—lead to a decrease in the expression of some other gene j.
- DBMs as described in previous sections of this application are used as a generative model that implicitly (i.e., without trying to construct a mechanistic understanding of biochemical pathways or other methods of direct gene interaction) learns interaction between genes.
- DBMs trained on gene expression data in a fully unsupervised manner do not have a notion of an individual patient. Instead, the vector of observations v can be broken into two pieces: the vector of gene expression values x and a vector of metadata y.
- predictions for individual patients in accordance with several embodiments of the invention can use a notion of locality in gene expression space.
- y): ⁇ log p ⁇ (x
- y) define the energy x given y.
- this also involves integrating over all the hidden layers.
- local measures of gene interactions can be computed from the derivatives of evaluated at x.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Complex Calculations (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
- Image Generation (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
p(v,h)=Z −1 e −E(v,h), (1)
Here, E(v,h) is called the energy function, and Z=∫dvdhe−E(v,h) is called the partition function. In many embodiments, processes use the integral operator. ∫dx, to denote both standard integration or a sum over all of the elements in a discrete set.
or, in vector notation, E(v,h)=−aTv−bTh−vTWh. Notice that visible units interact with the hidden units through the weights, W. However, there are no visible-visible or hidden-hidden interactions.
Similarly, it is easy to compute the conditional moments.
However, it is generally very difficult to compute statistics from the joint distribution. As a result, statistics from the joint distribution have to be estimated using random sampling processes such as Markov Chain Monte Carlo (MCMC).
In the standard formulation of an RBM, there are three parameters a, b, and W. The derivatives are:
Input: Initial configuration (v,h). | |
A number of Monte Carlo steps, k. | |
An RBM. | |
Output: A new configuration (v′,h′). | |
set v0 = v, h0 = h; | |
for i = 1,...,k do | |
| draw hi ~ p(h|vi−1); | |
| draw vi ~ p(v|hi); | |
end | |
return (vk,hk) | |
where a(•) and b(•) are arbitrary functions, and σ>0 and ε>0 are scale parameters of the visible and hidden layers, respectively. Different functions (called layer types) are used to represent different types of data. Examples of layer types used for modeling various types of data are described below.
Both the location,
over the domain vi≥vi low. Both the location,
domain vi high≤vi≥vi low. Both the location,
and then taking the energy as
The upper value Ni is specified ahead of time.
where the outputs of the previous RBM are used as the inputs of the next RBM. It can be difficult to get information from the data distribution to propagate into the deep layers of the model when training a DBM in this forward layerwise way. As a result, it is generally difficult to train DBMs with more than a couple of hidden layers.
The energy function of the ADBM is:
For simplicity, this has been illustrated with a single autoregressive connection connecting the last hidden layer with its previous value. However, one skilled in the art will recognize that this model can be extended to include multiple time delays or inter-temporal connections between layers.
that the sample v was drawn from the data distribution. Therefore, the discriminator divergence can be written as
D D(p d ∥p θ)=−log 2−∫dvp θ(v)log(p(data|v)) (16)
to show that it measures the probability that the optimal discriminator will incorrectly classify a sample drawn from the model distribution as coming from the data distribution.
which is convex with f(1)=0, as required. It can be shown that the discriminator divergence upper bounds the reverse KL divergence:
Input: | |
n = number of epochs; | |
m = number of fantasy particles; | |
k = number of Gibbs sampling steps; | |
α = weight of the likelihood and adversarial gradients | |
Initialize: | |
sample F ~ pθ(v) using k-steps of Gibbs sampling; | |
for epoch = 1,...,n do | |
| while True do | |
| | V ← minibatch; | |
| | if len(V) == 0 then | |
| | | break; | |
| | end | |
| | sample F ~ pθ(v) using k-steps of Gibbs sampling; | |
| | compute the log-likelihood gradient g£(V,F,θ); | |
| | encode {tilde over (V)} = {Epθ(h|v)[h]}v∈V and {tilde over (F)} = {Epθ(h|v)[h]}v∈F; | |
| | train discriminator on {tilde over (V)} and {tilde over (F)}; | |
| | compute the adversarial gradient gV({tilde over (F)},θ); | |
| | compute the full gradient g = αg£ + (1 − α)gV; | |
| | update the model parameters using the gradient; | |
| end | |
end | |
=−γ−(1−γ), (18)
which includes a contribution from adversarial term. , from a critic. Adversarial terms in accordance with a number of embodiments of the invention can be defined as
:=∫dvdhp θ(v,h)T(v,h). (19)
where T(v,h) is a critic function. In some embodiments, the adversary uses the same architecture and weights as the RBM, and encodes visible units into hidden unit activations. These hidden unit activations, computed for both the data and fantasy particles sampled from the RBM, are used by a critic to estimate the distance between the data and model distributions.
where ε is a small parameter that regularizes the inverse distance.
However, the estimates may have a high variance when N is small. On the other hand, mean field estimates such as those derived from the Thouless-Andersen-Palmer (TAP) expansion are analytic and have zero variance, but have a bias that can be difficult to control. Let f(ω)=ωfMC+(1−ω)fMF be an estimate created from a convex combination of a Monte Carlo estimate fMC and a mean field estimate fMF. It is easy to show that Bias2[f]=(1−ω)2Bias2 [fMF] and Var[f]=ω2Var[fMC] so that the mean squared error of f is MSE[f]=Bias2[f]+Var[f]=(1−ω)2Bias2[fMF]+ω2Var[fMC]. Therefore, one can generally choose a value of w to minimize the mean squared error of the combined estimator.
Tempered Sampling
p β(v,h)=Z β −1
The original distribution of the Boltzmann machine is recovered by setting β=1.
π˜e β(E
Therefore, decreasing β will decrease the number of Gibbs sampling steps required to move between distant configurations.
Input: | |
|
|
Variance of the distribution Var[β] < 1. | |
Current value of β. | |
Set: v = 1/Var[β] and c = (1 − ϕ)Var[β]. | |
Draw z ~ Poisson(β*φ/c). | |
Draw β′ ~ Gamma(v+z,c). | |
return β′ | |
Input: | |
Number of samples m. | |
Number of update steps k. | |
Autocorrelation coefficient for the |
|
Variance of the inverse temperature Var[β] < 1. | |
Initialize: | |
Randomly initialize m samples {(vi,hi)}i=1 m. | |
Randomly initialize m inverse temperatures βi ~ Gamma(1/Var[β], | |
Var[β]). | |
for t = 1,...,k do | |
| for i = 1,..., m do | |
| | Update βi using a driven gamma sampler. | |
| | Update (vi,hi) using Gibbs sampling. | |
| end | |
end | |
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/249,854 US11636309B2 (en) | 2018-01-17 | 2019-01-16 | Systems and methods for modeling probability distributions |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862618440P | 2018-01-17 | 2018-01-17 | |
US201962792648P | 2019-01-15 | 2019-01-15 | |
US16/249,854 US11636309B2 (en) | 2018-01-17 | 2019-01-16 | Systems and methods for modeling probability distributions |
Publications (2)
Publication Number | Publication Date |
---|---|
US20190220733A1 US20190220733A1 (en) | 2019-07-18 |
US11636309B2 true US11636309B2 (en) | 2023-04-25 |
Family
ID=67214040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/249,854 Active 2040-10-13 US11636309B2 (en) | 2018-01-17 | 2019-01-16 | Systems and methods for modeling probability distributions |
Country Status (6)
Country | Link |
---|---|
US (1) | US11636309B2 (en) |
EP (1) | EP3740908A4 (en) |
JP (2) | JP7305656B2 (en) |
CN (1) | CN111758108A (en) |
CA (1) | CA3088204A1 (en) |
WO (1) | WO2019143737A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11868900B1 (en) | 2023-02-22 | 2024-01-09 | Unlearn.AI, Inc. | Systems and methods for training predictive models that ignore missing features |
US12008478B2 (en) | 2019-10-18 | 2024-06-11 | Unlearn.AI, Inc. | Systems and methods for training generative models using summary statistics and other constraints |
US12020789B1 (en) | 2023-02-17 | 2024-06-25 | Unlearn.AI, Inc. | Systems and methods enabling baseline prediction correction |
US12051487B2 (en) | 2019-08-23 | 2024-07-30 | Unlearn.Al, Inc. | Systems and methods for supplementing data with generative models |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7305656B2 (en) | 2018-01-17 | 2023-07-10 | アンラーン.エーアイ, インコーポレイテッド | Systems and methods for modeling probability distributions |
US11769070B2 (en) | 2019-10-09 | 2023-09-26 | Cornell University | Quantum computing based hybrid solution strategies for large-scale discrete-continuous optimization problems |
CN110751291B (en) * | 2019-10-29 | 2021-02-12 | 支付宝(杭州)信息技术有限公司 | Method and device for realizing multi-party combined training neural network of security defense |
WO2021257128A2 (en) * | 2020-02-14 | 2021-12-23 | Cornell University | Quantum computing based deep learning for detection, diagnosis and other applications |
CN111563721B (en) * | 2020-04-21 | 2023-07-11 | 上海爱数信息技术股份有限公司 | Mail classification method suitable for different label distribution occasions |
EP3902314B1 (en) * | 2020-04-21 | 2022-10-12 | Rohde & Schwarz GmbH & Co. KG | Method of training a test system for mobile network testing, test system as well as method of mobile testing |
US20210374488A1 (en) * | 2020-06-01 | 2021-12-02 | Salesforce.Com, Inc. | Systems and methods for a k-nearest neighbor based mechanism of natural language processing models |
US11076824B1 (en) * | 2020-08-07 | 2021-08-03 | Shenzhen Keya Medical Technology Corporation | Method and system for diagnosis of COVID-19 using artificial intelligence |
US11847390B2 (en) * | 2021-01-05 | 2023-12-19 | Capital One Services, Llc | Generation of synthetic data using agent-based simulations |
US12106026B2 (en) | 2021-01-05 | 2024-10-01 | Capital One Services, Llc | Extensible agents in agent-based generative models |
JP2024506976A (en) * | 2021-02-22 | 2024-02-15 | ベーリンガー インゲルハイム インターナショナル ゲゼルシャフト ミット ベシュレンクテル ハフツング | System and method for measuring therapeutic efficacy of drugs |
US11282609B1 (en) * | 2021-06-13 | 2022-03-22 | Chorus Health Inc. | Modular data system for processing multimodal data and enabling parallel recommendation system processing |
CN113449205B (en) * | 2021-08-30 | 2021-11-09 | 四川省人工智能研究院(宜宾) | Recommendation method and system based on metadata enhancement |
WO2023233664A1 (en) * | 2022-06-03 | 2023-12-07 | 日本電気株式会社 | Optimization device, optimization method, and program |
US20240169187A1 (en) * | 2022-11-16 | 2024-05-23 | Unlearn.AI, Inc. | Systems and Methods for Supplementing Data With Generative Models |
WO2024118360A1 (en) * | 2022-12-02 | 2024-06-06 | Valo Health, Inc. | System and method for predicting and optimizing clinical trial outcomes |
CN115936008B (en) * | 2022-12-23 | 2023-10-31 | 中国电子产业工程有限公司 | Training method of text modeling model, text modeling method and device |
Citations (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040193019A1 (en) | 2003-03-24 | 2004-09-30 | Nien Wei | Methods for predicting an individual's clinical treatment outcome from sampling a group of patient's biological profiles |
WO2006084196A2 (en) | 2005-02-04 | 2006-08-10 | Entelos, Inc. | Method for defining virtual patient populations |
US20080082359A1 (en) | 2006-09-29 | 2008-04-03 | Searete Llc, A Limited Liability Corporation Of State Of Delaware | Computational systems for biomedical data |
US20090326976A1 (en) | 2008-06-26 | 2009-12-31 | Macdonald Morris | Estimating healthcare outcomes for individuals |
US20100235310A1 (en) | 2009-01-27 | 2010-09-16 | Gage Fred H | Temporally dynamic artificial neural networks |
US20100254973A1 (en) | 2007-06-21 | 2010-10-07 | The Nemours Foundation | Materials and Methods for Diagnosis of Asthma |
US20110218817A1 (en) | 2008-11-12 | 2011-09-08 | Spiegel Rene | Method for carrying out clinical studies and method for establishing a prognosis model for clinical studies |
US8150629B2 (en) | 2005-11-10 | 2012-04-03 | In Silico Biosciences | Method and apparatus for computer modeling of the interaction between and among cortical and subcortical areas in the human brain for the purpose of predicting the effect of drugs in psychiatric and cognitive diseases |
US20140019059A1 (en) | 2012-07-13 | 2014-01-16 | Medical Care Corporation | Mapping Cognitive to Functional Ability |
US20140257128A1 (en) | 2011-06-01 | 2014-09-11 | Drexel University | System and method of detecting and predicting seizures |
US20150010610A1 (en) | 2010-02-18 | 2015-01-08 | Osiris Therapeutics, Inc. | Immunocompatible amniotic membrane products |
US20160140300A1 (en) | 2013-06-12 | 2016-05-19 | University Health Network | Method and system for automated quality assurance and automated treatment planning in radiation therapy |
US20160180053A1 (en) | 2014-12-18 | 2016-06-23 | Fresenius Medical Care Holdings, Inc. | System And Method Of Conducting In Silico Clinical Trials |
US20160222448A1 (en) | 2013-09-27 | 2016-08-04 | The Regents Of The University Of California | Method to estimate the age of tissues and cell types based on epigenetic markers |
WO2016145379A1 (en) | 2015-03-12 | 2016-09-15 | William Marsh Rice University | Automated Compilation of Probabilistic Task Description into Executable Neural Network Specification |
US20170286627A1 (en) | 2016-03-30 | 2017-10-05 | Jacob Barhak | Analysis and verification of models derived from clinical trials data extracted from a database |
US20170344706A1 (en) | 2011-11-11 | 2017-11-30 | Rutgers, The State University Of New Jersey | Systems and methods for the diagnosis and treatment of neurological disorders |
US20170357844A1 (en) | 2016-06-09 | 2017-12-14 | Siemens Healthcare Gmbh | Image-based tumor phenotyping with machine learning from synthetic data |
US20170372193A1 (en) | 2016-06-23 | 2017-12-28 | Siemens Healthcare Gmbh | Image Correction Using A Deep Generative Machine-Learning Model |
US20180018590A1 (en) | 2016-07-18 | 2018-01-18 | NantOmics, Inc. | Distributed Machine Learning Systems, Apparatus, and Methods |
US20180046780A1 (en) | 2015-04-22 | 2018-02-15 | Antidote Technologies Ltd. | Computer implemented method for determining clinical trial suitability or relevance |
US20180315505A1 (en) | 2017-04-27 | 2018-11-01 | Siemens Healthcare Gmbh | Optimization of clinical decision making |
US20190019570A1 (en) | 2017-07-12 | 2019-01-17 | Fresenius Medical Care Holdings, Inc. | Techniques for conducting virtual clinical trials |
WO2019143737A1 (en) | 2018-01-17 | 2019-07-25 | Unlearn Ai, Inc. | Systems and methods for modeling probability distributions |
US10398389B1 (en) | 2016-04-11 | 2019-09-03 | Pricewaterhousecoopers Llp | System and method for physiological health simulation |
US20190303471A1 (en) | 2018-03-29 | 2019-10-03 | International Business Machines Corporation | Missing value imputation using adaptive ordering and clustering analysis |
US20200035362A1 (en) | 2018-07-27 | 2020-01-30 | University Of Miami | System and method for ai-based eye condition determinations |
US10650929B1 (en) | 2017-06-06 | 2020-05-12 | PathAI, Inc. | Systems and methods for training a model to predict survival time for a patient |
US10726954B2 (en) | 2015-04-22 | 2020-07-28 | Reciprocal Labs Corporation | Predictive modeling of respiratory disease risk and events |
US20200357490A1 (en) | 2019-05-07 | 2020-11-12 | International Business Machines Corporation | System for creating a virtual clinical trial from electronic medical records |
US20200395103A1 (en) | 2018-02-21 | 2020-12-17 | Klaritos, Inc. | Methods of performing clinical trials |
US20200411199A1 (en) | 2018-01-22 | 2020-12-31 | Cancer Commons | Platforms for conducting virtual trials |
US20210057108A1 (en) | 2019-08-23 | 2021-02-25 | Unlearn.Al, Inc. | Systems and Methods for Supplementing Data with Generative Models |
US20210117842A1 (en) | 2019-10-18 | 2021-04-22 | Unlearn.AI, Inc. | Systems and Methods for Training Generative Models Using Summary Statistics and Other Constraints |
US20210353203A1 (en) | 2020-05-13 | 2021-11-18 | Rce Technologies, Inc. | Diagnostics for detection of ischemic heart disease |
WO2022101809A1 (en) | 2020-11-10 | 2022-05-19 | University Of Southern California | Noninvasive heart failure detection |
US20220157413A1 (en) | 2019-08-23 | 2022-05-19 | Unlearn.AI, Inc. | Systems and Methods for Designing Augmented Randomized Trials |
US20220172085A1 (en) | 2020-12-01 | 2022-06-02 | Unlearn.AI, Inc. | Methods and Systems to Account for Uncertainties from Missing Covariates in Generative Model Predictions |
WO2022187064A1 (en) | 2021-03-01 | 2022-09-09 | Evelo Biosciences, Inc. | Compositions and methods of treating inflammation using prevotella histicola |
-
2019
- 2019-01-16 JP JP2020539258A patent/JP7305656B2/en active Active
- 2019-01-16 EP EP19741291.9A patent/EP3740908A4/en active Pending
- 2019-01-16 CA CA3088204A patent/CA3088204A1/en active Pending
- 2019-01-16 US US16/249,854 patent/US11636309B2/en active Active
- 2019-01-16 CN CN201980014482.7A patent/CN111758108A/en active Pending
- 2019-01-16 WO PCT/US2019/013870 patent/WO2019143737A1/en unknown
-
2021
- 2021-11-15 JP JP2021185425A patent/JP2022031730A/en not_active Withdrawn
Patent Citations (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040193019A1 (en) | 2003-03-24 | 2004-09-30 | Nien Wei | Methods for predicting an individual's clinical treatment outcome from sampling a group of patient's biological profiles |
WO2006084196A2 (en) | 2005-02-04 | 2006-08-10 | Entelos, Inc. | Method for defining virtual patient populations |
US8150629B2 (en) | 2005-11-10 | 2012-04-03 | In Silico Biosciences | Method and apparatus for computer modeling of the interaction between and among cortical and subcortical areas in the human brain for the purpose of predicting the effect of drugs in psychiatric and cognitive diseases |
US20080082359A1 (en) | 2006-09-29 | 2008-04-03 | Searete Llc, A Limited Liability Corporation Of State Of Delaware | Computational systems for biomedical data |
US20100254973A1 (en) | 2007-06-21 | 2010-10-07 | The Nemours Foundation | Materials and Methods for Diagnosis of Asthma |
US20090326976A1 (en) | 2008-06-26 | 2009-12-31 | Macdonald Morris | Estimating healthcare outcomes for individuals |
US20110218817A1 (en) | 2008-11-12 | 2011-09-08 | Spiegel Rene | Method for carrying out clinical studies and method for establishing a prognosis model for clinical studies |
US20100235310A1 (en) | 2009-01-27 | 2010-09-16 | Gage Fred H | Temporally dynamic artificial neural networks |
US20150010610A1 (en) | 2010-02-18 | 2015-01-08 | Osiris Therapeutics, Inc. | Immunocompatible amniotic membrane products |
US20140257128A1 (en) | 2011-06-01 | 2014-09-11 | Drexel University | System and method of detecting and predicting seizures |
US20170344706A1 (en) | 2011-11-11 | 2017-11-30 | Rutgers, The State University Of New Jersey | Systems and methods for the diagnosis and treatment of neurological disorders |
US20140019059A1 (en) | 2012-07-13 | 2014-01-16 | Medical Care Corporation | Mapping Cognitive to Functional Ability |
US20160140300A1 (en) | 2013-06-12 | 2016-05-19 | University Health Network | Method and system for automated quality assurance and automated treatment planning in radiation therapy |
US20160222448A1 (en) | 2013-09-27 | 2016-08-04 | The Regents Of The University Of California | Method to estimate the age of tissues and cell types based on epigenetic markers |
US20160180053A1 (en) | 2014-12-18 | 2016-06-23 | Fresenius Medical Care Holdings, Inc. | System And Method Of Conducting In Silico Clinical Trials |
WO2016145379A1 (en) | 2015-03-12 | 2016-09-15 | William Marsh Rice University | Automated Compilation of Probabilistic Task Description into Executable Neural Network Specification |
US20180046780A1 (en) | 2015-04-22 | 2018-02-15 | Antidote Technologies Ltd. | Computer implemented method for determining clinical trial suitability or relevance |
US10726954B2 (en) | 2015-04-22 | 2020-07-28 | Reciprocal Labs Corporation | Predictive modeling of respiratory disease risk and events |
US20170286627A1 (en) | 2016-03-30 | 2017-10-05 | Jacob Barhak | Analysis and verification of models derived from clinical trials data extracted from a database |
US10398389B1 (en) | 2016-04-11 | 2019-09-03 | Pricewaterhousecoopers Llp | System and method for physiological health simulation |
US20170357844A1 (en) | 2016-06-09 | 2017-12-14 | Siemens Healthcare Gmbh | Image-based tumor phenotyping with machine learning from synthetic data |
US20170372193A1 (en) | 2016-06-23 | 2017-12-28 | Siemens Healthcare Gmbh | Image Correction Using A Deep Generative Machine-Learning Model |
US20180018590A1 (en) | 2016-07-18 | 2018-01-18 | NantOmics, Inc. | Distributed Machine Learning Systems, Apparatus, and Methods |
US20180315505A1 (en) | 2017-04-27 | 2018-11-01 | Siemens Healthcare Gmbh | Optimization of clinical decision making |
US10650929B1 (en) | 2017-06-06 | 2020-05-12 | PathAI, Inc. | Systems and methods for training a model to predict survival time for a patient |
US20190019570A1 (en) | 2017-07-12 | 2019-01-17 | Fresenius Medical Care Holdings, Inc. | Techniques for conducting virtual clinical trials |
JP2021511584A (en) | 2018-01-17 | 2021-05-06 | アンラーン.エーアイ, インコーポレイテッド | Systems and methods for modeling probability distributions |
CA3088204A1 (en) | 2018-01-17 | 2019-07-25 | Unlearn.AI, Inc. | Systems and methods for modeling probability distributions using restricted and deep boltzmann machines |
WO2019143737A1 (en) | 2018-01-17 | 2019-07-25 | Unlearn Ai, Inc. | Systems and methods for modeling probability distributions |
CN111758108A (en) | 2018-01-17 | 2020-10-09 | 非学习人工智能股份有限公司 | System and method for modeling probability distributions |
EP3740908A1 (en) | 2018-01-17 | 2020-11-25 | Unlearn AI, Inc. | Systems and methods for modeling probability distributions |
JP2022031730A (en) | 2018-01-17 | 2022-02-22 | アンラーン.エーアイ, インコーポレイテッド | System and method for modeling probability distribution |
US20200411199A1 (en) | 2018-01-22 | 2020-12-31 | Cancer Commons | Platforms for conducting virtual trials |
US20200395103A1 (en) | 2018-02-21 | 2020-12-17 | Klaritos, Inc. | Methods of performing clinical trials |
US20190303471A1 (en) | 2018-03-29 | 2019-10-03 | International Business Machines Corporation | Missing value imputation using adaptive ordering and clustering analysis |
US20200035362A1 (en) | 2018-07-27 | 2020-01-30 | University Of Miami | System and method for ai-based eye condition determinations |
US20200357490A1 (en) | 2019-05-07 | 2020-11-12 | International Business Machines Corporation | System for creating a virtual clinical trial from electronic medical records |
WO2021041128A1 (en) | 2019-08-23 | 2021-03-04 | Unlearn.AI, Inc. | Systems and methods for supplementing data with generative models |
US20210057108A1 (en) | 2019-08-23 | 2021-02-25 | Unlearn.Al, Inc. | Systems and Methods for Supplementing Data with Generative Models |
US20220157413A1 (en) | 2019-08-23 | 2022-05-19 | Unlearn.AI, Inc. | Systems and Methods for Designing Augmented Randomized Trials |
EP4018394A1 (en) | 2019-08-23 | 2022-06-29 | Unlearn.AI, Inc. | Systems and methods for supplementing data with generative models |
US20210117842A1 (en) | 2019-10-18 | 2021-04-22 | Unlearn.AI, Inc. | Systems and Methods for Training Generative Models Using Summary Statistics and Other Constraints |
US20210353203A1 (en) | 2020-05-13 | 2021-11-18 | Rce Technologies, Inc. | Diagnostics for detection of ischemic heart disease |
WO2022101809A1 (en) | 2020-11-10 | 2022-05-19 | University Of Southern California | Noninvasive heart failure detection |
US20220172085A1 (en) | 2020-12-01 | 2022-06-02 | Unlearn.AI, Inc. | Methods and Systems to Account for Uncertainties from Missing Covariates in Generative Model Predictions |
WO2022120350A2 (en) | 2020-12-01 | 2022-06-09 | Unlearn.AI, Inc. | Methods and systems to account for uncertainties from missing covariates in generative model predictions |
WO2022187064A1 (en) | 2021-03-01 | 2022-09-09 | Evelo Biosciences, Inc. | Compositions and methods of treating inflammation using prevotella histicola |
Non-Patent Citations (52)
Title |
---|
ADITYA GROVER; MANIK DHAR; STEFANO ERMON: "Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 24 May 2017 (2017-05-24), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081405378 |
Akhtar et al., "Improving the Robustness of Neural Networks Using K-Support Norm Based Adversarial Training", IEEE Access; Publication [online], Dec. 28, 2016, 10 pgs. |
Arici et al., "Associative Adversarial Networks", arXiv:1611.06953v1 [cs.LG], Nov. 18, 2016, 8 pgs. |
Arjovsky et al., "Wasserstein Generative Adversarial Networks", Proceedings of the 34th International Conference on Machine Learning, 2017, 32 pgs. |
Arjovsky, "Wasserstein gan," http://github.com/martinarjovsky/WassersteinGAN, 2017, 10 pgs. |
Balzer et al., "Adaptive pair-matching in randomized trials with unbiased and efficient effect estimation", Statist. Med. 2015, vol. 34, pp. 999-1011; DOI: 10.1002/sim.6380. |
Bengio et al., "Greedy Layer-Wise Training of Deep Networks", Advances in neural information processing systems, 2007, 13 pgs. |
CHARLES K. FISHER; AARON M. SMITH; JONATHAN R. WALSH: "Boltzmann Encoded Adversarial Machines", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 23 April 2018 (2018-04-23), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081229135 |
Chatterjee et al. "Explaining Complex Distributions with Simple Models." 2008. Econophysics. pp. 1-14 (Year: 2008). * |
Cho et al., "Gaussian-Bernoulli deep Boltzmann machine", Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, Texas, Aug. 4-9, 2013, 9 pgs. |
Cho et al., "Gaussian-Bernoulli deep Boltzmann machine", Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, Texas, Aug. 4-9, 2013, 9 pgs. (Year: 2013). * |
Cui et al., "Multilevel Modeling and Value of Information in Clinical Trial Decision Support", BMC Systems Biology (2014) 8:6; DOI 10.1186/s12918-014-0140-0. |
Dutt et al. "Generative Adversarial Networks (GAN) Review." CVR Journal of Science and Technology, vol. 13, Dec. 2017 , pp. 1-5 (Year: 2017). * |
Extended European Search Report for European Application No. 19741291.9, Search completed Sep. 8, 2021, dated Sep. 17, 2021, 12 Pgs. |
Fisher et al., "Boltzmann Encoded Adversarial Machines", Arxiv.org: 1804.08682v1, Apr. 23, 2018, XP081229135, 17 pgs. |
Gabrie et al., "Training Restricted Boltzmann Machines via the Thouless-Anderson-Palmer Free Energy", Advances in Neural Information PRocessing Systems, vol. 28, 2015, 9 pgs. |
Goodfellow et al. "Generative Adversarial Networks." 2014. arXiv:1406.2661 (Year: 2014). * |
Goodfellow et al., "Generative Adversarial Nets", Advances in Neural Information Processing Systems, vol. 27, 2014, 9 pgs. |
Greydanus, "Generative Adversarial Networks for the MNIST dataset", "Mnist gan," http://github.com/greydanus/mnist-gan, 2017, 2 pgs. |
Grover et al., "Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models", Arxiv.org, Cornell University Library, May 24, 2017, XP081405378, 10 pgs. |
Hinton et al., "Reducing the Dimensionality of Data with Neural Networks", Science, vol. 313, No. 5786, Jul. 28, 2006, pp. 504-507. |
Hinton, "A Fast Learning Algorithm for Deep Belief Nets", Neural Computation 18, 2006, pp. 1527-1554. |
Hinton, "A Practical Guide to Training Restricted Boltzmann Machines", Neural networks: Tricks of the trade, Springer, Berlin, Heidelberg, 2012, 21 pgs. |
Hinton, "A Practical Guide to Training Restricted Boltzmann Machines", Neural networks: Tricks of the trade, Springer, Berlin, Heidelberg, 2012, 21 pgs. (Year: 2012). * |
International Preliminary Report on Patentability for International Application PCT/US2019/013870 Report dated Jul. 21, 2020, Mailed Jul. 30, 2020, 4 Pgs. |
International Preliminary Report on Patentability for International Application PCT/US2020/047054, Report dated Feb. 17, 2022, Mailed on Mar. 3, 2022, 5 Pgs. |
International Search Report and Written Opinion for International Application No. PCT/US2019/013870, Search completed Mar. 18, 2019, dated Mar. 27, 2019, 9 pgs. |
International Search Report and Written Opinion for International Application No. PCT/US2020/047054, Search completed Oct. 8, 2020, dated Nov. 23, 2020, 10 Pgs. |
International Search Report and Written Opinion for International Application PCT/US2021/072678, search completed Jan. 31, 2022, dated Jul. 1, 2022, 13 Pgs. |
Karcher et al., "The "RCT augmentation": a novel simulation method to add patient heterogeneity into phase III trials", BMC Medical Research Methodology (2018) 18:75; https://doi.org/10.1186/s1287 4-018-0534-6. |
Kim et al., "Deep Directed Generative Models with Energy-Based Probability Estimation", Arxiv.org, Cornell University Library, Jun. 10, 2016, XP080707281, 9 pgs. |
Kullback et al., "On Information and Sufficiency", The Annals of Mathematical Statistics, vol. 22, No. 1, 1951, pp. 79-86. |
Lamb et al. "GibbsNet: Iterative Adversarial Inference for Deep Graphical Models", 2017, arXiv preprint arXiv:1712.04120v1, 11 pages (Year: 2017). * |
Li et al. "Temperature based Restricted Boltzmann Machines." Jan. 13, 2016. Scientific Reports 6, 12 pages. (Year: 2016). * |
Liu et al., "A Survey of Deep Neural Network Architectures and their Applications", Neurocomputing, Elsevier, Dec. 18, 2016, pp. 11-26. |
López-Ruiz, et al. "Equiprobability, Entropy, Gamma Distributions and Other Geometrical Questions in Multi-Agent Systems." Entropy 2009, 11, 959-971. (Year: 2009). * |
Miotto et al., "Deep Learning for Healthcare: Review, Opportunities and Challenges", Briefings in Bioinformatics 19, No. 6 (2017), 11 pgs. |
Montavon et al., "Wasserstein Training of Restricted Boltzmann Machines", Advances in Neural Information Processing Systems, vol. 29, 2016, 9 pgs. |
Nguyen et al. "Latent Patient Profile Modelling and Applications with Mixed-Variate Restricted Boltzmann Machine." 2013, Advances in Knowledge Discovery and Data Mining, pp. 123-135 (Year: 2013). * |
Nguyen et al. "Supervised Restricted Boltzmann Machines." UAI. 2017. (Year: 2017). * |
Rogers et al., "Combining patient-level and summary-level data for Alzheimer's disease modeling and simulation: a beta regression meta-analysis", Journal of Pharmacokinetics and Pharmacodynamics 39.5 (2012), 20 pgs. |
Salakhutdinov et al., "Deep Boltzmann Machines", Proc. International Conference on Artificial Intelligence and Statistics, 2009, 8 pgs. |
Salakhutdinov et al., "Deep Boltzmann Machines", Proc. International Conference on Artificial Intelligence and Statistics, 2009, 8 pgs. (Year: 2009). * |
Song et al. "Generative Adversarial Learning of Markov Chains", 2017, 7 pages, Accessed at URL https://openreview.net/forum?id=S1L-hCNtl (Year: 2017). * |
Sutskever, et al., "The Recurrent Temporal Restricted Boltzmann Machine", Advances in Neural Information Processing Systems, 2009, 8 pgs. |
TAESUP KIM; YOSHUA BENGIO: "Deep Directed Generative Models with Energy-Based Probability Estimation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 10 June 2016 (2016-06-10), 201 Olin Library Cornell University Ithaca, NY 14853 , XP080707281 |
Tran, et al., "Mixed-Variate Restricted Boltzmann Machines", Asian Conference on Machine Learning, JMLR: Workshop and Conference Proceedings 20, 2011, pp. 213-229. |
Tran, et al., "Mixed-Variate Restricted Boltzmann Machines", Asian Conference on Machine Learning, JMLR: Workshop and Conference Proceedings 20, 2011, pp. 213-229. (Year: 2011). * |
Tuzman, Karen Tkach, "Broadening role for external control arms in clinical trials", Biocentury, Tools & Techniques, reprint from Jul. 15, 2019, 5 pgs. |
Ventz et al., "Design and Evaluation of an External Control Arm Using Prior Clinical Trials and Real-World Data", Clinical Cancer Research 2019; 25:4993-5001; doi: 10.1158/1078-0432.CCR-19-0820. |
Yu et al., "Assessment and adjustment of approximate inference algorithms using the law of total variance", arxiv:1911.08725v1 [stat.CO], Nov. 20, 2019, 29 pgs. |
Zhang et al. "Predictive Deep Boltzmann Machine for Multiperiod Wind Speed Forecasting." 2015. IEEE Transactions on Sustainable Energy, vol. 6, Issue 4, pp. 1416-1425 (Year: 2015). * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12051487B2 (en) | 2019-08-23 | 2024-07-30 | Unlearn.Al, Inc. | Systems and methods for supplementing data with generative models |
US12008478B2 (en) | 2019-10-18 | 2024-06-11 | Unlearn.AI, Inc. | Systems and methods for training generative models using summary statistics and other constraints |
US12020789B1 (en) | 2023-02-17 | 2024-06-25 | Unlearn.AI, Inc. | Systems and methods enabling baseline prediction correction |
US11868900B1 (en) | 2023-02-22 | 2024-01-09 | Unlearn.AI, Inc. | Systems and methods for training predictive models that ignore missing features |
US11966850B1 (en) | 2023-02-22 | 2024-04-23 | Unlearn.AI, Inc. | Systems and methods for training predictive models that ignore missing features |
Also Published As
Publication number | Publication date |
---|---|
US20190220733A1 (en) | 2019-07-18 |
EP3740908A1 (en) | 2020-11-25 |
CN111758108A (en) | 2020-10-09 |
JP2021511584A (en) | 2021-05-06 |
JP2022031730A (en) | 2022-02-22 |
CA3088204A1 (en) | 2019-07-25 |
JP7305656B2 (en) | 2023-07-10 |
EP3740908A4 (en) | 2021-10-20 |
WO2019143737A8 (en) | 2023-03-23 |
WO2019143737A1 (en) | 2019-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11636309B2 (en) | Systems and methods for modeling probability distributions | |
Ghoshal et al. | Estimating uncertainty in deep learning for reporting confidence to clinicians in medical image segmentation and diseases detection | |
Heidari et al. | The COVID-19 epidemic analysis and diagnosis using deep learning: A systematic literature review and future directions | |
Weng et al. | Disease prediction with different types of neural network classifiers | |
Che et al. | Deep computational phenotyping | |
Khagi et al. | Comparative analysis of Alzheimer's disease classification by CDR level using CNN, feature selection, and machine‐learning techniques | |
Davoudi et al. | Evolving convolutional neural network parameters through the genetic algorithm for the breast cancer classification problem | |
Lu et al. | A method for optimal detection of lung cancer based on deep learning optimized by marine predators algorithm | |
Al-Ali et al. | ANFIS-Net for automatic detection of COVID-19 | |
Mabrouk et al. | Medical image classification using transfer learning and chaos game optimization on the internet of medical things | |
Zhuang et al. | CS-AF: A cost-sensitive multi-classifier active fusion framework for skin lesion classification | |
Ahsen et al. | Unsupervised evaluation and weighted aggregation of ranked classification predictions | |
Torse et al. | Optimal feature selection for COVID-19 detection with CT images enabled by metaheuristic optimization and artificial intelligence | |
Das et al. | Managing uncertainty in imputing missing symptom value for healthcare of rural India | |
Alkhathlan et al. | Predicting and classifying breast cancer using machine learning | |
Reddy et al. | Classification of vertebral column using naïve bayes technique | |
Nahian et al. | Common human diseases prediction using machine learning based on survey data | |
Marivate et al. | Quantifying uncertainty in batch personalized sequential decision making | |
US20240303493A1 (en) | Systems and Methods for Training Conditional Generative Models | |
Mienye | Improved Machine Learning Algorithms with Application to Medical Diagnosis | |
Amiri | Uncertainty Quantification in Neural Network-Based Classification Models | |
Krishnamoorthy et al. | A novel NASNet model with LIME explanability for lung disease classification | |
US20240169188A1 (en) | Systems and Methods for Training Conditional Generative Models | |
Raihan et al. | A deep learning and machine learning approach to predict neonatal death in the context of São Paulo | |
Drost | Uncertainty estimation in deep neural networks for image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: UNLEARN.AI, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FISHER, CHARLES KENNETH;SMITH, AARON MICHAEL;WALSH, JONATHAN RYAN;REEL/FRAME:048095/0641 Effective date: 20190116 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |