US20200285948A1 - Robust auto-associative memory with recurrent neural network - Google Patents
Robust auto-associative memory with recurrent neural network Download PDFInfo
- Publication number
- US20200285948A1 US20200285948A1 US16/646,071 US201816646071A US2020285948A1 US 20200285948 A1 US20200285948 A1 US 20200285948A1 US 201816646071 A US201816646071 A US 201816646071A US 2020285948 A1 US2020285948 A1 US 2020285948A1
- Authority
- US
- United States
- Prior art keywords
- input
- neural network
- recurrent neural
- associative memory
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- a memory system that can retrieve a data item based on a description of its content rather than by knowing its location is called a “content addressable” memory. If, furthermore, the retrieval can be done in spite of partial or imperfect knowledge of the contents, the memory system is said to be a “robust content addressable” memory.
- a Hopfield network may be used as a robust content addressable memory.
- such a Hopfield network is auto-associative.
- a variation on a Hopfield network, called a “Bi-directional associative memory (BAM)” is hetero-associative.
- the number of learned parameters (a measure of memory capacity) in a Hopfield network is just the number of undirected arcs, which is to say, the number of unordered pairs of input variables. That is, the number of learned parameters of a Hopfield network is roughly one-half the square of the number of input variables. For a BAM, the number of learned parameters is the product of the number of input variables times the number of output variables.
- the capacity of the memory is determined by the number of input and output variables rather than by the number of data items to be learned.
- the number of scalar values to be represented is the product of the number of data items times the number of input variables.
- a Hopfield network does not have the capacity to learn a data base with say, 100,000,000 data items.
- a high-resolution color image may have twenty-million pixels in three colors, so the number of learned parameters in a Hopfield network would be on the order of two hundred trillion for a black-and-white image or sixteen quadrillion for the full color image, which would be totally impractical.
- Hopfield networks and BANIs are connected via undirected arcs and, unlike deep layered neural networks, do not have hidden layers and are not trained by the highly successful back propagation computation commonly used in deep learning.
- the present invention uses a machine learning system, such as a deep neural network, to implement a robust, content-addressable auto-associative memory system.
- the memory is content-addressable in the sense that an item in the memory system may be retrieved by a description or example rather than by knowing the address of the item in memory.
- the memory system is robust in that an item can be retrieved from a query that is a transformed, distorted, noisy version of the item to be retrieved. An item may also be retrieved based on an example of only a small portion of the item.
- the associative memory system may also be trained to be a classifier. The memory system is recurrent and auto-associative because it operates by feeding its output back to its input.
- the memory system may be based on a layered deep neural network with an arbitrary number of hidden layers.
- the number of learned parameters may be varied based on the number of data items to be learned or other considerations.
- Embodiments based on such deep neural networks may be trained by well-known deep learning techniques such as stochastic gradient descent based on back propagation.
- FIG. 1 is a block diagram of an illustrative embodiment of the invention described herein.
- FIG. 1A is a block diagram of another aspect of an illustrative embodiment of the invention.
- FIG. 1B is a flow chart of a process for training the associative memory according to various embodiments of the present invention.
- FIGS. 2A and 2B are flow charts of aspects of an illustrative process for training various embodiments of the invention.
- FIG. 3 is a diagram of a computer system for implementing various embodiments of the invention.
- FIG. 4 is a diagram of a type of neural network that may be used in various embodiments of the invention.
- FIG. 5 is a diagram depicting unrolling of a recurrent network.
- FIG. 1 is an illustrative embodiment of a robust auto-associative memory system 100 with corrective training.
- the associative memory unit 104 is a large machine learning system, for example a deep neural network, such as the example deep neural network illustrated in FIG. 4 .
- the machine learning system 104 by itself, may be a deep feed-forward neural network. It becomes recurrent and auto-associative because of the feedback 99 from the output prediction of the full pattern ( 105 ) back to the input to machine learning system 104 . Because the machine learning system 104 may, for example, be a layered deep, feed-forward neural network, it may have an arbitrary number of learned parameters.
- the learned parameters may comprise trainable parameters for the machine learning system 104 , such as the weights of directed arcs in a deep, feed-forward neural network and/or activation biases for nodes in the network.
- the number of learned parameters needs to be the number of data items times the number of variables in each data item, perhaps times a small multiple (e.g., a number greater than but close to 1.0) to allow for redundancy and robustness.
- the number of learned parameters in the machine learning system 104 may be adjusted to meet this requirement.
- the number of learned parameters may be based in part on the number of data items to be learned (times the number of input variables).
- the factor that is based on the number of data items may be either much smaller than the number of input or output variables or may be much larger than the number of input or output variables.
- extra learned parameters may be used to achieve better performance.
- the machine learning system 104 may have any number of layers enabling it to compute arbitrarily complex functions, for example to better model and compensate for transformations and degradations to the input. If the machine learning system 104 is a deep neural network, the recurrence due to the feedback from the output 105 back to the input of network 104 may be unrolled by making multiple copies of the network 104 , producing a single, large feed-forward neural network.
- This feed-forward deep neural network has directed arcs rather than undirected arcs as in a Hopfield network or in a bi-directional associative memory (BAM).
- BAM bi-directional associative memory
- it may be trained by stochastic gradient descent computed by feed-forward activation and back propagation of the partial derivatives of the objective.
- Deep neural networks and the methods of stochastic gradient descent, unrolling the recurrence, feed-forward activation, and back propagation of partial derivatives are discussed in association with FIG. 4 .
- Embodiments of the machine learning system 104 other than deep neural networks may also have as many learned parameters as necessary, including having the number of learned parameters be based in part on the number of data items to be learned.
- an auto-associative memory The task of an auto-associative memory is to memorize its training data.
- a robust auto-associative memory 100 not only memorizes its training data, it is able to retrieve an example from its training data given only a partial input or a degraded input.
- the dashed arrows from blocks 106 , 107 to block 105 represent backpropagation for training the associative memory 104 and the solid arrows represent feedforward activation, with backpropagation in the opposite direction from feedforward activation.
- the associative memory unit 104 is not a deep neural network, a similar training process may be used if the machine learning system implementing the associative memory 104 can be trained to avoid producing negative examples as well as to produce positive examples.
- FIG. 3 An illustrative example of a computer system 300 that may perform the computations associated with FIG. 1 and other figures is shown in FIG. 3 .
- FIG. 1 shows a system diagram of a robust auto-associative memory machine learning system 100 configured for an illustrative embodiment of an aspect of the invention in which a computer system 300 performs preliminary training of a robust associative memory 104 , with a moderate level of transformations and data augmentations of input 101 optionally performed by the computer system 300 in step 102 .
- a training process with more extreme transformations and data augmentations is discussed in association with an illustrative flow chart in FIGS. 2A and 2B .
- the number of learned parameters in the associative memory unit may be increased as necessary to achieve the required performance.
- the number of learned parameters may be varied proportional to the number of data items to be learned. If the associative memory unit 104 is a deep neural network, the number of nodes in a layer may be increased or the number of layers may be increased. An illustrative example of an associative memory machine learning system during operational use is shown in FIG. 1A .
- the computer system 300 presents each training example ( 101 ) multiple times to the associative memory 104 , each time with zero or more randomly generated transformations, degradations, and subsampling ( 102 ). As indicated by the dotted arrow from 101 to 104 , in some embodiments, there is no transformation done in step 102 .
- the original input data pattern ( 101 ) may be transformed ( 102 ) by translation, rotation, or any linear transformation.
- the original data ( 101 ) may be degraded ( 102 ) by additive or multiplicative noise.
- the input ( 101 ) may be changed in other ways that are used for data augmentation ( 102 ) in training classifiers.
- Data augmentation is well-known to those skilled in the art of training machine learning systems.
- only a subset of the original input variables ( 101 ) or only a subset of the transformed input variables ( 102 ) are retained for the next transformation or to be sent to the associative memory 104 .
- each of these types of transformations may be used in combination with others, with each types of transformation possibly used multiple times, as indicated by the arrow from block 102 back to itself.
- the amount of a translation, rotation, or other transformation at step 102 may be characterized by a parameter, such as the distance of the translation or the angle of the rotation.
- This transformation-characterizing parameter may directly be controlled as a hyperparameter or its maximum magnitude may be controlled by a hyperparameter.
- Hyperparameter values may be set by the system designer or may be controlled by a second machine learning system, called a “learning coach.”
- a learning coach may also adjust the number of learned parameters based on the number of data items to be learned, such as, for example, by adding nodes and/or layers, to thereby increase the number of arcs that need weights and the number of nodes that need biases.
- a learning coach is a second machine learning system that is trained to help manage the learning process of a first machine learning system.
- Learning coaches are described in more detail in the following applications, which are incorporated herein by reference in their entirety: published PCT application Pub. No. WO 2018/063840 A1, published Apr. 5, 2018, entitled “LEARNING COACH FOR MACHINE LEARNING SYSTEM”; and PCT Application No. PCT/US18/20887, filed Mar. 5, 2018, entitled “LEARNING COACH FOR MACHINE LEARNING SYSTEM.”
- the computer system 300 trains the robust auto-associative memory machine learning system 100 to predict the full original pattern 105 with the original, untransformed input data example as its target 106 .
- the computer system 300 also feeds back (via line 99 ) predicted pattern 105 as input to the associative memory 104 .
- the associative memory unit 104 is a recursive system, with its output repeatedly fed back to its input.
- the objective of the recursion is to produce a better match between the predicted pattern 105 and the target 106 with each round of the recursion.
- the computer system 300 causes the associative memory unit 104 to refine its prediction 105 of the full, undegraded pattern 106 .
- the computer system 300 may cause the associative memory 104 , for example, to recover some of the missing parts of the target 106 and to remove part of the noise and distortion from the pattern 105 in the first round of the recursion. With that more complete, somewhat cleaner, input, the computer system 300 may then cause the associative memory unit 104 to recover more of the original pattern 106 in the next round, and so on, until a stopping point is met, i.e., the output of the associative memory 104 satisfactorily matches the original pattern.
- the memorized patterns are the fixed points of this recursive process.
- examples of the input 101 and of the prediction 105 for later rounds of the recursion are saved by the computer system 300 in storage 109 . The choice of when a pattern should be saved may be made by fixed rules set by the system developer or may be made by a learning coach.
- FIG. 1B is a flow chart illustrating the training process of the associative memory system 100 according to various embodiments.
- the computer system 300 selects the first epoch of training data.
- the computer system 300 selects one of the training examples from the original epoch and transforms the selected training example at step 122 .
- the transformation at step 122 may comprise a transformation, augmentation, sub sampling, etc. of the selected training example.
- the computer system 300 trains the associative memory unit 104 with the transformed training example.
- Step 123 may be repeated recursively as described above, as indicated by the feedback loop from step 124 , until the stopping point is met (e.g., the output of the associate memory unit 104 satisfactorily matches the training example).
- the process advances to step 125 .
- step 125 if the selected training example is to be transformed in a different way, the process returns to step 122 , where the training example selected at step 121 is transformed in a second (or additional) random manner, and then the associative memory 104 is recursively trained at steps 123 - 124 with the training example transformed in a second (or additional manner).
- This second, outer loop (i.e., steps 122 to 125 ) can be repeated for a desired number of transformations to a training example. That number may be a fixed number (e.g., set by the system designer) or controller by a hyperparameter. Step 125 is also optional. That is, in other embodiments, a training item selected at step 121 is only transformed once for training the associative memory.
- step 126 the process advances to step 126 , where, as shown by the feedback loop from step 126 back to step 121 , the process is repeated for the next training example in the epoch, and so on until the process has been performed for all of the training examples in the epoch.
- the process is then repeated for multiple epochs until the training has converged or another stopping criterion, such as a specified number of epochs, is met, as indicated by the feedback loop from the decision step 127 back to step 120 .
- the use of the fixed points of the recursive process to represent memorized data items results in the robustness and other remarkable properties for the robust auto-associative memory system 100 .
- an entire complex image may be recovered from a small piece of the image if the small piece occurs in only one image in the set of images memorized by the robust auto-associative memory unit 100 .
- a memorized document e.g., word processing document, pdf file, spreadsheet, presentation, etc.
- an audio recording may be recovered from a small interval of sound.
- the auto-associative memory system 100 may be trained to be robust against a wide variety of transformations or noise.
- a work of art or a photograph may be retrieved from a memorized database from a sketch-like query.
- This robustness and other properties are further enhanced by another training process for which an illustrative embodiment is discussed in association with FIGS. 2A and 2B .
- the training process illustrated in FIGS. 1 and 1B also may include negative feedback from negative examples, such as 107 .
- the recursive function implemented by the associative memory 104 may have other fixed points in addition to the memorized training data examples 101 .
- the computer system 300 may train the associative memory unit 104 to eliminate such extra fixed points by training it not to generate them as output by negative feedback from the undesired fixed points as negative examples 107 .
- Another example of use of negative feedback is for the computer system to cause the auto-associative memory unit 104 to forget or erase the memory of a pattern it has been previously trained to remember.
- the computer system 300 can train a separate associative memory (not shown) to learn all the patterns in one classification category but not to learn examples from any other category.
- the negative examples then give negative feedback to the separate associative memory when it outputs a pattern that matches a different category than the intended category, for example, as judged by an independent classifier (not shown).
- an independent classifier not shown
- Other uses of negative examples are discussed in association with FIGS. 1A, 2A and 2B .
- Normal feedback may be represented by a loss function that has its minimum at a data item that is an intended target or positive example.
- Negative feedback may be represented by a loss function that has its maximum at a data item that is a negative example.
- the partial derivatives of the loss function may be computed by computer system 300 using back propagation if the associative memory 104 is a neural network. Back propagation is well-known to those skilled in the art of training neural networks and is discussed in association with FIG. 4 .
- computer system 300 may train the associative memory unit 104 on negative examples 107 by whatever method is appropriate for the type of machine learning system being used in the associative memory 104 .
- the number of learned parameters for the associative memory 104 may be adjusted during the learning process. For example, such a change may be based on testing the performance of the associative memory 104 on new data not used in the training. The number of learned parameters may be increased to increase the capacity of the memory 104 or the number of learned parameters may be decreased if testing reveals that there is spare capacity. Where, for example, the associative memory 104 comprises a deep neural network, additional learned parameters may be added by adding additional directed arcs, by adding additional nodes, or by adding additional layers to the network.
- FIG. 1A is a system diagram of the robust auto-associative memory machine learning system 100 of FIG. 1 in operational use.
- FIG. 1A is an illustration of the same system as illustrated in FIG. 1 , except configured for operational use rather than for training.
- components 104 , 105 , and 109 in FIG. 1A are the same as components 104 , 105 , and 109 , respectively, in FIG. 1 .
- the computer system 300 provides any input 103 to the associative memory 104 , not just training data items 101 and their transformations 102 , since in operational use, the purpose of the associative memory is to output a pattern 105 that corresponds to a pattern from its training examples 101 (see FIG. 1 ) in response to any input 103 .
- the computer system 300 provides the input 103 directly to the associative memory 104 . That is, there is no distortion, etc., as in block 102 of FIG. 1 . Having presented the associative memory 104 with an input data item 103 , the computer system 300 computes the output of associative memory 104 . For example, if associative memory is implemented as a deep neural network, the computer system 300 does a feed-forward activation computation from the input 103 to compute the activations of the nodes in the network 104 and then the activations of the output nodes 105 . The feed-forward activation computation is explained in association with FIG. 4 and is well-known to those skilled in the art of deep neural networks.
- the computer system 300 applies the output 105 recursively as input to the associative memory 104 .
- the computer system 300 repeats this recursion until the recursion converges or a stopping condition is met.
- a possible stopping condition is detection of an infinite cycle.
- An infinite cycle may be detected by observing that an output for a cycle is identical to the output for some previous cycle.
- the computer system 300 may save an output example 109 at convergence or at any stage of the recursive process, including the input 103 , for use in later training or for detection of an infinite cycle.
- all stages of the recursion process are saved.
- examples to be saved may be selected by a learning coach.
- the associative memory 104 is a machine learning system, such as a deep neural network, its task is not to classify its input as in most machine learning tasks. Instead, as its name implies, the associative memory 104 has the task of retrieving from its “memory” of the training data 101 (see FIG. 1 ) a training data item 105 that is “associated” with input 103 .
- the set of input items 103 that may be associated with a specific training data item 105 is the set of input items 103 such that the recursion converges to the specific training data item.
- the computer system 300 may save the output at convergence in 109 for later use in training as a negative example 107 of FIG. 1 . In some embodiments, the computer system 300 may save some or all of the data items that occur in an infinite cycle as negative examples.
- the computer system 300 may save the input and one or more of the intermediate stage outputs in association with the training data example as positive examples 108 for future training.
- the computer system For either negative or positive examples transferred by computer system 300 to block 108 , if the learned parameters of the associative memory 104 are being trained during operation, such as with adaptive training, then the computer system also saves in 108 a snapshot of the current values of the learned parameters of the associative memory unit 104 and links the snapshot to the corresponding positive or negative example.
- a negative example saved in block 108 of FIG. 1A is available to use as a negative example 107 in FIG. 1 .
- FIGS. 2A and 2B together represent a flow chart of an illustrative embodiment of another training process.
- the training process illustrated in FIGS. 2A and 2B may be based on a system such as the illustrative system 100 shown in FIG. 1 , but the training process illustrated in FIGS. 2A and 2B differs from the training process described above in association with FIG. 1 in several important aspects.
- the training process may make use of classification category labels and the robust auto-associative memory system 100 of FIG. 1 may be used as a classifier.
- the training process illustrated in FIGS. 2A and 2B is an iterative process that actively increases the amount or degree of the transformations 102 applied to the input 101 and tests to see if the amount or degree of a transformation 102 is too great.
- FIG. 2B illustrates the iterative aspect of the training process.
- FIG. 2A illustrates some aspects of the initial part of the training process and some aspects that apply to the training process as a whole rather than to individual steps in the process.
- the computer system 300 obtains a preliminary auto-associative memory system, such as the system 100 illustrated in FIG. 1 .
- the computer system 300 may obtain the preliminary auto-associative memory system by using the training process discussed in association with FIGS. 1 and 1B , or may obtain the preliminary auto-associative memory from a previous use of the training process illustrated in FIGS. 2A and 2B .
- the previous training of the preliminary auto-associative memory system 100 is done using the same training data set as to be used in the current training, or a subset of the training data set to be used in the current training.
- the preliminary auto-associative memory system 100 may be trained to “forget” a previous training data example by using that previous example as a negative training example.
- the preliminary auto-associative memory system 100 may be untrained; for example, it may be a deep neural network with random initialization of the connection weights. Such random initialization is well-known to those skilled in the art of training deep neural networks.
- step 201 the computer system 300 obtains training data that is labeled with classification category labels.
- the computer system 300 may store the training data in memory.
- step 202 the computer system 300 sets aside some of the training data obtained at step 201 .
- the data set aside at step 202 may be used for development testing by a learning coach.
- the computer system 300 applies rules and procedures 203 , 204 , and 205 throughout the training process, as described below.
- the computer system 300 uses the auto-associative memory system 100 (e.g., the one obtained at step 200 ) as a classifier.
- the training data items obtained at step 201 are labeled with a classification category.
- the illustrative embodiment of the auto-associative memory system 100 shown in FIG. 1A accepts any input 103 and attempts to compute an output pattern 105 that closely matches a training data item.
- the computer system has obtained training data with classification category labels.
- the computer system 300 includes the classification category or an identifier associated with the classification category as part of the input data item to the auto-associative memory unit 104 and retrieves that information as part of the output pattern 105 .
- the identifier associated with a training data item may be a hash code or other index with check sums or error-correcting encoding.
- the computer system uses this redundant encoding to determine its estimate of the index of the actual item or of the classification category of the untransformed input 101 .
- the auto-associative memory system 100 acts as a classifier.
- the computer system 300 may control the maximum amount or degree to be allowed of a transformation or data augmentation in step 102 of FIG. 1 .
- the maximum amount allowed may be represented by a hyperparameter “delta.”
- the maximum magnitude of a translation may be a specified function of delta; the maximum degree of rotation may be some other specified function of delta; the maximum standard deviation of noise may be yet another specified function of delta; the maximum fraction of input variables that may be deleted may be a specified function of delta; and/or the maximum degree of each other transformation may be a specified function delta.
- these specified functions may be set by the system developer. In some embodiments, a learning coach may be trained to optimize these specified functions.
- the hyperparameter delta as implemented in this illustrative embodiment of procedure 203 may be used in step 211 of the iterative training loop shown in FIG. 2B .
- the computer system 300 may train the auto-associative memory 100 using the unlabeled converged output pattern 105 as a negative example.
- step 211 the computer system 300 trains the auto-associative memory 100 with a larger value of delta than has been used in previous passes through the loop from step 211 to step 217 , if there have been any previous passes.
- the number of learned parameters may also be adjusted in step 211 .
- the number of learned parameters may be either increased or decreased, as indicated by the development testing in steps 212 and 216 .
- delta is set at a small value.
- the value of delta is set to zero, representing that no transformations, data augmentations, or input variable deletions are to be performed in step 102 of FIG. 1 .
- the computer system 300 increases the value of delta by an amount controlled by a hyperparameter.
- step 212 the computer system 300 measures the performance of the auto-associative memory system 100 acting as a classifier. Preferably, this performance measurement is made on development data that has been set aside from the training data, as specified in step 202 of FIG. 2A .
- step 213 the computer system 300 compares the performance measured in step 212 with measures of performance from previous passes through the loop from step 211 to step 217 or from the preliminary auto-associative memory obtained in step 200 of FIG. 2A . If there is degradation by more than a specified amount, the computer system 300 proceeds to step 214 , otherwise the computer system returns to step 211 . In some embodiments, the computer system 300 allows no degradation in performance in step 213 . In other words, if there is possible further improvement (i.e., more than zero degradation), the process returns to step 211 ; otherwise, the process advances to step 214 . In other embodiments, some degradation in performance is allowed but the limit on the amount of degradation in performance is measured against the best previous performance, not just against the performance of the immediately preceding pass through the loop from step 211 to step 217 .
- step 214 the computer system generates data according to step 102 of FIG. 1 with various types of transformations, data augmentations, noise, and input variable sub sampling at or near the limiting value delta.
- the computer may also select data that was generated in step 211 .
- the computer system is generating or selecting transformed input examples that would not have been generated in the previous passes through the loop from step 211 to step 217 with smaller values of delta. Thus, these data examples would not have been generated in the previous iteration that had better performance than the performance measured in step 212 of the current pass through the loop.
- Some of the data examples generated or selected in step 214 in the current round may be responsible for the degradation in measured performance.
- the computer system is attempting to replicate on training data the kind of degradation in performance that was measured on development data in step 212 .
- step 215 the computer system classifies data generated in step 214 and counts as a misclassification any data 102 that is classified with a category different from the category of the untransformed data 101 . Under control of hyperparameters, some or all of these misclassifications are used for training as negative examples.
- step 216 the computer system again measures the performance of the auto-associative memory system 100 acting as a classifier with the performance measured on set-aside development data.
- the same set of set-aside development data may be used for performance comparisons in step 213 so that the performance difference does not depend on differences in the data on which the performance is measured.
- a second set of set-aside development data is used to confirm the cumulative progress of multiple passes through the loop from step 211 to step 217 .
- step 217 the computer system again compares the performance of the current system with the previous performance. If there has been an improvement in performance, the computer system returns to step 211 to continue the process with a larger value of delta. If there has been no improvement in performance, the computer system backs up the learned parameter values to the best performance values previously obtained and terminates the training process.
- the computer system may experimentally try some of these other remedial actions and return to step 214 multiple times to try to find an improvement and proceeding to step 211 before eventually deciding to proceed to step 218 .
- a learning coach may modify the specified function of delta for one or more types of transformation eliminating or limiting the amount of increase in such a transformation.
- the computer system may merely return to step 214 multiple times to generate more negative examples.
- the auto-associative memory unit 104 has generally been described as a single neural network, such as shown in FIG. 4 , many variations are possible within the scope and spirit of the invention. As already mentioned, other types of machine learning systems may be used if they support back propagation of partial derivatives of the objective or if they merely support training against negative examples by some other means. As another example, the auto-associative memory unit 104 may be an ensemble of neural networks or other machine learning systems. As a particular example, with labeled training data, a separate auto-associative memory unit may be trained for each classification category with data exclusively consisting of the single classification category.
- FIG. 3 is a diagram of a computer system 300 that could be used to implement the embodiments described above, such as the process described in FIG. 1 .
- the illustrated computer system 300 comprises multiple processor units 302 A-B that each comprises, in the illustrated embodiment, multiple (N) sets of processor cores 304 A-N.
- Each processor unit 302 A-B may comprise on-board memory (ROM or RAM) (not shown) and off-board memory 306 A-B.
- the on-board memory may comprise primary, volatile and/or non-volatile, storage (e.g., storage directly accessible by the processor cores 304 A-N).
- the off-board memory 306 A-B may comprise secondary, non-volatile storage (e.g., storage that is not directly accessible by the processor cores 304 A-N), such as ROM, MDs, SSD, flash, etc.
- the processor cores 304 A-N may be CPU cores, GPU cores and/or AI accelerator cores. GPU cores operate in parallel (e.g., a general-purpose GPU (GPGPU) pipeline) and, hence, can typically process data more efficiently that a collection of CPU cores, but all the cores of a GPU execute the same code at one time.
- AI accelerators are a class of microprocessor designed to accelerate artificial neural networks. They typically are employed as a co-processor in a device with a host CPU 310 as well. An AI accelerator typically has tens of thousands of matrix multiplier units that operate at lower precision than a CPU core, such as 8-bit precision in an AI accelerator versus 64-bit precision in a CPU core.
- the different processor cores 304 may train and/or implement different networks or subnetworks or components.
- the cores of the first processor unit 302 A may implement the auto-associative memory 104 and the second processor unit 302 B may implement the learning coach.
- the cores of the first processor unit 302 A may train the machine learning system (e.g., neural network) of the auto-associative memory 104 according to techniques described herein, whereas the cores of the second processor unit 302 B may learn, from implementation of the learning coach, the hyperparameters for the auto-associative memory 104 .
- the associate memory 104 comprises an ensemble of machine learning systems
- different sets of cores in the first processor unit 302 A may be responsible for different ensemble members.
- One or more host processors 310 may coordinate and control the processor units 302 A-B.
- the system 300 could be implemented with one processor unit 302 .
- the processor units could be co-located or distributed.
- the processor units 302 may be interconnected by data networks, such as a LAN, WAN, the Internet, etc., using suitable wired and/or wireless data communication links. Data may be shared between the various processing units 302 using suitable data links, such as data buses (preferably high-speed data buses) or network links (e.g., Ethernet).
- the software for the various compute systems described herein and other computer functions described herein may be implemented in computer software using any suitable computer programming language such as .NET, C, C++, Python, and using conventional, functional, or object-oriented techniques.
- Programming languages for computer software and other computer-implemented instructions may be translated into machine language by a compiler or an assembler before execution and/or may be translated directly at run time by an interpreter. Examples of assembly languages include ARM, MIPS, and x86; examples of high level languages include Ada, BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal, Object Pascal, Haskell, ML; and examples of scripting languages include Bourne script, JavaScript, Python, Ruby, Lua, PHP, and Perl.
- FIG. 4 is a drawing of an example of a type of neural network such as might be used to implement the auto-associative memory unit 104 in FIG. 1 .
- This example neural network is a recurrent neural network.
- this example network and the auto-associative memory unit 104 have a special architecture that makes them easier to train than many other types of recurrent neural networks.
- a neural network comprises a network of nodes organized into layers, a layer of input nodes, zero or more inner or “hidden” layers of nodes, and a layer of output nodes.
- An inner layer may also be called a “hidden layer.”
- a given node in the output layer or in an inner layer is connected to one or more nodes in lower layers by means of a directed arc from the node in the lower layer to the given higher layer node.
- a directed arc is an arc where direction matters, as opposed to undirected arcs.
- the directed arcs are each associated with a trainable parameter, called its weight, which represents the strength of the connection from the lower node to the given higher node (or from an output node to its corresponding input node for the directed arcs from the output nodes to the input nodes).
- a trainable parameter is also called a “learned” parameter.
- Each node is also associated with an additional learned parameter called its “bias.”
- the weight associated with an arc from an output node to the corresponding input node implicitly has the value 1.0 and there is no learned parameter for these particular output-to-input arcs, as opposed to other arcs in the network that will have trainable (or learned) weights.
- the neural network in FIG. 4 and the auto-associative memory unit 104 has a set of target values, with a target objective for each output node.
- a neural network in which there is no cycle of directed arcs leading from a node back to itself is called a “feed-forward” network.
- a neural network in which there is a cycle of directed arcs is called a “recurrent neural network.”
- the cycle for the recurrent neural network are the directed arcs from the output nodes to their corresponding input nodes, as shown in FIG. 4 . That is, in preferred embodiments of the present invention, and as explained herein, there are no directed arcs from a higher numbered layer to a lower numbered layer, except for the directed arcs from the output nodes to their corresponding input nodes.
- a recurrent neural network R may be “unrolled” by making a sequence of copies of the neural network R(t) for each value oft in ⁇ 0, 1, . . . , T ⁇ .
- the index t counts the number of rounds of recursion.
- the value of T is the number of rounds of recursion until the recursion is stopped at convergence, at an infinite cycle, or by some other stopping criterion. Unrolling of a recurrent neural network R is depicted in FIG. 5 .
- a network R t outputs value h t in response to some input x t .
- the recurrent nature of the network R t is shown by the loop from the top of the network R t to the bottom on the network R t , depicting, as per the preferred embodiment of the present invention, the directed arcs from the output layer of the recurrent neural network to the associated nodes in the input layer.
- the recurrent neural network R t on the left side of the equation in FIG. 5 is unrolled into a sequence of copies of R 0, . . . t , as shown on the right side of the equation of FIG. 5 .
- the recurrent nature of the recurrent neural network is depicted by the directed arc from the output layer of prior copy of R to the input layer of the next copy of R (e.g., the directed arc from the output layer of R 0 to the input layer of R 1 , and so on). Note that the directed arcs from the output layer of prior copy of R to the input layer of the next copy of R show the feed forward activation direction; back propagation for training is in the opposite direction.
- each copy R(t) also has its own copy of the target objective as well as the back propagation from the input to network R(t+1) back to the output from R(t). This feature is a special feature of this recurrent neural network architecture and is not present in unrolled recurrent neural networks in general.
- a feed-forward neural network or an unrolled recurrent neural network may be trained using an iterative training process called stochastic gradient descent with a gradient estimate and learned parameter update for each minibatch of training data.
- An epoch of this iterative training process comprises a minibatch update for all the minibatches in the full batch of training data.
- the estimate of the gradient of the objective function for each minibatch may be computed by accumulating an estimate of the gradient of the objective function for each training data item in the minibatch.
- the estimate of the gradient for a training data item may be computed by a feed-forward computation of the activation of each node in the network followed by a backwards computation of the partial derivatives of the objective function based on the chain rule of calculus.
- the backwards computation of the partial derivatives of the objective function is called “back propagation.”
- Stochastic gradient descent including the feed-forward computation, the back propagation of partial derivatives of an objective function, and unrolling a recurrent neural network are all well-known to those skilled in the art of training neural networks.
- each copy R(t) of the network receives a back propagated partial derivative from its own copy of the objective function as well as a back propagated partial derivative from the input nodes of the next copy of the network R(t+1).
- the partial derivatives from the input nodes of copy R(t+1) comprises the combined partial derivatives from objective t+1 and all higher numbered copies R(t+k) up to copy R(T).
- the unrolled feed-forward network is only an approximate model to the recursive neural network because in the activation computation of the nodes in a cycle in the recursive network can, in principle go around the cycle an infinite number of times.
- a problem called “vanishing gradient” may occur for an unrolled recurrent neural network with too large a value of T.
- the magnitudes of the back propagated partial derivative may decrease by roughly a multiplicative factor for a factor less than 1.0 for each round of recursion, producing an exponential decay in the magnitudes of the partial derivatives.
- the present invention is directed to computer systems and computer-implemented methods for recursively training a content-addressable auto-associative memory.
- the system comprises a set of processor cores and computer memory that is in communication with the set of processor cores.
- the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train the content-addressable auto-associative memory system with a plurality of learned parameters and with a plurality of input examples, where each input example is represented by a plurality of input variables, such that: (i) the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and (ii) a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables.
- the software causes the set of processor cores to train the content-addressable auto-associative memory system such that the quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
- the present invention is directed to computer systems and computer-implemented methods for recursively training a recurrent neural network with a plurality of input examples.
- the computer system comprises a set of processor cores and computer memory in communication with the set of processor cores.
- the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train a recurrent neural network with a plurality of input examples, such that: (i) the recurrent neural network comprises a deep neural network that comprises N+1 layers, numbered 0, . . .
- the recurrent neural network is trained to produce an output pattern for each of the input examples;
- a target for the output pattern for each input example is the input example;
- the recurrent neural network comprises a plurality of directed arcs, wherein at least some of the directed arcs are between a node in one layer of the recurrent neural network and a node in another layer of the recurrent neural network.
- the software stored by the computer memory causes the set of processor cores to recursively train the recurrent neural network such that: (i) the recurrent neural network is trained to produce an output pattern for each of the input examples; and (ii) a quantity of learned parameters for the recurrent neural network is equal to the number of input variables times a quantity that is independent of the number of input variables.
- the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, such that the quantity of learned parameters for the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, can be varied based on the number of input examples to be learned. Also, the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, by back propagating partial derivatives of a loss function through the content-addressable auto-associative memory system or the recurrent neural network.
- the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, by for each input example: randomly transforming the input example; and recursively providing the randomly transformed input example to the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, for training, until an output of the content-addressable auto-associative memory system converges to the input example.
- the random transformations of the input examples can comprise one or more of: translating the input example; rotating the input example; linearly transforming the input example; degrading the input example; and subsampling the input example.
- the computer memory may store software that when executed by the set of processor cores further causes the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, with negative input examples.
- the negative input examples may comprise input examples where the output of the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, in operation, does not converge to an input example.
- At least some of the input examples are labeled examples that have, for each such input example, a classification category label such that the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, is trained to act as a classifier.
- the classification category labels may comprise error-correcting encoding.
- embodiments of the auto-associative memory system described herein can be content-addressable in the sense that an item in the memory may be retrieved by a description or example rather than by knowing the address of the item in memory.
- the auto-associative memory is associative in that an item can be retrieved with a query based on an example of an associated item rather than by an example of the item itself.
- the auto-associative memory is also robust in that an item can be retrieved from a query that is a transformed, distorted, noisy version of the item to be retrieved.
- the auto-associative memory can be use, for example, to retrieve images, documents, acoustic files, etc. from inputs, which can be small pieces of the images, documents, acoustic files, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Neurology (AREA)
- Evolutionary Biology (AREA)
- Computational Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mathematical Optimization (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
- Machine Translation (AREA)
- Feedback Control In General (AREA)
- Hardware Redundancy (AREA)
- Debugging And Monitoring (AREA)
Abstract
Computer systems and computer-implemented methods recursively train a content-addressable auto-associative memory such that: (i) the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and (ii) a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables. The quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
Description
- The present application claims priority to U.S. provisional patent application Ser. No. 62/564,754, entitled “Aggressive Development with Cooperative Generators,” filed Sep. 28, 2017, which is incorporated herein by reference in its entirety.
- In information retrieval and in data science it is often necessary to retrieve an item from a large collection of data without knowing where the data item is stored. Often there is only an incomplete, partial description of the item being sought. The data collection may contain billions or even trillions of items, so an exhaustive search of the collection may be prohibitively expensive and time consuming. In some cases, it is not known whether an exact match of a desired item is in the collection. In some cases, it is merely necessary to find any item that matches a query to some degree.
- A memory system that can retrieve a data item based on a description of its content rather than by knowing its location is called a “content addressable” memory. If, furthermore, the retrieval can be done in spite of partial or imperfect knowledge of the contents, the memory system is said to be a “robust content addressable” memory. For example, a Hopfield network may be used as a robust content addressable memory. In addition, such a Hopfield network is auto-associative. A variation on a Hopfield network, called a “Bi-directional associative memory (BAM)” is hetero-associative. However, the number of learned parameters (a measure of memory capacity) in a Hopfield network is just the number of undirected arcs, which is to say, the number of unordered pairs of input variables. That is, the number of learned parameters of a Hopfield network is roughly one-half the square of the number of input variables. For a BAM, the number of learned parameters is the product of the number of input variables times the number of output variables.
- There is no flexibility in the number of learned parameters for these networks. In either case, the capacity of the memory is determined by the number of input and output variables rather than by the number of data items to be learned. In an auto-associative memory, the number of scalar values to be represented is the product of the number of data items times the number of input variables. Thus, with say, 100 input variables, a Hopfield network does not have the capacity to learn a data base with say, 100,000,000 data items. On the other hand, a high-resolution color image may have twenty-million pixels in three colors, so the number of learned parameters in a Hopfield network would be on the order of two hundred trillion for a black-and-white image or sixteen quadrillion for the full color image, which would be totally impractical.
- In addition, Hopfield networks and BANIs are connected via undirected arcs and, unlike deep layered neural networks, do not have hidden layers and are not trained by the highly successful back propagation computation commonly used in deep learning.
- The present invention, in one general aspect, uses a machine learning system, such as a deep neural network, to implement a robust, content-addressable auto-associative memory system. The memory is content-addressable in the sense that an item in the memory system may be retrieved by a description or example rather than by knowing the address of the item in memory. The memory system is robust in that an item can be retrieved from a query that is a transformed, distorted, noisy version of the item to be retrieved. An item may also be retrieved based on an example of only a small portion of the item. The associative memory system may also be trained to be a classifier. The memory system is recurrent and auto-associative because it operates by feeding its output back to its input.
- The memory system may be based on a layered deep neural network with an arbitrary number of hidden layers. The number of learned parameters may be varied based on the number of data items to be learned or other considerations. Embodiments based on such deep neural networks may be trained by well-known deep learning techniques such as stochastic gradient descent based on back propagation.
- These and other benefits realizable with the present invention will be apparent from the description below.
- Various embodiments of the present invention are described herein by way of example in conjunction with the following figures.
-
FIG. 1 is a block diagram of an illustrative embodiment of the invention described herein. -
FIG. 1A is a block diagram of another aspect of an illustrative embodiment of the invention. -
FIG. 1B is a flow chart of a process for training the associative memory according to various embodiments of the present invention. -
FIGS. 2A and 2B are flow charts of aspects of an illustrative process for training various embodiments of the invention. -
FIG. 3 is a diagram of a computer system for implementing various embodiments of the invention. -
FIG. 4 is a diagram of a type of neural network that may be used in various embodiments of the invention. -
FIG. 5 is a diagram depicting unrolling of a recurrent network. -
FIG. 1 is an illustrative embodiment of a robust auto-associative memory system 100 with corrective training. Theassociative memory unit 104 is a large machine learning system, for example a deep neural network, such as the example deep neural network illustrated inFIG. 4 . Themachine learning system 104, by itself, may be a deep feed-forward neural network. It becomes recurrent and auto-associative because of thefeedback 99 from the output prediction of the full pattern (105) back to the input tomachine learning system 104. Because themachine learning system 104 may, for example, be a layered deep, feed-forward neural network, it may have an arbitrary number of learned parameters. The learned parameters may comprise trainable parameters for themachine learning system 104, such as the weights of directed arcs in a deep, feed-forward neural network and/or activation biases for nodes in the network. In an auto-associative memory, the number of learned parameters needs to be the number of data items times the number of variables in each data item, perhaps times a small multiple (e.g., a number greater than but close to 1.0) to allow for redundancy and robustness. The number of learned parameters in themachine learning system 104 may be adjusted to meet this requirement. In particular, the number of learned parameters may be based in part on the number of data items to be learned (times the number of input variables). The factor that is based on the number of data items may be either much smaller than the number of input or output variables or may be much larger than the number of input or output variables. Furthermore, extra learned parameters may be used to achieve better performance. In addition, themachine learning system 104 may have any number of layers enabling it to compute arbitrarily complex functions, for example to better model and compensate for transformations and degradations to the input. If themachine learning system 104 is a deep neural network, the recurrence due to the feedback from theoutput 105 back to the input ofnetwork 104 may be unrolled by making multiple copies of thenetwork 104, producing a single, large feed-forward neural network. This feed-forward deep neural network has directed arcs rather than undirected arcs as in a Hopfield network or in a bi-directional associative memory (BAM). Thus, it may be trained by stochastic gradient descent computed by feed-forward activation and back propagation of the partial derivatives of the objective. Deep neural networks and the methods of stochastic gradient descent, unrolling the recurrence, feed-forward activation, and back propagation of partial derivatives are discussed in association withFIG. 4 . Embodiments of themachine learning system 104 other than deep neural networks may also have as many learned parameters as necessary, including having the number of learned parameters be based in part on the number of data items to be learned. - The task of an auto-associative memory is to memorize its training data. A robust auto-
associative memory 100 not only memorizes its training data, it is able to retrieve an example from its training data given only a partial input or a degraded input. InFIG. 1 , the dashed arrows fromblocks associative memory 104 and the solid arrows represent feedforward activation, with backpropagation in the opposite direction from feedforward activation. If theassociative memory unit 104 is not a deep neural network, a similar training process may be used if the machine learning system implementing theassociative memory 104 can be trained to avoid producing negative examples as well as to produce positive examples. - An illustrative example of a
computer system 300 that may perform the computations associated withFIG. 1 and other figures is shown inFIG. 3 . -
FIG. 1 shows a system diagram of a robust auto-associative memorymachine learning system 100 configured for an illustrative embodiment of an aspect of the invention in which acomputer system 300 performs preliminary training of a robustassociative memory 104, with a moderate level of transformations and data augmentations ofinput 101 optionally performed by thecomputer system 300 instep 102. A training process with more extreme transformations and data augmentations is discussed in association with an illustrative flow chart inFIGS. 2A and 2B . In either the preliminary training process illustrated inFIG. 1 or the process illustrated inFIGS. 2A and 2B , the number of learned parameters in the associative memory unit may be increased as necessary to achieve the required performance. For example, the number of learned parameters may be varied proportional to the number of data items to be learned. If theassociative memory unit 104 is a deep neural network, the number of nodes in a layer may be increased or the number of layers may be increased. An illustrative example of an associative memory machine learning system during operational use is shown inFIG. 1A . - In the illustrative embodiment of
FIG. 1 , thecomputer system 300 presents each training example (101) multiple times to theassociative memory 104, each time with zero or more randomly generated transformations, degradations, and subsampling (102). As indicated by the dotted arrow from 101 to 104, in some embodiments, there is no transformation done instep 102. In some embodiments, the original input data pattern (101) may be transformed (102) by translation, rotation, or any linear transformation. In some embodiments, the original data (101) may be degraded (102) by additive or multiplicative noise. In some embodiments, the input (101) may be changed in other ways that are used for data augmentation (102) in training classifiers. Data augmentation is well-known to those skilled in the art of training machine learning systems. In some embodiments, only a subset of the original input variables (101) or only a subset of the transformed input variables (102) are retained for the next transformation or to be sent to theassociative memory 104. In some embodiments, each of these types of transformations may be used in combination with others, with each types of transformation possibly used multiple times, as indicated by the arrow fromblock 102 back to itself. - In some embodiments, the amount of a translation, rotation, or other transformation at
step 102 may be characterized by a parameter, such as the distance of the translation or the angle of the rotation. This transformation-characterizing parameter may directly be controlled as a hyperparameter or its maximum magnitude may be controlled by a hyperparameter. Hyperparameter values may be set by the system designer or may be controlled by a second machine learning system, called a “learning coach.” A learning coach may also adjust the number of learned parameters based on the number of data items to be learned, such as, for example, by adding nodes and/or layers, to thereby increase the number of arcs that need weights and the number of nodes that need biases. - A learning coach is a second machine learning system that is trained to help manage the learning process of a first machine learning system. Learning coaches are described in more detail in the following applications, which are incorporated herein by reference in their entirety: published PCT application Pub. No. WO 2018/063840 A1, published Apr. 5, 2018, entitled “LEARNING COACH FOR MACHINE LEARNING SYSTEM”; and PCT Application No. PCT/US18/20887, filed Mar. 5, 2018, entitled “LEARNING COACH FOR MACHINE LEARNING SYSTEM.”
- In the embodiment illustrated in
FIG. 1 , thecomputer system 300 trains the robust auto-associative memorymachine learning system 100 to predict the fulloriginal pattern 105 with the original, untransformed input data example as itstarget 106. Thecomputer system 300 also feeds back (via line 99) predictedpattern 105 as input to theassociative memory 104. That is, theassociative memory unit 104 is a recursive system, with its output repeatedly fed back to its input. The objective of the recursion is to produce a better match between the predictedpattern 105 and thetarget 106 with each round of the recursion. With each round of feedback, thecomputer system 300 causes theassociative memory unit 104 to refine itsprediction 105 of the full,undegraded pattern 106. In this recursion, thecomputer system 300 may cause theassociative memory 104, for example, to recover some of the missing parts of thetarget 106 and to remove part of the noise and distortion from thepattern 105 in the first round of the recursion. With that more complete, somewhat cleaner, input, thecomputer system 300 may then cause theassociative memory unit 104 to recover more of theoriginal pattern 106 in the next round, and so on, until a stopping point is met, i.e., the output of theassociative memory 104 satisfactorily matches the original pattern. The memorized patterns are the fixed points of this recursive process. In some embodiments, examples of theinput 101 and of theprediction 105 for later rounds of the recursion are saved by thecomputer system 300 instorage 109. The choice of when a pattern should be saved may be made by fixed rules set by the system developer or may be made by a learning coach. - In that connection,
FIG. 1B is a flow chart illustrating the training process of theassociative memory system 100 according to various embodiments. Atstep 120, thecomputer system 300 selects the first epoch of training data. Atstep 121, thecomputer system 300 selects one of the training examples from the original epoch and transforms the selected training example atstep 122. As described above in connection withblock 102 ofFIG. 1 , the transformation atstep 122 may comprise a transformation, augmentation, sub sampling, etc. of the selected training example. Atstep 123, thecomputer system 300 trains theassociative memory unit 104 with the transformed training example. Step 123 may be repeated recursively as described above, as indicated by the feedback loop fromstep 124, until the stopping point is met (e.g., the output of theassociate memory unit 104 satisfactorily matches the training example). Once the stopping point is met for training the associative memory for the transforming training example, the process advances to step 125. Atstep 125, if the selected training example is to be transformed in a different way, the process returns to step 122, where the training example selected atstep 121 is transformed in a second (or additional) random manner, and then theassociative memory 104 is recursively trained at steps 123-124 with the training example transformed in a second (or additional manner). This second, outer loop (i.e., steps 122 to 125) can be repeated for a desired number of transformations to a training example. That number may be a fixed number (e.g., set by the system designer) or controller by a hyperparameter. Step 125 is also optional. That is, in other embodiments, a training item selected atstep 121 is only transformed once for training the associative memory. - If there are no more transformations to be made to the selected training example at
step 125, or ifstep 125 is omitted, the process advances to step 126, where, as shown by the feedback loop from step 126 back to step 121, the process is repeated for the next training example in the epoch, and so on until the process has been performed for all of the training examples in the epoch. The process is then repeated for multiple epochs until the training has converged or another stopping criterion, such as a specified number of epochs, is met, as indicated by the feedback loop from thedecision step 127 back to step 120. - The use of the fixed points of the recursive process to represent memorized data items results in the robustness and other remarkable properties for the robust auto-
associative memory system 100. For example, with the recursive process, an entire complex image may be recovered from a small piece of the image if the small piece occurs in only one image in the set of images memorized by the robust auto-associative memory unit 100. Similarly, a memorized document (e.g., word processing document, pdf file, spreadsheet, presentation, etc.) may be recovered from a small, unique portion of text; or an audio recording may be recovered from a small interval of sound. As already mentioned, the auto-associative memory system 100 may be trained to be robust against a wide variety of transformations or noise. As another example, a work of art or a photograph may be retrieved from a memorized database from a sketch-like query. This robustness and other properties are further enhanced by another training process for which an illustrative embodiment is discussed in association withFIGS. 2A and 2B . - The training process illustrated in
FIGS. 1 and 1B also may include negative feedback from negative examples, such as 107. For example, the recursive function implemented by theassociative memory 104 may have other fixed points in addition to the memorized training data examples 101. In some embodiments, thecomputer system 300 may train theassociative memory unit 104 to eliminate such extra fixed points by training it not to generate them as output by negative feedback from the undesired fixed points as negative examples 107. Another example of use of negative feedback is for the computer system to cause the auto-associative memory unit 104 to forget or erase the memory of a pattern it has been previously trained to remember. Another example, in some embodiments of this invention, thecomputer system 300 can train a separate associative memory (not shown) to learn all the patterns in one classification category but not to learn examples from any other category. The negative examples then give negative feedback to the separate associative memory when it outputs a pattern that matches a different category than the intended category, for example, as judged by an independent classifier (not shown). Other uses of negative examples are discussed in association withFIGS. 1A, 2A and 2B . - Normal feedback may be represented by a loss function that has its minimum at a data item that is an intended target or positive example. Negative feedback may be represented by a loss function that has its maximum at a data item that is a negative example. For either type of feedback, the partial derivatives of the loss function may be computed by
computer system 300 using back propagation if theassociative memory 104 is a neural network. Back propagation is well-known to those skilled in the art of training neural networks and is discussed in association withFIG. 4 . If theassociative memory unit 104 is some other type of machine learning system that supports training on negative examples, then preferablycomputer system 300 may train theassociative memory unit 104 on negative examples 107 by whatever method is appropriate for the type of machine learning system being used in theassociative memory 104. - In some embodiments, the number of learned parameters for the
associative memory 104 may be adjusted during the learning process. For example, such a change may be based on testing the performance of theassociative memory 104 on new data not used in the training. The number of learned parameters may be increased to increase the capacity of thememory 104 or the number of learned parameters may be decreased if testing reveals that there is spare capacity. Where, for example, theassociative memory 104 comprises a deep neural network, additional learned parameters may be added by adding additional directed arcs, by adding additional nodes, or by adding additional layers to the network. -
FIG. 1A is a system diagram of the robust auto-associative memorymachine learning system 100 ofFIG. 1 in operational use.FIG. 1A is an illustration of the same system as illustrated inFIG. 1 , except configured for operational use rather than for training. In particular,components FIG. 1A are the same ascomponents FIG. 1 . However, as shown inFIG. 1A , during operational use, thecomputer system 300 provides anyinput 103 to theassociative memory 104, not just trainingdata items 101 and theirtransformations 102, since in operational use, the purpose of the associative memory is to output apattern 105 that corresponds to a pattern from its training examples 101 (seeFIG. 1 ) in response to anyinput 103. - In the embodiment illustrated in
FIG. 1A , thecomputer system 300 provides theinput 103 directly to theassociative memory 104. That is, there is no distortion, etc., as inblock 102 ofFIG. 1 . Having presented theassociative memory 104 with aninput data item 103, thecomputer system 300 computes the output ofassociative memory 104. For example, if associative memory is implemented as a deep neural network, thecomputer system 300 does a feed-forward activation computation from theinput 103 to compute the activations of the nodes in thenetwork 104 and then the activations of theoutput nodes 105. The feed-forward activation computation is explained in association withFIG. 4 and is well-known to those skilled in the art of deep neural networks. - Once
computer system 300 has computed theoutput 105 of theassociative memory 104, thecomputer system 300 applies theoutput 105 recursively as input to theassociative memory 104. Thecomputer system 300 repeats this recursion until the recursion converges or a stopping condition is met. A possible stopping condition is detection of an infinite cycle. An infinite cycle may be detected by observing that an output for a cycle is identical to the output for some previous cycle. In some embodiments, thecomputer system 300 may save an output example 109 at convergence or at any stage of the recursive process, including theinput 103, for use in later training or for detection of an infinite cycle. In some embodiments, all stages of the recursion process are saved. In some embodiments, examples to be saved may be selected by a learning coach. - Although the
associative memory 104 is a machine learning system, such as a deep neural network, its task is not to classify its input as in most machine learning tasks. Instead, as its name implies, theassociative memory 104 has the task of retrieving from its “memory” of the training data 101 (seeFIG. 1 ) atraining data item 105 that is “associated” withinput 103. The set ofinput items 103 that may be associated with a specifictraining data item 105 is the set ofinput items 103 such that the recursion converges to the specific training data item. - If the recursion converges to an output that is not a training data item, in some embodiments the
computer system 300 may save the output at convergence in 109 for later use in training as a negative example 107 ofFIG. 1 . In some embodiments, thecomputer system 300 may save some or all of the data items that occur in an infinite cycle as negative examples. - If the recursion converges to an output that is a training data example, the
computer system 300 may save the input and one or more of the intermediate stage outputs in association with the training data example as positive examples 108 for future training. - For either negative or positive examples transferred by
computer system 300 to block 108, if the learned parameters of theassociative memory 104 are being trained during operation, such as with adaptive training, then the computer system also saves in 108 a snapshot of the current values of the learned parameters of theassociative memory unit 104 and links the snapshot to the corresponding positive or negative example. - A negative example saved in
block 108 ofFIG. 1A is available to use as a negative example 107 inFIG. 1 . -
FIGS. 2A and 2B together represent a flow chart of an illustrative embodiment of another training process. The training process illustrated inFIGS. 2A and 2B may be based on a system such as theillustrative system 100 shown inFIG. 1 , but the training process illustrated inFIGS. 2A and 2B differs from the training process described above in association withFIG. 1 in several important aspects. For example, in the illustrative embodiment ofFIGS. 2A and 2B , the training process may make use of classification category labels and the robust auto-associative memory system 100 ofFIG. 1 may be used as a classifier. In another aspect, the training process illustrated inFIGS. 2A and 2B is an iterative process that actively increases the amount or degree of thetransformations 102 applied to theinput 101 and tests to see if the amount or degree of atransformation 102 is too great. -
FIG. 2B illustrates the iterative aspect of the training process.FIG. 2A illustrates some aspects of the initial part of the training process and some aspects that apply to the training process as a whole rather than to individual steps in the process. - In
step 200 ofFIG. 2A , thecomputer system 300 obtains a preliminary auto-associative memory system, such as thesystem 100 illustrated inFIG. 1 . Thecomputer system 300 may obtain the preliminary auto-associative memory system by using the training process discussed in association withFIGS. 1 and 1B , or may obtain the preliminary auto-associative memory from a previous use of the training process illustrated inFIGS. 2A and 2B . Preferably, the previous training of the preliminary auto-associative memory system 100 is done using the same training data set as to be used in the current training, or a subset of the training data set to be used in the current training. In some embodiments, the preliminary auto-associative memory system 100 may be trained to “forget” a previous training data example by using that previous example as a negative training example. In some embodiments, the preliminary auto-associative memory system 100 may be untrained; for example, it may be a deep neural network with random initialization of the connection weights. Such random initialization is well-known to those skilled in the art of training deep neural networks. - In
step 201, thecomputer system 300 obtains training data that is labeled with classification category labels. Thecomputer system 300 may store the training data in memory. Instep 202, thecomputer system 300 sets aside some of the training data obtained atstep 201. The data set aside atstep 202 may be used for development testing by a learning coach. - The
computer system 300 applies rules andprocedures procedure 203, thecomputer system 300 uses the auto-associative memory system 100 (e.g., the one obtained at step 200) as a classifier. As such, at least some of the training data items obtained atstep 201 are labeled with a classification category. In operation, the illustrative embodiment of the auto-associative memory system 100 shown inFIG. 1A accepts anyinput 103 and attempts to compute anoutput pattern 105 that closely matches a training data item. Instep 201, the computer system has obtained training data with classification category labels. In some embodiments ofprocedure 203, thecomputer system 300 includes the classification category or an identifier associated with the classification category as part of the input data item to the auto-associative memory unit 104 and retrieves that information as part of theoutput pattern 105. The identifier associated with a training data item may be a hash code or other index with check sums or error-correcting encoding. The computer system uses this redundant encoding to determine its estimate of the index of the actual item or of the classification category of theuntransformed input 101. Thus, in this illustrative embodiment, the auto-associative memory system 100 acts as a classifier. - In
procedure 204, thecomputer system 300 may control the maximum amount or degree to be allowed of a transformation or data augmentation instep 102 ofFIG. 1 . The maximum amount allowed may be represented by a hyperparameter “delta.” In some embodiments, the maximum magnitude of a translation may be a specified function of delta; the maximum degree of rotation may be some other specified function of delta; the maximum standard deviation of noise may be yet another specified function of delta; the maximum fraction of input variables that may be deleted may be a specified function of delta; and/or the maximum degree of each other transformation may be a specified function delta. In some embodiments, these specified functions may be set by the system developer. In some embodiments, a learning coach may be trained to optimize these specified functions. The hyperparameter delta as implemented in this illustrative embodiment ofprocedure 203 may be used instep 211 of the iterative training loop shown inFIG. 2B . - In operation of the auto-
associative memory system 100 in the embodiment illustrated inFIG. 1A , at convergence theoutput pattern 105 may fail to be any of the training data items. In this case, underprocedure 205 inFIG. 2A , thecomputer system 300 may train the auto-associative memory 100 using the unlabeled convergedoutput pattern 105 as a negative example. - After
steps procedures FIG. 2B , beginning atstep 211. Instep 211, thecomputer system 300 trains the auto-associative memory 100 with a larger value of delta than has been used in previous passes through the loop fromstep 211 to step 217, if there have been any previous passes. In some embodiments, the number of learned parameters may also be adjusted instep 211. The number of learned parameters may be either increased or decreased, as indicated by the development testing insteps - In the first pass through the loop from
step 211 to step 217, preferably delta is set at a small value. In some embodiments, in the first pass the value of delta is set to zero, representing that no transformations, data augmentations, or input variable deletions are to be performed instep 102 ofFIG. 1 . Instep 211 during later passes through the loop fromstep 211 to step 217, thecomputer system 300 increases the value of delta by an amount controlled by a hyperparameter. - In
step 212, thecomputer system 300 measures the performance of the auto-associative memory system 100 acting as a classifier. Preferably, this performance measurement is made on development data that has been set aside from the training data, as specified instep 202 ofFIG. 2A . - In
step 213, thecomputer system 300 compares the performance measured instep 212 with measures of performance from previous passes through the loop fromstep 211 to step 217 or from the preliminary auto-associative memory obtained instep 200 ofFIG. 2A . If there is degradation by more than a specified amount, thecomputer system 300 proceeds to step 214, otherwise the computer system returns to step 211. In some embodiments, thecomputer system 300 allows no degradation in performance instep 213. In other words, if there is possible further improvement (i.e., more than zero degradation), the process returns to step 211; otherwise, the process advances to step 214. In other embodiments, some degradation in performance is allowed but the limit on the amount of degradation in performance is measured against the best previous performance, not just against the performance of the immediately preceding pass through the loop fromstep 211 to step 217. - In
step 214, the computer system generates data according to step 102 ofFIG. 1 with various types of transformations, data augmentations, noise, and input variable sub sampling at or near the limiting value delta. Instep 214, the computer may also select data that was generated instep 211. In other words, instep 214, the computer system is generating or selecting transformed input examples that would not have been generated in the previous passes through the loop fromstep 211 to step 217 with smaller values of delta. Thus, these data examples would not have been generated in the previous iteration that had better performance than the performance measured instep 212 of the current pass through the loop. Some of the data examples generated or selected instep 214 in the current round may be responsible for the degradation in measured performance. Instep 214, the computer system is attempting to replicate on training data the kind of degradation in performance that was measured on development data instep 212. - In
step 215, the computer system classifies data generated instep 214 and counts as a misclassification anydata 102 that is classified with a category different from the category of theuntransformed data 101. Under control of hyperparameters, some or all of these misclassifications are used for training as negative examples. - In
step 216, the computer system again measures the performance of the auto-associative memory system 100 acting as a classifier with the performance measured on set-aside development data. In some embodiments the same set of set-aside development data may be used for performance comparisons instep 213 so that the performance difference does not depend on differences in the data on which the performance is measured. In some embodiments, a second set of set-aside development data is used to confirm the cumulative progress of multiple passes through the loop fromstep 211 to step 217. - In
step 217, the computer system again compares the performance of the current system with the previous performance. If there has been an improvement in performance, the computer system returns to step 211 to continue the process with a larger value of delta. If there has been no improvement in performance, the computer system backs up the learned parameter values to the best performance values previously obtained and terminates the training process. - In some embodiments there may be other remedial actions to reduce the classification errors caused when delta is increased. In such embodiments, the computer system may experimentally try some of these other remedial actions and return to step 214 multiple times to try to find an improvement and proceeding to step 211 before eventually deciding to proceed to step 218. For example, in some embodiments, a learning coach may modify the specified function of delta for one or more types of transformation eliminating or limiting the amount of increase in such a transformation. In some embodiments, the computer system may merely return to step 214 multiple times to generate more negative examples.
- Although the auto-
associative memory unit 104 has generally been described as a single neural network, such as shown inFIG. 4 , many variations are possible within the scope and spirit of the invention. As already mentioned, other types of machine learning systems may be used if they support back propagation of partial derivatives of the objective or if they merely support training against negative examples by some other means. As another example, the auto-associative memory unit 104 may be an ensemble of neural networks or other machine learning systems. As a particular example, with labeled training data, a separate auto-associative memory unit may be trained for each classification category with data exclusively consisting of the single classification category. -
FIG. 3 is a diagram of acomputer system 300 that could be used to implement the embodiments described above, such as the process described inFIG. 1 . The illustratedcomputer system 300 comprisesmultiple processor units 302A-B that each comprises, in the illustrated embodiment, multiple (N) sets ofprocessor cores 304A-N. Eachprocessor unit 302A-B may comprise on-board memory (ROM or RAM) (not shown) and off-board memory 306A-B. The on-board memory may comprise primary, volatile and/or non-volatile, storage (e.g., storage directly accessible by theprocessor cores 304A-N). The off-board memory 306A-B may comprise secondary, non-volatile storage (e.g., storage that is not directly accessible by theprocessor cores 304A-N), such as ROM, MDs, SSD, flash, etc. Theprocessor cores 304A-N may be CPU cores, GPU cores and/or AI accelerator cores. GPU cores operate in parallel (e.g., a general-purpose GPU (GPGPU) pipeline) and, hence, can typically process data more efficiently that a collection of CPU cores, but all the cores of a GPU execute the same code at one time. AI accelerators are a class of microprocessor designed to accelerate artificial neural networks. They typically are employed as a co-processor in a device with ahost CPU 310 as well. An AI accelerator typically has tens of thousands of matrix multiplier units that operate at lower precision than a CPU core, such as 8-bit precision in an AI accelerator versus 64-bit precision in a CPU core. - In various embodiments, the different processor cores 304 may train and/or implement different networks or subnetworks or components. For example, in one embodiment, the cores of the
first processor unit 302A may implement the auto-associative memory 104 and thesecond processor unit 302B may implement the learning coach. For example, the cores of thefirst processor unit 302A may train the machine learning system (e.g., neural network) of the auto-associative memory 104 according to techniques described herein, whereas the cores of thesecond processor unit 302B may learn, from implementation of the learning coach, the hyperparameters for the auto-associative memory 104. Further, where theassociate memory 104 comprises an ensemble of machine learning systems, different sets of cores in thefirst processor unit 302A may be responsible for different ensemble members. One ormore host processors 310 may coordinate and control theprocessor units 302A-B. - In other embodiments, the
system 300 could be implemented with one processor unit 302. In embodiments where there are multiple processor units, the processor units could be co-located or distributed. For example, the processor units 302 may be interconnected by data networks, such as a LAN, WAN, the Internet, etc., using suitable wired and/or wireless data communication links. Data may be shared between the various processing units 302 using suitable data links, such as data buses (preferably high-speed data buses) or network links (e.g., Ethernet). - The software for the various compute systems described herein and other computer functions described herein may be implemented in computer software using any suitable computer programming language such as .NET, C, C++, Python, and using conventional, functional, or object-oriented techniques. Programming languages for computer software and other computer-implemented instructions may be translated into machine language by a compiler or an assembler before execution and/or may be translated directly at run time by an interpreter. Examples of assembly languages include ARM, MIPS, and x86; examples of high level languages include Ada, BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal, Object Pascal, Haskell, ML; and examples of scripting languages include Bourne script, JavaScript, Python, Ruby, Lua, PHP, and Perl.
-
FIG. 4 is a drawing of an example of a type of neural network such as might be used to implement the auto-associative memory unit 104 inFIG. 1 . This example neural network is a recurrent neural network. However, this example network and the auto-associative memory unit 104 have a special architecture that makes them easier to train than many other types of recurrent neural networks. - In this discussion, a neural network comprises a network of nodes organized into layers, a layer of input nodes, zero or more inner or “hidden” layers of nodes, and a layer of output nodes. There is an input node associated with each input variable and an output node associated with each output variable. An inner layer may also be called a “hidden layer.” A given node in the output layer or in an inner layer is connected to one or more nodes in lower layers by means of a directed arc from the node in the lower layer to the given higher layer node. In this example network, there is also a directed arc from each output node back to the corresponding input node. A directed arc is an arc where direction matters, as opposed to undirected arcs. Note that there are only directed arc in the recurrent neural network shown in
FIG. 4 , as indicated by each arc between nodes having an arrow going from the lower level node to the higher level node, and from the output nodes to the input nodes. There are no undirected arcs in the recurrent neural network shown inFIG. 4 . - The directed arcs are each associated with a trainable parameter, called its weight, which represents the strength of the connection from the lower node to the given higher node (or from an output node to its corresponding input node for the directed arcs from the output nodes to the input nodes). A trainable parameter is also called a “learned” parameter. Each node is also associated with an additional learned parameter called its “bias.” In a preferred embodiment, the weight associated with an arc from an output node to the corresponding input node implicitly has the value 1.0 and there is no learned parameter for these particular output-to-input arcs, as opposed to other arcs in the network that will have trainable (or learned) weights. Other parameters that control the learning process are called “hyperparameters.” The neural network illustrated in
FIG. 4 has an input layer (“layer 0”), an output layer (“layer N” where N=6 in this example), and five hidden layers (layers 1 through N−1). In addition, the neural network inFIG. 4 and the auto-associative memory unit 104 has a set of target values, with a target objective for each output node. - A neural network in which there is no cycle of directed arcs leading from a node back to itself is called a “feed-forward” network. A neural network in which there is a cycle of directed arcs is called a “recurrent neural network.” In embodiments of the present invention, the cycle for the recurrent neural network are the directed arcs from the output nodes to their corresponding input nodes, as shown in
FIG. 4 . That is, in preferred embodiments of the present invention, and as explained herein, there are no directed arcs from a higher numbered layer to a lower numbered layer, except for the directed arcs from the output nodes to their corresponding input nodes. - For training purposes, a recurrent neural network R may be “unrolled” by making a sequence of copies of the neural network R(t) for each value oft in {0, 1, . . . , T}. In the case of the auto-
associative memory unit 104 inFIG. 1 and the example neural network shown inFIG. 4 , the index t counts the number of rounds of recursion. In this case, the value of T is the number of rounds of recursion until the recursion is stopped at convergence, at an infinite cycle, or by some other stopping criterion. Unrolling of a recurrent neural network R is depicted inFIG. 5 . On the left side of the equation, a network Rt outputs value ht in response to some input xt. The recurrent nature of the network Rt is shown by the loop from the top of the network Rt to the bottom on the network Rt, depicting, as per the preferred embodiment of the present invention, the directed arcs from the output layer of the recurrent neural network to the associated nodes in the input layer. - The recurrent neural network Rt on the left side of the equation in
FIG. 5 is unrolled into a sequence of copies of R0, . . . t, as shown on the right side of the equation ofFIG. 5 . As per the preferred embodiment of the present invention, the recurrent nature of the recurrent neural network is depicted by the directed arc from the output layer of prior copy of R to the input layer of the next copy of R (e.g., the directed arc from the output layer of R0 to the input layer of R1, and so on). Note that the directed arcs from the output layer of prior copy of R to the input layer of the next copy of R show the feed forward activation direction; back propagation for training is in the opposite direction. - The copies of R0, . . . t, therefore collectively form a single large feed-forward neural network since each directed arc goes from a node in a higher numbered layer to a destination node in a lower numbered layer in copy R(t) not go to the node in its own copy R(t) (or from a node to itself or to a lower numbered destination node in its own layer), but rather have the directed arc go to the copy of the corresponding destination node in the next copy of the network R(t+1). Thus, in the unrolled network there are no cycles of directed arcs, so the unrolled network is a feed-forward network, as shown by the right side of
FIG. 5 . - In the auto-
associative memory unit 104 or the network shown inFIG. 4 , the only arcs that go from a higher numbered layer to a lower numbered layer are the directed arcs from the output nodes to the input nodes. In unrolling the recurrent neural network shown inFIG. 4 or the recurrent auto-associative memory unit 104, each copy R(t) also has its own copy of the target objective as well as the back propagation from the input to network R(t+1) back to the output from R(t). This feature is a special feature of this recurrent neural network architecture and is not present in unrolled recurrent neural networks in general. - A feed-forward neural network or an unrolled recurrent neural network may be trained using an iterative training process called stochastic gradient descent with a gradient estimate and learned parameter update for each minibatch of training data. An epoch of this iterative training process comprises a minibatch update for all the minibatches in the full batch of training data.
- The estimate of the gradient of the objective function for each minibatch may be computed by accumulating an estimate of the gradient of the objective function for each training data item in the minibatch. The estimate of the gradient for a training data item may be computed by a feed-forward computation of the activation of each node in the network followed by a backwards computation of the partial derivatives of the objective function based on the chain rule of calculus. The backwards computation of the partial derivatives of the objective function is called “back propagation.”
- Stochastic gradient descent, including the feed-forward computation, the back propagation of partial derivatives of an objective function, and unrolling a recurrent neural network are all well-known to those skilled in the art of training neural networks.
- In the auto-
associative memory unit 104 and in the neural network inFIG. 4 , in the feed-forward network created by unrolling the recurrent neural network (seeFIG. 5 ), each copy R(t) of the network receives a back propagated partial derivative from its own copy of the objective function as well as a back propagated partial derivative from the input nodes of the next copy of the network R(t+1). The partial derivatives from the input nodes of copy R(t+1) comprises the combined partial derivatives from objective t+1 and all higher numbered copies R(t+k) up to copy R(T). - In many recurrent neural network architectures, the unrolled feed-forward network is only an approximate model to the recursive neural network because in the activation computation of the nodes in a cycle in the recursive network can, in principle go around the cycle an infinite number of times. Furthermore, for many recurrent neural network architectures, during training a problem called “vanishing gradient” may occur for an unrolled recurrent neural network with too large a value of T. The magnitudes of the back propagated partial derivative may decrease by roughly a multiplicative factor for a factor less than 1.0 for each round of recursion, producing an exponential decay in the magnitudes of the partial derivatives.
- Having an objective for each unrolled copy R(t) of the network R and accumulating the combined back propagated partial derivatives from higher numbered copies of R prevents this form “vanishing gradient.” In addition, the number of rounds of recursion in the auto-associative memory is limited to a finite number because of convergence or some other stopping criterion for the recursion.
- In one general aspect, therefore, the present invention is directed to computer systems and computer-implemented methods for recursively training a content-addressable auto-associative memory. The system comprises a set of processor cores and computer memory that is in communication with the set of processor cores. The computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train the content-addressable auto-associative memory system with a plurality of learned parameters and with a plurality of input examples, where each input example is represented by a plurality of input variables, such that: (i) the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and (ii) a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables. In various implementations, the software causes the set of processor cores to train the content-addressable auto-associative memory system such that the quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
- In another general aspect, the present invention is directed to computer systems and computer-implemented methods for recursively training a recurrent neural network with a plurality of input examples. In such embodiments, the computer system comprises a set of processor cores and computer memory in communication with the set of processor cores. The computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train a recurrent neural network with a plurality of input examples, such that: (i) the recurrent neural network comprises a deep neural network that comprises N+1 layers, numbered 0, . . . , N, wherein N>3, and wherein
layer 0 is an input layer and layer N is an output layer of the recurrent neural network, and whereinlayers 1 to N−1 are between the input layer and the output layer; (ii) the recurrent neural network is trained to produce an output pattern for each of the input examples; (iii) a target for the output pattern for each input example is the input example; and (iv) the recurrent neural network comprises a plurality of directed arcs, wherein at least some of the directed arcs are between a node in one layer of the recurrent neural network and a node in another layer of the recurrent neural network. - In another implementation, the software stored by the computer memory causes the set of processor cores to recursively train the recurrent neural network such that: (i) the recurrent neural network is trained to produce an output pattern for each of the input examples; and (ii) a quantity of learned parameters for the recurrent neural network is equal to the number of input variables times a quantity that is independent of the number of input variables.
- In various implementations, the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, such that the quantity of learned parameters for the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, can be varied based on the number of input examples to be learned. Also, the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, by back propagating partial derivatives of a loss function through the content-addressable auto-associative memory system or the recurrent neural network. Also, the software can cause the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, by for each input example: randomly transforming the input example; and recursively providing the randomly transformed input example to the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, for training, until an output of the content-addressable auto-associative memory system converges to the input example. The random transformations of the input examples can comprise one or more of: translating the input example; rotating the input example; linearly transforming the input example; degrading the input example; and subsampling the input example.
- In still further implementations, the computer memory may store software that when executed by the set of processor cores further causes the set of processor cores to train the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, with negative input examples. The negative input examples may comprise input examples where the output of the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, in operation, does not converge to an input example.
- In still further implementations, at least some of the input examples are labeled examples that have, for each such input example, a classification category label such that the content-addressable auto-associative memory system or the recurrent neural network, as the case may be, is trained to act as a classifier. The classification category labels may comprise error-correcting encoding.
- Based on the above description, it is clear that embodiments of the auto-associative memory system described herein can be content-addressable in the sense that an item in the memory may be retrieved by a description or example rather than by knowing the address of the item in memory. Further, the auto-associative memory is associative in that an item can be retrieved with a query based on an example of an associated item rather than by an example of the item itself. The auto-associative memory is also robust in that an item can be retrieved from a query that is a transformed, distorted, noisy version of the item to be retrieved. The auto-associative memory can be use, for example, to retrieve images, documents, acoustic files, etc. from inputs, which can be small pieces of the images, documents, acoustic files, etc.
- The examples presented herein are intended to illustrate potential and specific implementations of the present invention. It can be appreciated that the examples are intended primarily for purposes of illustration of the invention for those skilled in the art. No particular aspect or aspects of the examples are necessarily intended to limit the scope of the present invention. Further, it is to be understood that the figures and descriptions of the present invention have been simplified to illustrate elements that are relevant for a clear understanding of the present invention, while eliminating, for purposes of clarity, other elements. While various embodiments have been described herein, it should be apparent that various modifications, alterations, and adaptations to those embodiments may occur to persons skilled in the art with attainment of at least some of the advantages. The disclosed embodiments are therefore intended to include all such modifications, alterations, and adaptations without departing from the scope of the embodiments as set forth herein.
Claims (49)
1. A computer system comprising:
a set of processor cores; and
computer memory in communication with the set of processor cores, wherein the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train a content-addressable auto-associative memory system with a plurality of learned parameters and with a plurality of input examples, wherein each input example is represented by a plurality of input variables, such that:
the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and
a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables.
2. The computer system of claim 1 , wherein the software causes the set of processor cores to train the content-addressable auto-associative memory system such that the quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
3. The computer system of claim 1 , wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the content-addressable auto-associative memory system by back propagating partial derivatives of a loss function through the content-addressable auto-associative memory system.
4. The computer system of claim 1 , wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the content-addressable auto-associative memory system by, for each input example:
randomly transforming the input example; and
recursively providing the randomly transformed input example to the content-addressable auto-associative memory system for training, until an output of the content-addressable auto-associative memory system converges to the input example.
5. The computer system of claim 4 , wherein the computer memory stores software that when executed by the set of processor cores further causes the set of processor cores to train the content-addressable auto-associative memory system with negative input examples.
6. The computer system of claim 5 , wherein the negative input examples comprise input examples where the output of the content-addressable auto-associative memory system, in operation, does not converge to an input example.
7-11. (canceled)
12. A computer system comprising:
a set of processor cores; and
computer memory in communication with the set of processor cores, wherein the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to recursively train a recurrent neural network with a plurality of input examples, such that:
the recurrent neural network comprises a deep neural network that comprises N+1 layers, numbered 0, . . . , N, wherein N>3, and wherein layer 0 is an input layer and layer N is an output layer of the recurrent neural network, and wherein layers 1 to N−1 are between the input layer and the output layer;
the recurrent neural network is trained to produce an output pattern for each of the input examples;
a target for the output pattern for each input example is the input example; and
the recurrent neural network comprises a plurality of directed arcs, wherein at least some of the directed arcs are between a node in one layer of the recurrent neural network and a node in another layer of the recurrent neural network.
13-15. (canceled)
16. The computer system of claim 12 , wherein the software causes the set of processor cores to train the recurrent neural network such that the only directed arcs in the recurrent neural network from a higher numbered layer to a lower numbered layer are from a node in the output layer N to a node in the input layer 0.
17. The computer system of claim 16 , wherein:
the output layer of the recurrent neural network comprise a plurality of output layer nodes;
the input layer of the recurrent neural network comprise a plurality of input layer nodes;
the quantity of input layer nodes equals the quantity of output layer nodes, such that each output layer node has one and only one corresponding input layer node; and
the only directed arcs in the recurrent neural network that are from a higher numbered layer to a lower numbered layer are directed arcs from the output layer to the input layer, wherein there is a directed arc from each output layer node to its associated input layer node.
18. The computer system of claim 12 , wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the recurrent neural network by back propagating partial derivatives of a loss function through the recurrent neural network.
19. The computer system of claim 13, wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the recurrent neural network by back propagating partial derivatives of a loss function through the recurrent neural network.
20. The computer system of claim 12 , wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the recurrent neural network by back propagating partial derivatives of a loss function through the recurrent neural network.
21. The computer system of claim 17 , wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to train the recurrent neural network by, for each input example:
randomly transforming the input example; and
recursively providing the randomly transformed input example to the content-addressable auto-associative memory system for training, until an output of the content-addressable auto-associative memory system converges to the input example.
22. The computer system of claim 21 , wherein the computer memory stores software that when executed by the set of processor cores causes the set of processor cores to transform an input example by performing a distortion on the input example that comprises a distortion selected from the group consisting of:
translating the input example;
rotating the input example;
linearly transforming the input example;
degrading the input example; and
subsampling the input example.
23. The computer system of claim 21 , wherein the transformations of the input examples are controlled by one or more hyperparameters.
24. The computer system of claim 23 , further comprising:
a second set of processor cores; and
second computer memory in communication with the second set of processor cores, wherein the second computer memory stores software that when executed by the second set of processor cores causes the second set of processor cores to implement a machine-learning learning coach that learns, through machine learning, values for the one or more hyperparameters.
25. The computer system of claim 21 , wherein the computer memory stores software that when executed by the set of processor cores further causes the set of processor cores to train the recurrent neural network with negative input examples.
26. The computer system of claim 25 , wherein the negative input examples comprise input examples where the output of the recurrent neural network, in operation, does not converge to an input example.
27. The computer system of claim 12 , wherein at least some of the input examples are labeled examples that have, for each such input example, a classification category label such that the recurrent neural network is trained to act as a classifier.
28. The computer system of claim 27 , wherein the classification category labels comprise error-correcting encoding.
29. The computer system of claim 12 , wherein:
the input examples are digital images; and
the recurrent neural network is trained to, in operation, retrieve one of the digital images in response to receiving as input a portion of the digital image.
30. The computer system of claim 12 , wherein:
the input examples are audio files; and
the recurrent neural network is trained to, in operation, retrieve one of the audio files in response to receiving as input a portion of the audio file.
31. The computer system of claim 12 , wherein:
the input examples are document files; and
the recurrent neural network is trained to, in operation, retrieve one of the document files in response to receiving as input a portion of the document file.
32. A method comprising:
recursively train, by a computer system that comprises a set of processor cores, a content-addressable auto-associative memory system with a plurality of learned parameters and with a plurality of input examples, wherein each input example is represented by a plurality of input variables, such that:
the content addressable auto-associative memory system is trained to produce an output pattern for each of the input examples; and
a quantity of the learned parameters for the content-addressable auto-associative memory is equal to the number of input variables times a quantity that is independent of the number of input variables.
33. The method of claim 32 , wherein training the content-addressable auto-associative memory system comprises raining the content-addressable auto-associative memory system such that the quantity of learned parameters for the content-addressable auto-associative memory system can be varied based on the number of input examples to be learned.
34. The method of claim 32 , wherein training the content-addressable auto-associative memory system comprises back propagating partial derivatives of a loss function through the content-addressable auto-associative memory system.
35. The method of claim 32 , wherein training the content-addressable auto-associative memory system comprises, for each input example:
randomly transforming the input example; and
recursively providing the randomly transformed input example to the content-addressable auto-associative memory system for training, until an output of the content-addressable auto-associative memory system converges to the input example.
36. The method of claim 35 , wherein training the content-addressable auto-associative memory system comprises training the content-addressable auto-associative memory system with negative input examples.
37. The method of claim 36 , wherein the negative input examples comprise input examples where the output of the content-addressable auto-associative memory system, in operation, does not converge to an input example.
38. The method of claim 35 , wherein at least some of the input examples are labeled examples that have, for each such input example, a classification category label such that the r content-addressable auto-associative memory system is trained to act as a classifier.
39. The method of claim 38 , wherein the classification category labels comprise error-correcting encoding.
40. The method of claim 32 , wherein:
the input examples are digital images; and
the content-addressable auto-associative memory system is trained to, in operation, retrieve one of the digital images in response to receiving as input a portion of the digital image.
41. The method of claim 32 , wherein:
the input examples are audio files; and
the content-addressable auto-associative memory system is trained to, in operation, retrieve one of the audio files in response to receiving as input a portion of the audio file.
42. The method of claim 32 , wherein:
the input examples are document files; and
the content-addressable auto-associative memory system is trained to, in operation, retrieve one of the document files in response to receiving as input a portion of the document file.
43. A method comprising:
training, recursively, by a computer system that comprises a set of processor cores, a recurrent neural network with a plurality of input examples, such that:
the recurrent neural network comprises a deep neural network that comprises N+1 layers, numbered 0, . . . , N, wherein N>3, and wherein layer 0 is an input layer and layer N is an output layer of the recurrent neural network, and wherein layers 1 to N−1 are between the input layer and the output layer;
the recurrent neural network is trained to produce an output pattern for each of the input examples;
a target for the output pattern for each input example is the input example; and
the recurrent neural network comprises a plurality of directed arcs, wherein at least some of the directed arcs are between a node in one layer of the recurrent neural network and a node in another layer of the recurrent neural network.
44. A method comprising:
training, recursively, by a computer system that comprises a set of processor cores, a recurrent neural network with a plurality of input examples, such that:
the recurrent neural network is trained to produce an output pattern for each of the input examples; and
a quantity of learned parameters for the recurrent neural network is equal to the number of input variables times a quantity that is independent of the number of input variables.
45-46. (canceled)
47. The method of claim 43 , wherein training the recurrent neural network comprises training the recurrent neural network such that the only directed arcs in the recurrent neural network from a higher numbered layer to a lower numbered layer are from a node in the output layer N to a node in the input layer 0.
48. The method of claim 47 , wherein:
the output layer of the recurrent neural network comprise a plurality of output layer nodes;
the input layer of the recurrent neural network comprise a plurality of input layer nodes;
the quantity of input layer nodes equals the quantity of output layer nodes, such that each output layer node has one and only one corresponding input layer node; and
the only directed arcs in the recurrent neural network that are from a higher numbered layer to a lower numbered layer are directed arcs from the output layer to the input layer, wherein there is a directed arc from each output layer node to its associated input layer node.
49. The method of claim 43 , wherein training the recurrent neural network comprises back propagating partial derivatives of a loss function through the recurrent neural network.
50. The method of claim 44 , wherein training the recurrent neural network comprises back propagating partial derivatives of a loss function through the recurrent neural network.
51. (canceled)
52. The method of claim 48 , wherein training the recurrent neural network comprises, for each input example:
randomly transforming the input example; and
recursively providing the randomly transformed input example to the content-addressable auto-associative memory system for training, until an output of the content-addressable auto-associative memory system converges to the input example.
53. The method of claim 52 , wherein transforming an input example by performing a distortion on the input example that comprises a distortion selected from the group consisting of:
translating the input example;
rotating the input example;
linearly transforming the input example;
degrading the input example; and
subsampling the input example.
54-55. (canceled)
56. The method of claim 52 , wherein training the recurrent neural network comprises training the recurrent neural network with negative input examples.
57-62. (canceled)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/646,071 US20200285948A1 (en) | 2017-09-28 | 2018-09-19 | Robust auto-associative memory with recurrent neural network |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762564754P | 2017-09-28 | 2017-09-28 | |
US16/646,071 US20200285948A1 (en) | 2017-09-28 | 2018-09-19 | Robust auto-associative memory with recurrent neural network |
PCT/US2018/051683 WO2019067281A1 (en) | 2017-09-28 | 2018-09-19 | Robust auto-associative memory with recurrent neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
US20200285948A1 true US20200285948A1 (en) | 2020-09-10 |
Family
ID=65807592
Family Applications (11)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/124,977 Active 2038-11-27 US10679129B2 (en) | 2017-09-28 | 2018-09-07 | Stochastic categorical autoencoder network |
US16/646,092 Active US11354578B2 (en) | 2017-09-28 | 2018-09-14 | Mixture of generators model |
US16/646,169 Active US11074506B2 (en) | 2017-09-28 | 2018-09-17 | Estimating the amount of degradation with a regression objective in deep learning |
US16/646,071 Abandoned US20200285948A1 (en) | 2017-09-28 | 2018-09-19 | Robust auto-associative memory with recurrent neural network |
US16/645,710 Abandoned US20200285939A1 (en) | 2017-09-28 | 2018-09-28 | Aggressive development with cooperative generators |
US16/646,096 Active US11074505B2 (en) | 2017-09-28 | 2018-09-28 | Multi-objective generators in deep learning |
US16/867,746 Active US11461661B2 (en) | 2017-09-28 | 2020-05-06 | Stochastic categorical autoencoder network |
US16/901,608 Active US11410050B2 (en) | 2017-09-28 | 2020-06-15 | Imitation training for machine learning systems with synthetic data generators |
US17/810,778 Active US11531900B2 (en) | 2017-09-28 | 2022-07-05 | Imitation learning for machine learning systems with synthetic data generators |
US17/815,851 Active US11687788B2 (en) | 2017-09-28 | 2022-07-28 | Generating synthetic data examples as interpolation of two data examples that is linear in the space of relative scores |
US18/196,855 Pending US20230289611A1 (en) | 2017-09-28 | 2023-05-12 | Locating a decision boundary for complex classifier |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/124,977 Active 2038-11-27 US10679129B2 (en) | 2017-09-28 | 2018-09-07 | Stochastic categorical autoencoder network |
US16/646,092 Active US11354578B2 (en) | 2017-09-28 | 2018-09-14 | Mixture of generators model |
US16/646,169 Active US11074506B2 (en) | 2017-09-28 | 2018-09-17 | Estimating the amount of degradation with a regression objective in deep learning |
Family Applications After (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/645,710 Abandoned US20200285939A1 (en) | 2017-09-28 | 2018-09-28 | Aggressive development with cooperative generators |
US16/646,096 Active US11074505B2 (en) | 2017-09-28 | 2018-09-28 | Multi-objective generators in deep learning |
US16/867,746 Active US11461661B2 (en) | 2017-09-28 | 2020-05-06 | Stochastic categorical autoencoder network |
US16/901,608 Active US11410050B2 (en) | 2017-09-28 | 2020-06-15 | Imitation training for machine learning systems with synthetic data generators |
US17/810,778 Active US11531900B2 (en) | 2017-09-28 | 2022-07-05 | Imitation learning for machine learning systems with synthetic data generators |
US17/815,851 Active US11687788B2 (en) | 2017-09-28 | 2022-07-28 | Generating synthetic data examples as interpolation of two data examples that is linear in the space of relative scores |
US18/196,855 Pending US20230289611A1 (en) | 2017-09-28 | 2023-05-12 | Locating a decision boundary for complex classifier |
Country Status (4)
Country | Link |
---|---|
US (11) | US10679129B2 (en) |
EP (3) | EP3688676A4 (en) |
CN (3) | CN111226232B (en) |
WO (3) | WO2019067236A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11531900B2 (en) | 2017-09-28 | 2022-12-20 | D5Ai Llc | Imitation learning for machine learning systems with synthetic data generators |
US20230342351A1 (en) * | 2022-04-26 | 2023-10-26 | Truist Bank | Change management process for identifying inconsistencies for improved processing efficiency |
US11836600B2 (en) | 2020-08-20 | 2023-12-05 | D5Ai Llc | Targeted incremental growth with continual learning in deep neural networks |
US11983162B2 (en) | 2022-04-26 | 2024-05-14 | Truist Bank | Change management process for identifying potential regulatory violations for improved processing efficiency |
Families Citing this family (164)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201718756D0 (en) * | 2017-11-13 | 2017-12-27 | Cambridge Bio-Augmentation Systems Ltd | Neural interface |
WO2018176000A1 (en) | 2017-03-23 | 2018-09-27 | DeepScale, Inc. | Data synthesis for autonomous control systems |
WO2018226492A1 (en) | 2017-06-05 | 2018-12-13 | D5Ai Llc | Asynchronous agents with learning coaches and structurally modifying deep neural networks without performance degradation |
WO2018226527A1 (en) | 2017-06-08 | 2018-12-13 | D5Ai Llc | Data splitting by gradient direction for neural networks |
EP3646252A4 (en) | 2017-06-26 | 2021-03-24 | D5Ai Llc | Selective training for decorrelation of errors |
WO2019005507A1 (en) | 2017-06-27 | 2019-01-03 | D5Ai Llc | Aligned training of deep networks |
US11157441B2 (en) | 2017-07-24 | 2021-10-26 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US10671349B2 (en) | 2017-07-24 | 2020-06-02 | Tesla, Inc. | Accelerated mathematical engine |
US11270188B2 (en) | 2017-09-28 | 2022-03-08 | D5Ai Llc | Joint optimization of ensembles in deep learning |
JP6886112B2 (en) * | 2017-10-04 | 2021-06-16 | 富士通株式会社 | Learning program, learning device and learning method |
US10671435B1 (en) | 2017-10-19 | 2020-06-02 | Pure Storage, Inc. | Data transformation caching in an artificial intelligence infrastructure |
US12067466B2 (en) | 2017-10-19 | 2024-08-20 | Pure Storage, Inc. | Artificial intelligence and machine learning hyperscale infrastructure |
US11861423B1 (en) | 2017-10-19 | 2024-01-02 | Pure Storage, Inc. | Accelerating artificial intelligence (‘AI’) workflows |
US11494692B1 (en) | 2018-03-26 | 2022-11-08 | Pure Storage, Inc. | Hyperscale artificial intelligence and machine learning infrastructure |
US10360214B2 (en) | 2017-10-19 | 2019-07-23 | Pure Storage, Inc. | Ensuring reproducibility in an artificial intelligence infrastructure |
US11455168B1 (en) * | 2017-10-19 | 2022-09-27 | Pure Storage, Inc. | Batch building for deep learning training workloads |
US11263525B2 (en) | 2017-10-26 | 2022-03-01 | Nvidia Corporation | Progressive modification of neural networks |
US11250329B2 (en) * | 2017-10-26 | 2022-02-15 | Nvidia Corporation | Progressive modification of generative adversarial neural networks |
US11763159B2 (en) * | 2018-01-29 | 2023-09-19 | International Business Machines Corporation | Mitigating false recognition of altered inputs in convolutional neural networks |
US11321612B2 (en) | 2018-01-30 | 2022-05-03 | D5Ai Llc | Self-organizing partially ordered networks and soft-tying learned parameters, such as connection weights |
US10832137B2 (en) | 2018-01-30 | 2020-11-10 | D5Ai Llc | Merging multiple nodal networks |
CN111602149B (en) | 2018-01-30 | 2024-04-02 | D5Ai有限责任公司 | Self-organizing partial sequence network |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
EP3707645A1 (en) * | 2018-02-09 | 2020-09-16 | Deepmind Technologies Limited | Neural network systems implementing conditional neural processes for efficient learning |
WO2019208564A1 (en) * | 2018-04-26 | 2019-10-31 | 日本電信電話株式会社 | Neural network learning device, neural network learning method, and program |
JP7002404B2 (en) * | 2018-05-15 | 2022-01-20 | 株式会社日立製作所 | Neural network that discovers latent factors from data |
US11151450B2 (en) * | 2018-05-21 | 2021-10-19 | Fair Isaac Corporation | System and method for generating explainable latent features of machine learning models |
US11797864B2 (en) * | 2018-06-18 | 2023-10-24 | Fotonation Limited | Systems and methods for conditional generative models |
US11215999B2 (en) | 2018-06-20 | 2022-01-04 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11403521B2 (en) * | 2018-06-22 | 2022-08-02 | Insilico Medicine Ip Limited | Mutual information adversarial autoencoder |
US11676026B2 (en) | 2018-06-29 | 2023-06-13 | D5Ai Llc | Using back propagation computation as data |
WO2020009881A1 (en) | 2018-07-03 | 2020-01-09 | D5Ai Llc | Analyzing and correcting vulnerabillites in neural networks |
US11195097B2 (en) | 2018-07-16 | 2021-12-07 | D5Ai Llc | Building ensembles for deep learning by parallel data splitting |
US11361457B2 (en) | 2018-07-20 | 2022-06-14 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11468330B2 (en) * | 2018-08-03 | 2022-10-11 | Raytheon Company | Artificial neural network growth |
US11501164B2 (en) | 2018-08-09 | 2022-11-15 | D5Ai Llc | Companion analysis network in deep learning |
US11074502B2 (en) | 2018-08-23 | 2021-07-27 | D5Ai Llc | Efficiently building deep neural networks |
US11010670B2 (en) | 2018-08-27 | 2021-05-18 | D5Ai Llc | Building a deep neural network with diverse strata |
US11037059B2 (en) | 2018-08-31 | 2021-06-15 | D5Ai Llc | Self-supervised back propagation for deep learning |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
US11593641B2 (en) * | 2018-09-19 | 2023-02-28 | Tata Consultancy Services Limited | Automatic generation of synthetic samples using dynamic deep autoencoders |
US11151334B2 (en) * | 2018-09-26 | 2021-10-19 | Huawei Technologies Co., Ltd. | Systems and methods for multilingual text generation field |
US11710035B2 (en) * | 2018-09-28 | 2023-07-25 | Apple Inc. | Distributed labeling for supervised learning |
KR20210072048A (en) | 2018-10-11 | 2021-06-16 | 테슬라, 인크. | Systems and methods for training machine models with augmented data |
US20200125924A1 (en) * | 2018-10-22 | 2020-04-23 | Siemens Aktiengesellschaft | Method and system for analyzing a neural network |
US11196678B2 (en) | 2018-10-25 | 2021-12-07 | Tesla, Inc. | QOS manager for system on a chip communications |
JP2020086479A (en) * | 2018-11-15 | 2020-06-04 | 株式会社日立製作所 | Calculator, construction method of neural network, and calculator system |
AU2019389175A1 (en) * | 2018-11-30 | 2021-06-10 | Caris Mpi, Inc. | Next-generation molecular profiling |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
WO2020129412A1 (en) * | 2018-12-17 | 2020-06-25 | ソニー株式会社 | Learning device, identification device, and program |
US11995854B2 (en) * | 2018-12-19 | 2024-05-28 | Nvidia Corporation | Mesh reconstruction using data-driven priors |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US11610098B2 (en) * | 2018-12-27 | 2023-03-21 | Paypal, Inc. | Data augmentation in transaction classification using a neural network |
US11928556B2 (en) * | 2018-12-29 | 2024-03-12 | International Business Machines Corporation | Removing unnecessary history from reinforcement learning state |
US11514330B2 (en) * | 2019-01-14 | 2022-11-29 | Cambia Health Solutions, Inc. | Systems and methods for continual updating of response generation by an artificial intelligence chatbot |
DE102019200565A1 (en) * | 2019-01-17 | 2020-07-23 | Robert Bosch Gmbh | Device and method for classifying data, in particular for a controller area network or an automotive Ethernet network. |
US10997461B2 (en) | 2019-02-01 | 2021-05-04 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11150664B2 (en) | 2019-02-01 | 2021-10-19 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US10510002B1 (en) * | 2019-02-14 | 2019-12-17 | Capital One Services, Llc | Stochastic gradient boosting for deep neural networks |
US10956755B2 (en) | 2019-02-19 | 2021-03-23 | Tesla, Inc. | Estimating object properties using visual image data |
US20200285947A1 (en) * | 2019-03-07 | 2020-09-10 | International Business Machines Corporation | Classical neural network with selective quantum computing kernel components |
US20200293942A1 (en) * | 2019-03-11 | 2020-09-17 | Cisco Technology, Inc. | Distributed learning model for fog computing |
US11704573B2 (en) * | 2019-03-25 | 2023-07-18 | Here Global B.V. | Method, apparatus, and computer program product for identifying and compensating content contributors |
DE102019206620A1 (en) * | 2019-04-18 | 2020-10-22 | Robert Bosch Gmbh | Method, device and computer program for creating a neural network |
US12112254B1 (en) | 2019-04-25 | 2024-10-08 | Perceive Corporation | Optimizing loss function during training of network |
CN110084863B (en) * | 2019-04-25 | 2020-12-25 | 中山大学 | Multi-domain image conversion method and system based on generation countermeasure network |
US11610154B1 (en) | 2019-04-25 | 2023-03-21 | Perceive Corporation | Preventing overfitting of hyperparameters during training of network |
US11531879B1 (en) | 2019-04-25 | 2022-12-20 | Perceive Corporation | Iterative transfer of machine-trained network inputs from validation set to training set |
US11900238B1 (en) * | 2019-04-25 | 2024-02-13 | Perceive Corporation | Removing nodes from machine-trained network based on introduction of probabilistic noise during training |
US11175959B2 (en) * | 2019-05-01 | 2021-11-16 | International Business Machines Corporation | Determine a load balancing mechanism for allocation of shared resources in a storage system by training a machine learning module based on number of I/O operations |
US11175958B2 (en) | 2019-05-01 | 2021-11-16 | International Business Machines Corporation | Determine a load balancing mechanism for allocation of shared resources in a storage system using a machine learning module based on number of I/O operations |
CN110096810B (en) * | 2019-05-05 | 2020-03-17 | 中南大学 | Industrial process soft measurement method based on layer-by-layer data expansion deep learning |
JP7202260B2 (en) * | 2019-06-07 | 2023-01-11 | 株式会社日立製作所 | HYPER-PARAMETER MANAGEMENT DEVICE, HYPER-PARAMETER MANAGEMENT SYSTEM AND HYPER-PARAMETER MANAGEMENT METHOD |
JP7328799B2 (en) * | 2019-06-12 | 2023-08-17 | 株式会社日立製作所 | Storage system and storage control method |
CN110113057B (en) * | 2019-06-12 | 2023-06-13 | 中国计量大学 | Polarization code decoder utilizing deep learning |
JP7116711B2 (en) * | 2019-06-14 | 2022-08-10 | 株式会社東芝 | Information processing device, information processing method, and computer program |
US12013962B1 (en) * | 2019-07-03 | 2024-06-18 | Intuit Inc. | Automatic entry validation using density based clustering |
US11514311B2 (en) * | 2019-07-03 | 2022-11-29 | International Business Machines Corporation | Automated data slicing based on an artificial neural network |
EP3767533A1 (en) * | 2019-07-17 | 2021-01-20 | Robert Bosch GmbH | A machine learnable system with normalizing flow |
US20220335085A1 (en) * | 2019-07-30 | 2022-10-20 | Nippon Telegraph And Telephone Corporation | Data selection method, data selection apparatus and program |
US11443137B2 (en) | 2019-07-31 | 2022-09-13 | Rohde & Schwarz Gmbh & Co. Kg | Method and apparatus for detecting signal features |
WO2021040944A1 (en) * | 2019-08-26 | 2021-03-04 | D5Ai Llc | Deep learning with judgment |
US20220327379A1 (en) * | 2019-09-02 | 2022-10-13 | Nippon Telegraph And Telephone Corporation | Neural network learning apparatus, neural network learning method, and program |
EP3789924A1 (en) * | 2019-09-09 | 2021-03-10 | Robert Bosch GmbH | Stochastic data augmentation for machine learning |
JP7392366B2 (en) * | 2019-10-01 | 2023-12-06 | 富士通株式会社 | Optimal solution acquisition program, optimal solution acquisition method, and information processing device |
US11586912B2 (en) * | 2019-10-18 | 2023-02-21 | International Business Machines Corporation | Integrated noise generation for adversarial training |
EP3816864A1 (en) * | 2019-10-28 | 2021-05-05 | Robert Bosch GmbH | Device and method for the generation of synthetic data in generative networks |
CN111008277B (en) * | 2019-10-30 | 2020-11-03 | 创意信息技术股份有限公司 | Automatic text summarization method |
SG11202010803VA (en) * | 2019-10-31 | 2020-11-27 | Alipay Hangzhou Inf Tech Co Ltd | System and method for determining voice characteristics |
US20210150306A1 (en) * | 2019-11-14 | 2021-05-20 | Qualcomm Incorporated | Phase selective convolution with dynamic weight selection |
US11710046B2 (en) * | 2019-11-29 | 2023-07-25 | 42Maru Inc. | Method and apparatus for generating Q and A model by using adversarial learning |
WO2021112918A1 (en) | 2019-12-02 | 2021-06-10 | Caris Mpi, Inc. | Pan-cancer platinum response predictor |
US11727284B2 (en) | 2019-12-12 | 2023-08-15 | Business Objects Software Ltd | Interpretation of machine learning results using feature analysis |
US20210192376A1 (en) * | 2019-12-23 | 2021-06-24 | Sap Se | Automated, progressive explanations of machine learning results |
CN111104997B (en) * | 2019-12-25 | 2023-05-23 | 青岛创新奇智科技集团股份有限公司 | Commodity two-dimensional code generation method and system based on deep learning |
CN113093967A (en) | 2020-01-08 | 2021-07-09 | 富泰华工业(深圳)有限公司 | Data generation method, data generation device, computer device, and storage medium |
US11138094B2 (en) | 2020-01-10 | 2021-10-05 | International Business Machines Corporation | Creation of minimal working examples and environments for troubleshooting code issues |
US11163592B2 (en) * | 2020-01-10 | 2021-11-02 | International Business Machines Corporation | Generation of benchmarks of applications based on performance traces |
CN111131658B (en) * | 2020-01-19 | 2021-08-24 | 中国科学技术大学 | Image steganography method, device, electronic equipment and medium |
US11675879B2 (en) * | 2020-02-20 | 2023-06-13 | K2Ai, LLC | Apparatus and method for operating a detection and response system |
US11776679B2 (en) * | 2020-03-10 | 2023-10-03 | The Board Of Trustees Of The Leland Stanford Junior University | Methods for risk map prediction in AI-based MRI reconstruction |
TWI714480B (en) * | 2020-03-19 | 2020-12-21 | 索爾智慧機械有限公司 | Data display method of pull cap installation tool test instrument |
US11741340B2 (en) | 2020-03-23 | 2023-08-29 | D5Ai Llc | Data-dependent node-to-node knowledge sharing by regularization in deep learning |
US11494496B2 (en) * | 2020-03-30 | 2022-11-08 | International Business Machines Corporation | Measuring overfitting of machine learning computer model and susceptibility to security threats |
US11580455B2 (en) | 2020-04-01 | 2023-02-14 | Sap Se | Facilitating machine learning configuration |
US11514318B2 (en) * | 2020-04-08 | 2022-11-29 | International Business Machines Corporation | Multi-source transfer learning from pre-trained networks |
CN111242948B (en) * | 2020-04-29 | 2020-09-01 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, model training method, model training device, image processing equipment and storage medium |
US11651225B2 (en) * | 2020-05-05 | 2023-05-16 | Mitsubishi Electric Research Laboratories, Inc. | Non-uniform regularization in artificial neural networks for adaptable scaling |
US20210406693A1 (en) * | 2020-06-25 | 2021-12-30 | Nxp B.V. | Data sample analysis in a dataset for a machine learning model |
US20220035334A1 (en) * | 2020-07-29 | 2022-02-03 | Abb Schweiz Ag | Technologies for producing training data for identifying degradation of physical components |
US20220043681A1 (en) * | 2020-08-04 | 2022-02-10 | Oracle International Corporation | Memory usage prediction for machine learning and deep learning models |
US11909482B2 (en) * | 2020-08-18 | 2024-02-20 | Qualcomm Incorporated | Federated learning for client-specific neural network parameter generation for wireless communication |
EP3958182A1 (en) * | 2020-08-20 | 2022-02-23 | Dassault Systèmes | Variational auto-encoder for outputting a 3d model |
US20230308317A1 (en) * | 2020-08-20 | 2023-09-28 | Nokia Technologies Oy | Neural-Network-Based Receivers |
US20220092478A1 (en) * | 2020-09-18 | 2022-03-24 | Basf Se | Combining data driven models for classifying data |
TWI810487B (en) * | 2020-09-25 | 2023-08-01 | 國立成功大學 | Solar power forecasting method |
KR20220046324A (en) | 2020-10-07 | 2022-04-14 | 삼성전자주식회사 | Training method for inference using artificial neural network, inference method using artificial neural network, and inference apparatus thereof |
US20220108434A1 (en) * | 2020-10-07 | 2022-04-07 | National Technology & Engineering Solutions Of Sandia, Llc | Deep learning for defect detection in high-reliability components |
US11580396B2 (en) | 2020-10-13 | 2023-02-14 | Aira Technologies, Inc. | Systems and methods for artificial intelligence discovered codes |
US20220122001A1 (en) * | 2020-10-15 | 2022-04-21 | Nvidia Corporation | Imitation training using synthetic data |
US20220129758A1 (en) * | 2020-10-27 | 2022-04-28 | Raytheon Company | Clustering autoencoder |
US11615782B2 (en) * | 2020-11-12 | 2023-03-28 | Sony Interactive Entertainment Inc. | Semi-sorted batching with variable length input for efficient training |
US11818147B2 (en) * | 2020-11-23 | 2023-11-14 | Fair Isaac Corporation | Overly optimistic data patterns and learned adversarial latent features |
US20220180254A1 (en) * | 2020-12-08 | 2022-06-09 | International Business Machines Corporation | Learning robust predictors using game theory |
CN112417895B (en) * | 2020-12-15 | 2024-09-06 | 广州博冠信息科技有限公司 | Barrage data processing method, device, equipment and storage medium |
US11088784B1 (en) | 2020-12-24 | 2021-08-10 | Aira Technologies, Inc. | Systems and methods for utilizing dynamic codes with neural networks |
CN114757244A (en) * | 2020-12-25 | 2022-07-15 | 华为云计算技术有限公司 | Model training method, device, storage medium and equipment |
US11483109B2 (en) | 2020-12-28 | 2022-10-25 | Aira Technologies, Inc. | Systems and methods for multi-device communication |
US11191049B1 (en) | 2020-12-28 | 2021-11-30 | Aira Technologies, Inc. | Systems and methods for improving wireless performance |
US11368250B1 (en) | 2020-12-28 | 2022-06-21 | Aira Technologies, Inc. | Adaptive payload extraction and retransmission in wireless data communications with error aggregations |
US11477308B2 (en) | 2020-12-28 | 2022-10-18 | Aira Technologies, Inc. | Adaptive payload extraction in wireless communications involving multi-access address packets |
US11575469B2 (en) | 2020-12-28 | 2023-02-07 | Aira Technologies, Inc. | Multi-bit feedback protocol systems and methods |
CN112685314A (en) * | 2021-01-05 | 2021-04-20 | 广州知图科技有限公司 | JavaScript engine security test method and test system |
US20240153299A1 (en) * | 2021-03-01 | 2024-05-09 | Schlumberger Technology Corporation | System and method for automated document analysis |
US11489624B2 (en) | 2021-03-09 | 2022-11-01 | Aira Technologies, Inc. | Error correction in network packets using lookup tables |
US11489623B2 (en) * | 2021-03-15 | 2022-11-01 | Aira Technologies, Inc. | Error correction in network packets |
US11496242B2 (en) | 2021-03-15 | 2022-11-08 | Aira Technologies, Inc. | Fast cyclic redundancy check: utilizing linearity of cyclic redundancy check for accelerating correction of corrupted network packets |
CN113095377B (en) * | 2021-03-26 | 2024-06-14 | 中国科学院电工研究所 | Dangerous driving scene data random generation method and system |
TWI769820B (en) * | 2021-05-19 | 2022-07-01 | 鴻海精密工業股份有限公司 | Method for optimizing the generative adversarial network and electronic equipment |
WO2022252013A1 (en) * | 2021-05-31 | 2022-12-08 | Robert Bosch Gmbh | Method and apparatus for training neural network for imitating demonstrator's behavior |
US11675817B1 (en) | 2021-06-22 | 2023-06-13 | Wells Fargo Bank, N.A. | Synthetic data generation |
US20220414447A1 (en) * | 2021-06-24 | 2022-12-29 | Paypal, Inc. | Implicit curriculum learning |
EP4367638A1 (en) * | 2021-07-06 | 2024-05-15 | PAIGE.AI, Inc. | Systems and methods to process electronic images for synthetic image generation |
US11797425B2 (en) * | 2021-07-09 | 2023-10-24 | International Business Machines Corporation | Data augmentation based on failure cases |
DE102021208726A1 (en) | 2021-08-10 | 2023-02-16 | Robert Bosch Gesellschaft mit beschränkter Haftung | Training a generator for synthetic measurement data with augmented training data |
EP4145401A1 (en) * | 2021-09-06 | 2023-03-08 | MVTec Software GmbH | Method for detecting anomalies in images using a plurality of machine learning programs |
TWI780940B (en) * | 2021-10-04 | 2022-10-11 | 國立中央大學 | Task-oriented denoising system and method based on deep learning |
WO2023100190A1 (en) * | 2021-12-02 | 2023-06-08 | Telefonaktiebolaget Lm Ericsson (Publ) | First node, second node and methods performed thereby for handling data augmentation |
US11983238B2 (en) * | 2021-12-03 | 2024-05-14 | International Business Machines Corporation | Generating task-specific training data |
WO2023107164A1 (en) * | 2021-12-08 | 2023-06-15 | Visa International Service Association | System, method, and computer program product for cleaning noisy data from unlabeled datasets using autoencoders |
US20230289658A1 (en) * | 2022-01-14 | 2023-09-14 | Home Depot Product Authority, Llc | Incremental machine learning training |
WO2023192766A1 (en) * | 2022-03-31 | 2023-10-05 | D5Ai Llc | Generation and discrimination training as a variable resolution game |
EP4276724A1 (en) * | 2022-05-09 | 2023-11-15 | RTL Deutschland GmbH | Automatic prediction of effects of a media object |
US20230376375A1 (en) * | 2022-05-21 | 2023-11-23 | Jpmorgan Chase Bank, N.A. | Method and system for automatically identifying and resolving errors in log file |
WO2024044704A1 (en) * | 2022-08-25 | 2024-02-29 | Sabic Global Technologies B.V. | Systems and methods for generating training data |
US11615316B1 (en) | 2022-09-19 | 2023-03-28 | Rain Neuromorphics Inc. | Machine learning using gradient estimate determined using improved perturbations |
US11822908B1 (en) * | 2023-02-10 | 2023-11-21 | CuraeChoice, Inc. | Extensible compilation using composite programming for hardware |
EP4425384A1 (en) * | 2023-02-28 | 2024-09-04 | Fujitsu Limited | Training deep belief networks |
CN116807479B (en) * | 2023-08-28 | 2023-11-10 | 成都信息工程大学 | Driving attention detection method based on multi-mode deep neural network |
Family Cites Families (80)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5131055A (en) * | 1990-02-16 | 1992-07-14 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Auto and hetero-associative memory using a 2-D optical logic gate |
US5959574A (en) | 1993-12-21 | 1999-09-28 | Colorado State University Research Foundation | Method and system for tracking multiple regional objects by multi-dimensional relaxation |
US6324532B1 (en) * | 1997-02-07 | 2001-11-27 | Sarnoff Corporation | Method and apparatus for training a neural network to detect objects in an image |
US6128606A (en) | 1997-03-11 | 2000-10-03 | At&T Corporation | Module for constructing trainable modular network in which each module inputs and outputs data structured as a graph |
AU2002305652A1 (en) | 2001-05-18 | 2002-12-03 | Biowulf Technologies, Llc | Methods for feature selection in a learning machine |
US7054847B2 (en) * | 2001-09-05 | 2006-05-30 | Pavilion Technologies, Inc. | System and method for on-line training of a support vector machine |
US7609608B2 (en) * | 2001-09-26 | 2009-10-27 | General Atomics | Method and apparatus for data transfer using a time division multiple frequency scheme with additional modulation |
US7016884B2 (en) * | 2002-06-27 | 2006-03-21 | Microsoft Corporation | Probability estimate for K-nearest neighbor |
US20040010480A1 (en) * | 2002-07-09 | 2004-01-15 | Lalitha Agnihotri | Method, apparatus, and program for evolving neural network architectures to detect content in media information |
US20040042650A1 (en) | 2002-08-30 | 2004-03-04 | Lockheed Martin Corporation | Binary optical neural network classifiers for pattern recognition |
US7437336B2 (en) * | 2003-08-01 | 2008-10-14 | George Mason Intellectual Properties, Inc. | Polyoptimizing genetic algorithm for finding multiple solutions to problems |
KR100506095B1 (en) * | 2003-11-17 | 2005-08-03 | 삼성전자주식회사 | Method and apparatus of landmark detection in intelligent system |
US7587064B2 (en) | 2004-02-03 | 2009-09-08 | Hrl Laboratories, Llc | Active learning system for object fingerprinting |
US20070289013A1 (en) * | 2006-06-08 | 2007-12-13 | Keng Leng Albert Lim | Method and system for anomaly detection using a collective set of unsupervised machine-learning algorithms |
US7565334B2 (en) * | 2006-11-17 | 2009-07-21 | Honda Motor Co., Ltd. | Fully bayesian linear regression |
US8204128B2 (en) * | 2007-08-01 | 2012-06-19 | Her Majesty The Queen In Right Of Canada, As Represented By The Minister Of Industry, Through The Communications Research Centre Canada | Learning filters for enhancing the quality of block coded still and video images |
WO2009068084A1 (en) * | 2007-11-27 | 2009-06-04 | Nokia Corporation | An encoder |
CN105740641A (en) * | 2009-10-19 | 2016-07-06 | 提拉诺斯公司 | Integrated health data capture and analysis system |
US8687923B2 (en) * | 2011-08-05 | 2014-04-01 | Adobe Systems Incorporated | Robust patch regression based on in-place self-similarity for image upscaling |
US8484022B1 (en) * | 2012-07-27 | 2013-07-09 | Google Inc. | Adaptive auto-encoders |
CN102930291B (en) * | 2012-10-15 | 2015-04-08 | 西安电子科技大学 | Automatic K adjacent local search heredity clustering method for graphic image |
US8527276B1 (en) | 2012-10-25 | 2013-09-03 | Google Inc. | Speech synthesis using deep neural networks |
US9646226B2 (en) * | 2013-04-16 | 2017-05-09 | The Penn State Research Foundation | Instance-weighted mixture modeling to enhance training collections for image annotation |
US20140358828A1 (en) | 2013-05-29 | 2014-12-04 | Purepredictive, Inc. | Machine learning generated action plan |
US10459117B2 (en) * | 2013-06-03 | 2019-10-29 | Exxonmobil Upstream Research Company | Extended subspace method for cross-talk mitigation in multi-parameter inversion |
US9247911B2 (en) * | 2013-07-10 | 2016-02-02 | Alivecor, Inc. | Devices and methods for real-time denoising of electrocardiograms |
US9753796B2 (en) * | 2013-12-06 | 2017-09-05 | Lookout, Inc. | Distributed monitoring, evaluation, and response for multiple devices |
US20150228277A1 (en) * | 2014-02-11 | 2015-08-13 | Malaspina Labs (Barbados), Inc. | Voiced Sound Pattern Detection |
US11232319B2 (en) | 2014-05-16 | 2022-01-25 | The Trustees Of The University Of Pennsylvania | Applications of automatic anatomy recognition in medical tomographic imagery based on fuzzy anatomy models |
WO2016037300A1 (en) * | 2014-09-10 | 2016-03-17 | Xiaoou Tang | Method and system for multi-class object detection |
US10832138B2 (en) | 2014-11-27 | 2020-11-10 | Samsung Electronics Co., Ltd. | Method and apparatus for extending neural network |
US10275719B2 (en) | 2015-01-29 | 2019-04-30 | Qualcomm Incorporated | Hyper-parameter selection for deep convolutional networks |
US9576250B2 (en) | 2015-02-24 | 2017-02-21 | Xerox Corporation | Method and system for simulating users in the context of a parking lot based on the automatic learning of a user choice decision function from historical data considering multiple user behavior profiles |
US10410118B2 (en) | 2015-03-13 | 2019-09-10 | Deep Genomics Incorporated | System and method for training neural networks |
US20160277767A1 (en) * | 2015-03-16 | 2016-09-22 | Thomson Licensing | Methods, systems and apparatus for determining prediction adjustment factors |
US20160321523A1 (en) | 2015-04-30 | 2016-11-03 | The Regents Of The University Of California | Using machine learning to filter monte carlo noise from images |
US10565518B2 (en) | 2015-06-23 | 2020-02-18 | Adobe Inc. | Collaborative feature learning from social media |
US10552730B2 (en) | 2015-06-30 | 2020-02-04 | Adobe Inc. | Procedural modeling using autoencoder neural networks |
US9699205B2 (en) * | 2015-08-31 | 2017-07-04 | Splunk Inc. | Network security system |
US10521902B2 (en) * | 2015-10-14 | 2019-12-31 | The Regents Of The University Of California | Automated segmentation of organ chambers using deep learning methods from medical imaging |
US10776712B2 (en) * | 2015-12-02 | 2020-09-15 | Preferred Networks, Inc. | Generative machine learning systems for drug design |
US11170294B2 (en) * | 2016-01-07 | 2021-11-09 | Intel Corporation | Hardware accelerated machine learning |
US10043243B2 (en) * | 2016-01-22 | 2018-08-07 | Siemens Healthcare Gmbh | Deep unfolding algorithm for efficient image denoising under varying noise conditions |
US10733531B2 (en) * | 2016-01-27 | 2020-08-04 | Bonsai AI, Inc. | Artificial intelligence engine having an architect module |
US11087234B2 (en) * | 2016-01-29 | 2021-08-10 | Verizon Media Inc. | Method and system for distributed deep machine learning |
US10176799B2 (en) * | 2016-02-02 | 2019-01-08 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for training language models to reduce recognition errors |
US10089717B2 (en) * | 2016-04-05 | 2018-10-02 | Flipboard, Inc. | Image scaling using a convolutional neural network |
US20170328194A1 (en) | 2016-04-25 | 2017-11-16 | University Of Southern California | Autoencoder-derived features as inputs to classification algorithms for predicting failures |
EP3459017B1 (en) * | 2016-05-20 | 2023-11-01 | Deepmind Technologies Limited | Progressive neural networks |
US10043252B2 (en) * | 2016-06-14 | 2018-08-07 | Intel Corporation | Adaptive filtering with weight analysis |
US10387765B2 (en) * | 2016-06-23 | 2019-08-20 | Siemens Healthcare Gmbh | Image correction using a deep generative machine-learning model |
US20180024968A1 (en) | 2016-07-22 | 2018-01-25 | Xerox Corporation | System and method for domain adaptation using marginalized stacked denoising autoencoders with domain prediction regularization |
US10504004B2 (en) * | 2016-09-16 | 2019-12-10 | General Dynamics Mission Systems, Inc. | Systems and methods for deep model translation generation |
EP3520038A4 (en) | 2016-09-28 | 2020-06-03 | D5A1 Llc | Learning coach for machine learning system |
US10096088B2 (en) * | 2016-09-28 | 2018-10-09 | Disney Enterprises, Inc. | Robust regression method for image-space denoising |
US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
US10621586B2 (en) * | 2017-01-31 | 2020-04-14 | Paypal, Inc. | Fraud prediction based on partial usage data |
US20180218256A1 (en) * | 2017-02-02 | 2018-08-02 | Qualcomm Incorporated | Deep convolution neural network behavior generator |
US11915152B2 (en) | 2017-03-24 | 2024-02-27 | D5Ai Llc | Learning coach for machine learning system |
US10489887B2 (en) * | 2017-04-10 | 2019-11-26 | Samsung Electronics Co., Ltd. | System and method for deep learning image super resolution |
EP3612984A4 (en) | 2017-04-18 | 2021-03-24 | D5A1 Llc | Multi-stage machine learning and recognition |
US20180342045A1 (en) * | 2017-05-26 | 2018-11-29 | Microsoft Technology Licensing, Llc | Image resolution enhancement using machine learning |
WO2018226492A1 (en) | 2017-06-05 | 2018-12-13 | D5Ai Llc | Asynchronous agents with learning coaches and structurally modifying deep neural networks without performance degradation |
WO2018226527A1 (en) | 2017-06-08 | 2018-12-13 | D5Ai Llc | Data splitting by gradient direction for neural networks |
US20200143240A1 (en) | 2017-06-12 | 2020-05-07 | D5Ai Llc | Robust anti-adversarial machine learning |
EP3646252A4 (en) | 2017-06-26 | 2021-03-24 | D5Ai Llc | Selective training for decorrelation of errors |
WO2019005507A1 (en) | 2017-06-27 | 2019-01-03 | D5Ai Llc | Aligned training of deep networks |
US11403531B2 (en) * | 2017-07-19 | 2022-08-02 | Disney Enterprises, Inc. | Factorized variational autoencoders |
US11270188B2 (en) | 2017-09-28 | 2022-03-08 | D5Ai Llc | Joint optimization of ensembles in deep learning |
WO2019067831A1 (en) | 2017-09-28 | 2019-04-04 | D5Ai Llc | Multi-objective generators in deep learning |
WO2019067960A1 (en) | 2017-09-28 | 2019-04-04 | D5Ai Llc | Aggressive development with cooperative generators |
US10679129B2 (en) | 2017-09-28 | 2020-06-09 | D5Ai Llc | Stochastic categorical autoencoder network |
US10592779B2 (en) | 2017-12-21 | 2020-03-17 | International Business Machines Corporation | Generative adversarial network medical image generation for training of a classifier |
US10540578B2 (en) | 2017-12-21 | 2020-01-21 | International Business Machines Corporation | Adapting a generative adversarial network to new data sources for image classification |
US11138731B2 (en) * | 2018-05-30 | 2021-10-05 | Siemens Healthcare Gmbh | Methods for generating synthetic training data and for training deep learning algorithms for tumor lesion characterization, method and system for tumor lesion characterization, computer program and electronically readable storage medium |
US10692019B2 (en) * | 2018-07-06 | 2020-06-23 | Capital One Services, Llc | Failure feedback system for enhancing machine learning accuracy by synthetic data generation |
EP3624021A1 (en) * | 2018-09-17 | 2020-03-18 | Robert Bosch GmbH | Device and method for training an augmented discriminator |
US11580329B2 (en) * | 2018-09-18 | 2023-02-14 | Microsoft Technology Licensing, Llc | Machine-learning training service for synthetic data |
US11593641B2 (en) * | 2018-09-19 | 2023-02-28 | Tata Consultancy Services Limited | Automatic generation of synthetic samples using dynamic deep autoencoders |
US11366982B2 (en) | 2018-09-24 | 2022-06-21 | Sap Se | Computer systems for detecting training data usage in generative models |
-
2018
- 2018-09-07 US US16/124,977 patent/US10679129B2/en active Active
- 2018-09-14 WO PCT/US2018/051069 patent/WO2019067236A1/en active Search and Examination
- 2018-09-14 EP EP18861571.0A patent/EP3688676A4/en not_active Withdrawn
- 2018-09-14 US US16/646,092 patent/US11354578B2/en active Active
- 2018-09-14 CN CN201880067064.XA patent/CN111226232B/en active Active
- 2018-09-17 WO PCT/US2018/051332 patent/WO2019067248A1/en active Application Filing
- 2018-09-17 US US16/646,169 patent/US11074506B2/en active Active
- 2018-09-19 WO PCT/US2018/051683 patent/WO2019067281A1/en active Application Filing
- 2018-09-19 US US16/646,071 patent/US20200285948A1/en not_active Abandoned
- 2018-09-28 US US16/645,710 patent/US20200285939A1/en not_active Abandoned
- 2018-09-28 CN CN201880076808.4A patent/CN111542843A/en active Pending
- 2018-09-28 EP EP18861823.5A patent/EP3688677A4/en active Pending
- 2018-09-28 US US16/646,096 patent/US11074505B2/en active Active
- 2018-09-28 CN CN201880067035.3A patent/CN111226236B/en active Active
- 2018-09-28 EP EP18862297.1A patent/EP3688678A4/en not_active Withdrawn
-
2020
- 2020-05-06 US US16/867,746 patent/US11461661B2/en active Active
- 2020-06-15 US US16/901,608 patent/US11410050B2/en active Active
-
2022
- 2022-07-05 US US17/810,778 patent/US11531900B2/en active Active
- 2022-07-28 US US17/815,851 patent/US11687788B2/en active Active
-
2023
- 2023-05-12 US US18/196,855 patent/US20230289611A1/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11531900B2 (en) | 2017-09-28 | 2022-12-20 | D5Ai Llc | Imitation learning for machine learning systems with synthetic data generators |
US11687788B2 (en) | 2017-09-28 | 2023-06-27 | D5Ai Llc | Generating synthetic data examples as interpolation of two data examples that is linear in the space of relative scores |
US11836600B2 (en) | 2020-08-20 | 2023-12-05 | D5Ai Llc | Targeted incremental growth with continual learning in deep neural networks |
US11948063B2 (en) | 2020-08-20 | 2024-04-02 | D5Ai Llc | Improving a deep neural network with node-to-node relationship regularization |
US20230342351A1 (en) * | 2022-04-26 | 2023-10-26 | Truist Bank | Change management process for identifying inconsistencies for improved processing efficiency |
US11983162B2 (en) | 2022-04-26 | 2024-05-14 | Truist Bank | Change management process for identifying potential regulatory violations for improved processing efficiency |
Also Published As
Publication number | Publication date |
---|---|
US20200265320A1 (en) | 2020-08-20 |
US11354578B2 (en) | 2022-06-07 |
US20190095798A1 (en) | 2019-03-28 |
WO2019067248A1 (en) | 2019-04-04 |
EP3688678A4 (en) | 2021-07-28 |
US11410050B2 (en) | 2022-08-09 |
EP3688676A1 (en) | 2020-08-05 |
US20200320371A1 (en) | 2020-10-08 |
US20220383131A1 (en) | 2022-12-01 |
US20200285939A1 (en) | 2020-09-10 |
US11531900B2 (en) | 2022-12-20 |
US11461661B2 (en) | 2022-10-04 |
US20200279188A1 (en) | 2020-09-03 |
US20220335305A1 (en) | 2022-10-20 |
WO2019067236A1 (en) | 2019-04-04 |
EP3688677A1 (en) | 2020-08-05 |
US11074505B2 (en) | 2021-07-27 |
EP3688676A4 (en) | 2021-06-23 |
EP3688678A1 (en) | 2020-08-05 |
CN111226236A (en) | 2020-06-02 |
US10679129B2 (en) | 2020-06-09 |
EP3688677A4 (en) | 2021-08-18 |
US11687788B2 (en) | 2023-06-27 |
CN111226236B (en) | 2024-10-15 |
US20200210842A1 (en) | 2020-07-02 |
CN111226232B (en) | 2024-04-12 |
CN111542843A (en) | 2020-08-14 |
CN111226232A (en) | 2020-06-02 |
WO2019067281A1 (en) | 2019-04-04 |
US20200279165A1 (en) | 2020-09-03 |
US11074506B2 (en) | 2021-07-27 |
US20230289611A1 (en) | 2023-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200285948A1 (en) | Robust auto-associative memory with recurrent neural network | |
US11270188B2 (en) | Joint optimization of ensembles in deep learning | |
US11676026B2 (en) | Using back propagation computation as data | |
US11037059B2 (en) | Self-supervised back propagation for deep learning | |
CN110914839B (en) | Selective training of error decorrelation | |
US11521064B2 (en) | Training a neural network model | |
US11074502B2 (en) | Efficiently building deep neural networks | |
US11195097B2 (en) | Building ensembles for deep learning by parallel data splitting | |
US11010670B2 (en) | Building a deep neural network with diverse strata | |
CN116635866A (en) | Method and system for mining minority class data samples to train a neural network | |
US10922587B2 (en) | Analyzing and correcting vulnerabilities in neural networks | |
US20210342683A1 (en) | Companion analysis network in deep learning | |
US11682223B1 (en) | Scoring sentiment in documents using machine learning and fuzzy matching | |
JP6643905B2 (en) | Machine learning method and machine learning device | |
Ledesma et al. | Feature selection using artificial neural networks | |
US20230289434A1 (en) | Diversity for detection and correction of adversarial attacks | |
CN113239077B (en) | Searching method, system and computer readable storage medium based on neural network | |
WO2021170215A1 (en) | Neural architecture search | |
KR20230022629A (en) | Method and Apparatus for Training Artificial Intelligence Based on Episode Memory | |
JPH07200520A (en) | Associative module learning method and device therefor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: D5AI LLC, FLORIDA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BAKER, JAMES K.;REEL/FRAME:052077/0473 Effective date: 20181005 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |