US11526760B2 - Training system for artificial neural networks having a global weight constrainer - Google Patents

Training system for artificial neural networks having a global weight constrainer Download PDF

Info

Publication number
US11526760B2
US11526760B2 US16/186,121 US201816186121A US11526760B2 US 11526760 B2 US11526760 B2 US 11526760B2 US 201816186121 A US201816186121 A US 201816186121A US 11526760 B2 US11526760 B2 US 11526760B2
Authority
US
United States
Prior art keywords
weights
neural network
network architecture
constraint
neuron
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/186,121
Other versions
US20200151570A1 (en
Inventor
Sathya Narayanan Ravi
Tuan Quang Dinh
Vishnu Sai Rao Suresh Lokhande
Vikas Singh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wisconsin Alumni Research Foundation
Original Assignee
Wisconsin Alumni Research Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wisconsin Alumni Research Foundation filed Critical Wisconsin Alumni Research Foundation
Priority to US16/186,121 priority Critical patent/US11526760B2/en
Assigned to WISCONSIN ALUMNI RESEARCH FOUNDATION reassignment WISCONSIN ALUMNI RESEARCH FOUNDATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SINGH, VIKAS, DINH, TUAN, Lokhande, Vishnu Sai Rao Suresh, RAVI, SATHYA
Publication of US20200151570A1 publication Critical patent/US20200151570A1/en
Application granted granted Critical
Publication of US11526760B2 publication Critical patent/US11526760B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • G06N3/0481
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

An architecture for training the weights of artificial neural networks provides a global constrainer modifying the neuron weights in each iteration not only by the back-propagated error but also by a global constraint constraining these weights based on the value of all weights at that iteration. The ability to accommodate a global constraint is made practical by using a constrained gradient descent which approximates the error gradient deduced in the training as a plane, offsetting the increased complexity of the global constraint.

Description

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
This invention was made with government support under AG040396 awarded by the National Institutes of Health and 1252725 awarded by the National Science Foundation. The government has certain rights in the invention.
CROSS REFERENCE TO RELATED APPLICATION Background of the Invention
The present invention relates generally to computer architectures and, in particular, to an architecture providing improved training of artificial neural networks.
Artificial neural networks (henceforth neural network) are computing systems inspired by the brain. A common design of a neural net provides multiple layers of “neurons” where each layer has multiple connections to preceding and/or succeeding layers. Each of the multiple inputs to each neuron is associated with a weight, and the neuron provides an output that is a nonlinear function of the weighted sum of the data from the input connections.
The final layer of the neural net typically provides a classification, for example, expressed as an output vector having elements associated with different classification possibilities. In a common example of a neural net that is trained to review image data and classify that image data, the output vector may classify the image according to whether it shows a particular subject, for example, an automobile or a pedestrian.
The weights of the neural network are obtained by a training process in which example data with known classification is provided in a “training set” to the neural network and the weights adjusted iteratively so that the output classification converges to the known classification of the training set data.
During training, examples from the training set are applied to the neural network, and output from the neural network is compared to a desired output to produce an error value representing a difference between the desired output an actual output obtained with the current weights. This error value is “backpropagated” through the neurons and used iteratively to adjust the weights of each neuron. Over multiple training examples, the weights ideally converge to a set of weights that produces the desired output for each of the training set examples.
The large amount of data and large number of iterations required for training can incur substantial computational costs even with high-speed architectures using specialized hardware such as graphic processing units. Moreover, most real world uses of artificial neural networks anticipate constant retraining as the operating neural networks experience new situations. For example, an artificial neural network used in an autonomous vehicle will desirably be retrained regularly as it and other vehicles collect additional data during use.
Training an artificial neural network presents a gradient descent problem. The generated error is part of an error function whose surface height indicates an error corresponding to the current weights. Training the weight values involves iteratively adjusting the weights to try to move downhill on this error function surface. Determining the “downhill” direction at any given iteration can require substantial calculation in determining the local gradient for each weight.
This process of gradient descent can be made substantially faster by not determining a gradient for each example data in the training set for the entirety of the dataset at once but rather choosing a random subset of data and making the determination based on only that subset. This subset is then varied between iterations. This process is known as statistical (or stochastic) gradient descent and is widely used in the training of artificial neural networks in order to provide sufficient speed.
Some researchers have noticed that training time and calculation burden can be reduced by constraining the values of the weights globally, for example, trying to keep the weight values close to one. Applying these constraints to the statistical gradient descent calculation, however, can be difficult or impossible and can lead to convergence problems.
SUMMARY OF THE INVENTION
The present invention provides an artificial neural net training processor that applies global constraints to the weights of the artificial neural network thus producing substantial reductions in training time. In some cases, a near ⅓ reduction in training time can be obtained. Alternatively, for the same precision, the required neurons in the neural net may be reduced as much as 50 percent.
The ability to practically apply global constraints during the training process is made practical through the use of a “conditional gradient descent” rather than “statistical gradient descent.” Unlike the statistical gradient descent, the conditional gradient descent considers the local gradients of all weights but makes up in this completeness by approximating those weights as a sloped plane. While the error function as such can be much complicated, optimizing this sloped plane with global constraints can be very simple. This provides a simplification to the calculation without the complexity and convergence problems associated with statistical gradient descent.
Specifically, then, the present invention provides a neural network architecture for training weights of an artificial neural network, the architecture having a set of neurons arranged in multiple layers between network inputs and network outputs, each neuron providing a set of weights applied individually to separate neuron inputs to produce a neuron output. The architecture includes a memory adapted to hold a training set comprising multiple sets of examples, each said example linked to a particular desired output. An error calculator determines an error between an output at the network output for a given example of a given set compared to the desired output linked to a given set, and a constrained weight adjuster globally adjusts the weights of the neurons over multiple iterations according to a backpropagated portion of the error at each neuron at each iteration, the constrained weight adjuster operating to constrain adjustment of given weights according to a predetermined constraint dependent on the value of substantially all weights at a given iteration.
It is thus a feature of at least in one embodiment of the invention to provide a global constraint on the weights allowing improved trade-off between training convergence time, precision, and network complexity.
The constrained weight adjuster may include a library of constraint definitions and provide a selector input for selecting among those constraint definitions to provide a different predetermined constraint.
It is thus a feature of at least one embodiment of the invention to allow selection of a global constraint based on a priori knowledge of the training set or empirical study of the best global constraint for a particular training set.
The constraint definitions may include constraints constraining a vector sum of the weights and constraints constraining a variance of the weights.
It is thus a feature of at least one embodiment of the invention to provide for the ability to handle both constraints on the value of the weight vector and constraints relating to statistical measures of the weight vector.
The constrained weight adjuster may adjust the weights of the neurons in a conditional gradient descent in which a gradient of the error at the current weights is approximated by a multidimensional plane.
It is thus a feature of at least one embodiment of the invention to offset the additional computational burden of a constraint that considers all weights by providing a simplified gradient descent that uses an approximation of those weights. Thus it is a feature of at least one object of the invention to consider all of the weights globally by simplifying the contribution of each weight.
The adjustment is limited to a vector describing the weight values to within a predefined multidimensional volume having dimensions equal to the number of substantially all the weights.
It is thus a feature of at least one embodiment of the invention to reduce the range of the weights to better match hardware requirements, for example, of limited precision computer dividers.
The multidimensional volume constraining the weight values may be a sphere.
It is thus a feature of at least one embodiment of the invention to provide a computationally intuitive constraint system that does not favor particular constraint values.
Alternatively, the multidimensional volume may be an n-dimensional parallelepiped. This parallelepiped may have sides aligned with axes of a multidimensional space holding the multidimensional volume or may have vertices lying on axes passing through the origin of the dimensions.
It is thus a feature of at least one embodiment of the invention to provide a computationally simple constraint that can look simply at vector component ranges (the former case) or a constrained system that promotes sparsity (the latter case).
In one embodiment the adjustment may be limited to limit a global variance across the substantially all weights.
It is thus a feature of at least one embodiment of the invention to provide a constraint that considers statistical qualities of the weight vector.
These particular objects and advantages may apply to only some embodiments falling within the claims and thus do not define the scope of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a neural net processor of the present invention having a set of neurons arranged in a network and communicating with a training set stored in memory, the training set holding examples and desired outputs, the architecture further including a gradient generator determining training errors and a global constrainer having selectable constraint functions;
FIG. 2 is an expanded fragmentary view of the neurons of FIG. 1 showing their interconnection and communication of their weights to the global constrainer;
FIG. 3 is a block diagram of the global constrainer showing its receipt of a selection input for selecting among constraint definitions implemented by the constrainer; and
FIG. 4 is a highly simplified representation of a boundary-type constrainer constraining the weight vector according to one constraint definition.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring now to FIG. 1 , a high-speed training system 10 for artificial neural networks may provide for a memory 12 communicating with a neural net processor 14. The memory 12 may hold a training set 16 providing training subsets 18 each having a set of different examples 20 associated with a given target output 22. For example, in training an image classifier, each training subset 18 may be associated with different animals, and each training example 20 may be a different image of that type of animal. The target output 22 associated with the training subset 18 may be a particular classification of the animal (e.g., horse). The memory 12 may be a standard high-speed computer memory, for example, including random access memory and disk storage or the like.
The neural net processor 14 provides a special-purpose processor, for example, constructed of an array of graphic processing units or using custom integrated circuits intended for the purpose of implementing neurons of an artificial neural network such as Google's tensor processing unit (TPU). In this respect, the neural net processor 14 will include specific computational circuits and firmware or software to implement a set of interconnected neurons 24 forming a neural network 26. Each neuron 24 will include a set of weights (not shown in FIG. 1 ) which define normal operation of the neural network 26 and whose values are determined during the training process.
When training is undertaken, the neural network 26 may receive data successively from each example 20 of the training set 16 for processing by the neurons 24. The output of the neural network 26 may then be received by a gradient generator 28. The gradient generator 21 also receives from the training set 16 the target output 22 associated with that example 20 and with this target output 22 and the output of the neural network 26, and the gradient generator 28 determines an error value being the difference between this target output 22 and the actual output from the neural network 26 for the particular training example 20.
This error value is then backpropagated through the network to provide a neuron error or neuron-related gradient 54 for each of the neurons 24. As will be discussed below, these neuron-associated gradients 54 will be used to modify the weights of that neuron 24 for that given iteration.
After this modification is complete, a new training example 20 is provided to the neural net processor 14 and this process repeated, further modifying the weights until an average error value over the entire training set drops below a predetermined threshold indicating a convergence of the weights to a desired solution and thus completion of the training.
Importantly, neural net processor 14 also includes a global constrainer 30 which communicates with each neuron 24 to receive weight values and the neuron-related gradients 54 and limits the amount of change in the weights during each stage of the iteration that would otherwise be suggested by the gradient and weight values. The constraint enforced by the global constrainer 30 is global meaning that it is a function of all of the weights of all the neurons rather than determined simply by the individual weight of a single neuron and its back-propagated, neuron-related gradient value.
Referring now to FIGS. 1 and 2 , generally each neuron 24 will provide a set of interconnections 34 between it and adjacent neurons 24 so as to form the neural network 26 as a set of different layers 36. For most layers 36, the outputs of each 24 of the preceding layer 36 will connect to an input of every neuron 24 in the succeeding layer 36. For the initial layer 36, the neurons 24 receive data directly from the memory 12 (training set examples), and for the final layer 36 the neurons 24 provide data directly to the gradient generator 28. The layers 36 may be actual layers or virtual layers, the latter arising in a recurrent neural network in which data is recycled in successive instantiations of a single hardware layer.
While the present invention can work with a variety of neuron types, as discussed above, all neurons 24 will be associated with a set of weights 38 designated as (Wj). Collectively, the weights 38 in the neural network 26 form a weight vector designated (W). A weight 38 is associated with each input of each neuron 24 and during operation of the neural network 26, the weight 38 for each neuron input is multiplied by the value received on that input. The output of each weight 38 of a neuron 24 is then summed by summing junction 40 and optionally added with a bias 41 (also trained like the weights 38 during training) and then compressed by activation function 42 which provides a single neuron output that may connect with the neurons 24 of succeeding layers 36. The activation function 42 will typically implement a sigmoid or hyperbolic tangent function compressing the value of the output between two values (for example, −1 and +1).
During backpropagation, an error generated by the gradient generator 28 moves backward through the neurons 24 passing through the weights 38 to produce a gradient 54 associated with each neuron 24.
Normally this neuron-related gradient 54 would be used to produce a small change in the weight 38 by multiplying the gradient 54 times the weight 38 and adding a fraction of that value to the current weight 38 to produce a new weight 38. This process is local, meaning that it can be done without data outside of the neuron 24 or looking at more than one weight 38. In the present invention, however, the change in the weight value of each weight 38 is constrained by the constrainer 30 as a function of all of the other weights 38 within that neuron 24 and other neurons 24 in the same and different layers 36.
In globally constraining the change in the weights 38, the constrainer 30 may implement a number of different types of constraints selected by a selector circuit 32 which may, for example, be a user interface allowing the individual managing the training to select a constraint type. Alternatively the selector circuit 32 may produce a selection automatically by analysis of the type of data of the training set based on empirical evaluation of these constraint types.
Referring to FIG. 3 , a selection of a particular constraint type for the constrainer 30 may be done by selector circuit 32 which points to a particular constraint definition 51 from a definition file 50 whose contents will be discussed below. This constraint definition 51 is used by a global constraint calculator 52 which also receives the weight values 38 from each neuron 24 as well as the backpropagated gradients 54 associated with each weight 38 to produce a new weight 38 that will then be used for the next iteration of a forward propagation of a training set example 20.
In order to practically constrain the weights 38 globally, the global constraint calculator 52 implements a conditional descent gradient process. Unlike the statistical gradient descent, the conditional gradient descent considers all of the dimensions of the error function in determining a gradient but simplifies the gradient to be a single multidimensional plane. This may be done, for example, by approximating the gradient as the first term of the Taylor series representation of the gradient (being the linear term). Generally, a conditional gradient solves a linear minimization problem instead of a quadratic one as follows:
s t arg min W g t T W s . t . R ( W ) λ ( 1 )
where gt T is a transposition of the gradient vector (the error function) at a current point in the multidimensional space holding of the weight vector, at the current weight vector W point. As noted, this term gt T essentially represents a linear approximation of the gradient (first term of the Taylor expansion around Wt). The term st represents a point in a restricted domain of the domain of W and points to a direction of minimization. Using equation (1) the next iterative value of W is then determined as:
W t+1 CG ←ηW t+(1−η)s t  (2)
where the subscript t is the iteration index and η is a predetermined step size for the gradient descent, typically between 0 and 1. The value Wt+1 is a vector sum of all of the weights 38. Effectively this approach simplifies the gradient descent to descend on a simple planar surface that adjusts for each iteration to approximate the gradient at the new weight point.
Referring now momentarily to FIG. 4 , this new vector of Wt+1 represented by new weight value 62 is not used as the next weight vector but is constrained by the constrainer 30. As discussed above this constraint is a global constraint that is a function of all of the constituent weights 38. Conceptually, the invention provides a constraint boundary 60 within which a new weight value 64 is found and used to replace Wt+1 as computed in the equation (2).
In some cases, the boundary 60 applies geometrically (in multi-dimensions) to the final weight value 62 and in some case the constraints may operate on a statistical quality of the final weight 62, for example, limiting the variance of the constituent weights (W1, W2, . . . ) less susceptible to the geometric representation.
As noted, the global constraint calculator 52 implements this conditional gradient descent with selected different constraint functions. These different constraints, as stored in the constraint definitions 51 may include generally the Frobenius Norm, the Nuclear Norm, the
Figure US11526760-20221213-P00001
1 Norm, the
Figure US11526760-20221213-P00001
2 Norm, Nuclear Norm, and the Total Variation Norm.
The Frobenius Norm constraint constrains the changes in succeeding weights to be:
W i + 1 CG = W t ( 1 - η ) ( W t + λ g t g t F ) ( 3 )
where ∥⋅∥F is the Frobenius Norm function. This constraint basically describes a n-dimensional sphere constraining the weights similar to that shown in FIG. 4 but at a higher dimension corresponding to many more weights 38.
The Nuclear Norm constraint constrains the changes in succeeding weights to be:
W t+1 CG =W t−(1−η)(W t +λu t v t T )  (5)
where u and v are the largest left and right singular vectors of Wt.
The
Figure US11526760-20221213-P00001
1 Norm constraint constrains the changes in succeeding weights to be according to equation (2) above where:
s i j = { - λ if j = arg max j g i j 0 otherwise ( 5 )
where st and gt are vectors of the same size and the jth coordinate of vector st is −λ where j is chosen to be the coordinate with the largest magnitude among all coordinates in gt,
This constraint constrains the values of W to an n-dimensional parallelepiped whose vertices are on the axes of the multidimensional coordinate system. This constraint tends to make the weight vector W sparse.
The
Figure US11526760-20221213-P00001
2 Norm constraint constrains the changes in succeeding weights to be according to equation (2) above where:
s i j = { + λ if g t j < 0 - λ otherwise . ( 5 )
where all coordinates with negative values in gt are collected and all such coordinates in st are set to λ and the remaining coordinates in st set to −λ
The Total Variation Norm constraint constrains the changes in succeeding weights to have a variance that does not exceed a predetermined value. The total variation Norm constraint constrains the changes in succeeding weights to be according to equation (2) above where:
s t=arg min∥x∥ subject to x∈F uv(g t)  (6)
where Fuv(gt) is the standard flow polytope with weights specified by gt.
Certain terminology is used herein for purposes of reference only, and thus is not intended to be limiting. For example, terms such as “upper”, “lower”, “above”, and “below” refer to directions in the drawings to which reference is made. Terms such as “front”, “back”, “rear”, “bottom” and “side”, describe the orientation of portions of the component within a consistent but arbitrary frame of reference which is made clear by reference to the text and the associated drawings describing the component under discussion. Such terminology may include the words specifically mentioned above, derivatives thereof, and words of similar import. Similarly, the terms “first”, “second” and other such numerical terms referring to structures do not imply a sequence or order unless clearly indicated by the context.
When introducing elements or features of the present disclosure and the exemplary embodiments, the articles “a”, “an”, “the” and “said” are intended to mean that there are one or more of such elements or features. The terms “comprising”, “including” and “having” are intended to be inclusive and mean that there may be additional elements or features other than those specifically noted. It is further to be understood that the method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order discussed or illustrated, unless specifically identified as an order of performance. It is also to be understood that additional or alternative steps may be employed.
References to “a microprocessor” and “a processor” or “the microprocessor” and “the processor,” can be understood to include one or more microprocessors that can communicate in a stand-alone and/or a distributed environment(s), and can thus be configured to communicate via wired or wireless communications with other processors, where such one or more processor can be configured to operate on one or more processor-controlled devices that can be similar or different devices. Furthermore, references to memory, unless otherwise specified, can include one or more processor-readable and accessible memory elements and/or components that can be internal to the processor-controlled device, external to the processor-controlled device, and can be accessed via a wired or wireless network.
It is specifically intended that the present invention not be limited to the embodiments and illustrations contained herein and the claims should be understood to include modified forms of those embodiments including portions of the embodiments and combinations of elements of different embodiments as come within the scope of the following claims. All of the publications described herein, including patents and non-patent publications, are hereby incorporated herein by reference in their entireties.

Claims (10)

What we claim is:
1. A neural network architecture for training weights of an artificial neural network, the neural network architecture comprising:
a set of neurons arranged in multiple layers between network inputs and network outputs, each neuron providing a set of weights applied individually to separate neuron inputs to produce a neuron output;
a memory adapted to hold a training set comprising a multiple set of examples, each said set linked to a particular desired output;
an error calculator determining an error between an output at the network output for a given example of a given set compared to the desired output linked to a given set; and
a constrained weight adjuster globally adjusting the weights of the neurons over multiple iterations according to a backpropagated portion of the error at each neuron at each iteration, the constrained weight adjuster constraining adjustment of given weights according to a predetermined constraint dependent on a value of substantially all weights at a given iteration.
2. The neural network architecture of claim 1 wherein the constrained weight adjuster includes a library of constraint definitions and provides a selector input for selecting among those constraint definitions to provide a different predetermined constraint.
3. The neural network architecture of claim 2 wherein the constraint definitions include constraints constraining a vector sum of the weights and constraints constraining a variance of the weights.
4. The neural network architecture of claim 1 wherein the constrained weight adjuster adjusts the weights of the neurons in a constrained gradient descent in which a gradient of the error at current weights is approximated by a multidimensional plane.
5. The neural network architecture of claim 4 wherein the constrained weight adjuster adjusts the weights by limiting a vector describing the weight to within a predefined multidimensional volume having dimensions equal to a number of substantially all weights.
6. The neural network architecture of claim 5 wherein the predefined multidimensional volume is a sphere.
7. The neural network architecture of claim 6 wherein the predefined multidimensional volume is an n-dimensional parallelepiped.
8. The neural network architecture of claim 7 wherein the parallelepiped has sides aligned with axes of a multidimensional space holding the multidimensional volume.
9. The neural network architecture of claim 6 wherein the parallelepiped has vertices lying on axes passing through an origin of the dimensions.
10. The neural network architecture of claim 4 wherein the adjustment is limited to limit a global variance across the substantially all weights.
US16/186,121 2018-11-09 2018-11-09 Training system for artificial neural networks having a global weight constrainer Active 2041-10-14 US11526760B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/186,121 US11526760B2 (en) 2018-11-09 2018-11-09 Training system for artificial neural networks having a global weight constrainer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/186,121 US11526760B2 (en) 2018-11-09 2018-11-09 Training system for artificial neural networks having a global weight constrainer

Publications (2)

Publication Number Publication Date
US20200151570A1 US20200151570A1 (en) 2020-05-14
US11526760B2 true US11526760B2 (en) 2022-12-13

Family

ID=70550577

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/186,121 Active 2041-10-14 US11526760B2 (en) 2018-11-09 2018-11-09 Training system for artificial neural networks having a global weight constrainer

Country Status (1)

Country Link
US (1) US11526760B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200320002A1 (en) * 2019-04-04 2020-10-08 EMC IP Holding Company LLC Intelligently managing data facility caches
CN111737921B (en) * 2020-06-24 2024-04-26 深圳前海微众银行股份有限公司 Data processing method, equipment and medium based on cyclic neural network
US20220076127A1 (en) * 2020-09-09 2022-03-10 Microsoft Technology Licensing, Llc Forcing weights of transformer model layers
US11593819B2 (en) * 2021-06-09 2023-02-28 Maplebear Inc. Training a model to predict likelihoods of users performing an action after being presented with a content item

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190244098A1 (en) * 2018-02-06 2019-08-08 Fujitsu Limited Optimization system, optimization apparatus, and optimization system control method
US20190251441A1 (en) * 2018-02-13 2019-08-15 Adobe Systems Incorporated Reducing architectural complexity of convolutional neural networks via channel pruning
US20200005152A1 (en) * 2017-03-23 2020-01-02 Deepmind Technologies Limited Training neural networks using posterior sharpening
US10754744B2 (en) * 2016-03-15 2020-08-25 Wisconsin Alumni Research Foundation Method of estimating program speed-up in highly parallel architectures using static analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10754744B2 (en) * 2016-03-15 2020-08-25 Wisconsin Alumni Research Foundation Method of estimating program speed-up in highly parallel architectures using static analysis
US20200005152A1 (en) * 2017-03-23 2020-01-02 Deepmind Technologies Limited Training neural networks using posterior sharpening
US20190244098A1 (en) * 2018-02-06 2019-08-08 Fujitsu Limited Optimization system, optimization apparatus, and optimization system control method
US20190251441A1 (en) * 2018-02-13 2019-08-15 Adobe Systems Incorporated Reducing architectural complexity of convolutional neural networks via channel pruning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
F. Morchen, "Analysis of speedup as function of block size and cluster size for parallel feed-forward neural networks on a Beowulf cluster," in IEEE Transactions on Neural Networks, vol. 15, No. 2, pp. 515-527, Mar. 2004. (Year: 2004). *
S. Venkataramani, A. Ranjan, K. Roy and A. Raghunathan, "AxNN: Energy-efficient neuromorphic systems using approximate computing," 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 2014, pp. 27-32. (Year: 2014). *

Also Published As

Publication number Publication date
US20200151570A1 (en) 2020-05-14

Similar Documents

Publication Publication Date Title
US11526760B2 (en) Training system for artificial neural networks having a global weight constrainer
JP7315748B2 (en) Data classifier training method, data classifier training device, program and training method
Zhu et al. Bayesian deep convolutional encoder–decoder networks for surrogate modeling and uncertainty quantification
Dettmers et al. Sparse networks from scratch: Faster training without losing performance
Vani et al. An experimental approach towards the performance assessment of various optimizers on convolutional neural network
Chan et al. Bayesian poisson regression for crowd counting
Vigdor et al. The bayesian artmap
US9129222B2 (en) Method and apparatus for a local competitive learning rule that leads to sparse connectivity
CN111898764A (en) Method, device and chip for federal learning
US11657285B2 (en) Methods, systems, and media for random semi-structured row-wise pruning in neural networks
CN109784474A (en) A kind of deep learning model compression method, apparatus, storage medium and terminal device
JP2023523029A (en) Image recognition model generation method, apparatus, computer equipment and storage medium
US20220300823A1 (en) Methods and systems for cross-domain few-shot classification
US11615292B2 (en) Projecting images to a generative model based on gradient-free latent vector determination
Mesquita et al. Embarrassingly parallel MCMC using deep invertible transformations
Ibragimovich et al. Effective recognition of pollen grains based on parametric adaptation of the image identification model
Magdon-Ismail et al. Density estimation and random variate generation using multilayer networks
Wang et al. Variational inference with NoFAS: Normalizing flow with adaptive surrogate for computationally expensive models
Gu et al. Gaussian orthogonal latent factor processes for large incomplete matrices of correlated data
Ojaghi et al. Using artificial neural network for classification of high resolution remotely sensed images and assessment of its performance compared with statistical methods
US11875263B2 (en) Method and apparatus for energy-aware deep neural network compression
US20220284261A1 (en) Training-support-based machine learning classification and regression augmentation
EP3660742B1 (en) Method and system for generating image data
KR20230107230A (en) Automatic early termination machine learning model
Mirza et al. Classifier tools: A comparative study

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE