WO2020176297A1 - Compression de réseau neuronal artificiel par approche d'apprentissage de renforcement hybride itératif - Google Patents

Compression de réseau neuronal artificiel par approche d'apprentissage de renforcement hybride itératif Download PDF

Info

Publication number
WO2020176297A1
WO2020176297A1 PCT/US2020/018723 US2020018723W WO2020176297A1 WO 2020176297 A1 WO2020176297 A1 WO 2020176297A1 US 2020018723 W US2020018723 W US 2020018723W WO 2020176297 A1 WO2020176297 A1 WO 2020176297A1
Authority
WO
WIPO (PCT)
Prior art keywords
compression
model
neural network
component
compressed
Prior art date
Application number
PCT/US2020/018723
Other languages
English (en)
Inventor
Venkata Ratnam Saripalli
Ravi SONI
Jiahui Guan
Gopal B. Avinash
Original Assignee
GE Precision Healthcare LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GE Precision Healthcare LLC filed Critical GE Precision Healthcare LLC
Publication of WO2020176297A1 publication Critical patent/WO2020176297A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3059Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3071Prediction
    • H03M7/3073Time
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound
    • H03M7/702Software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the subject disclosure relates to artificial neural network compression and, more specifically, to facilitating automated compression of artificial neural networks via reinforcement learning.
  • neural networks are a computational framework for implementing machine learning (e.g., the teaching of a computer system to perform a specific task without explicit instructions unique to that task).
  • neural networks include multiple, interconnected computational units called neurons.
  • the networks are usually organized into a sequence of layers (e.g., an input layer, an output layer, and optionally one or more hidden layers between the input and output layers), with each layer containing one or more of the neurons.
  • neural networks have fully-connected feedforward topologies (e.g., each neuron in a given layer receives input from every neuron in the preceding layer and sends output to every neuron in the succeeding layer).
  • the networks need not be fully-connected (e.g., convolutional neural networks), and other topologies are possible (e.g., short-cut topologies, direct/indirect recurrent topologies, lateral recurrent topologies, and so on).
  • a neuron receives a vector input (e.g., the vector of scalar activation values of all neurons in the preceding layer); applies a propagation function (e.g., weighted sum) to the vector input to yield a scalar net input; optionally adds a bias value to the scalar net input; computes a scalar activation value by applying a nonlinear activation function (e.g., sigmoid function, softmax function, hyperbolic tangent, and so on) to the scalar net input; and finally outputs its own scalar activation value to the neurons in the succeeding layer.
  • This mathematical transformation between two connected layers can be represented via matrix notation as:
  • represents the vector of activation values for all neurons in layer L, represents the same for all neurons in layer L-l, represents the scalar bias values of the neurons in layer L, W L represents the weight matrix containing the scalar weight values for all connections to the neurons in layer L, and / represents the nonlinear activation function.
  • the weights in W L and the biases in b ⁇ are what enable neural networks to recognize patterns.
  • the weights and biases can be initialized randomly and then optimized (e.g., through cost function minimization via backpropagation, stochastic gradient descent, and so on). Once trained, the network’s optimized weights and biases allow it to consistently identify particular patterns in inputted data sets, which patterns it learned from the training data.
  • a fully -trained neural network can achieve impressive pattern recognition capabilities, and thus can be effectively applied in many fields (e.g., character recognition, audio recognition, computer vision, facial recognition, voice recognition, cancer cell detection, EEG analysis, ECG analysis, X-ray evaluation, MRI evaluation, CAT scan evaluation, ultrasound analysis, and so on).
  • fields e.g., character recognition, audio recognition, computer vision, facial recognition, voice recognition, cancer cell detection, EEG analysis, ECG analysis, X-ray evaluation, MRI evaluation, CAT scan evaluation, ultrasound analysis, and so on).
  • Neural network compression is conventionally performed via knowledge distillation (e.g., training a small network to mimic a large, fully-trained network), channel pruning (e.g., zeroing irrelevant/redundant connection weights and keeping only the weights that contribute to the network’s output, and/or removing neurons/layers altogether), quantization (e.g., rounding, truncating, or reducing the number of bits representing weights in the network), and so on.
  • knowledge distillation e.g., training a small network to mimic a large, fully-trained network
  • channel pruning e.g., zeroing irrelevant/redundant connection weights and keeping only the weights that contribute to the network’s output, and/or removing neurons/layers altogether
  • quantization e.g., rounding, truncating, or reducing the number of bits representing weights in the network
  • model-free reinforcement learning e.g., N2N learning, AMC engine compression, and so on
  • training trials e.g., millions in some cases
  • any automated compression systems that instead rely only on model-based reinforcement learning (which are not conceded to exist), while faster, would be particularly sensitive to model bias, and would thus be only as accurate as the environmental models they use.
  • the subject claimed innovation bridges the gap between these two automated methods/systems of neural network compression, thus achieving the superior accuracy of model- free reinforcement learning compression with the shorter convergence times of model-based reinforcement learning compression.
  • an artificial neural network compression system can comprise a processor that can execute computer-executable instructions stored on a computer-readable memory.
  • the system can include a reinforcement learning (“RL”) agent component that can determine, via a compression policy (e.g., a probabilistic mapping of states to compression actions), which compression actions to perform.
  • the system can include a model-free component that can, in some embodiments, comprise a first state component.
  • the first state component can receive electronic data indicating a state (e.g., number of layers, number of neurons, number/values of parameters, specific characteristics about a particular layer, and so on) of a neural network to be compressed.
  • the model-free component can have a first action component that can perform one or more compression actions determined by the RL agent component (e.g., layer removal, neuron removal, parameter/weight removal, parameter/weight adjustment, and so on) on the neural network to compress the neural network into a compressed neural network.
  • the system can also include a model-based component that can comprise, in various embodiments, a second state component that can receive electronic data indicating a state (e.g., number of layers, number of neurons, number/values of parameters, specific characteristics about a particular layer, and so on) of the neural network to be compressed.
  • the model-based component can also include a second action component that can perform one or more compression actions determined by the RL agent component (e.g., layer removal, neuron removal, parameter/weight removal, parameter/weight adjustment, and so on) on the neural network to compress the neural network into a compressed neural network.
  • the model-free component can compute, in some proportion of iterations (e.g.,
  • the first reward signal can be based on a compression ratio and a model performance metric (e.g., an accuracy ratio) of the compressed neural network for the first state component and the first action component.
  • the model-based component can predict, in some remaining proportion of compression iterations (e.g., (1— x)-proportion of the time that compression actions are performed), a second reward signal that can quantify how well the neural network was compressed.
  • the second reward signal can be based on a compression model learned from the first state component and the first action component (e.g., a compression model trained on the model-free output).
  • the RL agent component can iteratively update the compression policy based on one or more first reward signals computed by the model-free component and/or one or more second reward signals predicted by the model- based component (e.g., update the policy using the model-free reward signal in (a) -proportion of compression iterations/episodes, and update the policy using the model-based reward signal in (1— x)-proportion of compression iterations/episodes).
  • the RL agent component can, in some cases, update (e.g., via policy gradient methods) the compression policy until an optimal compression policy is substantially approximated (e.g., convergence).
  • a computer-implemented method for compressing artificial neural networks can comprise a series of acts.
  • the computer-implemented method can include receiving as input an original neural network to be compressed.
  • the computer-implemented method can also include performing one or more compression actions (e.g., layer removal, neuron removal, parameter/weight removal, parameter/weight adjustment, and so on) according to a reinforcement learning (RL) agent (e.g., a probabilistic mapping of states to compression actions) to compress the original neural network into a compressed neural network.
  • RL reinforcement learning
  • the computer-implemented method can further include generating a reward signal that quantifies how well the original neural network was compressed.
  • the generating the reward signal can be performed by computing, in some proportion of iterations (e.g., (a) -proportion of the time that compression actions are performed, where a 6 [0,1]), the reward signal in model-free fashion based on a compression ratio and an accuracy ratio of the compressed neural network.
  • the generating the reward signal can be performed by predicting, in some remaining proportion of compression iterations (e.g., (1— a)- proportion of the time that compression actions are performed), the reward signal in model-based fashion based on a compression model.
  • the compression model can be learned from one or more of the reward signals computed in model-free fashion (e.g., a compression model trained on the model-free output).
  • the computer-implemented method can, in some cases, include updating (e.g., via policy gradient methods) the RL agent based on the generated reward signal (e.g., updating the policy using the reward signal computed in model- free fashion in (a) -proportion of compression iterations/episodes, and updating the policy using the reward signal predicted in model-based fashion in (1— a)-proportion of compression iterations/episodes).
  • the computer-implemented method can include iterating respective prior steps (e.g., performing compression actions, generating reward signals, and updating the compression policy) until an optimal compression policy is substantially approximated (e.g., convergence).
  • a computer program product that can compress artificial neural networks can comprise a non-transitory computer-readable storage medium having program instructions embodied therewith.
  • the program instructions can be executable by a processing component which can cause the processing component to perform one or more acts.
  • the steps can include having the processing component receive as input an original neural network to be compressed.
  • the steps can also include having the processing component perform one or more compression actions (e.g., layer removal, neuron removal, parameter/weight removal, parameter/weight adjustment, and so on) according to a
  • the steps can include having the processing component generate a reward signal that quantifies how well the original neural network was compressed.
  • the generating the reward signal can be performed by computing, in some proportion of compression iterations (e.g., (a) -proportion of the time that compression actions are performed, where a 6 [0,1]), the reward signal in model-free fashion based on a compression ratio and an accuracy ratio of the compressed neural network.
  • the generating the reward signal can be performed by predicting, in some remaining proportion of compression iterations (e.g., (1— a)-proportion of the time that compression actions are performed), the reward signal in model-based fashion based on a compression model.
  • the compression model can be learned from one or more of the reward signals computed in model-free fashion (e.g., a compression model trained on the model-free output).
  • the acts can also include having the processing component update (e.g., via policy gradient methods) the RF agent based on the reward signal (e.g., updating the policy using the reward signal computed in model-free fashion in (a) -proportion of compression iterations/episodes, and updating the policy using the reward signal predicted in model-based fashion in (1— a) -proportion of compression
  • the acts can also include having the processing component iterate respective prior steps (e.g., performing compression actions, generating a reward signal, updating the compression policy) until an optimal compression policy is substantially approximated (e.g., convergence).
  • FIG. 1 illustrates a schematic block diagram of a conventional automated network compression system using model-free reinforcement learning.
  • FIG. 2 illustrates a flow diagram of a conventional automated network compression method using model-free reinforcement learning.
  • FIG. 3 illustrates a high-level schematic block diagram of an example, non- limiting system that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach in accordance with one or more embodiments described herein.
  • FIG. 4 illustrates a flow diagram of an example, non-limiting computer- implemented method that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach in accordance with one or more embodiments described herein.
  • FIG. 5 illustrates a flow diagram of an example, non-limiting computer- implemented method that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a-decay in accordance with one or more embodiments described herein.
  • FIGs. 6A and 6B illustrate schematic block diagrams of example, non-limiting systems that facilitate automated neural network compression via an iterative hybrid
  • FIG. 7 illustrates a schematic block diagram of an example, non-limiting system that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including an update component in accordance with one or more embodiments described herein.
  • FIG. 8 illustrates a schematic block diagram of an example, non-limiting system that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a reward component in accordance with one or more embodiments described herein.
  • FIG. 9 illustrates a schematic block diagram of an example, non-limiting system that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a deep neural network in accordance with one or more embodiments described herein.
  • FIG. 10 illustrates a schematic block diagram of an example, non-limiting system that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a machine learning component in accordance with one or more embodiments described herein.
  • FIG. 11 illustrates a schematic block diagram of an example, non-limiting system that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a value component in accordance with one or more embodiments described herein.
  • FIG. 12 illustrates pseudocode of an example, non-limiting computer- implemented algorithm that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach in accordance with one or more embodiments described herein.
  • FIG. 13 illustrates a block diagram of an example, non-limiting operating environment in which one or more embodiments described herein can be facilitated.
  • smart medical devices that could benefit from neural network implementation include smart diagnostic/monitoring devices (e.g., smart sensors that can monitor patient heartrate, blood pressure, breathing, temperature, insulin level, and the like to detect maladies; smart image-analyzers that can evaluate X-rays, MRI scans, CAT scans, ultrasound images, and so on to identify infirmities; smart toilets that can analyze a patient’s biological waste for signs of disease; smart beds that can detect occupancy and attempts of occupants to rise; smart surveillance cameras that can determine when an unaccompanied patient has fallen or is struggling; and the like), smart rehabilitation devices (e.g., smart braces, exoskeletons, and/or prostheses that can monitor and/or react to patient motion and forces, and the like), smart therapeutic devices,
  • smart diagnostic/monitoring devices e.g., smart sensors that can monitor patient heartrate, blood pressure, breathing, temperature, insulin level, and the like to detect maladies
  • smart image-analyzers that can evaluate X-rays, MRI scans
  • One or more embodiments described herein include systems, computer- implemented methods, apparatus, and/or computer program products that facilitate automated neural network compression. More specifically, one or more embodiments pertaining to automated neural network compression via an iterative hybrid reinforcement learning approach (also called“data-driven dyna model compression” or“D3MC”) are described. For example, in one or more embodiments, a compression architecture, which can be modeled as a Markov Decision Process, can receive an original neural network (also called the“teacher network”) to be compressed.
  • an original neural network also called the“teacher network”
  • the teacher network can be any type of fully- and/or partially-trained neural network with any type of topology (e.g., feedforward network, radial basis network, deep feedforward network, recurrent network, long/short term memory network, gated recurrent unit network, auto encoder network, variational auto encoder network, denoising auto encoder network, sparse auto encoder network, Markov chain network, Hopfield network, Boltzmann machine network, restricted Boltzmann machine network, deep belief network, deep convolutional network, deconvolutional network, deep convolutional inverse graphics network, generative adversarial network, liquid state machine network, extreme learning machine network, echo state network, deep residual network, Kohonen network, support vector machine network, neural Turing machine network, and so on).
  • topology e.g., feedforward network, radial basis network, deep feedforward network, recurrent network, long/short term memory network, gated recurrent unit network
  • auto encoder network variational auto encoder network, denoising auto encoder
  • the compression architecture can, in one or more embodiments, compress the teacher network by iteratively performing one or more designated actions (e.g., layer removal, layer shrinkage, parameter adjustment, and so on), with each action deterministically changing the state (e.g., number of layers, number of neurons, number/values of weights/biases, and so on) of the network being compressed (also called the “student network”).
  • the compression architecture can choose from among the designated actions by following a policy (e.g., a probabilistic mapping of states to actions) implemented by an RL agent.
  • the policy can be parameterized, non- parameterized/tabular, stochastic, deterministic, and so on.
  • the policy in various embodiments, can be initialized in any way and iteratively optimized (e.g., via policy gradient methods, and so on), resulting in a policy that generally chooses the best (e.g., state-value maximizing and/or action-value maximizing) action, given the current state of the student network, thereby compressing the student network while maintaining comparable accuracy to the teacher network.
  • the compression architecture can exhibit a dyna structure; that is, the policy can receive feedback from both a model-free reinforcement learning component (e.g., computes reward based on compression ratio and accuracy ratio of a fully- compressed student network) and a model-based reinforcement learning component (e.g., predicts reward of potential actions based on a model of the environment).
  • model-based component can learn and improve the environmental model by receiving tuples (e.g., final state of compressed student network and associated reward) from the model-free component, thereby eliminating the need for bias-inducing assumptions about the model.
  • tuples e.g., final state of compressed student network and associated reward
  • the subject claimed innovation can avoid searching redundant state-action space, and thus can achieve the accuracy (e.g., optimally compressed student networks) of model-free-only compression systems/methods with the quicker speeds/run times of model-based-only compression systems/methods (which are not conceded to exist), thereby addressing the shortcomings of prior art compression automation.
  • the embodiments described herein relate to systems, computer- implemented methods, apparatus, and/or computer program products that employ highly technical hardware and/or software to provide concrete technological solutions to concrete technological problems in the field of automated neural network compression.
  • conventional systems/methods for automated compression of neural networks primarily use model-free-only reinforcement learning, meaning that they achieve sufficiently accurate results at the expense of requiring significantly many training trials.
  • automated network compression that utilizes model-based-only reinforcement learning would compress networks more quickly and with fewer training trials, but at the expense of decreased accuracy and/or increased bias inherent in the environmental model used.
  • the present innovation provides a neural network compression architecture/pipeline that is structurally different from conventional automated compression pipelines and that reduces compression training-time without significant loss in accuracy.
  • FIG. 1 illustrates a schematic block diagram of a conventional automated network compression system 100 using model-free reinforcement learning.
  • the compression system 100 includes conventional automated compression architecture 102 that receives an original neural network (called the“teacher network”) 110 and outputs a compressed neural network (called the“student network”) 114.
  • the compression architecture 102 compresses the teacher network 110 into the student network 114 by iteratively applying one or more compression actions (e.g., layer removal, parameter removal, weight adjustment, and so on) to the environment 106 (e.g., the network being compressed).
  • the compression architecture 102 selects compression actions to perform according to an RL agent 104 (e.g., which can use a policy, a stochastic mapping from states to actions).
  • an RL agent 104 e.g., which can use a policy, a stochastic mapping from states to actions.
  • the compression architecture 102 utilizes a model-free reinforcement learning approach to compute a reward that characterizes how well or poorly the student network 114 has been compressed.
  • the reward is usually a function of the compression ratio, comparing the size of the student network 114 to that of the teacher network 110, and the accuracy ratio, comparing the accuracy of the student network 114 to that of the teacher network 110.
  • the compression ratio is simply a function of the number of parameters/layers in the compressed student network 114 and the number of parameters/layers in the teacher network 110.
  • the accuracy ratio is obtained by comparing the results of the teacher network 110 in response to given training data 108 to the results of the student network 114 in response to the same training data 108.
  • the compressed student network 114 can also be fed test data 112 to determine its level of accuracy.
  • the RL agent 104 can update (e.g., improve its policy via policy gradient methods) based on the reward. Such a method of updating a policy based on received rewards is called direct RL training.
  • This overall process of performing a sequence of compression actions, computing a reward based on the characteristics of the compressed network, and updating the policy of the RL agent 104 based on the reward is iterated until the policy converges (e.g., is optimized or approximately optimized); that is, until a cumulative reward function is maximized.
  • the RL agent 104 can choose the best compression action for any given state, and so the compression architecture 102 outputs the optimally-compressed student network 114.
  • FIG. 2 illustrates a flow diagram of a conventional automated network compression method 200 using model-free reinforcement learning.
  • a network compression architecture receives as input an original neural network (“teacher network”) to be compressed.
  • the compression architecture performs one or more compression actions, such as layer removal, parameter removal, weight adjustment, and so on, according to a compression policy in order to compress the teacher network into a compressed neural network (“student network”).
  • the compression architecture computes a reward based on the compression ratio and the accuracy ratio of the compressed student network.
  • the compression architecture updates the compression policy based on the computed reward.
  • the compression architecture repeatedly iterates 204 to 208 until an optimal compression policy, and thus an optimally compressed student network, is achieved or approximated (e.g., convergence).
  • FIG. 3 illustrates a high-level schematic block diagram of an example, non-limiting system 300 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach in accordance with one or more embodiments described herein.
  • the system 300 can comprise a data-driven dyna model compression architecture (called the“D3MC architecture”) 302 that can receive an original neural network (“teacher network”) 110 (or, in some embodiments, a copy of a teacher network 110) and output an optimally-compressed neural network (“student network”) 114.
  • the D3MC architecture 302 can be modeled as a finite Markov decision process
  • S can represent the state-space, which can include all possible reduced architectures - that is, all possible compressed student networks 114 - that can be derived from the teacher network 110.
  • Any student network 114 can be described by its state s 6 S, which can include its number of layers, the number of neurons in each layer, the number of weights/parameters in the network, the values of those weights/parameters, the accuracy of the network, and so on.
  • the state s 6 S can instead represent the state of a particular layer in the student network 114, such as the layer type, the number of kernels, the kernel size, the stride, the padding, the trainable parameters in the layer, and so on.
  • the state can represent any combination of the aforementioned, and so on.
  • A can represent the action-space, which can include all possible actions that can transform one network architecture into another, such as layer removal, neuron removal, parameter/weight removal, parameter/weight adjustment, and so on.
  • T: S x A ® S can represent a transition function that describes how the state of the student network 114 changes based on a previous state and an action taken in that previous state.
  • T can be deterministic since a given compression action a 6 A can take a student network 114 from one state s 6 S to another state s' 6 S without uncertainty.
  • the actions a 6 A can be selected by an RL agent according to a compression policy n-g: S ® A, which is a probabilistic mapping of states to actions with a parameterization of Q (e.g., a vector of parameter values that influence the policy output).
  • the policy p can instead be tabular, non- parameterized, and so on. In some cases, the policy can be deterministic.
  • r MP ⁇ S ® R can represent a model-free reward function that computes a reward based on the state of the student network 114.
  • r MB ⁇ S ® R can represent a model-based reward function that predicts a reward based on the state of the student network 114 and a model of the learning environment.
  • a reward can be computed after each action a 6 A.
  • a reward can be computed after a final compressed state s n 6 S is achieved via a sequence of actions a 0 , a 1 , ... , a n 6 A.
  • g 6 [0,1] can represent a discount factor that determines how heavily future rewards are weighted compared to present rewards, which can influence the policy update process.
  • the D3MC architecture 302 can include an RL agent 304 that can use a policy (e.g., p-g) to probabilistically select one or more actions from the action-space to compress the teacher network 110 into the student network 114.
  • the actions can be performed by the RL agent 304 on the environment 310 (e.g., the network currently being compressed).
  • the policy can be initialized in any way (e.g., random initialization of parameters in Q ) and can subsequently be iteratively updated/optimized (e.g., via policy gradient methods, REINFORCE policy gradient optimization, dynamic programming, Monte Carlo methods, temporal difference methods, «-step bootstrapping methods, any variations of the aforementioned, and so on).
  • a reward can be computed and/or predicted to characterize/quantify how well or how poorly the student network 114 was compressed.
  • the RL agent 304 can then iteratively optimize the policy based on the reward (and/or based on a sum of discounted and/or non-discounted future rewards) as mentioned above.
  • the D3MC architecture 302 can include a model- free reinforcement learning component 306 that can compute a reward based on the compressed state s n 6 S (e.g., via reward function r MP ⁇ S ® R ).
  • the model-free reinforcement learning component 306 can compute the reward as a function of the compression ratio, comparing the size of the compressed student network 114 to the size of the original teacher network 110, and of the accuracy ratio, comparing the accuracy of the outputs of the compressed student network 114 to that of the original teacher network 110, or some other model performance metric.
  • the compression ratio can be computed by comparing the number of parameters, layers, and/or neurons in the compressed student network 114 to the number of parameters, layers, and/or neurons in the original teacher network 110.
  • the accuracy ratio can be obtained by comparing the outputs of the original teacher network 110 in response to given training data 108 to the outputs of the compressed student network 114 in response to the same training data 108.
  • test data 112 can be used to determine the accuracy of the compressed student network 114.
  • the D3MC architecture 302 can train the compressed student network 114 via cross-entropy loss and/or distillation loss from the teacher network 110 and based on the training data 108 and/or the test data 112, thereby yielding the accuracy of the compressed student network 110.
  • this process of performing one or more compression actions, computing a reward based on the compressed state of the student network, and updating the policy based on the reward is called direct RL learning/training.
  • the D3MC architecture 302 can also comprise a model-based reinforcement learning component 308 that can predict a reward based on a compressed state of the student network 114 and/or based on contemplated compression actions (e.g., predicting the reward that would occur if the contemplated compression actions were performed).
  • the model-based reinforcement learning component 308 can, in various embodiments, have a model (e.g., distribution and/or sample model) of the environment 310.
  • the model e.g., the function r MB : S ® R
  • the D3MC architecture 302 computes a reward via the model-free reinforcement learning component 306, that reward and its associated compressed state s n 6 S can be sent to the model-based reinforcement learning component 308.
  • the model-based reinforcement learning component 308 can, after receiving one or more of these samples (e.g., reward-and- final-state pairs), perform supervised training on its machine learning component (e.g., training the machine learning component to output the given rewards when the given compressed states, and/or similar compressed states, are encountered and/or contemplated).
  • the RL agent 304 can be iteratively update/optimize the policy, as described above, based on the predicted reward.
  • model-based reinforcement learning component 308 can perform background planning (e.g., using simulated experience to improve value functions and/or policy) and/or decision-time planning (e.g., using simulated experience to select an action in the current state).
  • the D3MC architecture 302 can select an (a) -proportion of its actions in a given compression episode, where a 6 [0,1], to be rewarded via the model-free reinforcement learning component 306.
  • a (1— a) -proportion of its actions in the given compression episode can be rewarded via the model-based reinforcement learning component 308.
  • rewards can be computed via the model-free reinforcement learning component 306 about 60% of the time, while rewards can be predicted via the model- based reinforcement learning component 308 about 40% of the time.
  • the value of a can be decayed over time.
  • the model-free reinforcement learning component 306 can be used more often during the early compression trials/episodes of the D3MC architecture 302, thereby allowing a robust and unbiased model of the environment to be generated by the model-based reinforcement learning component 308. Consequently, the model-based reinforcement learning component 308 can then be used more often in the later compression trials/episodes, thereby significantly cutting down on convergence time without sacrificing substantial accuracy.
  • the rewards predicted by the model-based reinforcement learning component 308 can be used by the RL agent 304 to update/optimize the policy. In various other embodiments, the rewards predicted by the model-based reinforcement learning component 308 can be used to select a compression action at decision-time without
  • a combination of the aforementioned is possible.
  • a significant training speed-up can be achieved by combining the model- based reinforcement learning component 308 with the model-free reinforcement learning component 306.
  • the environment 310 can exhibit the following behavior.
  • the environment can accept a list of layers with binary action (e.g., 0 to keep, 1 to remove) per layer from the teacher network 110.
  • the D3MC architecture 302 can receive this list and create a network with the removed layers.
  • the D3MC architecture 302 can then use the original weights/parameters of the teacher network 110 to initialize the student network 114.
  • the D3MC architecture 302 can train the student-network 114 with a cross entropy loss and/or a distillation loss from the teacher network 110.
  • the associated reward can then be computed and/or predicted, as described above.
  • the retrain time can be cutdown significantly via predicting the reward signal.
  • FIG. 4 illustrates a flow diagram of an example, non limiting computer-implemented method 400 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach in accordance with one or more embodiments described herein.
  • a D3MC architecture can receive as input an original neural network (“teacher network”) to be compressed.
  • the D3MC architecture can receive a copy/duplicate of the original teacher network, such that the original teacher network remains unaltered while the duplicate teacher network is iteratively compressed and becomes the resultant student network.
  • the D3MC architecture can perform one or more compression actions (e.g., layer removal, neuron removal,
  • the D3MC architecture can compute a reward, via a model- free component, based on the compression ratio and the accuracy ratio of the compressed student network.
  • this reward computation can, in some embodiments, be performed after a sequence of compression actions are taken (e.g., after reaching a compressed state s n 6 S). In other embodiments, a reward can be computed after each compression action.
  • the compressed student network can be trained using cross-entropy loss and/or distillation loss on the teacher network in order to determine the compressed student network’s accuracy.
  • the D3MC architecture can use the computed reward and the final state of the compressed student network to facilitate supervised training of a model-based component in the D3MC architecture.
  • the D3MC architecture can predict a reward, via a model-based component, using a model trained on one or more prior final-state-and-reward tuples generated by the model-free component. As mentioned above, in various embodiments, this reward prediction can be computed after a sequence of compression actions and/or after each compression action.
  • FIG. 5 illustrates a flow diagram of an example, non limiting computer-implemented method 500 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a-decay in accordance with one or more embodiments described herein. As shown, the method 500 can, in various embodiments, have the same operations 402 to 412 as shown in FIG. 4.
  • the D3MC architecture can, in one or more embodiments, incrementally decay a to shift the bulk of reward generation from the model-free component to the model-based component over time and/or from compression episode to compression episode.
  • the D3MC can, in one or more embodiments, incrementally decay a to shift the bulk of reward generation from the model-free component to the model-based component over time and/or from compression episode to compression episode.
  • the architecture can iterate 404 to 412 and 502 until an optimal compression policy, and thus an optimally compressed student network, is achieved and/or approximated.
  • the early compression episodes/trials e.g., sequences of compression actions
  • the model-free component can help to generate a robust environmental model in the model-based component via the supervised training of 408.
  • a sufficiently robust model has been trained/learned, a can be decayed, which can cause the later compression episodes/trials to rely more heavily on the model-based component.
  • This hybrid structure/pipeline reaps the advantages of both model-free and model-based learning; it enables the D3MC architecture to achieve the compression accuracy of the model-free approaches, without requiring their inordinately long run times.
  • FIGs. 6A and 6B illustrate schematic block diagrams of example, non-limiting systems 600 that facilitate automated neural network compression via an iterative hybrid reinforcement learning approach in accordance with one or more embodiments described herein.
  • the system 600 can include the data- driven dyna model compression (“D3MC”) architecture 302.
  • the D3MC architecture 302 can comprise a processor 602 and a computer-readable memory 604.
  • the computer-readable memory 604 can store computer-executable instructions that can be executed by the processor 602. These instructions and their execution can, in some
  • the D3MC architecture 302 can also include a state component 606 that can receive electronic data signifying the state information of a student network to be compressed.
  • the state component 606 can receive data indicating the number of layers in the student network, the number of neurons in the student network, the number of parameters/weights in the student network, the values of
  • the initial state received by the state component 606 can, in some embodiments, be a state of an original teacher network (e.g., the student network’s architecture before any compression has been performed is identical to that of the teacher network, and the structures of any individual layers in the student network before any compression has been performed are identical to those in the teacher network).
  • the state component 606 can electronically receive/read the state information of the student network after each compression action and/or after each compression episode/trial (e.g., a sequence of compression actions). By reading the state information collected by the state component 606, the D3MC architecture 302 can select compression actions to perform on the student network based on the received state information.
  • the D3MC architecture 302 can comprise an action component 608 that can perform one or more of a set of designated compression actions on the student network.
  • the set of designated compression actions can include layer removal, neuron removal, parameter/weight removal, parameter/weight adjustment, and so on. That is, in one or more embodiments, the action component 608 can remove one or more layers from the student network, can remove one or more neurons from the student network, can remove/zero one or more parameters/weights in the student network, can otherwise adjust the values of one or more parameters/weights in the student network, and so on.
  • each action performed by the action component 608 can deterministically transform the architecture of the student network from one state s 6 S to another s' 6 S.
  • the D3MC architecture 302 can also comprise an agent component
  • the agent component 614 can use a compression policy (e.g., p-g), which can
  • the agent component 614 can determine which compression action and/or range of potential compression actions to take when the student network is in a particular state. For example, the agent component 614 can, in some cases, determine that a current state of the student network calls for removing a certain layer in the student network rather than merely removing one or more neurons in the layer or merely adjusting/removing the weights in the layer, and/or vice versa. The agent component 614 can make this determination since the policy assigns a higher probability to the compression action and/or actions that it favors the most.
  • the compression policy of the agent component 614 can be parameterized (e.g., p-g), non- parameterized, tabular, stochastic, deterministic, and so on.
  • the compression policy p can be a probabilistic function of one or more parameters (e.g., parameters listed in vector Q ) and can be optimized (e.g., via policy gradient methods) without consulting a state-value function and/or action-value function, although such a value function can still be incorporated (e.g., actor-critic approaches).
  • a parameterized policy can be a variation of the softmax function as follows:
  • the D3MC architecture 302 can further comprise a model-free component 610 and a model-based component 612.
  • the compression policy of the agent component 614 can be updated/optimized in order to ensure that appropriate compression actions are being performed by the action component 608.
  • the model-free component 610 and the model-based component 612 can help to facilitate this optimization by computing (e.g., model-free) and/or predicting (e.g., model-based) a reward that characterizes and/or quantifies how well or how poorly the student network was compressed.
  • such rewards can, in some embodiments, be generated after a sequence of compression actions has fully compressed a student network (e.g., after each compression episode/trial). In other embodiments, such rewards can be generated after each compression action, and so on.
  • model-free and model-based reinforcement learning is applicable to model-free component 610 and model-based component 612, respectively.
  • the model-free component 610 and the model-based component 612 can each comprise their own state component 606 and action component 608, as shown in FIG. 6B.
  • state component 606 and the action component 608 can apply to the state and action components depicted in FIG. 6B.
  • the remaining disclosure discusses other embodiments in relation to the configurations contemplated in FIG. 6A. However, those of skill will understand that all of this disclosure can be applied equally well to the configurations contemplated in FIG. 6B.
  • FIG. 7 illustrates a schematic block diagram of an example, non-limiting system 700 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including an update component in accordance with one or more embodiments described herein.
  • the D3MC architecture 302 can, in some embodiments, comprise all the components discussed in relation to FIG. 6A, and can further include an update component 702 that can update/optimize the compression policy p of the agent component 614.
  • the mathematical methods of updating/optimizing the compression policy of the agent component 614 can depend on the type of policy used (e.g., parameterized vs. non-parameterized/tabular, and so on).
  • the compression policy used by the agent component 614 can be parameterized (e.g., p-g).
  • a policy can be optimized/updated via policy gradient methods known in the art, such as the REINFORCE family of policy gradient optimization.
  • Such methods can update the compression policy function of the agent component 614 directly, without first calculating a state-value and/or action-value function.
  • These methods generally update the parameter vector Q between episodes/time-steps as follows:
  • the performance measure gradient can generally be resolved, after application of the policy gradient theorem, as follows:
  • a t 6 A is an action and/or a sample of an action taken at time/episode t
  • S t 6 S is a state and/or a sample of a state taken at time/episode t
  • G t is the expected return (e.g., discounted sum of rewards and/or average reward expected to be received by following the policy).
  • a state-independent and/or action-independent baseline can be subtracted from G t to reduce variance.
  • the compression policy can be non-parameterized and/or tabular.
  • methods other than policy gradient descent/ascent can be used to optimize the policy (e.g., action-value optimization, dynamic programming, Monte Carlo methods, temporal difference methods, «-step
  • the updates to the compression policy of the agent component 614 can depend on the expected return (e.g., G t ) of following the given policy, and the expected return can itself be a function of the real and/or simulated rewards generated in response to the compressed state(s) of the student network.
  • G t the expected return
  • FIG. 8 illustrates a schematic block diagram of an example, non-limiting system 800 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a reward component in accordance with one or more embodiments described herein.
  • the D3MC architecture 302 can, in various embodiments, have the same components as the system 700 in FIG.
  • the reward component 802 can compute a reward (e.g., via the reward function r MP ⁇ S ® R) based on the compression ratio and the accuracy ratio of a student network after one or more compression actions have been performed.
  • the reward function can be defined as follows:
  • R c can refer to the compression reward (e.g., higher reward for greater compression) and R A can refer to the accuracy reward (e.g., higher reward for greater accuracy). But multiplying these constituent reward values together, the overall reward for a given compressed student network scales with both the compression and the accuracy of the student network.
  • C can represent the compression ratio itself, which, as shown, can be a function of the number of parameters in the compressed student network (e.g., #Parameters student ) and the number of parameters in the original teacher network (e.g., #Parameters teacfter ).
  • the accuracy reward R A can simply be the ratio of the accuracy of the compressed student network (e.g., Accuracy student ) to the accuracy of the original teacher network (e.g.,
  • the accuracy of the student and teacher networks can be determined by respectively training the student and teacher networks on training data 108 and/or test data 112 and then comparing their results to the desired/correct results (e.g., supervised training).
  • desired/correct results e.g., supervised training
  • the reward component 802 can compute rewards using different parameters, variables, formulas, and so on. Regardless of the particular formula used, the reward component 802 can drive direct RL learning of the D3MC architecture 302 by providing real experience (e.g., real rewards based on final state of compressed student network).
  • FIG. 9 illustrates a schematic block diagram of an example, non-limiting system 900 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a deep neural network in accordance with one or more embodiments described herein.
  • the D3MC architecture 302 can, in various embodiments, have the same components as shown in FIG. 8, and can further comprise a deep neural network 902 in the model-based component 612.
  • the deep neural network 902 can learn an environmental model, which the model-based component 612 can then leverage to predict rewards of potential/contemplated compression actions and thereby minimize compression training time of the D3MC architecture 302.
  • the deep neural network 902 can receive one or more samples (e.g., final-state-and-reward tuples) from the model-free component 802 (and/or can receive the rewards from the model-free component 802 and can receive the final-state information from the state component 606, and so on).
  • samples e.g., final-state-and-reward tuples
  • the deep neural network 902 can be trained to predict the rewards that the model-free component 610 would compute for any given state information. This can, in some cases, take the form of supervised training of the deep neural network 902, in which the deep neural network 902 receives as input the final-state information and then iteratively changes its connection weights/biases (e.g., via backpropagation, stochastic gradient descent, and so on) to minimize an error function (e.g., the average squared differences between the actual output of the deep neural network 902 and the actual/correct rewards computed by the model-free component 610, and so on).
  • an error function e.g., the average squared differences between the actual output of the deep neural network 902 and the actual/correct rewards computed by the model-free component 610, and so on.
  • the deep neural network 902 can serve as the environmental model for the model-based component 612, thereby allowing the model-based component 612 to predict at decision-time the reward (e.g., by learning the function r MB ⁇ S ® R) that would likely occur if a particular compression action and/or sequence of compression actions were taken.
  • Training the D3MC architecture 302 in this way can help to reduce the overall convergence time of the D3MC architecture, meaning that it can converge on an optimal neural network compression policy more quickly than a compression architecture using model-free-only approaches could.
  • the model-based component 612 can include the deep neural network 902 that can learn the reward model (e.g., the function r MB ⁇ S ® R) by being directly trained on the real experience outputted from the model-free component 610, the D3MC architecture 302 can avoid suffering a significant loss in compression accuracy.
  • the subject claimed innovation can provide, in a sense, the best of both worlds: sufficiently high compression accuracy without inordinately long convergence times. This constitutes a significant technological benefit in the field of automated neural network compression.
  • the deep neural network 902 can estimate the function r MB to predict a reward for actions that put the student network into state x t .
  • the model r MB can be driven solely by the samples generated by the model-free component 610, which can be more representative of the heuristic data structure.
  • the deep neural network 902 can have any topology (e.g., fully connected, feedforward, recurrent, and so on) and/or any number of layers/neurons.
  • FIG. 10 illustrates a schematic block diagram of an example, non-limiting system 1000 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a machine learning component in accordance with one or more embodiments described herein.
  • the D3MC architecture 302 can, in various embodiments, have the same components as shown in FIG. 8, and can further comprise a machine learning component 1002.
  • FIG. 9 contemplates embodiments containing a specific artificial intelligence structure to learn the environmental model for the model-based component 612 (e.g., the deep neural network 902), FIG.
  • AI artificial intelligence
  • the embodiments of the present innovation herein can employ artificial intelligence (AI) to facilitate automating one or more features of the present innovation.
  • the components can employ various AI-based schemes for carrying out various AI-based schemes.
  • components of the present innovation can examine the entirety or a subset of the data to which it is granted access and can provide for reasoning about or determine states of the system, environment, and so on from a set of observations as captured via events and/or data. Determinations can be employed to identify a specific context or action, or can generate a probability distribution over states, for example.
  • the determinations can be probabilistic; that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Determinations can also refer to techniques employed for composing higher-level events from a set of events and/or data.
  • Such determinations can result in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources.
  • Components disclosed herein can employ various classification (explicitly trained (e.g., via training data) as well as implicitly trained (e.g., via observing behavior, preferences, historical information, receiving extrinsic information, and so on)) schemes and/or systems (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, and so on) in connection with performing automatic and/or determined action in connection with the claimed subject matter.
  • classification schemes and/or systems can be used to automatically learn and perform a number of functions, actions, and/or determinations.
  • Such classification can employ a probabilistic and/or statistical-based analysis (e.g., factoring into the analysis utilities and costs) to determine an action to be automatically performed.
  • a support vector machine (SVM) can be an example of a classifier that can be employed. The SVM operates by finding a hyper-surface in the space of possible inputs, where the hyper-surface attempts to split the triggering criteria from the non-triggering events. Intuitively, this makes the classification correct for testing data that is near, but not identical to training data.
  • directed and undirected model classification approaches include, e.g., naive Bayes, Bayesian networks, decision trees, neural networks, fuzzy logic models, and/or probabilistic classification models providing different patterns of independence, any of which can be employed. Classification as used herein also is inclusive of statistical regression that is utilized to develop models of priority.
  • FIG. 11 illustrates a schematic block diagram of an example, non-limiting system 1100 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach including a value component in accordance with one or more embodiments described herein.
  • the D3MC architecture 302 can, in various embodiments, include the same components as shown in FIG. 9, and can further comprise a value component 1102.
  • the value component 1102 can help implement an actor-critic policy optimization approach in the D3MC architecture 302, thereby helping to even further reduce compression training time.
  • actor-critic optimization can, in some cases, be formulated as follows:
  • the value component 1102 can learn and/or generate a state-value function v (and/or an action- value function) that can be used to update the compression policy of the agent component 614.
  • a state-value function v and/or an action- value function
  • any suitable methods known in the art can be employed (e.g., semi-gradient temporal difference methods, any other temporal difference methods, eligibility traces, «-step bootstrapping, dynamic programming, Monte Carlo methods, SARSA methods, Expected SARSA methods, Q-learning methods, stochastic gradient methods, and so on).
  • FIG. 12 illustrates pseudocode of an example, non- limiting computer-implemented algorithm 1200 that facilitates automated neural network compression via an iterative hybrid reinforcement learning approach in accordance with one or more embodiments described herein.
  • the initial state s 0 of the student network e.g., the network being compressed
  • the initial removal policy parameterization 0 remove O e.g., the parameters of the compression policy that determines whether to remove layers
  • the parameters can be randomly initialized.
  • a for-loop set to run N times can be entered with index i.
  • a nested for-loop set to run Li times (e.g., where Li can be the number of layers in the student network, or in some cases L can represent time-steps, and so on) can be entered with index t.
  • a compression action a t can be taken for each t from 1 to Li.
  • the action a t can be chosen by the removal policy 7r r emove ( s t-i ⁇ Remove, ;-i) based on the previous (e.g., before the policy update at index /) removal policy parameterization ⁇ remove, i -i and the previous (e.g., before a t is taken) state s t-1 .
  • next state s t can be computed based on the previous state s t- and the action just taken a t according to the transition function T , which can be deterministic.
  • the nested for-loop can end, which can leave the student network in state s L
  • a random number u * can be chosen from the interval [0,1] with uniform probability.
  • an if-loop can be entered, asking whether the random number u * is less than some value a.
  • a reward R can be computed using the model-free reward function r MP , discussed above, and the compressed state of the student network s L
  • the model-based function r MB can be trained/learned, as discussed above, based on the reward R computed by the model-free reward function r MP and the compressed state of the student network s L
  • the algorithm can determine whether the random number u * is not less than a.
  • the reward R can be predicted by the model-based reward function r MB based on the compressed state of the student network s L the layer type /, the number of kernels k, the kernel size ks, the stride s, the padding p, and the number of trainable parameters n.
  • the updated policy 0 re m ove,i can be computed based on the gradient of the performance measure At 1230, the first for-loop can finally end.
  • the algorithm can output the optimally compressed student network/model.
  • article of manufacture is intended to encompass a computer program accessible from any computer-readable device or storage media.
  • FIG. 13 illustrates a block diagram of an example, non-limiting operating environment in which one or more embodiments described herein can be facilitated. Repetitive description of like elements employed in other embodiments described herein is omitted for sake of brevity.
  • a suitable operating environment 1300 for implementing various aspects of this disclosure can also include a computer 1312.
  • the computer 1312 can also include a processing unit 1314, a system memory 1316, and a system bus 1318.
  • the system bus 1318 couples system components including, but not limited to, the system memory 1316 to the processing unit 1314.
  • the processing unit 1314 can be any of various available processors.
  • the system bus 1318 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MSA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component
  • ISA Industrial Standard Architecture
  • MSA Micro-Channel Architecture
  • EISA Extended ISA
  • IDE Intelligent Drive Electronics
  • VLB VESA Local Bus
  • the system memory 1316 can also include volatile memory 1320 and nonvolatile memory 1322.
  • nonvolatile memory 1322 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, or nonvolatile random access memory (RAM) (e.g., ferroelectric RAM (FeRAM). Volatile memory 1320 can also include random access memory (RAM), which acts as external cache memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • flash memory or nonvolatile random access memory (RAM) (e.g., ferroelectric RAM (FeRAM).
  • RAM nonvolatile random access memory
  • Volatile memory 1320 can also include random access memory (RAM), which acts as external cache memory.
  • RAM is available in many forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchhnk DRAM (SLDRAM), direct Rambus RAM (DRRAM), direct Rambus dynamic RAM (DRDRAM), and Rambus dynamic RAM.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM Synchhnk DRAM
  • DRRAM direct Rambus RAM
  • DRAM direct Rambus dynamic RAM
  • Rambus dynamic RAM Rambus dynamic RAM
  • Computer 1312 can also include removable/non-removable, volatile/non-volatile computer storage media.
  • FIG. 13 illustrates, for example, a disk storage 1324.
  • Disk storage 1324 can also include, but is not limited to, devices like a magnetic disk drive, floppy disk drive, tape drive, Jaz drive, Zip drive, LS-100 drive, flash memory card, or memory stick.
  • the disk storage 1324 also can include storage media separately or in combination with other storage media including, but not limited to, an optical disk drive such as a compact disk ROM device (CD-ROM), CD recordable drive (CD-R Drive), CD rewritable drive (CD-RW Drive) or a digital versatile disk ROM drive (DVD-ROM).
  • CD-ROM compact disk ROM device
  • CD-R Drive CD recordable drive
  • CD-RW Drive CD rewritable drive
  • DVD-ROM digital versatile disk ROM drive
  • FIG. 13 also depicts software that acts as an intermediary between users and the basic computer resources described in the suitable operating environment 1300.
  • Such software can also include, for example, an operating system 1328.
  • Operating system 1328 which can be stored on disk storage 1324, acts to control and allocate resources of the computer 1312.
  • System applications 1330 take advantage of the management of resources by operating system 1328 through program modules 1332 and program data 1334, e.g., stored either in system memory 1316 or on disk storage 1324. It is to be appreciated that this disclosure can be implemented with various operating systems or combinations of operating systems.
  • Input devices 1336 include, but are not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video camera, web camera, and the like. These and other input devices connect to the processing unit 1314 through the system bus 1318 via interface port(s) 1338.
  • Interface port(s) 1338 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB).
  • Output device(s) 1340 use some of the same type of ports as input device(s) 1336.
  • a USB port can be used to provide input to computer 1312, and to output information from computer 1312 to an output device 1340.
  • Output adapter 1342 is provided to illustrate that there are some output devices 1340 like monitors, speakers, and printers, among other output devices 1340, which require special adapters.
  • the output adapters 1342 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 1340 and the system bus 1318. It should be noted that other devices and/or systems of devices provide both input and output capabilities such as remote computer(s) 1344.
  • Computer 1312 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1344.
  • the remote computer(s) can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1344.
  • the remote computer(s) can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1344.
  • the remote computer(s) 1344 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 1344.
  • 1344 can be a computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically can also include many or all of the elements described relative to computer 1312. For purposes of brevity, only a memory storage device 1346 is illustrated with remote computer(s) 1344.
  • Remote computer(s) 1344 is logically connected to computer 1312 through a network interface 1348 and then physically connected via communication connection 1350.
  • Network interface 1348 encompasses wire and/or wireless communication networks such as local-area networks (LAN), wide-area networks (WAN), cellular networks, etc.
  • LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet, Token Ring and the like.
  • WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).
  • Communication connection(s) 1350 refers to the hardware/software employed to connect the network interface 1348 to the system bus 1318.
  • connection 1350 is shown for illustrative clarity inside computer 1312, it can also be external to computer 1312.
  • the hardware/software for connection to the network interface 1348 can also include, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards.
  • Embodiments can be a system, a computer-implemented method, an apparatus and/or a computer program product at any possible technical detail level of integration
  • the computer program product can include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the herein described embodiments.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium can be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non- exhaustive list of more specific examples of the computer readable storage medium can also include the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network can comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of embodiments can be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the "C"
  • the computer readable program instructions can execute entirely on the user's computer, partly on the user's computer, as a stand alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) can execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the subject innovation.
  • FPGA field-programmable gate arrays
  • PLA programmable logic arrays
  • These computer readable program instructions can also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational acts to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams can represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the blocks can occur out of the order noted in the Figures.
  • two blocks shown in succession can, in fact, be executed substantially concurrently, or the blocks can sometimes be executed in the reverse order, depending upon the functionality involved.
  • program modules can be located in both local and remote memory storage devices.
  • interface can refer to and/or can include a computer-related entity or an entity related to an operational machine with one or more specific functionalities.
  • the entities disclosed herein can be either hardware, a combination of hardware and software, software, or software in execution.
  • a component can be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a server and the server can be a component.
  • One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers.
  • respective components can execute from various computer readable media having various data structures stored thereon.
  • the components can communicate via local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems via the signal).
  • a component can be an apparatus with specific functionality provided by mechanical parts operated by electric or electronic circuitry, which is operated by a software or firmware application executed by a processor.
  • the processor can be internal or external to the apparatus and can execute at least a part of the software or firmware application.
  • a component can be an apparatus that provides specific functionality through electronic components without mechanical parts, wherein the electronic components can include a processor or other means to execute software or firmware that confers at least in part the functionality of the electronic components.
  • a component can emulate an electronic component via a virtual machine, e.g., within a cloud computing system.
  • any aspect or design described herein as an“example” and/or“exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art.
  • processor can refer to substantially any computing processing unit or device comprising, but not limited to, single-core processors; single-processors with software multithread execution capability; multi-core processors; multi-core processors with software multithread execution capability; multi-core processors with hardware multithread technology; parallel platforms; and parallel platforms with distributed shared memory.
  • a processor can refer to an integrated circuit, an application specific integrated circuit (ASIC), a digital signal processor (DSP), a field programmable gate array (FPGA), a programmable logic controller (PLC), a complex programmable logic device (CPLD), a discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • FPGA field programmable gate array
  • PLC programmable logic controller
  • CPLD complex programmable logic device
  • processors can exploit nano-scale architectures such as, but not limited to, molecular and quantum-dot based transistors, switches and gates, in order to optimize space usage or enhance performance of user equipment.
  • a processor can also be implemented as a combination of computing processing units.
  • terms such as“store,”“storage,”“data store,” data storage,”“database,” and substantially any other information storage component relevant to operation and functionality of a component are utilized to refer to“memory components,” entities embodied in a“memory,” or components comprising a memory. It is to be appreciated that memory and/or memory components described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
  • nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), flash memory, or nonvolatile random access memory (RAM) (e.g., ferroelectric RAM (FeRAM).
  • Volatile memory can include RAM, which can act as external cache memory, for example.
  • RAM is available in many forms such as synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM),
  • Synchlink DRAM SLDRAM
  • direct Rambus RAM DRRAM
  • direct Rambus dynamic RAM DRAM
  • RDRAM Rambus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne des systèmes et des procédés mis en œuvre par ordinateur pour faciliter la compression automatisée de réseaux neuronaux artificiels à l'aide d'une approche d'apprentissage de renforcement hybride itératif. Dans divers modes de réalisation, une architecture de compression peut recevoir en entrée un réseau neuronal d'origine à compresser. L'architecture peut effectuer une ou plusieurs actions de compression pour compresser le réseau neuronal d'origine dans un réseau neuronal compressé. L'architecture peut ensuite produire un signal de récompense quantifiant la manière dont le réseau neuronal d'origine a été comprimé. Dans (α)-proportion d'itérations/épisodes de compression, où α ∈ [0,1], le signal de récompense peut être calculé sans modèle sur la base d'un rapport de compression et d'un rapport de précision du réseau neuronal compressé. Dans (1-α)-proportion d'itérations/épisodes de compression, le signal de récompense peut être prédit sur la base de modèle à l'aide d'un modèle de compression appris/appris sur les signaux de récompense calculés sans modèle. Cette architecture hybride basée sur un modèle et sans modèle peut réduire considérablement le temps de convergence sans sacrifier une grande précision.
PCT/US2020/018723 2019-02-26 2020-02-19 Compression de réseau neuronal artificiel par approche d'apprentissage de renforcement hybride itératif WO2020176297A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962810543P 2019-02-26 2019-02-26
US62/810,543 2019-02-26
US16/450,474 2019-06-24
US16/450,474 US20200272905A1 (en) 2019-02-26 2019-06-24 Artificial neural network compression via iterative hybrid reinforcement learning approach

Publications (1)

Publication Number Publication Date
WO2020176297A1 true WO2020176297A1 (fr) 2020-09-03

Family

ID=72141229

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/US2020/018720 WO2020176295A1 (fr) 2019-02-26 2020-02-19 Compression de réseau neuronal artificiel par approche d'apprentissage de renforcement hybride itératif
PCT/US2020/018723 WO2020176297A1 (fr) 2019-02-26 2020-02-19 Compression de réseau neuronal artificiel par approche d'apprentissage de renforcement hybride itératif

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/US2020/018720 WO2020176295A1 (fr) 2019-02-26 2020-02-19 Compression de réseau neuronal artificiel par approche d'apprentissage de renforcement hybride itératif

Country Status (2)

Country Link
US (1) US20200272905A1 (fr)
WO (2) WO2020176295A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204916A (zh) * 2021-04-15 2021-08-03 特斯联科技集团有限公司 基于强化学习的智能决策方法及系统

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117521725A (zh) * 2016-11-04 2024-02-06 渊慧科技有限公司 加强学习系统
CA3076424A1 (fr) * 2019-03-22 2020-09-22 Royal Bank Of Canada Systeme et methode de transmission des connaissances entre reseaux neuronaux
EP3796231A1 (fr) * 2019-09-19 2021-03-24 Robert Bosch GmbH Dispositif et procédé de génération d'un réseau comprimé à partir d'un réseau neuronal entraîné
CN114746870A (zh) * 2019-10-02 2022-07-12 诺基亚技术有限公司 用于神经网络压缩中优先级信令的高级语法
CN110826344B (zh) * 2019-10-24 2022-03-01 北京小米智能科技有限公司 神经网络模型压缩方法、语料翻译方法及其装置
US11388044B1 (en) * 2019-10-25 2022-07-12 Peraton Labs Inc. Independent situational awareness for a network
US11525596B2 (en) 2019-12-23 2022-12-13 Johnson Controls Tyco IP Holdings LLP Methods and systems for training HVAC control using simulated and real experience data
US11429996B2 (en) * 2020-01-21 2022-08-30 International Business Machines Corporation System and method for generating preferred ameliorative actions using generative adversarial networks
US11755603B1 (en) * 2020-03-26 2023-09-12 Amazon Technologies, Inc. Searching compression profiles for trained neural networks
US11620576B1 (en) * 2020-06-22 2023-04-04 Amazon Technologies, Inc. Systems and methods for knowledge transfer in machine learning
EP3944029A1 (fr) * 2020-07-21 2022-01-26 Siemens Aktiengesellschaft Procédé et système pour déterminer un taux de compression d'un modèle ia d'une tâche industrielle
CN112257858B (zh) * 2020-09-21 2024-06-14 华为技术有限公司 一种模型压缩方法及装置
US11941337B1 (en) * 2020-09-30 2024-03-26 Keysight Technologies, Inc. System and method for modeling nonlinear component for use in circuit design
CN112116441B (zh) * 2020-10-13 2024-03-12 腾讯科技(深圳)有限公司 金融风险分类模型的训练方法、分类方法、装置及设备
CN112257848B (zh) * 2020-10-22 2024-04-30 北京灵汐科技有限公司 确定逻辑核布局的方法、模型训练方法、电子设备、介质
CN112472530B (zh) * 2020-12-01 2023-02-03 天津理工大学 一种基于步行比趋势变化的奖励函数建立方法
US20220261685A1 (en) * 2021-02-15 2022-08-18 Bank Of America Corporation Machine Learning Training Device
CN113015152B (zh) * 2021-02-21 2024-02-27 中国电子科技集团公司第二十二研究所 一种基于SARSA(λ)算法的定向天线自组网邻居发现方法
CN117043787A (zh) * 2021-03-15 2023-11-10 高通股份有限公司 内核引导的架构搜索与知识蒸馏
KR102578377B1 (ko) * 2021-03-18 2023-09-15 한국과학기술원 편향-분산 딜레마를 해결하는 뇌모사형 적응 제어를 위한 전자 장치 및 그의 방법
CN113132482B (zh) * 2021-04-13 2022-10-14 河海大学 一种基于强化学习的分布式消息系统参数自适应优化方法
CN113255735B (zh) * 2021-04-29 2024-04-09 平安科技(深圳)有限公司 患者用药方案的确定方法及确定装置
CN113011570B (zh) * 2021-04-30 2023-04-07 电子科技大学 一种采用神经网络压缩系统的人脸表情识别方法
TWI774411B (zh) * 2021-06-07 2022-08-11 威盛電子股份有限公司 模型壓縮方法以及模型壓縮系統
CN113657592B (zh) * 2021-07-29 2024-03-05 中国科学院软件研究所 一种软件定义卫星自适应剪枝模型压缩方法
CN113867147B (zh) * 2021-09-29 2024-06-11 商汤集团有限公司 训练及控制方法、装置、计算设备和介质
CN114186379B (zh) * 2021-10-12 2024-09-24 武汉大学 基于回声网络和深度残差神经网络的变压器状态评估方法
US20230316536A1 (en) * 2022-03-31 2023-10-05 Adobe Inc. Systems and methods for object tracking
CN115856873B (zh) * 2022-11-15 2023-11-07 大连海事大学 一种岸基ais信号可信性判别模型、方法及装置
CN118786439A (zh) * 2023-02-03 2024-10-15 华为技术有限公司 使用强化学习进行低秩分解来压缩深度学习模型的系统和方法
CN117195747B (zh) * 2023-11-02 2024-01-23 北京金鼎兴成磁性材料有限公司 一种磁性材料烘干用均匀热分布优化方法
CN117272839B (zh) * 2023-11-20 2024-02-06 北京阿迈特医疗器械有限公司 基于神经网络的支架压握性能预测方法及装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180260695A1 (en) * 2017-03-07 2018-09-13 Qualcomm Incorporated Neural network compression via weak supervision

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180260695A1 (en) * 2017-03-07 2018-09-13 Qualcomm Incorporated Neural network compression via weak supervision

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
ANUBHAV ASHOK ET AL: "N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 18 September 2017 (2017-09-18), XP080817072 *
FABIO PARDO ET AL: "Time Limits in Reinforcement Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 1 December 2017 (2017-12-01), XP081106145 *
HE YIHUI ET AL: "AMC: AutoML for Model Compression and Acceleration on Mobile Devices", 6 October 2018, INTERNATIONAL CONFERENCE ON FINANCIAL CRYPTOGRAPHY AND DATA SECURITY; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], SPRINGER, BERLIN, HEIDELBERG, PAGE(S) 815 - 832, ISBN: 978-3-642-17318-9, XP047488271 *
NAGABANDI ANUSHA ET AL: "Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning", 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE, 21 May 2018 (2018-05-21), pages 7559 - 7566, XP033403772, DOI: 10.1109/ICRA.2018.8463189 *
SHIXIANG GU ET AL: "Continuous Deep Q-Learning with Model-based Acceleration", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 2 March 2016 (2016-03-02), XP080686753 *
SUTTON R S: "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming", MACHINE LEARNING. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE,, no. 7th conf, 21 June 1990 (1990-06-21), pages 216 - 224, XP009103093 *
VINCENT FRANCOIS-LAVET ET AL: "An Introduction to Deep Reinforcement Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 November 2018 (2018-11-30), XP081434255, DOI: 10.1561/2200000071 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204916A (zh) * 2021-04-15 2021-08-03 特斯联科技集团有限公司 基于强化学习的智能决策方法及系统
CN113204916B (zh) * 2021-04-15 2021-11-19 特斯联科技集团有限公司 基于强化学习的智能决策方法及系统

Also Published As

Publication number Publication date
US20200272905A1 (en) 2020-08-27
WO2020176295A1 (fr) 2020-09-03

Similar Documents

Publication Publication Date Title
US20200272905A1 (en) Artificial neural network compression via iterative hybrid reinforcement learning approach
Sangiorgio et al. Robustness of LSTM neural networks for multi-step forecasting of chaotic time series
Ma et al. End-to-end incomplete time-series modeling from linear memory of latent variables
Ritchie et al. Deep amortized inference for probabilistic programs
US20230108874A1 (en) Generative digital twin of complex systems
Kutyniok The mathematics of artificial intelligence
Pfeiffer et al. Reward-modulated Hebbian learning of decision making
Koeppe et al. Explainable artificial intelligence for mechanics: physics-explaining neural networks for constitutive models
KR20210117331A (ko) 회귀 신경망의 르장드르 메모리 유닛
Langarica et al. Contrastive blind denoising autoencoder for real time denoising of industrial IoT sensor data
Koeppe et al. Explainable artificial intelligence for mechanics: physics-informing neural networks for constitutive models
CN114694379A (zh) 一种基于自适应动态图卷积的交通流预测方法及系统
US20230206036A1 (en) Method for generating a decision support system and associated systems
US20230259802A1 (en) Generative Modeling of Quantum Hardware
Ray et al. Deep learning and computational physics (lecture notes)
EP4339840A1 (fr) Procédé et dispositif de construction de base de données d'apprentissage
CN117787470A (zh) 一种基于ewt和集成方法的时序预测方法和系统
Fakhari et al. A new restricted boltzmann machine training algorithm for image restoration
Xia et al. VI-DGP: A variational inference method with deep generative prior for solving high-dimensional inverse problems
Zeyad et al. Utilising Artificial Intelligence for Disease Classification and Prediction
Xiang et al. Semi-parametric training of autoencoders with Gaussian kernel smoothed topology learning neural networks
Paniagua et al. Nonlinear system identification using modified variational autoencoders
CN110853754A (zh) 一种在非确定性和非完整性条件下的决策支持系统方法
Jordana et al. Learning dynamical systems from noisy sensor measurements using multiple shooting
Carden et al. Small-sample reinforcement learning: Improving policies using synthetic data 1

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20711712

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20711712

Country of ref document: EP

Kind code of ref document: A1