US20210350233A1 - System and Method for Automated Precision Configuration for Deep Neural Networks - Google Patents

System and Method for Automated Precision Configuration for Deep Neural Networks Download PDF

Info

Publication number
US20210350233A1
US20210350233A1 US17/250,928 US201917250928A US2021350233A1 US 20210350233 A1 US20210350233 A1 US 20210350233A1 US 201917250928 A US201917250928 A US 201917250928A US 2021350233 A1 US2021350233 A1 US 2021350233A1
Authority
US
United States
Prior art keywords
precision
optimal
configuration
engine
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/250,928
Inventor
Ehsan SABOORI
Davis Mangan SAWYER
MohammadHossein ASKARIHEMMAT
Olivier MASTROPIETRO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Deeplite Inc
Original Assignee
Deeplite Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Deeplite Inc filed Critical Deeplite Inc
Priority to US17/250,928 priority Critical patent/US20210350233A1/en
Assigned to DEEPLITE INC. reassignment DEEPLITE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TANDEMLAUNCH INC.
Assigned to TANDEMLAUNCH INC. reassignment TANDEMLAUNCH INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ASKARIHEMMAT, Mohammadhossein, MASTROPIETRO, Olivier, SABOORI, Ehsan, SAWYER, Davis Mangan
Publication of US20210350233A1 publication Critical patent/US20210350233A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2115Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
    • G06K9/6231
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0454
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the following relates to systems and methods for automated precision configuration for deep neural networks, for example by enabling low bit-precision weights and activations to be used effectively.
  • DNNs deep neural networks
  • CPUs Graphics Processing Units
  • CPUs Central Processing Units
  • new computer processors specifically designed for artificial intelligence (AI) applications have emerged.
  • These dedicated processors such as Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs) and analog computers offer the promise of more efficient and accessible AI products and services.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • analog computers offer the promise of more efficient and accessible AI products and services.
  • designing DNN models optimized for these new processors remains a significant challenge for AI engineers and application developers.
  • Prior solutions include a variety of core quantization techniques for various DNN model architectures, as well as having efficient kernels for computation in reduced precision like ARM CMSIS, Intel MKL-DNN and Nvidia TensorRT.
  • the main approach to model quantization is by uniform precision reduction across all layers of a DNN, for example from 32 bit Floating Point to 16 bit, or to 8 bit INT. It has been observed that once a model is trained, a lower bit precision is acceptable for the weights and activations of a DNN model to correctly compute the inference label for a given input. For this reason, many developers and hardware providers are developing in-house or add-on quantization methods that can naively convert the weights and activations of a DNN model to a supported precision for the target hardware (HW).
  • HW target hardware
  • the following relate to deep learning algorithms, for example, deep neural networks.
  • a method for automated precision configuration, specifically quantization of DNN weights and activations, is described.
  • the following relates to the design of a learning process to leverage trade-offs in different deep neural network precision configurations using computation constraints and hardware properties as inputs.
  • the learning process trains an optimizer agent to adapt large, full precision networks into smaller networks of similar performance that satisfy target constraints in a platform-aware way. By design, the learning process and agent is agnostic to both network architecture and target hardware platform.
  • a method of automated precision configuration for deep neural networks comprising: obtaining an input model and one or more constraints associated with an application and/or target device or process used in the application configured to utilize a deep neural network; learning an optimal low-precision configuration of the optimal architecture using the input model, constraints, the training data set, and the validation data set; and deploying the optimal configuration on the target device or process for use in the application.
  • a computer readable medium comprising computer executable instructions for automated design space exploration for deep neural networks, the computer executable instructions comprising instructions for performing the above method.
  • a deep neural network optimization engine configured to perform automated precision configuration for deep neural networks, the engine comprising a processor and memory, the memory comprising computer executable instructions for performing the above method.
  • FIG. 1 is a schematic diagram of a system for optimizing a DNN for use in a target device or process used in an artificial intelligence (AI) application;
  • AI artificial intelligence
  • FIG. 2 is a block diagram of an example of a DNN optimization engine
  • FIG. 3 is a graph comparing energy consumption and computation costs for various example network designs
  • FIG. 4 is a flow chart illustrating a process for optimizing an input DNN for deployment on a target device or process
  • FIG. 5 is a flow chart illustrating operations performed in learning an optimal low precision configuration.
  • DNN application designers are faced with stringent power, memory and cost requirements which often leads to inefficient solutions, possibly preventing people from moving to these devices.
  • the system described below can be used to make deep learning applicable, affordable and scalable by bridging the gap between DNNs and hardware back-ends. To do so, a scalable, DNN-agnostic engine is provided, which can enable a platform-aware optimization.
  • the engine targets information inefficiency in the implementation of DNNs, making them applicable for low-end devices. To provide such functionality, the engine:
  • One of the core challenges with model optimization for DNN inference is evaluating which precision configuration is best-suited for a given application.
  • the engine described herein uses an AI-driven optimizer to overcome the drawbacks of manual model quantization.
  • Information inefficiencies and novel supported bit-precisions for AI hardware are leveraged to effectively quantize the layers of a network in a platform-aware way.
  • FIG. 1 illustrates a DNN optimization engine 10 which is configured, as described below, to take an initial DNN 12 and generate or otherwise determine an optimized DNN 14 to be used by or deployed upon a target device or process 16 , the “target 16 ” for brevity.
  • the target 16 is used in or purposed for an AI application 18 that uses the optimized DNN 14 .
  • the Al application 18 has one or more application constraints 19 that dictate how the optimized DNN 14 is generated or chosen.
  • FIG. 2 illustrates an example of an architecture for the DNN optimization engine 10 .
  • the engine 10 in this example configuration includes a model converter 22 which can interface with a number of frameworks 20 , an intermediate representation model 24 , a design space exploration module 26 , a quantizer 28 , and mapping algorithms 30 that can include algorithms for both heterogeneous hardware 32 and homogeneous hardware 34 .
  • the engine 10 is also interfaces with a target hardware (HW) platform 16 .
  • the design space exploration module 26 , quantizer 28 , and mapping algorithms 30 adopt, apply, consider, or otherwise take into account the constraints 19 .
  • the constraints include accuracy, power, cost, supported precision, speed, among others that are possible as shown in dashed lines.
  • FIG. 1 the constraints include accuracy, power, cost, supported precision, speed, among others that are possible as shown in dashed lines.
  • the engine 10 addresses inference optimization of DNNs by leveraging state-of-the-art algorithms and methodologies to make DNNs applicable for any device 16 .
  • This provides an end-to-end framework to optimize DNNs from different deep learning framework front-ends down to low-level machine code for multiple hardware back-ends.
  • the engine 10 is configured to support multiple frameworks 20 (e.g. TensorFlow, Pytorch, etc.) and DNN architectures (e.g. CNN, RNN, etc.), to facilitate applying the engine's capabilities on different projects with different AI frameworks 20 .
  • frameworks 20 e.g. TensorFlow, Pytorch, etc.
  • DNN architectures e.g. CNN, RNN, etc.
  • two layers are included, namely: a) the model convertor 22 which contains each AI frameworks' specifications and DNNs' parser to produce the intermediate representation model (IRM) 24 from the original model; and b) the IRM 24 which represents all DNN models in a standard format.
  • the engine 10 also provides content aware optimization, by providing a two-level intermediate layer composed of: a) the design space exploration module 26 , which is an intermediate layer for finding a smaller architecture with similar performance as the given model to reduce memory footprint and computation (described in greater detail below); and b) the quantizer 28 , which is a low-level layer for quantizing the network to gain further computation speedup.
  • a two-level intermediate layer composed of: a) the design space exploration module 26 , which is an intermediate layer for finding a smaller architecture with similar performance as the given model to reduce memory footprint and computation (described in greater detail below); and b) the quantizer 28 , which is a low-level layer for quantizing the network to gain further computation speedup.
  • DNNs are heavily dependent on the design of hyper-parameters like the number of hidden layers, nodes per layer and activation functions, which have traditionally been optimized manually.
  • hardware constraints 19 such as memory and power should be considered to optimize the model effectively. Given spaces can easily exceed thousands of solutions, it can be intractable to find a near-optimal solution manually.
  • Quantizing DNNs has the potential to decrease complexity and memory footprint and facilitate potential deployment on the edge devices.
  • precision is typically considered at the design level of an entire model, making it difficult to consider as a tunable hyper parameter.
  • exploring efficient precision requires tight integration between the network design, training and implementation, which is not always feasible.
  • Typical implementations of low precision DNNs use uniform precision across all layers of the network while mixed-precision leads to better performance.
  • the engine 10 described herein exploits low precision weights using reinforcement learning to learn an optimal precision configuration across the neural network where each layer may have different precision to get the best out of the target platform 16 .
  • the engine 10 also supports uniform precision, fixed-point, dynamic fixed-point and binary/ternary networks.
  • the platform aware optimization layer that includes the mapping algorithms 30 is configured to address this challenge.
  • This layer contains standard transformation primitives commonly found in commodity hardware such as CPUs, GPUs, FPGAs, etc.
  • This additional layer provides a toolset to optimize DNNs for FPGAs and automatically map them onto FPGAs for model inference. This automated toolset can save design time significantly.
  • many homogeneous and heterogeneous multicore architectures have been introduced currently to continually improve system performance. Compared to homogeneous multicore systems, heterogeneous ones offer more computation power and efficient energy consumption because of the utilization of specialized cores for specific functions and each computational unit provides distinct resource efficiencies when executing different inference phases of deep models (e.g.
  • the engine 10 provides optimization primitives targeted at heterogeneous hardware 32 , by automatically splitting the DNN's computation on different hardware cores to maximize energy-efficiency and execution time on the target hardware 16 .
  • platform aware optimization techniques in combination with content aware optimization techniques achieves significant performance cost reduction across different hardware platforms while delivering the same inference accuracy compared to the state-of-the-art deep learning approaches.
  • the engine 10 provides a quantizer 28 which formulates the quantization problem as a multi-objective design space exploration 42 for DNNs with respect to the supported precisions of the target hardware 16 , where reinforcement learning-based agents 50 (see also FIG. 5 ) exploits low precision weights by learning an optimal precision configuration across the neural network where the precision assigned to each layer may different (mixed-precision) to get the best out of the target platform 16 , when it is then deployed on the target platform 16 at step 46 .
  • the engine 10 provides for automated optimization of deep learning algorithms.
  • the engine 10 also employs an efficient process for design space exploration 26 of DNNs that can satisfy target computation constraints 19 such as speed, model size, accuracy, power consumption, etc.
  • target computation constraints 19 such as speed, model size, accuracy, power consumption, etc.
  • the proposed process makes this possible by automatically producing an optimized DNN model suitable for the production environment and hardware 16 . Referring to FIG.
  • the agent 50 receives as inputs an initial DNN or teacher model 40 , training data set 52 and target constraints 19 . This can be done using the existing deep learning frameworks, without the need to introduce a new framework and the associated engineering overhead.
  • the agent 50 then generates a new precision configuration from the initial DNN based on target constraints 19 .
  • the agent 50 receives a reward based on the performance of the adapted model measured on the training data set 52 , guiding the process towards a feasible design.
  • the learning process can converge on a feasible precision configuration using minimal computing resources, time and human expert interaction. This process overcomes the disadvantages of manual optimization, which is often limited to certain DNN architectures, applications, hardware platforms and requires domain expertise.
  • the process is a universal method to leverage trade-offs in different DNN precision configuration and to ensure that target computation constraints are met. Furthermore, the process benefits end-users with multiple DNNs in production, each requiring updates and re-training at various intervals by providing a fast, lightweight and flexible method for designing new and compact DNNs. This approach advances current approaches by enabling resource-efficient DNNs that economize data centers, are available for use on low-end, affordable hardware and are accessible to a wider audience aiming to use deep learning algorithms in daily environments.
  • a policy 53 exploits low precision weights by learning an optimal precision configuration across the neural network where the precision assigned to each layer may be different.
  • the supported precisions by the target hardware e.g. INT8, INT16, F16 etc.
  • bit-budget need to be defined as constraints 19 for this step 42 .
  • the agent 53 observes a state that is generated through applying steps 58 - 64 .
  • the reinforcement learning policy repeatedly generates a set of precision configurations, with respect to supported precisions and bit-budget, to create new networks by altering layers' precisions.
  • This step 42 produces a quantized network at step 58 that is fine-tuned via knowledge distillation at step 60 on the training data set 52 and subsequently evaluated at step 62 for accuracy on the validation data set 54 .
  • the agent 50 then updates the policy 53 based on the reward achieved by the new architecture. Over a series of iterations, the agent 50 will select the precision configuration that achieves the best reward determined by the reward function 64 , for the given constraints 19 on the target computing hardware platform 16 . Once this model has been selected, the user can deploy the optimized model in production on their specified hardware(s).
  • the engine 10 leverages the class of function-preserving transformations that help to initialize the new network to represent the same function as the given network but use different parameterization to be further trained to improve the performance.
  • Knowledge distillation at step 60 has been employed as a component of the training process to accelerate the training of the student network, especially for large networks.
  • the transformation actions may lead to defected networks (e.g. not realistic kernel size, number of filters, etc.). It is not worth it to train these networks as they cannot learn properly. To improve the training process, an apparatus has been employed to detect these defected networks earlier and cut off the learning process by using a negative reward for them.
  • defected networks e.g. not realistic kernel size, number of filters, etc.
  • any module or component exemplified herein that executes instructions may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape.
  • Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the engine 10 , any component of or related to the engine, etc., or accessible or connectable thereto. Any application or module herein described may be implemented using computer readable/executable instructions that may be stored or otherwise held by such computer readable media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

There is provided a system and method of automated precision configuration for deep neural networks. The method includes obtaining an input model and one or more constraints associated with an application and/or target device or process used in the application configured to utilize a deep neural network; learning an optimal low-precision configuration of the architecture using constraints, the training data set, and the validation data set; and deploying the optimal configuration on the target device or process for use in the application.

Description

    CROSS-REFERENCE TO RELATED APPLICATION(S)
  • This application claims priority to U.S. Provisional Patent Application No. 62/769,403 filed on Nov. 19, 2018, the contents of which are incorporated herein by reference.
  • TECHNICAL FIELD
  • The following relates to systems and methods for automated precision configuration for deep neural networks, for example by enabling low bit-precision weights and activations to be used effectively.
  • BACKGROUND
  • In modern intelligent applications and devices, deep neural networks (DNNs) have become ubiquitous when solving complex computer tasks, such as recognizing objects in images and translating natural language. The success of these networks has been largely dependent on high performance computing machinery, such as Graphics Processing Units (GPUs) and server-class Central Processing Units (CPUs). Consequently, the adoption of DNNs to solve real-world problems is typically limited to scenarios where such computing is available. Recently, many new computer processors specifically designed for artificial intelligence (AI) applications have emerged. These dedicated processors, such as Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs) and analog computers offer the promise of more efficient and accessible AI products and services. However, designing DNN models optimized for these new processors remains a significant challenge for AI engineers and application developers. Significant domain expertise and trial-and-error is often required to create an optimized DNN for a specialized hardware. One of the main challenges is how to enable a precision configuration for a given DNN architecture that maintains accuracy and optimizes for memory, energy and latency performance on a given hardware architecture. The task of quantizing individual layers of a DNN, which can contain dozens of layers, often results in sub optimal performance in a real-world environment. Thus, there is significant interest in automating the task of enabling a precision configuration for an entire DNN architecture that considers the properties of the hardware architecture to optimize memory, energy and latency as well as maintain a desired level of accuracy on the given dataset.
  • To address these problems, there has been a widespread push in academia and industry to make deep learning models more efficient by considering the properties of the hardware architecture in the model optimization process. Many techniques have been proposed for manual quantization of DNNs that show lower bit precision models are feasible for accurate inferencing on new input data.
  • Prior solutions include a variety of core quantization techniques for various DNN model architectures, as well as having efficient kernels for computation in reduced precision like ARM CMSIS, Intel MKL-DNN and Nvidia TensorRT. The main approach to model quantization is by uniform precision reduction across all layers of a DNN, for example from 32 bit Floating Point to 16 bit, or to 8 bit INT. It has been observed that once a model is trained, a lower bit precision is acceptable for the weights and activations of a DNN model to correctly compute the inference label for a given input. For this reason, many developers and hardware providers are developing in-house or add-on quantization methods that can naively convert the weights and activations of a DNN model to a supported precision for the target hardware (HW). However, when this process is applied and the model is attempted to run on a different HW, the result can often be slower, or the model may be incompatible with the new HW. Additionally, these uniform quantization approaches are often found to sacrifice too much accuracy or limit network performance on complex and large data sets.
  • At present, two fundamental challenges exist with current quantization techniques, namely: 1) that hand-crafted features and domain expertise is required for automated quantization 2) that time-consuming fine-tuning is often necessary to maintain accuracy.
  • There exists a need for scalable, automated processes for model quantization on diverse DNN architectures and hardware back-ends. Generally, it is found that the current capacity for model quantization is outpaced by the rapid development of new DNNs and disparate hardware platforms that aim to increase the applicability and efficiency of deep learning workloads.
  • It is an object of the following to address at least one of the above-mentioned challenges.
  • SUMMARY
  • It is recognized that a general approach that is agnostic to both the architecture and target hardware(s) is needed to optimize DNNs, making them faster, smaller and energy-efficient for use in daily life. The following relate to deep learning algorithms, for example, deep neural networks. A method for automated precision configuration, specifically quantization of DNN weights and activations, is described. The following relates to the design of a learning process to leverage trade-offs in different deep neural network precision configurations using computation constraints and hardware properties as inputs. The learning process trains an optimizer agent to adapt large, full precision networks into smaller networks of similar performance that satisfy target constraints in a platform-aware way. By design, the learning process and agent is agnostic to both network architecture and target hardware platform.
  • In one aspect, there is provided a method of automated precision configuration for deep neural networks, the method comprising: obtaining an input model and one or more constraints associated with an application and/or target device or process used in the application configured to utilize a deep neural network; learning an optimal low-precision configuration of the optimal architecture using the input model, constraints, the training data set, and the validation data set; and deploying the optimal configuration on the target device or process for use in the application.
  • In another aspect, there is provided a computer readable medium comprising computer executable instructions for automated design space exploration for deep neural networks, the computer executable instructions comprising instructions for performing the above method.
  • In yet another aspect, there is provided a deep neural network optimization engine configured to perform automated precision configuration for deep neural networks, the engine comprising a processor and memory, the memory comprising computer executable instructions for performing the above method.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • One or more embodiments will now be described with reference to the appended drawings wherein:
  • FIG. 1 is a schematic diagram of a system for optimizing a DNN for use in a target device or process used in an artificial intelligence (AI) application;
  • FIG. 2 is a block diagram of an example of a DNN optimization engine;
  • FIG. 3 is a graph comparing energy consumption and computation costs for various example network designs;
  • FIG. 4 is a flow chart illustrating a process for optimizing an input DNN for deployment on a target device or process; and
  • FIG. 5 is a flow chart illustrating operations performed in learning an optimal low precision configuration.
  • DETAILED DESCRIPTION
  • All should be accessible and beneficial to various applications in everyday life. With the emergence of deep learning on embedded and mobile devices, DNN application designers are faced with stringent power, memory and cost requirements which often leads to inefficient solutions, possibly preventing people from moving to these devices. The system described below can be used to make deep learning applicable, affordable and scalable by bridging the gap between DNNs and hardware back-ends. To do so, a scalable, DNN-agnostic engine is provided, which can enable a platform-aware optimization. The engine targets information inefficiency in the implementation of DNNs, making them applicable for low-end devices. To provide such functionality, the engine:
      • is configured to be architecture independent, allowing the engine to support different DNN architectures such as convolution neural networks (CNNs), recurrent neural networks (RNNs), etc.;
      • is configured to be framework agnostic, enabling developers to readily apply the engine to a project without additional engineering overhead;
      • is configured to be hardware agnostic, helping end-users to readily change the back-end hardware or port a model from one hardware to another; and.
  • One of the core challenges with model optimization for DNN inference is evaluating which precision configuration is best-suited for a given application. The engine described herein uses an AI-driven optimizer to overcome the drawbacks of manual model quantization. Based on computation constraints i.e. a “bit budget”, a software agent selectively changes the bit precision of different layers in the model. Information inefficiencies and novel supported bit-precisions for AI hardware are leveraged to effectively quantize the layers of a network in a platform-aware way.
  • Turning now to the figures, FIG. 1 illustrates a DNN optimization engine 10 which is configured, as described below, to take an initial DNN 12 and generate or otherwise determine an optimized DNN 14 to be used by or deployed upon a target device or process 16, the “target 16” for brevity. The target 16 is used in or purposed for an AI application 18 that uses the optimized DNN 14. The Al application 18 has one or more application constraints 19 that dictate how the optimized DNN 14 is generated or chosen.
  • FIG. 2 illustrates an example of an architecture for the DNN optimization engine 10. The engine 10 in this example configuration includes a model converter 22 which can interface with a number of frameworks 20, an intermediate representation model 24, a design space exploration module 26, a quantizer 28, and mapping algorithms 30 that can include algorithms for both heterogeneous hardware 32 and homogeneous hardware 34. The engine 10 is also interfaces with a target hardware (HW) platform 16. The design space exploration module 26, quantizer 28, and mapping algorithms 30 adopt, apply, consider, or otherwise take into account the constraints 19. In this example, the constraints include accuracy, power, cost, supported precision, speed, among others that are possible as shown in dashed lines. FIG. 2 illustrates a framework with maximum re-use in mind, so that new AI frameworks 20, new DNN architectures and new hardware architectures can be easily added to a platform utilizing the engine 10. The engine 10 addresses inference optimization of DNNs by leveraging state-of-the-art algorithms and methodologies to make DNNs applicable for any device 16. This provides an end-to-end framework to optimize DNNs from different deep learning framework front-ends down to low-level machine code for multiple hardware back-ends.
  • For the model converter 22, the engine 10 is configured to support multiple frameworks 20 (e.g. TensorFlow, Pytorch, etc.) and DNN architectures (e.g. CNN, RNN, etc.), to facilitate applying the engine's capabilities on different projects with different AI frameworks 20. To do so, two layers are included, namely: a) the model convertor 22 which contains each AI frameworks' specifications and DNNs' parser to produce the intermediate representation model (IRM) 24 from the original model; and b) the IRM 24 which represents all DNN models in a standard format.
  • The engine 10 also provides content aware optimization, by providing a two-level intermediate layer composed of: a) the design space exploration module 26, which is an intermediate layer for finding a smaller architecture with similar performance as the given model to reduce memory footprint and computation (described in greater detail below); and b) the quantizer 28, which is a low-level layer for quantizing the network to gain further computation speedup.
  • Regarding the design space exploration module 26, DNNs are heavily dependent on the design of hyper-parameters like the number of hidden layers, nodes per layer and activation functions, which have traditionally been optimized manually. Moreover, hardware constraints 19 such as memory and power should be considered to optimize the model effectively. Given spaces can easily exceed thousands of solutions, it can be intractable to find a near-optimal solution manually.
  • Quantizing DNNs has the potential to decrease complexity and memory footprint and facilitate potential deployment on the edge devices. However, precision is typically considered at the design level of an entire model, making it difficult to consider as a tunable hyper parameter. Moreover, exploring efficient precision requires tight integration between the network design, training and implementation, which is not always feasible. Typical implementations of low precision DNNs use uniform precision across all layers of the network while mixed-precision leads to better performance. The engine 10 described herein exploits low precision weights using reinforcement learning to learn an optimal precision configuration across the neural network where each layer may have different precision to get the best out of the target platform 16. Besides mixed-precision, the engine 10 also supports uniform precision, fixed-point, dynamic fixed-point and binary/ternary networks.
  • It is also recognized that a major challenge lies in enabling support for multiple hardware back-ends while keeping compute, memory and energy footprints at their lowest. Content aware optimization alone is not considered to be enough to solve the challenge of supporting different hardware back ends. The reason being that primitive operations like convolution or matrix multiplication may be mapped and optimized in very different ways for each hardware back-end. These hardware-specific optimizations can vary drastically in terms of memory layout, parallelization threading patterns, caching access patterns and choice of hardware primitives.
  • The platform aware optimization layer that includes the mapping algorithms 30 is configured to address this challenge. This layer contains standard transformation primitives commonly found in commodity hardware such as CPUs, GPUs, FPGAs, etc. This additional layer provides a toolset to optimize DNNs for FPGAs and automatically map them onto FPGAs for model inference. This automated toolset can save design time significantly. Importantly, many homogeneous and heterogeneous multicore architectures have been introduced currently to continually improve system performance. Compared to homogeneous multicore systems, heterogeneous ones offer more computation power and efficient energy consumption because of the utilization of specialized cores for specific functions and each computational unit provides distinct resource efficiencies when executing different inference phases of deep models (e.g. Binary network on FPGA, full precision part on GPU/DSP, regular arithmetic operations on CPU, etc.). The engine 10 provides optimization primitives targeted at heterogeneous hardware 32, by automatically splitting the DNN's computation on different hardware cores to maximize energy-efficiency and execution time on the target hardware 16.
  • Using platform aware optimization techniques in combination with content aware optimization techniques achieves significant performance cost reduction across different hardware platforms while delivering the same inference accuracy compared to the state-of-the-art deep learning approaches.
  • For example, assume an application that desires to run a CNN on a low-end hardware with 60 MB memory. The model size is 450 MB and it needs to meet 10 ms critical response time for each inference operation. The model is 95% accurate, however, 90% accuracy is also acceptable. The CNN designers usually use GPUs to train and run their models, but they would now need to deal with memory and computation power limitations, new hardware architecture and satisfying all constraints (such as memory and accuracy) in the same time. It is considered infeasible to find a solution for the target hardware or may require tremendous engineering effort. In contrast, using the engine 10, and specifying the constraints 19, a user can effectively produce the optimized model by finding a feasible solution, reducing time to market and engineering effort, as illustrated in the chart shown in FIG. 3.
  • Referring now to FIG. 4, the engine 10 provides a quantizer 28 which formulates the quantization problem as a multi-objective design space exploration 42 for DNNs with respect to the supported precisions of the target hardware 16, where reinforcement learning-based agents 50 (see also FIG. 5) exploits low precision weights by learning an optimal precision configuration across the neural network where the precision assigned to each layer may different (mixed-precision) to get the best out of the target platform 16, when it is then deployed on the target platform 16 at step 46.
  • The engine 10 provides for automated optimization of deep learning algorithms. The engine 10 also employs an efficient process for design space exploration 26 of DNNs that can satisfy target computation constraints 19 such as speed, model size, accuracy, power consumption, etc. There is provided a learning process for training optimizer agents that automatically explore design trade-offs starting with large, initial DNNs to produce compact DNN designs in a data-driven way. Once an engineer has trained an initial deep neural network on a training data set to achieve a target accuracy for a task, they would then need to satisfy other constraints for the real-world production environment and computing hardware. The proposed process makes this possible by automatically producing an optimized DNN model suitable for the production environment and hardware 16. Referring to FIG. 5, the agent 50 receives as inputs an initial DNN or teacher model 40, training data set 52 and target constraints 19. This can be done using the existing deep learning frameworks, without the need to introduce a new framework and the associated engineering overhead. The agent 50 then generates a new precision configuration from the initial DNN based on target constraints 19. The agent 50 receives a reward based on the performance of the adapted model measured on the training data set 52, guiding the process towards a feasible design. The learning process can converge on a feasible precision configuration using minimal computing resources, time and human expert interaction. This process overcomes the disadvantages of manual optimization, which is often limited to certain DNN architectures, applications, hardware platforms and requires domain expertise. The process is a universal method to leverage trade-offs in different DNN precision configuration and to ensure that target computation constraints are met. Furthermore, the process benefits end-users with multiple DNNs in production, each requiring updates and re-training at various intervals by providing a fast, lightweight and flexible method for designing new and compact DNNs. This approach advances current approaches by enabling resource-efficient DNNs that economize data centers, are available for use on low-end, affordable hardware and are accessible to a wider audience aiming to use deep learning algorithms in daily environments.
  • In step 42, shown in FIG. 5, a policy 53 exploits low precision weights by learning an optimal precision configuration across the neural network where the precision assigned to each layer may be different. The supported precisions by the target hardware (e.g. INT8, INT16, F16 etc.) and bit-budget need to be defined as constraints 19 for this step 42. As shown in FIG. 5, the agent 53 observes a state that is generated through applying steps 58-64. The reinforcement learning policy repeatedly generates a set of precision configurations, with respect to supported precisions and bit-budget, to create new networks by altering layers' precisions. This step 42 produces a quantized network at step 58 that is fine-tuned via knowledge distillation at step 60 on the training data set 52 and subsequently evaluated at step 62 for accuracy on the validation data set 54. The agent 50 then updates the policy 53 based on the reward achieved by the new architecture. Over a series of iterations, the agent 50 will select the precision configuration that achieves the best reward determined by the reward function 64, for the given constraints 19 on the target computing hardware platform 16. Once this model has been selected, the user can deploy the optimized model in production on their specified hardware(s).
  • To reuse weights, the engine 10 leverages the class of function-preserving transformations that help to initialize the new network to represent the same function as the given network but use different parameterization to be further trained to improve the performance. Knowledge distillation at step 60 has been employed as a component of the training process to accelerate the training of the student network, especially for large networks.
  • The transformation actions may lead to defected networks (e.g. not realistic kernel size, number of filters, etc.). It is not worth it to train these networks as they cannot learn properly. To improve the training process, an apparatus has been employed to detect these defected networks earlier and cut off the learning process by using a negative reward for them.
  • For simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the examples described herein. However, it will be understood by those of ordinary skill in the art that the examples described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the examples described herein. Also, the description is not to be considered as limiting the scope of the examples described herein.
  • It will be appreciated that the examples and corresponding diagrams used herein are for illustrative purposes only. Different configurations and terminology can be used without departing from the principles expressed herein. For instance, components and modules can be added, deleted, modified, or arranged with differing connections without departing from these principles.
  • It will also be appreciated that any module or component exemplified herein that executes instructions may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the engine 10, any component of or related to the engine, etc., or accessible or connectable thereto. Any application or module herein described may be implemented using computer readable/executable instructions that may be stored or otherwise held by such computer readable media.
  • The steps or operations in the flow charts and diagrams described herein are just for example. There may be many variations to these steps or operations without departing from the principles discussed above. For instance, the steps may be performed in a differing order, or steps may be added, deleted, or modified.
  • Although the above principles have been described with reference to certain specific examples, various modifications thereof will be apparent to those skilled in the art as outlined in the appended claims.

Claims (19)

1. A method of automated precision configuration for deep neural networks, the method comprising:
obtaining an input model and one or more constraints associated with an application and/or target device or process used in the application configured to utilize a deep neural network;
learning an optimal low-precision configuration of the optimal architecture using the input model, constraints, a training data set, and a validation data set; and
deploying the optimal configuration on the target device or process for use in the application.
2. The method of claim 1, wherein the optimal configuration is learned using a policy to generate an optimized model from the input model.
3. The method of claim 2, wherein the optimal low-precision configuration of the optimal architecture is learned using the policy to generate a quantized network, the method further comprising:
fine tuning the quantized network with a knowledge distillation process;
evaluating the fine-tuned network;
applying a reward function; and
iterating for at least one additional quantized network and selecting the optimal low-precision configuration.
4. The method of claim 3, wherein selecting the optimal low-precision configuration comprises selecting a precision configuration that achieves the best reward as determined by the reward function, for the constraints on the target device or process.
5. The method of claim 1, wherein learning the optimal low-precision configuration comprises exploiting low precision weights using reinforcement learning to learn the optimal low-precision configuration across the deep neural network.
6. The method of claim 5, wherein each layer comprises a different precision.
7. The method of claim 1, wherein the constraints comprise at least one of: accuracy, power, cost, supported precision, speed.
8. The method of claim 7, wherein a computation constraint comprises a bit budget.
9. The method of claim 1, wherein the application is an artificial intelligence-based application.
10. A non-transitory computer readable medium comprising computer executable instructions for automated design space exploration for deep neural networks, the computer executable instructions comprising instructions for:
obtaining an input model and one or more constraints associated with an application and/or target device or process used in the application configured to utilize a deep neural network;
learning an optimal low-precision configuration of the optimal architecture using the input model, constraints, a training data set, and a validation data set; and
deploying the optimal configuration on the target device or process for use in the application.
11. A deep neural network optimization engine configured to perform automated design space exploration for deep neural networks, the engine comprising a processor and memory, the memory comprising computer executable instructions for:
obtaining an input model and one or more constraints associated with an application and/or target device or process used in the application configured to utilize a deep neural network;
learning an optimal low-precision configuration of the optimal architecture using the input model, constraints, a training data set, and a validation data set; and
deploying the optimal configuration on the target device or process for use in the application.
12. The engine of claim 11, wherein the optimal configuration is learned using a policy to generate an optimized model from the input model.
13. The engine of claim 2, wherein the optimal low-precision configuration of the optimal architecture is learned using the policy to generate a quantized network, further comprising instructions for:
fine tuning the quantized network with a knowledge distillation process;
evaluating the fine-tuned network;
applying a reward function; and
iterating for at least one additional quantized network and selecting the optimal low-precision configuration.
14. The engine of claim 13, wherein selecting the optimal low-precision configuration comprises selecting a precision configuration that achieves the best reward as determined by the reward function, for the constraints on the target device or process.
15. The engine of claim 11, wherein learning the optimal low-precision configuration comprises exploiting low precision weights using reinforcement learning to learn the optimal low-precision configuration across the deep neural network.
16. The engine of claim 15, wherein each layer comprises a different precision.
17. The engine of claim 11, wherein the constraints comprise at least one of: accuracy, power, cost, supported precision, speed.
18. The engine of claim 17, wherein a computation constraint comprises a bit budget.
19. The engine of claim 11, wherein the application is an artificial intelligence-based application.
US17/250,928 2018-11-19 2019-11-18 System and Method for Automated Precision Configuration for Deep Neural Networks Pending US20210350233A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/250,928 US20210350233A1 (en) 2018-11-19 2019-11-18 System and Method for Automated Precision Configuration for Deep Neural Networks

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201862769403P 2018-11-19 2018-11-19
PCT/CA2019/051643 WO2020102888A1 (en) 2018-11-19 2019-11-18 System and method for automated precision configuration for deep neural networks
US17/250,928 US20210350233A1 (en) 2018-11-19 2019-11-18 System and Method for Automated Precision Configuration for Deep Neural Networks

Publications (1)

Publication Number Publication Date
US20210350233A1 true US20210350233A1 (en) 2021-11-11

Family

ID=70773445

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/250,928 Pending US20210350233A1 (en) 2018-11-19 2019-11-18 System and Method for Automated Precision Configuration for Deep Neural Networks
US17/250,926 Pending US20220335304A1 (en) 2018-11-19 2019-11-18 System and Method for Automated Design Space Determination for Deep Neural Networks

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/250,926 Pending US20220335304A1 (en) 2018-11-19 2019-11-18 System and Method for Automated Design Space Determination for Deep Neural Networks

Country Status (4)

Country Link
US (2) US20210350233A1 (en)
EP (2) EP3884434A4 (en)
CA (2) CA3114635A1 (en)
WO (2) WO2020102887A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250523A1 (en) * 2019-02-05 2020-08-06 Gyrfalcon Technology Inc. Systems and methods for optimizing an artificial intelligence model in a semiconductor solution
US20210064975A1 (en) * 2019-09-03 2021-03-04 International Business Machines Corporation Deep neural network on field-programmable gate array
US11315229B2 (en) * 2020-06-09 2022-04-26 Inventec (Pudong) Technology Corporation Method for training defect detector
US11455425B2 (en) * 2020-10-27 2022-09-27 Alipay (Hangzhou) Information Technology Co., Ltd. Methods, apparatuses, and systems for updating service model based on privacy protection
WO2023160290A1 (en) * 2022-02-23 2023-08-31 京东方科技集团股份有限公司 Neural network inference acceleration method, target detection method, device, and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967568B (en) * 2020-06-29 2023-09-01 北京百度网讯科技有限公司 Adaptation method and device for deep learning model and electronic equipment
EP3944029A1 (en) * 2020-07-21 2022-01-26 Siemens Aktiengesellschaft Method and system for determining a compression rate for an ai model of an industrial task
KR20220101954A (en) * 2021-01-12 2022-07-19 삼성전자주식회사 Neural network processing method and apparatus
GB202206105D0 (en) * 2022-04-27 2022-06-08 Samsung Electronics Co Ltd Method for knowledge distillation and model generation
CN115774851B (en) * 2023-02-10 2023-04-25 四川大学 Method and system for detecting internal defects of crankshaft based on hierarchical knowledge distillation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200125956A1 (en) * 2017-05-20 2020-04-23 Google Llc Application Development Platform and Software Development Kits that Provide Comprehensive Machine Learning Services

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106170800A (en) * 2014-09-12 2016-11-30 微软技术许可有限责任公司 Student DNN is learnt via output distribution
US20160328644A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Adaptive selection of artificial neural networks
US10733531B2 (en) * 2016-01-27 2020-08-04 Bonsai AI, Inc. Artificial intelligence engine having an architect module
US20180260687A1 (en) * 2016-04-26 2018-09-13 Hitachi, Ltd. Information Processing System and Method for Operating Same
DE202016004627U1 (en) * 2016-07-27 2016-09-23 Google Inc. Training a neural value network
US10621486B2 (en) * 2016-08-12 2020-04-14 Beijing Deephi Intelligent Technology Co., Ltd. Method for optimizing an artificial neural network (ANN)
WO2018051841A1 (en) * 2016-09-16 2018-03-22 日本電信電話株式会社 Model learning device, method therefor, and program
US20180165602A1 (en) * 2016-12-14 2018-06-14 Microsoft Technology Licensing, Llc Scalability of reinforcement learning by separation of concerns
CN110326004B (en) * 2017-02-24 2023-06-30 谷歌有限责任公司 Training a strategic neural network using path consistency learning
US10713540B2 (en) * 2017-03-07 2020-07-14 Board Of Trustees Of Michigan State University Deep learning system for recognizing pills in images
US20180260695A1 (en) * 2017-03-07 2018-09-13 Qualcomm Incorporated Neural network compression via weak supervision
US9754221B1 (en) * 2017-03-09 2017-09-05 Alphaics Corporation Processor for implementing reinforcement learning operations
WO2018189728A1 (en) * 2017-04-14 2018-10-18 Cerebras Systems Inc. Floating-point unit stochastic rounding for accelerated deep learning
US10643297B2 (en) * 2017-05-05 2020-05-05 Intel Corporation Dynamic precision management for integer deep learning primitives
US10878273B2 (en) * 2017-07-06 2020-12-29 Texas Instruments Incorporated Dynamic quantization for deep neural network inference system and method
US20190050710A1 (en) * 2017-08-14 2019-02-14 Midea Group Co., Ltd. Adaptive bit-width reduction for neural networks
US20190171927A1 (en) * 2017-12-06 2019-06-06 Facebook, Inc. Layer-level quantization in neural networks
EP3543917B1 (en) * 2018-03-19 2024-01-03 SRI International Inc. Dynamic adaptation of deep neural networks
US11948074B2 (en) * 2018-05-14 2024-04-02 Samsung Electronics Co., Ltd. Method and apparatus with neural network parameter quantization
CN110378382A (en) * 2019-06-18 2019-10-25 华南师范大学 Novel quantization transaction system and its implementation based on deeply study

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200125956A1 (en) * 2017-05-20 2020-04-23 Google Llc Application Development Platform and Software Development Kits that Provide Comprehensive Machine Learning Services

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah, Amir Yazdanbakhsh, & Hadi Esmaeilzadeh. (5 November 2018). "ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks." https://arxiv.org/abs/1811.01704v1 (Year: 2018) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250523A1 (en) * 2019-02-05 2020-08-06 Gyrfalcon Technology Inc. Systems and methods for optimizing an artificial intelligence model in a semiconductor solution
US20210064975A1 (en) * 2019-09-03 2021-03-04 International Business Machines Corporation Deep neural network on field-programmable gate array
US11907828B2 (en) * 2019-09-03 2024-02-20 International Business Machines Corporation Deep neural network on field-programmable gate array
US11315229B2 (en) * 2020-06-09 2022-04-26 Inventec (Pudong) Technology Corporation Method for training defect detector
US11455425B2 (en) * 2020-10-27 2022-09-27 Alipay (Hangzhou) Information Technology Co., Ltd. Methods, apparatuses, and systems for updating service model based on privacy protection
WO2023160290A1 (en) * 2022-02-23 2023-08-31 京东方科技集团股份有限公司 Neural network inference acceleration method, target detection method, device, and storage medium

Also Published As

Publication number Publication date
EP3884434A4 (en) 2022-10-19
EP3884434A1 (en) 2021-09-29
CA3114635A1 (en) 2020-05-28
WO2020102888A1 (en) 2020-05-28
US20220335304A1 (en) 2022-10-20
WO2020102887A1 (en) 2020-05-28
EP3884435A1 (en) 2021-09-29
CA3114632A1 (en) 2020-05-28
EP3884435A4 (en) 2022-10-19

Similar Documents

Publication Publication Date Title
US20210350233A1 (en) System and Method for Automated Precision Configuration for Deep Neural Networks
US11790212B2 (en) Quantization-aware neural architecture search
WO2021057713A1 (en) Method for splitting neural network model by using multi-core processor, and related product
CN110674936A (en) Neural network processing method and device, computer equipment and storage medium
WO2018171717A1 (en) Automated design method and system for neural network processor
US20200042856A1 (en) Scheduler for mapping neural networks onto an array of neural cores in an inference processing unit
US20200125956A1 (en) Application Development Platform and Software Development Kits that Provide Comprehensive Machine Learning Services
TW202026858A (en) Exploiting activation sparsity in deep neural networks
CN111652367A (en) Data processing method and related product
CN110826708B (en) Method for realizing neural network model splitting by using multi-core processor and related product
CN110546611A (en) Reducing power consumption in a neural network processor by skipping processing operations
Daghero et al. Energy-efficient deep learning inference on edge devices
CN108171328B (en) Neural network processor and convolution operation method executed by same
EP3893104A1 (en) Methods and apparatus for low precision training of a machine learning model
US20220076095A1 (en) Multi-level sparse neural networks with dynamic rerouting
KR20190098671A (en) High speed processing method of neural network and apparatus using thereof
JP2022546271A (en) Method and apparatus for predicting kernel tuning parameters
de Prado et al. Automated design space exploration for optimized deployment of dnn on arm cortex-a cpus
Lin et al. Tiny machine learning: progress and futures [feature]
CN117011118A (en) Model parameter updating method, device, computer equipment and storage medium
CN114721670A (en) NPU neural network model deployment method and device based on TVM
Guo et al. Algorithms and architecture support of degree-based quantization for graph neural networks
Sun et al. Unicnn: A pipelined accelerator towards uniformed computing for cnns
Oltean et al. Method for rapid development of Arduino-based applications enclosing ANN
Thingom et al. A Review on Machine Learning in IoT Devices

Legal Events

Date Code Title Description
AS Assignment

Owner name: DEEPLITE INC., CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TANDEMLAUNCH INC.;REEL/FRAME:055755/0313

Effective date: 20200211

Owner name: TANDEMLAUNCH INC., CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SABOORI, EHSAN;SAWYER, DAVIS MANGAN;ASKARIHEMMAT, MOHAMMADHOSSEIN;AND OTHERS;REEL/FRAME:055755/0137

Effective date: 20191114

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED