CN111191772A - Intelligent computing general acceleration system facing embedded environment and construction method thereof - Google Patents

Intelligent computing general acceleration system facing embedded environment and construction method thereof Download PDF

Info

Publication number
CN111191772A
CN111191772A CN202010003133.XA CN202010003133A CN111191772A CN 111191772 A CN111191772 A CN 111191772A CN 202010003133 A CN202010003133 A CN 202010003133A CN 111191772 A CN111191772 A CN 111191772A
Authority
CN
China
Prior art keywords
acceleration
reconfigurable
model
operator
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010003133.XA
Other languages
Chinese (zh)
Other versions
CN111191772B (en
Inventor
李欣瑶
刘飞阳
高泽
白林亭
文鹏程
李亚晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202010003133.XA priority Critical patent/CN111191772B/en
Publication of CN111191772A publication Critical patent/CN111191772A/en
Application granted granted Critical
Publication of CN111191772B publication Critical patent/CN111191772B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)
  • Stored Programmes (AREA)

Abstract

The invention belongs to the field of intelligent computing, and discloses an intelligent computing general acceleration system facing an embedded environment and a construction method thereof. The method comprises the following steps: the method comprises the following steps of (1) pre-training a deep neural network, carrying out structural analysis on a deep neural network model, and extracting structural features of the model; a reconfigurable stage, analyzing the structure analysis data of the model, completing the calculation acceleration of each layer of the network based on programmable logic, and realizing the dynamic storage and the dynamic configuration of the network model based on reconfigurable hardware resources; and in the post-processing stage, the acceleration effect is verified, the result is output, and the programmable calculation acceleration unit can be expanded. The method has obvious acceleration effect on the complex deep neural network, has good flexibility and applicability, effectively reduces the storage and calculation cost required by intelligent calculation, and provides technical support for the deep neural network meeting different application requirements to be simultaneously deployed in an embedded environment with limited resources.

Description

Intelligent computing general acceleration system facing embedded environment and construction method thereof
Technical Field
The invention belongs to the field of intelligent computing, and provides an intelligent computing acceleration method facing an embedded environment.
Background
In recent years, deep neural networks represented by convolutional neural networks CNN and recurrent neural networks RNN have a great advantage in dealing with complex intelligence problems such as computer vision and speech recognition, and can meet various application requirements: (1) performing auxiliary decision according to the comprehensive situation information; (2) and detecting, identifying and tracking the target in real time and the like. However, the performance advantages of the current deep neural network mainly depend on the huge scale of parameters and the high computing power of the multi-graphics processor integration, and the high storage and computing cost makes it almost impossible to directly deploy the deep neural network to the embedded device with limited hardware resources. Therefore, it is necessary to solve the problem of deployment and optimization of various deep neural networks in an embedded environment with limited resources.
At present, there are two main directions of research for implementing intelligent computing acceleration applied in an embedded environment: the method is realized based on a customizable special integrated chip, can customize and optimize a specific algorithm according to specific requirements, and has low power consumption and high calculation efficiency; however, the application-specific integrated chip lacks a uniform software and hardware development environment, has a long development period and poor flexibility and universality, can only accelerate a specific deep neural network, and is difficult to apply to an embedded environment in a short time. The other research direction is realized in an embedded mode based on a programmable semi-customized FPGA, the FPGA has excellent iteration speed and flexibility, the technology is autonomous and controllable, and the method is suitable for the inference stage of a deep neural network; but the design difficulty of the sequential logic of the FPGA is very large.
Therefore, the conventional acceleration method mainly aims at the neural network with a single structure, and is difficult to meet the multi-application requirement of the embedded environment at the same time.
Disclosure of Invention
Aiming at a set of various deep neural networks, the invention provides a reconfigurable intelligent computing acceleration method facing an embedded environment, which has universality and can pointedly accelerate the deep neural networks.
The technical scheme of the invention is as follows:
the intelligent computing general acceleration system for the embedded environment aims at a set of a plurality of deep neural networks (namely the deep neural network to be accelerated), and comprises a reconfigurable control unit, a reconfigurable storage unit and a programmable computing acceleration unit;
the reconfigurable storage unit adopts reconfigurable hardware resources and is used for dynamically configuring an optimal storage architecture for the loaded network model layer by layer so as to reduce data dependence on an external memory;
the programmable computation accelerating unit is packaged with a public processing unit library and a special processing unit library; the public processing unit library is used for accelerating the commonality part of the deep neural network and providing two working modes of CNN and RNN; the special processing unit library is used for accelerating the personality part of the deep neural network;
the reconfigurable control unit is used for analyzing the structure analysis data of the deep neural network model to be accelerated, loading the network model into the reconfigurable storage unit layer by layer, controlling the common processing unit library in the programmable computation acceleration unit to switch between two working modes of CNN and RNN, and sequentially completing the acceleration of each layer of network according to the operator in the special processing unit library required by the corresponding awakening or dormancy of the model structure analysis data.
Optionally, the common processing unit library contains convolution operators, activation function Sigmoid operators, activation function Tanh operators, activation function ReLU operators, LSTM operators, GRU operators, and full join operators.
Optionally, the specialized processing unit library includes floating-point multiply-add operators, update gate operators, forget gate operators, input gate operators, and output gate operators.
Optionally, the reconfigurable hardware resources include lookup tables (LUTs) and flip flops (flip flops) of the FPGA.
Correspondingly, the invention also provides a construction method of the intelligent computing general acceleration system facing the embedded environment, which comprises the following steps:
a pretreatment stage: pre-training a deep neural network to be accelerated, carrying out structural analysis on a deep neural network model, and extracting structural features of the model;
a reconfigurable stage: analyzing the structure analysis data of the model, completing the calculation acceleration of each layer of the network based on programmable logic, and realizing the dynamic storage and the dynamic configuration of the network model based on reconfigurable hardware resources;
and (3) post-treatment stage: verifying the acceleration effect of the reconfigurable stage; if the acceleration effect reaches the expectation, outputting the result according to the required format; if the acceleration effect is not expected, the model structure is further analyzed, the common processing unit library and the special processing unit library are updated and supplemented according to the structural characteristics of the model, new operator modules are packaged (correspondingly added into the common processing unit library and the special processing unit library), the reconfigurable stage is returned, and the model is accelerated again.
The pretreatment stage may specifically be: the weight parameters are obtained through pre-training, the network scale is preliminarily reduced by applying a lightweight technology, the structure of the network model is analyzed, and the structural characteristics of the network model are extracted.
The application of the lightweight technology to preliminarily reduce the network scale may specifically be: pruning, sparse coefficient, data quantization, Huffman coding and binary/ternary operation are adopted, and the parameter quantity of the network model is preliminarily reduced.
The invention has the following beneficial effects:
the reconfigurable intelligent computation acceleration method provided by the invention has universality and can pointedly accelerate the deep neural network (accelerate the acceleration operators in deep neural network switching processing unit libraries with different structures), and the defect that the existing network acceleration method can only accelerate a single network is overcome. Particularly, three processing units in a reconfigurable stage are matched efficiently and tightly, a reconfigurable control unit analyzes the model structure analysis data, a reconfigurable storage unit dynamically configures a storage space, on-chip resources are utilized to the maximum extent, a programmable computation acceleration unit fully exerts the advantage of FPGA dynamic reconfiguration, two extensible processing unit libraries are packaged, and the flexibility of network model hardware acceleration is effectively improved.
The method provided by the invention has an obvious acceleration effect on the complex deep neural network, has good flexibility and applicability, effectively reduces the storage and calculation cost required by intelligent calculation, and provides technical support for meeting different application requirements of the deep neural network to be deployed in an embedded environment with limited resources.
Drawings
FIG. 1 is a schematic flow diagram of a reconfigurable intelligent computing acceleration method oriented to an embedded environment;
FIG. 2 is a schematic diagram of a reconfigurable intelligent computing acceleration architecture.
Detailed Description
The present invention will be described in further detail below by way of examples with reference to the accompanying drawings.
In the embodiment, in order to meet multiple application requirements of embedded environment situation awareness (mainly depending on RNN) and target identification (mainly depending on CNN) and the like, a reconfigurable intelligent computing acceleration method is provided for a set of multiple deep neural networks, an intelligent computing universal acceleration architecture is built, and a programmable FPGA hardware platform is adopted to verify the intelligent computing universal acceleration architecture.
The reconfigurable intelligent computing acceleration method for the embedded environment comprises three stages, namely a preprocessing stage, a reconfigurable stage and a post-processing stage. As shown in fig. 1, the method specifically includes:
firstly, a pretreatment stage: and preprocessing the deep neural network model to be accelerated, and acquiring weight parameters through pre-training. Lightweight technologies such as pruning, sparse coefficient, data quantization, Huffman coding, binarization/ternary operation and the like are adopted, the parameter quantity of the network model is preliminarily reduced, the structure of the network model is analyzed, and a foundation is laid for intelligent calculation acceleration in a reconfigurable stage.
Secondly, a reconfigurable stage: because convolution operation in the CNN network consumes a large amount of computing resources and storage resources, and the data flow direction in the RNN network is complex, in order to meet the universality requirement of reconfigurable intelligent computing acceleration and simultaneously improve the network model performance in a targeted and maximized manner, a general intelligent computing acceleration architecture is designed and implemented, as shown in fig. 2. The architecture consists of a reconfigurable control unit, a programmable computation acceleration unit and a reconfigurable storage unit.
1. The reconfigurable control unit analyzes the model structure analysis data obtained in the preprocessing stage, loads the network model into the on-chip data storage layer by layer, controls the public processing unit library in the programmable calculation acceleration unit to switch between the CNN working mode and the RNN working mode, and wakes up or sleeps the operators in the special processing unit library according to the model structure analysis data to finish the acceleration of the layer network. This process is repeated until accelerated computations for all layers of the network model are completed.
2. Based on the idea of modularization, a common processing unit library and a special processing unit library are packaged in the programmable computing acceleration unit:
A. the common processing unit library can accelerate the commonality part of the deep neural network and is divided into two working modes of CNN and RNN, including convolution operator, activation function Sigmoid operator, activation function Tanh operator, activation function ReLU operator, LSTM operator, GRU operator, full-link operator and other operators;
B. the special processing unit library can accelerate the individual part of the deep neural network and comprises a floating point multiplication and addition operator, an updating gate operator, a forgetting gate operator, an input gate operator and an output gate operator.
3. The reconfigurable storage unit adopts a reconfigurable hardware structure, utilizes reconfigurable hardware resources such as lookup tables (LUTs) of the FPGA, triggers (flip flops) and the like, and dynamically configures an optimal storage architecture layer by layer according to a network model loaded by the reconfigurable control unit, so that the data dependence on an external memory is reduced, and the time delay and the power consumption caused by data transmission are reduced.
Because the common processing unit library is designed for the common part of the network model, the architecture has certain universality, and the special accelerated processing unit libraries are respectively designed aiming at the special structure of the network model, thereby ensuring the flexibility and the high efficiency of the architecture. The reconfigurable hardware structure is used for dynamically configuring the on-chip storage space, and high time delay and high power consumption caused by data communication are reduced. The framework has good expandability, and can adapt to the quick change of the deep neural network model by updating the processing unit library. Therefore, the advantage of FPGA dynamic reconfiguration is fully exerted, the inference speed of the deep neural network is effectively improved, and the method can be applied to an embedded environment with shortage of hardware resources.
Thirdly, post-treatment stage: and establishing a hardware comprehensive verification environment of the reconfigurable intelligent computing acceleration architecture by adopting the programmable FPGA, and verifying the acceleration effect of the reconfigurable stage. If the acceleration effect reaches the expectation, outputting the result according to the required format; if the acceleration effect is not expected, the calculation acceleration unit required by the network model is not in the current programmable calculation acceleration unit, the network model structure is further analyzed, the public (special) processing unit library is updated and supplemented according to the model characteristics, a new operator module is packaged and added into the public (special) processing unit library, the reconstruction stage is returned, and secondary acceleration processing is carried out on the model.
The embodiment fully considers the requirements of commonality and individuality of neural network models with different depths on hardware resources. Taking CNN network and RNN network as examples, the CNN network is composed of multiple convolutional layers, pooling layers, and full-link layers, except for the input layer and output layer, and their corresponding acceleration operators are all contained in the common (dedicated) processing unit library. The RNN network consists of multiple gated structures, containing two specific structures.
CNN network acceleration: and in the preprocessing stage, the CNN model is pre-trained to obtain weight parameters of the CNN model, the scale of the CNN network model is reduced by using a lightweight technology, the CNN network model is subjected to layer-by-layer structure analysis, and structure analysis data and the CNN network model are transmitted to the reconfigurable intelligent computation acceleration architecture layer by layer. And a reconfigurable stage for accelerating the network model. The reconfigurable control unit loads the network model into the reconfigurable storage unit, and the reconfigurable storage unit dynamically configures reasonable on-chip storage space for the reconfigurable storage unit. Meanwhile, the reconfigurable control unit matches the structure analysis data with the processing unit library in the programmable calculation acceleration unit, controls the public processing unit library to be switched to a CNN working mode, and wakes up the corresponding operator in the special processing unit library. And after the acceleration is finished, the post-processing stage feeds back the acceleration effect and outputs the result.
RNN network acceleration: the preprocessing stage is similar to the preprocessing of the CNN, and is subjected to pre-training and model lightweight, and the structure analysis data and the model parameters are transmitted to the reconfigurable intelligent computing acceleration architecture together. In the reconfigurable stage, the reconfigurable control unit loads the network model into the reconfigurable storage unit, and the reconfigurable storage unit dynamically configures reasonable on-chip storage space for the reconfigurable storage unit. Meanwhile, the reconfigurable control unit matches the structure analysis data with a processing unit library in the programmable calculation acceleration unit, controls the common processing unit library to be switched to an RNN working mode, and wakes up a corresponding operator in the special processing unit library. Because the network contains two special structures, when the post-processing stage is verified, the acceleration effect is not expected, so that the model is further structurally analyzed, operators in a public (special) processing unit library are updated according to the special structures, if the special structures are not adapted yet, new operator modules are packaged and added into the public (special) processing unit library, the operation returns to the reconfigurable stage, and the model is secondarily accelerated.

Claims (7)

1. The utility model provides an intelligent computing universal acceleration system towards embedded environment, is directed against the set of multiple deep neural network which characterized in that: the reconfigurable computing acceleration unit comprises a reconfigurable control unit, a reconfigurable storage unit and a programmable computing acceleration unit;
the reconfigurable storage unit adopts reconfigurable hardware resources and is used for dynamically configuring an optimal storage architecture for the loaded network model layer by layer so as to reduce data dependence on an external memory;
the programmable computation accelerating unit is packaged with a public processing unit library and a special processing unit library; the public processing unit library is used for accelerating the commonality part of the deep neural network and providing two working modes of CNN and RNN; the special processing unit library is used for accelerating the personality part of the deep neural network;
the reconfigurable control unit is used for analyzing the structure analysis data of the deep neural network model to be accelerated, loading the network model into the reconfigurable storage unit layer by layer, controlling the common processing unit library in the programmable computation acceleration unit to switch between two working modes of CNN and RNN, and sequentially completing the acceleration of each layer of network according to the operator in the special processing unit library required by the corresponding awakening or dormancy of the model structure analysis data.
2. The intelligent computing universal acceleration system towards embedded environment of claim 1, characterized in that: the public processing unit library comprises a convolution operator, an activation function Sigmoid operator, an activation function Tanh operator, an activation function ReLU operator, an LSTM operator, a GRU operator and a full-link operator.
3. The intelligent computing universal acceleration system towards embedded environment of claim 1, characterized in that: the special processing unit library comprises a floating point multiply-add operator, an update gate operator, a forget gate operator, an input gate operator and an output gate operator.
4. The intelligent computing universal acceleration system towards embedded environment of claim 1, characterized in that: the reconfigurable hardware resources comprise lookup tables (LUTs) and flip flops (flip flops) of the FPGA.
5. A method of constructing an intelligent computing universal acceleration system for embedded environments as recited in claim 1, comprising:
a pretreatment stage: pre-training a deep neural network to be accelerated, carrying out structural analysis on a deep neural network model, and extracting structural features of the model;
a reconfigurable stage: analyzing the structure analysis data of the model, completing the calculation acceleration of each layer of the network based on programmable logic, and realizing the dynamic storage and the dynamic configuration of the network model based on reconfigurable hardware resources;
and (3) post-treatment stage: verifying the acceleration effect of the reconfigurable stage; if the acceleration effect reaches the expectation, outputting the result according to the required format; if the acceleration effect is not expected, the model structure is further analyzed, the common processing unit library and the special processing unit library are updated and supplemented according to the structural characteristics of the model, a new operator module is packaged, the reconfigurable stage is returned, and the model is accelerated again.
6. The method for constructing the intelligent computing general acceleration system facing the embedded environment according to claim 5, characterized in that: in the preprocessing stage, weight parameters are obtained through pre-training, the network scale is preliminarily reduced by applying a lightweight technology, the structure of the network model is analyzed, and the structural characteristics of the network model are extracted.
7. The method for constructing the intelligent computing universal acceleration system facing the embedded environment according to claim 6, characterized in that: the network scale is preliminarily reduced by applying the lightweight technology, and particularly, the parameters of a network model are preliminarily reduced by adopting pruning, sparse coefficients, data quantization, Huffman coding and binary/ternary operation.
CN202010003133.XA 2020-01-02 2020-01-02 Intelligent computing general acceleration system facing embedded environment and construction method thereof Active CN111191772B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010003133.XA CN111191772B (en) 2020-01-02 2020-01-02 Intelligent computing general acceleration system facing embedded environment and construction method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010003133.XA CN111191772B (en) 2020-01-02 2020-01-02 Intelligent computing general acceleration system facing embedded environment and construction method thereof

Publications (2)

Publication Number Publication Date
CN111191772A true CN111191772A (en) 2020-05-22
CN111191772B CN111191772B (en) 2022-12-06

Family

ID=70708099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010003133.XA Active CN111191772B (en) 2020-01-02 2020-01-02 Intelligent computing general acceleration system facing embedded environment and construction method thereof

Country Status (1)

Country Link
CN (1) CN111191772B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967572A (en) * 2020-07-10 2020-11-20 逢亿科技(上海)有限公司 FPGA-based YOLO V3 and YOLO V3 Tiny network switching method
CN112215071A (en) * 2020-09-10 2021-01-12 华蓝设计(集团)有限公司 Vehicle-mounted multi-target coupling identification and tracking method for automatic driving under heterogeneous traffic flow
CN112596718A (en) * 2020-12-24 2021-04-02 中国航空工业集团公司西安航空计算技术研究所 Hardware code generation and performance evaluation method
CN112906887A (en) * 2021-02-20 2021-06-04 上海大学 Sparse GRU neural network acceleration realization method and device
CN113780542A (en) * 2021-09-08 2021-12-10 北京航空航天大学杭州创新研究院 FPGA-oriented multi-target network structure construction method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171321A (en) * 2017-12-07 2018-06-15 中国航空工业集团公司西安航空计算技术研究所 A kind of deep neural network Embedded Design Method based on SoC chip
CN209231976U (en) * 2018-12-29 2019-08-09 南京宁麒智能计算芯片研究院有限公司 A kind of accelerator of restructural neural network algorithm
CN110135572A (en) * 2019-05-17 2019-08-16 南京航空航天大学 It is a kind of that flexible CNN design method is trained based on SOC

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171321A (en) * 2017-12-07 2018-06-15 中国航空工业集团公司西安航空计算技术研究所 A kind of deep neural network Embedded Design Method based on SoC chip
CN209231976U (en) * 2018-12-29 2019-08-09 南京宁麒智能计算芯片研究院有限公司 A kind of accelerator of restructural neural network algorithm
CN110135572A (en) * 2019-05-17 2019-08-16 南京航空航天大学 It is a kind of that flexible CNN design method is trained based on SOC

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967572A (en) * 2020-07-10 2020-11-20 逢亿科技(上海)有限公司 FPGA-based YOLO V3 and YOLO V3 Tiny network switching method
CN112215071A (en) * 2020-09-10 2021-01-12 华蓝设计(集团)有限公司 Vehicle-mounted multi-target coupling identification and tracking method for automatic driving under heterogeneous traffic flow
CN112596718A (en) * 2020-12-24 2021-04-02 中国航空工业集团公司西安航空计算技术研究所 Hardware code generation and performance evaluation method
CN112906887A (en) * 2021-02-20 2021-06-04 上海大学 Sparse GRU neural network acceleration realization method and device
CN113780542A (en) * 2021-09-08 2021-12-10 北京航空航天大学杭州创新研究院 FPGA-oriented multi-target network structure construction method
CN113780542B (en) * 2021-09-08 2023-09-12 北京航空航天大学杭州创新研究院 Method for constructing multi-target network structure facing FPGA

Also Published As

Publication number Publication date
CN111191772B (en) 2022-12-06

Similar Documents

Publication Publication Date Title
CN111191772B (en) Intelligent computing general acceleration system facing embedded environment and construction method thereof
US20230297846A1 (en) Neural network compression method, apparatus and device, and storage medium
CN105159148A (en) Robot instruction processing method and device
Chen et al. OCEAN: An on-chip incremental-learning enhanced processor with gated recurrent neural network accelerators
Sun et al. A high-performance accelerator for large-scale convolutional neural networks
CN112633477A (en) Quantitative neural network acceleration method based on field programmable array
CN108345934A (en) A kind of activation device and method for neural network processor
Xiao et al. FPGA implementation of CNN for handwritten digit recognition
Vasquez et al. Activation density based mixed-precision quantization for energy efficient neural networks
Xiyuan et al. A Review of FPGA‐Based Custom Computing Architecture for Convolutional Neural Network Inference
Vu et al. Efficient optimization and hardware acceleration of cnns towards the design of a scalable neuro inspired architecture in hardware
Kouris et al. A throughput-latency co-optimised cascade of convolutional neural network classifiers
Li et al. High-performance convolutional neural network accelerator based on systolic arrays and quantization
CN108734270B (en) Compatible neural network accelerator and data processing method
Zong-ling et al. The design of lightweight and multi parallel CNN accelerator based on FPGA
Liu et al. A 1D-CRNN inspired reconfigurable processor for noise-robust low-power keywords recognition
CN113780542A (en) FPGA-oriented multi-target network structure construction method
CN113553031A (en) Software definition variable structure computing framework and left-right brain integrated resource joint distribution method realized by using same
CN117521752A (en) Neural network acceleration method and system based on FPGA
US11769036B2 (en) Optimizing performance of recurrent neural networks
Mazouz et al. Automated offline design-space exploration and online design reconfiguration for CNNs
CN116822600A (en) Neural network search chip based on RISC-V architecture
US20220284260A1 (en) Variable quantization for neural networks
Xia et al. PAI-FCNN: FPGA based inference system for complex CNN models
Zhao et al. Research on machine learning optimization algorithm of CNN for FPGA architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant