CN109583006B - Dynamic optimization method of field programmable gate array convolution layer based on cyclic cutting and rearrangement - Google Patents

Dynamic optimization method of field programmable gate array convolution layer based on cyclic cutting and rearrangement Download PDF

Info

Publication number
CN109583006B
CN109583006B CN201811201717.7A CN201811201717A CN109583006B CN 109583006 B CN109583006 B CN 109583006B CN 201811201717 A CN201811201717 A CN 201811201717A CN 109583006 B CN109583006 B CN 109583006B
Authority
CN
China
Prior art keywords
independent
data sharing
loop
programmable gate
convolution layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811201717.7A
Other languages
Chinese (zh)
Other versions
CN109583006A (en
Inventor
陈朋
陈庆清
王海霞
赵�智
刘义鹏
梁荣华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201811201717.7A priority Critical patent/CN109583006B/en
Publication of CN109583006A publication Critical patent/CN109583006A/en
Application granted granted Critical
Publication of CN109583006B publication Critical patent/CN109583006B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/34Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

A dynamic optimization method of a field programmable gate array convolution layer based on cyclic cutting and rearrangement uses a high-level comprehensive tool to develop on a field programmable gate array platform, optimizes the convolution layer based on the cyclic cutting and rearrangement of convolution, adjusts the resource occupation and processing performance of the convolution layer, fully plays the parallel processing capability of the field programmable gate array, and improves the performance of a convolution neural network. The invention provides a dynamic optimization method of a field programmable gate array convolution layer based on cyclic cutting and rearrangement, which can greatly improve internal calculation speed and efficiency, thereby shortening calculation time and improving efficiency.

Description

Dynamic optimization method of field programmable gate array convolution layer based on cyclic cutting and rearrangement
Technical Field
The invention belongs to the technical field of digital image processing and pattern recognition, in particular to a dynamic optimization method of a field programmable gate array convolution layer based on cyclic cutting and rearrangement, aiming at the design of a core part convolution layer in a convolution neural network algorithm.
Background
The convolutional neural network is a multi-layer sensor developed on the basis of an artificial neural network, can be well adapted to deformation in the forms of translation, scaling, rotation and the like of images, is a sensitive sensor for extracting image characteristics, can achieve high accuracy by simulating optic nerve behaviors in living beings, and is widely applied to the fields of machine vision, pattern recognition, video monitoring, image searching and the like. The convolutional neural network belongs to a computationally intensive structure, but as the complexity of a model increases, model parameters are more and more, model scale and required calculation floating point numbers are larger and more, so that higher requirements on hardware resources are generated, and the model is not beneficial to being deployed and used on equipment with limited storage space and cruising duration.
Most of the convolutional neural network systems are basically realized in the GPU environment at present, and although the GPU has high parallel computing capability and can better solve the problem of computing speed, the convolutional neural network accelerator based on the GPU often has the problems of higher power consumption, larger volume and high cost.
Compared with the GPU, the field programmable gate array chip with a large amount of array logic and operation units has outstanding advantages in terms of size, power consumption and parallel operation. By means of the convolutional neural network realized by rich logic resources in the field programmable gate array and resources such as special multipliers, digital signal processing and the like, a large number of repeated and independent multiplication and addition operations in the algorithm can be executed in high parallelism, and the power consumption is reduced as much as possible while the computing capacity is ensured.
The traditional convolutional neural network construction mode for the field programmable gate array is designed based on a register transmission level description language, and has the problems of complex flow, long period, small optimization space and the like. Especially for the field programmable gate array, the method lacks effective characteristic analysis of the convolutional neural network implementation method, and the convolutional calculation has higher requirements on hardware resources. Therefore, in the edge computing environment, the design of a method for constructing a convolutional neural network becomes particularly important. In a field programmable gate array based convolutional neural network accelerator, field programmable gate array development implementation using a high-level synthesis tool has good scalability and requires only a short design time. The method adopts a high-level programming language to design an algorithm, and converts the algorithm into a trans-hierarchical design method of a register transmission level language which can be used for field programmable gate array design through the processes of compiling, semantic conversion, mapping, layout and wiring and the like. The circuit with high-level comprehensive design can obtain good performance under the condition of sufficient logic resources, but under the condition of complex equipment types and intensive resources, the design method and theory still need to be deeply explored.
Disclosure of Invention
In order to overcome the defect that the convolution layer in the prior art is too long in time consumption, the invention provides a dynamic optimization method of a field programmable gate array convolution layer based on cyclic cutting and rearrangement, which can greatly improve the internal calculation rate and efficiency, thereby shortening the calculation time and improving the efficiency.
The technical scheme adopted for solving the technical problems is as follows:
a dynamic optimization method of a field programmable gate array convolution layer based on cyclic cutting and rearrangement comprises the following steps:
1) Acquiring a calculation formula of a convolution layer according to the calculation process of convolution operation;
2) Setting corresponding segmentation parameters, and circularly segmenting the convolution layer calculation formula obtained in the step 1) to form two subcycles;
3) Analyzing the data sharing relation of the cycle parameters for the convolution layer calculation formula obtained in the step 1) and the subcycles obtained in the step 2);
4) According to the data sharing relation obtained by the analysis in the step 3), rearranging and unfolding optimization is carried out on the sub-loops obtained by the segmentation in the step 2) in a high-level comprehensive tool by inserting a compiling instruction in the conversion process;
5) Generating a corresponding comprehensive report by using a simulation tool of the high-level comprehensive tool, wherein the comprehensive report comprises the resource proportion used in the calculation process, comparing the obtained resource proportion report with the resource constraint condition, judging whether an optimal result under the current resource constraint condition is met, if not, modifying the segmentation parameters or the rearrangement sequence, and repeating the steps 2) and 3) and 4);
6) And 5) instantiating the convolution operation generated in the step 5) by using a high-level comprehensive tool, converting the C language into the Verilog language, generating a register transmission level circuit, and generating a corresponding convolution layer functional module.
Further, in the step 1), the convolution layer receives N feature maps with a size of w×h as input, each input feature map is generated by convolution kernel mapping with M windows with a size of k×k, and the translation step length of the windows is S, typically less than K, and the total N input feature maps form M output feature maps with a size of r×c, where the formula is as follows:
where OUT represents the output feature atlas, IN represents the input feature atlas, and W represents the weight set.
In the step 2), the calculation process of the convolution layer is divided into two subcycles, wherein one subcycle is shown in the following formula:
the combination < Tm, tn, tr > is the segmentation parameter set accordingly, where Tm, tn, tr and Tr are the segmentation of the output feature map depth, the input feature map depth, the output feature map width and length, respectively, and another sub-cycle is shown in the following formula:
in the step 3), according to the convolution calculation formula obtained in the step 1), the data sharing relationship between different loop iterations can be divided into three types: independent, independent and dependent;
i) Independent of: if loop iterator i k If not present in any access function of array A, then the corresponding loop dimension is independent of array A;
II) independently: if the data space union and loop dimension i accessed by array A k Is completely separable or for any two different parameters p 1 And p 2 For i k =p 1 And i k =p 2 Is disjoint in the data space of the different images, then the cyclic dimension i k Independent of array A;
III) dependence: if the union of data spaces accessed on array A cannot follow a certain loop dimension i k By performing the separation, the cyclic dimension i is considered k Depending on array A;
the data sharing relationship is shown in the following table:
from a hardware implementation perspective, independent data sharing relationships generate direct connections between buffers and computing modules, independent data sharing relationships generate broadcast connections, and dependent data sharing relationships produce interconnections with complex topologies.
In the step 4), the generated hardware structure is optimized, wherein one optimizing technology is loop expansion, and the other key optimizing technology is pipeline loop, and the operations of different loop iterations are repeatedly executed;
optimizing the sub-loops obtained after the segmentation in the step 2), firstly rearranging the internal loops according to the data sharing relation obtained by analysis, then expanding the loops arranged at the innermost part, and simultaneously adding the pipeline loops to improve the throughput of the system, wherein the calculation process after optimization is shown in the following formula:
where F (x) represents loop unrolling and L (x) represents pipeline loop.
The beneficial effects of the invention are as follows: and a high-level comprehensive tool is used for optimizing the convolution layer based on a dynamic optimization method of cyclic cutting and rearrangement, so that the calculation efficiency on the convolution neural network on the field programmable gate array is improved. The optimization method has good suitability, can be applied to different structures such as a pooling layer, a full-connection layer and the like in an expanding manner, and can also be applied to different convolutional neural network models.
Drawings
FIG. 1 is a flow chart of a method of dynamic optimization of a field programmable gate array convolutional layer based on loop cut and reorder;
FIG. 2 is a diagram of a calculation process of a convolution layer;
Detailed Description
The invention is further described below with reference to the accompanying drawings:
referring to fig. 1 and 2, a dynamic optimization method of a field programmable gate array convolution layer based on loop cutting and rearrangement includes the steps of:
1) The calculation process of the convolution layer is shown in fig. 2, where the convolution layer receives N feature maps with a size w×h as input, each input feature map is generated by using a convolution kernel mapping with M windows with a size k×k, and the translation step length of the window is S, and is generally smaller than K, and the total N input feature maps form M output feature maps with a size r×c, where the formula is as follows:
wherein OUT represents the output feature atlas, IN represents the input feature atlas, W represents the weight set;
2) Since the resources of the field programmable gate array are limited, the loop cannot be fully expanded for calculation, so the calculation process of the convolution layer is divided into two subcycles, and one subcycle is shown in the following formula:
the combination < Tm, tn, tr > is the segmentation parameter set accordingly, where Tm, tn, tr and Tr are the segmentation of the output feature map depth, the input feature map depth, the output feature map width and length, respectively. Another sub-cycle is shown by the following formula:
3) According to the convolution calculation formula obtained in the step 1), the data sharing relation between different loop iterations can be divided into three types: independent, independent and dependent.
I) Irrespective of the fact that the first and second parts are. If loop iterator i k Not in any access function of array a, the corresponding loop dimension is independent of array a.
II) independently. If the union of the data spaces accessed by array A and the loop dimension ik are completely separable, or for any two different parameters p 1 And p 2 For i k =p 1 And i k =p 2 Is disjoint in the data space of the different images, then the cyclic dimension i k Independent of array a.
III) dependence. If the union of data spaces accessed on array A cannot follow a certain loop dimension i k By performing the separation, the cyclic dimension i is considered k Depending on array a.
The data sharing relationship is shown in the table below.
Input IN Weight W Output OUT
trr Dependency of Independent of each other Independent and independent
tcc Dependency of Independent of each other Independent and independent
too Independent of each other Independent and independent Independent and independent
tii Independent and independent Independent and independent Independent of each other
i Dependency of Independent and independent Independent of each other
j Dependency of Independent and independent Independent of each other
From a hardware implementation perspective, independent data sharing relationships generate direct connections between buffers and computing modules, independent data sharing relationships generate broadcast connections, and dependent data sharing relationships generate interconnections with complex topologies;
4) The high-level synthesis tool can optimize the generated hardware structure by inserting a compiling instruction in the conversion process. One such optimization technique is loop expansion, which can convert sequentially executed loop operations into parallel operations, thereby increasing the operation speed. Yet another key optimization technique, namely pipeline looping, improves system throughput by repeatedly performing operations for different loop iterations.
Optimizing the sub-loops obtained after the segmentation in the step 2), firstly rearranging the internal loops according to the data sharing relation obtained by analysis, then expanding the loops arranged at the innermost part, and simultaneously adding pipeline loops to improve the throughput of the system. The optimized calculation process is shown as follows:
where F (x) represents loop unrolling and L (x) represents pipeline loop;
5) Because the resources of the field programmable gate array are limited, the resources of the optimized calculation process are required to be evaluated after the segmentation and rearrangement in the step 2) 4), and a simulation tool of a high-level comprehensive tool is used for generating a corresponding comprehensive report, wherein the comprehensive report comprises the resource occupation ratio used in the calculation process;
comparing the obtained resource duty ratio report with the resource constraint condition, judging whether an optimal result under the current resource constraint condition is met, if not, modifying the segmentation parameters or rearranging the sequence, and repeating the steps 2) and 3) and 4);
6) And 5) instantiating the convolution operation generated in the step 5) by using a high-level comprehensive tool, converting the C language into the Verilog language, generating a register transmission level circuit and generating a corresponding functional module.
The foregoing embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the examples, and any other modifications, substitutions, combinations, and tailors without departing from the spirit and principles of the present invention should be equivalent to the above described embodiments, and are included in the scope of the present invention.

Claims (4)

1. A method for dynamically optimizing a field programmable gate array convolutional layer based on cyclic cutting and reordering, the method comprising the steps of:
1) Acquiring a calculation formula of a convolution layer according to the calculation process of convolution operation;
2) Setting corresponding segmentation parameters, and circularly segmenting the convolution layer calculation formula obtained in the step 1) to form two subcycles;
3) Analyzing the data sharing relation of the cycle parameters for the convolution layer calculation formula obtained in the step 1) and the subcycles obtained in the step 2);
4) According to the data sharing relation obtained by the analysis in the step 3), rearranging and unfolding optimization is carried out on the sub-loops obtained by the segmentation in the step 2) in a high-level comprehensive tool by inserting a compiling instruction in the conversion process;
5) Generating a corresponding comprehensive report by using a simulation tool of the high-level comprehensive tool, wherein the comprehensive report comprises the resource proportion used in the calculation process, comparing the obtained resource proportion report with the resource constraint condition, judging whether an optimal result under the current resource constraint condition is met, if not, modifying the segmentation parameters or the rearrangement sequence, and repeating the steps 2) and 3) and 4);
6) Instantiating the convolution operation generated in the step 5) by using a high-level comprehensive tool, converting the C language into a Verilog language, generating a register transmission level circuit, and generating a corresponding convolution layer functional module;
in the step 2), the calculation process of the convolution layer is divided into two subcycles, wherein one subcycle is shown in the following formula:
the combination < Tm, tn, tr > is the segmentation parameter set accordingly, where Tm, tn, tr and Tr are the segmentation of the output feature map depth, the input feature map depth, the output feature map width and length, respectively, and another sub-cycle is shown in the following formula:
2. the method for dynamically optimizing a convolutional layer of a field programmable gate array based on cyclic cutting and rearrangement according to claim 1, wherein in the step 1), the convolutional layer receives N w×h feature maps as input, each input feature map is generated by mapping a convolutional kernel with M windows of k×k, the translation step of the window is S and is smaller than K, and a total of N input feature maps form M output feature maps with size of r×c, and the formula is as follows:
where OUT represents the output feature atlas, IN represents the input feature atlas, and W represents the weight set.
3. The method for dynamically optimizing a field programmable gate array convolution layer based on loop cutting and rearrangement according to claim 1 or 2, wherein in the step 3), according to the convolution calculation formula obtained in the step 1), the data sharing relationship between different loop iterations can be divided into three types: independent, independent and dependent;
i) Independent of: if loop iterator i k If not present in any access function of array A, then the corresponding loop dimension is independent of array A;
II) independently: if the data space union and loop dimension i accessed by array A k Is completely separable or for any two different parameters p 1 And p 2 For i k =p 1 And i k =p 2 Is disjoint in the data space of the different images, then the cyclic dimension i k Independent of array AIs a kind of device for the treatment of a cancer;
III) dependence: if the union of data spaces accessed on array A cannot follow a certain loop dimension i k By performing the separation, the cyclic dimension i is considered k Depending on array A;
the data sharing relation between trr and the input IN, the weight W and the output OUT is respectively dependent, irrelevant and independent;
the data sharing relation between tcc and the input IN, the weight W and the output OUT is respectively dependent, irrelevant and independent;
the data sharing relation between the too and the input IN, the weight W and the output OUT is irrelevant, independent and independent respectively;
tii is independent, independent and irrelevant to the data sharing relationship of the input IN, the weight W and the output OUT;
i is dependent, independent and irrelevant respectively with the data sharing relation of the input IN, the weight W and the output OUT;
the data sharing relation between j and the input IN, the weight W and the output OUT is respectively dependent, independent and irrelevant;
from a hardware implementation perspective, independent data sharing relationships generate direct connections between buffers and computing modules, independent data sharing relationships generate broadcast connections, and dependent data sharing relationships produce interconnections with complex topologies.
4. The method for dynamically optimizing a convolutional layer of a field programmable gate array based on loop cutting and rearrangement according to claim 1 or 2, wherein in the step 4), the generated hardware structure is optimized, one of the optimization techniques is loop expansion, and the other key optimization technique is pipeline loop, and the operations of different loop iterations are repeatedly executed;
optimizing the sub-loops obtained after the segmentation in the step 2), firstly rearranging the internal loops according to the data sharing relation obtained by analysis, then expanding the loops arranged at the innermost part, and simultaneously adding the pipeline loops to improve the throughput of the system, wherein the calculation process after optimization is shown in the following formula:
where F (x) represents loop unrolling and L (x) represents pipeline loop.
CN201811201717.7A 2018-10-16 2018-10-16 Dynamic optimization method of field programmable gate array convolution layer based on cyclic cutting and rearrangement Active CN109583006B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811201717.7A CN109583006B (en) 2018-10-16 2018-10-16 Dynamic optimization method of field programmable gate array convolution layer based on cyclic cutting and rearrangement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811201717.7A CN109583006B (en) 2018-10-16 2018-10-16 Dynamic optimization method of field programmable gate array convolution layer based on cyclic cutting and rearrangement

Publications (2)

Publication Number Publication Date
CN109583006A CN109583006A (en) 2019-04-05
CN109583006B true CN109583006B (en) 2023-07-21

Family

ID=65920178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811201717.7A Active CN109583006B (en) 2018-10-16 2018-10-16 Dynamic optimization method of field programmable gate array convolution layer based on cyclic cutting and rearrangement

Country Status (1)

Country Link
CN (1) CN109583006B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084363B (en) * 2019-05-15 2023-04-25 电科瑞达(成都)科技有限公司 Deep learning model acceleration method based on FPGA platform
CN111176962B (en) * 2019-12-02 2021-09-10 深圳先进技术研究院 FPGA platform, performance evaluation and design optimization method thereof and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118622A (en) * 2007-05-25 2008-02-06 清华大学 Minisize rudders three-dimensional track emulation method under city environment
CN107170024A (en) * 2017-04-01 2017-09-15 武汉市真意境文化科技有限公司 One kind is based on VR environment two dimension view generation methods and system
CN107368621A (en) * 2017-06-06 2017-11-21 中国核电工程有限公司 By the method for three-dimensional rack model generation CAD form two dimension standard three-view diagrams in PDMS
CN109598785A (en) * 2018-11-28 2019-04-09 佛山科学技术学院 A kind of three-dimensional grid model view conversion method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118622A (en) * 2007-05-25 2008-02-06 清华大学 Minisize rudders three-dimensional track emulation method under city environment
CN107170024A (en) * 2017-04-01 2017-09-15 武汉市真意境文化科技有限公司 One kind is based on VR environment two dimension view generation methods and system
CN107368621A (en) * 2017-06-06 2017-11-21 中国核电工程有限公司 By the method for three-dimensional rack model generation CAD form two dimension standard three-view diagrams in PDMS
CN109598785A (en) * 2018-11-28 2019-04-09 佛山科学技术学院 A kind of three-dimensional grid model view conversion method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《面向边缘计算的嵌入式FPGA卷积神经网络构建方法》;卢治等;《计算机研究与发展》;20180331;第552-555页 *

Also Published As

Publication number Publication date
CN109583006A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN110321999B (en) Neural network computational graph optimization method
CN111178519B (en) Convolutional neural network acceleration engine, convolutional neural network acceleration system and method
CN111967468A (en) FPGA-based lightweight target detection neural network implementation method
CN109784489A (en) Convolutional neural networks IP kernel based on FPGA
CN109740731B (en) Design method of self-adaptive convolution layer hardware accelerator
CN111626300A (en) Image semantic segmentation model and modeling method based on context perception
CN109583006B (en) Dynamic optimization method of field programmable gate array convolution layer based on cyclic cutting and rearrangement
CN112288082A (en) Design method of reconfigurable universal standard convolution accelerator based on HLS
JP2022533704A (en) Classifying Patterns in Electronic Circuit Layouts Using Machine Learning-Based Encoding
CN111861906A (en) Pavement crack image virtual augmentation model establishment and image virtual augmentation method
Li et al. Optimizing the deep neural networks by layer-wise refined pruning and the acceleration on FPGA
CN116740527A (en) Remote sensing image change detection method combining U-shaped network and self-attention mechanism
CN113222998A (en) Semi-supervised image semantic segmentation method and device based on self-supervised low-rank network
Wang et al. Briefly Analysis about CNN Accelerator based on FPGA
CN113255892B (en) Decoupled network structure searching method, device and readable storage medium
CN115457363B (en) Image target detection method and system
Di et al. Exploring resource-efficient acceleration algorithm for transposed convolution of GANs on FPGA
Yu et al. Hardware implementation of CNN based on FPGA for EEG Signal Patterns Recognition
CN115374925A (en) Hardware acceleration method for underwater target identification
Wen FPGA-Based Deep Convolutional Neural Network Optimization Method
Lin et al. A design framework for hardware approximation of deep neural networks
Lee et al. High-speed bnn design in hls with optimized classification and computation method
CN113780553B (en) Deep learning model optimization method and system based on high-level comprehensive tool
Zhang et al. Design of a Convolutional Neural Network Accelerator based on PYNQ
CN116991564B (en) Operator internal parallel acceleration method for heterogeneous dual-core MCU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant