CN116957018A - Method for realizing channel-by-channel convolution - Google Patents

Method for realizing channel-by-channel convolution Download PDF

Info

Publication number
CN116957018A
CN116957018A CN202210320433.XA CN202210320433A CN116957018A CN 116957018 A CN116957018 A CN 116957018A CN 202210320433 A CN202210320433 A CN 202210320433A CN 116957018 A CN116957018 A CN 116957018A
Authority
CN
China
Prior art keywords
convolution
channel
nna
input
convolution kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210320433.XA
Other languages
Chinese (zh)
Inventor
刘子航
王荔枝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Ingenic Technology Co ltd
Original Assignee
Hefei Ingenic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Ingenic Technology Co ltd filed Critical Hefei Ingenic Technology Co ltd
Priority to CN202210320433.XA priority Critical patent/CN116957018A/en
Publication of CN116957018A publication Critical patent/CN116957018A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The application provides a realization method of channel-by-channel convolution, when encountering the channel-by-channel convolution, the method realizes NNA convolution acceleration by expanding convolution kernels, the number of channels of each convolution kernel after expansion is equal to the number of input channels, and the number of convolution kernels is equal to the number of input channels; in order to ensure that the convolution result obtained after the convolution kernel expansion is consistent with the channel-by-channel convolution result, the filling values of the channel elements except the corresponding channel after the expansion need to be calculated.

Description

Method for realizing channel-by-channel convolution
Technical Field
The application relates to the technical field of neural networks, in particular to a method for realizing channel-by-channel convolution.
Background
With the advent of the big data age, the application of neural network technology has become increasingly popular, and a large number of data processing technologies have become one of important application technologies. Convolutional neural networks have wide application in the fields of images, video and voice, and particularly as the application of products such as consumer electronics and automotive electronics increasingly introduces artificial intelligence, the artificial intelligence requires a large amount of computation in model training and reasoning. The artificial intelligence has been developed rapidly, and technologies such as deep learning and neural networks enter the climax development stage. As neural networks become more complex, a significant amount of resources are required to train and evaluate, and hardware accelerator performance increases and versatility increases gradually. The convolutional neural network mainly comprises an input layer, a convolutional layer, a pooling layer and a full-connection layer, wherein the convolutional layer is a core layer forming the convolutional neural network, most of calculation amount in the network is generated by the convolutional neural network, and therefore the operation speed of the convolutional neural network basically depends on the operation speed of the convolutional layer. However, due to their algorithm and nature of the computation itself, conventional chips that have been widely used in the past have failed to meet the demands of a large number of computations, requiring chip manufacturers to build specialized chips for neural network algorithms, especially for the inference-side chips, namely Neural Network Accelerators (NNAs). In the prior art, NNA (NNA 1.0) only supports fast operations of multi-channel convolution. And the channel-by-channel convolution (depthwise convolution) process includes a large number of multiply-accumulate operations, which are slow.
That is, the following drawbacks exist in the prior art:
the convolution process comprises a large number of multiply-accumulate calculations, the calculation speed is low, real-time performance is not realized in practical application, NNA1.0 only supports fast calculation of multi-channel convolution, and for depthwise convolution, acceleration cannot be realized through NNA 1.0. The convolution process involves a large number of multiply-accumulate computations, the speed of which directly affects the performance of the convolution network. NNA essentially supports the fast computation of normal unsigned inputs and multiply-accumulate of normal unsigned weights, but NNA has some limitations as a hardware accelerator and does not have versatility.
Furthermore, the common terminology in the prior art is as follows:
1. neural network: the method simulates a mathematical model of a biological neural network structure and function, obtains the capability of analyzing or expressing sample data by learning the internal rule of training sample data, and can be applied to various application fields such as target detection, scene classification, character recognition and the like.
2. Deep learning: a process and method for training a neural network.
3. Multichannel convolution: and for the pixel point of each input channel, calculating the product of the neighborhood pixel and the corresponding convolution kernel channel element, accumulating, and accumulating the values on each channel to obtain a final convolution result.
4. depthwise convolution (channel-by-channel convolution): and for the pixel point of each input channel, calculating the product of the neighborhood pixel and the corresponding convolution kernel channel element, and accumulating to obtain a final convolution result. The number of convolution kernel channels is equal to 1, and the number of convolution kernels is equal to the number of input channels and the number of output channels.
5. NNA (neural network accelerate unit, neural network accelerator): NNA is a neural network accelerator of a hardware platform, and can realize the rapid operation of multi-channel convolution by configuring related register parameters, so that the running time of the neural network is greatly reduced, the real-time performance is higher in practical application, and the user experience is better.
6. FRAM: the NNA internally stores an on-chip RAM for input image data.
7. WRAM: the NNA internally stores on-chip RAM of convolution kernel data.
8. ORAM: on-chip general-purpose RAM.
9. pixel: the minimum unit of the image is input.
10. pad: the input image is edge-filled and divided into pad_top, pad_bottom, pad_left and pad_right, which represent the size of the image for filling up, down, left and right edges.
11. A stride: the step length of the sliding of the convolution kernel matrix is divided into stride_x and stride_y, and the step length of the sliding of the convolution kernel matrix in the transverse direction and the longitudinal direction is represented.
Disclosure of Invention
In order to solve the above problems, an object of the present application is to: a realization method of depthwise convolution based on NNA completes the rapid operation of depthwise convolution. The NNA hardware constraint includes that the size of the convolution kernel must be 3 or less, the convolution step size must be 2 or less, and the number of convolution kernel input channels must be a multiple of 32. When the NNA hardware constraint is not satisfied, some special processing is required for the weight data in order to achieve convolutional acceleration with the NNA. In particular, when considering that the input image data is normal unsigned data and the weight data is normal signed data, how to obtain the correct depthwise convolution result using the nnia hardware accelerator.
Specifically, the application provides a method for realizing channel-by-channel convolution, which realizes NNA convolution acceleration by expanding convolution kernels when encountering channel-by-channel convolution, wherein the number of channels of each convolution kernel after expansion is equal to the number of input channels, and the number of convolution kernels is equal to the number of input channels; the NNA hardware limit comprises that the size of a convolution kernel must be less than or equal to 3, the convolution step size must be less than or equal to 2, and the number of input channels of the convolution kernel must be a multiple of 32; when the NNA hardware constraint is not satisfied, in order to implement convolution acceleration with the NNA, some special processing needs to be done on the weight data: when the input image data is normal unsigned data and the weight data is normal signed data, the padding values of the remaining input channels except the corresponding input channel are calculated while the depthwise convolution kernel is extended.
The NNA supports the rapid calculation of the common unsigned input and the common unsigned weight multiply-accumulate, and the multichannel convolution is the result of the common unsigned input and the common signed weight multiply-accumulate, and the correct convolution result can be obtained by configuring NNA related registers in actual use: the NNA supports common convolution, the number of input channels IC of the convolution kernel is a multiple of 32, input channel expansion is needed to be carried out on the depthwise convolution kernel to calculate depthwise convolution by utilizing the NNA, then the depthwise convolution is calculated according to the mode of NNA calculation common convolution, in order to obtain a correct depthwise convolution result, filling values of other input channels except for corresponding input channels are needed to be calculated, and the consistency of a depthwise convolution formula and an NNA calculation common convolution formula is ensured: assuming that the convolution kernel size is size, the number of input channels is IC, and the bit number of the weight is nw;
common unsigned input is F u The common signed weight is W s Common unsigned weight is W u The method comprises the steps of carrying out a first treatment on the surface of the The conversion formula is as follows: w (W) s =W u -2 nw-1
Multichannel convolution process:
where k=size·ic.
For the channel-by-channel convolution, since NNA1.0 only supports multi-channel convolution, the convolution kernel of the channel-by-channel convolution needs to be channel-extended:
for the nth convolution kernel: the element value of the nth channel is equal to the element value of the nth convolution kernel before expansion, and the element values of the other channels are all 2 nw -1, 1.ltoreq.n.ltoreq.IC, the weight after expansion beingThe actual calculation process comprises the following steps:
where k=size, m=size·ic.
The NNA convolution process:
writing convolution kernel data into a WRAM, writing input image data into a FRAM, setting read addresses of the WRAM and the FRAM, input data bit width, convolution kernel size, pad and stride parameters through an NNA hardware register, and calling an NNA hardware instruction NNMACG to obtain a convolution result and outputting the convolution result; wherein the input image size is: input channel number IC input image height IH input image width IW, convolution kernel size: output channel number OC input channel number IC convolution kernel height KH convolution kernel width KW, output image size is: output channel number OC output image high OH output image width OW, pad is divided into: the pad_top, pad_bottom, pad_left, pad_right, and stride are divided into stride_y and stride_x;
the calculation formula of the output image size is:
the data sizes KH and KW of convolution kernels written into the WRAM by the hardware limitation of the NNA are smaller than or equal to 3, and the number of input channels of the convolution kernels is required to be a multiple of 32; before the NNA is called for convolution acceleration, the input image data and the convolution kernel data are processed, and the number of input channels IC is ensured to be a multiple of 32.
The NNA convolution acceleration process: writing the extended convolution kernel data, namely the input channel number is equal to 32, into the WRAM, writing the input image data into the FRAM, setting read addresses of the WRAM and the FRAM, input data bit width, convolution kernel size, pad and stride parameters through an NNA hardware register, and calling an NNA hardware instruction NNMACG to obtain a convolution result and outputting the convolution result.
The method further comprises the steps of:
let the convolution kernel size be indicated by KH KW,
let the representation of the convolution kernel data bit width nw be represented by wbit,
s1, setting common unsigned input as F u The common signed weight is W S Common unsigned weight is W u The method comprises the steps of carrying out a first treatment on the surface of the The conversion formula is as follows: w (W) s =W u -2 wbit-1 Wherein wbit represents the convolution kernel data bit width;
common convolution: the convolution kernel size is KH KW, the number of input image channels is IC, the number of convolution kernel input channels is equal to IC, and IC is a multiple of 32;
s2, NNA calculates common convolution, and the formula is as follows:
will W u Data writing WRAM, F u Data write FRAM, configure NNA registers including WRAM and FRAM read addresses, convolution kernel size, convolution kernel number of input channels, where IC must be a multiple of 32, F u Data bit width sum W u The data bit width is wide, and then NNMACG instruction is called to obtain convolution result, namely
S3, calculating depthwise convolution: the convolution kernel size is khkw, the number of input image channels is IC, where IC is a multiple of 32, the volumeThe number of the integration input channels is equal to 1; the convolution kernel after the input channel expansion is as followsThe number of the input channels of the convolution kernel after expansion is IC, and the filling values of the other input channels except the corresponding input channels are W t
NNA calculation depthwise convolution formula derivation:
if W is t =2 wbit-1 The formula can be further reduced to:
when the weight data is common signed data, the depthwise convolution result is calculated by using NNA, input channel expansion is needed to be carried out on the depthwise convolution kernel, and the values of the other input channels except the corresponding channel after expansion are all filled with 2 wbit-1 Wherein wbit is the weight data bit width;
s4, willData writing WRAM, F u Data write FRAM, configure NNA register, including WRAM and FRAM read address, convolution kernel size, extended convolution kernel input channel number IC, where IC must be a multiple of 32, F u Data bit width sum->The data bit width, and then the NNMACG instruction is called to obtain the depthwise convolution result, namely +.>
Thus, the present application has the advantages that: by a simple method, the quick operation on depthwise convolution can be completed by accelerating through NNA 1.0. In particular, when considering that the weight data is normal signed data, the filling of the remaining input channels except the corresponding input channel is completed while expanding.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application.
FIG. 1 is a schematic flow chart of the method of the present application.
Detailed Description
In order that the technical content and advantages of the present application may be more clearly understood, a further detailed description of the present application will now be made with reference to the accompanying drawings.
Because NNA has some hard condition limitation, only multi-channel convolution is supported, and the universality is not high in practical application. In order to ensure the universality and improve the convolution operation speed, when depthwise convolution is encountered, NNA convolution acceleration is realized by carrying out channel expansion on a convolution kernel. And the number of the channels of each convolution kernel is equal to the number of the input channels, and the number of the convolution kernels is equal to the number of the input channels.
NNA1.0 actually supports fast computation of normal unsigned input and normal unsigned weight multiply-accumulate, while multi-channel convolution is the result of normal unsigned input and normal signed weight multiply-accumulate. In practical use, the correct convolution result can be obtained quickly by configuring NNA related registers.
Assuming that the convolution kernel is size, the number of input channels is IC, and the bit number of the weights is nw. The common signed weight is W s Common unsigned weight is W u . The conversion formula is as follows: w (W) s =W u -2 nw-1
Multichannel convolution process:
where k=size·ic.
For depthwise convolution, since NNA1.0 only supports multi-channel convolution, channel expansion is required for the depthwise convolution convolution kernel.
For the nth convolution kernel: the element value of the nth channel is equal to the element value of the nth convolution kernel before expansion, and the element values of the other channels are all 2 nw -1, 1.ltoreq.n.ltoreq.IC. The weight after expansion is
The actual calculation process comprises the following steps:
where k=size, m=size·ic
In summary, as shown in fig. 1, the method further includes the following steps:
let the convolution kernel size be indicated by KH KW,
let the representation of the convolution kernel data bit width nw be represented by wbit,
s1, setting common unsigned input as F u The common signed weight is W S Common unsigned weight is W u The method comprises the steps of carrying out a first treatment on the surface of the The conversion formula is as follows: w (W) s =W u -2 wbit-1 Wherein wbit represents the convolution kernel data bit width;
common convolution: the convolution kernel size is KH KW, the number of input image channels is IC, the number of convolution kernel input channels is equal to IC, and IC is a multiple of 32;
s2, NNA calculates common convolution, and the formula is as follows:
will W u Data writing WRAM, F u Data write FRAM, configure NNA registers including WRAM and FRAM read addresses, convolution kernel size, convolution kernel number of input channels, where IC must be a multiple of 32Number F u Data bit width sum W u The data bit width is wide, and then NNMACG instruction is called to obtain convolution result, namely
S3, calculating depthwise convolution: the convolution kernel size is KH KW, the number of input image channels is IC, wherein IC is a multiple of 32, and the number of input channels of the convolution kernel is equal to 1; the convolution kernel after the input channel expansion is as followsThe number of the input channels of the convolution kernel after expansion is IC, and the filling values of the other input channels except the corresponding input channels are W t
NNA calculation depthwise convolution formula derivation:
if W is t =2 wbit-1 The formula can be further reduced to:
when the weight data is common signed data, the depthwise convolution result is calculated by using NNA, input channel expansion is needed to be carried out on the depthwise convolution kernel, and the values of the other input channels except the corresponding channel after expansion are all filled with 2 wbit-1 Wherein wbit is the weight data bit width;
s4, willData writing WRAM, F u Data write FRAM, configure NNA register, including WRAM and FRAM read address, convolution kernel size, extended convolution kernel input channel number IC, where IC must be a multiple of 32, F u Data bit width sum->The data bit width, and then the NNMACG instruction is called to obtain the depthwise convolution result, namely +.>
Thus, the key points of the application are:
1. convolution kernel expansion: NNA1.0 requires the number of convolution kernel channels to be equal to the number of input channels, and for depthwise convolution, the number of convolution kernel channels is equal to 1, and NNA convolution acceleration can be realized by performing channel expansion on the convolution kernels not in the NNA1.0 support range.
2. In order to ensure that the convolution result obtained after the convolution kernel expansion is consistent with the depthwise convolution result, the filling values of the other channel elements except the corresponding channel after the expansion need to be calculated.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, and various modifications and variations can be made to the embodiments of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (7)

1. The method is characterized in that when encountering the channel-by-channel convolution, NNA convolution acceleration is realized by expanding convolution kernels, the number of channels of each expanded multi-channel convolution kernel is equal to the number of input channels, and the number of convolution kernels is equal to the number of input channels; the NNA hardware limit comprises that the size of a convolution kernel must be less than or equal to 3, the convolution step size must be less than or equal to 2, and the number of input channels of the convolution kernel must be a multiple of 32; when the NNA hardware constraint is not satisfied, in order to implement convolution acceleration with the NNA, some special processing needs to be done on the weight data: when the input image data is normal unsigned data and the weight data is normal signed data, the padding values of the remaining input channels except the corresponding input channel are calculated while the depthwise convolution kernel is extended.
2. The method for realizing channel-by-channel convolution according to claim 1, wherein said NNA supports fast computation of normal unsigned input and normal unsigned weight multiply-accumulate, and said multi-channel convolution is a result of normal unsigned input and normal signed weight multiply-accumulate, and in actual use, correct convolution results can be obtained by configuring an NNA correlation register: the NNA supports common convolution, the number of input channels IC of the convolution kernel is a multiple of 32, input channel expansion is needed to be carried out on the depthwise convolution kernel to calculate depthwise convolution by utilizing the NNA, then the depthwise convolution is calculated according to the mode of NNA calculation common convolution, in order to obtain a correct depthwise convolution result, filling values of other input channels except for corresponding input channels are needed to be calculated, and the consistency of a depthwise convolution formula and an NNA calculation common convolution formula is ensured:
assuming that the convolution kernel size is size, the number of input channels is IC, and the bit number of the weight is nw;
common unsigned input is F u The common signed weight is W s Common unsigned weight is W u The method comprises the steps of carrying out a first treatment on the surface of the The conversion formula is as follows: w (W) s =W u -2 nw-1
Multichannel convolution process:
where k=size·ic.
3. A method of performing a channel-by-channel convolution as defined in claim 2,
for channel-by-channel convolution, since the NNA only supports multi-channel convolution, the convolution kernel of the channel-by-channel convolution needs to be channel-extended:
for the nth convolution kernel: the element value of the nth channel is equal to the element value of the nth convolution kernel before expansion, and the element values of the other channels are all 2 nw -1, 1.ltoreq.n.ltoreq.IC, the weight after expansion being
The actual calculation process comprises the following steps:
where k=size, m=size·ic.
4. A method of implementing a channel-by-channel convolution as defined in claim 3, wherein said NNA convolution process:
writing convolution kernel data into a WRAM, writing input image data into a FRAM, setting read addresses of the WRAM and the FRAM, input data bit width, convolution kernel size, pad and stride parameters through an NNA hardware register, and calling an NNA hardware instruction NNMACG to obtain a convolution result and outputting the convolution result; wherein the input image size is: input channel number IC input image height IH input image width IW, convolution kernel size: output channel number OC input channel number IC convolution kernel height KH convolution kernel width KW, output image size is: output channel number OC output image high OH output image width OW, pad is divided into: the pad_top, pad_bottom, pad_left, pad_right, and stride are divided into stride_y and stride_x;
the calculation formula of the output image size is:
5. the method for realizing channel-by-channel convolution according to claim 4, wherein the data sizes KH and KW of the convolution kernel written into the WRAM by the hardware constraint of the NNA are 3 or less, and the number of input channels of the convolution kernel must be a multiple of 32; before the NNA is called for convolution acceleration, the input image data and the convolution kernel data are processed, and the number of input channels IC is ensured to be a multiple of 32.
6. The method of claim 5, wherein the nnia convolution acceleration process: writing the extended convolution kernel data, namely the input channel number is equal to 32, into the WRAM, writing the input image data into the FRAM, setting read addresses of the WRAM and the FRAM, input data bit width, convolution kernel size, pad and stride parameters through an NNA hardware register, and calling an NNA hardware instruction NNMACG to obtain a convolution result and outputting the convolution result.
7. A method of implementing a channel-by-channel convolution as defined in claim 6, said method further comprising the steps of:
let the convolution kernel size be indicated by KH KW,
let the representation of the convolution kernel data bit width nw be represented by wbit,
s1, setting common unsigned input as F u The common signed weight is W S Common unsigned weight is W u
The conversion formula is as follows: w (W) s =W u -2 wbit-1 Wherein wbit represents the convolution kernel data bit width;
common convolution: the convolution kernel size is KH KW, the number of input image channels is IC, the number of convolution kernel input channels is equal to IC, and IC is a multiple of 32;
s2, NNA calculates common convolution, and the formula is as follows:
will W u Data writing WRAM, F u Data write FRAM, configure NNA registers including WRAM and FRAM read addresses, convolution kernel size, convolution kernel number of input channels, where IC must be a multiple of 32, F u Data bit width sum W u Data bit width, againInvoking NNMACG instruction to obtain convolution result, i.e
S3, calculating depthwise convolution: the convolution kernel size is KH KW, the number of input image channels is IC, wherein IC is a multiple of 32, and the number of input channels of the convolution kernel is equal to 1; the convolution kernel after the input channel expansion is as followsThe number of the input channels of the convolution kernel after expansion is IC, and the filling values of the other input channels except the corresponding input channels are W t
NNA calculation depthwise convolution formula derivation:
if W is t =2 wbit-1 The formula can be further reduced to:
when the weight data is common signed data, the depthwise convolution result is calculated by using NNA, input channel expansion is needed to be carried out on the depthwise convolution kernel, and the values of the other input channels except the corresponding channel after expansion are all filled with 2 wbit-1 Wherein wbit is the weight data bit width;
s4, willData writing WRAM, F u Data write FRAM, configure NNA register, including WRAM and FRAM read address, convolution kernel size, extended convolution kernel input channel number IC, where IC must be a multiple of 32, F u Data bit width sum->The data bit width, and then the NNMACG instruction is called to obtain the depthwise convolution result, namely +.>
CN202210320433.XA 2022-03-29 2022-03-29 Method for realizing channel-by-channel convolution Pending CN116957018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210320433.XA CN116957018A (en) 2022-03-29 2022-03-29 Method for realizing channel-by-channel convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210320433.XA CN116957018A (en) 2022-03-29 2022-03-29 Method for realizing channel-by-channel convolution

Publications (1)

Publication Number Publication Date
CN116957018A true CN116957018A (en) 2023-10-27

Family

ID=88451536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210320433.XA Pending CN116957018A (en) 2022-03-29 2022-03-29 Method for realizing channel-by-channel convolution

Country Status (1)

Country Link
CN (1) CN116957018A (en)

Similar Documents

Publication Publication Date Title
CN111667051B (en) Neural network accelerator applicable to edge equipment and neural network acceleration calculation method
CN110378468B (en) Neural network accelerator based on structured pruning and low bit quantization
CN111967468B (en) Implementation method of lightweight target detection neural network based on FPGA
CN111459877B (en) Winograd YOLOv2 target detection model method based on FPGA acceleration
CN112214726B (en) Operation accelerator
US20180204110A1 (en) Compressed neural network system using sparse parameters and design method thereof
EP3499428A1 (en) Method and electronic device for convolution calculation in neutral network
Su et al. Redundancy-reduced mobilenet acceleration on reconfigurable logic for imagenet classification
CN108090560A (en) The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN108154229B (en) Image processing method based on FPGA (field programmable Gate array) accelerated convolutional neural network framework
US20220083857A1 (en) Convolutional neural network operation method and device
JP6724267B1 (en) Learning device, inference device, learning model generation method, and inference method
TWI775210B (en) Data dividing method and processor for convolution operation
WO2022163861A1 (en) Neural network generation device, neural network computing device, edge device, neural network control method, and software generation program
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN110490308B (en) Design method of acceleration library, terminal equipment and storage medium
Arredondo-Velazquez et al. A streaming architecture for Convolutional Neural Networks based on layer operations chaining
CN111667052B (en) Standard and nonstandard convolution consistency transformation method of special neural network accelerator
CN116957018A (en) Method for realizing channel-by-channel convolution
CN112561050A (en) Neural network model training method and device
US20220253709A1 (en) Compressing a Set of Coefficients for Subsequent Use in a Neural Network
CN116957003A (en) Processing method of depthwise weight based on NNA
CN109102074B (en) Training device
TWI798591B (en) Convolutional neural network operation method and device
CN116737382B (en) Neural network reasoning acceleration method based on area folding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination