CN109615071B - High-energy-efficiency neural network processor, acceleration system and method - Google Patents

High-energy-efficiency neural network processor, acceleration system and method Download PDF

Info

Publication number
CN109615071B
CN109615071B CN201811592475.9A CN201811592475A CN109615071B CN 109615071 B CN109615071 B CN 109615071B CN 201811592475 A CN201811592475 A CN 201811592475A CN 109615071 B CN109615071 B CN 109615071B
Authority
CN
China
Prior art keywords
data
calculation
convolution
weight data
input data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811592475.9A
Other languages
Chinese (zh)
Other versions
CN109615071A (en
Inventor
秦刚
姜凯
李朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Scientific Research Institute Co Ltd
Original Assignee
Shandong Inspur Scientific Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Scientific Research Institute Co Ltd filed Critical Shandong Inspur Scientific Research Institute Co Ltd
Priority to CN201811592475.9A priority Critical patent/CN109615071B/en
Publication of CN109615071A publication Critical patent/CN109615071A/en
Application granted granted Critical
Publication of CN109615071B publication Critical patent/CN109615071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a high-energy-efficiency neural network processor, an acceleration system and a method, which belong to neural network processing devices and aim to solve the technical problems that: how to reduce the read-write times of the multiplier and the data memory and accelerate the calculation of the neural network; the structure package of the system is a main control chip containing an ARM core, and the system comprises a processor unit and a logic calculation unit, wherein the logic calculation unit is electrically connected with the processor unit through a bus interface. The system comprises the main control chip and the storage module. The method comprises the following steps: selecting and sorting the weight data; acquiring input data in a data multiplexing mode, and performing convolution operation and pooling operation on the weight data and the input data in a parallel computing mode according to a plurality of PE computing subunits; and acquiring the output data of each PE calculation subunit, performing addition operation to obtain final data, and storing the final data in a storage module. The invention can reduce convolution times and read-write times of external storage.

Description

High-energy-efficiency neural network processor, acceleration system and method
Technical Field
The invention relates to the field of neural network processing devices, in particular to a high-energy-efficiency neural network processor, an acceleration system and a method.
Background
The deep learning technology is a booster developed by an artificial intelligence technology, and the deep learning adopts a topological structure of a deep neural network to train, optimize and reason.
The convolutional neural network is the basis of deep learning, the calculation amount of the convolutional operation in the whole algorithm is large, a large number of multiplier units are needed, and the convolutional neural network is a bottleneck influencing the performance. The method adopted at present is as follows; the multiplication and accumulation are carried out in parallel to form a structure that the output of a plurality of multipliers is connected to an addition tree. When the existing system and method are used for multiplication and accumulation in parallel, the weight data and the related data need to be read for many times, the loss of a storage unit and a multiplier is large, and the calculation speed is low.
How to reduce the number of times of reading and writing the multiplier and the data memory and accelerate the calculation of the neural network is a technical problem to be solved.
Disclosure of Invention
The technical task of the invention is to provide a high-energy-efficiency neural network processor, an acceleration system and a method aiming at the defects, so as to solve the problems of reducing the read-write times of a multiplier and a data memory and accelerating the calculation of a neural network.
In a first aspect, an embodiment of the present invention provides an energy-efficient neural network processor, which is a main control chip including an ARM core, and includes:
the processor unit is used for acquiring input data and weight data and generating instruction data according to a model of the neural network;
the logic calculation unit is electrically connected with the processor unit through a bus interface and comprises an instruction FIFO subunit, a data FIFO subunit, a sorting subunit, an addition subunit and a plurality of PE calculation subunits, wherein:
the instruction FIFO subunit is used for realizing FIFO of instruction data and activating a proper number of PE calculation subunits and resources of the PE calculation subunits according to the instruction data;
a data FIFO subunit for implementing FIFO of the weight data and the input data;
the sequencing submodule is used for outputting the weight data and the input data in sequence based on the principle that the weight data which is positive number is output preferentially, the weight data which is negative number is output later and the weight data which is zero is not output;
the PE calculation subunit is used for performing convolution operation and pooling operation on the weight data and the input data and judging whether to automatically terminate the convolution operation;
the PE computing subunits are multiple in number, acquire input data in a data multiplexing mode and perform convolution operation and pooling operation on the weight data and the input data in a parallel computing mode;
and the addition subunit is used for performing addition operation on the data output by the PE calculation subunits.
In the embodiment, the weight data are sorted according to the principle that positive numbers are output preferentially, negative numbers are output after being output, and zero is not output, so that when the PE calculation subunits perform convolution operation, the convolution operation frequency can be reduced, the input data of each PE calculation subunit is multiplexed, and the read-write frequency from an external storage module can be further reduced, thereby reducing the external read-write of an external memory and reducing the use resources of related units of internal convolution operation.
Preferably, the PE calculation subunit includes:
the convolution calculation micro units are used for acquiring input data in a data multiplexing mode and carrying out convolution operation on the weight data and the input data in a serial calculation mode;
and the activation function is configured as a relu function and used for judging whether to terminate the convolution operation in the convolution calculation micro unit, and the judgment principle is as follows: if the convolution operation of the current M weight data and the related input data in the convolution calculation micro unit is negative, automatically stopping the convolution operation in the convolution calculation micro unit and outputting zero; otherwise, carrying out convolution operation on the N weight data and the related input data and outputting convolution data which is positive; wherein M < N, N is the total number of weight data located in the convolution calculation microcell;
and the pooling layer is used for performing pooling operation on the output data of each convolution calculation micro unit.
In the preferred embodiment, the convolution calculation micro units multiplex input data and weight data, that is, when convolution calculation is performed in each PE calculation subunit, each convolution calculation micro unit outputs the input data to the next convolution calculation micro unit for convolution calculation, so that it is not necessary to repeat reading and writing from an external or internal storage area, and the reading and writing frequency of an external memory or an internal storage area is reduced, thereby reducing power consumption; according to the characteristics of the activation function, when convolution calculation is performed by the convolution calculation micro unit, if the former convolution results are negative values, the rest convolution calculation can be omitted, so that the utilization rate of the convolution calculation micro unit is reduced.
Preferably, the logic calculation unit further comprises:
and the compression/decompression unit is used for compressing the weight data according to the level of the neural network and according to the run-length coding compression algorithm, or performing lossless decompression on the compressed weight data.
In the preferred embodiment, the compression of the weight data is realized by the compression/decompression unit, and the compressed weight data is stored in the external memory or the internal buffer area, so that the storage space can be reduced; the lossless compression is carried out on the weight data, which is equivalent to reducing the read-write times of external storage, thereby reducing the energy consumption.
Preferably, the logic calculation unit further comprises:
and the buffer subunit is used for temporarily storing the weight data, the input data and the instruction data.
Preferably, the bus interface is an AXI interface, which supports DMA data transfer.
Preferably, the main control chip is a zynq chip, a PS end of the zynq chip is used as a processor unit, and a PL end of the zynq chip is used as a logic calculation unit.
In a second aspect, an embodiment of the present invention provides an energy-efficient neural network acceleration system, including:
an energy-efficient neural network processor as in the first aspect; and
and the storage module is electrically connected with the main control chip and used for storing the weight data and the output data of the addition unit.
The embodiment provides an energy-efficient neural network acceleration system for performing neural network acceleration calculation.
In a third aspect, an embodiment of the present invention provides an energy-efficient neural network acceleration method, including:
storing the weight data in a storage module;
acquiring weight data and input data, and generating instruction data according to a model of the neural network;
acquiring instruction data in an FIFO mode, and activating a proper number of PE calculation subunits and resources of the PE calculation subunits according to the instruction data;
acquiring input data and weight data in an FIFO mode, and selectively sequencing the weight data and the input data based on the principle that weight data which is positive number is preferentially output, weight data which is negative number is output later, and weight data which is zero is not output;
acquiring input data in a data multiplexing mode, performing convolution operation and pooling operation on the weight data and the input data in a mode of parallel calculation of a plurality of PE calculation subunits, and judging whether to terminate the convolution operation according to the result of the convolution operation in the convolution operation process;
and acquiring the output data of each PE calculation subunit, performing addition operation to obtain final data, and storing the final data in a storage module.
Preferably, the PE calculation subunit performs convolution operation and pooling operation on the weight data and the related input data, and includes:
acquiring input data in a data multiplexing mode, and performing convolution operation on the weight data and the related input data in a mode of serial calculation of a plurality of convolution calculation micro units;
in the process of carrying out convolution operation in the convolution calculation micro unit, judging whether to terminate the convolution operation in the current convolution calculation micro unit through an activation function, wherein the method comprises the following steps: if the convolution operation of the current M weight data and the related input data in the convolution calculation micro unit is negative, automatically stopping the convolution operation in the convolution calculation micro unit and outputting zero; otherwise, carrying out convolution operation on the N weight data and the related input data and outputting convolution data which is positive; wherein M < N, N is the total number of weight data located in the convolution calculation microcell;
and performing pooling operation on the output data of each convolution calculation micro unit.
Preferably, before the weight data is stored in the external storage module, the weight data is compressed according to the level of the neural network and a run length coding compression algorithm, and the compressed weight data is stored in the external storage module;
and before the weight data and the input data are selected and sorted, lossless decompression is carried out on the compressed weight data.
The high-energy-efficiency neural network processor, the acceleration system and the method provided by the invention have the following advantages:
1. the weight data and the input data are sequenced by the sequencing submodule, so that convolution operation is preferentially carried out on the weight data which are positive numbers and the related input data in the subsequent PE calculation subunit, then convolution operation is carried out on the weight data which are negative numbers and the related input data, and whether the convolution operation is automatically stopped or not can be judged according to the convolution operation result in the volume operation process, so that the use of a convolution calculation related unit can be reduced, and the calculation speed is accelerated;
2. the input data are acquired among the PE calculation subunits in a data multiplexing mode, and meanwhile, the input data are acquired among the convolution calculation micro units in the PE calculation subunits in a data multiplexing mode, so that reading and writing of external storage can be reduced, power consumption is reduced, the utilization rate of the input data is improved, and the calculation speed is increased;
3. and the weight data is compressed and stored, so that the occupation of the storage space is reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an energy-efficient neural network processor according to embodiment 1;
FIG. 2 is a schematic structural diagram of an improved energy-efficient neural network processor according to embodiment 1;
fig. 3 is a schematic structural diagram of an energy-efficient neural network acceleration system according to embodiment 2;
fig. 4 is a flow chart of an energy-efficient neural network acceleration method according to embodiment 3.
Detailed Description
The present invention is further described below with reference to the accompanying drawings and specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not intended to limit the present invention, and the embodiments and technical features of the embodiments can be combined with each other without conflict.
It is to be understood that "a plurality" in the embodiments of the present invention means two or more.
The embodiment of the invention provides a high-energy-efficiency neural network processor, an acceleration system and a method, which are used for solving the technical problems of reducing the read-write times of a multiplier and a data memory and accelerating the calculation of a neural network.
Example 1:
as shown in fig. 1, the embodiment provides an energy-efficient neural network processor, which is a main control chip including an ARM core, and includes a processor unit and a logic computation unit, where the processor unit and the logic computation unit are electrically connected through a bus interface.
The processor unit comprises a multi-core ARM processor and is used for acquiring input data and weight data and generating instruction data according to a model of the neural network.
The bus interface is an AXI interface and supports a DMA data transmission mode.
The logic computation unit includes an instruction FIFO subunit, a data FIFO subunit, a sorting subunit, an addition subunit, and a plurality of PE computation subunits.
And the instruction FIFO subunit selects an FIFO memory, is connected between the plurality of PE automatic units through the bus interface, is used for realizing FIFO of instruction data, and activates a proper number of PE calculation subunits and resources of the PE calculation subunits according to the instruction data.
And the data FIFO subunit selects an FIFO memory to realize the electrical connection between the bus interface and the sequencing subunit and between the bus interface and the PE calculation subunit, and is used for realizing the FIFO of the weight data and the input data.
And the sequencing sub-module is used for acquiring the weight data from the data FIFO sub-unit, sequencing the weight data and the input data based on the principle that the weight data which is positive number is output preferentially, the weight data which is negative number is output later and the weight data which is zero is not output, and outputting the weight data and the corresponding input data to the PE sub-unit in sequence.
And the PE calculation subunit is used for performing convolution operation and pooling operation on the weight data and the input data and judging whether to automatically terminate the convolution operation according to the result of the convolution operation in the volume operation process.
In this embodiment, each PE calculation subunit includes N convolution calculation micro units, an activation function, and a pooling layer.
Wherein, N convolution calculation microcells have the following functions: and acquiring the sorted weight data and related input data from the sorting subunit, acquiring the input data among the N convolution calculation micro units in a data multiplexing mode, and performing convolution operation on the weight data and the input data in a serial calculation mode. The mode that the N convolution calculation micro units acquire input data in a data multiplexing mode is as follows:
partial input data are acquired in each convolution calculation micro unit to carry out convolution operation, the convolution calculation micro units carry out convolution operation in series in sequence, and after the convolution operation of the partial input data currently acquired by each convolution calculation micro unit and the weight data is finished, the convolution calculation micro units register the partial input data currently acquired to the next convolution calculation micro unit around, so that the input data can be acquired among the convolution calculation micro units in a data multiplexing mode, and the frequency of acquiring data from the outside is reduced.
An activation function configured as a relu function for judging whether to terminate the convolution operation, the judgment principle being: if the convolution operation of the current M weight data and the related input data in the convolution calculation micro unit is negative, automatically stopping the convolution operation and outputting zero; otherwise, carrying out convolution operation on the N weight data and the related input data and outputting convolution data which is positive number; where M < N, N is the total number of weight data output by the sorting subunit. The principle is as follows: in each convolution calculation micro unit, positive weight data is preferentially input, then negative weight data is input, namely convolution operation is preferentially carried out on the positive weight data and the corresponding input data during convolution calculation, after convolution operation is carried out on all the positive weight data and the corresponding input data, convolution operation is carried out on the negative weight data and the corresponding convolution data, if the positive weight data and the corresponding input data, and part of the negative weight data and the corresponding input data are calculated, negative numbers appear on the convolution data, the final result is affirmed as a negative number, due to the characteristic of an activation function, subsequent convolution calculation is automatically stopped, and the convolution operation of the convolution calculation micro unit is finished and zero is output.
And the pooling layer is used for performing pooling operation on the output data of each convolution calculation micro unit.
The PE calculation sub-units are a plurality of sub-units having the following functions: the method comprises the steps of acquiring input data in a data multiplexing mode, and performing convolution operation and pooling operation on weight data and the input data in a parallel computing mode. The operating modes of the plurality of PE computation units are understood as:
each PE calculation subunit acquires partial input data, convolution operation and pooling operation are carried out on the partial input data and the weight data, the PE calculation subunits synchronously carry out calculation work in parallel, and after the last convolution operation of the partial input data acquired by each PE calculation subunit and the weight data is finished, the PE calculation subunit registers the partial input data acquired at present to the next PE calculation subunit around, so that the PE calculation subunits can acquire the input data in a data multiplexing mode, and the frequency of acquiring the data from the outside is reduced.
And the addition subunit is electrically connected with the plurality of PE calculation subunits and is used for acquiring the pooled data from the PE calculation subunits and performing addition operation to obtain final data.
The high-energy-efficiency neural network processor can be used for being matched with an external storage module to accelerate the calculation of a neural network.
As shown in fig. 2, as a further improvement of this embodiment, the logic calculating unit further includes a compression/decompression unit, which is configured to compress the weight data according to the level of the neural network and according to the run-length encoding compression algorithm, or to decompress the compressed weight data without loss.
The compression/decompression unit has the following specific functions in the acceleration operation application of the neural network: before the neural network calculation, the weight data of the trained neural network is compressed by adopting a run-length coding compression algorithm according to the level of the neural network, so that the storage space is saved. Lossless decompression is performed on the compressed weight data before the weight data and the input data are operated.
As a further improvement of this embodiment, the logic calculating unit further includes a buffer subunit, and the buffer subunit is configured to temporarily store the weight data, the input data, and the instruction data. The buffer unit is connected with the command FIFO subunit through a bus interface to acquire the weight data, the input data and the command data, transmit the command data to the command FIFO subunit and transmit the weight data and the input data to the sequencing subunit.
In this embodiment, the main control chip selects a zynqMP chip, a PS terminal of the zynqMP chip is used as a processor unit, and a PL terminal of the zynqMP chip is used as a logic calculation unit.
Example 2:
as shown in fig. 3, the present embodiment provides an energy-efficient neural network acceleration system, which includes a storage module and an energy-efficient neural network processor disclosed in embodiment 1. The storage module is used for storing the weight data and the result data output by the adding subunit. Wherein the weight data can be decompressed weight data or compressed weight data.
The energy-efficient neural network acceleration system provided by the embodiment can realize acceleration of neural network calculation.
Example 3:
as shown in fig. 4, the present embodiment provides an energy-efficient neural network acceleration method, which is implemented based on the energy-efficient neural network acceleration system disclosed in embodiment 2, and includes the following steps:
s100, storing the weight data in an external storage module;
s200, acquiring weight data and input data through a processor unit, and generating instruction data according to a model of a neural network;
s300, acquiring instruction data through the instruction FIFO subunit, realizing FIFO of the instruction data, and activating a proper number of PE calculation subunits and resources of the PE calculation subunits according to the instruction data;
the data FIFO subunit is used for acquiring input data and weight data, realizing FIFO of the input data and the weight data, and selectively sequencing the weight data and the input data through the sequencing subunit based on the principle that the weight data which is positive number is preferentially output, the weight data which is negative number is output later, and the weight data which is zero is not output;
s400, acquiring input data in a data multiplexing mode, performing convolution operation and pooling operation on the weight data and the input data in a mode of parallel calculation of a plurality of PE calculation subunits, and judging whether to terminate the convolution operation according to the result of the convolution operation in the convolution operation process;
s500, acquiring output data of each PE calculation subunit, performing addition operation to obtain final data, and storing the final data in a storage module.
In step S300, the execution steps are preferably:
s310, acquiring instruction data through the instruction FIFO subunit, realizing FIFO of the instruction data, and activating a proper number of PE calculation subunits and resources of the PE calculation subunits according to the instruction data;
s320, acquiring input data and weight data through a data FIFO subunit to realize FIFO of the input data and the weight data;
s330, based on the principle that the weight data which is positive number is output preferentially, the weight data which is negative number is output later, and the weight data which is zero is not output, the weight data and the input data are selected and sorted through the sorting subunit.
In step S400, the PE calculation subunits acquire input data in a data multiplexing manner, and perform convolution operation and pooling operation on the weight data and the input data according to a parallel calculation manner of the PE calculation subunits, where the operation mode is: each PE calculation subunit acquires partial input data, convolution operation and pooling operation are carried out on the partial input data and the weight data, the PE calculation subunits synchronously carry out calculation work in parallel, and after the last convolution operation of the partial input data currently acquired by each PE calculation subunit and the weight data is finished, the PE calculation subunit registers the partial input data currently acquired to the next surrounding PE calculation subunit, so that the PE calculation subunits can acquire the input data in a data multiplexing mode, and the frequency of acquiring data from the outside is reduced.
The PE calculation subunit performs convolution and pooling operations on the weight data and the related input data, including:
s410, acquiring input data in a data multiplexing mode, and performing convolution operation on the weight data and the related input data in a mode of serial calculation of a plurality of convolution calculation micro units;
in the process of carrying out convolution operation in the convolution calculation micro unit, judging whether to terminate the convolution operation in the current convolution calculation micro unit through an activation function, wherein the method comprises the following steps: if the convolution operation of the current M weight data and the related input data in the convolution calculation micro unit is negative, automatically stopping the convolution operation in the convolution calculation micro unit and outputting zero; otherwise, carrying out convolution operation on the N weight data and the related input data and outputting convolution data which is positive number; wherein M < N, N is the total number of weight data in the convolution calculation micro unit;
and S420, performing pooling operation on the output data of each convolution calculation micro unit.
In step S410, the plurality of convolution calculating micro units obtain the input data in a data multiplexing manner, and perform convolution operation on the weight data and the input data according to a serial calculating manner of the plurality of convolution calculating micro units, wherein a working mode of the convolution calculating micro units is as follows: each convolution calculation micro unit acquires partial input data to carry out convolution operation, the convolution calculation micro units carry out the convolution operation in series in sequence, and after the convolution operation of the partial input data currently acquired by each convolution calculation micro unit and the weight data is finished, the convolution calculation micro unit registers the partial input data currently acquired to the next convolution calculation micro unit around, so that the convolution calculation micro units can acquire the input data in a data multiplexing mode, and the frequency of acquiring the data from the outside is reduced.
As a further improvement of this embodiment, before executing step S100 to store the weight data in the external storage module, the weight data is compressed by the compression/pressurization subunit, and the compression step is: according to the hierarchy of the neural network, adopting a run-length coding algorithm to compress the weight data in a layering way, and storing the compressed weight data into an external storage module; before step S330 is executed, i.e. before the weight data is subjected to selective sorting, lossless decompression is performed on the weight data.
And the weight data is compressed, so that the storage space can be saved.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (8)

1. An energy-efficient neural network processor is characterized in that the processor is a master control chip comprising an ARM core, and the processor comprises:
the processor unit is used for acquiring input data and weight data and generating instruction data according to a model of the neural network;
the logic calculation unit is electrically connected with the processor unit through a bus interface and comprises an instruction FIFO subunit, a data FIFO subunit, a sequencing subunit, an addition subunit and a plurality of PE calculation subunits, wherein:
the instruction FIFO subunit is used for realizing FIFO of instruction data and activating a proper number of PE calculation subunits and resources of the PE calculation subunits according to the instruction data;
a data FIFO subunit for implementing FIFO of the weight data and the input data;
the sorting submodule is used for outputting the weight data and the input data in sequence based on the principle that the weight data which is positive number is output preferentially, the weight data which is negative number is output after, and the weight data which is zero is not output;
the PE calculation subunit is used for performing convolution operation and pooling operation on the weight data and the input data and judging whether to automatically terminate the convolution operation;
the PE computing subunits are multiple in number, acquire input data in a data multiplexing mode and perform convolution operation and pooling operation on the weight data and the input data in a parallel computing mode;
an addition subunit, configured to perform addition operation on the data output by the plurality of PE calculation subunits;
the PE calculation subunit includes:
the convolution calculation micro units are used for acquiring input data in a data multiplexing mode and carrying out convolution operation on the weight data and the input data in a serial calculation mode;
and the activation function is configured as a relu function and used for judging whether to terminate convolution operation in the convolution calculation micro unit, and the judgment principle is as follows: if the convolution operation of the current M weight data and the related input data in the convolution calculation micro unit is negative, automatically stopping the convolution operation in the convolution calculation micro unit and outputting zero; otherwise, carrying out convolution operation on the N weight data and the related input data and outputting convolution data which is positive; wherein M < N, N is the total number of weight data located in the convolution calculation microcell;
and the pooling layer is used for performing pooling operation on the output data of each convolution calculation micro unit.
2. The energy efficient neural network processor of claim 1, wherein the logic computation unit further comprises:
and the compression/decompression unit is used for compressing the weight data according to the level of the neural network and according to the run-length coding compression algorithm, or performing lossless decompression on the compressed weight data.
3. The energy efficient neural network processor of claim 1, wherein the logic computation unit further comprises:
and the buffer subunit is used for temporarily storing the weight data, the input data and the instruction data.
4. The energy efficient neural network processor of claim 1, wherein the bus interface is an AXI interface that supports DMA data transfer.
5. The neural network processor with high energy efficiency according to claim 1, wherein the main control chip is a zynq chip, a PS end of the zynq chip is used as a processor unit, and a PL end of the zynq chip is used as a logic calculation unit.
6. An energy-efficient neural network acceleration system characterized by comprising:
an energy efficient neural network processor as claimed in any one of claims 1 to 4;
and the storage module is electrically connected with the main control chip and used for storing the weight data and the output data of the addition unit.
7. An energy-efficient neural network acceleration method, characterized by comprising:
storing the weight data in a storage module;
acquiring weight data and input data, and generating instruction data according to a model of the neural network;
acquiring instruction data in an FIFO mode, and activating a proper number of PE calculation subunits and resources of the PE calculation subunits according to the instruction data;
acquiring input data and weight data in an FIFO mode, and selectively sequencing the weight data and the input data based on the principle that weight data which is positive number is preferentially output, weight data which is negative number is output later, and weight data which is zero is not output;
acquiring input data in a data multiplexing mode, performing convolution operation and pooling operation on the weight data and the input data in a mode of parallel calculation of a plurality of PE calculation subunits, and judging whether to terminate the convolution operation according to the result of the convolution operation in the convolution operation process;
acquiring output data of each PE calculation subunit, performing addition operation to obtain final data, and storing the final data in a storage module;
the PE calculation subunit performs convolution and pooling operations on the weight data and the related input data, including:
acquiring input data in a data multiplexing mode, and performing convolution operation on the weight data and the related input data in a serial calculation mode of a plurality of convolution calculation micro units;
in the process of carrying out convolution operation in the convolution calculation micro unit, judging whether to terminate the convolution operation in the current convolution calculation micro unit through an activation function, wherein the method comprises the following steps: if the convolution operation of the current M weight data and the related input data in the convolution calculation micro unit is negative, automatically stopping the convolution operation in the convolution calculation micro unit and outputting zero; otherwise, carrying out convolution operation on the N weight data and the related input data and outputting convolution data which is positive; wherein M < N, N is the total number of weight data located in the convolution calculation microcell;
and performing pooling operation on the output data of each convolution calculation micro unit.
8. The energy-efficient neural network acceleration method of claim 7, characterized in that before the weight data is stored in the external storage module, the weight data is compressed according to the level of the neural network and according to the run length coding compression algorithm, and the compressed weight data is stored in the external storage module;
and before the weight data and the input data are selected and sorted, carrying out lossless decompression on the compressed weight data.
CN201811592475.9A 2018-12-25 2018-12-25 High-energy-efficiency neural network processor, acceleration system and method Active CN109615071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811592475.9A CN109615071B (en) 2018-12-25 2018-12-25 High-energy-efficiency neural network processor, acceleration system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811592475.9A CN109615071B (en) 2018-12-25 2018-12-25 High-energy-efficiency neural network processor, acceleration system and method

Publications (2)

Publication Number Publication Date
CN109615071A CN109615071A (en) 2019-04-12
CN109615071B true CN109615071B (en) 2023-04-18

Family

ID=66011554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811592475.9A Active CN109615071B (en) 2018-12-25 2018-12-25 High-energy-efficiency neural network processor, acceleration system and method

Country Status (1)

Country Link
CN (1) CN109615071B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781896B (en) * 2019-10-17 2022-07-19 暨南大学 Track garbage identification method, cleaning method, system and resource allocation method
CN111027683A (en) * 2019-12-09 2020-04-17 Oppo广东移动通信有限公司 Data processing method, data processing device, storage medium and electronic equipment
CN113627600B (en) * 2020-05-07 2023-12-29 合肥君正科技有限公司 Processing method and system based on convolutional neural network
CN111738427B (en) * 2020-08-14 2020-12-29 电子科技大学 Operation circuit of neural network
WO2024124808A1 (en) * 2022-12-14 2024-06-20 北京登临科技有限公司 Convolution calculation unit, ai operation array, sparse convolution operation method and related device
CN117391148A (en) * 2022-12-14 2024-01-12 北京登临科技有限公司 Convolution calculation unit, AI operation array and related equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN107679621B (en) * 2017-04-19 2020-12-08 赛灵思公司 Artificial neural network processing device

Also Published As

Publication number Publication date
CN109615071A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN109615071B (en) High-energy-efficiency neural network processor, acceleration system and method
CN108805267B (en) Data processing method for hardware acceleration of convolutional neural network
CN110991632B (en) Heterogeneous neural network calculation accelerator design method based on FPGA
CN106447034A (en) Neutral network processor based on data compression, design method and chip
CN107027036A (en) A kind of FPGA isomeries accelerate decompression method, the apparatus and system of platform
CN102457283A (en) Data compression and decompression method and equipment
CN109284824B (en) Reconfigurable technology-based device for accelerating convolution and pooling operation
CN114723033B (en) Data processing method, data processing device, AI chip, electronic device and storage medium
CN111880911A (en) Task load scheduling method, device and equipment and readable storage medium
CN110543936B (en) Multi-parallel acceleration method for CNN full-connection layer operation
CN111860773B (en) Processing apparatus and method for information processing
Shahshahani et al. Memory optimization techniques for fpga based cnn implementations
CN108647780B (en) Reconfigurable pooling operation module structure facing neural network and implementation method thereof
CN111008698A (en) Sparse matrix multiplication accelerator for hybrid compressed recurrent neural networks
CN113497627A (en) Data compression and decompression method, device and system
CN116167425A (en) Neural network acceleration method, device, equipment and medium
CN114065923A (en) Compression method, system and accelerating device of convolutional neural network
CN111382853B (en) Data processing device, method, chip and electronic equipment
CN111260046B (en) Operation method, device and related product
CN107831824B (en) Clock signal transmission method and device, multiplexing chip and electronic equipment
CN111679788A (en) NAND memory with auxiliary computing function
CN116187408B (en) Sparse acceleration unit, calculation method and sparse neural network hardware acceleration system
CN115796239B (en) Device for realizing AI algorithm architecture, convolution computing device, and related methods and devices
CN115115018A (en) Acceleration system for long and short memory neural network
CN111260070A (en) Operation method, device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230327

Address after: 250000 building S02, No. 1036, Langchao Road, high tech Zone, Jinan City, Shandong Province

Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd.

Address before: 250100 First Floor of R&D Building 2877 Kehang Road, Sun Village Town, Jinan High-tech Zone, Shandong Province

Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant