CN112965931A - Digital integration processing method based on CNN cell neural network structure - Google Patents

Digital integration processing method based on CNN cell neural network structure Download PDF

Info

Publication number
CN112965931A
CN112965931A CN202110195846.5A CN202110195846A CN112965931A CN 112965931 A CN112965931 A CN 112965931A CN 202110195846 A CN202110195846 A CN 202110195846A CN 112965931 A CN112965931 A CN 112965931A
Authority
CN
China
Prior art keywords
data
neural network
cnn
method based
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110195846.5A
Other languages
Chinese (zh)
Inventor
蔡群林
周君临
兰军
彭杰
展晓宇
门爱东
黄笑天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Microchip Zhitong Technology Partnership LP
Original Assignee
Beijing Microchip Zhitong Technology Partnership LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Microchip Zhitong Technology Partnership LP filed Critical Beijing Microchip Zhitong Technology Partnership LP
Priority to CN202110195846.5A priority Critical patent/CN112965931A/en
Publication of CN112965931A publication Critical patent/CN112965931A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/161Computing infrastructure, e.g. computer clusters, blade chassis or hardware partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Neurology (AREA)
  • Mathematical Optimization (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a digital integrated processing method based on a CNN cell neural network structure, which supports synchronous and efficient two-dimensional neighborhood matrix calculation, performs synchronous block calculation on all points on an image pixel matrix, realizes the design of a special chip for efficient image processing, realizes high efficiency and low power consumption factors, has the advantage of low cost, supports various data structures and data accuracy, and perfectly supports various mainstream protocols and neural network models.

Description

Digital integration processing method based on CNN cell neural network structure
Technical Field
The invention relates to the technical field of digital integrated processing, in particular to a digital integrated processing method based on a CNN cell neural network structure.
Background
Currently, with the rise of the field of artificial intelligence, the position of image recognition is becoming more important, and as an important component in the field of image recognition, an arithmetic unit for image processing is also developing. From general computation of a CPU to one-dimensional parallel computation of a GPU, the development of a special operation integrated circuit is more and more rapid. The arithmetic units of the existing processors have more or less own disadvantages: the CPU, a classical von Neumann system, is a general processor, accesses memory to call data for processing by addressing through a control module, and has complex control logic. His single arithmetic unit is powerful, but the computational power cannot be shared between the arithmetic cores, is good at handling single complex computational tasks, but is not good at paralleling a large number of tasks. And the GPU is designed based on large throughput, and the Cache occupies less space and only supports Thread. The number of arithmetic units of the system is large, the system is good at large-scale repeated calculation application scenes, but only can perform one-dimensional operation, but is not good at performing two-dimensional operation on neighborhood region data, and the operation speed is low. The special image acceleration processor is realized based on a traditional convolutional neural network, is limited by a convolutional layer structure and a data structure, has low operation speed and limited precision and has little learning capability. The existing operation integrated circuit unit sacrifices the capability of high-speed parallel for the calculation power of a single body, increases the calculation throughput at the cost of simplifying the operation unit, or is limited by the structural design. Therefore, the problems that a computing unit with weak parallel processing capability and high throughput of a traditional architecture processor computing unit is weak in independent computing capability, is limited by structure and data, needs to convert a two-dimensional image into one-dimensional data for multiplication, is deficient in two-dimensional computing capability and high in power consumption for computing a large image are solved.
Disclosure of Invention
The invention aims to provide a digital integration processing method based on a CNN cell neural network structure, which supports high-efficiency two-dimensional neighborhood matrix calculation, performs synchronous block calculation on all points on an image pixel matrix, realizes the design of a special chip for high-efficiency image processing, realizes high efficiency and low power consumption factors, has the advantage of low cost, supports various data structures, and perfectly supports various mainstream protocols and neural network models.
The invention provides a digital integrated processing method based on a CNN cell neural network structure, which comprises the following steps:
the method comprises the following steps: the image is input by a processing operation set circuit consisting of cell circuit units in a pixel matrix form and collected, and each circuit unit simultaneously participates in processing image pixels;
step two: the cell circuit unit takes a certain pixel point in the image as a processing object, and the circuit transmits object image pixel point information, peripheral pixel point information of the operation unit and operation parameters to the circuit module;
step three: multiplying all pixel data by the operation parameters by using a multiplier unit to obtain optimized pixel data;
step four: calculating adjacent data on the structure in advance through three layers of adders, adding every two of the adjacent data, and summarizing to obtain processed pixel data finally;
step five: respectively operating the related connection points in the neighborhood of the cell circuit unit, and transmitting the operation result to the addition unit part for calculation after the operation result is beaten by a beat rhythm by a register;
step six: the unit operation parts correspond to different data widths respectively; the two-to-two addition of neighborhood processing parameters is formed structurally, and finally, the sum is collected, and the calculation process of the multiply-add of multiple bit widths and multiple data is accelerated by using the idea of space time change.
The further improvement lies in that: the number of the multipliers in the third step is 9, and the multipliers form a multiplication unit and respectively operate the relevant connection points in the neighborhood of the cell.
The further improvement lies in that: in the fourth step, the total number of the adders is 8 to form an addition unit part, and the three addition units are provided from the input and respectively correspond to different data widths and input preprocessing methods. The further improvement lies in that: and the calculation template in the fifth step is a correlation connection matrix relation.
The further improvement lies in that: the data structure of the arithmetic unit in the sixth step contains various formats, supports fixed point and high-precision floating point, and has a self-learning adjustment function.
The invention has the beneficial effects that: the target unit can be subjected to block calculation in a neighborhood range, the operation unit can be used for calculating the matrix to be directly convolved, the matrix does not need to be split into one-dimensional operation, direct calculation is achieved, control is flexible, the multi-core concept is achieved, the operation unit can be used repeatedly, and power consumption is low after the operation unit is scaled. The structural design of the adder is optimized, and the operation time can be effectively prolonged aiming at multi-bit data input. For different large pixel blocks (14 × 14/20 × 20), all can be processed effectively. The design structure can use a clock with higher frequency, and a special storage structure is added, so that the risk of a time sequence is effectively reduced, the load is eliminated, and the probability of burrs is reduced.
Drawings
Fig. 1 is a block diagram a of the CNN neural network system of the present invention.
Fig. 2 is a block diagram B of the CNN neural network system of the present invention.
Fig. 3 is a circuit diagram a of the CNN neural network of the present invention.
Fig. 4 is an enlarged view of the invention at box 1 of fig. 3.
Fig. 5 is an enlarged view of the invention at box 2 of fig. 3.
Fig. 6 is an enlarged view of the invention at box 3 of fig. 3.
Fig. 7 is an enlarged view of the invention at box 4 of fig. 3.
Fig. 8 is an enlarged view of the invention at box 5 of fig. 3.
Fig. 9 is an enlarged view of the invention at box 6 of fig. 3.
Fig. 10 is an enlarged view of the invention at box 7 of fig. 3.
Fig. 11 is an enlarged view of the invention at box 8 of fig. 3.
Fig. 12 is an enlarged view of the invention at box 9 of fig. 3.
Fig. 13 is a CNN cell neural network circuit diagram B of the present invention.
Fig. 14 is an enlarged view of the invention at box 1 of fig. 13.
Fig. 15 is an enlarged view of the invention at box 2 of fig. 13.
Fig. 16 is a circuit diagram of a multiplier of the present invention.
Fig. 17 is a circuit diagram of adder a of the present invention.
Fig. 18 is a circuit diagram of an adder B of the present invention.
Fig. 19 is a circuit diagram of an adder C of the present invention.
Detailed Description
For the purpose of enhancing understanding of the present invention, the present invention will be further described in detail with reference to the following examples, which are provided for illustration only and are not to be construed as limiting the scope of the present invention.
The system structure block diagram of the CNN cell neural network operation unit is shown in fig. 1 and fig. 2, taking image processing as an example, an image is expanded on a cell circuit unit by taking pixel points as units after passing through a cache unit, and all cell units in a cell circuit set simultaneously calculate all corresponding pixel points of the image and pixel points in the peripheral neighborhood range according to the connection relationship. Thinning to each cell unit circuit, taking a certain pixel point in the image as a processing object, and transmitting the pixel information of the object image, the pixel information around the operation and the operation parameter to the circuit module by the circuit. In the first stage, all pixel data are multiplied by the operation weight parameter by using a multiplier unit to obtain optimized pixel data. And in the second stage, the processed pixel data is subjected to pairwise addition and summarization through three layers of adders to finally obtain the processed pixel data.
Fig. 3 and 13 are circuit diagrams of cellular neural network circuits, respectively depicting architectures of different data structures and data precision. The process is as follows: the target pixel information, the pixel information i 1-i 9 in the peripheral neighborhood and the coefficient data p 1-p 9 of the correlation matrix are transmitted to the circuit in parallel, multiplied by a multiplier to obtain a correlation weight value, first-stage accumulation is carried out by a first-stage adder, the correlation weight value enters a register for storage, then the correlation weight value enters a second-stage adder for accumulation and summary, and finally the final value cnn _ out is obtained by accumulation of a third-stage adder. The data formats of i series and p series are determined by input data and parameter setting. In an alternative mode, where a selector is used, cnn _ out is allowed to feed back the aforementioned parameters, and the dynamic comparison changes the parameter settings to suit the relevant application. FIG. 3 is a block diagram of a first type of data structure and data precision. Wherein fig. 4 and 5 are multiplier operation parts, in which multiplication of correlation coefficient weights is performed to perform correlation calculation for all correlation points in the matrix. Fig. 6, 7, 8 and 9 show adder operations, which are performed in the first half of the tree accumulation. FIGS. 10 and 11 show the second part of the adder operation, the second half of the tree accumulation. Fig. 12 shows characteristic values obtained by shifting, integrating, and optimizing the final accumulated value combining parameters and pixel data. FIG. 13 is a block diagram of a second type of data structure and data precision. Fig. 14 and 15 show the multiplication and addition operation and the comprehensive shift operation, respectively.
Fig. 16 is a multiplier in the cellular neural network circuit, which is used to multiply the filter parameter and the pixel data, and the whole multiplication unit is composed of 9 multiplier structures, and the related connection points in the cell neighborhood are all operated. In the multiplication circuit structure, pixel data i and parameter data p are expanded, the operation mode of information in the parameter data is selected, then the two data are subjected to bit complementing operation according to the selected parameter structure, multiplication operation is carried out to obtain an output value m, and the operation result of the multiplication unit is transmitted to the addition unit part for calculation after a beat rhythm is played by a register.
FIG. 17, FIG. 18, FIG. 19 shows adders in a cellular neural network arithmetic unit, where the adders have 8 parts in total to form addition units, and two types of addition units are present in total from the input, and correspond to different data widths; the addition of neighborhood processing parameters in pairs is structurally formed, and finally, the summary is carried out, and the calculation process of multi-bit wide and multi-data addition is accelerated by using the idea of space time change. The addition circuit takes two types of data m _1 and m _2 with different structures as input, takes selection signals with different structures as control input m _1_ e, m _2_ e, m _1_ s and m _2_ s, and carries out control signal synthesis of mixing collocation according to the selection of the parameter model, thereby carrying out time-sharing input control on the two types of data and forming shift addition. And finally, outputting the target data in a time-sharing manner through a shift register to obtain superposed values sum _ m and sum _ e.
The design principle of the whole process is that the matrix convolution operation is directly and synchronously carried out in the whole image range by taking the correlation connection matrix relation as a template according to the pixel values of all the points in the neighborhood of the target point according to specific model parameters, thereby achieving the aim of realizing image (pixel) processing according to a set parameter model.

Claims (6)

1. A digital integration processing method based on a CNN cell neural network structure is characterized in that: the method comprises the following steps:
the method comprises the following steps: the image is input by a processing operation set circuit consisting of cell circuit units in a pixel matrix form and collected, and each circuit unit simultaneously participates in processing image pixels;
step two: the cell circuit unit takes a certain pixel point in the image as a processing object, and the circuit transmits object image pixel point information, peripheral pixel point information of the operation unit and operation parameters to the circuit module;
step three: multiplying all pixel data by the operation parameters by using a multiplier unit to obtain optimized pixel data;
step four: calculating adjacent data on the structure in advance through three layers of adders, adding every two of the adjacent data, and summarizing to obtain processed pixel data finally;
step five: respectively operating the related connection points in the neighborhood of the cell circuit unit, and transmitting the operation result to the addition unit part for calculation after the operation result is beaten by a beat rhythm by a register;
step six: the operation unit parts respectively correspond to different data widths and have different data structures and data precision types; the two-to-two addition of neighborhood processing parameters is formed structurally, and finally, the sum is collected, and the calculation process of the multiply-add of multiple bit widths and multiple data is accelerated by using the idea of space time change.
2. The digital integrated processing method based on the CNN cellular neural network structure, as claimed in claim 1, wherein: the number of the multipliers in the third step is 9, and the multipliers form a multiplication unit and respectively operate the relevant connection points in the neighborhood of the cell.
3. The digital integrated processing method based on the CNN cellular neural network structure, as claimed in claim 1, wherein: in the fourth step, the total number of the adders is 8 to form an addition unit part, and the three addition units are provided from the input and respectively correspond to different data widths and input preprocessing methods.
4. A digital integrated processing method based on CNN cellular neural network architecture as claimed in claim 3, characterized in that: the connection method of the adder adopts a method of adding in advance and finally summarizing in local proximity.
5. The digital integrated processing method based on the CNN cellular neural network structure, as claimed in claim 1, wherein: and the calculation template in the fifth step is a correlation connection matrix relation.
6. The method of claim 1, wherein the method comprises: the multiple data types in the sixth step comprise a fixed point and a floating point, wherein the fixed point comprises a mainstream input image format and a mainstream input image model; the floating point has high precision and a learning, adjusting and feedback function.
CN202110195846.5A 2021-02-22 2021-02-22 Digital integration processing method based on CNN cell neural network structure Pending CN112965931A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110195846.5A CN112965931A (en) 2021-02-22 2021-02-22 Digital integration processing method based on CNN cell neural network structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110195846.5A CN112965931A (en) 2021-02-22 2021-02-22 Digital integration processing method based on CNN cell neural network structure

Publications (1)

Publication Number Publication Date
CN112965931A true CN112965931A (en) 2021-06-15

Family

ID=76285404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110195846.5A Pending CN112965931A (en) 2021-02-22 2021-02-22 Digital integration processing method based on CNN cell neural network structure

Country Status (1)

Country Link
CN (1) CN112965931A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
US9940534B1 (en) * 2016-10-10 2018-04-10 Gyrfalcon Technology, Inc. Digital integrated circuit for extracting features out of an input image based on cellular neural networks
CN110033086A (en) * 2019-04-15 2019-07-19 北京异构智能科技有限公司 Hardware accelerator for neural network convolution algorithm
CN110780845A (en) * 2019-10-17 2020-02-11 浙江大学 Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof
CN110807522A (en) * 2019-10-31 2020-02-18 合肥工业大学 General calculation circuit of neural network accelerator
CN111178519A (en) * 2019-12-27 2020-05-19 华中科技大学 Convolutional neural network acceleration engine, convolutional neural network acceleration system and method
CN111832719A (en) * 2020-07-28 2020-10-27 电子科技大学 Fixed point quantization convolution neural network accelerator calculation circuit
CN112308217A (en) * 2019-07-31 2021-02-02 北京欣奕华科技有限公司 Convolutional neural network acceleration method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9940534B1 (en) * 2016-10-10 2018-04-10 Gyrfalcon Technology, Inc. Digital integrated circuit for extracting features out of an input image based on cellular neural networks
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN110033086A (en) * 2019-04-15 2019-07-19 北京异构智能科技有限公司 Hardware accelerator for neural network convolution algorithm
CN112308217A (en) * 2019-07-31 2021-02-02 北京欣奕华科技有限公司 Convolutional neural network acceleration method and system
CN110780845A (en) * 2019-10-17 2020-02-11 浙江大学 Configurable approximate multiplier for quantization convolutional neural network and implementation method thereof
CN110807522A (en) * 2019-10-31 2020-02-18 合肥工业大学 General calculation circuit of neural network accelerator
CN111178519A (en) * 2019-12-27 2020-05-19 华中科技大学 Convolutional neural network acceleration engine, convolutional neural network acceleration system and method
CN111832719A (en) * 2020-07-28 2020-10-27 电子科技大学 Fixed point quantization convolution neural network accelerator calculation circuit

Similar Documents

Publication Publication Date Title
CN111459877B (en) Winograd YOLOv2 target detection model method based on FPGA acceleration
TWI749249B (en) Chip device, chip, intelligent device and operation method of the neural network
CN107301456B (en) Deep neural network multi-core acceleration implementation method based on vector processor
CN110163358B (en) Computing device and method
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
US20210357735A1 (en) Split accumulator for convolutional neural network accelerator
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
US11604975B2 (en) Ternary mode of planar engine for neural processor
US20230206051A1 (en) Broadcasting mode of planar engine for neural processor
CN111178492B (en) Computing device, related product and computing method for executing artificial neural network model
US11853868B2 (en) Multi dimensional convolution in neural network processor
CN110659014B (en) Multiplier and neural network computing platform
CN112965931A (en) Digital integration processing method based on CNN cell neural network structure
US20230259780A1 (en) Neural network sparsification apparatus and method and related product
CN116090519A (en) Compiling method of convolution operator and related product
Kang et al. Design of convolution operation accelerator based on FPGA
Yin et al. A reconfigurable accelerator for generative adversarial network training based on FPGA
JP7120308B2 (en) DATA PROCESSING DEVICE, DATA PROCESSING CIRCUIT AND DATA PROCESSING METHOD
Shahan et al. FPGA based convolution and memory architecture for Convolutional Neural Network
Wang et al. An FPGA-based reconfigurable CNN training accelerator using decomposable Winograd
CN113592067B (en) Configurable convolution calculation circuit for convolution neural network
Jangamreddy A Survey on Specialised Hardware for Machine Learning
Yang et al. A General-Purpose CNN Accelerator Based on Improved Systolic Array for FPGAs
US20240320470A1 (en) Mulicast and simulcast in neural engines
CN118626047A (en) Processor chip architecture scheme for approximate replacement of full adder in chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination