CN112965931A

CN112965931A - Digital integration processing method based on CNN cell neural network structure

Info

Publication number: CN112965931A
Application number: CN202110195846.5A
Authority: CN
Inventors: 蔡群林; 周君临; 兰军; 彭杰; 展晓宇; 门爱东; 黄笑天
Original assignee: Beijing Microchip Zhitong Technology Partnership LP
Current assignee: Beijing Microchip Zhitong Technology Partnership LP
Priority date: 2021-02-22
Filing date: 2021-02-22
Publication date: 2021-06-15

Abstract

The invention provides a digital integrated processing method based on a CNN cell neural network structure, which supports synchronous and efficient two-dimensional neighborhood matrix calculation, performs synchronous block calculation on all points on an image pixel matrix, realizes the design of a special chip for efficient image processing, realizes high efficiency and low power consumption factors, has the advantage of low cost, supports various data structures and data accuracy, and perfectly supports various mainstream protocols and neural network models.

Description

Digital integration processing method based on CNN cell neural network structure

Technical Field

The invention relates to the technical field of digital integrated processing, in particular to a digital integrated processing method based on a CNN cell neural network structure.

Background

Currently, with the rise of the field of artificial intelligence, the position of image recognition is becoming more important, and as an important component in the field of image recognition, an arithmetic unit for image processing is also developing. From general computation of a CPU to one-dimensional parallel computation of a GPU, the development of a special operation integrated circuit is more and more rapid. The arithmetic units of the existing processors have more or less own disadvantages: the CPU, a classical von Neumann system, is a general processor, accesses memory to call data for processing by addressing through a control module, and has complex control logic. His single arithmetic unit is powerful, but the computational power cannot be shared between the arithmetic cores, is good at handling single complex computational tasks, but is not good at paralleling a large number of tasks. And the GPU is designed based on large throughput, and the Cache occupies less space and only supports Thread. The number of arithmetic units of the system is large, the system is good at large-scale repeated calculation application scenes, but only can perform one-dimensional operation, but is not good at performing two-dimensional operation on neighborhood region data, and the operation speed is low. The special image acceleration processor is realized based on a traditional convolutional neural network, is limited by a convolutional layer structure and a data structure, has low operation speed and limited precision and has little learning capability. The existing operation integrated circuit unit sacrifices the capability of high-speed parallel for the calculation power of a single body, increases the calculation throughput at the cost of simplifying the operation unit, or is limited by the structural design. Therefore, the problems that a computing unit with weak parallel processing capability and high throughput of a traditional architecture processor computing unit is weak in independent computing capability, is limited by structure and data, needs to convert a two-dimensional image into one-dimensional data for multiplication, is deficient in two-dimensional computing capability and high in power consumption for computing a large image are solved.

Disclosure of Invention

The invention aims to provide a digital integration processing method based on a CNN cell neural network structure, which supports high-efficiency two-dimensional neighborhood matrix calculation, performs synchronous block calculation on all points on an image pixel matrix, realizes the design of a special chip for high-efficiency image processing, realizes high efficiency and low power consumption factors, has the advantage of low cost, supports various data structures, and perfectly supports various mainstream protocols and neural network models.

The invention provides a digital integrated processing method based on a CNN cell neural network structure, which comprises the following steps:

the method comprises the following steps: the image is input by a processing operation set circuit consisting of cell circuit units in a pixel matrix form and collected, and each circuit unit simultaneously participates in processing image pixels;

step two: the cell circuit unit takes a certain pixel point in the image as a processing object, and the circuit transmits object image pixel point information, peripheral pixel point information of the operation unit and operation parameters to the circuit module;

step three: multiplying all pixel data by the operation parameters by using a multiplier unit to obtain optimized pixel data;

step four: calculating adjacent data on the structure in advance through three layers of adders, adding every two of the adjacent data, and summarizing to obtain processed pixel data finally;

step five: respectively operating the related connection points in the neighborhood of the cell circuit unit, and transmitting the operation result to the addition unit part for calculation after the operation result is beaten by a beat rhythm by a register;

step six: the unit operation parts correspond to different data widths respectively; the two-to-two addition of neighborhood processing parameters is formed structurally, and finally, the sum is collected, and the calculation process of the multiply-add of multiple bit widths and multiple data is accelerated by using the idea of space time change.

The further improvement lies in that: the number of the multipliers in the third step is 9, and the multipliers form a multiplication unit and respectively operate the relevant connection points in the neighborhood of the cell.

The further improvement lies in that: in the fourth step, the total number of the adders is 8 to form an addition unit part, and the three addition units are provided from the input and respectively correspond to different data widths and input preprocessing methods. The further improvement lies in that: and the calculation template in the fifth step is a correlation connection matrix relation.

The further improvement lies in that: the data structure of the arithmetic unit in the sixth step contains various formats, supports fixed point and high-precision floating point, and has a self-learning adjustment function.

The invention has the beneficial effects that: the target unit can be subjected to block calculation in a neighborhood range, the operation unit can be used for calculating the matrix to be directly convolved, the matrix does not need to be split into one-dimensional operation, direct calculation is achieved, control is flexible, the multi-core concept is achieved, the operation unit can be used repeatedly, and power consumption is low after the operation unit is scaled. The structural design of the adder is optimized, and the operation time can be effectively prolonged aiming at multi-bit data input. For different large pixel blocks (14 × 14/20 × 20), all can be processed effectively. The design structure can use a clock with higher frequency, and a special storage structure is added, so that the risk of a time sequence is effectively reduced, the load is eliminated, and the probability of burrs is reduced.

Drawings

Fig. 1 is a block diagram a of the CNN neural network system of the present invention.

Fig. 2 is a block diagram B of the CNN neural network system of the present invention.

Fig. 3 is a circuit diagram a of the CNN neural network of the present invention.

Fig. 4 is an enlarged view of the invention at box 1 of fig. 3.

Fig. 5 is an enlarged view of the invention at box 2 of fig. 3.

Fig. 6 is an enlarged view of the invention at box 3 of fig. 3.

Fig. 7 is an enlarged view of the invention at box 4 of fig. 3.

Fig. 8 is an enlarged view of the invention at box 5 of fig. 3.

Fig. 9 is an enlarged view of the invention at box 6 of fig. 3.

Fig. 10 is an enlarged view of the invention at box 7 of fig. 3.

Fig. 11 is an enlarged view of the invention at box 8 of fig. 3.

Fig. 12 is an enlarged view of the invention at box 9 of fig. 3.

Fig. 13 is a CNN cell neural network circuit diagram B of the present invention.

Fig. 14 is an enlarged view of the invention at box 1 of fig. 13.

Fig. 15 is an enlarged view of the invention at box 2 of fig. 13.

Fig. 16 is a circuit diagram of a multiplier of the present invention.

Fig. 17 is a circuit diagram of adder a of the present invention.

Fig. 18 is a circuit diagram of an adder B of the present invention.

Fig. 19 is a circuit diagram of an adder C of the present invention.

Detailed Description

For the purpose of enhancing understanding of the present invention, the present invention will be further described in detail with reference to the following examples, which are provided for illustration only and are not to be construed as limiting the scope of the present invention.

The system structure block diagram of the CNN cell neural network operation unit is shown in fig. 1 and fig. 2, taking image processing as an example, an image is expanded on a cell circuit unit by taking pixel points as units after passing through a cache unit, and all cell units in a cell circuit set simultaneously calculate all corresponding pixel points of the image and pixel points in the peripheral neighborhood range according to the connection relationship. Thinning to each cell unit circuit, taking a certain pixel point in the image as a processing object, and transmitting the pixel information of the object image, the pixel information around the operation and the operation parameter to the circuit module by the circuit. In the first stage, all pixel data are multiplied by the operation weight parameter by using a multiplier unit to obtain optimized pixel data. And in the second stage, the processed pixel data is subjected to pairwise addition and summarization through three layers of adders to finally obtain the processed pixel data.

Fig. 3 and 13 are circuit diagrams of cellular neural network circuits, respectively depicting architectures of different data structures and data precision. The process is as follows: the target pixel information, the pixel information i 1-i 9 in the peripheral neighborhood and the coefficient data p 1-p 9 of the correlation matrix are transmitted to the circuit in parallel, multiplied by a multiplier to obtain a correlation weight value, first-stage accumulation is carried out by a first-stage adder, the correlation weight value enters a register for storage, then the correlation weight value enters a second-stage adder for accumulation and summary, and finally the final value cnn _ out is obtained by accumulation of a third-stage adder. The data formats of i series and p series are determined by input data and parameter setting. In an alternative mode, where a selector is used, cnn _ out is allowed to feed back the aforementioned parameters, and the dynamic comparison changes the parameter settings to suit the relevant application. FIG. 3 is a block diagram of a first type of data structure and data precision. Wherein fig. 4 and 5 are multiplier operation parts, in which multiplication of correlation coefficient weights is performed to perform correlation calculation for all correlation points in the matrix. Fig. 6, 7, 8 and 9 show adder operations, which are performed in the first half of the tree accumulation. FIGS. 10 and 11 show the second part of the adder operation, the second half of the tree accumulation. Fig. 12 shows characteristic values obtained by shifting, integrating, and optimizing the final accumulated value combining parameters and pixel data. FIG. 13 is a block diagram of a second type of data structure and data precision. Fig. 14 and 15 show the multiplication and addition operation and the comprehensive shift operation, respectively.

Fig. 16 is a multiplier in the cellular neural network circuit, which is used to multiply the filter parameter and the pixel data, and the whole multiplication unit is composed of 9 multiplier structures, and the related connection points in the cell neighborhood are all operated. In the multiplication circuit structure, pixel data i and parameter data p are expanded, the operation mode of information in the parameter data is selected, then the two data are subjected to bit complementing operation according to the selected parameter structure, multiplication operation is carried out to obtain an output value m, and the operation result of the multiplication unit is transmitted to the addition unit part for calculation after a beat rhythm is played by a register.

FIG. 17, FIG. 18, FIG. 19 shows adders in a cellular neural network arithmetic unit, where the adders have 8 parts in total to form addition units, and two types of addition units are present in total from the input, and correspond to different data widths; the addition of neighborhood processing parameters in pairs is structurally formed, and finally, the summary is carried out, and the calculation process of multi-bit wide and multi-data addition is accelerated by using the idea of space time change. The addition circuit takes two types of data m _1 and m _2 with different structures as input, takes selection signals with different structures as control input m _1_ e, m _2_ e, m _1_ s and m _2_ s, and carries out control signal synthesis of mixing collocation according to the selection of the parameter model, thereby carrying out time-sharing input control on the two types of data and forming shift addition. And finally, outputting the target data in a time-sharing manner through a shift register to obtain superposed values sum _ m and sum _ e.

The design principle of the whole process is that the matrix convolution operation is directly and synchronously carried out in the whole image range by taking the correlation connection matrix relation as a template according to the pixel values of all the points in the neighborhood of the target point according to specific model parameters, thereby achieving the aim of realizing image (pixel) processing according to a set parameter model.

Claims

1. A digital integration processing method based on a CNN cell neural network structure is characterized in that: the method comprises the following steps:

step six: the operation unit parts respectively correspond to different data widths and have different data structures and data precision types; the two-to-two addition of neighborhood processing parameters is formed structurally, and finally, the sum is collected, and the calculation process of the multiply-add of multiple bit widths and multiple data is accelerated by using the idea of space time change.

2. The digital integrated processing method based on the CNN cellular neural network structure, as claimed in claim 1, wherein: the number of the multipliers in the third step is 9, and the multipliers form a multiplication unit and respectively operate the relevant connection points in the neighborhood of the cell.

3. The digital integrated processing method based on the CNN cellular neural network structure, as claimed in claim 1, wherein: in the fourth step, the total number of the adders is 8 to form an addition unit part, and the three addition units are provided from the input and respectively correspond to different data widths and input preprocessing methods.

4. A digital integrated processing method based on CNN cellular neural network architecture as claimed in claim 3, characterized in that: the connection method of the adder adopts a method of adding in advance and finally summarizing in local proximity.

5. The digital integrated processing method based on the CNN cellular neural network structure, as claimed in claim 1, wherein: and the calculation template in the fifth step is a correlation connection matrix relation.

6. The method of claim 1, wherein the method comprises: the multiple data types in the sixth step comprise a fixed point and a floating point, wherein the fixed point comprises a mainstream input image format and a mainstream input image model; the floating point has high precision and a learning, adjusting and feedback function.