CN114120082A

CN114120082A - Image acceleration convolution calculation method, system, equipment and readable storage medium

Info

Publication number: CN114120082A
Application number: CN202111393744.0A
Authority: CN
Inventors: 杨柯; 吴新春; 孙彪; 朱书霖; 成鑫才; 李德鑫
Original assignee: Ningbo Handa Information Technology Co ltd; Southwest Jiaotong University
Current assignee: Ningbo Handa Information Technology Co ltd; Southwest Jiaotong University
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2022-03-01

Abstract

The invention discloses an image acceleration convolution calculation method, a system, equipment and a readable storage medium, comprising the following steps: step S1: acquiring original pixel data through a camera, and converting the original pixel data into a first matrix of m × n, wherein one matrix point corresponds to one pixel point in the original pixel data; step S2: outputting the first matrix to an FIFO module, and performing first-in first-out sequencing to obtain a second matrix; step S3: outputting the second matrix to a reading control module, and performing zero filling on the second matrix to obtain a matrix to be multiplied; step S4: and outputting the matrix to be multiplied to an operation module, and performing convolution operation on the matrix to be multiplied and the convolution kernel matrix. The calculation speed is improved through convolution operation, the convolution neural network is accelerated through the FPGA, and the FPGA has the characteristics of high speed and parallelism and is very suitable for hardware acceleration of the neural network.

Description

Image acceleration convolution calculation method, system, equipment and readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, a system, a device, and a readable storage medium for image acceleration convolution calculation.

Background

With the development of artificial intelligence, machine learning is involved in many fields, and is applied to various industries, including medical treatment, security and the like. Deep learning has also been rapidly developed in recent years as the leading branch of the machine learning field. The convolutional neural network model is an algorithm model widely applied to deep learning, has unique advantages in image processing, and is usually used as a backbone in an image feature extraction model. However, as the complexity of the convolutional neural network is continuously increased, the data size of the image information is huge, but the computer resources are limited, so that the speed of image feature processing is slow.

Therefore, there is a need for a convolution calculation method, system, device and readable storage medium capable of speeding up image processing.

Disclosure of Invention

In order to solve the existing problems, the invention provides an image acceleration convolution calculation method, a system, equipment and a readable storage medium, the calculation speed is improved through convolution operation, the convolution neural network is accelerated through an FPGA, the FPGA has the characteristics of high speed and parallelism, the method is very suitable for hardware acceleration of the neural network, and the problem of low image processing speed in the prior art is solved.

In a first aspect, the present invention provides a method for calculating an accelerated convolution of an image, including the following steps: step S1: acquiring original pixel data through a camera, and converting the original pixel data into a first matrix of m × n, wherein one matrix point corresponds to one pixel point in the original pixel data; step S2: outputting the first matrix to an FIFO module, and performing first-in first-out sequencing to obtain a second matrix; step S3: outputting the second matrix to a reading control module, and performing zero filling on the second matrix to obtain a matrix to be multiplied; step S4: and outputting the matrix to be multiplied to an operation module, and performing convolution operation on the matrix to be multiplied and the convolution kernel matrix. The calculation speed is improved through convolution operation, and the convolution neural network is accelerated through the FPGA.

In some embodiments of the present application, in step S1, one pixel synchronization clock transmits one pixel value in the order from left to right and from top to bottom.

In some embodiments of the present application, in step S2, the FIFO module includes a read-write address control circuit and a dual-port RAM, and the width of the FIFO is the bit width of the original pixel data and the depth is the lateral resolution of the original pixel data × 2.

In some embodiments of the present application, in step S3, the method further includes: s31: when the FIFO module is detected to be not empty, outputting a 0 value; s32: sending n read requests, outputting the received data and outputting two 0 values; s33: repeating the step S32(m-1) times; s34: and sending n read requests, outputting the received data and outputting a 0 value.

In some embodiments of the present application, when detecting that the FIFO module is not empty, the read control module completes sending a value of 0, or sends a read request to the FIFO module, and forwards pixel data output by the FIFO module; and each time the FIFO module read request is sent, the counter is increased by one, the count value returns to the read control module, and the read control module judges whether to send an output 0 value or send the read request according to the count value.

In some embodiments of the present application, the read control module only performs zero padding before and after each row of the second matrix.

In some embodiments of the present application, the operation module includes a single-port read-only memory mirror image and a shift register, the depth of the single-port read-only memory mirror image is 9, an initial value is convolution parameters S1-S9 in a convolution kernel matrix, the shift register includes a first shift register, a second shift register, a third shift register, a fourth shift register and a fifth shift register, the widths of the first shift register and the second shift register are pixel data bit widths, and the depth is n + 2; the widths of the third shift register, the fourth shift register and the fifth shift register are pixel data bit widths, the depth of the third shift register, the fourth shift register and the fifth shift register is 3, the matrix to be multiplied enters the first shift register and the fifth shift register firstly, the output end of the first shift register is connected with the input ends of the second shift register and the third shift register, and the output end of the second shift register is connected with the input end of the fourth shift register.

In a second aspect, an image acceleration convolution computing system is further provided, which includes a camera, configured to collect original pixel data and convert the original pixel data into a first matrix of m × n, where one matrix point corresponds to one pixel point in the original pixel data; the FIFO module converts the first matrix into a second matrix through a first-in first-out sequence; the reading control module is used for zero padding the second matrix to obtain a matrix to be multiplied; and the operation module is used for performing convolution operation on the to-be-multiplied matrix and the convolution kernel matrix.

In a third aspect, there is also provided an image accelerated convolution computing apparatus, including: a memory for storing a computer program; a processor for implementing the steps of the image accelerated convolution calculation method as described above when executing the computer program.

In a fourth aspect, a readable storage medium is also provided, the readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the image accelerated convolution calculation method as described above.

The invention has the beneficial effects that: the invention provides an image acceleration convolution calculation method, a system, equipment and a readable storage medium, comprising the following steps: step S1: acquiring original pixel data through a camera, and converting the original pixel data into a first matrix of m × n, wherein one matrix point corresponds to one pixel point in the original pixel data; step S2: outputting the first matrix to an FIFO module, and performing first-in first-out sequencing to obtain a second matrix; step S3: outputting the second matrix to a reading control module, and performing zero filling on the second matrix to obtain a matrix to be multiplied; step S4: and outputting the matrix to be multiplied to an operation module, and performing convolution operation on the matrix to be multiplied and the convolution kernel matrix. The calculation speed is improved through convolution operation, the convolution neural network is accelerated through the FPGA, and the FPGA has the characteristics of high speed and parallelism and is very suitable for hardware acceleration of the neural network.

Drawings

FIG. 1 is a diagram of an original pixel matrix of the present invention;

FIG. 2 is a diagram of a zero-padding matrix of the present invention;

FIG. 3 is a convolution kernel matrix diagram of the present invention;

FIG. 4 is a schematic of the convolution of the present invention;

FIG. 5 is a block diagram of the system of the present invention;

FIG. 6 is a circuit diagram of the FIFO module according to the present invention;

FIG. 7 is a circuit diagram of a read control module according to the present invention;

FIG. 8 is a flow chart of a read control module system according to the present invention;

FIG. 9 is a diagram of a multiplication matrix according to the present invention;

FIG. 10 is a block diagram of the operational module of the present invention;

FIG. 11 is a Shift RAM data flow diagram of the present invention;

fig. 12 is a submatrix diagram of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments, not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and for simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore, should not be considered as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or including indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the present application, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as exemplary is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the invention. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and processes are not shown in detail to avoid obscuring the description of the invention with unnecessary detail. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles disclosed herein.

At present, in order to obtain an image with the same size as that before convolution calculation, a method is generally adopted to expand the periphery of an original image, generally, a row or a column of zero elements are added to the periphery of the original image, and then convolution operation is performed, so that an output image with the same size as that of the original image can be obtained, and original image information is not lost. The resolution of the image is m x n. In fact, an m × n matrix, as shown in fig. 1; fig. 1 is a bit original pixel matrix, in order to keep the output image and the input image after the convolution operation the same size, the original image is not directly convolved, but a circle of zero values is formed around the original image, so as to obtain a zero-padding matrix as shown in fig. 2; fig. 3 shows a convolution kernel matrix with a convolution kernel size of 3 × 3, corresponding to a 3 × 3 matrix, where the values in the convolution kernel matrix are convolution parameters. The zero-padding matrix and the convolution kernel matrix are convolved to obtain an output matrix, as shown in fig. 4.

Example 1: please refer to fig. 5; the invention discloses an image acceleration convolution calculation method, which comprises the following steps: step S1: acquiring original pixel data through a camera, and converting the original pixel data into a first matrix of m × n, wherein one matrix point corresponds to one pixel point in the original pixel data; step S2: outputting the first matrix to an FIFO module, and performing first-in first-out sequencing to obtain a second matrix; step S3: outputting the second matrix to a reading control module, and performing zero filling on the second matrix to obtain a matrix to be multiplied; step S4: and outputting the matrix to be multiplied to an operation module, and performing convolution operation on the matrix to be multiplied and the convolution kernel matrix. The calculation speed is improved through convolution operation, and the convolution neural network is accelerated through the FPGA.

Example 2: referring to fig. 1, in some embodiments of the present application, in step S1, a pixel synchronization clock sends a pixel value in a sequence from left to right and from top to bottom. Secondary coordinate P of camera₁₁Start transmission until P_mn。

Example 3: referring to fig. 6, in some embodiments of the present application, in step S2, the FIFO module includes a read/write address control circuit and a dual-port RAM, the width of the FIFO is the bit width of the original pixel data, and the depth is the lateral resolution × 2 of the original pixel data. The resolution of the image sent by the camera is m × n, and the depth of the FIFO should be set to 2 m.

Example 4: referring to fig. 7 to 9, in some embodiments of the present application, in step S3, the method further includes the following steps: s31: when the FIFO module is detected to be not empty, outputting a 0 value; s32: sending n read requests, outputting the received data and outputting two 0 values; s33: repeating the step S32(m-1) times; s34: and sending n read requests, outputting the received data and outputting a 0 value. When the reading control module detects that the FIFO module is not empty, the sending of a 0 value is completed, or a reading request is sent to the FIFO module, and the pixel data output by the FIFO module is forwarded; and each time the FIFO module read request is sent, the counter is increased by one, the count value returns to the read control module, and the read control module judges whether to send an output 0 value or send the read request according to the count value. In some embodiments of the present application, the read control module only performs zero padding before and after each row of the second matrix.

Example 5: referring to fig. 10 to 12, in some embodiments of the present application, the operation module includes a single-port read-only memory mirror image and a shift register, where a depth of the single-port read-only memory mirror image is 9, an initial value of the single-port read-only memory mirror image is convolution parameters S1 to S9 in a convolution kernel matrix, the shift register includes a first shift register, a second shift register, a third shift register, a fourth shift register, and a fifth shift register, a width of the first shift register and a width of the second shift register are pixel data bit widths, and a depth of the first shift register and the second shift register is n + 2; the widths of the third shift register, the fourth shift register and the fifth shift register are pixel data bit widths, the depth of the third shift register, the fourth shift register and the fifth shift register is 3, the matrix to be multiplied enters the first shift register and the fifth shift register firstly, the output end of the first shift register is connected with the input ends of the second shift register and the third shift register, and the output end of the second shift register is connected with the input end of the fourth shift register.

In fig. 10, a Single Port ROM is a Single-Port read-only memory mirror image, and a Shift RAM is a Shift register; the Shift _ RAM _1, the Shift _ RAM _2, the Shift _ RAM _3, the Shift _ RAM _4 and the Shift _ RAM _5 are respectively a first Shift register, a second Shift register, a third Shift register, a fourth Shift register and a fifth Shift register, and the Shift _ RAM _3, the Shift _ RAM _4 and the Shift _ RAM _5 have initial values of 0; the Shift RAM increments the counter by one every time it outputs a number. The count value reflects that what is currently stored in the shift register is the data of the several rows. The counting value is input into a multi-selection module, the current calculation mode is judged according to the counting value, a driving code is generated and input into an Arithmetic Logic Unit (ALU), and the driving calculation unit adopts different operation formulas. The data to be processed sent by the read control circuit enters Shift _ RAM _1 and Shift _ RAM _5 first. The output of Shift _ RAM _1 is the input of Shift _ RAM _2 and Shift _ RAM _3, and the output of Shift _ RAM _2 is the input of Shift _ RAM _ 4. The dashed box marks the convolution template, and during convolution calculation, data stored in the convolution template and parameters of a convolution kernel are read for convolution operation.

Example 6: referring to fig. 12, when 2n +7 data are received, the data stored in the convolution template in the dashed box is the 3 × 3 sub-matrix at the upper left corner of the to-be-processed matrix, when the operation module receives 2n +7 data, the output data is valid, the operation module reserves two output ports, and one port outputs one valid pixel data in each clock cycle. The output data of the arithmetic unit, i.e. the output matrix, also follows the principle from top to bottom and from left to right. Specifically, row 1 of the output matrix will be output with row 2, and the last row 1 of the output matrix will be output with the 2 nd row from the last. That is, after the operation module receives 2n +7 data, it will output the 1 st line and the 2 nd line at the same time at two output ports, then output the 3 rd line to the m-2 line at one output port line by line, and finally output the m-1 st line and the m th line at the same time at two output ports. The first row of the output matrix is calculated as:

Q_1j＝Addr(2,3)×S4+Addr(2,2)×S5+Addr(2,1)×S6+Addr(1,3)×S7+Addr(1,2)×S8+Addr(1,1)×S9 j∈[1,n]

row 2 to m-1 of the output matrix:

Q_ij＝Addr(2,3)×S1+Addr(2,2)×S2+Addr(2,1)×S3+Addr(1,3)×S4+Addr(1,2)×S5+Addr(1,1)×S6+Addr(3,3)×S7+Addr(3,2)×S8+Addr(3,1)×S9 i∈[2,m-1] j∈[1,n]

row m of the output matrix:

Q_mj＝Addr(1,3)×S1+Addr(1,2)×S2+Addr(1,1)×S3+Addr(3,3)×S4+Addr(3,2)×S5+Addr(3,1)×S6 j∈[1,n]

in a second aspect, an image acceleration convolution computing system is further provided, which includes a camera, configured to collect original pixel data and convert the original pixel data into a first matrix of m × n, where one matrix point corresponds to one pixel point in the original pixel data; the FIFO module converts the first matrix into a second matrix through a first-in first-out sequence; the reading control module is used for zero padding the second matrix to obtain a matrix to be multiplied; and the operation module is used for performing convolution operation on the to-be-multiplied matrix and the convolution kernel matrix. The calculation speed is improved through convolution operation, the convolution neural network is accelerated through the FPGA, and the FPGA has the characteristics of high speed and parallelism and is very suitable for hardware acceleration of the neural network.

The invention has the technical effects that:

the calculation speed is improved through convolution operation, the convolution neural network is accelerated through the FPGA, and the FPGA has the characteristics of high speed and parallelism and is very suitable for hardware acceleration of the neural network.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed descriptions of other embodiments, and are not described herein again.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

For each patent, patent application publication, and other material cited in this application, such as articles, books, specifications, publications, documents, and the like, the entire contents of which are hereby incorporated by reference into this application, except for application history documents that are inconsistent with or conflict with the contents of this application, and except for documents that are currently or later become incorporated into this application as though fully set forth in the claims below. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the present disclosure.

The present invention provides a method, a system and a device for detecting a target based on the combination of SSD feature fusion and deep separable convolution, which are described in detail above, and the present invention is explained in the following by applying specific embodiments, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An image acceleration convolution calculation method is characterized by comprising the following steps:

step S1: acquiring original pixel data through a camera, and converting the original pixel data into a first matrix of m × n, wherein one matrix point corresponds to one pixel point in the original pixel data;

step S2: outputting the first matrix to an FIFO module, and performing first-in first-out sequencing to obtain a second matrix;

step S3: outputting the second matrix to a reading control module, and performing zero filling on the second matrix to obtain a matrix to be multiplied;

step S4: and outputting the matrix to be multiplied to an operation module, and performing convolution operation on the matrix to be multiplied and the convolution kernel matrix.

2. The method of claim 1, wherein in step S1, a pixel synchronization clock sends a pixel value in the order from left to right and from top to bottom.

3. The method according to claim 1, wherein in step S2, the FIFO module comprises a read-write address control circuit and a dual-port RAM, the FIFO has a width of a bit width of the original pixel data and a depth of a lateral resolution of 2 of the original pixel data.

4. The method of claim 1, wherein in step S3, the method further comprises the following steps:

s31: when the FIFO module is detected to be not empty, outputting a 0 value;

s32: sending n read requests, outputting the received data and outputting two 0 values;

s33: repeating the step S32(m-1) times;

s34: and sending n read requests, outputting the received data and outputting a 0 value.

5. The image acceleration convolution calculation method according to claim 4, wherein the reading control module finishes sending a 0 value or sends a reading request to the FIFO module when detecting that the FIFO module is not empty, and forwards the pixel data output by the FIFO module; and each time the FIFO module read request is sent, the counter is increased by one, the count value returns to the read control module, and the read control module judges whether to send an output 0 value or send the read request according to the count value.

6. The method according to claim 4, wherein the read control module performs zero padding only before and after each row of the second matrix.

7. The image acceleration convolution calculation method according to claim 1, wherein the operation module includes a single-port read-only memory mirror image and a shift register, the depth of the single-port read-only memory mirror image is 9, initial values are convolution parameters S1-S9 in a convolution kernel matrix, the shift register includes a first shift register, a second shift register, a third shift register, a fourth shift register and a fifth shift register, the width of the first shift register and the width of the second shift register are pixel data bit width, and the depth is n + 2; the widths of the third shift register, the fourth shift register and the fifth shift register are pixel data bit widths, the depth of the third shift register, the fourth shift register and the fifth shift register is 3, the matrix to be multiplied enters the first shift register and the fifth shift register firstly, the output end of the first shift register is connected with the input ends of the second shift register and the third shift register, and the output end of the second shift register is connected with the input end of the fourth shift register.

8. An image accelerated convolution computing system comprising

The camera is used for collecting original pixel data and converting the original pixel data into a first matrix of m x n, and one matrix point corresponds to one pixel point in the original pixel data;

the FIFO module converts the first matrix into a second matrix through a first-in first-out sequence;

the reading control module is used for zero padding the second matrix to obtain a matrix to be multiplied;

and the operation module is used for performing convolution operation on the to-be-multiplied matrix and the convolution kernel matrix.

9. An image accelerated convolution computing device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the image accelerated convolution calculation method according to any one of claims 1 to 7 when executing said computer program.

10. A scale storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the image accelerated convolution calculation method according to any one of claims 1 to 7.