CN112330524A

CN112330524A - Device and method for quickly realizing convolution in image tracking system

Info

Publication number: CN112330524A
Application number: CN202011153379.1A
Authority: CN
Inventors: 刘召军; 徐保树; 白魁一; 张程硕; 史志跃
Original assignee: Shenyang Sumboy Intelligent Imaging Technology Co ltd
Current assignee: Shenyang Sumboy Intelligent Imaging Technology Co ltd
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2021-02-05
Anticipated expiration: 2040-10-26
Also published as: CN112330524B

Abstract

The invention belongs to the technical field of image tracking and recognition, and particularly relates to a device and a method for quickly realizing convolution in an image tracking system, wherein a ZYNQ-7010 heterogeneous computing platform is adopted, the device comprises an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, the output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with the input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and the output end of the convolution computing module is connected with the DDR storage module. By adopting the device and the method, the convolution operation is fast, the device volume is small, and the integration level is high.

Description

Device and method for quickly realizing convolution in image tracking system

Technical Field

The invention belongs to the technical field of image tracking and recognition, and particularly relates to a device and a method for quickly realizing convolution in an image tracking system.

Background

Image convolution operations are widely used in digital image processing algorithms such as image enhancement, image edge detection, image tracking, and target recognition, and the size of the template is generally calculated to be small, for example, 3 × 3,5 × 5,7 × 7,9 × 9, etc., whereas in image tracking algorithms the size of the template directly determines the robustness of tracking for large and small targets. Although the principle of the image convolution operation is simple, the operation amount is huge, and the nested loop is quite time-consuming in the DSP and the CPU. Convolution operations cannot compute the result in real time for video images. With the progress of integrated circuit design and manufacturing process, a Field Programmable Gate Array (FPGA) with a large number of high-speed programmable logic resources is rapidly developed, in order to further improve the performance of the FPGA, mainstream chip manufacturers integrate the FPGA and an ARM core in a chip, so that real-time realization of some complex image processing algorithms becomes possible, and in addition, a digital signal processing unit (DSP) with high-speed digital signal processing capability is integrated in the FPGA, so that fixed-point operation can be realized at high speed and low power consumption, and a large number of complex operations are completed.

Because a large amount of algorithm calculation, video decoding and encoding and image preprocessing are needed in the process of realizing the tracking of a calculation target, the structure of a general tracker is a logic control unit plus an algorithm processing unit, such as an FPGA plus a DSP, the size of the tracker realized by combining the two chips is generally large, the connection between the FPGA and the DSP is complex, the difficulty of PCB hardware design is increased, and when large template calculation is carried out, the time consumption of a CPU or the DSP for the nested loop calculation is large, so that the real-time calculation cannot be realized in a video image system, and in addition, the CPU or the DSP is easily interrupted by other functions with high priority, so that the calculation time is not fixed, and a stable time margin cannot be provided for the subsequent algorithm.

The conventional method for realizing convolution by the FPGA has a good effect on realizing a small template, all data are listed, for example, if the size of the template is M, M data registers are needed, and if the value M is 32, 1024 data registers are needed, so that a large amount of code input is needed for programmers, and the maintenance is not easy. The conventional method for realizing convolution by FPGA can finish image convolution only by repeatedly reading a stored image after finishing the time sequence of a video image in calculation.

Disclosure of Invention

In order to solve the above technical problems, the present invention provides an apparatus and method for quickly performing convolution in an image tracking system.

The invention is realized in this way, provide a device for realizing convolution fast in the image tracking system, adopt ZYNQ-7010 isomerism calculation platform, including image data preprocessing module, ZYNQ SOC control module, DDR memory module and convolution calculation module, ZYNQ SOC control module is connected with image data preprocessing module and DDR memory module separately, the carry-out terminal of the image data preprocessing module is connected with DDR memory module, DDR memory module is connected with the input end of the convolution calculation module through search area image data transmission line and template image data transmission line, the carry-out terminal of the convolution calculation module is connected with DDR memory module again.

Preferably, in the ZYNQ SOC control module, the template size of the image convolution calculation set is 32 × 32, the size of the search area image for matching is 128 × 128, and the final calculation result size of the convolution calculation module is 97 × 97.

More preferably, the ZYNQ SOC control module is provided with 5 modes of image data preprocessing, each of which is:

mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;

mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;

mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;

mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;

mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.

Further preferably, in the convolution calculation module, the convolution operation is performed on the search area image and the template image according to the following method:

1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;

2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;

3) repeating the

steps

1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.

The invention also provides a method for quickly realizing convolution in an image tracking system by using the device, which comprises the following steps:

1) inputting video images to the images according to line sequence time sequence;

2) preprocessing the video images in an image data preprocessing module according to a mode set in the ZYNQ SOC control module, finally processing the video images into search area images with the size of 128 × 128, and sending the search area images to the DDR storage module for storage;

3) the ZYNQ SOC control module reads template data from the image data preprocessing module according to the size of 32 × 32, and then stores the template data in 32 distributed RAMs with the size of 32 in an FPGA of the ZYNQ SOC control module;

4) the convolution operation module respectively reads the search area data and the template data through the DDR storage module, performs convolution operation, and stores a convolution result in the DDR storage module.

Preferably, in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:

3) repeating the

steps

Compared with the prior art, the invention has the advantages that:

the image convolution calculation is completed in 32 x (128-31)/clk (main clock) after the search area is completed, and the system is strong in real-time performance; the search area image and the convolution calculation result are stored in the DDR storage module, so that the resource consumption in the FPGA is saved, the large template calculation convolution can be operated in a chip with less resources, and the size and the cost of an image tracking system can be reduced.

Drawings

FIG. 1 is a diagram of the apparatus of the present invention;

FIG. 2 is a schematic diagram of five pretreatment modes;

FIG. 3 is a diagram illustrating a convolution operation process.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the present invention provides a device for quickly implementing convolution in an image tracking system, which is characterized in that a ZYNQ-7010 heterogeneous computing platform is adopted, and the device comprises an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, wherein the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, an output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with an input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and an output end of the convolution computing module is connected with the DDR storage module. In the ZYNQ SOC control module, the template size of the image convolution calculation set is 32 × 32, the search area image size for matching is 128 × 128, and the final calculation result size by the convolution calculation module is 97 × 97.

The method for quickly realizing convolution in the image tracking system by utilizing the device comprises the following steps:

referring to fig. 2, in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:

The preprocessing mode is to obtain the images of the search area, different modes are adopted for different scenes or algorithms, especially in an image tracking system, the different modes are switched along with the increase or decrease of the proportion of the target in the whole field of images, and the mode effectively increases the robustness of tracking large targets and small targets.

4) the convolution operation module respectively reads the search area data and the template data through the DDR storage module, performs convolution operation, and stores the convolution result in the DDR storage module:

when the image data of the T-1 th line is written in the search image area, the convolution operation module starts to read the image data of the corresponding line of the T-1 search area from the DDR storage module and store the image data in the T-1 distributed RAMs of the FPGA, simultaneously reads the T template image and stores the T template image in the T distributed RAMs of the FPGA, and then writes the image data of the search area in the one line and simultaneously reads the image data of the search area in the T distributed RAM of the FPGA.

The calculation process is shown in FIG. 3; firstly, 32 search area images are read simultaneously, addresses are 0,1,2.. 31, 32 template images are obtained simultaneously, the search area image data and the template image data are subjected to parallel calculation products by 32 DSP multiplication units, accumulation summation calculation is carried out in the next period, and a summation result is obtained after 5 times of accumulation. Sequentially calculating to complete 32 times of multiplication, accumulation and summation, and then summing the results to obtain the convolution result of the search area image and the template image; sequentially changing the read addresses, such as 1,2,3.. 32; 2,3,4.. 33, and the like, and 97 image convolution results are obtained after 97 operations are carried out; then, reading the image data of the search area from the DDR storage area, storing the newly read 33 th row image data into the 1 st distributed RAM of the FPGA, then, simultaneously reading 32 search area image data, and then, performing a sorting, for example, the 1 st distributed RAM output data is the 32 th image data, the 2 nd distributed RAM output data is the 1 st image data, repeating the above steps to obtain the newly sorted 32 image data, then obtaining the image convolution result of the 2 nd row according to the above convolution calculation mode, and repeating the above two steps to obtain the image convolution result of 97 × 97. The time of each line of calculation results can be increased or reduced according to actual needs to ensure that the convolution results are output quickly after the search area is finished.

Claims

1. The device for quickly realizing convolution in an image tracking system is characterized by adopting a ZYNQ-7010 heterogeneous computing platform and comprising an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, wherein the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, the output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with the input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and the output end of the convolution computing module is connected with the DDR storage module.

2. The apparatus for rapidly performing convolution in an image tracking system according to claim 1, wherein in the ZYNQ SOC control module, a template size for convolution calculation of an image is set to 32 x 32, a size of a search space image for matching is set to 128 x 128, and a final result size calculated by the convolution calculation module is set to 97 x 97.

3. The apparatus for fast convolution implementation in an image tracking system according to claim 1, wherein in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:

4. The apparatus for quickly performing convolution in an image tracking system according to claim 1, wherein the convolution calculating module performs convolution operation on the search area image and the template image according to the following method:

3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.

5. A method for quickly performing convolution in an image tracking system using the apparatus of claim 2, comprising the steps of:

6. The method of claim 5, wherein in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:

7. The apparatus for quickly performing convolution in an image tracking system according to claim 5, wherein the convolution calculating module performs convolution operation on the search area image and the template image according to the following method: