CN112330524A - Device and method for quickly realizing convolution in image tracking system - Google Patents
Device and method for quickly realizing convolution in image tracking system Download PDFInfo
- Publication number
- CN112330524A CN112330524A CN202011153379.1A CN202011153379A CN112330524A CN 112330524 A CN112330524 A CN 112330524A CN 202011153379 A CN202011153379 A CN 202011153379A CN 112330524 A CN112330524 A CN 112330524A
- Authority
- CN
- China
- Prior art keywords
- image data
- image
- convolution
- module
- search area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 53
- 230000005540 biological transmission Effects 0.000 claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims description 33
- 238000009825 accumulation Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 8
- 230000001360 synchronised effect Effects 0.000 claims description 5
- 230000010354 integration Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of image tracking and recognition, and particularly relates to a device and a method for quickly realizing convolution in an image tracking system, wherein a ZYNQ-7010 heterogeneous computing platform is adopted, the device comprises an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, the output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with the input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and the output end of the convolution computing module is connected with the DDR storage module. By adopting the device and the method, the convolution operation is fast, the device volume is small, and the integration level is high.
Description
Technical Field
The invention belongs to the technical field of image tracking and recognition, and particularly relates to a device and a method for quickly realizing convolution in an image tracking system.
Background
Image convolution operations are widely used in digital image processing algorithms such as image enhancement, image edge detection, image tracking, and target recognition, and the size of the template is generally calculated to be small, for example, 3 × 3,5 × 5,7 × 7,9 × 9, etc., whereas in image tracking algorithms the size of the template directly determines the robustness of tracking for large and small targets. Although the principle of the image convolution operation is simple, the operation amount is huge, and the nested loop is quite time-consuming in the DSP and the CPU. Convolution operations cannot compute the result in real time for video images. With the progress of integrated circuit design and manufacturing process, a Field Programmable Gate Array (FPGA) with a large number of high-speed programmable logic resources is rapidly developed, in order to further improve the performance of the FPGA, mainstream chip manufacturers integrate the FPGA and an ARM core in a chip, so that real-time realization of some complex image processing algorithms becomes possible, and in addition, a digital signal processing unit (DSP) with high-speed digital signal processing capability is integrated in the FPGA, so that fixed-point operation can be realized at high speed and low power consumption, and a large number of complex operations are completed.
Because a large amount of algorithm calculation, video decoding and encoding and image preprocessing are needed in the process of realizing the tracking of a calculation target, the structure of a general tracker is a logic control unit plus an algorithm processing unit, such as an FPGA plus a DSP, the size of the tracker realized by combining the two chips is generally large, the connection between the FPGA and the DSP is complex, the difficulty of PCB hardware design is increased, and when large template calculation is carried out, the time consumption of a CPU or the DSP for the nested loop calculation is large, so that the real-time calculation cannot be realized in a video image system, and in addition, the CPU or the DSP is easily interrupted by other functions with high priority, so that the calculation time is not fixed, and a stable time margin cannot be provided for the subsequent algorithm.
The conventional method for realizing convolution by the FPGA has a good effect on realizing a small template, all data are listed, for example, if the size of the template is M, M data registers are needed, and if the value M is 32, 1024 data registers are needed, so that a large amount of code input is needed for programmers, and the maintenance is not easy. The conventional method for realizing convolution by FPGA can finish image convolution only by repeatedly reading a stored image after finishing the time sequence of a video image in calculation.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides an apparatus and method for quickly performing convolution in an image tracking system.
The invention is realized in this way, provide a device for realizing convolution fast in the image tracking system, adopt ZYNQ-7010 isomerism calculation platform, including image data preprocessing module, ZYNQ SOC control module, DDR memory module and convolution calculation module, ZYNQ SOC control module is connected with image data preprocessing module and DDR memory module separately, the carry-out terminal of the image data preprocessing module is connected with DDR memory module, DDR memory module is connected with the input end of the convolution calculation module through search area image data transmission line and template image data transmission line, the carry-out terminal of the convolution calculation module is connected with DDR memory module again.
Preferably, in the ZYNQ SOC control module, the template size of the image convolution calculation set is 32 × 32, the size of the search area image for matching is 128 × 128, and the final calculation result size of the convolution calculation module is 97 × 97.
More preferably, the ZYNQ SOC control module is provided with 5 modes of image data preprocessing, each of which is:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
Further preferably, in the convolution calculation module, the convolution operation is performed on the search area image and the template image according to the following method:
1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;
2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;
3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.
The invention also provides a method for quickly realizing convolution in an image tracking system by using the device, which comprises the following steps:
1) inputting video images to the images according to line sequence time sequence;
2) preprocessing the video images in an image data preprocessing module according to a mode set in the ZYNQ SOC control module, finally processing the video images into search area images with the size of 128 × 128, and sending the search area images to the DDR storage module for storage;
3) the ZYNQ SOC control module reads template data from the image data preprocessing module according to the size of 32 × 32, and then stores the template data in 32 distributed RAMs with the size of 32 in an FPGA of the ZYNQ SOC control module;
4) the convolution operation module respectively reads the search area data and the template data through the DDR storage module, performs convolution operation, and stores a convolution result in the DDR storage module.
Preferably, in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
Further preferably, in the convolution calculation module, the convolution operation is performed on the search area image and the template image according to the following method:
1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;
2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;
3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.
Compared with the prior art, the invention has the advantages that:
the image convolution calculation is completed in 32 x (128-31)/clk (main clock) after the search area is completed, and the system is strong in real-time performance; the search area image and the convolution calculation result are stored in the DDR storage module, so that the resource consumption in the FPGA is saved, the large template calculation convolution can be operated in a chip with less resources, and the size and the cost of an image tracking system can be reduced.
Drawings
FIG. 1 is a diagram of the apparatus of the present invention;
FIG. 2 is a schematic diagram of five pretreatment modes;
FIG. 3 is a diagram illustrating a convolution operation process.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a device for quickly implementing convolution in an image tracking system, which is characterized in that a ZYNQ-7010 heterogeneous computing platform is adopted, and the device comprises an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, wherein the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, an output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with an input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and an output end of the convolution computing module is connected with the DDR storage module. In the ZYNQ SOC control module, the template size of the image convolution calculation set is 32 × 32, the search area image size for matching is 128 × 128, and the final calculation result size by the convolution calculation module is 97 × 97.
The method for quickly realizing convolution in the image tracking system by utilizing the device comprises the following steps:
1) inputting video images to the images according to line sequence time sequence;
2) preprocessing the video images in an image data preprocessing module according to a mode set in the ZYNQ SOC control module, finally processing the video images into search area images with the size of 128 × 128, and sending the search area images to the DDR storage module for storage;
referring to fig. 2, in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
The preprocessing mode is to obtain the images of the search area, different modes are adopted for different scenes or algorithms, especially in an image tracking system, the different modes are switched along with the increase or decrease of the proportion of the target in the whole field of images, and the mode effectively increases the robustness of tracking large targets and small targets.
3) The ZYNQ SOC control module reads template data from the image data preprocessing module according to the size of 32 × 32, and then stores the template data in 32 distributed RAMs with the size of 32 in an FPGA of the ZYNQ SOC control module;
4) the convolution operation module respectively reads the search area data and the template data through the DDR storage module, performs convolution operation, and stores the convolution result in the DDR storage module:
when the image data of the T-1 th line is written in the search image area, the convolution operation module starts to read the image data of the corresponding line of the T-1 search area from the DDR storage module and store the image data in the T-1 distributed RAMs of the FPGA, simultaneously reads the T template image and stores the T template image in the T distributed RAMs of the FPGA, and then writes the image data of the search area in the one line and simultaneously reads the image data of the search area in the T distributed RAM of the FPGA.
The calculation process is shown in FIG. 3; firstly, 32 search area images are read simultaneously, addresses are 0,1,2.. 31, 32 template images are obtained simultaneously, the search area image data and the template image data are subjected to parallel calculation products by 32 DSP multiplication units, accumulation summation calculation is carried out in the next period, and a summation result is obtained after 5 times of accumulation. Sequentially calculating to complete 32 times of multiplication, accumulation and summation, and then summing the results to obtain the convolution result of the search area image and the template image; sequentially changing the read addresses, such as 1,2,3.. 32; 2,3,4.. 33, and the like, and 97 image convolution results are obtained after 97 operations are carried out; then, reading the image data of the search area from the DDR storage area, storing the newly read 33 th row image data into the 1 st distributed RAM of the FPGA, then, simultaneously reading 32 search area image data, and then, performing a sorting, for example, the 1 st distributed RAM output data is the 32 th image data, the 2 nd distributed RAM output data is the 1 st image data, repeating the above steps to obtain the newly sorted 32 image data, then obtaining the image convolution result of the 2 nd row according to the above convolution calculation mode, and repeating the above two steps to obtain the image convolution result of 97 × 97. The time of each line of calculation results can be increased or reduced according to actual needs to ensure that the convolution results are output quickly after the search area is finished.
Claims (7)
1. The device for quickly realizing convolution in an image tracking system is characterized by adopting a ZYNQ-7010 heterogeneous computing platform and comprising an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, wherein the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, the output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with the input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and the output end of the convolution computing module is connected with the DDR storage module.
2. The apparatus for rapidly performing convolution in an image tracking system according to claim 1, wherein in the ZYNQ SOC control module, a template size for convolution calculation of an image is set to 32 x 32, a size of a search space image for matching is set to 128 x 128, and a final result size calculated by the convolution calculation module is set to 97 x 97.
3. The apparatus for fast convolution implementation in an image tracking system according to claim 1, wherein in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
4. The apparatus for quickly performing convolution in an image tracking system according to claim 1, wherein the convolution calculating module performs convolution operation on the search area image and the template image according to the following method:
1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;
2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;
3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.
5. A method for quickly performing convolution in an image tracking system using the apparatus of claim 2, comprising the steps of:
1) inputting video images to the images according to line sequence time sequence;
2) preprocessing the video images in an image data preprocessing module according to a mode set in the ZYNQ SOC control module, finally processing the video images into search area images with the size of 128 × 128, and sending the search area images to the DDR storage module for storage;
3) the ZYNQ SOC control module reads template data from the image data preprocessing module according to the size of 32 × 32, and then stores the template data in 32 distributed RAMs with the size of 32 in an FPGA of the ZYNQ SOC control module;
4) the convolution operation module respectively reads the search area data and the template data through the DDR storage module, performs convolution operation, and stores a convolution result in the DDR storage module.
6. The method of claim 5, wherein in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
7. The apparatus for quickly performing convolution in an image tracking system according to claim 5, wherein the convolution calculating module performs convolution operation on the search area image and the template image according to the following method:
1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;
2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;
3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011153379.1A CN112330524B (en) | 2020-10-26 | 2020-10-26 | Device and method for quickly realizing convolution in image tracking system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011153379.1A CN112330524B (en) | 2020-10-26 | 2020-10-26 | Device and method for quickly realizing convolution in image tracking system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112330524A true CN112330524A (en) | 2021-02-05 |
CN112330524B CN112330524B (en) | 2024-06-18 |
Family
ID=74311662
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011153379.1A Active CN112330524B (en) | 2020-10-26 | 2020-10-26 | Device and method for quickly realizing convolution in image tracking system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112330524B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7882164B1 (en) * | 2004-09-24 | 2011-02-01 | University Of Southern California | Image convolution engine optimized for use in programmable gate arrays |
CN104035750A (en) * | 2014-06-11 | 2014-09-10 | 西安电子科技大学 | Field programmable gate array (FPGA)-based real-time template convolution implementing method |
CN107403117A (en) * | 2017-07-28 | 2017-11-28 | 西安电子科技大学 | Three dimensional convolution device based on FPGA |
WO2018120446A1 (en) * | 2016-12-31 | 2018-07-05 | 华中科技大学 | Parallel and coordinated processing method for real-time target recognition-oriented heterogeneous processor |
CN108921182A (en) * | 2018-09-26 | 2018-11-30 | 苏州米特希赛尔人工智能有限公司 | The feature-extraction images sensor that FPGA is realized |
CN109034025A (en) * | 2018-07-16 | 2018-12-18 | 东南大学 | A kind of face critical point detection system based on ZYNQ |
CN109389120A (en) * | 2018-10-29 | 2019-02-26 | 济南浪潮高新科技投资发展有限公司 | A kind of object detecting device based on zynqMP |
CN109859178A (en) * | 2019-01-18 | 2019-06-07 | 北京航空航天大学 | A kind of infrared remote sensing image real-time target detection method based on FPGA |
CN109871813A (en) * | 2019-02-25 | 2019-06-11 | 沈阳上博智像科技有限公司 | A kind of realtime graphic tracking and system |
CN110097174A (en) * | 2019-04-22 | 2019-08-06 | 西安交通大学 | Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row |
US20200012881A1 (en) * | 2018-07-03 | 2020-01-09 | Irvine Sensors Corporation | Methods and Devices for Cognitive-based Image Data Analytics in Real Time Comprising Saliency-based Training on Specific Objects |
US20200074288A1 (en) * | 2017-12-06 | 2020-03-05 | Tencent Technology (Shenzhen) Company Limited | Convolution operation processing method and related product |
WO2020119188A1 (en) * | 2018-12-10 | 2020-06-18 | 广东浪潮大数据研究有限公司 | Program detection method, apparatus and device, and readable storage medium |
CN111460999A (en) * | 2020-03-31 | 2020-07-28 | 北京工业大学 | Low-altitude aerial image target tracking method based on FPGA |
CN111459877A (en) * | 2020-04-02 | 2020-07-28 | 北京工商大学 | FPGA (field programmable Gate array) acceleration-based Winograd YO L Ov2 target detection model method |
-
2020
- 2020-10-26 CN CN202011153379.1A patent/CN112330524B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7882164B1 (en) * | 2004-09-24 | 2011-02-01 | University Of Southern California | Image convolution engine optimized for use in programmable gate arrays |
CN104035750A (en) * | 2014-06-11 | 2014-09-10 | 西安电子科技大学 | Field programmable gate array (FPGA)-based real-time template convolution implementing method |
WO2018120446A1 (en) * | 2016-12-31 | 2018-07-05 | 华中科技大学 | Parallel and coordinated processing method for real-time target recognition-oriented heterogeneous processor |
CN107403117A (en) * | 2017-07-28 | 2017-11-28 | 西安电子科技大学 | Three dimensional convolution device based on FPGA |
US20200074288A1 (en) * | 2017-12-06 | 2020-03-05 | Tencent Technology (Shenzhen) Company Limited | Convolution operation processing method and related product |
US20200012881A1 (en) * | 2018-07-03 | 2020-01-09 | Irvine Sensors Corporation | Methods and Devices for Cognitive-based Image Data Analytics in Real Time Comprising Saliency-based Training on Specific Objects |
CN109034025A (en) * | 2018-07-16 | 2018-12-18 | 东南大学 | A kind of face critical point detection system based on ZYNQ |
CN108921182A (en) * | 2018-09-26 | 2018-11-30 | 苏州米特希赛尔人工智能有限公司 | The feature-extraction images sensor that FPGA is realized |
CN109389120A (en) * | 2018-10-29 | 2019-02-26 | 济南浪潮高新科技投资发展有限公司 | A kind of object detecting device based on zynqMP |
WO2020119188A1 (en) * | 2018-12-10 | 2020-06-18 | 广东浪潮大数据研究有限公司 | Program detection method, apparatus and device, and readable storage medium |
CN109859178A (en) * | 2019-01-18 | 2019-06-07 | 北京航空航天大学 | A kind of infrared remote sensing image real-time target detection method based on FPGA |
CN109871813A (en) * | 2019-02-25 | 2019-06-11 | 沈阳上博智像科技有限公司 | A kind of realtime graphic tracking and system |
CN110097174A (en) * | 2019-04-22 | 2019-08-06 | 西安交通大学 | Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row |
CN111460999A (en) * | 2020-03-31 | 2020-07-28 | 北京工业大学 | Low-altitude aerial image target tracking method based on FPGA |
CN111459877A (en) * | 2020-04-02 | 2020-07-28 | 北京工商大学 | FPGA (field programmable Gate array) acceleration-based Winograd YO L Ov2 target detection model method |
Non-Patent Citations (4)
Title |
---|
ALOK PRAKASH 等: "Accelerating computer vision algorithms on heterogeneous edge computing platforms", 《2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEM(SIPS)》, 23 September 2020 (2020-09-23), pages 1 - 6 * |
冯坤: "多任务级联卷积的ZYNQ人脸跟踪设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2020, no. 02, 15 February 2020 (2020-02-15), pages 138 - 1142 * |
周政: "基于FPGA的智能目标跟踪系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2020, no. 02, 15 February 2020 (2020-02-15), pages 135 - 747 * |
李炳奇: "基于FPGA的目标检测与跟踪", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2019, no. 08, 15 August 2019 (2019-08-15), pages 138 - 863 * |
Also Published As
Publication number | Publication date |
---|---|
CN112330524B (en) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458279B (en) | FPGA-based binary neural network acceleration method and system | |
US20230026006A1 (en) | Convolution computation engine, artificial intelligence chip, and data processing method | |
CN112487750B (en) | Convolution acceleration computing system and method based on in-memory computing | |
CN111583093B (en) | Hardware implementation method for ORB feature point extraction with good real-time performance | |
CN111210019B (en) | Neural network inference method based on software and hardware cooperative acceleration | |
Ngo et al. | Resource-aware architecture design and implementation of Hough transform for a real-time iris boundary detection system | |
CN111582465B (en) | Convolutional neural network acceleration processing system and method based on FPGA and terminal | |
CN117217274B (en) | Vector processor, neural network accelerator, chip and electronic equipment | |
CN110738317A (en) | FPGA-based deformable convolution network operation method, device and system | |
CN109472734B (en) | Target detection network based on FPGA and implementation method thereof | |
Xiao et al. | FPGA-based scalable and highly concurrent convolutional neural network acceleration | |
US20220253668A1 (en) | Data processing method and device, storage medium and electronic device | |
Gong et al. | Research and implementation of multi-object tracking based on vision DSP | |
CN102129419B (en) | Based on the processor of fast fourier transform | |
CN112163612B (en) | Big template convolution image matching method, device and system based on fpga | |
CN109446478A (en) | A kind of complex covariance matrix computing system based on iteration and restructural mode | |
CN112330524B (en) | Device and method for quickly realizing convolution in image tracking system | |
CN113158132A (en) | Convolution neural network acceleration system based on unstructured sparsity | |
CN112508174B (en) | Weight binary neural network-oriented pre-calculation column-by-column convolution calculation unit | |
Ngo et al. | Real time iris segmentation on FPGA | |
Kim et al. | A configurable heterogeneous multicore architecture with cellular neural network for real-time object recognition | |
Park et al. | ShortcutFusion++: optimizing an end-to-end CNN accelerator for high PE utilization | |
CN113095024A (en) | Regional parallel loading device and method for tensor data | |
Li et al. | FPGA Accelerated Real-time Recurrent All-Pairs Field Transforms for Optical Flow | |
CN115861025B (en) | Reconfigurable image processor chip architecture supporting OpenCV and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |