CN112330524A - Device and method for quickly realizing convolution in image tracking system - Google Patents

Device and method for quickly realizing convolution in image tracking system Download PDF

Info

Publication number
CN112330524A
CN112330524A CN202011153379.1A CN202011153379A CN112330524A CN 112330524 A CN112330524 A CN 112330524A CN 202011153379 A CN202011153379 A CN 202011153379A CN 112330524 A CN112330524 A CN 112330524A
Authority
CN
China
Prior art keywords
image data
image
convolution
module
search area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011153379.1A
Other languages
Chinese (zh)
Other versions
CN112330524B (en
Inventor
刘召军
徐保树
白魁一
张程硕
史志跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Sumboy Intelligent Imaging Technology Co ltd
Original Assignee
Shenyang Sumboy Intelligent Imaging Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Sumboy Intelligent Imaging Technology Co ltd filed Critical Shenyang Sumboy Intelligent Imaging Technology Co ltd
Priority to CN202011153379.1A priority Critical patent/CN112330524B/en
Publication of CN112330524A publication Critical patent/CN112330524A/en
Application granted granted Critical
Publication of CN112330524B publication Critical patent/CN112330524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image tracking and recognition, and particularly relates to a device and a method for quickly realizing convolution in an image tracking system, wherein a ZYNQ-7010 heterogeneous computing platform is adopted, the device comprises an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, the output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with the input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and the output end of the convolution computing module is connected with the DDR storage module. By adopting the device and the method, the convolution operation is fast, the device volume is small, and the integration level is high.

Description

Device and method for quickly realizing convolution in image tracking system
Technical Field
The invention belongs to the technical field of image tracking and recognition, and particularly relates to a device and a method for quickly realizing convolution in an image tracking system.
Background
Image convolution operations are widely used in digital image processing algorithms such as image enhancement, image edge detection, image tracking, and target recognition, and the size of the template is generally calculated to be small, for example, 3 × 3,5 × 5,7 × 7,9 × 9, etc., whereas in image tracking algorithms the size of the template directly determines the robustness of tracking for large and small targets. Although the principle of the image convolution operation is simple, the operation amount is huge, and the nested loop is quite time-consuming in the DSP and the CPU. Convolution operations cannot compute the result in real time for video images. With the progress of integrated circuit design and manufacturing process, a Field Programmable Gate Array (FPGA) with a large number of high-speed programmable logic resources is rapidly developed, in order to further improve the performance of the FPGA, mainstream chip manufacturers integrate the FPGA and an ARM core in a chip, so that real-time realization of some complex image processing algorithms becomes possible, and in addition, a digital signal processing unit (DSP) with high-speed digital signal processing capability is integrated in the FPGA, so that fixed-point operation can be realized at high speed and low power consumption, and a large number of complex operations are completed.
Because a large amount of algorithm calculation, video decoding and encoding and image preprocessing are needed in the process of realizing the tracking of a calculation target, the structure of a general tracker is a logic control unit plus an algorithm processing unit, such as an FPGA plus a DSP, the size of the tracker realized by combining the two chips is generally large, the connection between the FPGA and the DSP is complex, the difficulty of PCB hardware design is increased, and when large template calculation is carried out, the time consumption of a CPU or the DSP for the nested loop calculation is large, so that the real-time calculation cannot be realized in a video image system, and in addition, the CPU or the DSP is easily interrupted by other functions with high priority, so that the calculation time is not fixed, and a stable time margin cannot be provided for the subsequent algorithm.
The conventional method for realizing convolution by the FPGA has a good effect on realizing a small template, all data are listed, for example, if the size of the template is M, M data registers are needed, and if the value M is 32, 1024 data registers are needed, so that a large amount of code input is needed for programmers, and the maintenance is not easy. The conventional method for realizing convolution by FPGA can finish image convolution only by repeatedly reading a stored image after finishing the time sequence of a video image in calculation.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides an apparatus and method for quickly performing convolution in an image tracking system.
The invention is realized in this way, provide a device for realizing convolution fast in the image tracking system, adopt ZYNQ-7010 isomerism calculation platform, including image data preprocessing module, ZYNQ SOC control module, DDR memory module and convolution calculation module, ZYNQ SOC control module is connected with image data preprocessing module and DDR memory module separately, the carry-out terminal of the image data preprocessing module is connected with DDR memory module, DDR memory module is connected with the input end of the convolution calculation module through search area image data transmission line and template image data transmission line, the carry-out terminal of the convolution calculation module is connected with DDR memory module again.
Preferably, in the ZYNQ SOC control module, the template size of the image convolution calculation set is 32 × 32, the size of the search area image for matching is 128 × 128, and the final calculation result size of the convolution calculation module is 97 × 97.
More preferably, the ZYNQ SOC control module is provided with 5 modes of image data preprocessing, each of which is:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
Further preferably, in the convolution calculation module, the convolution operation is performed on the search area image and the template image according to the following method:
1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;
2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;
3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.
The invention also provides a method for quickly realizing convolution in an image tracking system by using the device, which comprises the following steps:
1) inputting video images to the images according to line sequence time sequence;
2) preprocessing the video images in an image data preprocessing module according to a mode set in the ZYNQ SOC control module, finally processing the video images into search area images with the size of 128 × 128, and sending the search area images to the DDR storage module for storage;
3) the ZYNQ SOC control module reads template data from the image data preprocessing module according to the size of 32 × 32, and then stores the template data in 32 distributed RAMs with the size of 32 in an FPGA of the ZYNQ SOC control module;
4) the convolution operation module respectively reads the search area data and the template data through the DDR storage module, performs convolution operation, and stores a convolution result in the DDR storage module.
Preferably, in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
Further preferably, in the convolution calculation module, the convolution operation is performed on the search area image and the template image according to the following method:
1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;
2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;
3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.
Compared with the prior art, the invention has the advantages that:
the image convolution calculation is completed in 32 x (128-31)/clk (main clock) after the search area is completed, and the system is strong in real-time performance; the search area image and the convolution calculation result are stored in the DDR storage module, so that the resource consumption in the FPGA is saved, the large template calculation convolution can be operated in a chip with less resources, and the size and the cost of an image tracking system can be reduced.
Drawings
FIG. 1 is a diagram of the apparatus of the present invention;
FIG. 2 is a schematic diagram of five pretreatment modes;
FIG. 3 is a diagram illustrating a convolution operation process.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a device for quickly implementing convolution in an image tracking system, which is characterized in that a ZYNQ-7010 heterogeneous computing platform is adopted, and the device comprises an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, wherein the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, an output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with an input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and an output end of the convolution computing module is connected with the DDR storage module. In the ZYNQ SOC control module, the template size of the image convolution calculation set is 32 × 32, the search area image size for matching is 128 × 128, and the final calculation result size by the convolution calculation module is 97 × 97.
The method for quickly realizing convolution in the image tracking system by utilizing the device comprises the following steps:
1) inputting video images to the images according to line sequence time sequence;
2) preprocessing the video images in an image data preprocessing module according to a mode set in the ZYNQ SOC control module, finally processing the video images into search area images with the size of 128 × 128, and sending the search area images to the DDR storage module for storage;
referring to fig. 2, in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
The preprocessing mode is to obtain the images of the search area, different modes are adopted for different scenes or algorithms, especially in an image tracking system, the different modes are switched along with the increase or decrease of the proportion of the target in the whole field of images, and the mode effectively increases the robustness of tracking large targets and small targets.
3) The ZYNQ SOC control module reads template data from the image data preprocessing module according to the size of 32 × 32, and then stores the template data in 32 distributed RAMs with the size of 32 in an FPGA of the ZYNQ SOC control module;
4) the convolution operation module respectively reads the search area data and the template data through the DDR storage module, performs convolution operation, and stores the convolution result in the DDR storage module:
when the image data of the T-1 th line is written in the search image area, the convolution operation module starts to read the image data of the corresponding line of the T-1 search area from the DDR storage module and store the image data in the T-1 distributed RAMs of the FPGA, simultaneously reads the T template image and stores the T template image in the T distributed RAMs of the FPGA, and then writes the image data of the search area in the one line and simultaneously reads the image data of the search area in the T distributed RAM of the FPGA.
The calculation process is shown in FIG. 3; firstly, 32 search area images are read simultaneously, addresses are 0,1,2.. 31, 32 template images are obtained simultaneously, the search area image data and the template image data are subjected to parallel calculation products by 32 DSP multiplication units, accumulation summation calculation is carried out in the next period, and a summation result is obtained after 5 times of accumulation. Sequentially calculating to complete 32 times of multiplication, accumulation and summation, and then summing the results to obtain the convolution result of the search area image and the template image; sequentially changing the read addresses, such as 1,2,3.. 32; 2,3,4.. 33, and the like, and 97 image convolution results are obtained after 97 operations are carried out; then, reading the image data of the search area from the DDR storage area, storing the newly read 33 th row image data into the 1 st distributed RAM of the FPGA, then, simultaneously reading 32 search area image data, and then, performing a sorting, for example, the 1 st distributed RAM output data is the 32 th image data, the 2 nd distributed RAM output data is the 1 st image data, repeating the above steps to obtain the newly sorted 32 image data, then obtaining the image convolution result of the 2 nd row according to the above convolution calculation mode, and repeating the above two steps to obtain the image convolution result of 97 × 97. The time of each line of calculation results can be increased or reduced according to actual needs to ensure that the convolution results are output quickly after the search area is finished.

Claims (7)

1. The device for quickly realizing convolution in an image tracking system is characterized by adopting a ZYNQ-7010 heterogeneous computing platform and comprising an image data preprocessing module, a ZYNQ SOC control module, a DDR storage module and a convolution computing module, wherein the ZYNQ SOC control module is respectively connected with the image data preprocessing module and the DDR storage module, the output end of the image data preprocessing module is connected with the DDR storage module, the DDR storage module is connected with the input end of the convolution computing module through a search area image data transmission line and a template image data transmission line, and the output end of the convolution computing module is connected with the DDR storage module.
2. The apparatus for rapidly performing convolution in an image tracking system according to claim 1, wherein in the ZYNQ SOC control module, a template size for convolution calculation of an image is set to 32 x 32, a size of a search space image for matching is set to 128 x 128, and a final result size calculated by the convolution calculation module is set to 97 x 97.
3. The apparatus for fast convolution implementation in an image tracking system according to claim 1, wherein in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
4. The apparatus for quickly performing convolution in an image tracking system according to claim 1, wherein the convolution calculating module performs convolution operation on the search area image and the template image according to the following method:
1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;
2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;
3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.
5. A method for quickly performing convolution in an image tracking system using the apparatus of claim 2, comprising the steps of:
1) inputting video images to the images according to line sequence time sequence;
2) preprocessing the video images in an image data preprocessing module according to a mode set in the ZYNQ SOC control module, finally processing the video images into search area images with the size of 128 × 128, and sending the search area images to the DDR storage module for storage;
3) the ZYNQ SOC control module reads template data from the image data preprocessing module according to the size of 32 × 32, and then stores the template data in 32 distributed RAMs with the size of 32 in an FPGA of the ZYNQ SOC control module;
4) the convolution operation module respectively reads the search area data and the template data through the DDR storage module, performs convolution operation, and stores a convolution result in the DDR storage module.
6. The method of claim 5, wherein in the ZYNQ SOC control module, 5 modes of image data preprocessing are set, respectively:
mode 1: intercepting a 128 x 128 size search area image from the starting row and column position of the video image input into the image data preprocessing module;
mode 2: firstly, intercepting 256 × 256 images from the initial row and column positions of a video image input into an image data preprocessing module, then summing the image data of the adjacent 2 columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (system on a chip) control module, reading the RAM data for summation when the image data of the next row is summed, and finally dividing by 4 to obtain a 128 × 128 search area image;
mode 3: firstly, intercepting a 512 × 512 image from the initial row and column position of a video image input into the image data preprocessing module, then summing the adjacent 4 columns of image data according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data are summed in the adjacent four rows, and finally dividing the summation result by 16 to obtain a 128 × 128 search area image;
mode 4: firstly, intercepting 1024 × 1024 images from the initial row and column positions of a video image input into the image data preprocessing module, then summing the image data of the adjacent 8 columns according to a time sequence, storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (System on a chip) control module, reading the RAM data for summation when the image data of the adjacent 8 rows are summed, and finally dividing by 64 to obtain a 128 × 128 search area image;
mode 5: firstly, intercepting 2048 × 2048 images of a video image input into the image data preprocessing module from the position of a starting row and a starting column, then summing image data of 16 adjacent columns according to a time sequence, then storing a summation result into an FPGA (field programmable gate array) internal distributed RAM (random access memory) in a ZYNQ SOC (zero synchronous Q) control module, reading the RAM data for summation when the image data of 16 adjacent rows are summed, and finally dividing by 256 to obtain a 128 × 128 search area image.
7. The apparatus for quickly performing convolution in an image tracking system according to claim 5, wherein the convolution calculating module performs convolution operation on the search area image and the template image according to the following method:
1) simultaneously reading T search area image data and T template image data according to addresses, wherein the addresses are respectively 0,1 and 2.. T-1; t-1, reading first address data, performing product operation on T search area image data and T template image data to obtain T products, performing pipeline accumulation summation by using 5 periods, simultaneously reading T addresses in sequence to perform accumulation summation, summing adjacent results, and finally obtaining an image convolution result; changing an address reading mode, wherein addresses are 1,2 and 3.. T respectively; t-1, repeating the above calculation steps and switching addresses as required to obtain a line (S-T +1) image convolution result;
2) reading image data of a search area from the DDR storage module, storing the read T +1 th row of data in a 1 st distributed RAM (random access memory) of the FPGA, reordering the read T search area data before repeating the step 1, for example, storing the T +1 th row of search area image data in the 1 st distributed RAM, outputting the 1 st distributed RAM as the T-th image data and outputting the 2 nd distributed RAM as the 1 st image data by analogy, obtaining new T image data, and repeating the step 1 to obtain a new row (S-T +1) of image convolution results;
3) repeating the steps 1 and 2 to obtain (S-T +1) (S-T +1) image convolution results, and obtaining the image convolution results in the T (S-T-1) th main clock cycle after the input video image acquisition search area image is completed, and storing the image convolution results in the DDR storage module.
CN202011153379.1A 2020-10-26 2020-10-26 Device and method for quickly realizing convolution in image tracking system Active CN112330524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011153379.1A CN112330524B (en) 2020-10-26 2020-10-26 Device and method for quickly realizing convolution in image tracking system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011153379.1A CN112330524B (en) 2020-10-26 2020-10-26 Device and method for quickly realizing convolution in image tracking system

Publications (2)

Publication Number Publication Date
CN112330524A true CN112330524A (en) 2021-02-05
CN112330524B CN112330524B (en) 2024-06-18

Family

ID=74311662

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011153379.1A Active CN112330524B (en) 2020-10-26 2020-10-26 Device and method for quickly realizing convolution in image tracking system

Country Status (1)

Country Link
CN (1) CN112330524B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7882164B1 (en) * 2004-09-24 2011-02-01 University Of Southern California Image convolution engine optimized for use in programmable gate arrays
CN104035750A (en) * 2014-06-11 2014-09-10 西安电子科技大学 Field programmable gate array (FPGA)-based real-time template convolution implementing method
CN107403117A (en) * 2017-07-28 2017-11-28 西安电子科技大学 Three dimensional convolution device based on FPGA
WO2018120446A1 (en) * 2016-12-31 2018-07-05 华中科技大学 Parallel and coordinated processing method for real-time target recognition-oriented heterogeneous processor
CN108921182A (en) * 2018-09-26 2018-11-30 苏州米特希赛尔人工智能有限公司 The feature-extraction images sensor that FPGA is realized
CN109034025A (en) * 2018-07-16 2018-12-18 东南大学 A kind of face critical point detection system based on ZYNQ
CN109389120A (en) * 2018-10-29 2019-02-26 济南浪潮高新科技投资发展有限公司 A kind of object detecting device based on zynqMP
CN109859178A (en) * 2019-01-18 2019-06-07 北京航空航天大学 A kind of infrared remote sensing image real-time target detection method based on FPGA
CN109871813A (en) * 2019-02-25 2019-06-11 沈阳上博智像科技有限公司 A kind of realtime graphic tracking and system
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row
US20200012881A1 (en) * 2018-07-03 2020-01-09 Irvine Sensors Corporation Methods and Devices for Cognitive-based Image Data Analytics in Real Time Comprising Saliency-based Training on Specific Objects
US20200074288A1 (en) * 2017-12-06 2020-03-05 Tencent Technology (Shenzhen) Company Limited Convolution operation processing method and related product
WO2020119188A1 (en) * 2018-12-10 2020-06-18 广东浪潮大数据研究有限公司 Program detection method, apparatus and device, and readable storage medium
CN111460999A (en) * 2020-03-31 2020-07-28 北京工业大学 Low-altitude aerial image target tracking method based on FPGA
CN111459877A (en) * 2020-04-02 2020-07-28 北京工商大学 FPGA (field programmable Gate array) acceleration-based Winograd YO L Ov2 target detection model method

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7882164B1 (en) * 2004-09-24 2011-02-01 University Of Southern California Image convolution engine optimized for use in programmable gate arrays
CN104035750A (en) * 2014-06-11 2014-09-10 西安电子科技大学 Field programmable gate array (FPGA)-based real-time template convolution implementing method
WO2018120446A1 (en) * 2016-12-31 2018-07-05 华中科技大学 Parallel and coordinated processing method for real-time target recognition-oriented heterogeneous processor
CN107403117A (en) * 2017-07-28 2017-11-28 西安电子科技大学 Three dimensional convolution device based on FPGA
US20200074288A1 (en) * 2017-12-06 2020-03-05 Tencent Technology (Shenzhen) Company Limited Convolution operation processing method and related product
US20200012881A1 (en) * 2018-07-03 2020-01-09 Irvine Sensors Corporation Methods and Devices for Cognitive-based Image Data Analytics in Real Time Comprising Saliency-based Training on Specific Objects
CN109034025A (en) * 2018-07-16 2018-12-18 东南大学 A kind of face critical point detection system based on ZYNQ
CN108921182A (en) * 2018-09-26 2018-11-30 苏州米特希赛尔人工智能有限公司 The feature-extraction images sensor that FPGA is realized
CN109389120A (en) * 2018-10-29 2019-02-26 济南浪潮高新科技投资发展有限公司 A kind of object detecting device based on zynqMP
WO2020119188A1 (en) * 2018-12-10 2020-06-18 广东浪潮大数据研究有限公司 Program detection method, apparatus and device, and readable storage medium
CN109859178A (en) * 2019-01-18 2019-06-07 北京航空航天大学 A kind of infrared remote sensing image real-time target detection method based on FPGA
CN109871813A (en) * 2019-02-25 2019-06-11 沈阳上博智像科技有限公司 A kind of realtime graphic tracking and system
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row
CN111460999A (en) * 2020-03-31 2020-07-28 北京工业大学 Low-altitude aerial image target tracking method based on FPGA
CN111459877A (en) * 2020-04-02 2020-07-28 北京工商大学 FPGA (field programmable Gate array) acceleration-based Winograd YO L Ov2 target detection model method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALOK PRAKASH 等: "Accelerating computer vision algorithms on heterogeneous edge computing platforms", 《2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEM(SIPS)》, 23 September 2020 (2020-09-23), pages 1 - 6 *
冯坤: "多任务级联卷积的ZYNQ人脸跟踪设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2020, no. 02, 15 February 2020 (2020-02-15), pages 138 - 1142 *
周政: "基于FPGA的智能目标跟踪系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2020, no. 02, 15 February 2020 (2020-02-15), pages 135 - 747 *
李炳奇: "基于FPGA的目标检测与跟踪", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 2019, no. 08, 15 August 2019 (2019-08-15), pages 138 - 863 *

Also Published As

Publication number Publication date
CN112330524B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
CN110458279B (en) FPGA-based binary neural network acceleration method and system
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
CN112487750B (en) Convolution acceleration computing system and method based on in-memory computing
CN111583093B (en) Hardware implementation method for ORB feature point extraction with good real-time performance
CN111210019B (en) Neural network inference method based on software and hardware cooperative acceleration
Ngo et al. Resource-aware architecture design and implementation of Hough transform for a real-time iris boundary detection system
CN111582465B (en) Convolutional neural network acceleration processing system and method based on FPGA and terminal
CN117217274B (en) Vector processor, neural network accelerator, chip and electronic equipment
CN110738317A (en) FPGA-based deformable convolution network operation method, device and system
CN109472734B (en) Target detection network based on FPGA and implementation method thereof
Xiao et al. FPGA-based scalable and highly concurrent convolutional neural network acceleration
US20220253668A1 (en) Data processing method and device, storage medium and electronic device
Gong et al. Research and implementation of multi-object tracking based on vision DSP
CN102129419B (en) Based on the processor of fast fourier transform
CN112163612B (en) Big template convolution image matching method, device and system based on fpga
CN109446478A (en) A kind of complex covariance matrix computing system based on iteration and restructural mode
CN112330524B (en) Device and method for quickly realizing convolution in image tracking system
CN113158132A (en) Convolution neural network acceleration system based on unstructured sparsity
CN112508174B (en) Weight binary neural network-oriented pre-calculation column-by-column convolution calculation unit
Ngo et al. Real time iris segmentation on FPGA
Kim et al. A configurable heterogeneous multicore architecture with cellular neural network for real-time object recognition
Park et al. ShortcutFusion++: optimizing an end-to-end CNN accelerator for high PE utilization
CN113095024A (en) Regional parallel loading device and method for tensor data
Li et al. FPGA Accelerated Real-time Recurrent All-Pairs Field Transforms for Optical Flow
CN115861025B (en) Reconfigurable image processor chip architecture supporting OpenCV and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant