CN115619618A - Image processing method and system based on high-level comprehensive tool - Google Patents

Image processing method and system based on high-level comprehensive tool Download PDF

Info

Publication number
CN115619618A
CN115619618A CN202211242480.3A CN202211242480A CN115619618A CN 115619618 A CN115619618 A CN 115619618A CN 202211242480 A CN202211242480 A CN 202211242480A CN 115619618 A CN115619618 A CN 115619618A
Authority
CN
China
Prior art keywords
image
algorithm
image processing
processing
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211242480.3A
Other languages
Chinese (zh)
Inventor
王自鑫
张仕杰
陈弟虎
胡胜发
汤锦基
袁悦来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ankai Microelectronics Co ltd
Sun Yat Sen University
Original Assignee
Guangzhou Ankai Microelectronics Co ltd
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ankai Microelectronics Co ltd, Sun Yat Sen University filed Critical Guangzhou Ankai Microelectronics Co ltd
Priority to CN202211242480.3A priority Critical patent/CN115619618A/en
Publication of CN115619618A publication Critical patent/CN115619618A/en
Priority to PCT/CN2023/077516 priority patent/WO2024077833A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of image processing, and provides an image processing method and system based on a high-level comprehensive tool, which comprises the following steps: constructing an image processing flow, wherein the image processing flow comprises image denoising processing based on a GAUSS algorithm, edge extraction processing based on a SOBEL algorithm and feature point extraction processing based on a HARRIS algorithm or a FAST algorithm which are sequentially executed; optimizing one or more of cyclic expansion processing, data stream processing, pipeline processing and data segmentation processing in an image processing flow by using a high-level comprehensive tool to generate an image processing IP core; and C/RTL joint simulation is carried out on the image processing IP core in a Vivado HLS tool, and after simulation verification, the image processing IP core is called in the FPGA to complete image processing. The invention designs and optimizes the image processing algorithm by using a high-level comprehensive tool, and improves the image processing algorithm in a parallelization way, thereby improving the image processing speed, meeting the application requirements of high speed and high throughput, and being particularly suitable for the field of industrial real-time detection.

Description

Image processing method and system based on high-level comprehensive tool
Technical Field
The invention relates to the technical field of image processing, in particular to an image processing method and system based on a high-level comprehensive tool.
Background
At present, industrial real-time detection puts high requirements on the processing speed and throughput of an image processing algorithm, so that a traditional CPU (Central processing Unit) platform cannot meet the requirements and needs to be turned to a hardware platform to accelerate the image processing algorithm. In the field of industrial inspection, common image processing algorithms include GAUSS filtering, SOBEL filtering, HARRIS corner extraction, FAST corner extraction, and the like.
The traditional industrial vision detection system can not meet the requirements of modern industrial fields mainly including low detection precision, low detection speed and small detection total amount. The industrial-grade high-precision camera can solve the problems of low detection precision and insufficient detection rate, but the problems of large image original data volume, high image transmission rate and the like are brought along. The processing speed of the image algorithm becomes a bottleneck of the industrial detection system. Industrial inspection systems for CPU platforms cannot accommodate the increasing amount of raw image data and the increasing complexity of image processing algorithms. Image processing algorithms require efficient development and high-speed, high-throughput hardware platforms to meet modern industrial vision inspection requirements. In the field of FPGA acceleration, FPGAs have stronger raw data computation and reconfigurability, for example, FPGAs can process and adjust data of any precision. On one hand, pipeline optimization is adopted for FPGA hardware acceleration, and the calculation processing processes of all stages can be overlapped; on the other hand, the FPGA can cancel the delay of data transmission and data calculation through pipeline optimization, and high real-time performance is achieved.
Disclosure of Invention
The invention provides an image processing method and system based on a high-level comprehensive tool, aiming at overcoming the defects of low detection precision, low detection rate and small detection total amount of the traditional industrial visual detection in the prior art.
In order to solve the technical problems, the technical scheme of the invention is as follows:
an image processing method based on a high-level comprehensive tool comprises the following steps:
constructing an image processing flow, wherein the image processing flow comprises image denoising processing based on a GAUSS algorithm, edge extraction processing based on a SOBEL algorithm and feature point extraction processing based on a HARRIS algorithm or a FAST algorithm which are sequentially executed;
optimizing one or more of cyclic expansion processing, data stream processing, pipeline processing and data segmentation processing adopted by the image processing flow by using a high-level comprehensive tool to generate an image processing IP core;
and C/RTL joint simulation is carried out on the image processing IP core in a Vivado HLS tool, and the image processing IP core is called in the FPGA to finish image processing after simulation verification.
Furthermore, the invention also provides an image processing system based on the high-level comprehensive tool, and the image processing method of the high-level comprehensive tool is applied to the image processing system.
The image processing system comprises an image acquisition module, an image processing module and an image display module which are connected in sequence; the image processing module comprises an image denoising unit based on a GAUSS algorithm, an edge extraction unit based on a SOBEL algorithm, and a feature point extraction unit based on a HARRISS algorithm or a FAST algorithm; the image denoising unit, the edge extraction unit and the feature point extraction unit comprise an image processing IP core which is obtained by optimizing and generating one or more of cyclic expansion processing, data stream processing, pipeline processing and data segmentation processing on an image processing flow by using a high-level comprehensive tool.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: the invention designs and optimizes the image processing algorithm by using a high-level comprehensive tool, and improves the image processing algorithm in a parallelization way, thereby improving the image processing speed, meeting the application requirements of high speed and high throughput, and being particularly suitable for the field of industrial real-time detection.
Drawings
Fig. 1 is a flowchart of an image processing method based on a high-level integration tool according to embodiment 1.
Fig. 2 is a flow chart of the GAUSS filtering algorithm.
Fig. 3 is a pseudo code diagram of the weight parameter calculation portion of the GAUSS filter algorithm.
Fig. 4 is a pseudo code diagram of the GAUSS filter algorithm GAUSS filter calculation section.
Fig. 5 is a flow chart of the SOBEL filtering algorithm.
Fig. 6 is a pseudo-code diagram of the SOBEL filtering algorithm.
Fig. 7 is a flow chart of the HARRIS corner point extraction algorithm.
Fig. 8 is a pseudo-code diagram of the HARRIS corner point extraction algorithm.
Fig. 9 is a flowchart of the FAST feature point extraction algorithm.
Fig. 10 is a pseudo-code diagram of the FAST feature point extraction algorithm.
Fig. 11 is a diagram illustrating the effect of image processing at each stage in embodiment 2.
Fig. 12 is an architecture diagram of an image processing system based on a high-level integration tool of embodiment 3.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The present embodiment provides an image processing method based on a high-level integration tool, and as shown in fig. 1, the method is a flowchart of the image processing method based on the high-level integration tool of the present embodiment.
The image processing method based on the high-level synthesis tool provided by the embodiment comprises the following steps:
s1, constructing an image processing flow, wherein the image processing flow comprises image denoising processing based on a GAUSS algorithm, edge extraction processing based on a SOBEL algorithm and feature point extraction processing based on a HARRIS algorithm or a FAST algorithm which are sequentially executed.
And S2, optimizing one or more of cyclic expansion processing, data stream processing, pipeline processing and data segmentation processing adopted by the image processing flow by using a high-level comprehensive tool to generate an image processing IP core.
And S3, carrying out C/RTL joint simulation on the image processing IP core in a Vivado HLS tool, and calling the image processing IP core in the FPGA to finish image processing after simulation verification.
The four image processing algorithms of GAUSS, SOBEL, HARRISS and FAST are optionally designed by using C language, and then the designed image processing algorithms are subjected to cyclic expansion, data flow, pipeline and array segmentation to improve the image processing speed, so that the image processing speed is improved, and the application requirements of high speed and high throughput are met.
The image processing method provided by the embodiment is mainly designed for industrial detection scenes, and in a specific implementation process, a template image and an image to be detected are subjected to denoising processing by using a GAUSS algorithm, so that the influence of camera bottom noise is reduced; then, carrying out histogram equalization and carrying out edge extraction by using a SOBEL algorithm; then, extracting feature points by using a HARRISS algorithm and a FAST algorithm to complete image registration; and finally, judging and outputting a defect map through difference value calculation and a threshold value. The four image processing algorithms of GAUSS, SOBEL, HARRISS and FAST are optimized by using a high-level comprehensive tool and adopting one or more of cyclic expansion processing, data stream processing, pipeline processing and data segmentation processing to generate an image processing IP core subjected to accelerated optimization.
In an optional embodiment, the image denoising processing based on the GAUSS algorithm includes the following steps:
(1) Initialization operation: importing an initial weight parameter and an input image, wherein the weight parameter comprises an offset parameter sigma and a weight parameter kernel size n;
(2) Calculating a weight parameter: calculating and storing the weight parameters sigma and n through a GAUSS function, repeating the calculation process until the boundary of the weight parameters is traversed, and outputting the size n of a weight parameter kernel;
(3) Convolution processing: and carrying out convolution calculation on the image and the weight parameter, carrying out average calculation on the image and the sum of the weight parameter, repeating the calculation process until the image is traversed, outputting the image which is denoised, and storing the weight parameter.
As shown in fig. 2, it is a flow chart of the GAUSS filtering algorithm, and a dotted line frame in the diagram is a higher delay part, which is also a part requiring optimization. The pseudo code is shown in fig. 3 and 4.
The embodiment divides the GAUSS filtering algorithm into a weight function calculation part and a GAUSS filtering calculation part. In one embodiment, the operating parameters before optimization are shown in Table 1 below.
TABLE 1 delay of before-optimization GAUSS Filter Algorithm
Figure BDA0003885303880000041
The weight function calculation is divided into two loops for processing, the delay (Lantency) of the first part is 51 clock cycles, the Iteration delay (Iteration Lantency) is 17 clock cycles, and the Trip count (Trip count) is 3. The total delay is the product of the iteration delay and the run count. The second part delay is 36 clock cycles, the iteration delay is 12, and the run count is 3. The number of clock cycles consumed by the two parts of the weight function in the iteration delay is higher, and exceeds 10 clock cycles, which reflects that the FPGA is not good at performing exponent and floating point calculation. The second function is weight convolution calculation, and the function is divided into initialization, row cycle calculation and column cycle calculation steps. The delay in the initialized loop body is 1024 clock cycles, the delay is high, and the iteration delay is 2 clock cycles. Each Loop does not generate data dependence and belongs to a Perfect Loop body (Perfect Loop), namely, the data result in each iteration in the Loop body is irrelevant, and the Loop body can be circularly expanded or pipelined. The delay in the row cycle is more than 2099712 clock cycles, where the time consuming body is. The row loop and the column loop have nested loop relation. The delay of the column loop in the nested loop is 4096 clock cycles and the iteration delay is 8 cycles, while each iteration of the row loop is affected by the delay of the column loop, so that optimization is needed in this section. In addition, since the weight parameter of the GAUSS is frequently used after the calculation is completed, the weight parameter needs to be solidified, and the time for scheduling the parameter is reduced in the process of the pipeline.
Further, the step of optimizing the image denoising process flow based on the GAUSS algorithm by using a high-level comprehensive tool comprises the following steps:
circularly expanding the initialization operation of the GAUSS algorithm by using a pragma HLS UNROLL instruction;
the pragma HLS PIPELINE instruction is used for carrying out pipeline processing on the weight parameter calculation;
the pragma HLS PIPELINE instruction is used for carrying out pipeline processing on the convolution of the image and the weight parameter;
the pragma ARRAY _ PARTITION complete instruction is used to store the weight parameters for ARRAY PARTITION.
In this embodiment, for the GAUSS algorithm, the optimization is focused on parallelism optimization, and specifically, a pragma preprocessing instruction is used to guide a high-level synthesis tool to perform targeted optimization.
In the embodiment, the image denoising processing based on the GAUSS algorithm is divided into two parts, the first part is a weight parameter algorithm, the input of the weight parameter algorithm is σ and n, σ determines the weight parameter offset, and n determines the size of the weight parameter kernel. The output is a weight parameter kernel of length and width n. In this section, the weight function calculation involves more complex floating point and exponent calculations, and the loop body in the weight parameter algorithm is a nested loop body, the boundary condition of the loop body depends on the algorithm input n, in this embodiment, the initialization operation is circularly expanded using the # pragma HLS UNROLL instruction, so that the initialization loop is completely expanded, and the weight parameter calculation is pipelined using the # pragma HLS PIPELINE instruction.
The second part of algorithm is a background noise removal algorithm, which is composed of two loop bodies, wherein the two loop bodies are respectively the calculation of the weight parameter sum and the image convolution, the calculation of the image convolution comprises four layers of nested loops, and the outer layer loop has data dependency. The present embodiment pipelines the convolution of the image with the weight parameters using the # pragma HLS PIPELINE instruction, and stores the weight parameters for ARRAY division using the # pragma ARRAY _ PARTITION complete instruction. In a specific implementation process, complete is selected as a type parameter, and arrays are all scattered and implemented in a register form.
The operating parameters of the GAUSS filtering algorithm after high-level comprehensive optimization are shown in table 2 below.
TABLE 2 delay of optimized GAUSS Filter Algorithm
Figure BDA0003885303880000061
As can be seen from tables 1 and 2, the delay of the column cycle in the weight convolution calculation is reduced to 516 clock cycles after performing pipelining optimization, the original delay of the column cycle is 4096 clock cycles, the delay is one eighth of the original delay, and the start Interval (Initiation Interval) adopted is 1. The delay of the loop initialization function after pipeline optimization becomes 512.
In an alternative embodiment, the SOBEL algorithm-based edge extraction process includes the steps of:
importing a horizontal direction convolution kernel, a vertical direction convolution kernel and a SOBEL operator;
convolving the input image by the horizontal convolution kernel and the vertical convolution kernel respectively, and traversing the image to obtain a horizontally filtered image and a vertically filtered image;
carrying out weighted average on the horizontally filtered image and the vertically filtered image by using a SOBEL operator to derive an image for completing edge extraction; and storing the SOBEL operator.
As shown in fig. 5, a flow chart of the SOBEL filtering algorithm is shown, wherein the dashed line block diagram is the portion with higher delay. The SOBEL algorithm performs an edge extraction operation through a SOBEL operator, wherein the SOBEL operator is calculated from a first derivative. In this embodiment, two directions are taken for edge extraction, namely, the horizontal direction and the vertical direction. The input and output of the algorithm are the input and output of the image, respectively. The SOBEL algorithm is composed of two function bodies, namely a SOBEL filter function and a weighted calculation function. There are a large number of loop bodies, including nested loop bodies, in both function bodies. Wherein the perfect circulation body is particularly suitable for carrying out relevant optimization by utilizing a high-level comprehensive technology. Fig. 6 shows a pseudo-code diagram of the SOBEL filtering algorithm.
Further, the step of optimizing the edge extraction processing flow based on the SOBEL algorithm by using a high-level synthesis tool comprises:
linking the convolution in the horizontal direction with the weighted calculation by using a pragma HLS DATAFLOW instruction, and linking the convolution in the vertical direction with the weighted calculation to process data stream;
the pragma HLS PIPELINE instruction is used for carrying out pipeline processing on the horizontal direction convolution, the vertical direction convolution and weighted average calculation;
the SOBEL operator is divided into one-dimensional ARRAY storage with the pragma ARRAY _ PARTITION variable instruction.
In this embodiment, it is considered that two main functions in the SOBEL algorithm have large delay consumption, where the run count is the length and width of the picture and is also a boundary condition of the loop. The present embodiment uses the # pragma HLS DATAFLOW instruction to concatenate the horizontal and vertical convolutions with the weighted calculation for data flow processing. Where the DATAFLOW pragma supports task-level pipelining, allowing functions and loops to overlap in their operation, increasing the concurrency of RTL implementations, and increasing the overall throughput of the design.
In the SOBEL algorithm, the horizontal filtering image and the vertical filtering image are processed in parallel, and because the two filtering results have no data dependency relationship, a parallel framework can be realized, and the SOBEL algorithm is a perfect loop body and is suitable for using pipeline optimization. The present embodiment pipelines horizontal convolution, vertical convolution, and weighted average calculations using the # pragma HLS PIPELINE instruction. Because it is a perfect loop body, there is no blocking PIPELINE, so PIPELINE pragma is used to execute operations concurrently. In one embodiment, the default start-up interval is 1.
In addition, the # pragma ARRAY _ PARTITION variable instruction is used in the SOBEL operator part to divide the SOBEL operator into one-dimensional ARRAY storage, which is beneficial for the scheduling of pipeline operation directly afterwards.
The SOBEL algorithm operating parameters before and after optimization are shown in tables 3 and 4 below, respectively.
TABLE 3 SOBEL Algorithm delay before optimization
Figure BDA0003885303880000071
TABLE 4 optimized SOBEL algorithm delay
Figure BDA0003885303880000072
As can be seen from the above table, after pipeline optimization of the directional filtering calculation, the delay of the row cycle is reduced by two thirds, and the delay of the column cycle is reduced to 515 clock cycles. After the row cycle in the weighting calculation is optimized, the delay is reduced by two thirds, and the delay of the column cycle is reduced to 512 clock cycles. The total delay is reduced to one third of the original.
In an alternative embodiment, the feature point extraction process based on the HARRIS algorithm includes the following steps:
importing a horizontal convolution kernel and a vertical convolution kernel;
convolving the image subjected to edge extraction processing by the horizontal convolution kernel and the vertical convolution kernel respectively, and traversing the image to obtain a horizontally filtered image and a vertically filtered image;
the horizontally filtered image is processed through a square calculation function, the vertically filtered image is processed through a square calculation function, and the horizontally filtered image and the vertically filtered image are input into an R response function for calculation;
and judging a window threshold according to the calculated R response function value, and outputting an image after marking an angular point on the image.
As shown in fig. 7, a flow chart of the HARRIS corner extraction algorithm is shown, where the input of the algorithm is an original image, and the output is an image after feature points are calibrated. The HARRISS corner point extraction algorithm mainly comprises five function bodies, namely a direction filter function, a square calculation function, a multiplication calculation function, an R response function and a screening function. The directional filtering function is the same as that of the edge detection algorithm, but in the original edge detection function, the horizontal filtering and the vertical filtering are combined by using the weighting function, and in the HARRIS feature point extraction, the directional filtering image does not need to be weighted, but the horizontal filtering image and the vertical filtering image need to be derived, so the input of the function is the original image, and the output is the horizontal filtering image and the vertical filtering image. The square calculation function and the multiplication calculation function are element calculations in the R response function calculation, and are an image product and an image square calculation, respectively. The R response function is followed by the selection of feature points by the screening function.
Fig. 8 is a schematic diagram of pseudo code of the HARRIS corner point extraction algorithm. In addition to the directional filtering function, the perfect circular body is included in all the other four functions.
Further, the step of optimizing the characteristic point extraction processing flow based on the HARRISS algorithm by using a high-level comprehensive tool comprises the following steps:
respectively carrying out pipeline processing on horizontal direction convolution, vertical direction convolution, horizontal direction filtering square, vertical direction filtering square, horizontal and vertical filtering image power, R response function calculation and window threshold judgment by a pragma HLS PIPELINE instruction;
carrying out data flow processing on the read image operation during the R response function calculation by using a pragma HLS DATAFLOW instruction;
and carrying out loop expansion optimization on the window threshold judgment operation by using a pragma HLS UNROLL instruction.
According to the processing process of the HARRIS corner extraction algorithm, after the horizontal filtering image and the vertical filtering image are calculated, the two filtering images need to be subjected to squaring and power operations, the squaring operation of the two filtering images is to square each pixel point of the images, the most time-consuming part is to traverse the pixel points of the whole image, and then the image transmission operation is performed. In the image transmission operation, a pipeline optimization instruction is used for accelerating the image transmission speed. And (4) expanding the calculation part of the traversal image into parallel operation by using loop expansion optimization in the square and power parts of the image. When the R response function is calculated, the horizontal filter square diagram, the vertical filter square diagram, and the horizontal and vertical filter square diagrams need to be read, much time is consumed when an image is read, and the time consumption can be reduced in the reading operation of the image product and the image square by pipeline optimization and data flow optimization.
In this embodiment, a # pragma HLS PIPELINE instruction and a # pragma HLS DATAFLOW instruction are added to a corresponding function body, a factor =1 is specified, pipeline can shorten an instruction trigger interval within a C function or a C loop, a DATAFLOW compilation indicates that task-level pipeline beating is enabled, functions and loops are allowed to overlap in the operation process, the concurrency of RTL implementation is increased, and the overall throughput of design is increased.
The threshold part algorithm of the judgment window mainly judges the threshold of the calculation result of the R response function, optionally uses pipeline operation to accelerate image reading in image reading, and uses circular expansion optimization and accelerates threshold judgment in traversing threshold judgment. The present embodiment reduces the total delay of feature point detection by performing high-level comprehensive loop-unrolling optimization on the above functions. The operating parameters of the HARRIS corner extraction algorithm before and after optimization are respectively shown in the following tables 5 and 6.
TABLE 5 HARRIS corner extraction Algorithm delay before optimization
Figure BDA0003885303880000091
TABLE 6 optimized HARRIS corner extraction algorithm delay
Figure BDA0003885303880000101
As can be seen from the above table, the total delay of each function is significantly reduced, especially in the R corresponding function operation, the delay of the line cycle is reduced from 1310720 clock cycles to 264192 clock cycles after being optimized. In addition, the start interval is reduced from 3952172 clock cycles before optimization to 268326 clock cycles.
In an optional embodiment, the step of feature point extraction processing based on FAST algorithm includes:
importing Bresham circular coordinates and an image subjected to edge extraction processing;
performing Bresham circle difference operation on each selected pixel point through a candidate corner point selection function, performing continuity comparison on the pixel points in the Bresham circle, selecting candidate corner points, and calculating the corner point value of each candidate corner point;
importing the selected candidate corner points, and carrying out non-maximum suppression on the candidate corner points through a non-maximum suppression function to screen out the corner points;
and marking angular points on the image and outputting the image.
As shown in fig. 9, a flowchart of the FAST feature point extraction algorithm is shown, where the algorithm mainly includes two function bodies, a function for selecting candidate corner points and a non-maximum suppression function. The candidate corner selection function is to select candidate corners by continuously comparing pixel points in the Bresham circle and calculate the corner value of each candidate corner. The non-maximum suppression function mainly screens candidate corners. When candidate angular points appear in the screening window, angular points in the window are compared in terms of angular point values, and the angular points are selected.
Fig. 10 is a schematic diagram of pseudo code of the FAST feature point extraction algorithm. As can be seen from the pseudo-code, a loop body is included in both function bodies.
Delay data of the FAST feature point extraction algorithm shown in table 7 are obtained by performing a delay test.
TABLE 7 FAST feature point extraction algorithm delay before optimization
Figure BDA0003885303880000111
As can be seen from the above table, the total delay of the candidate corner selection function is 1129533 clock cycles. In the calculation process, the central point is required to be compared with the points on the surrounding Bresham circle, the next operation can be carried out only when a certain surrounding pixel point meets the condition, and otherwise, the point is skipped. This property makes it more branched. The total delay of the non-maximum suppression function is 7372306 clock cycles, with the read data delay of 1545 clock cycles. The column cycle delay in the image traversal is 14364 clock cycles and the iteration delay is 28 clock cycles. The two functions are the main part of the algorithm with high delay, and will be mainly optimized in the acceleration design, so as to further accelerate the FAST corner extraction.
Further, the step of optimizing the characteristic point extraction processing flow based on the FAST algorithm by using a high-level comprehensive tool comprises the following steps:
performing Bresham round difference operation on the selected pixel points by using a pragma HLS PIPELINE instruction to perform pipeline optimization;
the non-maximum suppression operation of the window with the pragma HLS UNROLL instruction is optimized in the form of loop unrolling.
In this embodiment, a # pragma HLS PIPELINE instruction is used to perform Bresham circle difference operation on the selected pixel points to perform pipeline optimization, where Bresham circle difference operation on each selected center point is involved, and is a main time delay consumption part of the algorithm.
After candidate corner points are derived, non-maximum suppression of the window is performed, and the consumption of the non-maximum suppression of the window in the function delay analysis is high, so that optimization needs to be performed in an important mode. Since the window boundary is fixed, the present embodiment uses the # pragma HLS UNROLL instruction to perform optimization in a loop unrolling manner, and simultaneously, the part where the candidate corner points are imported may use the # pragma HLS PIPELINE II =1 instruction to perform pipeline optimization.
In addition, in order to save transmission time during the process of importing and exporting the images, a pipeline optimization instruction is optionally used, and the pipeline optimization is used for achieving the image transmission rate close to parallel. In the process of importing the 7x7 sliding window, 16 pixels around the target pixel are stored as units by using the array _ partition complete dim =1 instruction, and then the call time can be reduced when successive comparison is performed.
And classifying after obtaining the difference value of each pixel point, and reducing the pixel traversal time by using a circular expansion instruction in the classification process. The complete instruction is # pragma array _ partition type = complete dim =1.type defaults to complete partitioning of the array into multiple independent elements. For a one-dimensional array, this corresponds to parsing the memory into independent registers. dim =1 specifies the dimension of the multidimensional array of zones, dim =1 representing the partitioning of the 1 st dimension.
The FAST algorithm delay data after high-level comprehensive optimization is shown in table 8 below.
TABLE 8 optimized FAST Algorithm delay
Figure BDA0003885303880000121
Comparing tables 7 and 8, it can be seen that in the FAST algorithm after the high-level comprehensive optimization, the delay is significantly reduced, especially in the candidate angle selection function part algorithm, the total delay is reduced from 1129533 clock cycles to 272440 clock cycles.
The image processing algorithm is designed and optimized by using a high-level comprehensive tool, and is improved in a parallelization mode aiming at the image processing algorithm, so that the image processing speed is increased, the application requirements of high speed and high throughput are met, and the method is particularly suitable for the field of industrial real-time detection.
Example 2
In this embodiment, the image processing method of the high-level integration tool provided in embodiment 1 is applied to the field of industrial real-time detection.
First, an image processing flow is constructed. The image processing flow in the present embodiment includes:
(1) Image denoising processing based on a GAUSS algorithm;
(2) Carrying out gray level histogram equalization processing;
(3) Edge extraction processing based on the SOBEL algorithm;
(4) Extracting and processing the characteristic points based on a HARRISS algorithm or a FAST algorithm;
(5) Carrying out image binarization segmentation processing based on an OTUS algorithm;
(6) Calculating a difference value;
(7) And displaying the picture.
In the image processing flow, firstly, denoising the template image and the image to be detected by using a GAUSS algorithm, wherein the purpose of denoising is to reduce the influence of the camera bottom noise; then, carrying out histogram equalization and carrying out edge extraction by using a SOBEL algorithm; then, extracting feature points by using a HARRIS algorithm and a FAST algorithm to complete image registration; and finally, judging and outputting a defect map through difference value calculation and a threshold value.
The image processing method comprises the steps of carrying out optimization on processing flows such as image denoising processing based on a GAUSS algorithm, edge extraction processing based on a SOBEL algorithm, feature point extraction processing based on a HARRISS algorithm or a FAST algorithm and the like by using a high-level comprehensive tool and adopting one or more of cyclic expansion processing, data stream processing, pipeline processing and data segmentation processing, and generating an image processing IP core.
And C/RTL joint simulation is carried out on the IP core subjected to high-level comprehensive optimization in a Vivado HLS tool, and after simulation verification, the image processing IP core is called in the FPGA to complete image processing.
And carrying out image quality comparison test on the IP core subjected to accelerated optimization and setting up an industrial detection flow for testing. As shown in fig. 11, it is a diagram of the effect of image processing in each stage of this embodiment. Fig. 11 (a) shows an original picture, which also has a camera noise that needs to be filtered. Fig. 11 (b) shows an image subjected to noise reduction processing, and feature information can be highlighted by removing extraneous noise. Fig. 11 (c) shows the histogram-equalized image, and the contrast is significantly increased. Fig. 11 (d) and 11 (e) are edge and feature point based extractions, respectively, facilitating subsequent image registration operations. Fig. 11 (f) is a binary image after the OTSU method, and a threshold value with the largest difference between the feature and the background is selected by the OTSU method. Fig. 11 (g) shows the image after the difference calculation, and it can be found that there is other irrelevant information in the image besides the defect information. Finally, the irrelevant information is deleted through an on operation to obtain a defect detection effect diagram of FIG. 11 (h), and a white spot part is a detected defect position.
Further, in this embodiment, the IP core before optimization is used as a reference group, and the IP core after accelerated optimization is used as an improved group, so as to perform an image processing test. The resource consumption and time consumption of each algorithm are compared as shown in tables 9 to 16 below.
TABLE 9 GAUSS Filtering Algorithm resource consumption
Figure BDA0003885303880000141
TABLE 10 GAUSS Filtering Algorithm time consumption
Figure BDA0003885303880000142
TABLE 11 SOBEL Filtering Algorithm resource consumption
Figure BDA0003885303880000143
TABLE 12 SOBEL Filtering Algorithm time consumption
Figure BDA0003885303880000144
TABLE 13 HARRISS Algorithm resource consumption
Figure BDA0003885303880000145
TABLE 14 HARRISS Algorithm time consumption
Figure BDA0003885303880000151
TABLE 15 FAST Algorithm resource consumption
Figure BDA0003885303880000152
TABLE 16 FAST Algorithm time consumption
Figure BDA0003885303880000153
As can be seen from the above table, the embodiment can realize industrial visual inspection with higher inspection accuracy, inspection rate and inspection total amount under the condition of consuming the same resource, effectively improve the image processing speed, and meet the application requirement of high speed and high throughput.
Example 3
The present embodiment provides an image processing system based on a high-level integration tool, and applies the image processing method of the high-level integration tool provided in embodiment 1. As shown in fig. 12, is an architecture diagram of the image processing system of the present embodiment.
The image processing system provided by the embodiment comprises an image acquisition module, an image processing module and an image display module which are sequentially connected.
The image processing module comprises an image denoising unit based on a GAUSS algorithm, an edge extraction unit based on a SOBEL algorithm and a feature point extraction unit based on a HARRIS algorithm or a FAST algorithm.
The image denoising unit, the edge extraction unit, and the feature point extraction unit in this embodiment include an image processing IP kernel that is generated by optimizing one or more of cyclic expansion processing, data stream processing, pipeline processing, and data segmentation processing for an image processing flow by using a high-level comprehensive tool.
In an optional embodiment, the step of optimizing the image denoising process flow based on the GAUSS algorithm by using a high-level synthesis tool includes:
circularly expanding the initialization operation of the GAUSS algorithm by using a pragma HLS UNROLL instruction;
the pragma HLS PIPELINE instruction is used for carrying out pipeline processing on the weight parameter calculation;
the pragma HLS PIPELINE instruction is used for carrying out pipeline processing on the convolution of the image and the weight parameter;
the weight parameter is stored in the pragma ARRAY _ PARTITION complete instruction for ARRAY PARTITION.
In an alternative embodiment, the step of optimizing the SOBEL algorithm-based edge extraction process flow using a high-level synthesis tool includes:
the pragma HLS DATAFLOW instruction is used for connecting the convolution in the horizontal direction with the weighting calculation and connecting the convolution in the vertical direction with the weighting calculation to process the data stream;
the pragma HLS PIPELINE instruction is used for carrying out pipeline processing on the horizontal direction convolution, the vertical direction convolution and weighted average calculation;
the SOBEL operator is divided into one-dimensional ARRAY storage with the pragma ARRAY _ PARTITION variable instruction.
In an optional embodiment, the step of optimizing the characteristic point extraction processing flow based on the HARRIS algorithm by using a high-level synthesis tool includes:
respectively carrying out pipeline processing on horizontal direction convolution, vertical direction convolution, horizontal direction filtering square, vertical direction filtering square, horizontal and vertical filtering image power, R response function calculation and window threshold judgment by a pragma HLS PIPELINE instruction;
performing data flow processing on the read image operation during R response function calculation by using a pragma HLS DATAFLOW instruction;
and carrying out loop expansion optimization on the window threshold judgment operation by using a pragma HLS UNROLL instruction.
In an optional embodiment, the step of optimizing the characteristic point extraction processing flow based on the FAST algorithm by using a high-level synthesis tool includes:
performing Bresham round difference operation on the selected pixel points by using a pragma HLS PIPELINE instruction to perform pipeline optimization;
the non-maximum suppression operation of the window with the pragma HLS UNROLL instruction is optimized in the form of loop unrolling.
In a specific implementation process, a camera OV5640 is used as an image acquisition module, and an ARM core is used for configuring a register of the camera through an SCCB protocol. After configuration is completed, the image enters ZynqUltraScale +, an image in an RGB565 format is obtained in Zynq through a camera acquisition interface, and the image is converted into an RGB888 format for subsequent operation. As the inside of the XILINX chip provides a stronger on-chip interconnection bus AXI, the image RGB information is converted into AXI-Stream, so that interconnection can be carried out more conveniently, and the image information is transmitted in an AXI form subsequently. The collected image data needs to be cached in the DDR for subsequent operations, while access to the DDR in the PL needs to be made through the VDMA. The collected two frames of data are stored in the DDR, and meanwhile, the image processing module reads the cached image through the VDMA. The image processing module reads images through an AXI bus, and after image processing is completed by each IP core in the image processing module, the images are cached to a DDR through a VDMA and are read by a PS end. And outputting the PS through a DP interface protocol after the PS reads. Since the DP protocol clock of the output image requires 74.25MHz, the PS output clock 100MHz is converted to 74.25MHz by the PLL. And finally, outputting the signal to an image display module for display through a level conversion circuit and a single-end to differential circuit.
The terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. An image processing method based on a high-level synthesis tool is characterized by comprising the following steps:
constructing an image processing flow, wherein the image processing flow comprises image denoising processing based on a GAUSS algorithm, edge extraction processing based on a SOBEL algorithm and feature point extraction processing based on a HARRIS algorithm or a FAST algorithm which are sequentially executed;
optimizing one or more of cyclic expansion processing, data stream processing, pipeline processing and data segmentation processing adopted by the image processing flow by using a high-level comprehensive tool to generate an image processing IP core;
and C/RTL joint simulation is carried out on the image processing IP core in a Vivado HLS tool, and the image processing IP core is called in the FPGA to finish image processing after simulation verification.
2. The image processing method according to claim 1, wherein the image denoising process based on the GAUSS algorithm comprises the steps of:
(1) Initialization operation: importing an initial weight parameter and an input image, wherein the weight parameter comprises an offset parameter sigma and a size n of a weight parameter kernel;
(2) Calculating a weight parameter: calculating and storing the weight parameters sigma and n through a GAUSS function, repeating the calculation process until the boundary of the weight parameters is traversed, and outputting the size n of a weight parameter kernel;
(3) Convolution processing: and carrying out convolution calculation on the image and the weight parameter, carrying out average calculation on the image and the sum of the weight parameter, repeating the calculation process until the image is traversed, outputting the image which is denoised, and storing the weight parameter.
3. The image processing method according to claim 2, wherein the step of optimizing the image denoising process flow based on the GAUSS algorithm by using a high-level synthesis tool comprises:
circularly expanding the initialization operation of the GAUSS algorithm by using a pragma HLS UNROLL instruction;
the pragma HLSPIPELINE instruction is used for carrying out pipeline processing on the weight parameter calculation;
carrying out pipeline processing on the convolution of the image and the weight parameter by a pragma HLSPIPELINE instruction;
the pragma ARRAY _ PARTITION complete instruction is used to store the weight parameters for ARRAY PARTITION.
4. The image processing method according to claim 1, wherein the SOBEL algorithm-based edge extraction process comprises the steps of:
importing a horizontal convolution kernel, a vertical convolution kernel and a SOBEL operator;
convolving the input image by the horizontal convolution kernel and the vertical convolution kernel respectively, and traversing the image to obtain a horizontally filtered image and a vertically filtered image;
carrying out weighted average on the horizontally filtered image and the vertically filtered image by using a SOBEL operator to derive an image for completing edge extraction; and storing the SOBEL operator.
5. The image processing method of claim 4, wherein the step of optimizing the SOBEL algorithm-based edge extraction process flow using a high-level synthesis tool comprises:
linking the convolution in the horizontal direction with the weighted calculation by using a pragma HLSDATAFLOW instruction, and linking the convolution in the vertical direction with the weighted calculation to process data stream;
carrying out pipeline processing on the horizontal direction convolution, the vertical direction convolution and the weighted average calculation by using a pragma HLSPIPELINE instruction;
the SOBEL operator is divided into one-dimensional ARRAY storage by pragma ARRAY _ PARTITION variable instruction.
6. The image processing method according to claim 1, wherein the HARRIS algorithm-based feature point extraction process includes the steps of:
importing a horizontal convolution kernel and a vertical convolution kernel;
convolving the image subjected to edge extraction processing by the horizontal convolution kernel and the vertical convolution kernel respectively, and traversing the image to obtain a horizontally filtered image and a vertically filtered image;
the horizontally filtered image passes through a square calculation function, the vertically filtered image passes through the square calculation function, and the horizontally filtered image and the vertically filtered image are input into an R response function for calculation after being calculated and processed through a power calculation function;
and judging a window threshold according to the calculated R response function value, and outputting an image after marking an angular point on the image.
7. The image processing method according to claim 6, wherein the step of optimizing the character point extraction process flow based on the HARRISS algorithm by using the high-level synthesis tool comprises:
performing pipeline processing on horizontal direction convolution, vertical direction convolution, horizontal direction filtering square, vertical direction filtering square, horizontal and vertical filtering image power, R response function calculation and window threshold judgment respectively by using a pragma HLSPIPELINE instruction;
carrying out data stream processing on the read image operation during the R response function calculation by using a pragma HLSDATAFLOW instruction;
and carrying out loop expansion optimization on the window threshold judgment operation by using a pragma HLS UNROLL instruction.
8. The image processing method according to claim 1, wherein the step of feature point extraction processing based on the FAST algorithm includes:
importing Bresham circular coordinates and an image subjected to edge extraction processing;
performing Bresham circle difference operation on each selected pixel point through a candidate angular point selection function, performing continuity comparison on the pixel points in the Bresham circle, selecting candidate angular points, and calculating the angular point value of each candidate angular point;
importing the selected candidate corner points, carrying out non-maximum suppression on the candidate corner points through a non-maximum suppression function, and screening out the corner points;
and marking angular points on the image and outputting the image.
9. The image processing method according to claim 8, wherein the step of optimizing the characteristic point extraction process flow based on the FAST algorithm by using a high-level synthesis tool comprises:
carrying out Bresham round difference operation on the selected pixel points by using a pragma HLSPIPELINE instruction to carry out pipeline optimization;
the non-maximum suppression operation of the window with the pragma HLS UNROLL instruction is optimized in the form of loop unrolling.
10. An image processing system based on a high-level comprehensive tool, which applies the image processing method of any one of claims 1 to 9 and is characterized by comprising an image acquisition module, an image processing module and an image display module which are connected in sequence;
the image processing module comprises an image denoising unit based on a GAUSS algorithm, an edge extraction unit based on a SOBEL algorithm, and a feature point extraction unit based on a HARRISS algorithm or a FAST algorithm;
the image denoising unit, the edge extraction unit and the feature point extraction unit comprise an image processing IP core which is obtained by optimizing and generating one or more of cyclic expansion processing, data stream processing, pipeline processing and data segmentation processing on an image processing flow by using a high-level comprehensive tool.
CN202211242480.3A 2022-10-11 2022-10-11 Image processing method and system based on high-level comprehensive tool Pending CN115619618A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211242480.3A CN115619618A (en) 2022-10-11 2022-10-11 Image processing method and system based on high-level comprehensive tool
PCT/CN2023/077516 WO2024077833A1 (en) 2022-10-11 2023-02-21 Image processing method and system based on high-level synthesis tool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211242480.3A CN115619618A (en) 2022-10-11 2022-10-11 Image processing method and system based on high-level comprehensive tool

Publications (1)

Publication Number Publication Date
CN115619618A true CN115619618A (en) 2023-01-17

Family

ID=84862752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211242480.3A Pending CN115619618A (en) 2022-10-11 2022-10-11 Image processing method and system based on high-level comprehensive tool

Country Status (2)

Country Link
CN (1) CN115619618A (en)
WO (1) WO2024077833A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024077833A1 (en) * 2022-10-11 2024-04-18 中山大学 Image processing method and system based on high-level synthesis tool

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10579762B2 (en) * 2017-05-15 2020-03-03 LegUp Computing Inc. High-level synthesis (HLS) method and apparatus to specify pipeline and spatial parallelism in computer hardware
CN109840892A (en) * 2019-01-14 2019-06-04 苏州长风航空电子有限公司 A kind of infrared video Enhancement Method based on High Level Synthesis
CN110717852B (en) * 2019-06-13 2022-09-16 内蒙古大学 FPGA-based field video image real-time segmentation system and method
CN115619618A (en) * 2022-10-11 2023-01-17 中山大学 Image processing method and system based on high-level comprehensive tool

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024077833A1 (en) * 2022-10-11 2024-04-18 中山大学 Image processing method and system based on high-level synthesis tool

Also Published As

Publication number Publication date
WO2024077833A1 (en) 2024-04-18

Similar Documents

Publication Publication Date Title
CN110717852B (en) FPGA-based field video image real-time segmentation system and method
Gentsos et al. Real-time canny edge detection parallel implementation for FPGAs
Ma et al. Optimised single pass connected components analysis
CN110991560B (en) Target detection method and system combining context information
CN111223083B (en) Construction method, system, device and medium of surface scratch detection neural network
CN105976327B (en) Method for transforming a first image, processing module and storage medium
CN108765282B (en) Real-time super-resolution method and system based on FPGA
Rao et al. An efficient reconfigurable architecture and implementation of edge detection algorithm using Handle-C
CN115017931B (en) Batch QR code real-time extraction method and system
Jablonski et al. Handel-C implementation of classical component labelling algorithm
WO2024077833A1 (en) Image processing method and system based on high-level synthesis tool
US8326045B2 (en) Method and apparatus for image processing
AU2011265380B2 (en) Determining transparent fills based on a reference background colour
Tekleyohannes et al. A reconfigurable accelerator for morphological operations
CN105069764B (en) A kind of image de-noising method and system based on Edge track
CN117011655A (en) Adaptive region selection feature fusion based method, target tracking method and system
Schellhorn et al. Optimization of a principal component analysis implementation on Field-Programmable Gate Arrays (FPGA) for analysis of spectral images
Bailey Adapting algorithms for hardware implementation
CN113283429A (en) Liquid level meter reading method based on deep convolutional neural network
CN109859122A (en) A kind of isotropism filtering method and system
Nickel et al. High-performance AKAZE implementation including parametrizable and generic HLS modules
Vitabile et al. Efficient rapid prototyping of image and video processing algorithms
CN114694063B (en) Hardware implementation method and system for extracting and selecting feature points of video stream in real time
Davalle et al. Hardware accelerator for fast image/video thinning
Kundu Effectiveness of edge detection of color images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination