CN110866885B

CN110866885B - Template-configurable N-pixel parallel gray morphological filtering circuit and method

Info

Publication number: CN110866885B
Application number: CN201910975475.5A
Authority: CN
Inventors: 桑红石; 李强; 常诚; 姜庆峰; 高万苏维; 刘羽丰; 李玉涛
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-10-16
Filing date: 2019-10-16
Publication date: 2022-05-17
Anticipated expiration: 2039-10-16
Also published as: CN110866885A

Abstract

The invention discloses an IP module and method for filtering parallel gray-scale morphology by using a template configurable N pixel, belonging to the technical field of image processing. Aiming at the requirement that the IP implementation must reduce the storage resource consumption, a scheme for realizing two-dimensional morphological filtering by using one-dimensional morphological filtering is designed to reduce the consumption of the storage resource generated by a two-dimensional window during large-template filtering, the requirement is provided for a data analysis mode during one-dimensional morphological filtering column operation, and a column transposition circuit is designed to achieve the purpose that each period image operation circuit provides N adjacent column pixels. Aiming at the problem that the IP must improve the working frequency of a clock to meet the real-time performance of a system to the system, the reusability of an operation circuit in the process of large-template morphological filtering is excavated, the parallelism of morphological filtering is fully developed, and the data throughput rate of the gray-scale morphological filtering IP is improved. When the IP is configured to different structural element sizes, the extreme value operation circuits are the same and share one M-N +1 input extreme value circuit, so that the consumption of operation resources is reduced.

Description

N-pixel parallel gray morphological filtering circuit and method capable of being configured by template

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a filtering circuit and a filtering method for template-configurable N-pixel parallel gray-scale morphology.

Background

Image preprocessing is a necessary link in target detection and identification, and gray level morphological filtering is an important step in image preprocessing, so that the method has wide application requirements and scenes. In embedded image processing systems such as automatic target recognition and tracking, it is often necessary to perform gray-scale morphological filtering on an image to implement image preprocessing functions such as smoothing the image, highlighting a region of interest in the image, and delineating boundaries between different regions of the image. In an accurate guidance system, in order to achieve the purpose of accurately identifying buildings, gray morphological filtering needs to be called for multiple times to perform background suppression on an image, and an uninteresting region smaller than a structural element in the image is removed. And as the distance between the aircraft and the building changes, the size of the original image and the size of the required structural elements required to complete the grey scale morphological filtering also change.

The gray scale morphological filtering is to perform an operation of taking a minimum value of a gray scale difference or taking a maximum value of a gray scale sum on an image pixel coinciding with a structural element. After the extreme value operation is performed, the coordinate of the center of the structural element of the obtained new pixel is the coordinate of the new pixel, and the result of the filtering operation is the gray value of the new pixel. The image after gray scale morphological erosion/expansion filtering can be obtained only by accessing all pixels of the image to be processed in the center of the filter, and the gray scale morphological erosion/expansion operation can be called for two times continuously to obtain the result of the gray scale morphological open/close operation. Therefore, with the increase of structural elements and images to be processed, the complexity and the computational load for realizing the gray morphological filtering process are increased. For a 512 × 640 image, a Tms320c6455 DSP model is used to realize a 81 × 81 two-dimensional morphological erosion/dilation operation with a total of about 10.7 × 10⁸The time of the secondary comparison operation is about 101ms, and the two-dimensional morphology opening/closing operation with the structural element size of 81 multiplied by 81 is realized, which totally needs about 21.5 multiplied by 10⁸The secondary comparison operation takes about 202ms, while the embedded image processing system in the laboratory requires 20ms to complete the comparison operation on the graph with the size of 512 x 640All operations of the image include: collecting an original image, writing the original image into an FPGA, performing grey-scale morphological filtering, histogram statistics, marking, feature calculation, similarity judgment and other operations related to a connected region, performing image difference and Gaussian filtering operation, calculating and analyzing a connected region feature value, fusing feature data, tracking a single target and outputting a result. Therefore, the real-time requirement of the embedded image processing system is difficult to meet by using a software programming mode on a CPU or a DSP. At present, a hardware acceleration module is generally adopted, and the advantage of hardware parallelism is utilized to obtain a higher processing speed than software so as to meet the real-time requirement of an embedded image processing system on a gray morphological filtering module.

And performing two-dimensional morphological filtering on the image, generating a two-dimensional working window as a neighborhood according to the size of the structural element, and completing the operation of taking a minimum value of gray difference or a maximum value of gray sum in the neighborhood. In hardware implementation, as the size of the structural elements and the size of the image to be processed increase, the required hardware resources and running time increase. Therefore, the embedded image processing system with high real-time requirement and limited hardware resources, such as aerospace, unmanned driving, accurate guidance and the like, cannot be applied.

Disclosure of Invention

The invention provides a template-configurable N-pixel parallel gray-scale morphological filtering circuit and method, aiming at solving the problems of high hardware resource consumption and long operation time caused by the increase of structural elements during gray-scale morphological filtering in the prior art, and aiming at realizing multi-pixel parallel filtering, improving the processing speed, reducing redundant computation and reducing the hardware resource overhead.

To achieve the above object, according to a first aspect of the present invention, there is provided a template configurable N-pixel parallel grayscale morphological filter circuit, the template being a flat symmetric structural element, a maximum value of a configurable structural element size M being M, the filter circuit comprising:

the line analysis circuit is used for analyzing each frame of image data when the filter circuit is configured to perform line operation, and outputting N adjacent line pixels to the parallel operation circuit in each clock period;

the column transposition circuit is used for analyzing each frame of image data when the filter circuit is configured to perform column operation, and outputting N adjacent column pixels to the parallel operation circuit in each clock period;

the data splicing circuit is used for expanding the image when the number of the rows and the columns of the image is not an integral multiple of N so as to ensure that N adjacent pixels sent into the parallel operation circuit belong to the same row/column;

the parallel operation circuit includes: the device comprises a shift register, a window generation circuit and an extreme value operation circuit;

the shift register is used for caching the image data analyzed and output by the line analyzing circuit and the column transposition circuit;

a window generation circuit for fetching the buffered image data from the shift register according to the size of the currently configured structural element and the working mode of the filter circuit when the N pixels are filtered in parallel, and the W of the 1 x (M + N-1) one-dimensional working window can be filled with the shared image data in sequence_N～W_MAt the position, the non-sharable image data is sequentially filled to the remaining positions of the 1 x (M + N-1) one-dimensional working window;

and the extreme value operation circuit is used for taking the minimum value/maximum value in the pixel at each position of the one-dimensional working window according to the working mode of the filter circuit when the N pixels are filtered in parallel to obtain the N pixel parallel gray morphological filtering result.

Specifically, the column transposition circuit includes: the depth of the RAM required for storing image data is the image line number divided by the N and data right shift circuit after line processing, and the analysis process is as follows:

(1) reading image data, wherein the initial writing address of N RAMs is 0, the image data is written into one RAM in each period by adopting a sequential switching mode, after the image data is written into the Nth RAM, the writing address is added with 1, the image data is continuously written into the corresponding address of the N RAMs until the N RAMs are in a full state, and the image data reading is suspended;

(2) reading image data, wherein the initial reading address of the N RAMs is 0, if each RAM is in a full state, reading N data from the N simple dual-port RAMs in parallel each time, and otherwise, reading N data in parallel after the image data is written into the Nth RAM;

(3) when N adjacent pixels in the 1 st column are analyzed, the data right shift circuit shifts the output data of the simple dual port to the right by 0 bit, when N adjacent pixels in the 2 nd column are analyzed, the data right shift circuit shifts the output data of the simple dual port to the right by w bit, and as the number of the analyzed pixel columns increases, the number of bits of the right shift is added with w in sequence until the number of the analyzed pixels is the second

Completing the full state of the simple dual-port RAM by the column pixels, wherein w is the data bit width;

(4) and (4) judging whether all the pixels are analyzed completely, if so, ending, and otherwise, entering the step (1).

Specifically, the filling manner of the one-dimensional working window is as follows:

when M is more than or equal to (N-1) and less than or equal to M, filling the sharable data into the one-dimensional working window W_N～W_MAt a location, when reuse data is not sufficient to fill W_N～W_MAt position, window remaining position is filled 2^w-1Or 0 value, fill 2 during corrosion operation^w-1Fill 0 in the dilation operation, and fill W in order with the remaining data₁～W_M+N-1At a location;

when m is more than or equal to 1<(N-1), one-dimensional working window W_m+1～W_MFilling at position 2^w-1Or 0, erosion operation fill 2^w-1 Fill 0 in the dilation operation, and fill W in the rest of the data in order₁～W_M+N-1At the location.

Specifically, the extremum operation circuit includes: 1 (M-N +1) input extremum operation circuit, N (N-1) input extremum operation circuits and 1 (N +1) input extremum operation circuit; after the one-dimensional working window is filled, the data at the fixed position is taken and sent to the corresponding extreme value operation circuit, which is as follows:

when M is more than or equal to (N-1) and less than or equal to M, W in the one-dimensional working window_N～W_MThe position data is sent into the (M-N +1) input extremum operation circuit to obtain the extremum Y, and the input of N (N-1) input extremum operation circuitsData are respectively W₁～W_N-1The data of the position is obtained as an extreme value y₁，W₂～W_N-1And W_M+1The data of the position is obtained as an extreme value y₂Up to W_M+1～W_M+N-1Obtain the extreme value y_NAnd then Y is respectively added with Y₁,y₂,…,y_NComparing to obtain the parallel processing result of the N pixels;

when m is more than or equal to 1<(N-1), W in one-dimensional working window_N～W_MThe position data is sent into an (M-N +1) input extremum calculation circuit to obtain an extremum Y, and the input data of N (N-1) input extremum calculation circuits are respectively W₁～W_mThe residue value is input to 0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₁，W₂～W_mAnd W_M+1Data of position, remaining

value input

0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₂Up to W_M-m+1～W_M+N-1The residue value is input to 0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y_NAnd then Y is respectively added with Y₁,y₂,…,y_NAnd comparing to obtain the parallel processing result of the N pixels.

Specifically, when configured to different structural element sizes, the extremum operation circuits share one (M-N +1) input extremum operation circuit.

To achieve the above object, according to a second aspect of the present invention, there is provided a template-configurable N-pixel parallel grayscale morphological filtering method, the method comprising the steps of:

s1, when the number of rows and columns of an image is not an integral multiple of N, expanding the image to ensure that N adjacent pixels output during row operation/column operation belong to the same row/column;

s2, when the configuration is line operation, analyzing each frame of image data, and outputting and caching N adjacent line pixels in each clock period; when the configuration is column operation, analyzing each frame of image data, and outputting and caching N adjacent column pixels in each clock period;

s3, when the N pixels are filtered in parallel, according to the size and the working mode of the structural elements configured at present, the cached image data are taken out, and the shared image data can be sequentially filled into the W of the 1 x (M + N-1) one-dimensional working window_N～W_MAt the position, the non-sharable image data is sequentially filled to the rest positions of a 1 x (M + N-1) one-dimensional working window, wherein M is the maximum value of the size M of the configurable structural element;

and S4, when the N pixels are subjected to parallel filtering, according to the working mode, taking the minimum value/maximum value of the pixels at each position of the one-dimensional working window to obtain an N pixel parallel gray morphological filtering result.

Specifically, when the configuration is column operation, each frame of image data is analyzed, and N adjacent column pixels are output and buffered in each clock cycle, specifically as follows:

(3) when N adjacent pixels in the 1 st column are analyzed, the data right shift circuit shifts the output data of the simple double port to the right by 0 bit, when N adjacent pixels in the 2 nd column are analyzed, the data right shift circuit shifts the output data of the simple double port to the right by w bit, and with the increase of the number of the analyzed pixel columns, the number of bits shifted to the right is sequentially added with w until the second column is analyzed

when m is more than or equal to 1<(N-1), one-dimensional working window W_m+1～W_MFilling at position 2^w-1Or 0, erosion operation fill 2^w-1 Fill 0 in the dilation operation, and fill W in order with the remaining data₁～W_M+N-1At the location.

Specifically, according to the working mode, the minimum value/maximum value in the pixel at each position of the one-dimensional working window is taken to obtain an N-pixel parallel gray level morphological filtering result, which is specifically as follows:

when M is more than or equal to (N-1) and less than or equal to M, W in the one-dimensional working window_N～W_MThe position data is sent into an (M-N +1) input extremum operation circuit to obtain an extremum Y, and the input data of N (N-1) input extremum operation circuits are respectively W₁～W_N-1The data of the position is obtained as an extreme value y₁，W₂～W_N-1And W_M+1The data of the position is obtained as an extreme value y₂Up to W_M+1～W_M+N-1Obtain the extreme value y_NAnd then Y is respectively added with Y₁,y₂,…,y_NComparing to obtain the parallel processing result of the N pixels;

when m is more than or equal to 1<(N-1), W in one-dimensional working window_N～W_MThe position data is sent into an (M-N +1) input extremum calculation circuit to obtain an extremum Y, and the input data of N (N-1) input extremum calculation circuits are respectively W₁～W_mThe residue value is input to 0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₁，W₂～W_mAnd W_M+1Data of positionThe residue value is input to 0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₂Up to W_M-m+1～W_M+N-1The residue value is input to 0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y_NAnd then Y is respectively added with Y₁,y₂,…,y_NAnd comparing to obtain the parallel processing result of the N pixels.

Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:

(1) when the large-template parallel gray-scale morphological filtering IP is configured to be column operation, the analysis of image frame data is completed through the column transposition circuit, so that N adjacent columns of pixels are provided for the operation unit in each period to support the parallelization of the operation unit, the data passing rate is improved, and the time consumption is reduced.

(2) The one-dimensional working window adopts a specific filling mode, and can share image data to be sequentially filled into W of 1 (M + N-1) one-dimensional working window_N～W_MAt the position, the non-sharable image data are sequentially filled to the rest position of the 1 x (M + N-1) one-dimensional working window, the sharable image data can be fully utilized, the redundant calculation is reduced, and the consumption of calculation resources is reduced.

(3) When the IP is configured to different structural element sizes, the extreme value operation circuits share the operation result of one (M-N +1) input extreme value circuit, so that the consumption of operation resources is reduced.

(4) The large-template parallel gray level morphological filtering circuit designed by the invention is called to realize two-dimensional morphological filtering, so that the consumption of storage resources generated by a two-dimensional window during large-template filtering can be reduced, and the consumption of the storage resources is reduced.

Drawings

Fig. 1 is an overall block diagram of a template-configurable N-pixel parallel grayscale morphological filter circuit according to an embodiment of the present invention;

fig. 2 is a block diagram of a column transpose circuit according to an embodiment of the present invention;

FIG. 3 is a block diagram of a parallel computing circuit according to an embodiment of the present invention;

FIG. 4 is a diagram of an analysis of adjacent 8-pixel parallel reusable data provided by an embodiment of the present invention;

FIG. 5 is a diagram illustrating a two-dimensional erosion operation implemented by one-dimensional gray-scale morphological filtering according to an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating a two-dimensional on operation implemented by the one-dimensional gray-scale morphological filtering according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

First, terms and variables involved in the present invention are explained:

m: the size of the structural elements may be configured.

M: the maximum value of the configurable structural element size is an odd number which is not larger than the number of image rows and columns.

N: total number of parallel pixels.

w: the image data bit width.

A and B: the original image size is a rows and B columns.

m × n: two-dimensional morphological filtering structure element size.

An IP module: intellectual Property (IP) module.

Aiming at the requirement that the IP implementation must reduce the storage resource consumption, a scheme for realizing two-dimensional morphological filtering by using one-dimensional morphological filtering is designed to reduce the consumption of the storage resource generated by a two-dimensional window during large-template filtering, a requirement is provided for a data analysis mode during one-dimensional morphological filtering column operation, and a column transposition circuit is designed to achieve the purpose that each period image operation circuit provides 8 adjacent column pixels. Aiming at the problem that the IP must improve the working frequency of a clock to meet the real-time requirement of a system on the system, the reusability of an operation circuit in the process of large-template morphological filtering is excavated, the parallelism of morphological filtering is fully developed, and the data throughput rate of the gray-scale morphological filtering IP is improved. When the IP is configured to different structural element sizes, the extreme value operation circuits are the same and share one 74 input extreme value circuit, so that the consumption of operation resources is reduced.

As shown in fig. 1, the present invention provides a template-configurable N-pixel parallel grayscale morphological filter circuit, where the template is a flat symmetric structural element, and a maximum value of a configurable structural element size M is M, and the filter circuit includes:

and the line analysis circuit is used for analyzing each frame of image data when the filter circuit is configured to perform line operation, and outputting N adjacent line pixels to the parallel operation circuit every clock period.

And the column transposition circuit is used for analyzing each frame of image data when the filter circuit is configured to perform column operation, and outputting N adjacent column pixels to the parallel operation circuit in each clock period.

And the data splicing circuit is used for expanding the image when the number of the rows and the columns of the image is not an integral multiple of N so as to ensure that N adjacent pixels sent into the parallel operation circuit belong to the same row/column.

The parallel operation circuit includes: the device comprises a shift register, a window generation circuit and an extreme value operation circuit.

And the shift register is used for buffering the image data analyzed and output by the line analyzing circuit and the column transposition circuit.

A window generation circuit for fetching the buffered image data from the shift register according to the size of the currently configured structural element and the working mode of the filter circuit when the N pixels are filtered in parallel, and the W of the 1 x (M + N-1) one-dimensional working window can be filled with the shared image data in sequence_N～W_MAt that location, the non-shareable image data is sequentially filled to the remaining locations of the 1 × (M + N-1) one-dimensional work window.

In order to obtain a better result for the gray scale morphological filtering algorithm, the selection of the structural elements in the gray scale morphological filtering operation should satisfy two conditions: one is that the structuring element should not be geometrically more complex than the original image and the structuring element is bounded. Second, the subset of structural elements should be a convex subset. The morphological structural elements supported by the invention are as follows: the rectangular structural element with the value of 1 belongs to a flat structure and has a maximum of 81 multiplied by 81.

In the line analysis circuit, line operation is line corrosion or line expansion; in the column transpose circuit, the column operation is column erosion or column expansion. The filter circuit works in a corrosion or expansion mode.

As shown in fig. 2, the column transpose circuit includes: the depth of the RAM required for storing image data is the image line number divided by the N and data right shift circuit after line processing, and the analysis process is as follows:

(1) reading image data, wherein the initial writing address of the N RAMs is 0, the image data is written into one RAM in each period by adopting a sequential switching mode, after the image data is written into the Nth RAM, the writing address is added with 1, the image data is continuously written into the corresponding address of the N RAMs until the N RAMs are in a full state, and the image data reading is suspended.

(2) And reading the image data, wherein the initial reading address of the N RAMs is 0, if each RAM is in a full state, N data are read out from the N simple dual-port RAMs in parallel each time, and otherwise, after the image data are written into the Nth RAM, the N data are read out in parallel.

And (5) finishing the full state of the simple dual-port RAM by the column pixels, wherein w is the data bit width.

In this embodiment, N is 8. If the data bit width w is 8, each frame of data includes 32 pixels, i.e., 32 columns of pixels. The same column of 8 parallel pixels are obtained by taking the 0 th bit to the 7 th bit of the output result of the right shift circuit from pixel 1 to pixel 8 in the figure. When 8 adjacent pixels in the 1 st column are analyzed, the data right shift circuit shifts the output data of the simple dual port by 0 bit to the right. When 8 adjacent pixels in the 2 nd column are analyzed, the data right shift circuit shifts the output data of the simple dual port to the right by 8 bits. And (4) sequentially adding 8 bits of right shift along with the increase of the number of the analyzed pixel columns until the 32 th column of pixels are analyzed, and ending the full state of the simple double ports.

If the data bit width w is 16, each frame of data includes 16 pixels, i.e., 16 columns of pixels. The same column of 8 parallel pixels are obtained by taking the 0 th bit to the 15 th bit of the output result of the right shift circuit from pixel 1 to pixel 8 in the figure. When 8 adjacent pixels in the 1 st column are analyzed, the data right shift circuit shifts the output data of the simple dual port by 0 bit to the right. The data right shift circuit shifts the output data of the simple dual port to the right by 16 bits when resolving 8 adjacent pixels in the 2 nd column. And (4) adding 16 bits in sequence along with the increase of the number of the analyzed pixel columns until the 16 th column of pixels are analyzed, and finishing the full state of the simple double ports.

And continuing to read data from the data FIFO after the full state of the simple dual ports is finished, and circulating the steps until the used pixels are analyzed.

As shown in fig. 3, the parallel operation circuit includes: the device comprises a shift register, a window generation circuit and an extreme value operation circuit. The parallel operation circuit has the function of improving the data throughput rate of the large-template parallel gray morphological filter IP circuit by fully and repeatedly utilizing the operation result of the extreme value operation circuit.

Window generation circuit for generating a window at N pixelsWhen parallel filtering is carried out, according to the size of the structural element configured at present and the working mode of the filter circuit, the cached image data is fetched from the shift register, and the shared image data can be sequentially filled to W of a 1 x (M + N-1) one-dimensional working window_N～W_MAt that location, the non-shareable image data is sequentially filled to the remaining locations of the 1 × (M + N-1) one-dimensional work window.

As shown in fig. 4, when the size M of the structural element satisfies (N-1) < M, the pixels at the N-M positions in the one-dimensional working window are reusable. When the size m of the structural element meets the condition that m is more than or equal to 1 and less than or equal to (N-1), no reusable image pixel exists in the one-dimensional working window when N pixels are filtered in parallel. Therefore, in order to fully utilize reusable image data, reduce redundant calculation and reduce the consumption of computational resources, the invention adopts a specific mode to fill a one-dimensional working window, and the size m of the structural element can be arbitrarily configured into any value from 1 to 81.

The filling mode of the one-dimensional working window is as follows:

when M is more than or equal to (N-1) and less than or equal to M, filling the sharable data into the one-dimensional working window W_N～W_MAt a location, when reuse data is not sufficient to fill W_N～W_MAt position, window remaining position is filled 2^w-1Or 0 value, fill 2 during corrosion operation^w-1Fill 0 in the dilation operation, and fill W in order with the remaining data₁～W_M+N-1At the location.

The extremum operation circuit includes: 1 (M-N +1) input extremum operation circuit, N (N-1) input extremum operation circuits, and 1 (N +1) input extremum operation circuit. The (N-1) input extremum circuit is used for processing extremum of non-shared data of different pixels in a one-dimensional window, (M-N +1) input extremum circuit is used for processing extremum of different pixels in shared data, and (N +1) input extremum circuit is used for obtaining one-dimensional filtering results of different pixels. After the one-dimensional working window is filled, the data at the fixed position is taken and sent to the corresponding extreme value operation circuit, which is as follows:

when M is more than or equal to (N-1) and less than or equal to M, putting W in the one-dimensional working window_N～W_MThe position data is sent to (M- (N-1)) input extremum operation circuit to obtain extremum Y, and the input data of N (N-1) input extremum operation circuits are respectively W₁～W_N-1The data of the position is obtained as an extreme value y₁，W₂～W_N-1And W_M+1The data of the position is obtained as an extreme value y₂Up to W_M+1～W_M+N-1Obtain the extreme value y_N. Then, Y is respectively added with Y₁,y₂,…,y_NAnd comparing to obtain the parallel processing result of N pixels.

When m is more than or equal to 1<(N-1), W in one-dimensional working window_N～W_MThe position data is sent to the (M- (N-1)) input extremum calculation circuit to obtain an extremum Y, and the input data of the N (N-1) input extremum calculation circuits are respectively W₁～W_mThe residue value is input to 0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₁，W₂～W_mAnd W_M+1Data of position, remaining

value input

0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₂Up to W_M-m+1～W_M+N-1The residue value is input to 0 or 2^w-1 Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y_NAnd then Y is respectively added with Y₁,y₂,…,y_NAnd comparing to obtain the parallel processing result of N pixels.

Preferably, when the IP is configured to different structural element sizes, the extreme value operation circuits are all the same and share one (M- (N-1)) input extreme value operation circuit, and the consumption of operation resources can be reduced.

In this embodiment, N is 8, and M is 81. When the size m of the structural element is larger than or equal to 7, pixels at 8-81 positions of the one-dimensional working window are taken and sent into a 74-input extremum operation circuit, pixel data at positions of 1-7, 2-7 and 82, 3-7 and 82-83, 4-7 and 82-84, 5-7 and 82-85, 6-7 and 82-86, 7 and 82-87 and 82-88 are taken and respectively sent into 8 7-input extremum operation circuits, and output results of the 9 extremum operation circuits are sent into a 9-input extremum operation circuit to obtain parallel operation results.

When the size m of the structural element is 3, pixels at 8-81 positions of a one-dimensional working window are taken and sent into a 74 input extremum operation circuit, pixels at 1-7 positions, pixels at 2-7 and 82 positions, pixels at 3-7 and 82-83 positions, pixels at 4-7 and 82-84 positions, pixels at 5-7 and 83-85 positions are taken, an extra pixel (corrosion is 255 and expansion is 0) is added, pixels at 6-7 and 84-86 positions are added with two extra pixels (corrosion is 255 and expansion is 0), pixels at 7 and 85-87 positions are added with three extra pixels (corrosion is 255 and expansion is 0) and pixels at 86-88 positions are added with four extra pixels (corrosion is 255 and expansion is 0) and sent into 87 input extremum operation circuits respectively, and then the output results of 9 extremum operation circuits are sent into a 9 input extremum operation circuit, and obtaining a parallel operation result.

When the size of the structural element is 3< m <7, pixels at 8-81 positions of a one-dimensional working window are taken and sent to a 74 input extremum operation circuit, pixels at 1-7 positions, pixels at 2-7 and 82 positions, pixels at 3-7 and 82-83 positions, pixels at 4-7 and 82-84 positions, pixels at 5-7 and 82-85 positions, pixels at 6-7 and 82-86 positions, pixels at 7 and 83-87 positions, an additional pixel (corrosion is 255 and expansion is 0), and two additional pixels (corrosion is 255 and expansion is 0) at 84-88 positions are respectively sent to 87 input extremum operation circuits. And then the output results of the 9 extreme value operation circuits are sent to the 9 input extreme value operation circuits to obtain parallel operation results.

The erosion or expansion of the gray scale morphology is realized by taking a structural element b (m, n) as a filtering template, wherein m is the number of rows of a morphological filtering window, and n is the number of columns. Each pixel of the image f to be processed is accessed with the origin of b as the center, and the erosion result/expansion result of each pixel is determined by the minimum value (corresponding erosion) or the maximum value (corresponding expansion) of the pixel in the region overlapping with the structural element b.

The invention adopts a hardware acceleration scheme that independent modules are independently called, the modules are relatively independent without direct communication, an embedded image processing system loads images to a central memory according to frames, and a central control unit is controlled by sending instructions through a DSP to call different modules to form different processing flows according to different application scenes, so that the flexibility is very high, and meanwhile, when the modules are independently called, the central controller can provide high-bandwidth input data for the modules, and the processing speed advantage of a certain module can be exerted.

The large template parallel gray scale morphological filter IP is designed to perform gray scale morphological filtering with maximum size of 81 × 81 for gray scale images with maximum size of 512 × 640, and the gray scale pixel bit width is 8bit or 16bit, and the row and column size and the size of the structural elements of the processable image should be configurable, i.e. the size of the structural elements of the image with other row and column sizes within 512 × 640 and within 81 × 81 should also be capable of being calculated. The IP clock frequency reaches more than 150 MHz.

The central memory provides high-bandwidth image data to each functional IP per frame (256 bits per frame) every clock cycle, and the processing results of each functional IP are also written back to the central memory per frame. Preferably, the operation result is spliced into 256-bit output data by the output data splicing circuit and is output to the bus through the output FIFO.

When the data bit width (B × w) of one line of the original gray pixel image is not an integral multiple of 256, pixels included in some frame data in the central memory do not belong to the same line, and the processing result of the line operation needs to be subjected to output result data splicing to complete line alignment, even if each pixel of one frame data belongs to the same line. After output data splicing, if the original image pixel bit width is 8 during row processing, the number of image columns is expanded to be an integral multiple of 32; if the original image pixel bit width is 16, the number of image columns is extended to be an integer multiple of 16.

Furthermore, the calling of the large-template parallel gray level morphological filtering circuit designed by the invention needs to perform operation firstly and then perform column operation so as to realize two-dimensional morphological filtering, reduce the consumption of storage resources caused by two-dimensional window generation during large-template filtering, and further reduce the consumption of the storage resources.

As shown in fig. 5, when the one-dimensional grayscale morphological filtering implements the two-dimensional erosion operation, in order to obtain the processing result of the two-dimensional erosion operation, it is necessary to sequentially perform line erosion with a size of N structural elements and M-column erosion with a size of M structural elements on an image, at this time, the size of the image after filtering is N × M, the line and row are transposed compared with the original image, and an IP is also called once, at this time, the size of the structural element is configured to be 1, and the operation mode is the column erosion. Therefore, three times of large-template parallel gray scale morphological filtering IP are required to be called to complete the two-dimensional morphological corrosion/expansion.

As shown in fig. 6, when the one-dimensional grayscale morphological filter implements the two-dimensional opening operation, in order to obtain the filtering result of the two-dimensional opening operation, it is necessary to sequentially perform line erosion with a structural element size of n, column erosion with a structural element size of m, line expansion with a structural element size of m, and column expansion with a structural element size of n on the image. Therefore, four parallel gray scale morphological filters IP are required to complete the two-dimensional morphological on/off operation.

The data throughput rate of the IP circuit is calculated as follows:

IP data throughput rate is the total number of image pixels/total time an image is processed by IP.

TABLE 1

As can be seen from Table 1, the large-template parallel gray morphological filter IP is called under the 150MHZ working clock frequency to realize one-dimensional gray for an image with the size of 256 x 320At the time of morphological erosion/swelling, the data throughput rate is 986Mpx/s, i.e., 986X 10 treatment per second⁶When the two-dimensional gray scale morphological erosion/expansion is realized by one pixel, the data throughput rate is 328.6Mpx/s, namely 328.6 multiplied by 10 per second processing⁶When the two-dimensional gray scale morphology open operation or close operation is realized by one pixel, the data passing rate is 246.5Mpx/s, namely 246.5 multiplied by 10 is processed every second⁶And (4) a pixel. When one-dimensional gray scale morphological erosion/expansion is realized on an image with the size of 512 multiplied by 640, the data throughput rate is 984Mpx/s, namely, the processing rate is 984 multiplied by 10 per second⁶When the two-dimensional gray scale morphological erosion/expansion is realized by one pixel, the data passing rate is 328Mpx/s, namely 328 multiplied by 10 is processed every second⁶When the two-dimensional gray scale morphology on operation or off operation is realized by one pixel, the data passing rate is 246Mpx/s, namely 246 x 10 is processed per second⁶And (4) a pixel. The large template parallel gray scale morphological filtering IP has higher data throughput rate when the large template gray scale morphological filtering is completed.

TABLE 2

As can be seen from table 2, by comparing the implementation schemes of the grayscale morphological filtering in the method proposed by Mukherjee, Debasish, etc. and the method proposed by GibsonRM, etc., the implementation scheme of the grayscale morphological filtering of the present invention can support the maximum 81 × 81 structural elements, and has a great improvement in the operation speed and low power consumption.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A template-configurable N-pixel parallel grayscale morphological filter circuit, the template being a flat symmetric structuring element with a maximum configurable structuring element size M of M, the filter circuit comprising:

the extreme value operation circuit is used for taking the minimum value/maximum value in the pixels at each position of the one-dimensional working window according to the working mode of the filter circuit when the N pixels are filtered in parallel to obtain the N pixel parallel gray morphological filtering result;

the column transpose circuit includes: the depth of the RAM required for storing image data is the image line number divided by the N and data right shift circuit after line processing, and the analysis process is as follows:

(4) judging whether all pixels are analyzed completely, if so, ending, otherwise, entering the step (1);

the filling mode of the one-dimensional working window is as follows:

when m is more than or equal to 1 and less than (N-1), the one-dimensional working window W_m+1～W_MFilling at position 2^w-1Or 0, erosion operation fill 2^w-1Fill 0 in the dilation operation, and fill W in order with the remaining data₁～W_M+N-1At a location;

the extremum operation circuit includes: 1 (M-N +1) input extremum operation circuit, N (N-1) input extremum operation circuits and 1 (N +1) input extremum operation circuit; after the one-dimensional working window is filled, the data at the fixed position is taken and sent to the corresponding extreme value operation circuit, which is as follows:

when M is more than or equal to (N-1) and less than or equal to M, the one-dimensional working window is arrangedMiddle W_N～W_MThe position data is sent into an (M-N +1) input extremum operation circuit to obtain an extremum Y, and the input data of N (N-1) input extremum operation circuits are respectively W₁～W_N-1The data of the position is obtained as an extreme value y₁，W₂～W_N-1And W_M+1The data of the position is obtained as an extreme value y₂Up to W_M+1～W_M+N-1Obtaining extreme values yN, and respectively adding Y to Y₁，y₂，…，y_NComparing to obtain the parallel processing result of the N pixels;

when m is more than or equal to 1 and less than (N-1), W in the one-dimensional working window is divided into_N～W_MThe position data is sent into an (M-N +1) input extremum calculation circuit to obtain an extremum Y, and the input data of N (N-1) input extremum calculation circuits are respectively W₁～W_mThe residue value is input to 0 or 2^w-1Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₁，W₂～W_mAnd W_M+1Data of position, remaining value input 0 or 2^w-1Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₂Up to W_M-m+1～W_M+N-1The residue value is input to 0 or 2^w-1Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y_NAnd then Y is respectively added with Y₁，y₂，…，y_NComparing to obtain a parallel processing result of N pixels;

when the configuration is different in structural element size, the extreme value operation circuits share one (M-N +1) input extreme value operation circuit.

2. A template-configurable N-pixel parallel gray level morphological filtering method is characterized by comprising the following steps of:

s4, when the N pixels are filtered in parallel, according to the working mode, taking a minimum value/a maximum value in each position pixel of the one-dimensional working window to obtain an N pixel parallel gray morphological filtering result;

when the configuration is column operation, each frame of image data is analyzed, and N adjacent column pixels are output and buffered in each clock cycle, specifically as follows:

Column pixel, full state of simple dual port RAM is finished, wherein w is data bit width；

the filling mode of the one-dimensional working window is as follows:

according to the working mode, obtaining a minimum value/maximum value in each position pixel of the one-dimensional working window to obtain an N pixel parallel gray level morphological filtering result, which is as follows:

when M is more than or equal to (N-1) and less than or equal to M, W in the one-dimensional working window_N～W_MThe position data is sent into an (M-N +1) input extremum operation circuit to obtain an extremum Y, and the input data of N (N-1) input extremum operation circuits are respectively W₁～W_N-1The data of the position is obtained as an extreme value y₁，W₂～W_N-1And W_M+1The data of the position is obtained as an extreme value y₂Up to W_M+1～W_M+N-1Obtain the extreme value y_NAnd then Y is respectively added with Y₁，y₂，…，y_NComparing to obtain the parallel processing result of the N pixels;

when m is more than or equal to 1 and less than (N-1), W in the one-dimensional working window is divided into_N～W_MThe position data is sent into an (M-N +1) input extremum calculation circuit to obtain an extremum Y, and the input data of N (N-1) input extremum calculation circuits are respectively W₁～W_mThe residue value is input to 0 or 2^w-1Corrosion operation input 2^w-1When performing the expansion operation, 0 is input to obtainTo an extreme value y₁，W₂～W_mAnd W_M+1Data of position, remaining value input 0 or 2^w-1Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y₂Up to W_M-m+1～W_M+N-1The residue value is input to 0 or 2^w-1Corrosion operation input 2^w-1And 0 is input in the expansion operation to obtain an extreme value y_NAnd then Y is respectively added with Y₁，y₂，…，y_NComparing to obtain a parallel processing result of N pixels;