CN115967785A

CN115967785A - Data acceleration processing system based on FPGA (field programmable Gate array) affine inverse transformation

Info

Publication number: CN115967785A
Application number: CN202211640437.2A
Authority: CN
Inventors: 王攀; 夏永清; 卢孔照; 王国秀; 周俊
Original assignee: Zhejiang Dali Technology Co ltd
Current assignee: Zhejiang Dali Technology Co ltd
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-04-14

Abstract

The invention relates to a data acceleration processing system based on FPGA (field programmable gate array) affine inverse transformation, which belongs to the technical field of image processing and solves the problems that in the prior art, the time consumption is too long when the FPGA processes data and the affine reduction transformation cannot be realized due to the insufficient BRAM (block-based-redundancy) space. The data accelerated processing system comprises: the first external memory is used for storing a plurality of map filling sub-image blocks; the second external memory is used for storing the original image; the acceleration control module is used for controlling the pre-loading control module and the map filling control module to realize the pre-loading step and the addressing reading step of the odd column blocks and the even column blocks in a ping-pong operation mode; a preload control module for performing a preload step; a map filling control module for executing the addressing reading step; and the BRAM spaces comprise a first BRAM space and a second BRAM space which are respectively used for storing the corresponding second pre-loading subblocks of the odd column blocks and the even column blocks. The FPGA can quickly process the data and normally execute the affine reduction transformation.

Description

Data acceleration processing system based on FPGA (field programmable Gate array) affine inverse transformation

Technical Field

The invention relates to the technical field of image processing, in particular to a data acceleration processing system based on FPGA (field programmable gate array) affine inverse transformation.

Background

With the development of science and technology, people have stronger and stronger requirements on high definition videos. Scaling low resolution video to high definition video is a major issue. With the application of an FPGA (Field Programmable Gate Array), the realization of high-definition video based on the FPGA is becoming mainstream.

The affine transformation algorithm can realize the functions of rotation, translation, scaling and the like of the image, and is widely applied to the field of video image processing.

However, in the prior art, data reading and writing can only be independently performed for a Random Access Memory (BRAM) space, which results in that the time consumption of the FPGA for data processing is too long; meanwhile, when faced with image reduction transformation, large original image data needs to be loaded, and the BRAM space is insufficient to place the required original image data, resulting in data execution failure.

Disclosure of Invention

In view of the foregoing analysis, the embodiments of the present invention are directed to providing a data acceleration processing system based on an inverse FPGA affine transform, so as to solve the problems in the prior art that the time consumption for processing data by an FPGA is too long and an affine reduction transform cannot be implemented due to insufficient BRAM space.

The embodiment of the invention provides a data acceleration processing system based on FPGA (field programmable gate array) affine inverse transformation, which comprises:

the first external memory is used for storing a plurality of map filling sub-image blocks, the map filling sub-image blocks are divided into sub-image blocks according to a result image, and the map filling sub-image blocks are divided into odd column blocks and even column blocks;

the second external memory is used for storing the original image;

the acceleration control module is used for controlling the pre-loading control module and the map filling control module to realize the pre-loading step and the addressing reading step of the odd column block and the even column block in a ping-pong operation mode;

a preload control module for performing the preloading step, the preloading step comprising: determining a corresponding first pre-loading sub-tile block in the original image according to the odd/even column blocks and an affine transformation inverse matrix, obtaining the first pre-loading sub-tile block from the second external memory, calculating a second pre-loading sub-tile block based on the first pre-loading sub-tile block, storing the second pre-loading sub-tile block corresponding to the odd column block into a first BRAM space, and storing the second pre-loading sub-tile block corresponding to the even column block into a second BRAM space;

a map filling control module for executing the addressing reading step, wherein the addressing reading step comprises: sequentially traversing each pixel point in the odd column block/even column block, determining a pixel value of each pixel point according to the affine transformation inverse matrix and a second pre-loading subblock block in the first BRAM space/second BRAM space corresponding to the odd column block/even column block, and sending the pixel value to the first external memory for storage;

and the BRAM spaces comprise the first BRAM space and the second BRAM space and are respectively used for storing the second pre-loading subblocks corresponding to the odd column blocks and the even column blocks.

Based on the further improvement of the above system, the data acceleration processing system further comprises:

and the coordinate generator is used for generating the coordinate of any pixel point in the original image and the result image and generating the coordinate of any pixel point in the second pre-loading sub-vector block.

the buffer register is used for buffering the pixel values in the first pre-loading sub-pixel block sent by the second external memory and transmitting the corresponding pixel values to the pre-loading control module according to the control instruction of the pre-loading control module, and the pre-loading control module obtains the pixel values of each point of the second pre-loading sub-pixel block according to the obtained pixel values of the first pre-loading sub-pixel block.

Based on a further improvement of the above system, the determining a corresponding first pre-loading subblock block in the original image according to the odd/even subblocks and an inverse affine transformation matrix comprises:

determining a matting sub-tile block according to the odd/even column blocks and the inverse affine transformation matrix, and determining a first pre-loading sub-tile block according to the matting sub-tile block.

In a further refinement of the above system, the calculating a second preload sub-block based on the first preload sub-block comprises:

determining whether the size of the first pre-loaded subblock block is larger than the size of the first/second BRAM space; if the judgment result is yes, determining a pre-reduction multiplying power parameter according to the sizes of the first pre-loading sub-pixel block and the first BRAM space/the second BRAM space, and reducing the first pre-loading sub-pixel block based on the pre-reduction multiplying power parameter to obtain a second pre-loading sub-pixel block; if not, the first preload bitmap block is taken as the second preload bitmap block.

Based on a further improvement of the above system, the determining a pixel value of each pixel point according to a second pre-loading subblock block in the first BRAM space or a second BRAM space corresponding to the inverse affine transformation matrix and the odd/even column block comprises:

obtaining coordinates of corresponding pixels in the original image according to the coordinates of each pixel and the affine transformation inverse matrix, and multiplying the coordinates of the pixels in the original image by the first coordinates of the first pre-load sub-image block to obtain first coordinates;

and determining the pixel value of each corresponding pixel point according to the first coordinate.

Based on the further improvement of the above system, the step of performing the pre-loading and the step of addressing and reading the odd column block and the even column block by using the ping-pong operation method specifically includes:

performing all map filling sub-blocks in the result image in the order of left to right and top to bottom, and performing the pre-loading step and the addressing reading step on the odd column blocks/even column blocks based on a ping-pong operation;

while performing the pre-loading step on odd/even column blocks, the address reading step is performed on even/odd column blocks at the same time.

In a further refinement of the above system, the determining the matting sub-tile from the odd/even column blocks and the inverse affine transformation matrix comprises:

determining map filling coordinates corresponding to the odd column blocks/even column blocks; the map filling coordinates comprise four map filling vertex coordinates corresponding to the odd column blocks/even column blocks;

determining, based on the affine inverse transformation, matting coordinates of the matting sub-tile according to the four fill-up vertex coordinates and the affine transformation inverse matrix, the matting coordinates including four matting vertex coordinates;

and determining the matting sub-image blocks according to the four matting vertex coordinates.

In a further improvement of the above system, the determining whether the size of the first preload subblock is larger than the size of the first/second BRAM space includes:

determining a length and width of the first pre-load subblock block and determining a length and width of the first/second BRAM spaces;

calculating a length ratio of a length of the first pre-loaded subblock and a length of the first BRAM space/second BRAM space and calculating a width ratio of a width of the first pre-loaded subblock and a width of the first BRAM space/second BRAM space; the first BRAM space and the second BRAM space are the same in size;

if any one of the length ratio and the width ratio is larger than 1, the judgment is yes, otherwise, the judgment is no.

In a further improvement of the above system, the determining a pre-scaling factor parameter according to the sizes of the first pre-loaded subblock block and the first BRAM space/second BRAM space includes:

judging whether the length ratio is larger than the width ratio or not; if yes, determining the pre-reduction magnification parameter according to the length ratio; otherwise, determining the pre-reduction magnification parameter according to the width ratio; the pre-reduction magnification parameter is N/M, wherein N and M are both integers of [1,6], and M > N.

Compared with the prior art, the invention can realize at least one of the following beneficial effects:

1. judging whether the size of original image data (a first preloaded sub-tile block) needing to be loaded is larger than that of a BRAM space; if the judgment result is yes, the original image data is reduced according to the sizes of the original image data and the BRAM space, and the reduced original image data is stored in the BRAM space, so that the original image data is normally stored in the BRAM space, and the reduction transformation in the affine transformation can be successfully executed.

2. The original image data needing to be loaded is determined based on the affine inverse transformation through the map filling coordinates of the map filling sub-image blocks, and the accuracy of calculating the original image data needing to be loaded is improved.

3. According to the length ratio and the width ratio of the original image data to be loaded and the BRAM space, the pre-reduction magnification parameter is further determined, so that when the original image data is reduced, the reduction range is ensured, and the loss of the original image data is reduced.

4. The acceleration control module controls the pre-loading control module and the map filling control module to realize the pre-loading step and the addressing reading step on the odd column blocks and the even column blocks in a ping-pong operation mode, so that the FPGA can quickly process data, and the processing time delay is reduced.

In the invention, the technical schemes can be combined with each other to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.

Fig. 1 is a schematic structural diagram of a data acceleration processing system based on an FPGA affine inverse transformation according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a result image divided into map-filling sub-blocks according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a cutout sub-block corresponding to the fill sub-block according to an embodiment of the present invention;

FIG. 4 is a second schematic diagram of a cutout sub-block corresponding to the fill sub-block according to the embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating interpolation expansion for a first preload subblock block according to an embodiment of the invention;

FIG. 6 is a schematic diagram illustrating a reduction in inter-sampling for an intermediate sub-tile according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a first coordinate according to an embodiment of the present invention;

fig. 8 is a second schematic structural diagram of the first coordinate according to the embodiment of the present invention.

Detailed Description

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.

One embodiment of the present invention discloses a data accelerated processing system based on an FPGA affine inverse transform, as shown in fig. 1, the data accelerated processing system includes:

the second external memory is used for storing the original image;

a map filling control module for executing the addressing reading step, wherein the addressing reading step comprises: sequentially traversing each pixel point in the odd column block/even column block, determining a pixel value of each pixel point according to the affine transformation inverse matrix and a second pre-loading sub-pixel block in the first BRAM space/second BRAM space corresponding to the odd column block/even column block, and sending the pixel value to the first external memory for storage;

and the BRAM space comprises the first BRAM space and the second BRAM space and is used for storing a second pre-loading sub-vector block corresponding to the odd column block and the even column block respectively.

Specifically, when the FPGA is used to process a video image, original image pixel Data needs to be stored in a BRAM space in the FPGA chip from a DDR (Double Data Rate) memory configured outside the FPGA, and then the original image pixel Data is mapped into a result image according to an affine transformation matrix, and the result image is also stored in the DDR.

It should be noted that the first external memory is used to store pixel value data of the result image, the second external memory is used to store pixel value data of the original image, and the first external memory and the second external memory may be the same DDR externally configured by the FPGA or different DDRs. Specifically, the second external memory is connected with the preload control module through an AXI4 bus, and the first external memory is connected with the map filling control module through an AXI4 bus. AXI (Advanced eXtensible Interface) is a Bus protocol, which is the most important part of the AMBA (Advanced Microcontroller Bus Architecture) 3.0 protocol proposed by ARM corporation, and is an on-chip Bus oriented to high performance, high bandwidth and low latency.

Specifically, in the first external memory, the result image is divided into a plurality of sub-image blocks as map-filled sub-image blocks, and the map-filled sub-image blocks can be divided into odd-column blocks and even-column blocks according to the rows and columns in which the map-filled sub-image blocks are located. After the size of each map-filling sub-block is designed, it is not changed. Illustratively, as shown in fig. 2, the resulting image has a size of 1280 by 1024, a length of 1280 pixels and a width of 1024 pixels; the size of the map-filling sub-block is 128 × 128, the length of the map-filling sub-block is 128 pixels, and the width of the map-filling sub-block is 128 pixels; and dividing the result image into 10 map filling sub-blocks in length and 8 map filling sub-blocks in width, and dividing the result image into 80 map filling sub-blocks in total.

Specifically, the second external memory is configured to store a plurality of matting sub-blocks, and the plurality of matting sub-blocks form an original image.

Preferably, as shown in fig. 1, the data acceleration processing system further includes:

It can be understood that the coordinate generator performs coordinate generation and positioning on all the pixel points in the result image and the original image by using a two-dimensional coordinate system.

It can be understood that the two-dimensional coordinate system is adopted for the result image and the original image to locate all the pixel points.

For affine transformation in a two-dimensional coordinate system, the inverse matrix comprises:

1. the two-dimensional translation 3*3 control matrix is:

in this matrix, tx and ty are shifted pixel distances and correspond to the arrow direction of the coordinate axis of X, Y. For the inverse transformation, its positive translation is exactly opposite to the arrow direction of the coordinate axis and is therefore indicated by a negative sign.

2. The two-dimensional rotating 3*3 control matrix is:

in this matrix theta represents the rotation angle and the sign of the sinusoidal parameters of this matrix in the inverse transform is exactly opposite to that of the forward transform.

3. The two-dimensional scaled 3*3 control matrix is:

in this matrix, sx and Sy are magnifications, and since the magnification of the forward transform is a reduction of the inverse transform, the magnification here is in the form of the reciprocal.

According to the affine transformation inverse matrix, the image coordinates of the result image and the original image are rearranged, it is worth explaining that the pixel point coordinates of the result image are mapped to the coordinates of the original image, then the pixel values on the coordinates of the original image are carried to the pixel point coordinates of the result image, and the matrix operation expression is as follows:

wherein, the coordinate (u, v) is the coordinate of any pixel point in the result image, the coordinate (x, y) is the corresponding coordinate in the original image obtained according to the affine transformation inverse matrix,

is an affine transformation inverse matrix. />

It should be noted that the image transformation is realized by changing the parameters in the control matrix, and certainly, the purpose of simultaneously performing multiple transformations in one affine transformation process can also be realized after mixing multiple deformation parameters.

Specifically, in the acceleration control module, a pre-loading step and an addressing reading step are carried out on the odd column blocks and the even column blocks based on a ping-pong operation; the pre-loading step is to store the pixel value data of the pixel points of the sub-image blocks of the original image corresponding to the map filling sub-image blocks (odd column blocks/even column blocks) divided from the first external memory result image into the corresponding BRAM space; the addressing reading step is to fill the pixel value data stored in the corresponding BRAM space into each pixel point in a map filling sub-image block (odd column block/even column block) divided from the result image, so that each pixel point in the map filling sub-image block is filled into the corresponding pixel value data.

Preferably, the step of performing the pre-loading and the step of addressing and reading the odd column block and the even column block by using the ping-pong operation is specifically:

performing all map filling sub-blocks in the result image in a sequence from left to right and from top to bottom, and performing the pre-loading step and the addressing reading step on the odd column blocks/even column blocks based on a ping-pong operation;

Specifically, as shown in fig. 2, the result image is divided into 10 map-filling sub-tiles with a length of 10 and 8 map-filling sub-tiles with a width, all the map-filling sub-tiles divided in the result image are executed in the order from left to right and from top to bottom, and the result image totally comprises 40 odd column blocks and 40 even column blocks. In the map-filling sub-blocks in the first row, the 1 st, 3 rd, 5 th, 7 th and 9 th map-filling sub-blocks are odd column blocks, and the 2 nd, 4 th, 6 th, 8 th and 10 th map-filling sub-blocks are even column blocks; similarly, in the second row of map-filling sub-blocks, the 11 th map-filling sub-block is an odd column block, the 12 th map-filling sub-block is an even column block, and so on. It should be noted that, in the present invention, the execution order of all the map-filling sub-blocks in the result may also be according to other orders, and it is only necessary to execute all the map-filling sub-blocks in a certain order.

The following are exemplary:

in the 1 st execution cycle, the acceleration control module controls the preloading control module to execute the preloading step on the 1 st map filling sub-block;

in the 2 nd execution cycle, the acceleration control module controls the preloading control module to execute the preloading step on the 2 nd filling sub-image block, and simultaneously controls the filling control module to execute the addressing reading step on the 1 st filling sub-image block;

in the 3 rd execution cycle, the acceleration control module controls the preloading control module to execute the preloading step on the 3 rd map filling sub-block, and simultaneously controls the map filling control module to execute the addressing reading step on the 2 nd map filling sub-block;

……

in the 80 th execution cycle, the acceleration control module controls the preloading control module to execute the preloading step on the 80 th filling sub-block, and simultaneously controls the filling control module to execute the addressing reading step on the 79 th filling sub-block;

in the 81 st execution cycle, the acceleration control module controls the map filling control module to perform the address reading step on the 80 th map filling sub-block.

It is worth to say that, in the embodiment of the present invention, the acceleration control module controls the preload control module and the map filling control module to perform the preloading step and the addressing reading step on the map filling sub-blocks divided from the result image in a ping-pong operation manner, so as to accelerate the processing speed of the FPGA on the data, achieve the rapid processing of the FPGA on the data, and reduce the processing delay.

Specifically, the preloading step is performed by a preloading control module, and comprises: and determining a corresponding first pre-loading sub-block in the original image according to the odd/even column blocks and the affine transformation inverse matrix, acquiring the first pre-loading sub-block from a second external memory, calculating a second pre-loading sub-block based on the first pre-loading sub-block, storing the second pre-loading sub-block corresponding to the odd column block into the first BRAM space, and storing the second pre-loading sub-block corresponding to the even column block into the second BRAM space. For example, as shown in fig. 2, the second pre-loaded sub-tile corresponding to the 1 st, 3 rd, 5 th, 7 th, … and 79 th map sub-tile is stored in the first BRAM space, and the second pre-loaded sub-tile corresponding to the 2 nd, 4 th, 6 th, 8 … and 80 th map sub-tile is stored in the second BRAM space.

It is worth to be noted that the first BRAM space and the second BRAM space are the same in size design, and the first BRAM space and the second BRAM space are two independent BRAM spaces, and both belong to the on-chip BRAM space of the FPGA.

Preferably, the determining a corresponding first pre-loading subblock block in the original image according to the odd/even column blocks and an affine transformation inverse matrix comprises:

Specifically, a cutout sub-block is calculated according to the odd/even column block and the inverse affine transformation matrix, the cutout sub-block is a sub-block in the original image, and the cutout sub-block is a corresponding sub-block of the fill sub-block in the original image.

Preferably, the determining a matting sub-tile according to the odd/even column block and the affine transformation inverse matrix includes:

determining, based on an inverse affine transformation, matting coordinates of the matting sub-tile from the four fill vertex coordinates and the inverse affine transformation matrix, the matting coordinates including four matting vertex coordinates;

Specifically, as shown in fig. 3, the map-filling sub-tile block is any one sub-tile block in the result image, the map-filling sub-tile block may be an odd-column block or an even-column block, and in combination with the two-dimensional coordinate system of the result image, four corresponding map-filling vertex coordinates Q1, Q2, Q3, and Q4 may be determined for the map-filling sub-tile block.

Based on the affine inverse transformation, according to the four fill-in vertex coordinates Q1, Q2, Q3 and Q4, the four corresponding matting vertex coordinates Q1', Q2', Q3 'and Q4' in the original image are respectively determined through affine inverse transformation matrix calculation, and as shown in fig. 3, the matting sub-image blocks are corresponding sub-image blocks of the fill-in sub-image blocks in the original image.

It is worth to be noted that, the DDR configured outside the FPGA belongs to burst length reading, and if the pixel data corresponding to the cutout sub-blocks are to be directly extracted from the DDR, a great time sequence waste is generated. Therefore, a first pre-loading sub-image block is determined according to the matting sub-image block of the original image, and the first pre-loading sub-image block is pixel value data which needs to be stored into the BRAM space from the second external memory DDR.

Preferably, the determining a first pre-loaded sub-tile according to the matting sub-tile comprises:

determining the maximum value and the minimum value of the matting sub-image block on two coordinate axes according to the four matting vertex coordinates of the matting sub-image block;

determining four vertex coordinates of the first preload bitmap block according to the maximum value and the minimum value on the two coordinate axes:

wherein A, B, C and D represent the four vertex coordinates of the first preload sub-tile, xmax represents the maximum of the matting sub-tile in the X-axis, xmin represents the minimum of the matting sub-tile in the X-axis, ymax represents the maximum of the matting sub-tile in the Y-axis, and Ymin represents the minimum of the matting sub-tile in the Y-axis.

Specifically, as shown in fig. 3, the coordinates Q1', Q2', Q3 'and Q4' of the four matting vertexes have a maximum value Xmax and a minimum value Xmin on the X axis; the maximum value on the Y-axis is Ymax and the minimum value is Ymin. Four vertex coordinates of the first preloaded vector block are

It should be noted that, when the pixel values of the pixels are read from the DDR, the coordinates of each pixel are integers.

It is understood that the rectangle enclosed by the first pre-loaded sub-tile segment includes a cutout region formed by the cutout sub-tile segments, and the four vertex coordinates of the first pre-loaded sub-tile segment are integers.

Preferably, said calculating a second preload bitmap based on said first preload bitmap comprises:

Specifically, the first BRAM space and the second BRAM space are equally sized, and the size of the BRAM space is generally designed to be 1.5-2 times that of the map filling sub-block. Illustratively, the size of the map-filling sub-tile is 128 × 128, and the size of the bram space is designed to be 256 × 256. After the size of the BRAM space is determined, no change is made.

It should be noted that when the original image is reduced, a large area of the original image may need to be loaded when loading pixel data from the original image according to the map-filling sub-block of the result image, and the BRAM space determined by design may not be enough to place the required pixel data of the original image, which may result in data execution failure. Therefore, the present invention compares the size of the first preloaded subblock block with the size of the BRAM space before storing the first preloaded subblock block in the BRAM space.

As shown in fig. 3, when the size of the first pre-loaded sub-tile is larger than the corresponding size of the first/second BRAM space, the first pre-loaded sub-tile is scaled down according to the pre-scaling factor parameter to obtain a second pre-loaded sub-tile, so that the first/second BRAM space is larger than the second pre-loaded sub-tile, which enables the second pre-loaded sub-tile to be normally placed in the first/second BRAM space, thereby making the data execution successful.

As shown in fig. 4, when the size of the first pre-loaded sub-tile is smaller than or equal to the size of the first BRAM space/the second BRAM space, the first pre-loaded sub-tile is directly used as the second pre-loaded sub-tile.

Preferably, the determining whether the size of the first pre-loaded subblock block is larger than the size of the first/second BRAM space includes:

calculating a length ratio of a length of the first pre-load subblock block and a length of the first/second BRAM space and calculating a width ratio of a width of the first pre-load subblock block and a width of the first/second BRAM space; the first BRAM space and the second BRAM space are the same in size;

if any one of the length ratio and the width ratio is larger than 1, judging as yes, otherwise, judging as no.

Specifically, the first BRAM space and the second BRAM space are the same in design size, and the size of the first BRAM space and the second BRAM space is fixed and constant, such as L × W; l is the length, typically 256 pixels, and W is the width, typically 256 pixels. The size of the first preloaded subblock block is determined by calculation according to the four vertex coordinates A, B, C and D of the first preloaded subblock block, the length of the first preloaded subblock block is

Each pixel having a width->

And (4) a pixel.

Specifically, the length ratio of the length of the first pre-loading subblock block to the length of the first BRAM space/the second BRAM space is

The ratio of the width of the first pre-loaded subblock to the width of the first/second BRAM space is->

In particular, the length ratios are compared

And 1, comparison of the width ratio

And 1, if any one of the length ratio and the width ratio is greater than 1, judging as yes, otherwise, judging as no.

Specifically, a pre-reduction magnification parameter is determined according to the sizes of the first pre-load subblock block and the first/second BRAM space.

Preferably, the determining a pre-reduction magnification parameter according to the sizes of the first pre-loaded sub-tile and the first/second BRAM space comprises:

Specifically, the length ratio is judged

And width ratio>

Of (c) is used. If the length ratio value>

Greater than the width ratio

Based on the length ratio->

Determining a pre-reduction magnification parameter; if the length ratio value>

Is less than or equal to the width ratio->

Based on the width ratio->

And determining a pre-reduction magnification parameter.

Preferably, the pre-reduction magnification parameter is N/M, where N and M are both integers of [1,6], and M > N.

It is worth noting that when the length ratio is

Greater than width ratio>

During the process, the length ratio is reciprocal, the reciprocal value is simplified into a fractional form, and the fractional value is close to the reciprocal value as much as possible on the basis that the fractional value is not larger than the reciprocal value, namely the pre-reduction multiplying power parameter N/M, so that both N and M are [1,6]]And M is an integer of>And N is added. Similarly, when the length ratio is->

Is less than or equal to the width ratio->

Taking reciprocal of the width ratio, simplifying the reciprocal value into a fractional form, enabling the fractional value to be close to the reciprocal value as much as possible on the basis that the fractional value is not larger than the reciprocal value, wherein the fractional value is a pre-reduction multiplying power parameter N/M, and enabling N and M to be [1,6 [ ]]And M is an integer of>N。

For example, when the calculated length ratio is 1.6 and the calculated width ratio is 1.2, the pre-reduction magnification parameter may be 3/5 according to the length ratio. Specifically, the reciprocal of the length ratio is 1/1.6=0.625, and then the integers N and M are selected so that N/M is not greater than 0.625 and is as close to 0.625 as possible, so that N is 3,M is selected as 5, that is, the reciprocal is reduced to a fraction of 3/5, and the pre-reduction magnification parameter is 3/5.

Preferably, the down scaling the first pre-loaded sub-tile based on the pre-scaling factor parameter to obtain a second pre-loaded sub-tile comprises:

inserting N-1 pixel points between two adjacent pixels of each row in the first pre-loaded sub-tile block, and then inserting N-1 pixel points between two adjacent pixels of each column in the first pre-loaded sub-tile block, thereby expanding the first pre-loaded sub-tile block by N times to obtain an intermediate sub-tile block;

and in the middle sub-image block, selecting one line as a line to be sampled every M-1 lines, and selecting one pixel point as a sampling point every M-1 points in each line to be sampled, so that the middle sub-image block is reduced by M times to obtain a second pre-loading sub-image block.

Specifically, as shown in fig. 5, exemplary: when N is 2, 1 pixel point is inserted between two adjacent pixels in each row of the first pre-loaded sub-tile block, and then 1 pixel point is inserted between two adjacent pixels in each column of the first pre-loaded sub-tile block, so that the first pre-loaded sub-tile block is expanded by 2 times to obtain a middle sub-tile block; when N is 3, 2 pixel points are inserted between two adjacent pixels in each row of the first pre-loaded sub-tile block, and then 2 pixel points are inserted between two adjacent pixels in each column of the first pre-loaded sub-tile block, so that the first pre-loaded sub-tile block is expanded by 3 times to obtain a middle sub-tile block. The original pixels represent pixel points in the first pre-loading sub-pixel block, and the interpolation pixels represent inserted pixel points.

Preferably, when a pixel is inserted into the first pre-loaded sub-vector block, the expression of the pixel value of the interpolated pixel is:

Dix_j＝DixJ-j[(DixJ-DixK)/N]；

DixJ and DixK respectively represent pixel values of two adjacent pixel points J and K in the first pre-loading sub pixel block, dix _ J represents a pixel value of a jth pixel point in N-1 pixel points sequentially inserted from the pixel point J to the pixel point K, and J is larger than or equal to 1 and smaller than or equal to (N-1).

Specifically, as shown in fig. 5, exemplary: when N is 2, inserting 1 pixel point between two adjacent original pixels in the first row of the first pre-loaded halftone block, from left to right, the pixel values of the two adjacent original pixels are 20 and 30 respectively, and then the pixel value of the inserted pixel point is 20-1 × [ (20-30)/2 ], that is, the pixel value of the inserted 1 pixel point is 25; when N is 3, in the first row, 2 pixels are inserted between two adjacent original pixels, and from left to right, the pixel values of the two adjacent original pixels are 20 and 30, respectively, and then the pixel values of the inserted pixels are 20-1 × [ (20-30)/3 ] and 20-2 × [ (20-30)/3 ], that is, the pixel values of the inserted 2 pixels are 23 and 26.

Specifically, as shown in fig. 5, exemplary: when N is 2, inserting 1 pixel point between two adjacent original pixels in the first column in the first pre-loading sub-pixel block, from top to bottom, the pixel values of the two adjacent original pixels are respectively 30 and 40, and the pixel value of the inserted pixel point is 30-1 × [ (30-40)/2 ], that is, the pixel value of the inserted 1 pixel point is 35; when N is 3, 2 pixels are inserted between two adjacent original pixels in the first column of the first preloaded sub-pixel block, and from top to bottom, the pixel values of the two adjacent original pixels are respectively 30 and 40, and then the pixel values of the inserted pixels are 30-1 × [ (30-40)/3 ] and 30-2 × [ (30-40)/3 ], that is, the pixel values of the inserted 2 pixels are 33 and 36.

Specifically, as shown in fig. 6, exemplary: when M is 2, in the middle sub-image block, selecting one line as a line to be sampled at intervals of 1 line, and in each line to be sampled, selecting one pixel point as a sampling point at intervals of 1 point, thereby reducing the middle sub-image block by 2 times to obtain a second pre-loaded sub-image block; and when M is 3, selecting one row as a row to be sampled at every 2 rows in the middle sub-image block, and selecting one pixel point as a sampling point at every 2 points in each row to be sampled, thereby reducing the middle sub-image block by 3 times and obtaining a second pre-loading sub-image block. Pixels are taken to represent pixels taken as points, and discarded pixels represent pixels not sampled.

Compared with the prior art, the data acceleration processing system based on the FPGA affine inverse transformation provided by the embodiment of the invention has the advantages that the pre-loading step and the addressing reading step are carried out on the map filling sub-image blocks divided from the result image through the ping-pong operation, the processing speed of the FPGA on the data is accelerated, the rapid processing of the FPGA on the data is realized, and the processing time delay is reduced; judging whether the size of original image data needing to be loaded is larger than the size of a BRAM space; if yes, reducing the original image data according to the sizes of the original image data and the BRAM space, and storing the reduced original image data into the BRAM space, so that the technical problem that affine reduction transformation cannot be realized due to the fact that the original image is too large and the BRAM space is too small in the affine transformation process is solved; determining the original image data to be loaded based on the affine inverse transformation through the map filling coordinates of the map filling sub-image blocks, and improving the accuracy of calculating the original image data to be loaded; according to the length ratio and the width ratio of the original image data to be loaded and the BRAM space, the pre-reduction magnification parameter is further determined, so that when the original image data is reduced, the reduction range is ensured, and the loss of the original image data is reduced.

Specifically, the BRAM space includes a first BRAM space and a second BRAM space for storing a second pre-load sub-tile corresponding to the odd column block and the even column block, respectively.

When the size of the first pre-loaded sub-pixel block is larger than that of the BRAM space, the first pre-loaded sub-pixel block is interpolated and sampled at intervals to obtain a second pre-loaded sub-pixel block; when the size of the first pre-loading sub-pixel block is not larger than that of the BRAM space, each pixel point of the obtained first pre-loading sub-pixel block is directly used as each pixel point of the second pre-loading sub-pixel block, and the pixel value of each pixel point in the second pre-loading sub-pixel block is stored in the BRAM space.

In the map filling control module, the address reading step includes: and sequentially traversing each pixel point in the odd column block/even column block, determining the pixel value of each pixel point according to the affine transformation inverse matrix and a second pre-loading subblock block in the first BRAM space/second BRAM space corresponding to the odd column block/even column block, and sending the pixel value to the first external memory for storage.

Preferably, the determining the pixel value of each pixel point according to the affine transformation inverse matrix and a second pre-loading sub-vector block in the first BRAM space or a second BRAM space corresponding to the odd/even column block includes:

obtaining coordinates of corresponding pixels in the original image according to the coordinates of each pixel and the affine transformation inverse matrix, subtracting the first coordinates of the first pre-loading sub-pixel block from the coordinates of the pixels in the original image, and multiplying the first coordinates by the pre-reduction multiplying factor parameter to obtain first coordinates;

Specifically, for any pixel point coordinate (u, v) in the map-filling sub-block, the coordinate (u ', v') corresponding to the original image is obtained according to the inverse matrix calculation of the radial transformation, and it is worth explaining that (u, v) and (u ', v') both belong to coordinate points in the same coordinate system. It can be understood that, the coordinate system in BRAM space is different from the above coordinate system, the corresponding coordinate (u 'v') is subtracted by the head coordinate of the first pre-loaded sub-tile corresponding to the map sub-tile, so that the corresponding coordinate point (u ', v') is converted into a coordinate point (u ", v") in the BRAM space coordinate system; and acquiring a first coordinate (u '. Times.N/M, v'. Times.N/M) corresponding to the pixel point (u, v) in the BRAM space according to the pre-reduction multiplying power parameter N/M.

It is to be understood that the coordinate systems of the first BRAM space and the second BRAM space are the same.

Specifically, the determining the pixel value of each corresponding pixel point according to the first coordinate includes:

let P (x, y) represent the first coordinate;

when x and y are integers, taking the pixel value of the P point as the pixel value of each corresponding pixel point;

when x and y are not integers, determining four second coordinates around a P point according to the P point, and determining pixel values of the four second coordinates from the BRAM space addressing; based on a bilinear interpolation algorithm, calculating a pixel value corresponding to the first coordinate through pixel values of the four second coordinates, and taking the pixel value corresponding to the first coordinate as a pixel value of each corresponding pixel point;

when only one of x and y is an integer, determining two third coordinates around a point P according to the point P, and determining pixel values of the two third coordinates from the BRAM spatial addressing; and calculating a pixel value corresponding to the first coordinate through the pixel values of the two third coordinates based on a bilinear interpolation algorithm, and taking the pixel value corresponding to the first coordinate as the pixel value of each corresponding pixel point.

Specifically, when x and y are integers, it is described that the first coordinate P (x, y) is a pixel point, and at this time, a pixel value corresponding to the P point in the BRAM space is directly taken out as a pixel value of a pixel point in the map-filling sub-block.

When only one of x and y is an integer, for example, as shown in fig. 7, y is an integer, and x is not an integer, it indicates that the first coordinate P (x, y) is between integer coordinates of two adjacent pixel points, at this time, two third coordinates P1 and P2 are determined according to the P point, dixP1 and DixP2 are extracted from pixel values corresponding to the third coordinates P1 and P2 in the BRAM space, and then the pixel value corresponding to the first coordinate P point is DixP = DixP1 k1+ Dix × k2, where DixP = DixP1 k1+ Dix × k2

And taking the pixel value corresponding to the P point as the pixel value of the pixel point in the map filling sub-image block.

When x and y are not integers, as shown in fig. 8, it is described that the first coordinate P (x, y) is between 4 adjacent pixel points, at this time, four second coordinates P1, P2, P3, and P4 are determined according to the P point, and the pixel values of the second coordinates P1, P2, P3, and P4 in the BRAM space are taken out to be DixP1, dixP2, dixP3, and DixP4. The pixel value corresponding to the first coordinate P point is DixP = (DixP 1 × k1+ DixP2 × k3+ DixP3 × k1+ DixP4 × k3+ DixP1 × k4+ DixP3 × k2+ DixP2 × k4+ DixP4 × k 2)/4, where

And taking the pixel value corresponding to the P point as the pixel value of the pixel point in the map filling sub-image block. Wherein it is present>

For rounding down, is>

Is rounded up.

And after the pixel values in each map filling sub-image block are obtained in sequence according to the method, all map filling sub-image blocks form a result image.

Compared with the prior art, the data acceleration processing system based on the FPGA affine inverse transformation provided by the embodiment of the present invention determines the first coordinate corresponding to the BRAM space through any one pixel point in the map-filling sub-block, determines the pixel value corresponding to the first coordinate through different conditions of the first coordinate, and takes the pixel value of the first coordinate as the pixel value of the corresponding pixel point in the map-filling sub-block.

Those skilled in the art will appreciate that all or part of the processes for implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct associated hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims

1. An accelerated data processing system based on an inverse affine transform of an FPGA, the accelerated data processing system comprising:

the first external memory is used for storing a plurality of map filling sub-image blocks, the map filling sub-image blocks are sub-image blocks divided according to a result image, and the map filling sub-image blocks are divided into odd column blocks and even column blocks;

the second external memory is used for storing the original image;

the acceleration control module is used for controlling the pre-loading control module and the map filling control module to realize the pre-loading step and the addressing reading step of the odd column blocks and the even column blocks in a ping-pong operation mode;

2. The data accelerated processing system of claim 1, further comprising:

3. The system of claim 2, further comprising:

4. The system of claim 3, wherein the determining a corresponding first pre-load subblock block in the original image according to the odd/even column blocks and an affine transformation inverse matrix comprises:

5. The system of claim 4, wherein the computing a second preload sub-block based on the first preload sub-block comprises:

judging whether the size of the first pre-loading subblock block is larger than the size of the first/second BRAM space; if the judgment result is yes, determining a pre-reduction multiplying power parameter according to the sizes of the first pre-loading sub-pixel block and the first BRAM space/the second BRAM space, and reducing the first pre-loading sub-pixel block based on the pre-reduction multiplying power parameter to obtain a second pre-loading sub-pixel block; if not, the first preload bitmap block is taken as the second preload bitmap block.

6. The system of claim 3, wherein the determining the pixel value of each pixel point according to a second pre-load subblock block in the first BRAM space or a second BRAM space corresponding to the inverse affine transformation matrix and the odd/even column block comprises:

7. The system of claim 3, wherein the step of performing pre-loading and the step of performing addressing and reading on the odd column blocks and the even column blocks by using a ping-pong operation is specifically:

8. The data accelerated processing system of claim 4, wherein determining a matting sub-tile from the odd/even column blocks and an affine transformation inverse matrix comprises:

and determining the matting sub-tile according to the four matting vertex coordinates.

9. The system of claim 5, wherein the determining whether the size of the first pre-load subblock is larger than the size of the first/second BRAM spaces comprises:

determining a length and width of the first pre-loaded subblock block and determining a length and width of the first/second BRAM space;

10. The system of claim 9, wherein the determining a pre-scaling factor parameter based on the size of the first pre-loaded sub-tile and the first/second BRAM space comprises: