CN105160622A

CN105160622A - Field programmable gate array (FPGA) based implementation method for image super resolution

Info

Publication number: CN105160622A
Application number: CN201510623919.0A
Authority: CN
Inventors: 钟雪燕; 李春英
Original assignee: Nanjing Institute of Railway Technology
Current assignee: Nanjing Institute of Railway Technology
Priority date: 2015-09-25
Filing date: 2015-09-25
Publication date: 2015-12-16
Anticipated expiration: 2035-09-25
Also published as: CN105160622B

Abstract

The invention provides a field programmable gate array (FPGA) based implementation method for image super resolution. The method comprises the following steps that a cycle control module controls cyclic dispatching of a random-access memory (RAM) module to achieve data writing; a RAM applying a single-input and dual-output port defines the depth of the RAM applying the single-input and dual-output port as a row of pixel points of a source image and the width as a pixel data width when the RAM module reads data so that adjacent two rows of pixels of source data are stored; and a weight acquired by a position analysis module is an normalization decimal and is mapped within an integer range for operation. By the FPGA based implementation method, the image processing rate is improved, and super resolution is achieved.

Description

Based on the implementation method of the image super-resolution of FPGA

Technical field

The present invention relates to a kind of implementation method of image super-resolution, be specifically related to a kind of implementation method of the image super-resolution based on FPGA.

Background technology

Common image display has fixing resolution, the view data of low resolution needs to carry out SUPERRESOLUTION PROCESSING FOR ACOUSTIC, obtaining the resolution matched with display device could normally show (as HDTV, High-DefinitionTV), this process nature is exactly a kind of Image Super-resolution process.

Image super-resolution technology is used widely in every field, as industries such as public safety, medical imaging, military affairs, geology, industry and consumer electronics.Improved the resolution of image by this technology as far as possible, reach better recognition capability and accuracy of identification.

Along with the increase of image data amount, image processing speed is had higher requirement, utilize hardware implementing image procossing to become the important topic of graphics process research gradually.

FPGA obtains extensive concern due to intrepid data-handling capacity, and it adopts parallel flow ability of swimming processing mode to data, accelerates data processing speed.Be less than or equal to a frame with regular software to view synthesis per second, it is per second that the process of FPGA Hardware can reach 25 ~ 30 frames in real time.Thus the FPGA Hardware of image procossing is worth research.

FPGA realizes image processing algorithm needs seeking balance between algorithm performance and resource use amount.Traditional linear interpolation algorithm comprises arest neighbors interpolation, bilinear interpolation, 4 bicubic interpolations and 6 bicubic interpolations, and wherein the super resolution image effect of arest neighbors interpolation is undesirable, and high order interpolation method complexity is high is not easy to hardware implementing.

Summary of the invention

The object of this invention is to provide a kind of implementation method of the image super-resolution based on FPGA, based on the image super-resolution bilinear interpolation implementation of FPGA, propose the secondary cycle scheduling mechanism based on single-input double-output port ram buffering, distribute and parallel pipelining process process in order to realize shared resource.

The invention provides following technical scheme:

Based on an implementation method for the image super-resolution of FPGA, the round-robin scheduling of cycle control module control RAM module realizes data write;

When RAM module reads data, adopt the RAM of single-input double-output port, the degree of depth defining the RAM of described single-input double-output port is the pixel number of source images a line, and width is pixel data width, realizes the storage of source data adjacent rows pixel;

The weights obtained by location analysis module are normalized decimals, weights are mapped to computing in integer range.

Preferably, the RAM of described single-input double-output port defines the RAM0-3 of four single-input double-output ports in bilinear interpolation hardware structure diagram, during the pixel value interpolation of the wherein corresponding target image of RAM0, RAM1 ranking operation, RAM2, RAM3 write the pixel value of the source images needed for the computing of target image next line; After RAM0, RAM1 computing terminates, RAM2, RAM3 compute weighted, and RAM0, RAM1 start to write source image pixels value, realize data consecutive operations in time and export, spatially realize the parallel multiplexing of ram space, improve operation efficiency.

Further, the round-robin scheduling that cycle control module controls four RAM modules realizes data write, is respectively between RAM0, RAM1 and RAM2, RAM3 and between RAM0RAM1, between RAM2RAM3; Between RAM0, RAM1 and RAM2, RAM3 between computing target image pixel value and write source image pixels value function cyclic switching; Source image pixels value recurrent wrIting is realized between RAM0RAM1, between RAM2RAM3.Such structural design takes full advantage of the multiplexing feature of FPGA parallel pipelining process, both ensure that making full use of of data bandwidth, in turn saves the space resources of FPGA.

Further, in whole calculating process, weights obtain based on floating point arithmetic, by floating number integer, and can by computing all integer; The integer of floating number is the corresponding figure place that the floating number of correspondence moved to left, and terminates rear right move corresponding figure place in multiplying.

The invention has the beneficial effects as follows: the image super-resolution bilinear interpolation implementation that the present invention is based on FPGA, propose the secondary cycle scheduling mechanism based on single-input double-output port ram buffering, distribute and parallel pipelining process process in order to realize shared resource.Improve image procossing speed, achieve super-resolution.

Accompanying drawing explanation

Accompanying drawing is used to provide a further understanding of the present invention, and forms a part for instructions, together with embodiments of the present invention for explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:

Fig. 1 is bilinear interpolation hardware structure diagram;

Fig. 2 is RAM round-robin scheduling mechanism choice;

Fig. 3 FPGA resource takies figure;

Fig. 4 algoritic module sets up retention time figure;

Fig. 5 is the Lena figure before interpolation, and the resolution of source images is 512x512;

Fig. 6 is the Lena figure after interpolation, and after interpolation, resolution is 1024x1024;

Fig. 7 is the histogram of Lena figure before interpolation;

Fig. 8 is the histogram of Lena figure after interpolation.

Embodiment

FPGA has the performance of two opposition: (1) has parallel processing and pipelining, can reach high performance process, but M performance doubly will expend M times of logic; (2) there is multiplex technique, can logic be reduced, but control complexity rising.Based on the functional characteristic of FPGA, the present invention proposes to realize shared resource based on the secondary cycle scheduling mechanism of single-input double-output port ram buffering and distributes and parallel pipelining process process.The FPGA of Xilinx is based on LUT structure simultaneously, can realize floating-point operation and multiplying, but can cause the serious waste of resource.Herein by all floating numbers all integer, carry out data operation in integer field.

Bilinear interpolation determines a plane by 4, is a Planar Mechanisms problem, so the single order interpolation on a rectangular grid needs to use bilinear function.The function making f (x, y) be Two Variables, is defined as the arbitrary value in 4 squares formed, makes Bilinear Equations

f(x,y)＝ax+by+cxy+d(1)

Define a hyperbolic paraboloid and known point matching.

The realization of image bilinear interpolation algorithm has come through over-sampling, horizontal and vertical linear interpolation three step.If X _s, Y _sbe respectively the size of source images on X, Y, X _d, Y _dbe respectively the size of target image on X, Y, the zoom factor S of both definition, then the zoom factor of horizontal direction, vertical direction is respectively

S _x＝X _s/X _d(2)

S _y＝Y _s/Y _d(3)

The pixel location sets of definition source images horizontal direction sampling

I_{x}^{s} = {0, 1, 2, 3 ... X_{s - 1}} - - - (4)

The pixel location sets of objective definition image level direction sampling

I_{x}^{d} = {0, 1, 2, 3 ... X_{d - 1}} - - - (5)

Mapping relations between both definition image slices vegetarian refreshments are then can obtain according to formula (2)

R_{x}^{s} = {0, 1 \times S_{x}, 2 \times S_{x}, 3 \times S_{x} ... X_{d - 1} \times S_{x}} - - - (6)

Target image horizontal direction X can be obtained thus _di pixel position that () some position is mapped to original image is

R(X _d(i))＝X _d(i)×S _x(7)

R (the X obtained _d(i)) be real number, this target image horizontal direction X _di () some picture element interpolation is at source images [R (X _d(i))] and ([R (X _d(i))]+1) between, (R (X simultaneously _d(i))-[R (X _d(i))]) and ([R (X _d(i))]+1-R (X _d(i))) corresponding to target image X _d(i) point and source images [R (X _d(i))] put and ([R (X _d(i))]+1) normalized value of relative distance between point.

Make F (X _d(i))=R (X _d(i))-[R (X _d(i))] (8)

Target image pixel value is made to be V _d, source image pixels value is V _s, then

V _d(X _d(i))＝V _s([R(X _d(i))])×F(X _d(i))+

V _s([R(X _d(i))]+1)×(1-F(X _d(i)))(9)

In like manner interpolation is in the vertical direction

V _d(X _d(i),Y _d(j))＝V _s(X _d(i),R[Y _d(j)])×G(Y _d(j))

+V _s([X _d(i),R(Y _d(j))]+1)×(1-G(Y _d(j)))(10)

(8) formula is substituted into (9) formula obtain

V _d(X _d(i),Y _d(j))＝V _s([R(X _d(i))],[R(Y _d(j))])

×(1-F(X _d(i)))×(1-G(Y _d(j)))+

V _s([R(X _d(i))],[R(Y _d(j))]+1)×F(X _d(i))×

(1-G(Y _d(j)))+V _s([R(X _d(i))]+1,[R(Y _d(j))])×

(1-F(X _d(i)))×G(Y _d(j))+

V _s([R(X _d(i))]+1,[R(Y _d(j))]+1)×F(X _d(i)))×

G(Y _d(j))(11)

Can find that formula (11) and formula (1) are similar.

From operational analysis, any pixel value of target image, by adjacent 2 decisions of source images adjacent rows, devises the RAM of single-input double-output port for this reason, realizes two data reading neighbor address.The degree of depth defining this RAM is the pixel number of source images a line, and width is pixel data width, realizes the storage of source data adjacent rows pixel.

According to the position X of target image storage pixel _d(i), Y _dj () obtains position (the R ([X of the source images that pixel is therewith correlated with through location analysis module _d(i)]), R ([X _d(j)])), (R ([X _d(i)])+1, R ([X _d(j)])), (R ([X _d(i)]), R ([X _d(j)]+1)), (R ([X _d(i)])+1, R ([X _d(j)]+1)) and the weights F (X of relevant position point _d(i)), (1-F (X _d(i))) and G (Y _d(j)), (1-G (Y _d(j))).The module writing the position control source images of the source images that control module obtains according to location analysis module writes in corresponding RAM.

As shown in Figure 1, in bilinear interpolation hardware structure diagram, define the RAM of four single-input double-output ports.During the pixel value interpolation of the wherein corresponding target image of RAM0, RAM1 ranking operation, RAM2, RAM3 write the pixel value of the source images needed for the computing of target image next line; After RAM0, RAM1 computing terminates, RAM2, RAM3 compute weighted, and RAM0, RAM1 start to write source image pixels value, realize data consecutive operations in time and export, spatially realize the parallel multiplexing of ram space, improve operation efficiency.

The round-robin scheduling that cycle control module controls four RAM modules realizes data write, and round-robin scheduling is divided into two-stage as shown in Figure 2, is respectively between RAM0, RAM1 and RAM2, RAM3 and between RAM0RAM1, between RAM2RAM3.Between RAM0, RAM1 and RAM2, RAM3 between computing target image pixel value and write source image pixels value function cyclic switching; Source image pixels value recurrent wrIting is realized between RAM0RAM1, between RAM2RAM3.Such structural design takes full advantage of the multiplexing feature of FPGA parallel pipelining process, both ensure that making full use of of data bandwidth, in turn saves the space resources of FPGA.

Normalized decimals according to known four weights obtained by location analysis module of operational analysis, although FPGA can support floating point arithmetic, need a large amount of logic and interconnection resource, Performance Ratio is poor, be unfavorable for the computing of FPGA, therefore weights be mapped to computing in integer range.In whole calculating process, S _x, S _ybe floating number, weights are based on S _x, S _ycomputing obtains, by S _x, S _yinteger, can by computing all integer.The integer of floating number is the corresponding figure place that the floating number of correspondence moved to left, and terminates rear right move corresponding figure place in multiplying.

The present embodiment experimental verification on the Kintex-7 development board of Xilinx company is the FPGA resource that this algorithm takies as shown in Figure 3, and comprise 645 triggers, 2 RAM and 11 DSP etc., resources occupation rate is not high.

Scheme the retention time of setting up that bilinear interpolation Hardware realizes shown in Fig. 4, as long as can be seen from the figure minimum data keeps (0.067+5.175)=5.242ns to transmit, mean that clock frequency can reach 190MHz, the time being interpolated into 1024x1024 pixel is 5.5ms, do not adopt desirable highest frequency in actual use procedure, select lower frequency to avoid loss of data or entanglement.It is per second that general selection realizes 25 ~ 30 frames.

Be the Lena figure before and after interpolation as seen in figs. 5-6, the resolution of source images is 512x512, and after interpolation, resolution is 1024x1024, roughly can see that Fig. 6 is more clear than Fig. 5 details aspect.

Fig. 7-8 is that the histogram of Lena figure before and after interpolation compares, can the details continuity problem of analysis chart picture more intuitively from histogram.Comparison diagram 7 and Fig. 8 can see that the Lena histogram after interpolation is than more level and smooth before interpolation, and the continuous transition of key diagram picture is better, achieves histogrammic equalization.

To sum up, the present invention can effectively improve image resolution ratio and image processing speed.This point is being verified on Kintex-7 development board, realizes image procossing 25 ~ 30 frame per second, and after image interpolation, not only details is more clear simultaneously, can see that image obtains equalization from histogram.

The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, although with reference to previous embodiment to invention has been detailed description, for a person skilled in the art, it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. based on an implementation method for the image super-resolution of FPGA, it is characterized in that, the round-robin scheduling of cycle control module control RAM module realizes data write;

2. the implementation method of the image super-resolution based on FPGA according to claim 1, it is characterized in that, the RAM of described single-input double-output port defines the RAM0-3 of four single-input double-output ports in bilinear interpolation hardware structure diagram, during the pixel value interpolation of the wherein corresponding target image of RAM0, RAM1 ranking operation, RAM2, RAM3 write the pixel value of the source images needed for the computing of target image next line; After RAM0, RAM1 computing terminates, RAM2, RAM3 compute weighted, and RAM0, RAM1 start to write source image pixels value, realize data consecutive operations in time and export, spatially realize the parallel multiplexing of ram space.

3. the implementation method of the image super-resolution based on FPGA according to claim 1 and 2, it is characterized in that, the round-robin scheduling that cycle control module controls four RAM modules realizes data write, is respectively between RAM0, RAM1 and RAM2, RAM3 and between RAM0RAM1, between RAM2RAM3; Between RAM0, RAM1 and RAM2, RAM3 between computing target image pixel value and write source image pixels value function cyclic switching; Source image pixels value recurrent wrIting is realized between RAM0RAM1, between RAM2RAM3.

4. the implementation method of the image super-resolution based on FPGA according to claim 1, is characterized in that, in whole calculating process, weights obtain based on floating point arithmetic, by floating number integer, and can by computing all integer; The integer of floating number is the corresponding figure place that the floating number of correspondence moved to left, and terminates rear right move corresponding figure place in multiplying.