CN103546752A

CN103546752A - Image size compression traversing method on basis of hardware parallel architecture

Info

Publication number: CN103546752A
Application number: CN201310482288.6A
Authority: CN
Inventors: 徐向民; 陈晓仕; 吴岱玲; 黄帅凯
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2013-10-15
Filing date: 2013-10-15
Publication date: 2014-01-29
Anticipated expiration: 2033-10-15
Also published as: CN103546752B

Abstract

The invention discloses an image size compression traversing method on the basis of a hardware parallel architecture. The image size compression traversing method includes steps storing images, enabling a first address generating module to generate a next-row address and reading image data in cache regions of a second row/a first row in a corresponding RAM (random access memory); enabling a second address generating module to generate compression column addresses of the first row/the second row, and outputting data in the corresponding cache regions to an output cache of an N<th> row; enabling a third address generating module to start working when output caches of M rows are fully written for the first time; enabling the third address generating module to output data of output caches of M rows from an N-M<th> row to an N-1<th> row when a compression stage is unchanged, reading the data in the output caches until the length of the data is equal to a corresponding row width of the current compression stage; enabling an operation pointer corresponding to the third address generating module to point to next K rows when the compression stage is changed; repeatedly carrying out the steps until image compression traversing is completed. The N is circularly equal to a number in the range from 1 to M+K until compression operation on a corresponding row is completed. The image size compression traversing method has the advantages of low quantity of occupied resources and high running speed.

Description

A kind of picture size compression traversal method based on hardware parallel architecture

Technical field

The present invention relates to target signature and search technical field, be specifically related to a kind of picture size compression traversal method based on hardware parallel architecture.

Background technology

Along with the increase in demand of machine intelligence, people to the demand of man-machine interactive system also in continuous raising.Wherein, systemic resolution is an important technical indicator.The processing of big data quantity is the main feature of existing intelligent human-machine interaction system in real time, is also a technical bottleneck of restriction high resolution system development.

In target detection technique field, have two technique directions: one is the detection of specific objective, for example whether the people on recognition image is " Zhang San "; Another one is that target is searched, for example, in a sub-picture, find out how many " people ", and where they are.The maximum difference of two technique directions is that the latter need to carry out size compression (target that adapts to different sizes) and traversal (place of target may appear in each of searching image) to image.

For the latter, picture size compression traversal technology refers to and large-sized image carried out to the compression of size according to specific ratio, and the treatment technology operating through the image traversal of row certain window size.Picture size compression traversal technology is front end preconditioning technique important in algorithm of target detection.The performance of compression, traversal method directly affects the speed of back-end processing and the scale of whole system.

Target detection technique need to be processed a large amount of data, and the requirement of real-time of field of human-computer interaction has restricted the development of technology to a great extent.Existing technical conditions mainly address this problem with two kinds of methods: a kind of is to reduce algorithm complex, take to sacrifice resolution and reach real-time processing requirements as cost; Another is to adopt parallel architecture to accelerate, and take to sacrifice resource and obtain processing speed and higher resolution faster as cost.

The main flow way of accelerating picture size compression and traversal with parallel architecture has three kinds.

First method is that entire image is stored in register, compresses line by line, by using linebuffer buffering to travel through with the method for calculated address.Use linebuffer to cushion the way traveling through and be, suppose that traversal window size is 20*20, original image size is 640*480, the buffer that 20 length of exampleization are 640, data enter from the first row, carry out shifting function, and the second row is exported and entered to the data of the first row, so analogize,, when the buffer of 20 row fills up, module is exported the data of 20 simultaneously, and these data are all at same row, can find out that data of every output equal window and move one, complete the function of traversal.

Second method is to carry out pixel selection in data input process, compressed image stored in register, and the image after a store compressed, the method by calculated address travels through output.

The third method is that entire image is cached in RAM, reads image and carry out squeeze operation from RAM, and image is deposited back in RAM, and compressed image reads into and in linebuffer, carries out traversing operation from RAM.

Above three kinds of methods no matter adopt any method, have very large restriction in the middle of practical application.

First, be the consumption in resource: in first method, need to consume 640*(480+20) * 8 register, the consumption of these register resources is corresponding the expansion of system area directly; In second method, suppose that compressibility factor is 1.25, from the third level, start to the tenth grade, needed memory space is the register of 640*480*0.7*8, that than first method, uses is little, but owing to compressing at front end, buffer areas at different levels need to pass through independent address arithmetic, and algorithm complex promotes to some extent, has improved the consumption of logical resource; In the third method, because used external memory storage, so the consumption of register resources is very little, cost is relatively low.

Next is the restriction in speed: the mode that in second method, calculated address travels through, read data at every turn, and read 20 data, need to consume 20 clock cycle, this will increase the expense of time greatly than first method; In the third method, owing to need to image being carried out to reading and writing repeatedly, restrict the data width in external memory storage, slower than second method in speed, be about 1/8(8 compression stage of second method).

So how large as far as possible raising speed is the difficult point place of picture size compression traversal technology when the least possible use hardware resource.

Summary of the invention

The shortcoming that the object of the invention is to overcome prior art, with not enough, provides a kind of picture size compression traversal method based on hardware parallel architecture, and it takies, and resource is few, the speed of service is fast.

Object of the present invention realizes by following technical scheme:

A picture size compression traversal method for hardware parallel architecture, the step that comprises following order:

(1) raw image data stores RAM(random asccess memory into) in, after the storage of the first row data is complete, read into the first row buffering area;

(2) after entire image storage is complete, start to carry out squeeze operation;

(3) address generating module one calculates next line address according to compressibility factor and compression stage, and the view data in RAM corresponding to address reads into the buffer area of the second row/the first row;

(4) when carrying out step (3), address generating module two starts the first row/the second line operate of advancing, according to compressibility factor and compression stage, calculate array of compressed address, buffer data corresponding to address outputs in the output buffer memory that N is capable, and N circulates in 1 to M+K;

(5) repeating step (4) until complete the squeeze operation of a line, often completes the operation of a line, and it is capable that the packed data of next line outputs to N+1, if N+K is greater than M+K, N equals 1;

(6) repeating step (3) is to step (5), until write for the first time the output buffer memory that full M is capable, address generating module three is started working;

(7) in the constant situation of compression stage, three pairs of address generating module are carried out output function from the capable data to the capable output buffer memory of capable this M of N-1 of N-M, wherein N is the output buffer memory numbering operating in step (4), when N-M is less than 1, N-M equals N+K, when N equals 1, N-1 equals M+K, and while sense data from the capable output buffer memory of this M, until the length of sense data is the line width that current compression stage is corresponding;

(8) in the situation that compression stage changes, because changing, compression stage cause every row pixel to tail off, in step (5), complete after N line operate, in step (7), also do not complete N-M to the capable data cached processing of output of this M of N-1, so if step (7) does not complete, in step (5), complete after N line operate, proceed step (4) and step (5), to the N+1 line operate of advancing, in step (7), the numbering of the M line output buffer memory of operation is constant, if step (7) completes, waiting step (5) completes;

(9) repeating step (3) is to step (8), until a sub-picture compression has traveled through.

Described step (3), be specially: the address that address generating module one calculates according to compressibility factor a and current compression stage c the next line that current compression stage is corresponding, and from RAM, extract and be stored in buffer area, computational methods are: establish and carried out the capable compression of n, calculating the row address that n+1 is capable is floor(n*ac), floor function representation rounds downwards.

Described step (4), be specially: the address that address generating module two calculates according to compressibility factor a and current compression stage c the next column that current compression stage is corresponding, and from buffer area, extract data out, computational methods are: establish the compression of having carried out n row, the column address of calculating n+1 row is floor(n*ac), floor function representation rounds downwards.

Compared with prior art, tool has the following advantages and beneficial effect in the present invention:

1, required hardware resource is few, cost is lower: the hardware structure of traversal method of the present invention comprises the outer RAM of a sheet, two row cache districts, a data selector, M+K line output buffer memory, three address generating module, K determines by compressibility factor a, K equals a and rounds up, for example compressibility factor a is that 5/4, K equals 2; M determines by cycling among windows height, and window size is 20*25, M=25; The size of buffer area determines by original image length, and for example picture size is 640*480, and buffer area length is that size is 640; Output buffer storage length determines by the reduction length of the first order, and for example compressibility factor is 1.25, and original image length is 640, by the third level, starts compression, and exporting buffer storage length is 640/(1.25) ³=327(rounds downwards).

2, the speed of service is fast: the present invention can carry out the squeeze operation of specified compression ratio and can in one-level compression, carry out continual window traversing operation a sub-picture size, in two stages of compression saltus step process, only need the wait utmost point short time can carry out the traversing operation of next stage compressed image, the speed of service is very fast.

Accompanying drawing explanation

Fig. 1 is the hardware structure figure of a kind of picture size compression traversal method based on hardware parallel architecture of the present invention;

Fig. 2 is address generating module one and the collaborative operation chart of carrying out image compression of address generating module two of method described in Fig. 1;

Fig. 3 is the traversing operation figure of method described in Fig. 1.

Embodiment

(3) address generating module one calculates next line address according to compressibility factor and compression stage, view data in RAM corresponding to address reads into the buffer area of the second row/the first row, be specially: the address that address generating module one calculates according to compressibility factor a and current compression stage c the next line that current compression stage is corresponding, and from RAM, extract and be stored in buffer area, computational methods are: establish and carried out the capable compression of n, calculating the row address that n+1 is capable is floor(n*ac), floor function representation rounds downwards;

(4) when carrying out step (3), address generating module two starts the first row/the second line operate of advancing, according to compressibility factor and compression stage, calculate array of compressed address, buffer data corresponding to address outputs in the output buffer memory that N is capable, N circulates in 1 to M+K, be specially: the address that address generating module two calculates according to compressibility factor a and current compression stage c the next column that current compression stage is corresponding, and from buffer area, extract data out, computational methods are: establish the compression of having carried out n row, the column address of calculating n+1 row is floor(n*ac), floor function representation rounds downwards,

If Fig. 1 is the hardware structure figure of the picture size compression traversal method based on hardware parallel architecture, RAM(random asccess memory wherein), address generating module one, address generating module two, buffer area one, buffer area two and a data selector composing images compression function module; The output buffer memory of numbering from 1 to M+K, address generating module two and address generating module three composing images traversal functional modules; Between three address modules, all there is the communication of control signal.

Below in conjunction with Fig. 2, image compression operation step is described:

Step 1: store a sub-picture complete in RAM, address generating module is worked at the beginning, address generating module one operating pointer points to buffer area one, and it is complete that the data of waiting for the first row in RAM deposit into current address generation module one operating pointer buffer area pointed;

Step 2: after the first row storage, the address that address generating module one produces next line, and corresponding operating pointer is pointed to buffer area two; Meanwhile, address generating module one sends buffer memory settling signal to address generating module two, and address generating module two is started working;

Step 3: address generating module two operating pointers point to buffer area one (generation module one operating pointer in current address points to buffer area two), address generating module two is calculated in ，Cong current address, array of compressed address generation module two operating pointer buffer area pointed and is read corresponding column data according to current compression stage situation;

Step 4: the output data that data selector points to the buffer area of select finger sensing by current address operational module 2 operating pointers are exported;

Step 5: after address generating module two completes the squeeze operation of a line, another buffer area of respective operations pointed, send compression settling signal to address generating module one simultaneously, the address that address generating module one produces next line, corresponding operating pointer points to another buffer area;

Step 6: repeating step 2, to step 5, completes the multilevel size squeeze operation of piece image;

Wherein address generating module one operating pointer points to different buffer areas forever with address generating module two operating pointers, so generation module operation in two-address is relatively independent, parallel running, makes the output of data free of discontinuities.

Below in conjunction with Fig. 3, image traversal operating procedure is described:

Step 1: address generating module two produces output buffer memory writing address when carrying out squeeze operation step 4, being write by data selector data Data2 out in the current output buffer memory that writes pointed;

Step 2: repeating step 1, complete the rear address generating module of writing of a line two and write settling signal to address generating module three transmissions, write pointed next line output buffer memory, when the current pointed M+K that writes is when capable, complete after write operation, write pointed the first row;

Step 3: write for the first time full M when capable when data, address generating module three is started working, M is capable for traversal pointed, and address generating module three comprises that to M is capable the capable data of the capable M altogether of M operate simultaneously above;

Step 4: address generating module three calculates traversal OPADD, from 1 to image length corresponding to current compression stage, from above-mentioned M line output buffer memory, read out M the view data after compression, if window size is N*M, after exporting N data, form a window data, in the time of the output of N+1 data with a window of front N-1 data formation, repeat this step until output length is the traversing operation that current compression stage correspondence image length completes a line window, wait for that address generating module two writes after settling signal, traversal pointed next line or lower K are capable;

Wherein in step 4, traveling through pointed selects to divide two kinds: in the situation that compression stage is constant, if the data Data2 now writing is data of the compressed image in same compression stage with the data that write before, travel through pointed next line; In the situation that compression stage changes, in compression stage as corresponding in the data Data2 writing and the different situation of the corresponding compression stage of current traversal pointer indication data, travel through under pointed K capable.

Above-described embodiment is preferably execution mode of the present invention; but embodiments of the present invention are not restricted to the described embodiments; other any do not deviate from change, the modification done under Spirit Essence of the present invention and principle, substitutes, combination, simplify; all should be equivalent substitute mode, within being included in protection scope of the present invention.

Claims

1. the picture size based on hardware parallel architecture is compressed a traversal method, the step that comprises following order:

(1) raw image data stores in RAM, after the storage of the first row data is complete, reads into the first row buffering area;

2. the picture size based on hardware parallel architecture according to claim 1 is compressed traversal method, it is characterized in that, described step (3), be specially: the address that address generating module one calculates according to compressibility factor a and current compression stage c the next line that current compression stage is corresponding, and from RAM, extract and be stored in buffer area, computational methods are: establish and carried out the capable compression of n, calculating the row address that n+1 is capable is floor(n*ac), floor function representation rounds downwards.

3. the picture size based on hardware parallel architecture according to claim 1 is compressed traversal method, it is characterized in that, described step (4), be specially: the address that address generating module two calculates according to compressibility factor a and current compression stage c the next column that current compression stage is corresponding, and from buffer area, extract data out, computational methods are: establish the compression of having carried out n row, the column address of calculating n+1 row is floor(n*ac), floor function representation rounds downwards.