CN111815502A

CN111815502A - FPGA (field programmable Gate array) acceleration method for multi-image processing based on WebP (Web Page) compression algorithm

Info

Publication number: CN111815502A
Application number: CN202010653783.9A
Authority: CN
Inventors: 杨晓成
Original assignee: Shanghai Xuehu Technology Co ltd
Current assignee: Shanghai Xuehu Technology Co ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-10-23
Anticipated expiration: 2040-07-08
Also published as: CN111815502B

Abstract

The invention relates to the technical field of image processing, in particular to an FPGA (field programmable gate array) accelerating method for multi-image processing based on a WebP (Web Page compression) algorithm, which comprises the steps of transmitting an image according to RGB (Red, Green and blue) three-channel data and converting the image into corresponding YUV (YUV) data; caching YUV data generated by corresponding pictures into an on-chip DDR cache, reading the data into a computing module through bus read data, reading corresponding data from the on-chip DDR cache respectively according to the processing progress of a plurality of pictures, and putting the data into a dependent data cache region; when the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the batch of pictures are completely encoded; the invention realizes an effective proposal for acceleration, realizes coding by adopting a parallel pipeline processing mode, is more efficient compared with serial processing on a CPU, is more suitable for processing the blocked closed-loop algorithm compared with the CPU, and improves the output frame rate of the whole WebP algorithm.

Description

FPGA (field programmable Gate array) acceleration method for multi-image processing based on WebP (Web Page) compression algorithm

Technical Field

The invention relates to the technical field of image processing, in particular to an FPGA (field programmable gate array) acceleration method for multi-image processing based on a WebP (Web Page) compression algorithm.

Background

With the development of image acquisition equipment such as mobile phones, flat panels, digital cameras and the like and the increase of picture pixel scale, the scale of internet image data is exponentially increased. Recent studies have shown that the data storage size on data center servers will grow four times from 663EB to 2.6ZB in 2016 to 2021, where most of the data storage originates from images and videos.

Images currently occupy up to 60% -65% of the bytes on most web pages, where image data is particularly important for mobile devices, where less image information can save bandwidth and battery life. WebP is a novel picture format proposed by Google on the basis of VP8 coding in order to meet the current higher and higher bandwidth requirements. Since WebP uses predictive coding techniques, the color values of its neighboring blocks are predicted by the colors of a part of the pixel blocks, and only the difference between the two is recorded. And in most cases, the difference between the two is very small, even zero, so that the compression ratio is greatly improved. Compared with WebP and JPEG compression, when WebP compresses JPG to 90% of the original image, the picture volume is reduced by about 50%. When WebP compresses JPG to 80% of original image, the picture volume is reduced by 60% -80%. The reason why the lossy WebP compression performance is better than JPG is mainly that the predictive coding technology is advanced, the adaptive quantization of the macro block also improves the compression efficiency, and Boolean arithmetic coding improves the compression performance by 5-10% compared with Huffman coding.

In the prior art, a WebP lossy compression algorithm is shown in fig. 1, and the algorithm first converts an original picture into a YUV macroblock (Y represents luminance and UV represents chrominance) which is correspondingly analyzed according to RGB three channels, and then divides the YUV macroblock into two branch lines, one of which obtains calculation parameters required in a corresponding quantization process through simple pre-analysis and segment calculation, and the other of which obtains calculation parameters required in a corresponding quantization process through distinguishing Y, U and V macroblocks and further processes each of which is obtained through subblocks into which the macroblock is decomposed, so that each pheromone is analyzed, and thus, information loss in an encoding process can be greatly reduced. Therefore, the whole process forms a closed loop from prediction, DCT transformation, quantization, inverse quantization and IDCT transformation, and front-back dependency is formed between each macro block of the same picture, and the sub-blocks are the same.

The WebP algorithm has high complexity, and the calculation of the next macroblock needs to wait until the calculation of the previous macroblock is finished, so that a Blocked "design is formed, and the processing efficiency is relatively low, for example, as shown in fig. 2, 4 pictures are processed, and the whole processing mode is a front-back blocking processing mode in the span from T1 time to T3 time.

With the coming of the 5G era, the requirements for cloud computing performance are improved due to high reliability, low time delay and large-bandwidth data transmission, the period of picture compression coding is required to be shortened in order to not influence the customer experience, and although the WebP algorithm greatly reduces the number of codes, the overall algorithm complexity is higher than that of other codes.

Disclosure of Invention

In view of the technical problems, the invention provides an FPGA acceleration method for multi-image processing based on a WebP compression algorithm, provides an effective acceleration scheme for realizing the WebP algorithm on a Field Programmable Gate Array (FPGA), realizes coding by adopting a parallel pipeline processing mode, is more efficient compared with serial processing on a CPU, reasonably utilizes on-board resources of the FPGA, and can shorten the processing time span to the time span from T1 to T2 under the influence of the FPGA acceleration scheme.

An FPGA acceleration method of multi-image processing based on a WebP compression algorithm is characterized by comprising the following steps:

step S1: transmitting the picture according to RGB three-channel data and converting the picture into corresponding YUV data;

step S2: caching YUV data generated by corresponding pictures into an on-chip DDR cache, reading the data into a computing module through bus read data, reading corresponding data from the on-chip DDR cache respectively according to the processing progress of a plurality of pictures, and putting the data into a dependent data cache region;

step S3: and when the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the batch of pictures are completely encoded.

In a preferred embodiment, the FPGA acceleration method based on WebP compression algorithm for multi-image processing is characterized by further comprising a parameter buffer area, and after converting the YUV data into YUV data, the method further comprises calculating segment parameters of the image through pre-analysis on the YUV data, and buffering the segment parameters to the parameter buffer area.

In a preferred embodiment, the FPGA acceleration method based on the WebP compression algorithm for multi-graph processing is characterized in that the parameter cache is a Bram stored inside the FPGA.

In a preferred embodiment, the FPGA acceleration method based on WebP compression algorithm for multi-graph processing is characterized in that the dependent data buffer is a DDR storage area.

The technical scheme has the following advantages or beneficial effects:

the invention provides an FPGA (field programmable gate array) acceleration method for multi-image processing based on a WebP (Web Page) compression algorithm, provides an effective acceleration scheme for realizing the WebP algorithm on a field editable gate array (FPGA), realizes coding by adopting a parallel pipeline processing mode, is more efficient compared with serial processing on a CPU (Central processing Unit), is more suitable for processing the blocked closed-loop algorithm compared with the CPU, and improves the output frame rate of the whole WebP algorithm.

Drawings

The invention and its features, aspects and advantages will become more apparent from reading the following detailed description of non-limiting embodiments with reference to the accompanying drawings. Like reference symbols in the various drawings indicate like elements. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.

FIG. 1 is a prior art WebP lossy compression algorithm;

FIG. 2 processes a graph of the span of 4 pictures from time T1 to time T3;

FIG. 3 is a schematic diagram of an FPGA acceleration method of multi-graph processing based on a WebP compression algorithm.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in FIG. 3, the invention discloses a FPGA acceleration method of multi-graph processing based on a WebP compression algorithm, which provides an effective acceleration scheme for realizing the WebP algorithm on a field editable gate array (FPGA), realizes coding by adopting a parallel pipeline processing mode, is more efficient compared with serial processing on a CPU, reasonably utilizes on-board resources of the FPGA, and can shorten the processing time span to the time span from T1 to T2 under the influence of the FPGA acceleration scheme. The specific method comprises the following steps:

In a preferred embodiment, the method further comprises a parameter cache region, and after the YUV data is converted into YUV data, the method further comprises the steps of calculating segment parameters of the picture through pre-analysis, and caching the segment parameters to the parameter cache region, wherein the parameter cache region is an internal storage Bram of the FPGA.

Preferably, the dependent data cache area is a DDR memory area.

The specific implementation mode is as shown in fig. 3, the picture data can be firstly written into a DDR storage area of an FPGA chip through an upper computer, according to the calculation flow, the data of each picture can be divided into macro blocks with various sizes of Y, U and V through RGB three-channel calculation, complexity of information calculation is increased, and then the upper branch line is used for calculating section parameters. The second branch line respectively establishes cache regions for caching the macro block data of a plurality of pictures, then the macro block data sequentially enters the calculation module through the circular arbiter, one round of traversal of N pictures is performed, the number of N depends on the whole operation period of the calculation module (including prediction, DCT conversion, quantization, inverse quantization and IDCT conversion), the operation period of the calculation module is submerged or covered through data input of each macro block of each picture, so that the whole parallel pipelining acceleration scheme is indirectly realized, and when the macro block data after the new IDCT conversion processing is returned to the data cache region, the next macro block data of the first picture can be input to the calculation module. Each macro block which completes the closed loop is written into the DDR storage space corresponding to the picture partition, because coding can be performed only when the data of one picture is calculated and processed, in order to cache the information quantity of one picture which is compressed, N pieces of picture processing completion macro block data need to be cached by means of a larger cache space in the FPGA, then the N pieces of picture processing completion macro block data are sequentially taken out from the DDR and enter a coding module of the pipeline computing, and in terms of the whole process, the overall operating speed is reduced to the original 1/N computing cycle compared with the data processed by the CPU, and the output frame rate is greatly improved.

The whole processing process extracts information of a plurality of macro blocks aiming at the same picture, because the root of the whole picture compression algorithm is to filter similar information in each macro block and reserve information with larger difference, and the information of adjacent macro blocks is also the same, the object of the whole algorithm closed loop process is a single macro block, which means that the minimum cycle interval is the number of cycles spent by calculation of the single macro block, and as the number of the macro blocks decomposed by the picture is increased, the number of cycles spent by picture processing is increased in equal proportion.

In order to avoid the situation, the output frame rate of the image compression algorithm is increased, and by adopting the acceleration scheme of the invention, the macro block processing of a plurality of images is sequentially added in the whole closed-loop process to fill the middle blocking period, so that the feasible reason is that the calculation of the macro blocks among different images is not interfered with each other, and the resource configurability of the FPGA is high, and compared with a CPU (central processing unit) which is more suitable for processing the blocked closed-loop algorithm, the output frame rate of the whole WebP algorithm is improved by adopting a parallel pipeline calculation mode.

Those skilled in the art will appreciate that those skilled in the art can implement the modifications in combination with the prior art and the above embodiments, and the details are not described herein. Such variations do not affect the essence of the present invention and are not described herein.

The above description is of the preferred embodiment of the invention. It is to be understood that the invention is not limited to the particular embodiments described above, in that devices and structures not described in detail are understood to be implemented in a manner common in the art; those skilled in the art can make many possible variations and modifications to the disclosed embodiments, or modify equivalent embodiments, without affecting the spirit of the invention, using the methods and techniques disclosed above, without departing from the scope of the invention. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims

1. An FPGA acceleration method of multi-image processing based on a Webp compression algorithm is characterized by comprising the following steps:

2. The FPGA acceleration method for multi-graph processing based on the Webp compression algorithm of claim 1, further comprising a parameter buffer area, wherein the conversion into YUV data further comprises calculating segment parameters of the picture through pre-analysis of the YUV data, and buffering the segment parameters to the parameter buffer area.

3. The FPGA acceleration method of multi-graph processing based on Webp compression algorithm of claim 2, characterized in that the parameter cache is an internal storage Bram of the FPGA.

4. The FPGA acceleration method based on multi-graph processing of Webp compression algorithm of claim 1, characterized in that the dependent data buffer area is a DDR storage area.