CN111815502A - FPGA (field programmable Gate array) acceleration method for multi-image processing based on WebP (Web Page) compression algorithm - Google Patents

FPGA (field programmable Gate array) acceleration method for multi-image processing based on WebP (Web Page) compression algorithm Download PDF

Info

Publication number
CN111815502A
CN111815502A CN202010653783.9A CN202010653783A CN111815502A CN 111815502 A CN111815502 A CN 111815502A CN 202010653783 A CN202010653783 A CN 202010653783A CN 111815502 A CN111815502 A CN 111815502A
Authority
CN
China
Prior art keywords
data
pictures
picture
webp
yuv
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010653783.9A
Other languages
Chinese (zh)
Other versions
CN111815502B (en
Inventor
杨晓成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xuehu Technology Co ltd
Original Assignee
Shanghai Xuehu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xuehu Technology Co ltd filed Critical Shanghai Xuehu Technology Co ltd
Priority to CN202010653783.9A priority Critical patent/CN111815502B/en
Publication of CN111815502A publication Critical patent/CN111815502A/en
Application granted granted Critical
Publication of CN111815502B publication Critical patent/CN111815502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of image processing, in particular to an FPGA (field programmable gate array) accelerating method for multi-image processing based on a WebP (Web Page compression) algorithm, which comprises the steps of transmitting an image according to RGB (Red, Green and blue) three-channel data and converting the image into corresponding YUV (YUV) data; caching YUV data generated by corresponding pictures into an on-chip DDR cache, reading the data into a computing module through bus read data, reading corresponding data from the on-chip DDR cache respectively according to the processing progress of a plurality of pictures, and putting the data into a dependent data cache region; when the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the batch of pictures are completely encoded; the invention realizes an effective proposal for acceleration, realizes coding by adopting a parallel pipeline processing mode, is more efficient compared with serial processing on a CPU, is more suitable for processing the blocked closed-loop algorithm compared with the CPU, and improves the output frame rate of the whole WebP algorithm.

Description

FPGA (field programmable Gate array) acceleration method for multi-image processing based on WebP (Web Page) compression algorithm
Technical Field
The invention relates to the technical field of image processing, in particular to an FPGA (field programmable gate array) acceleration method for multi-image processing based on a WebP (Web Page) compression algorithm.
Background
With the development of image acquisition equipment such as mobile phones, flat panels, digital cameras and the like and the increase of picture pixel scale, the scale of internet image data is exponentially increased. Recent studies have shown that the data storage size on data center servers will grow four times from 663EB to 2.6ZB in 2016 to 2021, where most of the data storage originates from images and videos.
Images currently occupy up to 60% -65% of the bytes on most web pages, where image data is particularly important for mobile devices, where less image information can save bandwidth and battery life. WebP is a novel picture format proposed by Google on the basis of VP8 coding in order to meet the current higher and higher bandwidth requirements. Since WebP uses predictive coding techniques, the color values of its neighboring blocks are predicted by the colors of a part of the pixel blocks, and only the difference between the two is recorded. And in most cases, the difference between the two is very small, even zero, so that the compression ratio is greatly improved. Compared with WebP and JPEG compression, when WebP compresses JPG to 90% of the original image, the picture volume is reduced by about 50%. When WebP compresses JPG to 80% of original image, the picture volume is reduced by 60% -80%. The reason why the lossy WebP compression performance is better than JPG is mainly that the predictive coding technology is advanced, the adaptive quantization of the macro block also improves the compression efficiency, and Boolean arithmetic coding improves the compression performance by 5-10% compared with Huffman coding.
In the prior art, a WebP lossy compression algorithm is shown in fig. 1, and the algorithm first converts an original picture into a YUV macroblock (Y represents luminance and UV represents chrominance) which is correspondingly analyzed according to RGB three channels, and then divides the YUV macroblock into two branch lines, one of which obtains calculation parameters required in a corresponding quantization process through simple pre-analysis and segment calculation, and the other of which obtains calculation parameters required in a corresponding quantization process through distinguishing Y, U and V macroblocks and further processes each of which is obtained through subblocks into which the macroblock is decomposed, so that each pheromone is analyzed, and thus, information loss in an encoding process can be greatly reduced. Therefore, the whole process forms a closed loop from prediction, DCT transformation, quantization, inverse quantization and IDCT transformation, and front-back dependency is formed between each macro block of the same picture, and the sub-blocks are the same.
The WebP algorithm has high complexity, and the calculation of the next macroblock needs to wait until the calculation of the previous macroblock is finished, so that a Blocked "design is formed, and the processing efficiency is relatively low, for example, as shown in fig. 2, 4 pictures are processed, and the whole processing mode is a front-back blocking processing mode in the span from T1 time to T3 time.
With the coming of the 5G era, the requirements for cloud computing performance are improved due to high reliability, low time delay and large-bandwidth data transmission, the period of picture compression coding is required to be shortened in order to not influence the customer experience, and although the WebP algorithm greatly reduces the number of codes, the overall algorithm complexity is higher than that of other codes.
Disclosure of Invention
In view of the technical problems, the invention provides an FPGA acceleration method for multi-image processing based on a WebP compression algorithm, provides an effective acceleration scheme for realizing the WebP algorithm on a Field Programmable Gate Array (FPGA), realizes coding by adopting a parallel pipeline processing mode, is more efficient compared with serial processing on a CPU, reasonably utilizes on-board resources of the FPGA, and can shorten the processing time span to the time span from T1 to T2 under the influence of the FPGA acceleration scheme.
An FPGA acceleration method of multi-image processing based on a WebP compression algorithm is characterized by comprising the following steps:
step S1: transmitting the picture according to RGB three-channel data and converting the picture into corresponding YUV data;
step S2: caching YUV data generated by corresponding pictures into an on-chip DDR cache, reading the data into a computing module through bus read data, reading corresponding data from the on-chip DDR cache respectively according to the processing progress of a plurality of pictures, and putting the data into a dependent data cache region;
step S3: and when the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the batch of pictures are completely encoded.
In a preferred embodiment, the FPGA acceleration method based on WebP compression algorithm for multi-image processing is characterized by further comprising a parameter buffer area, and after converting the YUV data into YUV data, the method further comprises calculating segment parameters of the image through pre-analysis on the YUV data, and buffering the segment parameters to the parameter buffer area.
In a preferred embodiment, the FPGA acceleration method based on the WebP compression algorithm for multi-graph processing is characterized in that the parameter cache is a Bram stored inside the FPGA.
In a preferred embodiment, the FPGA acceleration method based on WebP compression algorithm for multi-graph processing is characterized in that the dependent data buffer is a DDR storage area.
The technical scheme has the following advantages or beneficial effects:
the invention provides an FPGA (field programmable gate array) acceleration method for multi-image processing based on a WebP (Web Page) compression algorithm, provides an effective acceleration scheme for realizing the WebP algorithm on a field editable gate array (FPGA), realizes coding by adopting a parallel pipeline processing mode, is more efficient compared with serial processing on a CPU (Central processing Unit), is more suitable for processing the blocked closed-loop algorithm compared with the CPU, and improves the output frame rate of the whole WebP algorithm.
Drawings
The invention and its features, aspects and advantages will become more apparent from reading the following detailed description of non-limiting embodiments with reference to the accompanying drawings. Like reference symbols in the various drawings indicate like elements. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention.
FIG. 1 is a prior art WebP lossy compression algorithm;
FIG. 2 processes a graph of the span of 4 pictures from time T1 to time T3;
FIG. 3 is a schematic diagram of an FPGA acceleration method of multi-graph processing based on a WebP compression algorithm.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in FIG. 3, the invention discloses a FPGA acceleration method of multi-graph processing based on a WebP compression algorithm, which provides an effective acceleration scheme for realizing the WebP algorithm on a field editable gate array (FPGA), realizes coding by adopting a parallel pipeline processing mode, is more efficient compared with serial processing on a CPU, reasonably utilizes on-board resources of the FPGA, and can shorten the processing time span to the time span from T1 to T2 under the influence of the FPGA acceleration scheme. The specific method comprises the following steps:
step S1: transmitting the picture according to RGB three-channel data and converting the picture into corresponding YUV data;
step S2: caching YUV data generated by corresponding pictures into an on-chip DDR cache, reading the data into a computing module through bus read data, reading corresponding data from the on-chip DDR cache respectively according to the processing progress of a plurality of pictures, and putting the data into a dependent data cache region;
step S3: and when the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the batch of pictures are completely encoded.
In a preferred embodiment, the method further comprises a parameter cache region, and after the YUV data is converted into YUV data, the method further comprises the steps of calculating segment parameters of the picture through pre-analysis, and caching the segment parameters to the parameter cache region, wherein the parameter cache region is an internal storage Bram of the FPGA.
Preferably, the dependent data cache area is a DDR memory area.
The specific implementation mode is as shown in fig. 3, the picture data can be firstly written into a DDR storage area of an FPGA chip through an upper computer, according to the calculation flow, the data of each picture can be divided into macro blocks with various sizes of Y, U and V through RGB three-channel calculation, complexity of information calculation is increased, and then the upper branch line is used for calculating section parameters. The second branch line respectively establishes cache regions for caching the macro block data of a plurality of pictures, then the macro block data sequentially enters the calculation module through the circular arbiter, one round of traversal of N pictures is performed, the number of N depends on the whole operation period of the calculation module (including prediction, DCT conversion, quantization, inverse quantization and IDCT conversion), the operation period of the calculation module is submerged or covered through data input of each macro block of each picture, so that the whole parallel pipelining acceleration scheme is indirectly realized, and when the macro block data after the new IDCT conversion processing is returned to the data cache region, the next macro block data of the first picture can be input to the calculation module. Each macro block which completes the closed loop is written into the DDR storage space corresponding to the picture partition, because coding can be performed only when the data of one picture is calculated and processed, in order to cache the information quantity of one picture which is compressed, N pieces of picture processing completion macro block data need to be cached by means of a larger cache space in the FPGA, then the N pieces of picture processing completion macro block data are sequentially taken out from the DDR and enter a coding module of the pipeline computing, and in terms of the whole process, the overall operating speed is reduced to the original 1/N computing cycle compared with the data processed by the CPU, and the output frame rate is greatly improved.
The whole processing process extracts information of a plurality of macro blocks aiming at the same picture, because the root of the whole picture compression algorithm is to filter similar information in each macro block and reserve information with larger difference, and the information of adjacent macro blocks is also the same, the object of the whole algorithm closed loop process is a single macro block, which means that the minimum cycle interval is the number of cycles spent by calculation of the single macro block, and as the number of the macro blocks decomposed by the picture is increased, the number of cycles spent by picture processing is increased in equal proportion.
In order to avoid the situation, the output frame rate of the image compression algorithm is increased, and by adopting the acceleration scheme of the invention, the macro block processing of a plurality of images is sequentially added in the whole closed-loop process to fill the middle blocking period, so that the feasible reason is that the calculation of the macro blocks among different images is not interfered with each other, and the resource configurability of the FPGA is high, and compared with a CPU (central processing unit) which is more suitable for processing the blocked closed-loop algorithm, the output frame rate of the whole WebP algorithm is improved by adopting a parallel pipeline calculation mode.
Those skilled in the art will appreciate that those skilled in the art can implement the modifications in combination with the prior art and the above embodiments, and the details are not described herein. Such variations do not affect the essence of the present invention and are not described herein.
The above description is of the preferred embodiment of the invention. It is to be understood that the invention is not limited to the particular embodiments described above, in that devices and structures not described in detail are understood to be implemented in a manner common in the art; those skilled in the art can make many possible variations and modifications to the disclosed embodiments, or modify equivalent embodiments, without affecting the spirit of the invention, using the methods and techniques disclosed above, without departing from the scope of the invention. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (4)

1. An FPGA acceleration method of multi-image processing based on a Webp compression algorithm is characterized by comprising the following steps:
step S1: transmitting the picture according to RGB three-channel data and converting the picture into corresponding YUV data;
step S2: caching YUV data generated by corresponding pictures into an on-chip DDR cache, reading the data into a computing module through bus read data, reading corresponding data from the on-chip DDR cache respectively according to the processing progress of a plurality of pictures, and putting the data into a dependent data cache region;
step S3: and when the calculation traversal of one YUV macro block of one picture is completed, the macro block calculation traversal of the next picture is performed, and the pictures are switched in turn until all macro blocks of the batch of pictures are completely encoded.
2. The FPGA acceleration method for multi-graph processing based on the Webp compression algorithm of claim 1, further comprising a parameter buffer area, wherein the conversion into YUV data further comprises calculating segment parameters of the picture through pre-analysis of the YUV data, and buffering the segment parameters to the parameter buffer area.
3. The FPGA acceleration method of multi-graph processing based on Webp compression algorithm of claim 2, characterized in that the parameter cache is an internal storage Bram of the FPGA.
4. The FPGA acceleration method based on multi-graph processing of Webp compression algorithm of claim 1, characterized in that the dependent data buffer area is a DDR storage area.
CN202010653783.9A 2020-07-08 2020-07-08 FPGA acceleration method for multi-graph processing based on WebP compression algorithm Active CN111815502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010653783.9A CN111815502B (en) 2020-07-08 2020-07-08 FPGA acceleration method for multi-graph processing based on WebP compression algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010653783.9A CN111815502B (en) 2020-07-08 2020-07-08 FPGA acceleration method for multi-graph processing based on WebP compression algorithm

Publications (2)

Publication Number Publication Date
CN111815502A true CN111815502A (en) 2020-10-23
CN111815502B CN111815502B (en) 2023-11-28

Family

ID=72843439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010653783.9A Active CN111815502B (en) 2020-07-08 2020-07-08 FPGA acceleration method for multi-graph processing based on WebP compression algorithm

Country Status (1)

Country Link
CN (1) CN111815502B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112437308A (en) * 2020-11-12 2021-03-02 北京深维科技有限公司 WebP coding method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488753A (en) * 2015-11-27 2016-04-13 武汉精测电子技术股份有限公司 Method and device for carrying out two-dimensional Fourier transform and inverse transform on image
US20170256023A1 (en) * 2016-03-02 2017-09-07 Alibaba Group Holding Limited Solid state storage local image processing system and method
CN107154062A (en) * 2017-05-12 2017-09-12 郑州云海信息技术有限公司 A kind of implementation method of WebP Lossy Compression Algorithms, apparatus and system
CN107483948A (en) * 2017-09-18 2017-12-15 郑州云海信息技术有限公司 Pixel macroblock processing method in a kind of webp compressions processing
US20180089091A1 (en) * 2016-09-26 2018-03-29 Intel Corporation Cache and compression interoperability in a graphics processor pipeline
CN109327698A (en) * 2018-11-09 2019-02-12 杭州网易云音乐科技有限公司 Dynamic previewing map generalization method, system, medium and electronic equipment
CN110689475A (en) * 2019-09-10 2020-01-14 浪潮电子信息产业股份有限公司 Image data processing method, system, electronic equipment and storage medium
CN110876078A (en) * 2018-08-30 2020-03-10 阿里巴巴集团控股有限公司 Animation picture processing method and device, storage medium and processor
CN110913225A (en) * 2019-11-19 2020-03-24 北京奇艺世纪科技有限公司 Image encoding method, image encoding device, electronic device, and computer-readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488753A (en) * 2015-11-27 2016-04-13 武汉精测电子技术股份有限公司 Method and device for carrying out two-dimensional Fourier transform and inverse transform on image
US20170256023A1 (en) * 2016-03-02 2017-09-07 Alibaba Group Holding Limited Solid state storage local image processing system and method
US20180089091A1 (en) * 2016-09-26 2018-03-29 Intel Corporation Cache and compression interoperability in a graphics processor pipeline
CN107154062A (en) * 2017-05-12 2017-09-12 郑州云海信息技术有限公司 A kind of implementation method of WebP Lossy Compression Algorithms, apparatus and system
CN107483948A (en) * 2017-09-18 2017-12-15 郑州云海信息技术有限公司 Pixel macroblock processing method in a kind of webp compressions processing
CN110876078A (en) * 2018-08-30 2020-03-10 阿里巴巴集团控股有限公司 Animation picture processing method and device, storage medium and processor
CN109327698A (en) * 2018-11-09 2019-02-12 杭州网易云音乐科技有限公司 Dynamic previewing map generalization method, system, medium and electronic equipment
CN110689475A (en) * 2019-09-10 2020-01-14 浪潮电子信息产业股份有限公司 Image data processing method, system, electronic equipment and storage medium
CN110913225A (en) * 2019-11-19 2020-03-24 北京奇艺世纪科技有限公司 Image encoding method, image encoding device, electronic device, and computer-readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHENHUA GUO等: "An OpenCL Implementation of WebP Accelerator on FPGAs", APPLIED RECONFIGURABLE COMPUTING.ARCHITECTURES, TOOLS, AND APPLICATIONS., pages 578 - 589 *
韩宇等: "高分七号卫星图像压缩FPGA设计与实现技术", 航天器工程, vol. 29, no. 3, pages 169 - 176 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112437308A (en) * 2020-11-12 2021-03-02 北京深维科技有限公司 WebP coding method and device

Also Published As

Publication number Publication date
CN111815502B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
US11057585B2 (en) Image processing method and device using line input and output
KR101710001B1 (en) Apparatus and Method for JPEG2000 Encoding/Decoding based on GPU
CN110337002B (en) HEVC (high efficiency video coding) multi-level parallel decoding method on multi-core processor platform
CN108781298B (en) Encoder, image processing system, unmanned aerial vehicle and encoding method
AU2019101272A4 (en) Method and apparatus for super-resolution using line unit operation
CN107852509A (en) Method and apparatus for coding and decoding image
CN108769684A (en) Image processing method based on WebP image compression algorithms and device
CN104113761A (en) Code rate control method for video encoding and encoder
CN111815502B (en) FPGA acceleration method for multi-graph processing based on WebP compression algorithm
CN105100799A (en) Method for reducing intraframe coding time delay in HEVC encoder
CN106231307B (en) A kind of compression of images intra-coding prediction method and its hardware realization
CN116600134A (en) Parallel video compression method and device adapting to graphic engine
EP4300976A1 (en) Audio/video or image layered compression method and apparatus
CN110087085A (en) Image processing apparatus
WO2022116824A1 (en) Video decoding method, video encoding method, related devices, and storage medium
CN105472388A (en) Color filter array image encoding-decoding method, device and system
WO2022252222A1 (en) Encoding method and encoding device
CN114727116A (en) Encoding method and device
CN112437308A (en) WebP coding method and device
CN114697650A (en) Intra-frame division method based on down-sampling, related device equipment and medium
CN114173127A (en) Video processing method, device, equipment and storage medium
CN112422983A (en) Universal multi-core parallel decoder system and application thereof
TWI832661B (en) Methods, devices and storage media for image coding or decoding
CN104602026A (en) Reconstruction loop structure applicable to full multiplexing of encoder under HEVC (high efficiency video coding) standard
WO2023185806A9 (en) Image coding method and apparatus, image decoding method and apparatus, and electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant