CN115604489A

CN115604489A - Image compression method based on human visual quality and display resolution

Info

Publication number: CN115604489A
Application number: CN202110775153.3A
Authority: CN
Inventors: 刘锋; 蒋英海; 崔荣升; 张仕昂; 张爽
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2023-01-13

Abstract

The invention belongs to the field of image compression processing. Image compression coding techniques are an important component of modern multimedia and information technologies. With the continuous maturity of the current storage technology and network transmission technology, the demands of various users on image information are increasing, and the consumption of image data on storage and network transmission resources is also increasing. The conventional image compression coding technology needs to be improved in terms of satisfying the human visual quality requirement and the multi-resolution processing of images. In order to improve the efficiency of image storage and transmission and enable a user to acquire image information more quickly, the invention provides an image compression coding method taking human visual quality and display resolution as parameters. In one aspect, the two-dimensional discrete wavelet transform conforms to the features perceived by the human visual system and supports multi-resolution image processing. On the other hand, the visual threshold model based on the human visual quality principle can effectively optimize the image code stream, eliminate the visual redundancy of the image and improve the compression performance. Based on the two points, the invention can carry out compression coding to different degrees on the image according to the requirements of different resolutions and human visual quality. Compared with the traditional image compression method, the method can reduce the image compression coding rate and improve the image coding efficiency on the premise of certain visual quality and image resolution, thereby effectively reducing the resource consumption of image data storage and network transmission.

Description

Image compression method based on human visual quality and display resolution

Technical Field

The invention belongs to the field of compressed image coding, and particularly relates to an image compression method based on human visual quality and display resolution. The method relates to image compression coding, human visual quality, discrete wavelet transform technology, image multi-resolution display and visual masking effect.

Background

With the continuous development of image processing technology and multimedia technology, users increasingly demand high-quality image storage and code stream transmission. After the 20 th century, discrete Wavelet Transform (DWT) is gradually adopted in image processing technology. Compared with earlier image decorrelation transforms, the two-dimensional discrete wavelet transform is closer to the features perceived by the human visual system. And the discrete wavelet transforms a unique scale function and a wavelet function, so that multi-resolution image processing becomes possible. Therefore, the discrete wavelet transform is widely used in various fields, such as image denoising, image compression coding, and the like.

An image can be compressed because of redundancy in the image data. The image compression encoding technique is to retain image information with a small data amount by removing the correlation between image data. The redundancy of the image comprises spatial redundancy, structural redundancy, visual redundancy and the like, and the data compression of the image based on the wavelet transform generally adopts the following two modes:

(1) Image compression coding technology based on minimum arithmetic error

In conventional image compression coding techniques, the coding result is often optimized by minimizing the arithmetic Error (e.g., mean Square Error (MSE)) during coding. In 1993, shapiro proposed an Embedded Wavelet Zero-Tree coding Algorithm (EZW) based on a Wavelet-coded Tree data structure. The EZW algorithm finds the correlation of wavelet coefficients of different levels in the same direction and the same spatial position through a zero tree structure, reduces the coding of the wavelet coefficient high-frequency sub-band coefficient by utilizing the correlation, and greatly improves the coding efficiency. In 1996, a.said and w.a.pearlman proposed a multi-level Set splitting algorithm (Set Partitioning in Hierarchical Trees, SPIHT). The data structure keeps the effective organization of the zero tree structure to wavelet coefficients with correlation between different scales, and simultaneously increases the organization to wavelet coefficients with correlation under the same scale, thereby organizing the wavelet coefficients more effectively. However, three coefficient lists generated in the encoding process of the SPIHT algorithm occupy a large amount of memory, and the characteristics of wavelet transform on energy concentration are not fully utilized, so that a.islam and w.a.pearlman propose a set split embedded block algorithm (spec: set partitioned embedded block) in 1999, the complexity of the spec algorithm is lower than that of the SPIHT algorithm, the occupied dynamic memory is smaller when the lists of coefficients are scanned, and the SPIHT algorithm is more advantageous in encoding rate. In order to solve the problem of large memory occupation in coding, lin and Burgress propose an LZC (LZC) algorithm which is a new algorithm more beneficial to hardware realization, and the memory demand is improved by reducing the performance of an encoder of the algorithm to a certain extent. Xiong and Ramcandran proposed the SFQ algorithm (Space-Frequency Quantization). The method continuously searches the optimal zero tree structure and scalar quantization step size by adopting a combined optimal mode at the cost of greatly increasing the encoding complexity, so that the reconstructed image generates the minimum distortion. In around 2000, the JPEG2000 image coding standard was proposed. JPEG2000 can freely combine code streams according to different orders of spatial positions, image components, MSE-based image quality, image resolution and other factors, and supports image progressive coding and decoding. As part of the JPEG image compression standard system, JPEG2000 has become one of the mainstream standards for image compression and encoding nowadays.

(2) Image compression coding technology based on human visual quality

In recent years, the perception of the Human Visual System (HVS) has become more and more important in the field of image coding, and image compression based on Human Visual quality has become an important point of research. In the last 90 s of the century, chou and Li proposed a subband image coding scheme that employed a suitable difference model to simulate HVS perception and achieve nearly transparent coding. Watson et al describe a model for assessing the visibility of quantization distortion in a discrete wavelet transform. This model is later incorporated into JPEG2000 by Liu et al for grayscale image coding. Han et al introduced a DWT coefficient model and a quantization distortion model of the Deadzone quantization in JPEG2000, and developed a visually lossless image coding method for image coding using a Visual Threshold (VT). The model and visually lossless image coding method are then further developed for multi-resolution and remote browsing mechanisms. Unlike earlier literature, the work of Oh et al achieves image code stream optimization based on HVS perception for grayscale and color images, and employs subjective and objective evaluation methods in its methodology.

Due to the consideration of human visual characteristics, the performance of the existing image compression method based on human visual quality is improved to a certain extent compared with the traditional image compression method based on the minimum arithmetic error. However, image coding based on HVS perceptual quality above a threshold is not studied in these works, and images with corresponding resolution and visual quality cannot be provided according to the requirements of users for different resolutions and visual qualities, which results in a waste of network transmission resources to some extent.

The invention provides a visibility threshold image compression coding method which gives consideration to both browsing of multi-resolution images and human visual quality based on a human visual quality principle and image wavelet transformation, and realizes code stream optimization of gray scale and color images with different visual qualities and different resolutions. The algorithm is low in complexity, not only meets the requirements of users on image quality, but also enhances the performance of image compression, and the method can be used for remote transmission of large image data code streams, so that less network resources are occupied and the transmission efficiency is improved on the premise of ensuring certain resolution and human visual quality of decoded images. The method can be well applied to occasions needing to remotely browse the oversized images in telemedicine and the like.

The invention content is as follows:

the invention aims to solve the problems that the traditional image compression technology can not completely eliminate image coding redundancy aiming at human vision, and the prior art has insufficient efficiency in the aspect of multi-resolution image coding based on human vision sensitivity. The invention emphasizes the sensitivity measurement of human vision and the wavelet transform coefficient processing link based on human vision quality. Aiming at the data result obtained by processing, methods such as entropy coding and the like can be utilized to finally form an image compression coding code stream, thereby providing flexible image visual quality and resolution coding selection for users.

The specific technical scheme for solving the problems is as follows: an image compression method based on human visual quality and display resolution comprises the following steps:

(1) Performing component decomposition on the image to obtain image data under each component (for a color image, each decomposed component may be an RGB component, a YCbCr component, or the like, and for a grayscale image, this step may be skipped);

(2) Selecting wavelet transform for image compression coding, wherein the wavelet transform can make each component data of an image sparse and divide each frequency band wavelet coefficient into a series of coding units;

(3) In a gray image background, measuring the just visible amplitude of a single wavelet coefficient of each wavelet band of each component of a specific display device and an observation distance under human vision under full resolution so as to represent the sensitivity of human eye detection wavelet coefficients under different sub-bands;

(4) Calculating the just visible amplitude of the wavelet coefficient under other resolutions by using the just visible amplitude of the obtained wavelet coefficient under the full resolution (different display devices or observation distances can be converted into corresponding image resolutions through the pixel pitch and the observation distance of the display devices);

(5) Calculating the visual threshold of each coding unit by using the just visible amplitude of the calculated wavelet coefficient under the given relative resolution according to the human visual quality required by image compression coding and combining the visual masking effect;

(6) Quantizing wavelet transform data of each coding unit by using a Deadzone quantization method, and processing the precision of wavelet transform coefficients of the image according to the calculated visual threshold and wavelet quantization step size so as to optimize the length of binary data of a wavelet transform result;

(7) And (4) performing compression coding on the data obtained in the step (6) by using methods such as entropy coding and the like to finally obtain a compression coding result of the image.

The invention realizes the resolution freedom of image compression coding based on the characteristic of selectable resolution in the wavelet transformation result; meanwhile, the visual quality control of the decoded image is realized through the control action of the Deadzone quantization and the visual threshold value on the image wavelet transformation quantization precision. In the visual threshold system of the invention, the minimum visual threshold of each wavelet band is utilized to generate the coded data corresponding to the full-resolution human visual lossless image. And reversely executing the 7 steps to generate a decoding image corresponding to the coded data.

Description of the drawings:

FIG. 1 is a system frame structure of the present invention;

FIG. 2 is a wavelet transform multi-resolution processing mode;

FIG. 3 is a schematic diagram of a Deadzone quantizer;

FIG. 4 is a flow chart of a method of calculating a vision threshold using dichotomy;

FIG. 5 is a flow chart for controlling quantization encoding by visual thresholding;

FIG. 6 is a schematic diagram of 3AFC visual psychology experiments in a verification experiment;

the specific implementation mode is as follows:

the system framework structure of the invention is shown in fig. 1, and the core design scheme can be described by the following two parts:

(1) Measuring and calculating scheme for sensitivity of wavelet coefficient of human visual detection

Research shows that the human visual system has different sensitivities to different spatial modulation frequencies of light source contrast, and the human visual perception mechanism can be simulated through two-dimensional wavelet transformation. In two-dimensional wavelet transform, the transform is developed through a hierarchy. At each level, the transformed frequency band may be divided into four subbands, HH, HL, LH, and LL. The result of the subband being represented by HH is obtained by high-pass filtering and 2-fold down-sampling both the horizontal and vertical directions of the image; HL represents the result of this subband, which is obtained by high-pass and low-pass filtering and 2-fold down-sampling the horizontal and vertical directions of the image, respectively; LH represents the result of the sub-band and is obtained after low-pass and high-pass filtering and 2 times down-sampling are respectively carried out on the horizontal direction and the vertical direction of the image; and LL represents the subband resulting from low-pass filtering and 2-fold down-sampling both the horizontal and vertical directions of the image. The wavelet band of the next level is obtained by continuously decomposing the LL sub-band of the current level.

Aiming at the characteristics of wavelet decomposition, the invention measures the just visible amplitude of a single wavelet coefficient of a human visual system for each wavelet band of a specific display device and observation distance under different image component channels (such as RGB, YCbCr and the like), and further measures the sensitivity of the human visual system to a spatial contrast modulation frequency (the larger the just visible amplitude is, the smaller the representation sensitivity is).

Theoretically, in the context of a grayscale image (i.e. in YCbCr space, if each pixel has 8 bits per component, then Y =128, while the Cb and Cr components take the value 0), the probability that a single coefficient in the wavelet band can be detected by the human visual system obeys:

where y is the value of the wavelet coefficient. β is a constant, and is taken to be 2 in the present invention. Gamma depends on the design of visual psychological test. For example, if a human experimenter is asked in a psychovisual test whether a single non-zero wavelet coefficient is present in a certain image (yes/no test), γ =0. If the human experimenter is asked to select one of the two images that contains a single non-zero wavelet coefficient (2-alternative-forced-choice test,2AFC test), then γ =1/2; if the human subject is asked to select one of the three images present (3-alternating-forced-choice test,3AFC test), then γ =2/3.

In formula (1), T _{c，(b，l)，r} Is the value of the detectability of the wavelet coefficients for a particular display device and viewing distance. At T _{c，(b，l)，r} In c, the image component is expressed as a component of the image component. (b, l) represents wavelet subbands, where b represents the decomposition direction, selected from LL, HL, LH, HH, and l represents the level of wavelet decomposition. r represents the relative resolution of the image and is the ratio between the current resolution and the full resolution of the image (i.e. the ratio of the pixel values in one dimension of the current image to the full resolution image). Thus, r takes on a value between 0 and 1, and when the image is at full resolution, r takes on a value of 1.

Notably, T _{c，(b，l)，r} Equaling the formula satisfying e in the vision test ^-1 The value of the wavelet coefficient of (a). In different vision tests, the amplitude is equal to T _{c，(b，l)，r} There will be different detection probabilities for non-zero wavelet amplitudes. For example, the detection probability is about 63% in the yes/no test, about 82% in the 2AFC test, and about 75% in the 3AFC test. Therefore, T _{c，(b，l)，r} Can be used to describe the sensitivity of the HVS to single component and wavelet sub-bands, where higher T _{c，(b，l)，r} The values correspond to lower HVS sensitivity.

T _{c，(b，l)，r} The values of (a) were obtained by actual testing by a human experimenter. During the test of each frequency band, by continuously adjusting the alternative T _{c，(b，l)，r} To finally obtain the desired T _{c，(b，l)，r} The measurement result of (1). The present invention proposes to measure T using a 3AFC experiment _{c，(b，l)，r} At this time, the alternative T needs to be adjusted _{c，(b，l)，r} The test detection probability reaches about 75%, and the test process can be realized by means of open source tools such as a QUEST psychological measurement tool box and the like. The test method and procedure is not limited to the above-described methodA case.

For a given component and decomposition level, T will be the sensitivity of the HVS to HL and LH sub-bands, since these are very close _{c，(HL，l)，1} And T _{c，(LH，l)，1} And (6) merging. The invention is in T _{c，(b，l)，r} In the test process of (2), for specific situations (according to the requirements of parameters in the following second partial formula), it is necessary to decompose T within a certain wavelet decomposition depth _{c，(b，l)，r} The measurement is performed.

As shown in fig. 2, the wavelet transform may perform multi-resolution processing on an image. If the relative resolution of the image is reduced by 2 ^-u Times, the spatial frequencies covered by the sub-band (b, l) in the original image will shift into the sub-band (b, l-u). Thus, for a given component and relative resolution r, the sensitivity of human vision to the sub-bands (b, l) and the relative resolution r 2 ^-u In the case of (c), the sensitivity of human vision to the sub-band (b, l-u) is the same. By T _{c，(b，l)，r} In the form of (1), there are:

as can be seen from the formula (2), for any positive integer u,

retrievable T _{c，(b，l-u)，1} Is not an integer power of 2 at r, and

in the case of (2), a positive integer m can always be found, so that T _{c，(b，l)，r} Is equal to

And is provided with

For

T _{c，(b，l)，r} Can be obtained by the following method: a full resolution gray scale image is first generated that contains only 1 unit value wavelet coefficient in subband (b, 2) and is centered. The image is then scaled according to the relative resolution r and zero-padded in the periphery to the original image size. The image is then processed using a two-level two-dimensional wavelet transform. And the resulting wavelet coefficients are collected. Assuming that the HVS sensitivity to each non-zero wavelet coefficient is independent in this case, then the present invention applies T according to equation (1) _{c，(b，l)，r} The modeling is as follows:

where β =2, coefficient

Representation sub-band (b) ₀ And k) the sum of wavelet coefficients to the power of beta.

(2) Calculation and coding scheme of visual threshold

The invention defines a quality metric Q (Q > 0) for human visual perception, Q being related to a theoretical detection probability of quantization error in an observation Field of view (FOV), and the formula is defined as follows:

wherein γ has the same meaning as in formula (1).

As can be seen from equation (4), the higher Q, the lower the probability that the image is resolved. This means that the image has a higher visual quality.

The image coding of the present invention quantizes the wavelet transform result using a Deadzone quantizer, and the Deadzone quantizer and its reconstruction method are shown in fig. 3. In order to control the decoded image quality, a Visual Threshold (VT) is introduced for each wavelet sub-band coding unit. For a certain image coding unit (such as wavelet coefficient code block) which is subjected to component decomposition and wavelet coefficient transformation, the present invention uses VT to control the actual quantization error, thereby controlling the visual quality of image compression. VT is defined as the Deadzone quantization step for the image quality Q at a given relative resolution r. From formula (4), one can obtain:

in a specific encoding process, the visual threshold VT _{c，(b，l)，r} Obtained by numerical calculation. f (x) is the quantization error probability density function, which can be modeled at the LL subband as:

where σ is the standard deviation of the wavelet transform coefficients of a certain coding unit sub-band, Δ represents the Deadzone quantization step size,

the quantization error probability density distribution in the LH, HL, and HH subbands can then be modeled as:

wherein

The field of view is the range of eye focus when the human eye observes an object, N _(b，l)，r Is the relative resolution r, the number of wavelet coefficients of the sub-bands (b, l) included in the view domain can be calculated by the following formula:

wherein d is ₀ Is the distance from the human eyes to the screen, and the unit is mm; θ is the human viewing angle, here 2 °; p is the pixel pitch of the display in mm.

m _{c，(b，l)，r} Is a visual masking factor used for eliminating visual masking effect, and the specific formula is as follows:

wherein e =2.718281828459, namely a natural constant, alpha is a constant, and the value range of alpha is more than or equal to 0 and less than or equal to 1.

For a certain coding unit of a certain wavelet sub-band of a certain image component, there are many methods for calculating VT. For example, the value of VT in each coding unit may be calculated by a binary method. The specific algorithm flow is as follows:

as shown in FIG. 4, when starting to calculate VT, a very small threshold VT is first set ₁ And a maximum threshold VT ₂ . Then, the monotonicity of the function is used to judge according to the formula (5)

Whether the corresponding Q value is greater than the set image quality value. If so, then order

Otherwise make the instruction

Repeating the steps for a plurality of times in sequence, and finally enabling the width of the threshold interval to approach 0, thereby calculating the VT value required by the corresponding coding unit. Due to the use of the Deadzone quantizer, if a certain wavelet transform coefficient is deadzed based on a certain quantization step size, each reduction of one least significant bit in the resulting binary quantization result is equivalent to increasing the quantization step size by a factor of 2. The invention is to binary codeThe interception of the stream and the encoding strategy may be as shown in fig. 5. This strategy can be summarized as:

for a certain coding unit of a certain sub-band, the quantized wavelet coefficient binary code stream may be coded gradually from the most significant bit to the least significant bit in a certain way. When the quantization error of each wavelet coefficient in the coding unit is lower than the quantization error caused by VT as quantization step size for the first time, all the coding channels can be discarded after that, and the interception of the binary code stream is completed. The intercepted code stream can adopt any entropy coding method to further complete the image data compression coding.

After the encoding is finished, the steps are reversely repeated, namely entropy decoding, deadzone quantization reconstruction, rearrangement of the sequence of the encoding units according to the original positions of the wavelet transformation results, inverse wavelet transformation and composition synthesis (aiming at color images), namely the decoding of the images is finished.

The present invention can be verified based on the JPEG2000 lossy encoding framework. The standard JPEG2000 lossy coding performs image thinning and quantization processing on an image based on a 9/7 wavelet transform and a Deadzone quantizer using the YCbCr color transform. Conventional JPEG2000 lossy image compression coding optimizes the encoded code stream based on the minimization of the Mean Square Error (MSE) of the decoded image.

By using the human visual sensitivity measurement method, the invention performs a visual sensitivity measurement verification experiment on the ASUS PA328Q display. The display has a pixel pitch of 0.1845mm and a display brightness of 350cd/m ² . Visual psychology experiments a 3AFC experiment was used, the experimental setup being shown in fig. 6, where the human experimenter was 60cm from the display, and in each test the human experimenter was required to indicate which of the three images contained the non-zero wavelet coefficients within 10 seconds.

T obtained by measurement using 3AFC Vision psychology experiment and QUEST kit _{c，(b，l)，1} The data are shown in table 1.

TABLE 1T measured in ASUS PA328Q _{c，(b，l)，1}

In order to evaluate the effectiveness of the "image compression method based on human visual quality and display resolution" according to the present invention, image encoding experiments were performed on 29 Images of the Universal original Eyetracker color image Database (ftp:// ftp. Iv. Polytech. Univ. Nano. Fr/IRCCyN _ IVC _ Eyetracker _ Images _ LIVE _ Database/Images /). Wherein the quality index Q is taken to be 1,0.7,0.3 and 0.1, the relative resolution is taken to be 1,0.75 and 0.5, and the alpha parameter in the visual masking factor (i.e., equation (9)) is taken to be 0.6. The specific coding rate results are shown in table 2.

Table 2 results of image compression experiments

In table 2, the larger the code rate of the image, the larger the volume of the file generated after the image compression. As can be seen from table 2, the code rate of the image decreases as the visual quality Q and the relative resolution r decrease. This indicates that, under the condition of a certain resolution, a higher code rate is needed to encode the image in order to improve the visual quality of the image and reduce the probability of the image being distinguished. On the other hand, when the visual quality of the image is constant, the image displayed at a lower resolution will generate a code stream at a lower code rate.

In addition, different images with the same visual quality and the same resolution have different code rates. This means that the encoding approach herein can take different degrees of compression for different images. For images with rich details, the encoder will compress less to preserve more image detail. According to the two points, the invention can adopt compression of different degrees according to the actual requirements of users, provide images meeting the requirements of users and reduce unnecessary storage or transmission resource consumption.

To verify the validity of the model and the encoder, visual experiments were also performed on the decoded images. The experiment tests pictures with quality index Q of 1 and 0.1 and relative resolution of 1,0.75 and 0.5, and sets the visual experiment of pictures generated by a conventional MSE encoder under the same average code rate as a contrast. The specific experimental results are shown in tables 3 and 4.

Table 3 correct recognition rate of compressed picture of encoder based on visual quality

TABLE 4 correct recognition rate of compressed pictures of MSE encoder under the same code rate

By observing tables 3 and 4, comparing the experimental results of different visual qualities at the same resolution, it can be concluded that for higher Q values, the correct recognition rate of the picture tends to be lower, which means higher visual quality. If the experimental results of the improved encoder and the conventional MSE encoder are compared, it can be found that the improved encoder has a lower correct recognition rate in most cases. This shows that the performance of the algorithm adopted by the invention is improved to a certain extent compared with the conventional MSE coding algorithm.

Claims

1. The multi-resolution human visual sensitivity measurement and calculation method under the wavelet transform is characterized in that:

(1) Measuring the just visible amplitude of a single wavelet coefficient of each wavelet sub-band of full resolution for a particular display device and viewing distance in a grey background using visual psychology experiments for different image components (e.g. RGB or YCbCr, etc.) (in a "yes/no" visual psychology test the probability of correct detection is about 63%, in a 2AFC visual psychology test the probability of correct detection is about 82%, in a 3AFC visual psychology test the probability of correct detection is about 75%) and measuring the sensitivity of human vision to that wavelet band with that amplitude;

(2) The exact visible amplitude of the wavelet coefficients at the required image presentation resolution is calculated (different display devices or viewing distances, which can be converted to the corresponding image resolution by the display device pixel pitch and viewing distance) using the obtained exact visible amplitude of the wavelet coefficients at full resolution, as shown in equations (2), (3) and related descriptions.

2. The method of visual thresholding using the exact visible amplitude of the individual wavelet coefficients of each wavelet transform subband in combination with a visual masking model and a method of image compression coding based on human visual quality and image display resolution as claimed in claim 1, wherein:

(1) Calculating a visual masking factor in combination with the exact visible amplitude of the individual wavelet coefficients of each wavelet transform subband, and the standard deviation of the corresponding coding unit wavelet subband coefficients, as described in claim 1, by the method shown in formula (9) and the associated description;

(2) Calculating the apparent threshold value of each wavelet sub-band using the just visible amplitude and the visual masking factor of the individual wavelet coefficients of each wavelet transform sub-band as described in claim 1, as shown in formulas (5) - (8), fig. 3 and the associated description;

(3) Quantizing each wavelet transform sub-band coefficient of each component of the image by using a Deadzone quantizer, and optimizing an image binary wavelet coefficient code stream by using the calculated visual threshold: the quantized wavelet coefficient binary code stream may be encoded progressively from the most significant bit to the least significant bit in some manner for a certain coding unit (e.g., wavelet coefficient code block) of a certain subband. When the maximum quantization error in the coding unit is lower than the quantization error caused by taking VT as a quantization step size for the first time, all the coding channels can be discarded after the maximum quantization error is caused, and the interception of the binary code stream is finished;

(4) The intercepted code stream can adopt any entropy coding method to further complete image data compression coding.