CN112634278B

CN112634278B - Super-pixel-based just noticeable distortion method

Info

Publication number: CN112634278B
Application number: CN202011188873.1A
Authority: CN
Inventors: 王永芳; 王闯
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2022-06-14
Anticipated expiration: 2040-10-30
Also published as: CN112634278A

Abstract

The invention discloses a novel super-pixel-based just noticeable distortion method. The method comprises the following establishing steps: 1. processing an input image, 2, calculating regional characteristics, 3, establishing a regional weight model, 4, establishing a contrast masking model of joint texture roughness, and 5, obtaining a perception model based on superpixels. The invention is tested on the relevant test images, and the experimental result shows that the method has good accuracy and robustness. The method introduces a region weight model and a region roughness modulation model, wherein the region weight model combines the color contrast of the region level and the concave modulation effect to estimate the visual importance degree of each region, and the region roughness modulation model refines the selection of the window size on the basis of Tamura texture roughness and considers the influence of the average gray difference between windows to estimate the texture condition of each region.

Description

Super-pixel-based just noticeable distortion method

Technical Field

The invention relates to an image visual redundancy estimation method, in particular to a Just Noticeable Distortion (JND) model based on superpixels, belonging to the field of image processing and computer vision.

Background

With the rapid development of internet and multimedia information technology, images and videos have become the main information transmission media, and the increasing data volume brings huge challenges to the existing coding technology. Considering that the Human eye is the final receiver of the image/video, and the Human eye is imperceptible to the change of the pixel value below a certain threshold, which is also called as the JND threshold, the characteristics of the Human Visual System (HVS) can be utilized to remove the Visual redundancy in the image, so as to improve the existing coding efficiency, and further save the precious network bandwidth resources. The JND model is an effective method for estimating visual redundancy, and has wide applications, such as video coding, subjective evaluation, digital watermarking, and the like.

The JND models are generally classified into two categories, namely pixel domain JND models and transform domain JND models, wherein the pixel domain JND models are models built according to the characteristics of pixel values of images, and the transform domain JND models are models built according to the characteristics of the pixel values in the transform domain. The JND model is developed based on the research of the human visual system, the pixel domain JND model mainly includes a Luminance Masking (LM) effect and a Contrast Masking (CM) effect, and the transform domain JND model mainly uses a spatial Contrast Sensitivity Function (CSF) model, a background Luminance Adaptation (LA) effect, and a Texture Masking (TM) effect. With the continuous exploration and mining of the HVS, more visual characteristics are proposed and applied to the establishment of new JND models, such as free energy principle, mode complexity, disorder masking effect and the like. More and more JND models have been proposed in recent years, but the creation of these models is entirely pixel-based and does not take into account the different contribution of different regions to vision at the region level. Therefore, according to the difference of attention and sensitivity of human eyes to each region, how to establish modulation factors of a region level through color features and texture features of the region and how to improve the prediction accuracy of a perception model to a human eye visual system by using the modulation factors are the problems to be further solved at present.

Disclosure of Invention

The invention aims to provide a super-pixel-based just noticeable distortion method for accurately estimating visual redundancy information of an image. Because the recognition capability of human eyes is limited, the attention degree and the sensitivity degree of human eyes to each region are different when an image is observed due to the difference of the color, the texture and other characteristics of each region.

Image segmentation is performed by using a single Linear adaptive Clustering (SLIC) algorithm, and pixel points in each segmented super pixel have the same or similar characteristics of color, texture and the like, so that a pixel block with certain visual information is formed. According to the difference of attention of human eyes to different regions, calculating the average value of all pixel points in the superpixel in space coordinates and the average value of color components to establish a region-level color contrast model, wherein the higher the contrast is, the more easily the attention is paid. Meanwhile, considering the concave characteristic of human eyes, namely that the human visual system is most sensitive to the area which is mapped to the central concave area of the human eyes, and the sensitivity of the human eyes is rapidly reduced along with the distance from the central concave area to the human eyes, the area with the highest contrast is selected as the central concave area, and the concave modulation model of the area level is established, and the closer to the central concave area, the more easily the concave modulation model is concerned. In addition, since the texture has a masking effect on human eyes, the texture roughness of the region is calculated and incorporated into the perceptual model as a modulation factor. The method can accurately estimate the visual redundancy information of the test image, better conforms to the visual characteristics of human eyes, and can effectively remove more visual redundancy information while ensuring the subjective perception quality. In addition, the method has important reference significance for researching the application of the super-pixel characteristics in human vision and perception models.

In order to achieve the purpose, the invention has the following conception:

first, the input image is area-divided by the superpixel segmentation algorithm SLIC, and the spatial coordinates and color components of each area are calculated and represented by the average value of the spatial coordinates X, Y and the average value of the color components L, A, B of all pixels in the area. Then, the color contrast is calculated using the color component and the spatial coordinates of the super-pixel, considering that the region with a higher contrast is more likely to be noticed by the human eye according to the attention of the human eye to each region. According to the pit characteristics of human eyes, a concave modulation model based on regions is established, so that the visual importance degree of each region is accurately estimated. Further, a texture roughness modulation model at a region level is designed in consideration of the masking effect of the region texture, and the masking effect of the region texture on distortion is considered. The input image is processed through the idea, namely, the superpixel-based JND model according to the present invention.

According to the conception, the invention adopts the following technical scheme:

a JND method based on superpixels, comprising the steps of:

step 1, preprocessing an input image: including superpixel segmentation and conversion of color space. Performing region segmentation on the input image by using a superpixel segmentation algorithm SLIC, and simultaneously converting the input image into an LAB color space with uniform perception;

step 2, calculating regional characteristics: including color features and texture features. For color features, firstly, in each super-pixel region, an average value of spatial coordinates X, Y and an average value of color components L, A, B of all pixel points in the region are obtained to represent the spatial coordinates and the color components of the super-pixel region, and then spatial distances and color differences between the super-pixel region and other super-pixels are calculated in sequence. Obviously, if the color of a certain area is greatly different from that of the surrounding area, the color contrast of the area is high; if super-pixels with high color contrast are adjacent or concentrated, the part is considered to be more attractive to human eyes, and therefore the color contrast is calculated by combining the color difference and the spatial distance. In addition, 5 areas with the highest color contrast are selected as concave areas, and a concave modulation model is established according to the concave characteristics of human eyes. For texture features, considering the masking effect of regional textures on vision, further refining the size of a window on the basis of Tamura texture roughness, considering the influence of average gray difference between windows, establishing a texture roughness model based on the regions, and taking the texture roughness model as a modulation factor to improve the accuracy of the model;

step 3, establishing a region weight model: according to the visual characteristics of human eyes, the attention and the sensitivity of human eyes to different areas in the same image are different. In step 2, a contrast model and a concave modulation model based on the regions are established through the color components and the space coordinates of the regions, so as to estimate the visual importance degree of the regions and establish a region weight model;

step 4, comparing and masking models based on the region texture roughness: since texture has a masking capability for distortion, in step 2, the texture condition of each region is estimated by a region-level texture roughness model. Considering that the comparison masking model still needs to be further improved, therefore, the comparison masking model of the joint texture roughness is established to improve the accuracy of the model;

step 5, a perception model based on the super pixels: and finally, on the basis of the step 3 and the step 4, fusing the contrast Masking Model combined with the texture roughness with the brightness Masking Model under the guidance of a Nonlinear Addition Model for Masking (NAMM), and obtaining the final perception Model under the weighting of the region weight Model.

Preferably, in the step 1, the image is divided into K regions by using a superpixel segmentation algorithm SLIC;

K＝[w·h/n] (1)

wherein K is a positive integer, and w and h are the width and height of the image, respectively; the super pixel is composed of adjacent pixel points with the same or similar color, texture and other characteristics, and can be regarded as a visual input unit.

Preferably, in the step 2, considering the relationship between the region feature and the human visual feature, the three links of the region-based color contrast, the region-based concave modulation effect and the region-based texture roughness are mainly used:

(1) region-based color contrast:

for color features, the average of the spatial coordinates X, Y of all the pixels and the average of the color component L, A, B are calculated for each superpixel region in turn as the spatial coordinates (x) of the respective superpixels_k，y_k) And color component values (l)_k，a_k，b_k) Calculating the color difference and the space distance between the super pixel areas:

where k and i are the indices of the superpixel, (l)_k，a_k，b_k，x_k，y_k) And (l)_i，a_i，b_i，x_i，y_i) Respectively representing the super-pixels k and i,

representing superpixels k and superpixelsThe difference in the color of the element i,

represents the spatial distance of the superpixel k from the superpixel i;

according to HVS characteristics, the color of a certain area is greatly different from that of the surrounding area, which indicates that the color contrast of the area is relatively large, and if super pixels with high color contrast are adjacent or concentrated, the part is considered to attract the attention of human eyes; the color contrast is therefore proportional to the euclidean distance of the color and inversely proportional to the euclidean distance of the location, the color contrast being calculated:

wherein is c_kTotal color contrast of super-pixel k, c_(k，i)For the color contrast between regions k and i,

and

has been normalized to [0, 1]Where c is₁Set to 3;

(2) region-based concave modulation model

According to the human eye concave feature, the human visual system is most sensitive to the area mapped to the human eye foveal area, and the sensitivity of the human eye decreases rapidly with increasing distance from the foveal area; selecting 5 areas with highest color contrast as central concave areas; the eccentricity is defined as follows:

wherein i is 1, 2，x_iAnd y_iRepresenting the space coordinate value of the selected area, x and y representing the space coordinate values of other positions, and d representing the observation distance;

the closer to the fovea, the smaller the eccentricity, and the more easily the human eye pays attention to, defining a concave modulation model:

f_i＝ln(λe_i) (8)

where m is the total number of selected foveal areas and has a value of 5, f_iA modulation model representing a single foveal region; e.g. of the type_iDenotes the eccentricity, λ is the adjustment parameter, here set to 1;

(3) region-based roughness model

According to the relationship between the texture and the visual characteristic, the more complex the texture is, the stronger the masking effect on the distortion is; texture roughness is an important texture feature, and the masking capability of the region texture on noise is estimated by using the texture roughness.

Calculating roughness by using the optimal size of each pixel point in the target image according to Tamura texture roughness, wherein the optimal size refers to the window size corresponding to the maximum average gray difference between adjacent active windows; calculating the texture roughness of the area on the basis of Tamura texture roughness, wherein the calculation process is as follows:

first, the average gray value in a multi-size window at a pixel point is calculated, the window size is 2n × 2n, here, the selection of the window size is refined:

where (x, y) is the pixel coordinate, and I (x, y) is the pixel value at pixel (x, y); n is 1, 2, 5, and the texture change near the pixel point can be estimated more accurately through careful window division;

then, the average gray difference between two windows in the horizontal direction and the vertical direction of each pixel is calculated respectively,

E_n，h(x，y)＝|A_n(x+n，y)-A_n(x-n，y)| (10)

E_n，v(x，y)＝|A_n(x，y+n)-A_n(x，y-n)| (11)

wherein E_n，hAnd E_n，vRespectively representing the difference of the mean gray values in the horizontal and vertical directions, A_nRepresenting the mean gray value within the window;

then, selecting the window size corresponding to the maximum average gray scale difference value to calculate the roughness,

S(x，y)＝2n (12)

E_n＝E_max＝max(E₁，E₂，...E_N) (13)

wherein n is a value corresponding to the maximum average gray difference, including the horizontal direction and the vertical direction;

finally, the roughness of the superpixel is calculated:

wherein n is_kS (x, y) represents the optimal size corresponding to the pixel point, wherein S is the total number of the pixel points in the super pixel;

the Tamura roughness describes the average roughness of a super-pixel area, and the larger the optimal size corresponding to a pixel point is, namely the larger the size of a texture element is, the larger the roughness value is; however, the texture structure of the areas is sparse, and the visual masking effect is weak; since the JND threshold reflects the maximum allowable distortion level, the final modulation model is inversely proportional to the roughness value; considering the influence of the average gray difference, a roughness modulation model is defined as follows:

F_k＝σ·E_max/F_crs (15)

wherein E_maxIs the maximum averageGray difference, F_crsFor roughness values, σ is an adjustment parameter less than 1, related to the window size.

Preferably, in step 3, the color contrast and the concave modulation model obtained in step 2 are used to establish a region weight model:

W_k＝(c₂-c_k)·f_fov (16)

wherein c is_kColor contrast of the representation area, f_fovRepresenting a concave modulation model, c₂A constant parameter is set here to 1.75.

Preferably, in the step 4, the texture roughness is used to improve the accuracy of the model, and the texture roughness is combined with the comparison masking model to establish a more accurate visual masking model:

CM′＝CM+F_k (17)

wherein CM' is a contrast masking model of the joint texture roughness, CM is an original contrast masking model, F_kFor the region-based roughness modulation model:

CM(x，y)＝β·G(x，y)·W(x，y) (18)

where β is a control parameter, and takes a value of 0.117, G (x, y) is a maximum weighted average gradient value at the pixel coordinate (x, y), and W (x, y) represents an edge weighting factor at the pixel coordinate (x, y):

W(x，y)＝L·h (21)

wherein g is_k(i, j) are four-way high pass filters; l is the image after it has passed through the Canny operator edge detector and h is a k × k gaussian low pass filter.

Preferably, in the above steps 3 and 4, a region weight model and a comparison masking model based on texture roughness are obtained respectively, and are used for estimating the visual importance of the region and improving the visual masking effect of the model, and the two are merged into the perceptual model to obtain a final perceptual model based on the region:

RJND＝JND_NAMM·W_k (22)

wherein W_kFor regional weight model, JND_NAMMJND model for joint texture coarseness:

JND_NAMM＝LA+CM′-α·min{LA，CM′} (23)

wherein LA is a background brightness masking model, CM' is a contrast masking model of the joint texture roughness, and alpha is an adjusting parameter, and 0.3 is taken.

Is the average background luminance at pixel (x, y).

The method mainly considers the relationship between the color texture characteristics of different areas in the same image and the response of a human visual system to visual information, pixel points in the same superpixel have the same or similar characteristics of color, texture and the like, the superpixel can be regarded as a processing unit, the attention and sensitivity of human eyes to different areas are different, and the visual importance degree of each area is estimated by using the characteristics of the superpixel. Establishing a color contrast model of a region level by using the average value of the space coordinates X, Y of all pixel points in the super-pixel and the average value of the color component L, A, B, then selecting a region with the highest contrast as a central concave region, and establishing a concave modulation model of the region level. Meanwhile, the method is improved on the basis of Tamura texture roughness, the influence of average gray difference is increased while the window size is refined, the texture roughness based on the region is calculated, and the texture roughness is combined with a comparison masking model to improve the accuracy of the model. Further, the contrast masking model of the joint texture roughness and the brightness masking model are subjected to nonlinear superposition, and finally multiplied and combined with the region weight model, so that the sensitivity and attention of human eyes to different regions are fully considered, and the JND model based on the super pixels is obtained.

Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:

1. the method uses the super-pixels for establishing the JND model, fully considers the relation between the color and the textural features of the regional hierarchy and the visual characteristics of human eyes, and provides the JND model based on the super-pixels;

2. the method fully considers the relation between the regional characteristics and the attention of human eyes, not only calculates the color contrast between the regions, but also considers the fovea effect of the regional hierarchy, and the two jointly establish a regional weight model for estimating the visual importance degree of each region;

3. the method introduces a texture roughness modulation model of a region level, improves on the basis of Tamura texture roughness, further refines the selection of window sizes, considers the influence of average gray level difference between windows and is used for estimating the texture condition of each region;

the invention performs experiments on related test images, and the experimental result shows that the method has good accuracy and robustness.

Drawings

FIG. 1 is a block diagram of the process of the present invention.

Fig. 2 is a schematic diagram of a method of the superpixel-based JND model according to the present invention.

Fig. 3 is a super pixel segmentation diagram of the test image "basetballkill".

Fig. 4 is a schematic diagram of color contrast based on regions.

Fig. 5 shows the 5 regions with the highest contrast selected in the present invention.

FIG. 6 is a schematic diagram of a concave modulation centered on a single area and a general concave modulation diagram.

FIG. 7 is a schematic of texture roughness based on region.

FIG. 8 is a schematic diagram of a region weight model.

Fig. 9 is a comparison between noise-added and noise-not-added image areas in the original.

Fig. 10 is a comparison between models.

Fig. 11 is a comparison between image regions.

Detailed Description

Preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings:

the first embodiment is as follows:

referring to fig. 1, the method for super-pixel based just noticeable distortion comprises the following steps:

step 1, preprocessing an input image:

the method comprises the steps of super-pixel segmentation and color space conversion, wherein simple linear iterative clustering in a super-pixel segmentation algorithm is used for carrying out region segmentation on an input image, and super-pixels are taken as visual input units; simultaneously converting the input image into an LAB color space with uniform visual perception so as to facilitate subsequent operation processing;

step 2, calculating regional characteristics:

the method comprises the steps of including color features and texture features, for the color features, firstly calculating the average value of spatial coordinates X, Y and the average value of color components L, A, B of all pixels in a super-pixel region, then sequentially calculating the spatial distance and color difference with other super-pixels, and calculating the color contrast between the super-pixels; in addition, in consideration of the concave characteristic of human eyes, 5 areas with high color contrast are selected to establish a concave modulation model based on the areas; for texture features, further refining the selection of window sizes on the basis of Tamura texture roughness according to the masking effect of the texture roughness on human eyes, and meanwhile, considering the influence of average gray difference between windows to establish a texture roughness model based on regions;

step 3, establishing a region weight model:

according to the visual characteristics of human eyes, the attention degrees of the human eyes to different regions in the same image are different, in step 2, a region-based contrast model and a concave modulation model are established through the color components and the space coordinates of the regions, the attention degree of the regions is estimated, and a region weight model is established; the higher the contrast, the more easily the attention is paid; the closer to the foveal region, the more susceptible to attention;

step 4, combining a comparison masking model of texture roughness:

in step 2, establishing a texture roughness model based on the region according to the visual masking effect of the texture on human eyes; considering that the comparison masking model still needs to be further improved, the accuracy of the model is improved by utilizing the texture roughness, and the comparison masking model combining the texture roughness is established;

step 5, a perception model based on the super pixels:

and (4) fusing the comparison masking model of the joint texture roughness and the brightness masking model on the basis of the steps 3 and 4, and obtaining a final perception model under the weighting of the region weight model.

The method introduces a region weight model and a region roughness modulation model, wherein the region weight model combines the color contrast and the concave modulation effect of a region level to estimate the visual importance degree of each region, and the region roughness modulation model refines the selection of the window size on the basis of Tamura texture roughness and considers the influence of the average gray difference between windows to estimate the texture condition of each region; the invention performs experiments on related test images, and the experimental result shows that the method has good accuracy and robustness.

Example two:

the basic flowchart of the JND model based on superpixel and region features of this embodiment is shown in fig. 2. The method is realized in programming simulation under windows 10 and matlab 2016 environments. First, the input image is pre-processed, including superpixel segmentation and color space conversion. Then, carrying out feature calculation on each region, including color features and texture features, calculating color contrast through the average pixel value and the average coordinate value of the super-pixels, establishing a concave modulation model according to the concave characteristics of human eyes, and combining the two to establish a region weight model. In addition, a texture roughness model of the region is established to improve the masking capability of the region texture to distortion, and is combined with the comparison masking model. And finally, carrying out nonlinear superposition on the contrast masking model of the joint texture roughness and the brightness masking model, and multiplying the result by the region weight model to obtain a final new JND model, wherein the attention and the sensitivity of human eyes to each region are considered. The idea is the super-pixel-based JND model of the invention.

The method of the embodiment specifically comprises the following steps:

step 1, preprocessing an input image:

the method comprises the steps of super-pixel segmentation and color space conversion, wherein a super-pixel segmentation algorithm SLIC is used for segmenting an image into K regions;

K＝[w·h/n] (1)

wherein K is a positive integer, and w and h are the width and height of the image, respectively; the super-pixel is composed of adjacent pixel points with the same or similar color, texture and other characteristics, the super-pixel is taken as a visual input unit, and a super-pixel segmentation graph is shown in fig. 3;

converting the input image into the LAB color space for subsequent operation processing, because the perceived uniform LAB color space is more consistent with human perception characteristics;

step 2, calculating regional characteristics:

the method comprises color features and texture features, according to HVS characteristics, due to the difference of the color, texture and other features of each region, the attention degree and sensitivity degree of human eyes to different regions are different, the attention degree to salient regions is higher, and the sensitivity to texture complex regions is lower; considering the relationship between the region characteristics and the human visual characteristics, the three links of region-based color contrast, region-based concave modulation effect and region-based texture roughness are mainly used:

1. region-based color contrast

For color features, the average of the spatial coordinates X, Y of all the pixels and the average of the color component L, A, B are calculated for each superpixel region in turn as the spatial coordinates (x) of the respective superpixels_k，y_k) And color component values (l)_k，a_k，b_k) And calculating the color difference and the space distance between the super pixel areas.

representing the color difference of the super pixel k and the super pixel i,

representing the spatial distance of the superpixel k from the superpixel i.

According to the HVS characteristics, the color of a certain region is greatly different from the color of the surrounding region, indicating that the color contrast of the region is large, and if the regions with high color contrast are adjacent or concentrated, the region is considered to attract more attention of human eyes. The color contrast is therefore proportional to the euclidean distance of the color and inversely proportional to the euclidean distance of the location, the color contrast being calculated:

and

has been normalized to [0, 1 ]]Where c is₁Set to 3. The resulting color contrast plot is shown in fig. 4.

2. Region-based concave modulation model

According to the human eye's concave features, the human visual system is most sensitive to areas mapped to the human eye's fovea, and the sensitivity of the human eye decreases rapidly with increasing distance from the fovea. The 5 regions with the highest color contrast were chosen as the foveal region, as shown in fig. 5. The eccentricity is defined as follows:

wherein i is 1, 2_iAnd y_iThe spatial coordinate values representing the selected region, x and y represent spatial coordinate values of other positions, and d represents the observation distance.

f_i＝ln(λe_i) (8)

where m is the total number of selected foveal regions, f_iA modulation model representing a single foveal region; e.g. of the type_iDenotes the eccentricity and λ is the regulating parameter, here set to 1. Fig. 6 shows a schematic diagram of concave modulation with one area as a central concave area, and a general concave modulation diagram.

3. Region-based roughness model

According to the relationship between the texture and the visual characteristics, the more complex the texture is, the stronger the masking effect on the distortion is. Texture roughness is an important texture feature, and the masking capability of the region texture on noise is estimated by using the texture roughness.

And calculating the roughness by using the optimal size of each pixel point in the target image according to the Tamura texture roughness, wherein the optimal size refers to the window size corresponding to the maximum average gray difference between adjacent active windows. It is noted that the larger the optimal size at a pixel point, the more sparse the texture near the pixel point. The texture roughness of the regions is calculated here on the basis of Tamura texture roughness, in the following calculation process:

where (x, y) is the pixel coordinate and I (x, y) is the pixel value at pixel (x, y). n 1, 2, 5, the texture change near the pixel point can be estimated more accurately by careful window division.

E_n，h(x，y)＝|A_n(x+n，y)-A_n(x-n，y)| (10)

E_n，v(x，y)＝|A_n(x，y+n)-A_n(x，y-n)| (11)

wherein E_n，hAnd E_n，vRepresenting the difference of the mean gray values in the horizontal and vertical direction, respectively, a_nRepresenting the mean gray value within the window;

S(x，y)＝2n (12)

E_n＝E_max＝max(E₁，E₂，...E_N) (13)

wherein n is a value corresponding to the maximum average gray level difference, including the horizontal direction and the vertical direction.

Finally, the roughness of the superpixel is calculated.

Wherein n is_kS (x, y) represents the optimal size of the pixel point corresponding to the total number of pixel points in the superpixel.

The larger the optimal window size is, the more sparse the texture near the pixel point is, that is, the larger the output value of S is, the smoother the vision is. Considering that the JND threshold reflects the maximum allowable distortion level, the size of the roughness value of the super-pixel is proportional to the optimal window size, and therefore the final modulation model should be inversely proportional to the roughness value. In addition, considering the influence of the average gray difference, a roughness modulation model is defined as follows:

F_k＝σ·E_max/F_crs (15)

wherein E_maxIs the maximum average gray difference, F_crsFor roughness values, σ is an adjustment parameter less than 1, related to the window size. As shown in fig. 7, a schematic graph of texture roughness for the region is obtained.

Step 3, regional weight model

And (3) establishing a region weight model by using the color contrast and the concave modulation model obtained in the step (2) according to the difference of attention and sensitivity of human eyes to different regions, namely the different visual importance degrees of the different regions.

W_k＝(c₂-c_k)·f_fov (16)

Wherein c is_kColor contrast of the display area, f_fovRepresenting a concave modulation model, c₂A constant parameter is set here to 1.75. c. C_kThe larger the value is, the higher the color contrast is, and the higher the attention of human eyes is; f. of_fovThe larger the value, the farther away from the foveal region, the lower the attention of the human eye. The attention of the human eye is inversely proportional to the size of the JND threshold, i.e., the more easily the human eye pays attention to the region, the lower the tolerance to distortion, and therefore, the weightThe value is inversely proportional to the magnitude of the color contrast output value and directly proportional to the magnitude of the concave modulation model output value, i.e. the more visually important, the lower the corresponding JND threshold and the lower the weight value should be, as shown in fig. 8, to fit the human visual characteristics.

Step 4, combining the contrast masking model of the texture roughness

According to the visual characteristics of human eyes, the texture has a visual masking effect on the human eyes, and more distortion can be tolerated in the area with larger roughness of the texture. As shown in fig. 9, the texture region in fig. 9(b) is more tolerant to distortion than fig. 9(c) and 9 (d). The texture roughness is used to improve the accuracy of the model, and is combined with a comparison masking model to establish a more accurate visual masking model:

CM′＝CM+F_k (17)

wherein CM' is a comparison masking model of joint texture roughness, CM is an original comparison masking model, F_kIs a region-based roughness model.

CM(x，y)＝β·G(x，y)·W(x，y) (18)

Where β is a control parameter, and takes a value of 0.117, G (x, y) is the maximum weighted average gradient value at the pixel coordinate (x, y), and W (x, y) represents the edge weighting factor at the pixel coordinate (x, y).

W(x，y)＝L·h (21)

Wherein g is_k(i, j) are four-way high pass filters. L is the image after it has passed through the Canny operator edge detector and h is a k × k gaussian low pass filter.

Step 5, perception model based on region

In the above steps 3 and 4, a region weight model and a comparison masking model based on texture roughness are obtained respectively, and are used for estimating the visual importance degree of the region and improving the visual masking effect of the model, and the two are blended into the perception model to obtain the final perception model based on the region.

RJND＝JND_NAMM·W_k (22)

Wherein W_kJND, a region weight model_NAMMJND model based on texture roughness.

JND_NAMM＝LA+CM′-α·min{LA，CM′} (23)

Wherein LA is a background brightness masking model, CM' is a contrast masking model of the joint texture roughness, alpha is an adjusting parameter, 0.3 is taken,

is the average background luminance at pixel (x, y).

The following is an experiment performed on a test image to evaluate the superpixel-based JND model proposed by the present invention. The experimental environment is a matlab 2016 platform under a windows 10 operating system, the memory is 16G, and the CPU is Intel (R) core (TM) i 7-8700. In the experiment, two indexes, namely Peak Signal to Noise Ratio (PSNR) and Mean Opinion Score (MOS), are used for evaluating the performance of the JND model, the PSNR is used for calculating the objective distortion degree of an image, the MOS is used for measuring the subjective perception quality of the image, 20 observers trained through grading are invited to Score, and the scores sequentially represent that the subjective quality is from poor to good from 0 to 5. The higher the PSNR value is, the smaller the distortion degree of the image is, and the higher the MOS value is, the better the subjective perception quality of the image is. On the basis of the same subjective quality, the lower the PSNR, the better the performance of the model is. 3 different JND models (Liu, Wu's 2013, Wu's 2017) were selected as comparative models in the experiment. Fig. 10 is a comparison of the subjective quality and PSNR of an output image after the JND model processing on a test image. Fig. 11 is a comparison of different areas in the output image.

TABLE 1 comparison of model Performance

Where the average data of the experimental results are bolded in red font and the data that the model performs better on some test images are bolded in blue font. From the average results in the last column of the table, it can be seen that the present invention can effectively reduce PSNR and remove as much visual redundancy as possible while maintaining subjective quality as compared to the selected reference model. The experiment shows that the method has better robustness and accuracy in the aspect of image visual redundancy estimation, uses the super-pixel and region characteristics in the estimation of the JND threshold, has good biological principle support, and can be better suitable for the estimation of image perception redundancy.

The embodiments of the present invention have been described with reference to the accompanying drawings, but the present invention is not limited to the embodiments, and various changes and modifications can be made according to the purpose of the invention, and any changes, modifications, substitutions, combinations or simplifications made according to the spirit and principle of the technical solution of the present invention shall be equivalent substitutions, as long as the purpose of the present invention is met, and the present invention shall fall within the protection scope of the present invention without departing from the technical principle and inventive concept of the present invention.

Claims

1. A method for super-pixel based just noticeable distortion, comprising the steps of:

step 1, preprocessing an input image:

step 2, calculating regional characteristics:

step 3, establishing a region weight model:

according to the visual characteristics of human eyes, the attention degrees of the human eyes to different regions in the same image are different, in step 2, a region-based contrast model and a concave modulation model are established through the color components and the space coordinates of the regions, the attention degree of the regions is estimated, and a region weight model is established; the higher the contrast, the more easily it is focused on; the closer to the foveal region, the more susceptible to attention;

step 4, combining a texture roughness comparison and masking model:

step 5, a perception model based on the super pixels:

2. The method of claim 1, wherein in step 1, the image is divided into K regions using a superpixel segmentation algorithm SLIC;

K＝[w·h/n] (1)

3. The method of claim 1, wherein in step 2, considering the relationship between the region feature and the human visual feature, the three links of region-based color contrast, region-based concave modulation effect and region-based texture roughness are mainly considered:

(1) region-based color contrast:

representing the color difference of the super pixel k and the super pixel i,

represents the spatial distance of the superpixel k from the superpixel i;

according to HVS characteristics, the color of a certain region is greatly different from that of the surrounding region, which indicates that the color contrast of the region is relatively large, and if the regions with high color contrast are adjacent or concentrated, the region is considered to attract the attention of human eyes; thus, the color contrast is proportional to the Euclidean distance of the color and inversely proportional to the Euclidean distance of the location, and is calculated as follows:

and

has been normalized to [0, 1 ]]Where c is₁Set to 3;

(2) region-based concave modulation model

wherein i is 1, 2_iAnd y_iRepresenting the space coordinate value of the selected area, x and y representing the space coordinate values of other areas, and d representing the observation distance;

f_i＝ln(λe_i) (8)

where m is the total number of selected foveal regions, f_iA modulation model representing a single foveal region; e.g. of the type_iDenotes the eccentricity, λ is the adjustment parameter, here set to 1;

(3) region-based roughness model

According to the relationship between the texture and the visual characteristic, the more complex the texture is, the stronger the masking effect on the distortion is; texture roughness is an important texture feature, and the masking capability of regional textures on noise is estimated by the texture roughness;

calculating roughness by using the optimal size of each pixel point in the target image according to Tamura texture roughness, wherein the optimal size refers to the window size corresponding to the maximum average gray difference between adjacent active windows; the texture roughness of the regions is calculated here on the basis of Tamura texture roughness by the following procedure:

E_n，h(x，y)＝|A_n(x+n，y)-A_n(x-n，y)| (10)

E_n，v(x，y)＝|A_n(x，y+n)-A_n(x，y-n)| (11)

S(x，y)＝2n (12)

E_n＝E_max＝max(E₁，E₂，...E_N) (13)

finally, the roughness of the superpixel is calculated:

the texture structure of the region with larger roughness value is sparse, and the visual masking effect is weaker; since the JND threshold reflects the maximum allowable distortion level, the final modulation model is inversely proportional to the roughness value; considering the influence of the average gray difference, a roughness modulation model is defined as follows:

F_k＝σ·E_max/F_crs (15)

wherein E_maxIs the maximum average gray difference, F_crsFor roughness values, σ is an adjustment parameter less than 1, related to the window size.

4. The method of claim 1, wherein in step 3, the color contrast and the concave modulation model obtained in step 2 are used to establish a region weight model:

W_k＝(c₂-c_k)·f_fov (16)

5. The method of claim 1, wherein texture roughness is used to improve model accuracy in step 4, and combined with a comparative masking model to build a more accurate visual masking model:

CM′＝CM+F_k (17)

wherein CM' is a comparison masking model of joint texture roughness, CM is an original comparison masking model, F_kFor region-based roughness model:

CM(x，y)＝β·G(x，y)·W(x，y) (18)

W(x，y)＝L·h (21)

wherein g is_k(i, j) is a four-direction high pass filter; l is the image after it has passed through the Canny operator edge detector and h is a k × k gaussian low pass filter.

6. The method of claim 1, wherein in steps 3 and 4, a region weight model and a texture roughness-based comparison masking model are obtained respectively for estimating the visual importance of the region and enhancing the visual masking effect of the model, and the two are merged into the perceptual model to obtain a final region-based perceptual model:

RJND＝JND_NAMM·W_k (22)

wherein W_kJND, a region weight model_NAMMJND model based on texture roughness:

JND_NAMM＝LA+CM′-α·min{LA，CM′) (23)

is the average background luminance at pixel (x, y).