CN103218815B

CN103218815B - Utilize the method for natural scene statistical computation image saliency map

Info

Publication number: CN103218815B
Application number: CN201310135762.8A
Authority: CN
Inventors: 黄虹; 张建秋
Original assignee: Fudan University
Current assignee: Shanghai Jilian Network Technology Co ltd
Priority date: 2013-04-19
Filing date: 2013-04-19
Publication date: 2016-03-30
Anticipated expiration: 2033-04-19
Also published as: CN103218815A

Abstract

The invention belongs to image saliency map modelling technique field, be specially a kind of method utilizing natural scene statistical computation image saliency map.The present invention utilizes the natural scene Gauss yardstick multiplier stochastic variable mixed in statistical distribution to carry out computed image significantly to scheme, thus sets up image saliency map model.Analysis shows: the remarkable graph model that the present invention proposes and visual attention select mechanism to have higher consistance, namely can in the stimulation suppressing to repeat simultaneously, the visual stimulus that outstanding conspicuousness is higher, thus the conspicuousness distribution better describing that image stimulates human eye vision.

Description

Method for statistically calculating image saliency map by using natural scene

Technical Field

The invention belongs to the technical field of image saliency map models, and particularly relates to a method for calculating an image saliency map by using a multiplier random variable in natural scene Gaussian scale mixed statistical distribution.

Background

Visual Attention (VA) is an important mechanism of the Human Visual System (HVS). Generally, human eyes receive a lot of visual stimuli from the outside at an instant, but due to the limited processing resources in the HVS, a competitive relationship between different visual stimuli is formed for the processing resources. Eventually the most informative stimulus of all visual stimuli will win the competition, while the others will be suppressed^[1-2]. The HVS just utilizes the VA selection mechanism, and fully utilizes effective resources to process a large amount of visual stimuli, thereby reducing the complexity of scene analysis^[3]。

In the neurological and biological fields, some experiments record the eye gaze point of the observer facing different image scenes by means of a specific device, and others actively mark the region of interest of the observer in the different image scenes. The purpose of these experiments was to hope to study the VA mechanism of HVS based on the experimental results. The research result shows that: on one hand, the image where the gray scale change is large may draw more attention from the HVS, including texture information, edge information, and so on. On the other hand, the HVS has certain inhibition effect on redundant information which repeatedly appears, and is novel and not repeated with the surrounding informationMore attention is being paid to the HVS. Therefore, we call that where the visual stimulus structural template appears less frequently, the greater the significance^[3]。

With the continuous and deep research on the VA mechanism, some research results have been applied to the problem of image processing, and bring some heuristic results. However, in the real-time processing of images, it is not realistic to experimentally obtain the distribution of the fixation points of human eyes to different images, and a calculable VA model capable of simulating HVS properties needs to be established, so as to realize the prediction of the significance of visual stimuli. Among these models, the most basic is the saliency map model, which uses a saliency map (Salienctymap) to describe how attentive the HVS is to different locations in the scene^[1]。

The most fundamental conclusion on visual attention was Treisman in 1980&Attention feature integration theory proposed by Gelade on experimental basis^[5]. In 1989, Wolfe et al proposed a GuidedSearch model^[6]It uses the saliency map to implement the search of objects in the scene. In 1985, Koch&Ullman has established VA model framework according to neurological theory^[7]. Classical Itti&Koch saliency map method^[8]In Koch&In the Ullman framework, a VA mechanism is modeled by adopting multi-scale and multi-channel, relevant characteristics are extracted, and a significant graph with high consistency with an HVS is obtained by fusion. STB (Saliency ToolBox)^[9]The method combines the theory of consistent visual cognition to Itti&The Koch method was improved. Further, document [16 ]]A significant map model (PFT-based Salienymap, PFTSal) based on image Fourier transform phase spectrum is proposed and widely accepted^[1][4][17]. Recently, document [14 ]]A PCT (pulse DiscretectiosinosineTranform) model based on cosine transform is proposed, which is then documented [15]]The image descriptor is called signature, and the experimental result proves that the consistency of a signature-based salienencymap (signature-based) established on the descriptor and the HVS is higher than that of other existing signature models. Other work includes Bruce's method based on information theory^[10]And SUN model based on Bayesian method^[12]And Surrise model^[13]And the like.

The invention provides a statistical significance map model of a natural scene. It uses the random variable of multiplier in the natural scene Gaussian Scale Mixing (GSM) statistical distribution to calculate the image saliency map. The saliency map model has higher consistency with a VA mechanism.

Disclosure of Invention

The invention aims to provide a method for calculating an image saliency map consistent with an attention vision selection mechanism of a human visual system, so as to establish an image saliency map model.

The invention utilizes multiplier random variables in natural scene Gaussian Scale Mixing (GSM) statistical distribution to calculate an image saliency map, and the method comprises the following specific steps:

(1) setting an image as a grayscale image，、Respectively the number of lines and columns of the image, and wavelet coefficient transformation is carried out on the image to obtain a plurality of wavelet coefficient sub-bands.

(2) Within each wavelet coefficient sub-band, for each coefficientSelecting proper wavelet coefficient neighborhood, and drawing the wavelet coefficient neighborhood into wavelet coefficient neighborhood vectorWhereinIs the neighborhood size;

according to the statistical property of a Natural Scene Statistics (NSS) model, wavelet coefficient neighborhood vectors of natural imagesCan be described in terms of Gaussian Scale Mixed (GSM) distributions, i.e.WhereinTo characterize the random multipliers that the neighborhood vector covariance changes,is a zero mean, covariance matrix ofOf gaussian random variables, and thus, neighborhood vectorsProbability density function ofExpressed as:

(1)

wherein,is a random multiplierA probability density function distribution of (a);

thus, the coefficient neighborhood vectorWith respect to random multipliersObeying a zero mean, covariance matrix ofThe conditional probability distribution function is expressed as:

(6)。

(3) calculating an estimate of a gaussian scale mixture distribution multiplier variable

When the selected wavelet coefficient neighborhood is small enough, i.e.Sufficiently small, assuming a multiplierRemain unchanged in this neighborhood and thus can be temporarily switched onIs considered to be a definite quantity or constant, in which case the neighborhood isCorresponding multiplierCan be calculated by matching conditional probability distribution functionsIs obtained by maximum likelihood estimation, i.e.^[18]：

(7)

Note the bookIs decomposed intoWhereinIs composed ofFeature vector ofThe matrix of the composition is formed by the following components,is composed ofCharacteristic value ofA matrix is formed.

Therefore, the temperature of the molten metal is controlled,the maximum likelihood estimation result of (c) is:

(8)

hypothesis without loss of universalityIs provided withCovariance matrix of. Thus, can useCovariance matrix ofInstead of the formerTo obtain

(9)

With superscriptIs/are as followsTo representBy transposing, assuming wavelet vectorsIs a characteristic sample, thenConstitutes the feature space of the image. Since the wavelet coefficient mean is zero, therefore,is composed ofTo the center of this feature spaceMahalanobis distance of^[22] . Properties according to Mahalanobis distance: the Mahalanobis distance comprehensively considers the relation among all dimensions of the feature vector, and the sampleMahalanobis distance to feature space centerThe larger the size of the tube is,the lower the probability of belonging to the feature space, that is to say the samplesThe higher the "significance" of (a), and vice versa.

From the description of equation (9), it can be seen thatAndthere is a direct ratio relationship, i.e.

(10)

Therefore, the temperature of the molten metal is controlled,is a valid description of the significance of the sample in the feature space. The higher the significance of the feature sample, the correspondingThe larger the value; conversely, the less significant the feature sample is, the correspondingThe smaller the value.

Depending on the nature of the wavelet decomposition of the image, the wavelet coefficients can extract discontinuity information of the image, while it can describe the intensity distribution of the HVS' attention to the scene compared to the frequency domain. If we select all the coefficients of each wavelet coefficient adjacent to each other in the space, scale and direction to form a neighborhood, all the coefficients in the neighborhood are expressed as a neighborhood vector. ThenThe feature vectors of the visual stimuli in the neighborhood are described. Thus, is composed ofThe feature space formed by all the sets of the above describes not only the spatial distribution of the visual stimuli, but also the related information between the adjacent visual stimuli in space, scale and direction, as if the adjacent coefficients in a sub-band describe the correlation of the visual features adjacent in space, the adjacent coefficients in different directions in the same scale describe the correlation of the visual features adjacent in direction, and the adjacent coefficients in the same direction in different scales describe the scaleThe correlation of adjacent visual features, etc. Then the analysis of equation (6) and above indicates that: byIs composed ofThe matrix can comprehensively describe the significance distribution of the visual scene under the premise of comprehensively considering the correlation between the spatial distribution condition of the visual stimuli and the adjacent visual stimuli, and the comprehensive description is quite consistent with the VA mechanism described in the neurological and biological fields.

(4) The significance corresponding to all wavelet coefficient sub-bands is fused, so that a complete natural scene statistical significance map NSSSal (NaturalScenestatistical Salienymap) model can be obtained:

(2)

in the formula,representation scaleSuperposition of the significance descriptions in the different directions above;represents the addition of different scales (across-scale)^[9]Interpolating saliency descriptions at all scales toPost-addition of, whereinIs a Gaussian fuzzy kernel function which is used for performing certain smoothing effect on the saliency map^[15]。

(5) The dynamic range of the gray scale of the formula (11) is adjusted toThe position with the value closer to 1 means that the corresponding area in the image has higher significance, and is more attractive to human eyes than the positions with the values smaller.

(6) If the image is an RGB modulated color imageWith red channelGreen color channelAnd blue channel. Calculating corresponding gray channelsAntagonistic pair of red and greenAntagonistic pair with yellow blue。

Gray scale channelComprises the following steps:

(12)

according to the processing mechanism of human eyes on color information, four broadly modulated red (R), green (G), blue (B) and yellow (Y) channels are respectively:

(13)

(14)

(15)

(16)

can obtain the red-green antagonistic pairAnd the antagonistic pair of yellow and blueComprises the following steps:

(17)

(18)

(7) for gray scale channelRed and green antagonistic pair channelAntagonistic pair with yellow blueAccording to the calculation of the steps (1) to (5), the results are respectively recorded as、Andthe weighted average of the three is used as the saliency map of the color imageNamely:

(19)

wherein,、andare respectively the weight of 3 channels, and have。

According to the method, in the saliency map obtained by calculating the image, the higher the pixel value is, the higher the saliency of the image is; a lower pixel value corresponds to a lower image saliency.

The saliency map model provided by the invention has higher consistency with a visual attention selection mechanism, namely, the visual stimulus with higher saliency can be highlighted while the repeated stimulus is inhibited, so that the saliency distribution of the image to the visual stimulus of human eyes is better described.

Drawings

FIG. 1: pyramid decomposition and wavelet coefficient neighborhood selection may be manipulated.

Detailed Description

In the following, the invention compares the performance of the saliency map model NSSSal/cnssal of the invention with other saliency map models for extracting saliency maps from natural images by way of example. At the same time, will also pass the published Bruce database^[10]And the ImgSal database^[11]Their AUC (area under the ROC curve) was quantitatively evaluated.

The image is subjected to wavelet decomposition by using an over-complete steerable pyramid in the experiment so as to maintain the directionality of the image. Using resolution scaleNumber of directionsAre respectively as、、Andas shown in fig. 1. At the same time, select eachWavelet coefficientThe adjacent coefficients in the same sub-band, the coefficients at the same position in the sub-bands in different directions in the same scale and the coefficients at the same position in the parent scale in the same direction (marked by gray small squares in the figure) together form the neighborhood vector of the adjacent coefficientsSize of neighborhood. The gaussian kernel variance was chosen to be 0.045 times the saliency map width.

For any natural image, a saliency map obtained from a saliency map model is recorded as. Selecting a threshold valueAccording to the threshold valueTo pairPerforming binarization, and recording the resultIs composed of. Based on subjective attention distribution map provided in database(FixationDensityMap) whose positive type ratio (TPR, TruePositiveRate) is:

(20)

wherein,the sign represents the multiplication between the pixels and,representing 1 norm value, i.e. matrixThe number of the element(s) is 1.

Similarly, the false alarm rate (FPR) is:

(21)

for a given threshold valueWill be obtained for all images in the databaseThe mean value is used as the saliency map model at a threshold value ofPositive class rate of hourAnd, likewise, willIs taken as being at a threshold value ofFalse alarm rate of time. By selecting different threshold valuesTo do so byIs the vertical axis, inROC curves are plotted on the horizontal axis. Area under ROC curveA measure of the conformity of the saliency map model to the subjective attention distribution of the human eye is provided,values closer to 1 indicate higher agreement of the saliency map model with the VA mechanism.

(1) Grayscale image saliency map model performance evaluation

Table 1 shows the results of NSSSal and Itti after the images in the Bruce and ImgSal databases were grayed&Calculated by Koch, PFTSal and signatureSAL modelsThe value of the one or more of,the closer the value is to 1, the higher the consistency of the saliency map model with the VA mechanism. As can be seen, on both databases, Itti&Of KochThe least significant, PFTSal and signature models were statistically similar, while NSSSal had a more 1-like result than bothThe value, consistent with the VA mechanism, is highest.

Table 1: AUC value comparison of gray level image saliency map

。

(2) Color image saliency map model performance evaluation

Further, for color natural images, we compared cnssal with Itti&Koch^[8]、PQFTSal^[16]、signatureSal^[15]The ROC (ReceiverOperatingCharacterioctive) curve of (1), and the area AUC (area underterm Curve) under the ROC curve. In the experimentI.e. the same weight is taken for 3 channels.

Table 2 shows the results of the analysis of the Bruce and ImgSal databases by cNSSal and Itti&Calculated by Koch, PQFTSal and signatureSAl modelThe value is obtained. Different saliency map models after considering the color information,the values are increased accordingly, while cnssal is still the highest value compared to the other models.

Table 2: AUC value comparison of gray level image saliency map

。

Practical application effect of the invention

The saliency map model has wide application in image processing problems such as adaptive image compression, video abstraction, coding and image progressive transmission, image segmentation, image and video quality evaluation, target recognition, content-aware image scaling and the like. The effectiveness and superiority of the saliency map model of the invention are illustrated below by taking the application of the saliency map model of the invention in image quality assessment problems as an example.

The image quality assessment measure mimics the overall mechanism of the human visual system and is expected to achieve results consistent with the assessment of image quality by the human eye. However, in many existing image quality assessment methods, the important attention-vision selection mechanism in the HVS is often overlooked. When ignoring the attention-vision selection mechanism, the image quality assessment measure assumes that the human eye will give the same attention to all objects including natural scenes and image distortions. However, according to the attention vision selection mechanism, the human visual model does not treat the image as a simple high-dimensional spatial signal, but has different sensitivities to different attributes of the image, such as brightness, contrast, shape and texture of the object, orientation, smoothness, and the like. Since the human visual system has different sensitivities to different components of an image, such different sensitivities need to be taken into account when evaluating the quality of a distorted image. Therefore, the performance of the image quality evaluation algorithm can be effectively improved by combining the attention visual selection mechanism.

The image saliency map obtained in the experimentThe weight of each part of the image quality evaluation result in the original image quality evaluation method is readjusted as a weight, and the purpose is to expect that the evaluation result is more consistent with the subjective evaluation result of human eyes. Evaluation of three objective image quality measures (PSNR, MSSIM) by experiment^[25]、VIF^[26]) Weighted with saliency maps.

(1) PSNR based on saliency maps

Since the PSNR is pixel-based, the saliency map can be directly used as an additionThe weight matrix is used. Suppose thatIs a significant figureTo go toThe grey value over a pixel. PSNR (salience-basedpnr, SPSNR) based on the Saliency map is defined as:

(22)

wherein,is the total number of pixels of the image,、are respectively images、To go toThe gray-scale value of an individual pixel,is the image gray scale dynamic range. Since the gray levels of the saliency maps obtained from different natural images are different, the method is usedCarry out the weightAnd (6) normalization processing.

(2) MSSIM based on saliency map

Since MSSIM is based on image sub-blocks, the resulting saliency map is first of all mappedThe sub-blocks are divided into the same number and size as the reference picture. Suppose thatIs divided intoIndividual sub-blocks, and sub-blocks、The corresponding sub-block isTo give sub-blocksNormalized gray scale mean ofAs weights for the corresponding sub-blocks, then there are

(23)

Wherein,to representPixel of (2)The number of the first and second groups is,to representTo (1)The gray value of each pixel point. Therefore, a Saliency-based MSSIM (SMSSIM) can be defined as

(23)

Wherein,、respectively the first in the reference image and the distorted imageThe number of the individual blocks is one,is the total number of image sub-blocks.

(3) VIF based on saliency maps

Since the VIF in the spatial domain is multi-scale, the reference image is adjusted to different scales, and the saliency maps at different scales are calculatedWhereinIs a scale index. Then according to the sub-block based properties of VIFLower, handleIs divided intoIndividual block, firstThe individual block is. Sub-blocksNormalized gray scale mean ofAs weights for the corresponding sub-blocks, then there are

(25)

Wherein,to representThe number of pixels in (1) is,to representTo (1)The gray value of each pixel point. Therefore, VIF (Saliency-based VIF, SVIF) based on Saliency map can be usedIs defined as

(26)

Wherein,the number of the scale is shown as,in order to index the dimensions of the image,is shown asThe number of image sub-blocks at a scale,is the subblock index.For reference picture NoOn a scale ofThe number of the individual blocks is one,、the reference image and the distorted image sub-block received for the human eye respectively,are the corresponding model parameters.Is composed ofAndaboutThe mutual information of (2).

Four different significance map models (Itti) were used for PSNR, MSSIM, and VIF in the experiment&Koch^[8]、PQFTSal^[16]、signatureSal^[15]Cnssal), and five different implementation methods (IQA, Itti _ IQA, PQFT _ IQA, signature _ IQA, and cNSS _ IQA) are derived for each measurement.

The implementation program of matlab of MSSIM and the space domain VIF comes from the implementation version of the original author on the network. In the experiment, the size of the subblocks is 8 multiplied by 8 when the MSSIM and the SMSSIM are calculated, and the distance between adjacent subblocks is 1 pixel; and 4 scales are taken when calculating the spatial domain VIF, the sub-blocks are 3 multiplied by 3 in size and do not overlap with each other. Meanwhile, in order to ensure the fairness of comparison, the parameter settings when the three models are used for extracting the saliency map and combining the saliency map with the image quality evaluation are consistent. The three saliency maps are blurred with default blurring parameters in the version implemented by the author of signatureal (see document [15 ]).

In LIVE image quality assessment database^[28]The performance of the present invention was verified. The LIVE database contains 982 images, of which 779 are distorted images. These images are obtained from 29 reference images under different distortion levels through five distortion modes of JPEG, JPEG2000, white noise, Gaussian blur and channel fast fading. The database also shows the subjective evaluation score (DMOS) corresponding to each image, and the range of DMOS is [0,100 ]]DMOS =0 represents an undistorted image, and as the degree of distortion of the image increases, the DMOS value increases accordingly. By comparing the evaluation results obtained by the DMOS and the image quality evaluation algorithm, the evaluation can be madePerformance of the image quality assessment algorithm.

After the image quality estimation result and the DMOS are subjected to nonlinear regression fitting, the consistency of the objective image quality evaluation measure and the subjective quality evaluation result can be quantitatively evaluated through five objective evaluation indexes: 1) a Linear Correlation Coefficient (LCC), which describes the accuracy of the prediction, with values closer to 1 indicating higher prediction accuracy; 2) mean Absolute Error (MAE), smaller values of which indicate smaller predicted absolute errors; 3) root Mean Square Error (RMSE), the smaller the value of which indicates the smaller the predicted root mean square error, 4) Outlier Ratio (OR), which describes the predicted consistency, the smaller the value of which indicates the higher the predicted consistency; 5) spearman's rank correlation coefficient (SROCC), which describes the monotonicity of the prediction, a value closer to 1 indicates a better monotonicity of the prediction. The quantitative evaluation results were as follows:

table 3: method for improving and comparing image quality evaluation measure performance of color image saliency map model

。

The bolded values in table 3 are the best results obtained for several different implementations in each evaluation index. It can be seen that three different saliency map models (Itti & Koch, PQFTSal, signatureSAl and cNSSal) all bring improvements to the performance of image quality assessment metrics. Meanwhile, the performance of the cNSSal saliency map on image quality evaluation measurement is improved far better than that of other saliency map models, and the superiority of the saliency map model in the image quality evaluation problem is proved.

Reference to the literature

[1]UEngelke,HKaprykowsk,H-JZepernick,andPNdjiki-Nya.Visualattentioninqualityassessment[J].IEEESignalProcessingMagazine,2011,28(6):50-59.

[2]EKowler.Eyemovements:Thepast25years[J].VisionRes.,2011,51(13):1457-1583.

[3]MCarrasco.VisualAttention:Thepast25years[J].VisionRes.,2011,51(13):1484-1525.

[4]AToet.Computationalversuspsychophysicalbottom-upimagesaliency:acomparativeevaluationstudy[J].IEEETransactions,2011,PAMI-33(11):2131-2146.

[5]AMTreismanandGGelade.Afeature-integrationtheoryofattention[J].Cogn.Psychol.,1980,12(1):97–136.

[6]JMWolfe,KRCave,andSLFranzel.Guidedsearch:Analternativetothefeatureintegrationmodelforvisualsearch[J].J.Exp.Psychol.Hum.Percept.Perform.,1989,15(3):419-433.

[7]CKochandSUllman.Shiftsinselectivevisualattention:towardstheunderlyingneuralcircuitry[J].HumanNeurobiol,1985,219-227.

[8]LItti,CKochandENiebur.Amodelofsaliency-basedvisualattentionforrapidsceneanalysis[J].IEEETransactions,1998,PAMI-20(11):1254-1259.

[9]DWaltherandCKoch.Modelingattentiontosalientproto-objects[J].NeuralNetworks,2006,19(9):1395-1407.

[10]NDBBruceandJKTsotsos.Saliencybasedoninformationmaximization[A].InProc.AdvancesinNeuralInformationProcessingSystem[C].2005.155-162.

[11]JLi,MDLevine,XAn,XXu,andHHe.Visualsaliencybasedonscale-spaceanalysisinthefrequencydomain[J].IEEETransactions,2007,PAMI-PP(99):1,0.

[12]CKanan,MHTong,LZhang,andGWCottrel.SUN:Top-downsaliencyusingnaturalstatistics[J].VisualCognition,2009,17(6/7):979-1003.

[13]LIttiandPBaldi.Bayesiansurpriseattractshumanattention[A].InProc.AdvancesinNeuralInformationProcessingSystem[C].2009.547-554.

[14]YYing,BWang,LMZhang.Pulsediscretecosinetransformforsaliency-basedattention[A].InProc.IEEEICDL’09[C].2009.1-6,5-7.

[15]XHou,JHarel,andCKoch.Imagesignature:highlightingsparsesalientregions[J].IEEETransactions,2012,PAMI-34(1):194-201.

[16]MQiandLMZhang.Saliency-basedimagequalityassessmentcriterion[A].InProc.AdavancedIntelligentComputingTheoriesandApplications[C].2008.1124-1133.

[17]CGuoandLMZhang.Spatio-temporalsaliencydetectionusingphasespectrumofquaternionFouriertransform[A].InProc.IEEECVPR’08[C].2008.1-8,23-28.

[18]MJWainwrightandEPSimoncelli.ScalemixturesofGaussiansandthestatisticsofnaturalimages[J].Adv.NeuralInf.Process.Syst.,2000,12:855-861.

[19]JPortilla,VStrela,MJWainwright,andEPSimoncelli,ImagedenoisingusingscalemixtureofGaussiansinthewaveletdomain[J].IEEETransactions,2003,ImageProcessing-12(11):1338-1351.

[20]ZWangandACBovik.Reduced-andno-referenceimagequalityassessment[J].IEEE,2011,SignalProcessingMagazine-28(6):29-40.

[21]AKMoorthyandACBovik.Statisticsofnaturalimagedistortions[A].InProc.IEEEICASSP’10[C].2010.962-965,14-19.

[22]PCMahalanobis.Onthegeneralizeddistanceinstatistics[J].ProceedingsoftheNationalInstituteofSciencesofIndia,1936,2(1):49-55.

[23]EPSimoncelli,WTFreeman,EHAdelson,andDJHeeger.Shiftablemultiscaletransforms[J].IEEETransactions,1992,InformaitonTheory-38(2):587–607.

[24]ZWangandACBovik.Meansquarederror:loveitorleaveit?Anewlookatsignalfidelitymeasures[J].IEEESignalProcessingMagazine,2009,26(1):98-117.

[25]ZWang,ACBovik,HRSheikh,andEPSimoncelli.Imagequalityassessment:Fromerrormeasurementtostructuralsimilarity[J].IEEETransactions,2004,ImageProcessing-13(4):600-612.

[26]HRSheikhandACBovik.Imageinformationandvisualquality[J].IEEETransactions,2006,ImageProcessing-15(2):430-444.

[27]HR.Sheikh,MFSabir,andACBovik.Astatisticalevaluationofrecentfullreferenceimagequalityassessmentalgorithms[J].IEEETransactions,2006,ImageProcessing-15(11):3440-2451.

[28]HRSheikh,ZWang,ACBovik,andLKCormack.ImageandVideoQualityAssessmentResearchatLIVE[Online].Available:http://live.ece.utexas.edu/research/quality.。

Claims

1. A method for calculating an image saliency map is characterized in that a multiplier random variable in natural scene Gaussian scale mixed statistical distribution is used for calculating the image saliency map, and the method comprises the following specific steps:

(1) setting an image as a grayscale imageR, C are the row and column numbers of the image, respectively, and wavelet coefficient transformation is carried out on the row and column numbers to obtain a plurality of wavelet coefficient sub-bands;

(2) at each wavelet coefficientIn the sub-band, selecting proper wavelet coefficient neighborhood for each coefficient c, and drawing the wavelet coefficient neighborhood into wavelet coefficient neighborhood vectorWhere M is the neighborhood size; describing wavelet coefficient neighborhood vector of natural image by Gaussian scale mixed distribution according to statistical properties of natural scene statistical modelI.e., C ═ su ', where s is a random multiplier characterizing the variance of the covariance of the neighborhood vectors, u' is zero mean and the covariance matrix is C_u(ii) a gaussian random variable;

(3) the maximum likelihood estimation result for calculating the multiplier variable of Gaussian scale mixed distribution is as follows:

\hat{s} = \sqrt{c^{' T} C_{u}^{- 1} c^{'} / M} - - - (1)

suppose E { s }²1, using the covariance matrix C of C_cIn place of C_uObtaining:

\hat{s} = \sqrt{c^{' T} C_{c}^{- 1} c^{'} / M} - - - (2)

assuming that the wavelet coefficient neighborhood vector c 'is a feature sample, the set of c' forms the feature space of the image;mahalanobis distance to cThere is a direct relationship, namely:

is the effective description of the significance of the sample in the feature space, and the higher the significance of the feature sample is, the correspondingThe larger the value; conversely, the less significant the feature sample is, the correspondingThe smaller the value;

(4) and (3) fusing the corresponding significance of all wavelet coefficient sub-bands to obtain a complete natural scene statistical significance map NSSSal model:

in the formula,representation scaleSuperposition of the significance descriptions in the different directions above;representing the addition of different scales, interpolating the significance descriptions on all scales to R × C and adding, whereinThe method is a Gaussian fuzzy kernel function which is used for performing certain smoothing effect on the saliency map;

(5) the gray scale dynamic range of the formula (4) is adjusted to [0, 1], and the closer the value is to the position of 1, the more significant the corresponding area in the image is, the more attractive the eye is to the eyes than the other places with small values;

(6) if the image is an RGB modulated color imageCalculating the grayscale channel I, the red-green antagonistic pair channel RG and the yellow-blue antagonistic pair BY of the image according to the steps (1) to (5) to obtain saliency maps respectively marked as NSSSal _ I, NSSSal _ RG and NSSSal _ BY, and taking the weighted average of the three as the saliency map of the color image, namely:

cNSSSal＝ω_INSSSal_I+ω_RGNSSSal_RG+ω_BYNSSSal_BY(5)

wherein, ω is_I、ω_RGAnd omega_BYEach having a weight of 3 channels, and having a value of ω_I+ω_RG+ω_BY＝1。

2. The method according to claim 1, wherein in the saliency map calculated for each image, the higher the pixel value corresponds to the higher the saliency of the image; a lower pixel value corresponds to a lower image saliency.