CN113361599B - Video time domain saliency measurement method based on perception characteristic parameter measurement - Google Patents

Video time domain saliency measurement method based on perception characteristic parameter measurement Download PDF

Info

Publication number
CN113361599B
CN113361599B CN202110625964.5A CN202110625964A CN113361599B CN 113361599 B CN113361599 B CN 113361599B CN 202110625964 A CN202110625964 A CN 202110625964A CN 113361599 B CN113361599 B CN 113361599B
Authority
CN
China
Prior art keywords
motion
time domain
perception
video
measurement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110625964.5A
Other languages
Chinese (zh)
Other versions
CN113361599A (en
Inventor
邢亚芬
殷海兵
陈勇
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110625964.5A priority Critical patent/CN113361599B/en
Publication of CN113361599A publication Critical patent/CN113361599A/en
Application granted granted Critical
Publication of CN113361599B publication Critical patent/CN113361599B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of video processing and machine vision, and discloses a video time domain saliency measurement method based on perception characteristic parameter measurement, which comprises the following steps: step 1: extracting video time domain motion information; step 2: and measuring and fusing the perception characteristic parameters. According to the invention, five parameters affecting the HVS time domain perception characteristic in the video are considered, the action mechanism of the parameters is analyzed, and a corresponding probability density function is provided, so that the perception significance and uncertainty caused by the parameters can be quantitatively measured. The method for measuring the parameters by using the perception information theory provided by the invention maps the parameters to a uniform scale, and solves the problem of difficult fusion of heterogeneous characteristic parameters. The invention provides a time domain visual perception significance measurement method which can achieve a better expected effect.

Description

Video time domain saliency measurement method based on perception characteristic parameter measurement
Technical Field
The invention belongs to the technical field of video processing and machine vision, and particularly relates to a video time domain saliency measurement method based on perception characteristic parameter measurement.
Background
Video saliency detection aims at locating the region most attractive to the human eye in a given video sequence, and is widely used for behavior detection, object tracking, video compression, video quality assessment, and the like. In recent years, image saliency detection algorithms have been relatively sophisticated, and video saliency detection is still a newer research direction. Unlike image saliency detection, which is performed only in the spatial domain, the video contains a large amount of motion information, which increases the difficulty of video saliency detection research. Therefore, how to correctly extract and effectively use time domain information has become a new research trend in the field of video saliency detection.
Early video saliency detection algorithms typically marked moving foreground objects as salient regions. Detection methods in recent years combine spatial and temporal information. Itti et al utilize a bayesian model to calculate the difference between the posterior information and the prior information of the observer to measure how surprise the event occurred to predict video saliency. Ma et al integrate a top-down mechanism into a classical bottom-up saliency detection model. Zhai et al linearly combine the saliency maps of the time and space domains to obtain a saliency map of the video frame. Guo et al use two color channels, one luminance channel and one motion channel to predict video saliency. Fang et al propose a new algorithm to detect visual saliency and calculate video saliency by combining the uncertainty of the spatial-temporal information and the metric data, fusing the spatial-temporal saliency with the uncertainty weight. Bak proposes a spatial-temporal saliency network, and different fusion mechanisms are studied to integrate spatial and temporal information. Jang et al add a two-layer convolutional long-term memory network (2C-LSTM) based on their previous algorithm, using OM-CNN extraction features as input. Wang et al propose a novel video recognition model (ACL) to enable fast, end-to-end saliency learning. Zhang et al propose a spatial domain saliency algorithm based on convolutional neural networks and a temporal saliency algorithm based on motion vectors.
The main challenge of video saliency detection is how to fully extract motion information in video and effectively fuse the perceived saliency and uncertainty of the video. The method is based on the research of Stocker on human eye visual speed perception, and the visual perception priori probability distribution and likelihood function of the characteristic parameters are obtained by using a psychovisual experiment, so that the motion information can be quantized in a perception meaning mode, and the perception significance and perception uncertainty of various motion information can be fused better.
Disclosure of Invention
The invention aims to provide a video time domain saliency measurement method based on a perception characteristic parameter measurement, which aims to solve the technical problem of how to fully extract motion information in a video and effectively integrate the perception saliency and uncertainty of the motion information.
In order to solve the technical problems, the specific technical scheme of the video time domain saliency measurement method based on the perception characteristic parameter measurement is as follows:
a method for measuring video time domain saliency based on a sensing characteristic parameter measurement, comprising the following steps:
step 1: extracting video time domain motion information: five main characteristic parameters affecting visual attention of human eyes are extracted: target relative motion, background motion, time domain duration, residual fluctuation intensity on motion track and adjacent inter-frame prediction residual; and according to the size, the motion direction and the influence of the space-time domain motion rule degree parameters of the moving object, the significance and the uncertainty of the five main perception characteristic parameters are regulated.
Step 2: measurement and fusion of perception characteristic parameters: and measuring the perception significance and the perception uncertainty by using self-information and information entropy, and obtaining the time domain visual perception significance after fusion.
Further, the step 2 comprises the following specific steps:
step 2.1: sensing the significance by using the self-information measurement, and sensing the uncertainty by using the information entropy measurement;
step 2.2: and fusing five perception characteristic parameters of the same dimension to obtain the time domain perception significance.
Further, the step 2.1 comprises the following specific steps:
step 2.1.1: relative motion homogenization;
step 2.1.2: background motion homogenization;
step 2.1.3: time domain duration homogenization;
step 2.1.4: homogenizing residual fluctuation intensity;
step 2.1.5: inter prediction residual homogenization.
Further, the specific steps of step 2.1.1 are as follows:
the prior probability distribution of relative motion is represented by a power function:
wherein v is r For relative movement speed, i.e. relative movement vector length, beta 1 Is a constant greater than 0; model parameter alpha 1 Modeling is a function related to the area of motion:
α 1 =k 1 e s(x)
wherein s (x) is the area of the moving object, k 1 A constant greater than zero; thus, a priori summaries are knownThe rate distribution, the visual perception significance of relative motion is measured using self-information:
I(v r )=-log 2 p(v r )=α 1 log 2 v r -log 2 β 1
a saliency adjustment factor based on direction of motion:
where θ is the relative motion vector direction angle,is an amplitude adjustment factor; thus, the visual perception significance of the relative motion is calculated as:
I(v r )=I(v)×h(θ)
further, the specific steps of step 2.1.2 are as follows:
the likelihood function is used to represent the equivalent noise due to the video background motion:
wherein v is g For video background motion, m 1 To stimulate v g Equivalent noise generated; width parameter sigma of gaussian curve 1 Inversely proportional to the contrast threshold c:
parameter lambda 1 、γ 1 Is a constant greater than 0;
let v a (t, i, j) is the size of the motion vector of the current pixel point of the t frame, and the irregularity Deltav of the size of the airspace motion vector is defined in a certain area range omega (i, j) s Standard deviation for all motion vectors in this range; adding Deltav s Adjusting background movementThe visual perception uncertainty, using the information entropy metric formula, is as follows:
wherein the method comprises the steps ofIs constant ρ 1 To adjust the control parameters.
Further, the specific steps of step 2.1.3 are as follows:
the relationship of time domain duration τ to visual saliency of the human eye can be described using a sigmoid function:
wherein a and b are both constant adjustment factors;
measuring whether the target motion is regular or not by utilizing the change quantity of the target motion direction in the time domain duration, and combining the target motion vector size v a To adjust the significance of the time domain duration, modeling the probability density function as:
wherein the parameter alpha 2 、β 2 A constant greater than zero;is the change amount of the target movement direction in unit time, delta theta t The calculation method comprises the following steps:
θ (t, i, j) is the motion vector direction angle of the current pixel (i, j), θ (t-1, p, q) is the motion vector angle of its best matching position (p, q) in the t-1 frame; using k (t-1, p, q) to describe whether a motion matching point exists in the t-1 frame, if the matching point exists, k (t-1, p, q) =1, otherwise 0; therefore, after the two parameters are used for adjustment, the visual perception significance calculation formula of the time domain duration is as follows:
further, the specific steps of step 2.1.4 are as follows:
modeling the probability density function as a lognormal distribution form, the formula being:
wherein delta is the stimulus, namely residual fluctuation intensity, m 2 For added equivalent noise, the width parameter sigma 2 Is a constant for the residual fluctuation intensity delta and is related to the irregularity Deltav of the time domain motion vector t The following are related:
wherein lambda is 2 、γ 2 Is a constant greater than 0, defining the irregularity Deltav of the temporal motion vector t The method comprises the following steps:
therefore, adding the time domain motion vector size irregularity to adjust the visual perception uncertainty caused by residual fluctuation intensity uses the information entropy measurement formula as follows:
wherein the method comprises the steps ofIs a constant.
Further, the specific steps of step 2.1.5 are as follows:
modeling the probability density function as a lognormal distribution form, the formula being:
wherein e is a stimulus source, namely an adjacent inter prediction residual, and the calculation formula is as follows:
f (t, i, j) and f (t-1, i, j) are the differences between the corresponding luminance values in the t frame and the t-1 frame, g (t, i, j) and g (t-1, i, j) are the differences between the corresponding average background luminance values, m 3 Is added equivalent noise; width parameter sigma 3 E is a constant, influenced by a luminance adaptive threshold LA; LA represents the sensitivity of the human eye to different background brightness, and the curve width σ is represented using the following formula 3 Relationship with luminance adaptability threshold LA:
wherein lambda is 3 、γ 3 Is a constant greater than 0; thus, the visual perception uncertainty caused by neighboring inter-prediction residues can be measured in terms of information entropy:
wherein the method comprises the steps ofIs constant.
Further, the step 2.2 comprises the following specific steps:
respectively fusing the perception saliency and the perception uncertainty, and calculating the time domain perception saliency a t The formula is as follows:
U_t=(U(v g )+U(e)+U(δ))-0.3*min(U(v g ),U(e),U(δ))
I_t=I(v r )+I(τ)-0.3*min(I(v r ),I(τ))
where parameter μ controls the slope of the curve, μ=0.6.
The video time domain significance measurement method based on the perception characteristic parameter measurement has the following advantages:
1. according to the invention, five parameters affecting the HVS time domain perception characteristic in the video are considered, the action mechanism of the parameters is analyzed, and a corresponding probability density function is provided, so that the perception significance and uncertainty caused by the parameters can be quantitatively measured.
2. The method for measuring the parameters by using the perception information theory provided by the invention maps the parameters to a uniform scale, and solves the problem of difficult fusion of heterogeneous characteristic parameters.
3. The invention provides a time domain visual perception significance measurement method which can achieve a better expected effect.
Drawings
FIG. 1 is a schematic diagram of the calculation of time-domain visual perception characteristic parameters according to the present invention;
FIG. 2 is an original diagram of a video temporal saliency metric method based on perceptual feature parameter metrics of the present invention;
FIG. 3 is a saliency map of a video temporal saliency metric method based on perceptual feature parameter metrics;
fig. 4 is an overall block diagram of the temporal saliency metric of the present invention.
Detailed Description
For a better understanding of the objects, structures and functions of the present invention, a method for measuring video temporal saliency based on a perceptual feature parameter measurement is described in further detail below with reference to the accompanying drawings.
Measuring the visual perception saliency of a video in the time domain requires consideration of both visual attention and the impact of visual masking effects. Visual masking effects refer to a decrease in the ability of the human eye to respond to stimuli in a spatially or temporally complex background. Masking effects are highly dependent on the randomness of the image or video content. Regular images or videos often contain predictable content, and the appearance of stimulus leads to subjective expectations and actual inconsistencies, and stimulus can be distinguished from the surrounding environment, which is a visually significant effect caused by stimulus. In a random context, the content is unpredictable and therefore no changes are noticeable to it. Motion information in video is highly relevant to visual attention and visual masking as an important video feature. Regular and steady movements tend to attract the attention of the human eye, while a large number of random movements consume human eye perceived energy, reducing the human eye resolution and reducing the visual perception prominence. Because the masking effect is highly dependent on the structure and the motion regularity in the space and time directions, the motion regularity of a video airspace is measured by taking the consistency of the motion vector size of the current frame into consideration, the motion regularity of a time domain is measured by taking the consistency of the motion vector direction and the motion size on a motion track, and the masking effect is measured by taking the prediction residual errors of the current frame and the next frame as the time domain randomness.
In order to measure the significance and uncertainty of the video perception characteristic parameters and effectively integrate the influence of various heterogeneous perception characteristic parameters on the visual perception significance, the significance is perceived by using self-information measurement from the angle of information theory, the uncertainty is perceived by using information entropy measurement, the significance of relative motion is regulated by considering the size and the motion direction of a moving target, and the uncertainty of background motion is regulated by using the inconsistency of the size of an airspace motion vector. In addition, the significance of the temporal duration is influenced by the change amount of the motion vector direction on the motion track, the uncertainty of the prediction residual on the temporal track is influenced by the change amount of the motion vector size on the track, and the calculation of the temporal visual perception characteristic parameter is shown in fig. 1 by measuring the temporal uncertainty of the prediction residual between adjacent frames by using the information entropy.
(1) Extracting video time domain motion information:
the method extracts five main characteristic parameters affecting the visual attention of human eyes: the method comprises the steps of relative movement of a target, background movement, time domain duration, residual fluctuation intensity on a movement track, prediction residual between adjacent frames and the like, and adjusting the significance of the relative movement by utilizing the size and the movement direction of the moving target and adjusting the uncertainty of the background movement by utilizing the irregularity of the size of a space domain movement vector in consideration of the influence of parameters such as the size, the movement direction, the space domain movement regularity and the like of the moving target. In addition, the significance of the temporal duration is affected by irregularities in the direction of the motion vector on the motion trajectory, and the uncertainty of the prediction residual on the motion trajectory is affected by irregularities in the size of the temporal motion vector on the trajectory.
(2) Measurement and fusion of perception characteristic parameters:
the method considers various time domain heterogeneous perception characteristic parameters affecting visual attention, measures the perception significance and the perception uncertainty by using self-information and information entropy, and obtains the time domain visual perception significance after fusion, and the method comprises the following steps:
step one: the self-information measure is utilized to sense the significance, and the information entropy measure is utilized to sense the uncertainty.
1. Relative motion homogenization
The larger the relative motion of the moving object, the larger the motion area, and the more attractive the eye is, and the larger the visual perception significance is correspondingly. The result of the statistical analysis gives a priori probability distribution of the relative motion, which can be approximately represented by a power function:
wherein v is r For the relative motion speed, i.e. the relative motion vector length, the characteristic parameter extraction is as shown in figure 1, beta 1 Is a constant greater than 0. Since the visual perception significance of the relative motion should also be related to the area of the moving object, the visual perception significance of the relative motion should be related to the area of the moving objectModel parameter alpha 1 Modeling is a function related to the area of motion:
α 1 =k 1 e s(x) (3)
wherein s (x) is the area of the moving object, k 1 Is a constant greater than zero. Thus, given a priori probability distribution, the visual perception significance of relative motion can be measured using self-information:
I(v r )=-log 2 p(v r )=α 1 log 2 v r -log 2 β 1 (4)
in addition, since the human eye visual sensitivity has directionality, the human eye is relatively sensitive to horizontal and vertical components, and is insensitive to diagonal directions, i.e., has a tilt effect (oblique effect). Based on this analysis, considering the influence of the motion direction on the visual saliency, a saliency adjustment factor based on the motion direction is proposed:
where θ is the relative motion vector direction angle,is an amplitude adjustment factor. Thus, the visual perception significance of the relative motion is calculated as:
I(v r )=I(v)×h(θ) (6)
the visual perception significance of human eyes increases with the increase of the relative movement speed and the area of the moving object, and the visual significance is larger as the direction angle of the movement vector approaches to the horizontal and vertical directions. This corresponds to the visual perception of the human eye.
2. Background motion homogenization
The large-range random background motion consumes human eye perception energy, the resolution capability of human eyes to video fine distortion is reduced, and the suppression effect can be regarded as visual perception uncertainty caused by the background motion, which is equivalent to adding noise when the human eyes observe video details. This equivalent noise can be represented using a likelihood function:
wherein v is g For video background motion, m 1 To stimulate v g Equivalent noise generated. Width parameter sigma of gaussian curve 1 Inversely proportional to the contrast threshold c:
parameter lambda 1 、γ 1 Is a constant greater than 0.
The disparity in the magnitudes of neighboring pixel motion vectors can negatively impact the perceived quality of video, similar to the dithering phenomenon. Let v a (t, i, j) is the size of the motion vector of the current pixel point of the t frame, and the irregularity Deltav of the size of the airspace motion vector is defined in a certain area range omega (i, j) s Is the standard deviation of all motion vectors in this range. Thus, deltav is added s Adjusting the visual perception uncertainty caused by background motion, and using an information entropy measurement formula as follows:
wherein the method comprises the steps ofIs constant ρ 1 To adjust the control parameters. The visual perception uncertainty calculated by the above formula is consistent with our subjective perception. On the one hand, subtle distortions in moving video are more imperceptible relative to still images, with visual perception uncertainty increasing with increasing background motion; on the other hand, the larger the contrast threshold, the more clearly the human eye sees, and the distortion is more noticeable, so the uncertainty decreases with increasing contrast threshold. In addition, larger size irregularities in the spatial motion vectors result in stronger motion vectorsThe greater the spatial masking of the visual perception uncertainty.
3. Time domain duration homogenization
The human eye vision system has near-cause effects and asymmetric perception capabilities. The longer duration of the region portions of the video will leave stronger transient memory effects in the human brain, with higher perceived sensitivity to these image contents. The relationship of time domain duration τ to visual saliency of the human eye can be described using a sigmoid function:
wherein a and b are both constant adjustment factors. In addition, when the time domain duration is the same, the observation of moving objects tends to consume more energy than stationary objects, so that stationary areas are more attractive to the human eye if distortion fluctuations occur. According to subjective experiments, the distortion is easier to be hidden in a large amount of random movements, and in regular and stable movements, the distortion is more difficult to be hidden, whether the target movement is regular or not is measured by using the change amount of the target movement direction in the time domain duration, and the target movement vector size v is combined a To adjust the significance of the time domain duration, modeling the probability density function as:
wherein the parameter alpha 2 、β 2 Is a constant greater than zero.Is the change amount of the target movement direction in unit time, delta theta t The calculation method comprises the following steps:
θ (t, i, j) is the motion vector direction angle of the current pixel (i, j), and θ (t-1, p, q) is the motion vector angle of its best matching position (p, q) in the t-1 frame. Since there may be occlusion or exposure of the object during motion, κ (t-1, p, q) is used herein to describe whether a motion match exists in the t-1 frame, and if a match exists, κ (t-1, p, q) =1, otherwise 0. Therefore, after the two parameters are used for adjustment, the visual perception significance calculation formula of the time domain duration is as follows:
the visual perception significance calculated by the above formula increases with increasing duration over a time threshold beyond which it tends to saturate. Furthermore, absolute motion vector v a The larger the irregularity of the time domain motion direction in unit time is, the smaller the human eye visual perception significance is, and the formula accords with the HVS characteristic.
4. Residual fluctuation intensity homogenization
The frequent change of the pixel value on the motion track along with time is expressed as flickering, mosquito noise and the like, which attracts human eyes to pay attention to and gives the viewer a boring feeling, which is equivalent to a time domain uncertainty source in the video playing process. This uncertainty is equivalent to adding equivalent noise when viewing the video, which can be represented using a likelihood function. The probability density function is modeled as a lognormal distribution form taking into account the time domain HVS characteristics, the formula being:
wherein delta is the stimulus, namely residual fluctuation intensity, m 2 For added equivalent noise, the width parameter sigma 2 The residual fluctuation intensity delta can be regarded as a constant, but is related to the irregularity Deltav of the time domain motion vector t The following are related:
wherein lambda is 2 、γ 2 Is a constant greater than 0, defining the irregularity Deltav of the temporal motion vector t The method comprises the following steps:
therefore, adding the time domain motion vector size irregularity to adjust the visual perception uncertainty caused by residual fluctuation intensity uses the information entropy measurement formula as follows:
wherein the method comprises the steps ofIs a constant. The visual perception uncertainty in equation 17 is affected by the combination of residual fluctuation intensity and temporal motion irregularity: as the residual fluctuation intensity and the temporal motion vector size irregularity increase, which are in accordance with human eye visual characteristics.
5. Inter prediction residual homogenization
Bayesian brain theory shows that for a currently input image, the human brain automatically predicts a subsequent video frame to achieve perception of the input scene. The error between the input image and the predicted image in the human brain is an unpredictable part, i.e. the uncertainty of the domain component. This uncertainty is equivalent to adding equivalent noise when viewing the video, which can be represented using a likelihood function modeled as a log-normal distribution with respect to the time domain HVS characteristics, formulated as:
wherein e is a stimulus source, namely an adjacent inter prediction residual, and the calculation formula is as follows:
f (t, i, j) and f (t-1, i, j) are the differences between the corresponding luminance values in the t frame and the t-1 frame, g (t, i, j) and g (t-1, i, j) are the differences between the corresponding average background luminance values, m 3 Is added equivalent noise. Width parameter sigma 3 E can be considered as a constant but is influenced by the luminance adaptive threshold LA. LA represents the sensitivity of the human eye to different background brightness, subjective experiments indicate that the HVS is insensitive to areas of too high or too low background brightness, and highly sensitive to areas of medium brightness. The larger LA, the larger the human eye's perception threshold for brightness change. Accordingly, the following formula is used herein to represent the curve width σ 3 Relationship with luminance adaptability threshold LA:
wherein lambda is 3 、γ 3 Is a constant greater than 0. Thus, the visual perception uncertainty caused by neighboring inter-prediction residues can be measured in terms of information entropy:
wherein the method comprises the steps ofIs constant. The visual perception uncertainty in equation 21 is affected by the combination of the neighboring inter prediction residual and the luma adaptive threshold: this increases with increasing residual fluctuation intensity and luminance adaptive threshold, which is consistent with human eye visual characteristics.
Step two: and fusing five perception characteristic parameters of the same dimension to obtain the time domain perception significance.
Since we have calculated the visual perception significance of the relative motion, temporal duration, etc. characteristic parameters, background motion, residual fluctuation intensity and inter-frame predictionThe perceived uncertainty of the residual. Respectively fusing the perception saliency and the perception uncertainty, and calculating the time domain perception saliency a t The formula is as follows:
U_t=(U(v g )+U(e)+U(δ))-0.3*min(U(v g ),U(e),U(δ))
where parameter μ controls the slope of the curve, μ=0.6 according to subjective experiments.
To obtain visual effect, a is t Mapping to [0,1 ]]FIGS. 2 and 3 show an original image of a sixth frame of a RaceHorses sequence and its time domain perceptual saliency map a t . In fig. 3, the brighter the region indicates the greater the visual perception significance. An overall block diagram of the temporal saliency metric is shown in fig. 4.
It will be understood that the invention has been described in terms of several embodiments, and that various changes and equivalents may be made to these features and embodiments by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (6)

1. The method for measuring the video time domain saliency based on the measurement of the perception characteristic parameters is characterized by comprising the following steps:
step 1: extracting video time domain motion information: five main characteristic parameters affecting visual attention of human eyes are extracted: target relative motion, background motion, time domain duration, residual fluctuation intensity on motion track and adjacent inter-frame prediction residual; according to the size, the motion direction and the influence of space-time domain motion rule degree parameters of a moving object, the significance and uncertainty of five main perception characteristic parameters are adjusted;
step 2: measurement and fusion of perception characteristic parameters: measuring the perception significance and the perception uncertainty of the self-information and the information entropy, and obtaining the time domain visual perception significance after fusion;
step 2.1: sensing the significance by using the self-information measurement, and sensing the uncertainty by using the information entropy measurement;
step 2.1.1: relative motion homogenization;
step 2.1.2: background motion homogenization;
step 2.1.3: time domain duration homogenization;
the relationship of time domain duration τ to visual saliency of the human eye can be described using a sigmoid function:
wherein a and b are both constant adjustment factors;
measuring whether the target motion is regular or not by utilizing the change quantity of the target motion direction in the time domain duration, and combining the targets
Motion vector magnitude v a To adjust the significance of the time domain duration, modeling the probability density function as:
wherein the parameter alpha 2 、β 2 A constant greater than zero;is the change amount of the target movement direction in unit time, delta theta t The calculation method comprises the following steps:
θ (t, i, j) is the motion vector direction angle of the current pixel (i, j), θ (t-1, p, q) is the motion vector angle of its best matching position (p, q) in the t-1 frame; using k (t-1, p, q) to describe whether a motion matching point exists in the t-1 frame, if the matching point exists, k (t-1, p, q) =1, otherwise 0; therefore, after the two parameters are used for adjustment, the visual perception significance calculation formula of the time domain duration is as follows:
step 2.1.4: homogenizing residual fluctuation intensity;
step 2.1.5: homogenizing an inter-frame prediction residual;
step 2.2: and fusing five perception characteristic parameters of the same dimension to obtain the time domain perception significance.
2. The method of video temporal saliency measurement based on perceptual feature parameter measurement of claim 1, wherein step 2.1.1 comprises the specific steps of:
the prior probability distribution of relative motion is represented by a power function:
wherein v is r For relative movement speed, i.e. relative movement vector length, beta 1 Is a constant greater than 0; model parameter alpha 1 Modeling is a function related to the area of motion:
α 1 =k 1 e s(x)
wherein s (x) is the area of the moving object, k 1 A constant greater than zero; thus, the prior probability distribution is known,
visual perception significance of relative motion is measured using self-information:
I(v r )=-log 2 p(v r )=α 1 log 2 v r -log 2 β 1
a saliency adjustment factor based on direction of motion:
where θ is the relative motion vector direction angle,is an amplitude adjustment factor; thus, the visual perception significance of the relative motion is calculated as:
I(v r )=I(v)×h(θ)。
3. the method of video temporal saliency measurement based on perceptual feature parameter measurement of claim 1, wherein step 2.1.2 comprises the specific steps of:
the likelihood function is used to represent the equivalent noise due to the video background motion:
wherein v is g For video background motion, m 1 To stimulate v g Equivalent noise generated; width parameter sigma of gaussian curve 1 Inversely proportional to the contrast threshold c:
parameter lambda 1 、γ 1 Is a constant greater than 0;
let v a (t, i, j) is the size of the motion vector of the current pixel point of the t frame, and the irregularity Deltav of the size of the airspace motion vector is defined in a certain area range omega (i, j) s Standard deviation for all motion vectors in this range; adding Deltav s Adjusting the visual perception uncertainty caused by background motion, and using an information entropy measurement formula as follows:
wherein the method comprises the steps ofIs constant ρ 1 To adjust the control parameters.
4. The method of video temporal saliency measurement based on perceptual feature parameter measurement of claim 1, wherein the specific steps of step 2.1.4 are as follows:
modeling the probability density function as a lognormal distribution form, the formula being:
wherein delta is the stimulus, namely residual fluctuation intensity, m 2 For added equivalent noise, the width parameter sigma 2 Is a constant for the residual fluctuation intensity delta and is related to the irregularity Deltav of the time domain motion vector t The following are related:
wherein lambda is 2 、γ 2 Is a constant greater than 0, defining the irregularity Deltav of the temporal motion vector t The method comprises the following steps:
therefore, the uncertainty of visual perception caused by the fluctuation intensity of the residual error is adjusted by adding the irregularity of the size of the time domain motion vector,
the information entropy metric formula is used as follows:
wherein the method comprises the steps ofIs a constant.
5. The method of video temporal saliency measurement based on perceptual feature parameter measurement of claim 1, wherein the specific steps of step 2.1.5 are as follows:
modeling the probability density function as a lognormal distribution form, the formula being:
wherein e is a stimulus source, namely an adjacent inter prediction residual, and the calculation formula is as follows:
f (t, i, j) and f (t-1, i, j) are the differences between the corresponding luminance values in the t frame and the t-1 frame, g (t, i, j) and g (t-1, i, j) are the differences between the corresponding average background luminance values, m 3 Is added equivalent noise; width parameter sigma 3 E is a constant, influenced by a luminance adaptive threshold LA; LA represents the sensitivity of the human eye to different background brightness, and the curve width σ is represented using the following formula 3 Relationship with luminance adaptability threshold LA:
wherein lambda is 3 、γ 3 Is a constant greater than 0; thus, the visual perception uncertainty caused by neighboring inter-prediction residues can be measured in terms of information entropy:
wherein the method comprises the steps ofIs constant.
6. The method of video temporal saliency measurement based on perceptual feature parameter measurement of claim 1, wherein step 2.2 comprises the specific steps of:
respectively fusing the perception saliency and the perception uncertainty, and calculating the time domain perception saliency a t The formula is as follows:
U_t=(U(v g )+U(e)+U(δ))-0.3*min(U(v g ),U(e),U(δ))
I_t=I(v r )+I(τ)-0.3*min(I(v r ),I(τ))
where parameter μ controls the slope of the curve, μ=0.6.
CN202110625964.5A 2021-06-04 2021-06-04 Video time domain saliency measurement method based on perception characteristic parameter measurement Active CN113361599B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625964.5A CN113361599B (en) 2021-06-04 2021-06-04 Video time domain saliency measurement method based on perception characteristic parameter measurement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625964.5A CN113361599B (en) 2021-06-04 2021-06-04 Video time domain saliency measurement method based on perception characteristic parameter measurement

Publications (2)

Publication Number Publication Date
CN113361599A CN113361599A (en) 2021-09-07
CN113361599B true CN113361599B (en) 2024-04-05

Family

ID=77532366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625964.5A Active CN113361599B (en) 2021-06-04 2021-06-04 Video time domain saliency measurement method based on perception characteristic parameter measurement

Country Status (1)

Country Link
CN (1) CN113361599B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108965879A (en) * 2018-08-31 2018-12-07 杭州电子科技大学 A kind of Space-time domain adaptively just perceives the measure of distortion
CN112825557A (en) * 2019-11-20 2021-05-21 北京大学 Self-adaptive sensing time-space domain quantization method aiming at video coding

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104539962B (en) * 2015-01-20 2017-12-01 北京工业大学 It is a kind of merge visually-perceptible feature can scalable video coding method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108965879A (en) * 2018-08-31 2018-12-07 杭州电子科技大学 A kind of Space-time domain adaptively just perceives the measure of distortion
CN112825557A (en) * 2019-11-20 2021-05-21 北京大学 Self-adaptive sensing time-space domain quantization method aiming at video coding

Also Published As

Publication number Publication date
CN113361599A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
Stoffregen et al. Reducing the sim-to-real gap for event cameras
Wang et al. Utility-driven adaptive preprocessing for screen content video compression
Ćulibrk et al. Salient motion features for video quality assessment
Yang et al. Perceptual quality assessment of screen content images
Bae et al. A DCT-based total JND profile for spatiotemporal and foveated masking effects
CN108965879B (en) Space-time domain self-adaptive just noticeable distortion measurement method
Tang Spatiotemporal visual considerations for video coding
Li et al. Weight-based R-λ rate control for perceptual HEVC coding on conversational videos
JP2002247574A (en) Method for modeling visual attention
CN110827193A (en) Panoramic video saliency detection method based on multi-channel features
Lyudvichenko et al. A semiautomatic saliency model and its application to video compression
CN112584089B (en) Face brightness adjusting method and device, computer equipment and storage medium
CN110796621B (en) Infrared image cross grain removing processing method, processing equipment and storage device
Feng et al. Low-cost eye gaze prediction system for interactive networked video streaming
Jakhetiya et al. Just noticeable difference for natural images using RMS contrast and feed-back mechanism
Liu et al. Video quality assessment using space–time slice mappings
CN113361599B (en) Video time domain saliency measurement method based on perception characteristic parameter measurement
Simmons et al. Transformation of stimulus correlations by the retina
Boccignone et al. Bayesian integration of face and low-level cues for foveated video coding
Zhu et al. A metric for video blending quality assessment
Xiong et al. Face region based conversational video coding
Cheng et al. A computational model for stereoscopic visual saliency prediction
Lee et al. Temporal pooling of video quality estimates using perceptual motion models
CN112967229B (en) Method for calculating just-perceived distortion threshold based on video perception characteristic parameter measurement
Tavakkoli et al. Automatic statistical object detection for visual surveillance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant