CN111784694A - No-reference video quality evaluation method based on visual attention mechanism - Google Patents

No-reference video quality evaluation method based on visual attention mechanism Download PDF

Info

Publication number
CN111784694A
CN111784694A CN202010841520.0A CN202010841520A CN111784694A CN 111784694 A CN111784694 A CN 111784694A CN 202010841520 A CN202010841520 A CN 202010841520A CN 111784694 A CN111784694 A CN 111784694A
Authority
CN
China
Prior art keywords
video
optical flow
flow field
visual attention
attention mechanism
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010841520.0A
Other languages
Chinese (zh)
Other versions
CN111784694B (en
Inventor
史萍
侯明
潘达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN202010841520.0A priority Critical patent/CN111784694B/en
Publication of CN111784694A publication Critical patent/CN111784694A/en
Application granted granted Critical
Publication of CN111784694B publication Critical patent/CN111784694B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The invention discloses a no-reference video quality evaluation method based on a visual attention mechanism. The invention utilizes the perception effect of human eyes when observing the distorted video, namely the motion information in the video can attract the attention of the human eyes, so that the human eyes can pay more attention to the region to influence the judgment of the overall quality of the video. In addition, motion has a masking effect, and distortion of a motion region is not easily perceived by human eyes. The invention designs a visual attention mechanism model to simulate the process of human eye perception distortion, represents the motion information of a video frame pixel by pixel through an optical flow field to serve as a visual attention diagram, and acts the visual attention diagram on a deep neural network, thereby improving the performance of a video quality evaluation model.

Description

No-reference video quality evaluation method based on visual attention mechanism
Technical Field
The invention relates to a no-reference video quality evaluation method based on a visual attention mechanism, and belongs to the technical field of digital video processing.
Background
With the development of 5G network facilities and digital media, video is more and more common in people's lives. And certain distortion can be generated in the processes of acquisition, compression and transmission of the video, so that the watching experience of people is influenced. In order to improve the quality of Video services, Video providers need to evaluate the Video quality, which is called Video Quality Assessment (VQA).
Video quality evaluation can be classified into a subjective evaluation method and an objective evaluation method. In subjective evaluation, an observer subjectively scores video quality, but the subjective evaluation has large workload, long time consumption and inconvenience; the objective evaluation method is that a computer calculates the quality index of the video according to a certain algorithm, and the evaluation method can be divided into three evaluation methods of Full Reference (FR), half Reference (RR) and no Reference (NoReference, NR) according to whether the Reference video is needed during evaluation:
(1) a full reference video quality evaluation method. The FR algorithm is to compare the difference between a video to be evaluated and a reference video and analyze the distortion degree of the video to be evaluated, thereby obtaining the quality evaluation of the video to be evaluated, under the condition that a given lossless video is used as the reference video. Common FR methods are: video quality evaluation based on video pixel statistics (mainly including peak signal-to-noise ratio and mean square error), video quality evaluation based on deep learning, and video quality evaluation based on structural information (mainly structural similarity). The FR algorithm is by far the most reliable method in objective video quality evaluation.
(2) A semi-reference video quality evaluation method. The RR algorithm is to extract partial characteristic information of a reference video as a reference, and compare and analyze the video to be evaluated so as to obtain the quality evaluation of the video. The common RR algorithm is mainly: an original video characteristic based method and a Wavelet domain statistical model based method.
(3) A no-reference video quality evaluation method. The NR algorithm refers to a method of performing quality evaluation on a video to be evaluated without a lossless video as a reference video. The commonly used NR algorithm is mainly: a method based on natural scene statistics and a method based on deep learning.
Disclosure of Invention
The invention provides a no-reference objective quality evaluation method aiming at the problem of poor no-reference video quality evaluation performance in the existing video quality evaluation.
The technical scheme adopted by the invention is a no-reference video quality evaluation method based on a visual attention mechanism, which comprises the following steps:
and step 1, extracting video frames.
For a video, after frame extraction, the frame is used as an input unit of the visual attention mechanism model.
Step 1.1, extracting video frames, extracting the video frames at intervals of 4 frames, and discarding other video frames as redundancy;
step 1.2, discarding the last frame of the extracted video frame, because the frame can not calculate the optical flow field;
and 2, generating optical flow field data.
An optical flow field of the video data is generated using an open source model PWC-Net.
Step 2.1, building a PWC-Net model, and using an open-source trained model;
step 2.2, forming a video frame pair by each video frame and the next video frame as the input of the PWC-Net;
and 2.3, calculating input PWC-Net of each group of video frames to obtain optical flow field data of all the video frames.
And 3, preprocessing the optical flow field data.
And carrying out threshold truncation normalization on the optical flow field data generated by the PWC-Net, and taking the amplitude.
Step 3.1, setting threshold values Tx (default 140) and Ty (default 160) respectively for X, Y channels of optical flow field data, and discarding optical flow data values outside the threshold values and setting the optical flow data values as the threshold values;
step 3.2, dividing all values of the optical flow field data X, Y channel by Tx and Ty respectively, and carrying out normalization;
step 3.3, calculating amplitude values of all optical flow field data
Figure 100002_DEST_PATH_IMAGE001
As an optical flow amplitude map;
and 3.4, scaling the optical flow field amplitude diagram to one fourth of the original size under the condition of unchanging the aspect ratio.
And 4, building and training a visual attention mechanism model.
A visual attention mechanism network based on ResNet50 was constructed and trained.
Step 4.1, reconstructing a ResNet50 network, adding a visual attention mechanism module after the second group of convolution layers of the ResNet50, namely multiplying the optical flow field amplitude diagram obtained in the step 3 by the characteristic diagram at the moment in a bit mode; the output of the visual attention mechanism module serves as the input of the ResNet50 third set of convolutional layers;
step 4.2, training data are arranged, the video frames generated in the step 1 and the corresponding optical flow amplitude maps generated in the step 3 are input into a model, and the labels are the quality scores of the videos;
and 4.3, training a visual attention mechanism network, and training by using MSELoss.
And 5, evaluating the quality of the video.
And performing frame extraction and optical flow calculation on a section of video, and performing quality evaluation.
Step 5.1, extracting video frames from the video to be detected according to the step 1;
step 5.2, generating an optical flow field amplitude map of the video frame to be detected by using the steps of step 2 and step 3;
and 5.3, performing quality evaluation by using the visual attention mechanism network trained in the step 4, and obtaining a quality score for each video frame.
And 5.4, averaging the quality scores of all the video frames to obtain the overall quality score of the video.
Compared with the prior art, the invention has the following advantages:
(1) the present invention takes advantage of the distortion perception characteristics of the human eye for video motion regions to improve VQA performance. In the process of sensing video distortion by human eyes, the motion information attracts the attention of human eyes, so that the human eyes can pay attention to the region more easily to influence the judgment of the overall video quality. On the other hand, the motion has a masking effect, and the distortion generated by the motion area is not easily perceived by human eyes. If the motion area can be screened out, the human eye vision system can be better simulated, so that the VQA model is more accurate.
(2) The invention uses PWC-Net to generate an optical flow field, can better extract a video motion area and better represent the visual perception characteristic in VQA. The optical flow field may describe motion information in the video on a pixel-by-pixel basis, which may better represent the attention view of the visual attention mechanism in VQA. PWC-Net is a high speed, high accuracy degree of depth learning model, relative to traditional method, can produce the optical flow field of higher quality more high-efficiently.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a diagram of a visual attention mechanism model of the present invention based on ResNet 50;
Detailed Description
The method is described in detail below with reference to the figures and examples.
The flow chart of an embodiment is shown in fig. 1, and comprises the following steps:
step S10, extracting video frames;
step S20, generating an optical flow field;
step S30, preprocessing optical flow field data;
step S40, building and training a visual attention mechanism model;
step S50, carrying out quality evaluation on the video;
the extracted video frame adjusting step S10 of an embodiment further includes the steps of:
step S100, extracting video frames, selecting the video frames at equal intervals, and directly discarding other video frames due to redundancy;
in step S110, the last frame of the extracted video frame is discarded because the optical flow field cannot be calculated.
The optical flow field data preprocessing adjustment step S20 of the embodiment further includes the steps of:
step S200, building a PWC-Net model, and using an open-source trained model;
step S210, forming a video frame pair by each frame of video and a frame behind the video frame as the input of PWC-Net;
step S220, input PWC-Net is calculated for each video frame, and optical flow field data of all the video frames is obtained.
The optical flow field data preprocessing adjustment step S30 of the embodiment further includes the steps of:
step S300, respectively setting threshold values Tx and Ty for X, Y channels of optical flow field data, and discarding optical flow data values outside the threshold values and setting the optical flow data values as the threshold values;
step S310, all values of the optical flow field data X, Y channel are divided by Tx and Ty respectively for normalization;
step S320, calculating amplitude values M of all optical flow field data;
step S330, scaling the optical flow field amplitude map to one fourth of the original size under the condition of unchanged aspect ratio.
The establishing and training visual attention mechanism model adjusting step S40 of an embodiment further includes the steps of:
step S400, a ResNet50 network is reconstructed, a visual attention mechanism module is added after the second group of convolution layers of the ResNet50, namely the optical flow field amplitude diagram obtained in the step S30 is multiplied by the characteristic diagram at the moment in a bit mode;
step S410, training data are sorted, a model is input into an independent video frame and an optical flow field corresponding to the independent video frame, and a label is used for marking the quality score of the video;
and step S420, training a visual attention mechanism network, and performing training by using MSE Loss.
The video quality evaluation adjustment step S50 according to the embodiment further includes the steps of:
step S500, extracting video frames from the video to be tested according to the step S10;
step S510, generating an optical flow field amplitude map of the video frame to be detected by using the steps S20 and S30;
step S520, the quality evaluation is carried out by using the visual attention mechanism network trained in the step S40, and each video frame obtains a quality score;
step S530, the quality scores of all the video frames are averaged to obtain the overall quality score of the video.
The results of experiments using the present invention are given below.
Table 1 shows the performance results on various VQA databases using the present invention.
Note: SRCC (Spearman rank correlation coefficient)
PLCC (Pearson linear correlation coefficient)
Table 1 results of testing the present invention in various VQA databases
Database with a plurality of databases LIVE CISQ KoNVid-1k
SRCC 0.824 0.801 0.801
PLCC 0.829 0.829 0.814

Claims (5)

1. A no-reference video quality evaluation method based on a visual attention mechanism is characterized in that: the method comprises the following steps of,
step 1, extracting video frames from a video;
step 2, generating optical flow field data for the extracted video frame by using an open source model PWC-Net;
step 3, preprocessing the optical flow field data to obtain a zoomed optical flow field amplitude map;
step 4, building and training a visual attention mechanism model, specifically building and training a visual attention mechanism model based on ResNet50, wherein the visual attention mechanism model is used for scoring the quality of each extracted video frame;
and 5, extracting frames of the video to be evaluated according to the step 1, performing quality scoring on the extracted video frames to be evaluated by utilizing the trained visual attention mechanism model, and performing quality scoring on all the frames to average to obtain the integral quality score of the video.
2. The method according to claim 1, wherein the method comprises: the step of extracting video frames from the video described in step 1 is specifically as follows,
step 1.1, extracting video frames at intervals of 4 frames, and discarding other video frames as redundancy;
and step 1.2, discarding the last frame of the extracted video frame.
3. The method according to claim 1, wherein the method comprises: the step of preprocessing the optical flow field data described in step 3 is as follows,
step 3.1, setting thresholds Tx and Ty respectively for X, Y channels of optical flow field data, setting the value of the X channel as the threshold Tx and the value of the Y channel as the threshold Ty for optical flow field data of which the X channel is greater than the threshold Tx or the Y channel is greater than the threshold Ty;
step 3.2, dividing all values of the optical flow field data X, Y channel cut by the threshold in the step 3.1 by Tx and Ty respectively, and performing normalization;
the calculation process of the optical flow field amplitude map in the step 3 is as follows: calculating amplitude values M of all optical flow field data after normalization
Figure DEST_PATH_IMAGE001
As an optical flow field amplitude map.
4. The method according to claim 1, wherein the method comprises: the visual attention mechanism model in the step 4 refers to a modified ResNet50 network, wherein the modification specifically refers to adding a visual attention mechanism module after the second group of convolutional layers of the ResNet50, namely, the scaled optical flow field amplitude diagram obtained in the step 3 is multiplied by the output characteristic diagram of the second group of convolutional layers of the ResNet50 in a bit mode, and the output of the visual attention mechanism module is used as the input of the ResNet50 third group of convolutional layers.
5. The method according to claim 1, wherein the method comprises: step 4, model training, wherein training data input by the model is the video frame obtained in the step 1 and the corresponding optical flow field amplitude diagram generated in the step 3, and a label is a quality score of the training video;
and 4, training the model by adopting MSE Loss as a Loss function.
CN202010841520.0A 2020-08-20 2020-08-20 No-reference video quality evaluation method based on visual attention mechanism Active CN111784694B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010841520.0A CN111784694B (en) 2020-08-20 2020-08-20 No-reference video quality evaluation method based on visual attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010841520.0A CN111784694B (en) 2020-08-20 2020-08-20 No-reference video quality evaluation method based on visual attention mechanism

Publications (2)

Publication Number Publication Date
CN111784694A true CN111784694A (en) 2020-10-16
CN111784694B CN111784694B (en) 2024-07-23

Family

ID=72762317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010841520.0A Active CN111784694B (en) 2020-08-20 2020-08-20 No-reference video quality evaluation method based on visual attention mechanism

Country Status (1)

Country Link
CN (1) CN111784694B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112954312A (en) * 2021-02-07 2021-06-11 福州大学 No-reference video quality evaluation method fusing spatio-temporal characteristics
CN114202728A (en) * 2021-12-10 2022-03-18 北京百度网讯科技有限公司 Video detection method, device, electronic equipment, medium and product

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020126891A1 (en) * 2001-01-17 2002-09-12 Osberger Wilfried M. Visual attention model
CN102769772A (en) * 2011-05-05 2012-11-07 浙江大学 Method and device for evaluating video sequence distortion
US20170154415A1 (en) * 2015-11-30 2017-06-01 Disney Enterprises, Inc. Saliency-weighted video quality assessment
CN107318014A (en) * 2017-07-25 2017-11-03 西安电子科技大学 The video quality evaluation method of view-based access control model marking area and space-time characterisation
US20190258902A1 (en) * 2018-02-16 2019-08-22 Spirent Communications, Inc. Training A Non-Reference Video Scoring System With Full Reference Video Scores
CN110598537A (en) * 2019-08-02 2019-12-20 杭州电子科技大学 Video significance detection method based on deep convolutional network
CN110677639A (en) * 2019-09-30 2020-01-10 中国传媒大学 Non-reference video quality evaluation method based on feature fusion and recurrent neural network
CN111182292A (en) * 2020-01-05 2020-05-19 西安电子科技大学 No-reference video quality evaluation method and system, video receiver and intelligent terminal
CN111193923A (en) * 2019-09-24 2020-05-22 腾讯科技(深圳)有限公司 Video quality evaluation method and device, electronic equipment and computer storage medium
CN111314733A (en) * 2020-01-20 2020-06-19 北京百度网讯科技有限公司 Method and apparatus for evaluating video sharpness

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020126891A1 (en) * 2001-01-17 2002-09-12 Osberger Wilfried M. Visual attention model
CN102769772A (en) * 2011-05-05 2012-11-07 浙江大学 Method and device for evaluating video sequence distortion
US20170154415A1 (en) * 2015-11-30 2017-06-01 Disney Enterprises, Inc. Saliency-weighted video quality assessment
CN107318014A (en) * 2017-07-25 2017-11-03 西安电子科技大学 The video quality evaluation method of view-based access control model marking area and space-time characterisation
US20190258902A1 (en) * 2018-02-16 2019-08-22 Spirent Communications, Inc. Training A Non-Reference Video Scoring System With Full Reference Video Scores
CN110598537A (en) * 2019-08-02 2019-12-20 杭州电子科技大学 Video significance detection method based on deep convolutional network
CN111193923A (en) * 2019-09-24 2020-05-22 腾讯科技(深圳)有限公司 Video quality evaluation method and device, electronic equipment and computer storage medium
CN110677639A (en) * 2019-09-30 2020-01-10 中国传媒大学 Non-reference video quality evaluation method based on feature fusion and recurrent neural network
CN111182292A (en) * 2020-01-05 2020-05-19 西安电子科技大学 No-reference video quality evaluation method and system, video receiver and intelligent terminal
CN111314733A (en) * 2020-01-20 2020-06-19 北京百度网讯科技有限公司 Method and apparatus for evaluating video sharpness

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴泽民;彭韬频;田畅;胡磊;王露萌;: "融合空时感知特性的无参考视频质量评估算法", 电子学报, no. 03 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112954312A (en) * 2021-02-07 2021-06-11 福州大学 No-reference video quality evaluation method fusing spatio-temporal characteristics
CN112954312B (en) * 2021-02-07 2024-01-05 福州大学 Non-reference video quality assessment method integrating space-time characteristics
CN114202728A (en) * 2021-12-10 2022-03-18 北京百度网讯科技有限公司 Video detection method, device, electronic equipment, medium and product
CN114202728B (en) * 2021-12-10 2022-09-02 北京百度网讯科技有限公司 Video detection method, device, electronic equipment and medium

Also Published As

Publication number Publication date
CN111784694B (en) 2024-07-23

Similar Documents

Publication Publication Date Title
CN110677639B (en) Non-reference video quality evaluation method based on feature fusion and recurrent neural network
CN105208374B (en) A kind of non-reference picture assessment method for encoding quality based on deep learning
Zhang et al. A no-reference evaluation metric for low-light image enhancement
CN108074239B (en) No-reference image quality objective evaluation method based on prior perception quality characteristic diagram
Yue et al. Blind stereoscopic 3D image quality assessment via analysis of naturalness, structure, and binocular asymmetry
CN107318014B (en) The video quality evaluation method of view-based access control model marking area and space-time characterisation
CN111784694B (en) No-reference video quality evaluation method based on visual attention mechanism
CN105357519B (en) Quality objective evaluation method for three-dimensional image without reference based on self-similarity characteristic
CN107146220B (en) A kind of universal non-reference picture quality appraisement method
CN106127741A (en) Non-reference picture quality appraisement method based on improvement natural scene statistical model
CN104243973A (en) Video perceived quality non-reference objective evaluation method based on areas of interest
CN109741285B (en) Method and system for constructing underwater image data set
CN102722888A (en) Stereoscopic image objective quality evaluation method based on physiological and psychological stereoscopic vision
CN107743225A (en) It is a kind of that the method for carrying out non-reference picture prediction of quality is characterized using multilayer depth
CN111259844B (en) Real-time monitoring method for examinees in standardized examination room
CN105894507B (en) Image quality evaluating method based on amount of image information natural scene statistical nature
CN112528939A (en) Quality evaluation method and device for face image
CN115272203A (en) No-reference image quality evaluation method based on deep learning
CN117729381A (en) Live broadcast capability evaluation system based on non-operational data analysis
CN111539404A (en) Full-reference image quality evaluation method based on structural clues
CN104394405B (en) A kind of method for evaluating objective quality based on full reference picture
Yang et al. EHNQ: Subjective and objective quality evaluation of enhanced night-time images
CN115457029B (en) Underwater image quality measurement method based on perception characteristics
CN104899893B (en) The picture quality detection method of view-based access control model attention
Zewdie et al. A new pooling strategy for image quality metrics: Five number summary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Ying Zefeng

Inventor after: Shi Ping

Inventor after: Hou Ming

Inventor after: Pan Da

Inventor before: Shi Ping

Inventor before: Hou Ming

Inventor before: Pan Da

GR01 Patent grant
GR01 Patent grant