CN110677639B - Non-reference video quality evaluation method based on feature fusion and recurrent neural network - Google Patents

Non-reference video quality evaluation method based on feature fusion and recurrent neural network Download PDF

Info

Publication number
CN110677639B
CN110677639B CN201910938025.9A CN201910938025A CN110677639B CN 110677639 B CN110677639 B CN 110677639B CN 201910938025 A CN201910938025 A CN 201910938025A CN 110677639 B CN110677639 B CN 110677639B
Authority
CN
China
Prior art keywords
video
feature fusion
network
training
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910938025.9A
Other languages
Chinese (zh)
Other versions
CN110677639A (en
Inventor
史萍
侯明
潘达
应泽峰
韩明良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN201910938025.9A priority Critical patent/CN110677639B/en
Publication of CN110677639A publication Critical patent/CN110677639A/en
Application granted granted Critical
Publication of CN110677639B publication Critical patent/CN110677639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a no-reference video quality evaluation method based on feature fusion and a recurrent neural network. The neural network used by the invention directly takes the video segment as input and adopts the feature fusion network, and the design can better extract the direct relation of the video frame, thereby more accurately obtaining the overall quality evaluation index of the video. The feature fusion network can process multiple frames at one time and obtain a feature with low dimensionality, namely the feature scale is greatly reduced relative to the data volume, and the total time can be greatly reduced for a whole video during operation.

Description

Non-reference video quality evaluation method based on feature fusion and recurrent neural network
Technical Field
The invention relates to a no-reference video quality evaluation method based on feature fusion and a recurrent neural network, and belongs to the technical field of digital video processing.
Background
Video, as a complex source of visual information, implies a large amount of valuable information. The Quality of Video directly affects the subjective feeling and information acquisition of people, and other Video tasks such as Video compression can be fed back and measured, and research on Video Quality Assessment (VQA) has also been widely regarded in recent years.
Video quality evaluation can be classified into a subjective evaluation method and an objective evaluation method. In subjective evaluation, an observer subjectively scores video quality, but the subjective evaluation has large workload, long time consumption and inconvenience; the objective evaluation method is that a computer calculates the quality index of the video according to a certain algorithm, and the evaluation method can be divided into three evaluation methods, namely Full Reference (FR), half Reference (RR) and No Reference (No Reference, NR), according to whether the Reference video is needed during evaluation:
(1) a full reference video quality evaluation method. The FR algorithm is to compare the difference between a video to be evaluated and a reference video, and analyze the distortion degree of the video to be evaluated, thereby obtaining the quality evaluation of the video to be evaluated, under the standard that an ideal video is given as the reference video. Among the common FR methods are: video quality evaluation based on video pixel statistics (mainly including peak signal-to-noise ratio and mean square error), video quality evaluation based on deep learning, and video quality evaluation based on structural information (mainly structural similarity). The FR algorithm is by far the most reliable method in objective video quality evaluation.
(2) A semi-reference video quality evaluation method. The RR algorithm is to extract partial characteristic information of a reference video as a reference, and compare and analyze the video to be evaluated so as to obtain the quality evaluation of the video. The common RR algorithm is mainly: an original video characteristic based method and a Wavelet domain statistical model based method.
(3) A no-reference video quality evaluation method. The NR algorithm is a method for evaluating the quality of a video to be evaluated under the condition of no ideal video. The commonly used NR algorithm is mainly: a method based on natural scene statistics and a method based on deep learning.
In the process of acquiring, processing, transmitting and recording videos, due to the fact that an imaging system, a processing method, a transmission medium, recording equipment and the like are not complete, and due to the fact that objects move, noise interference and the like cause video distortion and video degradation, the quality of a section of distorted video is often measured, the quality measurement of the video is directly obtained from the distorted video without using a reference video of the distorted video, and the objective evaluation of the quality of the non-reference video is called.
CN201811071199.1 discloses a no-reference image quality evaluation method based on a hierarchical feature fusion network, which mainly solves the problems of low precision and low speed in the prior art. The implementation scheme is as follows: selecting a reference image from the MSCOCO data set and establishing a polluted image database by adding noise; mean value removing and cutting are carried out on the training set images and the test set images simultaneously; designing a hierarchical feature fusion network model for performing end-to-end joint optimization according to a hierarchical processing mechanism from local features to global semantics of a human visual system; training the hierarchical feature fusion network model by utilizing a training set and a test set; performing mean value removal and cutting processing on an image to be evaluated, and inputting the processed image into a trained hierarchical feature fusion network model to obtain an image quality prediction score; therefore, the accuracy and speed of the non-reference quality evaluation are improved, and the method can be used for image screening, compression and video quality monitoring.
CN201810239888.2 discloses a full-reference virtual reality video quality evaluation method based on a convolutional neural network, which comprises the following steps: video preprocessing: obtaining a VR differential video by utilizing a left view video and a right view video of the VR video, uniformly extracting frames from the differential video, and giving non-overlapping blocks to each frame, wherein video blocks at the same position of each frame form a VR video patch; establishing two convolution neural network models with the same configuration; training a convolutional neural network model: by utilizing a gradient descent method, taking VR video patches as input, matching each patch with an original video quality score as a label, inputting the patches into a network in batches, and fully optimizing weights of each layer of the network after multiple iterations to finally obtain a convolutional neural network model for extracting virtual reality video features; extracting features by using a convolutional neural network; and obtaining a local score by using a support vector machine, obtaining a final score by using a score fusion strategy, and improving the accuracy of the objective evaluation method.
The invention aims to perform reference objective quality evaluation on video quality by adopting feature fusion and a recurrent neural network.
Disclosure of Invention
The invention provides a no-reference objective quality evaluation method aiming at the problem of poor quality evaluation performance of no-reference video in the existing video quality evaluation.
The invention adopts the technical scheme that a no-reference video quality evaluation method based on feature fusion and a recurrent neural network comprises the following steps:
step 1, obtaining a video segment from a video.
For a video, video segments are obtained through frame extraction, cropping and combination, and the video segments are used as input of the VQA model.
Step 1.1, extracting video frames, selecting the video frames at intervals of 4, and directly discarding other video frames due to redundancy;
step 1.2, cutting video frames, cutting each video frame into 280 x 280 image blocks in a window cutting mode, and setting one frame capable of cutting M image blocks;
and 1.3, combining the clipped image blocks, randomly taking N starting points in a video sequence, continuously taking T frames at the same position of the image blocks along the time direction, taking T as 8 to obtain a T multiplied by 280 video segment, wherein the T multiplied by 280 video segment is used as a minimum unit input by an VQA model, and a video segment is obtained into an M multiplied by N video segment.
And 2, building and training a feature fusion network.
Constructing and training a Resnet 50-based feature fusion network, inputting a video segment obtained in the step 1, and outputting a 1024-dimensional feature vector:
step 2.1, transforming Resnet50 into a feature fusion network, inputting [ (Batch-Size × T) × Channel × 280 × 280], adjusting to [ (Batch-Size × 1) × (Channel × T) × 280 × 280] after the 2 nd Bottleneck Layer of Resnet50, and realizing feature fusion;
step 2.2, preparing training data, taking the video segment generated in the step 1 as the input of the network, and taking the label of the video segment as the quality score of the whole video;
and 2.3, training the feature fusion network, adding the input dimension of the full connection layer to the tail of the feature fusion network to be 1, inputting the feature fusion network to be a video segment, outputting a label to be a quality score, and training by using MSE Loss.
And 3, obtaining the feature vector representation of the video.
And generating a 1024-dimensional feature vector for each video segment through the trained feature fusion network, and further forming video features.
Step 3.1, discarding the last full connection layer of the trained feature fusion network, and outputting a 1024-dimensional vector;
step 3.2, using the trained feature fusion network obtained in the step 3.1 after discarding the full connection layer to generate a feature vector of each video segment;
and 3.3, combining the characteristics of the video, and correspondingly segmenting the positions along the time axis direction to obtain the characteristics of M multiplied by N multiplied by 1024 as the video characteristics.
And 4, building and training a recurrent neural network.
And (3) building and training an LSTM recurrent neural network, inputting the video characteristics of a certain division position output in the step (3), namely the Nx 1024 video characteristics, and outputting the video quality scores.
Step 4.1, constructing an LSTM recurrent neural network, wherein the network comprises 2 layers of LSTM structures, the size of a first hidden layer is 2048, the size of a second hidden layer is 256, and then connecting a full-connection layer with the output of 1;
and 4.2, training data are sorted, and feature vectors of N video segments of one video segment are sorted into Nx 1024 to be used as the input of the recurrent neural network.
And 4.3, training a recurrent neural network, using the video quality score as a label, and using MSELoss for training.
And 5, evaluating the quality of the video.
And segmenting a section of video, sampling, extracting characteristics and evaluating quality.
Step 5.1, cutting the video to be tested into video segments according to the step 1;
step 5.2, using the feature fusion network trained in the step 2 to extract the features of the video segment cut in the step 5.1;
and 5.3, performing quality evaluation by using the recurrent neural network trained in the step 4, and obtaining M local quality scores from a video.
And 5.4, averaging the M local quality scores to obtain the overall quality score of the video.
Compared with the prior art, the invention has the following advantages:
(1) the conventional VQA technical method based on deep learning often uses a frame level network to evaluate the quality of a frame level first, and then obtains the quality score of the whole video according to the result of each frame. The neural network used by the invention directly takes the video segment as input, adopts the feature fusion network, and uses the circulating neural network to fuse the features of the video segment. The design of the invention can better extract the direct relation of the video frames, thereby more accurately obtaining the overall quality evaluation index of the video.
(2) Compared with a neural network used by a traditional image, the characteristic fusion network used by the invention has the advantages that the correlation of the content between video frames can be more fully extracted due to the characteristic fusion design on the time axis, and the characteristics obtained by the network can better represent the overall characteristics of the video.
(3) Compared with a recurrent neural network which uses frame-level characteristics as input and is used in a traditional video task, the recurrent neural network used in the invention inputs the characteristics of a video segment, so that the range of network detection quality is wider, and the overall quality evaluation of the video is more accurate.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a diagram of a feature fusion network and recurrent neural network architecture according to the present invention;
Detailed Description
The method is described in detail below with reference to the figures and examples.
Provided is an implementation mode.
The flow chart of an embodiment is shown in fig. 1, and comprises the following steps:
step S10, extracting a cutting video segment;
step S20, building and training a feature fusion network;
step S30, obtaining the feature vector representation of the video;
step S40, building and training a recurrent neural network;
step S50, carrying out quality evaluation on the video;
the extract cropped video segment adjusting step S10 of an embodiment further comprises the steps of:
step S100, extracting video frames, selecting the video frames at equal intervals, and directly discarding other video frames due to redundancy;
step S110, cutting video frames, cutting each video frame into image blocks in a window cutting mode, and setting one frame capable of cutting M image blocks;
and step S120, combining the cut image blocks, randomly taking N starting points in a video sequence, and continuously taking T frames at the same position of the image blocks along the time direction to obtain a video segment, wherein the video segment is used as a minimum unit for VQA model input, and M multiplied by N video segments can be obtained from a video segment.
The step S20 of building and training a feature fusion network according to the embodiment further includes the following steps:
step S200, modifying Resnet50 as a feature fusion network to realize feature fusion;
step S210, preparing training data, and setting labels for the video segments generated in the step S10, wherein the labels are the quality scores of the videos;
and S220, training the feature fusion network, adding the input dimension of the full connection layer to the end of the network to be 1, inputting the video segment of S210, and training by using MSE Loss, wherein the output label is a quality score.
The feature vector representation adjusting step S30 of the obtained video of the embodiment further includes the steps of:
step S300, discarding the last full connection layer of the trained feature fusion network, and outputting a 1024-dimensional vector;
step S310, generating a feature vector of each video segment by using the fusion network of S300;
step S320, combining the features of the video, and corresponding to the split position along the time axis direction to obtain the features of M × N × 1024 as the video features.
The step of building and training the recurrent neural network adjustment S40 of the embodiment further includes the following steps:
s400, building an LSTM recurrent neural network, wherein the network comprises 2 layers of LSTM structures, the size of a first hidden layer is 2048, the size of a second hidden layer is 256, and then connecting a full-connection layer with the output of 1;
step S410, training data are arranged, and feature vectors of N sections of video bands obtained in step S320 are arranged into Nx 1024 to be used as input of a recurrent neural network;
and step S420, training a cyclic neural network, using the video quality score as a label, and using MSE Loss for training.
The video quality evaluation adjustment step S50 according to the embodiment further includes the steps of:
step S500, cutting the video to be tested into video segments according to the step S10;
step S510, feature extraction is carried out on the video segments cut in the step 5.1 by using the feature fusion network trained in the step S20;
step S520, the quality evaluation is carried out by using the recurrent neural network trained in the step S40, and M local quality scores are obtained in a section of video;
step S530, averaging the M local quality scores to obtain an overall quality score of the video.
The results of experiments using the present invention are given below.
Table 1 shows the performance results on various VQA databases using the present invention. (without adding pretraining)
Table 1 results of testing the present invention in various VQA databases
Database with a plurality of databases LIVE CISQ KoNVid-1k
SRCC 0.784 0.751 0.762
PLCC 0.799 0.779 0.784

Claims (2)

1. A no-reference video quality evaluation method based on feature fusion and a recurrent neural network is characterized in that: the method comprises the following steps of,
step 1, obtaining a video segment from a video;
for a video, video segments are obtained through frame extraction, clipping and combination and are used as input of an VQA model;
step 2, building and training a feature fusion network;
building and training a Resnet 50-based feature fusion network, inputting the video segment obtained in the step 1, and outputting a 1024-dimensional feature vector; constructing and training a Resnet 50-based feature fusion network, inputting a video segment obtained in the step 1, and outputting a 1024-dimensional feature vector:
step 2.1, transforming Resnet50 into a feature fusion network, inputting [ (Batch-Size × T) × Channel × 280 × 280], adjusting to [ (Batch-Size × 1) × (Channel × T) × 280 × 280] after the 2 nd Bottleneck Layer of Resnet50, and realizing feature fusion;
step 2.2, preparing training data, taking the video segment generated in the step 1 as the input of the network, and taking the label of the video segment as the quality score of the whole video;
step 2.3, training the feature fusion network, adding the input dimension of a full connection layer to the tail of the feature fusion network to be 1, inputting the feature fusion network to be a video segment, outputting a label to be a mass fraction, and training by using MSE Loss;
step 3, obtaining the feature vector representation of the video;
generating a 1024-dimensional feature vector for each video segment through the trained feature fusion network, and further forming video features;
step 4, building and training a recurrent neural network;
building and training an LSTM recurrent neural network, inputting the video characteristics of a certain segmentation position output in the step 3, and outputting the video characteristics as a quality score of the video;
constructing an LSTM recurrent neural network, wherein the network comprises a 2-layer LSTM structure, the size of a first hidden layer is 2048, the size of a second hidden layer is 256, and then connecting a full-connection layer with the output of 1;
training data are sorted, and feature vectors of N sections of video bands obtained in the step S320 are sorted into Nx 1024 to be used as input of a recurrent neural network;
training a cyclic neural network, using the video quality score as a label, and using MSE Loss to train;
step 5, evaluating the quality of the video;
and segmenting a section of video, sampling, extracting characteristics and evaluating quality.
2. The method according to claim 1, wherein the method comprises the following steps: the steps of deriving a video segment from a video are as follows,
step 1.1, extracting video frames, selecting the video frames at intervals of 4, and directly discarding other video frames due to redundancy;
step 1.2, cutting video frames, cutting each video frame into 280 x 280 image blocks in a window cutting mode, and setting one frame capable of cutting M image blocks;
and 1.3, combining the clipped image blocks, randomly taking N starting points in a video sequence, continuously taking T frames at the same position of the image blocks along the time direction, taking 8 by default to obtain a T multiplied by 280 video segment, wherein the video segment is used as a minimum unit input by an VQA model, and the M multiplied by N video segments are obtained from a video segment.
CN201910938025.9A 2019-09-30 2019-09-30 Non-reference video quality evaluation method based on feature fusion and recurrent neural network Active CN110677639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910938025.9A CN110677639B (en) 2019-09-30 2019-09-30 Non-reference video quality evaluation method based on feature fusion and recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910938025.9A CN110677639B (en) 2019-09-30 2019-09-30 Non-reference video quality evaluation method based on feature fusion and recurrent neural network

Publications (2)

Publication Number Publication Date
CN110677639A CN110677639A (en) 2020-01-10
CN110677639B true CN110677639B (en) 2021-06-11

Family

ID=69080456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910938025.9A Active CN110677639B (en) 2019-09-30 2019-09-30 Non-reference video quality evaluation method based on feature fusion and recurrent neural network

Country Status (1)

Country Link
CN (1) CN110677639B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784694B (en) * 2020-08-20 2024-07-23 中国传媒大学 No-reference video quality evaluation method based on visual attention mechanism
CN112330613B (en) * 2020-10-27 2024-04-12 深思考人工智能科技(上海)有限公司 Evaluation method and system for cytopathology digital image quality
CN112669270B (en) * 2020-12-21 2024-10-01 北京金山云网络技术有限公司 Video quality prediction method, device and server
CN113411566A (en) * 2021-05-17 2021-09-17 杭州电子科技大学 No-reference video quality evaluation method based on deep learning
CN113473117B (en) * 2021-07-19 2022-09-02 上海交通大学 Non-reference audio and video quality evaluation method based on gated recurrent neural network
CN113822856B (en) * 2021-08-16 2024-06-21 南京中科逆熵科技有限公司 End-to-end non-reference video quality evaluation method based on hierarchical time-space domain feature representation
CN113784113A (en) * 2021-08-27 2021-12-10 中国传媒大学 No-reference video quality evaluation method based on short-term and long-term time-space fusion network and long-term sequence fusion network
WO2023195603A1 (en) * 2022-04-04 2023-10-12 Samsung Electronics Co., Ltd. System and method for bidirectional automatic sign language translation and production
CN114972267A (en) * 2022-05-31 2022-08-30 腾讯音乐娱乐科技(深圳)有限公司 Panoramic video evaluation method, computer device and computer program product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101282481A (en) * 2008-05-09 2008-10-08 中国传媒大学 Method for evaluating video quality based on artificial neural net
KR101465664B1 (en) * 2013-12-31 2014-12-01 성균관대학교산학협력단 Image data quality assessment apparatus, method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101087438A (en) * 2006-06-06 2007-12-12 安捷伦科技有限公司 System and method for computing packet loss measurement of video quality evaluation without reference
CN109308696B (en) * 2018-09-14 2021-09-28 西安电子科技大学 No-reference image quality evaluation method based on hierarchical feature fusion network
CN109961434B (en) * 2019-03-30 2022-12-06 西安电子科技大学 No-reference image quality evaluation method for hierarchical semantic attenuation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101282481A (en) * 2008-05-09 2008-10-08 中国传媒大学 Method for evaluating video quality based on artificial neural net
KR101465664B1 (en) * 2013-12-31 2014-12-01 성균관대학교산학협력단 Image data quality assessment apparatus, method and system

Also Published As

Publication number Publication date
CN110677639A (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN110677639B (en) Non-reference video quality evaluation method based on feature fusion and recurrent neural network
CN108090902B (en) Non-reference image quality objective evaluation method based on multi-scale generation countermeasure network
CN112861720B (en) Remote sensing image small sample target detection method based on prototype convolutional neural network
CN113269237B (en) Assembly change detection method, device and medium based on attention mechanism
CN109961049B (en) Cigarette brand identification method under complex scene
CN108074239B (en) No-reference image quality objective evaluation method based on prior perception quality characteristic diagram
CN104023230B (en) A kind of non-reference picture quality appraisement method based on gradient relevance
CN110751612A (en) Single image rain removing method of multi-channel multi-scale convolution neural network
CN110728640B (en) Fine rain removing method for double-channel single image
CN106127741A (en) Non-reference picture quality appraisement method based on improvement natural scene statistical model
CN110598613B (en) Expressway agglomerate fog monitoring method
CN111462002B (en) Underwater image enhancement and restoration method based on convolutional neural network
CN111402237A (en) Video image anomaly detection method and system based on space-time cascade self-encoder
CN111369548A (en) No-reference video quality evaluation method and device based on generation countermeasure network
CN109859166A (en) It is a kind of based on multiple row convolutional neural networks without ginseng 3D rendering method for evaluating quality
CN110910365A (en) Quality evaluation method for multi-exposure fusion image of dynamic scene and static scene simultaneously
CN108830829B (en) Non-reference quality evaluation algorithm combining multiple edge detection operators
CN108259893B (en) Virtual reality video quality evaluation method based on double-current convolutional neural network
Xu et al. Remote-sensing image usability assessment based on ResNet by combining edge and texture maps
CN117058735A (en) Micro-expression recognition method based on parameter migration and optical flow feature extraction
CN111784694B (en) No-reference video quality evaluation method based on visual attention mechanism
CN114915777A (en) Non-reference ultrahigh-definition video quality objective evaluation method based on deep reinforcement learning
CN114372962A (en) Laparoscopic surgery stage identification method and system based on double-particle time convolution
CN114359167A (en) Insulator defect detection method based on lightweight YOLOv4 in complex scene
CN113256563A (en) Method and system for detecting surface defects of fine product tank based on space attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant