CN110677639B - Non-reference video quality evaluation method based on feature fusion and recurrent neural network - Google Patents
Non-reference video quality evaluation method based on feature fusion and recurrent neural network Download PDFInfo
- Publication number
- CN110677639B CN110677639B CN201910938025.9A CN201910938025A CN110677639B CN 110677639 B CN110677639 B CN 110677639B CN 201910938025 A CN201910938025 A CN 201910938025A CN 110677639 B CN110677639 B CN 110677639B
- Authority
- CN
- China
- Prior art keywords
- video
- feature fusion
- network
- training
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N17/00—Diagnosis, testing or measuring for television systems or their details
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a no-reference video quality evaluation method based on feature fusion and a recurrent neural network. The neural network used by the invention directly takes the video segment as input and adopts the feature fusion network, and the design can better extract the direct relation of the video frame, thereby more accurately obtaining the overall quality evaluation index of the video. The feature fusion network can process multiple frames at one time and obtain a feature with low dimensionality, namely the feature scale is greatly reduced relative to the data volume, and the total time can be greatly reduced for a whole video during operation.
Description
Technical Field
The invention relates to a no-reference video quality evaluation method based on feature fusion and a recurrent neural network, and belongs to the technical field of digital video processing.
Background
Video, as a complex source of visual information, implies a large amount of valuable information. The Quality of Video directly affects the subjective feeling and information acquisition of people, and other Video tasks such as Video compression can be fed back and measured, and research on Video Quality Assessment (VQA) has also been widely regarded in recent years.
Video quality evaluation can be classified into a subjective evaluation method and an objective evaluation method. In subjective evaluation, an observer subjectively scores video quality, but the subjective evaluation has large workload, long time consumption and inconvenience; the objective evaluation method is that a computer calculates the quality index of the video according to a certain algorithm, and the evaluation method can be divided into three evaluation methods, namely Full Reference (FR), half Reference (RR) and No Reference (No Reference, NR), according to whether the Reference video is needed during evaluation:
(1) a full reference video quality evaluation method. The FR algorithm is to compare the difference between a video to be evaluated and a reference video, and analyze the distortion degree of the video to be evaluated, thereby obtaining the quality evaluation of the video to be evaluated, under the standard that an ideal video is given as the reference video. Among the common FR methods are: video quality evaluation based on video pixel statistics (mainly including peak signal-to-noise ratio and mean square error), video quality evaluation based on deep learning, and video quality evaluation based on structural information (mainly structural similarity). The FR algorithm is by far the most reliable method in objective video quality evaluation.
(2) A semi-reference video quality evaluation method. The RR algorithm is to extract partial characteristic information of a reference video as a reference, and compare and analyze the video to be evaluated so as to obtain the quality evaluation of the video. The common RR algorithm is mainly: an original video characteristic based method and a Wavelet domain statistical model based method.
(3) A no-reference video quality evaluation method. The NR algorithm is a method for evaluating the quality of a video to be evaluated under the condition of no ideal video. The commonly used NR algorithm is mainly: a method based on natural scene statistics and a method based on deep learning.
In the process of acquiring, processing, transmitting and recording videos, due to the fact that an imaging system, a processing method, a transmission medium, recording equipment and the like are not complete, and due to the fact that objects move, noise interference and the like cause video distortion and video degradation, the quality of a section of distorted video is often measured, the quality measurement of the video is directly obtained from the distorted video without using a reference video of the distorted video, and the objective evaluation of the quality of the non-reference video is called.
CN201811071199.1 discloses a no-reference image quality evaluation method based on a hierarchical feature fusion network, which mainly solves the problems of low precision and low speed in the prior art. The implementation scheme is as follows: selecting a reference image from the MSCOCO data set and establishing a polluted image database by adding noise; mean value removing and cutting are carried out on the training set images and the test set images simultaneously; designing a hierarchical feature fusion network model for performing end-to-end joint optimization according to a hierarchical processing mechanism from local features to global semantics of a human visual system; training the hierarchical feature fusion network model by utilizing a training set and a test set; performing mean value removal and cutting processing on an image to be evaluated, and inputting the processed image into a trained hierarchical feature fusion network model to obtain an image quality prediction score; therefore, the accuracy and speed of the non-reference quality evaluation are improved, and the method can be used for image screening, compression and video quality monitoring.
CN201810239888.2 discloses a full-reference virtual reality video quality evaluation method based on a convolutional neural network, which comprises the following steps: video preprocessing: obtaining a VR differential video by utilizing a left view video and a right view video of the VR video, uniformly extracting frames from the differential video, and giving non-overlapping blocks to each frame, wherein video blocks at the same position of each frame form a VR video patch; establishing two convolution neural network models with the same configuration; training a convolutional neural network model: by utilizing a gradient descent method, taking VR video patches as input, matching each patch with an original video quality score as a label, inputting the patches into a network in batches, and fully optimizing weights of each layer of the network after multiple iterations to finally obtain a convolutional neural network model for extracting virtual reality video features; extracting features by using a convolutional neural network; and obtaining a local score by using a support vector machine, obtaining a final score by using a score fusion strategy, and improving the accuracy of the objective evaluation method.
The invention aims to perform reference objective quality evaluation on video quality by adopting feature fusion and a recurrent neural network.
Disclosure of Invention
The invention provides a no-reference objective quality evaluation method aiming at the problem of poor quality evaluation performance of no-reference video in the existing video quality evaluation.
The invention adopts the technical scheme that a no-reference video quality evaluation method based on feature fusion and a recurrent neural network comprises the following steps:
step 1, obtaining a video segment from a video.
For a video, video segments are obtained through frame extraction, cropping and combination, and the video segments are used as input of the VQA model.
Step 1.1, extracting video frames, selecting the video frames at intervals of 4, and directly discarding other video frames due to redundancy;
step 1.2, cutting video frames, cutting each video frame into 280 x 280 image blocks in a window cutting mode, and setting one frame capable of cutting M image blocks;
and 1.3, combining the clipped image blocks, randomly taking N starting points in a video sequence, continuously taking T frames at the same position of the image blocks along the time direction, taking T as 8 to obtain a T multiplied by 280 video segment, wherein the T multiplied by 280 video segment is used as a minimum unit input by an VQA model, and a video segment is obtained into an M multiplied by N video segment.
And 2, building and training a feature fusion network.
Constructing and training a Resnet 50-based feature fusion network, inputting a video segment obtained in the step 1, and outputting a 1024-dimensional feature vector:
step 2.1, transforming Resnet50 into a feature fusion network, inputting [ (Batch-Size × T) × Channel × 280 × 280], adjusting to [ (Batch-Size × 1) × (Channel × T) × 280 × 280] after the 2 nd Bottleneck Layer of Resnet50, and realizing feature fusion;
step 2.2, preparing training data, taking the video segment generated in the step 1 as the input of the network, and taking the label of the video segment as the quality score of the whole video;
and 2.3, training the feature fusion network, adding the input dimension of the full connection layer to the tail of the feature fusion network to be 1, inputting the feature fusion network to be a video segment, outputting a label to be a quality score, and training by using MSE Loss.
And 3, obtaining the feature vector representation of the video.
And generating a 1024-dimensional feature vector for each video segment through the trained feature fusion network, and further forming video features.
Step 3.1, discarding the last full connection layer of the trained feature fusion network, and outputting a 1024-dimensional vector;
step 3.2, using the trained feature fusion network obtained in the step 3.1 after discarding the full connection layer to generate a feature vector of each video segment;
and 3.3, combining the characteristics of the video, and correspondingly segmenting the positions along the time axis direction to obtain the characteristics of M multiplied by N multiplied by 1024 as the video characteristics.
And 4, building and training a recurrent neural network.
And (3) building and training an LSTM recurrent neural network, inputting the video characteristics of a certain division position output in the step (3), namely the Nx 1024 video characteristics, and outputting the video quality scores.
Step 4.1, constructing an LSTM recurrent neural network, wherein the network comprises 2 layers of LSTM structures, the size of a first hidden layer is 2048, the size of a second hidden layer is 256, and then connecting a full-connection layer with the output of 1;
and 4.2, training data are sorted, and feature vectors of N video segments of one video segment are sorted into Nx 1024 to be used as the input of the recurrent neural network.
And 4.3, training a recurrent neural network, using the video quality score as a label, and using MSELoss for training.
And 5, evaluating the quality of the video.
And segmenting a section of video, sampling, extracting characteristics and evaluating quality.
Step 5.1, cutting the video to be tested into video segments according to the step 1;
step 5.2, using the feature fusion network trained in the step 2 to extract the features of the video segment cut in the step 5.1;
and 5.3, performing quality evaluation by using the recurrent neural network trained in the step 4, and obtaining M local quality scores from a video.
And 5.4, averaging the M local quality scores to obtain the overall quality score of the video.
Compared with the prior art, the invention has the following advantages:
(1) the conventional VQA technical method based on deep learning often uses a frame level network to evaluate the quality of a frame level first, and then obtains the quality score of the whole video according to the result of each frame. The neural network used by the invention directly takes the video segment as input, adopts the feature fusion network, and uses the circulating neural network to fuse the features of the video segment. The design of the invention can better extract the direct relation of the video frames, thereby more accurately obtaining the overall quality evaluation index of the video.
(2) Compared with a neural network used by a traditional image, the characteristic fusion network used by the invention has the advantages that the correlation of the content between video frames can be more fully extracted due to the characteristic fusion design on the time axis, and the characteristics obtained by the network can better represent the overall characteristics of the video.
(3) Compared with a recurrent neural network which uses frame-level characteristics as input and is used in a traditional video task, the recurrent neural network used in the invention inputs the characteristics of a video segment, so that the range of network detection quality is wider, and the overall quality evaluation of the video is more accurate.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a diagram of a feature fusion network and recurrent neural network architecture according to the present invention;
Detailed Description
The method is described in detail below with reference to the figures and examples.
Provided is an implementation mode.
The flow chart of an embodiment is shown in fig. 1, and comprises the following steps:
step S10, extracting a cutting video segment;
step S20, building and training a feature fusion network;
step S30, obtaining the feature vector representation of the video;
step S40, building and training a recurrent neural network;
step S50, carrying out quality evaluation on the video;
the extract cropped video segment adjusting step S10 of an embodiment further comprises the steps of:
step S100, extracting video frames, selecting the video frames at equal intervals, and directly discarding other video frames due to redundancy;
step S110, cutting video frames, cutting each video frame into image blocks in a window cutting mode, and setting one frame capable of cutting M image blocks;
and step S120, combining the cut image blocks, randomly taking N starting points in a video sequence, and continuously taking T frames at the same position of the image blocks along the time direction to obtain a video segment, wherein the video segment is used as a minimum unit for VQA model input, and M multiplied by N video segments can be obtained from a video segment.
The step S20 of building and training a feature fusion network according to the embodiment further includes the following steps:
step S200, modifying Resnet50 as a feature fusion network to realize feature fusion;
step S210, preparing training data, and setting labels for the video segments generated in the step S10, wherein the labels are the quality scores of the videos;
and S220, training the feature fusion network, adding the input dimension of the full connection layer to the end of the network to be 1, inputting the video segment of S210, and training by using MSE Loss, wherein the output label is a quality score.
The feature vector representation adjusting step S30 of the obtained video of the embodiment further includes the steps of:
step S300, discarding the last full connection layer of the trained feature fusion network, and outputting a 1024-dimensional vector;
step S310, generating a feature vector of each video segment by using the fusion network of S300;
step S320, combining the features of the video, and corresponding to the split position along the time axis direction to obtain the features of M × N × 1024 as the video features.
The step of building and training the recurrent neural network adjustment S40 of the embodiment further includes the following steps:
s400, building an LSTM recurrent neural network, wherein the network comprises 2 layers of LSTM structures, the size of a first hidden layer is 2048, the size of a second hidden layer is 256, and then connecting a full-connection layer with the output of 1;
step S410, training data are arranged, and feature vectors of N sections of video bands obtained in step S320 are arranged into Nx 1024 to be used as input of a recurrent neural network;
and step S420, training a cyclic neural network, using the video quality score as a label, and using MSE Loss for training.
The video quality evaluation adjustment step S50 according to the embodiment further includes the steps of:
step S500, cutting the video to be tested into video segments according to the step S10;
step S510, feature extraction is carried out on the video segments cut in the step 5.1 by using the feature fusion network trained in the step S20;
step S520, the quality evaluation is carried out by using the recurrent neural network trained in the step S40, and M local quality scores are obtained in a section of video;
step S530, averaging the M local quality scores to obtain an overall quality score of the video.
The results of experiments using the present invention are given below.
Table 1 shows the performance results on various VQA databases using the present invention. (without adding pretraining)
Table 1 results of testing the present invention in various VQA databases
Database with a plurality of databases | LIVE | CISQ | KoNVid-1k |
SRCC | 0.784 | 0.751 | 0.762 |
PLCC | 0.799 | 0.779 | 0.784 |
Claims (2)
1. A no-reference video quality evaluation method based on feature fusion and a recurrent neural network is characterized in that: the method comprises the following steps of,
step 1, obtaining a video segment from a video;
for a video, video segments are obtained through frame extraction, clipping and combination and are used as input of an VQA model;
step 2, building and training a feature fusion network;
building and training a Resnet 50-based feature fusion network, inputting the video segment obtained in the step 1, and outputting a 1024-dimensional feature vector; constructing and training a Resnet 50-based feature fusion network, inputting a video segment obtained in the step 1, and outputting a 1024-dimensional feature vector:
step 2.1, transforming Resnet50 into a feature fusion network, inputting [ (Batch-Size × T) × Channel × 280 × 280], adjusting to [ (Batch-Size × 1) × (Channel × T) × 280 × 280] after the 2 nd Bottleneck Layer of Resnet50, and realizing feature fusion;
step 2.2, preparing training data, taking the video segment generated in the step 1 as the input of the network, and taking the label of the video segment as the quality score of the whole video;
step 2.3, training the feature fusion network, adding the input dimension of a full connection layer to the tail of the feature fusion network to be 1, inputting the feature fusion network to be a video segment, outputting a label to be a mass fraction, and training by using MSE Loss;
step 3, obtaining the feature vector representation of the video;
generating a 1024-dimensional feature vector for each video segment through the trained feature fusion network, and further forming video features;
step 4, building and training a recurrent neural network;
building and training an LSTM recurrent neural network, inputting the video characteristics of a certain segmentation position output in the step 3, and outputting the video characteristics as a quality score of the video;
constructing an LSTM recurrent neural network, wherein the network comprises a 2-layer LSTM structure, the size of a first hidden layer is 2048, the size of a second hidden layer is 256, and then connecting a full-connection layer with the output of 1;
training data are sorted, and feature vectors of N sections of video bands obtained in the step S320 are sorted into Nx 1024 to be used as input of a recurrent neural network;
training a cyclic neural network, using the video quality score as a label, and using MSE Loss to train;
step 5, evaluating the quality of the video;
and segmenting a section of video, sampling, extracting characteristics and evaluating quality.
2. The method according to claim 1, wherein the method comprises the following steps: the steps of deriving a video segment from a video are as follows,
step 1.1, extracting video frames, selecting the video frames at intervals of 4, and directly discarding other video frames due to redundancy;
step 1.2, cutting video frames, cutting each video frame into 280 x 280 image blocks in a window cutting mode, and setting one frame capable of cutting M image blocks;
and 1.3, combining the clipped image blocks, randomly taking N starting points in a video sequence, continuously taking T frames at the same position of the image blocks along the time direction, taking 8 by default to obtain a T multiplied by 280 video segment, wherein the video segment is used as a minimum unit input by an VQA model, and the M multiplied by N video segments are obtained from a video segment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910938025.9A CN110677639B (en) | 2019-09-30 | 2019-09-30 | Non-reference video quality evaluation method based on feature fusion and recurrent neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910938025.9A CN110677639B (en) | 2019-09-30 | 2019-09-30 | Non-reference video quality evaluation method based on feature fusion and recurrent neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110677639A CN110677639A (en) | 2020-01-10 |
CN110677639B true CN110677639B (en) | 2021-06-11 |
Family
ID=69080456
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910938025.9A Active CN110677639B (en) | 2019-09-30 | 2019-09-30 | Non-reference video quality evaluation method based on feature fusion and recurrent neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110677639B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111784694B (en) * | 2020-08-20 | 2024-07-23 | 中国传媒大学 | No-reference video quality evaluation method based on visual attention mechanism |
CN112330613B (en) * | 2020-10-27 | 2024-04-12 | 深思考人工智能科技(上海)有限公司 | Evaluation method and system for cytopathology digital image quality |
CN112669270B (en) * | 2020-12-21 | 2024-10-01 | 北京金山云网络技术有限公司 | Video quality prediction method, device and server |
CN113411566A (en) * | 2021-05-17 | 2021-09-17 | 杭州电子科技大学 | No-reference video quality evaluation method based on deep learning |
CN113473117B (en) * | 2021-07-19 | 2022-09-02 | 上海交通大学 | Non-reference audio and video quality evaluation method based on gated recurrent neural network |
CN113822856B (en) * | 2021-08-16 | 2024-06-21 | 南京中科逆熵科技有限公司 | End-to-end non-reference video quality evaluation method based on hierarchical time-space domain feature representation |
CN113784113A (en) * | 2021-08-27 | 2021-12-10 | 中国传媒大学 | No-reference video quality evaluation method based on short-term and long-term time-space fusion network and long-term sequence fusion network |
WO2023195603A1 (en) * | 2022-04-04 | 2023-10-12 | Samsung Electronics Co., Ltd. | System and method for bidirectional automatic sign language translation and production |
CN114972267A (en) * | 2022-05-31 | 2022-08-30 | 腾讯音乐娱乐科技(深圳)有限公司 | Panoramic video evaluation method, computer device and computer program product |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101282481A (en) * | 2008-05-09 | 2008-10-08 | 中国传媒大学 | Method for evaluating video quality based on artificial neural net |
KR101465664B1 (en) * | 2013-12-31 | 2014-12-01 | 성균관대학교산학협력단 | Image data quality assessment apparatus, method and system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101087438A (en) * | 2006-06-06 | 2007-12-12 | 安捷伦科技有限公司 | System and method for computing packet loss measurement of video quality evaluation without reference |
CN109308696B (en) * | 2018-09-14 | 2021-09-28 | 西安电子科技大学 | No-reference image quality evaluation method based on hierarchical feature fusion network |
CN109961434B (en) * | 2019-03-30 | 2022-12-06 | 西安电子科技大学 | No-reference image quality evaluation method for hierarchical semantic attenuation |
-
2019
- 2019-09-30 CN CN201910938025.9A patent/CN110677639B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101282481A (en) * | 2008-05-09 | 2008-10-08 | 中国传媒大学 | Method for evaluating video quality based on artificial neural net |
KR101465664B1 (en) * | 2013-12-31 | 2014-12-01 | 성균관대학교산학협력단 | Image data quality assessment apparatus, method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110677639A (en) | 2020-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110677639B (en) | Non-reference video quality evaluation method based on feature fusion and recurrent neural network | |
CN108090902B (en) | Non-reference image quality objective evaluation method based on multi-scale generation countermeasure network | |
CN112861720B (en) | Remote sensing image small sample target detection method based on prototype convolutional neural network | |
CN113269237B (en) | Assembly change detection method, device and medium based on attention mechanism | |
CN109961049B (en) | Cigarette brand identification method under complex scene | |
CN108074239B (en) | No-reference image quality objective evaluation method based on prior perception quality characteristic diagram | |
CN104023230B (en) | A kind of non-reference picture quality appraisement method based on gradient relevance | |
CN110751612A (en) | Single image rain removing method of multi-channel multi-scale convolution neural network | |
CN110728640B (en) | Fine rain removing method for double-channel single image | |
CN106127741A (en) | Non-reference picture quality appraisement method based on improvement natural scene statistical model | |
CN110598613B (en) | Expressway agglomerate fog monitoring method | |
CN111462002B (en) | Underwater image enhancement and restoration method based on convolutional neural network | |
CN111402237A (en) | Video image anomaly detection method and system based on space-time cascade self-encoder | |
CN111369548A (en) | No-reference video quality evaluation method and device based on generation countermeasure network | |
CN109859166A (en) | It is a kind of based on multiple row convolutional neural networks without ginseng 3D rendering method for evaluating quality | |
CN110910365A (en) | Quality evaluation method for multi-exposure fusion image of dynamic scene and static scene simultaneously | |
CN108830829B (en) | Non-reference quality evaluation algorithm combining multiple edge detection operators | |
CN108259893B (en) | Virtual reality video quality evaluation method based on double-current convolutional neural network | |
Xu et al. | Remote-sensing image usability assessment based on ResNet by combining edge and texture maps | |
CN117058735A (en) | Micro-expression recognition method based on parameter migration and optical flow feature extraction | |
CN111784694B (en) | No-reference video quality evaluation method based on visual attention mechanism | |
CN114915777A (en) | Non-reference ultrahigh-definition video quality objective evaluation method based on deep reinforcement learning | |
CN114372962A (en) | Laparoscopic surgery stage identification method and system based on double-particle time convolution | |
CN114359167A (en) | Insulator defect detection method based on lightweight YOLOv4 in complex scene | |
CN113256563A (en) | Method and system for detecting surface defects of fine product tank based on space attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |