CN116071691B - Video quality evaluation method based on content perception fusion characteristics - Google Patents
Video quality evaluation method based on content perception fusion characteristics Download PDFInfo
- Publication number
- CN116071691B CN116071691B CN202310343979.1A CN202310343979A CN116071691B CN 116071691 B CN116071691 B CN 116071691B CN 202310343979 A CN202310343979 A CN 202310343979A CN 116071691 B CN116071691 B CN 116071691B
- Authority
- CN
- China
- Prior art keywords
- convolution
- quality
- video
- bottleneck
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000008447 perception Effects 0.000 title claims abstract description 28
- 238000013441 quality evaluation Methods 0.000 title claims abstract description 23
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 38
- 238000013528 artificial neural network Methods 0.000 claims abstract description 17
- 230000007774 longterm Effects 0.000 claims abstract description 12
- 230000000306 recurrent effect Effects 0.000 claims abstract description 11
- 238000011176 pooling Methods 0.000 claims description 17
- 230000009467 reduction Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000001303 quality assessment method Methods 0.000 claims description 2
- 230000036962 time dependent Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 2
- 241000282414 Homo sapiens Species 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a video quality evaluation method based on content perception fusion characteristics, which comprises the following steps: step 1, constructing a multidirectional differential second-order differential Gaussian filter characteristic extraction module for extracting input image characteristics; step 2, building a residual feature extraction network model based on a multi-direction differential second-order differential Gaussian filter feature extraction module and a depth convolution neural network, and inputting video frame by frame into the residual feature extraction network model to obtain content perception features of each frame of image; step 3, reducing the dimension of the content perception characteristics, inputting the content perception characteristics into a gate control recurrent neural network GRU, and modeling long-term dependency relationship to obtain quality elements and weights of the video at different moments; and 4, determining the final quality score of the video based on the quality elements and weights at different moments. The video quality evaluation method provided by the invention can realize more accurate video quality evaluation effect by extracting the content perception characteristics of the video.
Description
Technical Field
The invention relates to the field of computer vision, in particular to a video quality evaluation method based on content perception fusion characteristics.
Background
In recent years, with the widespread use of intelligent devices in human production and life, a huge amount of video materials are generated every day, but due to the limitations of various real environments and hardware device performances, the quality of the video is inevitably lost to different degrees, so that the video cannot be used in an actual application scene, and therefore, quality evaluation of the video is necessary before the video is applied to the actual scene.
The video quality evaluation method commonly used at present is mainly divided into two categories, namely subjective quality evaluation and objective quality evaluation. The subjective quality evaluation is to subjectively score various videos with different quality by people, and the method is direct and simple, but is limited by resources such as limited manpower, time and the like, and the subjective deviation of different people on the same video segment causes no unified scoring standard, so that large-scale practical application cannot be realized.
Objective video quality assessment can be divided into three categories, namely full-reference, half-reference and no-reference, according to whether original lossless video information exists or not. Since the lossless video is not existed in the real application scene with high probability as real contrast, no reference video quality evaluation has become the key point of the current research. With the continuous progress and development of the deep learning technology, the technology is gradually and widely applied to actual life, and for the non-reference video quality evaluation, although some non-reference quality evaluation methods exist at the present stage, a plurality of barriers cannot be broken through: human visual characteristics are not fully considered, a large amount of manual characteristics are required to be extracted by the traditional method, time and labor are wasted, various characteristic information of an image is not fully considered, and the like.
Disclosure of Invention
Aiming at the problems existing in the prior art, the video quality evaluation method based on the content perception fusion characteristics is provided, the content perception characteristics of a video image are obtained through a multi-direction differential second-order differential Gaussian filter characteristic extraction module and a deep convolution neural network, then the quality score is obtained through modeling of a long-term dependency relationship by a gate control recurrent neural network GRU, and the video quality is determined by combining weights.
The technical scheme adopted by the invention is as follows: a video quality evaluation method based on content perception fusion characteristics comprises the following steps:
and 4, determining the final quality score of the video based on the quality elements and weights at different moments.
Further, the substeps of the step 1 are as follows:
step 1.1, constructing a multidirectional differential second-order differential Gaussian kernel and a directional derivative thereof; in construction, the number of directions is preferably 8;
and 1.2, performing convolution operation on the input image and the multi-direction second-order differential Gaussian directional derivative to finish characteristic information extraction.
Further, the substep of the step 2 is as follows:
step 2.1, frame-by-frame splitting is carried out on an input video to obtain T RGB three-channel color images;
step 2.2, uniformly scaling the obtained image to 224 pixels by 224 pixels;
step 2.3, outputting the image obtained in the step 2.2 through a 2D convolution layer to obtain the image characteristics with the dimension of 112 multiplied by 64;
2.4, inputting the image obtained in the step 2.2 into a multi-directional differential second-order differential Gaussian filter characteristic extraction module for characteristic extraction, fusing the extracted characteristic with the characteristic output in the step 2.3, wherein the dimension of the fused characteristic is 112 multiplied by 72, and recovering the channel number to 64 dimensions by convolution operation on the fused characteristic;
step 2.5, the 64-dimensional fusion features are sent to a maximum pooling layer, and the dimensions of the output features are 56 multiplied by 64;
step 2.6, establishing a Bottleneck convolution structure, and inputting the output characteristic in the step 2.5 into the Bottleneck convolution structure output characteristic W t Feature W t The method comprises a plurality of feature graphs, wherein T is 1-T;
step 2.7, feature W t Each feature map in the map is subjected to space global pooling and then is subjected to space global levelingAnd (5) carrying out pooling and spatial global standard deviation pooling combined operation to obtain the content perception features in the feature map.
Further, in the step 2.5, the maximum pooling layer and size are 3×3, the step size is 2, and the filling dimension is 1.
Further, the specific process of establishing the Bottleneck convolution structure in the step 2.6 is as follows:
step 2.6.1, setting a 2D convolution layer Conv_2D_2, wherein the number of convolution kernels is C 1 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.2, setting a 2D convolution layer Conv_2D_3, wherein the number of convolution kernels is C 1 The convolution kernel size is 7×7, the step size is 1, and the filling dimension is 1;
step 2.6.3, setting 2D convolution layer Conv_2D_4 with number of convolution kernels C 2 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.4, sequentially connecting the 2D convolution layers Conv_2D_2, conv_2D_3 and Conv_2D_4 to obtain a convolution module named as a Bottleneck-A structure;
step 2.6.5, the number of convolution kernels of three 2D convolution layers in the Bottleneck-A structure is respectively set as 2C 1 、2C 1 、2C 2 Obtaining a Bottleneck-B structure; similarly, the number of convolution kernels is set to 4C 1 、4C 1 、4C 2 And 8C 1 、8C 1 、8C 2 Obtaining a Bottleneck-C structure and Bottleneck-D;
step 2.6.6, connecting 3 Bottleneck-A structures, 4 Bottleneck-B structures, 6 Bottleneck-C structures and 3 Bottleneck-D structures in sequence to obtain Bottleneck convolution structures.
Further, the substep of the step 3 is as follows:
step 3.1, performing dimension reduction on the content perception feature through the full connection layer FC_1 to obtain a dimension reduction feature;
step 3.2, the dimension reduction characteristics are sent into a gate control recurrent neural network GRU which can integrate and adjust and learn long-term dependency;
step 3.3, calculating the hidden layer state at the time t by taking the hidden layer state of the GRU network as the comprehensive characteristic to obtain the integrated characteristic;
step 3.4, integrating the characteristic input full-connection layer FC_2 to obtain the mass fraction at the moment t;
step 3.5, taking the lowest mass fraction in the previous frames as a memory quality element at the time t;
step 3.6, constructing the current quality element in the t-th frame, and weighting the quality score in the next few frames, so as to assign a larger weight to the frames with low quality scores.
Further, in the step 3.5, the memory quality elements are:
wherein,,representing memory quality element->Index set representing all moments +.>、/>The quality scores at time t and time k are indicated, s being a super parameter associated with time t.
Further, in the step 3.6, the current mass elements are:
wherein,,for the current quality element->For the weights, a softmin function definition is used,an index set indicating the relevant time, e indicating a natural constant.
Further, the substep of the step 4 is as follows:
step 4.1, linearly combining the memory quality element with the current quality element to obtain the approximate quality fraction of the subjective frame moment;
and 4.2, carrying out time global average pooling on the approximate quality score to obtain a final video quality score.
Further, in the step 4.1, the method for calculating the approximate mass fraction is as follows:
wherein,,representing approximate mass fraction, ++>Representing memory quality element->R is a super parameter that balances the contributions of the memory mass element and the current mass element.
Compared with the prior art, the beneficial effects of adopting the technical scheme include:
1. the constructed multi-direction differential second-order differential Gaussian filter characteristic extraction module can extract rich edge characteristic information in the image.
2. The feature extraction network model obtained by combining the constructed feature extraction module with the deep convolutional neural network has the capability of identifying different content information.
3. The recurrent neural network GRU can effectively model long-term dependency relationship of quality elements at different moments in video.
Therefore, the video quality evaluation method provided by the invention can realize more accurate video quality evaluation effect.
Drawings
Fig. 1 is a flowchart of a video quality evaluation method according to the present invention.
Fig. 2 is a schematic diagram of extracting content-aware features according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of modeling long-term dependency and evaluating video quality according to an embodiment of the present invention.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar modules or modules having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the present application include all alternatives, modifications, and equivalents as may be included within the spirit and scope of the appended claims.
Example 1
Aiming at the defects that the prior art does not fully consider the human visual characteristics, the traditional method needs to extract a large amount of manual characteristics, time and labor are wasted, various characteristic information of images are not fully considered, and the like, referring to fig. 1, the embodiment provides a video quality evaluation method based on content perception fusion characteristics, which comprises the following steps:
and 4, determining the final quality score of the video based on the quality elements and weights at different moments.
In step 1 of this embodiment, the number of directions is selected to be 8, and gradient information of different angles of the image is obtained through a multi-direction differential second-order differential gaussian filtering characteristic extraction module.
In the embodiment, gradient information is extracted through the multi-directional differential second-order differential Gaussian filter characteristic extraction module established in the step 1, and content perception characteristics are extracted through cooperation with the deep convolutional neural network, wherein the multi-directional differential second-order differential Gaussian filter characteristic extraction module and the deep convolutional neural network can form a residual characteristic extraction network model.
The gate control recurrent neural network GRU in the step 3 can integrate characteristics and learn long-term dependency.
Example 2
On the basis of embodiment 1, this embodiment further describes a multi-directional differential second order differential gaussian filter feature extraction module and a feature extraction method in step 1, which are specifically as follows:
construction of a multidirectional differentiated second order differential Gaussian kernelAnd its directional derivative>The method is specifically as follows:
wherein,,and->Respectively representing the abscissa and the ordinate of pixels in the image; />Representing the differentiation factor; />;/>,/>The selected angle value is represented, and the calculation formula is as follows:the value range of m is +.>,/>The number of the selected directions is represented, and the value range of M is any positive integer.
In this embodiment, selecting the direction number m=8 to obtain gradient information of different angles of the image; in the feature extraction, the input image I (x, y) is differentiated with a multidirectional second-order differential Gaussian directional derivativeAnd carrying out convolution operation to achieve the purpose of extracting characteristic information, wherein the specific operation is as follows:
Example 3
On the basis of embodiment 1 or 2, as shown in fig. 2, the specific process of extracting the content perception feature in step 2 is further described, and it should be noted that the feature extraction module in fig. 2 refers to a multi-direction differential second order differential gaussian filter feature extraction module:
step 2.1, for an input video material, carrying out frame-by-frame splitting on the video material to obtain T RGB three-channel color images;
step 2.2, the image is to be obtainedWherein the value range of T is 1-T, and the image size is uniformly scaled to 224 pixels by 224 pixels through an image processing size operation;
step 2.3, setting a 2D convolution layer Conv_2D_1, wherein the number of convolution kernels is 64, the size of the convolution kernels is 7 multiplied by 7, the step length is 2, the filling dimension is 3, and the output dimension of an image subjected to B2 operation is 112 multiplied by 64 after the image is subjected to Conv_2D_1;
2.4, inputting the image obtained in the step 2.2 into a multi-directional differential second-order differential Gaussian filter characteristic extraction module to perform characteristic extraction, wherein the dimension of an output characteristic is 112×112×8, then performing concat characteristic fusion operation with the characteristic output in the step 2.3, wherein the dimension of the fusion characteristic is 112×112×72, and then sending the fusion characteristic into a 1×1×64 convolution to restore the channel number to 64 dimensions;
step 2.5, the fusion characteristic with 64 dimensions of channels is sent into a largest pooling layer with a core size of 3 multiplied by 3, a step length of 2 and a filling dimension of 1, and the dimension of the output characteristic is 56 multiplied by 64;
step 2.6, establishing a Bottleneck convolution structure, and inputting the output characteristic in the step 2.5 into the Bottleneck convolution structure output characteristic W t Feature W t The method comprises a plurality of feature graphs, wherein T is 1-T;
step 2.7, feature W t Each feature map in the map is subjected to space global pooling (Spatialgp), and then is subjected to space global average pooling (GP) mean ) And spatial global standard deviation pooling (GP) std ) Obtaining feature F in feature map by joint operation t :
Feature F fused by multidirectional differential second-order differential Gaussian filter feature extraction module and deep convolution neural network t Has the ability to distinguish information of different content and thus the feature has content-aware properties.
Example 4
On the basis of embodiment 3, this embodiment proposes a specific bottleck convolution structure construction process, which is specifically as follows:
step 2.6.1, setting a 2D convolution layer Conv_2D_2, wherein the number of convolution kernels is C 1 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.2, setting a 2D convolution layer Conv_2D_3, wherein the number of convolution kernels is C 1 The convolution kernel size is 7×7, the step size is 1, and the filling dimension is 1;
step 2.6.3, setting 2D convolution layer Conv_2D_4 with number of convolution kernels C 2 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.4, sequentially connecting the 2D convolution layers Conv_2D_2, conv_2D_3 and Conv_2D_4 to obtain a convolution module named as a Bottleneck-A structure;
step 2.6.5, the number of convolution kernels of three 2D convolution layers in the Bottleneck-A structure is respectively set as 2C 1 、2C 1 、2C 2 Obtaining a Bottleneck-B structure; similarly, the number of convolution kernels is set to 4C 1 、4C 1 、4C 2 And 8C 1 、8C 1 、8C 2 Obtaining a Bottleneck-C structure and Bottleneck-D;
step 2.6.6, connecting 3 Bottleneck-A structures, 4 Bottleneck-B structures, 6 Bottleneck-C structures and 3 Bottleneck-D structures in sequence to obtain Bottleneck convolution structures.
Example 5
On the basis of embodiment 3 or 4, as shown in fig. 3, the present embodiment is further described with respect to a specific procedure of modeling long-term dependencies and acquiring quality elements using a recurrent neural network with gate control. Specific:
step 3.1, performing dimension reduction on the content perception feature through a full connection layer FC_1 to obtain a dimension reduction feature X t ;
Wherein,,and->For two parameters in the fully connected layer fc_1, scaling and bias terms are represented, respectively.
Step 3.2, the dimension reduction characteristics are sent into a gate control recurrent neural network GRU which can integrate and adjust and learn long-term dependency;
step 3.3, calculating the hidden layer state at the time t by taking the hidden layer state of the GRU network as the comprehensive characteristic to obtain the integrated characteristic;
in this embodiment, the hidden layer has an initial value ofh 0 Hidden layer integration feature at time th t From input features X at time t t And the hidden layer at the previous momenth t-1 And (3) calculating to obtain:
step 3.4, integration of featuresInputting another full connection layer FC_2 to obtain the quality fraction at the moment tq t ;
Wherein,,and->For two parameters in the fully connected layer fc_2, scaling and bias terms are represented, respectively.
Step 3.5, taking the lowest quality fraction in the previous frames as a memory quality element at the time t;
Wherein,,representing memory quality element->Index set representing all moments +.>、/>The quality scores at time t and time k are indicated, s being a super parameter associated with time t.
Step 3.6, in order to simulate the phenomenon that human beings have deep memory for video quality degradation and have weak perceptibility for video quality enhancement, in this embodiment, the current quality element is constructed in the t-th frameAnd weighting the quality score in the next few frames (may be made of +.>Determined) a greater weight is assigned to frames with low quality scores by:
wherein,,for the current quality element->For the weights, a softmin function definition is used,an index set indicating the relevant time, e indicating a natural constant.
Example 6
On the basis of embodiment 5, this embodiment further describes the method for obtaining the final quality score of the video in step 4, specifically:
step 4.1, obtaining the approximate mass fraction of the subjective frame moment by linearly combining the memory mass element and the current mass element;
Wherein r is a super parameter balancing the contributions of the memory mass element and the current mass element.
Step 4.2, approximate mass fractionAnd obtaining the final video quality fraction Q after time global average pooling.
The invention can be better realized based on any of the embodiments 1-6, and the quality score of one end video can be accurately obtained.
It should be noted that, in the description of the embodiments of the present invention, unless explicitly specified and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; may be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention will be understood in detail by those skilled in the art; the accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.
Claims (9)
1. A video quality assessment method based on content-aware fusion features, comprising:
step 1, constructing a multidirectional differential second-order differential Gaussian filter characteristic extraction module for extracting input image characteristics;
step 2, building a residual feature extraction network model based on a multi-direction differential second-order differential Gaussian filter feature extraction module and a depth convolution neural network, and inputting video frame by frame into the residual feature extraction network model to obtain content perception features of each frame of image;
step 3, reducing the dimension of the content perception characteristics, inputting the content perception characteristics into a gate control recurrent neural network GRU, and modeling long-term dependency relationship to obtain quality elements and weights of the video at different moments;
step 4, determining the final quality score of the video based on quality elements and weights at different moments;
the substep of the step 3 is as follows:
step 3.1, performing dimension reduction on the content perception feature through the full connection layer FC_1 to obtain a dimension reduction feature;
step 3.2, the dimension reduction characteristics are sent into a gate control recurrent neural network GRU which can integrate and adjust and learn long-term dependency;
step 3.3, calculating the hidden layer state at the time t by taking the hidden layer state of the GRU network as the comprehensive characteristic to obtain the integrated characteristic;
step 3.4, integrating the characteristic input full-connection layer FC_2 to obtain the mass fraction at the moment t;
step 3.5, taking the lowest mass fraction in the previous frames as a memory quality element at the time t;
step 3.6, constructing the current quality element in the t-th frame, and weighting the quality score in the next few frames, so as to assign a larger weight to the frames with low quality scores.
2. The method for evaluating video quality based on content aware fusion feature according to claim 1, wherein the substeps of step 1 are:
step 1.1, constructing a multidirectional differential second-order differential Gaussian kernel and a directional derivative thereof;
and 1.2, performing convolution operation on the input image and the multi-direction second-order differential Gaussian directional derivative to finish characteristic information extraction.
3. The video quality evaluation method based on content aware fusion feature according to claim 1 or 2, wherein the sub-steps of step 2 are:
step 2.1, frame-by-frame splitting is carried out on an input video to obtain T RGB three-channel color images;
step 2.2, uniformly scaling the obtained image to 224 pixels by 224 pixels;
step 2.3, outputting the image obtained in the step 2.2 through a 2D convolution layer to obtain the image characteristics with the dimension of 112 multiplied by 64;
2.4, inputting the image obtained in the step 2.2 into a multi-directional differential second-order differential Gaussian filter characteristic extraction module for characteristic extraction, fusing the extracted characteristic with the characteristic output in the step 2.3, wherein the dimension of the fused characteristic is 112 multiplied by 72, and recovering the channel number to 64 dimensions by convolution operation on the fused characteristic;
step 2.5, the 64-dimensional fusion features are sent to a maximum pooling layer, and the dimensions of the output features are 56 multiplied by 64;
step 2.6, establishing a Bottleneck convolution structure, and inputting the output characteristic in the step 2.5 into the Bottleneck convolution structure output characteristic W t Feature W t The method comprises a plurality of feature graphs, wherein T is 1-T;
step 2.7, feature W t And carrying out space global pooling on each feature map, and obtaining content perception features in the feature maps through the combined operation of space global average pooling and space global standard deviation pooling.
4. The video quality evaluation method based on content aware fusion feature according to claim 3, wherein in the step 2.5, the maximum pooling layer and size is 3×3, the step size is 2, and the filling dimension is 1.
5. The video quality evaluation method based on content aware fusion feature according to claim 3, wherein in the step 2.6, the specific process of creating the bottleck convolution structure is as follows:
step 2.6.1, setting a 2D convolution layer Conv_2D_2, wherein the number of convolution kernels is C 1 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.2, setting a 2D convolution layer Conv_2D_3, wherein the number of convolution kernels is C 1 The convolution kernel size is 7×7, the step size is 1, and the filling dimension is 1;
step 2.6.3, setting 2D convolution layer Conv_2D_4 with number of convolution kernels C 2 The convolution kernel size is 1×1, the step size is 1, and the filling dimension is 0;
step 2.6.4, sequentially connecting the 2D convolution layers Conv_2D_2, conv_2D_3 and Conv_2D_4 to obtain a convolution module named as a Bottleneck-A structure;
step 2.6.5, the number of convolution kernels of three 2D convolution layers in the Bottleneck-A structure is respectively set as 2C 1 、2C 1 、2C 2 Obtaining a Bottleneck-B structure; similarly, the number of convolution kernels is set to 4C 1 、4C 1 、4C 2 And 8C 1 、8C 1 、8C 2 Obtaining a Bottleneck-C structure and Bottleneck-D;
step 2.6.6, connecting 3 Bottleneck-A structures, 4 Bottleneck-B structures, 6 Bottleneck-C structures and 3 Bottleneck-D structures in sequence to obtain Bottleneck convolution structures.
6. The method for evaluating video quality based on content aware fusion feature according to claim 1, wherein in the step 3.5, the memory quality elements are:
7. The method for evaluating video quality based on content aware fusion feature according to claim 6, wherein in step 3.6, the current quality element is:
8. The method for evaluating video quality based on content aware fusion feature according to claim 1, wherein the sub-step of step 4 is:
step 4.1, linearly combining the memory quality element with the current quality element to obtain the approximate quality fraction of the subjective frame moment;
and 4.2, carrying out time global average pooling on the approximate quality score to obtain a final video quality score.
9. The method for evaluating video quality based on content aware fusion feature according to claim 8, wherein in the step 4.1, the approximate quality score calculating method is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310343979.1A CN116071691B (en) | 2023-04-03 | 2023-04-03 | Video quality evaluation method based on content perception fusion characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310343979.1A CN116071691B (en) | 2023-04-03 | 2023-04-03 | Video quality evaluation method based on content perception fusion characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116071691A CN116071691A (en) | 2023-05-05 |
CN116071691B true CN116071691B (en) | 2023-06-23 |
Family
ID=86171795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310343979.1A Active CN116071691B (en) | 2023-04-03 | 2023-04-03 | Video quality evaluation method based on content perception fusion characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116071691B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115511858A (en) * | 2022-10-08 | 2022-12-23 | 杭州电子科技大学 | Video quality evaluation method based on novel time sequence characteristic relation mapping |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140044197A1 (en) * | 2012-08-10 | 2014-02-13 | Yiting Liao | Method and system for content-aware multimedia streaming |
CN111833246B (en) * | 2020-06-02 | 2022-07-08 | 天津大学 | Single-frame image super-resolution method based on attention cascade network |
US11335033B2 (en) * | 2020-09-25 | 2022-05-17 | Adobe Inc. | Compressing digital images utilizing deep learning-based perceptual similarity |
CN112784698B (en) * | 2020-12-31 | 2024-07-02 | 杭州电子科技大学 | No-reference video quality evaluation method based on deep space-time information |
CN113554599B (en) * | 2021-06-28 | 2023-08-18 | 杭州电子科技大学 | Video quality evaluation method based on human visual effect |
-
2023
- 2023-04-03 CN CN202310343979.1A patent/CN116071691B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115511858A (en) * | 2022-10-08 | 2022-12-23 | 杭州电子科技大学 | Video quality evaluation method based on novel time sequence characteristic relation mapping |
Also Published As
Publication number | Publication date |
---|---|
CN116071691A (en) | 2023-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | 3D panoramic virtual reality video quality assessment based on 3D convolutional neural networks | |
CN108230278B (en) | Image raindrop removing method based on generation countermeasure network | |
CN111612708B (en) | Image restoration method based on countermeasure generation network | |
CN102507592B (en) | Fly-simulation visual online detection device and method for surface defects | |
CN109671023A (en) | A kind of secondary method for reconstructing of face image super-resolution | |
CN110059728B (en) | RGB-D image visual saliency detection method based on attention model | |
CN110555434A (en) | method for detecting visual saliency of three-dimensional image through local contrast and global guidance | |
CN110349087B (en) | RGB-D image high-quality grid generation method based on adaptive convolution | |
CN108805839A (en) | Combined estimator image defogging method based on convolutional neural networks | |
CN109635822B (en) | Stereoscopic image visual saliency extraction method based on deep learning coding and decoding network | |
CN113112416B (en) | Semantic-guided face image restoration method | |
CN108090472A (en) | Pedestrian based on multichannel uniformity feature recognition methods and its system again | |
CN114863236A (en) | Image target detection method based on double attention mechanism | |
CN111709900A (en) | High dynamic range image reconstruction method based on global feature guidance | |
CN107944437B (en) | A kind of Face detection method based on neural network and integral image | |
CN111611861A (en) | Image change detection method based on multi-scale feature association | |
CN113554599A (en) | Video quality evaluation method based on human visual effect | |
CN111429402A (en) | Image quality evaluation method for fusing advanced visual perception features and depth features | |
CN111524060B (en) | System, method, storage medium and device for blurring portrait background in real time | |
CN111882516B (en) | Image quality evaluation method based on visual saliency and deep neural network | |
CN116468672A (en) | Non-reference image quality evaluation method based on self-adaptive feature weighted fusion | |
CN107909565A (en) | Stereo-picture Comfort Evaluation method based on convolutional neural networks | |
CN115272119A (en) | Image shadow removing method and device, computer equipment and storage medium | |
CN116071691B (en) | Video quality evaluation method based on content perception fusion characteristics | |
CN113411566A (en) | No-reference video quality evaluation method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |