WO2023036045A1 - 模型训练方法、视频质量评估方法、装置、设备及介质 - Google Patents
模型训练方法、视频质量评估方法、装置、设备及介质 Download PDFInfo
- Publication number
- WO2023036045A1 WO2023036045A1 PCT/CN2022/116480 CN2022116480W WO2023036045A1 WO 2023036045 A1 WO2023036045 A1 WO 2023036045A1 CN 2022116480 W CN2022116480 W CN 2022116480W WO 2023036045 A1 WO2023036045 A1 WO 2023036045A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video data
- training
- quality assessment
- model
- module
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 135
- 238000001303 quality assessment method Methods 0.000 title claims abstract description 120
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000012545 processing Methods 0.000 claims description 91
- 238000012795 verification Methods 0.000 claims description 28
- 238000013441 quality evaluation Methods 0.000 claims description 21
- 238000011156 evaluation Methods 0.000 claims description 20
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 13
- 238000011176 pooling Methods 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 11
- 238000007499 fusion processing Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 8
- 238000012937 correction Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 14
- 238000012360 testing method Methods 0.000 description 11
- 238000001994 activation Methods 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
Definitions
- This application relates to but not limited to the technical field of image processing.
- the present disclosure provides a model training method for video quality assessment, a video quality assessment method, a model training device for video quality assessment, a video quality assessment device, an electronic device, and a computer storage medium.
- the present disclosure provides a model training method for video quality assessment, including: acquiring training video data, wherein the training video data includes reference video data and distorted video data; Mean Opinion Value MOS value; training a preset initial video quality assessment model according to the training video data and its MOS value until a convergence condition is reached, and obtaining a final video quality assessment model.
- the present disclosure provides a video quality assessment method, including: processing the video data to be evaluated with a final quality assessment model trained and obtained according to any method described herein, to obtain a quality assessment score of the video data to be evaluated.
- the present disclosure provides a model training device for video quality assessment, including: an acquisition module configured to acquire training video data; wherein the training video data includes reference video data and distorted video data; a processing module, It is configured to determine the average opinion value MOS value of each of the training video data; the training module is configured to train a preset initial video quality assessment model according to the training video data and its MOS value until a convergence condition is reached to obtain a final video quality assessment Model.
- the present disclosure provides a video quality assessment device, including: an assessment module configured to process the video data to be assessed with the final quality assessment model obtained through training according to the aforementioned model training method for video quality assessment, A quality evaluation score of the video data to be evaluated is obtained.
- the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored; when the one or more programs are processed by the one or more When executed by a processor, the one or more processors implement any model training method for video quality assessment described herein.
- the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored; when the one or more programs are processed by the one or more When executed by a processor, the one or more processors are made to implement any video quality assessment method described herein.
- the present disclosure provides a computer storage medium on which a computer program is stored, wherein when the program is executed by a processor, any model training method for video quality assessment described herein is implemented.
- the present disclosure provides a computer storage medium on which a computer program is stored, wherein when the program is executed by a processor, any video quality assessment method described herein is implemented.
- Fig. 1 is a schematic flow chart of a model training method for video quality assessment provided by the present disclosure
- FIG. 2 is a schematic flow diagram of training an initial video quality assessment model provided by the present disclosure
- FIG. 3 is a schematic diagram of a three-dimensional convolutional neural network provided by the present disclosure.
- FIG. 4 is a schematic flow diagram of a dense convolutional network provided by the present disclosure.
- FIG. 5 is a schematic flow diagram of the attention mechanism network provided by the present disclosure.
- FIG. 6 is a schematic flow diagram of a layered convolutional network provided by the present disclosure.
- FIG. 7 is a schematic flow diagram of an initial video quality assessment model provided by the present disclosure.
- Figure 8a is a schematic diagram of the 3D-PVQA method provided by the present disclosure.
- Fig. 8b is a screenshot of reference video data and a screenshot of distorted video data provided by the present disclosure
- Fig. 9 is a schematic flow chart of determining the average opinion value MOS value of each training video data provided by the present disclosure.
- FIG. 10 is a schematic flowchart of a video quality assessment method provided by the present disclosure.
- FIG. 11 is a block diagram of a model training device for video quality assessment provided by the present disclosure.
- Fig. 12 is a schematic diagram of modules of a video quality assessment device provided by the present disclosure.
- FIG. 13 is a schematic diagram of an electronic device provided by the present disclosure.
- FIG. 14 is a schematic diagram of a computer storage medium provided by the present disclosure.
- Embodiments described herein may be described with reference to plan views and/or cross-sectional views by way of idealized schematic representations of the disclosure. Accordingly, the example illustrations may be modified according to manufacturing techniques and/or tolerances. Therefore, the embodiments are not limited to those shown in the drawings but include modifications of configurations formed based on manufacturing processes. Accordingly, the regions illustrated in the figures have schematic properties, and the shapes of the regions shown in the figures illustrate the specific shapes of the regions of the elements, but are not intended to be limiting.
- This disclosure proposes to preset an initial video quality assessment model for fully extracting features and accurately detecting boundaries in an image, obtain reference video data and distorted video data, and use the reference video data and distorted video data and their MOS values (MOS--Mean Opinion Score, average opinion value) to train the initial video quality assessment model, and obtain the final video quality assessment model, so as to improve the accuracy of video quality assessment.
- MOS--Mean Opinion Score, average opinion value MOS--Mean Opinion Score, average opinion value
- the present disclosure provides a model training method for video quality assessment, which may include the following steps S11 to S13.
- step S11 training video data is acquired, wherein the training video data includes reference video data and distorted video data.
- step S12 the MOS value of each training video data is determined.
- step S13 the preset initial video quality assessment model is trained according to the training video data and its MOS value until a convergence condition is reached, and a final video quality assessment model is obtained.
- reference video data can be regarded as standard video data
- reference video data and distorted video data can be obtained through open source data sets LIVE, CSIQ, IVP and self-made data set CENTER.
- the MOS value is a numerical value used to characterize the quality of video data.
- the video data in the open source data sets LIVE, CSIQ, and IVP usually carry corresponding MOS values, but the video data in the self-made data set CENTER does not carry corresponding MOS values, so it is necessary to determine the MOS values of each training video data.
- the initial video quality assessment model for fully extracting image features and accurately detecting the boundary in the image is preset, and the acquisition includes the reference video Data and distorted video data training video data, while using the reference video data and distorted video data to train the initial video quality assessment model to obtain the final video quality assessment model, which can clearly distinguish distorted video data and non-distorted video data, that is, reference video data , so as to ensure the independence and diversity of the video data used to train the model.
- the final video quality assessment model obtained by training the initial video quality assessment model can fully extract image features and accurately detect the boundaries in the image, and directly use the final video quality assessment model to perform quality assessment on the video data to be evaluated, which improves the accuracy of video quality assessment. Accuracy.
- the network structure of the model when the network structure of the model is determined, there are two parts that affect the final performance of the model.
- One part is the parameters of the model, such as weights, biases, etc.; the other part is the hyperparameters of the model, such as the learning rate. , network layers, etc. If the same training data is used to optimize the parameters and hyperparameters of the model, it will probably lead to absolute overfitting of the model. Therefore, two independent datasets can be used to optimize the parameters and hyperparameters of the initial video quality assessment model, respectively.
- the training of the preset initial video quality assessment model according to the training video data and its MOS value until reaching the convergence condition may include the following steps S131 and S132.
- step S131 a training set and a verification set are determined according to a preset ratio and training video data, wherein the intersection of the training set and the verification set is an empty set.
- step S132 adjust the parameters of the initial video quality assessment model according to the training set and the MOS value of each video data in the training set, and adjust the hyperparameters of the initial video quality assessment model according to the verification set and the MOS value of each video data in the verification set Adjustments are made until a convergence criterion is reached.
- the training data may be divided into a training set and a verification set according to a ratio of 6:4.
- the preset ratio can also be other ratios such as 8:2 and 5:5.
- the training video data can also be determined as a training set, a validation set, and a test set.
- the training video data can be divided into a training set, a verification set, and a test set according to a ratio of 6:2:2, and the intersection between the training set, the verification set, and the test set is an empty set.
- the training set and verification set are used to train the initial video quality assessment model to obtain the final video quality assessment model, and the test set is used to evaluate the generalization ability of the final video quality assessment model.
- the more data in the test set the longer it takes to use the test set to evaluate the generalization ability of the final video quality assessment model; the more video data used to train the initial video quality assessment model, the greater the quality of the final video.
- the number of training video data can be appropriately increased and the proportion of training set and verification set in training video data can be increased. For example, other ratios such as 10:1:1 can be used in training.
- the video data is divided into training set, validation set and test set.
- the training set and the verification set whose intersection is an empty set are determined according to the preset ratio and the training video data, and the training set and the training set are used.
- the convergence condition it can be fully extracted. Image features and a final video quality assessment model with higher accuracy for accurately detecting boundaries in the image improve the accuracy of video quality assessment.
- Training the preset initial video quality assessment model based on the training video data and its MOS value is a model training process based on deep learning, which is equivalent to taking the MOS value of the training video data as a benchmark, and is committed to continuously improving the output results of the model to the MOS value. move closer.
- the gap between the evaluation result output by the model and the MOS value is small, it can be considered that the model has met the requirements of video quality evaluation.
- the convergence condition includes that the evaluation error rate of each video data in the training set and the verification set does not exceed a preset threshold, and the evaluation error rate is calculated using the following formula:
- E is the evaluation error rate of the current video data
- S is the evaluation score of the current video data output by the initial quality evaluation model after adjusting parameters and hyperparameters
- Mos is the Mos value of the current video data.
- the MOS value of the current video data has been pre-determined.
- the initial quality assessment model after adjusting parameters and hyperparameters will output the current video
- the evaluation score S of the data so the evaluation error rate E of the current video data can be calculated.
- the preset threshold may be 0.28, 0.26, 0.24 and so on.
- the initial video quality assessment model preset in the present disclosure may include a three-dimensional convolutional neural network for extracting motion information, so as to improve the accuracy of video quality assessment.
- the initial video quality assessment model includes a three-dimensional convolutional neural network for extracting motion information of image frames.
- FIG. 3 it is a schematic diagram of a three-dimensional convolutional neural network provided by the present disclosure.
- the 3D convolutional neural network can stack multiple consecutive image frames into a cube, and then apply a 3D convolution kernel in the cube.
- each feature map in the convolutional layer (as shown in the right half of Figure 3) will be associated with multiple adjacent consecutive frames in the previous layer (as shown in the left half of Figure 3) are connected so that motion information between consecutive image frames can be captured.
- the initial video quality evaluation model may also include an attention model, a data fusion processing module, The global pooling module and the fully connected layer, wherein the attention model, the data fusion processing module, the three-dimensional convolutional neural network, the global pooling module and the fully connected layer are cascaded in sequence.
- the attention model includes a cascaded multi-input network, a two-dimensional convolutional module, a dense convolutional network, a downsampling processing module, a layered convolutional network, an upsampling processing module, and an attention mechanism network
- the dense convolutional network includes at least two cascaded dense convolutional modules
- the dense convolutional module includes four cascaded densely connected convolutional layers.
- the dense convolutional network includes at least two cascaded dense convolutional modules, each dense convolutional module includes four cascaded densely connected convolutional layers, and the input of each densely connected convolutional layer is the current dense convolutional module Feature map fusion of all preceding dense convolutional layers.
- the feature map after each layer of pooling of the encoder will pass through a dense convolution module, and each time a dense convolution module is passed, a BN (BatchNormalization, batch normalization) operation, ReLU (Rectified Linear Units, linear correction unit ) activation function operation and convolution Conv operation.
- BN BatchNormalization, batch normalization
- ReLU Rectified Linear Units, linear correction unit
- the attention mechanism network includes a cascaded attention convolution module, a linear correction unit activation module, a nonlinear activation module, and an attention upsampling processing module.
- FIG. 5 it is a schematic flow diagram of the attention mechanism network provided by the present disclosure.
- the input of the attention mechanism network is the low-dimensional feature g i and the high-dimensional feature x l , where x l is obtained by double-upsampling the output xi of the layered convolutional network; a part of the output of the dense convolutional network After being processed by the upsampling processing module, it is input into the layered convolutional network and then output xi; g i is another part of the output of the dense convolutional network; 1*1 convolution is performed on g i (W g : Conv 1*1), and the x l performs 1*1 convolution (W x : Conv 1*1), and then performs matrix addition processing on the two convolution results, which are processed by the linear correction unit activation module (ReLU), 1*1 convolution ( ⁇ : Conv 1*1), nonlinear activation (Sigmoid), upsampling processing (Ups
- W g is the result of 1*1 convolution of g i
- W x is the result of 1*1 convolution of x l
- T is the matrix transposition symbol
- ⁇ is the result of 1*1 convolution on the output of the linear correction unit activation module
- the layered convolutional network includes a first layered network, a second layered network, a third layered network, and a fourth upsampling processing module
- the first layered network includes cascaded first A downsampling processing module and a first layered convolution module
- the second layered network includes a cascaded second downsampling processing module, a second layered convolution module and a second upsampling processing module
- the third layered network includes The cascaded global pooling module, the third layered convolution module and the third upsampling processing module
- the first layered convolution module is also cascaded with the second downsampling processing module
- the second upsampling processing module is cascaded with the fourth upsampling processing module
- the fourth upsampling processing module and the third upsampling processing module are also cascaded with the third layered convolution module.
- the first down-sampling processing module and the second down-sampling processing module are both configured to down-sample the data.
- the second up-sampling processing module, the third up-sampling processing module and the fourth up-sampling processing module are all used to perform up-sampling processing on the data.
- FIG. 6 it is a schematic flow chart of the layered convolutional network provided by the present disclosure.
- the data is input into the layered convolutional network, it is respectively input into the first layered network, the second layered network, the third layered network and the fourth upsampling processing module for processing; the output of the first layered network and the second layered network
- the output of the network is processed by data fusion, and then input to the fourth upsampling processing module;
- the data is input to the layered convolution network, then input to the global pooling module for processing, and then input to the third layered convolution module for processing, and the third
- the output X1 of the layered convolution module and the output P(X) of the fourth upsampling processing module are subjected to matrix multiplication processing to obtain Perform matrix addition processing with the output X2 of the third upsampling processing module to obtain
- the third layered convolution module again for processing, and obtain the output result of the layered convolutional network, namely the high-dimensional feature x
- the first layered convolution module can perform Conv 5*5 operations on the data (ie 5*5 convolution), and the second layered convolution module can perform Conv 3*3 operations on the data (ie 3 *3 convolution), the third layered convolution module can perform Conv 1*1 operation on the data (that is, 1*1 convolution). It should be understood that the same convolution module can also be used to perform Conv 5*5 operation, Conv 3*3 operation and Conv 1*1 operation respectively.
- the initial video quality assessment model may include a multi-input network, a two-dimensional convolution module, a dense convolution network, a downsampling processing module, a layered convolution network, an upsampling processing module, an attention mechanism network, a data fusion processing module, 3D Convolutional Neural Networks, Global Pooling Modules and Fully Connected Layers.
- the video quality assessment model provided in this disclosure may be called a 3D-PVQA (3Dimensions Pyramid Video Quality Assessment, three-dimensional pyramid video quality assessment) model and a 3D-PVQA method.
- each video data in the training set and each video data in the verification set are divided into distorted video data and residual video data and input to the 3D-PVQA model respectively, that is, residual multi-input Residual-Multi-Input and distorted multi-input Distorted-Multi-Input.
- Residual video data can be obtained by processing residual frame Residual Frames based on distorted video data and reference video data.
- the multi-input network outputs the input data into two sets of data, the first set of data is the original input data, and the second set of data is the data obtained by reducing the original input data to double the size of the data frame.
- the multi-input network will output two sets of data. After the first set of data is processed by the two-dimensional convolution module, it is input to the dense convolution network for processing, and the input is down-sampled. The processing module is processed; the second set of data is processed by the two-dimensional convolution module, and after being fused (concat) with the output of the downsampling processing module, it is input to the dense convolution network again for processing. At this time, a part of the dense convolution network The output will be input to the downsampling processing module again for processing, and the output of the downsampling processing module will be input to the layered convolutional network for processing.
- the output of the layered convolutional network will be input to the attention mechanism network together with another part of the output of the dense convolutional network for processing.
- the data fusion processing module performs data fusion processing on the output results of the residual video data obtained by the attention mechanism network processing and the output results of the distorted video data.
- the output of the data fusion processing module will be input into two 3D convolutional neural networks.
- the neural network will output the threshold of frame loss perceptibility, perform matrix multiplication processing on the threshold of frame loss perceivability and the residual data frame obtained by the residual frame, and finally input it to the global pooling module and fully connected layer for processing , which will output a quality assessment score for the video data.
- Figure 6 shows two first layered convolution modules, two second layered convolution modules, and three third layered convolution modules, and does not refer to layered There are two first layered convolution modules, two second layered convolution modules and three third layered convolution modules in the convolutional network.
- the downsampling processing module and the downsampling processing module in the layered convolutional network can be the same downsampling processing module, or they can be different downsampling processing modules, and the upsampling processing module and the upsampling processing in the layered convolutional network
- the attention upsampling processing modules in the module and the attention mechanism network may be the same upsampling processing module or different upsampling processing modules.
- the training video data can be divided into training set, verification set and test set according to the preset ratio, the training set is input into the 3D-PVQA model for training, the verification set is input into the 3D-PVQA model for verification, and the test set Set input 3D-PVQA model for testing, can get the corresponding quality assessment score.
- the test set can be used to evaluate the generalization ability of the final video quality assessment model, as shown in Figure 8b, the screenshot of the reference video data is on the left, and the screenshot of the distorted video data is on the right, as shown in Table 1 below. It is the MOS value of the video data and the quality evaluation score corresponding to the video data output by the 3D-PVQA model.
- the determination of the average opinion value MOS value of each training video data may include the following steps S121 to S124.
- each training video data is grouped, each group includes a piece of reference video data and a plurality of distorted video data, and the resolution of each video data in each group is the same, and the frame rate of each video data in each group same.
- step S122 classify the distorted video data in each group.
- step S123 each distorted video data of each category in each group is graded.
- step S124 the MOS value of each training video data is determined according to the grouping, classification and classification of each training video data.
- the distorted video data when classifying each distorted video data in each group, can be divided into distorted video data of different categories such as packet loss class distortion and encoding class distortion, and each distorted video data of each classification in each group When the data is graded, the distorted video data can be classified into three different levels of distortion: mild, moderate and severe.
- each group After grouping, classifying, and grading each training video data, each group includes a piece of reference video data and multiple pieces of distorted video data, and multiple pieces of distorted video data belong to different categories, and the distorted video data under each category belong to different distortion levels , based on the reference video data in each group, the MOS value of each training video data can be determined by using the SAMVIQ (Subjective Assessment Method for Video Quality evaluation) method and grouping, classification and grading situations.
- SAMVIQ Subjective Assessment Method for Video Quality evaluation
- the present disclosure also provides a video quality assessment method, which may include the following step S21.
- step S21 the video data to be evaluated is processed by training the final quality evaluation model obtained according to the above-mentioned model training method for video quality evaluation to obtain the quality evaluation score of the video data to be evaluated.
- the initial video quality assessment model Preset the initial video quality assessment model to fully extract image features and accurately detect the boundaries in the image, obtain training video data including reference video data and distorted video data, and use the reference video data and distorted video data to evaluate the initial video quality assessment model Performing training to obtain the final video quality assessment model ensures the independence and diversity of the video data used to train the model.
- the video data to be evaluated can be directly used for quality evaluation by using the final video quality evaluation model, which improves the accuracy of video quality evaluation.
- the present disclosure also provides a model training device for video quality assessment, which may include: an acquisition module 101 , a processing module 102 , and a training module 103 .
- the acquiring module 101 is configured to acquire training video data; wherein, the training video data includes reference video data and distorted video data.
- the processing module 102 is configured to determine the MOS value of each training video data.
- the training module 103 is configured to train a preset initial video quality assessment model according to the training video data and its MOS value until a convergence condition is reached, so as to obtain a final video quality assessment model.
- the training module 103 is configured to: determine a training set and a verification set according to preset ratios and training video data, wherein the intersection of the training set and the verification set is an empty set;
- the MOS value of the data adjusts the parameters of the initial video quality assessment model, and adjusts the hyperparameters of the initial video quality assessment model according to the verification set and the MOS value of each video data in the verification set until the convergence condition is reached.
- the convergence condition includes that the evaluation error rate of each video data in the training set and the verification set does not exceed a preset threshold, and the evaluation error rate is calculated using the following formula:
- E is the evaluation error rate of the current video data
- S is the evaluation score of the current video data output by the initial quality evaluation model after adjusting parameters and hyperparameters
- Mos is the Mos value of the current video data.
- the initial video quality assessment model includes a three-dimensional convolutional neural network for extracting motion information of image frames.
- the initial video quality assessment model also includes an attention model, a data fusion processing module, a global pooling module and a fully connected layer, and the attention model, a data fusion processing module, a three-dimensional convolutional neural network, and a global pooling
- the modularization and fully connected layers are cascaded in sequence.
- the attention model includes a cascaded multi-input network, a two-dimensional convolutional module, a dense convolutional network, a downsampling processing module, a layered convolutional network, an upsampling processing module, and an attention mechanism network
- the dense convolutional network includes at least two cascaded dense convolutional modules
- the dense convolutional module includes four cascaded densely connected convolutional layers.
- the attention mechanism network includes a cascaded attention convolution module, a linear correction unit activation module, a nonlinear activation module, and an attention upsampling processing module.
- the layered convolutional network includes a first layered network, a second layered network, a third layered network, and a fourth upsampling processing module
- the first layered network includes cascaded first A downsampling processing module and a first layered convolution module
- the second layered network includes a cascaded second downsampling processing module, a second layered convolution module and a second upsampling processing module
- the third layered network includes The cascaded global pooling module, the third layered convolution module and the third upsampling processing module
- the first layered convolution module is also cascaded with the second downsampling processing module
- the second upsampling processing module is cascaded with the fourth upsampling processing module
- the fourth upsampling processing module and the third upsampling processing module are also cascaded with the third layered convolution module.
- the processing module 102 is configured to: group each training video data, each group includes a piece of reference video data and multiple pieces of distorted video data, and the resolution of each video data in each group is the same, and The frame rate of each video data in each group is the same; Classify each video data in each group; Classify each video data of each classification in each group; Determine each training video according to the grouping, classification and classification of each training video data The MOS value of the data.
- the present disclosure also provides a video quality assessment device, including: an assessment module 201 configured to train the final video quality obtained according to the aforementioned model training method for video quality assessment.
- the quality evaluation model processes the video data to be evaluated to obtain the quality evaluation score of the video data to be evaluated.
- the embodiment of the present disclosure also provides an electronic device, including: one or more processors 301; a storage device 302, on which one or more programs are stored; when the one or more When the program is executed by the one or more processors 301, the one or more processors 301 are made to implement at least one of the following methods: the model training method for video quality assessment provided by the aforementioned embodiments; The video quality assessment method provided by each embodiment described above.
- an embodiment of the present disclosure also provides a computer storage medium on which a computer program is stored, wherein, when the program is executed by a processor, at least one of the following methods is implemented: The model training method for video quality assessment provided by the method; the video quality assessment method provided by each embodiment described above.
- the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute.
- Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit .
- Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
- computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
- communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
- Example embodiments have been disclosed herein, and while specific terms have been employed, they are used and should be construed in a general descriptive sense only and not for purposes of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be described in combination with other embodiments, unless explicitly stated otherwise. Combinations of features and/or elements. Accordingly, it will be understood by those of ordinary skill in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
Abstract
Description
Claims (16)
- 一种用于视频质量评估的模型训练方法,包括:获取训练视频数据,其中,所述训练视频数据包括参考视频数据和失真视频数据;确定各所述训练视频数据的平均意见值MOS值;根据所述训练视频数据及其MOS值训练预设的初始视频质量评估模型直至达到收敛条件,得到最终视频质量评估模型。
- 根据权利要求1所述的方法,其中,所述根据所述训练视频数据及其MOS值训练预设的初始视频质量评估模型直至达到收敛条件包括:根据预设比例和所述训练视频数据确定训练集和验证集,其中,所述训练集和所述验证集的交集为空集;根据所述训练集和所述训练集中各视频数据的MOS值对所述初始视频质量评估模型的参数进行调整,以及根据所述验证集和所述验证集中各视频数据的MOS值对所述初始视频质量评估模型的超参数进行调整,直至达到收敛条件。
- 根据权利要求2所述的方法,其中,所述收敛条件包括所述训练集中以及所述验证集中的各视频数据的评估误差率均不超过预设阈值,所述评估误差率利用如下公式计算得到:E=(|S-Mos|)/Mos,其中,E为当前视频数据的评估误差率;S为调整参数和超参数后的所述初始质量评估模型输出的当前视频数据的评估分数;Mos为当前视频数据的Mos值。
- 根据权利要求1至3中任意一项所述的方法,其中,所述初始视频质量评估模型包括用于提取图像帧的运动信息的三维卷积神 经网络。
- 根据权利要求4所述的方法,其中,所述初始视频质量评估模型还包括注意力模型、数据融合处理模块、全局池化模块和全连接层,所述注意力模型、所述数据融合处理模块、所述三维卷积神经网络、所述全局池化模块和所述全连接层依次级联。
- 根据权利要求5所述的方法,其中,所述注意力模型包括级联的多输入网络、二维卷积模块、密集卷积网络、下采样处理模块、分层卷积网络、上采样处理模块和注意力机制网络,所述密集卷积网络包括至少两个级联的密集卷积模块,所述密集卷积模块包括四个级联的密集连接卷积层。
- 根据权利要求6所述的方法,其中,所述注意力机制网络包括级联的注意力卷积模块、线性修正单元激活模块、非线性激活模块和注意力上采样处理模块。
- 根据权利要求5所述的方法,其中,所述分层卷积网络包括第一分层网络、第二分层网络、第三分层网络和第四上采样处理模块,所述第一分层网络包括级联的第一下采样处理模块和第一分层卷积模块,所述第二分层网络包括级联的第二下采样处理模块、第二分层卷积模块和第二上采样处理模块,所述第三分层网络包括级联的全局池化模块、第三分层卷积模块和第三上采样处理模块,所述第一分层卷积模块还与所述第二下采样处理模块级联,所述第一分层卷积模块以及所述第二上采样处理模块与所述第四上采样处理模块级联,所述第四上采样处理模块和所述第三上采样处理模块还与所述第三分层卷积模块级联。
- 根据权利要求1至3中任意一项所述的方法,其中,所述确定各所述训练视频数据的平均意见值MOS值包括:对各所述训练视频数据进行分组,每组中包括一条参考视频数据和多条失真视频数据,且每组中各视频数据的分辨率相同,且每组中各视频数据的帧率相同;对每组中各视频数据进行分类;对每组中每个分类的各视频数据进行分级;根据所述各训练视频数据的分组、分类和分级确定各所述训练视频数据的MOS值。
- 一种视频质量评估方法,包括:根据权利要求1-9任一项所述的方法训练获得的最终质量评估模型对待评估视频数据进行处理,得到所述待评估视频数据的质量评估分数。
- 一种用于视频质量评估的模型训练装置,包括:获取模块,配置为获取训练视频数据;其中,所述训练视频数据包括参考视频数据和失真视频数据;处理模块,配置为确定各所述训练视频数据的平均意见值MOS值;训练模块,配置为根据所述训练视频数据及其MOS值训练预设的初始视频质量评估模型直至达到收敛条件,得到最终视频质量评估模型。
- 一种视频质量评估装置,包括:评估模块,配置为根据权利要求1-9任一项所述的用于视频质量评估的模型训练方法训练获得的最终质量评估模型对待评估视频数据进行处理,得到所述待评估视频数据的质量评估分数。
- 一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现:如权利要求1-9任一项所述的用于视频质量评估的模型训练方法。
- 一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现:如权利要求10所述的视频质量评估方法。
- 一种计算机存储介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现:如权利要求1-9任一项所述的用于视频质量评估的模型训练方法。
- 一种计算机存储介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现:如权利要求10所述的视频质量评估方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020247009562A KR20240052000A (ko) | 2021-09-09 | 2022-09-01 | 모델 훈련 방법, 비디오 품질 평가 방법, 장치, 설비 및 매체 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111055446.0 | 2021-09-09 | ||
CN202111055446.0A CN115775218A (zh) | 2021-09-09 | 2021-09-09 | 模型训练方法、视频质量评估方法、装置、设备及介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023036045A1 true WO2023036045A1 (zh) | 2023-03-16 |
Family
ID=85387481
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/116480 WO2023036045A1 (zh) | 2021-09-09 | 2022-09-01 | 模型训练方法、视频质量评估方法、装置、设备及介质 |
Country Status (3)
Country | Link |
---|---|
KR (1) | KR20240052000A (zh) |
CN (1) | CN115775218A (zh) |
WO (1) | WO2023036045A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117079081A (zh) * | 2023-10-16 | 2023-11-17 | 山东海博科技信息系统股份有限公司 | 一种多模态视频文本处理模型训练方法及系统 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190258902A1 (en) * | 2018-02-16 | 2019-08-22 | Spirent Communications, Inc. | Training A Non-Reference Video Scoring System With Full Reference Video Scores |
CN110674925A (zh) * | 2019-08-29 | 2020-01-10 | 厦门大学 | 基于3d卷积神经网络的无参考vr视频质量评价方法 |
CN110751649A (zh) * | 2019-10-29 | 2020-02-04 | 腾讯科技(深圳)有限公司 | 视频质量评估方法、装置、电子设备及存储介质 |
CN110958467A (zh) * | 2019-11-21 | 2020-04-03 | 清华大学 | 视频质量预测方法和装置及电子设备 |
CN111524110A (zh) * | 2020-04-16 | 2020-08-11 | 北京微吼时代科技有限公司 | 视频质量的评价模型构建方法、评价方法及装置 |
CN113196761A (zh) * | 2018-10-19 | 2021-07-30 | 三星电子株式会社 | 用于评估视频的主观质量的方法及装置 |
CN114598864A (zh) * | 2022-03-12 | 2022-06-07 | 中国传媒大学 | 一种基于深度学习的全参考超高清视频质量客观评价方法 |
-
2021
- 2021-09-09 CN CN202111055446.0A patent/CN115775218A/zh active Pending
-
2022
- 2022-09-01 KR KR1020247009562A patent/KR20240052000A/ko unknown
- 2022-09-01 WO PCT/CN2022/116480 patent/WO2023036045A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190258902A1 (en) * | 2018-02-16 | 2019-08-22 | Spirent Communications, Inc. | Training A Non-Reference Video Scoring System With Full Reference Video Scores |
CN113196761A (zh) * | 2018-10-19 | 2021-07-30 | 三星电子株式会社 | 用于评估视频的主观质量的方法及装置 |
CN110674925A (zh) * | 2019-08-29 | 2020-01-10 | 厦门大学 | 基于3d卷积神经网络的无参考vr视频质量评价方法 |
CN110751649A (zh) * | 2019-10-29 | 2020-02-04 | 腾讯科技(深圳)有限公司 | 视频质量评估方法、装置、电子设备及存储介质 |
CN110958467A (zh) * | 2019-11-21 | 2020-04-03 | 清华大学 | 视频质量预测方法和装置及电子设备 |
CN111524110A (zh) * | 2020-04-16 | 2020-08-11 | 北京微吼时代科技有限公司 | 视频质量的评价模型构建方法、评价方法及装置 |
CN114598864A (zh) * | 2022-03-12 | 2022-06-07 | 中国传媒大学 | 一种基于深度学习的全参考超高清视频质量客观评价方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117079081A (zh) * | 2023-10-16 | 2023-11-17 | 山东海博科技信息系统股份有限公司 | 一种多模态视频文本处理模型训练方法及系统 |
CN117079081B (zh) * | 2023-10-16 | 2024-01-26 | 山东海博科技信息系统股份有限公司 | 一种多模态视频文本处理模型训练方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
KR20240052000A (ko) | 2024-04-22 |
CN115775218A (zh) | 2023-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kang et al. | Robust median filtering forensics using an autoregressive model | |
CN111079539B (zh) | 一种基于异常追踪的视频异常行为检测方法 | |
CN111178120B (zh) | 一种基于作物识别级联技术的害虫图像检测方法 | |
TWI729861B (zh) | 處理異常檢測的裝置及方法 | |
CN110457524B (zh) | 模型生成方法、视频分类方法及装置 | |
WO2023036045A1 (zh) | 模型训练方法、视频质量评估方法、装置、设备及介质 | |
Goh et al. | A hybrid evolutionary algorithm for feature and ensemble selection in image tampering detection | |
US20230066499A1 (en) | Method for establishing defect detection model and electronic apparatus | |
CN108830829B (zh) | 联合多种边缘检测算子的无参考质量评价算法 | |
CN111931857A (zh) | 一种基于mscff的低照度目标检测方法 | |
CN112686869A (zh) | 一种布匹瑕疵检测方法及其装置 | |
Li et al. | Robust median filtering detection based on the difference of frequency residuals | |
CN114862857A (zh) | 一种基于两阶段学习的工业产品外观异常检测方法及系统 | |
CN113888477A (zh) | 网络模型的训练方法、金属表面缺陷检测方法及电子设备 | |
Agarwal et al. | Median filtering forensics based on optimum thresholding for low-resolution compressed images | |
CN112418229A (zh) | 一种基于深度学习的无人船海上场景图像实时分割方法 | |
CN116152577B (zh) | 图像分类方法及装置 | |
Rodríguez-Lois et al. | A Critical Look into Quantization Table Generalization Capabilities of CNN-based Double JPEG Compression Detection | |
CN112949344B (zh) | 一种用于异常检测的特征自回归方法 | |
CN114897842A (zh) | 基于纹理增强网络的红外小目标分割检测方法 | |
Hebbar et al. | A Deep Learning Framework with Transfer Learning Approach for Image Forgery Localization | |
CN114155198A (zh) | 一种去雾图像的质量评价方法和装置 | |
Zhu et al. | Recaptured image detection through multi-resolution residual-based correlation coefficients | |
Sornalatha et al. | Detecting contrast enhancement based image forgeries by parallel approach | |
Okarma | Image and video quality assessment with the use of various verification databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22866509 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20247009562 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022866509 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022866509 Country of ref document: EP Effective date: 20240322 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |