CN114125495A

CN114125495A - Video quality evaluation model training method, video quality evaluation method and device

Info

Publication number: CN114125495A
Application number: CN202010801975.XA
Authority: CN
Inventors: 吕航; 李佳聪
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2022-03-01

Abstract

The invention discloses a video quality evaluation model training method, a video quality evaluation method and a video quality evaluation device, and relates to the field of video communication. The method comprises the following steps: acquiring sample video data and quality scores of the sample video data; extracting video characteristic parameters related to human eye visual experience in sample video data; and taking the quality score of the sample video data as marking data, training a neural network model based on the video characteristic parameters to obtain a video quality evaluation model, so as to evaluate the quality of the video data to be evaluated according to the video quality evaluation model. The video quality assessment method and device can improve accuracy of video quality assessment.

Description

Video quality evaluation model training method, video quality evaluation method and device

Technical Field

The present disclosure relates to the field of video communication, and in particular, to a video quality assessment model training method, a video quality assessment method, and an apparatus thereof.

Background

Under the background of 5G technical development and commercialization, high-definition video services become typical 5G applications, including video on demand, live broadcast, high-definition video conferencing, VR live broadcast, VR and AR games, and the like. Video content itself and transmission distortion in a network environment have a significant influence on video quality, and continuous improvement of video quality becomes especially important for development of related services.

VQA (Video Quality Assessment) includes both manual and objective quantitative assessments. The manual evaluation efficiency is low, the cost is high and the randomness is strong. The objective evaluation includes FR (Full Reference), RR (Reduced Reference), and NR (None Reference). The full reference and the half reference need reference standards, so that the condition is limited, and the reference-free evaluation mode does not need standard assistance, so that the application is flexible, the range is wide and the attention is paid in the practical application.

A traditional video objective evaluation system, such as a G.1070 video quality reference-free evaluation model published by ITU in 2012, mainly evaluates objective parameters such as Mean Square Error (MSE) and peak signal-to-noise ratio (PSNR), but has a poor evaluation effect.

Disclosure of Invention

The technical problem to be solved by the present disclosure is to provide a video quality assessment model training method, a video quality assessment method and an apparatus, which can improve the accuracy of video quality assessment.

According to an aspect of the present disclosure, a video quality assessment model training method is provided, including: acquiring sample video data and quality scores of the sample video data; extracting video characteristic parameters related to human eye visual experience in sample video data; and taking the quality score of the sample video data as marking data, training a neural network model based on the video characteristic parameters to obtain a video quality evaluation model, so as to evaluate the quality of the video data to be evaluated according to the video quality evaluation model.

In some embodiments, the video feature parameters include: at least one of spatial domain characteristic parameters, temporal domain characteristic parameters, and transmission characteristic parameters.

In some embodiments, the spatial signature parameters include: at least one of an image luminance chrominance perception parameter, an image contrast parameter, an image blur parameter, and an image edge energy parameter.

In some embodiments, the time domain feature parameters include: at least one of a scene cut perceptual parameter and an inter-frame difference parameter.

In some embodiments, the transmission characteristic parameters include: at least one of an interruption time parameter, a disorder rate parameter, a delay parameter, a packet loss rate parameter, and a code rate parameter.

In some embodiments, deriving the video quality assessment model comprises: comparing the output result of the neural network model with the quality score of the sample video data; judging whether the comparison result meets the requirement of the loss function or not, and adjusting the parameters of the neural network model through repeated iteration; and taking the neural network model when the comparison result meets the requirement of the loss function as a video quality evaluation model.

In some embodiments, the neural network is a back propagation algorithm BP neural network.

According to another aspect of the present disclosure, a video quality evaluation method is further provided, including: acquiring video data to be evaluated; extracting video characteristic parameters related to human eye visual experience in video data to be evaluated; and inputting the video characteristic parameters into a video quality evaluation model to obtain the quality score of the video data to be evaluated, wherein the video quality evaluation model is obtained by training based on the video characteristic parameters in the sample video data and the quality score of the sample video data.

In some embodiments, the video feature parameters include at least one of spatial, temporal, and transmission feature parameters.

According to another aspect of the present disclosure, there is also provided a video quality assessment model training apparatus, including: a sample information acquisition unit configured to acquire sample video data and a quality score of the sample video data; a characteristic parameter extraction unit configured to extract video characteristic parameters related to human visual experience in the sample video data; and the model training unit is configured to train the neural network model based on the video characteristic parameters by taking the quality scores of the sample video data as the marking data to obtain a video quality evaluation model so as to evaluate the quality of the video data to be evaluated according to the video quality evaluation model.

According to another aspect of the present disclosure, there is also provided a video quality evaluation apparatus including: an actual information acquisition unit configured to acquire video data to be evaluated; the video characteristic extraction unit is configured to extract video characteristic parameters related to human visual experience in video data to be evaluated; and the video quality evaluation unit is configured to input the video characteristic parameters into a video quality evaluation model to obtain a quality score of the video data to be evaluated, wherein the video quality evaluation model is obtained by training based on the video characteristic parameters in the sample video data and the quality score of the sample video data.

According to another aspect of the present disclosure, there is also provided an electronic device, including: a memory; and a processor coupled to the memory, the processor configured to perform the video quality assessment model training method as described above, or the video quality assessment method as described above, based on instructions stored in the memory.

According to another aspect of the present disclosure, a non-transitory computer-readable storage medium is also presented, on which computer program instructions are stored, which instructions, when executed by a processor, implement the video quality assessment model training method described above, or the video quality assessment method described above.

In the embodiment of the disclosure, the neural network model is trained through the video characteristic parameters related to human visual experience and the quality scores of the sample video data to obtain the video quality assessment model, so that the subsequent actually acquired video data can be more objectively, quantitatively, accurately, efficiently and quickly assessed in quality, and the accuracy of video quality assessment is improved.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a flow diagram of some embodiments of a video quality assessment model training method of the present disclosure.

Fig. 2 is a schematic flow chart diagram illustrating further embodiments of a video quality assessment model training method according to the present disclosure.

Fig. 3 is a flow diagram of some embodiments of a video quality evaluation method of the present disclosure.

Fig. 4 is a schematic structural diagram of some embodiments of the video quality assessment model training apparatus according to the present disclosure.

Fig. 5 is a schematic structural diagram of another embodiment of a video quality assessment model training apparatus according to the present disclosure.

Fig. 6 is a schematic structural diagram of some embodiments of the video quality evaluation apparatus of the present disclosure.

Fig. 7 is a schematic structural diagram of some embodiments of an electronic device of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

At step 110, sample video data and a quality score for the sample video data are obtained.

In some embodiments, a training sample library is constructed, and the training samples are adopted as diverse as possible according to the requirements and characteristics of the business. For example, the samples are selected according to the ITU-R bt.1210 standard, and can be coded into video samples of different qualities by a codec, including 18M, 10M, 8M, 6M, 2M, and other different compressed streams and derived samples of different qualities of luminance and chrominance.

In some embodiments, subjective MOS (Mean Opinion Score) scoring is performed on sample video data under different network conditions such as packet loss, disorder, delay jitter, and the like, and a scoring rule may be performed by a dual-stimulus continuous quality scale method or other methods. And the quality score of the sample video data is used as a sample annotation value of the training model.

In step 120, video feature parameters related to Human Visual experience (HVS) in the sample video data are extracted.

In a service scene, a video service platform transmits a video stream to a terminal through networks such as a 5G network, a broadband network and the like, the terminal receives the video stream and then decodes and plays the video stream, and factors influencing the video quality include the spatial domain and the time domain characteristics of the video and the distortion influence in the transmission process.

For example, the spatial domain characteristic parameter is mainly a quality parameter at an image level, that is, a video stream is composed of a sequence of images of one frame by one frame, and each frame of image generates a subjective experience for a viewer. In addition, video is a stream of frames on the time axis, and therefore, switching of changes between frames is also important in the viewing experience, in addition to the quality of the image itself. Moreover, transmission distortion such as packet loss, time delay, interruption and the like of the video in the network environment can significantly deteriorate the viewing and using experience of the video, and the experience generated by different video services such as video on demand, live broadcast and video conference is also different. For example, video conferencing is sensitive to latency, so it is necessary to pay attention to the distortion effect in network transmission for video quality evaluation.

Therefore, in this embodiment, spatial domain feature parameters, temporal domain feature parameters, and transmission feature parameters of the video may be extracted.

In step 130, the quality score of the sample video data is used as the annotation data, and the neural network model is trained based on the video characteristic parameters to obtain a video quality evaluation model, so that the quality of the video data to be evaluated is evaluated according to the video quality evaluation model.

In some embodiments, the neural network is a BP (Back Propagation) neural network. Compared with a convolution neural network with large operation amount, the BP neural network is simpler and more efficient during training.

In the embodiment, the neural network model is trained through the video characteristic parameters related to human visual experience and the quality scores of the sample video data to obtain the video quality evaluation model, so that the subsequent actually acquired video data can be evaluated more objectively, quantitatively, accurately, efficiently and quickly.

In step 210, sample video data and subjective quality scores of the sample video data under different network states are obtained.

In some embodiments, the sample video stream is decoded.

In step 220, spatial domain feature parameters, temporal domain feature parameters, and transmission feature parameters of the sample video data are extracted.

In some embodiments, the characteristic parameters of the image are more, and in combination with the HVS, the spatial characteristic parameters include one or more of an image luminance chrominance perception parameter, an image contrast parameter, an image blur parameter, and an image edge energy parameter.

In some embodiments, the maximum value and the average value are respectively selected for the image brightness and chrominance perception parameter, the image contrast parameter, the image fuzziness parameter and the image edge energy parameter, and the human visual experience is reflected by taking the maximum value and the average value.

In some embodiments, the image brightness and chroma perception parameters are mainly expressed as perception of image intensity, and are brightness and darkness measurement parameters for human eyes to perceive static objects.

In some embodiments, a formula is utilized

Determining an image luminance chrominance perception parameter V_CSFWherein m and n are the sizes of the video frame images, and i and j are pixel coordinates of the images;

is the pixel brightness value; CSF is a contrast sensitivity function; DCT is discrete cosine transform function, IDCT is inverse discrete cosine transform function. Averaging all frame images and counting the maximum value of a single frame to obtain a brightness and chrominance perception parameter V_CSF。

In some embodiments, the image contrast parameter is an extracted image gray level distribution value, and the contrast has a relatively obvious influence on the visual effect.

In some embodiments, a formula is utilized

Determining an image contrast parameter V_cWhere 8 represents the contrast of each pixel to the peripheral 8 pixels; l1, L2 are the adjacent positions of a certain pixel, i.e. the range of the former one and the latter one; i (I, j) ═ I (I ═ L1, j ═ L2) represents the luminance difference between a pixel and an adjacent pixel;

representing the average of the luminance of a pixel and its neighbors. The image contrast parameter V is obtained by calculating the average and maximum values for all frames_c。

In some embodiments, the blur parameter measures the sharpness of the image and is an important parameter for image perception.

In some embodiments, a point sharpness function is employed

Determining the image fuzziness of a single frame, wherein m and n are rows and columns of image pixels, dx identifies distance enhancement, df identifies gray change amplitude, and a represents the distance between adjacent pixels of a certain pixel, and is referred to as the adjacent 8-pixel range. And averaging all frames and counting the maximum value of a single frame to serve as a quality characteristic parameter of the EVA.

In some embodiments, the image edge energy parameter is an important image feature as a key measure of spatial domain.

In some embodiments, formulas are employed

Determining an image edge energy parameter V_EdgeEnWherein E (i, j) ═ E₁(Y(i,j))+E₂(Y (i, j)), E1 and E2 are two 3 × 3 templates, specifically taking the values:

V_EdgeEnfor a single frame edge energy value, the average value and the maximum value of all frames are calculated as the output of the parameter.

In some embodiments, the time domain feature parameters include: at least one of a scene cut perceptual parameter and an inter-frame difference parameter. Scene cuts and interframe changes that are too fast cause discomfort in the subjective experience, so scene cut perceptual parameters and interframe difference parameters need to be referenced in video quality assessment.

In some embodiments, the scene cut perceptual parameter and the inter-frame difference parameter are selected to be a maximum value and an average value, respectively.

In some embodiments, the scene switching perception is to adopt contrast difference of brightness, chroma and texture (expected value) to reflect the switching of the scene, and the current frame and the 30 frames of the previous sequence are subjected to gray gradient contrast calculation.

In some embodiments, the determination is made using a formula

N is not less than 30, or

And N is less than 30, and determining scene switching perception parameters. Wherein, N refers to a frame sequence before a current frame; v_alue,N＝V_alue,N-5Expressing the gray gradient value difference of the current Nth frame and the previous L frame;

is the average of the gray scale gradient values of the two frames. Namely, the average value and the maximum value of the scene change perception of the video stream are calculated as the evaluation parameter values.

In some embodiments, the inter-frame difference parameter is an important quality characteristic parameter in a time sequence, and changes in a time domain space are measured by a mean square error of brightness between two previous and next frames of images, so as to finally obtain an average value and a maximum value of inter-frame differences of the whole video stream.

In some embodiments, a male is utilized

And

and calculating an interframe difference parameter.

In some embodiments, transmission distortion such as packet loss, time delay, interruption and the like of a video in a network environment can significantly deteriorate viewing and using experiences of the video, and experiences generated by different video services such as video on demand, live broadcast and video conference are also different, so that it is necessary for video quality evaluation to pay attention to distortion influence in network transmission. The transmission characteristic parameter comprises at least one of an interruption time parameter, a disorder rate parameter, a time delay parameter, a packet loss rate parameter and a code rate parameter.

In some embodiments, when the time delay of the data packet exceeds 2 seconds, it is generally considered that an interrupt occurs, and the interrupt occurs, a picture is stuck, the visual experience effect is poor, and the interrupt number parameter is the number of times of statistical interrupts.

In some embodiments, when the video data packet reaches the terminal through the switching device and the network line, a disorder condition may occur, the disorder of the packet may adversely affect the appearance of the picture, the disorder rate is a ratio of the disorder packet to the total data packet, a percentage value is taken, for example, N%, and the disorder rate parameter is N.

In some embodiments, latency and jitter are factors that contribute significantly to transmission distortion, particularly for video conferencing services. The characteristic parameter may be an average delay (a statistical average of all video data packets) and a maximum delay, which is significant for the viewing experience degradation.

In some embodiments, in case of a degraded network condition, the loss of video data packets may cause page stutter and mosaic effect, degrading the viewing visual experience, and the packet loss rate parameter counts the ratio of the packet loss rate to the total data packets.

In some embodiments, the bitrate of the video stream can reflect the abundance of the content of the video, which has a direct effect on the video quality, and the higher the rate, the better the instruction.

In step 230, a BP neural network model is constructed and initialized, and then the learning interest rate, the expected error and the network damage are set.

In some embodiments, the BP neural network: the video evaluation scoring method comprises an input layer, a hidden layer and an output layer, wherein characteristic parameters are introduced into the input layer, and video evaluation scoring results are output from the output layer after processing of the hidden layer.

In some embodiments, the input layer of the BP neural network is 18 neurons, the output layer is 1 neuron, and the output value is the quality evaluation score value; the hidden layer is 5 layers, the number of the neurons is 26, 19, 23, 18 and 12 respectively, the learning rate is 0.01, the learning rate can be adjusted in a training loop, the connection weight and the threshold are random numbers, and the expected value of the error target is 0.001. During training, network packet loss, disorder, time delay jitter, interruption and the like can be simulated through equipment such as a network damage instrument and the like in a video transmission network to form transmission distortion.

In step 240, the spatial domain characteristic parameters, the temporal domain characteristic parameters, the transmission characteristic parameters and the subjective quality scores of the sample video data are input to a BP neural network model, and the BP neural network model is trained until an expected error is achieved.

In some embodiments, each sample has 3 classes 11 set of 18 parameters and a subjective MOS score.

In step 250, it is determined whether the comparison result of the output result of the neural network model and the subjective quality score of the sample video data meets the requirement of the loss function, if yes, step 260 is executed, otherwise, step 270 is executed.

In some embodiments, the neural network model is subjected to effect verification, for example, the output result of the neural network model is subjected to correlation evaluation with the subjective quality score of the sample video data.

In some embodiments, the relevance evaluation is performed on the subjective evaluation result and the system evaluation result by using an SROCC algorithm, a KROCC algorithm and the like, and the higher the evaluation result is, the more accurate the evaluation system is, and the better the effect is; if the BP neural network is lower than 0.8, the BP neural network needs to be adjusted, such as adjusting the learning rate, expecting errors and the like, and training is carried out again.

In step 260, the trained neural network model is used as a video quality assessment model.

At step 270, parameters of the neural network model are adjusted, followed by continuing to perform step 240.

In the embodiment, the BP neural network in the non-reference video quality evaluation subsystem is trained and optimized through different qualities, contents and simulation of different transmission network conditions, a video quality evaluation model is obtained through continuous iteration, and the video quality evaluation model is used for performing quality evaluation testing of video services or real-time monitoring of the video services. The model fully considers typical influence factors of a space domain, a time domain and a transmission domain which influence the human eye visual experience, reflects the subjective visual experience and improves the accuracy of video evaluation.

In step 310, video data to be evaluated is obtained.

In some embodiments, the terminal receives and decodes the transmitted video stream.

In step 320, video characteristic parameters related to human visual experience in the video data to be evaluated are extracted.

In some embodiments, the video feature parameters include: at least one of spatial domain characteristic parameters, temporal domain characteristic parameters, and transmission characteristic parameters. The spatial domain characteristic parameters comprise: at least one of an image luminance chrominance perception parameter, an image contrast parameter, an image blur parameter, and an image edge energy parameter. The time domain characteristic parameters comprise: at least one of a scene cut perceptual parameter and an inter-frame difference parameter. The transmission characteristic parameters include: at least one of an interruption time parameter, a disorder rate parameter, a delay parameter, a packet loss rate parameter, and a code rate parameter.

In step 320, the video characteristic parameters are input into a video quality evaluation model to obtain a quality score of the video data to be evaluated, wherein the video quality evaluation model is obtained by training based on the video characteristic parameters in the sample video data and the quality score of the sample video data.

In the embodiment, the quality score of the video data to be evaluated is obtained by using the video quality evaluation model and the video characteristic parameters related to the human eye visual experience, so that the quality score is more objective, quantitative and accurate.

Fig. 4 is a schematic structural diagram of some embodiments of the video quality assessment model training apparatus according to the present disclosure. The apparatus includes a sample information obtaining unit 410, a feature parameter extracting unit 420, and a model training unit 430.

The sample information obtaining unit 410 is configured to obtain sample video data and a quality score of the sample video data.

The feature parameter extraction unit 420 is configured to extract video feature parameters related to human visual experience in the sample video data.

The model training unit 430 is configured to train the neural network model based on the video feature parameters with the quality scores of the sample video data as the annotation data, resulting in a video quality assessment model, so as to perform quality assessment on the video data to be evaluated according to the video quality assessment model.

The model training unit 430 is further configured to compare the output result of the neural network model with the quality score of the sample video data; judging whether the comparison result meets the requirement of the loss function or not, and adjusting the parameters of the neural network model through repeated iteration; and taking the neural network model when the comparison result meets the requirement of the loss function as a video quality evaluation model.

The sample information obtaining unit 410, the feature parameter extracting unit 420, and the model training unit 430 in the above embodiments may be implemented by a plurality of systems and modules in the systems as shown in fig. 5.

Fig. 5 is a schematic structural diagram of some embodiments of a video quality assessment model training apparatus according to the present disclosure. The apparatus includes a video feature extraction subsystem 510, a no-reference video quality evaluation subsystem 520, and an optimization training subsystem 530.

The video feature extraction subsystem 510 includes a video receiving module 511, a spatial domain feature parameter sub-extraction module 512, a temporal domain feature parameter extraction sub-module 513, and a transmission feature parameter extraction sub-module 514. The video receiving module 511 is configured to receive the transmitted video stream and decode it; the time domain characteristic parameter extraction submodule 512, the space domain characteristic parameter extraction submodule 513 and the transmission characteristic parameter extraction submodule 514 respectively extract and count 18 characteristic parameters of 11 groups of 3 types from the video data packet stream, and transmit the characteristic parameters to the characteristic parameter acquisition module of the BP neural network subsystem.

The no-reference video quality evaluation subsystem 520 comprises a characteristic parameter acquisition module 521, a BP neural network 522 and a BP neural network parameter control module 523.

The feature parameter collecting module 521 collects 18 video stream feature parameters of 11 types and inputs the video stream feature parameters into an input layer of the BP neural network. The BP neural network 522 includes an input layer, a hidden layer, and an output layer, and the input layer introduces characteristic parameters, and outputs a video evaluation scoring result from the output layer after processing by the hidden layer. The BP neural network parameter control module 523 is responsible for controlling parameters of each layer of the BP neural network, including a neuron connection weight, a learning rate, a threshold value, and the like, and the control module is only responsible for setting each parameter of the neural network, but the parameter set is not generated by the control module, and is generated by inputting of the neural network parameter module of the optimization training subsystem.

The optimization training subsystem 530 includes a video quality evaluation scoring module 531, a video quality subjective MOS scoring module 532, a BP neural network parameter module 533, a BP neural network parameter optimization training module 534, and a training optimization control module 535.

The video quality evaluation score generated by each training of the BP neural network subsystem is input to the video quality evaluation score module 531 for storage. The video quality subjective MOS scoring module 532 retrieves the subjective MOS score corresponding to the sample from the sample library according to the trained sample. The BP neural network parameter module 533 is responsible for generating all parameters of the BP neural network, including initializing a parameter set and storing the parameter set after optimization training, and is also responsible for providing the parameter set to the parameter optimization training module for parameter optimization; and the parameter set is also transmitted to a parameter control module of the neural network subsystem, and the parameter set is assigned to the BP neural network by the control module. The BP neural network parameter optimization training module 534 is a core module of the subsystem, optimizes parameters of the neural network according to errors of the trained scores and the subjective MOS scores, and transmits the optimized parameter sets to the neural network parameter module. The training optimization control module 535 is a control core of the subsystem, and controls the operation and coordination of each module.

After the trained video quality evaluation model is obtained, the video quality evaluation model can be deployed and implemented, and because the video quality evaluation model is based on a BP neural network, the operation cost is low, the video quality evaluation model can be deployed on a video terminal side as a probe system, collects the video quality of the terminal side in real time, and is matched with the work of fault diagnosis, intelligent maintenance and the like of a video service operation and maintenance system; the system can also be used as a test tool to test at each node of the video service in real time; a pre-application evaluation reference may also be made to the video content.

The video quality evaluation model training device has universality, universality and cross-platform performance, can be realized and deployed on the basis of various platforms or terminal equipment, can also be deployed and operated on the basis of entity equipment or a virtual machine, and has wide application prospect.

Fig. 6 is a schematic structural diagram of some embodiments of the video quality evaluation apparatus of the present disclosure. The apparatus includes an actual information acquisition unit 610, a video feature extraction unit 620, and a video quality evaluation unit 630.

The actual information acquisition unit 610 is configured to acquire video data to be evaluated.

The video feature extraction unit 620 is configured to extract video feature parameters related to human visual experience in the video data to be evaluated.

The video quality evaluation unit 630 is configured to input the video feature parameters to a video quality evaluation model, and obtain a quality score of the video data to be evaluated, where the video quality evaluation model is trained based on the video feature parameters in the sample video data and the quality score of the sample video data.

Fig. 7 is a schematic structural diagram of some embodiments of an electronic device of the present disclosure. The electronic device includes a memory 710 and a processor 720. Wherein: the memory 710 may be a magnetic disk, flash memory, or any other non-volatile storage medium. The memory is used to store instructions in the embodiments corresponding to fig. 1-3. Processor 720, coupled to memory 710, may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 720 is configured to execute instructions stored in the memory.

In some embodiments, processor 720 is coupled to memory 710 through a BUS BUS 730. The electronic device 700 may also be connected to an external storage system 750 through a storage interface 740 for retrieving external data, and may also be connected to a network or another computer system (not shown) through a network interface 760. And will not be described in detail herein.

In the embodiment, the data instruction is stored through the memory, the instruction is processed through the processor, in the embodiment, the neural network model is trained through the video characteristic parameters related to human visual experience and the quality score of the sample video data, the video quality evaluation model is obtained, and the subsequent actually acquired video data can be objectively, quantitatively, accurately, efficiently and quickly evaluated.

In other embodiments, a computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method in the embodiments corresponding to fig. 1-3. As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Thus far, the present disclosure has been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. A video quality assessment model training method comprises the following steps:

acquiring sample video data and a quality score of the sample video data;

extracting video characteristic parameters related to human eye visual experience in the sample video data; and

and taking the quality score of the sample video data as marking data, training a neural network model based on the video characteristic parameters to obtain a video quality evaluation model, and performing quality evaluation on the video data to be evaluated according to the video quality evaluation model.

2. The video quality assessment model training method of claim 1,

the video feature parameters include: at least one of spatial domain characteristic parameters, temporal domain characteristic parameters, and transmission characteristic parameters.

3. The video quality assessment model training method of claim 2,

the spatial domain characteristic parameters comprise: at least one of an image luminance chrominance perception parameter, an image contrast parameter, an image blur parameter, and an image edge energy parameter.

4. The video quality assessment model training method of claim 2,

the time domain characteristic parameters comprise: at least one of a scene cut perceptual parameter and an inter-frame difference parameter.

5. The video quality assessment model training method of claim 2,

the transmission characteristic parameters include: at least one of an interruption time parameter, a disorder rate parameter, a delay parameter, a packet loss rate parameter, and a code rate parameter.

6. The video quality assessment model training method according to any one of claims 1 to 5, wherein obtaining the video quality assessment model comprises:

comparing the output of the neural network model to a quality score of the sample video data;

judging whether the comparison result meets the requirement of the loss function or not, and adjusting the parameters of the neural network model through repeated iteration; and

and taking a neural network model when the comparison result meets the requirement of the loss function as the video quality evaluation model.

7. The video quality assessment model training method according to any one of claims 1 to 5,

the neural network model is a back propagation algorithm BP neural network model.

8. A video quality evaluation method comprises the following steps:

acquiring video data to be evaluated;

extracting video characteristic parameters related to human eye visual experience in the video data to be evaluated; and

inputting the video characteristic parameters into a video quality evaluation model to obtain the quality score of the video data to be evaluated, wherein,

the video quality evaluation model is obtained by training based on video characteristic parameters in the sample video data and quality scores of the sample video data.

9. The video quality evaluation method according to claim 8,

the video characteristic parameters comprise at least one of spatial domain characteristic parameters, temporal domain characteristic parameters and transmission characteristic parameters.

10. A video quality assessment model training apparatus, comprising:

a sample information acquisition unit configured to acquire sample video data and a quality score of the sample video data;

a characteristic parameter extraction unit configured to extract video characteristic parameters related to human visual experience in the sample video data; and

and the model training unit is configured to train a neural network model based on the video characteristic parameters by taking the quality scores of the sample video data as labeling data to obtain a video quality evaluation model so as to evaluate the quality of the video data to be evaluated according to the video quality evaluation model.

11. The video quality assessment model training device of claim 10,

12. A video quality evaluation apparatus comprising:

an actual information acquisition unit configured to acquire video data to be evaluated;

the video feature extraction unit is configured to extract video feature parameters related to human visual experience in the video data to be evaluated; and

a video quality evaluation unit configured to input the video feature parameters into a video quality evaluation model to obtain a quality score of the video data to be evaluated,

13. The video quality evaluation apparatus according to claim 12,

14. An electronic device, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the video quality assessment model training method of any of claims 1 to 7, or the video quality assessment method of claim 8 or 9, based on instructions stored in the memory.

15. A non-transitory computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the video quality assessment model training method of any one of claims 1 to 7, or the video quality assessment method of claim 8 or 9.