CN113573044B

CN113573044B - Video data processing method and device, computer equipment and readable storage medium

Info

Publication number: CN113573044B
Application number: CN202110078355.2A
Authority: CN
Inventors: 刘强; 李松南
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2022-12-09
Anticipated expiration: 2041-01-19
Also published as: CN113573044A

Abstract

The embodiment of the application discloses a video data processing method, a video data processing device, computer equipment and a readable storage medium, and relates to the video processing technology in artificial intelligence, wherein the method comprises the following steps: acquiring a video frame image and the attribute of a video to be detected, and performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result; if the attribute detection result is that the video frame image is matched with the attribute of the video to be detected, determining the first quality evaluation value as a target quality evaluation value of the video frame image; if the attribute detection result is that the video frame image is not matched with the attribute of the video to be detected, detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the video frame image, identifying an image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image. By adopting the method and the device, the accuracy of quality evaluation on the video data can be improved.

Description

Video data processing method and device, computer equipment and readable storage medium

Technical Field

The present application relates to the field of video processing, and in particular, to a method and an apparatus for processing video data, a computer device, and a readable storage medium.

Background

With the rapid development of computer networks, computers have been widely used in various fields, such as education, entertainment, medical treatment, and the like. For example, in the entertainment field, a user may record a video using a computer device, upload the recorded video data to a social platform, and so on, so that more users can view the data of people, body types, scenery, and so on that are photographed by themselves. In addition, the social platform can also be popularized according to the video data uploaded by the user, so that the user utilization rate of the social platform is increased.

In the prior art, the quality of video data recorded by users is uneven, so that the quality of the video data uploaded by the users is low, and the effect of increasing the user utilization rate of a social platform is poor. Therefore, how to accurately evaluate the quality of the video data uploaded by the user is a problem to be solved urgently. At present, the quality of video data is generally determined according to the content in the video data and the definition of a video, and if the content in the video data does not contain sensitive information and the content in the video data is relatively clear, the quality of the video data is determined to be relatively high.

Disclosure of Invention

The embodiment of the application provides a video data processing method and device, a computer device and a readable storage medium, which can improve the accuracy of quality evaluation on video data.

An embodiment of the present application provides a video data processing method, including:

acquiring a video frame image and a to-be-detected video attribute, and performing attribute detection on the video frame image based on the to-be-detected video attribute to obtain an attribute detection result; the attribute of the video to be tested is used for indicating the attribute influencing the quality of the video frame image;

if the attribute detection result is the result that the video frame image is matched with the attribute of the video to be detected, determining a first quality evaluation value as a target quality evaluation value of the video frame image;

if the attribute detection result is the result that the video frame image is not matched with the attribute of the video to be detected, detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image, identifying an image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image.

An aspect of an embodiment of the present application provides a video data processing apparatus, including:

the category detection module is used for acquiring a video frame image and the attribute of a video to be detected, and performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result; the attribute of the video to be tested is used for indicating the attribute influencing the quality of the video frame image;

the category matching module is used for determining a first quality evaluation value as a target quality evaluation value of the video frame image if the attribute detection result is the result that the video frame image is matched with the attribute of the video to be detected;

and the object detection module is used for detecting the evaluation object in the video frame image if the attribute detection result is the result that the video frame image is not matched with the attribute of the video to be detected, obtaining a second quality evaluation value of the evaluation object in the video frame image, identifying an image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image.

Optionally, the category detecting module includes:

the data acquisition unit is used for acquiring a pixel mean value and a pixel variance corresponding to pixel values of pixel points forming the video frame image based on the attribute of the video to be detected and determining statistical data of the video frame image based on the pixel mean value and the pixel variance;

and the threshold matching unit is used for determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image if the statistical data of the video frame image is smaller than the statistical abnormal threshold.

Optionally, the category detecting module includes:

the icon detection unit is used for carrying out icon detection on the video frame image based on the attribute of the video to be detected, if the video frame image is detected to have a video icon, acquiring a mark icon, and carrying out feature matching on the features of the video icon and the features of the mark icon;

and the icon matching unit is used for determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image if the characteristics of the video icon are not matched with the characteristics of the mark icon.

Optionally, the category detecting module includes:

the characteristic identification unit is used for acquiring a reference coordinate system based on the attribute of the video to be detected and identifying key characteristic points in the video frame image;

the position determining unit is used for determining the relative position information of the key characteristic point based on the characteristic position of the key characteristic point in the reference coordinate system;

and the position matching unit is used for determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image if the relative position information does not belong to the standard relative position range.

Optionally, the number of the evaluation objects in the video frame image is at least two, and the at least two evaluation objects include a color value object, an aesthetic object and a video quality object; the object detection module includes:

a color value determination unit for detecting a color value object in the video frame image based on a color value detection model, and determining a second quality evaluation value of the color value object in the video frame image;

an aesthetic determination unit for detecting an aesthetic object in the video frame image based on an aesthetic detection model, determining an image composition structure of the video frame image, identifying color value combination information of the video frame image, and determining a second quality assessment value of the aesthetic object in the video frame image according to the image composition structure and the color value combination information;

the quality determination unit is used for detecting a video quality object in the video frame image based on a quality detection model, determining color gradation distribution information of the video frame image, identifying noise information in the video frame image, and determining a second quality evaluation value of the video quality object in the video frame image according to the color gradation distribution information and the noise information.

Optionally, the number of the evaluation objects is at least two; the object detection module includes:

the weight obtaining unit is used for obtaining weight information corresponding to an image scene to which the video frame image belongs; the weight information comprises weight parameters respectively corresponding to at least two evaluation objects;

and the weight calculation unit is used for carrying out weighted summation on the second quality evaluation values respectively corresponding to the at least two evaluation objects based on the weight parameters respectively corresponding to the at least two evaluation objects to obtain the target quality evaluation value of the video frame image.

Optionally, the object detection module includes:

a structure determining unit for detecting an image composition structure of the video frame image;

the image splitting unit is used for carrying out frame splitting processing on the video frame image to obtain at least two pieces of separated image data if the image composition structure of the video frame image is a frame synthesis structure;

an image detection unit, configured to detect an evaluation object in each of the at least two separated image data to obtain a second quality evaluation value of the evaluation object in each of the at least two separated image data;

a data composing unit operable to compose the second quality evaluation value of the evaluation object in each of the separated image data into a second quality evaluation value of the evaluation object in the video frame image.

One aspect of the present application provides a computer device, including: a processor, a memory, a network interface;

the processor is connected to a memory and a network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing a computer program, and the processor is used for calling the computer program to execute the method in the aspect in the embodiment of the present application.

An aspect of the embodiments of the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program is adapted to be loaded by a processor and to execute the method in the first aspect.

An aspect of an embodiment of the present application provides a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternatives in one aspect of the embodiments of the application.

In the embodiment of the application, by performing attribute detection on a video frame image, whether the video frame image has attributes which affect the quality of the video frame image, such as pure screen, watermark, rotation and the like, can be determined, and if the video frame image is detected to have the attributes, a target quality evaluation value corresponding to the video frame image is determined to be a first quality evaluation value. If the video frame image is detected to have no attributes, detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image, identifying an image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image. That is to say, the present application may determine whether the video frame image has an attribute that affects the quality of the video frame image, that is, a low-quality video frame image attribute, by performing a primary detection on the video frame image. And under the condition that the attribute influencing the quality of the video frame image does not exist, secondarily detecting the video frame image and combining the image scene to which the video frame image belongs to determine the target quality evaluation value of the video frame image. By the primary detection of the video frame image, whether the video frame image has low-quality video frame image attributes or not can be quickly determined, if yes, the subsequent detection is not carried out, and the subsequent calculation amount can be reduced; if the attribute of the low-quality video frame image does not exist, the secondary detection is carried out on the video frame image by combining the scene to which the video frame image belongs, so that the accuracy of the quality evaluation of the video frame image can be improved. In addition, since the video frame image is evaluated from multiple dimensions (including screen-only, watermark, rotation, color value, aesthetics, and video quality), the quality evaluation accuracy of the video data can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a video data processing system according to an embodiment of the present disclosure;

fig. 2 is a schematic view of an application scenario of a video data processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a video data processing method according to an embodiment of the present application;

fig. 4 is a scene schematic diagram for determining relative position information of key feature points according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of another video data processing method according to an embodiment of the present application;

fig. 6 is a schematic view of a scene split from a video frame image according to an embodiment of the present application;

fig. 7 is a schematic flowchart of another video data processing method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) is a science for researching how to make a machine look, and in particular, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or to transmit to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition. The method and the device for evaluating the quality of the video frame image relate to a video processing technology and cloud computing in artificial intelligence, and can detect the video frame image by utilizing the video processing technology and the cloud computing, determine the attribute of the video frame image, detect an evaluation object in the video frame image, obtain the quality evaluation value of the evaluation object in the video frame image, determine the quality evaluation value of the video frame image and further improve the accuracy of quality evaluation on video data.

Referring to fig. 1, fig. 1 is a network architecture diagram of a video data processing system according to an embodiment of the present application, as shown in fig. 1, a computer device 101 may perform data interaction with a user terminal, where the number of the user terminals may be one or more, for example, when the number of the user terminals is multiple, the user terminals may include the user terminal 102a, the user terminal 102b, the user terminal 102c, and the like in fig. 1. Taking the user terminal 102a as an example, the computer device 101 may respond to a data upload request of the user terminal 102a, obtain a video frame image and a to-be-detected video attribute based on the data upload request, perform attribute detection on the video frame image based on the to-be-detected video attribute, and obtain an attribute detection result, where the to-be-detected video attribute is used to indicate an attribute affecting quality of the video frame image. Further, the computer device 101 may determine whether the attribute detection result is a result that the video frame image matches the attribute of the video to be detected, and if the attribute detection result is a result that the video frame image matches the attribute of the video to be detected, determine the first quality assessment value as a target quality assessment value of the video frame image; if the attribute detection result is not the result that the video frame image is matched with the attribute of the video to be detected, namely the attribute detection result is the result that the video frame image is not matched with the attribute of the video to be detected, detecting the evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image. The computer apparatus 101 obtains a target quality evaluation value of the video frame image by identifying an image scene to which the video frame image belongs and adjusting the second quality evaluation value based on the image scene.

The method comprises the steps of performing primary detection on a video frame image, and determining whether the video frame image has an attribute which influences the quality of the video frame image, namely, the attribute of a low-quality video frame image. And under the condition that the attribute does not exist, carrying out secondary detection on the video frame image, and determining a target quality evaluation value of the video frame image by combining the image scene to which the video frame image belongs. By the primary detection of the video frame image, whether the video frame image has low-quality video frame image attributes or not can be quickly determined, if yes, the subsequent detection is not carried out, and the subsequent calculation amount can be reduced; if the attribute influencing the quality of the video frame image does not exist in the video frame image, the accuracy of video frame image quality evaluation can be improved by carrying out secondary detection on the video frame image by combining the scene to which the video frame image belongs.

It is understood that the computer device mentioned in the embodiments of the present application includes, but is not limited to, a terminal device or a server. In other words, the computer device or the user device may be a server or a terminal device, or may be a system composed of a server and a terminal device. The above-mentioned terminal device may be an electronic device, including but not limited to a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm-top computer, an Augmented Reality/Virtual Reality (AR/VR) device, a helmet-mounted display, a wearable device, a smart speaker, a digital camera, a camera, and other Mobile Internet Devices (MID) with network access capability, and the like, where the client has a display function. The above-mentioned server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Optionally, data related in this embodiment of the application, such as the video frame image and the quality assessment value, may be stored in a server, or may be stored in a memory of a computer device, or may be stored based on a cloud storage technology, which is not limited herein.

Further, please refer to fig. 2, wherein fig. 2 is a schematic view of an application scenario of a video data processing method according to an embodiment of the present application. As shown in fig. 2, the user terminal 20 sends a data upload request to the computer device 22, where the data upload request carries video data, and the computer device 22 may obtain a video frame image 21 in the video data and a to-be-detected video attribute indicating that quality of the video frame image is affected, and perform attribute detection on the video frame image 21 based on the to-be-detected video attribute to obtain an attribute detection result; and if the attribute detection result is the result that the video frame image 21 is matched with the attribute of the video to be detected, determining the first quality evaluation value as the target quality evaluation value of the video frame image. Optionally, the computer device 22 may also output attributes of the video frame image that match the attributes of the video to be tested. For example, if the video frame image 21 matches the watermark attribute in the attribute to be measured, the output marked video frame image 23 includes the watermark. If the attribute detection result is that the video frame image 21 is not matched with the attribute of the video to be detected, the computer device 22 detects the evaluation object in the video frame image 21 to obtain a second quality evaluation value of the evaluation object in the video frame image 21, identifies the image scene to which the video frame image 21 belongs, adjusts the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image 21, and accordingly achieves quality evaluation of the video frame image 21.

Further, please refer to fig. 3, wherein fig. 3 is a schematic flowchart of a video data processing method according to an embodiment of the present application; as shown in fig. 3, the method includes:

s101, acquiring a video frame image and the attribute of a video to be detected, and performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result.

In the embodiment of the application, the computer equipment can acquire video data from a local database; alternatively, the video data may be acquired from other storage media; or, the computer device may further acquire video data from the user terminal, and the computer device may acquire at least two video frame images by performing frame extraction processing on the acquired video data. Taking the example that the computer device acquires video data from the user terminal, when a user sends a data uploading request through the user terminal, the computer device acquires the video data based on the data uploading request, and if the video data is a single-frame image, the video data is determined to be a video-frame image. If the video data is data composed of multiple frames of images, the computer device may split the video data to obtain at least two video frames composing the video data, where the computer device may determine the at least two video frames as at least two video frame images to be processed in the embodiment of the present application, and may also perform frame extraction processing on the at least two video frames to obtain at least two video frame images to be processed in the embodiment of the present application. In the embodiment of the present application, a video frame image is taken as an example for description, and a computer device obtains a video frame image and an attribute of a video to be detected, and performs attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result. If the video data is data composed of a plurality of frames of images, the processing of steps S101 to S103 is performed for each frame of image in the video data. The video attribute to be tested is used for indicating the attribute influencing the quality of the video frame image. For example, the video attribute to be measured may include a pure screen attribute, a watermark attribute, a rotation attribute, or the like.

Optionally, the to-be-detected video attribute includes a pure screen attribute, and the computer device may detect a pixel value of a pixel point of the video frame image to obtain an attribute detection result of the video frame image, where the attribute detection result may indicate whether the attribute of the video frame image has the pure screen attribute. Specifically, the computer device may obtain a pixel mean value and a pixel variance corresponding to pixel values of pixel points constituting a video frame image based on the attribute of the video to be detected, and determine statistical data of the video frame image based on the pixel mean value and the pixel variance; and if the statistical data of the video frame image is smaller than the statistical anomaly threshold value, determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image. If the statistical data of the video frame image is smaller than the statistical anomaly threshold, the computer device may determine that the video frame image is a pure-screen attribute, that is, when the statistical data of the video frame image is smaller than the statistical anomaly threshold, it may be considered that the pixel point difference of the pixel points of the video frame image is small, and the video frame image may be considered as a pure-screen image. The statistical anomaly threshold may be a default anomaly threshold, such as a numerical value of 5, 10, or 15, or may be a variable adjusted according to a process for determining the quality of a video frame image, which is not limited in the embodiment of the present application.

In specific implementation, if the statistical anomaly threshold includes a variance anomaly threshold, the computer device may first obtain a pixel value of each pixel point in the video frame image, and determine a pixel mean value corresponding to the video frame image according to the pixel value of each pixel point in the video frame image. And then calculating the pixel variance corresponding to the video frame image based on the pixel mean and the pixel value of each pixel point in the video frame image, and determining the attribute of the video frame image. If the pixel variance detected by the computer equipment is smaller than the variance abnormal threshold value, which indicates that the pure screen attribute exists in the video frame image, determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image. If the computer device detects that the pixel variance is larger than or equal to the variance abnormal threshold value, it indicates that the video frame image does not have the pure screen attribute.

Further, the statistical anomaly threshold may further include a mean anomaly threshold, and the computer device may first obtain a pixel value of each pixel point in the video frame image, determine a pixel mean value corresponding to the video frame image according to the pixel value of each pixel point in the video frame image, and then calculate a pixel variance corresponding to the video frame image based on the pixel mean value and the pixel value of each pixel point in the video frame image. If the pixel variance detected by the computer equipment is smaller than the variance abnormal threshold value, the pure screen attribute of the video frame image exists, and the result that the video frame image is matched with the attribute of the video to be detected is determined as the attribute detection result of the video frame image. Further, the computer device may obtain a hue range to which the pixel mean value belongs, where the hue range may include a white range, a black range, a green range, and the like, and determine a pure screen attribute category based on the hue range to which the pixel mean value belongs, where the pure screen attribute category may include a pure black screen attribute category, a pure white screen attribute category, or a pure green screen attribute category, and the like.

Optionally, the attribute of the video to be detected includes a watermark attribute, and the computer device may perform icon detection on the video frame image to obtain an attribute detection result of the video frame image, where the attribute detection result may indicate whether the attribute of the video frame image has the watermark attribute. Specifically, the computer device can perform icon detection on the video frame image based on the attribute of the video to be detected, and if the video frame image is detected to have the video icon, the mark icon is obtained, and the characteristics of the video icon and the characteristics of the mark icon are subjected to characteristic matching; and if the characteristics of the video icon are not matched with the characteristics of the mark icon, determining the result of matching the video frame image with the attributes of the video to be detected as the attribute detection result of the video frame image.

The number of the marker icons may include one or more, and the marker icon may refer to an enterprise icon having a cooperative relationship with the social platform on which the video frame image is located, for example, the marker icon may include a station logo of the cooperative enterprise (i.e., a logo indicating a television station playing the television program), an enterprise identifier (logo) of the cooperative enterprise, and the like. If the computer equipment detects that the characteristics of the video icon in the video frame image are not matched with the characteristics of the marked icon, the watermark attribute of the video frame image is determined to exist, and the result that the video frame image is matched with the attributes of the video to be detected is determined as the attribute detection result of the video frame image. And if the computer equipment detects that the characteristics of the video icon in the video frame image are matched with the characteristics of the marked icon, determining that the video frame image does not have the watermark attribute. That is, if a video icon in a video frame image belongs to an enterprise icon having a cooperative relationship with the social platform, the computer device considers that the video frame image does not have the watermark attribute; and if the video icon in the video frame image does not belong to the enterprise icon with the cooperative relationship with the social platform, the computer equipment considers that the video frame image has the watermark attribute.

In a specific implementation, the computer device may detect the video frame image by using a deep learning network such as a Convolutional Neural Network (CNN), a super-resolution testing sequence (VGG), or a residual error network (ResNet), and determine whether a video icon exists in the video frame image. Optionally, the computer device may further perform text recognition on the video frame image to obtain text features in the video frame image, and if the computer device detects that the text features in the video frame image are matched with the text features in the marker icon and that the icon corresponding to the text features in the video frame image is matched with the marker icon, it is determined that the video frame image belongs to the watermark attribute, and a result of matching the video frame image with the attribute of the video to be detected is determined as an attribute detection result of the video frame image. And if the computer equipment detects that the text features in the video frame image are not matched with the text features in the mark icon and the icon corresponding to the text features in the video frame image is not matched with the mark icon, determining that the watermark attribute exists in the video frame image.

Optionally, the to-be-detected video attribute includes a rotation attribute, and the computer device may detect the key feature point in the video frame image to obtain an attribute detection result of the video frame image, where the attribute detection result may indicate whether the video frame image has the rotation attribute. Specifically, the computer device can acquire a reference coordinate system based on the attribute of the video to be detected, and identify key feature points in the video frame image; determining the relative position information of the key characteristic points based on the characteristic positions of the key characteristic points in the reference coordinate system; and if the relative position information does not belong to the standard relative position range, determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image.

The reference coordinate system may be a preset coordinate system, and for example, may be a coordinate system established with any one of an upper left corner position, a lower left corner position, an upper right corner position, a lower right corner position, or a midpoint position of the video frame image as a coordinate origin. It can be understood that when the reference coordinate system is established and the position information of the video frame image is described, the reference coordinate system is taken as the standard to unify the coordinate system of the video frame image. The key feature points in the video frame image may refer to feature points for describing an object included in the video frame image. For example, if the object included in the video frame image is a person, the key feature points may be feature points of five sense organs, feature points of four limbs, or the like of the person. If the object included in the video frame image is a landscape, the key feature point may refer to a feature point for describing the landscape, for example, when the landscape is a tree, the key feature point may refer to a root, a tip, a stem, or the like of the tree; or when the landscape is a mountain, the key feature points can be the bottom, waist or top of the mountain. The computer equipment can determine the relative position information of the key characteristic points based on the characteristic positions of the key characteristic points in the reference coordinate system by identifying the key characteristic points in the video frame images. And if the computer equipment detects that the relative position information does not belong to the standard relative position range, determining that the video frame image has a rotation attribute, and determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image. And if the computer equipment detects that the relative position information belongs to the standard relative position range, determining that the video frame image does not have the rotation attribute. The standard relative position range can be determined according to the relative position information of the key feature points corresponding to the historical video frame images.

As shown in fig. 4, fig. 4 is a scene schematic diagram of determining relative position information of key feature points according to an embodiment of the present application, where an object included in a video frame image 40 is a person, a computer device determines a key feature image 41 of the person by identifying the video frame image 40, the key feature image 41 includes key feature points 1 to 5 (respectively corresponding to eyebrows, eyes, ears, noses, and mouths), and based on feature positions of the key feature points in a reference coordinate system, the relative position information of the key feature points is determined to be 42, that is, the relative position information of the key feature points is respectively: and determining that the relative position information belongs to the standard relative position range if the key feature point 1 is positioned above the key feature point 2, the key feature point 2 is positioned on the left side or the right side of the key feature point 3, the key feature point 2 is positioned obliquely above the key feature point 4, and the key feature point 4 is positioned above the key feature point 5. If it is determined that the relative position information of the key feature point is 43, that is, the relative position information of the key feature point is: and if the key feature point 1 is positioned below the key feature point 2, the key feature point 2 is positioned on the left side or the right side of the key feature point 3, the key feature point 2 is positioned obliquely below the key feature point 4, and the key feature point 4 is positioned below the key feature point 5, determining that the relative position information does not belong to the standard relative position range, determining that the video frame image has a rotation attribute, and determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image.

When one or more of the pure screen attribute, the watermark attribute or the rotation attribute exists in the video frame image, the video frame image has the problems of low definition, inconvenience in viewing and the like, so that the video frame image can be considered to belong to a low-quality video image, subsequent detection and other processing can not be carried out on the video frame image, and the video data processing efficiency is improved.

The video attributes to be detected can include m attributes, m is a positive integer, attribute detection is performed on the video frame image based on the m attributes to obtain attributes of the video frame image, and if the attributes of the video frame image are matched with one or more of the m attributes, the result of matching the video frame image with the attributes of the video to be detected is determined as the attribute detection result of the video frame image. And if the attributes of the video frame image are not matched with the m attributes, determining the result that the video frame image is not matched with the attributes of the video to be detected as the attribute detection result of the video frame image. For example, the m attributes include a pure screen attribute, a watermark attribute, a rotation attribute, or a reproduction attribute (that is, recording video data that has been recorded by a user), and the like, and the video attribute to be detected may be any one of the pure screen attribute, the watermark attribute, the rotation attribute, or the reproduction attribute, or a combination of at least two of the pure screen attribute, the watermark attribute, the rotation attribute, or the reproduction attribute. For example, m is 1, the attribute of the video to be detected is a pure screen attribute, and if the computer device detects that the attribute of the video frame image is not matched with the pure screen attribute, the attribute detection result is determined to be a result that the attribute of the video frame image is not matched with the attribute of the video to be detected. Or m is 2, the attributes of the video to be detected are the pure screen attribute and the watermark attribute, and if the computer equipment detects that the attributes of the video frame image are not matched with the pure screen attribute and the watermark attribute, the attribute detection result is determined to be the result that the attributes of the video frame image are not matched with the attributes of the video to be detected. Or m is 3, the attributes of the video to be detected are a pure screen attribute, a watermark attribute and a rotation attribute, and if the computer equipment detects that the attributes of the video frame image are not matched with the pure screen attribute, the watermark attribute and the rotation attribute, the attribute detection result is determined to be the result that the video frame image is not matched with the attributes of the video to be detected.

And S102, if the attribute detection result is the result that the video frame image is matched with the attribute of the video to be detected, determining the first quality evaluation value as the target quality evaluation value of the video frame image.

In the embodiment of the application, when the video frame image belongs to one or more of a pure screen attribute, a watermark attribute, a rotation attribute and the like, the video frame image has the problems of low definition, inconvenience in viewing and the like, so that the video frame image can be considered to belong to a low-quality video image, and the computer device determines the first quality evaluation value as the target quality evaluation value of the video frame image, so that the quality evaluation of the video frame image is realized. Where the first quality assessment value may be 0 or other value, when the video frame image is low-quality video data, a default quality assessment value (e.g., 0, 0.1, 0.15 or other value) may be used as a target quality assessment value of the video frame image to achieve quality assessment of the video frame image. Optionally, the computer device may further display attributes, such as a pure screen attribute, a watermark attribute, a rotation attribute, and the like, to which the video frame image belongs in the video frame image, so that a user may determine a reason why the target quality evaluation value of the video frame image is low, thereby implementing quick modification.

S103, if the attribute detection result is that the video frame image is not matched with the attribute of the video to be detected, detecting the evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image, identifying the image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image.

In the embodiment of the application, if the attribute detection result is the result that the video frame image and the attribute of the video to be detected are not matched, the computer equipment detects the evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image; and the computer equipment identifies the image scene to which the video frame image belongs and adjusts the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image.

The number of the evaluation objects in the video frame image may be one, such as a color value object, an aesthetic object, or a video quality object. When the number of evaluation objects in the video frame image is one, for example, an aesthetic object, the computer device obtains a second quality evaluation value of the aesthetic object in the video frame image by detecting the aesthetic object in the video frame image, and determines the second quality evaluation value of the aesthetic object in the video frame image as a target quality evaluation value of the video frame image. Alternatively, the number of the evaluation objects in the video frame image is at least two, and the at least two evaluation objects include, but are not limited to, a color value object, an aesthetic object, or a video quality object, etc., that is, the number of the evaluation objects in the video frame image is two or more, and specifically may be a color value object and an aesthetic object, or a color value object and a video quality object, or an aesthetic object and a video quality object, or a color value object, an aesthetic object, and a video quality object. The color value object is used for indicating the color value characteristic of the evaluation object in the video frame image, the aesthetic object is used for indicating the aesthetic characteristic of the evaluation object in the video frame image, and the video quality object is used for indicating the video quality characteristic of the evaluation object in the video frame image.

Specifically, the computer device detects a color value object in the video frame image based on a color value detection model, and determines a second quality evaluation value of the color value object in the video frame image; detecting an aesthetic object in the video frame image based on the aesthetic detection model, determining an image composition structure of the video frame image, identifying color value combination information of the video frame image, and determining a second quality evaluation value of the aesthetic object in the video frame image according to the image composition structure and the color value combination information; the method comprises the steps of detecting a video quality object in a video frame image based on a quality detection model, determining color gradation distribution information of the video frame image, identifying noise information in the video frame image, and determining a second quality evaluation value of the video quality object in the video frame image according to the color gradation distribution information and the noise information.

In a specific implementation, the computer device detects the color value object in the video frame image based on the color value detection model, may determine the information of the five sense organs of the color value object in the video frame image and the angle of the color value object in the video frame image, for example, the color value object in the video frame image is a front face, a side face, etc., and determine the second quality evaluation value of the color value object in the video frame image according to the information of the five sense organs of the color value object in the video frame image and the angle of the color value object in the video frame image. The computer device detects the aesthetic object in the video frame image based on the aesthetic detection model, determines the image composition structure of the video frame image, for example, determines whether the image composition structure is a three-division structure, identifies color value combination information of the video frame image, the color value combination information being color collocation information in the video frame image, and determines a second quality evaluation value of the aesthetic object in the video frame image according to the image composition structure and the color value combination information. Optionally, the computer device may further detect the aesthetic object in the video frame image based on the aesthetic detection model, determine the video content in the video frame image, for example, determine whether the video content belongs to vulgar information, sensitive information, and the like, and thereby determine a second quality assessment value of the aesthetic object in the video frame image by combining the image composition structure, the color value combination information, and the video content. The computer device detects a video quality object in the video frame image based on the quality detection model, determines color gradation distribution information of the video frame image, wherein the color gradation distribution information can refer to the exposure degree of the video frame image, such as whether the video frame image is over-exposed or over-dark, identifies noise information in the video frame image, the noise information can comprise noise jitter and the like in the video frame image, and determines a second quality evaluation value of the video quality object in the video frame image according to the color gradation distribution information and the noise information. Optionally, before the color value detection model, the aesthetic detection model, and the quality detection model are used to detect the video frame image, the color value detection model, the aesthetic detection model, and the quality detection model may be trained separately, and the following training method for the scene recognition model may be referred to in the specific model training method, which is not described herein too much.

Optionally, the number of the evaluation objects in the video frame image is at least two, and the computer device may acquire an image scene to which the acquired video frame image belongs, and adjust the second quality evaluation value based on the image scene to obtain the target quality evaluation value of the video frame image. Specifically, the computer device acquires weight information corresponding to an image scene to which the video frame image belongs; wherein, the weight information comprises weight parameters respectively corresponding to at least two evaluation objects; and weighting and summing the second quality evaluation values respectively corresponding to the at least two evaluation objects based on the weight parameters respectively corresponding to the at least two evaluation objects to obtain a target quality evaluation value of the video frame image.

In specific implementation, the computer device may obtain scene features in the video frame image, and identify the obtained scene features based on the scene identification model, so as to determine an image scene to which the video frame image belongs, where the specific scene features may be determined according to the type of an object in the video frame image and the content of the video frame image. The image scenes may include live scenes, landscape scenes, meeting scenes, street-shot scenes, or other scenes, among others. Optionally, before the scene recognition model is used to recognize the video frame image, the scene recognition model may be trained, and when the scene recognition model is trained, the training sample scene image is labeled first, including labeling information such as the scene type, the object in the scene, and the position of the object in the scene image. And then training the initial sample scene image by using the marked sample scene image, and storing the model when the model converges and reaches a certain precision (the loss function value in the model is smaller than the loss threshold and the precision is larger than the precision threshold), wherein the stored model is the scene recognition model. The computer equipment identifies the video frame image by using the scene identification model, can determine the image scene to which the video frame image belongs, so as to obtain the weight parameters corresponding to at least two evaluation objects corresponding to the image scene respectively, and performs weighted summation on the second quality evaluation values corresponding to the at least two evaluation objects respectively based on the weight parameters corresponding to the at least two evaluation objects respectively to obtain the target quality evaluation value of the video frame image.

Optionally, if the video frame image is an image in video data composed of e-frame video frame images, the computer device may identify scene features in each frame of video frame image, so as to determine an image scene to which the video frame image belongs; or, the computer device may obtain, as a scene of the e-frame video frame image, a scene in which more scenes appear in a previous f-frame video frame image in the e-frame video frame image, where e and f are positive integers, and e is greater than f.

In a specific implementation, taking an example that at least two evaluation objects include a color value object, an aesthetic object, and a video quality object, a corresponding relationship between the weight parameters of the image scene and the evaluation objects may be as shown in table 1:

TABLE 1 correspondence table of weight parameters of image scene and evaluation object

The image scene comprises a live scene, a landscape scene and a street shooting scene, the evaluation object in the video frame image comprises a color value object, an aesthetic object and a video quality object, a1> a3> a2, b2> b3> b1, c3> c2> c1, a1+ a2+ a3=1, b1+ b2+ b3=1, and c1+ c2+ c3=1.

For example, if the number of evaluation objects in the video frame image is 3, which are a color object, an aesthetic object, and a video quality object, respectively, if the second quality evaluation value of the color object in the video frame image is s1, the second quality evaluation value of the aesthetic object in the video frame image is s2, the second quality evaluation value of the video quality object in the video frame image is s3, and the computer device detects that the image scene to which the video frame image belongs is a live broadcast scene, that is, the weight parameter corresponding to the color object corresponding to the image scene is a1, the weight parameter corresponding to the aesthetic object is a2, and the weight parameter corresponding to the video quality object is a3, the formula for calculating the target quality evaluation value of the video frame image is as shown in formula (1-1):

S＝a1*s1+a2*s2+a3*s3 (1-1)

where S is a target quality evaluation value of the video frame image. If the number of the evaluation objects in the video frame image is any 2 of the color value object, the aesthetic object and the video quality object, the weighting parameters a1 and a2 corresponding to the two evaluation objects in the image scene are obtained, and a1+ a2=1. If the number of the evaluation objects in the video frame image is 1 of the color value object, the aesthetic object and the video quality object, the weight parameter of the evaluation object in the acquired image scene is 1. When the video frame images belong to different image scenes, the types of the evaluation objects in the video frame images are different, for example, the quality evaluation influence of the color value object in the live scene on the video frame images is greater than that of the scenery scene, and for example, the quality evaluation influence of the aesthetic object in the scenery scene on the video frame images is greater than that of the live scene, so that the weight parameters of each evaluation object are determined according to different image scenes by acquiring the image scenes to which the video frame images belong, and the accuracy of the quality evaluation of the video frame images can be improved.

In the embodiment of the application, by performing attribute detection on a video frame image, whether the video frame image has attributes which affect the quality of the video frame image, such as pure screen, watermark, rotation and the like, can be determined, and if the video frame image is detected to have the attributes, a target quality evaluation value corresponding to the video frame image is determined to be a first quality evaluation value. If the video frame image is detected to have no attributes, detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image, identifying an image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image. That is to say, the present application may determine whether the video frame image has an attribute that affects the quality of the video frame image, that is, a low-quality video frame image attribute, by performing a primary detection on the video frame image. And under the condition that no attribute influencing the quality of the video frame image exists, determining a target quality evaluation value of the video frame image by carrying out secondary detection on the video frame image and combining the image scene to which the video frame image belongs. By the primary detection of the video frame image, whether the video frame image has low-quality video frame image attributes or not can be quickly determined, if yes, the subsequent detection is not carried out, and the subsequent calculation amount can be reduced; the secondary detection is carried out on the video frame image by combining the scene to which the video frame image belongs, so that the accuracy of the quality evaluation of the video frame image can be improved. In addition, since the video frame image is evaluated from multiple dimensions (including multiple dimensions of pure screen, watermark, rotation, color value, aesthetics, and video quality), the quality evaluation accuracy of the video data can be improved.

Optionally, when the computer device detects the evaluation object in the video frame image to obtain the second quality evaluation value of the evaluation object in the video frame image, the computer device may further process the video frame image according to the image composition structure of the video frame image, so as to improve accuracy of quality evaluation of the video frame image. Furthermore, the computer equipment can also display and popularize the video frame image in the social platform where the video frame image is located, so that the utilization rate of users of the social platform is increased. Referring to fig. 5, fig. 5 is a schematic flowchart of another video data processing method according to an embodiment of the present application, and as shown in fig. 5, the method includes:

s201, obtaining a video frame image and the attribute of a video to be detected, and performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result.

S202, if the attribute detection result is the result that the video frame image is matched with the attribute of the video to be detected, determining the first quality evaluation value as the target quality evaluation value of the video frame image.

Here, the specific implementation manner of steps S201 to S202 may refer to the description of steps S101 to S102 in the embodiment corresponding to fig. 3, and is not described herein again.

S203, if the attribute detection result is that the video frame image is not matched with the attribute of the video to be detected, detecting the image composition structure of the video frame image.

In the embodiment of the application, if the attribute detection result is the result that the video frame image and the attribute of the video to be detected are not matched, the computer equipment detects the image composition structure of the video frame image. The image composition structure may include a video frame image composed of a single frame image or a video frame image composed of a plurality of frame images. And if the image composition structure is a video frame image composed of multiple frames of images, determining that the video frame image is a frame composition structure. In specific implementation, the computer device may obtain terminal device information of a recording terminal corresponding to the video frame image, and if the video frame image corresponds to one terminal device information, the video frame image is considered to be a single frame image; and if the video frame image corresponds to the information of the plurality of terminal devices, the video frame image is considered to be a video frame image formed by a plurality of frame images, and the image forming structure of the video frame image is a frame synthesis structure. Optionally, the split-screen detection model may also be used to detect the video frame image and determine the image composition structure of the video frame image.

And S204, if the image composition structure of the video frame image is a frame composition structure, performing frame splitting processing on the video frame image to obtain at least two separated image data.

In the embodiment of the application, if the image composition structure of the video frame image is a frame composition structure, the computer device performs frame splitting processing on the video frame image to obtain at least two pieces of separated image data. The image data separation means that at least two video frame images are obtained by splitting the video frame images.

S205, detecting the evaluation object in each of the at least two separated image data, and obtaining a second quality evaluation value of the evaluation object in each of the separated image data.

Here, the method for detecting the evaluation object in each of the at least two separated image data by the computer device may refer to the method for detecting the evaluation object in the video frame image in step S103, and is not described herein again. For example, the number of the at least two separated image data is 2, and the separated image data is the separated image data 1 and the separated image data 2, respectively, then the computer device obtains the second quality evaluation value j1 of the evaluation object in the separated image data 1 by detecting the evaluation object in the separated image data 1, and obtains the second quality evaluation value j2 of the evaluation object in the separated image data 2 by detecting the evaluation object in the separated image data 2. A second quality evaluation value of the evaluation object in each of the at least two separated image data can thereby be calculated.

S206, the second quality evaluation value of the evaluation object in each separated image data is made up into the second quality evaluation value of the evaluation object in the video frame image.

Here, since the computer device performs the splitting process on the video frame image to obtain the at least two separated image data, the second quality assessment value of the evaluation object in each of the at least two separated image data is made up of the second quality assessment value of the evaluation object in the video frame image. For example, the video frame image is split to obtain separated image data 1 and separated image data 2, the number of evaluation objects in the separated image data 1 is 3, the second quality evaluation values of the 3 evaluation objects in the separated image data 1 are j1, j2, and j3, respectively, the number of evaluation objects in the separated image data 2 is 3, and the second quality evaluation values of the 3 evaluation objects in the separated image data 2 are k1, k2, and k3, respectively, so that the second quality evaluation values of the evaluation objects in the video frame image include j1, j2, j3, k1, k2, and k3.

S207, identifying an image scene to which the video frame image belongs, adjusting the second quality assessment value based on the image scene to obtain a target quality assessment value of the video frame image, and determining a target quality assessment value of video data corresponding to the video frame image based on the target quality assessment value of the video frame image.

In the embodiment of the present application, the description of step S103 may be referred to as a method for specifically identifying an image scene to which a video frame image belongs, and details are not repeated here. When the computer device performs frame splitting processing on the video frame image to obtain two separated image data, image scenes to which the two separated image data belong can be respectively identified, and second quality evaluation values of the two separated image data are respectively adjusted based on the image scenes to which the two separated image data belong. For example, if it is recognized that the image scenes to which the separated image data corresponding to the video frame image belongs are all live scenes, and the weighting parameters corresponding to the color object, the aesthetic object, and the video quality object corresponding to the live scenes are a1, a2, and a3, respectively, the method for adjusting the second quality assessment value corresponding to the separated image data 1 by the computer device based on the image scenes may refer to formula (1-2):

J＝a1*j1+a2*j2+a3*j3 (1-2)

where J1 is a second quality evaluation value of the color object in the separated image data 1, J2 is a second quality evaluation value of the aesthetic object in the separated image data 1, J3 is a second quality evaluation value of the video quality object in the separated image data 1, a1 is a weight parameter of the color object corresponding to the live broadcast scene, a2 is a weight parameter of the aesthetic object corresponding to the live broadcast scene, a3 is a weight parameter of the video quality object corresponding to the live broadcast scene, and J is a target quality evaluation value of the separated image data 1.

The method for adjusting the second quality assessment value corresponding to the separated image data 2 based on the image scene may refer to formula (1-3):

K＝a1*k1+a2*k2+a3*k3 (1-3)

where K1 is a second quality evaluation value of the color object in the separated image data 2, K2 is a second quality evaluation value of the aesthetic object in the separated image data 2, K3 is a second quality evaluation value of the video quality object in the separated image data 2, a1 is a weight parameter of the color object corresponding to the live broadcast scene, a2 is a weight parameter of the aesthetic object corresponding to the live broadcast scene, a3 is a weight parameter of the video quality object corresponding to the live broadcast scene, and K is a target quality evaluation value of the separated image data 2.

Optionally, when the computer device processes video data composed of multiple frames of video frame images, the computer device may split the video data to obtain at least two video frames composing the video data, where the computer device may determine the at least two video frames as the at least two video frame images to be processed in the embodiment of the present application, and may also perform frame extraction processing on the at least two video frames to obtain the at least two video frame images to be processed in the embodiment of the present application. For example, the computer device splits the video data to obtain g video frames constituting the video data, where the computer device may determine the g video frames as at least two video frame images to be processed in this embodiment, or may perform frame extraction on the g video frames to extract h video frames therein as the at least two video frame images to be processed in this embodiment. Wherein g and h are both positive integers, g is greater than h, and h can be 5, 9, 12 or other values, which are not limited herein.

Taking the example that the computer device extracts h video frames from g video frames, the quality evaluation of the h video frame images in the video data is performed in the embodiment of the present application. The computer device may perform frame splitting processing on each of the h video frame images based on the method in steps S204 to S207, calculate second quality assessment values of the evaluation object in the at least two split image data obtained after the splitting, and form the second quality assessment values of the evaluation object in the video frame images by using the second quality assessment values of the evaluation object in the at least two split image data; and adjusting the second quality evaluation value of the evaluation object in the video frame image based on the image scene to which the video frame image belongs to obtain a target quality evaluation value of the video frame image, and obtaining a target quality evaluation value of the video data based on the target quality evaluation value of each video frame image. Specifically, the computer device may calculate an average value of the target quality assessment values of each of the h video frame images as the target quality assessment value of the video data; or, performing weighted summation on the target quality assessment value of each video frame image in the h video frame images to serve as the target quality assessment value of the video data; alternatively, the minimum value of the target quality assessment values of each of the h video frame images is acquired as the target quality assessment value of the video data.

The process of the foregoing steps S203 to S207 may refer to fig. 6, where fig. 6 is a schematic view of a scene for splitting a video frame image provided in an embodiment of the present application, a computer device obtains the video frame image 61, detects the video frame image 61 based on the split-screen detection model 62, determines that an image composition structure of the video frame image 61 is a frame composition structure, and then performs frame splitting processing on the video frame image 61 by the computer device to obtain at least two

separated image data

63a and 63b. The computer device obtains the second quality evaluation values j1, j2, j3 of the evaluation object in the separated

image data

63a and 63b and the second quality evaluation values k1, k2, k3 of the evaluation object in the separated image data 63b by detecting the evaluation objects (including the color object, the aesthetic object, and the video quality object) in the separated

image data

63a and 63b, respectively. Further, the computer device identifies the image scene to which the video frame image 61 belongs to obtain the weight parameters a1, a2, and a3 of the plurality of objects corresponding to the image scene, and adjusts the second quality assessment value of the separated image data 63a based on the weight parameters a1, a2, and a3 of the plurality of objects corresponding to the image scene obtained image scene to obtain the target quality assessment value of the separated image data 63 a. The computer device adjusts the second quality assessment value of the separated image data 63b based on the weight parameters a1, a2, and a3, resulting in a target quality assessment value of the separated image data 63b. The computer device determines a target quality evaluation value of the video frame image based on the target quality evaluation value 63a of the separated image data and the target quality evaluation value 63b of the separated image data.

Alternatively, if the separated image data 63a and the separated image data 63b belong to different image scenes, the image scene to which the separated image data 63a belong and the image scene to which the separated image data 63b belong may be acquired, the second quality assessment value of the separated image data 63a may be adjusted based on the weight parameters of the objects corresponding to the image scene to which the separated image data 63a belong, and the second quality assessment value of the separated image data 63b may be adjusted based on the weight parameters of the objects corresponding to the image scene to which the separated image data 63b belong.

And S208, transmitting the target quality assessment value of the video data to the user terminal.

Here, the computer device may transmit the target quality assessment value of the video data to the user terminal, and a user who uploads the video data may view the target quality assessment value of the video data through the user terminal. Optionally, the computer device may send parameters (for example, noise, overexposure, non-frontal face, sensitive video content, poor color matching, and the like) affecting the target quality assessment value of the video data to the user terminal, so that the user terminal can quickly determine the reason why the video data quality assessment value is low, and modify the video data quality assessment value.

S209, acquiring an uploading determination request of the user terminal for the video data, uploading the video data to a social platform where the video data is located, and if the target quality assessment value of the video data is larger than a score threshold value, displaying the video data in a home page of the social platform.

Here, the user terminal sends an upload determination request for the video data, if the computer device receives the upload determination request for the video data from the user terminal, or if the target quality assessment value is greater than the score threshold value, the computer device uploads the video data to the social platform where the video data is located, and when the target quality assessment value of the video data is greater than the score threshold value, the video data is displayed in a home page of the social platform. The social platform where the video data is located may refer to a platform that the user terminal needs to send the video data to be displayed, and the computer device may also promote the video data so that more users can view the video data, the score threshold may be determined according to the number of the video data of the social platform and the number of the video data being uploaded by the user terminal, and the score threshold may be 0.6, 0.7, 0.8, or other numerical values.

Optionally, the computer device may further obtain a quality assessment value corresponding to voice data in the video data, and promote the video data by combining the quality assessment value corresponding to the voice data with a target quality assessment value of the video data, for example, when the quality assessment value corresponding to the voice data in the video data is greater than a sound threshold and the target quality assessment value of the video data is greater than a score threshold, promote the video data. In an implementation scenario, the video data is video data in a social platform about singing, the quality assessment value corresponding to the voice data in the video data may be a quality assessment value obtained by scoring the voice data in the video data through the computer device or other scoring devices corresponding to the social platform, and the computer device obtains the quality assessment value corresponding to the voice data, so that the video data can be popularized by combining the quality assessment value corresponding to the voice data and a target quality assessment value of the video data. Optionally, after the computer device sends the target quality assessment value of the video data to the user terminal, the computer device may further obtain a video frame image updated by the user terminal, and perform the processing of steps S101 to S103 on the updated video frame image, so as to obtain the target quality assessment value of the updated video frame image.

In the embodiment of the application, when the second quality assessment value of the assessment object in the video frame image is obtained by detecting the assessment object in the video frame image, the accuracy of the video frame image quality assessment can be improved because the image composition structure of the video frame image is obtained, the video frame image is split according to the image composition structure of the video frame image, and the split separated image data is processed. In addition, the video frame image is displayed and popularized in the social platform where the video frame image is located, and therefore the user utilization rate of the social platform can be increased.

Optionally, in the embodiment of the present application, the video frame image is processed mainly by a series module and a parallel module, where the series module is specifically configured to perform attribute detection on the video frame image based on the attribute of the video to be detected, so as to obtain an attribute detection result. The tandem module can detect whether the video frame image is a low-quality video image, such as a pure screen image, a watermark image, a rotation image, and the like. The video frame image is detected by using the serial module, when the video frame image is a low-quality video image, the first quality evaluation value is determined as the target quality evaluation value corresponding to the video frame image, and then the video frame image is not subjected to subsequent detection, so that the calculation amount can be saved, and the accuracy of the quality evaluation of the video frame image is improved. The parallel module is specifically used for detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image under the condition that the serial module detects an attribute detection result which is a result that the attribute of the video frame image is not matched with that of the video to be detected, identifying an image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image. The parallel module can be used for realizing secondary detection of the video frame image, and the quality evaluation value of the video frame image is adjusted by combining the image scene to which the video frame image belongs, so that the accuracy of the quality evaluation of the video frame image can be further improved.

Optionally, referring to fig. 7, fig. 7 is a schematic flowchart of another video data processing method provided in an embodiment of the present application, and as shown in fig. 7, the method includes:

s301, video data is acquired.

S302, frame extraction processing is carried out on the video data to obtain a video frame image.

S303, detecting whether the video frame image has a pure screen attribute.

The computer equipment obtains the attribute detection result by obtaining the attribute of the video to be detected and carrying out attribute detection on the video frame image based on the attribute of the video to be detected. If the video frame image has the pure screen attribute, executing step S306; if not, go to step S304.

S304, detecting whether the watermark attribute exists in the video frame image.

If the video frame image has the watermark attribute, executing step S306; if not, step S305 is executed.

S305, detecting whether the video frame image has the rotation attribute.

If the video frame image has the rotation attribute, executing step S306; if not, go to step S307.

S306, determining the target quality evaluation value of the video frame image as the detection result that the video frame image is matched with the attribute of the video to be detected.

The computer equipment determines a result of matching the video frame image with the attribute of the video to be detected as an attribute detection result of the video frame image, and determines the first quality evaluation value as a target quality evaluation value of the video frame image. After the computer device determines the first quality assessment value as a target quality assessment value of the video frame image, step S310 is performed to determine a target quality assessment value of video data corresponding to the video frame image based on the target quality assessment value of the video frame image.

S307, detecting whether the composition structure of the video frame image is a frame synthesis structure.

If the composition structure of the video frame image is a frame composition structure, step S308 is executed; if not, go to step S309.

And S308, carrying out frame splitting processing on the video frame image, and detecting an evaluation object in the video frame image after the frame splitting processing to obtain a second quality evaluation value of the evaluation object in the video frame image.

Wherein the evaluation object includes a color value object, an aesthetic object, and a video quality object.

S309, identifying a weight parameter corresponding to an image scene to which the video frame image belongs, and adjusting the second quality assessment value based on the weight parameter to obtain a target quality assessment value of the video frame image.

S310, determining a target quality evaluation value of video data corresponding to the video frame image based on the target quality evaluation value of the video frame image.

The method of the embodiments of the present application is described above, and the apparatus of the embodiments of the present application is described below.

Referring to fig. 8, fig. 8 is a schematic diagram illustrating a component structure of a video data processing apparatus according to an embodiment of the present application, where the video data processing apparatus may be a computer program (including program code) running in a computer device, for example, the video data processing apparatus is an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. The apparatus 80 comprises:

the category detection module 81 is configured to obtain a video frame image and a to-be-detected video attribute, and perform attribute detection on the video frame image based on the to-be-detected video attribute to obtain an attribute detection result; the attribute of the video to be tested is used for indicating the attribute influencing the quality of the video frame image;

a category matching module 82, configured to determine a first quality assessment value as a target quality assessment value of the video frame image if the attribute detection result is a result that the video frame image matches the attribute of the video to be detected;

and an object detection module 83, configured to detect the evaluation object in the video frame image if the attribute detection result is a result that the video frame image and the attribute of the video to be detected are not matched, obtain a second quality evaluation value of the evaluation object in the video frame image, identify an image scene to which the video frame image belongs, and adjust the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image.

Optionally, the category detecting module 81 includes:

the data acquisition unit 811 is configured to acquire a pixel mean and a pixel variance corresponding to pixel values of pixel points that form the video frame image based on the attribute of the video to be detected, and determine statistical data of the video frame image based on the pixel mean and the pixel variance;

a threshold matching unit 812, configured to determine, if the statistical data of the video frame image is smaller than the statistical anomaly threshold, a result that the video frame image matches the attribute of the video to be detected as an attribute detection result of the video frame image.

Optionally, the category detecting module 81 includes:

an icon detecting unit 813, configured to perform icon detection on the video frame image based on the attribute of the video to be detected, and if a video icon is detected to exist in the video frame image, obtain a marker icon, and perform feature matching on features of the video icon and features of the marker icon;

an icon matching unit 814, configured to determine, if the feature of the video icon does not match the feature of the marker icon, a result that the video frame image matches the attribute of the video to be detected as an attribute detection result of the video frame image.

Optionally, the category detecting module 81 includes:

a feature identification unit 815, configured to obtain a reference coordinate system based on the attribute of the video to be detected, and identify a key feature point in the video frame image;

a position determining unit 816, configured to determine, based on the feature position of the key feature point in the reference coordinate system, relative position information of the key feature point;

a position matching unit 817, configured to determine, if the relative position information does not belong to a standard relative position range, a result of matching the video frame image with the attribute of the to-be-detected video is an attribute detection result of the video frame image.

Optionally, the number of the evaluation objects in the video frame image is at least two, and the at least two evaluation objects include a color value object, an aesthetic object and a video quality object; the object detection module 83 includes:

a color value determining unit 831 for detecting a color value object in the video frame image based on a color value detection model, determining a second quality assessment value of the color value object in the video frame image;

an aesthetics determining unit 832 for detecting an aesthetic object in the video frame image based on an aesthetics detection model, determining an image composition structure of the video frame image, identifying color value combination information of the video frame image, determining a second quality assessment value of the aesthetic object in the video frame image according to the image composition structure and the color value combination information;

a quality determining unit 833, configured to detect a video quality object in the video frame image based on the quality detection model, determine color gradation distribution information of the video frame image, identify noise information in the video frame image, and determine a second quality assessment value of the video quality object in the video frame image according to the color gradation distribution information and the noise information.

Optionally, the number of the evaluation objects is at least two; the object detection module 83 includes:

a weight obtaining unit 834 for obtaining weight information corresponding to an image scene to which the video frame image belongs; the weight information comprises weight parameters respectively corresponding to at least two evaluation objects;

a weight calculating unit 835, configured to perform weighted summation on the second quality assessment values respectively corresponding to the at least two assessment objects based on the weight parameters respectively corresponding to the at least two assessment objects, so as to obtain a target quality assessment value of the video frame image.

Optionally, the object detecting module 83 includes:

a structure determination unit 836 for detecting an image composition structure of the video frame image;

an image splitting unit 837, configured to, if the image composition structure of the video frame image is a frame composition structure, perform frame splitting processing on the video frame image to obtain at least two pieces of separated image data;

an image detecting unit 838 for detecting the evaluation object in each of the at least two separated image data to obtain a second quality evaluation value of the evaluation object in each of the at least two separated image data;

a data composing unit 839 is used for composing the second quality evaluation value of the evaluation object in each of the separated image data into the second quality evaluation value of the evaluation object in the video frame image.

It should be noted that, for content that is not mentioned in the embodiment corresponding to fig. 8, reference may be made to the description of the method embodiment, and details are not repeated here.

In the embodiment of the application, by performing attribute detection on a video frame image, whether the video frame image has attributes which affect the quality of the video frame image, such as pure screen, watermark, rotation and the like, can be determined, and if the video frame image is detected to have the attributes, a target quality evaluation value corresponding to the video frame image is determined to be a first quality evaluation value. If the video frame image is detected to have no attributes, detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image, identifying an image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image. That is to say, the present application may determine, by performing a primary detection on a video frame image, whether the video frame image has an attribute that affects the quality of the video frame image, that is, a low-quality video frame image attribute. And under the condition that no attribute influencing the quality of the video frame image exists, determining a target quality evaluation value of the video frame image by carrying out secondary detection on the video frame image and combining the image scene to which the video frame image belongs. By the primary detection of the video frame image, whether the video frame image has low-quality video frame image attributes or not can be quickly determined, if yes, the subsequent detection is not carried out, and the subsequent calculation amount can be reduced; the secondary detection is carried out on the video frame image by combining the scene to which the video frame image belongs, so that the accuracy of the quality evaluation of the video frame image can be improved. In addition, since the video frame image is evaluated from multiple dimensions (including screen-only, watermark, rotation, color value, aesthetics, and video quality), the quality evaluation accuracy of the video data can be improved.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure. As shown in fig. 9, the computer device 90 may include: a processor 901, a network interface 904 and a memory 905, wherein the computer device 90 further includes: a user interface 903, and at least one communication bus 902. Wherein a communication bus 902 is used to enable the connective communication between these components. The user interface 903 may include a Display (Display) and a Keyboard (Keyboard), and the optional user interface 903 may also include a standard wired interface and a standard wireless interface. The network interface 904 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 905 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 905 may optionally be at least one memory device located remotely from the processor 901. As shown in fig. 9, the memory 905, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 90 shown in fig. 9, the network interface 904 may provide a network communication function; and the user interface 903 is primarily an interface for providing input to a user; and the processor 901 may be configured to invoke a device control application stored in the memory 905 to implement:

acquiring a video frame image and the attribute of a video to be detected, and performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result; the attribute of the video to be tested is used for indicating the attribute influencing the quality of the video frame image;

It should be understood that the computer device 90 described in this embodiment of the present application may perform the description of the video data processing method in the embodiment corresponding to fig. 3, fig. 5, and fig. 7, and may also perform the description of the video data processing apparatus in the embodiment corresponding to fig. 8, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

In the embodiment of the application, by performing attribute detection on a video frame image, whether the video frame image has attributes which affect the quality of the video frame image, such as pure screen, watermark, rotation and the like, can be determined, and if the video frame image is detected to have the attributes, a target quality evaluation value corresponding to the video frame image is determined to be a first quality evaluation value. If the video frame image is detected to have no attributes, detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image, identifying an image scene to which the video frame image belongs, and adjusting the second quality evaluation value based on the image scene to obtain a target quality evaluation value of the video frame image. That is to say, the present application may determine, by performing a primary detection on a video frame image, whether the video frame image has an attribute that affects the quality of the video frame image, that is, a low-quality video frame image attribute. And under the condition that the attribute influencing the quality of the video frame image does not exist, secondarily detecting the video frame image and combining the image scene to which the video frame image belongs to determine the target quality evaluation value of the video frame image. Whether the video frame image has the attribute of the low-quality video frame image can be quickly determined through primary detection of the video frame image, if so, subsequent detection is not carried out, and the subsequent calculation amount can be reduced; the secondary detection is carried out on the video frame image by combining the scene to which the video frame image belongs, so that the accuracy of the quality evaluation of the video frame image can be improved. In addition, since the video frame image is evaluated from multiple dimensions (including screen-only, watermark, rotation, color value, aesthetics, and video quality), the quality evaluation accuracy of the video data can be improved.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program, the computer program comprising program instructions, which, when executed by a computer, cause the computer to perform the method according to the foregoing embodiments, and the computer may be a part of the above-mentioned computer device. Such as the processor 901 described above. As an example, the program instructions may be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network, and the multiple computer devices distributed across the multiple sites and interconnected by the communication network may constitute a blockchain network.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A method of processing video data, comprising:

if the attribute detection result is the result that the video frame image is not matched with the attribute of the video to be detected, detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image; the number of the evaluation objects is at least two;

identifying an image scene to which the video frame image belongs, and acquiring weight information corresponding to the image scene to which the video frame image belongs; the weight information comprises weight parameters respectively corresponding to at least two evaluation objects;

and performing weighted summation on the second quality evaluation values respectively corresponding to the at least two evaluation objects based on the weight parameters respectively corresponding to the at least two evaluation objects to obtain a target quality evaluation value of the video frame image.

2. The method according to claim 1, wherein the performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result includes:

acquiring a pixel mean value and a pixel variance corresponding to pixel values of pixel points forming the video frame image based on the attribute of the video to be detected, and determining statistical data of the video frame image based on the pixel mean value and the pixel variance;

and if the statistical data of the video frame image is smaller than a statistical anomaly threshold value, determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image.

3. The method according to claim 1, wherein the performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result includes:

performing icon detection on the video frame image based on the attribute of the video to be detected, if the video frame image is detected to have a video icon, acquiring a mark icon, and performing feature matching on the features of the video icon and the features of the mark icon;

and if the characteristics of the video icon are not matched with the characteristics of the mark icon, determining the result of matching the video frame image with the attributes of the video to be detected as the attribute detection result of the video frame image.

4. The method according to claim 1, wherein the performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result comprises:

acquiring a reference coordinate system based on the attribute of the video to be detected, and identifying key feature points in the video frame image;

determining relative position information of the key feature points based on the feature positions of the key feature points in the reference coordinate system;

and if the relative position information does not belong to the standard relative position range, determining the result of matching the video frame image with the attribute of the video to be detected as the attribute detection result of the video frame image.

5. The method according to any one of claims 1-4, wherein the at least two evaluation objects comprise at least two of a color value object, an aesthetic object, and a video quality object; the color value object is used for indicating color value characteristics in the video frame image and representing five sense organs information and angles; the aesthetic object is used for representing the image composition structure, color value combination and video content of the video frame image; the video quality object is used for representing the tone scale distribution and the noise of the video frame image;

the detecting the evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image includes:

detecting a color value object in the video frame image based on a color value detection model, and determining a second quality evaluation value of the color value object in the video frame image;

detecting an aesthetic object in the video frame image based on an aesthetic detection model, determining an image composition structure of the video frame image, identifying color value combination information of the video frame image, and determining a second quality assessment value of the aesthetic object in the video frame image according to the image composition structure and the color value combination information;

detecting a video quality object in the video frame image based on a quality detection model, determining color gradation distribution information of the video frame image, identifying noise information in the video frame image, and determining a second quality evaluation value of the video quality object in the video frame image according to the color gradation distribution information and the noise information.

6. The method according to claim 1, wherein the detecting the evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image comprises:

detecting an image composition structure of the video frame image;

if the image composition structure of the video frame image is a frame synthesis structure, performing frame splitting processing on the video frame image to obtain at least two separated image data;

detecting an evaluation object in each of the at least two separated image data to obtain a second quality evaluation value of the evaluation object in each of the at least two separated image data;

composing the second quality evaluation value of the evaluation object in each of the separated image data into a second quality evaluation value of the evaluation object in the video frame image;

the method further comprises the following steps:

and if the number of the video frame images is at least two, determining the target quality evaluation value of the video data to which the at least two video frame images belong based on the target quality evaluation value of each video frame image.

7. A video data processing apparatus, comprising:

the category detection module is used for acquiring a video frame image and the attribute of a video to be detected, and performing attribute detection on the video frame image based on the attribute of the video to be detected to obtain an attribute detection result; the attribute of the video to be detected is used for indicating the attribute influencing the quality of the video frame image;

the category matching module is used for determining a first quality evaluation value as a target quality evaluation value of the video frame image if the attribute detection result is a result that the video frame image is matched with the attribute of the video to be detected;

the object detection module is used for detecting an evaluation object in the video frame image to obtain a second quality evaluation value of the evaluation object in the video frame image if the attribute detection result is the result that the video frame image and the attribute of the video to be detected are not matched; the number of the evaluation objects is at least two;

the object detection module is further configured to identify an image scene to which the video frame image belongs, and obtain weight information corresponding to the image scene to which the video frame image belongs; the weight information comprises weight parameters respectively corresponding to at least two evaluation objects;

the object detection module is further configured to perform weighted summation on the second quality assessment values respectively corresponding to the at least two assessment objects based on the weight parameters respectively corresponding to the at least two assessment objects, so as to obtain a target quality assessment value of the video frame image.

8. A computer device, comprising: a processor, a memory, and a network interface;

the processor is coupled to the memory and the network interface, wherein the network interface is configured to provide data communication functionality, the memory is configured to store program code, and the processor is configured to invoke the program code to perform the method of any of claims 1-6.

9. A computer-readable storage medium, in which a computer program is stored which is adapted to be loaded by a processor and to carry out the method of any one of claims 1 to 6.