CN117615146A

CN117615146A - Video processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN117615146A
Application number: CN202311511860.7A
Authority: CN
Inventors: 宁沛荣; 高敏; 请求不公布姓名; 樊星星; 曲建峰; 段晨辉; 陈靖
Original assignee: Shuhang Technology Beijing Co ltd
Current assignee: Shuhang Technology Beijing Co ltd
Priority date: 2023-11-13
Filing date: 2023-11-13
Publication date: 2024-02-27

Abstract

The application discloses a video processing method and device, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring a video to be processed; selecting a video frame from the video to be processed as a video frame to be filtered; determining picture complexity of the video frame to be filtered, wherein the picture complexity represents complexity of textures in the video frame to be filtered; and determining the target number of reference frames of the video frames to be filtered according to the picture complexity, wherein the reference frames are used for filtering the video frames to be filtered, and the target number is inversely related to the picture complexity.

Description

Video processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video processing method and apparatus, an electronic device, and a computer readable storage medium.

Background

Because the video frames in the video have correlation, when the video frames to be filtered in the video are filtered, the video frames except the video frames to be filtered in the video can be utilized to filter the video frames to be filtered, wherein the video frames used for filtering the video frames to be filtered in the video are reference frames. Obviously, the number of reference frames affects the filtering effect of the video frames to be filtered, so how to determine the number of reference frames has a very important meaning for filtering the video frames to be filtered.

Disclosure of Invention

The application provides a video processing method and device, an electronic device and a computer readable storage medium to determine the number of reference frames.

In a first aspect, a video processing method is provided, the method comprising:

acquiring a video to be processed;

selecting a video frame from the video to be processed as a video frame to be filtered;

determining picture complexity of the video frame to be filtered, wherein the picture complexity represents complexity of textures in the video frame to be filtered;

and determining the target number of reference frames of the video frames to be filtered according to the picture complexity, wherein the reference frames are used for filtering the video frames to be filtered, and the target number is inversely related to the picture complexity.

In combination with any one of the embodiments of the present application, the determining the picture complexity of the video frame to be filtered includes:

and obtaining the picture complexity according to the first amplitude of the transverse gradient of the video frame to be filtered and/or the second amplitude of the longitudinal gradient of the video frame to be filtered, wherein the picture complexity is positively correlated with the first amplitude and the second amplitude.

In combination with any one of the embodiments of the present application, the obtaining the picture complexity according to the first magnitude of the lateral gradient of the video frame to be filtered and/or the second magnitude of the longitudinal gradient of the video frame to be filtered includes:

Determining a first amplitude of a transverse gradient of the video frame to be filtered and a second amplitude of a longitudinal gradient of the video frame to be filtered;

calculating the sum of the first amplitude and the second amplitude to obtain a target value;

and obtaining the picture complexity according to the target value, wherein the picture complexity is positively correlated with the target value.

In combination with any one of the embodiments of the present application, the determining the first magnitude of the lateral gradient of the video frame to be filtered and the second magnitude of the longitudinal gradient of the video frame to be filtered includes:

determining a human eye region of interest from the video frame to be filtered, wherein the human eye region of interest is a region of interest of human eyes in the video frame to be filtered under the condition that the video frame to be filtered is displayed;

determining the amplitude of the transverse gradient of the eye region of interest to obtain the first amplitude;

and determining the amplitude of the longitudinal gradient of the eye region of interest to obtain the second amplitude.

In combination with any one of the embodiments of the present application, the determining, according to the picture complexity, the target number of reference frames of the video frame to be filtered includes:

obtaining the target number by increasing the number of the reference frames on the basis of a preset number, wherein the preset number is the number of the reference frames when the picture complexity is smaller than a first threshold;

Determining the predetermined number as the target number in a case where the picture complexity is greater than or equal to the first threshold value and less than or equal to a second threshold value;

and in the case that the picture complexity is greater than the second threshold, obtaining the target number by reducing the number of the reference frames on the basis of the predetermined number.

In combination with any one of the embodiments of the present application, after determining the target number of reference frames of the video frame to be filtered according to the picture complexity, the method further includes:

determining n frames of video frames with the time stamps closest to the time stamps of the frames of the video to be filtered from the video to be processed, wherein the n frames are the same as the target number, and the n frames are used as n frame reference frames of the video to be filtered;

filtering the video frame to be filtered according to the n frame reference frames to obtain a filtered video frame;

and replacing the video frames to be filtered in the video to be processed with the video frames to be filtered, so as to obtain the video after being filtered.

In combination with any one of the embodiments of the present application, the filtering the video frame to be filtered according to the n frame reference frames to obtain a filtered video frame includes:

Dividing the video frame to be filtered into m reference image blocks;

for each of the m reference image blocks, determining a target image block having a matching relationship from the n frames of reference frames;

carrying out weighted average on the reference image block and the target image block with the matching relation to obtain m filtering image blocks;

and obtaining the filtered video frame according to the m filtered image blocks.

In combination with any of the embodiments of the present application, after obtaining the filtered video, the method further includes:

and obtaining the coded video of the video to be processed by coding the filtered video.

determining a variance of pixel values of the video frame to be filtered;

and determining the picture complexity according to the variance, wherein the picture complexity is positively correlated with the variance.

In a second aspect, there is provided a video processing apparatus, the apparatus comprising:

the acquisition unit is used for acquiring the video to be processed;

the selecting unit is used for selecting one frame of video frame from the video to be processed as a video frame to be filtered;

A determining unit, configured to determine a picture complexity of the video frame to be filtered, where the picture complexity characterizes a complexity of a texture in the video frame to be filtered;

the determining unit is configured to determine, according to the picture complexity, a target number of reference frames of the video frame to be filtered, where the reference frames are used to filter the video frame to be filtered, and the target number is inversely related to the picture complexity.

In combination with any one of the embodiments of the present application, the determining unit is configured to:

In combination with any one of the embodiments of the present application, the determining unit is further configured to determine, from the video to be processed, an n-frame video frame having a timestamp closest to a timestamp of the video frame to be filtered, as an n-frame reference frame of the video frame to be filtered, where n is the same as the target number;

the device further comprises:

the filtering unit is used for filtering the video frame to be filtered according to the n frame reference frames to obtain a filtered video frame;

and the replacing unit is used for replacing the video frames to be filtered in the video to be processed by using the video frames to be filtered, so as to obtain the video after being filtered.

In combination with any one of the embodiments of the present application, the filtering unit is configured to:

dividing the video frame to be filtered into m reference image blocks;

In combination with any one of the embodiments of the present application, the apparatus further includes: and the encoding unit is used for obtaining the encoded video of the video to be processed by encoding the filtered video.

determining a variance of pixel values of the video frame to be filtered;

In a third aspect, an electronic device is provided, comprising: a processor and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform a method as described in the first aspect and any one of its possible implementations.

In a fourth aspect, there is provided another electronic device comprising: a processor, a transmitting means, an input means, an output means and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the first aspect and any implementation thereof as described above.

In a fifth aspect, there is provided a computer readable storage medium having stored therein a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the first aspect and any implementation thereof as described above.

In a sixth aspect, there is provided a computer program product comprising a computer program or instructions which, when run on a computer, cause the computer to perform the first aspect and any embodiments thereof.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

In the application, after the video processing device acquires the video to be processed, a video frame is selected from the video to be processed as a video frame to be filtered. A picture complexity of the video frame to be filtered is determined, wherein the picture complexity characterizes a complexity of a texture in the video frame to be filtered. And determining the target number of the reference frames of the video frames to be filtered according to the picture complexity under the condition that the target number of the reference frames is inversely related to the picture complexity, wherein the reference frames are used for filtering the video frames to be filtered. In this way, in the case that the complexity of the texture in the video frame to be filtered is low, the number of reference frames of the video frame to be filtered is large, and accordingly, the intensity of filtering performed on the video frame to be filtered by using the reference frames is large, and due to the low complexity of the texture in the video frame to be filtered, even if the intensity of filtering is large, the loss of the texture caused by filtering is small. Under the condition that the complexity of textures in the video frames to be filtered is high, the number of reference frames of the video frames to be filtered is small, correspondingly, the strength of filtering carried out on the video frames to be filtered by using the reference frames is low, and due to the fact that the complexity of textures in the video frames to be filtered is high, the strength of filtering is low, and loss of textures caused by filtering can be reduced. That is, with the video processing method of the present application, the target number of reference frames that match the complexity of the texture in the video frame to be filtered can be determined.

Drawings

In order to more clearly describe the technical solutions in the embodiments or the background of the present application, the following description will describe the drawings that are required to be used in the embodiments or the background of the present application.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the technical aspects of the application.

Fig. 1 is a schematic flow chart of a video processing method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of determining a target number of reference frames of a video frame to be filtered according to an embodiment of the present application;

fig. 3 is a schematic architecture diagram of a video publishing system according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The execution body of the embodiment of the application is a video processing device, where the video processing device may be any electronic device capable of executing the technical scheme disclosed in the embodiment of the method of the application. Alternatively, the video processing device may be one of the following: computer, server.

It should be understood that the method embodiments of the present application may also be implemented by way of a processor executing computer program code. Embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. Referring to fig. 1, fig. 1 is a flowchart of a video processing method according to an embodiment of the present application.

101. And acquiring the video to be processed.

In this embodiment of the present application, the video to be processed may be a video including any content, for example, the video to be processed is a video of a basketball game, for example, the video to be processed is a dance video, and for example, the video to be processed is a video including both a dance and a basketball game.

In one implementation of acquiring a video to be processed, a video processing device receives the video to be processed input by a user through an input component. The input assembly includes at least one of: keyboard, mouse, touch screen, touch pad, audio input device.

In another implementation manner of obtaining the video to be processed, the video processing device receives the video to be processed sent by the terminal. The terminal may be any of the following: cell phone, computer, panel computer, server.

In yet another implementation of obtaining the video to be processed, the video processing apparatus obtains the video to be processed by downloading the video from the internet.

In still another implementation manner of acquiring the video to be processed, a communication connection is arranged between the video processing device and the camera, and the camera acquires the video acquired by the camera as the video to be processed through the communication connection.

102. And selecting a video frame from the video to be processed as a video frame to be filtered.

103. And determining the picture complexity of the video frame to be filtered.

In this embodiment of the present application, the picture complexity includes the complexity of the texture in the video frame to be filtered, where the more complex the texture in the video frame to be filtered, the higher the complexity of the texture in the video frame to be filtered, and correspondingly, the higher the picture complexity of the video frame to be filtered. Optionally, in the video frame to be filtered, the larger the area of the flat area is, the lower the picture complexity of the video frame to be filtered is. Optionally, the picture complexity of the video frame to be filtered includes a kind of color in the video frame to be filtered, wherein the more kinds of colors in the video frame to be filtered, the higher the picture complexity of the video frame to be filtered. Optionally, the picture complexity of the video frame to be filtered includes a number of edges in the video frame to be filtered, wherein the greater the number of edges in the video frame to be filtered, the higher the picture complexity of the video frame to be filtered. Optionally, the picture complexity of the video frame to be filtered includes a number of objects in the video frame to be filtered, wherein the greater the number of objects in the video frame to be filtered, the higher the picture complexity of the video frame to be filtered.

In one possible implementation, the video processing device calculates the magnitude of the lateral gradient of the video frame to be filtered to obtain the first magnitude. And obtaining the picture complexity according to the first amplitude value of the transverse gradient of the video frame to be filtered, wherein the picture complexity is positively correlated with the first amplitude value.

The larger the amplitude of the transverse gradient of the video frame to be filtered is, the higher the complexity of the texture in the video frame to be filtered is, so that the video processing device obtains the picture complexity according to the first amplitude of the transverse gradient of the video frame to be filtered under the condition that the picture complexity is positively correlated with the first amplitude. Optionally, the video processing device uses the first amplitude as a picture complexity of the video frame to be filtered.

In another possible implementation, the video processing device calculates the magnitude of the longitudinal gradient of the video frame to be filtered to obtain the second magnitude. And obtaining the picture complexity according to the second amplitude of the longitudinal gradient of the video frame to be filtered, wherein the picture complexity and the second amplitude are positively correlated.

The larger the amplitude of the longitudinal gradient of the video frame to be filtered is, the higher the complexity of the texture in the video frame to be filtered is, so that the video processing device obtains the picture complexity according to the second amplitude of the longitudinal gradient of the video frame to be filtered under the condition that the picture complexity and the second amplitude are positively correlated. Optionally, the video processing device uses the second amplitude as a picture complexity of the video frame to be filtered.

In yet another possible implementation, the video processing apparatus calculates the magnitude of the lateral gradient of the video frame to be filtered to obtain the first magnitude. The video processing device calculates the amplitude of the longitudinal gradient of the video frame to be filtered to obtain a second amplitude. And calculating the sum of the first amplitude value and the second amplitude value to obtain a target value. And obtaining the picture complexity according to the target value, wherein the picture complexity is positively correlated with the target value, namely, the picture complexity is positively correlated with the first amplitude value and the second amplitude value.

Optionally, the video processing device uses the sum of the first amplitude and the second amplitude as the picture complexity of the video frame to be filtered.

Optionally, the video processing device determines the first magnitude of the lateral gradient of the video frame to be filtered and the second magnitude of the longitudinal gradient of the video frame to be filtered by performing the steps of: and determining a human eye region of interest from the video frame to be filtered, wherein the human eye region of interest is a region of interest of human eyes in the video frame to be filtered under the condition that the video frame to be filtered is displayed. And determining the amplitude of the transverse gradient of the eye region of interest to obtain a first amplitude. And determining the amplitude of the longitudinal gradient of the eye region of interest to obtain a second amplitude.

In this embodiment of the present application, the region of interest of the human eye is a region of interest of the human eye in the video frame to be filtered. The human eye attention area is an area which is focused by human eyes in the video frame to be filtered. In one possible implementation, the human eye region of interest is a region of pixels covered by a foreground in the video frame to be filtered. For example, for an image that includes a person, the person is a foreground. When people watch the image, the focus is the person in the image, and therefore, the pixel area covered by the person is the area of attention of human eyes. The image processing device determines a pixel area covered by the foreground from the video frame to be filtered, and a human eye attention area can be obtained.

In another possible implementation, the pixel area covered by the foreground includes an area with a large change in image content and an area with a small change in image content, where the basis for distinguishing whether the change in image content is large or small can be determined according to actual requirements. For example, for any region within a pixel region covered by the foreground, a variance of pixel values greater than a variance threshold may determine that the region is a region of large image content variation, and a variance of pixel values less than or equal to the variance threshold may determine that the region is a region of small image content variation. For another example, for any region within the pixel region covered by the foreground, a variance of the gradient greater than the gradient threshold may determine that the region is a region of large image content variation, and a variance of the gradient less than or equal to the gradient threshold may determine that the region is a region of small image content variation. The image processing apparatus uses an area where the image content is greatly changed as a human eye region of interest.

In yet another possible implementation, the pixel area covered by the foreground includes at least two object pixel areas, where different object pixel areas are pixel areas covered by different objects, for example, in the case where the foreground is a person, the pixel area covered by the person includes a pixel area covered by a face, a pixel area covered by four limbs, and a pixel area covered by a torso other than four limbs, and at this time, the pixel area covered by the face, the pixel area covered by four limbs, and the pixel area covered by a torso other than four limbs are all object pixel areas.

Because the probability that different objects are focused by human eyes is different, the basis for determining whether the image content of different object pixel areas is large or small is different, specifically, the lower the probability that the object corresponding to the object pixel area is focused by human eyes, the higher the standard for determining that the object pixel area is the area with large image content variation. For example, when the foreground is a person, the probability that the human face is focused on by the human eye is higher than the probability that the limbs are focused on by the human eye, and therefore, the criterion for determining that the pixel area covered by the human face is an area where the image content is greatly changed is lower than the criterion for determining that the pixel area covered by the limbs is an area where the image content is greatly changed. For example, for a pixel region covered by a face, a variance of a gradient greater than a face gradient threshold may determine that the region is a region with a large change in image content, and a variance of a gradient less than or equal to the face gradient threshold may determine that the region is a region with a small change in image content. For a pixel region covered by an extremity, a region with a gradient variance greater than an extremity gradient threshold can be determined to be a region with large image content variation, and a region with a gradient variance less than or equal to the extremity gradient threshold can be determined to be a region with small image content variation. Then the face gradient threshold is less than the limb gradient threshold.

By differentiating the criteria for determining whether or not different pixel regions are regions of great variation in image content, the probability that the regions of great variation in image content are regions of interest to the human eye can be improved. For example, the foreground covers a region of pixels that includes a color gradient that has a change in image content, but the change in image content in that region is slowly changing, so the location where the color gradient appears will affect the probability that the color gradient will be of interest to the human eye. For example, in the case where the foreground is a person, if the color gradation region appears on clothing worn by the person, the probability that the color gradation region is focused on by the human eye is small, and if the color gradation region appears on the face, the probability that the color gradation region is focused on by the human eye is large. That is, whether or not the regions of different pixels have regions with large variation in image content is determined on the same criterion, which tends to result in a low probability that the regions with large variation in image content are regions of interest to the human eye. Therefore, by differentiating the criteria for determining whether or not the different pixel regions are regions in which the image content is greatly changed, the probability that the regions in which the image content is greatly changed are regions of interest to the human eye can be improved.

After the video processing device determines the human eye region of interest from the video frame to be filtered, determining the amplitude of the transverse gradient of the human eye region of interest as a first amplitude and determining the amplitude of the longitudinal gradient of the human eye region of interest as a second amplitude, thus, the picture complexity of the video frame to be filtered is determined based on the first amplitude and/or the second amplitude, and the picture complexity of the video frame to be filtered can be better represented.

In yet another possible implementation manner, the video processing apparatus determines a variance of pixel values of the video frame to be filtered, and in particular, the video processing apparatus calculates variances of pixel values of all pixels in the video frame to be filtered, to obtain variances of pixel values of the video frame to be filtered. And determining the picture complexity according to the variance of the pixel value, wherein the picture complexity is positively correlated with the variance of the pixel value.

The variance of the pixel values of the video frame to be filtered is large, which indicates that the difference of the pixel values of different pixels in the video frame to be filtered is large, and also indicates that the difference of the image content of different pixels in the video frame to be filtered is large, and further indicates that the picture complexity of the video frame to be filtered is high. Therefore, the video processing apparatus determines the picture complexity from the variance of the pixel values in the case where the picture complexity and the variance of the pixel values are positively correlated. Optionally, the video processing device uses the variance of the pixel values as the picture complexity of the video frame to be filtered.

In yet another possible implementation, the video processing apparatus inputs the video frame to be filtered into a picture complexity network, resulting in a picture complexity of the video frame to be filtered. Wherein the picture complexity network is a neural network trained by: inputting the training image into a picture complexity network, predicting the picture complexity of the training image, and obtaining the training complexity of the training image. And obtaining a loss of a picture complexity network according to the difference between the training complexity and the label of the training image, wherein the loss of the picture complexity network is positively correlated with the difference, the label of the training image is the actual picture complexity of the training image, and the actual picture complexity is the true value (GT) of the picture complexity of the training image.

104. And determining the target number of the reference frames of the video frames to be filtered according to the picture complexity.

In the embodiment of the application, the reference frame is used for filtering the video frame to be filtered, and the reference frame is the basis for filtering the video frame to be filtered. In one possible implementation, the video frames to be filtered are filtered using motion compensation based weighted temporal filtering (motion compensated temporal filter, MCTF) and reference frames. In another possible implementation manner, the reference frame may be used to perform weighted average filtering on the video frame to be filtered, and specifically, the reference frame and the video frame to be filtered may be subjected to weighted average to obtain the filtered video frame to be filtered.

The more the number of the reference frames is, the greater the intensity of filtering the video frames to be filtered by using the reference frames is, and correspondingly, the more textures in the video frames to be filtered are lost by filtering, whereas the fewer the number of the reference frames is, the less the intensity of filtering the video frames to be filtered by using the reference frames is, and correspondingly, the less textures in the video frames to be filtered are lost by filtering. Thus, the number of reference frames should be determined for the video frame to be filtered based on the complexity of the texture in the video frame to be filtered.

As described above, the picture complexity of the video frame to be filtered characterizes the complexity of the texture in the video frame to be filtered, and therefore, the video processing apparatus may determine the target number of the reference frames of the video frame to be filtered according to the picture complexity of the video frame to be filtered, where the target number is inversely related to the picture complexity. Specifically, the higher the picture complexity of the video frame to be filtered, the smaller the target number, and the lower the picture complexity of the video frame to be filtered, the larger the target number.

In this embodiment of the present application, after obtaining a video to be processed, the video processing apparatus selects a video frame from the video to be processed as a video frame to be filtered. A picture complexity of the video frame to be filtered is determined, wherein the picture complexity characterizes a complexity of a texture in the video frame to be filtered. And determining the target number of the reference frames of the video frames to be filtered according to the picture complexity under the condition that the target number of the reference frames is inversely related to the picture complexity, wherein the reference frames are used for filtering the video frames to be filtered. In this way, in the case that the complexity of the texture in the video frame to be filtered is low, the number of reference frames of the video frame to be filtered is large, and accordingly, the intensity of filtering performed on the video frame to be filtered by using the reference frames is large, and due to the low complexity of the texture in the video frame to be filtered, even if the intensity of filtering is large, the loss of the texture caused by filtering is small. Under the condition that the complexity of textures in the video frames to be filtered is high, the number of reference frames of the video frames to be filtered is small, correspondingly, the strength of filtering carried out on the video frames to be filtered by using the reference frames is low, and due to the fact that the complexity of textures in the video frames to be filtered is high, the strength of filtering is low, and loss of textures caused by filtering can be reduced. That is, with the video processing method of the present application, the target number of reference frames that match the complexity of the texture in the video frame to be filtered can be determined.

It should be understood that the video frames to be filtered are description objects selected for briefly describing the technical solution, and it should not be understood that the video processing apparatus determines the number of reference frames for one frame of video frames in the video to be processed only through steps 101 to 103, and in practical application, the video processing apparatus may determine the number of reference frames for each frame of video frame in the video to be processed by determining the target number of reference frames for the video frames to be filtered in steps 101 to 103.

As an alternative embodiment, the video processing device performs the following steps in performing step 104:

201. in the case that the picture complexity is smaller than the first threshold, the target number is obtained by increasing the number of the reference frames on the basis of a predetermined number.

202. And determining the predetermined number as the target number when the picture complexity is greater than or equal to the first threshold value and less than or equal to the second threshold value.

203. And when the picture complexity is greater than the second threshold, the target number is obtained by reducing the number of the reference frames based on the predetermined number.

In this embodiment of the present application, the predetermined number is a preset number of reference frames. The predetermined number may be regarded as a reference value of reference frames of the video frames to be filtered, and in case the complexity of textures in the video frames to be filtered is in a reference range, the number of reference frames of the video frames to be filtered should be the predetermined number. Accordingly, in the case that the complexity of the texture in the video frame to be filtered is higher than the reference range, the number of reference frames of the video frame to be filtered should be reduced on the basis of the predetermined number, and in the case that the complexity of the texture in the video frame to be filtered is lower than the reference range, the number of reference frames of the video frame to be filtered should be increased on the basis of the predetermined number.

The reference range may be a result obtained by counting the picture complexity of the image, wherein the picture complexity of the image characterizes the complexity of the texture in the image. Specifically, the reference range is a range including a predetermined proportion of picture complexity. For example, by counting the picture complexity of 100 images, a range including the picture complexity of 80% of the images is determined as a reference range. Alternatively, the reference range may be a result obtained by counting the picture complexity of an image belonging to the same type as the video frame to be filtered. For example, the image content of the video frame to be filtered is a sports game, and the image of the same type as the video frame to be filtered may be an image of the sports game. For another example, the video frame to be filtered is acquired on a sunny day, and then the image of the same type as the video frame to be filtered may be an image acquired on a sunny day. For another example, the video frame to be filtered is acquired with the target imaging device, and then the image of the same type as the video frame to be filtered may be an image acquired with the target imaging device.

In this embodiment of the present application, the video processing apparatus determines, based on the first threshold and the second threshold, whether the complexity of the texture in the video frame to be filtered is in the reference range. Specifically, the complexity of the texture in the video frame to be filtered is greater than or equal to the first threshold and less than or equal to the second threshold, indicating that the complexity of the texture in the video frame to be filtered is in the reference range. The complexity of the texture in the video frame to be filtered is less than a first threshold, indicating that the complexity of the texture in the video frame to be filtered is below a reference range. The complexity of the texture in the video frame to be filtered is greater than the second threshold, indicating that the complexity of the texture in the video frame to be filtered is higher than the reference range.

Therefore, the video processing apparatus obtains the target number by increasing the number of reference frames on the basis of the predetermined number in the case where the picture complexity is smaller than the first threshold. In the case where the picture complexity is greater than or equal to the first threshold value and less than or equal to the second threshold value, the predetermined number is determined to be the target number. In case the picture complexity is larger than the second threshold, the target number is obtained by reducing the number of reference frames on the basis of the predetermined number. Therefore, the target number of the reference frames of the video frames to be filtered can be determined according to the complexity of textures in the video frames to be filtered. Alternatively, the predetermined number is 4 or 6.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for determining a target number of reference frames of a video frame to be filtered according to an embodiment of the present application. As shown in fig. 2, after the process flow starts, a video frame to be filtered is input, and then the complexity of the texture of each pixel is calculated pixel by pixel for the video frame to be filtered.

In one possible implementation, the lateral and longitudinal gradients of each pixel are first calculated using a gradient operator, where the gradient operator is the operator used to calculate the gradient. Optionally, the gradient operator includes: sobel (sobel) operator, robbert (roberts) operator, and laplace (laplace) operator.

Alternatively, the lateral and longitudinal gradients of each pixel are calculated by:

where g (x) is the lateral gradient of pixel e, which is any one pixel in the video frame to be filtered,representing a pixel neighborhood built centered on pixel e, < ->Representing an operator template used by the gradient operator to calculate the lateral gradient of the pixel.

Alternatively, the longitudinal gradient and the longitudinal gradient of each pixel are calculated by:

where g (y) is the longitudinal texture gradient of pixel e, which is any one pixel in the video frame to be filtered,representing a pixel neighborhood built centered on pixel e, < ->Representing an operator template used by the gradient operator to calculate the longitudinal gradient of the pixel.

Then, the magnitudes of the lateral gradients and the longitudinal gradients of the individual pixels are summed to obtain the complexity of the texture of the individual pixels.

And calculating the average value of the complexity of textures of all pixels in the video frame to be filtered, and taking the average value as the picture complexity of the video frame to be filtered. And judging whether the picture complexity of the video frame to be filtered is larger than a second threshold value. If the picture complexity of the video frames to be filtered is greater than the second threshold, the number of reference frames of the video frames to be filtered is adaptively reduced, and specifically, the target number of reference frames of the video frames to be filtered is obtained by reducing the number of reference frames on the basis of a predetermined number. If the picture complexity of the video frame to be filtered is smaller than or equal to the second threshold value, judging whether the picture complexity of the video frame to be filtered is smaller than the first threshold value. If the picture complexity of the video frames to be filtered is smaller than the first threshold value, the number of the reference frames of the video frames to be filtered is adaptively increased, and specifically, the target number of the reference frames of the video frames to be filtered is obtained by increasing the number of the reference frames on the basis of the preset number. And if the picture complexity of the video frame to be filtered is greater than or equal to the first threshold value, and the picture complexity of the video frame to be filtered is greater than or equal to the first threshold value and less than or equal to the second threshold value, determining that the target number of the reference frames of the video frame to be filtered is a preset number. And ending the processing flow after the target number of the reference frames of the video frames to be filtered is obtained.

As an alternative embodiment, the video processing apparatus further performs the following steps after determining the target number of reference frames of the video frames to be filtered according to the picture complexity:

301. and determining the n frames of video frames with the time stamps closest to the time stamp of the video frames to be filtered from the video to be processed as n frames of reference frames of the video frames to be filtered.

In this embodiment of the present application, all video frames in the video to be processed have time stamps, where the time stamps are time differences between the video frames and when the video to be processed starts to be played, for example, the video to be processed includes a video frame a, where the time stamp of the video frame a is 1 minute 3 seconds. If the video to be processed starts to be played at 10 points, the video frame played at 10 points for 1 minute and 3 seconds is known to be the video frame a according to the time stamp of the video frame a.

In the video to be processed, the closer the time stamps of the two frames of video frames are, the higher the matching degree of the image contents of the two frames of video frames is. Thus, the video processing device determines, from the video to be processed, the n frames of video whose time stamps are closest to the time stamps of the video frames to be filtered, and can determine, from the video to be processed, the n frames of video whose image contents most match the image contents of the video frames to be filtered. For example, the video to be processed includes a video frame a, a video frame b, a video frame c, and a video frame d, wherein the time stamp of the video frame a is smaller than the time stamp of the video frame b, the time stamp of the video frame b is smaller than the time stamp of the video frame c, the time stamp of the video frame c is smaller than the time stamp of the video frame d, and the time difference between the time stamp of the video frame a and the time stamp of the video frame b, the time difference between the time stamp of the video frame b and the time stamp of the video frame c, and the time difference between the time stamp of the video frame c and the time stamp of the video frame d are all equal. If the video frame c is a video frame to be filtered, and n is 2, the 2 frame reference frames determined by the video processing device from the video to be processed are a video frame b and a video frame d.

In the embodiment of the present application, n is the same as the target number, that is, the number of reference frames of the video frames to be filtered determined from the video to be processed is the target number.

302. And filtering the video frame to be filtered according to the n frame reference frames to obtain a filtered video frame.

In one possible implementation, the video processing apparatus obtains the filtered video frame by performing weighted average filtering on the n frame reference frames and the video frame to be filtered.

In another possible implementation, the video processing device divides the video frame to be filtered into m reference image blocks, where m is a positive integer. Optionally, the size of the reference image block is smaller than or equal to a fixed value. Specifically, in the case where the size of the video frame to be filtered is an integer multiple of a fixed value, the sizes of the reference image blocks are all fixed values. In the case where the size of the video frame to be filtered is not an integer multiple of the fixed value, after dividing the reference image block of the fixed value from the video frame to be filtered, the image block of which the size is smaller than the fixed value is taken as the reference image block. For example, the video frame to be filtered has a size of 1080×1080 and a fixed value of 36×36, then the video frame to be filtered may be divided into 30 reference image blocks of 36×36 in size.

The video processing apparatus determines, for each of m reference image blocks, a target image block having a matching relationship from n frames of reference frames. Optionally, the video processing device divides each frame of reference frame into m image blocks to be matched, wherein the dividing mode of the reference frame is the same as the dividing mode of the video frame to be filtered. The video processing apparatus determines, for each reference image block, an image block having a matching relationship from among image blocks to be matched of reference frames of each frame as a target image block, so that n target image blocks can be obtained.

The reference image block having a matching relationship with the target image block means that the reference image block is matched with the target image block, for example, m reference image blocks include a reference image block a, and m image blocks to be matched include an image block b to be matched and an image block c to be matched. If the matching degree of the reference image block a and the image block b to be matched is higher than that of the reference image block a and the image block c to be matched, the image block c to be matched is an image block matched with the reference image block a in m image blocks to be matched, namely the image block c to be matched is an image block with a matching relation with the reference image block a in m image blocks to be matched.

And carrying out weighted average on the reference image block and the target image block with the matching relation to obtain m filtering image blocks. Filtering of the reference image block may be achieved by weighted averaging the reference image block and the target image block having a matching relationship with the reference image block. M filtered image blocks can be obtained after filtering the m reference image blocks. Finally, the video processing device can obtain a filtered video frame according to the m filtered image blocks.

303. And replacing the video frame to be filtered in the video to be processed with the video frame to be filtered, so as to obtain the video after filtering.

The video processing apparatus may implement filtering of the video frame to be filtered in the video to be processed by executing step 303, to obtain a filtered video.

In this embodiment, after determining the target number of reference frames of the video frames to be filtered, the video processing apparatus determines, from the video to be processed, n frames of video frames having time stamps closest to the time stamps of the video frames to be filtered, as n frames of reference frames of the video frames to be filtered, where n is the same as the target number. And then filtering the video frame to be filtered according to the n frame reference frames, so that loss of textures in the video frame to be filtered caused by filtering can be reduced under the condition of filtering the video frame to be filtered, and more textures can be reserved for the video frame to be filtered under the condition of reducing noise in the video frame to be filtered through filtering.

It should be understood that the video frame to be filtered is a description object selected for briefly describing the technical solution, and it should not be understood that the video processing apparatus filters only one frame of video frame in the video to be processed through steps 301 to 303, and in practical application, the video processing apparatus may filter each frame of video frame in the video to be processed separately in a manner of filtering the video frame to be filtered in steps 301 to 303.

At this time, the video processing device determines the amplitude of the transverse gradient of the human eye region of interest as a first amplitude and determines the amplitude of the longitudinal gradient of the human eye region of interest as a second amplitude after determining the human eye region of interest from the video frame to be filtered, so that the picture complexity of the video frame to be filtered is determined based on the first amplitude and/or the second amplitude, and the picture complexity of the video frame to be filtered can be better represented. Therefore, after determining the target number of the reference frames of the video frames to be filtered according to the picture complexity of the video frames to be filtered, the video processing device determines, from the video to be processed, n frames of video frames with time stamps closest to the time stamps of the video frames to be filtered as n frames of reference frames of the video frames to be filtered, wherein n is the same as the target number. And then filtering the video frame to be filtered according to the n frame reference frames, so that loss of textures in the eye region of interest caused by filtering can be reduced under the condition of filtering the video frame to be filtered, and more textures can be reserved for the eye region of interest under the condition of reducing noise in the video frame to be filtered through filtering. Therefore, the quality of the video frames to be filtered perceived by human eyes can be improved when the video frames to be filtered are displayed.

As an alternative embodiment, the video processing apparatus obtains the encoded video of the video to be processed by encoding the filtered video after obtaining the filtered video. This may improve the quality of the encoded video.

Optionally, the video processing device encodes the filtered video through MCTF to obtain encoded video. Since MCTF includes a process of motion estimation and motion compensation, and noise in a video frame may cause errors of motion estimation and motion compensation, in the case that a filtered video frame in a filtered video is filtered, the filtered video is encoded using MCTF, so that the errors of motion estimation and motion compensation may be reduced, and thus the accuracy of encoding may be improved.

Optionally, the video processing device obtains the encoded video by performing residual coding on the filtered video. Because the noise in the video frame comprises a high-frequency component, and the high-frequency component needs to consume a code rate in the residual coding process, the code rate consumed by the noise can be reduced when the filtered video frame in the filtered video is filtered, and the code rate consumed by the filtered video is further saved.

In one possible implementation scenario, video distribution may be implemented based on the video processing method provided above. Referring to fig. 3, fig. 3 is a schematic diagram of an architecture of a video distribution system provided in an embodiment of the present application, as shown in fig. 3, the video distribution system 1 includes a client 11, a client 12, and a video processing device 13, where communication connection exists between the client 11 and the client 12 and the video processing device 13, and the client 11 and the client 12 can upload video to the video processing device 13 through the communication connection, and the client 11 and the client 12 distribute video on a video platform operated by the video processing device 13 by uploading video to the video processing device 13. In one possible implementation, the video platform on which the video processing device 13 operates is a short video platform, and in another possible implementation, the video platform on which the video processing device 13 operates is a live platform.

Alternatively, both the client 11 and the client 12 may be one of the following: a mobile phone, a computer, a tablet computer, and a wearable smart device, for example, the client 11 is a mobile phone, the client 12 is a computer, and for example, the client 11 and the client 12 are tablet computers. Optionally, the video processing device 13 is a server.

It should be understood that the clients 11 and 12 shown in fig. 3 are only examples, and it should not be understood that only 2 clients having communication connection with the video processing apparatus 13 are needed, and in practical applications, the number of clients having communication connection with the video processing apparatus 13 may be m, where m is a positive integer.

The clients 11 and 12 may log onto a video platform on which the video processing apparatus 13 operates, and the user may further upload video to the video platform through the clients 11 or 12 to distribute the uploaded video on the video platform. After receiving the video uploaded by the user, the video processing device 13 first filters the video uploaded by the user to obtain a filtered video, then transcodes the filtered video to obtain a transcoded video, and finally issues the transcoded video to the video platform. In this way, the video processing device filters the video uploaded by the user, so that noise in the uploaded video can be removed to obtain a filtered video, then the filtered video is transcoded to obtain a transcoded video, and the quality of the transcoded video can be improved. And finally, the transcoded video is released to the video platform, so that the quality of the video platform can be improved.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

The foregoing details the method of embodiments of the present application, and the apparatus of embodiments of the present application is provided below.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application, where the video processing apparatus 2 includes: the acquisition unit 21, the selection unit 22, the determination unit 23, optionally, the video processing apparatus 2 further comprises: filter unit 24, substitution unit 25, coding unit 26, in particular:

an acquisition unit 21 for acquiring a video to be processed;

a selecting unit 22, configured to select a video frame from the video to be processed as a video frame to be filtered;

a determining unit 23, configured to determine a picture complexity of the video frame to be filtered, where the picture complexity characterizes a complexity of a texture in the video frame to be filtered;

the determining unit 23 is configured to determine, according to the picture complexity, a target number of reference frames of the video frame to be filtered, where the reference frames are used to filter the video frame to be filtered, and the target number is inversely related to the picture complexity.

In combination with any one of the embodiments of the present application, the determining unit 23 is configured to:

In combination with any one of the embodiments of the present application, the determining unit 23 is further configured to determine, from the video to be processed, n frames of video frames having a time stamp closest to a time stamp of the video frame to be filtered, as n frame reference frames of the video frame to be filtered, where n is the same as the target number;

The video processing apparatus 2 further includes:

the filtering unit 24 is configured to filter the video frame to be filtered according to the n frame reference frames, so as to obtain a filtered video frame;

and a replacing unit 25, configured to replace the video frame to be filtered in the video to be processed with the video frame to be filtered, so as to obtain a filtered video.

In combination with any one of the embodiments of the present application, the filtering unit 24 is configured to:

dividing the video frame to be filtered into m reference image blocks;

In combination with any one of the embodiments of the present application, the video processing apparatus 2 further includes: and the encoding unit 26 is configured to obtain an encoded video of the video to be processed by encoding the filtered video.

Determining a variance of pixel values of the video frame to be filtered;

In this embodiment of the present application, after obtaining a video to be processed, the video processing apparatus selects a video frame from the video to be processed as a video frame to be filtered. A picture complexity of the video frame to be filtered is determined, wherein the picture complexity characterizes a complexity of a texture in the video frame to be filtered. And determining the target number of the reference frames of the video frames to be filtered according to the picture complexity under the condition that the target number of the reference frames is inversely related to the picture complexity, wherein the reference frames are used for filtering the video frames to be filtered. In this way, in the case that the complexity of the texture in the video frame to be filtered is low, the number of reference frames of the video frame to be filtered is large, and accordingly, the intensity of filtering performed on the video frame to be filtered by using the reference frames is large, and due to the low complexity of the texture in the video frame to be filtered, even if the intensity of filtering is large, the loss of the texture caused by filtering is small. Under the condition that the complexity of textures in the video frames to be filtered is high, the number of reference frames of the video frames to be filtered is small, correspondingly, the strength of filtering carried out on the video frames to be filtered by using the reference frames is low, and due to the fact that the complexity of textures in the video frames to be filtered is high, the strength of filtering is low, and loss of textures caused by filtering can be reduced. That is, the video processing device in this application may determine a target number of reference frames that match the complexity of the texture in the video frame to be filtered.

In some embodiments, functions or modules included in the apparatus provided in the embodiments of the present application may be used to perform the methods described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

Fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device 3 comprises a processor 31, a memory 32. Optionally, the electronic device 3 further comprises input means 33 and output means 34. The processor 31, memory 32, input device 33, and output device 34 are coupled by connectors, including various interfaces, transmission lines or buses, etc., as the embodiments are not limited in this respect. It should be understood that in various embodiments of the present application, coupled is intended to mean interconnected by a particular means, including directly or indirectly through other devices, e.g., through various interfaces, transmission lines, buses, etc.

The processor 31 may comprise one or more processors, for example one or more central processing units (central processing unit, CPU), which in the case of a CPU may be a single core CPU or a multi core CPU. Alternatively, the processor 31 may be a processor group constituted by a plurality of CPUs, the plurality of processors being coupled to each other through one or more buses. In the alternative, the processor may be another type of processor, and the embodiment of the present application is not limited.

Memory 32 may be used to store computer program instructions as well as various types of computer program code for performing aspects of the present application. Optionally, the memory includes, but is not limited to, a random access memory (random access memory, RAM), a read-only memory (ROM), an erasable programmable read-only memory (erasable programmable read only memory, EPROM), or a portable read-only memory (compact disc read-only memory, CD-ROM) for associated instructions and data.

The input means 33 are for inputting data and/or signals and the output means 34 are for outputting data and/or signals. The input device 33 and the output device 34 may be separate devices or may be an integral device.

It will be appreciated that, in the embodiment of the present application, the memory 32 may be used to store not only related instructions, but also related data, for example, the memory 32 may be used to store the video to be processed acquired through the input device 33, or the memory 32 may also be used to store the target number of reference frames of the video frames to be filtered obtained through the processor 31, etc., where the data specifically stored in the memory is not limited in the embodiment of the present application.

It will be appreciated that fig. 5 shows only a simplified design of an electronic device. In practical applications, the electronic device may further include other necessary elements, including but not limited to any number of input/output devices, processors, memories, etc., and all electronic devices that may implement the embodiments of the present application are within the scope of protection of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein. It will be further apparent to those skilled in the art that the descriptions of the various embodiments herein are provided with emphasis, and that the same or similar parts may not be explicitly described in different embodiments for the sake of convenience and brevity of description, and thus, parts not described in one embodiment or in detail may be referred to in the description of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital versatiledisc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: a read-only memory (ROM) or a random access memory (random access memory, RAM), a magnetic disk or an optical disk, or the like.

Claims

1. A method of video processing, the method comprising:

acquiring a video to be processed;

determining picture complexity of the video frame to be filtered, wherein the picture complexity comprises the complexity of textures in the video frame to be filtered;

2. The method of claim 1, wherein said determining a picture complexity of the video frame to be filtered comprises:

3. The method according to claim 2, wherein said deriving the picture complexity from a first magnitude of a lateral gradient of the video frame to be filtered and/or from a second magnitude of a longitudinal gradient of the video frame to be filtered comprises:

4. A method according to claim 3, wherein said determining a first magnitude of a lateral gradient of the video frame to be filtered and a second magnitude of a longitudinal gradient of the video frame to be filtered comprises:

5. The method according to any one of claims 1 to 4, wherein said determining a target number of reference frames of the video frames to be filtered based on the picture complexity comprises:

6. The method according to any one of claims 1 to 4, wherein after determining the target number of reference frames of the video frames to be filtered according to the picture complexity, the method further comprises:

7. The method of claim 6, wherein filtering the video frame to be filtered based on the n frame reference frames to obtain a filtered video frame comprises:

dividing the video frame to be filtered into m reference image blocks;

8. The method of claim 6, wherein after obtaining the filtered video, the method further comprises:

9. The method of claim 1, wherein said determining a picture complexity of the video frame to be filtered comprises:

determining a variance of pixel values of the video frame to be filtered;

10. A video processing apparatus, the apparatus comprising:

the acquisition unit is used for acquiring the video to be processed;

11. An electronic device, comprising: a processor and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of any one of claims 1 to 9.

12. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 9.

13. A computer program product, characterized in that the computer program product comprises a computer program or instructions; the computer program or instructions, when run on a computer, cause the computer to perform the method of any one of claims 1 to 9.