CN115022679B

CN115022679B - Video processing method, device, electronic equipment and medium

Info

Publication number: CN115022679B
Application number: CN202210604658.8A
Authority: CN
Inventors: 房龙江
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2023-08-29
Anticipated expiration: 2042-05-30
Also published as: CN115022679A

Abstract

The disclosure provides a video processing method, a device, equipment, a medium and a product, and relates to the technical field of computers, in particular to the technical fields of computer vision, image processing and the like. The video processing method comprises the following steps: determining change information of a target object in a candidate image frame based on pixel information of the candidate image frame in the video to be processed; determining a selection policy for the candidate image frames based on the change information; selecting a target image frame from the candidate image frames based on a selection policy; a target video is generated based on the target image frame.

Description

Video processing method, device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to the technical field of computer vision, image processing, and the like, and more particularly, to a video processing method, apparatus, electronic device, medium, and program product.

Background

In playing video, users have various playing requirements, for example, some users need to accelerate playing video. However, in the process of accelerating video playing, there are problems such as reduced video smoothness, blurred video pictures, and pause.

Disclosure of Invention

The present disclosure provides a video processing method, apparatus, electronic device, storage medium, and program product.

According to an aspect of the present disclosure, there is provided a video processing method including: determining change information of a target object in a candidate image frame based on pixel information of the candidate image frame in a video to be processed; determining a selection policy for the candidate image frame based on the change information; selecting a target image frame from the candidate image frames based on the selection policy; a target video is generated based on the target image frame.

According to another aspect of the present disclosure, there is provided a video processing apparatus including: the device comprises a first determining module, a second determining module, a selecting module and a generating module. The first determining module is used for determining the change information of the target object in the candidate image frame based on the pixel information of the candidate image frame in the video to be processed; a second determining module for determining a selection policy for the candidate image frames based on the change information; a selection module for selecting a target image frame from the candidate image frames based on the selection policy; and the generating module is used for generating a target video based on the target image frame.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video processing method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the video processing method described above.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the video processing method described above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates a system architecture for video processing according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a video processing method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a diagram of determining change information according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of generating a target video according to an embodiment of the disclosure;

fig. 5 schematically illustrates a block diagram of a video processing apparatus according to an embodiment of the present disclosure; and

fig. 6 is a block diagram of an electronic device for performing video processing to implement an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Video technology is currently the primary information presentation technology, and video is usually of different resolutions, the higher the resolution should be, the more pixels each image frame in the video contains. Taking 1080P resolution as an example, each image frame includes 1920×1080 pixels, and a pixel value (color value) of each pixel is represented by 0-255, and 256 values are used to represent different color ranges. The pictures of the image frames in the video are generally three channels, and are combined by three colors of red, green and blue to form colorful video information. The frame rate of video is typically between 25 frames and 60 frames, meaning that one second of video includes 25 image frames to 60 image frames, and some video can be at a higher frame rate. The video has the characteristics of large information quantity, high information density and the like, and has the forms of online playing, offline playing and the like.

The existing video formats mainly comprise MP4, AVI, WMA and other formats, and the main encoding forms comprise H.264 and H.265, so that the original image information in the video can be necessarily compressed, and the high frame rate and fluency of the video are realized.

In playing video, a double speed can be selected, and the higher the double speed is, the less time is required for playing video. The doubling speed generally includes 1.5 times, 2 times, 2.5 times, 3 times, and the like. Playing video through double fast forward usually results in blurred fast forward sections, frame skipping, and unsmooth pictures.

In view of this, an embodiment of the present disclosure provides a video processing method including: and determining the change information of the target object in the candidate image frame based on the pixel information of the candidate image frame in the video to be processed. Then, a selection policy for the candidate image frames is determined based on the change information, and a target image frame is selected from the candidate image frames based on the selection policy. Next, a target video is generated based on the target image frame.

Fig. 1 schematically illustrates a system architecture of video processing according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include clients 101, 102, 103, a network 104, and a server 105. The network 104 is the medium used to provide communication links between the clients 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 105 through the network 104 using clients 101, 102, 103 to receive or send messages, etc. Various communication client applications may be installed on clients 101, 102, 103, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, and the like (by way of example only).

The clients 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like. The clients 101, 102, 103 of the disclosed embodiments may, for example, run applications.

The server 105 may be a server providing various services, such as a background management server (by way of example only) that provides support for websites browsed by users using clients 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the client. In addition, the server 105 may also be a cloud server, i.e. the server 105 has cloud computing functionality.

It should be noted that the video processing method provided by the embodiment of the present disclosure may be performed by the server 105. Accordingly, the video processing apparatus provided by the embodiments of the present disclosure may be provided in the server 105.

In one example, the server 105 may process the video to be processed to obtain the target video. The server 105 may actively send the target video to the clients 101, 102, 103 for playback. Alternatively, after receiving the request of the client 101, 102, 103, the server 105 transmits the target video to the client 101, 102, 103 for playback in response to the request.

It should be understood that the number of clients, networks, and servers in fig. 1 is merely illustrative. There may be any number of clients, networks, and servers, as desired for implementation.

A video processing method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 4 in conjunction with the system architecture of fig. 1. The video processing method of the embodiment of the present disclosure may be performed by, for example, a server shown in fig. 1, the server shown in fig. 1 being the same as or similar to, for example, the following electronic device.

Fig. 2 schematically illustrates a flow chart of a video processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the video processing method 200 of the embodiment of the present disclosure may include, for example, operations S210 to S240.

In operation S210, change information of a target object in a candidate image frame is determined based on pixel information of the candidate image frame in a video to be processed.

In operation S220, a selection policy for the candidate image frames is determined based on the change information.

In operation S230, a target image frame is selected from among the candidate image frames based on a selection policy.

In operation S240, a target video is generated based on the target image frame.

According to an embodiment of the present disclosure, for example, a plurality of image frames are included in the video to be processed, and the candidate image frames include, for example, two image frames or more. The candidate image frames include target objects such as scenes, characters, trees, animals, etc. in the video.

After determining the candidate image frames from the video to be processed, the pixels of the candidate image frames may be processed to obtain change information for the candidate image frames, the change information characterizing the degree of change of the target object. For example, when the target object is a scene, the change information may characterize the speed of scene cuts in the video picture. When the target object is a person, the change information may characterize the moving speed of the person in the video picture. In other words, the change information is used to characterize the candidate image frames as either dynamically changing or stationary.

After determining the change information for the candidate image frames, a selection policy for the candidate image frames may be determined based on the change information. The selection policy indicates how to select an appropriate target image frame from the candidate image frames. When the change information indicates that the change degree of the candidate image frames is larger, the video frames representing the candidate image frames change faster, and the number of target image frames selected based on the selection strategy is larger. When the change information characterizes the candidate image frames to have smaller change degree, the video pictures representing the candidate image frames are relatively static, and the number of target image frames selected based on the selection strategy is smaller. After selecting the target image frame based on the selection policy, a target video may be generated based on the target image frame.

It will be appreciated that the number of target image frames is typically less than the number of candidate image frames, and thus the number of target video generated based on the target image frames is typically less than the number of frames of video to be processed, the target video being a double-speed video relative to the video to be processed. For example, when the video to be processed includes n image frames, n is an integer greater than 0, the time taken for the video to be processed to normally play is 30 minutes, and n/30 frames are played in one minute. If the target video includes n/2 frames of image frames, the target video takes 15 minutes to play, one minute to play (n/2)/15=n/30 frames. The time consuming playing of the target video is reduced and is therefore a 2-fold speed video relative to the video to be processed.

According to the embodiment of the disclosure, the target image frames are selected based on the change information of the candidate image frames, different numbers of target image frames are selected according to the change speed of the video frames, the selection intellectualization of the target image frames is improved, the selected target image frames contain important information of the candidate image frames to a large extent, the playing fluency of the generated target video is further improved, and the playing definition of the target video is improved.

Fig. 3 schematically illustrates a schematic diagram of determining change information according to an embodiment of the present disclosure.

As shown in fig. 3, the video 310 to be processed includes a plurality of image frames, for example. The plurality of image frames in the video to be processed 310 are extracted and buffered to obtain a plurality of buffered image frames 320. The plurality of buffered image frames 320 includes, for example, n frames, n being an integer greater than 0.

A plurality of video clips 331, 332, 333 is determined from the plurality of buffered image frames 320, adjacent ones of the plurality of video clips 331, 332, 333 containing the same image frame. For example, video clip 331 and video clip 332 are adjacent, and video clip 331 and video clip 332 include the same image frames m_2, m_3. Video clip 332 is adjacent to video clip 333, and video clip 332 and video clip 333 include the same image frames m_3, m_4.

The image frames contained in any of the plurality of video clips 331, 332, 333 are taken as candidate image frames. The candidate image frames are, for example, adjacent image frames, and the target video is generated based on the adjacent image frames, so that the effect of the target video is improved, and the omission of the image frames in the video to be processed is avoided. For example, the image frames m_1, m_2, and m_3 included in the video clip 331 are candidate image frames, the image frames m_2, m_3, and m_4 included in the video clip 332 are candidate image frames, and the image frames m_3, m_4, and m_5 included in the video clip 333 are candidate image frames.

In another example, the image frames in video segment 331 may not be adjacent. For example, video segment 331 may contain m_1, m_3, m_5, video segment 332 may contain m_2, m_4, m_6, and video segment 333 may contain m_3, m_5, m_7.

It will be appreciated that the embodiments of the present disclosure do not specifically limit the number of identical image frames contained in adjacent video clips, and do not specifically limit the number of candidate image frames contained in each video clip.

How to acquire the change information for the candidate image frames m_1, m_2, m_3 is described below taking the candidate image frames m_1, m_2, m_3 included in the video clip 331 as an example.

For example, a first difference image is determined based on pixel information of candidate image frames m_1, m_2, m_3, a second difference image is determined based on pixel information of the first difference image, and then change information is determined based on pixel information of the second difference image.

For example, the candidate image frames include a first image frame m_1, a second image frame m_2, and a third image frame m_3. The first difference image includes a first difference sub-image Δd_1 and a second difference sub-image Δd_2.

As shown in equation (1), the first difference sub-image Δd_1 is determined based on the pixel difference value of the first image frame m_1 and the second image frame m_2. For example, for each pixel value in the first image frame m_1, the pixel value is subtracted from the corresponding pixel value in the second image frame m_2, and all the pixel values are operated on to obtain the first difference sub-image Δd_1.

Similarly, as shown in equation (2), the second difference sub-image Δd_2 is determined based on the pixel difference between the second image frame m_2 and the third image frame m_3.

ΔD_1＝|M_1-M_2| (1)

ΔD_2＝|M_2-M_3| (2)

Where m_1, m_2, and m_3 denote pixel values included in the corresponding image frame, and Δd_1 and Δd_2 denote difference sub-images composed of pixel differences between the corresponding pixels.

In an example, the pixel values include, for example, gray values ranging, for example, from 0 to 255. The first image frame m_1, the second image frame m_2, and the third image frame m_3 are, for example, grayscale images, and if the first image frame is a color image, the first image frame m_1, the second image frame m_2, and the third image frame m_3 may be converted into grayscale images for calculation. Thus, equation (1) and equation (2) are gray value subtraction.

In another example, the pixel values include, for example, color values or sub-pixel values, e.g., the first image frame m_1, the second image frame m_2, and the third image frame m_3 are, for example, RGB (Red, green, blue) images. Each pixel in the image frame includes an R value, a G value, and a B value, each of which ranges from 0 to 255, for example. The R, G, B values may be referred to as color values or subpixel values.

Taking the calculation of the first difference sub-image Δd_1 as an example, the R value of each pixel in the first image frame m_1 is subtracted from the R value of the corresponding pixel in the second image frame m_2 to obtain a first subtraction result. And subtracting the G value of the corresponding pixel in the second image frame M_2 from the G value of each pixel in the first image frame M_1 to obtain a second subtraction result. And subtracting the B value of the corresponding pixel in the second image frame M_2 from the B value of each pixel in the first image frame M_1 to obtain a third subtraction result. And then, adding or averaging the first subtraction result, the second subtraction result and the third subtraction result to obtain a first difference sub-image delta D_1.

After the first and second difference sub-images Δd_1 and Δd_2 are obtained, the second difference image Δp_1 may be determined based on the pixel difference between the first and second difference sub-images Δd_1 and Δd_2, as shown in equation (3).

ΔP_1＝|ΔD_1-ΔD_2| (3)

After the second difference image Δp_1 is obtained, as shown in formula (4), pixel values of the second difference image Δp_1 may be added to obtain the change information i_1 for the candidate image frames m_1, m_2, m_3.

Where k is the total number of pixels in the second difference image Δp_1, i.e. the total number of pixels in each image frame of the candidate image frames. For example, when the video resolution of the video 310 to be processed is the number of rows of pixels and the number of columns of pixels, k is the number of rows of pixels and the number of columns of pixels.

Similarly, the change information i_2 for the candidate image frames m_2, m_3, m_4 included in the video clip 332, and the change information i_3 for the candidate image frames m_3, m_4, m_5 included in the video clip 333 may be obtained.

It can be appreciated that, how to calculate the change information i_1 is described above with the first image frame m_1, the second image frame m_2, and the third image frame m_3, but the embodiment of the present disclosure does not specifically limit the calculation manner of the change information i_1. For example, the transformation information i_1 may be calculated based on more image frames. Taking 4 image frames as an example, a first difference sub-image is obtained based on a first image frame and a second image frame, a second difference sub-image is obtained based on the second image frame and a third image frame, and a third difference sub-image is obtained based on the third image frame and a fourth image frame. Then, a second difference image is obtained based on the first difference sub-image and the second difference sub-image, another second difference image is obtained based on the second difference sub-image and the third difference sub-image, and the two second difference images can be added or averaged to obtain a final second difference image. Alternatively, the subtraction of the two second difference images may be continued to obtain a final second difference image. Finally, the change information I_1 is obtained based on the final second difference image.

According to an embodiment of the present disclosure, a first difference image is obtained based on pixel differences between candidate image frames, the first difference image characterizing a difference between any two of the candidate image frames. Then, a second difference image is derived based on pixel differences between the first difference image, the second difference image characterizing differences between two or more image frames in adjacent images. The difference between the candidate image frames can be reflected more accurately based on the change information obtained from the second difference image. By the embodiment of the disclosure, the determined change information is more accurate.

In another example of the present disclosure, the change information for the candidate image frames is represented, for example, in a change value. If the determined change value is smaller than the first threshold, the determining the selection policy includes selecting a leading image frame from the candidate image frames, the leading image frame corresponding to a time before the other image frames in the candidate image frames. If the change value is greater than or equal to the first threshold value and less than the second threshold value, the determining the selection strategy comprises selecting a preceding image frame and a subsequent image frame from the candidate image frames, wherein the time corresponding to the subsequent image frame is after the time corresponding to other image frames in the candidate image frames. If the determined change value is greater than or equal to the second threshold, determining a selection policy includes selecting a preceding image frame, a following image frame, and an intermediate image frame from the candidate image frames, the intermediate image frame corresponding to a time instant that is subsequent to the time instant that the preceding image frame corresponds to and prior to the time instant that the following image frame corresponds to.

For ease of understanding, the description will be given taking an example in which the candidate image frames include, for example, a first image frame, for example, a preceding image frame, a second image frame, for example, an intermediate image frame, and a third image frame, for example, a following image frame. The corresponding time of the second image frame in the video to be processed is after the corresponding time of the first image frame in the video to be processed, and the corresponding time of the third image frame in the video to be processed is after the corresponding time of the second image frame in the video to be processed. For example, for a certain candidate image frame, the change value is denoted as i_1.

When the variation value i_1 is smaller than the first threshold value, it indicates that the degree of variation in relative still or picture between the candidate image frames is small, i.e., the degree of similarity between a plurality of image frames among the candidate image frames is high. At this time, it may be determined that the selection policy includes selecting the first image frame so as to preserve a small number of image frames among the candidate image frames, so that the amount of target video data to be subsequently generated is small and has high fluency on the basis of satisfying the accelerated playback.

When the variation value i_1 is greater than or equal to the first threshold value and less than the second threshold value, it indicates that there is a certain degree of relative variation between the candidate image frames but the degree of variation is not great. For example, the degree of change between the first image frame and the second image frame is smaller than the degree of change between the first image frame and the third image frame, i.e. the degree of change between the first image frame and the third image frame is slightly larger, and the degree of similarity between the first image frame and the third image frame is lower. At this time, the selection policy may be determined to include selecting the first image frame and the third image frame, so as to preserve the image frame with larger difference in the candidate image frames, so that the target video generated subsequently has higher fluency on the basis of meeting the requirement of accelerated playing, and avoid the phenomena of jamming or blurring and the like when the target video is played.

When the variation value i_1 is equal to or greater than the second threshold, it indicates that the degree of relative variation between the candidate image frames is large, and the similarity between the first image frame, the second image frame, and the third image frame is low. At this time, the selection policy may be determined to include selecting the first image frame, the second image frame, and the third image frame, so that all image frames with larger differences in the candidate image frames are reserved, so that the target video generated later includes rich video pictures and has higher fluency in playing, and the phenomenon of jamming or blurring and the like in playing the target video is avoided.

The second threshold value is, for example, larger than the first threshold value, and the first and second threshold values are, for example, derived based on the total number of pixels of the image frame and the specific pixel value. In an example, when the total number of pixels of an image frame is 1920×1080, the maximum pixel value (specific pixel value) of each pixel is 255. In this case, the first threshold is, for example, (1920×1080×255)/4, and the second threshold is, for example, (1920×1080×255)/2.

According to the embodiment of the disclosure, the first threshold value and the second threshold value are determined based on the total number of pixels of the image frame and the specific pixel value, so that the first threshold value and the second threshold value as selection references represent the degree of change of the image frame as a whole, and the selection strategy determined based on the first threshold value and the second threshold value is more accurate.

Fig. 4 schematically illustrates a schematic diagram of generating a target video according to an embodiment of the present disclosure.

As shown in fig. 4, a plurality of video clips 431, 432, 433 are determined from the video to be processed, and image frames contained in any of the plurality of video clips 431, 432, 433 are determined as candidate image frames.

For example, for the candidate image frames m_1, m_2, and m_3 in the video clip 431, the corresponding change information (change value) i_1 is, for example, greater than or equal to the first threshold th_1 and less than the second threshold th_2, and at this time, the video clip 431 is filtered to obtain a corresponding set of target image frames 441, where the set of target image frames 441 includes, for example, the target image frames m_1 and m_3.

For example, for the candidate image frames m_2, m_3, and m_4 in the video segment 432, the corresponding change information (change value) i_2 is smaller than the first threshold th_1, for example, and the filtering process is performed on the video segment 432 to obtain a corresponding set of target image frames 442, where the set of target image frames 442 includes the target image frame m_2, for example.

For example, for the candidate image frames m_3, m_4, and m_5 in the video clip 433, the corresponding change information (change value) i_3 is, for example, greater than or equal to the first threshold th_1 and less than the second threshold th_2, and at this time, the video clip 433 is filtered to obtain a corresponding set of target image frames 443, where the set of target image frames 443 includes, for example, the target image frames m_3 and m_5.

Thus, a plurality of sets of target image frames 441, 442, 443 are obtained, the plurality of sets of target image frames 441, 442, 443 being in one-to-one correspondence with the plurality of video clips 431, 432, 433.

When it is determined that the plurality of sets of target image frames 441, 442, 443 include a repeated target image frame, the repeated target image frames are removed, for example, the target image frame m_3 in the third set that is repeated with the first set is removed, resulting in a processed plurality of sets of target image frames 451, 452, 453. Then, a target video 460 is generated based on the image frames of the processed plural sets of target image frames 451, 452, 453.

According to embodiments of the present disclosure, when determining video clips, adjacent video clips are made to contain identical image frames as much as possible, so that the screened target image frames contain more video content. When the target image frame contains repeated image frames, generating a target video after removing the repetition, and ensuring higher fluency of the obtained target video while meeting the requirement of accelerated playing so as to avoid the phenomenon of blocking or blurring.

The target video in the embodiment of the disclosure can be used for online playing, the video to be processed is processed through the back end (server) to obtain the target image frame to be stored and played, and the target image frame is subjected to frame re-rate combination to obtain the target video, so that the data volume of the video is reduced. When the front end (client) downloads the target video offline or plays the target video online, the storage space and the network bandwidth are effectively reduced. The front end (client) plays the target video, so that the video to be processed can be rapidly played, the playing smoothness is improved, and the playing experience is improved.

The target video of the embodiment of the disclosure can be applied to the field of video query, and the video to be processed is processed through the back end (server), so that the occupied space of video data is effectively reduced, and the storage hard disk capacity of the video data is saved. The recombined target video can support fast play of video pictures with strong variation, and the function of fast inquiring important pictures in the video is realized.

Fig. 5 schematically shows a block diagram of a video processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 5, the video processing apparatus 500 of the embodiment of the present disclosure includes, for example, a first determination module 510, a second determination module 520, a selection module 530, and a generation module 540.

The first determining module 510 may be configured to determine, based on pixel information of candidate image frames in the video to be processed, change information of a target object in the candidate image frames. According to an embodiment of the present disclosure, the first determining module 510 may perform, for example, the operation S210 described above with reference to fig. 2, which is not described herein.

The second determination module 520 may be configured to determine a selection policy for the candidate image frames based on the change information. The second determining module 520 may, for example, perform operation S220 described above with reference to fig. 2 according to an embodiment of the present disclosure, which is not described herein.

The selection module 530 may be configured to select a target image frame from the candidate image frames based on a selection policy. According to an embodiment of the present disclosure, the selection module 530 may perform, for example, operation S230 described above with reference to fig. 2, which is not described herein.

The generation module 540 may be configured to generate a target video based on the target image frame. The generating module 540 may, for example, perform operation S240 described above with reference to fig. 2 according to an embodiment of the present disclosure, which is not described herein.

According to an embodiment of the present disclosure, the first determining module 510 includes: the first, second and third determination sub-modules. A first determining sub-module for determining a first difference image based on pixel information of the candidate image frames; a second determining sub-module for determining a second difference image based on pixel information of the first difference image; and a third determining sub-module for determining the change information based on the pixel information of the second difference image.

According to an embodiment of the present disclosure, the candidate image frames include a first image frame, a second image frame, and a third image frame; the first difference image comprises a first difference sub-image and a second difference sub-image; wherein the first determination submodule includes: a first determination unit and a second determination unit. A first determining unit configured to determine a first difference sub-image based on pixel differences of the first image frame and the second image frame; and a second determining unit configured to determine a second difference sub-image based on a pixel difference between the second image frame and the third image frame.

According to an embodiment of the present disclosure, the second determination submodule is further configured to: determining a second difference image based on pixel differences between the first difference sub-image and the second difference sub-image; the third determination submodule is further configured to: and adding the pixel values of the second difference image to obtain the change information.

According to an embodiment of the present disclosure, the change information includes a change value; the second determination module 520 includes: a fourth determination sub-module, a fifth determination sub-module, and a sixth determination sub-module. A fourth determining sub-module, configured to determine, in response to determining that the change value is less than the first threshold, that the selection policy includes selecting a leading image frame from the candidate image frames, where a time corresponding to the leading image frame is before a time corresponding to other image frames in the candidate image frames; a fifth determining submodule, configured to determine, in response to determining that the change value is greater than or equal to the first threshold and less than the second threshold, that the selection policy includes selecting a preceding image frame and a subsequent image frame from the candidate image frames, where a time corresponding to the subsequent image frame is after a time corresponding to other image frames in the candidate image frames; a sixth determining sub-module, configured to determine, in response to determining that the change value is greater than or equal to a second threshold, that the selection policy includes selecting a leading image frame, a following image frame, and an intermediate image frame from the candidate image frames, where a time corresponding to the intermediate image frame is after a time corresponding to the leading image frame and before a time corresponding to the following image frame, where the second threshold is greater than the first threshold.

According to an embodiment of the present disclosure, the candidate image frames include a first image frame, a second image frame, and a third image frame; the first image frame is a preceding image frame, the second image frame is a middle image frame, and the third image frame is a following image frame.

According to an embodiment of the present disclosure, the candidate image frames include neighboring image frames.

According to an embodiment of the present disclosure, the apparatus 500 may further include: the third determination module and the fourth determination module. A third determining module, configured to determine a plurality of video segments from the video to be processed, where adjacent video segments in the plurality of video segments contain the same image frame; and the fourth determining module is used for determining the image frames contained in any video clip in the plurality of video clips as candidate image frames.

According to an embodiment of the present disclosure, the target image frames include a plurality of sets of target image frames, the plurality of sets of target image frames corresponding one-to-one to the plurality of video clips; wherein the generating module 540 includes: removing the sub-module and generating the sub-module. A removing sub-module, configured to remove the repeated target image frames in response to determining that the multiple sets of target image frames include the repeated target image frames, and obtain multiple sets of processed target image frames; and the generating sub-module is used for generating a target video based on the processed multiple groups of target image frames.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the video processing method described above.

According to an embodiment of the present disclosure, there is provided a computer program product comprising a computer program/instruction which, when executed by a processor, implements the video processing method described above.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. The electronic device 600 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, for example, a video processing method. For example, in some embodiments, the video processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When a computer program is loaded into RAM 603 and executed by computing unit 601, one or more steps of the video processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the video processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable video processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A video processing method, comprising:

determining change information of a target object in a candidate image frame based on pixel information of the candidate image frame in a video to be processed;

determining a selection policy for the candidate image frame based on the change information;

selecting a target image frame from the candidate image frames based on the selection policy; and

generating a target video based on the target image frame;

Wherein, the determining the change information of the target object in the candidate image frame based on the pixel information of the candidate image frame in the video to be processed includes:

determining a first difference image based on pixel information of the candidate image frame;

determining a second difference image based on pixel information of the first difference image; and

determining the change information based on pixel information of the second difference image;

wherein the candidate image frames comprise a first image frame, a second image frame and a third image frame; the determining a first difference image based on pixel information of the candidate image frame includes:

determining a first difference sub-image based on pixel differences of the first image frame and the second image frame;

determining a second difference sub-image based on pixel differences between the second image frame and the third image frame;

the determining a second difference image based on pixel information of the first difference image includes:

determining the second difference image based on pixel differences between the first difference sub-image and the second difference sub-image;

wherein the determining the change information based on the pixel information of the second difference image includes:

Adding pixel values of the second difference image to obtain the change information;

wherein the change information includes a change value; the determining a selection policy for the candidate image frame based on the change information includes:

in response to determining that the change value is less than a first threshold, determining the selection policy includes selecting a leading image frame from the candidate image frames, wherein a time instant corresponding to the leading image frame is before a time instant corresponding to other image frames in the candidate image frames;

in response to determining that the change value is greater than or equal to the first threshold and less than a second threshold, determining the selection policy includes selecting the leading image frame and a subsequent image frame from the candidate image frames, wherein a time corresponding to the subsequent image frame is after a time corresponding to other image frames in the candidate image frames; and

in response to determining that the change value is greater than or equal to the second threshold, determining the selection policy includes selecting the leading image frame, the trailing image frame, and an intermediate image frame from the candidate image frames, wherein a time instant corresponding to the intermediate image frame is subsequent to a time instant corresponding to the leading image frame and is prior to a time instant corresponding to the trailing image frame,

Wherein the second threshold is greater than the first threshold.

2. The method of claim 1, wherein the first image frame is the leading image frame, the second image frame is the intermediate image frame, and the third image frame is the trailing image frame.

3. The method of any of claims 1-2, wherein the candidate image frames comprise neighboring image frames.

4. The method of any of claims 1-2, further comprising:

determining a plurality of video clips from the video to be processed, wherein adjacent video clips in the plurality of video clips contain the same image frame; and

and determining the image frames contained in any video clips in the plurality of video clips as the candidate image frames.

5. The method of claim 4, wherein the target image frames comprise a plurality of sets of target image frames, the plurality of sets of target image frames corresponding one-to-one to the plurality of video clips;

wherein the generating a target video based on the target image frame comprises:

in response to determining that the plurality of sets of target image frames include a repeated target image frame, removing the repeated target image frame to obtain a plurality of sets of processed target image frames; and

The target video is generated based on the processed sets of target image frames.

6. A video processing apparatus comprising:

the first determining module is used for determining the change information of the target object in the candidate image frame based on the pixel information of the candidate image frame in the video to be processed;

a second determining module for determining a selection policy for the candidate image frames based on the change information;

a selection module for selecting a target image frame from the candidate image frames based on the selection policy; and

the generation module is used for generating a target video based on the target image frame;

wherein the first determining module includes:

a first determining sub-module for determining a first difference image based on pixel information of the candidate image frame;

a second determining sub-module for determining a second difference image and based on the pixel information of the first difference image

A third determining sub-module, configured to determine the change information based on pixel information of the second difference image;

wherein the candidate image frames comprise a first image frame, a second image frame and a third image frame; the first determination submodule includes:

A first determining unit configured to determine a first difference sub-image based on pixel differences of the first image frame and the second image frame; and

a second determining unit configured to determine a second difference sub-image based on a pixel difference between the second image frame and the third image frame;

wherein the second determining submodule is used for determining a second difference value image based on pixel difference values between the first difference value sub-image and the second difference value sub-image;

the third determining submodule is used for adding pixel values of the second difference image to obtain the change information;

the change information includes a change value; the second determining module includes:

a fourth determining sub-module, configured to determine, in response to determining that the change value is less than a first threshold, that the selection policy includes selecting a leading image frame from the candidate image frames, where a time corresponding to the leading image frame is before a time corresponding to other image frames in the candidate image frames;

a fifth determining submodule, configured to determine, in response to determining that the change value is greater than or equal to the first threshold and less than a second threshold, that the selection policy includes selecting the leading image frame and the trailing image frame from the candidate image frames, where a time corresponding to the trailing image frame is after a time corresponding to other image frames in the candidate image frames; and

A sixth determination sub-module for determining, in response to determining that the change value is equal to or greater than the second threshold, that the selection policy includes selecting the preceding image frame, the following image frame, and an intermediate image frame from the candidate image frames, wherein a time instant corresponding to the intermediate image frame is after a time instant corresponding to the preceding image frame and before a time instant corresponding to the following image frame,

wherein the second threshold is greater than the first threshold.

7. The apparatus of claim 6, wherein the first image frame is the leading image frame, the second image frame is the intermediate image frame, and the third image frame is the trailing image frame.

8. The apparatus of any of claims 6-7, wherein the candidate image frames comprise neighboring image frames.

9. The apparatus of any of claims 6-7, further comprising:

a third determining module, configured to determine a plurality of video segments from the video to be processed, where adjacent video segments in the plurality of video segments contain the same image frame; and

and a fourth determining module, configured to determine an image frame included in any video clip in the plurality of video clips as the candidate image frame.

10. The apparatus of claim 9, wherein the target image frames comprise a plurality of sets of target image frames, the plurality of sets of target image frames corresponding one-to-one to the plurality of video clips;

wherein, the generating module includes:

a removing sub-module, configured to remove the repeated target image frames in response to determining that the multiple sets of target image frames include the repeated target image frames, and obtain processed multiple sets of target image frames; and

a generation sub-module for generating the target video based on the processed sets of target image frames.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.