WO2022116772A1

WO2022116772A1 - Video clipping method and apparatus, storage medium, and electronic device

Info

Publication number: WO2022116772A1
Application number: PCT/CN2021/128711
Authority: WO
Inventors: 吴昊; 王长虎
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2020-12-02
Filing date: 2021-11-04
Publication date: 2022-06-09
Also published as: CN112561839A; CN112561839B

Abstract

A video clipping method and apparatus, a storage medium, and an electronic device. The video clipping method comprises: obtaining size information of an original video to be clipped and a target clipping frame (201); performing storyboard detection on the original video to determine storyboard clips in the original video (202); for each storyboard clip, determining, according to the main content of a target video frame in the storyboard clip, a clipping path corresponding to the storyboard clip (203), the target video frame being a part or all of video frames in the storyboard clip, and the clipping path being used for representing a position movement path of the target clipping frame along the width directions or the length directions of the video frames in all the video frames comprised in the storyboard clip; and according to the size information of the target clipping frame and the clipping path corresponding to each storyboard clip, clipping the original video to obtain a clipped target video (204). The method can reduce the problem of frequent shaking of a picture in the playing process of a clipped video, thereby improving the playing effect of the clipped video.

Description

Video cropping method, device, storage medium and electronic device

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is based on the Chinese application with the application number of 202011401472.X and the filing date of December 2, 2020, and claims its priority. The disclosure of the Chinese application is hereby incorporated into the present application as a whole.

technical field

The present disclosure relates to the technical field of video processing, and in particular, to a video cropping method, apparatus, storage medium and electronic device.

Background technique

Video cropping is a technique required in scenarios where the playback size of the video is inconsistent with the original video. The video cropping algorithm in the related art usually uses a cropping frame of the target playback size to crop each video frame in the video. Specifically, for the text information included in each video frame, a loss function will be applied. When the text is completely within the cropping frame or completely outside the cropping frame, the result of the loss function is the smallest, and when half of the text is within the cropping frame, half In this case outside the clipping box, the result of this loss function is the largest to improve the clipping effect.

However, when a vertical video is cropped into a horizontal video, for text information such as subtitles and logos, in order to satisfy the above loss function, different cropping frame positions will be obtained in different video frames, resulting in The screen shakes frequently during the video playback, which affects the playback effect of the clipped video.

SUMMARY OF THE INVENTION

This Summary is provided to introduce concepts in a simplified form that are described in detail in the Detailed Description section that follows. This summary section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

In a first aspect, the present disclosure provides a video cropping method, the method comprising:

Obtain the size information of the original video to be cropped and the target cropping frame;

Performing split detection on the original video to determine a split segment in the original video;

For each of the mirrored clips, determine a clipping path corresponding to the mirrored clip according to the main content of the target video frame in the mirrored clip, where the target video frame is a partial video frame in the mirrored clip Or all video frames, the clipping path is used to represent the position movement path of the target clipping frame in all video frames included in the mirroring segment along the width direction or the length direction of the video frame;

The original video is cropped according to the size information of the target cropping frame and the cropping path corresponding to each mirroring segment to obtain a cropped target video.

In a second aspect, the present disclosure provides a video cropping device, the device comprising:

The acquisition module is used to acquire the size information of the original video to be cropped and the target cropping frame;

a first determining module, configured to perform mirror detection on the original video to determine mirror segments in the original video;

The second determination module is configured to, for each of the mirrored segments, determine the clipping path corresponding to the mirrored segment according to the main content of the target video frame in the mirrored segment, where the target video frame is the Part of video frames or all video frames in the mirror segment, and the cropping path is used to represent the position of the target cropping frame moving along the width direction or the length direction of the video frame in all video frames included in the mirror segment path;

A cropping module, configured to crop the original video according to the size information of the target cropping frame and the cropping path corresponding to each of the mirroring segments, so as to obtain a cropped target video.

In a third aspect, the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements the steps of the method described in the first aspect.

In a fourth aspect, the present disclosure provides an electronic device, comprising:

a storage device on which a computer program is stored;

A processing device is configured to execute the computer program in the storage device to implement the steps of the method in the first aspect.

Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.

Description of drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale. In the attached image:

1 is a schematic diagram of a clipping result of a video clipping method in the related art;

2 is a flowchart of a video cropping method according to an exemplary embodiment of the present disclosure;

3 is a schematic diagram of interpolation calculation in a video cropping method according to an exemplary embodiment of the present disclosure;

4 is a schematic diagram of smoothing filtering processing in a video cropping method according to an exemplary embodiment of the present disclosure;

5 is a block diagram of a video cropping apparatus according to an exemplary embodiment of the present disclosure;

Fig. 6 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.

As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence. In addition, it should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "a" or more".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

As mentioned in the background art, the video cropping algorithm in the related art applies a loss function to the text information included in each video frame. When the text is completely within the cropping frame or completely outside the cropping frame, the result of the loss function is the smallest , and when half of the text is inside the cropping frame and half is outside the cropping frame, the result of the loss function is the largest to improve the cropping effect.

However, when a vertical video is cropped into a horizontal video, for text information such as subtitles and logos, in order to satisfy the above loss function, different cropping frame positions will be obtained in different video frames, resulting in The screen shakes frequently during the video playback, which affects the playback effect of the clipped video. For example, referring to FIG. 1 , the first video frame, the second video frame and the third video frame are three consecutive frames in the same video, the cropping frame of the first video frame is A, and in the second video frame, in order to frame the subtitles , the cropping frame B is moved down compared to the cropping frame A as a whole. In the third video frame, in order to frame all the faces, the cropping frame C is moved upward as a whole compared to the cropping frame B. Therefore, during the playback of the cropped video, the screen frequently shakes up and down, and the playback effect is poor.

In view of this, the present disclosure provides a video cropping method, device, storage medium and electronic device, so as to reduce the problem of frequent picture shaking during the playback of the cropped video, thereby improving the playback effect of the cropped video. FIG. 2 is a flowchart of a video cropping method according to an exemplary embodiment of the present disclosure. Referring to Figure 2, the video cropping method may include:

Step 201: Obtain the size information of the original video to be cropped and the target cropping frame.

For example, the user can input a URL (Uniform Resource Locator) corresponding to the original video in the electronic device, and then the electronic device can download the original video from the corresponding resource server according to the URL to perform video trimming. Alternatively, the electronic device may, in response to a video trimming request triggered by the user, acquire the stored video from the memory as the original video for video trimming, etc. The embodiment of the present disclosure does not limit the acquisition method of the original video.

For example, the size information of the target cropping frame may define the width and length of the cropped video, which may be determined according to the playback size of the video playback device. For example, the size information of the original video is 720×1280 pixels, and the playback size of the video playback device is 1:1, then the size information of the target cropping frame can be determined to be 720×720 pixels. Alternatively, the size information of the target cropping frame may be customized according to actual business requirements, etc., which is not limited in this embodiment of the present disclosure.

Step 202 , perform segment detection on the original video to determine segment segments in the original video.

Illustratively, split detection can determine different shot segments in the original video. The same storyboard segment includes multiple video frames, and the multiple video frames correspond to the same or similar scene scenes. Therefore, if multiple video frames in the same storyboard segment are cropped by multiple cropping frames with large positional deviations, the It will cause the screen of the cropped video to shake frequently, which will reduce the playback effect of the cropped video. The video frames in different mirroring clips have different camera scenes due to their corresponding camera scenes. Therefore, even if the screen shakes, the video playback effect will not be greatly affected. Therefore, in the embodiment of the present disclosure, in order to reduce the problem of frequent screen shaking during the playback of the cropped video and improve the playback effect of the cropped video, the original video can be detected by mirror detection to determine the mirrored segments in the original video. , so that in the subsequent processing process, a clipping path can be separately planned for each mirroring segment, so that the video frame in each mirroring clip can correspond to a clipping frame with a smaller position change.

Step 203 , for each mirrored segment, according to the main content of the target video frame in the mirrored segment, determine a clipping path corresponding to the mirrored segment. The target video frame is a part or all of the video frames in the mirrored segment, and the clipping path is used to represent the position of the target clipping frame along the width direction or the length direction of the video frame in all the video frames included in the mirrored clip movement path.

For example, the target video frame can be obtained by performing frame extraction processing on the mirrored segment, and the target video frame can include all video frames in the mirrored clip, and can also include some video frames in the mirrored clip. This is not limited. Alternatively, the original video may also be subjected to frame extraction processing, and then the video frames obtained by the frame extraction processing may be marked. After that, the mirror detection is performed again to obtain a mirror segment, and then the video frame with the mark in the mirror segment is used as the target video frame, etc. The embodiment of the present disclosure does not limit the method for determining the target video frame.

Exemplarily, the scene scenes corresponding to the video frames included in each storyboard segment are the same or similar, so the main content included in the video frames is not very different, so the clipping path is determined according to the main content of the target video frame in the storyboard segment, so that the same The positional movement deviation of the target cropping frame in the cropping path corresponding to the storyboard segment is small, thereby reducing the problem of frequent screen shaking during the playback of the cropped video, and improving the video playback effect.

Step 204: Crop the original video according to the size information of the target cropping frame and the cropping path corresponding to each mirroring segment to obtain a cropped target video.

Exemplarily, the size information of the target cropping frame defines the size of the cropping frame, and the cropping path corresponding to each storyboard segment defines the position of the cropping frame, so that the size information of the target cropping frame and the cropping corresponding to each storyboard segment can be determined according to the size information of the target cropping frame. The path crops each frame of the original video to obtain the cropped target video.

In a possible way, for each segment, each video frame in the segment is trimmed according to the size information of the target cropping frame and the cropping path corresponding to the segment, and then each clipped Video frames are spliced in chronological order to obtain the cropped target video. That is to say, each video frame in each mirrored segment is cropped first, and then the cropped video frames corresponding to different mirrored segments are spliced in chronological order to obtain a cropped target video.

For example, the first mirror clip includes video frame 1, video frame 2 and video frame 3, the second mirror clip includes video frame 4 and video frame 5, video frame 1, video frame 2, video frame 3, video frame 4 The time corresponding to the video frame 5 increases sequentially, that is, the video frame 1, the video frame 2, the video frame 3, the video frame 4 and the video frame 5 are played in sequence during the video playback. In this case, video frame 1, video frame 2, and video frame 3 included in the first mirroring segment can be cropped, while the second mirroring segment including video frame 4 and video frame 5 can be cropped, and then The video frames included in the cropped first mirroring segment and the video frames included in the cropped second mirroring segment are spliced in chronological order to obtain a cropped target video.

In the above manner, for each mirrored segment, a clipping path corresponding to the mirrored clip can be determined according to the main content of the target video frame in the mirrored clip, so as to perform video clipping according to the corresponding clipping path of each mirrored clip. Since the scene scenes corresponding to the video frames included in each storyboard segment are the same or similar, the main content included in the video frames is not very different, so the clipping path is determined according to the main content of the target video frame in the storyboard segment, which can make the same storyboard. The positional deviation of the cropping frame in the cropping path corresponding to the clip is small, thereby reducing the problem of frequent screen shaking during the playback of the cropped video, and improving the video playback effect. For example, the original video may be a vertical version video, and the size information of the target cropping frame may be the size information corresponding to the horizontal version video, so that the video cropping method provided by the present disclosure can solve the problem of cropping in the scene where the vertical version video is cropped into the horizontal version video. After the video is shaken, the video playback effect is improved.

In order to make those skilled in the art better understand the video cropping method provided by the present disclosure, the above steps are illustrated in detail below.

In a possible manner, performing sub-slice detection on the original video to determine the sub-segment in the original video may be: performing sub-scenario detection on the original video by a frame difference method to determine the sub-segment in the original video; or, Input the original video into the pre-trained mirror detection model, and determine the mirror segment in the original video according to the output result of the mirror detection model. The mirror detection model is based on the sample video and the sample mirror segment corresponding to the sample video. obtained by training.

Illustratively, the frame difference method can be one of the moving object detection and segmentation methods. The basic principle is to use pixel-based time difference between two or three adjacent frames of an image sequence to extract the moving region in the image through occlusion. The specific calculation method of the frame difference method is similar to that in the related art, and will not be repeated here. In the embodiment of the present disclosure, the motion regions corresponding to different video frames in the original video can be determined by the frame difference method, so that the video frames with the same or similar motion regions belong to the same mirror segment, and then at least one corresponding to the original video can be obtained. A storyboard segment. Alternatively, a mirroring detection model may also be trained according to the sample video and the sample mirroring segments corresponding to the sample video, so that at least one mirroring segment corresponding to the original video is determined through the trained mirror detection model.

After obtaining at least one segmented segment corresponding to the original video, for each segmented segment, a clipping path corresponding to the segmented segment may be determined according to the main content of the target video frame in the segmented segment. In a possible way, if the target video frame is all the video frames in the mirrored segment, then for each target video frame, a target cropping frame that can include the main content in the target video frame can be determined, and then the target cropping frame can be determined Position coordinates in the width direction or length direction of the target video frame to obtain the clipping path. However, in this method, the target cropping frame needs to be determined one by one for each video frame in the mirroring segment, which requires a large amount of calculation, thereby affecting the efficiency of video cropping.

In order to solve this problem and improve the video cropping efficiency, in a possible way, if the target video frame is a part of the video frame in the mirror clip, then for each target video frame, a target that can include the main content in the target video frame can be determined The cropping frame is determined, and the position coordinates of the target cropping frame in the width direction or the length direction of the target video frame are determined. Then, interpolation calculation is performed according to the position coordinates of each target video frame, so as to obtain the position coordinates of the target cropping frame corresponding to other video frames in the mirroring segment except the target video frame. Finally, the clipping path corresponding to the mirroring clip is determined according to the position coordinates corresponding to each video frame in the mirroring clip.

For example, the main content may be the main picture content occupying most of the image area, for example, in a video in which a character is explaining a story, the character is the main content in the target video frame. For each target video frame, at least one of the following detection methods may be performed to determine the main content: saliency detection, face detection, text detection, and logo detection. Among them, saliency detection is used to detect the position of the main component of the target video frame. Face detection is used to detect the location of the face in the target video frame. Text detection is used to detect the position and content of text in the target video frame. Logo detection is used to detect the location of the logo, watermark, etc. in the target video frame. In addition, it is also possible to perform frame detection on the target video frame before detecting the main content, and then remove the detected black borders, Gaussian blur and other useless frames, so as to improve the detection accuracy of the subsequent main content.

In a possible manner, determining the position coordinates of the target cropping frame in the width direction or the length direction of the target video frame may be: if it is determined to crop along the width direction according to the size information of the original video and the size information of the target cropping frame, then The position coordinate of the cropping frame in the width direction of the target video frame. If it is determined to perform cropping along the length direction according to the size information of the original video and the size information of the target cropping frame, the position coordinates of the target cropping frame in the length direction of the target video frame are determined.

It should be understood that in general, the length and width of the corresponding frame picture in the original video can be cropped respectively according to the size information of the target cropping frame, so the position coordinates of the target cropping frame in the width direction and the length direction of the target video frame can be determined, That is, if the width direction of the target video frame is taken as the X axis, and the length direction of the target video frame is taken as the Y axis, the position coordinates include the X coordinate value and the Y coordinate value.

In the embodiment of the present disclosure, in order to improve the video cropping efficiency, cropping may be performed along the length or width of the corresponding frame according to the size information of the target cropping frame and the size information of the original video. For example, if the size information of the target cropping frame is 1:1, and the size information of the original video is 720×1280 pixels, then the length of the corresponding frame picture (along the y-axis direction) can be cropped, and the size of the cropped video is 720×720 pixel. In this case, the position coordinates of the target cropping frame in the length direction of the target video frame can be determined, that is, if the width direction of the target video frame is taken as the X axis, and the length direction of the target video frame is taken as the Y axis, then the position coordinates Include the Y coordinate value. In other cases, if it is determined to crop along the width direction according to the size information of the target cropping frame and the size information of the original video, the position coordinates of the target cropping frame in the width direction of the target video frame can be determined, that is, the position coordinates include the X coordinate value.

Illustratively, the interpolation calculation is performed according to the position coordinates of each target video frame, and the position coordinates may be the position coordinates of the target crop frame in the width direction of the target video frame, the position coordinates of the target crop frame in the length direction of the target video frame, and Any one of the position coordinates of the target cropping frame in the width direction and the length of the target video frame can be determined according to actual business requirements in specific applications. For ease of understanding, the following description is made with the position coordinates being the position coordinates of the target cropping frame in the length direction of the target video frame (that is, the position coordinates include the Y coordinate value).

For example, the position coordinates of each target video frame are y ₀ , y ₁ , y ₂ , ..., y _n-1 (n is the number of target video frames), so the position coordinates can be interpolated to obtain the original video The position coordinates of the target cropping frame corresponding to other video frames other than the target video frame. The interpolation calculation method may be any interpolation calculation method in the related art, which is not limited in the present disclosure.

In a possible manner, considering that the usual linear interpolation method may lead to abrupt changes in the interpolation position, which may cause the position of the target cropping frame in the cropping path obtained by interpolation calculation to move greatly, thereby causing the video picture to shake after cropping, the present disclosure The embodiment adopts a cubic spline interpolation calculation method, so that the cropping path obtained by the interpolation calculation is smoother, and the shaking of the video picture after cropping is reduced.

Specifically, the objective function can be determined according to the position coordinates of each target video frame, wherein the objective function includes a plurality of segment functions, and each segment function is determined according to the position coordinates of every two adjacent target video frames. , and each piecewise function and objective function are cubic equations in which the independent variable is time and the dependent variable is the position coordinate, and the first-order derivative and second-order derivative of the objective function are continuous in time. Correspondingly, the position coordinates of the target cropping frame corresponding to the other video frames can be determined according to the target function and the time corresponding to the other video frames except the target video frame in the mirroring segment.

Illustratively, the position coordinates of every two adjacent target video frames can be used as a segment interval, and each segment interval can correspond to a segment function, and the segment function is that the independent variable is time, and the dependent variable is the position coordinate. The cubic equation of , so the piecewise curve corresponding to each piecewise function can be obtained. The objective function includes multiple piecewise functions, that is, the objective function is the sum of the multiple piecewise functions. Moreover, the objective function is a cubic equation with the independent variable as time and the dependent variable as the position coordinate. The first-order derivative and the second-order derivative are continuous in time, so the piecewise curves corresponding to multiple piecewise functions can be connected into a smooth curve, Reduce the shaking of the video screen after cropping.

For example, for the interval variable time t: t ₀ ≤t ₁ ≤t ₂ ≤...≤t n- ₁ , the corresponding position coordinates y: y ₀ ≤y ₁ ≤y ₂ ≤...≤y _n-1 , the objective function S( t)=S ₀ (t)+S ₁ (t)+S ₂ (t)+...+S _n-1 (t) satisfies: 1) In each segment interval [t _i ,t _i+1 ], The piecewise function S _i (t) is a cubic function; 2) S _i (t)=y _i ; 3) The first derivative S'(t) and the second derivative S'(t) of the objective function S(t) are in [t ₀ ,t _n-1 ] is continuous, and the objective function S(t) is a cubic function. Wherein, the expression of S _i (t) can be: S _i (t)=a _i +b _i (tt _i ) +c _i (tt _i ) ² +d _i (tt _i ) ³ , where i takes the value of 0, 1, 2, ..., (n-1).

That is to say, the embodiment of the present disclosure can calculate the position coordinates of the target cropping frame corresponding to other video frames in the mirroring segment according to the position coordinates of the target cropping frame in each target video frame based on cubic spline interpolation. The second-order continuity of cubic spline interpolation can make the clipping path calculated by the interpolation smoother, thereby reducing the shaking of the clipped video image. For example, referring to FIG. 3, for the position coordinates Y1, Y2, Y3 and Y4 of the target video frame, the interpolation calculation is performed in the above-mentioned manner, and the position coordinates of other video frames between each two adjacent target videos can be obtained, thereby determining the The clipping path corresponding to the storyboard clip.

In a possible way, interpolation calculation is performed according to the position coordinates of each target video frame to obtain the position coordinates of the target cropping frame corresponding to other video frames in the mirroring segment except the target video frame, and it can also be: The position coordinates of each target video frame are processed by smoothing filtering to obtain the smooth position coordinates of each target video frame, and then interpolation calculation is performed according to the smooth position coordinates of each target video frame to obtain the target video frame in the mirror segment. The position coordinates of the target cropping frame corresponding to other video frames.

For example, the position coordinates of each target video frame may be processed through Gaussian smoothing filtering to obtain the smooth position coordinates of each target video frame. For example, if the position coordinates of the target cropping frame in each target video frame are y ₀ , y ₁ , y ₂ , ..., y _n-1 , a Gaussian smoothing filtering method with a window of 2M+1 (M is a positive integer) can be used . Among them, the weight corresponding to the position of the deviation △y from the center of the window conforms to the following Gaussian distribution:

The sliding window convolution kernel of length 2M+1 is [G(-M), G(-M+1),...,G(0),...,G(M-1),G(M)]. For the meaning of the relevant parameters in the Gaussian distribution formula, reference may be made to the related art, which will not be repeated here. Certainly, in other possible manners, the position coordinates of each target video frame may also be processed through other smoothing filtering manners, such as mean filtering, etc., which is not limited in this embodiment of the present disclosure.

Then, interpolation calculation may be performed according to the smooth position coordinates of each target video frame, so as to obtain the position coordinates of the target cropping frame corresponding to other video frames in the mirroring segment except the target video frame. In this way, since the position coordinates of the target cropping frame in the target video frame are smoothly filtered before the interpolation calculation, the positional offset between the target cropping frames in the target video frame can be reduced. For example, referring to Fig. 4 , performing smooth filtering processing on the initial position coordinates Y1, Y2, Y3 and Y4 of the target cropping frame in the target video frame, the smooth position coordinates Y1', Y2', Y3' and Y4' can be obtained. It can be seen from Figure 4 that, compared with the initial position coordinates, the position offset between the smooth position coordinates is reduced, so that the position offset between the target cropping frames corresponding to other video frames obtained by interpolation can be reduced, and the cropping can be further improved. The smoothness of the path reduces the shaking of the video screen after cropping.

Based on the same inventive concept, the present disclosure also provides a video cropping device, which can become part or all of an electronic device through software, hardware, or a combination of the two. 5, the video cropping device 500 includes:

Obtaining module 501, for obtaining the size information of the original video to be cropped and the target cropping frame;

a first determining module 502, configured to perform mirror detection on the original video to determine mirror segments in the original video;

The second determination module 503 is configured to, for each of the mirrored segments, determine a clipping path corresponding to the mirrored segment according to the main content of the target video frame in the mirrored segment, where the target video frame is the A part of the video frames or all of the video frames in the storyboard clip, the clipping path is used to represent the position of the target clipping frame along the width direction or the length direction of the video frame in all the video frames included in the mirror clip clip moving path;

The cropping module 504 is configured to crop the original video according to the size information of the target cropping frame and the cropping path corresponding to each of the mirrored segments to obtain a cropped target video.

Optionally, the second determining module 503 is used for:

For each of the target video frames, determine a target cropping frame that can include the main content in the target video frame, and determine the position coordinates of the target cropping frame in the width direction or the length direction of the target video frame;

Perform interpolation calculation according to the position coordinates of each target video frame, so as to obtain the position coordinates of the target crop frame corresponding to other video frames in the mirror segment except the target video frame;

According to the position coordinates corresponding to each video frame in the mirrored clip, the clipping path corresponding to the mirrored clip is determined.

Optionally, the second determining module 503 is used for:

If it is determined to cut along the width direction according to the size information of the original video and the size information of the target cropping frame, then determine the position coordinates of the target cropping frame in the width direction of the target video frame;

If it is determined to perform cropping along the length direction according to the size information of the original video and the size information of the target cropping frame, the position coordinates of the target cropping frame in the length direction of the target video frame are determined.

Optionally, the second determining module 503 is used for:

An objective function is determined according to the position coordinates of each target video frame, wherein the target function includes a plurality of segment functions, each of which is based on the position coordinates of every two adjacent target video frames and determined, and each of the piecewise function and the objective function is a cubic equation whose independent variable is time and the dependent variable is the position coordinate, and the first-order derivative and second-order derivative of the objective function are in time continuous;

Determine the position coordinates of the target cropping frame corresponding to the other video frames according to the objective function and the time corresponding to the other video frames in the mirroring segment except the target video frame.

Optionally, the second determining module 503 is used for:

performing smooth filtering processing on the position coordinates of each target video frame to obtain the smooth position coordinates of each target video frame;

Interpolation calculation is performed according to the smooth position coordinates of each target video frame, so as to obtain the position coordinates of the target cropping frame corresponding to other video frames in the mirror segment except the target video frame.

Optionally, the first determining module 502 is configured to:

Perform mirror detection on the original video by the frame difference method to determine the mirror segments in the original video; or, input the original video into a pre-trained mirror detection model, and detect the mirror according to the mirror The output result of the model determines the mirror segment in the original video, and the mirror detection model is obtained by training according to the sample video and the sample mirror segment corresponding to the sample video.

Optionally, the cropping module 504 is used for:

For each of the mirrored clips, according to the size information of the target cropping frame and the clipping path corresponding to the mirrored clip, trimming each video frame in the mirrored clip;

Each clipped video frame is spliced in time sequence to obtain the clipped target video.

Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

Based on the same inventive concept, an embodiment of the present disclosure also provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, implements the steps of any of the above video cropping methods.

Based on the same inventive concept, an embodiment of the present disclosure also provides an electronic device, including:

a storage device on which a computer program is stored;

A processing device is configured to execute the computer program in the storage device to implement the steps of any of the above video cropping methods.

Referring next to FIG. 6 , it shows a schematic structural diagram of an electronic device 600 suitable for implementing an embodiment of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in FIG. 6 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

As shown in FIG. 6, an electronic device 600 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 601 that may be loaded into random access according to a program stored in a read only memory (ROM) 602 or from a storage device 608 Various appropriate actions and processes are executed by the programs in the memory (RAM) 603 . In the RAM 603, various programs and data required for the operation of the electronic device 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604 .

Typically, the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 607 of a computer, etc.; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609. Communication means 609 may allow electronic device 600 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 6 shows electronic device 600 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 609, or from the storage device 608, or from the ROM 602. When the computer program is executed by the processing apparatus 601, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol), can be used for communication, and can communicate with digital data in any form or medium (eg, communication network) interconnection. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the original video to be trimmed and the size information of the target trimming frame; The video is subjected to mirror detection to determine the mirror segment in the original video; for each mirror segment, according to the main content of the target video frame in the mirror segment, the cropping corresponding to the mirror segment is determined path, the target video frame is a part of the video frame or all the video frames in the mirror clip, and the clipping path is used to represent that the target clip frame is in all the video frames included in the mirror clip along the The position movement path in the width direction or the length direction of the video frame; according to the size information of the target cropping frame and the cropping path corresponding to each of the mirror segments, the original video is cropped to obtain the cropped target video.

Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. Where a remote computer is involved, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

The modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the module does not constitute a limitation of the module itself under certain circumstances.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, Example 1 provides a video cropping method, including:

According to one or more embodiments of the present disclosure, Example 2 provides the method of Example 1, wherein determining the clipping path corresponding to the mirrored segment according to the main content of the target video frame in the mirrored segment, including:

According to one or more embodiments of the present disclosure, Example 3 provides the method of Example 2, wherein the determining the position coordinates of the target cropping frame in the width direction or the length direction of the target video frame includes:

According to one or more embodiments of the present disclosure, Example 4 provides the method of Example 2, wherein the interpolation calculation is performed according to the position coordinates of each target video frame, so as to obtain the target video in the mirroring segment divided by the target video. The position coordinates of the target cropping frame corresponding to other video frames outside the frame, including:

According to one or more embodiments of the present disclosure, Example 5 provides the method of any one of Examples 2-4, wherein the interpolation calculation is performed according to the position coordinates of each target video frame to obtain the The position coordinates of the target cropping frame corresponding to other video frames except the target video frame, including:

Smoothing filtering is performed on the position coordinates of each target video frame to obtain the smooth position coordinates of each target video frame;

According to one or more embodiments of the present disclosure, Example 6 provides the method of any one of Examples 1-4, the performing segment detection on the original video to determine segment segments in the original video, comprising: :

According to one or more embodiments of the present disclosure, Example 7 provides the method of any one of Examples 1-4, according to the size information of the target cropping frame and the cropping path corresponding to each of the mirroring segments The original video is cropped to obtain the cropped target video, including:

According to one or more embodiments of the present disclosure, Example 8 provides a video cropping apparatus, the apparatus comprising:

The second determination module is configured to, for each of the mirrored segments, determine the clipping path corresponding to the mirrored segment according to the main content of the target video frame in the mirrored segment, where the target video frame is the Some or all of the video frames in the mirror segment, and the clipping path is used to represent the position of the target clipping frame moving along the width direction or the length direction of the video frame in all the video frames included in the mirror clip path;

According to one or more embodiments of the present disclosure, Example 9 provides the apparatus of Example 8, the second determining module is configured to:

According to one or more embodiments of the present disclosure, Example 10 provides the apparatus of Example 9, the second determining module being configured to:

According to one or more embodiments of the present disclosure, Example 11 provides the apparatus of Example 9, the second determining module being configured to:

According to one or more embodiments of the present disclosure, Example 12 provides the apparatus of any one of Examples 9-11, wherein the second determination module is configured to:

According to one or more embodiments of the present disclosure, Example 13 provides the apparatus of any one of Examples 8-11, wherein the first determining module is configured to:

According to one or more embodiments of the present disclosure, Example 14 provides the apparatus of any one of Examples 8-11, the cropping module for:

For each described mirror segment, according to the size information of the target cropping frame and the clipping path corresponding to the mirror segment, each video frame in the mirror segment is trimmed;

According to one or more embodiments of the present disclosure, Example 15 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the method of any one of Examples 1 to 7.

According to one or more embodiments of the present disclosure, Example 16 provides an electronic device comprising:

a storage device on which a computer program is stored;

A processing device for executing the computer program in the storage device to implement the steps of the method in any one of Examples 1 to 7.

The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims. Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

Claims

A video cropping method, the method comprising:

Obtain the size information of the original video to be cropped and the target cropping frame;

Performing split detection on the original video to determine a split segment in the original video;

For each of the mirrored clips, determine a clipping path corresponding to the mirrored clip according to the main content of the target video frame in the mirrored clip, where the target video frame is a partial video frame in the mirrored clip Or all video frames, the clipping path is used to represent the position movement path of the target clipping frame in all video frames included in the mirroring segment along the width direction or the length direction of the video frame;

The original video is cropped according to the size information of the target cropping frame and the cropping path corresponding to each mirroring segment to obtain a cropped target video.
The method according to claim 1, wherein the determining of the clipping path corresponding to the mirroring segment according to the main content of the target video frame in the mirroring segment comprises:

For each of the target video frames, determine a target cropping frame that can include the main content in the target video frame, and determine the position coordinates of the target cropping frame in the width direction or the length direction of the target video frame;

Perform interpolation calculation according to the position coordinates of each target video frame, so as to obtain the position coordinates of the target crop frame corresponding to other video frames in the mirror segment except the target video frame;

According to the position coordinates corresponding to each video frame in the mirrored clip, the clipping path corresponding to the mirrored clip is determined.
The method according to claim 2, wherein the determining the position coordinates of the target cropping frame in the width direction or the length direction of the target video frame comprises:

If it is determined to cut along the width direction according to the size information of the original video and the size information of the target cropping frame, then determine the position coordinates of the target cropping frame in the width direction of the target video frame;

If it is determined to perform cropping along the length direction according to the size information of the original video and the size information of the target cropping frame, the position coordinates of the target cropping frame in the length direction of the target video frame are determined.
The method according to claim 2, wherein the interpolation calculation is performed according to the position coordinates of each target video frame to obtain target cropping corresponding to other video frames except the target video frame in the mirroring segment The position coordinates of the box, including:

An objective function is determined according to the position coordinates of each target video frame, wherein the target function includes a plurality of segment functions, each of which is based on the position coordinates of every two adjacent target video frames and determined, and each of the piecewise function and the objective function is a cubic equation whose independent variable is time and the dependent variable is the position coordinate, and the first-order derivative and second-order derivative of the objective function are in time continuous;

Determine the position coordinates of the target cropping frame corresponding to the other video frames according to the objective function and the time corresponding to the other video frames in the mirroring segment except the target video frame.
The method according to any one of claims 2-4, wherein the interpolation calculation is performed according to the position coordinates of each target video frame to obtain other videos in the mirror segment except the target video frame The position coordinates of the target cropping frame corresponding to the frame, including:

performing smooth filtering processing on the position coordinates of each target video frame to obtain the smooth position coordinates of each target video frame;

Interpolation calculation is performed according to the smooth position coordinates of each target video frame, so as to obtain the position coordinates of the target cropping frame corresponding to other video frames in the mirror segment except the target video frame.
The method according to any one of claims 1-5, wherein the performing mirror detection on the original video to determine a mirror segment in the original video, comprising:

Perform mirror detection on the original video by the frame difference method to determine the mirror segments in the original video; or, input the original video into a pre-trained mirror detection model, and detect the mirror according to the mirror The output result of the model determines the mirror segment in the original video, and the mirror detection model is obtained by training according to the sample video and the sample mirror segment corresponding to the sample video.
The method according to any one of claims 1-6, wherein the original video is cropped according to the size information of the target cropping frame and the cropping path corresponding to each of the mirroring segments to obtain The cropped target video includes:

For each of the mirrored clips, according to the size information of the target cropping frame and the clipping path corresponding to the mirrored clip, trimming each video frame in the mirrored clip;

Each clipped video frame is spliced in time sequence to obtain the clipped target video.
A video cropping device, the device comprising:

an acquisition module, configured to acquire the size information of the original video to be cropped and the target cropping frame;

a first determining module, configured to perform mirror detection on the original video to determine mirror segments in the original video;

The second determining module is configured to, for each of the mirrored segments, determine a clipping path corresponding to the mirrored segment according to the main content of the target video frame in the mirrored segment, where the target video frame is the Part of the video frames or all video frames in the storyboard clip, the clipping path is used to represent the position of the target clipping frame along the width direction or the length direction of the video frame in all the video frames included in the mirror clip clip moving path;

The cropping module is configured to crop the original video according to the size information of the target cropping frame and the cropping path corresponding to each of the mirroring segments, so as to obtain a cropped target video.
A computer-readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the method of any one of claims 1-7.
An electronic device comprising:

a storage device on which a computer program is stored;

A processing device configured to execute the computer program in the storage device to implement the steps of the method of any one of claims 1-7.
A computer program product comprising a computer program which, when executed by a processing device, implements the steps of the method of any one of claims 1-7.