CN112561840B

CN112561840B - Video clipping method and device, storage medium and electronic equipment

Info

Publication number: CN112561840B
Application number: CN202011405847.XA
Authority: CN
Inventors: 吴昊; 王长虎
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2024-05-28
Anticipated expiration: 2040-12-02
Also published as: WO2022116947A1; CN112561840A

Abstract

The disclosure relates to a video clipping method, a video clipping device, a storage medium and electronic equipment, so as to save time expenditure in a video clipping process and improve video clipping efficiency. The video clipping method comprises the following steps: acquiring an original video to be cut and a target cutting size; determining an initial cutting frame according to the main body content of a plurality of target video frames in the original video; determining whether to cut each frame picture of the original video through the same cutting frame according to the difference condition between the size of the cutting frame and the target cutting size and the change condition of main body content in a plurality of target video frames; if each frame of the original video is determined to be cut through the same cutting frame, determining a target cutting frame capable of comprising the most main content in the original video from a plurality of candidate cutting frames, wherein the size of each candidate cutting frame is consistent with the size of the initial cutting frame; and cutting the original video according to the target cutting frame.

Description

Video clipping method and device, storage medium and electronic equipment

Technical Field

The disclosure relates to the technical field of video processing, in particular to a video clipping method, a video clipping device, a storage medium and electronic equipment.

Background

Video cropping is a technique required to work in scenes where the video play size is not consistent with the original video. The related art video cropping algorithm uses a cropping frame of a target playing size to crop each video frame in the video according to the content information in the original video, and then reassembles the cropped video frames into the video. Specifically, a clipping position is required to be planned for each video frame through a dynamic programming optimization algorithm, interpolation smoothing is performed on the clipping position, and finally video encoding is performed on the clipped video frames to form a new video. In the whole process, the time cost is relatively large, and the low-delay video cropping scene cannot be well met.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a video cropping method, the method comprising:

Acquiring an original video to be cut and a target cutting size;

determining an initial cutting frame according to the main body content of a plurality of target video frames in the original video;

Determining whether to cut each frame of the original video through the same cutting frame according to the difference between the size of the initial cutting frame and the target cutting size and the change condition of main body contents in the target video frames;

If each frame of picture of the original video is determined to be cut through the same cutting frame, determining a target cutting frame capable of comprising the most main body content in the original video in a plurality of candidate cutting frames, wherein the size of each candidate cutting frame is consistent with the size of the initial cutting frame;

And clipping the original video according to the target clipping frame.

In a second aspect, the present disclosure provides a video cropping device, the device comprising:

the acquisition module is used for acquiring an original video to be cut and a target cutting size;

The first determining module is used for determining a cutting frame according to the main body content of a plurality of target video frames in the original video;

The second determining module is used for determining whether each frame picture of the original video is cut through the same cutting frame according to the difference condition between the size of the initial cutting frame and the target cutting size and the change condition of main body content in the plurality of target video frames;

A third determining module, configured to determine, when determining that each frame of the original video is cropped by the same cropping frame, a target cropping frame capable of including the most main content in the original video among a plurality of candidate cropping frames, where a size of each of the candidate cropping frames is consistent with a size of the initial cropping frame;

And the clipping module is used for clipping the original video according to the target clipping frame.

In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which when executed by a processing device implements the steps of the method described in the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising:

A storage device having a computer program stored thereon;

Processing means for executing said computer program in said storage means to carry out the steps of the method described in the first aspect.

According to the technical scheme, in the video clipping process, whether each frame of picture of the original video is clipped through the same clipping frame can be determined, if so, the calculation of dynamically planning clipping paths and the calculation of interpolation smoothing are not needed, so that the time cost of video clipping is saved, and the video clipping efficiency is improved. And after each frame of picture of the original video is cut through the same cutting frame, a target cutting frame which can comprise the most main body content in the original video can be determined in a plurality of candidate cutting frames, so that time expenditure is saved, important information in the original video is contained to the maximum extent, content loss in the cut video is avoided, and video cutting effect is ensured.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flowchart illustrating a video cropping method according to an exemplary embodiment of the present disclosure;

FIG. 2 is a block diagram of a video cropping device, shown in accordance with an exemplary embodiment of the present disclosure;

fig. 3 is a block diagram of an electronic device, according to an exemplary embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units. It is further noted that references to "one" or "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

As described in the background art, the related art video cropping algorithm uses a cropping frame of a target playing size to crop each video frame in the video according to the content information in the original video, and then reassembles the cropped video frames into the video. However, according to the video clipping method of the related art, the inventor researches that because clipping positions are required to be planned for each video frame and interpolation smoothing and video encoding are performed, the time cost in the video clipping process is relatively large, and the video clipping scene with low delay cannot be well met. Furthermore, the inventors have also studied and found that the subject content in the video to be cropped may be relatively fixed (such as a portrait video, or a video in which a person is story-telling, etc.), in which case it is not necessary to determine a crop box separately for each frame. A clipping scheme that fixes clipping positions for all frames may be more suitable in a scenario where low latency is required and where the subject content does not vary much.

In view of this, the present disclosure provides a video clipping method, apparatus, storage medium and electronic device, so as to first determine whether a clipping scheme of a fixed clipping position is available in a video clipping process, and if the clipping scheme of the fixed clipping position is available, calculate a position of an optimal clipping frame according to a main content of a video, and clip each frame of picture of the video according to the optimal clipping frame, so as to maximally include important information in the video while saving time overhead.

Fig. 1 is a flowchart illustrating a video cropping method according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the video cropping method includes:

step 101, obtaining an original video to be cut and a target cutting size.

For example, a user may enter a URL (Uniform Resource Locator ) corresponding to an original video in an electronic device, and the electronic device may then download the original video from the corresponding resource server according to the URL for video cropping. Or the electronic device may respond to a video cropping request triggered by the user, acquire the stored video from the memory as the original video to perform video cropping, etc., and the embodiment of the disclosure does not limit the acquisition manner of the original video.

For example, the target crop size may define a width and a length of the cropped video, which may be determined according to a playback size of the video playback device. For example, the original video has a size of 720×1280 pixels, and the video playing device has a playing size of 1:1, then the target crop size may be determined to be 720 x 720 pixels. Or the target clipping size may be customized according to actual service requirements, etc., which is not limited by the embodiments of the present disclosure.

Step 102, determining an initial cropping frame according to the main body content of a plurality of target video frames in the original video.

The plurality of video target frames may be obtained by performing frame extraction processing on the original video, and may be part or all of all video frames in the original video. That is, the frame extraction process may be to extract a portion of the video frame of the original video, or may be to extract all of the video frames of the original video, which is not limited in this embodiment of the disclosure.

For example, the subject content may be a main picture content that occupies a large portion of the image area, such as in a video in which a person is story-telling, the person being the subject content in the target video frame. For each target video frame, at least one of the following detection methods may be performed to determine the subject content: significance detection, face detection, text detection, logo detection. Wherein the saliency detection is used to detect a subject component position of the target video frame. The face detection is used for detecting the position of the face in the target video frame. The text detection is used for detecting the position of the text in the target video frame and the text content. logo detection is used for detecting the positions of content such as logo and watermark in a target video frame. In addition, before the main body content is detected, the frame detection is performed on the target video frame, and then the detected useless frames such as black edges, gaussian blur and the like are removed, so that the detection accuracy of the follow-up main body content is improved.

After determining the subject content of the plurality of target video frames in the original video, a crop frame capable of including the subject content of the plurality of target video frames may be determined as an initial crop frame, to determine whether the original video is suitable for the crop scheme of the fixed crop frame through the initial crop frame.

Step 103, determining whether to clip each frame of the original video by the same clipping frame according to the difference between the size of the initial clipping frame and the target clipping size and the change of the main body content in the plurality of target video frames.

Illustratively, as already described above, the initial crop box may be a crop box that is capable of including subject content in a plurality of target video frames. Therefore, through the difference between the size of the initial cutting frame and the target cutting size, whether the original video is suitable for the cutting scheme of the fixed cutting frame can be primarily judged. Specifically, if the initial cropping frame has a larger difference from the target cropping size, it means that in the subsequent processing process, in order to make the cropped video reach the target cropping size, more frame filling (such as black filling) is required, and the cropped video has a poor look and feel, so that it is not suitable for cropping each frame of the original video through the same cropping frame. Otherwise, each frame of the original video can be cut through the same cutting frame. In addition, when the main content of the plurality of target video frames varies greatly, if each frame of the original video is cropped by the same cropping frame, most of the content of different frames may be lost, and therefore it is not suitable to crop each frame of the original video by the same cropping frame. Otherwise, each frame of the original video can be cut through the same cutting frame.

Step 104, if it is determined that each frame of the original video is cropped by the same cropping frame, a target cropping frame capable of including the most main content in the original video is determined from the plurality of candidate cropping frames. Wherein the size of each candidate crop box is consistent with the size of the initial crop box.

For example, the initial crop box is a crop box that can include the subject content in a plurality of target video frames in the original video, thus illustrating that the size of the initial crop box is suitable for cropping of the original video, so that in determining the target crop box, a plurality of candidate crop boxes with a size consistent with the initial crop box can be determined first. The plurality of candidate crop frames may include an initial crop frame, where possible.

For example, the determination of the plurality of candidate crop boxes may be: for each frame of picture, taking a preset cutting frame with the same size as the initial cutting frame as an initial candidate cutting frame, wherein the preset cutting frame is overlapped with at least one edge of the frame of picture. And then the candidate cutting frame can be moved according to the preset position offset to obtain a new candidate cutting frame position until the boundary of the candidate cutting frame coincides with or exceeds the target video frame. The preset position offset may be set according to the actual situation, for example, the preset position offset may be set to 20 pixels, in this case, the initial candidate cropping frame may be moved laterally (or moved longitudinally) by 20 pixels to obtain a new candidate cropping frame, and then the new candidate cropping frame is moved laterally (or moved longitudinally) by 20 pixels to obtain a new candidate cropping frame, and so on until the boundary of the candidate cropping frame coincides with or exceeds the target video frame. Thus, a plurality of candidate crop frames can be obtained.

It should be understood that if it is determined that the original video is cropped by the same cropping frame, it is explained that the main content in the original video is not greatly changed, so that including the main content of the most main content of the frame picture in the original video is equivalent to including the main content of the frame picture in the original video, so that among the plurality of candidate cropping frames, a target cropping frame capable of including the main content of the frame picture most main content in the original video can be determined according to the inclusion degree of each candidate cropping frame to the main content of the frame picture in the original video, so that the target cropping frame maximally encloses important information in the video, and the content loss of the cropped video is reduced.

And 105, clipping the original video according to the target clipping frame.

For example, the video cropping method provided by the present disclosure may be applied to a server, where after determining a target cropping frame, the server may crop each frame of an original video according to the target cropping frame, then stitch the cropped frame into a video according to a time sequence, and finally send the stitched video (i.e., the cropped video) to a front-end device (such as a video playing device) for playing.

Or in a possible mode, considering that the whole video only needs one cutting frame, the video coding can be bypassed, and the relevant information of the cutting frame is directly given to the front-end equipment, so that the front-end equipment directly plays the video according to the relevant information of the cutting frame, and the time cost of video coding is saved. That is, the size information and the position information of the target crop frame may be transmitted to the video playback device to cause the video playback device to crop the original video according to the target crop frame. The size information of the target cutting frame defines the length and the width of the target cutting frame, and the position information of the target cutting frame defines the position coordinate information of the target cutting frame along the width and/or the length direction of the video frame picture. After receiving the size information and the position information of the target cutting frame, the video playing device can play the video content included in the target cutting frame in the video playing process so as to cut the video.

By the method, in the video clipping process, whether each frame of picture of the original video is clipped through the same clipping frame can be determined, and if the fact that each frame of picture of the original video is clipped through the same clipping frame is determined, calculation of dynamically planning clipping paths and calculation of interpolation smoothing are not needed, so that time expenditure of video clipping is saved, and video clipping efficiency is improved. Specifically, according to the test found by the inventor, compared with the method of planning the clipping position for each frame of picture in the related art, the video clipping method provided by the disclosure has the advantage that the speed is increased by about 10 times, so that the time consumption problem of the related art in the video clipping process can be remarkably improved.

In addition, according to the video cropping method provided by the disclosure, after each frame of picture of the original video is determined to be cropped through the same cropping frame, the target cropping frame capable of comprising the most main content in the original video can be determined in a plurality of candidate cropping frames, so that time expenditure is saved, important information in the original video is contained to the maximum extent, content deletion in the cropped video is avoided, and video cropping effect is ensured. And the video coding can be bypassed, and the relevant information of the cutting frame can be directly given to the front-end equipment, so that the front-end equipment directly plays the video according to the relevant information of the cutting frame, thereby saving the time cost of the video coding and further saving the time cost.

In order to make those skilled in the art more understand the video cropping method provided in the present disclosure, the following details of the above steps are illustrated.

In a possible manner, determining the initial crop box may be based on the body content of the plurality of target video frames in the original video: and determining a significance detection result corresponding to each target video frame in the original video and an external rectangle corresponding to the maximum connected domain in the target video frame, wherein the significance detection result is used for representing the distribution condition of the main body content in the target video frame. Then taking the significance detection result as a weight value, carrying out weighted summation on the width of the circumscribed rectangle corresponding to each target video frame to obtain a width weighted result, and taking the ratio of the width weighted result to the sum of the significance detection results as the width of the initial cutting frame; and/or taking the saliency detection result as a weight value, carrying out weighted summation on the length of the circumscribed rectangle corresponding to each target video frame to obtain a length weighted result, and taking the ratio of the length weighted result to the sum of the saliency detection results as the length of the initial cutting frame.

Illustratively, saliency detection is used to simulate the visual characteristics of a human through intelligent algorithms, extracting salient regions in an image (i.e., regions of human interest). The significance detection result may be a pixel mean value of all pixel points in the significance region obtained by the significance detection as a weight value, and the like, which is not limited by the embodiment of the present disclosure. When the method is implemented, saliency detection can be performed through a graph theory attention model, a frequency domain analysis attention model and the like, so that a saliency detection result corresponding to a target video frame is obtained. The connected domain refers to an image region which is formed by foreground pixel points with the same pixel value and adjacent positions in the image, and can be used for determining contour regions corresponding to different targets in the image, and the circumscribed rectangle of the maximum connected domain can comprise the contour of the maximum target in the image, namely the main body content in the image.

For example, for a plurality of target video frames in an original video, determining the corresponding significance detection result is: s ₁、S₂、……、S_k, where k represents the number of target video frames. The width of the circumscribed rectangle of the maximum connected domain of the ith target video frame in the plurality of target video frames is as follows: w _i＝W(convex(S_i)), wherein convex represents a function of a circumscribed rectangle for determining a maximum connected domain of the target video frame according to the significance detection result, and a specific calculation process thereof is similar to that in the related art, and is not repeated herein. Similarly, the length of the circumscribed rectangle of the maximum connected domain of the ith target video frame in the plurality of target video frames is: h _i＝H(convex(S_i)). In this case, the width of the initial crop box can be calculated by the following formula: The length of the initial crop box can be calculated by the following formula: /(I)

It should be appreciated that the length and width of the corresponding frame in the original video may be cropped separately according to the target crop size, in which case the length and width of the initial crop frame may be determined in the manner described above. Or in order to improve the video cropping efficiency, cropping can be performed along the length or width of the corresponding frame picture according to the target cropping size and the size of the original video. For example, the target crop size is 1:1, the original video size is 720×1280 pixels, the length (along the y-axis direction) of the corresponding frame can be cut, and the cut video size is 720×720 pixels. Therefore, when determining the initial crop box, the width of the initial crop box may be determined in the above manner, the length may be determined according to the length of the original video, or the length of the initial crop box may be determined in the above manner, and the width may be determined according to the width of the original video.

By the method, the size of the initial clipping frame can be determined according to the saliency detection result of each target video frame and the circumscribed rectangle of the maximum connected domain, so that the initial clipping frame can comprise most of main body content of each target video frame, and the content loss of the clipped video is avoided.

In a possible manner, in step 103, according to the difference between the size of the initial cropping frame and the target cropping size and the variation of the main body content in the multiple target video frames, determining whether to crop each frame of the original video through the same cropping frame may be: and determining a length difference value between the length of the initial cutting frame and the length defined by the target cutting size, a width difference value between the width of the initial cutting frame and the width defined by the target cutting size, and a standard deviation of an circumscribed rectangle corresponding to the maximum connected domain in each target video frame. And then, determining a clipping strategy judgment value according to the length difference value, the width difference value and the standard deviation. If the clipping strategy judgment value is smaller than the preset threshold value, determining that each frame of the original video is clipped through the same clipping frame, and if the clipping strategy judgment value is larger than or equal to the preset threshold value, determining that each frame of the original video is not clipped through the same clipping frame. The preset threshold may be set according to an actual service requirement, which is not limited in the embodiments of the present disclosure.

For example, the length difference may be obtained by taking an absolute value by differentiating the length of the initial trimming frame from the length defined by the target trimming size, and the width difference may be obtained by taking an absolute value by differentiating the width of the initial trimming frame from the width defined by the target trimming size. The standard deviation of the circumscribed rectangle corresponding to the maximum connected domain in each target video frame represents the variation degree of the main body content of the whole video. The standard deviation may be a standard deviation of a width of the circumscribed rectangle, or may be a standard deviation of a length of the circumscribed rectangle, or may also be a standard deviation of an area of the circumscribed rectangle, etc., which is not limited by the disclosed embodiments.

For example, the clipping policy decision value may be determined according to the length difference, the width difference, and the standard deviation according to the following formula:

Wherein score represents a clipping policy decision value, lambda ₁ and lambda ₂ represent preset weight values, |W-W _out | represents a width difference, |H-H _out | represents a length difference, W _out represents a width defined by a target clipping size, H _out represents a length defined by a target clipping size, R _i represents the circumscribed rectangle of the maximum connected domain of the ith target video frame, R _i can be the width or length of the circumscribed rectangle in specific application, and accordingly,/>And the width average value or the length average value of circumscribed rectangles corresponding to the k target video frames is represented.

In a possible manner, in order to facilitate the comparison of the clipping policy decision value with a preset threshold value, and considering that the length difference value, the width difference value and the standard deviation can also be normalized, the clipping policy decision value is determined according to the length difference value, the width difference value and the standard deviation after the normalization. For example, the length difference may be normalized according to the length defined by the target clipping size to obtain a target length difference, the width difference may be normalized according to the width defined by the target clipping size to obtain a target width difference, and the standard deviation may be normalized according to the width or length of the initial clipping frame to obtain a target standard deviation. Then, determining a clipping strategy judgment value according to the target length difference value, the target width difference value and the target standard deviation.

For example, if the standard deviation is calculated according to the width of the circumscribed rectangle, the clipping policy judgment value may be determined according to the following formula according to the length difference, the width difference and the standard deviation:

Or if the standard deviation is calculated according to the length of the circumscribed rectangle, the clipping strategy judgment value can be determined according to the length difference value, the width difference value and the standard deviation according to the following formula:

Wherein, Representing the normalized width difference (i.e., target width difference),/>Representing the normalized length difference (i.e., target length difference),/>And/>All represent standard deviations (i.e., target standard deviations) after normalization processing.

It should be appreciated that a change in video size does not cause a change in video content, such as a change in video size from 720 x 720 pixels to 1280 x 1280 pixels, with the resolution of the video being changed, but the video content is not substantially changed. However, in this case, the standard deviation of the circumscribed rectangle changes due to the change of the video size, for example, in the above example, the video size is changed from 720×720 pixels to 1280×1280 pixels, and the standard deviation of the circumscribed rectangle is also correspondingly changed to 2 times of the original standard deviation, so that the clipping policy judgment value is increased, and the judgment result is inaccurate. By adopting the mode, the standard deviation is normalized according to the length or the width of the initial cutting frame, so that the change of the cutting strategy judgment value caused by the change of the video size can be avoided, and a more accurate judgment result can be obtained. In addition, through normalization processing, a clipping strategy judgment value with the same numerical dimension as a preset threshold value can be obtained, so that comparison of the clipping strategy judgment value and the preset threshold value is more conveniently carried out, and whether each frame of picture of the original video is clipped through the same clipping frame or not is more conveniently determined.

For example, if the clipping policy judgment value is smaller than the preset threshold value, it is indicated that the difference between the size of the initial clipping frame and the target clipping size is smaller, and the clipped video does not need more frame filling, so that the look and feel is better. The main body content in the target video frames is not changed greatly, and the main body content of each frame picture in the original video can be included through the fixed cutting frame, so that the content deletion of the cut video is avoided. Therefore, under the condition that the judgment value of the clipping strategy is smaller than the preset threshold value, each frame of picture of the original video can be determined to be clipped through the same clipping frame, so that time expenditure is saved, and meanwhile, a good clipping effect is obtained. Otherwise, if the judgment value of the clipping strategy is larger than or equal to the preset threshold value, the situation that the difference between the size of the initial clipping frame and the target clipping size is larger is indicated, more frames are needed for clipping the video, and the appearance is poor. And the main body content in a plurality of target video frames is greatly changed, so that the main body content of each frame picture in the original video is difficult to include through a fixed clipping frame, and the content of the clipped video is possibly lost. Therefore, when the clipping policy judgment value is greater than or equal to the preset threshold value, it can be determined that each frame of the original video is not clipped by the same clipping frame.

By the method, the difference between the size of the initial cutting frame and the target cutting size frame and the position variation degree of the main body content in the original video can be respectively judged to determine whether to perform the cutting strategy of the fixed cutting frame on the original video, and if the cutting strategy of the fixed cutting frame is determined to be performed on the original video, the calculation of dynamic planning cutting paths and the calculation of interpolation smoothing are not needed, so that the time cost of video cutting is saved, and the video cutting efficiency is improved.

In a possible manner, among the plurality of candidate crop frames, determining the target crop frame that can include the most subject content in the original video may be: and calculating a cost function for each frame of N frames of the original video according to the main body content in the frame and M candidate cutting frames corresponding to the frame, wherein the cost function is used for representing the content of the main body content in the frame by the candidate cutting frames, and N and M are positive integers. Then, according to the calculation result of the cost function, a target crop box capable of including the most subject content is determined among the N times M candidate crop boxes.

For example, according to the above-described manner of candidate crop frames, M candidate crop frames may be determined per frame picture, such as the above-described manner of determining one candidate crop frame per 20 pixels. If the original video includes N frames of pictures, N times M candidate crop frames can be obtained. Accordingly, a cost function may be calculated according to the subject content of each frame picture and the N by M candidate crop frames, and a target crop frame capable of including the most subject content in the original video may be determined at the N by M candidate crop frames according to the calculation result of the cost function.

For example, the cost function is calculated as follows:

Wherein f represents the calculation result of the cost function, a (C) represents the subject content included in the candidate crop frame C, and a (I _i) represents the complete subject content included in the I-th frame picture.

In this case, a candidate crop box that minimizes the calculation result of the cost function may be determined as the target crop box. The position of the target cutting frame can be obtained through enumeration, specifically, cutting positions of N times M candidate cutting frames can be discretized, then a cost function is calculated for the position of each candidate cutting frame, and the candidate cutting frame with the smallest calculation result of the cost function is selected.

For another example, the cost function is calculated as follows:

In this case, a candidate crop box that maximizes the calculation result of the cost function may be determined as the target crop box. Similarly, the position of the target crop frame may be obtained by enumeration, specifically, the crop positions of N times M candidate crop frames may be discretized, then a cost function may be calculated for each candidate crop frame position, and a candidate crop frame that maximizes the calculation result of the cost function may be selected.

By the method, the cost function can be calculated according to the main body content of each frame picture and the plurality of candidate clipping frames corresponding to each frame picture, so that the target clipping frame which can comprise the most main body content is determined in the plurality of candidate clipping frames, important information in the video is contained to the greatest extent while the video clipping time cost is saved, and the content loss of the video after clipping is reduced.

In a possible manner, it is also possible to determine, for each frame of the original video, the complete body content of the frame, the length body content in the length direction, and the width body content in the width direction, and cache the complete body content, the length body content, and the width body content. Correspondingly, the calculating cost function according to the main body content in the frame picture and the M preset candidate clipping frames corresponding to the frame picture can be as follows: if the original video is cut along the length direction according to the size of the original video and the target cutting size, obtaining the whole main body content and the width main body content corresponding to the cached frame picture, and calculating a cost function according to the whole main body content, the width main body content and M candidate cutting frames. If the original video is cut along the width direction according to the size of the original video and the target cutting size, obtaining the complete main body content and the length main body content corresponding to the cached frame picture, and calculating a cost function according to the complete main body content, the length main body content and M candidate cutting frames.

It should be appreciated that in calculating the cost function, parameters (such as the parameter a (I _i) in the cost function formula described above) that characterize the complete body content in the frame are required. In the embodiment of the disclosure, the complete main body content corresponding to each frame picture can be determined in advance for each frame picture, and the complete main body content is cached, so that the cached complete main body content can be directly obtained in the calculation process of a subsequent cost function, the complete main body content corresponding to the frame picture does not need to be recalculated each time, the time expenditure and the calculation amount can be further saved, and the video cropping efficiency is improved.

It should also be appreciated that video cropping may be cropping along the length or width of the corresponding frame of picture based on the target cropping size and the size of the original video. In such a case of cutting in the width direction or in the length direction, the pixels in the direction in which cutting is not performed are unchanged, and therefore, it is possible to determine the length main content in the length direction and the width main content in the width direction for each frame of the original video in advance, and cache the length main content and the width main content. In the subsequent process of calculating the cost function, if parameters (such as the parameter a (C) in the cost function formula) representing the main content included in the candidate clipping frame need to be calculated, the main content in the cached non-clipping dimension can be directly obtained, and the calculation is not required to be performed again each time, so that the time cost and the calculation amount are saved.

For example, if it is determined that clipping is performed in the length direction according to the size of the original video and the target clipping size, it is indicated that the pixels in the width direction are unchanged, so in the process of calculating the cost function, the cached width main content may be directly obtained, and then the main content of the actual length corresponding to the candidate clipping frame is combined to determine the main content included in the candidate clipping frame (for example, the parameter a (C) in the cost function formula described above). Similarly, if it is determined that clipping is performed in the width direction according to the size of the original video and the target clipping size, it is indicated that the pixels in the length direction are unchanged, so that in the process of calculating the cost function, the cached length main content can be directly obtained, and then the main content of the actual width corresponding to the candidate clipping frame is combined to determine the main content included in the candidate clipping frame (for example, the parameter a (C) in the cost function formula).

Based on the same inventive concept, the present disclosure also provides a video cropping device, which may be part or all of an electronic device by means of software, hardware, or a combination of both. Referring to fig. 2, the video cropping device 200 may include:

An obtaining module 201, configured to obtain an original video to be cut and a target cutting size;

a first determining module 202, configured to determine an initial crop box according to the main content of a plurality of target video frames in the original video;

A second determining module 203, configured to determine whether to clip each frame of the original video through the same clipping frame according to a difference between the size of the initial clipping frame and the target clipping size and a change of the main content in the plurality of target video frames;

a third determining module 204, configured to determine, when determining that each frame of the original video is cropped by the same cropping frame, a target cropping frame capable of including the most main content in the original video among a plurality of candidate cropping frames, where a size of each of the candidate cropping frames is consistent with a size of the initial cropping frame;

and the cropping module 205 is configured to crop the original video according to the target cropping frame.

Optionally, the clipping module 205 is configured to:

And sending the size information and the position information of the target cutting frame to video playing equipment so that the video playing equipment cuts the original video according to the target cutting frame.

Optionally, the first determining module 202 is configured to:

For each target video frame in the original video, determining a significance detection result corresponding to the target video frame and an external rectangle corresponding to a maximum connected domain in the target video frame, wherein the significance detection result is used for representing the distribution condition of main content in the target video frame;

Taking the significance detection result as a weight value, carrying out weighted summation on the width of the circumscribed rectangle corresponding to each target video frame to obtain a width weighted result, and taking the ratio of the width weighted result to the significance detection result sum as the width of the initial cutting frame; and/or taking the saliency detection result as a weight value, carrying out weighted summation on the length of the circumscribed rectangle corresponding to each target video frame to obtain a length weighted result, and taking the ratio of the length weighted result to the sum of the saliency detection results as the length of the initial cutting frame.

Optionally, the second determining module 203 is configured to:

Determining a length difference value between the length of the initial cutting frame and the length defined by the target cutting size, a width difference value between the width of the initial cutting frame and the width defined by the target cutting size, and a standard deviation of an external rectangle corresponding to the maximum connected domain in each target video frame;

determining a clipping strategy judgment value according to the length difference value, the width difference value and the standard deviation;

If the clipping strategy judgment value is smaller than a preset threshold value, determining that each frame of the original video is clipped through the same clipping frame, and if the clipping strategy judgment value is larger than or equal to the preset threshold value, determining that each frame of the original video is not clipped through the same clipping frame.

Optionally, the second determining module 203 is configured to:

Normalizing the length difference according to the length limited by the target cutting size to obtain a target length difference, normalizing the width difference according to the width limited by the target cutting size to obtain a target width difference, and normalizing the standard deviation according to the width or length of the initial cutting frame to obtain a target standard deviation;

and determining the clipping strategy judgment value according to the target length difference value, the target width difference value and the target standard deviation.

Optionally, the third determining module 204 is configured to:

Calculating a cost function according to the main content in the frame picture and M candidate cutting frames corresponding to the frame picture aiming at each frame picture in N frame pictures of the original video, wherein the cost function is used for representing the content of the main content in the frame picture by the candidate cutting frames, and N and M are positive integers;

and determining a target cutting frame capable of comprising the most main content in N times M candidate cutting frames according to the calculation result of the cost function.

Optionally, the apparatus 200 further includes:

the buffer module is used for determining the complete main body content, the length main body content along the length direction and the width main body content along the width direction of each frame picture of the original video, and buffering the complete main body content, the length main body content and the width main body content;

the third determining module 204 is configured to:

If the original video is cut along the length direction according to the size of the original video and the target cutting size, acquiring the whole main body content and the width main body content corresponding to the cached frame picture, and calculating the cost function according to the whole main body content, the width main body content and the M candidate cutting frames;

And if the original video is cut along the width direction according to the size of the original video and the target cutting size, acquiring the complete main body content and the length main body content corresponding to the cached frame picture, and calculating the cost function according to the complete main body content, the length main body content and the M candidate cutting frames.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Based on the same inventive concept, the embodiments of the present disclosure also provide a computer readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of any of the video cropping methods described above.

Based on the same inventive concept, the embodiments of the present disclosure further provide an electronic device, including:

A storage device having a computer program stored thereon;

Processing means for executing the computer program in the storage means to implement the steps of any of the video cropping methods described above.

Referring now to fig. 3, a schematic diagram of an electronic device 300 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 3 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a Read Only Memory (ROM) 302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

In general, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and communication means 309. The communication means 309 may allow the electronic device 300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 shows an electronic device 300 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 301.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some embodiments, communications may be made using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an original video to be cut and a target cutting size; determining an initial cutting frame according to the main body content of a plurality of target video frames in the original video; determining whether to cut each frame of the original video through the same cutting frame according to the difference between the size of the initial cutting frame and the target cutting size and the change condition of main body contents in the target video frames; if each frame of picture of the original video is determined to be cut through the same cutting frame, determining a target cutting frame capable of comprising the most main body content in the original video in a plurality of candidate cutting frames, wherein the size of each candidate cutting frame is consistent with the size of the initial cutting frame; and clipping the original video according to the target clipping frame.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of a module does not in some cases define the module itself.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In accordance with one or more embodiments of the present disclosure, example 1 provides a video cropping method comprising:

Acquiring an original video to be cut and a target cutting size;

And clipping the original video according to the target clipping frame.

Example 2 provides the method of example 1, clipping the original video according to the target clipping frame, comprising:

Example 3 provides the method of example 1, according to one or more embodiments of the present disclosure, the determining a crop box according to body contents of a plurality of target video frames in the original video, including:

According to one or more embodiments of the present disclosure, example 4 provides the method of example 1, the determining whether to clip each frame picture of the original video by the same clipping frame according to a difference between a size of the initial clipping frame and the target clipping size and a change in content of a main body in the plurality of target video frames, including:

According to one or more embodiments of the present disclosure, example 5 provides the method of example 4, the determining a clipping policy decision value from the length difference value, the width difference value, and the standard deviation, comprising:

In accordance with one or more embodiments of the present disclosure, example 6 provides the method of any one of examples 1-5, the determining a target crop box capable of including the most subject content in the original video among a plurality of candidate crop boxes, comprising:

Example 7 provides the method of example 6, according to one or more embodiments of the present disclosure, the method further comprising:

For each frame of the original video, determining the complete main content, the length main content along the length direction and the width main content along the width direction of the frame, and caching the complete main content, the length main content and the width main content;

the calculating the cost function according to the main body content in the frame picture and the M preset candidate clipping frames corresponding to the frame picture includes:

In accordance with one or more embodiments of the present disclosure, example 8 provides a video cropping device, the device comprising:

The first determining module is used for determining an initial cutting frame according to the main body content of a plurality of target video frames in the original video;

Example 9 provides the apparatus of example 8, according to one or more embodiments of the disclosure, the clipping module to:

According to one or more embodiments of the present disclosure, example 10 provides the apparatus of example 8, the first determining module to:

According to one or more embodiments of the present disclosure, example 11 provides the apparatus of example 8, the second determining module to:

In accordance with one or more embodiments of the present disclosure, example 12 provides the apparatus of example 8, the second determining module to:

According to one or more embodiments of the present disclosure, example 13 provides the apparatus of any one of examples 8-12, the third determining module to:

Example 14 provides the apparatus of example 13, according to one or more embodiments of the disclosure, further comprising:

the third determining module 204 is configured to:

According to one or more embodiments of the present disclosure, example 15 provides a computer-readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the method of any of examples 1 to 7.

Example 16 provides an electronic device according to one or more embodiments of the present disclosure, comprising:

A storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to implement the steps of the method of any one of examples 1 to 7.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims

1. A method of video cropping, the method comprising:

Acquiring an original video to be cut and a target cutting size;

determining an initial cutting frame according to the main body contents of a plurality of target video frames in the original video, wherein the initial cutting frame is a cutting frame capable of comprising the main body contents of the plurality of target video frames;

And clipping the original video according to the target clipping frame.

2. The method of claim 1, wherein cropping the original video according to the target crop box comprises:

3. The method of claim 1, wherein determining an initial crop box from the body content of the plurality of target video frames in the original video comprises:

4. The method of claim 1, wherein determining whether to clip each frame of the original video with the same clipping frame based on a difference between the size of the initial clipping frame and the target clipping size and a change in content of a main body in the plurality of target video frames comprises:

5. The method of claim 4, wherein determining a clipping policy decision value based on the length difference value, the width difference value, and the standard deviation comprises:

6. The method of any of claims 1-5, wherein the determining a target crop box that can include the most subject content in the original video from among a plurality of candidate crop boxes comprises:

7. The method of claim 6, wherein the method further comprises:

8. A video cropping device, the device comprising:

The first determining module is used for determining an initial cutting frame according to the main body contents of a plurality of target video frames in the original video, wherein the initial cutting frame is a cutting frame capable of comprising the main body contents of the plurality of target video frames;

9. A computer readable medium on which a computer program is stored, characterized in that the program, when being executed by a processing device, carries out the steps of the method according to any one of claims 1-7.

10. An electronic device, comprising:

A storage device having a computer program stored thereon;

processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-7.